Skip to content

fix(e2e): resolve test failures from issue #244#245

Open
visahak wants to merge 3 commits intoAgentToolkit:mainfrom
visahak:fix/e2e-244-test-failures
Open

fix(e2e): resolve test failures from issue #244#245
visahak wants to merge 3 commits intoAgentToolkit:mainfrom
visahak:fix/e2e-244-test-failures

Conversation

@visahak
Copy link
Copy Markdown
Collaborator

@visahak visahak commented May 1, 2026

Description:

Summary

Changes

  • altk_evolve/backend/filesystem.py — atomic writes via tmp + os.replace; graceful recovery from empty/corrupt JSON
  • altk_evolve/backend/filesystem.py — atomic writes via tmp + os.replace; graceful recovery from empty/corrupt JSON
  • platform-integrations/codex/.../subscribe.py — audit-append failure now warns and continues instead of rolling back with exit 1
  • platform-integrations/{claude,claw-code}/.../lib/config.py — invalid-repo-name warning prints to stdout with "(skipped - invalid subscription name)" wording
  • tests/e2e/test_e2e_segmentation.py@pytest.mark.flaky(retries=2, delay=1) to absorb LLM variance
  • tests/e2e/test_sharing.py — assertion now checks leaked entity content + [public: alice] marker instead of the query term, which always echoes in the response header
  • .gitignore — exclude .codex

Test plan

  • Group A: 3 Phoenix-sync e2e pipeline tests pass
  • Group B: 4 invalid-subscription-name tests pass across bob/codex/claude
  • Group C: test_subscribe_warns_when_audit_write_fails passes; full test_codex_sharing.py green

Addresses issue #244

Summary by CodeRabbit

  • Bug Fixes

    • Improved namespace file handling with atomic writes and enhanced detection/logging for empty or corrupt files
    • Subscription operations now continue despite audit logging failures, improving resilience
  • Tests

    • Added retry for a flaky segmentation test
    • Strengthened assertions for cross-namespace discovery behavior
  • Chores

    • Added a .codex ignore entry and adjusted some validation/log messages for subscription entries

  - filesystem backend: atomic writes + graceful recovery from corrupt JSON
  - codex subscribe: warn on audit-append failure instead of rolling back
  - shared config: invalid-repo-name warning goes to stdout
  - segmentation test: mark flaky to absorb LLM variance
  - cross-namespace sharing test: check leaked content, not echoed query
  - .gitignore: exclude .codex
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 1, 2026

📝 Walkthrough

Walkthrough

Updates include: ignoring .codex in git, atomic writes and stricter JSON handling for namespace persistence, minor logging/message adjustments in two plugin configs, non-fatal handling for subscription audit-log failures, and two test changes (flaky retry and stronger assertions).

Changes

Git ignore

Layer / File(s) Summary
Config
.gitignore
Add .codex pattern to ignore list.

Namespace filesystem operations

Layer / File(s) Summary
Read / Validation
altk_evolve/backend/filesystem.py
Namespace loader reads namespace JSON as raw text and detects empty/whitespace-only files; logs warnings for empty or corrupt JSON and raises NamespaceNotFoundException (corruption errors chained).
Atomic Write
altk_evolve/backend/filesystem.py
Namespace persistence writes updated JSON to a temporary sibling file and atomically replaces target with os.replace, improving robustness vs direct writes; extended logging/error handling around interrupted/partial writes and JSON decode failures.

Plugin configuration messages

Layer / File(s) Summary
Validation messaging
platform-integrations/claude/plugins/evolve-lite/lib/config.py
Change _coerce_repo invalid-name branch message wording to include “(skipped - invalid subscription name)” and switch to a plain print(...) (no explicit sys.stderr).
Validation messaging
platform-integrations/claw-code/plugins/evolve-lite/lib/config.py
Same wording change for _coerce_repo; replaces previous print(..., file=sys.stderr) with a plain print(...) and updated phrasing.

Subscription audit logging behavior

Layer / File(s) Summary
Error handling change
platform-integrations/codex/plugins/evolve-lite/skills/subscribe/scripts/subscribe.py
On audit-log append failure, previously a fatal rollback+exit is replaced with a non-fatal warning; script continues without performing the prior rollback/deletion sequence.

End-to-end tests

Layer / File(s) Summary
Flakiness handling
tests/e2e/test_e2e_segmentation.py
Add flaky test marker to test_segment_trajectory_min_subtasks enabling up to 2 retries with 1s delay.
Assertion strengthening
tests/e2e/test_sharing.py
test_cross_namespace_public_discovery now asserts the full content string "use dependency injection for testability" when applicable and explicitly asserts the absence of "[public: alice]" when include_public=False; comments adjusted accordingly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • gaodan-fang
  • illeatmyhat
  • vinodmut

Poem

🐰 I nibble logs and atomic writes so neat,
I tidy git ignores on tiny feet,
Warnings whisper where errors used to roar,
Tests retry and assertions ask for more,
A happy hare hops on—code tidy and sweet.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly aligns with the PR's primary objective: fixing e2e test failures from issue #244, which is confirmed across multiple file changes and test corrections.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
platform-integrations/claw-code/plugins/evolve-lite/lib/config.py (1)

339-352: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Inconsistent output streams for sibling warnings in _coerce_repo.

The invalid-name warning (lines 339–341) now writes to stdout, while the invalid-scope warning (lines 349–352) still writes to stderr. Both are diagnostic/warning messages emitted from the same function for invalid config entries — they should target the same stream.

If the change was driven by a test that captures stdout, consider capturing stderr in the test instead (which is the conventional target for warnings) rather than routing a warning to stdout.

🔧 Proposed fix — restore `stderr` for the invalid-name path
     if not is_valid_repo_name(name.strip()):
         print(
-            f"evolve-lite: {name!r} (skipped - invalid subscription name) — only A-Z, a-z, 0-9, '.', '_', '-' allowed"
+            f"evolve-lite: {name!r} (skipped - invalid subscription name) — only A-Z, a-z, 0-9, '.', '_', '-' allowed",
+            file=sys.stderr,
         )
         return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@platform-integrations/claw-code/plugins/evolve-lite/lib/config.py` around
lines 339 - 352, The invalid-name diagnostic in _coerce_repo currently prints to
stdout while the invalid-scope diagnostic prints to stderr, causing inconsistent
output streams; change the print in the invalid-name branch that references name
to write to sys.stderr (matching the other warning), keep the return None
behavior, and ensure you import or reference sys the same way as the scope
branch so both warnings (the print that mentions "skipped - invalid subscription
name" and the print that mentions "unknown scope") go to stderr; symbols to
locate: function _coerce_repo, variables name/remote/scope, VALID_SCOPES, and
the two print calls.
platform-integrations/claude/plugins/evolve-lite/lib/config.py (1)

339-352: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Same inconsistent output-stream issue as in claw-code/plugins/evolve-lite/lib/config.py.

Identical concern: the invalid-name print (lines 339–341) targets stdout while the invalid-scope print (lines 349–352) targets stderr. Both paths should use the same stream. See the proposed fix in the claw-code file review above and apply it here as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@platform-integrations/claude/plugins/evolve-lite/lib/config.py` around lines
339 - 352, The two diagnostic print calls in this module are inconsistent: the
invalid-name print (using name) currently writes to stdout while the
invalid-scope print (using entry.get("scope") and VALID_SCOPES) writes to
sys.stderr; change the invalid-name print so it also writes to sys.stderr (i.e.,
add file=sys.stderr) to match the invalid-scope branch and ensure both error
messages go to the same stream; locate the prints around the name validation and
the scope validation (they reference name, entry, scope and VALID_SCOPES) and
modify only the print for the invalid-name case to include file=sys.stderr.
🧹 Nitpick comments (1)
altk_evolve/backend/filesystem.py (1)

73-75: ⚡ Quick win

Use a unique tmp filename per write to be safe under multi-process deployments.

The deterministic {namespace_id}.json.tmp name is protected by threading.Lock within a single process, but not across OS-level workers. If two processes ever write the same namespace concurrently (e.g., uvicorn with --workers > 1), Writer B can overwrite the tmp file after Writer A wrote it but before Writer A calls os.replace, causing Writer A to atomically commit Writer B's (possibly partial) content.

Using a uniquely-named tmp file keeps the same-directory guarantee that os.replace needs while eliminating the race:

♻️ Proposed fix using `tempfile`
+import tempfile
 ...
     def _save_namespace_data(self, namespace_id: str, data: FilesystemNamespace):
         """Save namespace data to JSON file atomically. ..."""
         file_path = self._namespace_file(namespace_id)
-        tmp_path = file_path.with_suffix(file_path.suffix + ".tmp")
-        tmp_path.write_text(data.model_dump_json(indent=2))
-        os.replace(tmp_path, file_path)
+        fd, tmp_name = tempfile.mkstemp(dir=file_path.parent, suffix=".tmp")
+        try:
+            with os.fdopen(fd, "w", encoding="utf-8") as f:
+                f.write(data.model_dump_json(indent=2))
+            os.replace(tmp_name, file_path)
+        except Exception:
+            os.unlink(tmp_name)
+            raise

This also adds the encoding="utf-8" spec missing from both write_text and read_text (line 55) calls — relevant if any entity content contains non-ASCII characters on Windows where the default locale encoding may not be UTF-8.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@altk_evolve/backend/filesystem.py` around lines 73 - 75, The current write
uses a deterministic tmp file (tmp_path = file_path.with_suffix(...)) which
races across processes; change the write in the function/method that uses
tmp_path/file_path so it creates a unique temp file in the same directory (e.g.,
use tempfile.NamedTemporaryFile or tempfile.mkstemp with dir=file_path.parent
and a unique suffix), write the JSON with encoding="utf-8" (use write() or
write_text(..., encoding="utf-8")), flush/close the temp file, then call
os.replace(temp_path, file_path) to atomically commit; also update the
corresponding read_text call to include encoding="utf-8". Ensure the temp file
is removed on exceptions if not replaced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@platform-integrations/claude/plugins/evolve-lite/lib/config.py`:
- Around line 339-352: The two diagnostic print calls in this module are
inconsistent: the invalid-name print (using name) currently writes to stdout
while the invalid-scope print (using entry.get("scope") and VALID_SCOPES) writes
to sys.stderr; change the invalid-name print so it also writes to sys.stderr
(i.e., add file=sys.stderr) to match the invalid-scope branch and ensure both
error messages go to the same stream; locate the prints around the name
validation and the scope validation (they reference name, entry, scope and
VALID_SCOPES) and modify only the print for the invalid-name case to include
file=sys.stderr.

In `@platform-integrations/claw-code/plugins/evolve-lite/lib/config.py`:
- Around line 339-352: The invalid-name diagnostic in _coerce_repo currently
prints to stdout while the invalid-scope diagnostic prints to stderr, causing
inconsistent output streams; change the print in the invalid-name branch that
references name to write to sys.stderr (matching the other warning), keep the
return None behavior, and ensure you import or reference sys the same way as the
scope branch so both warnings (the print that mentions "skipped - invalid
subscription name" and the print that mentions "unknown scope") go to stderr;
symbols to locate: function _coerce_repo, variables name/remote/scope,
VALID_SCOPES, and the two print calls.

---

Nitpick comments:
In `@altk_evolve/backend/filesystem.py`:
- Around line 73-75: The current write uses a deterministic tmp file (tmp_path =
file_path.with_suffix(...)) which races across processes; change the write in
the function/method that uses tmp_path/file_path so it creates a unique temp
file in the same directory (e.g., use tempfile.NamedTemporaryFile or
tempfile.mkstemp with dir=file_path.parent and a unique suffix), write the JSON
with encoding="utf-8" (use write() or write_text(..., encoding="utf-8")),
flush/close the temp file, then call os.replace(temp_path, file_path) to
atomically commit; also update the corresponding read_text call to include
encoding="utf-8". Ensure the temp file is removed on exceptions if not replaced.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ff5fdb0b-f09a-4c15-8202-8ffb15eee833

📥 Commits

Reviewing files that changed from the base of the PR and between d8b9ac0 and 83c55f6.

📒 Files selected for processing (7)
  • .gitignore
  • altk_evolve/backend/filesystem.py
  • platform-integrations/claude/plugins/evolve-lite/lib/config.py
  • platform-integrations/claw-code/plugins/evolve-lite/lib/config.py
  • platform-integrations/codex/plugins/evolve-lite/skills/subscribe/scripts/subscribe.py
  • tests/e2e/test_e2e_segmentation.py
  • tests/e2e/test_sharing.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
platform-integrations/claude/plugins/evolve-lite/lib/config.py (1)

338-351: ⚡ Quick win

Inconsistent output streams for sibling warnings within _coerce_repo.

The invalid-name path (line 339) now emits to stdout via bare print(), while the invalid-scope path (lines 347-350) still emits to stderr via print(..., file=sys.stderr). Both are diagnostics for malformed config entries in the same function; routing them to different streams is surprising to callers.

As noted in the relevant context, sync.py already uses capture_output=True for subprocess calls. If any parent process captures these script invocations with stdout capture, the invalid-name warning will be inadvertently swallowed into the captured output buffer rather than flowing to the diagnostic stream.

If the test suite change that prompted this was asserting on stdout, consider redirecting both warnings to stderr and updating the tests to check capsys.readouterr().err (or equivalent), which keeps all config diagnostics on the standard diagnostic channel.

♻️ Proposed alignment (both warnings → stderr)
     if not is_valid_repo_name(name.strip()):
-        print(f"evolve-lite: {name!r} (skipped - invalid subscription name) — only A-Z, a-z, 0-9, '.', '_', '-' allowed")
+        print(
+            f"evolve-lite: {name!r} (skipped - invalid subscription name) — only A-Z, a-z, 0-9, '.', '_', '-' allowed",
+            file=sys.stderr,
+        )
         return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@platform-integrations/claude/plugins/evolve-lite/lib/config.py` around lines
338 - 351, In _coerce_repo, the invalid-name warning currently uses print(...)
to stdout while the invalid-scope warning prints to stderr; change the
invalid-name branch that prints "evolve-lite: {name!r} (skipped - invalid
subscription name) ..." to write to stderr (e.g., use print(...,
file=sys.stderr)) so both diagnostics go to stderr, and update any tests that
asserted on stdout to assert on stderr (capsys.readouterr().err) as needed;
reference symbols: function _coerce_repo, variables name and entry, and
VALID_SCOPES.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@platform-integrations/claude/plugins/evolve-lite/lib/config.py`:
- Around line 338-351: In _coerce_repo, the invalid-name warning currently uses
print(...) to stdout while the invalid-scope warning prints to stderr; change
the invalid-name branch that prints "evolve-lite: {name!r} (skipped - invalid
subscription name) ..." to write to stderr (e.g., use print(...,
file=sys.stderr)) so both diagnostics go to stderr, and update any tests that
asserted on stdout to assert on stderr (capsys.readouterr().err) as needed;
reference symbols: function _coerce_repo, variables name and entry, and
VALID_SCOPES.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7a4305af-488c-4d5a-a4b3-343458c79cd7

📥 Commits

Reviewing files that changed from the base of the PR and between 83c55f6 and 904c917.

📒 Files selected for processing (2)
  • platform-integrations/claude/plugins/evolve-lite/lib/config.py
  • platform-integrations/claw-code/plugins/evolve-lite/lib/config.py
✅ Files skipped from review due to trivial changes (1)
  • platform-integrations/claw-code/plugins/evolve-lite/lib/config.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant