fix(e2e): resolve test failures from issue #244 by visahak · Pull Request #245 · AgentToolkit/altk-evolve

visahak · 2026-05-01T20:04:00Z

Description:

Summary

Fix 3 categories of e2e test failures reported in E2E suite still has 6 failures and 32 errors after installing examples/milvus extras #244 plus 2 sharing tests discovered in the full-suite run
Hardened filesystem backend against interrupted writes (was causing 0-byte namespace files and cascading test failures)
Test-side fixes for assertion bugs and LLM variance

Changes

altk_evolve/backend/filesystem.py — atomic writes via tmp + os.replace; graceful recovery from empty/corrupt JSON
altk_evolve/backend/filesystem.py — atomic writes via tmp + os.replace; graceful recovery from empty/corrupt JSON
platform-integrations/codex/.../subscribe.py — audit-append failure now warns and continues instead of rolling back with exit 1
platform-integrations/{claude,claw-code}/.../lib/config.py — invalid-repo-name warning prints to stdout with "(skipped - invalid subscription name)" wording
tests/e2e/test_e2e_segmentation.py — @pytest.mark.flaky(retries=2, delay=1) to absorb LLM variance
tests/e2e/test_sharing.py — assertion now checks leaked entity content + [public: alice] marker instead of the query term, which always echoes in the response header
.gitignore — exclude .codex

Test plan

Group A: 3 Phoenix-sync e2e pipeline tests pass
Group B: 4 invalid-subscription-name tests pass across bob/codex/claude
Group C: test_subscribe_warns_when_audit_write_fails passes; full test_codex_sharing.py green

Addresses issue #244

Summary by CodeRabbit

Bug Fixes
- Improved namespace file handling with atomic writes and enhanced detection/logging for empty or corrupt files
- Subscription operations now continue despite audit logging failures, improving resilience
Tests
- Added retry for a flaky segmentation test
- Strengthened assertions for cross-namespace discovery behavior
Chores
- Added a .codex ignore entry and adjusted some validation/log messages for subscription entries

- filesystem backend: atomic writes + graceful recovery from corrupt JSON - codex subscribe: warn on audit-append failure instead of rolling back - shared config: invalid-repo-name warning goes to stdout - segmentation test: mark flaky to absorb LLM variance - cross-namespace sharing test: check leaked content, not echoed query - .gitignore: exclude .codex

coderabbitai · 2026-05-01T20:04:13Z

📝 Walkthrough

Walkthrough

Updates include: ignoring .codex in git, atomic writes and stricter JSON handling for namespace persistence, minor logging/message adjustments in two plugin configs, non-fatal handling for subscription audit-log failures, and two test changes (flaky retry and stronger assertions).

Changes

Git ignore

Layer / File(s)	Summary
Config `.gitignore`	Add `.codex` pattern to ignore list.

Namespace filesystem operations

Layer / File(s)	Summary
Read / Validation `altk_evolve/backend/filesystem.py`	Namespace loader reads namespace JSON as raw text and detects empty/whitespace-only files; logs warnings for empty or corrupt JSON and raises `NamespaceNotFoundException` (corruption errors chained).
Atomic Write `altk_evolve/backend/filesystem.py`	Namespace persistence writes updated JSON to a temporary sibling file and atomically replaces target with `os.replace`, improving robustness vs direct writes; extended logging/error handling around interrupted/partial writes and JSON decode failures.

Plugin configuration messages

Layer / File(s)	Summary
Validation messaging `platform-integrations/claude/plugins/evolve-lite/lib/config.py`	Change `_coerce_repo` invalid-name branch message wording to include “(skipped - invalid subscription name)” and switch to a plain `print(...)` (no explicit `sys.stderr`).
Validation messaging `platform-integrations/claw-code/plugins/evolve-lite/lib/config.py`	Same wording change for `_coerce_repo`; replaces previous `print(..., file=sys.stderr)` with a plain `print(...)` and updated phrasing.

Subscription audit logging behavior

Layer / File(s)	Summary
Error handling change `platform-integrations/codex/plugins/evolve-lite/skills/subscribe/scripts/subscribe.py`	On audit-log append failure, previously a fatal rollback+exit is replaced with a non-fatal warning; script continues without performing the prior rollback/deletion sequence.

End-to-end tests

Layer / File(s)	Summary
Flakiness handling `tests/e2e/test_e2e_segmentation.py`	Add flaky test marker to `test_segment_trajectory_min_subtasks` enabling up to 2 retries with 1s delay.
Assertion strengthening `tests/e2e/test_sharing.py`	`test_cross_namespace_public_discovery` now asserts the full content string `"use dependency injection for testability"` when applicable and explicitly asserts the absence of `"[public: alice]"` when `include_public=False`; comments adjusted accordingly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

gaodan-fang
illeatmyhat
vinodmut

Poem

🐰 I nibble logs and atomic writes so neat,
I tidy git ignores on tiny feet,
Warnings whisper where errors used to roar,
Tests retry and assertions ask for more,
A happy hare hops on—code tidy and sweet.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly aligns with the PR's primary objective: fixing e2e test failures from issue `#244`, which is confirmed across multiple file changes and test corrections.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

platform-integrations/claw-code/plugins/evolve-lite/lib/config.py (1)
339-352: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Inconsistent output streams for sibling warnings in _coerce_repo.

The invalid-name warning (lines 339–341) now writes to stdout, while the invalid-scope warning (lines 349–352) still writes to stderr. Both are diagnostic/warning messages emitted from the same function for invalid config entries — they should target the same stream.

If the change was driven by a test that captures stdout, consider capturing stderr in the test instead (which is the conventional target for warnings) rather than routing a warning to stdout.
🔧 Proposed fix — restore `stderr` for the invalid-name path
     if not is_valid_repo_name(name.strip()):
         print(
-            f"evolve-lite: {name!r} (skipped - invalid subscription name) — only A-Z, a-z, 0-9, '.', '_', '-' allowed"
+            f"evolve-lite: {name!r} (skipped - invalid subscription name) — only A-Z, a-z, 0-9, '.', '_', '-' allowed",
+            file=sys.stderr,
         )
         return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@platform-integrations/claw-code/plugins/evolve-lite/lib/config.py` around
lines 339 - 352, The invalid-name diagnostic in _coerce_repo currently prints to
stdout while the invalid-scope diagnostic prints to stderr, causing inconsistent
output streams; change the print in the invalid-name branch that references name
to write to sys.stderr (matching the other warning), keep the return None
behavior, and ensure you import or reference sys the same way as the scope
branch so both warnings (the print that mentions "skipped - invalid subscription
name" and the print that mentions "unknown scope") go to stderr; symbols to
locate: function _coerce_repo, variables name/remote/scope, VALID_SCOPES, and
the two print calls.
platform-integrations/claude/plugins/evolve-lite/lib/config.py (1)
339-352: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Same inconsistent output-stream issue as in claw-code/plugins/evolve-lite/lib/config.py.

Identical concern: the invalid-name print (lines 339–341) targets stdout while the invalid-scope print (lines 349–352) targets stderr. Both paths should use the same stream. See the proposed fix in the claw-code file review above and apply it here as well.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@platform-integrations/claude/plugins/evolve-lite/lib/config.py` around lines
339 - 352, The two diagnostic print calls in this module are inconsistent: the
invalid-name print (using name) currently writes to stdout while the
invalid-scope print (using entry.get("scope") and VALID_SCOPES) writes to
sys.stderr; change the invalid-name print so it also writes to sys.stderr (i.e.,
add file=sys.stderr) to match the invalid-scope branch and ensure both error
messages go to the same stream; locate the prints around the name validation and
the scope validation (they reference name, entry, scope and VALID_SCOPES) and
modify only the print for the invalid-name case to include file=sys.stderr.

🧹 Nitpick comments (1)

altk_evolve/backend/filesystem.py (1)
73-75: ⚡ Quick win

Use a unique tmp filename per write to be safe under multi-process deployments.

The deterministic {namespace_id}.json.tmp name is protected by threading.Lock within a single process, but not across OS-level workers. If two processes ever write the same namespace concurrently (e.g., uvicorn with --workers > 1), Writer B can overwrite the tmp file after Writer A wrote it but before Writer A calls os.replace, causing Writer A to atomically commit Writer B's (possibly partial) content.

Using a uniquely-named tmp file keeps the same-directory guarantee that os.replace needs while eliminating the race:
♻️ Proposed fix using `tempfile`
+import tempfile
 ...
     def _save_namespace_data(self, namespace_id: str, data: FilesystemNamespace):
         """Save namespace data to JSON file atomically. ..."""
         file_path = self._namespace_file(namespace_id)
-        tmp_path = file_path.with_suffix(file_path.suffix + ".tmp")
-        tmp_path.write_text(data.model_dump_json(indent=2))
-        os.replace(tmp_path, file_path)
+        fd, tmp_name = tempfile.mkstemp(dir=file_path.parent, suffix=".tmp")
+        try:
+            with os.fdopen(fd, "w", encoding="utf-8") as f:
+                f.write(data.model_dump_json(indent=2))
+            os.replace(tmp_name, file_path)
+        except Exception:
+            os.unlink(tmp_name)
+            raise
This also adds the encoding="utf-8" spec missing from both write_text and read_text (line 55) calls — relevant if any entity content contains non-ASCII characters on Windows where the default locale encoding may not be UTF-8.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@altk_evolve/backend/filesystem.py` around lines 73 - 75, The current write
uses a deterministic tmp file (tmp_path = file_path.with_suffix(...)) which
races across processes; change the write in the function/method that uses
tmp_path/file_path so it creates a unique temp file in the same directory (e.g.,
use tempfile.NamedTemporaryFile or tempfile.mkstemp with dir=file_path.parent
and a unique suffix), write the JSON with encoding="utf-8" (use write() or
write_text(..., encoding="utf-8")), flush/close the temp file, then call
os.replace(temp_path, file_path) to atomically commit; also update the
corresponding read_text call to include encoding="utf-8". Ensure the temp file
is removed on exceptions if not replaced.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@platform-integrations/claude/plugins/evolve-lite/lib/config.py`:
- Around line 339-352: The two diagnostic print calls in this module are
inconsistent: the invalid-name print (using name) currently writes to stdout
while the invalid-scope print (using entry.get("scope") and VALID_SCOPES) writes
to sys.stderr; change the invalid-name print so it also writes to sys.stderr
(i.e., add file=sys.stderr) to match the invalid-scope branch and ensure both
error messages go to the same stream; locate the prints around the name
validation and the scope validation (they reference name, entry, scope and
VALID_SCOPES) and modify only the print for the invalid-name case to include
file=sys.stderr.

In `@platform-integrations/claw-code/plugins/evolve-lite/lib/config.py`:
- Around line 339-352: The invalid-name diagnostic in _coerce_repo currently
prints to stdout while the invalid-scope diagnostic prints to stderr, causing
inconsistent output streams; change the print in the invalid-name branch that
references name to write to sys.stderr (matching the other warning), keep the
return None behavior, and ensure you import or reference sys the same way as the
scope branch so both warnings (the print that mentions "skipped - invalid
subscription name" and the print that mentions "unknown scope") go to stderr;
symbols to locate: function _coerce_repo, variables name/remote/scope,
VALID_SCOPES, and the two print calls.

---

Nitpick comments:
In `@altk_evolve/backend/filesystem.py`:
- Around line 73-75: The current write uses a deterministic tmp file (tmp_path =
file_path.with_suffix(...)) which races across processes; change the write in
the function/method that uses tmp_path/file_path so it creates a unique temp
file in the same directory (e.g., use tempfile.NamedTemporaryFile or
tempfile.mkstemp with dir=file_path.parent and a unique suffix), write the JSON
with encoding="utf-8" (use write() or write_text(..., encoding="utf-8")),
flush/close the temp file, then call os.replace(temp_path, file_path) to
atomically commit; also update the corresponding read_text call to include
encoding="utf-8". Ensure the temp file is removed on exceptions if not replaced.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ff5fdb0b-f09a-4c15-8202-8ffb15eee833

📥 Commits

Reviewing files that changed from the base of the PR and between d8b9ac0 and 83c55f6.

📒 Files selected for processing (7)

.gitignore
altk_evolve/backend/filesystem.py
platform-integrations/claude/plugins/evolve-lite/lib/config.py
platform-integrations/claw-code/plugins/evolve-lite/lib/config.py
platform-integrations/codex/plugins/evolve-lite/skills/subscribe/scripts/subscribe.py
tests/e2e/test_e2e_segmentation.py
tests/e2e/test_sharing.py

coderabbitai

🧹 Nitpick comments (1)

platform-integrations/claude/plugins/evolve-lite/lib/config.py (1)
338-351: ⚡ Quick win

Inconsistent output streams for sibling warnings within _coerce_repo.

The invalid-name path (line 339) now emits to stdout via bare print(), while the invalid-scope path (lines 347-350) still emits to stderr via print(..., file=sys.stderr). Both are diagnostics for malformed config entries in the same function; routing them to different streams is surprising to callers.

As noted in the relevant context, sync.py already uses capture_output=True for subprocess calls. If any parent process captures these script invocations with stdout capture, the invalid-name warning will be inadvertently swallowed into the captured output buffer rather than flowing to the diagnostic stream.

If the test suite change that prompted this was asserting on stdout, consider redirecting both warnings to stderr and updating the tests to check capsys.readouterr().err (or equivalent), which keeps all config diagnostics on the standard diagnostic channel.
♻️ Proposed alignment (both warnings → stderr)
     if not is_valid_repo_name(name.strip()):
-        print(f"evolve-lite: {name!r} (skipped - invalid subscription name) — only A-Z, a-z, 0-9, '.', '_', '-' allowed")
+        print(
+            f"evolve-lite: {name!r} (skipped - invalid subscription name) — only A-Z, a-z, 0-9, '.', '_', '-' allowed",
+            file=sys.stderr,
+        )
         return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@platform-integrations/claude/plugins/evolve-lite/lib/config.py` around lines
338 - 351, In _coerce_repo, the invalid-name warning currently uses print(...)
to stdout while the invalid-scope warning prints to stderr; change the
invalid-name branch that prints "evolve-lite: {name!r} (skipped - invalid
subscription name) ..." to write to stderr (e.g., use print(...,
file=sys.stderr)) so both diagnostics go to stderr, and update any tests that
asserted on stdout to assert on stderr (capsys.readouterr().err) as needed;
reference symbols: function _coerce_repo, variables name and entry, and
VALID_SCOPES.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@platform-integrations/claude/plugins/evolve-lite/lib/config.py`:
- Around line 338-351: In _coerce_repo, the invalid-name warning currently uses
print(...) to stdout while the invalid-scope warning prints to stderr; change
the invalid-name branch that prints "evolve-lite: {name!r} (skipped - invalid
subscription name) ..." to write to stderr (e.g., use print(...,
file=sys.stderr)) so both diagnostics go to stderr, and update any tests that
asserted on stdout to assert on stderr (capsys.readouterr().err) as needed;
reference symbols: function _coerce_repo, variables name and entry, and
VALID_SCOPES.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7a4305af-488c-4d5a-a4b3-343458c79cd7

📥 Commits

Reviewing files that changed from the base of the PR and between 83c55f6 and 904c917.

📒 Files selected for processing (2)

platform-integrations/claude/plugins/evolve-lite/lib/config.py
platform-integrations/claw-code/plugins/evolve-lite/lib/config.py

✅ Files skipped from review due to trivial changes (1)

platform-integrations/claw-code/plugins/evolve-lite/lib/config.py

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

visahak and others added 2 commits May 1, 2026 22:11

fix ruff formatting

253338f

Merge branch 'main' into fix/e2e-244-test-failures

904c917

coderabbitai Bot reviewed May 2, 2026

View reviewed changes

visahak requested review from gaodan-fang, illeatmyhat and vinodmut May 3, 2026 00:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(e2e): resolve test failures from issue #244#245

fix(e2e): resolve test failures from issue #244#245
visahak wants to merge 3 commits intoAgentToolkit:mainfrom
visahak:fix/e2e-244-test-failures

visahak commented May 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 1, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

visahak commented May 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

visahak commented May 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 1, 2026 •

edited

Loading