Skip to content

Harden wrapped media tag parsing and batch tag ordering#22

Merged
brianmeyer merged 3 commits intomasterfrom
codex/rec-192-193-hardening
Mar 22, 2026
Merged

Harden wrapped media tag parsing and batch tag ordering#22
brianmeyer merged 3 commits intomasterfrom
codex/rec-192-193-hardening

Conversation

@brianmeyer
Copy link
Copy Markdown
Owner

@brianmeyer brianmeyer commented Mar 22, 2026

Summary

  • harden wrapped generator tag parsing so fenced JSON-like outputs do not leak malformed tags
  • keep search_batch tag ordering deterministic across duplicate hits from concurrent query branches
  • stabilize watch-folder async tests so CI asserts the daemon dedup and polling behavior instead of scheduler timing

Testing

  • pytest -q tests/test_search_batch.py tests/test_storage.py -q
  • pytest -q tests/test_watch_folder.py -q
  • repeated focused watch-folder loop passed 5/5 for create, image, and document paths

@brianmeyer
Copy link
Copy Markdown
Owner Author

Closing temporarily while I push a CI-stabilization follow-up onto the same branch and reopen/reuse the branch with fresh checks.

@brianmeyer brianmeyer closed this Mar 22, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ceb95e473f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/recallforge/search.py Outdated

canonical_path = representative.memory_root_path or top_hit.memory_root_path
if canonical_path:
representative.filepath = canonical_path
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve collection prefix when canonicalizing rolled-up filepath

Assigning representative.filepath = canonical_path replaces the URI-style path (recallforge://{collection}/{file_path}) with memory_root_path, which is only the raw file_path from storage metadata. In cross-collection searches this makes distinct results with the same file path collide and can cause incorrect dedup/merging (for example search_batch merges by filepath), so the rolled-up result should keep a collection-qualified filepath and only use the root path for display/provenance fields.

Useful? React with 👍 / 👎.

@brianmeyer brianmeyer reopened this Mar 22, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 549da58bae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/recallforge/search.py Outdated
Comment on lines +1265 to +1268
canonical_path = representative.memory_root_path or top_hit.memory_root_path
if canonical_path:
representative.filepath = canonical_path
representative.display_path = canonical_path
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve namespaced filepath when rolling up memory hits

Setting representative.filepath to memory_root_path drops the recallforge://<collection>/... namespace and rewrites rolled-up results to a raw storage path. In mixed-collection searches, two different memories with the same root-relative path now collide because downstream merging/dedup keys on filepath (e.g. search_batch), so results can be merged incorrectly or hidden.

Useful? React with 👍 / 👎.

Comment thread src/recallforge/search.py Outdated
Comment on lines +1278 to +1282
representative.memory_primary_evidence_path = top_hit.filepath
representative.memory_supporting_paths = [
item.filepath
for item in group
if item.filepath != representative.filepath
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exclude primary evidence from supporting path list

memory_supporting_paths is filtered against representative.filepath instead of memory_primary_evidence_path, so when the top hit is a child asset the same path appears in both primary and supporting evidence. This duplicates provenance entries and can evict a real supporting sibling because the list is truncated to 5 items.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@brianmeyer brianmeyer merged commit cb193c3 into master Mar 22, 2026
3 checks passed
@brianmeyer brianmeyer deleted the codex/rec-192-193-hardening branch March 22, 2026 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant