Skip to content

Add derived media tags for root memories#17

Merged
brianmeyer merged 2 commits intomasterfrom
codex/rec-186-media-tags
Mar 22, 2026
Merged

Add derived media tags for root memories#17
brianmeyer merged 2 commits intomasterfrom
codex/rec-186-media-tags

Conversation

@brianmeyer
Copy link
Copy Markdown
Owner

Summary

  • generate normalized root-memory image/video tags via the existing Qwen helper path during ingest
  • persist tags on root embedding metadata and surface them through memory reads and search responses
  • add regression coverage for tag storage, memory rollup, and API serialization

Verification

  • pytest -q tests/test_storage.py tests/test_search_pipeline.py tests/test_config_tools.py tests/test_json_compliance.py

Linear

  • REC-186

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7e4a18e798

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +145 to +152
for line in (line.strip() for line in text.splitlines() if line.strip()):
lowered = line.lower()
if lowered.startswith("tags:"):
line = line.split(":", 1)[1]
if "," in line:
candidates.extend(part.strip() for part in line.split(",") if part.strip())
else:
candidates.append(line)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Strip code fences before splitting generated tags

If the model returns fenced JSON (for example ```json ... ```), the JSON parse path is skipped and the fallback line splitter treats fence tokens and bracket fragments as tags (e.g. json, ["cat"). This produces malformed tags that get persisted and later surfaced in search/memory APIs, degrading retrieval quality for media memories. The parser should remove markdown fences (or extract the JSON payload) before the line/comma fallback.

Useful? React with 👍 / 👎.

Comment on lines +1204 to +1206
video_tags = (
self._generate_media_tags(embed_video_func or embed_image_func, video_body, "video")
if caption_media
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fall back to image generator when video generator is missing

Video tag generation currently passes embed_video_func or embed_image_func as a single candidate. When embed_video_func is present but does not implement generate_text (or is a plain callable wrapper), _generate_media_tags returns no tags and never tries embed_image_func, even if the image backend can generate tags. This silently drops derived tags for video roots in mixed-backend setups.

Useful? React with 👍 / 👎.

@brianmeyer brianmeyer merged commit 998d716 into master Mar 22, 2026
4 checks passed
@brianmeyer brianmeyer deleted the codex/rec-186-media-tags branch March 22, 2026 18:53
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b076eaf85b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/recallforge/search.py
score=data['rrf_score'],
source=','.join(str(i) for i in sorted(data['query_indices'])),
query_scores=data['query_scores'],
tags=getattr(result, "tags", None),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Merge tags across duplicate batch hits

search_batch deduplicates on filepath but keeps a single stored result object, and the new tags field is copied only from that one object. When the same filepath is returned by multiple queries with different tag sets (for example, hybrid mode with memory rollup vs. fts/vec mode), later query tags are discarded, and because futures complete asynchronously this can make returned tags nondeterministic across runs. The merged entry should combine tags from all contributing hits instead of taking only the first-seen result’s tags.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant