Add derived media tags for root memories#17
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7e4a18e798
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for line in (line.strip() for line in text.splitlines() if line.strip()): | ||
| lowered = line.lower() | ||
| if lowered.startswith("tags:"): | ||
| line = line.split(":", 1)[1] | ||
| if "," in line: | ||
| candidates.extend(part.strip() for part in line.split(",") if part.strip()) | ||
| else: | ||
| candidates.append(line) |
There was a problem hiding this comment.
Strip code fences before splitting generated tags
If the model returns fenced JSON (for example ```json ... ```), the JSON parse path is skipped and the fallback line splitter treats fence tokens and bracket fragments as tags (e.g. json, ["cat"). This produces malformed tags that get persisted and later surfaced in search/memory APIs, degrading retrieval quality for media memories. The parser should remove markdown fences (or extract the JSON payload) before the line/comma fallback.
Useful? React with 👍 / 👎.
| video_tags = ( | ||
| self._generate_media_tags(embed_video_func or embed_image_func, video_body, "video") | ||
| if caption_media |
There was a problem hiding this comment.
Fall back to image generator when video generator is missing
Video tag generation currently passes embed_video_func or embed_image_func as a single candidate. When embed_video_func is present but does not implement generate_text (or is a plain callable wrapper), _generate_media_tags returns no tags and never tries embed_image_func, even if the image backend can generate tags. This silently drops derived tags for video roots in mixed-backend setups.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b076eaf85b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| score=data['rrf_score'], | ||
| source=','.join(str(i) for i in sorted(data['query_indices'])), | ||
| query_scores=data['query_scores'], | ||
| tags=getattr(result, "tags", None), |
There was a problem hiding this comment.
Merge tags across duplicate batch hits
search_batch deduplicates on filepath but keeps a single stored result object, and the new tags field is copied only from that one object. When the same filepath is returned by multiple queries with different tag sets (for example, hybrid mode with memory rollup vs. fts/vec mode), later query tags are discarded, and because futures complete asynchronously this can make returned tags nondeterministic across runs. The merged entry should combine tags from all contributing hits instead of taking only the first-seen result’s tags.
Useful? React with 👍 / 👎.
Summary
Verification
Linear