Add derived media tags for root memories by brianmeyer · Pull Request #17 · brianmeyer/recallforge

brianmeyer · 2026-03-22T18:40:45Z

Summary

generate normalized root-memory image/video tags via the existing Qwen helper path during ingest
persist tags on root embedding metadata and surface them through memory reads and search responses
add regression coverage for tag storage, memory rollup, and API serialization

Verification

pytest -q tests/test_storage.py tests/test_search_pipeline.py tests/test_config_tools.py tests/test_json_compliance.py

Linear

REC-186

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7e4a18e798

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-22T18:47:20Z

+            for line in (line.strip() for line in text.splitlines() if line.strip()):
+                lowered = line.lower()
+                if lowered.startswith("tags:"):
+                    line = line.split(":", 1)[1]
+                if "," in line:
+                    candidates.extend(part.strip() for part in line.split(",") if part.strip())
+                else:
+                    candidates.append(line)


Strip code fences before splitting generated tags

If the model returns fenced JSON (for example ```json ... ```), the JSON parse path is skipped and the fallback line splitter treats fence tokens and bracket fragments as tags (e.g. json, ["cat"). This produces malformed tags that get persisted and later surfaced in search/memory APIs, degrading retrieval quality for media memories. The parser should remove markdown fences (or extract the JSON payload) before the line/comma fallback.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-22T18:47:20Z

+        video_tags = (
+            self._generate_media_tags(embed_video_func or embed_image_func, video_body, "video")
+            if caption_media


Fall back to image generator when video generator is missing

Video tag generation currently passes embed_video_func or embed_image_func as a single candidate. When embed_video_func is present but does not implement generate_text (or is a plain callable wrapper), _generate_media_tags returns no tags and never tries embed_image_func, even if the image backend can generate tags. This silently drops derived tags for video roots in mixed-backend setups.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b076eaf85b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-22T18:54:03Z

            score=data['rrf_score'],
            source=','.join(str(i) for i in sorted(data['query_indices'])),
            query_scores=data['query_scores'],
+            tags=getattr(result, "tags", None),


Merge tags across duplicate batch hits

search_batch deduplicates on filepath but keeps a single stored result object, and the new tags field is copied only from that one object. When the same filepath is returned by multiple queries with different tag sets (for example, hybrid mode with memory rollup vs. fts/vec mode), later query tags are discarded, and because futures complete asynchronously this can make returned tags nondeterministic across runs. The merged entry should combine tags from all contributing hits instead of taking only the first-seen result’s tags.

Useful? React with 👍 / 👎.

Add derived media tags for memory roots

7e4a18e

chatgpt-codex-connector Bot reviewed Mar 22, 2026

View reviewed changes

Stabilize media tags and watch folder polling

b076eaf

brianmeyer merged commit 998d716 into master Mar 22, 2026
4 checks passed

brianmeyer deleted the codex/rec-186-media-tags branch March 22, 2026 18:53

chatgpt-codex-connector Bot reviewed Mar 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add derived media tags for root memories#17

Add derived media tags for root memories#17
brianmeyer merged 2 commits intomasterfrom
codex/rec-186-media-tags

brianmeyer commented Mar 22, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 22, 2026

Uh oh!

chatgpt-codex-connector Bot Mar 22, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brianmeyer commented Mar 22, 2026

Summary

Verification

Linear

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant