Skip to content

Python: Bambriz context provider#5139

Open
bambriz wants to merge 21 commits into
microsoft:mainfrom
bambriz:bambriz-context-provider
Open

Python: Bambriz context provider#5139
bambriz wants to merge 21 commits into
microsoft:mainfrom
bambriz:bambriz-context-provider

Conversation

@bambriz
Copy link
Copy Markdown

@bambriz bambriz commented Apr 7, 2026

Motivation and Context

Adds an Azure Cosmos DB context provider to the agent-framework-azure-cosmos package, enabling RAG-style
context injection from Cosmos DB before model invocation and automatic write-back of request/response messages after
each run.

Description

New: CosmosContextProvider

  • Implements the ContextProvider protocol for Azure Cosmos DB
  • Supports three retrieval modes via CosmosContextSearchMode: VECTOR, FULL_TEXT, and HYBRID
  • Joins filtered user/assistant messages into a retrieval query string
  • Writes request/response messages back into the knowledge container after each run (after_run)
  • Configurable: top_k, scan_limit, partition_key, weights (for hybrid RRF), content_field_names
  • Supports environment-based settings resolution and flexible credential types

Samples (python/samples/02-agents/context_providers/azure_cosmos/)

  • cosmos_context_basics.py — Vector search context provider usage
  • cosmos_context_fulltext.py — Full-text retrieval mode
  • cosmos_context_hybrid.py — Hybrid retrieval with RRF weights

Tests

  • test_cosmos_context_provider.py — Unit tests covering all search modes, message filtering, write-back, and
    error handling (95% coverage)

Other changes

  • Updated AGENTS.md with context provider documentation and usage examples
  • Updated package README.md with detailed context provider API docs

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines and coding standard
  • All unit tests pass (34/34), and I have added new tests where possible
  • Is this a breaking change? No

@moonbox3 moonbox3 added documentation Improvements or additions to documentation python labels Apr 7, 2026
@github-actions github-actions Bot changed the title Bambriz context provider Python: Bambriz context provider Apr 7, 2026
Comment thread python/packages/azure-cosmos/agent_framework_azure_cosmos/_context_provider.py Outdated
Comment thread python/packages/azure-cosmos/agent_framework_azure_cosmos/_context_provider.py Outdated
Comment thread python/packages/azure-cosmos/agent_framework_azure_cosmos/_context_provider.py Outdated
Comment thread python/packages/azure-cosmos/agent_framework_azure_cosmos/_context_provider.py Outdated
Comment thread python/packages/azure-cosmos/agent_framework_azure_cosmos/_context_provider.py Outdated
Comment thread python/packages/azure-cosmos/agent_framework_azure_cosmos/_context_provider.py Outdated
Comment thread python/packages/azure-cosmos/agent_framework_azure_cosmos/_context_provider.py Outdated
@moonbox3
Copy link
Copy Markdown
Contributor

moonbox3 commented Apr 15, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/azure-cosmos/agent_framework_azure_cosmos
   _context_provider.py2081791%178, 254, 286, 303, 311–315, 324, 364, 399, 405, 408–411
TOTAL34327392488% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
6827 30 💤 0 ❌ 0 🔥 1m 52s ⏱️

Copilot AI review requested due to automatic review settings April 21, 2026 21:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Azure Cosmos DB-backed context provider to the Python agent-framework-azure-cosmos package, enabling retrieval-time context injection and post-run message writeback to Cosmos DB.

Changes:

  • Introduces CosmosContextProvider + CosmosContextSearchMode with full-text/vector/hybrid query construction and writeback behavior.
  • Exposes the new provider from the package public API and updates package metadata/docs.
  • Adds a comprehensive new test suite for the context provider.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
python/packages/azure-cosmos/agent_framework_azure_cosmos/_context_provider.py New Cosmos DB context provider implementation (retrieval + writeback).
python/packages/azure-cosmos/agent_framework_azure_cosmos/init.py Exports the new provider/search mode in the public API.
python/packages/azure-cosmos/tests/test_cosmos_context_provider.py New tests covering retrieval modes, settings resolution, writeback, and lifecycle.
python/packages/azure-cosmos/README.md Documents context provider usage/configuration and updates wording.
python/packages/azure-cosmos/AGENTS.md Updates package overview and usage snippets to include the context provider.
python/packages/azure-cosmos/pyproject.toml Updates package description to include context provider support.

Comment thread python/packages/azure-cosmos/README.md
Comment thread python/packages/azure-cosmos/README.md
@bambriz bambriz marked this pull request as ready for review May 6, 2026 21:30
bambriz and others added 4 commits May 14, 2026 09:05
- Streamline _context_provider.py implementation
- Reduce test complexity in test_cosmos_context_provider.py
- Remove deprecated cosmos_context_shared.py sample
- Update AGENTS.md and README.md documentation
- Update uv.lock

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bambriz
Copy link
Copy Markdown
Author

bambriz commented May 18, 2026

@moonbox3 could we get a review on this PR?

Copy link
Copy Markdown
Contributor

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also have a look at the failing CI/CD checks.

user_id = context.metadata.get("user_id") if context.metadata else None

base_sort_key = time.time_ns()
for index, message in enumerate(writeback):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be doing the writeback one upsert at a time? If message N raises (429, transient network, throttling), messages 0..N-1 are already committed and the exception propagates out of after_run. Next turn, before_run will retrieve half-written exchanges (user without assistant, or vice versa) and quietly poison the RAG context. Neither the mem0 (single add()) nor redis (single load()) sibling has this shape. Wondering if we want a try/except around upsert_item that logs and continues, or batch the documents so it is all-or-nothing per turn.

if self.message_field_name and self.message_field_name not in fields:
fields.append(self.message_field_name)
select = ", ".join(f"c.{f}" for f in fields)
base = f"SELECT TOP {self.scan_limit} {select} FROM c" # noqa: S608 # nosec B608
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The # nosec suppresses bandit, but select is built from content_field_names / message_field_name (line 332-335) and vector_field_name (line 346, 351) is also interpolated into the ORDER BY, all constructor strings with no validation. Cosmos NoSQL cannot parameterize field names, so the only defense is constructor-time validation. In a multi-tenant agent host where field names come from request/tenant config, a value like embedding}, @x) OFFSET 0 LIMIT 1; -- injects. Have we thought about a ^[A-Za-z_][A-Za-z0-9_]*$ check at __init__ for every user-supplied field name?

Suggested change
base = f"SELECT TOP {self.scan_limit} {select} FROM c" # noqa: S608 # nosec B608
base = f"SELECT TOP {self.scan_limit} {select} FROM c" # noqa: S608 # nosec B608 - field names validated in __init__

(plus the validation in __init__ for content_field_names, message_field_name, vector_field_name)

self.vector_field_name = vector_field_name
self.embedding_function = embedding_function
self.partition_key = partition_key
self.weights = tuple(float(w) for w in weights) if weights is not None else None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cosmos RRF() requires weights in [0, 1]. No bounds check here means a misconfigured weights=[2.0, 1.0] (which is what test_hybrid_weights_passed_through actually uses) crashes mid-before_run against a real endpoint with an opaque server error inside an agent turn rather than a clear ValueError at construction. Worth validating at init?

Suggested change
self.weights = tuple(float(w) for w in weights) if weights is not None else None
if weights is not None:
for w in weights:
if not 0.0 <= float(w) <= 1.0:
raise ValueError(f"RRF weight {w!r} must be in [0, 1]")
self.weights = tuple(float(w) for w in weights)
else:
self.weights = None


container = await self._get_container()
properties = await container.read()
paths: list[str] = properties.get("partitionKey", {}).get("paths", []) # type: ignore[assignment]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens on a container with a hash/synthetic partition key, or any response shape where paths is empty? paths[0] raises IndexError and crashes after_run with no useful message. Should we guard with an explicit error?

Suggested change
paths: list[str] = properties.get("partitionKey", {}).get("paths", []) # type: ignore[assignment]
paths: list[str] = properties.get("partitionKey", {}).get("paths", []) # type: ignore[assignment]
if not paths:
raise ValueError(
f"Could not determine partition key path for container '{self.container_name}'."
)
field = paths[0].lstrip("/")
self._partition_key_field = field

exc_tb: Any,
) -> None:
try:
await self.close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the async with block already raised and close() also raises, the close error is silently dropped. That can mask real client/socket leak signals from azure-cosmos. Other Azure clients in the repo at least log here. Should this except Exception log with exc_info=True before swallowing?

Suggested change
await self.close()
try:
await self.close()
except Exception:
if exc_type is None:
raise
logger.warning("Error closing Cosmos client during exception handling.", exc_info=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants