[Cosmos] Use ContainerReference in Cosmos Client SDK, initial driver dependency for caches#4005
Conversation
analogrelay
left a comment
There was a problem hiding this comment.
Good start. Just one comment about wrapping/adapting/reusing driver types.
There was a problem hiding this comment.
Pull request overview
Introduces the Cosmos driver as an internal SDK dependency and switches ContainerClient construction to eagerly resolve immutable container metadata via a new SDK ContainerReference, reducing per-operation cache lookups and aligning cache keys to use the container RID. Also updates call sites across tests/examples/native/perf to await the new fallible DatabaseClient::container_client() API and aligns several crate dependencies to workspace versions.
Changes:
- Add SDK-side
ContainerReferenceand use the driver to eagerly resolve container RID + partition key definition atContainerClientconstruction time. - Make
DatabaseClient::container_client()returnazure_core::Result<ContainerClient>and update call sites to.await?. - Align Cosmos crate dependencies to workspace (
azure_core,azure_identity) and addazure_data_cosmos_driveras a workspace dependency.
Reviewed changes
Copilot reviewed 33 out of 34 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure_data_cosmos_perf/src/setup.rs | Updates container client construction to be fallible/awaited (now interacts with eager metadata resolution). |
| sdk/cosmos/azure_data_cosmos_perf/src/main.rs | Updates perf tool call sites to container_client().await? and adjusts formatting of chained calls. |
| sdk/cosmos/azure_data_cosmos_perf/Cargo.toml | Switches azure_core/azure_identity to workspace dependencies. |
| sdk/cosmos/azure_data_cosmos_native/src/clients/database_client.rs | Updates FFI wrapper call sites to container_client().await?. |
| sdk/cosmos/azure_data_cosmos_native/Cargo.toml | Switches azure_core to workspace dependency. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/sharded_transport.rs | Simplifies tracing imports (now unconditional debug + trace). |
| sdk/cosmos/azure_data_cosmos/tests/multi_write_tests/cosmos_multi_write_retry_policies.rs | Updates test call sites to container_client().await?. |
| sdk/cosmos/azure_data_cosmos/tests/multi_write_tests/cosmos_multi_write_fault_injection.rs | Updates test call sites to container_client().await?. |
| sdk/cosmos/azure_data_cosmos/tests/framework/test_data.rs | Updates test helper to use fallible container_client().await?. |
| sdk/cosmos/azure_data_cosmos/tests/framework/test_client.rs | Updates framework helpers to use fallible container_client().await?. |
| sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_items.rs | Updates emulator test helper to container_client().await?. |
| sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_fault_injection.rs | Updates emulator tests to container_client().await? (and refactors one call for formatting). |
| sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_containers.rs | Updates emulator test to container_client().await?. |
| sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_batch.rs | Updates emulator test helper to container_client().await?. |
| sdk/cosmos/azure_data_cosmos/src/models/mod.rs | Wires in the new container_reference module and re-export. |
| sdk/cosmos/azure_data_cosmos/src/models/container_reference.rs | Adds SDK ContainerReference with driver adaptation + accessors + unit test. |
| sdk/cosmos/azure_data_cosmos/src/handler/container_connection.rs | Stores ContainerReference and uses its RID/PK definition for routing decisions in send(). |
| sdk/cosmos/azure_data_cosmos/src/cosmos_request.rs | Removes unused container-id extraction accessor from CosmosRequest. |
| sdk/cosmos/azure_data_cosmos/src/clients/database_client.rs | Makes container_client() fallible and passes driver + db id for eager metadata resolution. |
| sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs | Builds a driver runtime/driver during SDK client build and disables prior builder unit tests. |
| sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client.rs | Adds driver field and passes it into DatabaseClient. |
| sdk/cosmos/azure_data_cosmos/src/clients/container_client.rs | new() becomes fallible and eagerly resolves container metadata via driver resolve_container(). |
| sdk/cosmos/azure_data_cosmos/examples/cosmos/upsert.rs | Updates example to container_client().await?. |
| sdk/cosmos/azure_data_cosmos/examples/cosmos/replace.rs | Updates example to container_client().await?. |
| sdk/cosmos/azure_data_cosmos/examples/cosmos/read.rs | Updates example to container_client().await?. |
| sdk/cosmos/azure_data_cosmos/examples/cosmos/query.rs | Updates example to container_client().await?. |
| sdk/cosmos/azure_data_cosmos/examples/cosmos/metadata.rs | Updates example to container_client().await?. |
| sdk/cosmos/azure_data_cosmos/examples/cosmos/delete.rs | Updates example to container_client().await?. |
| sdk/cosmos/azure_data_cosmos/examples/cosmos/create.rs | Updates example to container_client().await?. |
| sdk/cosmos/azure_data_cosmos/examples/cosmos/batch.rs | Updates example to container_client().await?. |
| sdk/cosmos/azure_data_cosmos/Cargo.toml | Adds driver dependency; switches azure_core/azure_identity to workspace; forwards driver TLS feature through SDK features. |
| sdk/cosmos/azure_data_cosmos/CHANGELOG.md | Documents the breaking change to DatabaseClient::container_client() returning a Result. |
| Cargo.toml | Adds azure_data_cosmos_driver to [workspace.dependencies]. |
| Cargo.lock | Updates resolved dependency graph to use workspace azure_core/azure_identity and include the driver. |
Comments suppressed due to low confidence (2)
sdk/cosmos/azure_data_cosmos_perf/src/setup.rs:33
DatabaseClient::container_client()now eagerly resolves container metadata and will fail if the container doesn’t exist yet. Calling it at the start ofensure_container()means the function can no longer create a missing container (the failure happens before theread()/404 handling). Restructure this flow to create the container first (handling 409), then poll by attemptingcontainer_client(...).await(and/orread()) until it succeeds.
sdk/cosmos/azure_data_cosmos_perf/src/main.rs:101container_clientis created beforeensure_database/ensure_container, butDatabaseClient::container_client()now performs a metadata lookup and will fail if the database/container aren’t present yet. Move container client construction to after the database and container have been ensured (or haveensure_containerreturn a readyContainerClient).
sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/sharded_transport.rs
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs
Outdated
Show resolved
Hide resolved
analogrelay
left a comment
There was a problem hiding this comment.
I'd like us to resolve the issue of whether we wrap, copy, or re-export ContainerReference. I think re-exporting is my preference here unless there's a strong motivation not to. Then wrapping would be preference
sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs
Outdated
Show resolved
Hide resolved
FabianMeiswinkel
left a comment
There was a problem hiding this comment.
+1 on re-exporting driver public API as default - allows differentiation later if needed - bit will for large percentage of APIs not change anything and is easier to understand.
API Change CheckAPIView identified API level changes in this PR and created the following API reviews |
98d01c8
into
release/azure_data_cosmos-previews
) Fixes two issues in `sdk/cosmos/azure_data_cosmos/src/routing/partition_key_range_cache.rs`: ## Changes Made ### Primary fix — correct pkranges resource link URL construction In `get_routing_map_for_collection()`, changed `.item(collection_rid)` to `.item_by_rid(collection_rid)`. The `.item()` method URL-encodes the value via `LinkSegment::new()`, so RIDs like `pLLZAIuPigw=` were being encoded to `pLLZAIuPigw%3D` in the URL, causing Cosmos DB to return 404 on every partition key range fetch. The `.item_by_rid()` method uses `LinkSegment::identity()` (no encoding), producing the correct URL: ``` dbs/perfdb/colls/pLLZAIuPigw=/pkranges ← correct ``` ### Error logging in try_lookup Added `tracing::warn!` before the `.ok()` call in `try_lookup()` that was silently swallowing errors. Routing map fetch failures now emit a warning with the `collection_rid` and error details, making silent failures visible in diagnostics. ### Unit tests Added three unit tests in `partition_key_range_cache.rs` that directly verify the URL construction behavior: - `pkranges_link_rid_with_equals_is_not_encoded` — verifies `item_by_rid()` preserves `=` in the RID path (e.g., `pLLZAIuPigw=` → `dbs/perfdb/colls/pLLZAIuPigw=/pkranges`) - `pkranges_link_item_encodes_equals_incorrectly` — documents the bug: `item()` encodes `=` to `%3D`, producing a path that causes 404s - `pkranges_link_rid_with_plus_is_not_encoded` — verifies `item_by_rid()` also preserves `+` and `/` in base64 RIDs ## Root Cause PR #4005 changed the `pk_range_cache` key from container name to collection RID. The URL construction code was not updated to use `.item_by_rid()`, causing RID URL-encoding and subsequent 404s on every pkranges fetch. Because errors were silently swallowed and `AsyncCache` does not cache errors, this failed on every single request, resulting in write lock contention and loss of client-side partition key routing (~7% throughput regression observed in benchmarks). <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> ---- *This section details on the original issue you should resolve* <issue_title>[Cosmos] pk_range_cache uses .item() instead of .item_by_rid() causing silent 404s on every request</issue_title> <issue_description>## Bug Report ### Summary PR [#4005](#4005) changed the `pk_range_cache` key from container **name** to collection **RID**. However, the code that constructs the URL for fetching partition key ranges still uses `.item(collection_rid)` instead of `.item_by_rid(collection_rid)`. This causes the RID to be URL-encoded (e.g., `=` → `%3D`), resulting in a **404 from Cosmos DB** on every partition key range fetch attempt. Because `try_lookup` silently swallows errors via `Ok(routing_map.ok())` and `AsyncCache` does not cache errors, this failure repeats on **every single request**, causing: 1. **1.6M extra 404 requests/hour** observed on a benchmark account after deploying the change 2. **Write lock contention** in `AsyncCache` as every concurrent operation serializes through the failed fetch path 3. **Loss of client-side partition key routing** — the gateway must route all requests instead 4. **~7% throughput regression** observed in continuous benchmarks (110M → 102M requests/hour) ### Root Cause (3-step chain) #### Step 1: Wrong URL encoding — `.item()` vs `.item_by_rid()` In `partition_key_range_cache.rs`, `get_routing_map_for_collection()`: ```rust let pk_range_link = self .database_link // dbs/perfdb .feed(ResourceType::Containers) .item(collection_rid) // ← BUG: .item() URL-encodes the RID .feed(ResourceType::PartitionKeyRanges); ``` `.item()` calls `LinkSegment::new()` which URL-encodes the value. RIDs like `pLLZAIuPigw=` get the `=` encoded to `%3D`: ``` dbs/perfdb/colls/pLLZAIuPigw%3D/pkranges ← Cosmos DB returns 404 ``` Should use `.item_by_rid()` which calls `LinkSegment::identity()` (no encoding): ``` dbs/perfdb/colls/pLLZAIuPigw=/pkranges ← correct ``` #### Step 2: Error silently swallowed In `partition_key_range_cache.rs` line 147, `try_lookup()`: ```rust Ok(routing_map.ok()) // Converts Err(404) → Ok(None), invisible to caller ``` The caller in `container_connection.rs` sees `Ok(None)` and skips the routing block entirely: ```rust let routing_map = self.pk_range_cache.try_lookup(collection_rid, None).await?; if let Some(routing_map) = routing_map { // SKIPPED — no client-side partition key range resolution } ``` #### Step 3: Errors not cached → retried on every request `AsyncCache::get()` only caches successful values. When `compute()` returns `Err`, the error propagates and the cache remains empty. Every subsequent request: 1. Read lock → cache miss 2. Acquire **write lock** (serializes all concurrent operations on the same key) 3. HTTP request to Cosmos DB → **404** 4. Error propagated, cache stays empty 5. Error swallowed as `Ok(None)` 6. Routing bypassed ### Evidence from Benchmarks Continuous benchmark on `cosmos-perf-rg` (4 pods, concurrency=100, 400K RU/s): | Hour (UTC) | 404 Count | Notes | |---|---|---| | 13:02 – 17:02 | 3,500 – 6,300 | Normal background | | **18:02** | **1,645,604** | After deploying commit `98d01c8` | Throughput dropped from ~110M req/hr to ~102M req/hr (~7% regression). Server-side latency actually decreased (fewer effective requests reaching the service), confirming the bottleneck is client-side. ### Suggested Fix ```diff // partition_key_range_cache.rs, get_routing_map_for_collection() let pk_range_link = self .database_link .feed(ResourceType::Containers) - .item(collection_rid) + .item_by_rid(collection_rid) .feed(ResourceType::PartitionKeyRanges); ``` Additionally, consider: - Logging errors in `try_lookup` before swallowing them, to make silent failures visible - Adding negative caching (or a backoff) in `AsyncCache` to avoid retrying failed fetches on every request ### Affected Version Commit [`98d01c8`](98d01c8) on `release/azure_data_cosmos-previews` branch (PR #4005). </issue_description> <agent_instructions> @copilot fix this issue. The target branch is release/azure_data_cosmos-previews. Required changes: 1. Primary fix — In sdk/cosmos/azure_data_cosmos/src/routing/partition_key_range_cache.rs, in get_routing_map_for_collection(), change .item(collection_rid) to .item_by_rid(collection_rid) so the collection RID is not URL-encoded when constructing the pkranges resource link. 2. Add error logging in try_lookup — Before the .ok() on the last line of try_lookup() in the same file, add a tracing::warn! that logs when the routing map fetch fails, including the collection_rid and the error. This ensures silent failures are visible in diagnostics. Example: let routing_map = self.routing_map_cache.get(/* ... */).await; if let Err(ref e) = routing_map { tracing::warn!( collection_rid, ... </details> <!-- START COPILOT CODING AGENT SUFFIX --> - Fixes #4031 <!-- START COPILOT CODING AGENT TIPS --> --- 📍 Connect Copilot coding agent with [Jira](https://gh.io/cca-jira-docs), [Azure Boards](https://gh.io/cca-azure-boards-docs) or [Linear](https://gh.io/cca-linear-docs) to delegate work to Copilot in one click without leaving your project management tool. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tvaron3 <70857381+tvaron3@users.noreply.github.com>
…ure#4032) Fixes two issues in `sdk/cosmos/azure_data_cosmos/src/routing/partition_key_range_cache.rs`: In `get_routing_map_for_collection()`, changed `.item(collection_rid)` to `.item_by_rid(collection_rid)`. The `.item()` method URL-encodes the value via `LinkSegment::new()`, so RIDs like `pLLZAIuPigw=` were being encoded to `pLLZAIuPigw%3D` in the URL, causing Cosmos DB to return 404 on every partition key range fetch. The `.item_by_rid()` method uses `LinkSegment::identity()` (no encoding), producing the correct URL: ``` dbs/perfdb/colls/pLLZAIuPigw=/pkranges ← correct ``` Added `tracing::warn!` before the `.ok()` call in `try_lookup()` that was silently swallowing errors. Routing map fetch failures now emit a warning with the `collection_rid` and error details, making silent failures visible in diagnostics. Added three unit tests in `partition_key_range_cache.rs` that directly verify the URL construction behavior: - `pkranges_link_rid_with_equals_is_not_encoded` — verifies `item_by_rid()` preserves `=` in the RID path (e.g., `pLLZAIuPigw=` → `dbs/perfdb/colls/pLLZAIuPigw=/pkranges`) - `pkranges_link_item_encodes_equals_incorrectly` — documents the bug: `item()` encodes `=` to `%3D`, producing a path that causes 404s - `pkranges_link_rid_with_plus_is_not_encoded` — verifies `item_by_rid()` also preserves `+` and `/` in base64 RIDs PR Azure#4005 changed the `pk_range_cache` key from container name to collection RID. The URL construction code was not updated to use `.item_by_rid()`, causing RID URL-encoding and subsequent 404s on every pkranges fetch. Because errors were silently swallowed and `AsyncCache` does not cache errors, this failed on every single request, resulting in write lock contention and loss of client-side partition key routing (~7% throughput regression observed in benchmarks). <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> ---- *This section details on the original issue you should resolve* <issue_title>[Cosmos] pk_range_cache uses .item() instead of .item_by_rid() causing silent 404s on every request</issue_title> <issue_description>## Bug Report PR [Azure#4005](Azure#4005) changed the `pk_range_cache` key from container **name** to collection **RID**. However, the code that constructs the URL for fetching partition key ranges still uses `.item(collection_rid)` instead of `.item_by_rid(collection_rid)`. This causes the RID to be URL-encoded (e.g., `=` → `%3D`), resulting in a **404 from Cosmos DB** on every partition key range fetch attempt. Because `try_lookup` silently swallows errors via `Ok(routing_map.ok())` and `AsyncCache` does not cache errors, this failure repeats on **every single request**, causing: 1. **1.6M extra 404 requests/hour** observed on a benchmark account after deploying the change 2. **Write lock contention** in `AsyncCache` as every concurrent operation serializes through the failed fetch path 3. **Loss of client-side partition key routing** — the gateway must route all requests instead 4. **~7% throughput regression** observed in continuous benchmarks (110M → 102M requests/hour) In `partition_key_range_cache.rs`, `get_routing_map_for_collection()`: ```rust let pk_range_link = self .database_link // dbs/perfdb .feed(ResourceType::Containers) .item(collection_rid) // ← BUG: .item() URL-encodes the RID .feed(ResourceType::PartitionKeyRanges); ``` `.item()` calls `LinkSegment::new()` which URL-encodes the value. RIDs like `pLLZAIuPigw=` get the `=` encoded to `%3D`: ``` dbs/perfdb/colls/pLLZAIuPigw%3D/pkranges ← Cosmos DB returns 404 ``` Should use `.item_by_rid()` which calls `LinkSegment::identity()` (no encoding): ``` dbs/perfdb/colls/pLLZAIuPigw=/pkranges ← correct ``` In `partition_key_range_cache.rs` line 147, `try_lookup()`: ```rust Ok(routing_map.ok()) // Converts Err(404) → Ok(None), invisible to caller ``` The caller in `container_connection.rs` sees `Ok(None)` and skips the routing block entirely: ```rust let routing_map = self.pk_range_cache.try_lookup(collection_rid, None).await?; if let Some(routing_map) = routing_map { // SKIPPED — no client-side partition key range resolution } ``` `AsyncCache::get()` only caches successful values. When `compute()` returns `Err`, the error propagates and the cache remains empty. Every subsequent request: 1. Read lock → cache miss 2. Acquire **write lock** (serializes all concurrent operations on the same key) 3. HTTP request to Cosmos DB → **404** 4. Error propagated, cache stays empty 5. Error swallowed as `Ok(None)` 6. Routing bypassed Continuous benchmark on `cosmos-perf-rg` (4 pods, concurrency=100, 400K RU/s): | Hour (UTC) | 404 Count | Notes | |---|---|---| | 13:02 – 17:02 | 3,500 – 6,300 | Normal background | | **18:02** | **1,645,604** | After deploying commit `98d01c8` | Throughput dropped from ~110M req/hr to ~102M req/hr (~7% regression). Server-side latency actually decreased (fewer effective requests reaching the service), confirming the bottleneck is client-side. ```diff // partition_key_range_cache.rs, get_routing_map_for_collection() let pk_range_link = self .database_link .feed(ResourceType::Containers) - .item(collection_rid) + .item_by_rid(collection_rid) .feed(ResourceType::PartitionKeyRanges); ``` Additionally, consider: - Logging errors in `try_lookup` before swallowing them, to make silent failures visible - Adding negative caching (or a backoff) in `AsyncCache` to avoid retrying failed fetches on every request Commit [`98d01c8`](Azure@98d01c8) on `release/azure_data_cosmos-previews` branch (PR Azure#4005). </issue_description> <agent_instructions> @copilot fix this issue. The target branch is release/azure_data_cosmos-previews. Required changes: 1. Primary fix — In sdk/cosmos/azure_data_cosmos/src/routing/partition_key_range_cache.rs, in get_routing_map_for_collection(), change .item(collection_rid) to .item_by_rid(collection_rid) so the collection RID is not URL-encoded when constructing the pkranges resource link. 2. Add error logging in try_lookup — Before the .ok() on the last line of try_lookup() in the same file, add a tracing::warn! that logs when the routing map fetch fails, including the collection_rid and the error. This ensures silent failures are visible in diagnostics. Example: let routing_map = self.routing_map_cache.get(/* ... */).await; if let Err(ref e) = routing_map { tracing::warn!( collection_rid, ... </details> <!-- START COPILOT CODING AGENT SUFFIX --> - Fixes Azure#4031 <!-- START COPILOT CODING AGENT TIPS --> --- 📍 Connect Copilot coding agent with [Jira](https://gh.io/cca-jira-docs), [Azure Boards](https://gh.io/cca-azure-boards-docs) or [Linear](https://gh.io/cca-linear-docs) to delegate work to Copilot in one click without leaving your project management tool. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tvaron3 <70857381+tvaron3@users.noreply.github.com>
…ure#4032) Fixes two issues in `sdk/cosmos/azure_data_cosmos/src/routing/partition_key_range_cache.rs`: In `get_routing_map_for_collection()`, changed `.item(collection_rid)` to `.item_by_rid(collection_rid)`. The `.item()` method URL-encodes the value via `LinkSegment::new()`, so RIDs like `pLLZAIuPigw=` were being encoded to `pLLZAIuPigw%3D` in the URL, causing Cosmos DB to return 404 on every partition key range fetch. The `.item_by_rid()` method uses `LinkSegment::identity()` (no encoding), producing the correct URL: ``` dbs/perfdb/colls/pLLZAIuPigw=/pkranges ← correct ``` Added `tracing::warn!` before the `.ok()` call in `try_lookup()` that was silently swallowing errors. Routing map fetch failures now emit a warning with the `collection_rid` and error details, making silent failures visible in diagnostics. Added three unit tests in `partition_key_range_cache.rs` that directly verify the URL construction behavior: - `pkranges_link_rid_with_equals_is_not_encoded` — verifies `item_by_rid()` preserves `=` in the RID path (e.g., `pLLZAIuPigw=` → `dbs/perfdb/colls/pLLZAIuPigw=/pkranges`) - `pkranges_link_item_encodes_equals_incorrectly` — documents the bug: `item()` encodes `=` to `%3D`, producing a path that causes 404s - `pkranges_link_rid_with_plus_is_not_encoded` — verifies `item_by_rid()` also preserves `+` and `/` in base64 RIDs PR Azure#4005 changed the `pk_range_cache` key from container name to collection RID. The URL construction code was not updated to use `.item_by_rid()`, causing RID URL-encoding and subsequent 404s on every pkranges fetch. Because errors were silently swallowed and `AsyncCache` does not cache errors, this failed on every single request, resulting in write lock contention and loss of client-side partition key routing (~7% throughput regression observed in benchmarks). <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> ---- *This section details on the original issue you should resolve* <issue_title>[Cosmos] pk_range_cache uses .item() instead of .item_by_rid() causing silent 404s on every request</issue_title> <issue_description>## Bug Report PR [Azure#4005](Azure#4005) changed the `pk_range_cache` key from container **name** to collection **RID**. However, the code that constructs the URL for fetching partition key ranges still uses `.item(collection_rid)` instead of `.item_by_rid(collection_rid)`. This causes the RID to be URL-encoded (e.g., `=` → `%3D`), resulting in a **404 from Cosmos DB** on every partition key range fetch attempt. Because `try_lookup` silently swallows errors via `Ok(routing_map.ok())` and `AsyncCache` does not cache errors, this failure repeats on **every single request**, causing: 1. **1.6M extra 404 requests/hour** observed on a benchmark account after deploying the change 2. **Write lock contention** in `AsyncCache` as every concurrent operation serializes through the failed fetch path 3. **Loss of client-side partition key routing** — the gateway must route all requests instead 4. **~7% throughput regression** observed in continuous benchmarks (110M → 102M requests/hour) In `partition_key_range_cache.rs`, `get_routing_map_for_collection()`: ```rust let pk_range_link = self .database_link // dbs/perfdb .feed(ResourceType::Containers) .item(collection_rid) // ← BUG: .item() URL-encodes the RID .feed(ResourceType::PartitionKeyRanges); ``` `.item()` calls `LinkSegment::new()` which URL-encodes the value. RIDs like `pLLZAIuPigw=` get the `=` encoded to `%3D`: ``` dbs/perfdb/colls/pLLZAIuPigw%3D/pkranges ← Cosmos DB returns 404 ``` Should use `.item_by_rid()` which calls `LinkSegment::identity()` (no encoding): ``` dbs/perfdb/colls/pLLZAIuPigw=/pkranges ← correct ``` In `partition_key_range_cache.rs` line 147, `try_lookup()`: ```rust Ok(routing_map.ok()) // Converts Err(404) → Ok(None), invisible to caller ``` The caller in `container_connection.rs` sees `Ok(None)` and skips the routing block entirely: ```rust let routing_map = self.pk_range_cache.try_lookup(collection_rid, None).await?; if let Some(routing_map) = routing_map { // SKIPPED — no client-side partition key range resolution } ``` `AsyncCache::get()` only caches successful values. When `compute()` returns `Err`, the error propagates and the cache remains empty. Every subsequent request: 1. Read lock → cache miss 2. Acquire **write lock** (serializes all concurrent operations on the same key) 3. HTTP request to Cosmos DB → **404** 4. Error propagated, cache stays empty 5. Error swallowed as `Ok(None)` 6. Routing bypassed Continuous benchmark on `cosmos-perf-rg` (4 pods, concurrency=100, 400K RU/s): | Hour (UTC) | 404 Count | Notes | |---|---|---| | 13:02 – 17:02 | 3,500 – 6,300 | Normal background | | **18:02** | **1,645,604** | After deploying commit `98d01c8` | Throughput dropped from ~110M req/hr to ~102M req/hr (~7% regression). Server-side latency actually decreased (fewer effective requests reaching the service), confirming the bottleneck is client-side. ```diff // partition_key_range_cache.rs, get_routing_map_for_collection() let pk_range_link = self .database_link .feed(ResourceType::Containers) - .item(collection_rid) + .item_by_rid(collection_rid) .feed(ResourceType::PartitionKeyRanges); ``` Additionally, consider: - Logging errors in `try_lookup` before swallowing them, to make silent failures visible - Adding negative caching (or a backoff) in `AsyncCache` to avoid retrying failed fetches on every request Commit [`98d01c8`](Azure@98d01c8) on `release/azure_data_cosmos-previews` branch (PR Azure#4005). </issue_description> <agent_instructions> @copilot fix this issue. The target branch is release/azure_data_cosmos-previews. Required changes: 1. Primary fix — In sdk/cosmos/azure_data_cosmos/src/routing/partition_key_range_cache.rs, in get_routing_map_for_collection(), change .item(collection_rid) to .item_by_rid(collection_rid) so the collection RID is not URL-encoded when constructing the pkranges resource link. 2. Add error logging in try_lookup — Before the .ok() on the last line of try_lookup() in the same file, add a tracing::warn! that logs when the routing map fetch fails, including the collection_rid and the error. This ensures silent failures are visible in diagnostics. Example: let routing_map = self.routing_map_cache.get(/* ... */).await; if let Err(ref e) = routing_map { tracing::warn!( collection_rid, ... </details> <!-- START COPILOT CODING AGENT SUFFIX --> - Fixes Azure#4031 <!-- START COPILOT CODING AGENT TIPS --> --- 📍 Connect Copilot coding agent with [Jira](https://gh.io/cca-jira-docs), [Azure Boards](https://gh.io/cca-azure-boards-docs) or [Linear](https://gh.io/cca-linear-docs) to delegate work to Copilot in one click without leaving your project management tool. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tvaron3 <70857381+tvaron3@users.noreply.github.com>
…e key (#4047) ## Summary Fixes [#4031](#4031) — the pkranges fetch was using the collection **RID** in a name-based URL hierarchy (`dbs/perfdb/colls/<RID>/pkranges`), which Cosmos DB rejects with 404 because mixed name/RID addressing is not supported. The previous fix (#4041) corrected URL encoding (`.item()` → `.item_by_rid()`) but did not fix the fundamental mixed-addressing issue. This PR resolves it by passing the container **name** for URL construction while keeping the **RID** as the cache key and request context value. ## Root Cause PR #4005 changed `container_connection.rs::send()` to pass `self.container_ref.rid()` to `pk_range_cache.try_lookup()`. The cache used this RID to build the pkranges URL: ``` dbs/perfdb/colls/pLLZAIuPigw=/pkranges ^^^^^^ ^^^^^^^^^^^^^ NAME RID ← mixed addressing → 404 ``` All other SDK and driver operations use name-based URLs. The pkranges fetch was the only code path using a RID in a name-based link hierarchy. ## Impact (observed on continuous benchmarks) - **1.8M 404 requests/hour** from failed pkranges fetches - Errors silently swallowed by `try_lookup` → `Ok(routing_map.ok())` - Errors not cached → retried on every request (write lock contention on AsyncCache) - Loss of client-side partition key routing → gateway must route all requests - Throughput regression from ~110M to ~4.2M requests/hour ## Changes ### `partition_key_range_cache.rs` - Added `collection_name: &str` parameter to `try_lookup`, `get_routing_map_for_collection`, `resolve_partition_key_range_by_id`, and `resolve_overlapping_ranges` - Changed `.item_by_rid(collection_rid)` → `.item(collection_name)` for pkranges URL construction - Cache key remains the RID (`collection_rid.to_string()`) - `resource_id` on the request remains the RID - Updated `tracing::warn!` to include both `collection_name` and `collection_rid` - Replaced 3 RID-encoding unit tests with 2 tests verifying name-based URL construction ### `container_connection.rs` - Extracted `collection_name` from `self.container_ref.name()` alongside existing `collection_rid` - Passes `collection_name` to all `pk_range_cache` method calls - `resolved_collection_rid` on request context still uses the RID (unchanged) ### `cosmos_fault_injection.rs` - Added `fault_injection_pkrange_readfeed_is_exercised` integration test - Injects a transient error on `MetadataPartitionKeyRanges` ReadFeed with hit_limit=1 - Verifies the fault rule is hit (proving pkrange fetch code path executes) - Verifies subsequent item operations succeed (proving end-to-end pkrange resolution works) ## Test Results - ✅ 31 unit tests pass (including 2 new) - ✅ Build succeeds - Integration test requires emulator (will run in CI) --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This PR will contain the work needed in order to make the changes to port over an initial method (`read_item`) to use the underlying driver as the connection. For now, I am sharing a spec of the proposed changes, in the hopes that this same spec will work to migrate all other remaining methods after we verify this one works. The spec file can be found in the PR to facilitate review, but will also be the description below. Actual code implementation to follow. # SDK-to-Driver Cutover: Design Specification ## Overview This document describes the design for routing `azure_data_cosmos` SDK operations through the `azure_data_cosmos_driver` execution engine, replacing the legacy gateway pipeline path. The first operation cut over is `ContainerClient::read_item`, which serves as the **reference pattern** for all subsequent operations. ### Context Prior to this work, the Cosmos SDK had two separate execution paths: - **Gateway pipeline** (`azure_data_cosmos`): The SDK handled auth, routing, retries, and request construction via `CosmosRequest` → `GatewayPipeline` → HTTP. - **Driver** (`azure_data_cosmos_driver`): A newer execution engine with its own transport, routing, and operation model (`CosmosOperation` + `OperationOptions`). Previously used only in driver-level tests. [PR #4005](#4005) bridged the two worlds by having `ContainerClient::new()` call `driver.resolve_container()` for eager metadata resolution. This PR takes the next step: routing the first data operation through the driver. ### Goal Make the SDK client a **thin wrapper** over the driver. The SDK translates public-facing types into driver concepts, delegates execution, and translates the response back. All real work (auth, routing, retries, transport) happens inside `driver.execute_operation()`. ## Architecture ### Data Flow ```text User calls: container_client.read_item(pk, id, options) │ ┌─────────▼────────────┐ │ SDK ContainerClient │ └─────────┬────────────┘ │ ┌───────────────────┼───────────────────┐ │ │ │ PartitionKey ItemOptions ContainerRef (SDK type) (SDK type) (driver type, │ │ stored on client) │ │ │ ▼ ▼ ▼ into_driver_pk() item_options_to_ ItemReference:: │ operation_options() from_name() │ │ │ └───────────────────┼───────────────────┘ │ ┌─────────▼──────────┐ │ CosmosOperation:: │ │ read_item() │ └─────────┬──────────┘ │ ┌─────────▼───────────┐ │ driver.execute_ │ │ operation(op, opts)│ │ │ │ (auth, routing, │ │ retries, HTTP) │ └─────────┬───────────┘ │ ┌─────────▼───────────┐ │ driver_response_ │ │ to_cosmos_response │ └─────────┬───────────┘ │ ┌─────────▼───────────┐ │ CosmosResponse<T> │ │ (SDK public type) │ └─────────────────────┘ ``` ### Key Principle The SDK's public API does not change. `read_item` retains the same signature, return type, and observable behavior. This is a pure internal refactor. ## Design Decision: Driver as Required Infrastructure An alternative approach was explored where the driver is **optional** — stored as `Option<Arc<CosmosDriver>>` on `CosmosClient`, `DatabaseClient`, and `ContainerClient`. In that model, each operation checks at runtime whether a driver is available: if so, it takes the driver path; otherwise, it falls back to the legacy gateway pipeline. Container metadata resolution is also optional and failure is silently ignored. We chose **not** to take that approach, since we want to verify the behavior of the driver being used only and this single method will serve as the test. In this design, the driver is **required**: - `CosmosClient` stores `Arc<CosmosDriver>` (not `Option`). - `ContainerClient::new()` eagerly resolves container metadata via the driver and returns `Result` — if resolution fails, the client cannot be created. - Operations have a **single codepath** through the driver, with no gateway fallback. ### Rationale The purpose of this cutover is to validate that the driver can fully replace the gateway pipeline for each operation. A fallback path undermines that goal: - **Testability:** If the driver path can silently fall back to the gateway, we can't be 100% sure that the driver path is exercised in production or tests. Failures would be hidden rather than surfaced. - **Correctness:** A dual-codepath design requires maintaining behavioral parity between two implementations indefinitely. A single path is easier to reason about, test, and debug. - **Options fidelity:** A fallback path tempts skipping the options translation (e.g., passing empty `OperationOptions` on the driver path), which silently drops user-configured session tokens, etags, and excluded regions. - **Response fidelity:** A minimal fallback implementation may skip reconstructing response headers from the driver's typed response, causing callers to get `None` for `request_charge()`, `session_token()`, and `etag()`. The cutover is intentionally incremental — one operation at a time. Operations that haven't been cut over yet continue using the gateway pipeline naturally (they don't call the driver). This gives us the gradual rollout benefit without the complexity of runtime branching within a single operation. ## Type Translation Decisions ### PartitionKey (SDK → Driver) The SDK and driver define **separate `PartitionKey` types** with identical structure but in different crates. Both represent a JSON array of typed values (string, number, bool, null). **Approach:** Added `into_driver_partition_key()` on the SDK's `PartitionKey` that maps each `InnerPartitionKeyValue` variant to the driver's `PartitionKeyValue`. **Driver change required:** Made `PartitionKeyValue` `pub` (was `pub(crate)`) so the SDK crate can construct `Vec<PartitionKeyValue>` for the conversion. **Future consideration:** Once Ashley's options alignment work unifies these types, this conversion can be eliminated, and we can just use the Driver's definitions the way we did with the ContainerReference. ```rust // SDK partition_key.rs pub(crate) fn into_driver_partition_key(self) -> driver::PartitionKey { let driver_values: Vec<DriverPKV> = self.0.into_iter() .map(|v| match v.0 { InnerPartitionKeyValue::String(s) => DriverPKV::from(s), InnerPartitionKeyValue::Number(n) => DriverPKV::from(n), InnerPartitionKeyValue::Bool(b) => DriverPKV::from(b), InnerPartitionKeyValue::Null => DriverPKV::from(Option::<String>::None), // ... }) .collect(); DriverPK::from(driver_values) } ``` ### ItemOptions → OperationOptions The SDK's `ItemOptions` (item-scoped request options) maps to the driver's `OperationOptions` field-by-field. The types in each field differ between crates, so values are bridged via their string representations. | SDK `ItemOptions` field | Driver `OperationOptions` | Conversion | | --- | --- | --- | | `session_token: Option<SessionToken>` | `.with_session_token()` | `DriverSessionToken::new(token.to_string())` | | `if_match_etag: Option<Etag>` | `.with_etag_condition()` | `Precondition::if_match(ETag::new(etag.to_string()))` | | `custom_headers: HashMap<...>` | `.with_custom_headers()` | Passed through directly (types are the same) | | `excluded_regions: Option<Vec<RegionName>>` | `.with_excluded_regions()` | `Region::new(name.to_string())` for each | | `content_response_on_write_enabled: bool` | *Ignored for reads* | Driver always returns body for point reads | **Driver change required:** Added `custom_headers` support to `OperationOptions` (new field, setter, getter) and wired it into `build_transport_request` in `operation_pipeline.rs`. Custom headers may be removed in the future as we analyze which options are truly needed. ### Response Bridge (Driver → SDK) The driver returns an untyped `CosmosResponse { body: Vec<u8>, headers: CosmosResponseHeaders, status: CosmosStatus }`. The SDK returns a typed `CosmosResponse<T>` wrapping `azure_core::Response<T>`. **Approach:** Reconstruct the SDK response from driver parts: ```rust pub(crate) fn driver_response_to_cosmos_response<T>( driver_response: DriverResponse, ) -> CosmosResponse<T> { let status_code = driver_response.status().status_code(); let headers = cosmos_response_headers_to_headers(driver_response.headers()); let body = driver_response.into_body(); let raw = RawResponse::from_bytes(status_code, headers, Bytes::from(body)); let typed: Response<T> = raw.into(); CosmosResponse::new(typed, None) } ``` The header conversion maps each typed `CosmosResponseHeaders` field back to its raw header name/value pair (reverse of the driver's `from_headers()` parser). **Caveat:** Only headers that the driver explicitly parses are preserved (activity ID, request charge, session token, etag, continuation, item count, substatus). Any other server headers are lost. This covers all standard Cosmos response metadata. We will probably come back to this when we do the work on verifying the headers we want. ### CosmosRequest → Optional The SDK's `CosmosResponse<T>` previously held the original `CosmosRequest` — a gateway pipeline concept with no driver equivalent. The driver uses `CosmosOperation` + `OperationOptions` instead, which are consumed during execution. **Decision:** Made the `request` field `Option<CosmosRequest>`: - Gateway-routed operations (all methods not yet cut over) continue setting `Some(request)`. - Driver-routed operations set `None`. - The field is only accessed behind `#[cfg(feature = "fault_injection")]` and marked `#[allow(dead_code)]`. - A TODO comment marks it for removal once all operations are on the driver. ## Structural Changes ### ContainerClient Added two fields to `ContainerClient` so `read_item` can reach the driver at execution time: ```rust pub struct ContainerClient { // ... existing fields ... driver: Arc<CosmosDriver>, // retained from new() container_ref: ContainerReference, // cloned before passing to ContainerConnection } ``` Previously, the driver was discarded after `new()` and `ContainerReference` was buried inside `ContainerConnection`. ### driver_bridge Module New private module at `src/driver_bridge.rs` containing: - `driver_response_to_cosmos_response<T>()` — response conversion - `item_options_to_operation_options()` — options translation - `driver_response_headers_to_headers()` — converts the driver's typed response headers (e.g., `activity_id: Option<ActivityId>`, `request_charge: Option<RequestCharge>`) into raw `azure_core::Headers` key-value pairs for the SDK response This module is the shared foundation for all future operation cutover. When cutting over `create_item`, `delete_item`, etc., they reuse the same bridge functions. ## Applying This Pattern to Other Operations To cut over another item operation (e.g., `create_item`), follow this template: 1. **Build the operation:** Use the appropriate `CosmosOperation::*` factory method (e.g., `CosmosOperation::create_item(container_ref, pk)`). 2. **Attach the body:** For write operations, serialize the item to bytes and call `.with_body(bytes)` on the operation. 3. **Translate options:** Reuse `item_options_to_operation_options()` from `driver_bridge.rs`. For write-specific options (e.g., `content_response_on_write_enabled`), extend the bridge function. 4. **Execute:** Call `self.driver.execute_operation(operation, driver_options).await?`. 5. **Bridge response:** Reuse `driver_response_to_cosmos_response(driver_response)`. The public method signature should not change. ## Files Changed | File | Change | | --- | --- | | `azure_data_cosmos_driver/src/options/operation_options.rs` | Added `custom_headers` field + setter/getter | | `azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs` | Wired custom headers into request construction | | `azure_data_cosmos_driver/src/models/partition_key.rs` | Made `PartitionKeyValue` `pub` | | `azure_data_cosmos_driver/src/models/mod.rs` | Re-exported `PartitionKeyValue` | | `azure_data_cosmos/src/driver_bridge.rs` | **New** — shared conversion module | | `azure_data_cosmos/src/clients/container_client.rs` | Added `driver`/`container_ref` fields; rewrote `read_item` | | `azure_data_cosmos/src/models/cosmos_response.rs` | Made `request` field optional | | `azure_data_cosmos/src/partition_key.rs` | Added `into_driver_partition_key()` | | `azure_data_cosmos/src/options/mod.rs` | Added `pub(crate)` accessors for bridge | | `azure_data_cosmos/src/pipeline/mod.rs` | Updated `CosmosResponse::new` call site | | `azure_data_cosmos/src/lib.rs` | Registered `mod driver_bridge` | ## Open Items and Future Work - **Options alignment:** Ashley is working on aligning SDK options with the driver's options model. Once complete, the `ItemOptions` → `OperationOptions` translation may simplify or become unnecessary. - **PartitionKey unification:** The dual `PartitionKey` types and `into_driver_partition_key()` conversion should be eliminated once the types are unified. - **`CosmosRequest` removal:** Once all operations are routed through the driver, the `Option<CosmosRequest>` field on `CosmosResponse<T>` can be removed entirely. - **`custom_headers` review:** The `custom_headers` field on `OperationOptions` was added for feature parity. It may be removed as we analyze which options are truly needed at the driver level. - **Remaining operations:** `create_item`, `delete_item`, `replace_item`, `upsert_item`, `patch_item`, and query operations should follow the same pattern established here. ## Fault Injection Wiring When cutting `read_item` over to the driver, the SDK's fault injection tests initially failed because the two execution paths (gateway and driver) have **independent fault injection systems**. This section documents how they were connected. ### Problem The SDK and driver each have their own fault injection module (`azure_data_cosmos::fault_injection` and `azure_data_cosmos_driver::fault_injection`). They define parallel but separate types (`FaultInjectionRule`, `FaultInjectionCondition`, `FaultInjectionResult`, etc.) with identical variants but different Rust types. Prior to this work, only the gateway pipeline received fault injection rules — the driver was built without them. ### Solution: Rule Translation with Shared State The bridge module (`driver_bridge.rs`) includes `sdk_fi_rules_to_driver_fi_rules()`, which translates SDK fault injection rules into driver fault injection rules. The translation covers: - `FaultOperationType` — variant-by-variant match (identical variant names) - `FaultInjectionErrorType` — variant-by-variant match - `FaultInjectionCondition` — `RegionName` → `Region`, operation type and container ID mapped directly - `FaultInjectionResult` — `Duration` → `Option<Duration>`, probability copied - Timing fields — `start_time: Instant` → `Option<Instant>`, `end_time` and `hit_limit` copied ### Shared Mutable State SDK `FaultInjectionRule` has `enabled: Arc<AtomicBool>` and `hit_count: Arc<AtomicU32>` that tests mutate at runtime (`.disable()`, `.enable()`, `.hit_count()`). The driver's `FaultInjectionRuleBuilder` accepts external `Arc`s via `with_shared_state()`, so both the SDK gateway path and the driver path reference the **same atomic state**. This means: - Calling `.disable()` on the SDK rule also disables it in the driver - Hit counts are shared — both paths increment the same counter - Tests that toggle rules or assert hit counts work correctly across both paths ### Wiring in `CosmosClientBuilder` In `CosmosClientBuilder::build()`: 1. Before the `FaultInjectionClientBuilder` is consumed for the gateway transport, `rules()` extracts a reference to the SDK rules 2. `sdk_fi_rules_to_driver_fi_rules()` translates them to driver rules with shared state 3. The translated rules are passed to `CosmosDriverRuntimeBuilder::with_fault_injection_rules()` 4. The SDK's `fault_injection` Cargo feature now forwards to the driver's `fault_injection` feature ### Test Patterns for Future Cutover When cutting over additional operations, **no additional fault injection wiring is needed** — it's handled once at the `CosmosClientBuilder` level. However, tests that assert `request_url()` need to handle `None` for driver-routed operations: ```rust // Gateway-routed operations return Some(url) // Driver-routed operations return None if let Some(url) = response.request_url() { assert_eq!(url.host_str().unwrap(), expected_endpoint); } ``` ### `custom_response` Translation Translation of `CustomResponse` (synthetic HTTP responses) is not yet implemented. None of the current tests use custom responses for `ReadItem` operations. When needed, the bridge function should be extended to translate `CustomResponse` fields (`status_code`, `headers`, `body`). ### Consolidating to Driver Fault Injection After Cutover The current dual-system architecture (SDK fault injection + driver fault injection + translation bridge) exists only because the cutover is incremental — some operations still go through the gateway while others go through the driver. Once **all** operations are routed through the driver: 1. **Drop `azure_data_cosmos::fault_injection`** — the SDK's HTTP-client-level fault interception module becomes unreachable. Delete the entire `src/fault_injection/` directory. 2. **Re-export driver types** — the SDK re-exports the driver's fault injection types directly: ```rust #[cfg(feature = "fault_injection")] pub use azure_data_cosmos_driver::fault_injection; ``` 3. **Remove the translation layer** — `sdk_fi_rules_to_driver_fi_rules()` in `driver_bridge.rs` and the `shared_enabled()`/`shared_hit_count()` accessors on the SDK rule are no longer needed. 4. **Simplify `CosmosClientBuilder`** — `with_fault_injection()` accepts `Vec<Arc<driver::FaultInjectionRule>>` directly and passes them to `CosmosDriverRuntimeBuilder::with_fault_injection_rules()`. No translation, no cloning, no intermediary builder. 5. **Update tests** — tests construct driver `FaultInjectionRule` directly (same builders, same API) instead of SDK rules. At that point the SDK has **no fault injection logic of its own** — it's a pass-through to the driver, matching the overall "SDK as thin wrapper" goal. The driver is the single source of truth for all transport-related concerns including fault injection. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Summary
Adds
azure_data_cosmos_driveras a dependency to the SDK and introduces theContainerReferencepattern for eager container metadata resolution. This is the first step toward using the driver as the SDK's internal transport/routing layer.This PR is not the full cutover into the driver under the hood - its main purpose is to start that process with the bare minimum (current caches) without fully replacing the transport pipeline. That will be done in an entirely separate issue/PR. The cutover from the SDK's fault injection into the Driver's fault injection for end-to-end testing against the driver will also be in a separate issue/PR. This work is meant to unblock other work as well, like #3987.
Design
ContainerClientconstruction now eagerly resolves immutable container metadata (RID, partition key definition) via the driver'sresolve_container(), rather than doing per-operation cache lookups insend(). This mirrors how the driver's ownContainerReferenceworks.SDK
ContainerReference(No Model Sharing)The SDK defines its own
pub(crate) ContainerReferenceadapted from the driver's type viafrom_driver_ref(). This follows the versioning strategy inAGENTS.md—azure_data_cosmoscannot exposeazure_data_cosmos_drivertypes directly.Changes
SDK (
azure_data_cosmos)models/container_reference.rsContainerReferencewithfrom_driver_ref(),from_parts(), accessorsclients/cosmos_client.rsdriver: Arc<CosmosDriver>field, passes toDatabaseClientclients/cosmos_client_builder.rsCosmosDriverRuntime+CosmosDriverinbuild(), commented out 5 builder unit tests (need fault injection linked from SDK to driver)clients/database_client.rsdriverfield,container_client()now returnsazure_core::Result<ContainerClient>(breaking)clients/container_client.rsnew()callsdriver.resolve_container(), buildsContainerReference, returnsResulthandler/container_connection.rsContainerReference,send()uses stored metadata, fixed dual-cache-key bugDependency alignment
Cargo.tomlazure_data_cosmos_driverworkspace dependencyCargo.tomlazure_core→ workspace,reqwestfeature forwardsdriver/reqwest_native_tlsCargo.tomlazure_core→ workspaceCargo.tomlazure_core+azure_identity→ workspaceCall site updates (~59 sites across 19 files)
All
.container_client()calls updated to.container_client().await?across tests, examples, native crate, and perf crate.Bug fix
send()previously used the container name (e.g.,"MyContainer") as thepk_range_cachekey, while the cache parameter is namedcollection_ridand expects a RID. All lookups now consistently useContainerReference::collection_rid().Architecture notes
resolve_container()in this PR. The SDK'sGatewayPipelinestill handles all data plane operations. Full transport cutover is planned for a future PR.ContainerClientinstances to fail — this will be addressed in a follow up taking care of container re-creation scenarios.