Skip to content

Cosmos Driver: Hierarchical Partition Key Support#4087

Draft
simorenoh wants to merge 4 commits intorelease/azure_data_cosmos-previewsfrom
cosmos-hpk
Draft

Cosmos Driver: Hierarchical Partition Key Support#4087
simorenoh wants to merge 4 commits intorelease/azure_data_cosmos-previewsfrom
cosmos-hpk

Conversation

@simorenoh
Copy link
Copy Markdown
Member

@simorenoh simorenoh commented Apr 3, 2026

Adds MultiHash EPK (Effective Partition Key) computation and prefix partition key routing infrastructure to the driver crate, enabling hierarchical partition key (HPK) support for containers with multiple partition key paths.

What changed

MultiHash EPK computation (effective_partition_key.rs)

Previously, EffectivePartitionKey::compute() fell through to single-hash V2 for PartitionKeyKind::MultiHash, producing incorrect EPKs. MultiHash requires each component to be hashed independently — this PR adds that per-component hashing.

  • Added PartitionKeyKind::MultiHash arm to compute() routing to new effective_partition_key_multi_hash_v2()
  • Each component is independently V2-encoded → MurmurHash3-128 → byte-reversed → top-2-bit masked → hex-encoded, then concatenated (N×32 hex chars)
  • Extracted shared hash_v2_to_epk() helper used by both single-hash and multi-hash paths
  • Algorithm verified against cross-SDK baselines (.NET, Go, Java) via existing testdata/*.xml fixtures

Prefix EPK range computation (effective_partition_key.rs)

  • Added compute_range() for partial/prefix partition keys (fewer components than the container definition)
  • Full key → point range (start == end); prefix key on MultiHash → [prefix_epk, prefix_epk + "FF") range covering all possible suffix completions

Prefix routing in PK range cache (partition_key_range_cache.rs)

  • Added resolve_partition_key_range_ids() that handles both full and prefix partition keys
  • Full key: point lookup (single range ID); prefix key: EPK range → resolve_overlapping_ranges() → multiple range IDs for fan-out

Tests

9 new unit tests covering:

  • Single/two/three-component MultiHash EPK computation with expected value verification
  • MultiHash with Undefined component (partial HPK)
  • MultiHash vs single-hash divergence for multi-component keys
  • compute_range() for full keys, prefix keys (1-of-3, 2-of-3), and single-hash (always point)

Follow-up: FeedRange API (PR #3987)

PR #3987 introduces feed_range_from_partition_key(), which currently uses the SDK's hash.rs to compute EPKs. For MultiHash containers, this hits the existing stub and returns incorrect results. Once that method is updated to route through the driver's EPK computation (as part of the SDK-to-driver cutover), it will get correct MultiHash support for free from this PR. Prefix HPK support for the FeedRange API (returning multiple feed ranges for partial keys) will additionally need the compute_range() infrastructure added here.

This is the first step of the end-to-end implementation of this feature. The remaining work, piecing together the operations to this logic and ensuring that queries can also use it, rely on the migration to the driver.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2026

API Change Check

APIView identified API level changes in this PR and created the following API reviews

azure_data_cosmos_driver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cosmos The azure_cosmos crate

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

1 participant