Skip to content

DNS dot-suffix: Adds TCP DNS dot-suffix for Direct mode to avoid Kubernetes ndots latency#5731

Open
kirankumarkolli wants to merge 3 commits intomasterfrom
users/kirankk/copilot-5730-feature-dns-dot-suffix
Open

DNS dot-suffix: Adds TCP DNS dot-suffix for Direct mode to avoid Kubernetes ndots latency#5731
kirankumarkolli wants to merge 3 commits intomasterfrom
users/kirankk/copilot-5730-feature-dns-dot-suffix

Conversation

@kirankumarkolli
Copy link
Copy Markdown
Member

@kirankumarkolli kirankumarkolli commented Apr 3, 2026

Feature: Enable DNS dot-suffix for Direct mode to avoid Kubernetes ndots latency

Description

Fixes #5730

🤖 This PR was authored by GitHub Copilot as part of an automated issue triage and resolution workflow.

On Kubernetes with ndots:5, Cosmos DB endpoints like myaccount.documents.azure.com (only 3 dots) trigger multiple failed DNS search-domain expansions before the absolute lookup succeeds. This adds 50-200ms latency per DNS resolution in Direct mode (TCP/RNTBD).

This fix leverages the existing dnsResolutionFunction injection point on StoreClientFactory to append a trailing dot (.) to hostnames at DNS resolution time, making them fully qualified (FQDN). This signals the DNS resolver to skip search-domain expansion entirely.


Issue Summary

Property Value
Issue #5730
Area DirectMode / Transport
SDK Version (reported) v3 (.NET)
Severity P3

Root Cause Analysis

Code Path

DocumentClient.cs:6761 - InitializeDirectConnectivity()
  └─> new StoreClientFactory(...) — dnsResolutionFunction defaults to null
      └─> Connection.cs:540 - this.dnsResolutionFunction(this.serverUri.DnsSafeHost)
          └─> Connection.cs:1016 - Dns.GetHostAddressesAsync(hostName) — bare hostname, no dot

Root Cause

Connection.ResolveHostAsync() calls Dns.GetHostAddressesAsync(hostName) with bare hostnames (e.g., myaccount.documents.azure.com). On Kubernetes with ndots:5, hostnames with fewer than 5 dots trigger search-domain expansion: the resolver first tries myaccount.documents.azure.com.default.svc.cluster.local, then .svc.cluster.local, etc — all failing — before the absolute lookup. This is not a regression; it has always worked this way, but the latency only manifests in Kubernetes environments.


Changes Made

Files Modified

File Change
ConfigurationManager.cs Added AZURE_COSMOS_DNS_DOT_SUFFIX_ENABLED env var constant + IsDnsDotSuffixEnabled() accessor
DnsDotSuffixHelper.cs (new) ToFqdnHostName() appends trailing dot (skips IPs/nulls/already-dotted). CreateDnsResolutionFunction() factory for the StoreClientFactory lambda
DocumentClient.cs Wired dnsResolutionFunction: into StoreClientFactory constructor, conditionally enabled via env var
DnsDotSuffixHelperTests.cs (new) 10 unit tests covering all edge cases

Code Changes

  • ConfigurationManager.cs: Follows existing env var pattern (internal static readonly string + public accessor). Added after last existing constant.
  • DnsDotSuffixHelper.cs: New internal static utility. ToFqdnHostName() uses IPAddress.TryParse to skip IPs, checks EndsWith(".") for idempotency. CreateDnsResolutionFunction() wraps Dns.GetHostAddressesAsync with dot-suffix logic.
  • DocumentClient.cs: Uses named parameter dnsResolutionFunction: in existing StoreClientFactory constructor call. Passes null when disabled (preserving default behavior).

Generated Output (Before/After)

Before (bare hostname — triggers ndots expansion):

DNS query: myaccount.documents.azure.com
  → Try: myaccount.documents.azure.com.default.svc.cluster.local (NXDOMAIN ~10ms)
  → Try: myaccount.documents.azure.com.svc.cluster.local (NXDOMAIN ~10ms)
  → Try: myaccount.documents.azure.com.cluster.local (NXDOMAIN ~10ms)
  → Try: myaccount.documents.azure.com.internal (NXDOMAIN ~10ms)
  → Try: myaccount.documents.azure.com (SUCCESS ~5ms)
Total: ~45ms

After (FQDN with trailing dot — single query):

DNS query: myaccount.documents.azure.com.
  → Try: myaccount.documents.azure.com. (SUCCESS ~5ms)
Total: ~5ms

Testing

Test Results

Test Suite Total Passed Failed
DnsDotSuffixHelperTests 10 10 0
Build - -

New Tests Added

  • DnsDotSuffixHelperTests.ToFqdnHostName_AppendsTrailingDot - Standard Cosmos DB endpoint
  • DnsDotSuffixHelperTests.ToFqdnHostName_IdempotentWhenAlreadyDotSuffixed - Already FQDN
  • DnsDotSuffixHelperTests.ToFqdnHostName_SkipsIPv4Address - IPv4 passthrough
  • DnsDotSuffixHelperTests.ToFqdnHostName_SkipsIPv6Address - IPv6 loopback
  • DnsDotSuffixHelperTests.ToFqdnHostName_SkipsIPv6FullAddress - Full IPv6
  • DnsDotSuffixHelperTests.ToFqdnHostName_ReturnsNullForNull - Null guard
  • DnsDotSuffixHelperTests.ToFqdnHostName_ReturnsEmptyForEmpty - Empty guard
  • DnsDotSuffixHelperTests.ToFqdnHostName_AppendsTrailingDotToLocalhost - Single-label host
  • DnsDotSuffixHelperTests.ToFqdnHostName_AppendsTrailingDotToSingleLabel - Short hostname
  • DnsDotSuffixHelperTests.CreateDnsResolutionFunction_ReturnsNonNullFunction - Factory sanity

Breaking Changes

None. Opt-in via environment variable. Default behavior unchanged.


External References


Checklist

  • Code follows project conventions
  • Self-review completed
  • Comments added for complex logic
  • Documentation updated (XML doc comments)
  • New tests added for the fix
  • All existing tests pass
  • Remote CI gates pass (Section 7.4)

Generated by GitHub Copilot CLI Agent

…ency

Adds opt-in DNS dot-suffix (FQDN trailing dot) for Direct/TCP connections
to avoid Kubernetes ndots:5 search-domain expansion latency.

- Add AZURE_COSMOS_DNS_DOT_SUFFIX_ENABLED environment variable
- Add DnsDotSuffixHelper with ToFqdnHostName() and CreateDnsResolutionFunction()
- Wire dnsResolutionFunction into StoreClientFactory via DocumentClient
- Add unit tests for DnsDotSuffixHelper

Fixes #5730

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good!

- Rename AZURE_COSMOS_DNS_DOT_SUFFIX_ENABLED  AZURE_COSMOS_TCP_DNS_DOT_SUFFIX_ENABLED
- Rename DnsDotSuffixEnabled  TcpDnsDotSuffixEnabled
- Rename IsDnsDotSuffixEnabled()  IsTcpDnsDotSuffixEnabled()
- Restore original line endings in IsLengthAwareRangeComparatorEnabled()

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kirankumarkolli kirankumarkolli changed the title Feature: Enable DNS dot-suffix for Direct mode to avoid Kubernetes ndots latency DNS dot-suffix: Adds TCP DNS dot-suffix for Direct mode to avoid Kubernetes ndots latency Apr 3, 2026
@kirankumarkolli kirankumarkolli marked this pull request as ready for review April 3, 2026 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cosmos SDK: Enable FQDN DNS Resolution to Avoid Kubernetes ndots Latency

1 participant