Fix JVM <clinit> deadlock by removing static final accessor fields#48689
Fix JVM <clinit> deadlock by removing static final accessor fields#48689jeet1995 wants to merge 3 commits intoAzure:mainfrom
Conversation
|
Closing — bridge classes don't allow adding new methods. Proceeding with #48667 (Class.forName with explicit classloader). |
Remove all 49 'private static final XxxAccessor' fields from consuming
classes and replace with inline calls to the getter. These fields
triggered initializeAllAccessors() during <clinit>, creating circular
class initialization dependencies that permanently deadlocked the JVM
when multiple threads concurrently loaded Cosmos classes.
The accessor is already cached inside ImplementationBridgeHelpers via
AtomicReference — the static fields were a redundant cache. Inline
calls add one volatile read (~1ns) per access, negligible vs the
actual Cosmos operations.
Also includes:
- Missing static { initialize(); } blocks for 3 classes
- CosmosItemSerializer <clinit> ordering fix
- Renamed misnamed accessor in CosmosDiagnosticsThresholdsHelper
- Forked-JVM deadlock regression test (invocationCount=5, 12 threads)
- Behavioral enforcement test for static { initialize(); } blocks
Fixes: Azure#48622
Fixes: Azure#48585
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
e57066d to
66afd43
Compare
|
/azp run java - cosmos - ci |
|
/azp run java - cosmos - tests |
|
/azp run java - cosmos - kafka |
|
/azp run java - cosmos - spark |
|
/azp run java - spring - ci |
|
Azure Pipelines successfully started running 1 pipeline(s). |
4 similar comments
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run java - cosmos - ci |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR addresses a JVM <clinit> deadlock in the Cosmos Java SDK by removing many static final ...Accessor caches in consuming classes and switching call sites to resolve accessors lazily via ImplementationBridgeHelpers.*Helper.get*Accessor() on demand, reducing class-initialization-time cross-dependencies.
Changes:
- Replaced numerous
private static final XxxAccessor ... = getXxxAccessor()fields with inline (lazy) getter calls at usage sites. - Added/adjusted
static { initialize(); }blocks and<clinit>ordering to ensure accessors are registered safely during class initialization where required. - Added forked-JVM regression/enforcement tests around concurrent
<clinit>behavior and accessor registration.
Reviewed changes
Copilot reviewed 54 out of 54 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/spring/azure-spring-data-cosmos/README.md | Trailing whitespace/newline adjustment. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/util/CosmosPagedFluxStaticListImpl.java | Removed static accessor cache; inline FeedResponse accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/util/CosmosPagedFluxDefaultImpl.java | Removed static accessor cache; inline diagnostics context accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/FeedResponse.java | Removed static diagnostics accessor cache; inline accessor calls. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/CosmosOperationDetails.java | Added static { initialize(); } registration block. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/CosmosItemRequestOptions.java | Removed static thresholds accessor cache; inline thresholds accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/StaleResourceRetryPolicy.java | Removed static exception accessor cache. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/SessionTokenMismatchRetryPolicy.java | Removed static accessor cache; inline session retry options accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentClientImpl.java | Removed multiple static accessor caches; inline accessor usage across implementation. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/QueryPlanRetriever.java | Removed static accessor caches; inline accessors for options/exception handling. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/PipelinedQueryExecutionContext.java | Removed static accessor cache; inline accessor usage when cloning options. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/PipelinedDocumentQueryExecutionContext.java | Removed static accessor caches; inline options and serializer accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/ParallelDocumentQueryExecutionContext.java | Removed static accessor caches; inline options/diagnostics accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/OrderByUtils.java | Removed static diagnostics accessor cache; inline diagnostics accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/OrderByDocumentQueryExecutionContext.java | Removed static accessor caches; inline feed/diagnostics accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/OrderByDocumentProducer.java | Removed static feed accessor cache; inline feed accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/NonStreamingOrderByUtils.java | Removed static diagnostics accessor cache; inline diagnostics accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/NonStreamingOrderByDocumentQueryExecutionContext.java | Removed static accessor caches; inline feed/diagnostics accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/HybridSearchDocumentQueryExecutionContext.java | Removed static accessor caches; inline feed/diagnostics accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/GroupByDocumentQueryExecutionContext.java | Removed static diagnostics accessor cache; inline diagnostics accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/Fetcher.java | Removed static diagnostics accessor cache; inline diagnostics accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/DocumentQueryExecutionContextFactory.java | Inline options accessor usage in creation flow. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/DocumentQueryExecutionContextBase.java | Removed static accessor caches; inline accessor usage for request creation and cloning. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/DocumentProducer.java | Removed static accessor cache; inline accessor usage when cloning options. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/DefaultDocumentQueryExecutionContext.java | Removed static accessor cache; inline accessor usage for partition key definition/properties. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/DCountDocumentQueryExecutionContext.java | Removed static diagnostics accessor cache; inline diagnostics accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/ChangeFeedFetcher.java | Removed static feed accessor cache; inline feed accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/AggregateDocumentQueryExecutionContext.java | Removed static accessor caches; inline feed/diagnostics accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/JsonSerializable.java | Removed static serializer accessor cache; inline serializer accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/ImplementationBridgeHelpers.java | Renamed thresholds accessor getter (getCosmosAsyncClientAccessor → getCosmosDiagnosticsThresholdsAccessor). |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/HttpClientConfig.java | Removed static HTTP2 config accessor cache; inline accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/Document.java | Removed static serializer accessor cache; inline serializer accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/GoneAndRetryWithRetryPolicy.java | Removed static exception accessor cache; inline exception accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/DiagnosticsProvider.java | Removed multiple static accessor caches; inline accessors throughout tracing/metrics paths. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/CosmosQueryRequestOptionsImpl.java | Updated thresholds accessor getter name usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/CosmosQueryRequestOptionsBase.java | Updated thresholds accessor getter name usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/ConnectionPolicy.java | Removed static HTTP2 config accessor cache; inline accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/clienttelemetry/ClientTelemetryMetrics.java | Removed static accessor caches; inline accessors for telemetry metrics recording. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/clienttelemetry/ClientMetricsDiagnosticsHandler.java | Removed static telemetry config accessor cache. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/ChangeFeedQueryImpl.java | Removed static accessor caches; inline accessors for change feed request/response. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/caches/RxCollectionCache.java | Removed static exception accessor cache; inline exception accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/batch/TransactionalBulkExecutor.java | Removed static batch request options accessor cache; inline accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/batch/BulkExecutor.java | Removed static accessor caches; inline accessors for batch response and diagnostics provider. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosRequestContext.java | Added static { initialize(); } registration block. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosItemSerializer.java | Reordered <clinit> to register accessor before static fields. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosDiagnosticsContext.java | Added static { initialize(); } registration block. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosContainerProactiveInitConfig.java | Removed static container identity accessor cache; inline accessor usage. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosAsyncUser.java | Removed static accessor caches; inline accessors for query naming/feed response creation. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosAsyncScripts.java | Removed static accessor caches; inline accessors for query naming/feed response creation. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosAsyncDatabase.java | Removed static accessor caches; inline accessors for query naming/feed response creation. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosAsyncContainer.java | Removed many static accessor caches; inline accessors across request/response, policies, and telemetry. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosAsyncClient.java | Removed static accessor caches; inline telemetry/options/feed response accessors. |
| sdk/cosmos/azure-cosmos/CHANGELOG.md | Added changelog bullet for <clinit> deadlock fix. |
| sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/ImplementationBridgeHelpersTest.java | Added forked-JVM deadlock regression test and accessor registration enforcement test. |
| @@ -59,7 +59,6 @@ public class DocumentQueryExecutionContextFactory { | |||
|
|
|||
| private final static int PageSizeFactorForTop = 5; | |||
| private static final Logger logger = LoggerFactory.getLogger(DocumentQueryExecutionContextFactory.class); | |||
There was a problem hiding this comment.
DocumentQueryExecutionContextFactory still has a static final CosmosQueryRequestOptionsAccessor (qryOptAccessor) that is initialized during this class’ <clinit> via getCosmosQueryRequestOptionsAccessor(). This reintroduces the class-initialization-time accessor resolution pattern this PR is trying to eliminate and can put initializeAllAccessors() back on a <clinit> path. Please remove this static accessor field and resolve the accessor lazily at the point of use (e.g., inline getter call or a method-local variable).
| public abstract class CosmosQueryRequestOptionsBase<T extends CosmosQueryRequestOptionsBase<?>> implements OverridableRequestOptions { | ||
| private final static ImplementationBridgeHelpers.CosmosDiagnosticsThresholdsHelper.CosmosDiagnosticsThresholdsAccessor thresholdsAccessor = | ||
| ImplementationBridgeHelpers.CosmosDiagnosticsThresholdsHelper.getCosmosAsyncClientAccessor(); | ||
| ImplementationBridgeHelpers.CosmosDiagnosticsThresholdsHelper.getCosmosDiagnosticsThresholdsAccessor(); | ||
|
|
There was a problem hiding this comment.
CosmosQueryRequestOptionsBase still initializes thresholdsAccessor as a static final field by calling getCosmosDiagnosticsThresholdsAccessor() during class initialization. This is the same pattern that can trigger accessor initialization during <clinit> and can reintroduce the deadlock this PR is addressing. Please remove this static field and resolve the accessor lazily (inline call or method-local caching) where it’s needed.
| public final class CosmosQueryRequestOptionsImpl extends CosmosQueryRequestOptionsBase<CosmosQueryRequestOptionsImpl> { | ||
| private final static ImplementationBridgeHelpers.CosmosDiagnosticsThresholdsHelper.CosmosDiagnosticsThresholdsAccessor thresholdsAccessor = | ||
| ImplementationBridgeHelpers.CosmosDiagnosticsThresholdsHelper.getCosmosAsyncClientAccessor(); | ||
| ImplementationBridgeHelpers.CosmosDiagnosticsThresholdsHelper.getCosmosDiagnosticsThresholdsAccessor(); | ||
| private String partitionKeyRangeId; |
There was a problem hiding this comment.
CosmosQueryRequestOptionsImpl still initializes thresholdsAccessor as a static final field by calling getCosmosDiagnosticsThresholdsAccessor() during <clinit>. This keeps accessor resolution on a class-initialization path and can reintroduce the deadlock scenario. Please remove this static field and resolve the accessor lazily at the call sites (optionally cache in a method-local variable).
| * Enforces that every class targeted by {@code ensureClassInitialized()} in | ||
| * {@link ImplementationBridgeHelpers} registers its accessor during {@code <clinit>} | ||
| * (i.e., has a {@code static { initialize(); }} block). | ||
| * <p> | ||
| * Verification is behavioral, not source-based: a forked child JVM iterates every |
There was a problem hiding this comment.
The JavaDoc for allAccessorClassesMustHaveStaticInitializerBlock refers to ensureClassInitialized() and claims the getter triggers Class.forName() → <clinit> and will fail if static { initialize(); } is missing. In the current ImplementationBridgeHelpers implementation, getters call initializeAllAccessors() (not a targeted Class.forName()), so this test can pass even when a class doesn’t register its accessor during <clinit> (because initializeAllAccessors() can set accessors directly). Please adjust the test (or the infrastructure) so it actually enforces the intended contract.
| * timeout detects the hang. Runs 5 invocations via TestNG ({@code invocationCount = 5}), | ||
| * each forking 3 child JVMs — totaling 15 fresh JVMs × 12 concurrent threads = 180 | ||
| * {@code <clinit>} race attempts. | ||
| */ | ||
| @Test(groups = { "unit" }, invocationCount = 5) | ||
| public void concurrentAccessorInitializationShouldNotDeadlock() throws Exception { |
There was a problem hiding this comment.
concurrentAccessorInitializationShouldNotDeadlock is marked as a unit test but forks multiple child JVMs (invocationCount=5 and an inner runs=3, i.e., 15 JVM forks per test execution). This is likely to add substantial runtime/overhead to the unit test suite and increase flakiness on constrained CI agents. Consider moving this to a heavier-weight test group (or reducing the fork count) while keeping the regression coverage.
| this.maxRetryAttemptsInCurrentRegion = | ||
| new AtomicInteger(sessionRetryOptionsAccessor.getMaxInRegionRetryCount(sessionRetryOptions)); | ||
| this.regionSwitchHint = sessionRetryOptionsAccessor.getRegionSwitchHint(sessionRetryOptions); | ||
| this.minInRegionRetryTime = sessionRetryOptionsAccessor.getMinInRegionRetryTime(sessionRetryOptions); | ||
| new AtomicInteger(ImplementationBridgeHelpers.CosmosSessionRetryOptionsHelper.getCosmosSessionRetryOptionsAccessor().getMaxInRegionRetryCount(sessionRetryOptions)); | ||
| this.regionSwitchHint = ImplementationBridgeHelpers.CosmosSessionRetryOptionsHelper.getCosmosSessionRetryOptionsAccessor().getRegionSwitchHint(sessionRetryOptions); | ||
| this.minInRegionRetryTime = ImplementationBridgeHelpers.CosmosSessionRetryOptionsHelper.getCosmosSessionRetryOptionsAccessor().getMinInRegionRetryTime(sessionRetryOptions); |
There was a problem hiding this comment.
These inlined accessor calls create very long lines (likely exceeding the 120-character Checkstyle LineLength limit) and are harder to read. Consider storing the accessor in a method-local variable (safe w.r.t. <clinit> deadlock) and/or wrapping the call chain across lines to satisfy the repository’s Checkstyle line-length rule (e.g., eng/lintingconfigs/checkstyle/track2/checkstyle.xml).
| public CosmosDiagnostics createDiagnostics() { | ||
| CosmosDiagnostics diagnostics = | ||
| diagnosticsAccessor.create(this, telemetryCfgAccessor.getSamplingRate(this.clientTelemetryConfig)); | ||
| ImplementationBridgeHelpers.CosmosDiagnosticsHelper.getCosmosDiagnosticsAccessor().create(this, ImplementationBridgeHelpers.CosmosClientTelemetryConfigHelper.getCosmosClientTelemetryConfigAccessor().getSamplingRate(this.clientTelemetryConfig)); | ||
|
|
There was a problem hiding this comment.
This line is extremely long due to nested inline accessor calls and is likely to violate the 120-character Checkstyle LineLength rule. Please wrap this across multiple lines and/or use a method-local variable for the accessors to keep the code readable and compliant.
| Duration.ofNanos(feedResponseConsumerLatencyInNanos.get())); | ||
|
|
||
| tracerProvider.endSpan(ctxSnapshot, traceCtx, ctxAccessor.isEmptyCompletion(ctxSnapshot), isSampledOut); | ||
| tracerProvider.endSpan(ctxSnapshot, traceCtx, ImplementationBridgeHelpers.CosmosDiagnosticsContextHelper.getCosmosDiagnosticsContextAccessor().isEmptyCompletion(ctxSnapshot), isSampledOut); | ||
|
|
There was a problem hiding this comment.
The endSpan call inlines another getCosmosDiagnosticsContextAccessor() invocation, making the line very long and repeating the accessor lookup. Please store the accessor (or isEmptyCompletion result) in a local variable and/or wrap the call across lines to avoid Checkstyle LineLength violations and reduce repeated volatile reads.
sdk/cosmos/azure-cosmos/CHANGELOG.md
Outdated
| #### Bugs Fixed | ||
| Fixing an NPE caused due to boxed Boolean conversion. - See [PR 48656](https://github.com/Azure/azure-sdk-for-java/pull/48656/) | ||
| * Fixing an NPE caused due to boxed Boolean conversion. - See [PR 48656](https://github.com/Azure/azure-sdk-for-java/pull/48656/) | ||
| * Fixed JVM `<clinit>` deadlock when multiple threads concurrently trigger Cosmos SDK class loading for the first time. - See [PR 48667](https://github.com/Azure/azure-sdk-for-java/pull/48667) |
There was a problem hiding this comment.
The changelog entry for the <clinit> deadlock fix points to PR #48667, but this PR is described as an alternative approach. Please update the reference to point to the correct PR (this one) or otherwise clarify which change actually delivered the fix in this release line.
Pattern A (private final instance fields) — 18 files: CosmosAsyncClient, CosmosAsyncDatabase, CosmosAsyncScripts, CosmosAsyncUser, ConnectionPolicy, RxDocumentClientImpl, BulkExecutor, TransactionalBulkExecutor, HttpClientConfig, AggregateDocumentQueryExecutionContext, DefaultDocumentQueryExecutionContext, DocumentQueryExecutionContextBase, OrderByDocumentProducer, CosmosPagedFluxDefaultImpl, CosmosPagedFluxStaticListImpl Pattern B (private static lazy getter methods) — 11 files: CosmosAsyncContainer (11 methods), DiagnosticsProvider (3 methods), ClientTelemetryMetrics (1 method), DocumentProducer, DocumentQueryExecutionContextFactory, PipelinedDocumentQueryExecutionContext, PipelinedQueryExecutionContext, HybridSearchDocumentQueryExecutionContext, NonStreamingOrderByDocumentQueryExecutionContext, OrderByDocumentQueryExecutionContext, ParallelDocumentQueryExecutionContext 4 files originally listed as Pattern A were moved to Pattern B because their accessors are only used in static inner classes (ItemToPageTransformer, etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run java - cosmos - ci |
|
/azp run java - cosmos - tests |
|
/azp run java - cosmos - kafka |
|
/azp run java - cosmos - spark |
|
/azp run java - spring - ci |
|
Azure Pipelines successfully started running 1 pipeline(s). |
4 similar comments
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
… stale docs - Removed remaining static final accessor fields in DocumentQueryExecutionContextFactory, CosmosQueryRequestOptionsBase, CosmosQueryRequestOptionsImpl - Extracted local variables for long inline accessor chains in SessionTokenMismatchRetryPolicy, RxDocumentClientImpl, CosmosPagedFluxDefaultImpl - Updated test Javadoc to reflect lazy accessor approach (not Class.forName) - Reduced child JVM runs from 3 to 1 (invocationCount=5 provides repetition) - Fixed CHANGELOG PR link to Azure#48689 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run java - cosmos - ci |
|
/azp run java - cosmos - tests |
|
/azp run java - cosmos - kafka |
|
/azp run java - cosmos - spark |
|
/azp run java - spring - ci |
|
Azure Pipelines successfully started running 1 pipeline(s). |
4 similar comments
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run java - cosmos - ci |
|
/azp run java - cosmos - tests |
|
/azp run java - spring - ci |
|
Azure Pipelines successfully started running 1 pipeline(s). |
2 similar comments
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Summary
Fixes a JVM-level
<clinit>deadlock that occurs when multiple threads concurrently trigger Cosmos SDK class loading for the first time. This is a permanent, unrecoverable deadlock that hangs all affected threads indefinitely.Fixes: #48622, #48585
Root Cause
Consuming classes cache accessors in
private static finalfields:During
<clinit>, the getter finds the accessor null and callsinitializeAllAccessors(), which eagerly loads 40+ classes. When two threads enter<clinit>of different classes simultaneously, the JVM's per-class initialization locks create a circular wait — permanent deadlock.Fix
Remove all 49
private static final XxxAccessorfields and replace with inline calls to the getter:The accessor is already cached inside
ImplementationBridgeHelpersviaAtomicReference— the static fields were a redundant cache. Inline calls add one volatile read (~1ns) per access, negligible vs actual Cosmos operations.Why this approach
Class.forName()in getters<clinit>(JLS §12.4.2 no-op). Also reverts Fabian's intentional removal in PR #28912initializeAllAccessors()from<clinit>*BridgeInternalclassesinitialize()publicCosmosClientBuilder.static{}<clinit>involvement, zero recursive edge casesAdditional fixes
static { initialize(); }toCosmosRequestContext,CosmosOperationDetails,CosmosDiagnosticsContext<clinit>ordering inCosmosItemSerializer— movedstatic { initialize(); }beforeDEFAULT_SERIALIZERgetCosmosAsyncClientAccessor()→getCosmosDiagnosticsThresholdsAccessor()inCosmosDiagnosticsThresholdsHelperFiles Changed
49
static finalaccessor fields removed across 27 consuming classes. No changes toImplementationBridgeHelpers.javaitself.Tests
concurrentAccessorInitializationShouldNotDeadlock(invocationCount=5) — forks fresh child JVMs, 12 threads viaCyclicBarrierconcurrently triggering<clinit>allAccessorClassesMustHaveStaticInitializerBlock— behavioral enforcement via forked JVM, verifies every accessor is non-null after<clinit>accessorInitialization— existing test, validates explicitinitializeAllAccessors()bootstrap path