fix: prevent grants_version_going_backward under concurrent grants#4358
Open
singhpk234 wants to merge 1 commit into
Open
fix: prevent grants_version_going_backward under concurrent grants#4358singhpk234 wants to merge 1 commit into
singhpk234 wants to merge 1 commit into
Conversation
10a3892 to
8eae75f
Compare
Two independent issues in AtomicOperationMetaStoreManager / JDBC persistence could surface `IllegalStateException: grants_version_going_backward` and silently lost grant-version increments on the JDBC backend under concurrent write workload (reproducible via polaris-tools' create-dataset-simulation benchmark, apache#3836): * Read side: refreshResolvedEntity called lookupEntityVersions at t0 and conditionally lookupEntity at t1, then returned the entity from t1 paired with the grants version from t0. A concurrent persistNewGrantRecord that bumped the entity row's grant_records_version between the two reads produced a result where entity.getGrantRecordsVersion() exceeded the reported grantsVersion, tripping ResolvedPolarisEntity's invariant. Fix: when the entity is reloaded, use its own grant_records_version as the reported value so the returned pair is internally consistent. * Write side: JdbcBasePersistenceImpl.persistEntity CAS-checked entity_version only, not grant_records_version, so two concurrent grant bumps on the same grantee could both pass CAS and silently lose one increment. Fix: include grant_records_version in the CAS WHERE clause, matching the transactional path's CAS at AbstractTransactionalPersistence.checkConditionsForWriteEntityInCurrentTxn. Adds AtomicOperationMetaStoreManagerRefreshTest covering the read-side race reproducer. Fixes apache#3836
8eae75f to
10f8cbe
Compare
Contributor
|
@singhpk234 : is this ready for review yet? 😉 |
HonahX
reviewed
May 14, 2026
Contributor
Author
HonahX
approved these changes
May 14, 2026
dimas-b
reviewed
May 16, 2026
| // grant_records_version between our lookupEntityVersions and lookupEntity reads. Use the | ||
| // entity's own value so (entity, grantRecordsVersion) stays internally consistent. | ||
| int reportedGrantRecordsVersion = | ||
| entity != null ? entity.getGrantRecordsVersion() : entityVersions.grantRecordsVersion(); |
Contributor
There was a problem hiding this comment.
Does the same problem affect TransactionalMetaStoreManagerImpl? If so, it's not a blocker for this fix, but may be worth fixing separately.
| originalEntity.getCatalogId(), | ||
| "entity_version", | ||
| originalEntity.getEntityVersion(), | ||
| "grant_records_version", |
Contributor
There was a problem hiding this comment.
Why is a a condition on entity_version not sufficient? It should be incremented on all updates and thus prevent forks in the change history already 🤔
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
About the change
Two independent races in AtomicOperationMetaStoreManager caused HTTP 500s from IllegalStateException: grants_version_going_backward under concurrent write workload on the JDBC backend:
Read side: refreshResolvedEntity called lookupEntityVersions at t0 and then conditionally lookupEntity at t1, returning the grants version from t0 paired with the entity read at t1. A concurrent persistNewGrantRecord could bump the entity row's grant_records_version between the two reads, so the returned entity carried a grant_records_version greater than the reported grantsVersion, tripping ResolvedPolarisEntity's invariant. Fix: read the entity unconditionally and use its own grant_records_version as the canonical reported value (same pattern as loadResolvedEntityById).
Write side: JdbcBasePersistenceImpl.persistEntity CAS-checked entity_version only, not grant_records_version, so two concurrent grant bumps on the same grantee could both pass CAS and silently lose one increment. Fix: add grant_records_version to the CAS WHERE clause, matching the transactional path's CAS at AbstractTransactionalPersistence.checkConditionsForWriteEntityInCurrentTxn. Also reorder persistNewGrantRecord and revokeGrantRecord to bump grantee and securable versions before writing/deleting the grant record, so a CAS failure surfaces cleanly without orphaned grant_records rows.
Adds AtomicOperationMetaStoreManagerRefreshTest covering the refresh race reproducer (assertion failure pre-fix with grantsVersion=5 vs entity=7).
Fixes #3836
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)