Skip to content

fix: prevent grants_version_going_backward under concurrent grants#4358

Open
singhpk234 wants to merge 1 commit into
apache:mainfrom
singhpk234:fix/grants-version-going-backward
Open

fix: prevent grants_version_going_backward under concurrent grants#4358
singhpk234 wants to merge 1 commit into
apache:mainfrom
singhpk234:fix/grants-version-going-backward

Conversation

@singhpk234
Copy link
Copy Markdown
Contributor

About the change

Two independent races in AtomicOperationMetaStoreManager caused HTTP 500s from IllegalStateException: grants_version_going_backward under concurrent write workload on the JDBC backend:

  • Read side: refreshResolvedEntity called lookupEntityVersions at t0 and then conditionally lookupEntity at t1, returning the grants version from t0 paired with the entity read at t1. A concurrent persistNewGrantRecord could bump the entity row's grant_records_version between the two reads, so the returned entity carried a grant_records_version greater than the reported grantsVersion, tripping ResolvedPolarisEntity's invariant. Fix: read the entity unconditionally and use its own grant_records_version as the canonical reported value (same pattern as loadResolvedEntityById).

  • Write side: JdbcBasePersistenceImpl.persistEntity CAS-checked entity_version only, not grant_records_version, so two concurrent grant bumps on the same grantee could both pass CAS and silently lose one increment. Fix: add grant_records_version to the CAS WHERE clause, matching the transactional path's CAS at AbstractTransactionalPersistence.checkConditionsForWriteEntityInCurrentTxn. Also reorder persistNewGrantRecord and revokeGrantRecord to bump grantee and securable versions before writing/deleting the grant record, so a CAS failure surfaces cleanly without orphaned grant_records rows.

Adds AtomicOperationMetaStoreManagerRefreshTest covering the refresh race reproducer (assertion failure pre-fix with grantsVersion=5 vs entity=7).

Fixes #3836

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

@github-project-automation github-project-automation Bot moved this to PRs In Progress in Basic Kanban Board May 3, 2026
@sfc-gh-prsingh sfc-gh-prsingh force-pushed the fix/grants-version-going-backward branch 2 times, most recently from 10a3892 to 8eae75f Compare May 3, 2026 20:22
Two independent issues in AtomicOperationMetaStoreManager / JDBC persistence
could surface `IllegalStateException: grants_version_going_backward` and
silently lost grant-version increments on the JDBC backend under concurrent
write workload (reproducible via polaris-tools' create-dataset-simulation
benchmark, apache#3836):

* Read side: refreshResolvedEntity called lookupEntityVersions at t0 and
  conditionally lookupEntity at t1, then returned the entity from t1 paired
  with the grants version from t0. A concurrent persistNewGrantRecord that
  bumped the entity row's grant_records_version between the two reads
  produced a result where entity.getGrantRecordsVersion() exceeded the
  reported grantsVersion, tripping ResolvedPolarisEntity's invariant. Fix:
  when the entity is reloaded, use its own grant_records_version as the
  reported value so the returned pair is internally consistent.

* Write side: JdbcBasePersistenceImpl.persistEntity CAS-checked
  entity_version only, not grant_records_version, so two concurrent grant
  bumps on the same grantee could both pass CAS and silently lose one
  increment. Fix: include grant_records_version in the CAS WHERE clause,
  matching the transactional path's CAS at
  AbstractTransactionalPersistence.checkConditionsForWriteEntityInCurrentTxn.

Adds AtomicOperationMetaStoreManagerRefreshTest covering the read-side race
reproducer.

Fixes apache#3836
@sfc-gh-prsingh sfc-gh-prsingh force-pushed the fix/grants-version-going-backward branch from 8eae75f to 10f8cbe Compare May 3, 2026 20:30
@dimas-b
Copy link
Copy Markdown
Contributor

dimas-b commented May 11, 2026

@singhpk234 : is this ready for review yet? 😉

Copy link
Copy Markdown
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward when its ready for review! The fix looks good! Thanks!

@singhpk234 singhpk234 marked this pull request as ready for review May 14, 2026 23:35
@singhpk234
Copy link
Copy Markdown
Contributor Author

Apologies @dimas-b @HonahX i just forgot to mark it ready for review :) please have a look when you get some time !

@github-project-automation github-project-automation Bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board May 14, 2026
// grant_records_version between our lookupEntityVersions and lookupEntity reads. Use the
// entity's own value so (entity, grantRecordsVersion) stays internally consistent.
int reportedGrantRecordsVersion =
entity != null ? entity.getGrantRecordsVersion() : entityVersions.grantRecordsVersion();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the same problem affect TransactionalMetaStoreManagerImpl? If so, it's not a blocker for this fix, but may be worth fixing separately.

originalEntity.getCatalogId(),
"entity_version",
originalEntity.getEntityVersion(),
"grant_records_version",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a a condition on entity_version not sufficient? It should be incremented on all updates and thus prevent forks in the change history already 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Grants version going backward during write workload

3 participants