Skip to content

Fix SIGSEGV in fsm_extend when vacuuming tables in non-default tablespace#1666

Open
avamingli wants to merge 1 commit intoapache:mainfrom
avamingli:fix_vacuum_crash
Open

Fix SIGSEGV in fsm_extend when vacuuming tables in non-default tablespace#1666
avamingli wants to merge 1 commit intoapache:mainfrom
avamingli:fix_vacuum_crash

Conversation

@avamingli
Copy link
Copy Markdown
Contributor

Fix issue #1665

The commit df1e2ff ("Prevent CREATE TABLE from using dangling tablespace") added a call to TablespaceLockTuple() inside TablespaceCreateDbspace() for every non-default tablespace, including the common path where the per-database directory already exists. That lock acquisition calls LockSharedObject(), which calls AcceptInvalidationMessages(), allowing pending sinval messages to be processed at an unexpected point deep inside smgrcreate().

This creates a crash window during VACUUM of a heap table (including auxiliary tables such as the aoseg table of an AO relation) that lives in a non-default tablespace and has never been vacuumed before (so no FSM or VM fork exists yet):

lazy_scan_heap
-> visibilitymap_pin -> vm_readbuf -> vm_extend
smgrcreate(VM fork) + CacheInvalidateSmgr() <- queues SHAREDINVALSMGR_ID
-> RecordPageWithFreeSpace -> fsm_readbuf -> fsm_extend
smgrcreate(FSM fork) -> mdcreate
-> TablespaceCreateDbspace [CBDB-specific for non-default tablespaces]
-> TablespaceLockTuple -> LockSharedObject
-> AcceptInvalidationMessages()
-> smgrclosenode() -> rel->rd_smgr = NULL
rel->rd_smgr->smgr_cached_nblocks[FSM_FORKNUM] = ...
-> SIGSEGV (NULL dereference) at freespace.c:637

Fix: add RelationOpenSmgr(rel) after smgrcreate() in both fsm_extend() (freespace.c) and vm_extend() (visibilitymap.c). RelationOpenSmgr is a no-op when rd_smgr is still valid, and re-opens the smgr handle if the sinval handler has closed it. The identical guard already exists earlier in both functions for the LockRelationForExtension path.

Add a regression test (vacuum_fsm_nondefault_tablespace) covering both a plain heap table and an AO table in a non-default tablespace.

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


…pace

The commit df1e2ff ("Prevent CREATE TABLE from using dangling tablespace")
added a call to TablespaceLockTuple() inside TablespaceCreateDbspace() for
every non-default tablespace, including the common path where the per-database
directory already exists.  That lock acquisition calls LockSharedObject(),
which calls AcceptInvalidationMessages(), allowing pending sinval messages to
be processed at an unexpected point deep inside smgrcreate().

This creates a crash window during VACUUM of a heap table (including auxiliary
tables such as the aoseg table of an AO relation) that lives in a non-default
tablespace and has never been vacuumed before (so no FSM or VM fork exists yet):

  lazy_scan_heap
  -> visibilitymap_pin -> vm_readbuf -> vm_extend
     smgrcreate(VM fork) + CacheInvalidateSmgr()   <- queues SHAREDINVALSMGR_ID
  -> RecordPageWithFreeSpace -> fsm_readbuf -> fsm_extend
     smgrcreate(FSM fork) -> mdcreate
     -> TablespaceCreateDbspace [CBDB-specific for non-default tablespaces]
        -> TablespaceLockTuple -> LockSharedObject
           -> AcceptInvalidationMessages()
              -> smgrclosenode() -> rel->rd_smgr = NULL
     rel->rd_smgr->smgr_cached_nblocks[FSM_FORKNUM] = ...
     -> SIGSEGV (NULL dereference) at freespace.c:637

Fix: add RelationOpenSmgr(rel) after smgrcreate() in both fsm_extend()
(freespace.c) and vm_extend() (visibilitymap.c).  RelationOpenSmgr is a
no-op when rd_smgr is still valid, and re-opens the smgr handle if the
sinval handler has closed it.  The identical guard already exists earlier
in both functions for the LockRelationForExtension path.

Add a regression test (vacuum_fsm_nondefault_tablespace) covering both a
plain heap table and an AO table in a non-default tablespace.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant