Skip to content

Removes in memory set from dead compaction detector#6283

Open
keith-turner wants to merge 8 commits intoapache:mainfrom
keith-turner:tmp-file-cleanup
Open

Removes in memory set from dead compaction detector#6283
keith-turner wants to merge 8 commits intoapache:mainfrom
keith-turner:tmp-file-cleanup

Conversation

@keith-turner
Copy link
Copy Markdown
Contributor

Removed an in memory set of tables ids in the dead compaction detectors that contained table ids that may have compaction tmp files that needed cleanup. This set would be hard to maintain in multiple managers. Also the set could lose track of tables if the process died.

Replaced the in memory set with a set in the metadata table. This set is directly populated by the split and merge fate operations, so there is no chance of losing track of things when a process dies. Also this set is more narrow and allows looking for tmp files to cleanup in single tablets dirs rather than scanning an entire tables dir.

Also made a change to the order in which tmp files are deleted for failed compactions. They used to be deleted after the metadata for the compaction was cleaned up, this could lead to losing track of the cleanup if the process died after deleting the metadata but before deleting the tmp file. Now the tmp files are deleted before the metadata entry, so should no longer lose track in process death.

This change is needed by #6217

Removed an in memory set of tables ids in the dead compaction detectors
that contained table ids that may have compaction tmp files that needed
cleanup. This set would be hard to maintain in multiple managers. Also
the set could lose track of tables if the process died.

Replaced the in memory set with a set in the metadata table. This set is
directly populated by the split and merge fate operations, so there is
no chance of losing track of things when a process dies.  Also this set
is more narrow and allows looking for tmp files to cleanup in single
tablets dirs rather than scanning an entire tables dir.

Also made a change to the order in which tmp files are deleted for
failed compactions.  They used to be deleted after the metadata for the
compaction was cleaned up, this could lead to losing track of the
cleanup if the process died after deleting the metadata but before
deleting the tmp file.  Now the tmp files are deleted before the
metadata entry, so should no longer lose track in process death.

This change is needed by apache#6217
@keith-turner keith-turner added this to the 4.0.0 milestone Mar 31, 2026
final TabletMetadata tm = ctx.getAmple().readTablet(extent, ColumnType.DIR);
if (tm != null) {
final Collection<Volume> vols = ctx.getVolumeManager().getVolumes();
for (Volume vol : vols) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was moved to FindCompactionTmpFiles so it could be called here and in the dead compaction detector.

Copy link
Copy Markdown
Contributor

@ddanielr ddanielr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the functionality move into the metadata table.
I am curious how long those row entries become with the full UUID and tablet dir.

FindCompactionTmpFiles seems like it could use some refactoring but that's not technically impacting the work this PR is trying to accomplish.

}

// Finds any tmp files matching the given compaction ids in table dir and deletes them.
public static void deleteTmpFiles(ServerContext ctx, TableId tableId, String dirName,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of this code seems like it has overlap with findTempFiles

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some code I copied mostly as is from CompactionCoordinator to this class for deleting a file. Did not look at the existing code when I copied it in, will take a look at that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a change in 4993169 to consolidate the delete code. Still needed a separate function to find tmp files.

keith-turner and others added 5 commits April 7, 2026 10:43
…RemovedCompactionStoreImpl.java

Co-authored-by: Daniel Roberts <ddanielr@gmail.com>
…ction/coordinator/DeadCompactionDetector.java

Co-authored-by: Daniel Roberts <ddanielr@gmail.com>
interface RemovedCompactionStore {
Stream<RemovedCompaction> list();

void add(Collection<RemovedCompaction> removedCompactions);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a few minutes to understand what a RemovedCompaction represented. My understanding of these changes are that when a tablet is merged or split, the ECID entries in the tablet metadata for the tablets involved are removed. However, the compaction is likely running on a Compactor. Is that right?

I was having trouble understanding the context given the name. I wonder if a different name might better reflect the situation. The compaction itself is not removed, it's OBE. It's been orphaned from it's parent tablet or it's like a dangling ref in a database. Would OrphanedCompaction be a better name?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OrphanedCompaction is a much better name, will change to that. Used removed in the name because it was removed from the metadata table.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that right?

Yes that is all correct.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed in db57334

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants