feat(sync-service): hibernate before suspend to enable GC#4284
Conversation
❌ 5 Tests Failed:
View the top 1 failed test(s) by shortest run time
View the full list of 4 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
Claude Code ReviewSummaryIteration 3 review. Since iteration 2 (2026-05-07), the only change is a test-file formatting cleanup ( What's Working Well (Unchanged Since Iteration 2)
Issues FoundCritical (Must Fix)None. Important (Should Fix)1. (Re-raised from iter. 2) Activity in the last File: Status from iter. 2: still unaddressed. Bug-zone size with current defaults ( Why the new "activity during hibernation cancels pending suspend" test ( Why it is still Important, not Critical: terminated consumers restart on the next request — wasted work, not data loss. But the claim in Suggested fix (same as iter. 2 — restated for completeness): bump the generation on every activity. Either inline at each call site: state = %{state | suspend_generation: state.suspend_generation + 1}Or via a helper used by every handler that resets defp invalidate_suspend_timer(%{suspend_generation: gen} = state) do
%{state | suspend_generation: gen + 1}
endPlus a regression test that actually lands in the bug zone, e.g.: @tag hibernate_after: 50, shape_suspend_after: 100, with_pure_file_storage_opts: [flush_period: 1]
@tag suspend: true
test "activity in last hibernate_after ms of suspend window does not terminate", ctx do
# ... setup, wait until t≈10ms hibernated ...
Process.sleep(80) # now in last 50ms of 100ms suspend window
ShapeLogCollector.handle_event(txn2, ctx.stack_id)
Process.sleep(50) # past original suspend_timer at ~t=110ms
assert Process.alive?(consumer_pid)
end2. (Re-raised from iter. 2) File: def handle_info({:configure_suspend, hibernate_after, suspend_after, jitter_period}, state) do
state = %{state | hibernate_after: hibernate_after, suspend_after: suspend_after}
{:noreply, state, Enum.random(hibernate_after..jitter_period)}
endUnchanged since iter. 2. If suspend was previously enabled with a pending gen=N timer, then 3. (New) 837-line implementation plan markdown checked into File: This file is the agent-authored task-by-task implementation plan ("REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development..."). It documents the planned changes step-by-step, with commit messages and shell snippets — clearly a working document, not project documentation. It does not belong in the source tree:
Suggested fix: remove Suggestions (Nice to Have)1. (Re-raised from iter. 2) Add an assertion that proves the cancellation fired File: Once Important #1 is fixed, consider adding an assertion that 2. (New, minor) Test comments at The original suspend timer is scheduled when Issue ConformanceNo linked issue (carry-over). PR description and the changeset both claim "Any activity cancels the pending suspend timer, restarting the cycle." Once Important #1 is fixed, that claim will be accurate. Today it is a partial truth — it is true for activity arriving in the first Previous Review Status
Net new commits since iter. 2: Review iteration: 3 | 2026-05-18 |
3d7ade0 to
6029612
Compare
Add new config option to control the delay between hibernation and suspension. Default is 60 seconds. This prepares for hibernate-then-suspend behavior where consumers first hibernate (to trigger GC) before suspending. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…tate Add state fields to track the scheduled suspend timer and the configured delay between hibernation and suspension. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When suspend is enabled, consumers now: 1. Hibernate first on timeout (triggering GC) 2. Schedule a suspend timer for suspend_after ms later 3. Suspend (terminate) when the suspend timer fires Any activity cancels the pending suspend timer, restarting the cycle. This ensures GC runs before eventual process termination. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change enable_suspend/3 to enable_suspend/4, adding the suspend_after parameter to configure the delay between hibernation and suspension. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add tests verifying: - Consumer hibernates first, then suspends after suspend_after ms - Activity during hibernation cancels the pending suspend timer - Update enable_suspend test for new 4-arity function - Update existing suspend test to use explicit suspend_after tag Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Clarify enable_suspend/4 docstring: jitter controls hibernation timing, suspension happens suspend_after ms after hibernation - Update :configure_suspend handler comment to describe hibernate-then-suspend - Add defensive guard for suspend_after: nil in schedule_suspend_timer/1 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…n test With hibernate-then-suspend, consumers hibernate first then wait suspend_after before terminating. Set ELECTRIC_SHAPE_SUSPEND_AFTER=200ms alongside ELECTRIC_SHAPE_HIBERNATE_AFTER=200ms so the test completes within the expected timeframe. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
6029612 to
fe0cbe4
Compare
Summary
When consumer suspend is enabled, consumers now hibernate first (triggering GC) before eventually suspending (terminating). Previously, consumers went directly to suspend without hibernating, and due to a large timeout for suspend, consumer processes could hold onto garbage memory for longer than necessary.
shape_suspend_afterconfig (default 60s) - delay between hibernation and suspensionhibernate_aftertimeout, schedules suspend timersuspend_afterms, terminates process if still idleConsumerRegistry.enable_suspend/4to include suspend_after parameterTest Plan
🤖 Generated with Claude Code