Skip to content

test(internal): add lifecycle stress / fuzzing harness for PeriodicThread#18183

Draft
r1viollet wants to merge 1 commit into
mainfrom
r1viollet/periodic-thread-stress
Draft

test(internal): add lifecycle stress / fuzzing harness for PeriodicThread#18183
r1viollet wants to merge 1 commit into
mainfrom
r1viollet/periodic-thread-stress

Conversation

@r1viollet
Copy link
Copy Markdown
Contributor

@r1viollet r1viollet commented May 20, 2026

Description

Adds tests/internal/test_periodic_stress.py — a randomized lifecycle
fuzzer for PeriodicThread that interleaves
create / start / stop / join / awake / drop-last-ref / fork / gc /
thread-churn across a pool of PeriodicThread and PeriodicService
objects.

This is the harness that surfaced #17707 and #18040
(PeriodicThread.awake() blocking forever after stop()).

What's in the file

  • test_periodic_thread_lifecycle_stress — the randomized fuzzer.
    Default budget kept short for CI (DD_STRESS_ITERS=200, ~10s).
    Documented env vars for soaks (DD_STRESS_ITERS, DD_STRESS_SECONDS,
    DD_STRESS_SEED, DD_STRESS_POOL, DD_STRESS_FORK_EVERY,
    DD_STRESS_TRACE_FILE). Seed is printed on every run so a failing
    random seed can be pinned with DD_STRESS_SEED=N.
  • test_periodic_thread_concurrent_dealloc_race — focused regression
    for fix: periodic thread start race #17485 (refcount TOCTOU between std::thread creation and the
    lambda acquiring the GIL).
  • test_periodic_thread_stop_without_join_then_fork_repeat — focused
    regression for fix: periodic thread after fork cleanup #16955 (pthread_t recycling after stop-without-join
    then fork).

How to run it when modifying the periodic thread framework

The module docstring documents the exact commands. tl;dr:

# ~30s soak
DD_STRESS_ITERS=10000 scripts/run-tests -- -- tests/internal/test_periodic_stress.py

# 120s wall-clock soak (recommended under ASan/TSan)
DD_STRESS_SECONDS=120 scripts/run-tests -- -- tests/internal/test_periodic_stress.py

# Reproduce a failing seed (the seed is printed to stderr by every run)
DD_STRESS_SEED=12345 scripts/run-tests -- -- tests/internal/test_periodic_stress.py

Testing

All three tests pass locally on Python 3.13 in the testrunner image:

tests/internal/test_periodic_stress.py::test_periodic_thread_lifecycle_stress PASSED
tests/internal/test_periodic_stress.py::test_periodic_thread_concurrent_dealloc_race PASSED
tests/internal/test_periodic_stress.py::test_periodic_thread_stop_without_join_then_fork_repeat PASSED

Risks

None — test-only addition. Existing test discovery and CI suites are
unchanged.

changelog/no-changelog — test-only.

🤖 Generated with Claude Code

…read

Adds tests/internal/test_periodic_stress.py — a randomized lifecycle
fuzzer that interleaves create / start / stop / join / awake /
drop-last-ref / fork / gc / thread-churn across a pool of PeriodicThread
and PeriodicService objects. The assertion surface is "no crash": any
child killed by a signal or unexpected exception escaping the public
API fails the test.

Two focused regression harnesses are bundled in the same file:
- test_periodic_thread_concurrent_dealloc_race (regresses PR 17485)
- test_periodic_thread_stop_without_join_then_fork_repeat (PR 16955)

Default budget (DD_STRESS_ITERS=200, ~4 forks) is short enough for CI
(~10s). The module docstring documents how to run a longer soak when
modifying ddtrace/internal/_threads.cpp or the periodic lifecycle,
along with DD_STRESS_SEED for reproducing a failing run.

This is the harness that surfaced PRs 17707 and 18040
(PeriodicThread.awake() blocking forever after stop()).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@r1viollet r1viollet added the changelog/no-changelog A changelog entry is not required for this PR. label May 20, 2026
@datadog-prod-us1-3
Copy link
Copy Markdown

datadog-prod-us1-3 Bot commented May 20, 2026

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 2 Pipeline jobs failed

Changelog | Validate changelog   View in Datadog   GitHub Actions

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration. Release note not found during changelog validation. Use 'reno new <slug>' to add a new note to 'releasenotes/notes'.

DataDog/apm-reliability/dd-trace-py | download win_arm64 wheels   View in Datadog   GitLab

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration. RUN_ID not found. Check if the GitHub build jobs were successfully triggered on your PR.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

🚧 81 tests that failed were ignored due to quarantine View in Datadog

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: f853f3b | Docs | Datadog PR Page | Give us feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/no-changelog A changelog entry is not required for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant