fix(openfeature): avoid startup worker timeout#18161
Draft
leoromanovsky wants to merge 1 commit into
Draft
Conversation
Codeowners resolved as |
Contributor
🎉 All green!🧪 All tests passed 🔗 Commit SHA: c9abfbb | Docs | Datadog PR Page | Give us feedback! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The FEATURE_FLAGGING_AND_EXPERIMENTATION system-tests job showed gunicorn workers timing out at 30s during startup, with stderr also reporting
OpenTelemetry configuration OTEL_METRIC_EXPORT_INTERVAL is not supported by Datadog.The OTEL metric reader interval is actually supported and consumed by ddtrace, and the OpenFeature provider default initialization wait lined up exactly with common 30s web-server worker timeouts.Changes
OTEL_METRIC_EXPORT_INTERVALandOTEL_METRIC_EXPORT_TIMEOUTas supported OTEL startup configurations.DataDogProviderinitialization timeout from 30s to 25s so OpenFeature can mark the provider not ready and recover later instead of letting a common worker timeout kill startup.Decisions
DD_EXPERIMENTAL_FLAGGING_PROVIDER_INITIALIZATION_TIMEOUT_MSfor deployments that need a different startup budget.Testing
scripts/ddtest riot -v run --pass-env -s 120e7d0 -- tests/opentelemetry/test_config.py::test_otel_metric_reader_configuration_supportedscripts/ddtest riot -v run --pass-env -s 18421e5 -- -k test_initialize_timeout_raisesscripts/ddtest riot -v run --pass-env -s 18421e5 -- -k test_late_recovery_after_timeoutgit diff --check