feat(webhook): proxy-aware error handling for forward proxies#899
Conversation
Adds a Features page covering the existing DESTINATIONS_WEBHOOK_PROXY_URL configuration and introduces DESTINATIONS_WEBHOOK_PROXY_TYPE=envoy as an opt-in for Envoy-aware error handling: infra-error detection via x-envoy-response-flags and the proxyconnect error path, transparent redelivery via nack, and x-envoy-*/server header sanitization. Includes required Envoy configuration, a minimal reference envoy.yaml, and queue retry policy guidance for sufficient redelivery runway during proxy outages. Default behavior for existing users is unchanged. Co-Authored-By: Claude <noreply@anthropic.com>
Adds an explicit error-attribution table. Nack is reserved for cases where the proxy itself is the proximate cause (auth failure, proxy unreachable). Destination failures that Envoy merely reports (upstream DNS / connect / timeout) are recorded as normal failed delivery attempts with destination-attributed codes; the wrapper sanitizes the response data so Envoy is not exposed, but the customer still sees a destination failure. This avoids the infinite-nack failure mode for genuinely broken destinations. Co-Authored-By: Claude <noreply@anthropic.com>
|
Is |
"DLQ runway" replaced with an explicit statement of what the number means: the maximum time a single nacked message is retried before landing in the dead-letter queue. Co-Authored-By: Claude <noreply@anthropic.com>
hmm good idea, yeah we can do that I think |
Removes DESTINATIONS_WEBHOOK_PROXY_TYPE. Envoy-specific signals (x-envoy-response-flags, x-envoy-* / server: envoy headers) are detected automatically from response headers when present; nothing to configure. The proxy-infra nack path (proxyconnect 407, dial to proxy fails) is generic and applies to any forward proxy, since the discriminator is "the proxy is broken or misconfigured," not "the proxy is Envoy." Co-Authored-By: Claude <noreply@anthropic.com>
Reorganizes the error handling section so generic-proxy behavior and Envoy-specific signals are in separate tables. Adding support for other proxy implementations (Squid, HAProxy, nginx) in the future would extend the Envoy support pattern. Co-Authored-By: Claude <noreply@anthropic.com>
Two clarifications: 1. Response sanitization is best-effort and currently complete only for Envoy. For arbitrary proxies, Outpost can rewrite error messages but cannot reliably strip proxy-identifying content from response bodies or headers. 2. Production proxy auth + TLS recommendation now depends on topology: not required if Outpost and the proxy share a private network; strongly recommended if the proxy is reachable over the internet to prevent open-relay abuse. Co-Authored-By: Claude <noreply@anthropic.com>
Trim the production-deployment guidance to a single sentence on the VPC-vs-internet trade-off, instead of a subsection with specific configuration suggestions. Co-Authored-By: Claude <noreply@anthropic.com>
Defines ErrProxyInfra (signals nack-on-infra-error) and ErrProxyDestination (signals record-as-destination-error with mapped code) along with MapEnvoyResponseFlag for translating Envoy response flags into existing destination error codes (connection_refused, timeout, dns_error, network_unreachable). No consumers yet; subsequent commits add the transport wrapper and integration into the webhook delivery path. Co-Authored-By: Claude <noreply@anthropic.com>
Wraps the proxy transport when a proxy URL is configured. Detects two
classes of proxy-originated failures:
- Proxy infrastructure (407 / 401 / 403 on CONNECT, dial-to-proxy
failure): wrapped as ErrProxyInfra. ExecuteHTTPRequest returns
Delivery: nil so the message handler nacks via the existing
nil-attempt path (registry.go:179-195), preserving the destination's
retry budget across proxy outages.
- Destination errors that the proxy reports (5xx on CONNECT, etc.):
wrapped as ErrProxyDestination with a classification code mapped
from the response. Falls through to the standard failed-attempt path
with the explicit code instead of substring-matching the underlying
error.
CONNECT-time detection uses http.Transport.OnProxyConnectResponse which
exposes the full response (status + headers) before Go discards it,
sidestepping the fragility of parsing internal error strings. The
wrapper still detects dial-to-proxy failures via the "proxyconnect"
prefix that Go's net.OpError keeps for that case.
Co-Authored-By: Claude <noreply@anthropic.com>
Extends proxyTransport with Envoy-specific signals layered on top of the
generic proxy handling:
- Plain-HTTP forwarding path: responses carrying a non-empty/non-"-"
x-envoy-response-flags header are treated as Envoy-synthesized
failures, converted to ErrProxyDestination with the flag mapped to a
destination error code, and their bodies dropped so they don't reach
the delivery record.
- CONNECT path: onProxyConnectResponse reads the same flag header on
non-200 responses to refine the generic "connection_refused"
classification (e.g. flag DC -> dns_error, UT -> timeout).
- Successful responses are stripped of x-envoy-* headers and a
"server: envoy" header before being returned, so the proxy is not
fingerprinted in the destination's recorded response.
Co-Authored-By: Claude <noreply@anthropic.com>
Adds a snapshot test asserting that net/http still emits "proxyconnect" in dial-to-proxy failure errors. The proxyTransport detector relies on this internal stdlib convention; if a Go upgrade changes the wording, this test fails in CI rather than letting the detector silently stop matching and causing proxy outages to be misclassified as destination errors. Co-Authored-By: Claude <noreply@anthropic.com>
Two small adjustments after the code landed:
- Drop the implementation-specific "Outpost sees" column from the
generic table; describe observed behavior instead. Proxy auth-style
failures cover 407, 401, and 403, not just 407.
- Frame the Envoy table around the response-flag signal directly,
rather than listing every flag-to-code combination (those are
documented by the MapEnvoyResponseFlag mapping).
Co-Authored-By: Claude <noreply@anthropic.com>
Envoy rejects response_headers_to_add at the HttpConnectionManager level (confirmed on Envoy 1.31). The field belongs on RouteConfiguration. Updates both the standalone snippet and the reference envoy.yaml. Co-Authored-By: Claude <noreply@anthropic.com>
Destinations commonly sit behind their own Envoy at the edge, which
emits server: envoy and x-envoy-response-flags on the response. For
HTTPS destinations, that response travels through an established CONNECT
tunnel — the forward proxy is byte-blind to TLS-encrypted bytes and
cannot have synthesized or modified anything. Previously the wrapper
treated those headers as if our forward proxy had set them, which:
- converted a real destination 503 from a downed app into a misleading
"connection_refused" with no body
- stripped the destination's own envoy observability headers
After this change, HTTPS responses (post-successful-CONNECT) are
returned unchanged. Proxy-originated HTTPS failures all happen at
CONNECT time and remain handled in onProxyConnectResponse.
The plain-HTTP forwarding path still applies envoy detection and header
stripping because the forward proxy is in the byte path on the return.
The residual case where this over-strips — a plain-HTTP destination
that is itself behind Envoy — is documented as a limitation;
attribution remains correct because x-envoy-response-flags is
overwritten by the forward Envoy via OVERWRITE_IF_EXISTS_OR_ADD.
Top-of-file doc comment on proxyTransport now describes the two
surfaces (CONNECT-time vs plain-HTTP-forwarding) so the dual-path
design is discoverable.
Co-Authored-By: Claude <noreply@anthropic.com>
- httphelper: skip ClassifyNetworkError when ErrProxyDestination sentinel matches - proxytransport: drop strings.ToLower; http.Header keys are canonicalized - proxytransport: drop impossible nil-guard on req/req.URL in RoundTrip - proxytransport: note unhandled Envoy flags fall through to network_error - proxytransport: cross-reference pin test from proxyconnect detector - proxytransport_test: drop unreachable empty-flag row from map test Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MakeHTTPClient never touched BaseProvider state — it was a free helper pretending to be a method. Move it to httpclient.go and update the three HTTP-based providers (desthookdeck, destwebhook, destwebhookstandard) to call destregistry.NewHTTPClient directly. BaseProvider keeps only the metadata/validation concerns it actually owns. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move proxytransport.go, the Envoy flag mapping, and all sentinels (ErrProxyInfra, ErrProxyDestination, IsProxyInfraError) into the destwebhook package — the only place they're actually used. destregistry keeps a generic WrapTransport hook on HTTPClientConfig; destwebhook exports a WrapTransport function and the two webhook providers (destwebhook, destwebhookstandard) plug it in when constructing their HTTP client. destregistry/ no longer has any proxy-specific code. The wrapper, the Envoy flag table, the CONNECT-response callback, and all four test files move with the sentinels. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the "set 5s/600s/10 attempts for ~30min runway" recommendation with the actual behavior of the default provisioning in internal/mqinfra/gcppubsub.go (10s/120s/6 attempts ~ 5min) and note that operators expecting longer proxy outages should tune MinRetryBackoff/MaxRetryBackoff/RetryLimit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alexbouchardd
left a comment
There was a problem hiding this comment.
I would get @mkherlakian review here as well since he knows envoy's quirks well
There was a problem hiding this comment.
Is it common practice in Go to put tests in the same directory as the code it's testing?
There was a problem hiding this comment.
@mkherlakian yes, especially for unit test it's sort of the only option. Putting it in a different directory would be an anti-pattern.
Also keep in mind Golang enforce an implicit package boundary by directory, so there's that.
DC (Downstream Connection Termination) is the wrong flag — it means the client closed the connection, not DNS resolution. The actual flag emitted by Envoy's dynamic_forward_proxy when a host fails to resolve is DF (DNS Failure). Verified against a local Envoy reference proxy (build/dev/envoy/envoy.yaml): nonexistent.invalid → DF → dns_error. Also adds a local-dev Envoy service (build/dev/compose.yml + envoy.yaml) matching the documented config — `%RESPONSE_FLAGS%` on RouteConfiguration with OVERWRITE_IF_EXISTS_OR_ADD — so the QA suite at qa/suites/outpost/ webhook-proxy.md can be run against a real proxy locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review feedback that the recorded error code is too coarse and
gives operators nothing to diagnose with.
Mapping: align with ClassifyNetworkError vocabulary so customers see the
same codes whether or not a proxy is in path.
- UC, UR, LR → connection_reset (split out of connection_refused)
- UPE, DPE → protocol_error (new bucket; previously network_error)
- UMSDR → timeout
- network_error stays as the catch-all; operator-visible signal that
a new flag has shown up and the table should be expanded.
Diagnostics: capture raw x-envoy-response-flags and the new
x-envoy-response-code-details (stage{reason}) on ErrProxyDestination
and attach them to the publish-attempt error payload as proxy_flag /
proxy_details. Logged operator-side; never written to the customer-
visible attempt response_data — same model EG uses.
Ref Envoy config emits both headers with OVERWRITE_IF_EXISTS_OR_ADD so
destinations can't spoof them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace Flag/Details fields with a free-form Diagnostics map[string]string. The previous shape baked Envoy-specific terminology into a struct that classifies errors from any forward proxy — even though only Envoy populates the fields today. Generic map keeps the contract honest: whoever populates a key owns its naming (envoy_flag, envoy_details for Envoy; future Squid/HAProxy paths would add their own). Error() sorts keys for deterministic log output. httphelper splats the map into the publish-attempt Data payload. Behavior unchanged — same log strings, same customer-invisible surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Webhook deliveries through a forward proxy (
DESTINATIONS_WEBHOOK_PROXY_URL) now distinguish proxy infrastructure failures from destination failures, and translate proxy-reported destination failures into the right destination-attributed error code. No new config; Envoy-specific signals are auto-detected from response headers when present. Implementation is scoped todestwebhook/;destregistry/exposes only a genericWrapTransporthook.Full feature doc:
docs/content/features/webhook-proxy.mdoc.Why
Two equally-load-bearing problems today:
server: envoy/x-envoy-*headers leak into the response stored against the destination.Behavior
UF→connection_refused,DC→dns_error,UT→timeout, etc.; unrecognized →network_error).x-envoy-*andserver: envoyheaders are stripped before storage. Envoy-synthesized 5xx bodies are dropped.Nuances
x-envoy-response-flagswithOVERWRITE_IF_EXISTS_OR_ADD(must live onRouteConfiguration, Envoy rejects it on HCM). Without this, a destination can fake the flag header and force its real body to be dropped.Pub/Sub redelivery runway
Outpost's default Pub/Sub provisioning gives roughly 5 minutes of redelivery before DLQ (see the docs page for the exact knobs). A proxy outage under that window is transparent to destinations; longer ones land in the DLQ and need manual replay.
Test plan
ExecuteHTTPRequestreturnsDelivery: nilonly on the infra-error sentinelproxyconnecterror wording (guards stdlib regression)MapEnvoyResponseFlagtable test🤖 Generated with Claude Code