Skip to content

Support policy-based transport header extraction and propagation in OtapPdata #2508

@lquerel

Description

@lquerel

Pre-filing checklist

  • I searched existing issues and didn't find a duplicate

Component(s)

Rust OTAP dataflow (rust/otap-dataflow/)

Objective

Introduce a policy-based transport header capability that allows header-capable receivers to extract selected inbound transport headers into OtapPdata context, and allows header-capable exporters to propagate selected stored headers on egress.

Rationale

Protocols such as OTLP and OTAP can carry important request-scoped metadata outside the payload itself, for example in HTTP headers or gRPC metadata. That metadata can represent trace context, tenant, auth, routing, correlation, or policy hints that need to survive the pipeline.

OtapPdata already has context, but it does not currently preserve transport headers. This makes end-to-end propagation and policy-driven use of request metadata difficult.

A shared policy-based model is the preferred direction because multiple receivers and exporters are expected to need the same capability. Receivers and exporters can still expose node-specific config (by overriding the policy locally), but that config should primarily select or override shared transport-header policy rules rather than
redefine the feature independently at every node.

Scope

Add a new transport_headers policy family and a protocol-neutral transport header abstraction in OtapPdata context.

The transport header abstraction should preserve the semantics needed across protocols:

  • duplicate header names must be preserved
  • matching should use a normalized logical name
  • the original wire name should remain available
  • values should support both text and binary
  • values should be stored as raw bytes, not as protocol-specific header objects

A conceptual TransportHeader entry should contain:

  • name: normalized logical name used for matching and policy lookup
  • wire_name: original header or metadata name observed on ingress
  • value_kind: text or binary
  • value: raw bytes

The new transport_headers policy family should define:

  • extraction rules
  • propagation rules
  • shared limits and failure handling

Receivers and exporters should activate policy rules from node config. A receiver should be able to opt into extraction rules, and an exporter should be able to opt into propagation rules. Processors should be transport-header transparent unless they explicitly add support for reading or mutating this context.

The design should be protocol-neutral. OTLP/gRPC and OTAP/gRPC are suitable initial targets, but the abstraction should also cover other header-capable protocols.

Acceptance Criteria

  • OtapPdata can carry transport headers in context without losing duplicate-name or binary-value semantics.
  • A shared transport_headers policy family exists in the config model.
  • Receivers can activate extraction rules from policy through node config.
  • Exporters can activate propagation rules from policy through node config.
  • A normal in-pipeline path preserves transport headers through at least one processor.
  • An end-to-end pipeline demonstrates extraction on receiver:otlp an receiver:otap, preservation through a basic processor, and propagation on exporter:otap.
  • Extraction and propagation are explicit and opt-in. The default behavior is not to forward all inbound headers.
  • Tests cover extraction, propagation, filtering, limits, and invalid-header handling.

Dependencies or Blockers

No response

Additional Context

Proposed policy shape:

version: otel_dataflow/v1

groups:
  default:
    pipelines:
      ingest:
        policies:
          header_capture:
            defaults:
              max_entries: 32         # default: 32 captured headers per message
              max_name_bytes: 128     # default: 128 bytes per header name
              max_value_bytes: 4096   # default: 4096 bytes per header value
              on_error: drop          # default: drop the offending captured header

            # Per-header defaults:
            # - store_as: first matched name, normalized
            # - value_kind: text unless protocol-specific inference marks it binary
            # - sensitive: false
            headers:
              - match_names: ["x-tenant-id"]
                store_as: tenant_id

              - match_names: ["x-request-id"]   // store_as default on the extracted header name

              - match_names: ["authorization"]
                sensitive: true

          header_propagation:
            default:
              selector: all_captured  # default: all_captured
              action: propagate       # default: propagate
              name: preserve          # default: preserve stored header name
              on_error: drop          # default: drop the offending outbound header

            # Per-override defaults:
            # - action: propagate
            # - name: preserve
            # - on_error: drop
            overrides:
              - match:
                  stored_names: ["authorization"]
                action: drop

        nodes:
          otlp_ingest:
            type: receiver:otlp
            config:
              protocols:
                grpc:
                  listening_addr: "0.0.0.0:4317"

          otap_ingest:
            type: receiver:otap
            header_capture:    // node level override
              headers:
                - match_names: ["x-request-id"]
                  store_as: request_id
            config:
              listening_addr: "0.0.0.0:50051"

          batch:
            type: processor:batch
            config: {}

          otap_export:
            type: exporter:otap
            config:
              grpc_endpoint: "http://127.0.0.1:60051"

        connections:
          - from: otlp_ingest
            to: batch
          - from: otap_ingest
            to: batch
          - from: batch
            to: otap_export

Behavior in this example:

  • otlp_ingest uses the pipeline header_capture policy and captures tenant_id, request_id, and authorization.
  • otap_ingest overrides header_capture, so it only captures x-request-id and uses tighter limits.
  • batch preserves the captured headers unchanged.
  • otap_export propagates all captured headers by default.
  • authorization is dropped on egress by the propagation override.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions