Skip to content

bug: Secrets rewriting is not applied to websocket traffic. #872

@dllu

Description

@dllu

Agent Diagnostic

  • Investigated a live OpenClaw sandbox running under OpenShell and a local checkout of the latest OpenShell source tree.
  • Did not use a repo-specific OpenShell skill in this session; investigation was done by direct source tracing and live sandbox diagnostics.
  • Verified that Discord REST calls work with DISCORD_BOT_TOKEN=openshell:resolve:env:DISCORD_BOT_TOKEN, but Discord gateway auth fails with WebSocket close code 4004 Authentication failed.
  • Confirmed inside the sandbox that the effective token visible to the process is the placeholder string, not the real token.
  • Reproduced the split directly:
    • fetch("https://discord.com/api/v10/users/@me", { Authorization: "Bot openshell:resolve:env:DISCORD_BOT_TOKEN" }) succeeds through OpenShell proxy rewriting.
    • A raw WebSocket connection to wss://gateway.discord.gg/?v=10&encoding=json succeeds at the transport layer, but sending IDENTIFY with that same placeholder token is rejected with 4004.
  • Traced the OpenShell rewrite path in projects/openshell:
    • crates/openshell-sandbox/src/secrets.rs rewrites placeholders in HTTP headers and request targets.
    • crates/openshell-sandbox/src/l7/rest.rs rewrites the HTTP header block before forwarding, then streams remaining body bytes unchanged.
    • crates/openshell-sandbox/src/l7/relay.rs explicitly switches to raw bidirectional relay after 101 Switching Protocols, with the comment that L7 enforcement is no longer active.
  • Searched the repo for WebSocket-specific placeholder rewriting and did not find any code that parses or rewrites WebSocket frames after upgrade.
  • Conclusion: OpenShell's placeholder-based secret injection currently works for HTTP request metadata, but not for credentials sent inside upgraded WebSocket payloads. This breaks Discord gateway authentication because the bot token is sent in the WebSocket IDENTIFY frame body rather than in an HTTP header/path/query field.

Description

OpenShell provider placeholders are not rewritten inside WebSocket traffic after an HTTP upgrade.

This is a problem for integrations that rely on a secret being sent in WebSocket message payloads rather than in HTTP headers or URL components. A concrete case is Discord gateway authentication:

  • The sandboxed process sees DISCORD_BOT_TOKEN=openshell:resolve:env:DISCORD_BOT_TOKEN.
  • OpenShell correctly rewrites that placeholder for Discord REST requests, so calls like GET /api/v10/users/@me succeed.
  • The Discord gateway connection itself also succeeds at the transport level through the proxy.
  • But the Discord token is sent in the WebSocket IDENTIFY payload, and that payload is forwarded unchanged after the HTTP 101 Switching Protocols upgrade.
  • Discord therefore receives the literal placeholder string instead of the real bot token and closes the gateway with 4004 Authentication failed.

Expected behavior:

  • Either OpenShell should support secret rewriting for upgraded WebSocket traffic when the placeholder appears in outbound frames, or the product/docs should clearly state that provider placeholder injection only applies to HTTP request metadata and does not support WebSocket message-body authentication flows.

Actual behavior:

  • HTTP placeholder rewriting works.
  • WebSocket payload placeholder rewriting does not happen.
  • Discord native gateway auth fails even though REST works, which makes the integration appear partially functional and causes repeated reconnect/restart loops.

Reproduction Steps

  1. Create an OpenShell sandbox with a provider credential mapped to DISCORD_BOT_TOKEN (or any test token env var) so the sandbox sees a placeholder like openshell:resolve:env:DISCORD_BOT_TOKEN.
  2. Ensure outbound access to discord.com:443 and gateway.discord.gg:443 is allowed by policy.
  3. Inside the sandbox, confirm that the process env contains the placeholder string rather than the real token.
  4. Make an HTTPS request using the placeholder token in an HTTP header, for example:
fetch("https://discord.com/api/v10/users/@me", {
  headers: { Authorization: "Bot " + process.env.DISCORD_BOT_TOKEN }
})
  1. Observe that the REST request succeeds, showing that OpenShell rewrites the placeholder in HTTP traffic.
  2. In the same sandbox, open a WebSocket to wss://gateway.discord.gg/?v=10&encoding=json.
  3. After receiving Discord's HELLO packet (op: 10), send an IDENTIFY payload using process.env.DISCORD_BOT_TOKEN as the token.
  4. Observe that Discord closes the socket with 4004 Authentication failed.

Minimal Node reproduction for step 6-8:

const token = process.env.DISCORD_BOT_TOKEN;
const ws = new WebSocket("wss://gateway.discord.gg/?v=10&encoding=json");

ws.onmessage = (ev) => {
  const msg = JSON.parse(ev.data);
  if (msg.op === 10) {
    ws.send(JSON.stringify({
      op: 2,
      d: {
        token,
        intents: 0,
        properties: {
          os: "linux",
          browser: "repro",
          device: "repro"
        }
      }
    }));
  }
};

ws.onclose = (ev) => {
  console.log("close", ev.code, ev.reason);
};

Observed result:

  • REST succeeds with the placeholder token.
  • WebSocket gateway closes with 4004 Authentication failed.

Environment

  • OS: any host OS supported by OpenShell
  • OpenShell version: latest main source inspected locally; bug also reproduced against a live OpenShell deployment
  • Affected workload: OpenClaw Discord integration, though the bug is not specific to OpenClaw and should affect any client that sends provider placeholders inside WebSocket message payloads
  • Relevant OpenShell components:
    • crates/openshell-sandbox/src/secrets.rs
    • crates/openshell-sandbox/src/l7/rest.rs
    • crates/openshell-sandbox/src/l7/relay.rs

Logs

OpenClaw / Discord side:


discord: rest proxy enabled
discord: gateway proxy enabled
discord client initialized; awaiting gateway readiness
discord gateway: Gateway websocket closed: 4004
discord gateway error: Error: Fatal gateway close code: 4004


OpenShell sandbox env:


HTTPS_PROXY=http://10.200.0.1:3128
ALL_PROXY=http://10.200.0.1:3128
DISCORD_BOT_TOKEN=openshell:resolve:env:DISCORD_BOT_TOKEN


Manual REST check inside the sandbox:


token openshell:resolve:env:DISCORD_BOT_TOKEN
status 200
{"id":"<bot-id>","username":"<bot-name>", ...}


Manual gateway check inside the sandbox:


msg 10
identify-sent openshell:resolve:env:DISCORD_BOT_TOKEN
close 4004 Authentication failed.


Relevant OpenShell source comments:


// 101 Switching Protocols — raw bidirectional relay (L7 enforcement no longer active)



# The sandbox process sees GITHUB_TOKEN=openshell:resolve:env:GITHUB_TOKEN
# in its environment. When curl sends this as an Authorization header,
# the proxy's SecretResolver rewrites the placeholder to the real token.

Agent-First Checklist

  • I pointed my agent at the repo and had it investigate this issue
  • I loaded relevant skills (e.g., debug-openshell-cluster, debug-inference, openshell-cli)
  • My agent could not resolve this — the diagnostic above explains why

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions