fix(formatter): strip terminal escape sequences from non-JSON output#708
Conversation
API responses may contain user-generated content with embedded ANSI escape codes or C0/C1 control characters. JSON output is safe because serde escapes them as \uXXXX, but table/CSV/YAML formats passed strings through verbatim, allowing a malicious API value to inject terminal sequences into the user's terminal. Adds strip_control_chars() which removes CSI sequences (ESC [ ... final), OSC sequences (ESC ] ... BEL/ST), other Fe two-char sequences, and bare control characters (except tab and newline). Called from value_to_cell() so every string field rendered by format_value() in non-JSON modes is sanitized. Fixes googleworkspace#635
🦋 Changeset detectedLatest commit: 27bd7b6 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a security vulnerability where terminal escape sequences in API responses could be rendered directly in non-JSON output formats, potentially allowing for terminal injection attacks. By implementing a robust sanitization layer, the CLI now ensures that all string values are cleaned of malicious control characters before being displayed to the user. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a strip_control_chars function to sanitize terminal escape sequences and control characters in non-JSON outputs, mitigating terminal injection risks. The feedback identifies that the current implementation misses several control string sequences (DCS, SOS, PM, APC) and that the YAML formatter remains unsanitized as it bypasses the updated logic. It is also suggested to expand character filtering to include the Unicode 'Format' category for better security.
…ings Two issues from review: 1. DCS (ESC P), SOS (ESC X), PM (ESC ^), and APC (ESC _) sequences were not handled. The previous 'Other Fe sequences' arm consumed only the introducer byte, leaving the sequence body and ST terminator in the output. Each now calls consume_until_st() (extracted helper shared with OSC) which drains chars until BEL or ESC-backslash ST. 2. The YAML formatter (json_to_yaml) built strings directly from the raw API value without going through value_to_cell, so escape sequences survived in YAML output. Apply strip_control_chars() at the top of the String branch in json_to_yaml so all three non-JSON formats are covered. Adds tests for DCS/SOS/PM/APC stripping and YAML sanitization.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a strip_control_chars function to sanitize terminal escape sequences and C0/C1 control characters from non-JSON outputs, such as Table, CSV, and YAML, to prevent terminal injection vulnerabilities. The sanitizer is applied within json_to_yaml and value_to_cell. A review comment suggests refining the consume_until_st logic to ensure it correctly identifies the String Terminator sequence (ESC ) by checking for the backslash before consuming the next character, preventing potential over-consumption of characters.
Previously, any ESC byte inside a control string caused the next character to be consumed unconditionally. If the ESC was not followed by a backslash (e.g., a malformed or nested sequence), that character would be silently dropped. Now peek() checks for '\' before consuming, so only the valid ESC \ String Terminator is consumed; other ESC bytes cause an immediate break without over-consuming.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a security fix to strip terminal escape sequences and control characters from non-JSON outputs, such as Table, CSV, and YAML. This prevents malicious API responses from injecting terminal commands or manipulating the user's terminal display. The implementation includes a new strip_control_chars function that handles various ANSI/VT sequences while preserving tabs and newlines, along with comprehensive tests to verify the sanitization logic. I have no feedback to provide.
Problem
value_to_cell()returns raw strings from API responses without sanitizing terminal escape sequences. Any response field containing ANSI escape codes (e.g.\x1b]0;...) renders them directly when using--format table,--format yaml, or--format csv.JSON output (
--format json) is safe because serde automatically escapes control characters as\uXXXX. Non-JSON formats pass untrusted content straight to the terminal, allowing an attacker to craft an API response value that injects terminal escape sequences — for example to set the window title, move the cursor, or trigger other VT sequences.Raised in #635.
Fix
Adds
strip_control_chars()— a zero-dependency function that removes:ESC [ ... <final byte>) — SGR colours, cursor movement, etc.ESC ] ... BELorESC ] ... ESC \) — window title injection, hyperlinks, etc.ESC <0x40–0x5F>)value_to_cell()now callsstrip_control_chars()for everyValue::String, so all string fields rendered throughformat_value()in non-JSON modes are sanitized before output.Tests
test_strip_control_chars_clean_string— passthrough for safe stringstest_strip_control_chars_csi_sequence— SGR colour codes strippedtest_strip_control_chars_osc_sequence— BEL- and ST-terminated OSC strippedtest_strip_control_chars_c0_control— NUL, BEL, BS, CR stripped; tab/newline kepttest_value_to_cell_sanitizes_escape_sequences— end-to-end throughvalue_to_cellFixes #635