brotli-compress .socket.facts.json on upload#1291
Merged
Benjamin Barslev Nielsen (barslev) merged 4 commits intoMay 20, 2026
Merged
Conversation
Collaborator
|
This is rad! |
3 tasks
John-David Dalton (jdalton)
added a commit
that referenced
this pull request
May 6, 2026
…) (#1305) Port of barslev's PR #1291 from v1.x to main. depscan's api-v0 multipart boundary streams brotli decode based on the .br filename suffix, so the facts file uploads as ~10x smaller without changing the on-disk format coana writes. Adds packages/cli/src/utils/coana/compress-facts.mts with compressSocketFactsForUpload(scanPaths) — streams brotli a sibling .socket.facts.json.br next to each source file and returns swapped paths plus a cleanup() callback. Sibling-write keeps the multipart entry name inside cwd (depscan drops .. traversal entries). handleCreateNewScan calls it just before fetchCreateOrgFullScan and runs cleanup() in finally so the .br files are removed even on upload failure. Translation from v1.x to main: - @socketsecurity/registry/lib/fs -> @socketsecurity/lib/fs - fs.rm -> safeDelete (fleet-wide rule) - constants default-import -> named import (DOT_SOCKET_DOT_FACTS_JSON already lives in packages/cli/src/constants.mts) - v1.x added scan-id and reachability-error tests in the same file as the helper; main keeps utils/coana/extract-scan-id.mts separate and has no extractReachabilityErrors yet, so only the new helper's tests are added at test/unit/utils/coana/compress-facts.test.mts Skipping pre-commit test suite via DISABLE_PRECOMMIT_TEST=1 because test/unit/commands/analytics/output-analytics.test.mts has 3 date-dependent snapshot failures unrelated to this PR (snapshots encode the literal date Apr 18/19/20/21 and fail on any other day). Targeted vitest run on the touched files (32 tests in utils/coana/ + handle-create-new-scan) passes 32/32.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit bb96685. Configure here.
Compress `.socket.facts.json` with brotli at upload time, just before
`fetchCreateOrgFullScan` POSTs the multipart `/full-scans` body. Coana
keeps writing plain JSON; the local readers
(`extractTier1ReachabilityScanId`, `extractReachabilityErrors`) keep
reading plain JSON; only the on-the-wire bytes between socket-cli and
depscan change. depscan transparently decompresses at the api-v0
multipart ingest boundary in a coordinated change.
Why:
* Mono-repo `.socket.facts.json` files routinely exceed 60MB. On a
representative simple-npm fixture, brotli compresses 150,748 bytes
to 19,971 (~86.8% reduction); production mono-repos see a much
larger absolute saving.
* Coana has a second consumer downstream - teaching coana to write
`.br` would break it. Brotli on the wire-only path keeps coana's
contract invariant.
* `tier1ReachabilityScanId` and reachability-error reporting still
read plain JSON locally; no brotli round-trip on those paths.
* Compression is a transport detail owned by the upload site; cleanup
is one `finally` block.
Implementation:
* `src/utils/coana.mts` - new exported
`compressSocketFactsForUpload(scanPaths)` returning `{ paths, cleanup }`.
Per `.socket.facts.json` entry, `mkdtempSync` a fresh `.socket-br-XXX/`
directory inside the source's parent dir (NOT under `os.tmpdir()` -
see below), `brotliCompressSync` bytes to
`${td}/.socket.facts.json.br`, swap the path. Other paths pass through.
Missing facts paths pass through. Cleanup is idempotent with
`force: true`.
* `src/commands/scan/handle-create-new-scan.mts` - wraps the
`fetchCreateOrgFullScan` call in
`try { compress; upload; } finally { cleanup(); }`. Cleanup runs on
success, throw, or any abort path.
Why the temp lives next to the source:
The SDK computes `path.relative(cwd, brPath)` for the multipart name
field. depscan's multipart ingest (`addStreamEntry`) checks
`containsTraversal(unixified)` and bails on any `..` segment. A
tmpdir-based path resolves to `../../../var/folders/...`, gets dropped
to `unmatchedFiles`, and the SocketFacts content never lands in the
scan. Putting the temp inside `path.dirname(originalFactsPath)`
produces a relative path like `.socket-br-XXX/.socket.facts.json.br` -
traversal-free, ingests cleanly.
Tests:
* `src/utils/coana.test.mts` - 16 cases.
- `compressSocketFactsForUpload` x 5: round-trip JSON via
`brotliDecompressSync`, basename swap to `.socket.facts.json.br`,
non-facts paths pass through, missing facts paths pass through,
cleanup idempotent, mixed-entry order. Pins the contract that the
temp lives in a subdir of the source's parent (traversal-free).
- `extractTier1ReachabilityScanId` x 7: plain JSON, missing file,
missing field, null, empty/blank, trim, numeric coercion.
- `extractReachabilityErrors` x 4: extraction, missing file,
no-components, no-inner-reachability.
This change requires the matching depscan multipart decompression
patch on the receiving side; that change ships first.
Previous form wrapped each `.br` in a per-scan `mkdtempSync` subdir
under the source dir for concurrency isolation. That created a
directory-handling asymmetry on depscan's side: a wire path of
`dirA/.socket.facts.json.br` flattened to `.socket.facts.json` at
root via depscan's basename-strip, while plain `dirA/.socket.facts.json`
preserved the dir.
depscan dropped the basename-strip in the corresponding PR; switch
to writing `.br` as a sibling of the source so wire and storage
paths match for both branches. Net effect: a brotli upload from
`<source>/.socket.facts.json` lands at `<source>/.socket.facts.json`
in the manifest tarball - identical to plain.
Concurrent-scan note: coana already writes to a single source path,
so the source `.socket.facts.json` itself is racy under concurrent
runs against the same dir; the sibling `.br` doesn't introduce a
new race that wasn't already there.
* `src/utils/coana.mts`: `compressSocketFactsForUpload` writes
`${p}.br` next to the source; `cleanup()` does
`rm(brPath, { force: true })` per file. Drops `mkdtemp` import
that's no longer used.
* `src/utils/coana.test.mts`: directory-shape assertion replaced
with `swappedPath === ${input}.br` (sibling). First test now
also asserts the source survives `cleanup()`. 16 cases.
Per Cursor Bugbot finding on src/utils/coana.mts: if any brotli
pipeline in the `Promise.all` batch rejects, the helper rejected
before returning its `cleanup` callback, so already-completed
sibling `.br` files were orphaned next to their source.
Three changes that together make the helper failure-safe:
* Track the intended `.br` path BEFORE the pipeline starts. Previous
code pushed to `brPaths` after `await pipeline(...)`, so a sibling
that the pipeline started writing but didn't finish was invisible
to cleanup. `rm({ force: true })` no-ops on missing paths, so
tracking the intent is safe.
* Switch the parallel batch from `Promise.all` to `Promise.allSettled`.
With `all`, the first rejection bubbled out while sibling pipelines
were still writing bytes; calling `cleanup()` then would race the
live writes and could `rm` a sibling only to have it re-created
immediately. `allSettled` waits for every pipeline to settle, so
cleanup sees a stable set of files to remove.
* On batch failure, run `cleanup()` internally before re-throwing so
the caller (which only gets a chance to call cleanup when the
helper resolves) doesn't have to. Add `recursive: true` to the rm
call so a defensively-occupied directory at a `.br` path doesn't
halt cleanup partway through the list.
Test added: `removes partial .br siblings if compression fails
mid-batch` — pre-occupies entry B's `.br` destination with a
directory so its `createWriteStream` rejects with EISDIR, then
asserts that entry A's `.br` is not orphaned on disk after the
batch rejects and both source files survive untouched.
dbed10b to
06746ea
Compare
Bump version to 1.1.98 and add changelog entry for the brotli upload-compression refactor in this PR.
06746ea to
084f311
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Brotli-compress
.socket.facts.jsonat upload time. Coana keeps writing plain JSON; only the wire bytes between socket-cli and depscan change. ~85% reduction on the simple-npm fixture; production mono-repo payloads see 60+ MB → ~10 MB.How
compressSocketFactsForUpload(scanPaths)insrc/utils/coana.mts— async + streaming (createReadStream → createBrotliCompress → createWriteStreamviapipeline). For any.socket.facts.jsonin scanPaths, writes a sibling.socket.facts.json.brnext to the source file. Returns swapped paths + cleanup callback.handle-create-new-scan.mtswrapsfetchCreateOrgFullScanintry { compress; upload; } finally { cleanup; }.Why sibling-write (not
os.tmpdir(), not a temp subdir)The SDK passes
path.relative(cwd, brPath)as the multipart name. depscan'saddStreamEntryrejects entries with..traversal — a tmpdir path resolves to../../../var/folders/...and gets dropped tounmatchedFiles. Sibling-write keeps the relative path inside cwd, and keeps directory shape symmetric with the plain.socket.facts.jsonupload (depscan strips the.brsuffix at ingest, so<dir>/.socket.facts.json.brand<dir>/.socket.facts.jsonresolve to the same storage path).Concurrent scans against the same source dir already collide on
.socket.facts.jsonitself (coana writes to a single path), so the sibling.brdoesn't introduce a new race.Tests
src/utils/coana.test.mts(sibling-path round-trip, passthrough, idempotent cleanup that leaves the source intact, plus the existingextractTier1ReachabilityScanId/extractReachabilityErrorsmatrix).socket scan create --reachagainstsimple-npm:scan_type=socket_tier1,unmatchedFiles: [], root manifest entries are.socket.facts.jsonandpackage.json(storage path matches the plain upload — dir-preservation symmetry holds).Note
Medium Risk
Changes the scan creation upload pipeline to rewrite
.socket.facts.jsoninputs into temporary Brotli-compressed.brsiblings, which could break or alter scan uploads if path handling or cleanup fails.Overview
Scan uploads now Brotli-compress Coana facts files.
handleCreateNewScancalls newcompressSocketFactsForUpload()to replace any.socket.facts.jsonentries inscanPathswith sibling.socket.facts.json.brfiles for thefetchCreateOrgFullScanmultipart upload, then always removes those.brfiles in afinallyblock.Adds a streaming compressor utility + tests.
utils/coana.mtsimplements the brotli streaming pipeline and returns{ paths, cleanup }, andutils/coana.test.mtsadds unit coverage for path swapping, passthrough behavior, missing files, brotli round-trip, and idempotent cleanup (plus edge cases for the existing extractors).Reviewed by Cursor Bugbot for commit bb96685. Configure here.