Skip to content

fix (typespec_client_core): reduce peak memory in AsyncResponseBody::collect#4093

Open
nathanielmiller23 wants to merge 5 commits intoAzure:mainfrom
nathanielmiller23:fix/reduce-peak-memory-in-AsyncResponseBodycollect
Open

fix (typespec_client_core): reduce peak memory in AsyncResponseBody::collect#4093
nathanielmiller23 wants to merge 5 commits intoAzure:mainfrom
nathanielmiller23:fix/reduce-peak-memory-in-AsyncResponseBodycollect

Conversation

@nathanielmiller23
Copy link
Copy Markdown

Iterates on #3893.

What changed

Refactors AsyncResponseBody::collect in
sdk/core/typespec_client_core/src/http/response.rs to reduce peak
retained memory during response body collection.

The previous implementation used a two-pass strategy: it accumulated all
incoming chunks into an intermediate Vec<Bytes> while tracking total
length, then allocated a single BytesMut with exact capacity and copied
the chunks into it. As discussed in #3893 and PR #3879, this meant the
full set of collected chunks remained live until the final contiguous
buffer was constructed, so both were briefly resident simultaneously.

This change replaces that with a single-pass loop that appends each chunk
directly into a growing BytesMut. This removes the
intermediate Vec<Bytes> staging step and reduces peak retained memory
during collection. The trade-off is amortized buffer growth rather than a
single exact-capacity allocation.

Validation

  • cargo test -p typespec_client_core → 139 passed, 0 failed
  • cargo clippy -p typespec_client_core -- -D warnings → 0 warnings

Benchmark Results

Local Criterion benchmarks (Apple Silicon, 100 samples each) compare a
reconstruction of the prior two-pass collection strategy against the
current single-pass implementation, measuring the collection algorithm in
isolation without network I/O. The benchmark lives in azure_core rather
than typespec_client_core because the public AsyncResponseBody type is
re-exported through azure_core and the Criterion infrastructure is
already present there.

Run with: cargo bench -p azure_core --bench collect_benchmark

Times are Criterion median values (ns = nanoseconds, µs = microseconds, ms = milliseconds; 1ms = 1,000µs = 1,000,000ns). Lower is faster.

Workload two_pass (reconstructed from PR #3879) single_pass (this PR)
10 chunks × 1 KB (10 KB total) 20.887 µs 886.55 ns
100 chunks × 1 KB (100 KB total) 209.17 µs 7.561 µs
10 chunks × 64 KB (640 KB total) 1.316 ms 30.881 µs
10 chunks × 1 MB (10 MB total) 20.936 ms 901.90 µs

In these local runs, the single-pass approach was faster across all tested
cases. In production, network I/O will dominate total latency, so these
ratios are specific to the isolated benchmark and are not meant to predict
real-world throughput gains. The primary motivation for this change remains
peak memory, not throughput.

Peak RSS measurement is not included here, but I can add an allocator profile or memory-focused benchmark if that would be useful.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

Thank you for your contribution @nathanielmiller23! We will review the pull request and get back to you soon.

@github-actions github-actions bot added Azure.Core The azure_core crate Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. labels Apr 4, 2026
@nathanielmiller23
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Azure.Core The azure_core crate Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant