fix (typespec_client_core): reduce peak memory in AsyncResponseBody::collect#4093
Open
nathanielmiller23 wants to merge 5 commits intoAzure:mainfrom
Open
Conversation
added 5 commits
April 3, 2026 22:03
|
Thank you for your contribution @nathanielmiller23! We will review the pull request and get back to you soon. |
Author
|
@microsoft-github-policy-service agree |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Iterates on #3893.
What changed
Refactors
AsyncResponseBody::collectinsdk/core/typespec_client_core/src/http/response.rsto reduce peakretained memory during response body collection.
The previous implementation used a two-pass strategy: it accumulated all
incoming chunks into an intermediate
Vec<Bytes>while tracking totallength, then allocated a single
BytesMutwith exact capacity and copiedthe chunks into it. As discussed in #3893 and PR #3879, this meant the
full set of collected chunks remained live until the final contiguous
buffer was constructed, so both were briefly resident simultaneously.
This change replaces that with a single-pass loop that appends each chunk
directly into a growing
BytesMut. This removes theintermediate
Vec<Bytes>staging step and reduces peak retained memoryduring collection. The trade-off is amortized buffer growth rather than a
single exact-capacity allocation.
Validation
cargo test -p typespec_client_core→ 139 passed, 0 failedcargo clippy -p typespec_client_core -- -D warnings→ 0 warningsBenchmark Results
Local Criterion benchmarks (Apple Silicon, 100 samples each) compare a
reconstruction of the prior two-pass collection strategy against the
current single-pass implementation, measuring the collection algorithm in
isolation without network I/O. The benchmark lives in
azure_coreratherthan
typespec_client_corebecause the publicAsyncResponseBodytype isre-exported through
azure_coreand the Criterion infrastructure isalready present there.
Run with:
cargo bench -p azure_core --bench collect_benchmarkTimes are Criterion median values (ns = nanoseconds, µs = microseconds, ms = milliseconds; 1ms = 1,000µs = 1,000,000ns). Lower is faster.
In these local runs, the single-pass approach was faster across all tested
cases. In production, network I/O will dominate total latency, so these
ratios are specific to the isolated benchmark and are not meant to predict
real-world throughput gains. The primary motivation for this change remains
peak memory, not throughput.
Peak RSS measurement is not included here, but I can add an allocator profile or memory-focused benchmark if that would be useful.