feat: add session observability GET endpoints for sandboxes#331
feat: add session observability GET endpoints for sandboxes#331Abhinav-kodes wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
f3820f4 to
7f79307
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces functionality to retrieve individual sandboxes by session ID and list sandboxes filtered by kind. Changes include updates to the Store interface, Redis and Valkey implementations using SCAN for efficiency, and new HTTP handlers in the workload manager. Feedback highlights a potential failure in the Valkey implementation if keys expire during a scan, and suggests enforcing kind-based validation in the GET handler to prevent unauthorized cross-kind access. Additionally, indentation in the new handlers should be corrected to use tabs for consistency with Go standards.
There was a problem hiding this comment.
Pull request overview
Adds “session observability” read endpoints to the Workload Manager API so clients can retrieve a session by ID or list active sessions, backed by new store capabilities for enumerating sandboxes by kind.
Changes:
- Added
GET /v1/{agent-runtime|code-interpreter}/sessionsandGET /v1/{agent-runtime|code-interpreter}/sessions/:sessionIdroutes and handlers in Workload Manager. - Extended
store.StorewithListSandboxesByKindand implemented it for Redis + Valkey usingSCAN. - Added/updated unit tests for the new handlers and store methods, plus updated fakes to satisfy the new interface.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/workloadmanager/server.go | Registers new GET session routes for agent-runtime and code-interpreter APIs. |
| pkg/workloadmanager/handlers.go | Implements handleGetSandbox and handleListSandboxes with optional auth namespace filtering. |
| pkg/workloadmanager/handlers_test.go | Adds unit tests for the new GET handlers and updates fake store. |
| pkg/workloadmanager/garbage_collection_test.go | Updates nopStore to satisfy the extended store interface. |
| pkg/store/interface.go | Adds ListSandboxesByKind to the Store interface. |
| pkg/store/store_redis.go | Implements Redis ListSandboxesByKind via SCAN + batch load + kind filter. |
| pkg/store/store_redis_test.go | Adds a unit test for Redis ListSandboxesByKind. |
| pkg/store/store_valkey.go | Implements Valkey ListSandboxesByKind via SCAN + batch load + kind filter. |
| pkg/store/store_valkey_test.go | Adds a unit test for Valkey ListSandboxesByKind. |
| pkg/router/session_manager_test.go | Updates router test fake store to satisfy the extended store interface. |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #331 +/- ##
==========================================
+ Coverage 47.57% 50.20% +2.63%
==========================================
Files 30 30
Lines 2819 2960 +141
==========================================
+ Hits 1341 1486 +145
+ Misses 1338 1313 -25
- Partials 140 161 +21
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
44d7045 to
edbec37
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements functionality to retrieve and list sandboxes by kind, adding new API endpoints and supporting logic in the Redis and Valkey storage backends. Feedback focuses on performance optimizations, specifically recommending secondary indices to replace O(N) scans when listing sandboxes, improving slice pre-allocation for memory efficiency, and strengthening result-length validation during batch operations.
ea991fe to
9cd1422
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces the ability to list and retrieve sandboxes by their kind across Redis and Valkey storage backends, adding new API endpoints and namespace-based authorization. Review feedback suggests using non-blocking SSCAN instead of SMembers to prevent performance issues with large datasets and recommends logging errors during secondary index cleanup in DeleteSandboxBySessionID to avoid orphaned session IDs.
9cd1422 to
4e7ab3b
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces new functionality to list and retrieve sandboxes by their kind, implemented across the store interfaces (Redis and Valkey) and the workload manager server. The changes include adding secondary indexing for sandbox kinds to support efficient querying, updating the API routes, and providing corresponding test coverage. My review identified several performance and consistency concerns, specifically regarding unbounded batch operations in Redis/Valkey and the lack of atomicity in deletion operations, which I recommend addressing through batching and Lua scripting.
4e7ab3b to
0fcba68
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements functionality to retrieve and list sandboxes by kind for both Agent Runtime and Code Interpreter services. It introduces a new ListSandboxesByKind method to the storage interface with implementations for Redis and Valkey, and adds corresponding API endpoints with namespace-based authorization filtering. Feedback focused on potential scalability issues with the current SCAN-based filtering approach in the storage layer and suggested a more efficient in-place filtering implementation for the list handler to reduce memory allocations.
0fcba68 to
b35599d
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces functionality to list and retrieve sandboxes by kind and session ID across the storage and API layers. The Store interface is expanded with ListSandboxesByKind, implemented for both Redis and Valkey using efficient SCAN operations to prevent blocking. New REST endpoints are added to the workload manager to support these operations, incorporating namespace-based filtering when authentication is enabled. Review feedback suggests removing redundant nil checks in the Redis and Valkey store implementations, as the batch loading logic already ensures non-nil results.
Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
b35599d to
ae72e13
Compare
|
@LiZhenCheng9527 @hzxuzhonghu @YaoZengzeng a dedicated Redis SET per kind ( However, implementing a secondary index requires changes to For the current scale of the project, the SCAN-based approach with per-batch filtering (100 keys at a time) should be sufficient. |
|
Thank you for the update and for considering the scalability implications. Your plan to track the secondary index implementation as a follow-up optimization is reasonable, especially given the current scale of the project. The use of |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR completes the CRUD operations for the Workload Manager API by introducing "Session Observability" endpoints.
Previously, the Workload Manager only supported creating (POST) and deleting (DELETE) sandboxes. If a client disconnected or lost the initial response, there was no native way to retrieve the entrypoints or check the status of a running session.
This PR adds:
GET /v1/agent-runtime/sessions/:sessionId(and/code-interpreter) to retrieve a specific session's details.GET /v1/agent-runtime/sessions(and/code-interpreter) to list all active sessions.ListSandboxesByKindimplementation in thestore.Storeinterface, backed by non-blocking SCAN loops for both Valkey and Redis.Which issue(s) this PR fixes:
Special notes for your reviewer:
The
ListSandboxesByKindimplementations for Valkey and Redis utilize theSCANcommand rather thanKEYSto ensure the newGET /sessionsendpoint does not block the storage engine when operating at scale.Does this PR introduce a user-facing change?: