Skip to content

feat: add session observability GET endpoints for sandboxes#331

Open
Abhinav-kodes wants to merge 1 commit into
volcano-sh:mainfrom
Abhinav-kodes:feat-get-session-endpoints
Open

feat: add session observability GET endpoints for sandboxes#331
Abhinav-kodes wants to merge 1 commit into
volcano-sh:mainfrom
Abhinav-kodes:feat-get-session-endpoints

Conversation

@Abhinav-kodes
Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:
This PR completes the CRUD operations for the Workload Manager API by introducing "Session Observability" endpoints.

Previously, the Workload Manager only supported creating (POST) and deleting (DELETE) sandboxes. If a client disconnected or lost the initial response, there was no native way to retrieve the entrypoints or check the status of a running session.

This PR adds:

  • GET /v1/agent-runtime/sessions/:sessionId (and /code-interpreter) to retrieve a specific session's details.
  • GET /v1/agent-runtime/sessions (and /code-interpreter) to list all active sessions.
  • A ListSandboxesByKind implementation in the store.Store interface, backed by non-blocking SCAN loops for both Valkey and Redis.
  • Authorization filtering to ensure authenticated users can only view and list sessions within their own namespaces.
  • Comprehensive unit tests covering the new handlers, data store queries, and authorization boundaries.

Which issue(s) this PR fixes:

Special notes for your reviewer:
The ListSandboxesByKind implementations for Valkey and Redis utilize the SCAN command rather than KEYS to ensure the new GET /sessions endpoint does not block the storage engine when operating at scale.

Does this PR introduce a user-facing change?:

Added `GET /sessions` and `GET /sessions/:sessionId` endpoints to the Workload Manager API, allowing clients to list active sandboxes and retrieve connection entrypoints for existing sessions.

Copilot AI review requested due to automatic review settings May 13, 2026 13:18
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yaozengzeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces functionality to retrieve individual sandboxes by session ID and list sandboxes filtered by kind. Changes include updates to the Store interface, Redis and Valkey implementations using SCAN for efficiency, and new HTTP handlers in the workload manager. Feedback highlights a potential failure in the Valkey implementation if keys expire during a scan, and suggests enforcing kind-based validation in the GET handler to prevent unauthorized cross-kind access. Additionally, indentation in the new handlers should be corrected to use tabs for consistency with Go standards.

Comment thread pkg/store/store_valkey.go Outdated
Comment thread pkg/workloadmanager/handlers.go Outdated
Comment thread pkg/workloadmanager/server.go Outdated
Comment thread pkg/workloadmanager/server.go Outdated
Comment thread pkg/workloadmanager/handlers_test.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds “session observability” read endpoints to the Workload Manager API so clients can retrieve a session by ID or list active sessions, backed by new store capabilities for enumerating sandboxes by kind.

Changes:

  • Added GET /v1/{agent-runtime|code-interpreter}/sessions and GET /v1/{agent-runtime|code-interpreter}/sessions/:sessionId routes and handlers in Workload Manager.
  • Extended store.Store with ListSandboxesByKind and implemented it for Redis + Valkey using SCAN.
  • Added/updated unit tests for the new handlers and store methods, plus updated fakes to satisfy the new interface.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
pkg/workloadmanager/server.go Registers new GET session routes for agent-runtime and code-interpreter APIs.
pkg/workloadmanager/handlers.go Implements handleGetSandbox and handleListSandboxes with optional auth namespace filtering.
pkg/workloadmanager/handlers_test.go Adds unit tests for the new GET handlers and updates fake store.
pkg/workloadmanager/garbage_collection_test.go Updates nopStore to satisfy the extended store interface.
pkg/store/interface.go Adds ListSandboxesByKind to the Store interface.
pkg/store/store_redis.go Implements Redis ListSandboxesByKind via SCAN + batch load + kind filter.
pkg/store/store_redis_test.go Adds a unit test for Redis ListSandboxesByKind.
pkg/store/store_valkey.go Implements Valkey ListSandboxesByKind via SCAN + batch load + kind filter.
pkg/store/store_valkey_test.go Adds a unit test for Valkey ListSandboxesByKind.
pkg/router/session_manager_test.go Updates router test fake store to satisfy the extended store interface.

Comment thread pkg/store/store_redis.go
Comment thread pkg/store/store_valkey.go
Comment thread pkg/workloadmanager/handlers.go Outdated
Comment thread pkg/store/store_redis_test.go
Comment thread pkg/store/store_valkey_test.go
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 13, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 81.48148% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.20%. Comparing base (524e55e) to head (ae72e13).
⚠️ Report is 54 commits behind head on main.

Files with missing lines Patch % Lines
pkg/store/store_valkey.go 74.35% 5 Missing and 5 partials ⚠️
pkg/store/store_redis.go 79.31% 3 Missing and 3 partials ⚠️
pkg/workloadmanager/server.go 0.00% 4 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #331      +/-   ##
==========================================
+ Coverage   47.57%   50.20%   +2.63%     
==========================================
  Files          30       30              
  Lines        2819     2960     +141     
==========================================
+ Hits         1341     1486     +145     
+ Misses       1338     1313      -25     
- Partials      140      161      +21     
Flag Coverage Δ
unittests 50.20% <81.48%> (+2.63%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Abhinav-kodes
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@Abhinav-kodes Abhinav-kodes requested a review from Copilot May 13, 2026 14:14
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements functionality to retrieve and list sandboxes by kind, adding new API endpoints and supporting logic in the Redis and Valkey storage backends. Feedback focuses on performance optimizations, specifically recommending secondary indices to replace O(N) scans when listing sandboxes, improving slice pre-allocation for memory efficiency, and strengthening result-length validation during batch operations.

Comment thread pkg/store/store_redis.go
Comment thread pkg/store/store_valkey.go Outdated
Comment thread pkg/store/store_valkey.go
Comment thread pkg/workloadmanager/handlers.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 15 changed files in this pull request and generated 2 comments.

Comment thread pkg/store/store_redis.go
Comment thread pkg/store/store_valkey.go
Copilot AI review requested due to automatic review settings May 13, 2026 14:35
@Abhinav-kodes Abhinav-kodes force-pushed the feat-get-session-endpoints branch from ea991fe to 9cd1422 Compare May 13, 2026 14:37
@Abhinav-kodes
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ability to list and retrieve sandboxes by their kind across Redis and Valkey storage backends, adding new API endpoints and namespace-based authorization. Review feedback suggests using non-blocking SSCAN instead of SMembers to prevent performance issues with large datasets and recommends logging errors during secondary index cleanup in DeleteSandboxBySessionID to avoid orphaned session IDs.

Comment thread pkg/store/store_redis.go
Comment thread pkg/store/store_valkey.go
Comment thread pkg/store/store_redis.go Outdated
Comment thread pkg/store/store_valkey.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@Abhinav-kodes
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new functionality to list and retrieve sandboxes by their kind, implemented across the store interfaces (Redis and Valkey) and the workload manager server. The changes include adding secondary indexing for sandbox kinds to support efficient querying, updating the API routes, and providing corresponding test coverage. My review identified several performance and consistency concerns, specifically regarding unbounded batch operations in Redis/Valkey and the lack of atomicity in deletion operations, which I recommend addressing through batching and Lua scripting.

Comment thread pkg/workloadmanager/handlers.go Outdated
Comment thread pkg/store/store_redis.go Outdated
Comment thread pkg/store/store_redis.go Outdated
Comment thread pkg/store/store_valkey.go
Comment thread pkg/store/store_valkey.go Outdated
@Abhinav-kodes Abhinav-kodes force-pushed the feat-get-session-endpoints branch from 4e7ab3b to 0fcba68 Compare May 13, 2026 15:11
Copilot AI review requested due to automatic review settings May 13, 2026 15:11
@Abhinav-kodes
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements functionality to retrieve and list sandboxes by kind for both Agent Runtime and Code Interpreter services. It introduces a new ListSandboxesByKind method to the storage interface with implementations for Redis and Valkey, and adds corresponding API endpoints with namespace-based authorization filtering. Feedback focused on potential scalability issues with the current SCAN-based filtering approach in the storage layer and suggested a more efficient in-place filtering implementation for the list handler to reduce memory allocations.

Comment thread pkg/store/store_redis.go
Comment thread pkg/store/store_valkey.go
Comment thread pkg/workloadmanager/handlers.go Outdated
@Abhinav-kodes Abhinav-kodes force-pushed the feat-get-session-endpoints branch from 0fcba68 to b35599d Compare May 13, 2026 15:43
@Abhinav-kodes
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces functionality to list and retrieve sandboxes by kind and session ID across the storage and API layers. The Store interface is expanded with ListSandboxesByKind, implemented for both Redis and Valkey using efficient SCAN operations to prevent blocking. New REST endpoints are added to the workload manager to support these operations, incorporating namespace-based filtering when authentication is enabled. Review feedback suggests removing redundant nil checks in the Redis and Valkey store implementations, as the batch loading logic already ensures non-nil results.

Comment thread pkg/store/store_redis.go Outdated
Comment thread pkg/store/store_valkey.go Outdated
Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
Copilot AI review requested due to automatic review settings May 13, 2026 16:09
@Abhinav-kodes Abhinav-kodes force-pushed the feat-get-session-endpoints branch from b35599d to ae72e13 Compare May 13, 2026 16:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@Abhinav-kodes
Copy link
Copy Markdown
Contributor Author

@LiZhenCheng9527 @hzxuzhonghu @YaoZengzeng
I have applied all the ai assistant's suggestions , but SCAN-based approach is O(N) over all session keys and won't scale well with millions of sandboxes as pointed out by @gemini-code-assist

a dedicated Redis SET per kind (session:kind:agent-runtime) is the approach I'd lean towards, as it provides O(K) lookups (where K = sandboxes of that kind) without requiring a key schema migration.

However, implementing a secondary index requires changes to StoreSandbox and DeleteSandboxBySessionID to maintain the index (SADD on create, SREM on delete), which is outside the scope of this PR.
I'll track this as a follow-up optimization.

For the current scale of the project, the SCAN-based approach with per-batch filtering (100 keys at a time) should be sufficient.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thank you for the update and for considering the scalability implications. Your plan to track the secondary index implementation as a follow-up optimization is reasonable, especially given the current scale of the project. The use of SCAN with batch filtering is a pragmatic approach to avoid blocking the storage engine while still providing the requested functionality. I look forward to seeing the secondary index implementation in a future PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants