Skip to content

Harden PicoD server: graceful shutdown, symlink guard, doc fix#337

Open
Abhinav-kodes wants to merge 3 commits into
volcano-sh:mainfrom
Abhinav-kodes:harden-picod-server
Open

Harden PicoD server: graceful shutdown, symlink guard, doc fix#337
Abhinav-kodes wants to merge 3 commits into
volcano-sh:mainfrom
Abhinav-kodes:harden-picod-server

Conversation

@Abhinav-kodes
Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind security
/kind enhancement

What this PR does / why we need it:

This PR hardens the PicoD server with three targeted improvements:

  1. Graceful shutdownRun() now accepts a context.Context and shuts down the HTTP server cleanly on SIGINT/SIGTERM, allowing in-flight requests to complete before exit. This is foundational for the stateful execution direction ([Proposal] Position AgentCube as a Stateful, Isolated, Concurrent Rollout Execution Layer for Agentic RL and Verifiable Agentic Tasks #267).

  2. Symlink traversal guard in upload handlershandleMultipartUpload and handleJSONBase64Upload used os.MkdirAll to create parent directories, which follows symlinks in existing path components. An attacker with workspace write access could place a symlink pointing outside the jail, causing directory creation beyond the workspace boundary. Both call sites now use the existing mkdirSafe guard, consistent with ExecuteHandler.

  3. Doc fix — The ExecuteRequest.Timeout comment stated the default was "30s" but the actual default is 60s.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

  • The setWorkspace call site intentionally keeps os.MkdirAll — it creates the workspace root itself at startup from admin config, and mkdirSafe uses workspaceDir as the jail root, so using it there would be circular.
  • The graceful shutdown goroutine in Run() mirrors the pattern used by the workloadmanager's Shutdown() method.

Does this PR introduce a user-facing change?:

NONE

Copilot AI review requested due to automatic review settings May 14, 2026 13:01
@volcano-sh-bot volcano-sh-bot added the kind/enhancement New feature or request label May 14, 2026
@volcano-sh-bot volcano-sh-bot requested a review from acsoto May 14, 2026 13:01
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

@Abhinav-kodes: The label(s) kind/security cannot be applied, because the repository doesn't have them.

Details

In response to this:

What type of PR is this?

/kind security
/kind enhancement

What this PR does / why we need it:

This PR hardens the PicoD server with three targeted improvements:

  1. Graceful shutdownRun() now accepts a context.Context and shuts down the HTTP server cleanly on SIGINT/SIGTERM, allowing in-flight requests to complete before exit. This is foundational for the stateful execution direction ([Proposal] Position AgentCube as a Stateful, Isolated, Concurrent Rollout Execution Layer for Agentic RL and Verifiable Agentic Tasks #267).

  2. Symlink traversal guard in upload handlershandleMultipartUpload and handleJSONBase64Upload used os.MkdirAll to create parent directories, which follows symlinks in existing path components. An attacker with workspace write access could place a symlink pointing outside the jail, causing directory creation beyond the workspace boundary. Both call sites now use the existing mkdirSafe guard, consistent with ExecuteHandler.

  3. Doc fix — The ExecuteRequest.Timeout comment stated the default was "30s" but the actual default is 60s.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

  • The setWorkspace call site intentionally keeps os.MkdirAll — it creates the workspace root itself at startup from admin config, and mkdirSafe uses workspaceDir as the jail root, so using it there would be circular.
  • The graceful shutdown goroutine in Run() mirrors the pattern used by the workloadmanager's Shutdown() method.

Does this PR introduce a user-facing change?:

NONE

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign acsoto for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements graceful shutdown for the picod server by introducing a signal-aware context and updating the server's Run method. Additionally, it updates the default execution timeout documentation and switches to a safer directory creation method. The review feedback correctly identifies a race condition in the shutdown logic where the main process could exit prematurely, as well as an issue where a clean shutdown is incorrectly reported as a fatal error. A code suggestion was provided to address these concerns using a synchronization channel.

Comment thread pkg/picod/server.go Outdated
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 14, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.93%. Comparing base (524e55e) to head (02cb899).
⚠️ Report is 54 commits behind head on main.

Files with missing lines Patch % Lines
pkg/picod/server.go 0.00% 13 Missing ⚠️
pkg/picod/files.go 0.00% 0 Missing and 2 partials ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #337      +/-   ##
==========================================
+ Coverage   47.57%   48.93%   +1.36%     
==========================================
  Files          30       30              
  Lines        2819     2869      +50     
==========================================
+ Hits         1341     1404      +63     
+ Misses       1338     1312      -26     
- Partials      140      153      +13     
Flag Coverage Δ
unittests 48.93% <0.00%> (+1.36%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Abhinav-kodes
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements graceful shutdown for the picod server by integrating signal handling and context-aware execution. It also increases the default command execution timeout to 60 seconds and refactors directory creation to use a safer internal method. Feedback focuses on improving the server's state management by using a local variable for the HTTP server instead of a struct field to prevent potential race conditions, and increasing the shutdown timeout to 90 seconds to better accommodate long-running requests.

Comment thread pkg/picod/server.go Outdated
Comment thread pkg/picod/server.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@Abhinav-kodes
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements graceful shutdown for the picod server by integrating signal handling in the main entry point and utilizing the http.Server.Shutdown method. Additionally, it updates the default command timeout documentation and refactors directory creation to use a safer internal method. A review comment suggests avoiding the use of a magic number for the shutdown timeout by defining it as a constant or making it configurable.

Comment thread pkg/picod/server.go
Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement New feature or request size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants