Skip to content

feat(sandbox-manager): support auto-resume (wake-on-traffic) sandboxes#495

Open
AiRanthem wants to merge 6 commits into
openkruise:masterfrom
AiRanthem:feature/auto-resume-260601
Open

feat(sandbox-manager): support auto-resume (wake-on-traffic) sandboxes#495
AiRanthem wants to merge 6 commits into
openkruise:masterfrom
AiRanthem:feature/auto-resume-260601

Conversation

@AiRanthem

Copy link
Copy Markdown
Member

Summary

  • Add wake-on-traffic annotation lifecycle and timeout parsing for E2B sandboxes.
  • Add scoped system credential support for manager-to-gateway wake requests.
  • Teach sandbox-gateway to resume paused sandboxes on matching traffic and add focused unit coverage.

Test Plan

  • go test ./pkg/sandbox-gateway/... ./pkg/sandbox-manager/... ./pkg/servers/e2b/... ./pkg/utils/proxyutils ./pkg/utils/timeout ./pkg/proxy

@kruise-bot kruise-bot requested review from furykerry and zmberg June 3, 2026 10:19
@kruise-bot

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zmberg for approval by writing /assign @zmberg in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.35560% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.91%. Comparing base (0a98763) to head (e2cea07).
⚠️ Report is 15 commits behind head on master.

Files with missing lines Patch % Lines
pkg/sandbox-gateway/filter/filter.go 87.50% 8 Missing and 5 partials ⚠️
pkg/servers/e2b/core.go 33.33% 10 Missing ⚠️
pkg/servers/e2b/keys/systemkey.go 93.33% 3 Missing and 3 partials ⚠️
...g/sandbox-gateway/controller/gateway_controller.go 0.00% 5 Missing ⚠️
pkg/sandbox-gateway/wake/wake.go 95.29% 2 Missing and 2 partials ⚠️
pkg/sandbox-gateway/server/server.go 75.00% 1 Missing and 1 partial ⚠️
pkg/sandbox-gateway/wake/client.go 93.33% 1 Missing and 1 partial ⚠️
pkg/sandbox-manager/infra/sandboxcr/sandbox.go 93.10% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #495      +/-   ##
==========================================
+ Coverage   79.37%   79.91%   +0.54%     
==========================================
  Files         189      194       +5     
  Lines       13308    13776     +468     
==========================================
+ Hits        10563    11009     +446     
- Misses       2365     2379      +14     
- Partials      380      388       +8     
Flag Coverage Δ
unittests 79.91% <91.35%> (+0.54%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AiRanthem AiRanthem changed the title feat: support wake-on-traffic sandboxes feat(sandbox-manager): support auto-resume (wake-on-traffic) sandboxes Jun 4, 2026
Persist the wake-on-traffic configuration on Sandbox CRs from E2B create and keep it synchronized with timeout updates. Add the AutoResume wire field, timeout annotation mutation support, validation for autoResume without autoPause, and focused tests for the manager and E2B surfaces.

Also carries the initial wake-on-traffic design spec so reviewers can evaluate the persisted configuration contract alongside the implementation.

Signed-off-by: AiRanthem <zhongtianyun.zty@alibaba-inc.com>
Introduce the cluster-scoped system key, route-level system auth scope, and cross-owner connect path used by the gateway. Thread AllowAnyOwner through the manager lookup path while keeping normal API-key ownership checks unchanged.

System callers receive no sandbox access token and get gateway-retryable wake failures mapped to HTTP 409; connect not-found behavior is mapped to 404.

Signed-off-by: AiRanthem <zhongtianyun.zty@alibaba-inc.com>
Add the gateway-side wake package, manager connect client, system-key reader, route WakeOnTraffic propagation, refresh behavior for paused routes, and async filter wake gate. The filter waits for the registry to observe Running before forwarding the original request and maps wake failures to local 502 responses.

Also shares the wake-on-traffic timeout codec with the E2B manager path and folds in review-driven fixes for system-key readiness, route refresh updates, and the design spec.

Signed-off-by: AiRanthem <zhongtianyun.zty@alibaba-inc.com>
Signed-off-by: AiRanthem <zhongtianyun.zty@alibaba-inc.com>
Signed-off-by: AiRanthem <zhongtianyun.zty@alibaba-inc.com>
@AiRanthem AiRanthem force-pushed the feature/auto-resume-260601 branch from f0115c0 to e2cea07 Compare June 10, 2026 02:50
@kruise-bot

Copy link
Copy Markdown

@AiRanthem: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants