feat(cli): add okactl command-line tool for sandbox operations#497
feat(cli): add okactl command-line tool for sandbox operations#497Liquorice-Ma wants to merge 9 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #497 +/- ##
==========================================
+ Coverage 78.34% 80.13% +1.79%
==========================================
Files 162 208 +46
Lines 11739 15299 +3560
==========================================
+ Hits 9197 12260 +3063
- Misses 2187 2587 +400
- Partials 355 452 +97
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
|
||
| OpenKruise Agents 项目原有四个组件(controller、manager、gateway、runtime),都是长运行服务。日常运维中,对 SandboxSet 的扩缩容、镜像更新、容器重启等操作只能通过 `kubectl edit` 或手写 YAML 完成,操作繁琐且容易出错。 | ||
|
|
||
| 为此,我们新增了第五个组件 **agent-cli** —— 一个 kubectl 风格的命令行工具,提供三个核心命令: |
|
|
||
| 前两个命令(scale、set image)是**纯客户端操作**,直接通过 K8s API 修改 SandboxSet CR,由已有的 SandboxSet controller 自动处理变更。 | ||
|
|
||
| 第三个命令(restart)采用了**CRD 驱动模式**(参照 OpenKruise 的 ContainerRecreateRequest):CLI 创建一个 `SandboxContainerRestart` CR,新增的 controller 监听并执行实际的容器重启操作。 |
There was a problem hiding this comment.
cli create CRR is enough, no need to create SandboxContainerRestart CR
| @@ -0,0 +1,227 @@ | |||
| # agent-cli 命令行工具设计与实现文档 | |||
There was a problem hiding this comment.
plz put the proposal in docs/proposals, and rewrite the design in English for wider audience
feat(cli): add okactl command-line tool for sandbox operations fix(cli): add in-cluster config support for okactl running inside Pods Update okactl binary okactl scale/setimage/restart -h
920a645 to
57d6ff2
Compare
06fb203 to
b48e3df
Compare
Add create suo subcommand that creates SandboxUpdateOps to batch update container images of claimed sandboxes by label selector. Includes auto-cleanup of existing SUOs, container name validation, and correct PodTemplateSpec-level patch structure. Also adds proposal doc for the create suo feature.
- Add E2E tests for scale, set image, create suo, and restart commands - Add dedicated e2e-okactl.yaml workflow running on kind cluster - Add unit tests to meet 80% coverage requirement - Fix set image optimistic lock conflict with retry.RetryOnConflict
59cca84 to
cbfe589
Compare
… display - Add top-level 'okactl status' command group with 'sbs' and 'suo' subcommands - status sbs: show SandboxSet rolling update progress with auto-diagnosis - status suo: show SandboxUpdateOps batch update progress - Both support --wait flag for polling until completion - Add cobra Aliases for resource short names (sandboxset/sbs, sandboxupdateops/suo) - Customize cobra usage template to display subcommand aliases in help output - Remove 'set image status' subcommand (replaced by top-level 'status sbs') - Update set image examples to reference 'okactl status sbs' - Add developer manual for okactl CLI - Add proposal document for status command design - Add unit tests (coverage: 85.2%) and E2E tests for new commands Signed-off-by: 马赫 <Yurong.mh@alibaba-inc.com>
furykerry
left a comment
There was a problem hiding this comment.
plz never submit binary to the repo
| } | ||
|
|
||
| var deleted []string | ||
| for i := range list.Items { |
There was a problem hiding this comment.
deleteActiveSandboxUpdateOps deletes ALL SUOs, not just active ones, plz Only delete SUOs with Phase == Pending || Phase == Updating
| } | ||
|
|
||
| if sbx.Spec.Template == nil { | ||
| return nil |
There was a problem hiding this comment.
plz fetch the referenced SandboxTemplate to validate if Template nil
|
|
||
| // diagnoseSandboxSetUpdate checks sandboxes belonging to a SandboxSet and reports any issues. | ||
| // It builds a kubernetes client to inspect pod status when sandbox messages are empty. | ||
| func diagnoseSandboxSetUpdate(globalOpts *GlobalOptions, sbs *agentsv1alpha1.SandboxSet, reported *map[string]bool) { |
There was a problem hiding this comment.
ach diagnosis call creates two new REST clients via globalOpts.AgentsClient() and globalOpts.KubeClient(), which involves re-reading kubeconfig and establishing new TLS connections. In --wait mode this happens every 3 poll cycles (~9s), consider pass the already-created clients as parameters to diagnoseSandboxSetUpdate.
| } | ||
|
|
||
| // formatSuoImagePairs formats a map of container=image pairs as a slice of "container=image" strings. | ||
| func formatSuoImagePairs(images map[string]string) []string { |
There was a problem hiding this comment.
ormatSuoImagePairs() and buildSuoImagePatch() generate Non-deterministic output from map iteration, consider sort keys before iteration in both functions.
| } | ||
|
|
||
| // parseContainerImages parses "container=image" pairs and returns a map. | ||
| func parseContainerImages(args []string) (map[string]string, error) { |
There was a problem hiding this comment.
setimage.go and create.go duplicate parseContainerImages / parseSuoContainerImages, Consider extracting a shared parseImageArgs() to reduce duplication.
| okactl status sbs my-pool --wait | ||
|
|
||
| # Batch update images for claimed sandboxes | ||
| okactl create suo -l app=my-app app=nginx:1.25 |
There was a problem hiding this comment.
is it possible to considate create suo command with set image
okactl set image -l app=my-app app=nginx:1.25
There was a problem hiding this comment.
Originally set image supported both sbs and sbx, but create suo was split out in an earlier review so that set image only handles SandboxSet while claimed sandboxes are updated via SUO.
| okactl set image sbs my-pool app=nginx:1.25 | ||
|
|
||
| # Check update progress (or wait for completion) | ||
| okactl status sbs my-pool --wait |
There was a problem hiding this comment.
--wait should be the option of set image not status
Binary is now distributed via GitHub Releases instead of committing to the repository. Signed-off-by: 马赫 <Yurong.mh@alibaba-inc.com>
- Use apierrors.IsNotFound instead of string matching in waitForSUODeletion - Replace parseSuoSelectorToMap with metav1.ParseToLabelSelector for full label selector syntax support (key in (v1,v2), key!=value, etc.) - Validate container names against all matching sandboxes instead of only the first; warn on partial mismatch, error only when missing from all - Add Running phase check to restart command before creating CRR - Remove --wait flag from status command (belongs to set image only) - Sync developer manual and proposal documents Signed-off-by: 马赫 <Yurong.mh@alibaba-inc.com>
Ⅰ. Describe what this PR does
Add a new CLI component agent-cli — a kubectl-style command-line tool for common sandbox operations, eliminating the need to hand-write YAML or use kubectl edit.
Three commands are introduced:
agent-cli scale sandboxset --replicas=N — Scale a SandboxSet's replica count via JSON Merge Patch (atomic, no optimistic-lock conflicts).
agent-cli set image sandboxset container=image [...] — Update one or more container images in a SandboxSet's inline template. Detects TemplateRef usage and guides users to modify the SandboxTemplate directly.
agent-cli restart sandbox [-c container ...] — Restart containers in a running Sandbox by creating a SandboxContainerRestart CR. If no -c flags are specified, all user containers are restarted.
The restart command follows a CRD-driven pattern (inspired by OpenKruise's ContainerRecreateRequest):
CLI creates a SandboxContainerRestart CR
A new controller watches these CRs and executes container restarts via kubectl exec kill -TERM 1
The CR tracks per-container status (Pending → Succeeded/Failed) and supports TTL-based auto-cleanup
New CRD: SandboxContainerRestart (shortName: scr) with support for:
Failure policies (Fail / Ignore)
Ordered or parallel restart
Active deadline timeout
TTL-based auto-cleanup after completion
Ⅱ. Does this pull request fix one issue?
NONE
Ⅲ. Describe how to verify it
Build the CLI:
go build -o agent-cli ./cmd/agent-cli/
Scale a SandboxSet:
./agent-cli scale sandboxset my-sbs --replicas=5 -n sandbox-system
Update container images:
./agent-cli set image sandboxset my-sbs main=nginx:2.0 sidecar=envoy:2.0 -n sandbox-system
Restart containers in a Sandbox:
Restart specific containers
./agent-cli restart sandbox my-sbx -c main -c sidecar -n sandbox-system
Restart all user containers
./agent-cli restart sandbox my-sbx -n sandbox-system
Run unit tests:
go test ./pkg/cli/... ./pkg/controller/sandboxcontainerrestart/...
Ⅳ. Special notes for reviews
The compiled binary agent-cli (47MB) is included in this commit and should be removed before merge — it should be built via CI instead.
SandboxContainerRestart uses dynamic.Interface in the CLI because the generated clientset does not yet include this new type. After running code-gen, this can be switched to a typed client.
The orderedRecreate and minStartedSeconds strategy fields are defined in the CRD spec but not yet implemented in the controller — these are reserved for future iterations.
After merge, make generate manifests should be run to generate proper deepcopy functions and CRD YAML (the current types include hand-written DeepCopyObject stubs).