Skip to content

feat(cli): add okactl command-line tool for sandbox operations#497

Open
Liquorice-Ma wants to merge 9 commits into
openkruise:masterfrom
Liquorice-Ma:agents-cli
Open

feat(cli): add okactl command-line tool for sandbox operations#497
Liquorice-Ma wants to merge 9 commits into
openkruise:masterfrom
Liquorice-Ma:agents-cli

Conversation

@Liquorice-Ma

Copy link
Copy Markdown

Ⅰ. Describe what this PR does

Add a new CLI component agent-cli — a kubectl-style command-line tool for common sandbox operations, eliminating the need to hand-write YAML or use kubectl edit.

Three commands are introduced:

agent-cli scale sandboxset --replicas=N — Scale a SandboxSet's replica count via JSON Merge Patch (atomic, no optimistic-lock conflicts).
agent-cli set image sandboxset container=image [...] — Update one or more container images in a SandboxSet's inline template. Detects TemplateRef usage and guides users to modify the SandboxTemplate directly.
agent-cli restart sandbox [-c container ...] — Restart containers in a running Sandbox by creating a SandboxContainerRestart CR. If no -c flags are specified, all user containers are restarted.
The restart command follows a CRD-driven pattern (inspired by OpenKruise's ContainerRecreateRequest):

CLI creates a SandboxContainerRestart CR
A new controller watches these CRs and executes container restarts via kubectl exec kill -TERM 1
The CR tracks per-container status (Pending → Succeeded/Failed) and supports TTL-based auto-cleanup
New CRD: SandboxContainerRestart (shortName: scr) with support for:

Failure policies (Fail / Ignore)
Ordered or parallel restart
Active deadline timeout
TTL-based auto-cleanup after completion

Ⅱ. Does this pull request fix one issue?

NONE

Ⅲ. Describe how to verify it

Build the CLI:

go build -o agent-cli ./cmd/agent-cli/
Scale a SandboxSet:

./agent-cli scale sandboxset my-sbs --replicas=5 -n sandbox-system
Update container images:

./agent-cli set image sandboxset my-sbs main=nginx:2.0 sidecar=envoy:2.0 -n sandbox-system
Restart containers in a Sandbox:

Restart specific containers

./agent-cli restart sandbox my-sbx -c main -c sidecar -n sandbox-system

Restart all user containers

./agent-cli restart sandbox my-sbx -n sandbox-system
Run unit tests:

go test ./pkg/cli/... ./pkg/controller/sandboxcontainerrestart/...

Ⅳ. Special notes for reviews

The compiled binary agent-cli (47MB) is included in this commit and should be removed before merge — it should be built via CI instead.
SandboxContainerRestart uses dynamic.Interface in the CLI because the generated clientset does not yet include this new type. After running code-gen, this can be switched to a typed client.
The orderedRecreate and minStartedSeconds strategy fields are defined in the CRD spec but not yet implemented in the controller — these are reserved for future iterations.
After merge, make generate manifests should be run to generate proper deepcopy functions and CRD YAML (the current types include hand-written DeepCopyObject stubs).

@kruise-bot

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zmberg for approval by writing /assign @zmberg in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 88.56209% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.13%. Comparing base (1ee08c4) to head (af064e3).
⚠️ Report is 47 commits behind head on master.

Files with missing lines Patch % Lines
pkg/cli/create.go 83.85% 15 Missing and 11 partials ⚠️
pkg/cli/setimage.go 89.94% 11 Missing and 6 partials ⚠️
pkg/cli/restart.go 86.88% 10 Missing and 6 partials ⚠️
pkg/cli/options.go 86.66% 3 Missing and 3 partials ⚠️
pkg/cli/scale.go 94.33% 2 Missing and 1 partial ⚠️
pkg/cli/status.go 96.72% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #497      +/-   ##
==========================================
+ Coverage   78.34%   80.13%   +1.79%     
==========================================
  Files         162      208      +46     
  Lines       11739    15299    +3560     
==========================================
+ Hits         9197    12260    +3063     
- Misses       2187     2587     +400     
- Partials      355      452      +97     
Flag Coverage Δ
unittests 80.13% <88.56%> (+1.79%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread docs/agent-cli-design.md Outdated

OpenKruise Agents 项目原有四个组件(controller、manager、gateway、runtime),都是长运行服务。日常运维中,对 SandboxSet 的扩缩容、镜像更新、容器重启等操作只能通过 `kubectl edit` 或手写 YAML 完成,操作繁琐且容易出错。

为此,我们新增了第五个组件 **agent-cli** —— 一个 kubectl 风格的命令行工具,提供三个核心命令:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider rename to okactl

Comment thread docs/agent-cli-design.md Outdated

前两个命令(scale、set image)是**纯客户端操作**,直接通过 K8s API 修改 SandboxSet CR,由已有的 SandboxSet controller 自动处理变更。

第三个命令(restart)采用了**CRD 驱动模式**(参照 OpenKruise 的 ContainerRecreateRequest):CLI 创建一个 `SandboxContainerRestart` CR,新增的 controller 监听并执行实际的容器重启操作。

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cli create CRR is enough, no need to create SandboxContainerRestart CR

Comment thread docs/agent-cli-design.md Outdated
@@ -0,0 +1,227 @@
# agent-cli 命令行工具设计与实现文档

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz put the proposal in docs/proposals, and rewrite the design in English for wider audience

@Liquorice-Ma Liquorice-Ma changed the title 新增agent-cli命令行工具 feat(cli): add okactl command-line tool for sandbox operations Jun 5, 2026
feat(cli): add okactl command-line tool for sandbox operations

fix(cli): add in-cluster config support for okactl running inside Pods

Update okactl binary

okactl scale/setimage/restart -h
@Liquorice-Ma Liquorice-Ma force-pushed the agents-cli branch 4 times, most recently from 920a645 to 57d6ff2 Compare June 15, 2026 03:48
@Liquorice-Ma Liquorice-Ma force-pushed the agents-cli branch 2 times, most recently from 06fb203 to b48e3df Compare June 22, 2026 06:42
Add create suo subcommand that creates SandboxUpdateOps to batch update
container images of claimed sandboxes by label selector.
Includes auto-cleanup of existing SUOs, container name validation,
and correct PodTemplateSpec-level patch structure.

Also adds proposal doc for the create suo feature.
- Add E2E tests for scale, set image, create suo, and restart commands
- Add dedicated e2e-okactl.yaml workflow running on kind cluster
- Add unit tests to meet 80% coverage requirement
- Fix set image optimistic lock conflict with retry.RetryOnConflict
@Liquorice-Ma Liquorice-Ma force-pushed the agents-cli branch 2 times, most recently from 59cca84 to cbfe589 Compare June 22, 2026 09:37
… display

- Add top-level 'okactl status' command group with 'sbs' and 'suo' subcommands
  - status sbs: show SandboxSet rolling update progress with auto-diagnosis
  - status suo: show SandboxUpdateOps batch update progress
  - Both support --wait flag for polling until completion
- Add cobra Aliases for resource short names (sandboxset/sbs, sandboxupdateops/suo)
- Customize cobra usage template to display subcommand aliases in help output
- Remove 'set image status' subcommand (replaced by top-level 'status sbs')
- Update set image examples to reference 'okactl status sbs'
- Add developer manual for okactl CLI
- Add proposal document for status command design
- Add unit tests (coverage: 85.2%) and E2E tests for new commands

Signed-off-by: 马赫 <Yurong.mh@alibaba-inc.com>

@furykerry furykerry left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz never submit binary to the repo

Comment thread pkg/cli/create.go
}

var deleted []string
for i := range list.Items {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteActiveSandboxUpdateOps deletes ALL SUOs, not just active ones, plz Only delete SUOs with Phase == Pending || Phase == Updating

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread pkg/cli/restart.go Outdated
}

if sbx.Spec.Template == nil {
return nil

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz fetch the referenced SandboxTemplate to validate if Template nil

Comment thread pkg/cli/setimage.go Outdated

// diagnoseSandboxSetUpdate checks sandboxes belonging to a SandboxSet and reports any issues.
// It builds a kubernetes client to inspect pod status when sandbox messages are empty.
func diagnoseSandboxSetUpdate(globalOpts *GlobalOptions, sbs *agentsv1alpha1.SandboxSet, reported *map[string]bool) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ach diagnosis call creates two new REST clients via globalOpts.AgentsClient() and globalOpts.KubeClient(), which involves re-reading kubeconfig and establishing new TLS connections. In --wait mode this happens every 3 poll cycles (~9s), consider pass the already-created clients as parameters to diagnoseSandboxSetUpdate.

Comment thread pkg/cli/create.go
}

// formatSuoImagePairs formats a map of container=image pairs as a slice of "container=image" strings.
func formatSuoImagePairs(images map[string]string) []string {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ormatSuoImagePairs() and buildSuoImagePatch() generate Non-deterministic output from map iteration, consider sort keys before iteration in both functions.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread pkg/cli/setimage.go Outdated
}

// parseContainerImages parses "container=image" pairs and returns a map.
func parseContainerImages(args []string) (map[string]string, error) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setimage.go and create.go duplicate parseContainerImages / parseSuoContainerImages, Consider extracting a shared parseImageArgs() to reduce duplication.

okactl status sbs my-pool --wait

# Batch update images for claimed sandboxes
okactl create suo -l app=my-app app=nginx:1.25

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to considate create suo command with set image

okactl set image -l app=my-app app=nginx:1.25

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally set image supported both sbs and sbx, but create suo was split out in an earlier review so that set image only handles SandboxSet while claimed sandboxes are updated via SUO.

Comment thread docs/developer-manuals/okactl.md Outdated
okactl set image sbs my-pool app=nginx:1.25

# Check update progress (or wait for completion)
okactl status sbs my-pool --wait

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--wait should be the option of set image not status

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

马赫 added 2 commits June 24, 2026 12:01
Binary is now distributed via GitHub Releases instead of committing
to the repository.

Signed-off-by: 马赫 <Yurong.mh@alibaba-inc.com>
- Use apierrors.IsNotFound instead of string matching in waitForSUODeletion
- Replace parseSuoSelectorToMap with metav1.ParseToLabelSelector for full
  label selector syntax support (key in (v1,v2), key!=value, etc.)
- Validate container names against all matching sandboxes instead of only
  the first; warn on partial mismatch, error only when missing from all
- Add Running phase check to restart command before creating CRR
- Remove --wait flag from status command (belongs to set image only)
- Sync developer manual and proposal documents

Signed-off-by: 马赫 <Yurong.mh@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants