Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion kubernetes/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,7 @@ uninstall: manifests kustomize ## Uninstall CRDs from the K8s cluster specified
.PHONY: deploy
deploy: manifests kustomize ## Deploy controller to the K8s cluster specified in ~/.kube/config.
cd config/manager && $(KUSTOMIZE) edit set image controller=${CONTROLLER_IMG}
$(KUSTOMIZE) build config/default | $(KUBECTL) apply -f -
$(KUSTOMIZE) build config/default | sed 's|TASK_EXECUTOR_IMAGE_PLACEHOLDER|$(TASK_EXECUTOR_IMG)|g' | $(KUBECTL) apply -f -

.PHONY: undeploy
undeploy: kustomize ## Undeploy controller from the K8s cluster specified in ~/.kube/config. Call with ignore-not-found=true to ignore resource not found errors during deletion.
Expand Down
88 changes: 88 additions & 0 deletions kubernetes/README-ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,94 @@ Pool 自定义资源维护一个预热的计算资源池,以实现快速沙箱
- 基于需求的自动资源分配和释放
- 实时状态监控,显示总数、已分配和可用资源

### Pod 回收策略

Pool CRD 支持可配置的 Pod 回收策略,用于确定 BatchSandbox 删除时如何处理 Pod:

#### 策略类型

| 策略 | 描述 |
|------|------|
| `Delete`(默认) | BatchSandbox 删除时直接删除 Pod |
| `Reuse` | 重置 Pod 后归还资源池以供复用 |

#### Reuse 策略的重要行为变更

使用 `podRecyclePolicy: Reuse` 时,控制器会自动对 Pod 规格进行以下修改:

| 变更 | 原因 |
|------|------|
| `restartPolicy` 从 `Never` 改为 `Always` | 重置时需要重启容器 |
| `shareProcessNamespace` 设置为 `true` | Sidecar 通信所需 |
| 添加 `SYS_PTRACE` capability | nsenter 访问容器命名空间所需 |
| 注入 `task-executor` sidecar | 处理 Pod 重置操作 |
| 添加 `sandbox-storage` volume | 主容器与 sidecar 的共享存储 |

#### Reuse 策略的前置条件

使用 `Reuse` 策略时,**必须**在部署控制器时配置 `task-executor-image` 参数:

```sh
# 使用 Helm
helm install opensandbox-controller ./charts/opensandbox-controller \
--set controller.taskExecutorImage=<registry>/opensandbox-task-executor:<tag> \
--namespace opensandbox-system

# 使用 Kustomize
make deploy CONTROLLER_IMG=<registry>/opensandbox-controller:<tag> \
TASK_EXECUTOR_IMG=<registry>/opensandbox-task-executor:<tag>
```

> **注意**:如果未配置 `task-executor-image`,`Reuse` 策略将降级为 `Delete` 并输出警告日志。

#### 配置示例

```yaml
apiVersion: sandbox.opensandbox.io/v1alpha1
kind: Pool
metadata:
name: reuse-pool
spec:
podRecyclePolicy: Reuse # 启用 Pod 复用
resetSpec:
mainContainerName: sandbox-container # 可选:默认为第一个容器
cleanDirectories: # 可选:重置时清理的目录
- "/tmp/*"
- "/var/cache/**"
timeoutSeconds: 60 # 可选:10-600 秒,默认 60
template:
spec:
containers:
- name: sandbox-container
image: ubuntu:latest
command: ["sleep", "3600"]
capacitySpec:
bufferMax: 10
bufferMin: 2
poolMax: 20
poolMin: 5
```

#### 重置工作流程

当使用 `Reuse` 策略的 BatchSandbox 被删除时:

1. **停止任务**:停止 Pod 中所有正在运行的任务
2. **清理目录**:清理指定的目录(支持 glob 模式)
3. **重启主容器**:通过 SIGTERM/SIGKILL 重启主容器
4. **归还资源池**:Pod 被归还到资源池以供复用

#### 资源池状态

资源池状态包含 `resetting` 字段,用于追踪正在重置的 Pod:

```sh
kubectl get pool reuse-pool

NAME TOTAL ALLOCATED AVAILABLE RESETTING AGE
reuse-pool 10 3 5 2 5m
```

### 任务编排
集成的任务管理系统,在沙箱内执行自定义工作负载:
- **可选执行**:任务调度完全可选 - 可以在不带任务的情况下创建沙箱
Expand Down
88 changes: 88 additions & 0 deletions kubernetes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,94 @@ The Pool custom resource maintains a pool of pre-warmed compute resources to ena
- Automatic resource allocation and deallocation based on demand
- Real-time status monitoring showing total, allocated, and available resources

### Pod Recycle Policy

The Pool CRD supports configurable pod recycle policies that determine how pods are handled when a BatchSandbox is deleted:

#### Policy Types

| Policy | Description |
|--------|-------------|
| `Delete` (default) | Delete the pod directly when BatchSandbox is deleted |
| `Reuse` | Reset the pod and return it to the pool for reuse |

#### Important Behavioral Changes for Reuse Policy

When using `podRecyclePolicy: Reuse`, the controller automatically makes the following changes to pod specs:

| Change | Reason |
|--------|--------|
| `restartPolicy` changed from `Never` to `Always` | Required for container restart during reset |
| `shareProcessNamespace` set to `true` | Required for sidecar communication |
| `SYS_PTRACE` capability added | Required for nsenter to access container namespaces |
| `task-executor` sidecar injected | Handles pod reset operations |
| `sandbox-storage` volume added | Shared storage between main container and sidecar |

#### Prerequisites for Reuse Policy

To use the `Reuse` policy, you **must** configure the `task-executor-image` parameter when deploying the controller:

```sh
# Using Helm
helm install opensandbox-controller ./charts/opensandbox-controller \
--set controller.taskExecutorImage=<registry>/opensandbox-task-executor:<tag> \
--namespace opensandbox-system

# Using Kustomize
make deploy CONTROLLER_IMG=<registry>/opensandbox-controller:<tag> \
TASK_EXECUTOR_IMG=<registry>/opensandbox-task-executor:<tag>
```

> **Note**: If `task-executor-image` is not configured, the `Reuse` policy will fall back to `Delete` with a warning log.

#### Configuration Example

```yaml
apiVersion: sandbox.opensandbox.io/v1alpha1
kind: Pool
metadata:
name: reuse-pool
spec:
podRecyclePolicy: Reuse # Enable pod reuse
resetSpec:
mainContainerName: sandbox-container # Optional: defaults to first container
cleanDirectories: # Optional: directories to clean during reset
- "/tmp/*"
- "/var/cache/**"
timeoutSeconds: 60 # Optional: 10-600 seconds, default 60
template:
spec:
containers:
- name: sandbox-container
image: ubuntu:latest
command: ["sleep", "3600"]
capacitySpec:
bufferMax: 10
bufferMin: 2
poolMax: 20
poolMin: 5
```

#### How Reset Works

When a BatchSandbox with `Reuse` policy is deleted:

1. **Stop Tasks**: All running tasks in the pod are stopped
2. **Clean Directories**: Specified directories are cleaned (supports glob patterns)
3. **Restart Main Container**: The main container is restarted via SIGTERM/SIGKILL
4. **Return to Pool**: The pod is returned to the pool for reuse

#### Pool Status with Reset

The Pool status includes a `resetting` field to track pods being reset:

```sh
kubectl get pool reuse-pool

NAME TOTAL ALLOCATED AVAILABLE RESETTING AGE
reuse-pool 10 3 5 2 5m
```

### Task Orchestration
Integrated task management system that executes custom workloads within sandboxes:
- **Optional Execution**: Task scheduling is completely optional - sandboxes can be created without tasks
Expand Down
48 changes: 48 additions & 0 deletions kubernetes/apis/sandbox/v1alpha1/pool_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,19 @@ type PoolSpec struct {
// CapacitySpec controls the size of the resource pool.
// +kubebuilder:validation:Required
CapacitySpec CapacitySpec `json:"capacitySpec"`
// PodRecyclePolicy specifies how to handle allocated Pods when a pooled BatchSandbox is deleted.
// - Delete: (default) Delete the allocated Pod directly.
// - Reuse: Reset the Pod before returning it to the pool; if reset fails, delete it.
// Note: Reuse policy requires task-executor image to be configured in controller.
// If not configured, pods will be deleted with a warning log.
// +optional
// +kubebuilder:default=Delete
// +kubebuilder:validation:Enum=Delete;Reuse
PodRecyclePolicy PodRecyclePolicy `json:"podRecyclePolicy,omitempty"`
// ResetSpec specifies the reset configuration when PodRecyclePolicy is Reuse.
// Ignored when PodRecyclePolicy is Delete.
// +optional
ResetSpec *ResetSpec `json:"resetSpec,omitempty"`
}

type CapacitySpec struct {
Expand All @@ -53,6 +66,37 @@ type CapacitySpec struct {
PoolMin int32 `json:"poolMin"`
}

// PodRecyclePolicy defines the policy for recycling pooled Pods.
type PodRecyclePolicy string

const (
// PodRecyclePolicyDelete deletes the allocated Pod directly.
PodRecyclePolicyDelete PodRecyclePolicy = "Delete"
// PodRecyclePolicyReuse resets the Pod before returning it to the pool.
// Requires task-executor image to be configured in controller.
PodRecyclePolicyReuse PodRecyclePolicy = "Reuse"
)

// ResetSpec specifies how to reset a Pod before returning it to the pool.
type ResetSpec struct {
// MainContainerName specifies which container is the main container for reset purposes.
// The main container will be restarted during reset.
// If not specified, the first container in the pod template is used.
// +optional
MainContainerName string `json:"mainContainerName,omitempty"`
// CleanDirectories specifies directories to clean during reset.
// Supports glob patterns like "/tmp/*", "/var/cache/**".
// Default: ["/tmp"]
// +optional
CleanDirectories []string `json:"cleanDirectories,omitempty"`
// TimeoutSeconds specifies the timeout for reset operation in seconds.
// +optional
// +kubebuilder:default=60
// +kubebuilder:validation:Minimum=10
// +kubebuilder:validation:Maximum=600
TimeoutSeconds int64 `json:"timeoutSeconds,omitempty"`
}

// PoolStatus defines the observed state of Pool.
type PoolStatus struct {
// ObservedGeneration is the most recent generation observed for this BatchSandbox. It corresponds to the
Expand All @@ -66,6 +110,9 @@ type PoolStatus struct {
Allocated int32 `json:"allocated"`
// Available is the number of nodes currently available in the pool.
Available int32 `json:"available"`
// Resetting is the number of Pods currently being reset.
// +optional
Resetting int32 `json:"resetting,omitempty"`
}

// +genclient
Expand All @@ -75,6 +122,7 @@ type PoolStatus struct {
// +kubebuilder:printcolumn:name="TOTAL",type="integer",JSONPath=".status.total",description="The number of all nodes in pool."
// +kubebuilder:printcolumn:name="ALLOCATED",type="integer",JSONPath=".status.allocated",description="The number of allocated nodes in pool."
// +kubebuilder:printcolumn:name="AVAILABLE",type="integer",JSONPath=".status.available",description="The number of available nodes in pool."
// +kubebuilder:printcolumn:name="RESETTING",type="integer",JSONPath=".status.resetting",description="The number of pods being reset in pool."
// Pool is the Schema for the pools API.
type Pool struct {
metav1.TypeMeta `json:",inline"`
Expand Down
25 changes: 25 additions & 0 deletions kubernetes/apis/sandbox/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 16 additions & 0 deletions kubernetes/charts/opensandbox-controller/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,8 @@ kubectl delete crd pools.sandbox.opensandbox.io
| `controller.podLabels` | Additional labels for controller pods | `{}` |
| `controller.podAnnotations` | Additional annotations for controller pods | `{}` |
| `controller.priorityClassName` | Priority class name for controller pods | `""` |
| `controller.taskExecutorImage` | Task executor image for pod reuse policy (required for Reuse policy) | `""` |
| `controller.taskExecutorResources` | Task executor sidecar resources in format "cpu,memory" | `200m,128Mi` |

### RBAC Parameters

Expand Down Expand Up @@ -149,6 +151,20 @@ imagePullSecrets:
- name: myregistrykey
```

### Enable Pod Reuse Policy

To use the `Reuse` pod recycle policy, you must configure the task-executor image:

```yaml
controller:
taskExecutorImage: myregistry.example.com/opensandbox-task-executor:v0.1.0
taskExecutorResources: 200m,128Mi # Optional: defaults to "200m,128Mi"
```

> **Note**: Without `taskExecutorImage` configured, pools with `podRecyclePolicy: Reuse` will fall back to `Delete` behavior.

See [Pod Recycle Policy](../README.md#pod-recycle-policy) for more details.

### Node Affinity

```yaml
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,12 @@ spec:
{{- if and .Values.controller.kubeClient (gt .Values.controller.kubeClient.burst 0) }}
- --kube-client-burst={{ .Values.controller.kubeClient.burst }}
{{- end }}
{{- if .Values.controller.taskExecutorImage }}
- --task-executor-image={{ .Values.controller.taskExecutorImage }}
{{- end }}
{{- if .Values.controller.taskExecutorResources }}
- --task-executor-resources={{ .Values.controller.taskExecutorResources }}
{{- end }}
ports:
- name: health
containerPort: 8081
Expand Down
9 changes: 9 additions & 0 deletions kubernetes/charts/opensandbox-controller/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,15 @@ controller:
# -- Priority class name for controller pods
priorityClassName: ""

# -- Task executor image for pod reuse policy support.
# Required when using Pool with podRecyclePolicy: Reuse.
# If not configured, Reuse policy will fall back to Delete with a warning log.
taskExecutorImage: ""

# -- Task executor sidecar resources in format "cpu,memory".
# Example: "200m,128Mi". Both request and limit will be set to the same value.
taskExecutorResources: "200m,128Mi"

# -- Image pull secrets for private registries
imagePullSecrets: []
# - name: myregistrykey
Expand Down
Loading
Loading