mtls: Pods restart multiple times when SPIRE is first enabled

**What happened**:

When SPIRE is enabled via helm upgrade `--set spire.enabled=true`, the Router and WorkloadManager pods crash and restart several times before stabilizing (Router: 5 restarts, WM: 3, SPIRE Agent: 3).

The root cause is a startup race condition. The `spiffe-helper` sidecar and the main container start simultaneously, but the main container needs cert files that `spiffe-helper` hasn't written yet. Both Router (`cmd/router/main.go:66`) and WorkloadManager (`cmd/workload-manager/main.go:82`) call `klog.Fatalf` if certs aren't found within 30s (`pkg/mtls/wait.go:30`), crashing the pod.

**What you expected to happen**:
Pods should start cleanly with 0 restarts.


**How to reproduce it (as minimally and precisely as possible)**:
1. Install AgentCube without SPIRE on a Kind cluster.
2. Run `helm upgrade` with `--set spire.enabled=true`.
3. Watch pods via `kubectl get pods -n agentcube -w`. observe multiple CrashLoopBackOff cycles.

**Anything else we need to know?**:

**Suggested fix**: Convert `spiffe-helper` from a regular sidecar to a [Kubernetes native sidecar](https://kubernetes.io/blog/2023/08/25/native-sidecar-containers/) by moving it to `initContainers` with `restartPolicy: Always` . This guarantees Kubernetes starts `spiffe-helper` and lets it write the cert files before the main container starts.

**Environment**:

- agentcube version:
- Kubernetes version: v1.32.2 (Kind v0.27.0)
- Others: Helm v3.17.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtls: Pods restart multiple times when SPIRE is first enabled #374

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

mtls: Pods restart multiple times when SPIRE is first enabled #374

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions