Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@
type: object
type: object
mtlsMode:
default: permissive
description: |-
MTLSMode selects the mTLS posture between authbridge sidecars on
the proxy-sidecar / lite paths. envoy-sidecar handles transport
Expand All @@ -127,8 +128,8 @@

Three valid values:

disabled Plaintext between sidecars (default).
permissive Inbound: byte-peek listener accepts both TLS and
disabled Plaintext between sidecars.
permissive (default) Inbound: byte-peek listener accepts both TLS and
plaintext on the same port. Outbound: tries TLS,
falls back to plaintext on handshake failure (one-line
WARN log per fallback). Use during rollout.
Expand All @@ -137,7 +138,7 @@
completes.

Resolution: AgentRuntime CR > namespace authbridge-runtime-config
mtls.mode > "disabled". Setting mtlsMode != disabled implicitly
mtls.mode > "permissive". Setting mtlsMode != disabled implicitly
requires SPIRE — the operator auto-enables spire for the workload.

CR-empty vs CR="disabled" are observably different in
Expand Down Expand Up @@ -420,7 +421,7 @@
lastTransitionTime:
description: |-
lastTransitionTime is the last time the condition transitioned from one status to another.
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.

Check warning on line 424 in charts/kagenti-operator/crds/agent.kagenti.dev_agentruntimes.yaml

View workflow job for this annotation

GitHub Actions / YAML Lint

424:151 [line-length] line too long (162 > 150 characters)
format: date-time
type: string
message:
Expand All @@ -432,7 +433,7 @@
observedGeneration:
description: |-
observedGeneration represents the .metadata.generation that the condition was set based upon.
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date

Check warning on line 436 in charts/kagenti-operator/crds/agent.kagenti.dev_agentruntimes.yaml

View workflow job for this annotation

GitHub Actions / YAML Lint

436:151 [line-length] line too long (162 > 150 characters)
with respect to the current state of the instance.
format: int64
minimum: 0
Expand Down
7 changes: 4 additions & 3 deletions kagenti-operator/api/v1alpha1/agentruntime_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,8 @@ type AgentRuntimeSpec struct {
//
// Three valid values:
//
// disabled Plaintext between sidecars (default).
// permissive Inbound: byte-peek listener accepts both TLS and
// disabled Plaintext between sidecars.
// permissive (default) Inbound: byte-peek listener accepts both TLS and
// plaintext on the same port. Outbound: tries TLS,
// falls back to plaintext on handshake failure (one-line
// WARN log per fallback). Use during rollout.
Expand All @@ -96,7 +96,7 @@ type AgentRuntimeSpec struct {
// completes.
//
// Resolution: AgentRuntime CR > namespace authbridge-runtime-config
// mtls.mode > "disabled". Setting mtlsMode != disabled implicitly
// mtls.mode > "permissive". Setting mtlsMode != disabled implicitly
// requires SPIRE — the operator auto-enables spire for the workload.
//
// CR-empty vs CR="disabled" are observably different in
Expand All @@ -111,6 +111,7 @@ type AgentRuntimeSpec struct {
// process start).
//
// +optional
// +kubebuilder:default=permissive

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: This default flip has a notable upgrade impact worth surfacing. Per the field doc, mtlsMode != disabled auto-enables SPIRE and changing the mode triggers a pod rollout. So on operator upgrade, every existing AgentRuntime with an unset mtlsMode flips empty→permissive → SPIRE auto-enabled + pod rollout, fleet-wide. permissive is designed to be safe (accepts/falls back to plaintext), but on clusters without SPIRE this relies entirely on graceful fallback — and the "deploy without SPIRE, verify graceful fallback" E2E box is still unchecked in the test plan. Recommend verifying the no-SPIRE path before merge and calling out the rollout + SPIRE auto-enable in release notes / a prominent startup log (the existing startup logs cover card-discovery/verified-fetch but not the mTLS default change).

// +kubebuilder:validation:Enum=disabled;permissive;strict
MTLSMode string `json:"mtlsMode,omitempty"`
}
Expand Down
32 changes: 28 additions & 4 deletions kagenti-operator/cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -171,10 +171,10 @@ func main() {
flag.StringVar(&mlflowCAFile, "mlflow-ca-file", "",
"Path to PEM-encoded CA bundle for MLflow TLS verification (appended to system pool)")

flag.BoolVar(&enableCardDiscovery, "enable-card-discovery", false,
"Enable automatic agent card discovery from AgentRuntime workloads into status.card")
flag.BoolVar(&enableVerifiedFetch, "enable-verified-fetch", false,
"Enable mTLS-authenticated fetch of agent cards via SPIFFE identity")
flag.BoolVar(&enableCardDiscovery, "enable-card-discovery", true,
"Enable automatic agent card discovery from AgentRuntime workloads into status.card (set to false to disable)")
flag.BoolVar(&enableVerifiedFetch, "enable-verified-fetch", true,
"Enable mTLS-authenticated fetch of agent cards via SPIFFE identity (set to false as kill switch)")
flag.StringVar(&verifiedFetchSpiffeSocket, "verified-fetch-spiffe-socket",
"unix:///spiffe-workload-api/spire-agent.sock",
"SPIFFE Workload API socket path for verified fetch")
Expand Down Expand Up @@ -237,6 +237,30 @@ func main() {

ctrl.SetLogger(zap.New(zap.UseFlagOptions(&opts)))

// Startup info logs for defaults that changed in this release.
if enableCardDiscovery {
setupLog.Info("card discovery enabled by default; set --enable-card-discovery=false to disable")
}
if enableVerifiedFetch {
setupLog.Info("verified fetch enabled by default; set --enable-verified-fetch=false to disable")
}

// Deprecation warnings for legacy flags that are now superseded by
// mTLS defaults (permissive mode auto-enables SPIRE and identity binding).
for _, dep := range []struct {
name string
set bool
}{
{"require-a2a-signature", requireA2ASignature},
{"signature-audit-mode", signatureAuditMode},
{"enforce-network-policies", enforceNetworkPolicies},
} {
if dep.set {
setupLog.Info("DEPRECATED: flag is superseded by mTLS permissive default; will be removed in a future release",
"flag", dep.name)
}
}

ctx := ctrl.SetupSignalHandler()

// ========================================
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ spec:
type: object
type: object
mtlsMode:
default: permissive
description: |-
MTLSMode selects the mTLS posture between authbridge sidecars on
the proxy-sidecar / lite paths. envoy-sidecar handles transport
Expand All @@ -127,8 +128,8 @@ spec:

Three valid values:

disabled Plaintext between sidecars (default).
permissive Inbound: byte-peek listener accepts both TLS and
disabled Plaintext between sidecars.
permissive (default) Inbound: byte-peek listener accepts both TLS and
plaintext on the same port. Outbound: tries TLS,
falls back to plaintext on handshake failure (one-line
WARN log per fallback). Use during rollout.
Expand All @@ -137,7 +138,7 @@ spec:
completes.

Resolution: AgentRuntime CR > namespace authbridge-runtime-config
mtls.mode > "disabled". Setting mtlsMode != disabled implicitly
mtls.mode > "permissive". Setting mtlsMode != disabled implicitly
requires SPIRE — the operator auto-enables spire for the workload.

CR-empty vs CR="disabled" are observably different in
Expand Down
27 changes: 21 additions & 6 deletions kagenti-operator/internal/controller/agentruntime_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,10 @@ const (
// Value is a JSON array of skill names, set by the kagenti backend or the user.
AnnotationSkills = "kagenti.io/skills"

// AnnotationMTLSMode is the annotation applied to PodTemplateSpec to advertise the
// resolved mTLS posture. Read by authbridge sidecars for observability.
AnnotationMTLSMode = "kagenti.io/mtls-mode"

// AnnotationRestartPending marks a Sandbox that was scaled to 0 and needs
// to be scaled back to 1 on the next reconcile cycle. Two-phase restart
// avoids a race with the Sandbox controller's pod-name annotation.
Expand All @@ -72,6 +76,7 @@ const (
ConditionTypeReady = "Ready"
ConditionTypeTargetResolved = "TargetResolved"
ConditionTypeConfigResolved = "ConfigResolved"
ConditionTypeMTLSReady = "MTLSReady"

// AnnotationLastCardFetchHash stores the change-detection key used to skip
// redundant card fetches when the workload's pod template has not changed.
Expand Down Expand Up @@ -333,6 +338,12 @@ func (r *AgentRuntimeReconciler) applyWorkloadConfig(ctx context.Context, rt *ag

key := types.NamespacedName{Name: ref.Name, Namespace: rt.Namespace}

// Resolve mTLS mode: CR value takes precedence, default to "permissive".
mtlsMode := rt.Spec.MTLSMode
if mtlsMode == "" {
mtlsMode = "permissive"
}

var configHashChanged bool

err := retry.RetryOnConflict(retry.DefaultRetry, func() error {
Expand All @@ -348,7 +359,8 @@ func (r *AgentRuntimeReconciler) applyWorkloadConfig(ctx context.Context, rt *ag
alreadyConfigured := currentWorkloadLabels[LabelAgentType] == string(rt.Spec.Type) &&
currentWorkloadLabels[LabelManagedBy] == LabelManagedByValue &&
currentPodLabels[LabelAgentType] == string(rt.Spec.Type) &&
currentPodAnnotations[AnnotationConfigHash] == configHash
currentPodAnnotations[AnnotationConfigHash] == configHash &&
currentPodAnnotations[AnnotationMTLSMode] == mtlsMode

if alreadyConfigured {
return nil
Expand All @@ -375,19 +387,21 @@ func (r *AgentRuntimeReconciler) applyWorkloadConfig(ctx context.Context, rt *ag
podLabels[LabelAgentType] = string(rt.Spec.Type)
acc.setPodLabels(acc.obj, podLabels)

// Apply config-hash annotation to PodTemplateSpec
// Apply config-hash and mtls-mode annotations to PodTemplateSpec
podAnnotations := acc.getPodAnnotations(acc.obj)
if podAnnotations == nil {
podAnnotations = make(map[string]string)
}
podAnnotations[AnnotationConfigHash] = configHash
podAnnotations[AnnotationMTLSMode] = mtlsMode
acc.setPodAnnotations(acc.obj, podAnnotations)

logger.Info("Applying config to workload",
"workload", ref.Name,
"kind", ref.Kind,
"type", string(rt.Spec.Type),
"configHash", configHash[:12])
"configHash", configHash[:12],
"mtlsMode", mtlsMode)

return r.Update(ctx, acc.obj)
})
Expand Down Expand Up @@ -725,11 +739,12 @@ func (r *AgentRuntimeReconciler) handleDeletion(ctx context.Context, rt *agentv1
delete(podLabels, LabelAgentType)
acc.setPodLabels(acc.obj, podLabels)

// Remove kagenti.io/config-hash from PodTemplateSpec pod annotations.
// This triggers the rolling update that replaces existing injected pods,
// and leaves the workload annotation-clean for any future AR.
// Remove kagenti.io/config-hash and kagenti.io/mtls-mode from PodTemplateSpec
// pod annotations. This triggers the rolling update that replaces existing
// injected pods, and leaves the workload annotation-clean for any future AR.
podAnnotations := acc.getPodAnnotations(acc.obj)
delete(podAnnotations, AnnotationConfigHash)
delete(podAnnotations, AnnotationMTLSMode)
acc.setPodAnnotations(acc.obj, podAnnotations)

logger.Info("Removed kagenti labels and config-hash from workload on AgentRuntime deletion",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ type AgentRuntimeOverrides struct {

// mTLS posture — from .spec.mtlsMode
// Nil = no per-workload override; the namespace's
// authbridge-runtime-config mtls.mode (if set) or "disabled"
// authbridge-runtime-config mtls.mode (if set) or "permissive"
// applies.
MTLSMode *string
}
Expand Down
17 changes: 9 additions & 8 deletions kagenti-operator/internal/webhook/injector/envoy_template.go
Original file line number Diff line number Diff line change
Expand Up @@ -72,19 +72,20 @@ func RenderEnvoyConfig(cfg *ResolvedConfig) (string, error) {
return cfg.EnvoyYAML, nil
}

// MTLSEnabled checks both "" and MTLSModeDisabled because
// ResolvedConfig leaves MTLSMode as "" when no source set it
// (CR / namespace ConfigMap / default — see ResolveConfig). The
// resolution chain only fills MTLSMode when something explicitly
// asked for it, so "" means "no opinion → treat as disabled".
// MTLSEnabled: empty string is treated as permissive (mTLS is on
// by default). Only MTLSModeDisabled explicitly disables mTLS.
effectiveMode := cfg.MTLSMode
if effectiveMode == "" {
effectiveMode = MTLSModePermissive
}
data := envoyTemplateData{
AdminPort: cfg.Platform.Proxy.AdminPort,
OutboundPort: cfg.Platform.Proxy.Port,
InboundPort: cfg.Platform.Proxy.InboundProxyPort,
ExtProcPort: defaultExtProcPort,
MTLSEnabled: cfg.MTLSMode != "" && cfg.MTLSMode != MTLSModeDisabled,
MTLSPermissive: cfg.MTLSMode == MTLSModePermissive,
MTLSStrict: cfg.MTLSMode == MTLSModeStrict,
MTLSEnabled: effectiveMode != MTLSModeDisabled,
MTLSPermissive: effectiveMode == MTLSModePermissive,
MTLSStrict: effectiveMode == MTLSModeStrict,
}

var buf bytes.Buffer
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,10 +70,9 @@ func TestRenderEnvoyConfig_TemplateRendering(t *testing.T) {
}

func TestRenderEnvoyConfig_MTLSDisabled_NoTLSBlocks(t *testing.T) {
// Default / disabled mode — no TLS blocks should render. Locks in
// the existing plaintext shape so a future template edit can't
// silently leak TLS config into pods that didn't ask for it.
for _, mode := range []string{"", MTLSModeDisabled} {
// Explicitly disabled mode — no TLS blocks should render.
// Empty string is now treated as permissive (mTLS on by default).
for _, mode := range []string{MTLSModeDisabled} {
t.Run("mode="+mode, func(t *testing.T) {
cfg := &ResolvedConfig{
Platform: config.CompiledDefaults(),
Expand Down
23 changes: 20 additions & 3 deletions kagenti-operator/internal/webhook/injector/pod_mutator.go
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ func (m *PodMutator) InjectAuthBridge(ctx context.Context, podSpec *corev1.PodSp
}
}
if mtlsMode == "" {
mtlsMode = MTLSModeDisabled
mtlsMode = MTLSModePermissive
mtlsSource = "default"
}
// Defense in depth: the CRD enum check rejects unknown values at
Expand All @@ -270,10 +270,10 @@ func (m *PodMutator) InjectAuthBridge(ctx context.Context, podSpec *corev1.PodSp
case MTLSModeDisabled, MTLSModePermissive, MTLSModeStrict:
// recognized, keep as-is
default:
mutatorLog.Info("WARN: unrecognized mtlsMode; defaulting to disabled",
mutatorLog.Info("WARN: unrecognized mtlsMode; defaulting to permissive",
"namespace", namespace, "crName", crName,
"unrecognized", mtlsMode, "source", mtlsSource)
mtlsMode = MTLSModeDisabled
mtlsMode = MTLSModePermissive

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The defense-in-depth fallback for an unrecognized mtlsMode flips from disabled to permissive. Reasonable under the "mTLS by default" theme (fails secure rather than open), but note a garbage value now also pulls in SPIRE — fine given the CRD enum rejects unknowns at admission, just a conscious posture change worth a line in the commit/PR rationale.

mtlsSource = "default-invalid-fallback"
}
mutatorLog.Info("resolved mTLS mode",
Expand Down Expand Up @@ -516,6 +516,15 @@ func (m *PodMutator) InjectAuthBridge(ctx context.Context, podSpec *corev1.PodSp
))
}

// Set MTLS_MODE env var on the authbridge container so it knows the
// resolved mTLS posture at runtime.
for i := range podSpec.Containers {
if podSpec.Containers[i].Name == AuthBridgeProxyContainerName {
setOrAddEnv(&podSpec.Containers[i], "MTLS_MODE", mtlsMode)
break
}
}

// Inject HTTP_PROXY env vars into all existing app containers
for i := range podSpec.Containers {
c := &podSpec.Containers[i]
Expand Down Expand Up @@ -620,6 +629,14 @@ func (m *PodMutator) InjectAuthBridge(ctx context.Context, podSpec *corev1.PodSp
podSpec.Containers = append(podSpec.Containers, builder.BuildEnvoyProxyContainerWithSpireOption(spireEnabled))
}

// Set MTLS_MODE env var on the envoy-sidecar authbridge container.
for i := range podSpec.Containers {
if podSpec.Containers[i].Name == EnvoyProxyContainerName {
setOrAddEnv(&podSpec.Containers[i], "MTLS_MODE", mtlsMode)
break
}
}

if decision.ProxyInit.Inject && !containerExists(podSpec.InitContainers, ProxyInitContainerName) {
outboundExclude := annotations[OutboundPortsExcludeAnnotation]
inboundExclude := annotations[InboundPortsExcludeAnnotation]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,11 @@ func TestInjectAuthBridge_RespectsExistingServiceAccountName(t *testing.T) {
func TestInjectAuthBridge_NoSACreationWhenSpiffeHelperDisabled(t *testing.T) {
// Spiffe-helper is injected by default for agents. SA creation is skipped
// when spiffe-helper is explicitly opted out via its per-sidecar label.
m := newTestMutator(newAgentRuntime("test-ns", "my-agent"))
// MTLSMode must be set to "disabled" because the default (permissive) would
// auto-enable SPIRE, creating a ServiceAccount regardless of the spiffe-helper label.
rt := newAgentRuntime("test-ns", "my-agent")
rt.Spec.MTLSMode = "disabled"
m := newTestMutator(rt)
ctx := context.Background()

podSpec := &corev1.PodSpec{}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ type ResolvedConfig struct {
// raw AuthBridgeRuntimeYAML so callers (e.g. RenderEnvoyConfig) can
// branch on the resolved values without re-parsing the YAML.
// AuthBridgeMode is "" when no source set it (caller picks the default).
// MTLSMode is "" when no source set it (caller treats as "disabled").
// MTLSMode is "" when no source set it (caller treats as "permissive").
AuthBridgeMode string
MTLSMode string
}
Expand Down
2 changes: 2 additions & 0 deletions kagenti-operator/test/e2e/fixtures.go
Original file line number Diff line number Diff line change
Expand Up @@ -959,6 +959,7 @@ metadata:
namespace: ` + authBridgeTestNamespace + `
spec:
type: agent
mtlsMode: disabled
targetRef:
apiVersion: apps/v1
kind: Deployment
Expand Down Expand Up @@ -1092,6 +1093,7 @@ metadata:
namespace: ` + authBridgeTestNamespace + `
spec:
type: agent
mtlsMode: disabled
targetRef:
apiVersion: apps/v1
kind: Deployment
Expand Down
Loading