Feat: [spec] mTLS transport security for agent communication (003)#401
Conversation
|
|
||
| 1. **SPIRE (default and only)** — SPIRE-issued X.509 SVIDs via the Workload API. Already deployed for JWT SVIDs in Kagenti. The spiffe-helper sidecar or go-spiffe SDK provides certificates. | ||
|
|
||
| Istio, user-supplied certificates, and cert-manager are explicitly out of scope. No Istio dependency — Istio support can be added in a future iteration if needed. |
There was a problem hiding this comment.
This work was kicked off already, covers Istio service mesh with mTLS (L4) for pod-to-pod traffic encryption:
6752ecb to
3f4cae4
Compare
|
The DCO check is failing on this PR — one of the two commits is missing its
To fix, sign off all commits on the branch and force-push: git rebase --signoff main # use origin/main or upstream/main if that is your base
git push --force-with-leaseThe |
Review Guide: mTLS Transport Security for Agent CommunicationGenerated: 2026-06-08 | Spec: Why This ChangeAgent-to-agent and controller-to-agent communication in kagenti currently runs over plaintext HTTP by default. While authbridge already has full mTLS support implemented (permissive/strict modes, SPIRE-based SVIDs, per-handshake cert rotation), the operator doesn't activate it by default. Operators must manually set flags and configure mTLS mode. This spec makes mTLS the default transport security, with clear error conditions when SPIRE is unavailable. What Changes
No breaking changes. Existing deployments without SPIRE get a clear How It WorksThe implementation leverages heavily what's already built:
Of 24 tasks, 2 are already done ( When It AppliesApplies when:
Does not apply when:
Key Decisions
Areas Needing Attention
Open Questions
Review Checklist
|
Add spec-driven development artifacts for mTLS transport security covering controller-to-agent and agent-to-agent communication paths. SPIRE is the sole certificate provider; Istio is out of scope. Artifacts: spec.md, plan.md, research.md, data-model.md, tasks.md, brainstorm.md, and authbridge config contract. Jira: RHAIENG-4944 Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Reconciled tasks against current main. Key findings: - fetchCard() mTLS-first logic already implemented - Envoy TLS contexts already wired in webhook injector - Signing flags already default to false - Marked T014 and T019 as DONE - Added summary table of actual work needed (6 impl, 8 test, 6 polish) - Updated file paths and line references to match current code Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> Co-authored-by: Roland Huss <rhuss@redhat.com> Assisted-By: 🤖 Claude Code
- Add Istio coexistence section: SPIRE and Istio mTLS are complementary layers, coexist without conflict - Clarify MTLSReady does NOT block Ready condition; emit Warning Event instead for kubectl describe visibility - Document SPIRE CSI driver as known limitation in detection heuristic (spiffe-helper is the supported pattern for now) - Update T016 task to reflect these decisions Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
e6b379a to
90bc21a
Compare
|
Using the following points for tracking based on #405 and team review:
|
pdettori
left a comment
There was a problem hiding this comment.
Solid spec-first approach. Architecture is sound — leveraging existing authbridge mTLS and adding operator-side wiring. The team has already identified the key pivot via #405 (annotation-based delivery via webhook instead of ConfigMap injection), which simplifies the operator's responsibility.
Main suggestion: update the spec artifacts to reflect the #405 decisions before merge, so the spec doesn't go stale on day one.
Areas reviewed: Architecture/design, scope boundaries, security model, task coherence
Commits: 4 commits, all signed-off ✓
CI: E2E failing (unrelated — spec-only PR), all other checks pass
Update mTLS config delivery from ConfigMap injection to annotation + env var approach per PR kagenti#405 team review: - Controller sets kagenti.io/mtls-mode annotation on pod template (triggers rolling restart on change, independent of config hash) - Webhook reads mTLSMode from AgentRuntime CR at pod CREATE time, sets MTLS_MODE env var on authbridge container - Acknowledge Istio mTLS coexistence (PR kagenti#383, kagenti#399, RHAIENG-5467) - Mark T005 as SUPERSEDED with new annotation-based approach - Update contracts, data-model, research, REVIEWERS.md Jira: RHAIENG-4944 Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
|
Have updated based on aggregates made in #405. Are we good with getting the spec in? |
Summary
permissive(enabled by default, opt-out withmTLSMode: disabled)Artifacts
specs/003-mtls-transport-security/spec.mdspecs/003-mtls-transport-security/plan.mdspecs/003-mtls-transport-security/research.mdspecs/003-mtls-transport-security/data-model.mdspecs/003-mtls-transport-security/tasks.mdspecs/003-mtls-transport-security/contracts/specs/003-mtls-transport-security/brainstorm.mdKey Decisions
permissivemode, operators opt out withmTLSMode: disabledMTLSReady=Falsecondition when SPIRE is unavailableReferences
Test plan
Signed-off-by: Varsha Prasad Narsing varshaprasad96@gmail.com
🤖 Generated with Claude Code