-
Notifications
You must be signed in to change notification settings - Fork 158
Description
Summary
Refactor agentstack so that agent scaling, deployment, and discovery are handled by kagenti instead of our custom Kubernetes provider management. This gives us standard A2A agent lifecycle, realtime agent card discovery, and a path to zero-trust identity (SPIRE/SPIFFE), service mesh (Istio Ambient), and Shipwright-based builds.
What We Gain
- Standard A2A agent lifecycle management (deploy, discover, scale)
- Realtime agent card discovery (no more storing cards in DB or Docker labels)
- Zero-trust identity via SPIRE/SPIFFE (optional)
- Service mesh observability via Istio Ambient (optional)
- Shipwright-based builds replacing our Kaniko pipeline (optional)
- Team namespace isolation
What We Drop
KubernetesProviderDeploymentManager(custom kr8s-based deployment logic)KubernetesProviderBuildManager(Kaniko build jobs with Docker label baking)- Agent card storage in database / Docker image labels
- Scale-to-zero logic (kagenti handles agent lifecycle)
- The concept of "managed" vs "unmanaged" providers (all agents become kagenti-managed)
Target Architecture
Lima VM (MicroShift)
├── agentstack namespace
│ ├── agentstack-server (FastAPI, slimmed down)
│ │ ├── A2A Proxy → delegates to kagenti agent services
│ │ └── Provider Registry → reads from kagenti API / agent cards
│ ├── PostgreSQL
│ ├── Redis
│ └── SeaweedFS
├── keycloak namespace (shared)
│ └── Keycloak (StatefulSet)
├── kagenti-system namespace
│ ├── kagenti-operator
│ ├── kagenti-webhook
│ ├── kagenti-ui (backend + frontend) [optional]
│ └── MCP Gateway
├── team1, team2, ... (agent namespaces)
│ └── agent Deployments + Services
├── istio-system (optional)
├── zero-trust-workload-identity-manager (optional)
└── cr-system (container registry, optional)
Key Decisions
Installing Kagenti into MicroShift: Helm-only
Strip kagenti down to its two Helm charts (kagenti and kagenti-deps) and install them directly into MicroShift. Skip the Ansible playbook entirely.
What Ansible does that we replicate elsewhere:
- Cluster creation → already handled by Lima/MicroShift
- DNS setup → already handled by Lima networking
- OAuth secret creation → Helm hooks or init containers
- Image preloading → pre-pull in Lima config
- Shipwright ClusterBuildStrategy → Helm template or post-install hook
Configurable Feature Toggles
kagenti:
enabled: true
features:
istio:
enabled: false # Service mesh (Ambient mode)
spire:
enabled: false # Zero-trust workload identity
shipwright:
enabled: false # Container builds
builds:
enabled: false # Build UI + Tekton
observability:
phoenix:
enabled: true # LLM trace viewer
otel:
enabled: false # OpenTelemetry collector
kiali:
enabled: false # Service mesh dashboard
containerRegistry:
enabled: false # In-cluster registry
certManager:
enabled: false # Certificate management
mcpGateway:
enabled: false # MCP GatewayMinimal local setup: just kagenti operator + webhook + Keycloak. Full-featured: everything enabled.
Keycloak: kagenti-deps owns it, agentstack consumes it
- kagenti-deps deploys Keycloak in
keycloaknamespace - Agentstack Helm chart disables its own Keycloak and references the external instance
- Deployment order:
kagenti-deps→agentstack→kagenti
Keycloak Realms: Separate
- Agentstack keeps its
agentstackrealm - Kagenti keeps its
kagentirealm - Both realms in the same Keycloak instance, fully independent
- Agentstack's provision-job.yaml stays in agentstack Helm chart, targets
keycloak-service.keycloak:8080 - Admin credentials shared via K8s secret or dedicated admin client
Agent Discovery: Kagenti Backend API
Agentstack calls kagenti's REST API to discover agents:
agentstack-server → HTTP → kagenti-backend → K8s API → Deployments with kagenti.io/type=agent
Agentstack-server gets a dedicated Keycloak client in the kagenti realm (e.g., agentstack-api) with kagenti-viewer role, using client credentials grant for service-to-service auth.
Direct K8s label scanning was rejected because K8s API RBAC is namespace-scoped - agentstack's service account can't list Deployments in team namespaces without ClusterRole escalation. The kagenti API avoids this entirely.
Multi-Tenancy: Global Agent Catalog + Per-User Data
All agents across all kagenti namespaces are visible to all agentstack users. User data (conversations, tasks, files) remains per-user.
Agents: GLOBAL (all users see all agents from all namespaces)
Conversations: PER-USER (user's own chat history with agents)
Tasks: PER-USER (A2A task ownership)
Files: PER-USER (uploaded documents)
Multi-Agent Communication: All through agentstack proxy
Agentstack issues custom tokens and controls which agent can call which, so it stays in the request path. Cross-namespace networking works fine in K8s (no architectural blockers).
# Agent (team1) → agentstack API (agentstack namespace) ← works
# Agentstack (agentstack namespace) → agent (team1) ← works
Namespaces are a logical boundary for resource organization, not a network boundary. Istio Ambient adds mTLS on top without restricting connectivity.
DNS
Adopt kagenti's localtest.me convention (wildcard DNS → 127.0.0.1) for Keycloak redirect URIs, agent URLs, and UI access.
Component Mapping
What agentstack drops (delegates to kagenti)
| Agentstack Component | Kagenti Replacement |
|---|---|
KubernetesProviderDeploymentManager |
Kagenti operator deploys agents as standard K8s Deployments in team namespaces |
KubernetesProviderBuildManager (Kaniko) |
Shipwright + Tekton builds (optional) |
Agent card in Docker labels (beeai.dev.agent.json) |
Realtime HTTP fetch from /.well-known/agent-card.json |
| Agent card stored in DB | Realtime discovery from running agents |
| Scale-to-zero / auto-stop logic | Kagenti manages agent lifecycle (or standard HPA) |
| Provider model (managed/unmanaged distinction) | All agents are kagenti-managed Deployments |
build-provider-job.yaml (Kaniko + Crane) |
Shipwright BuildRun with Buildah strategy |
| Keycloak deployment | kagenti-deps Keycloak deployment |
What agentstack keeps
| Component | Reason |
|---|---|
| A2A Proxy Service | Core routing/auth logic, user task tracking |
| Provider Registry sync | Can evolve to sync with kagenti's agent namespaces |
| PostgreSQL | Agentstack's own data (users, tasks, conversations) |
| Redis | Caching, rate limiting |
| SeaweedFS | Object storage for artifacts |
| Phoenix | LLM observability |
What changes in agentstack-server
| Area | Change |
|---|---|
bootstrap.py |
Remove KubernetesProviderDeploymentManager and KubernetesProviderBuildManager injection |
providers.py service |
Rewrite to discover agents via kagenti API |
a2a.py service |
Update URL resolution: http://{agent}.{namespace}.svc.cluster.local:8080 |
| Provider model | Simplify - remove auto_stop_timeout, unmanaged_state, build fields |
| Provider cron jobs | Remove auto_stop_providers, refresh_unmanaged_provider_state; keep or adapt registry sync |
| Configuration | Add kagenti connection settings, remove build/scaling config |
Agent Card Discovery - New Model
Current: Build-time baking into Docker labels → stored in DB → DB lookup at runtime
New: Runtime HTTP fetch from /.well-known/agent-card.json via kagenti API. No more "offline" agents. Discovery is always fresh.
Transition: Keep DB-backed card cache as fallback during PoC, add kagenti-based discovery as primary, remove DB cache once stable.
MicroShift Compatibility Notes
- Kagenti supports OpenShift via
global.openshift: true - MicroShift may lack OLM → need Helm-based alternatives for operators
- SPIRE's ZTWIM operator requires OCP 4.19+ → use
useSpireHelmChart: true - Istio Ambient mode should work on MicroShift (standard K8s networking)
- Adopt
localtest.mefor DNS
Implementation Plan
Phase 1: Minimal Integration
- Install kagenti Helm charts into MicroShift (operator + webhook only)
- Move Keycloak to separate namespace
- Deploy a test agent via kagenti (manual kubectl)
- Verify agent card discovery via HTTP
- Update agentstack A2A proxy to route to kagenti-managed agents
Phase 2: Feature Parity
- Remove
KubernetesProviderDeploymentManager - Remove
KubernetesProviderBuildManager - Implement kagenti-based agent discovery in provider service
- Update Helm chart (remove Keycloak, add kagenti-deps dependency)
- Test full agent lifecycle: deploy → discover → chat → delete
Phase 3: Optional Features
- Enable Istio Ambient mode
- Enable SPIRE/SPIFFE identity
- Enable Shipwright builds
- Configure feature toggles in values.yaml
Open Questions
- Kagenti operator on MicroShift: Has this been tested? Any known issues?
- Agent namespaces: Should agentstack create team namespaces, or delegate to kagenti?
- UI: Do we keep agentstack UI only, or also deploy kagenti UI for agent management?
- MCP Gateway: Agentstack has managed MCP service. Kagenti also has MCP Gateway. Which wins?
- Provider registry: Current agentstack syncs from Git-based registries. Does kagenti have an equivalent, or do we keep this?
- Observability: Both use Phoenix. Consolidate to one instance?
Design document: docs/poc-kagenti-integration.md on branch poc/kagenti-integration
Metadata
Metadata
Assignees
Labels
Type
Projects
Status