Skip to content

PoC: Kagenti integration for agent lifecycle management #2304

@jezekra1

Description

@jezekra1

Summary

Refactor agentstack so that agent scaling, deployment, and discovery are handled by kagenti instead of our custom Kubernetes provider management. This gives us standard A2A agent lifecycle, realtime agent card discovery, and a path to zero-trust identity (SPIRE/SPIFFE), service mesh (Istio Ambient), and Shipwright-based builds.

What We Gain

  • Standard A2A agent lifecycle management (deploy, discover, scale)
  • Realtime agent card discovery (no more storing cards in DB or Docker labels)
  • Zero-trust identity via SPIRE/SPIFFE (optional)
  • Service mesh observability via Istio Ambient (optional)
  • Shipwright-based builds replacing our Kaniko pipeline (optional)
  • Team namespace isolation

What We Drop

  • KubernetesProviderDeploymentManager (custom kr8s-based deployment logic)
  • KubernetesProviderBuildManager (Kaniko build jobs with Docker label baking)
  • Agent card storage in database / Docker image labels
  • Scale-to-zero logic (kagenti handles agent lifecycle)
  • The concept of "managed" vs "unmanaged" providers (all agents become kagenti-managed)

Target Architecture

Lima VM (MicroShift)
├── agentstack namespace
│   ├── agentstack-server (FastAPI, slimmed down)
│   │   ├── A2A Proxy → delegates to kagenti agent services
│   │   └── Provider Registry → reads from kagenti API / agent cards
│   ├── PostgreSQL
│   ├── Redis
│   └── SeaweedFS
├── keycloak namespace (shared)
│   └── Keycloak (StatefulSet)
├── kagenti-system namespace
│   ├── kagenti-operator
│   ├── kagenti-webhook
│   ├── kagenti-ui (backend + frontend) [optional]
│   └── MCP Gateway
├── team1, team2, ... (agent namespaces)
│   └── agent Deployments + Services
├── istio-system (optional)
├── zero-trust-workload-identity-manager (optional)
└── cr-system (container registry, optional)

Key Decisions

Installing Kagenti into MicroShift: Helm-only

Strip kagenti down to its two Helm charts (kagenti and kagenti-deps) and install them directly into MicroShift. Skip the Ansible playbook entirely.

What Ansible does that we replicate elsewhere:

  1. Cluster creation → already handled by Lima/MicroShift
  2. DNS setup → already handled by Lima networking
  3. OAuth secret creation → Helm hooks or init containers
  4. Image preloading → pre-pull in Lima config
  5. Shipwright ClusterBuildStrategy → Helm template or post-install hook

Configurable Feature Toggles

kagenti:
  enabled: true
  features:
    istio:
      enabled: false        # Service mesh (Ambient mode)
    spire:
      enabled: false        # Zero-trust workload identity
    shipwright:
      enabled: false        # Container builds
    builds:
      enabled: false        # Build UI + Tekton
    observability:
      phoenix:
        enabled: true       # LLM trace viewer
      otel:
        enabled: false      # OpenTelemetry collector
      kiali:
        enabled: false      # Service mesh dashboard
    containerRegistry:
      enabled: false        # In-cluster registry
    certManager:
      enabled: false        # Certificate management
    mcpGateway:
      enabled: false        # MCP Gateway

Minimal local setup: just kagenti operator + webhook + Keycloak. Full-featured: everything enabled.

Keycloak: kagenti-deps owns it, agentstack consumes it

  • kagenti-deps deploys Keycloak in keycloak namespace
  • Agentstack Helm chart disables its own Keycloak and references the external instance
  • Deployment order: kagenti-depsagentstackkagenti

Keycloak Realms: Separate

  • Agentstack keeps its agentstack realm
  • Kagenti keeps its kagenti realm
  • Both realms in the same Keycloak instance, fully independent
  • Agentstack's provision-job.yaml stays in agentstack Helm chart, targets keycloak-service.keycloak:8080
  • Admin credentials shared via K8s secret or dedicated admin client

Agent Discovery: Kagenti Backend API

Agentstack calls kagenti's REST API to discover agents:

agentstack-server → HTTP → kagenti-backend → K8s API → Deployments with kagenti.io/type=agent

Agentstack-server gets a dedicated Keycloak client in the kagenti realm (e.g., agentstack-api) with kagenti-viewer role, using client credentials grant for service-to-service auth.

Direct K8s label scanning was rejected because K8s API RBAC is namespace-scoped - agentstack's service account can't list Deployments in team namespaces without ClusterRole escalation. The kagenti API avoids this entirely.

Multi-Tenancy: Global Agent Catalog + Per-User Data

All agents across all kagenti namespaces are visible to all agentstack users. User data (conversations, tasks, files) remains per-user.

Agents:        GLOBAL (all users see all agents from all namespaces)
Conversations: PER-USER (user's own chat history with agents)
Tasks:         PER-USER (A2A task ownership)
Files:         PER-USER (uploaded documents)

Multi-Agent Communication: All through agentstack proxy

Agentstack issues custom tokens and controls which agent can call which, so it stays in the request path. Cross-namespace networking works fine in K8s (no architectural blockers).

# Agent (team1) → agentstack API (agentstack namespace) ← works
# Agentstack (agentstack namespace) → agent (team1)     ← works

Namespaces are a logical boundary for resource organization, not a network boundary. Istio Ambient adds mTLS on top without restricting connectivity.

DNS

Adopt kagenti's localtest.me convention (wildcard DNS → 127.0.0.1) for Keycloak redirect URIs, agent URLs, and UI access.


Component Mapping

What agentstack drops (delegates to kagenti)

Agentstack Component Kagenti Replacement
KubernetesProviderDeploymentManager Kagenti operator deploys agents as standard K8s Deployments in team namespaces
KubernetesProviderBuildManager (Kaniko) Shipwright + Tekton builds (optional)
Agent card in Docker labels (beeai.dev.agent.json) Realtime HTTP fetch from /.well-known/agent-card.json
Agent card stored in DB Realtime discovery from running agents
Scale-to-zero / auto-stop logic Kagenti manages agent lifecycle (or standard HPA)
Provider model (managed/unmanaged distinction) All agents are kagenti-managed Deployments
build-provider-job.yaml (Kaniko + Crane) Shipwright BuildRun with Buildah strategy
Keycloak deployment kagenti-deps Keycloak deployment

What agentstack keeps

Component Reason
A2A Proxy Service Core routing/auth logic, user task tracking
Provider Registry sync Can evolve to sync with kagenti's agent namespaces
PostgreSQL Agentstack's own data (users, tasks, conversations)
Redis Caching, rate limiting
SeaweedFS Object storage for artifacts
Phoenix LLM observability

What changes in agentstack-server

Area Change
bootstrap.py Remove KubernetesProviderDeploymentManager and KubernetesProviderBuildManager injection
providers.py service Rewrite to discover agents via kagenti API
a2a.py service Update URL resolution: http://{agent}.{namespace}.svc.cluster.local:8080
Provider model Simplify - remove auto_stop_timeout, unmanaged_state, build fields
Provider cron jobs Remove auto_stop_providers, refresh_unmanaged_provider_state; keep or adapt registry sync
Configuration Add kagenti connection settings, remove build/scaling config

Agent Card Discovery - New Model

Current: Build-time baking into Docker labels → stored in DB → DB lookup at runtime

New: Runtime HTTP fetch from /.well-known/agent-card.json via kagenti API. No more "offline" agents. Discovery is always fresh.

Transition: Keep DB-backed card cache as fallback during PoC, add kagenti-based discovery as primary, remove DB cache once stable.


MicroShift Compatibility Notes

  • Kagenti supports OpenShift via global.openshift: true
  • MicroShift may lack OLM → need Helm-based alternatives for operators
  • SPIRE's ZTWIM operator requires OCP 4.19+ → use useSpireHelmChart: true
  • Istio Ambient mode should work on MicroShift (standard K8s networking)
  • Adopt localtest.me for DNS

Implementation Plan

Phase 1: Minimal Integration

  • Install kagenti Helm charts into MicroShift (operator + webhook only)
  • Move Keycloak to separate namespace
  • Deploy a test agent via kagenti (manual kubectl)
  • Verify agent card discovery via HTTP
  • Update agentstack A2A proxy to route to kagenti-managed agents

Phase 2: Feature Parity

  • Remove KubernetesProviderDeploymentManager
  • Remove KubernetesProviderBuildManager
  • Implement kagenti-based agent discovery in provider service
  • Update Helm chart (remove Keycloak, add kagenti-deps dependency)
  • Test full agent lifecycle: deploy → discover → chat → delete

Phase 3: Optional Features

  • Enable Istio Ambient mode
  • Enable SPIRE/SPIFFE identity
  • Enable Shipwright builds
  • Configure feature toggles in values.yaml

Open Questions

  1. Kagenti operator on MicroShift: Has this been tested? Any known issues?
  2. Agent namespaces: Should agentstack create team namespaces, or delegate to kagenti?
  3. UI: Do we keep agentstack UI only, or also deploy kagenti UI for agent management?
  4. MCP Gateway: Agentstack has managed MCP service. Kagenti also has MCP Gateway. Which wins?
  5. Provider registry: Current agentstack syncs from Git-based registries. Does kagenti have an equivalent, or do we keep this?
  6. Observability: Both use Phoenix. Consolidate to one instance?

Design document: docs/poc-kagenti-integration.md on branch poc/kagenti-integration

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions