Agentic Platform

A production-grade, multi-tenant Kubernetes platform for deploying, managing, and securing AI agents at scale. Built on AWS EKS with Istio Ambient Mesh, it provides enterprise-level isolation, observability, and governance for agentic workloads.

Architecture Overview

The platform combines two open-source projects — AgentGateway (data plane) and kagent (control plane) — with Istio Ambient Mesh, Keycloak, and a suite of Kyverno policies to form a complete multi-tenant agent runtime.

Key Design Decisions

Istio Ambient Mesh — Zero-sidecar service mesh. L4 mTLS via ztunnel on every node; L7 policy via per-tenant waypoint proxies. No sidecar injection overhead.
AgentGateway as Waypoint — The waypoint proxy implementation is AgentGateway (not Envoy), giving waypoints native understanding of MCP, A2A, and LLM protocols.
Credential Injection at Gateway — Tenants never hold real LLM API keys. Dummy secrets pass LiteLLM validation; the gateway strips and injects real credentials at the proxy layer.
Kyverno-Driven Conventions — Policies auto-generate HTTPRoutes from annotations, inject node scheduling, and enforce trace propagation. Tenants opt in with platform.agentic.io/expose: "true".

Components

Component	Role	Namespace
AgentGateway	L7 proxy — routes MCP, A2A, and LLM traffic; enforces auth policies; injects credentials; traces to Langfuse	`agentgateway-system`
kagent	K8s-native agent controller — reconciles Agent, ModelConfig, MCPServer CRDs into running workloads	`kagent-system`
Istio (Ambient)	mTLS, SPIFFE identity, ztunnel (L4), waypoint enrollment	`istio-system`
Keycloak	Identity provider — JWT issuance, tenant claims	`keycloak`
Langfuse	LLM observability — OTEL trace ingestion, prompt/completion logging, cost tracking	`langfuse`
ClickHouse	Analytics database backing Langfuse	`langfuse`
Prometheus + Grafana	Metrics collection, dashboards, alerting	`monitoring`
Kyverno	Policy engine — 5 mutation/generation policies for scheduling, routing, and protocol detection	`kyverno`
OTEL Collector	Bridges agent gRPC OTLP to Langfuse HTTP OTLP endpoint	`langfuse`
EverMemOS	Long-term memory system — REST API for memory storage, retrieval, and hybrid search (BM25 + vector + reranker). Backed by MongoDB, Elasticsearch, Milvus, and Redis	`evermemos`
Cluster Autoscaler	Automatically scales EKS node groups when pods are unschedulable due to resource pressure	`kube-system`

Node Groups

The EKS cluster uses three tainted node groups to isolate workload types:

Node Group	Instance	Taint	Workloads
platform	t3.large (2-6)	— (untainted, default landing zone)	Controllers, Langfuse, Prometheus, Keycloak, Kyverno, EverMemOS
agents	t3.large (1-5)	`workload=agents:NoSchedule`	Tenant agent pods, MCP servers, waypoint proxies
gateway	t3.medium (1-2)	`workload=gateway:NoSchedule`	AgentGateway ingress proxy (NLB-backed)

Kyverno policies automatically inject the correct nodeSelector and tolerations — platform operators and tenants don't need to specify them. The Cluster Autoscaler monitors all three ASGs and adds/removes nodes based on pending pod scheduling pressure (scale-down threshold: 50% utilization, 10-minute cooldown).

Multi-Tenancy

Tenant isolation is enforced in depth across five layers:

Namespace — Each tenant gets a dedicated namespace with resource quotas and limit ranges
RBAC — tenant-agent-developer ClusterRole scoped per-namespace via RoleBinding
NetworkPolicy — Deny-by-default at the CNI layer; explicit allow for DNS, OTEL, gateway, and external HTTPS
Istio AuthorizationPolicy — L4 isolation via SPIFFE identity through ztunnel (independent of NetworkPolicy)
Per-Tenant Waypoint — Each namespace gets its own AgentGateway waypoint for L7 policy (prompt guards, credential injection)

Tenants can:

Deploy Agents, MCPServers, and Sandboxes in their namespace
Bring their own LLM keys (BYOK pattern via AgentgatewayBackend)
Expose agents via annotation — Kyverno auto-generates HTTPRoutes
Call agents in other namespaces through the central gateway

Directory Structure

.
├── cluster/          # EKS cluster definition (eksctl) and IAM policies
├── platform/         # Helmfile, Helm values, Kubernetes manifests, and custom images
│   ├── helmfile.yaml       # Orchestrates ~12 Helm releases in phased order
│   ├── environments/       # Per-environment config (dev, defaults)
│   ├── images/             # Custom container images
│   ├── manifests/          # Post-install K8s manifests (policies, routes, RBAC)
│   └── values/             # Helm chart value overrides
├── scripts/          # Numbered deployment pipeline (00-05) and utilities
├── tenants/          # Tenant templates, examples, and onboarded tenant configs
│   ├── _template/          # Boilerplate YAML for tenant onboarding
│   ├── examples/           # Sample agent, MCP server, sandbox, and backend configs
│   └── onboarded/          # Live tenant directories (alpha, beta, test-a2a)
├── tests/            # Kyverno unit tests and Chainsaw e2e integration tests
│   ├── kyverno/            # 5 policy test suites
│   └── e2e/                # 2 Chainsaw test suites (passthrough, egress)
└── vendor/           # Patched forks of agentgateway, kagent, and evermemos

See individual directory READMEs for detailed documentation.

Getting Started

Prerequisites

AWS CLI configured with appropriate permissions
eksctl, kubectl, helm, helmfile installed
kyverno CLI (for running policy tests)
chainsaw (for e2e tests)
An Anthropic API key
An OpenRouter API key (for EverMemOS LLM)
A DeepInfra API key (for EverMemOS embedding/reranking)

Deployment Pipeline

The scripts in scripts/ are numbered in execution order:

# 1. Create the EKS cluster (15-20 min)
./scripts/00-create-cluster.sh

# 2. Provision AWS resources (RDS PostgreSQL, ElastiCache Redis, S3)
./scripts/01-create-aws-resources.sh

# 3. Create Kubernetes secrets from AWS outputs
./scripts/02-create-secrets.sh

# 4. Deploy all platform services via Helmfile
./scripts/03-deploy-platform.sh

# 5. Apply post-install manifests (Kyverno policies, Grafana MCP, Agent Sandbox CRDs)
./scripts/04-post-install.sh

# 6. Onboard a tenant
./scripts/05-onboard-tenant.sh alpha

Accessing Platform UIs

./scripts/port-forward.sh

Service	URL
kagent UI	http://localhost:15000
Langfuse	http://localhost:15001
Grafana	http://localhost:15002
AgentGateway	http://localhost:15003
Keycloak	http://localhost:15004
Kiali	http://localhost:15005

Quick Reapply

After editing manifests, reapply without a full redeploy:

./scripts/apply.sh              # Full reapply (Helmfile + manifests)
./scripts/apply.sh --skip-helm  # Manifests only

Routing

All agent, LLM, and tool traffic flows through the AgentGateway proxy:

Path Pattern	Target	Protocol
`/a2a/{namespace}/{agent-name}`	Agent pods	A2A (Google Agent-to-Agent)
`/llm/{namespace}/{model}`	LLM backends (Anthropic, OpenAI, etc.)	HTTP (with credential injection)
`/mcp/{namespace}/{server}`	MCP tool servers	MCP (Model Context Protocol)
`/sandbox/*`	Dynamic sandbox router	HTTP (header-based routing)

Routes are auto-generated by Kyverno when resources are annotated with platform.agentic.io/expose: "true". The optional platform.agentic.io/expose-alias annotation overrides the namespace segment in the path.

Observability

Agent interactions are traced end-to-end (shown in the architecture diagram above). Langfuse captures prompt/completion pairs, token usage, latency, and cost. Traces follow W3C traceparent headers across agent-to-agent calls (kagent instruments both httpx and aiohttp via OpenTelemetry for context propagation).

Prometheus scrapes metrics from all namespaces. Grafana dashboards are auto-discovered, and a Grafana MCP tool server is available to kagent's built-in observability agent.

Testing

# Run all tests
./tests/run-all.sh

# Unit tests only (Kyverno policy validation, no cluster required)
./tests/run-all.sh --unit-only

# Integration tests only (requires running cluster)
./tests/run-all.sh --integration-only

Kyverno Policy Tests (Unit)

platform-scheduling — Injects platform node affinity
tenant-scheduling — Injects agent node affinity
auto-expose — Generates HTTPRoutes from annotations

Chainsaw E2E Tests (Integration)

waypoint-passthrough — Verifies intra-namespace traffic routes through waypoint
waypoint-egress — Tests external LLM API calls with TLS origination and credential injection

Vendored Dependencies

The vendor/ directory contains patched forks of three projects:

agentgateway (Rust) — Patched for Istio Ambient waypoint integration and HBONE protocol support. Branch: waypoint.
kagent (Go/Python/TypeScript) — Patched for trace context propagation (traceparent/tracestate). Natively supports appProtocol: kgateway.dev/a2a on agent Services and OTEL instrumentation of aiohttp/httpx.
evermemos (Python) — Patched to remove hard dependency on .env file, allowing configuration entirely via Kubernetes ConfigMap and Secret. Image built from vendor source and pushed to ECR.

All are upstream-tracking forks. Changes are scoped to features not yet merged upstream.

External Dependencies (AWS)

Service	Purpose	Configuration
RDS PostgreSQL 17	Keycloak + Langfuse database	`db.t4g.small`, encrypted, 20GB gp3
ElastiCache Redis	Langfuse session/cache	`cache.t4g.small`, transit encryption
S3	Langfuse event/media uploads	`agentic-platform-langfuse-{env}`, AES256, no public access
EBS (gp3)	Persistent volumes for ClickHouse, Prometheus, Grafana, EverMemOS (MongoDB, Elasticsearch, Milvus, etcd, MinIO)	Provisioned via EBS CSI driver

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Platform

Architecture Overview

Key Design Decisions

Components

Node Groups

Multi-Tenancy

Directory Structure

Getting Started

Prerequisites

Deployment Pipeline

Accessing Platform UIs

Quick Reapply

Routing

Observability

Testing

Kyverno Policy Tests (Unit)

Chainsaw E2E Tests (Integration)

Vendored Dependencies

External Dependencies (AWS)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
cluster		cluster
examples		examples
platform		platform
scripts		scripts
tenants/_template		tenants/_template
tests		tests
vendor		vendor
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
architecture.drawio.svg		architecture.drawio.svg

Folders and files

Latest commit

History

Repository files navigation

Agentic Platform

Architecture Overview

Key Design Decisions

Components

Node Groups

Multi-Tenancy

Directory Structure

Getting Started

Prerequisites

Deployment Pipeline

Accessing Platform UIs

Quick Reapply

Routing

Observability

Testing

Kyverno Policy Tests (Unit)

Chainsaw E2E Tests (Integration)

Vendored Dependencies

External Dependencies (AWS)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages