From 3afdb2a74ec705ae775564e740653d6c35d6f703 Mon Sep 17 00:00:00 2001 From: usize Date: Tue, 9 Jun 2026 15:53:53 -0700 Subject: [PATCH 01/10] docs: Propose AI Gateway CRDs and Controller This proposal is derived from the upstream wg-ai-gateway project -- part of Kubernetes SIG-Network. It targets Envoy AI Gateway as a proof of concept, but is meant to be extended to support additional proxies. A novel integration with SPIFFE is proposed, as a value add unique to our platform. Signed-off-by: usize --- docs/plans/ai-gateway.md | 682 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 682 insertions(+) create mode 100644 docs/plans/ai-gateway.md diff --git a/docs/plans/ai-gateway.md b/docs/plans/ai-gateway.md new file mode 100644 index 00000000..92340874 --- /dev/null +++ b/docs/plans/ai-gateway.md @@ -0,0 +1,682 @@ +# AI Gateway Policy Attachment + +## Status + +Proposal + +## Summary + +Add AI Gateway capabilities to the Kagenti operator using the Gateway API +policy attachment pattern. Users create a standard `Gateway` resource +(managed by Envoy Gateway), then attach Kagenti policy CRDs to control +LLM routing and access control. Each policy generates the downstream +Envoy AI Gateway resources needed to implement the declared intent. + +Two CRDs cover the initial scope: + +- **AIRoutingPolicy** — providers, models, per-model failover, + credentials, and per-model token rate limiting. +- **AIAccessPolicy** — gateway-level mTLS using SPIFFE trust bundles. + +Providers and models are separate concepts within AIRoutingPolicy. +Providers define shared connection configuration (endpoint, schema, +credentials). Models define what clients request, which provider +backends serve them, and failover order — similar to +[LiteLLM's model group pattern][LiteLLM Router]. Failover happens +within a model's backend list, never across unrelated models. + +`AIAccessPolicy` is a novel feature intended to leverage our platform's +deep SPIFFE integration to provide tokenless inference access and governance to +agent workloads. + +## Background + +### Gateway API + +[Gateway API] is the Kubernetes-native standard for configuring network +gateways. Because [Envoy Gateway] is an implementation that manages Envoy proxy +data planes for Gateway API resources we are targeting it as our initial dataplane. + +The Envoy [AI Gateway extension] adds AI-aware capabilities on top of +Envoy Gateway: protocol translation between LLM schemas, model-based +routing, credential injection, and token accounting. It defines its own +CRDs (`AIGatewayRoute`, `AIServiceBackend`, `BackendSecurityPolicy`) +that the extension server translates into xDS configuration. + +These CRDs are powerful but low-level. A user wanting to route to Ollama +with mTLS and token rate limiting needs to create and coordinate six or +more resources. Our policy CRDs provide a higher-level abstraction. + +As additional proxies are supported, we will likewise translate as needed +in order to program them. + +### WG AI Gateway + +The Kubernetes [WG AI Gateway] is defining standards for AI-aware +networking in Gateway API. Two proposals are directly relevant: + +- [Proposal 10: Egress Gateways] — introduces a `Backend` resource + (`gateway.networking.k8s.io/v1alpha1`) as a first-class representation + of external destinations with inline TLS, credential injection + extensions, and MCP protocol support. Defines three-tier policy scoping + (Route > Backend > Gateway) with oldest-wins conflict resolution. + When this proposal matures, it could collapse the per-provider + Backend + AIServiceBackend + BackendSecurityPolicy into a single + resource, reducing the generated resource count without changing the + user-facing API. + +- [Proposal 7: Payload Processing] — introduces + `PayloadProcessingPipeline` for ordered, sequential body/header + processors (prompt validation, PII redaction, semantic routing) as + HTTPRoute filters. This is the emerging standard for guardrails and + is the intended generation target for per-provider processing + pipelines (see [Future: guardrails](#future-guardrails)). + +Our design generates the Envoy AI Gateway CRDs that exist today, but is +structured so that when the WG proposals mature into accepted APIs, we +can adopt them as generation targets without changing the user-facing +policy CRDs. Translation from policy intent to data-plane resources is +isolated in a renderer package to support this migration. + +[Gateway API]: https://gateway-api.sigs.k8s.io/ +[Envoy Gateway]: https://gateway.envoyproxy.io/ +[AI Gateway extension]: https://aigateway.envoyproxy.io/ +[WG AI Gateway]: https://github.com/kubernetes-sigs/wg-ai-gateway +[Proposal 10: Egress Gateways]: https://github.com/kubernetes-sigs/wg-ai-gateway/blob/main/proposals/10-egress-gateways.md +[Proposal 7: Payload Processing]: https://github.com/kubernetes-sigs/wg-ai-gateway/blob/main/proposals/7-payload-processing.md + +## Architecture + +``` +User creates Controller generates +──────────── ──────────────────── + +┌──────────────┐ +│ Gateway │ ◄── Envoy Gateway manages the Envoy proxy +│ class: eg │ +└──────┬───────┘ + │ targetRef + │ + ┌─────┴──────────────────────────────────────────────┐ + │ │ + │ ┌────────────────┐ ┌───────────────┐ │ + │ │AIRoutingPolicy │ │AIAccessPolicy │ │ + │ │ (providers, │ │ (mTLS) │ │ + │ │ models, │ │ │ │ + │ │ rate limits) │ │ │ │ + │ └───────┬────────┘ └──────┬────────┘ │ + │ │ │ │ + │ ▼ ▼ │ + │ Backend CA Secret │ + │ AIServiceBackend Server cert │ + │ AIGatewayRoute ClientTrafficPolicy │ + │ BackendSecurityPolicy │ + │ BackendTrafficPolicy │ + │ (retry + rate limits) │ + └─────────────────────────────────────────────────────┘ +``` + +Two controllers watch the same Gateway for different concerns: + +- **Envoy Gateway** — reconciles the Gateway, deploys the Envoy proxy, + processes BackendTrafficPolicy and ClientTrafficPolicy +- **Kagenti operator** — reconciles the two AI policy CRDs and generates + downstream resources + +No conflict: our controller never modifies the Gateway. It creates +sibling resources that reference it. The two Kagenti policy controllers +are independent — they generate disjoint sets of resources and never +write to each other's objects. + +## API group + +``` +aigateway.kagenti.dev/v1alpha1 +``` + +Separate from the operator's agent CRDs (`agent.kagenti.dev`) and from +Envoy AI Gateway's CRDs (`aigateway.envoyproxy.io`). + +## CRDs + +### AIRoutingPolicy + +The core policy — without it, the Gateway has no AI routing. + +Providers and models are separate concepts: + +- **Providers** define shared connection configuration: endpoint, API + schema, and credentials. A provider is referenced by name from model + backend entries. Each provider generates one Backend, one + AIServiceBackend, and (if credentials are set) one + BackendSecurityPolicy. + +- **Models** define the client-facing routing unit. Each model has a + `name` (the virtual name clients request), a list of `backends` + (provider references with the actual model name and failover + priority), and optional rate limiting. Each model generates one rule + in the AIGatewayRoute. + +A single controller owns all generated resources, eliminating +cross-controller write conflicts. + +```yaml +apiVersion: aigateway.kagenti.dev/v1alpha1 +kind: AIRoutingPolicy +metadata: + name: my-routing + namespace: team1 +spec: + targetRef: + group: gateway.networking.k8s.io + kind: Gateway + name: ai-gateway + + # Providers: shared endpoint + credential definitions + providers: + - name: ollama-local + endpoint: http://ollama.team1.svc:11434 + schema: OpenAI + + - name: openai + endpoint: https://api.openai.com/v1 + schema: OpenAI + credentials: + type: APIKey + secretRef: + name: openai-secret + key: api-key + + - name: azure-openai + endpoint: https://my-resource.openai.azure.com + schema: OpenAI + credentials: + type: AzureCredentials + tenantId: "..." + clientId: "..." + clientSecretRef: + name: azure-secret + key: client-secret + + - name: bedrock + endpoint: https://bedrock-runtime.us-east-1.amazonaws.com + schema: AWSBedrock + credentials: + type: AWSCredentials + region: us-east-1 + + # Models: what clients request, with per-model failover + models: + - name: gpt-4o # virtual name (client-facing) + backends: + - provider: openai + model: gpt-4o # actual model name — always explicit + priority: 0 + - provider: azure-openai + model: gpt-4o-2024-05-13 # Azure uses a different name + priority: 1 + rateLimit: + tokensPerHour: 100000 + tokenCountMode: TotalToken # InputToken | OutputToken | TotalToken + failover: + retryOn: [502, 503, 429] + maxRetries: 2 + + - name: gpt-4o-mini + backends: + - provider: openai + model: gpt-4o-mini + priority: 0 + rateLimit: + tokensPerHour: 1000000 + + - name: qwen2.5:3b + backends: + - provider: ollama-local + model: qwen2.5:3b # single backend, no failover + + - name: claude-sonnet + backends: + - provider: bedrock + model: anthropic.claude-sonnet-4-20250514-v1:0 + priority: 0 + rateLimit: + tokensPerHour: 100000 +``` + +**What it generates:** + +| Generated resource | API group | Count | +|----|----|----| +| Backend | gateway.envoyproxy.io | one per provider | +| AIServiceBackend | aigateway.envoyproxy.io | one per provider | +| BackendSecurityPolicy | aigateway.envoyproxy.io | one per provider with credentials | +| AIGatewayRoute | aigateway.envoyproxy.io | one (one rule per model + llmRequestCosts) | +| BackendTrafficPolicy | gateway.envoyproxy.io | one (per-model failover + rate limit rules) | + +**Provider credential types:** + +| `type` | Generated BackendSecurityPolicy | Use case | +|--------|--------------------------------|----------| +| `APIKey` | `spec.apiKey.secretRef` | OpenAI, Anthropic, generic | +| `AWSCredentials` | `spec.awsCredentials` with SigV4 | AWS Bedrock | +| `AzureCredentials` | `spec.azureCredentials` | Azure OpenAI | +| `GCPCredentials` | `spec.gcpCredentials` | Vertex AI | + +Protocol translation between schemas (e.g., OpenAI-format request routed +to an Anthropic backend) is handled automatically by the AI Gateway +extension server based on the provider's `schema` field. No code in our +controller. + +**Failover behavior:** + +Failover is per model. Each model's `backends` list defines the failover +order via the `priority` field (lowest = highest priority). If the +primary backend fails (matching `retryOn` conditions), Envoy retries at +the next priority level. Within the same priority, `weight` (default 1) +distributes traffic for active-active scenarios. + +Each model can specify its own `failover` configuration (retryOn, +maxRetries, backoff). A top-level `defaultFailover` field can set +defaults to avoid repetition. Models with a single backend need no +failover configuration. + +In the generated AIGatewayRoute, each model becomes one rule matching +the `x-ai-eg-model` header, with multiple `backendRefs` when the model +has multiple backends. + +**Rate limiting:** + +Rate limits are defined per model. Each model with a `rateLimit` field +generates one rate limit rule in the BackendTrafficPolicy, matched by +the `x-ai-eg-model` header that the AI Gateway extension sets during +routing. This maps directly to Envoy AI Gateway's rate limiting +mechanism — one rule, one model header selector, one Redis counter — +with no ambiguity in descriptor grouping. + +A model with no `rateLimit` field has no token quota enforced. The +`tokenCountMode` defaults to `TotalToken` and can be set to +`InputToken` or `OutputToken` for finer-grained accounting. + +When any model specifies a `rateLimit`, the controller adds +`llmRequestCosts` entries to the AIGatewayRoute (telling the AI Gateway +extension to extract token counts into Envoy dynamic metadata under the +`io.envoy.ai_gateway` namespace) and rate limit rules with +`cost.response.from: Metadata` to the BackendTrafficPolicy. Both +resources are already owned by this controller for routing and failover, +so no coordination is needed. + +Requires Envoy Gateway's global rate limit service (Redis-backed) for +cross-instance quota enforcement. + +Per-user or per-tenant rate limiting (multi-tenant cost allocation) is +a separate concern. If needed, a future policy CRD could attach to the +Gateway to layer per-client quotas on top of per-model limits. + +### AIAccessPolicy + +Controls which clients can reach the Gateway. Phase 1 implements mTLS +with SPIFFE trust bundles. Future phases could add JWT validation or +API key authentication. + +This is a gateway-level concern — it applies uniformly to all traffic +entering the gateway, independent of which provider handles the request. + +```yaml +apiVersion: aigateway.kagenti.dev/v1alpha1 +kind: AIAccessPolicy +metadata: + name: my-access + namespace: team1 +spec: + targetRef: + group: gateway.networking.k8s.io + kind: Gateway + name: ai-gateway + + mtls: + trustDomain: localtest.me + trustBundleConfigMap: + name: spire-bundle + namespace: spire-system + key: bundle.spiffe # default + serverCertRef: # optional — auto-generated if omitted + name: my-server-cert +``` + +**What it generates:** + +| Generated resource | Purpose | +|----|-----| +| Secret `-mtls-ca` | PEM-encoded CA certs extracted from the SPIFFE JSON trust bundle | +| Secret `-mtls-server` | Self-signed ECDSA server cert (only if `serverCertRef` is omitted) | +| ClientTrafficPolicy | Requires client certs, validated against the CA Secret | + +The controller reads the SPIRE trust bundle ConfigMap (which contains a +SPIFFE-format JWK set), extracts the `x509-svid` certificates, converts +them to PEM, and writes a Secret. It skips expired certificates (SPIRE +bundles retain rotated CAs that Envoy rejects). + +When an AIAccessPolicy targets a Gateway, the controller also ensures +the Gateway's listeners use HTTPS with TLS termination. If the user's +AIRoutingPolicy specifies HTTP listeners, the AIAccessPolicy overrides +them to HTTPS — access policy takes precedence over routing config for +the listener protocol. + +### Future: guardrails + +Content filtering, prompt injection detection, and PII redaction are +per-provider concerns — trust boundaries differ between a local Ollama +instance and an external API endpoint. The natural home for guardrails +is inline on the provider definition, as an ordered processing pipeline: + +```yaml +providers: +- name: openai + endpoint: https://api.openai.com/v1 + schema: OpenAI + credentials: ... + processing: # future — not in initial scope + - name: pii-redaction + phase: request + extProc: + serviceRef: + name: pii-service + port: 50051 + - name: toxicity-check + phase: response + extProc: + serviceRef: + name: toxicity-service + port: 50051 + +- name: ollama + endpoint: http://ollama.svc:11434 + schema: OpenAI + # no processing — trusted local endpoint +``` + +The `processing` list is ordered, giving pipeline semantics. The +controller would generate per-route ext-proc filter chains so that +different providers get different processing. + +The WG AI Gateway's [Proposal 7: Payload Processing] defines a +`PayloadProcessingPipeline` CRD for ordered body/header processors as +HTTPRoute filters. When that proposal matures into an accepted API, the +`processing` field should generate `PayloadProcessingPipeline` resources +rather than raw ext-proc configuration. This follows the same principle +as routing: the user-facing spec stays stable while the generated +resources change as standards evolve. + +Gateway-wide guardrails (compliance requirements that apply regardless +of provider) are a separate concern. If needed, a future +`AIGuardrailsPolicy` could attach to the Gateway using the same policy +attachment pattern as AIAccessPolicy. That would be a separate design +proposal. + +Not in scope for the initial implementation. + +## Example: complete deployment + +A platform admin sets up a Gateway with mTLS. A team lead configures +the LLM providers and models with per-model failover and rate limits. + +```yaml +# Platform admin creates the Gateway +apiVersion: gateway.networking.k8s.io/v1 +kind: Gateway +metadata: + name: ai-gateway + namespace: team1 +spec: + gatewayClassName: eg + listeners: + - name: https + port: 8443 + protocol: HTTPS +--- +# Platform admin sets access policy +apiVersion: aigateway.kagenti.dev/v1alpha1 +kind: AIAccessPolicy +metadata: + name: ai-gateway-access + namespace: team1 +spec: + targetRef: + group: gateway.networking.k8s.io + kind: Gateway + name: ai-gateway + mtls: + trustDomain: localtest.me + trustBundleConfigMap: + name: spire-bundle + namespace: spire-system +--- +# Team lead configures providers and models +apiVersion: aigateway.kagenti.dev/v1alpha1 +kind: AIRoutingPolicy +metadata: + name: ai-gateway-routing + namespace: team1 +spec: + targetRef: + group: gateway.networking.k8s.io + kind: Gateway + name: ai-gateway + + providers: + - name: ollama + endpoint: http://ollama.team1.svc:11434 + schema: OpenAI + - name: openai + endpoint: https://api.openai.com/v1 + schema: OpenAI + credentials: + type: APIKey + secretRef: + name: openai-secret + key: api-key + - name: azure-openai + endpoint: https://my-resource.openai.azure.com + schema: OpenAI + credentials: + type: AzureCredentials + tenantId: "..." + clientId: "..." + clientSecretRef: + name: azure-secret + key: client-secret + + models: + - name: gpt-4o + backends: + - provider: openai + model: gpt-4o + priority: 0 + - provider: azure-openai + model: gpt-4o-2024-05-13 + priority: 1 + rateLimit: + tokensPerHour: 100000 + failover: + retryOn: [502, 503, connect-failure] + maxRetries: 2 + - name: qwen2.5:3b + backends: + - provider: ollama + model: qwen2.5:3b +``` + +## Reconciliation + +Each policy has its own controller. They run independently, generate +disjoint resource sets, and can be applied in any order. A missing +policy simply means that capability isn't configured. + +``` +AIRoutingPolicyReconciler + ├── parse provider endpoints (URL → host + port) + ├── for each provider: + │ ├── create/update Backend (gateway.envoyproxy.io) + │ ├── create/update AIServiceBackend (aigateway.envoyproxy.io) + │ └── create/update BackendSecurityPolicy (if credentials set) + ├── create/update AIGatewayRoute + │ ├── one rule per model (x-ai-eg-model match → backendRefs) + │ └── llmRequestCosts (if any model has rateLimit) + ├── create/update BackendTrafficPolicy + │ ├── per-model failover/retry config + │ └── per-model rate limit rules (x-ai-eg-model header selectors) + ├── clean up orphaned resources for removed providers + └── update status conditions + +AIAccessPolicyReconciler + ├── read SPIRE trust bundle ConfigMap + ├── parse SPIFFE JSON → extract x509-svid certs → PEM + ├── create/update CA Secret + ├── create/update self-signed server cert (if no serverCertRef) + ├── create/update ClientTrafficPolicy + └── update status conditions +``` + +All generated resources carry an owner reference to their policy CR. +Deleting a policy garbage-collects its generated resources. + +## Code structure + +Translation from policy intent to data-plane-specific resources is +isolated in a renderer package. Phase 1 ships with an Envoy AI Gateway +renderer. This boundary exists so that alternative data planes (or +future WG AI Gateway standard resources) can be supported by adding a +renderer without modifying reconciliation logic. + +``` +internal/aigateway/ + intent.go # data-plane-agnostic representation of routing intent + reconciler.go # shared reconciliation orchestration + envoy/ + renderer.go # intent → Envoy AI Gateway CRDs + renderer_test.go +``` + +## RBAC + +The policy attachment model maps naturally to organizational roles: + +| Role | Creates | Why | +|------|---------|-----| +| Platform admin | Gateway, AIAccessPolicy | Controls infra and security | +| Team lead | AIRoutingPolicy (providers, models, rate limits) | Decides which providers, models, and quotas the team uses | +| Developer | nothing — calls the gateway endpoint | Consumes inference through the gateway | + +The operator service account needs: + +| API group | Resources | Verbs | +|-----------|-----------|-------| +| aigateway.kagenti.dev | airoutingpolicies, aiaccesspolicies + /status + /finalizers | all | +| gateway.networking.k8s.io | gateways | get, list, watch | +| gateway.envoyproxy.io | backends, clienttrafficpolicies, backendtrafficpolicies | all | +| aigateway.envoyproxy.io | aiservicebackends, aigatewayroutes, backendsecuritypolicies | all | +| (core) | secrets | all | +| (core) | configmaps | get, list, watch | + +## Infrastructure prerequisites + +The Ansible installer deploys these before the operator runs: + +| Component | Version | Helm chart | Purpose | +|-----------|---------|------------|---------| +| Envoy Gateway | v1.7.0 | `oci://docker.io/envoyproxy/gateway-helm` | Data plane management, must have `extensionManager` configured | +| AI Gateway controller | v0.6.0 | `oci://docker.io/envoyproxy/ai-gateway-helm` | Extension server for AIGatewayRoute/AIServiceBackend | +| AI Gateway CRDs | v0.6.0 | `oci://docker.io/envoyproxy/ai-gateway-crds-helm` | CRD definitions | +| GatewayClass `eg` | — | kubectl apply | Links Gateways to Envoy Gateway's controller | +| SPIRE | — | spire-crds + spire charts | Trust bundle for mTLS (optional) | + +Envoy Gateway must be configured with the `extensionManager` pointing +to the AI Gateway controller's gRPC service: + +```yaml +config: + envoyGateway: + extensionApis: + enableBackend: true + extensionManager: + hooks: + xdsTranslator: + post: [Translation, Cluster, Route] + service: + fqdn: + hostname: ai-gateway-controller..svc.cluster.local + port: 1063 +``` + +## Implementation plan + +### Phase 1: AIRoutingPolicy + +The minimum viable feature. A user creates a Gateway and an +AIRoutingPolicy; the controller generates the Envoy AI Gateway +resources to make inference routing work. Per-model rate limits and +per-model failover are included from the start since they share +generated resources with routing (AIGatewayRoute and +BackendTrafficPolicy) and map directly to the Envoy data plane model. + +Files: +- `api/aigateway/v1alpha1/types.go` — shared types (targetRef, provider, model, backend, credentials, rateLimit) +- `api/aigateway/v1alpha1/airoutingpolicy_types.go` +- `internal/aigateway/intent.go` — data-plane-agnostic routing intent +- `internal/aigateway/envoy/renderer.go` — Envoy AI Gateway renderer +- `internal/aigateway/envoy/renderer_test.go` +- `internal/controller/airoutingpolicy_controller.go` +- `internal/controller/airoutingpolicy_controller_test.go` + +### Phase 2: AIAccessPolicy + +mTLS enforcement using SPIRE trust bundles. Adds the gateway-level +access control layer on top of routing. + +Files: +- `api/aigateway/v1alpha1/aiaccesspolicy_types.go` +- `internal/controller/aiaccesspolicy_controller.go` +- `internal/spiffe/bundle.go` — SPIFFE JSON → PEM conversion + +### Future: AIProvider extraction + +If the provider spec grows significantly (credentials, processing +pipelines, additional metadata), it may warrant extraction into its own +CRD (`AIProvider`) referenced by name from AIRoutingPolicy. This would +enable provider reuse across gateways and finer-grained RBAC. The +internal `RoutingIntent` model should be structured so that this +extraction is a mechanical refactor. + +### Future: guardrails + +Per-provider guardrails via the `processing` field (see +[Future: guardrails](#future-guardrails) above). Depends on the +WG AI Gateway [Proposal 7: Payload Processing] maturing. Gateway-wide +guardrails would be a separate policy CRD and a separate design +proposal. + +## Compatibility notes + +- AI Gateway v0.6.0 requires `AIServiceBackend.backendRef` to point to + an Envoy Gateway `Backend` resource, not a raw Service. + See: https://github.com/envoyproxy/ai-gateway/issues/902 + +- `AIGatewayRoute.backendRefs` must omit `group` and `kind` to default + to `AIServiceBackend`. Explicit values are rejected unless they + specify `InferencePool`. + +- The AI Gateway CRDs emit deprecation warnings for `v1alpha1` in favor + of `v1beta1`. The controller should target `v1alpha1` initially and + migrate when the beta API stabilizes. + +## References + +- [Gateway API Policy Attachment (GEP-713)](https://gateway-api.sigs.k8s.io/geps/gep-713/) +- [WG AI Gateway](https://github.com/kubernetes-sigs/wg-ai-gateway) +- [WG AI Gateway Charter](https://github.com/kubernetes/community/blob/master/wg-ai-gateway/charter.md) +- [Proposal 10: Egress Gateways](https://github.com/kubernetes-sigs/wg-ai-gateway/blob/main/proposals/10-egress-gateways.md) +- [Proposal 7: Payload Processing](https://github.com/kubernetes-sigs/wg-ai-gateway/blob/main/proposals/7-payload-processing.md) +- [LiteLLM Router](https://docs.litellm.ai/docs/routing) +- [Envoy AI Gateway v0.6.0](https://aigateway.envoyproxy.io/) +- [Envoy Gateway extensionManager](https://gateway.envoyproxy.io/docs/tasks/extensibility/extension-server/) From 0132f91420a27a7c9edd03c449206c03d496520d Mon Sep 17 00:00:00 2001 From: usize Date: Tue, 9 Jun 2026 15:59:39 -0700 Subject: [PATCH 02/10] The title doesn't need to mention policy attachment patterns. Signed-off-by: usize --- docs/plans/ai-gateway.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/plans/ai-gateway.md b/docs/plans/ai-gateway.md index 92340874..ebeb6a0f 100644 --- a/docs/plans/ai-gateway.md +++ b/docs/plans/ai-gateway.md @@ -1,4 +1,4 @@ -# AI Gateway Policy Attachment +# AI Gateway Operator ## Status From b8df5ff950ec8a560b15c0cc202f8f045756d256 Mon Sep 17 00:00:00 2001 From: usize Date: Thu, 11 Jun 2026 16:20:46 -0700 Subject: [PATCH 03/10] Enumerate status conditions for resources. Signed-off-by: usize --- docs/plans/ai-gateway.md | 109 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 103 insertions(+), 6 deletions(-) diff --git a/docs/plans/ai-gateway.md b/docs/plans/ai-gateway.md index ebeb6a0f..efa2abe9 100644 --- a/docs/plans/ai-gateway.md +++ b/docs/plans/ai-gateway.md @@ -363,6 +363,95 @@ AIRoutingPolicy specifies HTTP listeners, the AIAccessPolicy overrides them to HTTPS — access policy takes precedence over routing config for the listener protocol. +### Status + +Both CRDs report their state through standard `metav1.Condition` +entries plus per-component status maps. The conditions give aggregate +signals (suitable for `kubectl wait --for=condition=...`); the maps +give per-provider or per-component detail for debugging. + +#### AIRoutingPolicy + +```yaml +status: + conditions: + - type: Accepted # spec is syntactically valid + status: "True" + reason: Valid + - type: GatewayBound # targetRef Gateway exists + status: "True" + reason: Bound + - type: ProvidersConfigured # all provider resources created + status: "False" + reason: PartialFailure + message: "2/3 providers configured" + - type: RoutingActive # AIGatewayRoute + BTP created and accepted + status: "True" + reason: Applied + + providers: + - name: ollama + ready: true + - name: openai + ready: true + - name: bedrock + ready: false + error: "invalid endpoint URL: missing scheme" + + models: + - name: gpt-4o + ready: true + - name: qwen2.5:3b + ready: true +``` + +| Condition | True when | False reasons | +|-----------|-----------|---------------| +| `Accepted` | Spec passes validation (endpoints parse, names unique) | `InvalidSpec` | +| `GatewayBound` | Target Gateway exists in namespace | `GatewayNotFound` | +| `ProvidersConfigured` | All Backend + AIServiceBackend + BSP resources created | `PartialFailure`, `ApplyFailed`, `CredentialSecretNotFound` | +| `RoutingActive` | AIGatewayRoute + BackendTrafficPolicy created and accepted | `ApplyFailed` | + +The `status.providers[]` list mirrors `spec.providers[]` and reports +per-provider readiness. When `ProvidersConfigured` is False, the +provider entries show exactly which providers failed and why — +eliminating guesswork in multi-provider configurations. + +The `status.models[]` list tracks per-model route rule creation. In +the common case all models are ready; a model becomes not-ready if its +referenced provider is missing from the spec. + +#### AIAccessPolicy + +```yaml +status: + conditions: + - type: Accepted + status: "True" + reason: Valid + - type: GatewayBound + status: "True" + reason: Bound + - type: BundleReady # trust bundle parsed with ≥1 valid cert + status: "True" + reason: Loaded + message: "2 certificates from SPIFFE JSON bundle" + - type: MTLSActive # CA Secret + server cert + CTP created + status: "True" + reason: Applied +``` + +| Condition | True when | False reasons | +|-----------|-----------|---------------| +| `Accepted` | Spec passes validation | `InvalidSpec` | +| `GatewayBound` | Target Gateway exists in namespace | `GatewayNotFound` | +| `BundleReady` | Trust bundle ConfigMap read and parsed with ≥1 valid cert | `BundleNotFound`, `BundleEmpty`, `BundleParseError` | +| `MTLSActive` | CA Secret, server cert, and ClientTrafficPolicy created | `ApplyFailed`, `CertGenerationFailed` | + +`BundleReady` is re-evaluated on every requeue (default 5 minutes), +so it reflects trust bundle rotations. The message includes the +certificate count for visibility into how many CAs are trusted. + ### Future: guardrails Content filtering, prompt injection detection, and PII redaction are @@ -515,27 +604,35 @@ policy simply means that capability isn't configured. ``` AIRoutingPolicyReconciler + ├── validate spec → set Accepted condition + ├── verify target Gateway exists → set GatewayBound condition ├── parse provider endpoints (URL → host + port) ├── for each provider: │ ├── create/update Backend (gateway.envoyproxy.io) │ ├── create/update AIServiceBackend (aigateway.envoyproxy.io) - │ └── create/update BackendSecurityPolicy (if credentials set) + │ ├── create/update BackendSecurityPolicy (if credentials set) + │ └── record result in status.providers[] + ├── set ProvidersConfigured condition (aggregate) ├── create/update AIGatewayRoute │ ├── one rule per model (x-ai-eg-model match → backendRefs) - │ └── llmRequestCosts (if any model has rateLimit) + │ ├── llmRequestCosts (if any model has rateLimit) + │ └── record result in status.models[] ├── create/update BackendTrafficPolicy │ ├── per-model failover/retry config - │ └── per-model rate limit rules (x-ai-eg-model header selectors) - ├── clean up orphaned resources for removed providers - └── update status conditions + │ └── per-model rate limit rules + ├── set RoutingActive condition + └── clean up orphaned resources for removed providers AIAccessPolicyReconciler + ├── validate spec → set Accepted condition + ├── verify target Gateway exists → set GatewayBound condition ├── read SPIRE trust bundle ConfigMap ├── parse SPIFFE JSON → extract x509-svid certs → PEM + ├── set BundleReady condition (with cert count) ├── create/update CA Secret ├── create/update self-signed server cert (if no serverCertRef) ├── create/update ClientTrafficPolicy - └── update status conditions + └── set MTLSActive condition ``` All generated resources carry an owner reference to their policy CR. From c519bdb0eb2414706e695a8b7e6302fb56cc1b2f Mon Sep 17 00:00:00 2001 From: usize Date: Thu, 11 Jun 2026 16:22:59 -0700 Subject: [PATCH 04/10] Describe client integration as out of scope. Signed-off-by: usize --- docs/plans/ai-gateway.md | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/docs/plans/ai-gateway.md b/docs/plans/ai-gateway.md index efa2abe9..2f128074 100644 --- a/docs/plans/ai-gateway.md +++ b/docs/plans/ai-gateway.md @@ -596,6 +596,37 @@ spec: model: qwen2.5:3b ``` +### Client integration + +This proposal covers the **gateway side**: programming the proxy to +require mTLS, route to LLM backends, and enforce access control. The +gateway validates client certificates against the SPIFFE trust bundle +CA and either accepts or rejects the TLS handshake. It does not care +how the client obtained its certificate. + +From the client's perspective, consuming the gateway requires two +things: + +1. **Endpoint** — the gateway's in-cluster Service address + (e.g. `https://..svc:8443`) +2. **Client certificate** — a valid X.509 SVID from the same trust + domain, presented during the TLS handshake + +Any workload with a SPIFFE identity can call the gateway. The agent +code itself doesn't change — it uses the standard OpenAI-compatible +`/v1/chat/completions` endpoint. How the workload acquires and +presents its SPIFFE certificate (CSI driver, spiffe-helper sidecar, +AuthBridge integration, or native go-spiffe) is the client-side +concern and is out of scope for this proposal. + +AuthBridge (agent-to-agent OAuth/OIDC) and the MCP Gateway (MCP +protocol routing) are orthogonal to AI Gateway (model routing and +inference access control). They target different traffic flows and +do not conflict. + +Client-side configuration for connecting workloads to AI Gateways +with SPIFFE identity may be addressed in a separate proposal. + ## Reconciliation Each policy has its own controller. They run independently, generate From 6e8e71cab4b4570f20979cc6427ecdcbc2f66ee4 Mon Sep 17 00:00:00 2001 From: usize Date: Thu, 11 Jun 2026 16:58:30 -0700 Subject: [PATCH 05/10] Defer multi-tenancy and multi-protocol support. Signed-off-by: usize --- docs/plans/ai-gateway.md | 48 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 45 insertions(+), 3 deletions(-) diff --git a/docs/plans/ai-gateway.md b/docs/plans/ai-gateway.md index 2f128074..21bb2c6d 100644 --- a/docs/plans/ai-gateway.md +++ b/docs/plans/ai-gateway.md @@ -309,9 +309,14 @@ so no coordination is needed. Requires Envoy Gateway's global rate limit service (Redis-backed) for cross-instance quota enforcement. -Per-user or per-tenant rate limiting (multi-tenant cost allocation) is -a separate concern. If needed, a future policy CRD could attach to the -Gateway to layer per-client quotas on top of per-model limits. +The initial design treats the namespace as a single tenant — rate +limit counters are scoped per model, not per client. Per-user or +per-tenant rate limiting is a separate concern. Multi-tenancy could +be addressed by extracting rate limiting into a separate policy CRD +with precedence rules akin to [BackendTrafficPolicy][] — where +more specific policies override broader defaults. + +[BackendTrafficPolicy]: https://gateway.envoyproxy.io/docs/concepts/gateway_api_extensions/backend-traffic-policy/ ### AIAccessPolicy @@ -505,6 +510,43 @@ proposal. Not in scope for the initial implementation. +### Future: protocol extensibility + +The provider `schema` field declares the protocol a backend speaks. +The initial implementation supports inference schemas (OpenAI, +AWSBedrock, etc.), but this maps directly to the [Backend protocol +model][Proposal 10] emerging in WG AI Gateway — where protocol is a +property of the destination, not the route. + +The `models` list is inference-specific: it defines virtual model +names, maps them to provider backends, and configures per-model +failover and rate limits. Non-inference protocols like MCP don't have +a model selection concept. Extending AIRoutingPolicy to support MCP +providers would add protocol-appropriate routing entries alongside +`models`, driven by the provider's schema: + +```yaml +providers: +- name: ollama + endpoint: http://ollama.svc:11434 + schema: OpenAI # inference — uses models[] +- name: tool-server + endpoint: http://tools.svc:8080 + schema: MCP # MCP — no model mapping needed + +models: # inference-specific +- name: qwen2.5:3b + backends: + - provider: ollama + model: qwen2.5:3b +``` + +The renderer would generate `AIGatewayRoute` for inference providers +and `MCPRoute` for MCP providers. Credential injection, mTLS, and +rate limiting apply uniformly regardless of protocol. The exact shape +of non-inference routing entries is left to a future proposal once +the WG AI Gateway Backend specification matures. + ## Example: complete deployment A platform admin sets up a Gateway with mTLS. A team lead configures From 49ebb83cd886bc6a136b31dceec33446b49f9c29 Mon Sep 17 00:00:00 2001 From: usize Date: Thu, 11 Jun 2026 17:11:02 -0700 Subject: [PATCH 06/10] Lay the foundation for a future PayloadProcessingPipeline resource. Signed-off-by: usize --- docs/plans/ai-gateway.md | 80 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 80 insertions(+) diff --git a/docs/plans/ai-gateway.md b/docs/plans/ai-gateway.md index 21bb2c6d..41ad03f8 100644 --- a/docs/plans/ai-gateway.md +++ b/docs/plans/ai-gateway.md @@ -547,6 +547,86 @@ rate limiting apply uniformly regardless of protocol. The exact shape of non-inference routing entries is left to a future proposal once the WG AI Gateway Backend specification matures. +### Future: PayloadProcessingPipeline + +The WG AI Gateway's [Proposal 7: Payload Processing] defines +[`PayloadProcessingPipeline`][PPP] — a resource for +well-ordered payload processing policies such as semantic caching, +model selection, guardrails, PII redaction, and prompt injection +detection. These policies could apply at a per-provider or +per-model level. + +In the upstream proposal, processing pipelines attach to `Backend` +resources and `Route` resources. In this proposal, we've abstracted +networking-specific concepts behind terms of art associated with +Generative AI. `Backend` maps onto "provider" — where a provider is +a backend that communicates via an AI protocol (e.g. Responses API, +MCP, A2A). `Route` is implicit in the model-to-provider mapping. +The user never sees the generated HTTPRoute or Backend resources +directly. + +This presents a tension. Consider a user who wants PII redaction on +requests to OpenAI but not to Ollama (different trust boundaries). +They need a PayloadProcessingPipeline that targets only the OpenAI +provider's traffic. Three approaches: + +**Target by name string.** The user references a provider or model +by name within the AIRoutingPolicy: + +```yaml +kind: PayloadProcessingPipeline +spec: + targetRef: + kind: AIRoutingPolicy + name: my-routing + provider: openai # string reference into the spec +``` + +The controller looks up the AIRoutingPolicy, finds the provider +named `openai`, determines which HTTPRoute rules correspond to +models using that provider, and attaches the processing pipeline to +those rules. This works, but the controller does significant +indirection, and the targeting vocabulary grows with every new +concept (provider, model, etc.). + +**Expose providers and models as resources.** Instead of one +AIRoutingPolicy containing everything, the user creates individual +resources that map to the networking primitives: + +```yaml +kind: AIProvider # generates Backend + AIServiceBackend +spec: + name: openai + endpoint: https://api.openai.com/v1 + schema: OpenAI +--- +kind: AIModel # generates a rule in AIGatewayRoute +spec: + name: gpt-4o + providerRef: openai + model: gpt-4o +``` + +PayloadProcessingPipeline can now target `AIProvider` or `AIModel` +directly using standard `targetRef` — no string indirection. This is +more composable but the user manages many resources instead of one +for a simple setup. + +**Hybrid: compact CR with optional break-out.** AIRoutingPolicy +stays as-is for simple deployments. When a user needs per-provider +processing, they extract that provider into a standalone `AIProvider` +CR and reference it by name from the AIRoutingPolicy. The standalone +resource is targetable by PayloadProcessingPipeline. This is similar +to how Kubernetes handles inline vs referenced specs — like a pod +template inline in a Deployment vs a standalone Pod. + +The right answer likely depends on real usage patterns. This +proposal establishes the routing and access control foundation; +payload processing attachment semantics will be addressed in a +follow-up once the WG specification matures. + +[PPP]: https://github.com/kubernetes-sigs/wg-ai-gateway/blob/main/proposals/7-payload-processing.md + ## Example: complete deployment A platform admin sets up a Gateway with mTLS. A team lead configures From 46f690555bcefc21bb09235dcd41450d8f2414c3 Mon Sep 17 00:00:00 2001 From: usize Date: Thu, 11 Jun 2026 17:13:54 -0700 Subject: [PATCH 07/10] Remove superfluous guardrails section | replaced by PayloadProcessingPipeline Signed-off-by: usize --- docs/plans/ai-gateway.md | 57 +++++++--------------------------------- 1 file changed, 9 insertions(+), 48 deletions(-) diff --git a/docs/plans/ai-gateway.md b/docs/plans/ai-gateway.md index 41ad03f8..cf7a315c 100644 --- a/docs/plans/ai-gateway.md +++ b/docs/plans/ai-gateway.md @@ -459,54 +459,15 @@ certificate count for visibility into how many CAs are trusted. ### Future: guardrails -Content filtering, prompt injection detection, and PII redaction are -per-provider concerns — trust boundaries differ between a local Ollama -instance and an external API endpoint. The natural home for guardrails -is inline on the provider definition, as an ordered processing pipeline: - -```yaml -providers: -- name: openai - endpoint: https://api.openai.com/v1 - schema: OpenAI - credentials: ... - processing: # future — not in initial scope - - name: pii-redaction - phase: request - extProc: - serviceRef: - name: pii-service - port: 50051 - - name: toxicity-check - phase: response - extProc: - serviceRef: - name: toxicity-service - port: 50051 - -- name: ollama - endpoint: http://ollama.svc:11434 - schema: OpenAI - # no processing — trusted local endpoint -``` - -The `processing` list is ordered, giving pipeline semantics. The -controller would generate per-route ext-proc filter chains so that -different providers get different processing. - -The WG AI Gateway's [Proposal 7: Payload Processing] defines a -`PayloadProcessingPipeline` CRD for ordered body/header processors as -HTTPRoute filters. When that proposal matures into an accepted API, the -`processing` field should generate `PayloadProcessingPipeline` resources -rather than raw ext-proc configuration. This follows the same principle -as routing: the user-facing spec stays stable while the generated -resources change as standards evolve. - -Gateway-wide guardrails (compliance requirements that apply regardless -of provider) are a separate concern. If needed, a future -`AIGuardrailsPolicy` could attach to the Gateway using the same policy -attachment pattern as AIAccessPolicy. That would be a separate design -proposal. +Content filtering, prompt injection detection, PII redaction, and +semantic caching are payload processing concerns that could apply at +a per-provider or per-model level — trust boundaries differ between +a local Ollama instance and an external API endpoint. The WG AI +Gateway's [Proposal 7: Payload Processing] defines +`PayloadProcessingPipeline` as the standard for this capability. +How we attach processing pipelines to our abstraction is discussed +in [Future: PayloadProcessingPipeline](#future-payloadprocessingpipeline) +below. Not in scope for the initial implementation. From 371dd51579e0bfd40b52adf856d7a88d8719ad5c Mon Sep 17 00:00:00 2001 From: usize Date: Thu, 11 Jun 2026 17:23:23 -0700 Subject: [PATCH 08/10] Explain the value of mTLS based inference access. Signed-off-by: usize --- docs/plans/ai-gateway.md | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/docs/plans/ai-gateway.md b/docs/plans/ai-gateway.md index cf7a315c..c9bf19aa 100644 --- a/docs/plans/ai-gateway.md +++ b/docs/plans/ai-gateway.md @@ -327,6 +327,41 @@ API key authentication. This is a gateway-level concern — it applies uniformly to all traffic entering the gateway, independent of which provider handles the request. +#### Why: tokenless inference access + +Today, giving an agent access to an LLM provider means provisioning +an API key, storing it in a Secret, mounting it into the workload, +and building rotation and revocation processes around it. API keys +are bearer tokens — if leaked, anyone can use them. In multi-team +environments, shared keys make attribution and cost accounting +difficult. + +With AIAccessPolicy, inference access is tied to workload identity. +A platform admin deploys an AI Gateway with mTLS. Agent workloads +in the namespace have SPIFFE identities provisioned by SPIRE. When +an agent calls the gateway, its X.509 SVID is validated against the +trust bundle — no API keys to provision, rotate, or leak. Workloads +without a SPIFFE identity in the trust domain are rejected at the +TLS handshake. The agent code is unchanged; it calls an +OpenAI-compatible endpoint. The credential management is pushed from +application teams to infrastructure. + +Provider-level API keys (for external providers like OpenAI) still +exist, but they live in one place — the AIRoutingPolicy's credential +configuration — managed by the platform team. Individual agents +never see them. + +This mechanism also opens the door to per-workload policy. After +mTLS validation, the proxy sets the `x-forwarded-client-cert` +header with the client's SPIFFE ID (e.g. +`spiffe://localtest.me/ns/team1/sa/weather-agent`). This header +cannot be spoofed — the proxy strips any client-supplied XFCC and +replaces it with data extracted from the validated certificate. Our +initial data plane (Envoy) supports this natively; the mechanism +may be replicated in future supported proxies. A future policy CRD +could match on XFCC to apply per-workload rate limits, model access +lists, or processing pipelines. + ```yaml apiVersion: aigateway.kagenti.dev/v1alpha1 kind: AIAccessPolicy From 8e3778f60a80dd70ce5a37fec407cf80db0bb921 Mon Sep 17 00:00:00 2001 From: usize Date: Thu, 11 Jun 2026 17:27:51 -0700 Subject: [PATCH 09/10] Fix inconsistencies leftover from new section additions. Signed-off-by: usize --- docs/plans/ai-gateway.md | 38 +++++++++++++++++++++++++------------- 1 file changed, 25 insertions(+), 13 deletions(-) diff --git a/docs/plans/ai-gateway.md b/docs/plans/ai-gateway.md index c9bf19aa..2d2489b4 100644 --- a/docs/plans/ai-gateway.md +++ b/docs/plans/ai-gateway.md @@ -68,9 +68,9 @@ networking in Gateway API. Two proposals are directly relevant: - [Proposal 7: Payload Processing] — introduces `PayloadProcessingPipeline` for ordered, sequential body/header processors (prompt validation, PII redaction, semantic routing) as - HTTPRoute filters. This is the emerging standard for guardrails and - is the intended generation target for per-provider processing - pipelines (see [Future: guardrails](#future-guardrails)). + HTTPRoute filters. This is the emerging standard for guardrails. + How we integrate it with our abstraction is discussed in + [Future: PayloadProcessingPipeline](#future-payloadprocessingpipeline). Our design generates the Envoy AI Gateway CRDs that exist today, but is structured so that when the WG proposals mature into accepted APIs, we @@ -737,10 +737,9 @@ presents its SPIFFE certificate (CSI driver, spiffe-helper sidecar, AuthBridge integration, or native go-spiffe) is the client-side concern and is out of scope for this proposal. -AuthBridge (agent-to-agent OAuth/OIDC) and the MCP Gateway (MCP -protocol routing) are orthogonal to AI Gateway (model routing and -inference access control). They target different traffic flows and -do not conflict. +AuthBridge (agent-to-agent OAuth/OIDC) is orthogonal to AI Gateway +(model routing and inference access control). They target different +traffic flows and do not conflict. Client-side configuration for connecting workloads to AI Gateways with SPIFFE identity may be addressed in a separate proposal. @@ -849,12 +848,26 @@ config: hooks: xdsTranslator: post: [Translation, Cluster, Route] + translation: + cluster: + includeAll: true + listener: + includeAll: true + route: + includeAll: true + secret: + includeAll: true service: fqdn: hostname: ai-gateway-controller..svc.cluster.local port: 1063 ``` +The `translation` block is required. Without it, Envoy Gateway +defaults to excluding listeners and routes from the +`PostTranslateModify` hook, and the AI Gateway extension server +cannot inject the ext_proc filter into the listener filter chain. + ## Implementation plan ### Phase 1: AIRoutingPolicy @@ -894,13 +907,12 @@ enable provider reuse across gateways and finer-grained RBAC. The internal `RoutingIntent` model should be structured so that this extraction is a mechanical refactor. -### Future: guardrails +### Future: payload processing -Per-provider guardrails via the `processing` field (see -[Future: guardrails](#future-guardrails) above). Depends on the -WG AI Gateway [Proposal 7: Payload Processing] maturing. Gateway-wide -guardrails would be a separate policy CRD and a separate design -proposal. +Per-provider and per-model guardrails via +[PayloadProcessingPipeline](#future-payloadprocessingpipeline). +Depends on the WG AI Gateway [Proposal 7: Payload Processing] +maturing. ## Compatibility notes From aebb7539a6c4d4412a51a3ba20fb9d30c836017e Mon Sep 17 00:00:00 2001 From: usize Date: Thu, 11 Jun 2026 17:43:04 -0700 Subject: [PATCH 10/10] Separate protocol from schema | align terminology with Gateway API upstream. Signed-off-by: usize --- docs/plans/ai-gateway.md | 38 ++++++++++++++++++++++++++++++++++---- 1 file changed, 34 insertions(+), 4 deletions(-) diff --git a/docs/plans/ai-gateway.md b/docs/plans/ai-gateway.md index 2d2489b4..2bafa47d 100644 --- a/docs/plans/ai-gateway.md +++ b/docs/plans/ai-gateway.md @@ -145,12 +145,27 @@ The core policy — without it, the Gateway has no AI routing. Providers and models are separate concepts: -- **Providers** define shared connection configuration: endpoint, API - schema, and credentials. A provider is referenced by name from model - backend entries. Each provider generates one Backend, one - AIServiceBackend, and (if credentials are set) one +- **Providers** define shared connection configuration: endpoint, + protocol, API schema, and credentials. A provider is referenced by + name from model backend entries. Each provider generates one + Backend, one AIServiceBackend, and (if credentials are set) one BackendSecurityPolicy. + Providers distinguish two layers: + + | Field | Layer | Examples | + |-------|-------|----------| + | `protocol` | Transport | `HTTP`, `HTTP2`, `gRPC` | + | `schema` | Application format | `OpenAI`, `AWSBedrock`, `MCP`, `A2A` | + + The WG AI Gateway's [Proposal 10: Egress Gateways] combines both + into a single `BackendProtocol` enum. We separate them + intentionally — MCP and A2A are application-layer schemas + (JSON-RPC over HTTP/SSE, HTTP respectively), not transport + protocols. A provider's protocol determines the generated + `Backend` transport configuration; its schema determines the + `AIServiceBackend` API translation. + - **Models** define the client-facing routing unit. Each model has a `name` (the virtual name clients request), a list of `backends` (provider references with the actual model name and failover @@ -428,6 +443,9 @@ status: - type: RoutingActive # AIGatewayRoute + BTP created and accepted status: "True" reason: Applied + - type: Programmed # aggregate: all sub-conditions True + status: "False" + reason: ProvidersNotReady providers: - name: ollama @@ -451,6 +469,14 @@ status: | `GatewayBound` | Target Gateway exists in namespace | `GatewayNotFound` | | `ProvidersConfigured` | All Backend + AIServiceBackend + BSP resources created | `PartialFailure`, `ApplyFailed`, `CredentialSecretNotFound` | | `RoutingActive` | AIGatewayRoute + BackendTrafficPolicy created and accepted | `ApplyFailed` | +| `Programmed` | All of the above are True | Mirrors the failing sub-condition's reason | + +`Programmed` is an aggregate signal per [GEP-713][]. It goes True +when `Accepted`, `GatewayBound`, `ProvidersConfigured`, and +`RoutingActive` are all True, giving a single condition suitable for +`kubectl wait --for=condition=Programmed`. + +[GEP-713]: https://gateway-api.sigs.k8s.io/geps/gep-713/ The `status.providers[]` list mirrors `spec.providers[]` and reports per-provider readiness. When `ProvidersConfigured` is False, the @@ -479,6 +505,9 @@ status: - type: MTLSActive # CA Secret + server cert + CTP created status: "True" reason: Applied + - type: Programmed # aggregate: all sub-conditions True + status: "True" + reason: Programmed ``` | Condition | True when | False reasons | @@ -487,6 +516,7 @@ status: | `GatewayBound` | Target Gateway exists in namespace | `GatewayNotFound` | | `BundleReady` | Trust bundle ConfigMap read and parsed with ≥1 valid cert | `BundleNotFound`, `BundleEmpty`, `BundleParseError` | | `MTLSActive` | CA Secret, server cert, and ClientTrafficPolicy created | `ApplyFailed`, `CertGenerationFailed` | +| `Programmed` | All of the above are True | Mirrors the failing sub-condition's reason | `BundleReady` is re-evaluated on every requeue (default 5 minutes), so it reflects trust bundle rotations. The message includes the