Docs: Propose AI Gateway CRDs and Controller#419
Conversation
This proposal is derived from the upstream wg-ai-gateway project -- part of Kubernetes SIG-Network. It targets Envoy AI Gateway as a proof of concept, but is meant to be extended to support additional proxies. A novel integration with SPIFFE is proposed, as a value add unique to our platform. Signed-off-by: usize <mofoster@redhat.com>
Signed-off-by: usize <mofoster@redhat.com>
|
@usize I like the direction where this is going ! |
pdettori
left a comment
There was a problem hiding this comment.
Great direction — this is essentially the "service mesh for LLM calls" layer that completes Kagenti's infrastructure story. Kagenti already provides framework-neutral auth, identity (SPIFFE), and scaling for agents; adding platform-level control over the inference egress path is the natural next step. The renderer abstraction and alignment with WG AI Gateway proposals are solid forward-looking choices.
A few comments inline on areas that would strengthen the proposal.
| ├── read SPIRE trust bundle ConfigMap | ||
| ├── parse SPIFFE JSON → extract x509-svid certs → PEM | ||
| ├── create/update CA Secret | ||
| ├── create/update self-signed server cert (if no serverCertRef) |
There was a problem hiding this comment.
Status conditions need to be defined. The reconciler mentions "update status conditions" but the proposal never specifies what conditions are set. For production readiness, operators need clear signals: Is the gateway bound? Are providers valid? Is the SPIFFE bundle healthy?
Suggest adding a concrete .status.conditions schema for both CRDs, e.g.:
GatewayBound— target Gateway found and acceptedProvidersConfigured— all referenced providers validated (credentials exist, endpoints reachable)RoutingApplied— downstream resources generated successfullyBundleReady(AIAccessPolicy) — SPIFFE bundle parsed with ≥1 valid cert
Without these, debugging "why isn't my model routing?" becomes guesswork.
There was a problem hiding this comment.
Err, @pdettori I gave Claude access to the gh tool and it disobeyed my global CLAUDE.md and posted a comment without my permission. I have a separate account and token (https://github.com/usize-agent) which is configured in my sandboxes. Unfortunately this instance was a quick session that I spun up without using my sandbox command. Lesson learned. Apologies to post AI garble. I'm updating the proposal. :]
|
|
||
| ### Gateway API | ||
|
|
||
| [Gateway API] is the Kubernetes-native standard for configuring network |
There was a problem hiding this comment.
Agent integration example missing. This is the key value prop — "tokenless inference access to agent workloads" — but the proposal doesn't show how an agent actually consumes the gateway. What URL does it call? How does it present its SPIFFE cert? Does the webhook sidecar handle this automatically?
A short end-to-end example connecting AIAccessPolicy → Gateway → Agent (showing the env var or mount the agent uses) would make this much more tangible and help reviewers validate the design against real usage patterns.
Also worth clarifying: how does this coexist with AuthBridge/AuthProxy if both are on the same namespace? And is this orthogonal to the existing MCP Gateway (MCP = protocol routing, AI Gateway = model routing)?
There was a problem hiding this comment.
I've addressed Agent integration with AIAccessPolicy via the Why: tokenless inference access section.
I've touched on support for additional protocols e.g., MCP and integration with projects like AuthBridge as well.
The tl;dr is, yes, we can support MCP and A2A policies and I gave an example of what it would look like. But it's out of scope for our initial implementation, which is focused on governing inference.
| routing. This maps directly to Envoy AI Gateway's rate limiting | ||
| mechanism — one rule, one model header selector, one Redis counter — | ||
| with no ambiguity in descriptor grouping. | ||
|
|
There was a problem hiding this comment.
Multi-tenancy / namespace scoping for rate limits. The rate limit design uses x-ai-eg-model header matching with one Redis counter per model. But in a multi-team deployment:
- If two teams' AIRoutingPolicies define a model named
gpt-4oagainst the same Gateway, do they share a Redis counter? That would silently pool their quotas. - If they use separate Gateways, is that documented as the isolation boundary?
Suggest clarifying whether model names are namespace-scoped in the Redis descriptor key (e.g., <namespace>/<model-name>) or whether the expectation is one Gateway per team.
More broadly: a short section on the namespace model (can teams share a Gateway? must each own one?) would help operators plan their deployment topology.
There was a problem hiding this comment.
If two teams' AIRoutingPolicies define a model named gpt-4o against the same Gateway, do they share a Redis counter? That would silently pool their quotas.
In the current design. Yes. I punted that down below via:
Per-user or per-tenant rate limiting (multi-tenant cost allocation) is
a separate concern. If needed, a future policy CRD could attach to the
Gateway to layer per-client quotas on top of per-model limits.
Where per-tenant policy should live is a difficult question. If we create a different AIRoutingPolicy per tenant, we will end up duplicating a lot of boilerplate.
The model I had considered moving toward was to treat AIRoutingPolicy as a global configuration. With token rate limits etc... existing as defaults.
Then we could add another CRD for per-tenant configurations that overrides it. Let me think this through and update the proposal with something tidier here.
|
Hi, the naming of this GW is very confusing in the context of an agent platform. |
|
Re: naming The idea is to support the class of proxies broadly being called AI Gateways. See e.g., https://github.com/kubernetes-sigs/wg-ai-gateway It's a term of art being used more broadly. For example, https://github.com/BerriAI/litellm is one of the most popular "AI Gateway" projects in use in agentic systems today. What I will say, is that while governing inference is the initial scope. Support for programming a proxy with body based policies opens up the possibility of governing A2A traffic, MCP traffic etc.... Calling it an inference gateway operator would contradict the positioning of the proxies it's meant to program like Envoy AI Gateway, AgentGateway and others. |
Signed-off-by: usize <mofoster@redhat.com>
Signed-off-by: usize <mofoster@redhat.com>
Signed-off-by: usize <mofoster@redhat.com>
Signed-off-by: usize <mofoster@redhat.com>
…Pipeline Signed-off-by: usize <mofoster@redhat.com>
Signed-off-by: usize <mofoster@redhat.com>
Signed-off-by: usize <mofoster@redhat.com>
|
@pdettori I've added a number of new sections here.
|
…stream. Signed-off-by: usize <mofoster@redhat.com>
Summary
This proposal is derived from the upstream wg-ai-gateway project -- part of Kubernetes SIG-Network.
It targets Envoy AI Gateway as a proof of concept, but is meant to be extended to support additional proxies.
A novel integration with SPIFFE is proposed, as a value add unique to our platform.
Related issue(s)
kagenti/kagenti#1165