Skip to content

Docs: Propose AI Gateway CRDs and Controller#419

Open
usize wants to merge 10 commits into
kagenti:mainfrom
usize:plan/ai-gateway
Open

Docs: Propose AI Gateway CRDs and Controller#419
usize wants to merge 10 commits into
kagenti:mainfrom
usize:plan/ai-gateway

Conversation

@usize

@usize usize commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

This proposal is derived from the upstream wg-ai-gateway project -- part of Kubernetes SIG-Network.

It targets Envoy AI Gateway as a proof of concept, but is meant to be extended to support additional proxies.

A novel integration with SPIFFE is proposed, as a value add unique to our platform.

Related issue(s)

kagenti/kagenti#1165

This proposal is derived from the upstream wg-ai-gateway project --
part of Kubernetes SIG-Network.

It targets Envoy AI Gateway as a proof of concept, but is meant to
be extended to support additional proxies.

A novel integration with SPIFFE is proposed, as a value add unique
to our platform.

Signed-off-by: usize <mofoster@redhat.com>
@usize usize requested a review from a team as a code owner June 9, 2026 22:57
Signed-off-by: usize <mofoster@redhat.com>
@pdettori

Copy link
Copy Markdown
Member

@usize I like the direction where this is going !

@pdettori pdettori left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great direction — this is essentially the "service mesh for LLM calls" layer that completes Kagenti's infrastructure story. Kagenti already provides framework-neutral auth, identity (SPIFFE), and scaling for agents; adding platform-level control over the inference egress path is the natural next step. The renderer abstraction and alignment with WG AI Gateway proposals are solid forward-looking choices.

A few comments inline on areas that would strengthen the proposal.

Comment thread docs/plans/ai-gateway.md
├── read SPIRE trust bundle ConfigMap
├── parse SPIFFE JSON → extract x509-svid certs → PEM
├── create/update CA Secret
├── create/update self-signed server cert (if no serverCertRef)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status conditions need to be defined. The reconciler mentions "update status conditions" but the proposal never specifies what conditions are set. For production readiness, operators need clear signals: Is the gateway bound? Are providers valid? Is the SPIFFE bundle healthy?

Suggest adding a concrete .status.conditions schema for both CRDs, e.g.:

  • GatewayBound — target Gateway found and accepted
  • ProvidersConfigured — all referenced providers validated (credentials exist, endpoints reachable)
  • RoutingApplied — downstream resources generated successfully
  • BundleReady (AIAccessPolicy) — SPIFFE bundle parsed with ≥1 valid cert

Without these, debugging "why isn't my model routing?" becomes guesswork.

@usize usize Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, @pdettori I gave Claude access to the gh tool and it disobeyed my global CLAUDE.md and posted a comment without my permission. I have a separate account and token (https://github.com/usize-agent) which is configured in my sandboxes. Unfortunately this instance was a quick session that I spun up without using my sandbox command. Lesson learned. Apologies to post AI garble. I'm updating the proposal. :]

Comment thread docs/plans/ai-gateway.md

### Gateway API

[Gateway API] is the Kubernetes-native standard for configuring network

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent integration example missing. This is the key value prop — "tokenless inference access to agent workloads" — but the proposal doesn't show how an agent actually consumes the gateway. What URL does it call? How does it present its SPIFFE cert? Does the webhook sidecar handle this automatically?

A short end-to-end example connecting AIAccessPolicy → Gateway → Agent (showing the env var or mount the agent uses) would make this much more tangible and help reviewers validate the design against real usage patterns.

Also worth clarifying: how does this coexist with AuthBridge/AuthProxy if both are on the same namespace? And is this orthogonal to the existing MCP Gateway (MCP = protocol routing, AI Gateway = model routing)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed Agent integration with AIAccessPolicy via the Why: tokenless inference access section.

I've touched on support for additional protocols e.g., MCP and integration with projects like AuthBridge as well.

The tl;dr is, yes, we can support MCP and A2A policies and I gave an example of what it would look like. But it's out of scope for our initial implementation, which is focused on governing inference.

Comment thread docs/plans/ai-gateway.md
routing. This maps directly to Envoy AI Gateway's rate limiting
mechanism — one rule, one model header selector, one Redis counter —
with no ambiguity in descriptor grouping.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-tenancy / namespace scoping for rate limits. The rate limit design uses x-ai-eg-model header matching with one Redis counter per model. But in a multi-team deployment:

  1. If two teams' AIRoutingPolicies define a model named gpt-4o against the same Gateway, do they share a Redis counter? That would silently pool their quotas.
  2. If they use separate Gateways, is that documented as the isolation boundary?

Suggest clarifying whether model names are namespace-scoped in the Redis descriptor key (e.g., <namespace>/<model-name>) or whether the expectation is one Gateway per team.

More broadly: a short section on the namespace model (can teams share a Gateway? must each own one?) would help operators plan their deployment topology.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If two teams' AIRoutingPolicies define a model named gpt-4o against the same Gateway, do they share a Redis counter? That would silently pool their quotas.

In the current design. Yes. I punted that down below via:

Per-user or per-tenant rate limiting (multi-tenant cost allocation) is
a separate concern. If needed, a future policy CRD could attach to the
Gateway to layer per-client quotas on top of per-model limits.

Where per-tenant policy should live is a difficult question. If we create a different AIRoutingPolicy per tenant, we will end up duplicating a lot of boilerplate.

The model I had considered moving toward was to treat AIRoutingPolicy as a global configuration. With token rate limits etc... existing as defaults.

Then we could add another CRD for per-tenant configurations that overrides it. Let me think this through and update the proposal with something tidier here.

@davidhadas

Copy link
Copy Markdown

Hi, the naming of this GW is very confusing in the context of an agent platform.
From an agent platform POV it is an LLM GW, not an AI GW.
The agent platform offer ai services to users by deploying agents. These agents access an LLM resource, other agents, MCP resource and maybe other resources. ANy access to an LLM resource goes via the LLM GW - this makes sense :)

@usize

usize commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Re: naming

The idea is to support the class of proxies broadly being called AI Gateways. See e.g., https://github.com/kubernetes-sigs/wg-ai-gateway

It's a term of art being used more broadly. For example, https://github.com/BerriAI/litellm is one of the most popular "AI Gateway" projects in use in agentic systems today.

What I will say, is that while governing inference is the initial scope. Support for programming a proxy with body based policies opens up the possibility of governing A2A traffic, MCP traffic etc....

Calling it an inference gateway operator would contradict the positioning of the proxies it's meant to program like Envoy AI Gateway, AgentGateway and others.

usize added 7 commits June 11, 2026 17:16
Signed-off-by: usize <mofoster@redhat.com>
Signed-off-by: usize <mofoster@redhat.com>
Signed-off-by: usize <mofoster@redhat.com>
…Pipeline

Signed-off-by: usize <mofoster@redhat.com>
Signed-off-by: usize <mofoster@redhat.com>
Signed-off-by: usize <mofoster@redhat.com>
@usize

usize commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

@pdettori I've added a number of new sections here.

  • fleshed out status conditions.
  • described both the why and the mechanism behind tokenless inference access via AIAccessPolicy
  • deferred client access to a followup -- I want to keep this proposal focused on programming the Gateway.
  • deferred multi-tenancy but elaborated on a possible solution via BackendTrafficPolicy-style precedence along with some open questions we'll need to answer.
  • elaborated on future support paths for Providers that speak protocols like MCP or A2A, along with Inference request protocols.
  • laid the foundation for inclusion of a PayloadProcessingPipeline to configure arbitrary body based policies in the Gateway. This is out of scope for this proposal, however, the discussion here should give us a good jumping off point when we area ready to tackle it.

@usize usize requested a review from pdettori June 12, 2026 00:32
…stream.

Signed-off-by: usize <mofoster@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants