feat(ai-proxy): /v1/responses translation (ADR-0030 §2)#78
Closed
ndreno wants to merge 6 commits intotest/ai-proxy-routes-coveragefrom
Closed
feat(ai-proxy): /v1/responses translation (ADR-0030 §2)#78ndreno wants to merge 6 commits intotest/ai-proxy-routes-coveragefrom
ndreno wants to merge 6 commits intotest/ai-proxy-routes-coveragefrom
Conversation
…nge)
ADR-0030 §1 calls for the dispatcher to become protocol-aware, with
path-based dispatch picking translation between Chat Completions and
the Responses API on the same target pool. This PR is the mechanical
move that preps the source layout — zero behavior change for the only
existing path (/v1/chat/completions).
Source layout:
- lib.rs orchestration (target resolution, fallback chain,
metrics, context propagation, shared helpers,
host stubs); now declares mod protocols/providers.
- protocols/
chat_completion.rs OpenAI Chat Completions adapter — handle(),
translate_to_anthropic, translate_from_anthropic,
AnthropicRequest.
- providers/
openai.rs OpenAI-compatible transport (openai_call,
openai_stream, maybe_inject_max_tokens, openai_url,
openai_headers).
anthropic.rs Anthropic Messages transport + ANTHROPIC_API_VERSION
constant (was inlined at lib.rs:359).
ollama.rs Empty slot — Ollama shares OpenAI transport today.
ADR-0030 §2 will use this file to reject Responses
requests against an Ollama target.
Dispatch is now path-aware via a function-pointer indirection
(`ProtocolHandler` type alias). Today only /v1/chat/completions is
routed; ADR-0030 PR-4 adds /v1/responses, PR-5 adds /v1/models. Unknown
paths return 404 with `urn:barbacane:error:not-found` (one new test
covers this).
All 43 existing unit tests pass unchanged via super::module::* paths;
test count goes 43 → 44 with the new path-dispatch test. WASM build
clean. Workspace clippy + fmt clean.
Pre-push checklist:
- cargo build --workspace clean
- cargo build --target wasm32-unknown-unknown -r clean
- cargo test --workspace --exclude barbacane-test all green
- cargo fmt --all -- --check clean
- cargo clippy --lib --bins clean
- cargo deny check advisories FAILS on
RUSTSEC-2026-0114 (wasmtime), pre-existing on main, separate PR.
…(ADR-0030 §0)
BREAKING CHANGE for ADR-0024 deployments. The model identifier is no
longer a gateway-side config knob — the client's `model` field on the
request body is passed to the upstream provider verbatim. The gateway
declares providers (where to go, with what credentials), never an
authoritative model list.
Removed:
- `model: String` from `TargetConfig` (was required)
- `model: Option<String>` from the flat `AiProxy` top-level config
- The `target.model` fallback in `translate_to_anthropic`
- The `cfg.model` assertion in `config_flat_minimal`
Added:
- `extract_client_model(body)` — parses the `model` field from an
OpenAI-format request body. Returns `None` for absent body, malformed
JSON, missing field, non-string value, or empty string.
- `model_required_response()` — `400 problem+json` with
`urn:barbacane:error:model_required` and `code: "model_required"`.
Returned by `dispatch()` when the client omits `model`. Matches both
upstream provider contracts (OpenAI Chat Completions and Responses
both require `model`) and ADR-0030 §0's caller-owned-model principle.
- `dispatch_chat_completion()` — path-specific helper that extracts the
client model upfront, short-circuits 400 if missing, otherwise calls
the shared orchestration loop with the client model plumbed through.
- `#[serde(deny_unknown_fields)]` on `AiProxy` and `TargetConfig` —
closes the runtime safety net, so leftover nested `model:` (which
vacuum's auto-generated validator does not recurse into yet) fails
at WASM instance load with a clear "unknown field model" error.
Plumbing:
- `ProtocolHandler` signature now takes `client_model: &str` as a 4th
argument before `streaming`. The orchestration loop extracts the
model once and passes it through to the handler.
- `propagate_context(target, client_model, resp)` now writes `ai.model`
from the client value, not `target.model`. Downstream middlewares
(`ai-cost-tracker`, `ai-token-limit`) read the same identifier the
client requested.
- `anthropic_call` and `translate_to_anthropic` take `client_model: &str`
explicitly; no fallback.
Migration:
- vacuum surfaces leftover `model:` at lint time on the flat config
via the auto-generated `additionalProperties: false` check; the
message names the field. Runtime `deny_unknown_fields` catches
leftover `model:` on `targets.<>` and `fallback[]`.
- All shipped fixtures updated: `tests/fixtures/ai-{proxy,gateway}.yaml`,
`crates/barbacane-test/tests/ai_{proxy,gateway}.rs`,
`docs/rulesets/tests/valid-complete.yaml`.
Tests: 50 unit (was 44; +6 covering model_required, extract_client_model
helper, legacy-model-rejection at top-level and nested target).
14/14 ruleset tests pass.
Considered and rejected: a dedicated `barbacane-ai-proxy-no-model`
vacuum rule. Migration-specific lint rules accumulate forever; the
auto-generated message + CHANGELOG entry is sufficient. The genuine
gap (vacuum doesn't recurse into nested objects) is addressed by a
separate generator improvement that benefits every plugin, not by a
bespoke rule for this one migration.
Adds glob-based dynamic model routing as a third resolution layer between ai.target context lookup and default_target/flat fallthrough. Each routes entry binds a glob pattern (e.g. claude-*, gpt-4o*, o[1-4]*) to a provider + credentials. First match wins. Catalog policy lives on the target via optional allow/deny glob lists. Critically, allow/deny applies on every resolution path that produces a target carrying those rules — including ai.target-driven dispatch — so a cel misconfig that sets ai.target to a target whose deny covers the requested model still gets 403. Catalog policy is a property of the target, not the resolution path. Resolution precedence (4-step ladder, ADR-0030 §3): 1. ai.target context key (set by upstream cel) 2. routes glob match against client model 3. default_target → targets[name] 4. flat provider config Failure modes: - 400 model_required (PR-2): client omitted model - 400 no_route (new): routes configured but no entry matched and no fallthrough — operator's catalog doesn't cover the requested model - 403 model_not_permitted (new): allow/deny rejected the model. Does NOT fall through to fallback or to another route — that would silently escalate a denied model to a different provider. Escape hatch: tighten the route's pattern so non-matching models miss the route entirely and reach the catch-all. - 500 misconfiguration: nothing configured, or a route's glob fails to compile (surfaced from ensure_compiled_routes at first dispatch). Implementation: - New Route struct (pattern + provider + credentials + allow/deny). - New CompiledRoute caching the precompiled GlobMatcher per route. - ensure_compiled_routes runs once per plugin instance, lazily on first dispatch — same pattern as cel's compiled CEL program. - New ResolveOutcome enum distinguishes Resolved / NoRouteMatch / NotConfigured so dispatch can map each to the right HTTP shape. - evaluate_catalog_policy compiles per-target allow/deny on the fly (lists are typically <5 entries; cheap). Fails closed on a glob compile error rather than silently bypassing the policy. - Glob library: `globset` 0.4 with case-sensitive, anchored matching. default-features = false to keep the WASM binary small. - Schema: TargetConfig gains optional allow/deny; new RouteEntry; new GlobPattern referenced from both. Pattern allowlist regex ^[A-Za-z0-9_*?\[\]\-:.+/]+$ pins glob characters at lint time so vacuum surfaces nonsense like regex syntax before runtime. - New resolution_total counter with `resolution=context|routes|default| flat` label and a debug log (ai-proxy: resolved provider=X via=Y) for the "why did my request go there?" debugging case. Tests: 67/67 (was 50; +17 covering routes first-match-wins, catch-all fallthrough, no_route when no fallthrough, default/flat fallthrough, ai.target overrides routes, invalid glob compile error, allow pass/ reject, deny pass/match, allow+deny ordering, end-to-end 400 no_route, end-to-end 403 model_not_permitted, no-fallthrough on deny, the ai.target+deny subtlety, and resolution_total label emission). 14/14 ruleset tests pass.
…e-by dispatch fix Adds integration tests, fixture coverage, and a vacuum migration fixture for the ADR-0030 implementation stack (PR-1 → PR-3). Surfaced and fixed one bug along the way. Drive-by fix ============ The path-based dispatch added in PR-1 (#73) was too strict: it returned 404 for any req.path != /v1/chat/completions, breaking every existing test fixture and any operator-defined operation path. The dispatcher shouldn't constrain the operator's choice of path when only one protocol is on offer. Now defaults to chat_completion::handle for any path; PR-4 will narrowly add /v1/responses when there's a real second protocol to differentiate. Removes the dead 404 arm from error_response(). The unit test from PR-1 that asserted the rejected behavior is replaced with one that asserts the dispatcher accepts custom paths today. Coverage added ============== Integration tests (crates/barbacane-test/tests/ai_proxy.rs, +5 tests): - test_ai_proxy_routes_first_match_wins — wiremock with three route prefixes proves claude-* / gpt-* / catch-all dispatch to the right upstream URL via the actual data plane pipeline. - test_ai_proxy_400_when_body_omits_model — proves the model_required short-circuit fires end-to-end. - test_ai_proxy_400_no_route_when_model_does_not_match — proves the no_route response shape ships through the data plane. - test_ai_proxy_403_model_not_permitted_does_not_reach_upstream — wiremock with .expect(0) proves the upstream is never called when catalog policy denies; would catch a regression that leaks through to the provider. - test_ai_proxy_403_does_not_fall_through_to_next_route — proves the no-fallthrough rule from ADR-0030 §3 holds in the real pipeline. End-to-end tests (crates/barbacane-test/tests/ai_gateway.rs, +2 tests): - cel_driven_target_deny_fires_403_not_silent_pass — the load-bearing ADR-0030 §3 subtlety in the actual pipeline. cel writes ai.target=anthropic-tier based on a header; the named target carries deny:["claude-opus-*"]; a request with claude-opus-4-6 hits 403, not silent pass. Mock has .expect(0) so the test proves the upstream is never called. - cel_driven_target_deny_passes_when_model_does_not_match_deny — positive control for the same spec; proves a non-denied model still reaches upstream. Compilation smoke (tests/fixtures/ai-proxy.yaml, +2 operations): - /ai/routed/chat/completions with full routes table including allow, deny, and a catch-all. - /ai/restricted/chat/completions with default_target + catalog deny on a named target. Vacuum migration UX (docs/rulesets/tests/invalid-ai-proxy-leftover- model.yaml + run-tests.sh): - Regression fixture proving the auto-generated dispatch validator surfaces "Unknown config field 'model' for dispatcher 'ai-proxy'" with the full list of allowed fields when an operator forgets to delete `model:` after upgrading from ADR-0024. The lint message is the migration UX promised in PR-2's CHANGELOG. Known gap (out of scope for this PR): nested-glob lint coverage for routes[].pattern and allow/deny entries. The auto-generated validator doesn't recurse into nested objects, so vacuum can't catch a regex- syntax pattern at lint time today. Runtime catches it via globset compile error from ensure_compiled_routes (covered by unit tests). This is the generator-recursion improvement we discussed earlier — a separate PR that benefits every plugin's nested schema, not a migration-specific lint. Test counts: plugin 793 → 801 (+8 — PR-1 unit test relaxed +1 test, PR-3 +17 already in stack, this PR +8 from drive-by fix and new helpers). Integration 275 → 282 (+7). 15/15 ruleset tests pass.
Adds the OpenAI Responses API surface as the second protocol on
ai-proxy. Path-based dispatch routes POST /v1/responses through the new
Responses adapter; everything else still routes to Chat Completions.
Per-provider behavior
=====================
- **OpenAI**: passthrough at /v1/responses upstream. Streaming uses
host_http_stream like Chat Completions.
- **Anthropic**: translate input[] items ↔ Messages content blocks:
input_text / input_image → text / image content blocks
function_call → tool_use block
function_call_output → tool_result block
reasoning → dropped (Anthropic doesn't accept
client-supplied reasoning input);
counted + Warning: 299 surfaced.
`instructions` is hoisted to Anthropic's top-level `system` field.
Streaming is buffered to a single terminal event (mirrors ADR-0024
Chat Completions until true SSE translation lands).
- **Ollama**: 400 problem+json with code:
responses_not_supported_for_provider. Ollama's OpenAI-compat surface
is Chat Completions only as of 2026-04.
Spec-level guards (preflight, before target resolution)
=======================================================
- previous_response_id present (and not null) → 400 problem+json with
code: previous_response_id_not_supported. The stateful Responses API
requires session-scoped storage that ADR-0030 §2 explicitly defers;
this is the forward-compat hook.
- store: true | absent → permissive (process statelessly, attach
Warning: 299 — "store ignored; gateway is stateless"). Most clients
send store: true as an unexamined default; rejecting it would break
them gratuitously. Operators see the downgrade via the
barbacane_plugin_ai_proxy_responses_store_downgrades_total counter.
- store: false → no warning, no counter.
Synthetic Responses id
======================
Format: resp_<uuid-v7>. v7 is time-ordered, so a `resp_*` grep across
access logs comes out chronologically without needing a separate sort
key. Built manually from host_time_now + a per-instance counter — the
wasm32-unknown-unknown target has no system RNG, but the v7 spec only
requires monotonicity within a node, which the counter provides.
Provider refactor
=================
providers/anthropic.rs gains an anthropic_messages_call_raw helper that
sends a pre-built Messages body and returns the raw upstream Response.
The existing anthropic_call (used by Chat Completions) now layers on
top of it; the new responses module uses it directly. No protocol-side
duplication of auth headers / version pinning.
Tests
=====
- Plugin unit tests: 67 → 90 (+23). 8 preflight tests, 7 input-item
translation cases, 5 output-item translation cases, 4 Warning header
cases, 1 error response shape.
- Integration tests (ai_proxy.rs): 9 → 15 (+6 covering OpenAI
passthrough, previous_response_id 400, Ollama 400, full Anthropic
translation roundtrip, store Warning header, reasoning Warning
header).
Schema description
==================
config-schema.json now documents the path-dispatch matrix and the
forward-compat scope of /v1/responses (stateless, what's rejected,
synthetic id format). The schema for the route table itself is
unchanged — Responses is a path concern, not a config one.
R1 (load-bearing) — OpenAI passthrough was leaking the upstream response
`id` to the client, violating ADR-0030 §2's stateless-uniform contract.
A client could read OpenAI's real id and send it back as
`previous_response_id`, which the gateway rejects with 400 — the bug is
the inconsistency between leaking the real id and rejecting its reuse.
Fix: in the non-streaming passthrough path, parse the 2xx upstream body,
replace `id` with a synthetic `resp_<uuid-v7>`, re-serialize. Streaming
SSE rewrite is harder (the id is buried in `response.created` event
payloads); marked as a known gap inline, inheriting ADR-0030 §2's
existing SSE deferral.
R2 — `AnthropicToResponses::translate` had two unused parameters
(`store_downgrade`, `dropped_reasoning_count`) carried forward "for
symmetry" with a `let _ = (...);` discard. Both signals flow through
headers/metrics, never the body. Drop the parameters; less to read,
less to wonder about.
R3 — The Anthropic path was parsing the request body 4 times: preflight,
extract_client_model, the handler's full-body translation, and again
inside `ResponsesPreflight::from_body` with `.expect("preflight already
ran in dispatch")` to recover `store_downgrade`. Stash the flag on
context (`ai.responses.store_downgrade`) in `dispatch_responses`; the
handler reads it back via `host::context_get`. Eliminates the .expect
smell and one parse. Mirrors how `ai.target` already flows from cel
into ai-proxy.
R4 — `make_uuid_v7` doc-comment was over-promising "Two ids generated
in the same millisecond differ in their counter portion." Tightened to
"by the same plugin instance" + made the per-instance reset and 2^64
wrap explicit. Both inconsequential at realistic throughput, but worth
saying out loud.
R5 — `responses_not_supported_for_provider_response` took a `&str` and
the call site passed the literal `"ollama"`. Type as `Provider`, call
`.name()` for the body string. Compile-time guarantee against typos.
R6 — `build_text_or_array_blocks` silently dropped unknown content-part
types while the top-level item handler logs a warning. Made consistent
— part-type drops also log via `host::log_warn`, so a future OpenAI
part-type that Barbacane doesn't know yet stays diagnosable.
R7 — The `flush` closure inside `ResponsesToAnthropic::translate`
captured three `&mut`s and was invoked 4 times. Promoted to a private
`flush_message(role, blocks, messages)` helper. No semantic change.
Reads better; new tests can exercise it directly.
R8 — Added unit test for interleaved [user, assistant, user] role
sequences in `input[]` to lock in `flush_message`'s ordering — the
most plausible regression site for a future change to the translation
loop. Plus a coalescing test (consecutive same-role items land in one
message) and two flush_message direct tests (no-op on empty buffer
and on missing role).
Tests: plugin 90 → 97 (+7: role-switch, coalescing, two flush_message
direct, three rewrite_response_id_if_2xx covering 2xx, 4xx pass-through,
and unparseable body pass-through). Integration 15 → 15 (the existing
test renamed to `..._rewrites_id` and now asserts the id is synthetic +
not equal to the upstream's real id, which would have caught the
original bug).
e6546f0 to
ffd5cb0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the OpenAI Responses API as the second protocol on
ai-proxy, completing ADR-0030 §1's path-aware dispatcher promise. Stateless only; the stateful surface (previous_response_id,GET /v1/responses/{id}) deferred per ADR-0030 §2.This is PR-4 in the implementation plan, stacked on PR #77 (the coverage PR which includes the drive-by path-dispatch fix). Will rebase down the chain as each lower PR merges.
Per-provider matrix
/v1/responsesupstream. Streaming viahost_http_stream.input[]↔ Messagescontentblocks; translate the response back. Streaming buffered to a single terminal event (true SSE translation deferred).responses_not_supported_for_provider. Ollama's OpenAI-compat surface is Chat Completions only as of 2026-04.Translation map (Anthropic path)
input_text(string or[{type: input_text, text}])textinput_image(image_urlor base64)imagefunction_call(assistant tool call)tool_usefunction_call_output(client tool result)tool_resultreasoningWarning: 299attached.instructionssystemfieldOutput: Anthropic
textblocks →output_textitems;tool_use→function_call;usage.input_tokens/output_tokensmap directly. Syntheticid = "resp_" + uuid-v7— time-ordered so log greps come out chronologically.Stateless guards (preflight, runs before target resolution)
previous_response_id: <non-null>previous_response_id_not_supportedstore: true(or absent — OpenAI server-side default)Warning: 299 - "store ignored; gateway is stateless"+barbacane_plugin_ai_proxy_responses_store_downgrades_totalcounterstore: falsemodelmissingmodel_required(existing PR-2 path)Synthetic id design
resp_<uuid-v7>. Built manually fromhost_time_now()+ a per-instance counter — thewasm32-unknown-unknowntarget has no system RNG, but the v7 spec only requires monotonicity within a node, which the counter provides. The id is opaque per the OpenAI Responses contract; clients reading it as a tracking handle work unchanged.Provider refactor
providers/anthropic.rsnow exposesanthropic_messages_call_raw(target, body) -> Result<Response, String>that sends a pre-built Messages body and returns the raw upstreamResponse. The existinganthropic_call(Chat Completions) layers on top; the new Responses adapter calls it directly. No protocol-side duplication of auth headers / version pinning.Test plan
ai_proxy.rs(was 9; +6 covering OpenAI passthrough,previous_response_id400, Ollama 400, full Anthropic translation roundtrip, store Warning header, reasoning Warning header)cargo build --target wasm32-unknown-unknown --release— clean (uuid crate withoutjsfeature; v7 built manually)cargo test --workspace --exclude barbacane-test— all greencargo clippy --lib --bins— zero warningscargo fmt --all -- --check— cleanbash docs/rulesets/tests/run-tests.sh— 15/15 passOut of scope (deferred per ADR-0030 §2)
previous_response_id,GET /v1/responses/{id}, cancel). Requires a session-scoped storage capability that doesn't exist in the WASM runtime. The 400 rejection is the forward-compat hook.