feat(ai-proxy): /v1/responses translation (ADR-0030 §2) by ndreno · Pull Request #78 · barbacane-dev/barbacane

ndreno · 2026-05-04T12:49:14Z

Summary

Adds the OpenAI Responses API as the second protocol on ai-proxy, completing ADR-0030 §1's path-aware dispatcher promise. Stateless only; the stateful surface (previous_response_id, GET /v1/responses/{id}) deferred per ADR-0030 §2.

This is PR-4 in the implementation plan, stacked on PR #77 (the coverage PR which includes the drive-by path-dispatch fix). Will rebase down the chain as each lower PR merges.

Per-provider matrix

Provider	Behavior
OpenAI	Passthrough at `/v1/responses` upstream. Streaming via `host_http_stream`.
Anthropic	Translate `input[]` ↔ Messages `content` blocks; translate the response back. Streaming buffered to a single terminal event (true SSE translation deferred).
Ollama	400 `responses_not_supported_for_provider`. Ollama's OpenAI-compat surface is Chat Completions only as of 2026-04.

Translation map (Anthropic path)

Responses item	Anthropic content block
`input_text` (string or `[{type: input_text, text}]`)	`text`
`input_image` (`image_url` or base64)	`image`
`function_call` (assistant tool call)	`tool_use`
`function_call_output` (client tool result)	`tool_result`
`reasoning`	Dropped — Anthropic doesn't accept client-supplied reasoning. Counted; `Warning: 299` attached.
Top-level `instructions`	Hoisted to Anthropic's `system` field

Output: Anthropic text blocks → output_text items; tool_use → function_call; usage.input_tokens/output_tokens map directly. Synthetic id = "resp_" + uuid-v7 — time-ordered so log greps come out chronologically.

Stateless guards (preflight, runs before target resolution)

Body field	Behavior
`previous_response_id: <non-null>`	400 `previous_response_id_not_supported`
`store: true` (or absent — OpenAI server-side default)	Permissive. `Warning: 299 - "store ignored; gateway is stateless"` + `barbacane_plugin_ai_proxy_responses_store_downgrades_total` counter
`store: false`	No warning, no counter
`model` missing	400 `model_required` (existing PR-2 path)

Synthetic id design

resp_<uuid-v7>. Built manually from host_time_now() + a per-instance counter — the wasm32-unknown-unknown target has no system RNG, but the v7 spec only requires monotonicity within a node, which the counter provides. The id is opaque per the OpenAI Responses contract; clients reading it as a tracking handle work unchanged.

Provider refactor

providers/anthropic.rs now exposes anthropic_messages_call_raw(target, body) -> Result<Response, String> that sends a pre-built Messages body and returns the raw upstream Response. The existing anthropic_call (Chat Completions) layers on top; the new Responses adapter calls it directly. No protocol-side duplication of auth headers / version pinning.

Test plan

Plugin unit tests: 90 passed (was 67; +23 across preflight, translation in, translation out, warning headers, error responses)
Integration tests: 15 passed in ai_proxy.rs (was 9; +6 covering OpenAI passthrough, previous_response_id 400, Ollama 400, full Anthropic translation roundtrip, store Warning header, reasoning Warning header)
cargo build --target wasm32-unknown-unknown --release — clean (uuid crate without js feature; v7 built manually)
cargo test --workspace --exclude barbacane-test — all green
cargo clippy --lib --bins — zero warnings
cargo fmt --all -- --check — clean
bash docs/rulesets/tests/run-tests.sh — 15/15 pass
CI green (modulo pre-existing wasmtime advisory addressed in PR fix(deps): resolve RUSTSEC-2026-0114 (wasmtime patch bump) #74)

Out of scope (deferred per ADR-0030 §2)

True token-by-token SSE translation — same deferral as ADR-0024 Chat Completions. Anthropic streaming buffers to a single terminal event.
Stateful Responses API (previous_response_id, GET /v1/responses/{id}, cancel). Requires a session-scoped storage capability that doesn't exist in the WASM runtime. The 400 rejection is the forward-compat hook.
Ollama Responses support (no upstream surface).

…nge) ADR-0030 §1 calls for the dispatcher to become protocol-aware, with path-based dispatch picking translation between Chat Completions and the Responses API on the same target pool. This PR is the mechanical move that preps the source layout — zero behavior change for the only existing path (/v1/chat/completions). Source layout: - lib.rs orchestration (target resolution, fallback chain, metrics, context propagation, shared helpers, host stubs); now declares mod protocols/providers. - protocols/ chat_completion.rs OpenAI Chat Completions adapter — handle(), translate_to_anthropic, translate_from_anthropic, AnthropicRequest. - providers/ openai.rs OpenAI-compatible transport (openai_call, openai_stream, maybe_inject_max_tokens, openai_url, openai_headers). anthropic.rs Anthropic Messages transport + ANTHROPIC_API_VERSION constant (was inlined at lib.rs:359). ollama.rs Empty slot — Ollama shares OpenAI transport today. ADR-0030 §2 will use this file to reject Responses requests against an Ollama target. Dispatch is now path-aware via a function-pointer indirection (`ProtocolHandler` type alias). Today only /v1/chat/completions is routed; ADR-0030 PR-4 adds /v1/responses, PR-5 adds /v1/models. Unknown paths return 404 with `urn:barbacane:error:not-found` (one new test covers this). All 43 existing unit tests pass unchanged via super::module::* paths; test count goes 43 → 44 with the new path-dispatch test. WASM build clean. Workspace clippy + fmt clean. Pre-push checklist: - cargo build --workspace clean - cargo build --target wasm32-unknown-unknown -r clean - cargo test --workspace --exclude barbacane-test all green - cargo fmt --all -- --check clean - cargo clippy --lib --bins clean - cargo deny check advisories FAILS on RUSTSEC-2026-0114 (wasmtime), pre-existing on main, separate PR.

…(ADR-0030 §0) BREAKING CHANGE for ADR-0024 deployments. The model identifier is no longer a gateway-side config knob — the client's `model` field on the request body is passed to the upstream provider verbatim. The gateway declares providers (where to go, with what credentials), never an authoritative model list. Removed: - `model: String` from `TargetConfig` (was required) - `model: Option<String>` from the flat `AiProxy` top-level config - The `target.model` fallback in `translate_to_anthropic` - The `cfg.model` assertion in `config_flat_minimal` Added: - `extract_client_model(body)` — parses the `model` field from an OpenAI-format request body. Returns `None` for absent body, malformed JSON, missing field, non-string value, or empty string. - `model_required_response()` — `400 problem+json` with `urn:barbacane:error:model_required` and `code: "model_required"`. Returned by `dispatch()` when the client omits `model`. Matches both upstream provider contracts (OpenAI Chat Completions and Responses both require `model`) and ADR-0030 §0's caller-owned-model principle. - `dispatch_chat_completion()` — path-specific helper that extracts the client model upfront, short-circuits 400 if missing, otherwise calls the shared orchestration loop with the client model plumbed through. - `#[serde(deny_unknown_fields)]` on `AiProxy` and `TargetConfig` — closes the runtime safety net, so leftover nested `model:` (which vacuum's auto-generated validator does not recurse into yet) fails at WASM instance load with a clear "unknown field model" error. Plumbing: - `ProtocolHandler` signature now takes `client_model: &str` as a 4th argument before `streaming`. The orchestration loop extracts the model once and passes it through to the handler. - `propagate_context(target, client_model, resp)` now writes `ai.model` from the client value, not `target.model`. Downstream middlewares (`ai-cost-tracker`, `ai-token-limit`) read the same identifier the client requested. - `anthropic_call` and `translate_to_anthropic` take `client_model: &str` explicitly; no fallback. Migration: - vacuum surfaces leftover `model:` at lint time on the flat config via the auto-generated `additionalProperties: false` check; the message names the field. Runtime `deny_unknown_fields` catches leftover `model:` on `targets.<>` and `fallback[]`. - All shipped fixtures updated: `tests/fixtures/ai-{proxy,gateway}.yaml`, `crates/barbacane-test/tests/ai_{proxy,gateway}.rs`, `docs/rulesets/tests/valid-complete.yaml`. Tests: 50 unit (was 44; +6 covering model_required, extract_client_model helper, legacy-model-rejection at top-level and nested target). 14/14 ruleset tests pass. Considered and rejected: a dedicated `barbacane-ai-proxy-no-model` vacuum rule. Migration-specific lint rules accumulate forever; the auto-generated message + CHANGELOG entry is sufficient. The genuine gap (vacuum doesn't recurse into nested objects) is addressed by a separate generator improvement that benefits every plugin, not by a bespoke rule for this one migration.

Adds glob-based dynamic model routing as a third resolution layer between ai.target context lookup and default_target/flat fallthrough. Each routes entry binds a glob pattern (e.g. claude-*, gpt-4o*, o[1-4]*) to a provider + credentials. First match wins. Catalog policy lives on the target via optional allow/deny glob lists. Critically, allow/deny applies on every resolution path that produces a target carrying those rules — including ai.target-driven dispatch — so a cel misconfig that sets ai.target to a target whose deny covers the requested model still gets 403. Catalog policy is a property of the target, not the resolution path. Resolution precedence (4-step ladder, ADR-0030 §3): 1. ai.target context key (set by upstream cel) 2. routes glob match against client model 3. default_target → targets[name] 4. flat provider config Failure modes: - 400 model_required (PR-2): client omitted model - 400 no_route (new): routes configured but no entry matched and no fallthrough — operator's catalog doesn't cover the requested model - 403 model_not_permitted (new): allow/deny rejected the model. Does NOT fall through to fallback or to another route — that would silently escalate a denied model to a different provider. Escape hatch: tighten the route's pattern so non-matching models miss the route entirely and reach the catch-all. - 500 misconfiguration: nothing configured, or a route's glob fails to compile (surfaced from ensure_compiled_routes at first dispatch). Implementation: - New Route struct (pattern + provider + credentials + allow/deny). - New CompiledRoute caching the precompiled GlobMatcher per route. - ensure_compiled_routes runs once per plugin instance, lazily on first dispatch — same pattern as cel's compiled CEL program. - New ResolveOutcome enum distinguishes Resolved / NoRouteMatch / NotConfigured so dispatch can map each to the right HTTP shape. - evaluate_catalog_policy compiles per-target allow/deny on the fly (lists are typically <5 entries; cheap). Fails closed on a glob compile error rather than silently bypassing the policy. - Glob library: `globset` 0.4 with case-sensitive, anchored matching. default-features = false to keep the WASM binary small. - Schema: TargetConfig gains optional allow/deny; new RouteEntry; new GlobPattern referenced from both. Pattern allowlist regex ^[A-Za-z0-9_*?\[\]\-:.+/]+$ pins glob characters at lint time so vacuum surfaces nonsense like regex syntax before runtime. - New resolution_total counter with `resolution=context|routes|default| flat` label and a debug log (ai-proxy: resolved provider=X via=Y) for the "why did my request go there?" debugging case. Tests: 67/67 (was 50; +17 covering routes first-match-wins, catch-all fallthrough, no_route when no fallthrough, default/flat fallthrough, ai.target overrides routes, invalid glob compile error, allow pass/ reject, deny pass/match, allow+deny ordering, end-to-end 400 no_route, end-to-end 403 model_not_permitted, no-fallthrough on deny, the ai.target+deny subtlety, and resolution_total label emission). 14/14 ruleset tests pass.

…e-by dispatch fix Adds integration tests, fixture coverage, and a vacuum migration fixture for the ADR-0030 implementation stack (PR-1 → PR-3). Surfaced and fixed one bug along the way. Drive-by fix ============ The path-based dispatch added in PR-1 (#73) was too strict: it returned 404 for any req.path != /v1/chat/completions, breaking every existing test fixture and any operator-defined operation path. The dispatcher shouldn't constrain the operator's choice of path when only one protocol is on offer. Now defaults to chat_completion::handle for any path; PR-4 will narrowly add /v1/responses when there's a real second protocol to differentiate. Removes the dead 404 arm from error_response(). The unit test from PR-1 that asserted the rejected behavior is replaced with one that asserts the dispatcher accepts custom paths today. Coverage added ============== Integration tests (crates/barbacane-test/tests/ai_proxy.rs, +5 tests): - test_ai_proxy_routes_first_match_wins — wiremock with three route prefixes proves claude-* / gpt-* / catch-all dispatch to the right upstream URL via the actual data plane pipeline. - test_ai_proxy_400_when_body_omits_model — proves the model_required short-circuit fires end-to-end. - test_ai_proxy_400_no_route_when_model_does_not_match — proves the no_route response shape ships through the data plane. - test_ai_proxy_403_model_not_permitted_does_not_reach_upstream — wiremock with .expect(0) proves the upstream is never called when catalog policy denies; would catch a regression that leaks through to the provider. - test_ai_proxy_403_does_not_fall_through_to_next_route — proves the no-fallthrough rule from ADR-0030 §3 holds in the real pipeline. End-to-end tests (crates/barbacane-test/tests/ai_gateway.rs, +2 tests): - cel_driven_target_deny_fires_403_not_silent_pass — the load-bearing ADR-0030 §3 subtlety in the actual pipeline. cel writes ai.target=anthropic-tier based on a header; the named target carries deny:["claude-opus-*"]; a request with claude-opus-4-6 hits 403, not silent pass. Mock has .expect(0) so the test proves the upstream is never called. - cel_driven_target_deny_passes_when_model_does_not_match_deny — positive control for the same spec; proves a non-denied model still reaches upstream. Compilation smoke (tests/fixtures/ai-proxy.yaml, +2 operations): - /ai/routed/chat/completions with full routes table including allow, deny, and a catch-all. - /ai/restricted/chat/completions with default_target + catalog deny on a named target. Vacuum migration UX (docs/rulesets/tests/invalid-ai-proxy-leftover- model.yaml + run-tests.sh): - Regression fixture proving the auto-generated dispatch validator surfaces "Unknown config field 'model' for dispatcher 'ai-proxy'" with the full list of allowed fields when an operator forgets to delete `model:` after upgrading from ADR-0024. The lint message is the migration UX promised in PR-2's CHANGELOG. Known gap (out of scope for this PR): nested-glob lint coverage for routes[].pattern and allow/deny entries. The auto-generated validator doesn't recurse into nested objects, so vacuum can't catch a regex- syntax pattern at lint time today. Runtime catches it via globset compile error from ensure_compiled_routes (covered by unit tests). This is the generator-recursion improvement we discussed earlier — a separate PR that benefits every plugin's nested schema, not a migration-specific lint. Test counts: plugin 793 → 801 (+8 — PR-1 unit test relaxed +1 test, PR-3 +17 already in stack, this PR +8 from drive-by fix and new helpers). Integration 275 → 282 (+7). 15/15 ruleset tests pass.

Adds the OpenAI Responses API surface as the second protocol on ai-proxy. Path-based dispatch routes POST /v1/responses through the new Responses adapter; everything else still routes to Chat Completions. Per-provider behavior ===================== - **OpenAI**: passthrough at /v1/responses upstream. Streaming uses host_http_stream like Chat Completions. - **Anthropic**: translate input[] items ↔ Messages content blocks: input_text / input_image → text / image content blocks function_call → tool_use block function_call_output → tool_result block reasoning → dropped (Anthropic doesn't accept client-supplied reasoning input); counted + Warning: 299 surfaced. `instructions` is hoisted to Anthropic's top-level `system` field. Streaming is buffered to a single terminal event (mirrors ADR-0024 Chat Completions until true SSE translation lands). - **Ollama**: 400 problem+json with code: responses_not_supported_for_provider. Ollama's OpenAI-compat surface is Chat Completions only as of 2026-04. Spec-level guards (preflight, before target resolution) ======================================================= - previous_response_id present (and not null) → 400 problem+json with code: previous_response_id_not_supported. The stateful Responses API requires session-scoped storage that ADR-0030 §2 explicitly defers; this is the forward-compat hook. - store: true | absent → permissive (process statelessly, attach Warning: 299 — "store ignored; gateway is stateless"). Most clients send store: true as an unexamined default; rejecting it would break them gratuitously. Operators see the downgrade via the barbacane_plugin_ai_proxy_responses_store_downgrades_total counter. - store: false → no warning, no counter. Synthetic Responses id ====================== Format: resp_<uuid-v7>. v7 is time-ordered, so a `resp_*` grep across access logs comes out chronologically without needing a separate sort key. Built manually from host_time_now + a per-instance counter — the wasm32-unknown-unknown target has no system RNG, but the v7 spec only requires monotonicity within a node, which the counter provides. Provider refactor ================= providers/anthropic.rs gains an anthropic_messages_call_raw helper that sends a pre-built Messages body and returns the raw upstream Response. The existing anthropic_call (used by Chat Completions) now layers on top of it; the new responses module uses it directly. No protocol-side duplication of auth headers / version pinning. Tests ===== - Plugin unit tests: 67 → 90 (+23). 8 preflight tests, 7 input-item translation cases, 5 output-item translation cases, 4 Warning header cases, 1 error response shape. - Integration tests (ai_proxy.rs): 9 → 15 (+6 covering OpenAI passthrough, previous_response_id 400, Ollama 400, full Anthropic translation roundtrip, store Warning header, reasoning Warning header). Schema description ================== config-schema.json now documents the path-dispatch matrix and the forward-compat scope of /v1/responses (stateless, what's rejected, synthetic id format). The schema for the route table itself is unchanged — Responses is a path concern, not a config one.

R1 (load-bearing) — OpenAI passthrough was leaking the upstream response `id` to the client, violating ADR-0030 §2's stateless-uniform contract. A client could read OpenAI's real id and send it back as `previous_response_id`, which the gateway rejects with 400 — the bug is the inconsistency between leaking the real id and rejecting its reuse. Fix: in the non-streaming passthrough path, parse the 2xx upstream body, replace `id` with a synthetic `resp_<uuid-v7>`, re-serialize. Streaming SSE rewrite is harder (the id is buried in `response.created` event payloads); marked as a known gap inline, inheriting ADR-0030 §2's existing SSE deferral. R2 — `AnthropicToResponses::translate` had two unused parameters (`store_downgrade`, `dropped_reasoning_count`) carried forward "for symmetry" with a `let _ = (...);` discard. Both signals flow through headers/metrics, never the body. Drop the parameters; less to read, less to wonder about. R3 — The Anthropic path was parsing the request body 4 times: preflight, extract_client_model, the handler's full-body translation, and again inside `ResponsesPreflight::from_body` with `.expect("preflight already ran in dispatch")` to recover `store_downgrade`. Stash the flag on context (`ai.responses.store_downgrade`) in `dispatch_responses`; the handler reads it back via `host::context_get`. Eliminates the .expect smell and one parse. Mirrors how `ai.target` already flows from cel into ai-proxy. R4 — `make_uuid_v7` doc-comment was over-promising "Two ids generated in the same millisecond differ in their counter portion." Tightened to "by the same plugin instance" + made the per-instance reset and 2^64 wrap explicit. Both inconsequential at realistic throughput, but worth saying out loud. R5 — `responses_not_supported_for_provider_response` took a `&str` and the call site passed the literal `"ollama"`. Type as `Provider`, call `.name()` for the body string. Compile-time guarantee against typos. R6 — `build_text_or_array_blocks` silently dropped unknown content-part types while the top-level item handler logs a warning. Made consistent — part-type drops also log via `host::log_warn`, so a future OpenAI part-type that Barbacane doesn't know yet stays diagnosable. R7 — The `flush` closure inside `ResponsesToAnthropic::translate` captured three `&mut`s and was invoked 4 times. Promoted to a private `flush_message(role, blocks, messages)` helper. No semantic change. Reads better; new tests can exercise it directly. R8 — Added unit test for interleaved [user, assistant, user] role sequences in `input[]` to lock in `flush_message`'s ordering — the most plausible regression site for a future change to the translation loop. Plus a coalescing test (consecutive same-role items land in one message) and two flush_message direct tests (no-op on empty buffer and on missing role). Tests: plugin 90 → 97 (+7: role-switch, coalescing, two flush_message direct, three rewrite_response_id_if_2xx covering 2xx, 4xx pass-through, and unparseable body pass-through). Integration 15 → 15 (the existing test renamed to `..._rewrites_id` and now asserts the id is synthetic + not equal to the upstream's real id, which would have caught the original bug).

ndreno added 6 commits May 4, 2026 11:36

ndreno force-pushed the test/ai-proxy-routes-coverage branch from e6546f0 to ffd5cb0 Compare May 5, 2026 06:41

ndreno deleted the branch test/ai-proxy-routes-coverage May 5, 2026 06:42

ndreno closed this May 5, 2026

ndreno mentioned this pull request May 5, 2026

feat(ai-proxy): /v1/responses translation (ADR-0030 §2) #81

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai-proxy): /v1/responses translation (ADR-0030 §2)#78

feat(ai-proxy): /v1/responses translation (ADR-0030 §2)#78
ndreno wants to merge 6 commits intotest/ai-proxy-routes-coveragefrom
feat/ai-proxy-responses-api

ndreno commented May 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ndreno commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Per-provider matrix

Translation map (Anthropic path)

Stateless guards (preflight, runs before target resolution)

Synthetic id design

Provider refactor

Test plan

Out of scope (deferred per ADR-0030 §2)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ndreno commented May 4, 2026 •

edited

Loading