feat(ai-proxy): /v1/responses translation (ADR-0030 §2) by ndreno · Pull Request #81 · barbacane-dev/barbacane

ndreno · 2026-05-05T06:43:06Z

Replacement for #78 (auto-closed when PR-3's coverage branch was deleted on its merge). Same content + the post-review remediation commit; rebased onto main.

Original PR: #78

Adds the OpenAI Responses API surface as the second protocol on ai-proxy. Path-based dispatch routes POST /v1/responses through the new Responses adapter; everything else still routes to Chat Completions. Per-provider behavior ===================== - **OpenAI**: passthrough at /v1/responses upstream. Streaming uses host_http_stream like Chat Completions. - **Anthropic**: translate input[] items ↔ Messages content blocks: input_text / input_image → text / image content blocks function_call → tool_use block function_call_output → tool_result block reasoning → dropped (Anthropic doesn't accept client-supplied reasoning input); counted + Warning: 299 surfaced. `instructions` is hoisted to Anthropic's top-level `system` field. Streaming is buffered to a single terminal event (mirrors ADR-0024 Chat Completions until true SSE translation lands). - **Ollama**: 400 problem+json with code: responses_not_supported_for_provider. Ollama's OpenAI-compat surface is Chat Completions only as of 2026-04. Spec-level guards (preflight, before target resolution) ======================================================= - previous_response_id present (and not null) → 400 problem+json with code: previous_response_id_not_supported. The stateful Responses API requires session-scoped storage that ADR-0030 §2 explicitly defers; this is the forward-compat hook. - store: true | absent → permissive (process statelessly, attach Warning: 299 — "store ignored; gateway is stateless"). Most clients send store: true as an unexamined default; rejecting it would break them gratuitously. Operators see the downgrade via the barbacane_plugin_ai_proxy_responses_store_downgrades_total counter. - store: false → no warning, no counter. Synthetic Responses id ====================== Format: resp_<uuid-v7>. v7 is time-ordered, so a `resp_*` grep across access logs comes out chronologically without needing a separate sort key. Built manually from host_time_now + a per-instance counter — the wasm32-unknown-unknown target has no system RNG, but the v7 spec only requires monotonicity within a node, which the counter provides. Provider refactor ================= providers/anthropic.rs gains an anthropic_messages_call_raw helper that sends a pre-built Messages body and returns the raw upstream Response. The existing anthropic_call (used by Chat Completions) now layers on top of it; the new responses module uses it directly. No protocol-side duplication of auth headers / version pinning. Tests ===== - Plugin unit tests: 67 → 90 (+23). 8 preflight tests, 7 input-item translation cases, 5 output-item translation cases, 4 Warning header cases, 1 error response shape. - Integration tests (ai_proxy.rs): 9 → 15 (+6 covering OpenAI passthrough, previous_response_id 400, Ollama 400, full Anthropic translation roundtrip, store Warning header, reasoning Warning header). Schema description ================== config-schema.json now documents the path-dispatch matrix and the forward-compat scope of /v1/responses (stateless, what's rejected, synthetic id format). The schema for the route table itself is unchanged — Responses is a path concern, not a config one.

R1 (load-bearing) — OpenAI passthrough was leaking the upstream response `id` to the client, violating ADR-0030 §2's stateless-uniform contract. A client could read OpenAI's real id and send it back as `previous_response_id`, which the gateway rejects with 400 — the bug is the inconsistency between leaking the real id and rejecting its reuse. Fix: in the non-streaming passthrough path, parse the 2xx upstream body, replace `id` with a synthetic `resp_<uuid-v7>`, re-serialize. Streaming SSE rewrite is harder (the id is buried in `response.created` event payloads); marked as a known gap inline, inheriting ADR-0030 §2's existing SSE deferral. R2 — `AnthropicToResponses::translate` had two unused parameters (`store_downgrade`, `dropped_reasoning_count`) carried forward "for symmetry" with a `let _ = (...);` discard. Both signals flow through headers/metrics, never the body. Drop the parameters; less to read, less to wonder about. R3 — The Anthropic path was parsing the request body 4 times: preflight, extract_client_model, the handler's full-body translation, and again inside `ResponsesPreflight::from_body` with `.expect("preflight already ran in dispatch")` to recover `store_downgrade`. Stash the flag on context (`ai.responses.store_downgrade`) in `dispatch_responses`; the handler reads it back via `host::context_get`. Eliminates the .expect smell and one parse. Mirrors how `ai.target` already flows from cel into ai-proxy. R4 — `make_uuid_v7` doc-comment was over-promising "Two ids generated in the same millisecond differ in their counter portion." Tightened to "by the same plugin instance" + made the per-instance reset and 2^64 wrap explicit. Both inconsequential at realistic throughput, but worth saying out loud. R5 — `responses_not_supported_for_provider_response` took a `&str` and the call site passed the literal `"ollama"`. Type as `Provider`, call `.name()` for the body string. Compile-time guarantee against typos. R6 — `build_text_or_array_blocks` silently dropped unknown content-part types while the top-level item handler logs a warning. Made consistent — part-type drops also log via `host::log_warn`, so a future OpenAI part-type that Barbacane doesn't know yet stays diagnosable. R7 — The `flush` closure inside `ResponsesToAnthropic::translate` captured three `&mut`s and was invoked 4 times. Promoted to a private `flush_message(role, blocks, messages)` helper. No semantic change. Reads better; new tests can exercise it directly. R8 — Added unit test for interleaved [user, assistant, user] role sequences in `input[]` to lock in `flush_message`'s ordering — the most plausible regression site for a future change to the translation loop. Plus a coalescing test (consecutive same-role items land in one message) and two flush_message direct tests (no-op on empty buffer and on missing role). Tests: plugin 90 → 97 (+7: role-switch, coalescing, two flush_message direct, three rewrite_response_id_if_2xx covering 2xx, 4xx pass-through, and unparseable body pass-through). Integration 15 → 15 (the existing test renamed to `..._rewrites_id` and now asserts the id is synthetic + not equal to the upstream's real id, which would have caught the original bug).

ndreno added 2 commits May 5, 2026 08:42

ndreno merged commit e13c261 into main May 5, 2026
2 checks passed

ndreno deleted the feat/ai-proxy-responses-api branch May 5, 2026 06:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai-proxy): /v1/responses translation (ADR-0030 §2)#81

feat(ai-proxy): /v1/responses translation (ADR-0030 §2)#81
ndreno merged 2 commits intomainfrom
feat/ai-proxy-responses-api

ndreno commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ndreno commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant