Skip to content

feat(ai-proxy): /v1/responses translation (ADR-0030 §2)#81

Merged
ndreno merged 2 commits intomainfrom
feat/ai-proxy-responses-api
May 5, 2026
Merged

feat(ai-proxy): /v1/responses translation (ADR-0030 §2)#81
ndreno merged 2 commits intomainfrom
feat/ai-proxy-responses-api

Conversation

@ndreno
Copy link
Copy Markdown
Contributor

@ndreno ndreno commented May 5, 2026

Replacement for #78 (auto-closed when PR-3's coverage branch was deleted on its merge). Same content + the post-review remediation commit; rebased onto main.

Original PR: #78

ndreno added 2 commits May 5, 2026 08:42
Adds the OpenAI Responses API surface as the second protocol on
ai-proxy. Path-based dispatch routes POST /v1/responses through the new
Responses adapter; everything else still routes to Chat Completions.

Per-provider behavior
=====================

- **OpenAI**: passthrough at /v1/responses upstream. Streaming uses
  host_http_stream like Chat Completions.
- **Anthropic**: translate input[] items ↔ Messages content blocks:
    input_text / input_image     → text / image content blocks
    function_call                → tool_use block
    function_call_output         → tool_result block
    reasoning                    → dropped (Anthropic doesn't accept
                                   client-supplied reasoning input);
                                   counted + Warning: 299 surfaced.
  `instructions` is hoisted to Anthropic's top-level `system` field.
  Streaming is buffered to a single terminal event (mirrors ADR-0024
  Chat Completions until true SSE translation lands).
- **Ollama**: 400 problem+json with code:
  responses_not_supported_for_provider. Ollama's OpenAI-compat surface
  is Chat Completions only as of 2026-04.

Spec-level guards (preflight, before target resolution)
=======================================================

- previous_response_id present (and not null) → 400 problem+json with
  code: previous_response_id_not_supported. The stateful Responses API
  requires session-scoped storage that ADR-0030 §2 explicitly defers;
  this is the forward-compat hook.
- store: true | absent → permissive (process statelessly, attach
  Warning: 299 — "store ignored; gateway is stateless"). Most clients
  send store: true as an unexamined default; rejecting it would break
  them gratuitously. Operators see the downgrade via the
  barbacane_plugin_ai_proxy_responses_store_downgrades_total counter.
- store: false → no warning, no counter.

Synthetic Responses id
======================

Format: resp_<uuid-v7>. v7 is time-ordered, so a `resp_*` grep across
access logs comes out chronologically without needing a separate sort
key. Built manually from host_time_now + a per-instance counter — the
wasm32-unknown-unknown target has no system RNG, but the v7 spec only
requires monotonicity within a node, which the counter provides.

Provider refactor
=================

providers/anthropic.rs gains an anthropic_messages_call_raw helper that
sends a pre-built Messages body and returns the raw upstream Response.
The existing anthropic_call (used by Chat Completions) now layers on
top of it; the new responses module uses it directly. No protocol-side
duplication of auth headers / version pinning.

Tests
=====

- Plugin unit tests: 67 → 90 (+23). 8 preflight tests, 7 input-item
  translation cases, 5 output-item translation cases, 4 Warning header
  cases, 1 error response shape.
- Integration tests (ai_proxy.rs): 9 → 15 (+6 covering OpenAI
  passthrough, previous_response_id 400, Ollama 400, full Anthropic
  translation roundtrip, store Warning header, reasoning Warning
  header).

Schema description
==================

config-schema.json now documents the path-dispatch matrix and the
forward-compat scope of /v1/responses (stateless, what's rejected,
synthetic id format). The schema for the route table itself is
unchanged — Responses is a path concern, not a config one.
R1 (load-bearing) — OpenAI passthrough was leaking the upstream response
`id` to the client, violating ADR-0030 §2's stateless-uniform contract.
A client could read OpenAI's real id and send it back as
`previous_response_id`, which the gateway rejects with 400 — the bug is
the inconsistency between leaking the real id and rejecting its reuse.
Fix: in the non-streaming passthrough path, parse the 2xx upstream body,
replace `id` with a synthetic `resp_<uuid-v7>`, re-serialize. Streaming
SSE rewrite is harder (the id is buried in `response.created` event
payloads); marked as a known gap inline, inheriting ADR-0030 §2's
existing SSE deferral.

R2 — `AnthropicToResponses::translate` had two unused parameters
(`store_downgrade`, `dropped_reasoning_count`) carried forward "for
symmetry" with a `let _ = (...);` discard. Both signals flow through
headers/metrics, never the body. Drop the parameters; less to read,
less to wonder about.

R3 — The Anthropic path was parsing the request body 4 times: preflight,
extract_client_model, the handler's full-body translation, and again
inside `ResponsesPreflight::from_body` with `.expect("preflight already
ran in dispatch")` to recover `store_downgrade`. Stash the flag on
context (`ai.responses.store_downgrade`) in `dispatch_responses`; the
handler reads it back via `host::context_get`. Eliminates the .expect
smell and one parse. Mirrors how `ai.target` already flows from cel
into ai-proxy.

R4 — `make_uuid_v7` doc-comment was over-promising "Two ids generated
in the same millisecond differ in their counter portion." Tightened to
"by the same plugin instance" + made the per-instance reset and 2^64
wrap explicit. Both inconsequential at realistic throughput, but worth
saying out loud.

R5 — `responses_not_supported_for_provider_response` took a `&str` and
the call site passed the literal `"ollama"`. Type as `Provider`, call
`.name()` for the body string. Compile-time guarantee against typos.

R6 — `build_text_or_array_blocks` silently dropped unknown content-part
types while the top-level item handler logs a warning. Made consistent
— part-type drops also log via `host::log_warn`, so a future OpenAI
part-type that Barbacane doesn't know yet stays diagnosable.

R7 — The `flush` closure inside `ResponsesToAnthropic::translate`
captured three `&mut`s and was invoked 4 times. Promoted to a private
`flush_message(role, blocks, messages)` helper. No semantic change.
Reads better; new tests can exercise it directly.

R8 — Added unit test for interleaved [user, assistant, user] role
sequences in `input[]` to lock in `flush_message`'s ordering — the
most plausible regression site for a future change to the translation
loop. Plus a coalescing test (consecutive same-role items land in one
message) and two flush_message direct tests (no-op on empty buffer
and on missing role).

Tests: plugin 90 → 97 (+7: role-switch, coalescing, two flush_message
direct, three rewrite_response_id_if_2xx covering 2xx, 4xx pass-through,
and unparseable body pass-through). Integration 15 → 15 (the existing
test renamed to `..._rewrites_id` and now asserts the id is synthetic +
not equal to the upstream's real id, which would have caught the
original bug).
@ndreno ndreno merged commit e13c261 into main May 5, 2026
2 checks passed
@ndreno ndreno deleted the feat/ai-proxy-responses-api branch May 5, 2026 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant