barbacane-dev · ndreno · May 5, 2026 · May 4, 2026 · May 4, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -22,6 +22,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **`ai-response-guard` middleware plugin**: inspects LLM responses (OpenAI chat-completion format) in on_response. Named profiles carry `redact` rules (regex → replacement, scoped to `choices[].message.content` and `delta.content`) and `blocked_patterns` (match replaces the response with 502). Streamed responses cannot be redacted after the fact; the plugin emits `redactions_skipped_streaming_total` instead.
 - **Named-profile + CEL composition pattern**: all four AI middlewares read a `context_key` (default `ai.policy`, overridable) to select the active profile. A `cel` middleware upstream writes `ai.policy` via `on_match.set_context`; one CEL decision fans out to prompt strictness, token budget, redaction strictness, and the `ai-proxy` dispatcher's named targets (via `ai.target`).
 
+### Added
+- **plugin**: `ai-proxy` `POST /v1/responses` — OpenAI Responses API support, stateless only (ADR-0030 §2). For OpenAI provider, the dispatcher passes through to the upstream `/v1/responses` and **rewrites the response `id` to a synthetic `resp_<uuid-v7>`** so the gateway's stateless contract holds uniformly across providers — without this, OpenAI's real id leaks to the client and they could send it back as `previous_response_id` (which we 400 on). For Anthropic, the request is translated to Messages API: `input_text`/`input_image` → `text`/`image` content blocks, `function_call` + `function_call_output` → `tool_use` + `tool_result`, `reasoning` items are dropped (Anthropic doesn't accept client-supplied reasoning). The response is translated back to Responses shape with a synthetic time-ordered `id`. For Ollama, returns 400 `responses_not_supported_for_provider` (Ollama's OpenAI-compat surface is Chat Completions only). Streaming SSE on the OpenAI passthrough does not rewrite the in-event id — true SSE handling is deferred for both protocols (ADR-0030 §2 "Streaming").
+- **plugin**: `ai-proxy` `previous_response_id` returns 400 `previous_response_id_not_supported`. The stateful Responses API (`previous_response_id` + `GET /v1/responses/{id}` retrieval) requires session-scoped storage that ADR-0030 §2 explicitly defers; the rejection is the forward-compatibility hook.
+- **plugin**: `ai-proxy` `store` flag is permissive — `true`, `false`, and absent all flow through unchanged. When `store ≠ false` (most clients send `true` as an unexamined default), the dispatcher emits a `Warning: 299 - "store ignored; gateway is stateless, see ADR-0030"` header and increments `barbacane_plugin_ai_proxy_responses_store_downgrades_total`. Operators can quantify stateful-API usage and decide whether to prioritize the future session-storage capability.
+- **plugin**: `ai-proxy` `reasoning` items dropped on the Responses → Anthropic translation path emit `Warning: 299 - "reasoning items dropped..."` and increment `barbacane_plugin_ai_proxy_responses_reasoning_dropped_total`. Silent reasoning drops can degrade output quality on multi-turn agent flows in ways the client cannot detect.
+
 ### Fixed
 - **plugin**: `ai-proxy` no longer returns `404 Not Found` when the operation is bound to a path other than `/v1/chat/completions`. The path-based dispatch added in PR-1 was too strict — operators are free to bind `ai-proxy` to any operation path, and the dispatcher routes Chat Completions requests through unchanged. PR-4 will reintroduce path-based dispatch narrowly when `/v1/responses` actually has a second protocol to differentiate.
 

diff --git a/README.md b/README.md
@@ -10,8 +10,8 @@
   <a href="https://github.com/barbacane-dev/barbacane/actions/workflows/ci.yml"><img src="https://github.com/barbacane-dev/barbacane/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
   <a href="https://docs.barbacane.dev"><img src="https://img.shields.io/badge/docs-docs.barbacane.dev-blue" alt="Documentation"></a>
   <img src="https://img.shields.io/badge/unit%20tests-517%20passing-brightgreen" alt="Unit Tests">
-  <img src="https://img.shields.io/badge/plugin%20tests-818%20passing-brightgreen" alt="Plugin Tests">
-  <img src="https://img.shields.io/badge/integration%20tests-282%20passing-brightgreen" alt="Integration Tests">
+  <img src="https://img.shields.io/badge/plugin%20tests-848%20passing-brightgreen" alt="Plugin Tests">
+  <img src="https://img.shields.io/badge/integration%20tests-288%20passing-brightgreen" alt="Integration Tests">
   <img src="https://img.shields.io/badge/cli%20tests-23%20passing-brightgreen" alt="CLI Tests">
   <img src="https://img.shields.io/badge/ui%20tests-44%20passing-brightgreen" alt="UI Tests">
   <img src="https://img.shields.io/badge/e2e%20tests-11%20passing-brightgreen" alt="E2E Tests">

diff --git a/crates/barbacane-test/tests/ai_proxy.rs b/crates/barbacane-test/tests/ai_proxy.rs
@@ -594,3 +594,328 @@ paths:
 
     assert_eq!(resp.status(), 403, "deny must return 403, not escalate");
 }
+
+// =========================================================================
+// ADR-0030 §2 — Responses API at POST /v1/responses
+// =========================================================================
+
+/// Build a temp spec exposing `/v1/responses` bound to `ai-proxy` with the
+/// given provider + base_url. The path is the canonical OpenAI Responses
+/// path so the dispatcher's path-match (PR-4) routes through the Responses
+/// adapter.
+fn create_responses_spec(
+    provider: &str,
+    base_url: &str,
+) -> (tempfile::TempDir, std::path::PathBuf) {
+    let temp_dir = tempfile::TempDir::new().expect("temp dir");
+    let manifest_dir = std::path::Path::new(env!("CARGO_MANIFEST_DIR"));
+    let plugins_dir = manifest_dir
+        .parent()
+        .unwrap()
+        .parent()
+        .unwrap()
+        .join("plugins");
+    let ai_proxy_path = plugins_dir.join("ai-proxy/ai-proxy.wasm");
+
+    std::fs::write(
+        temp_dir.path().join("barbacane.yaml"),
+        format!(
+            "plugins:\n  ai-proxy:\n    path: {}\n",
+            ai_proxy_path.display()
+        ),
+    )
+    .unwrap();
+
+    let spec_path = temp_dir.path().join("responses.yaml");
+    let api_key_line = match provider {
+        "anthropic" | "openai" => "          api_key: \"sk-test\"\n",
+        _ => "",
+    };
+    std::fs::write(
+        &spec_path,
+        format!(
+            r#"openapi: "3.0.3"
+info:
+  title: Responses API integration
+  version: "1.0.0"
+paths:
+  /v1/responses:
+    post:
+      operationId: responses
+      requestBody:
+        required: true
+        content:
+          application/json:
+            schema:
+              type: object
+      x-barbacane-dispatch:
+        name: ai-proxy
+        config:
+          provider: {provider}
+{api_key_line}          base_url: "{base_url}"
+          timeout: 10
+          max_tokens: 1024
+      responses:
+        "200":
+          description: ok
+        "400":
+          description: client error
+"#,
+            provider = provider,
+            api_key_line = api_key_line,
+            base_url = base_url,
+        ),
+    )
+    .unwrap();
+    (temp_dir, spec_path)
+}
+
+#[tokio::test]
+async fn test_ai_proxy_responses_openai_passthrough_rewrites_id() {
+    // ADR-0030 §2 — the gateway is uniformly stateless. Even on the OpenAI
+    // passthrough path we must rewrite the upstream `id` to a synthetic
+    // `resp_<uuid-v7>`; otherwise OpenAI's real id leaks to the client and
+    // they could send it back as `previous_response_id` (which we 400 on).
+    let mock_server = MockServer::start().await;
+    let upstream_id = "resp_real_openai_should_not_leak";
+    Mock::given(method("POST"))
+        .and(path("/v1/responses"))
+        .respond_with(
+            ResponseTemplate::new(200)
+                .set_body_string(format!(
+                    r#"{{"id":"{}","object":"response","output":[],"usage":{{"input_tokens":1,"output_tokens":1,"total_tokens":2}}}}"#,
+                    upstream_id
+                ))
+                .insert_header("content-type", "application/json"),
+        )
+        .expect(1)
+        .mount(&mock_server)
+        .await;
+
+    let (_tmp, spec_path) = create_responses_spec("openai", &mock_server.uri());
+    let gateway = TestGateway::from_spec(spec_path.to_str().unwrap())
+        .await
+        .expect("gateway");
+
+    let resp = gateway
+        .post(
+            "/v1/responses",
+            r#"{"model":"gpt-4o","input":[{"type":"input_text","role":"user","content":"hi"}]}"#,
+        )
+        .await
+        .unwrap();
+    assert_eq!(resp.status(), 200);
+    let body: serde_json::Value = resp.json().await.unwrap();
+    assert_eq!(body["object"], "response");
+    let id = body["id"].as_str().unwrap();
+    assert!(
+        id.starts_with("resp_"),
+        "id should be a synthetic resp_<uuid>: {}",
+        id
+    );
+    assert_ne!(
+        id, upstream_id,
+        "upstream id leaked to client — gateway is no longer stateless"
+    );
+}
+
+#[tokio::test]
+async fn test_ai_proxy_responses_400_on_previous_response_id() {
+    // The mock must NOT be reached — the preflight check rejects this body
+    // before target resolution.
+    let mock_server = MockServer::start().await;
+    Mock::given(method("POST"))
+        .and(path("/v1/responses"))
+        .respond_with(ResponseTemplate::new(200).set_body_string("{}"))
+        .expect(0)
+        .mount(&mock_server)
+        .await;
+
+    let (_tmp, spec_path) = create_responses_spec("openai", &mock_server.uri());
+    let gateway = TestGateway::from_spec(spec_path.to_str().unwrap())
+        .await
+        .expect("gateway");
+
+    let resp = gateway
+        .post(
+            "/v1/responses",
+            r#"{"model":"gpt-4o","input":[],"previous_response_id":"resp_old"}"#,
+        )
+        .await
+        .unwrap();
+    assert_eq!(resp.status(), 400);
+    let body: serde_json::Value = resp.json().await.unwrap();
+    assert_eq!(body["code"], "previous_response_id_not_supported");
+}
+
+#[tokio::test]
+async fn test_ai_proxy_responses_400_on_ollama_provider() {
+    let mock_server = MockServer::start().await;
+    // Ollama doesn't have a Responses surface — the mock must NOT be reached.
+    Mock::given(method("POST"))
+        .and(path("/v1/responses"))
+        .respond_with(ResponseTemplate::new(200).set_body_string("{}"))
+        .expect(0)
+        .mount(&mock_server)
+        .await;
+
+    let (_tmp, spec_path) = create_responses_spec("ollama", &mock_server.uri());
+    let gateway = TestGateway::from_spec(spec_path.to_str().unwrap())
+        .await
+        .expect("gateway");
+
+    let resp = gateway
+        .post(
+            "/v1/responses",
+            r#"{"model":"mistral","input":[{"type":"input_text","role":"user","content":"hi"}]}"#,
+        )
+        .await
+        .unwrap();
+    assert_eq!(resp.status(), 400);
+    let body: serde_json::Value = resp.json().await.unwrap();
+    assert_eq!(body["code"], "responses_not_supported_for_provider");
+}
+
+#[tokio::test]
+async fn test_ai_proxy_responses_anthropic_translation_roundtrip() {
+    // Mock Anthropic /v1/messages returning a Messages-format response. The
+    // gateway must translate it into Responses format for the client.
+    let mock_server = MockServer::start().await;
+    let messages_response = r#"{
+        "id":"msg_xyz","type":"message","role":"assistant","model":"claude-sonnet-4-6",
+        "content":[{"type":"text","text":"Hello!"}],
+        "stop_reason":"end_turn",
+        "usage":{"input_tokens":4,"output_tokens":2}
+    }"#;
+    Mock::given(method("POST"))
+        .and(path("/v1/messages"))
+        .respond_with(
+            ResponseTemplate::new(200)
+                .set_body_string(messages_response)
+                .insert_header("content-type", "application/json"),
+        )
+        .expect(1)
+        .mount(&mock_server)
+        .await;
+
+    let (_tmp, spec_path) = create_responses_spec("anthropic", &mock_server.uri());
+    let gateway = TestGateway::from_spec(spec_path.to_str().unwrap())
+        .await
+        .expect("gateway");
+
+    let resp = gateway
+        .post(
+            "/v1/responses",
+            r#"{
+                "model":"claude-sonnet-4-6",
+                "store":false,
+                "input":[{"type":"input_text","role":"user","content":"Hi"}]
+            }"#,
+        )
+        .await
+        .unwrap();
+    assert_eq!(resp.status(), 200);
+    let body: serde_json::Value = resp.json().await.unwrap();
+    assert_eq!(body["object"], "response");
+    let id = body["id"].as_str().unwrap();
+    assert!(id.starts_with("resp_"), "synthetic id: {}", id);
+    assert_eq!(body["model"], "claude-sonnet-4-6");
+    assert_eq!(body["output"][0]["type"], "output_text");
+    assert_eq!(body["output"][0]["text"], "Hello!");
+    assert_eq!(body["usage"]["input_tokens"], 4);
+    assert_eq!(body["usage"]["output_tokens"], 2);
+}
+
+#[tokio::test]
+async fn test_ai_proxy_responses_warning_header_on_store_downgrade() {
+    let mock_server = MockServer::start().await;
+    Mock::given(method("POST"))
+        .and(path("/v1/messages"))
+        .respond_with(
+            ResponseTemplate::new(200)
+                .set_body_string(
+                    r#"{"id":"msg","model":"claude","content":[{"type":"text","text":"ok"}],"usage":{"input_tokens":1,"output_tokens":1}}"#,
+                )
+                .insert_header("content-type", "application/json"),
+        )
+        .mount(&mock_server)
+        .await;
+
+    let (_tmp, spec_path) = create_responses_spec("anthropic", &mock_server.uri());
+    let gateway = TestGateway::from_spec(spec_path.to_str().unwrap())
+        .await
+        .expect("gateway");
+
+    // store: true is the OpenAI default — gateway downgrades and tells the client.
+    let resp = gateway
+        .post(
+            "/v1/responses",
+            r#"{
+                "model":"claude-sonnet-4-6",
+                "store":true,
+                "input":[{"type":"input_text","role":"user","content":"hi"}]
+            }"#,
+        )
+        .await
+        .unwrap();
+    assert_eq!(resp.status(), 200);
+    let warning = resp
+        .headers()
+        .get("warning")
+        .expect("warning header set")
+        .to_str()
+        .unwrap();
+    assert!(
+        warning.contains("store ignored"),
+        "warning should announce the store downgrade: {}",
+        warning
+    );
+}
+
+#[tokio::test]
+async fn test_ai_proxy_responses_warning_header_on_reasoning_dropped() {
+    let mock_server = MockServer::start().await;
+    Mock::given(method("POST"))
+        .and(path("/v1/messages"))
+        .respond_with(
+            ResponseTemplate::new(200)
+                .set_body_string(
+                    r#"{"id":"msg","model":"claude","content":[{"type":"text","text":"ok"}],"usage":{"input_tokens":1,"output_tokens":1}}"#,
+                )
+                .insert_header("content-type", "application/json"),
+        )
+        .mount(&mock_server)
+        .await;
+
+    let (_tmp, spec_path) = create_responses_spec("anthropic", &mock_server.uri());
+    let gateway = TestGateway::from_spec(spec_path.to_str().unwrap())
+        .await
+        .expect("gateway");
+
+    let resp = gateway
+        .post(
+            "/v1/responses",
+            r#"{
+                "model":"claude-sonnet-4-6",
+                "store":false,
+                "input":[
+                    {"type":"reasoning","summary":"thinking..."},
+                    {"type":"input_text","role":"user","content":"hi"}
+                ]
+            }"#,
+        )
+        .await
+        .unwrap();
+    assert_eq!(resp.status(), 200);
+    let warning = resp
+        .headers()
+        .get("warning")
+        .expect("warning header set")
+        .to_str()
+        .unwrap();
+    assert!(
+        warning.contains("reasoning items dropped"),
+        "warning should announce reasoning drop: {}",
+        warning
+    );
+}
diff --git a/plugins/ai-proxy/Cargo.lock b/plugins/ai-proxy/Cargo.lock
diff --git a/plugins/ai-proxy/Cargo.toml b/plugins/ai-proxy/Cargo.toml
@@ -15,6 +15,11 @@ barbacane-plugin-sdk = { path = "../../crates/barbacane-plugin-sdk" }
 globset = { version = "0.4", default-features = false }
 serde = { version = "1", features = ["derive"] }
 serde_json = "1"
+# Used only for formatting bytes as the UUID dashed-hex form. v7 is built
+# manually from `host_time_now` + a per-instance counter — the wasm32-
+# unknown-unknown target has no system RNG, and the v7 spec only requires
+# monotonicity within a node, which the counter provides. ADR-0030 §2.
+uuid = { version = "1", default-features = false }
 
 [profile.release]
 opt-level = "s"