Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **`ai-response-guard` middleware plugin**: inspects LLM responses (OpenAI chat-completion format) in on_response. Named profiles carry `redact` rules (regex → replacement, scoped to `choices[].message.content` and `delta.content`) and `blocked_patterns` (match replaces the response with 502). Streamed responses cannot be redacted after the fact; the plugin emits `redactions_skipped_streaming_total` instead.
- **Named-profile + CEL composition pattern**: all four AI middlewares read a `context_key` (default `ai.policy`, overridable) to select the active profile. A `cel` middleware upstream writes `ai.policy` via `on_match.set_context`; one CEL decision fans out to prompt strictness, token budget, redaction strictness, and the `ai-proxy` dispatcher's named targets (via `ai.target`).

### Added
- **plugin**: `ai-proxy` `POST /v1/responses` — OpenAI Responses API support, stateless only (ADR-0030 §2). For OpenAI provider, the dispatcher passes through to the upstream `/v1/responses` and **rewrites the response `id` to a synthetic `resp_<uuid-v7>`** so the gateway's stateless contract holds uniformly across providers — without this, OpenAI's real id leaks to the client and they could send it back as `previous_response_id` (which we 400 on). For Anthropic, the request is translated to Messages API: `input_text`/`input_image` → `text`/`image` content blocks, `function_call` + `function_call_output` → `tool_use` + `tool_result`, `reasoning` items are dropped (Anthropic doesn't accept client-supplied reasoning). The response is translated back to Responses shape with a synthetic time-ordered `id`. For Ollama, returns 400 `responses_not_supported_for_provider` (Ollama's OpenAI-compat surface is Chat Completions only). Streaming SSE on the OpenAI passthrough does not rewrite the in-event id — true SSE handling is deferred for both protocols (ADR-0030 §2 "Streaming").
- **plugin**: `ai-proxy` `previous_response_id` returns 400 `previous_response_id_not_supported`. The stateful Responses API (`previous_response_id` + `GET /v1/responses/{id}` retrieval) requires session-scoped storage that ADR-0030 §2 explicitly defers; the rejection is the forward-compatibility hook.
- **plugin**: `ai-proxy` `store` flag is permissive — `true`, `false`, and absent all flow through unchanged. When `store ≠ false` (most clients send `true` as an unexamined default), the dispatcher emits a `Warning: 299 - "store ignored; gateway is stateless, see ADR-0030"` header and increments `barbacane_plugin_ai_proxy_responses_store_downgrades_total`. Operators can quantify stateful-API usage and decide whether to prioritize the future session-storage capability.
- **plugin**: `ai-proxy` `reasoning` items dropped on the Responses → Anthropic translation path emit `Warning: 299 - "reasoning items dropped..."` and increment `barbacane_plugin_ai_proxy_responses_reasoning_dropped_total`. Silent reasoning drops can degrade output quality on multi-turn agent flows in ways the client cannot detect.

### Fixed
- **plugin**: `ai-proxy` no longer returns `404 Not Found` when the operation is bound to a path other than `/v1/chat/completions`. The path-based dispatch added in PR-1 was too strict — operators are free to bind `ai-proxy` to any operation path, and the dispatcher routes Chat Completions requests through unchanged. PR-4 will reintroduce path-based dispatch narrowly when `/v1/responses` actually has a second protocol to differentiate.

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@
<a href="https://github.com/barbacane-dev/barbacane/actions/workflows/ci.yml"><img src="https://github.com/barbacane-dev/barbacane/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
<a href="https://docs.barbacane.dev"><img src="https://img.shields.io/badge/docs-docs.barbacane.dev-blue" alt="Documentation"></a>
<img src="https://img.shields.io/badge/unit%20tests-517%20passing-brightgreen" alt="Unit Tests">
<img src="https://img.shields.io/badge/plugin%20tests-818%20passing-brightgreen" alt="Plugin Tests">
<img src="https://img.shields.io/badge/integration%20tests-282%20passing-brightgreen" alt="Integration Tests">
<img src="https://img.shields.io/badge/plugin%20tests-848%20passing-brightgreen" alt="Plugin Tests">
<img src="https://img.shields.io/badge/integration%20tests-288%20passing-brightgreen" alt="Integration Tests">
<img src="https://img.shields.io/badge/cli%20tests-23%20passing-brightgreen" alt="CLI Tests">
<img src="https://img.shields.io/badge/ui%20tests-44%20passing-brightgreen" alt="UI Tests">
<img src="https://img.shields.io/badge/e2e%20tests-11%20passing-brightgreen" alt="E2E Tests">
Expand Down
325 changes: 325 additions & 0 deletions crates/barbacane-test/tests/ai_proxy.rs
Original file line number Diff line number Diff line change
Expand Up @@ -594,3 +594,328 @@ paths:

assert_eq!(resp.status(), 403, "deny must return 403, not escalate");
}

// =========================================================================
// ADR-0030 §2 — Responses API at POST /v1/responses
// =========================================================================

/// Build a temp spec exposing `/v1/responses` bound to `ai-proxy` with the
/// given provider + base_url. The path is the canonical OpenAI Responses
/// path so the dispatcher's path-match (PR-4) routes through the Responses
/// adapter.
fn create_responses_spec(
provider: &str,
base_url: &str,
) -> (tempfile::TempDir, std::path::PathBuf) {
let temp_dir = tempfile::TempDir::new().expect("temp dir");
let manifest_dir = std::path::Path::new(env!("CARGO_MANIFEST_DIR"));
let plugins_dir = manifest_dir
.parent()
.unwrap()
.parent()
.unwrap()
.join("plugins");
let ai_proxy_path = plugins_dir.join("ai-proxy/ai-proxy.wasm");

std::fs::write(
temp_dir.path().join("barbacane.yaml"),
format!(
"plugins:\n ai-proxy:\n path: {}\n",
ai_proxy_path.display()
),
)
.unwrap();

let spec_path = temp_dir.path().join("responses.yaml");
let api_key_line = match provider {
"anthropic" | "openai" => " api_key: \"sk-test\"\n",
_ => "",
};
std::fs::write(
&spec_path,
format!(
r#"openapi: "3.0.3"
info:
title: Responses API integration
version: "1.0.0"
paths:
/v1/responses:
post:
operationId: responses
requestBody:
required: true
content:
application/json:
schema:
type: object
x-barbacane-dispatch:
name: ai-proxy
config:
provider: {provider}
{api_key_line} base_url: "{base_url}"
timeout: 10
max_tokens: 1024
responses:
"200":
description: ok
"400":
description: client error
"#,
provider = provider,
api_key_line = api_key_line,
base_url = base_url,
),
)
.unwrap();
(temp_dir, spec_path)
}

#[tokio::test]
async fn test_ai_proxy_responses_openai_passthrough_rewrites_id() {
// ADR-0030 §2 — the gateway is uniformly stateless. Even on the OpenAI
// passthrough path we must rewrite the upstream `id` to a synthetic
// `resp_<uuid-v7>`; otherwise OpenAI's real id leaks to the client and
// they could send it back as `previous_response_id` (which we 400 on).
let mock_server = MockServer::start().await;
let upstream_id = "resp_real_openai_should_not_leak";
Mock::given(method("POST"))
.and(path("/v1/responses"))
.respond_with(
ResponseTemplate::new(200)
.set_body_string(format!(
r#"{{"id":"{}","object":"response","output":[],"usage":{{"input_tokens":1,"output_tokens":1,"total_tokens":2}}}}"#,
upstream_id
))
.insert_header("content-type", "application/json"),
)
.expect(1)
.mount(&mock_server)
.await;

let (_tmp, spec_path) = create_responses_spec("openai", &mock_server.uri());
let gateway = TestGateway::from_spec(spec_path.to_str().unwrap())
.await
.expect("gateway");

let resp = gateway
.post(
"/v1/responses",
r#"{"model":"gpt-4o","input":[{"type":"input_text","role":"user","content":"hi"}]}"#,
)
.await
.unwrap();
assert_eq!(resp.status(), 200);
let body: serde_json::Value = resp.json().await.unwrap();
assert_eq!(body["object"], "response");
let id = body["id"].as_str().unwrap();
assert!(
id.starts_with("resp_"),
"id should be a synthetic resp_<uuid>: {}",
id
);
assert_ne!(
id, upstream_id,
"upstream id leaked to client — gateway is no longer stateless"
);
}

#[tokio::test]
async fn test_ai_proxy_responses_400_on_previous_response_id() {
// The mock must NOT be reached — the preflight check rejects this body
// before target resolution.
let mock_server = MockServer::start().await;
Mock::given(method("POST"))
.and(path("/v1/responses"))
.respond_with(ResponseTemplate::new(200).set_body_string("{}"))
.expect(0)
.mount(&mock_server)
.await;

let (_tmp, spec_path) = create_responses_spec("openai", &mock_server.uri());
let gateway = TestGateway::from_spec(spec_path.to_str().unwrap())
.await
.expect("gateway");

let resp = gateway
.post(
"/v1/responses",
r#"{"model":"gpt-4o","input":[],"previous_response_id":"resp_old"}"#,
)
.await
.unwrap();
assert_eq!(resp.status(), 400);
let body: serde_json::Value = resp.json().await.unwrap();
assert_eq!(body["code"], "previous_response_id_not_supported");
}

#[tokio::test]
async fn test_ai_proxy_responses_400_on_ollama_provider() {
let mock_server = MockServer::start().await;
// Ollama doesn't have a Responses surface — the mock must NOT be reached.
Mock::given(method("POST"))
.and(path("/v1/responses"))
.respond_with(ResponseTemplate::new(200).set_body_string("{}"))
.expect(0)
.mount(&mock_server)
.await;

let (_tmp, spec_path) = create_responses_spec("ollama", &mock_server.uri());
let gateway = TestGateway::from_spec(spec_path.to_str().unwrap())
.await
.expect("gateway");

let resp = gateway
.post(
"/v1/responses",
r#"{"model":"mistral","input":[{"type":"input_text","role":"user","content":"hi"}]}"#,
)
.await
.unwrap();
assert_eq!(resp.status(), 400);
let body: serde_json::Value = resp.json().await.unwrap();
assert_eq!(body["code"], "responses_not_supported_for_provider");
}

#[tokio::test]
async fn test_ai_proxy_responses_anthropic_translation_roundtrip() {
// Mock Anthropic /v1/messages returning a Messages-format response. The
// gateway must translate it into Responses format for the client.
let mock_server = MockServer::start().await;
let messages_response = r#"{
"id":"msg_xyz","type":"message","role":"assistant","model":"claude-sonnet-4-6",
"content":[{"type":"text","text":"Hello!"}],
"stop_reason":"end_turn",
"usage":{"input_tokens":4,"output_tokens":2}
}"#;
Mock::given(method("POST"))
.and(path("/v1/messages"))
.respond_with(
ResponseTemplate::new(200)
.set_body_string(messages_response)
.insert_header("content-type", "application/json"),
)
.expect(1)
.mount(&mock_server)
.await;

let (_tmp, spec_path) = create_responses_spec("anthropic", &mock_server.uri());
let gateway = TestGateway::from_spec(spec_path.to_str().unwrap())
.await
.expect("gateway");

let resp = gateway
.post(
"/v1/responses",
r#"{
"model":"claude-sonnet-4-6",
"store":false,
"input":[{"type":"input_text","role":"user","content":"Hi"}]
}"#,
)
.await
.unwrap();
assert_eq!(resp.status(), 200);
let body: serde_json::Value = resp.json().await.unwrap();
assert_eq!(body["object"], "response");
let id = body["id"].as_str().unwrap();
assert!(id.starts_with("resp_"), "synthetic id: {}", id);
assert_eq!(body["model"], "claude-sonnet-4-6");
assert_eq!(body["output"][0]["type"], "output_text");
assert_eq!(body["output"][0]["text"], "Hello!");
assert_eq!(body["usage"]["input_tokens"], 4);
assert_eq!(body["usage"]["output_tokens"], 2);
}

#[tokio::test]
async fn test_ai_proxy_responses_warning_header_on_store_downgrade() {
let mock_server = MockServer::start().await;
Mock::given(method("POST"))
.and(path("/v1/messages"))
.respond_with(
ResponseTemplate::new(200)
.set_body_string(
r#"{"id":"msg","model":"claude","content":[{"type":"text","text":"ok"}],"usage":{"input_tokens":1,"output_tokens":1}}"#,
)
.insert_header("content-type", "application/json"),
)
.mount(&mock_server)
.await;

let (_tmp, spec_path) = create_responses_spec("anthropic", &mock_server.uri());
let gateway = TestGateway::from_spec(spec_path.to_str().unwrap())
.await
.expect("gateway");

// store: true is the OpenAI default — gateway downgrades and tells the client.
let resp = gateway
.post(
"/v1/responses",
r#"{
"model":"claude-sonnet-4-6",
"store":true,
"input":[{"type":"input_text","role":"user","content":"hi"}]
}"#,
)
.await
.unwrap();
assert_eq!(resp.status(), 200);
let warning = resp
.headers()
.get("warning")
.expect("warning header set")
.to_str()
.unwrap();
assert!(
warning.contains("store ignored"),
"warning should announce the store downgrade: {}",
warning
);
}

#[tokio::test]
async fn test_ai_proxy_responses_warning_header_on_reasoning_dropped() {
let mock_server = MockServer::start().await;
Mock::given(method("POST"))
.and(path("/v1/messages"))
.respond_with(
ResponseTemplate::new(200)
.set_body_string(
r#"{"id":"msg","model":"claude","content":[{"type":"text","text":"ok"}],"usage":{"input_tokens":1,"output_tokens":1}}"#,
)
.insert_header("content-type", "application/json"),
)
.mount(&mock_server)
.await;

let (_tmp, spec_path) = create_responses_spec("anthropic", &mock_server.uri());
let gateway = TestGateway::from_spec(spec_path.to_str().unwrap())
.await
.expect("gateway");

let resp = gateway
.post(
"/v1/responses",
r#"{
"model":"claude-sonnet-4-6",
"store":false,
"input":[
{"type":"reasoning","summary":"thinking..."},
{"type":"input_text","role":"user","content":"hi"}
]
}"#,
)
.await
.unwrap();
assert_eq!(resp.status(), 200);
let warning = resp
.headers()
.get("warning")
.expect("warning header set")
.to_str()
.unwrap();
assert!(
warning.contains("reasoning items dropped"),
"warning should announce reasoning drop: {}",
warning
);
}
7 changes: 7 additions & 0 deletions plugins/ai-proxy/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions plugins/ai-proxy/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@ barbacane-plugin-sdk = { path = "../../crates/barbacane-plugin-sdk" }
globset = { version = "0.4", default-features = false }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
# Used only for formatting bytes as the UUID dashed-hex form. v7 is built
# manually from `host_time_now` + a per-instance counter — the wasm32-
# unknown-unknown target has no system RNG, and the v7 spec only requires
# monotonicity within a node, which the counter provides. ADR-0030 §2.
uuid = { version = "1", default-features = false }

[profile.release]
opt-level = "s"
Expand Down
Loading