ttlequals0 · ttlequals0 · May 13, 2026 · May 13, 2026 · May 13, 2026 · May 13, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,65 @@ All notable changes to the Claude Code OpenAI Wrapper project will be documented
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [2.9.7] - 2026-05-12
+
+### Added
+
+- Active CLI-auth health probe. When `CLAUDE_AUTH_METHOD=claude_cli`,
+  the lifespan schedules a periodic background coroutine that runs
+  the existing `claude_cli.verify_cli()` (a 1-turn
+  `query(prompt="Hello", max_turns=1)`) and updates a shared
+  `cli_health` state. Bounds the stale window between the bundled CLI
+  losing its session and a real chat request discovering it.
+  - Interval is configurable via `CLI_AUTH_PROBE_INTERVAL_SECONDS`
+    (default 600s / 10 min). Set to 0 to disable. Skipped entirely for
+    non-cli auth methods (API key / Bedrock / Vertex), which surface
+    upstream auth failures via the existing
+    `assistant_authentication_failed` -> 401 mapping.
+  - Probe results visible at `GET /v1/auth/status` under a new
+    `cli_health` block: `ok`, `last_probed_at`, `last_ok_at`,
+    `error_kind` (`auth_failure` | `unknown` | `null`),
+    `error_message`.
+
+### Changed
+
+- `POST /v1/chat/completions` and `POST /v1/messages` now return
+  **HTTP 401** with `error.type=authentication_error` and
+  `error.code=claude_cli_not_authenticated` when the latest CLI probe
+  failed, instead of letting the request fall through to a generic
+  502 from the SDK or 503 from the config check. OpenAI / Anthropic
+  client libraries route 401 as `AuthenticationError`, giving callers a
+  durable signal to roll keys or re-`/login` rather than retrying a
+  doomed request.
+- `_build_sdk_error_response` (the
+  `ClaudeResultError.subtype=error_during_execution` path) now scans
+  `error_message` + `stderr_tail` for the same CLI-auth-failure markers
+  the probe uses (`not logged in`, `please run /login`,
+  `invalid api key`, `authentication_error`, `401`). On a match the
+  response is 401 + `authentication_error` and `cli_health` is seeded
+  failed so the next request fails fast without a round-trip.
+- Auth-failure responses now bypass the global `http_exception_handler`
+  (which previously rewrote the body as `error.type=api_error`) by
+  returning `JSONResponse` directly. Required for OpenAI / Anthropic
+  clients to read the authentication signal.
+
+### Tests
+
+- `tests/test_auth_unit.py::TestProbeCliAuth` - three async unit tests
+  covering `probe_cli_auth()`: success (`mark_ok`), `Not logged in`
+  stderr (`auth_failure`), generic exception (`unknown`).
+- `tests/test_endpoints.py::TestChatCompletionsCliHealthGate` and
+  `tests/test_anthropic_messages.py::TestAnthropicMessagesCliHealthGate`
+  - in-process TestClient assertions that both endpoints return 401
+  with `authentication_error` when `cli_health.ok=False`.
+- `tests/test_error_path_unit.py::TestCliAuthFailureToFourOhOne` -
+  four tests for the stderr-marker mapping: 401 on `Not logged in`,
+  401 on `Invalid API key`, 502 regression guard on `connection
+  refused`, and a seeding test confirming a real request flips
+  `cli_health.ok` to False.
+- Suite total: 673 passed, 31 skipped (was 664/31 on v2.9.6; +9 new
+  tests).
+
 ## [2.9.6] - 2026-05-11
 
 ### Changed

diff --git a/README.md b/README.md
@@ -4,10 +4,11 @@ OpenAI API-compatible wrapper for Claude Code. Drop it in front of any OpenAI cl
 
 ## Version
 
-**Current:** 2.9.6
+**Current:** 2.9.7
 
 Highlights of recent releases (full history in [CHANGELOG.md](./CHANGELOG.md)):
 
+- **2.9.7** - Active Claude-CLI auth health probe (10-minute default, configurable via `CLI_AUTH_PROBE_INTERVAL_SECONDS`). `/v1/chat/completions` and `/v1/messages` now return **HTTP 401** with `error.type=authentication_error` when the bundled CLI loses its session, so OpenAI / Anthropic client libraries route the failure as `AuthenticationError` instead of a transient 502/503. `/v1/auth/status` exposes the new `cli_health` block. Defense-in-depth: `error_during_execution` results whose stderr matches `Not logged in / Please run /login / Invalid API key` also map to 401 and seed `cli_health` failed.
 - **2.9.6** - `claude-agent-sdk` 0.1.68 -> 0.1.81. urllib3 floor raised to 2.7.0 and `python-multipart` to 0.0.27 to close three HIGH Dependabot alerts. Pulled in upstream `RichardAtCT#46` so `/v1/models` returns Anthropic's live catalogue when `ANTHROPIC_API_KEY` is set (cached, with a short error TTL so transient outages do not stick for an hour). `check-sdk-version.yml` now opens a draft bump PR on drift instead of writing only to the job summary.
 - **2.9.x** (earlier) - CodeQL hardening: sanitised error responses (no more `str(e)` to clients), `filter_content` rewrite against polynomial ReDoS, `/v1/debug/request` gated behind `DEBUG_MODE`/`VERBOSE`, workflow permissions pinned. Image trimmed via `poetry install --only main` and a real `.dockerignore`.
 - **2.8.x** - Security dep bumps, breaker defaults loosened, CLI stderr capture, structured-log state unmasked.
@@ -17,7 +18,7 @@ Highlights of recent releases (full history in [CHANGELOG.md](./CHANGELOG.md)):
 
 ## Status
 
-Production ready. **664 tests passing (31 skipped)**. Streaming works. Sessions work. JSON mode works. Function calling works. Tools are off by default for speed - pass `enable_tools: true` to turn them on. Auth supports API key, Bedrock, Vertex AI, and CLI.
+Production ready. **673 tests passing (31 skipped)**. Streaming works. Sessions work. JSON mode works. Function calling works. Tools are off by default for speed - pass `enable_tools: true` to turn them on. Auth supports API key, Bedrock, Vertex AI, and CLI.
 
 ## Quick Start
 
@@ -189,6 +190,7 @@ Listed in roughly the order you will reach for them.
 | `ANTHROPIC_MODELS_URL` | Override the live models endpoint. Point at a proxy or staging URL during testing. | `https://api.anthropic.com/v1/models` |
 | `ANTHROPIC_VERSION` | `anthropic-version` header sent to the Models API. | `2023-06-01` |
 | `ANTHROPIC_BETA` / `ANTHROPIC_BETA_HEADER` | Optional `anthropic-beta` header forwarded to the Models API for beta-gated features. | - |
+| `CLI_AUTH_PROBE_INTERVAL_SECONDS` | Background CLI-auth probe cadence when `CLAUDE_AUTH_METHOD=claude_cli`. Each probe is a 1-turn `query` (~$0.001 at Sonnet pricing); a failure flips `cli_health.ok` so `/v1/chat/completions` and `/v1/messages` return 401 instead of letting the SDK fail loudly. Set `0` to disable. Ignored for non-cli auth methods. | `600` (10 min) |
 | `DEBUG_MODE` | Enable debug logging and unlock `/v1/debug/request` | `false` |
 | `VERBOSE` | Same unlock effect on `/v1/debug/request` | `false` |
 | `CORS_ORIGINS` | Allowed CORS origins (JSON array) | `["*"]` |
@@ -451,7 +453,7 @@ With `json_object` mode, the wrapper adds system prompt instructions for JSON ou
 ## Testing
 
 ```bash
-# Run the full test suite (664 tests, ~3 s on a laptop)
+# Run the full test suite (673 tests, ~3 s on a laptop)
 poetry run pytest tests/
 
 # Quick endpoint test (server must be running)

diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "claude-code-openai-wrapper"
-version = "2.9.6"
+version = "2.9.7"
 description = "OpenAI API-compatible wrapper for Claude Code"
 authors = ["Richard Atkinson <richardatk01@gmail.com>"]
 readme = "README.md"

diff --git a/src/__init__.py b/src/__init__.py
@@ -1,3 +1,3 @@
 """Claude Code OpenAI Wrapper - A FastAPI-based OpenAI-compatible API for Claude Code."""
 
-__version__ = "2.9.6"
+__version__ = "2.9.7"
diff --git a/src/auth.py b/src/auth.py
@@ -1,5 +1,7 @@
 import os
 import logging
+from dataclasses import dataclass, field
+from datetime import datetime, timezone
 from typing import Optional, Dict, Any, Tuple
 from fastapi import HTTPException, Request
 from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
@@ -284,3 +286,106 @@ def get_claude_code_auth_info() -> Dict[str, Any]:
         "status": auth_manager.auth_status,
         "environment_variables": list(auth_manager.get_claude_code_env_vars().keys()),
     }
+
+
+# Markers the Claude CLI emits to stderr (or wraps in SDK exceptions) when its
+# stored session is missing or expired. Compared case-insensitively against the
+# concatenation of an exception's str() and any captured stderr_tail.
+_CLI_AUTH_FAILURE_MARKERS = (
+    "not logged in",
+    "please run /login",
+    "invalid api key",
+    "authentication_error",
+    "401",
+)
+
+
+def _classify_probe_error(blob: str) -> str:
+    lowered = (blob or "").lower()
+    if any(marker in lowered for marker in _CLI_AUTH_FAILURE_MARKERS):
+        return "auth_failure"
+    return "unknown"
+
+
+@dataclass
+class CliHealth:
+    """Latest observed health of the Claude CLI auth path.
+
+    The probe loop (run only when auth_method == 'claude_cli') refreshes this
+    on an interval; the chat / messages handlers consult `ok` to short-circuit
+    with HTTP 401 before round-tripping through the SDK.
+    """
+
+    ok: bool = True
+    last_probed_at: Optional[datetime] = None
+    last_ok_at: Optional[datetime] = None
+    error_kind: Optional[str] = None
+    error_message: Optional[str] = None
+
+    def mark_ok(self) -> None:
+        now = datetime.now(timezone.utc)
+        self.ok = True
+        self.last_probed_at = now
+        self.last_ok_at = now
+        self.error_kind = None
+        self.error_message = None
+
+    def mark_failed(self, kind: str, message: str) -> None:
+        self.ok = False
+        self.last_probed_at = datetime.now(timezone.utc)
+        self.error_kind = kind
+        # Trim to keep logs and /v1/auth/status compact.
+        self.error_message = (message or "")[:500]
+
+    def as_dict(self) -> Dict[str, Any]:
+        return {
+            "ok": self.ok,
+            "last_probed_at": self.last_probed_at.isoformat() if self.last_probed_at else None,
+            "last_ok_at": self.last_ok_at.isoformat() if self.last_ok_at else None,
+            "error_kind": self.error_kind,
+            "error_message": self.error_message,
+        }
+
+
+cli_health = CliHealth()
+
+
+async def probe_cli_auth(cli=None) -> bool:
+    """Run a 1-turn CLI probe and update `cli_health`.
+
+    Reuses `claude_cli.verify_cli()` (which already issues a short
+    `query(prompt="Hello", max_turns=1)`); on any exception, classifies the
+    failure as auth_failure if the marker set matches, else unknown.
+
+    `cli` is the ClaudeCodeCLI instance to probe. When omitted, lazy-resolves
+    the module-level singleton from `src.main` so a periodic probe exercises
+    exactly the same instance that real requests use. The parameter is
+    primarily there for tests, which inject a mock.
+
+    Returns True when the probe succeeded, False otherwise. Never raises.
+    """
+    if cli is None:
+        # Lazy import - src.main imports src.auth at module load.
+        from src import main as _main  # noqa: WPS433 - intentional lazy import
+
+        cli = _main.claude_cli
+
+    try:
+        ok = await cli.verify_cli()
+        if ok:
+            cli_health.mark_ok()
+            logger.info("cli_auth_probe_ok")
+            return True
+        cli_health.mark_failed("unknown", "verify_cli returned False")
+        logger.warning("cli_auth_probe_failed kind=unknown reason=verify_cli_returned_false")
+        return False
+    except Exception as exc:  # noqa: BLE001 - the probe must never propagate
+        message = str(exc)
+        kind = _classify_probe_error(message)
+        cli_health.mark_failed(kind, message)
+        logger.warning(
+            "cli_auth_probe_failed kind=%s error=%s",
+            kind,
+            message[:200].replace("\n", " "),
+        )
+        return False