diff --git a/CHANGELOG.md b/CHANGELOG.md
index 7d6cbe4..0ceedb9 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,6 +5,65 @@ All notable changes to the Claude Code OpenAI Wrapper project will be documented
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [2.9.7] - 2026-05-12
+
+### Added
+
+- Active CLI-auth health probe. When `CLAUDE_AUTH_METHOD=claude_cli`,
+  the lifespan schedules a periodic background coroutine that runs
+  the existing `claude_cli.verify_cli()` (a 1-turn
+  `query(prompt="Hello", max_turns=1)`) and updates a shared
+  `cli_health` state. Bounds the stale window between the bundled CLI
+  losing its session and a real chat request discovering it.
+  - Interval is configurable via `CLI_AUTH_PROBE_INTERVAL_SECONDS`
+    (default 600s / 10 min). Set to 0 to disable. Skipped entirely for
+    non-cli auth methods (API key / Bedrock / Vertex), which surface
+    upstream auth failures via the existing
+    `assistant_authentication_failed` -> 401 mapping.
+  - Probe results visible at `GET /v1/auth/status` under a new
+    `cli_health` block: `ok`, `last_probed_at`, `last_ok_at`,
+    `error_kind` (`auth_failure` | `unknown` | `null`),
+    `error_message`.
+
+### Changed
+
+- `POST /v1/chat/completions` and `POST /v1/messages` now return
+  **HTTP 401** with `error.type=authentication_error` and
+  `error.code=claude_cli_not_authenticated` when the latest CLI probe
+  failed, instead of letting the request fall through to a generic
+  502 from the SDK or 503 from the config check. OpenAI / Anthropic
+  client libraries route 401 as `AuthenticationError`, giving callers a
+  durable signal to roll keys or re-`/login` rather than retrying a
+  doomed request.
+- `_build_sdk_error_response` (the
+  `ClaudeResultError.subtype=error_during_execution` path) now scans
+  `error_message` + `stderr_tail` for the same CLI-auth-failure markers
+  the probe uses (`not logged in`, `please run /login`,
+  `invalid api key`, `authentication_error`, `401`). On a match the
+  response is 401 + `authentication_error` and `cli_health` is seeded
+  failed so the next request fails fast without a round-trip.
+- Auth-failure responses now bypass the global `http_exception_handler`
+  (which previously rewrote the body as `error.type=api_error`) by
+  returning `JSONResponse` directly. Required for OpenAI / Anthropic
+  clients to read the authentication signal.
+
+### Tests
+
+- `tests/test_auth_unit.py::TestProbeCliAuth` - three async unit tests
+  covering `probe_cli_auth()`: success (`mark_ok`), `Not logged in`
+  stderr (`auth_failure`), generic exception (`unknown`).
+- `tests/test_endpoints.py::TestChatCompletionsCliHealthGate` and
+  `tests/test_anthropic_messages.py::TestAnthropicMessagesCliHealthGate`
+  - in-process TestClient assertions that both endpoints return 401
+  with `authentication_error` when `cli_health.ok=False`.
+- `tests/test_error_path_unit.py::TestCliAuthFailureToFourOhOne` -
+  four tests for the stderr-marker mapping: 401 on `Not logged in`,
+  401 on `Invalid API key`, 502 regression guard on `connection
+  refused`, and a seeding test confirming a real request flips
+  `cli_health.ok` to False.
+- Suite total: 673 passed, 31 skipped (was 664/31 on v2.9.6; +9 new
+  tests).
+
 ## [2.9.6] - 2026-05-11
 
 ### Changed
diff --git a/README.md b/README.md
index 8d22bbe..1981fd2 100644
--- a/README.md
+++ b/README.md
@@ -4,10 +4,11 @@ OpenAI API-compatible wrapper for Claude Code. Drop it in front of any OpenAI cl
 
 ## Version
 
-**Current:** 2.9.6
+**Current:** 2.9.7
 
 Highlights of recent releases (full history in [CHANGELOG.md](./CHANGELOG.md)):
 
+- **2.9.7** - Active Claude-CLI auth health probe (10-minute default, configurable via `CLI_AUTH_PROBE_INTERVAL_SECONDS`). `/v1/chat/completions` and `/v1/messages` now return **HTTP 401** with `error.type=authentication_error` when the bundled CLI loses its session, so OpenAI / Anthropic client libraries route the failure as `AuthenticationError` instead of a transient 502/503. `/v1/auth/status` exposes the new `cli_health` block. Defense-in-depth: `error_during_execution` results whose stderr matches `Not logged in / Please run /login / Invalid API key` also map to 401 and seed `cli_health` failed.
 - **2.9.6** - `claude-agent-sdk` 0.1.68 -> 0.1.81. urllib3 floor raised to 2.7.0 and `python-multipart` to 0.0.27 to close three HIGH Dependabot alerts. Pulled in upstream `RichardAtCT#46` so `/v1/models` returns Anthropic's live catalogue when `ANTHROPIC_API_KEY` is set (cached, with a short error TTL so transient outages do not stick for an hour). `check-sdk-version.yml` now opens a draft bump PR on drift instead of writing only to the job summary.
 - **2.9.x** (earlier) - CodeQL hardening: sanitised error responses (no more `str(e)` to clients), `filter_content` rewrite against polynomial ReDoS, `/v1/debug/request` gated behind `DEBUG_MODE`/`VERBOSE`, workflow permissions pinned. Image trimmed via `poetry install --only main` and a real `.dockerignore`.
 - **2.8.x** - Security dep bumps, breaker defaults loosened, CLI stderr capture, structured-log state unmasked.
@@ -17,7 +18,7 @@ Highlights of recent releases (full history in [CHANGELOG.md](./CHANGELOG.md)):
 
 ## Status
 
-Production ready. **664 tests passing (31 skipped)**. Streaming works. Sessions work. JSON mode works. Function calling works. Tools are off by default for speed - pass `enable_tools: true` to turn them on. Auth supports API key, Bedrock, Vertex AI, and CLI.
+Production ready. **673 tests passing (31 skipped)**. Streaming works. Sessions work. JSON mode works. Function calling works. Tools are off by default for speed - pass `enable_tools: true` to turn them on. Auth supports API key, Bedrock, Vertex AI, and CLI.
 
 ## Quick Start
 
@@ -189,6 +190,7 @@ Listed in roughly the order you will reach for them.
 | `ANTHROPIC_MODELS_URL` | Override the live models endpoint. Point at a proxy or staging URL during testing. | `https://api.anthropic.com/v1/models` |
 | `ANTHROPIC_VERSION` | `anthropic-version` header sent to the Models API. | `2023-06-01` |
 | `ANTHROPIC_BETA` / `ANTHROPIC_BETA_HEADER` | Optional `anthropic-beta` header forwarded to the Models API for beta-gated features. | - |
+| `CLI_AUTH_PROBE_INTERVAL_SECONDS` | Background CLI-auth probe cadence when `CLAUDE_AUTH_METHOD=claude_cli`. Each probe is a 1-turn `query` (~$0.001 at Sonnet pricing); a failure flips `cli_health.ok` so `/v1/chat/completions` and `/v1/messages` return 401 instead of letting the SDK fail loudly. Set `0` to disable. Ignored for non-cli auth methods. | `600` (10 min) |
 | `DEBUG_MODE` | Enable debug logging and unlock `/v1/debug/request` | `false` |
 | `VERBOSE` | Same unlock effect on `/v1/debug/request` | `false` |
 | `CORS_ORIGINS` | Allowed CORS origins (JSON array) | `["*"]` |
@@ -451,7 +453,7 @@ With `json_object` mode, the wrapper adds system prompt instructions for JSON ou
 ## Testing
 
 ```bash
-# Run the full test suite (664 tests, ~3 s on a laptop)
+# Run the full test suite (673 tests, ~3 s on a laptop)
 poetry run pytest tests/
 
 # Quick endpoint test (server must be running)
diff --git a/pyproject.toml b/pyproject.toml
index bad19a8..05e6c9d 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "claude-code-openai-wrapper"
-version = "2.9.6"
+version = "2.9.7"
 description = "OpenAI API-compatible wrapper for Claude Code"
 authors = ["Richard Atkinson <richardatk01@gmail.com>"]
 readme = "README.md"
diff --git a/src/__init__.py b/src/__init__.py
index ace4f50..32465bc 100644
--- a/src/__init__.py
+++ b/src/__init__.py
@@ -1,3 +1,3 @@
 """Claude Code OpenAI Wrapper - A FastAPI-based OpenAI-compatible API for Claude Code."""
 
-__version__ = "2.9.6"
+__version__ = "2.9.7"
diff --git a/src/auth.py b/src/auth.py
index 7b23e69..ed7f83d 100644
--- a/src/auth.py
+++ b/src/auth.py
@@ -1,5 +1,7 @@
 import os
 import logging
+from dataclasses import dataclass, field
+from datetime import datetime, timezone
 from typing import Optional, Dict, Any, Tuple
 from fastapi import HTTPException, Request
 from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
@@ -284,3 +286,106 @@ def get_claude_code_auth_info() -> Dict[str, Any]:
         "status": auth_manager.auth_status,
         "environment_variables": list(auth_manager.get_claude_code_env_vars().keys()),
     }
+
+
+# Markers the Claude CLI emits to stderr (or wraps in SDK exceptions) when its
+# stored session is missing or expired. Compared case-insensitively against the
+# concatenation of an exception's str() and any captured stderr_tail.
+_CLI_AUTH_FAILURE_MARKERS = (
+    "not logged in",
+    "please run /login",
+    "invalid api key",
+    "authentication_error",
+    "401",
+)
+
+
+def _classify_probe_error(blob: str) -> str:
+    lowered = (blob or "").lower()
+    if any(marker in lowered for marker in _CLI_AUTH_FAILURE_MARKERS):
+        return "auth_failure"
+    return "unknown"
+
+
+@dataclass
+class CliHealth:
+    """Latest observed health of the Claude CLI auth path.
+
+    The probe loop (run only when auth_method == 'claude_cli') refreshes this
+    on an interval; the chat / messages handlers consult `ok` to short-circuit
+    with HTTP 401 before round-tripping through the SDK.
+    """
+
+    ok: bool = True
+    last_probed_at: Optional[datetime] = None
+    last_ok_at: Optional[datetime] = None
+    error_kind: Optional[str] = None
+    error_message: Optional[str] = None
+
+    def mark_ok(self) -> None:
+        now = datetime.now(timezone.utc)
+        self.ok = True
+        self.last_probed_at = now
+        self.last_ok_at = now
+        self.error_kind = None
+        self.error_message = None
+
+    def mark_failed(self, kind: str, message: str) -> None:
+        self.ok = False
+        self.last_probed_at = datetime.now(timezone.utc)
+        self.error_kind = kind
+        # Trim to keep logs and /v1/auth/status compact.
+        self.error_message = (message or "")[:500]
+
+    def as_dict(self) -> Dict[str, Any]:
+        return {
+            "ok": self.ok,
+            "last_probed_at": self.last_probed_at.isoformat() if self.last_probed_at else None,
+            "last_ok_at": self.last_ok_at.isoformat() if self.last_ok_at else None,
+            "error_kind": self.error_kind,
+            "error_message": self.error_message,
+        }
+
+
+cli_health = CliHealth()
+
+
+async def probe_cli_auth(cli=None) -> bool:
+    """Run a 1-turn CLI probe and update `cli_health`.
+
+    Reuses `claude_cli.verify_cli()` (which already issues a short
+    `query(prompt="Hello", max_turns=1)`); on any exception, classifies the
+    failure as auth_failure if the marker set matches, else unknown.
+
+    `cli` is the ClaudeCodeCLI instance to probe. When omitted, lazy-resolves
+    the module-level singleton from `src.main` so a periodic probe exercises
+    exactly the same instance that real requests use. The parameter is
+    primarily there for tests, which inject a mock.
+
+    Returns True when the probe succeeded, False otherwise. Never raises.
+    """
+    if cli is None:
+        # Lazy import - src.main imports src.auth at module load.
+        from src import main as _main  # noqa: WPS433 - intentional lazy import
+
+        cli = _main.claude_cli
+
+    try:
+        ok = await cli.verify_cli()
+        if ok:
+            cli_health.mark_ok()
+            logger.info("cli_auth_probe_ok")
+            return True
+        cli_health.mark_failed("unknown", "verify_cli returned False")
+        logger.warning("cli_auth_probe_failed kind=unknown reason=verify_cli_returned_false")
+        return False
+    except Exception as exc:  # noqa: BLE001 - the probe must never propagate
+        message = str(exc)
+        kind = _classify_probe_error(message)
+        cli_health.mark_failed(kind, message)
+        logger.warning(
+            "cli_auth_probe_failed kind=%s error=%s",
+            kind,
+            message[:200].replace("\n", " "),
+        )
+        return False
diff --git a/src/main.py b/src/main.py
index 0ce896a..cc24b29 100644
--- a/src/main.py
+++ b/src/main.py
@@ -52,7 +52,17 @@
     convert_tool_messages,
 )
 from src.cpu_watchdog import cpu_watchdog
-from src.auth import verify_api_key, security, validate_claude_code_auth, get_claude_code_auth_info
+from src.auth import (
+    verify_api_key,
+    security,
+    validate_claude_code_auth,
+    get_claude_code_auth_info,
+)
+
+# Import the module (not the singletons) so reloads of src.auth in tests stay
+# in sync with main.py's view of _auth.cli_health / auth_manager / probe_cli_auth.
+from src import auth as _auth
+from src.auth import _classify_probe_error  # pure function, safe to bind once
 from src.parameter_validator import ParameterValidator, CompatibilityReporter
 from src.session_manager import session_manager
 from src.tool_manager import tool_manager
@@ -435,17 +445,21 @@ async def lifespan(app: FastAPI):
 
         if cli_verified:
             logger.info("✅ Claude Agent SDK verified successfully")
+            _auth.cli_health.mark_ok()
         else:
             logger.warning("⚠️  Claude Agent SDK verification returned False")
             logger.warning("The server will start, but requests may fail.")
+            _auth.cli_health.mark_failed("unknown", "startup verify_cli returned False")
     except asyncio.TimeoutError:
         logger.warning("⚠️  Claude Agent SDK verification timed out (30s)")
         logger.warning("This may indicate network issues or SDK configuration problems.")
         logger.warning("The server will start, but first request may be slow.")
+        _auth.cli_health.mark_failed("unknown", "startup verify_cli timed out after 30s")
     except Exception as e:
         logger.error(f"⚠️  Claude Agent SDK verification failed: {e}")
         logger.warning("The server will start, but requests may fail.")
         logger.warning("Check that Claude Code CLI is properly installed and authenticated.")
+        _auth.cli_health.mark_failed(_classify_probe_error(str(e)), str(e))
 
     # Log debug information if debug mode is enabled
     if DEBUG_MODE or VERBOSE:
@@ -490,6 +504,28 @@ async def cost_cleanup_loop():
 
     cost_cleanup_task = asyncio.get_running_loop().create_task(cost_cleanup_loop())
 
+    # Periodic CLI auth probe. Only runs when auth_method == claude_cli because
+    # API key / Bedrock / Vertex failures already surface as
+    # assistant_authentication_failed via _ASSISTANT_ERROR_STATUS. Set the
+    # interval to 0 to disable. Each probe is a 1-turn query and costs ~$0.001
+    # at Sonnet pricing; default 10 min keeps the bill low while still bounding
+    # the stale window.
+    async def cli_auth_probe_loop():
+        interval = int(os.getenv("CLI_AUTH_PROBE_INTERVAL_SECONDS", "600"))
+        if interval <= 0:
+            logger.info("cli_auth_probe disabled (interval=%s)", interval)
+            return
+        try:
+            while True:
+                await asyncio.sleep(interval)
+                if _auth.auth_manager.auth_method != "claude_cli":
+                    continue
+                await _auth.probe_cli_auth()
+        except asyncio.CancelledError:
+            pass
+
+    cli_auth_probe_task = asyncio.get_running_loop().create_task(cli_auth_probe_loop())
+
     # Start CPU watchdog (Linux/Docker only)
     cpu_watchdog.start()
 
@@ -497,6 +533,7 @@ async def cost_cleanup_loop():
 
     cpu_watchdog.stop()
     cost_cleanup_task.cancel()
+    cli_auth_probe_task.cancel()
 
     # Cleanup on shutdown
     logger.info("Shutting down session manager...")
@@ -770,7 +807,12 @@ def _build_sdk_error_response(request_id: str, model: str, err: ClaudeResultErro
     """Non-recoverable SDK result: return 502 so clients know to retry with
     backoff. Structured body includes the SDK subtype and any errors so
     callers can tell the difference between a max-turns overflow and a
-    transport failure."""
+    transport failure.
+
+    Defense-in-depth for the CLI-auth probe loop: when stderr_tail (or the
+    error_message) matches the known auth-failure markers, return 401 instead
+    and seed _auth.cli_health so the next request fails fast without a round-trip.
+    """
     logger.error(
         _kv(
             "claude_sdk_error",
@@ -786,6 +828,33 @@ def _build_sdk_error_response(request_id: str, model: str, err: ClaudeResultErro
         logger.error(
             f"claude_sdk_error stderr tail (request_id={request_id}):\n" f"{err.stderr_tail}"
         )
+
+    blob = " ".join(filter(None, [err.error_message, err.stderr_tail]))
+    if _classify_probe_error(blob) == "auth_failure":
+        _auth.cli_health.mark_failed("auth_failure", blob)
+        logger.warning(
+            _kv(
+                "claude_sdk_cli_auth_failed",
+                request_id=request_id,
+                model=model,
+                subtype=err.subtype,
+            )
+        )
+        return JSONResponse(
+            status_code=401,
+            content={
+                "error": {
+                    "message": (
+                        "Claude CLI is not authenticated. Run `claude /login` "
+                        "on the wrapper host and restart, or set "
+                        "ANTHROPIC_API_KEY."
+                    ),
+                    "type": "authentication_error",
+                    "code": "claude_cli_not_authenticated",
+                }
+            },
+        )
+
     return JSONResponse(
         status_code=502,
         content={
@@ -1351,6 +1420,59 @@ async def generate_streaming_response(
         yield f"data: {json.dumps(error_chunk)}\n\n"
 
 
+def _check_cli_auth_or_401() -> Optional[JSONResponse]:
+    """Gate request handlers on the latest CLI-auth probe + the auth manager.
+
+    Returns a JSONResponse with HTTP 401 (or 503 for non-cli auth methods)
+    when authentication is unhealthy, else None.
+
+    Returning a JSONResponse directly - rather than raising HTTPException -
+    is intentional: the global http_exception_handler wraps all detail bodies
+    as `error.type=api_error`, which clobbers the OpenAI-shaped
+    `authentication_error` literal that clients route on.
+    """
+    if _auth.auth_manager.auth_method == "claude_cli" and not _auth.cli_health.ok:
+        return JSONResponse(
+            status_code=401,
+            content={
+                "error": {
+                    "message": (
+                        "Claude CLI authentication is not healthy. "
+                        "Run `claude /login` on the wrapper host and restart, "
+                        "or set ANTHROPIC_API_KEY."
+                    ),
+                    "type": "authentication_error",
+                    "code": "claude_cli_not_authenticated",
+                    "last_probed_at": (
+                        _auth.cli_health.last_probed_at.isoformat()
+                        if _auth.cli_health.last_probed_at
+                        else None
+                    ),
+                    "error_kind": _auth.cli_health.error_kind,
+                    "error_message": _auth.cli_health.error_message,
+                }
+            },
+        )
+
+    auth_valid, auth_info = validate_claude_code_auth()
+    if not auth_valid:
+        status = 401 if _auth.auth_manager.auth_method == "claude_cli" else 503
+        return JSONResponse(
+            status_code=status,
+            content={
+                "error": {
+                    "message": "Claude Code authentication failed",
+                    "type": "authentication_error" if status == 401 else "service_unavailable",
+                    "code": "claude_cli_not_authenticated" if status == 401 else "auth_unavailable",
+                    "errors": auth_info.get("errors", []),
+                    "method": auth_info.get("method", "none"),
+                }
+            },
+        )
+
+    return None
+
+
 @app.post("/v1/chat/completions")
 @rate_limit_endpoint("chat")
 async def chat_completions(
@@ -1362,17 +1484,10 @@ async def chat_completions(
     # Check FastAPI API key if configured
     await verify_api_key(request, credentials)
 
-    # Validate Claude Code authentication
-    auth_valid, auth_info = validate_claude_code_auth()
-
-    if not auth_valid:
-        error_detail = {
-            "message": "Claude Code authentication failed",
-            "errors": auth_info.get("errors", []),
-            "method": auth_info.get("method", "none"),
-            "help": "Check /v1/auth/status for detailed authentication information",
-        }
-        raise HTTPException(status_code=503, detail=error_detail)
+    # Gate on Claude CLI probe + config-level auth validation.
+    auth_block = _check_cli_auth_or_401()
+    if auth_block is not None:
+        return auth_block
 
     # Circuit breaker check: if the SDK has been failing at >50% for a minute,
     # fail-fast with 503 instead of forwarding another doomed request. The
@@ -1672,17 +1787,10 @@ async def anthropic_messages(
     # Check FastAPI API key if configured
     await verify_api_key(request, credentials)
 
-    # Validate Claude Code authentication
-    auth_valid, auth_info = validate_claude_code_auth()
-
-    if not auth_valid:
-        error_detail = {
-            "message": "Claude Code authentication failed",
-            "errors": auth_info.get("errors", []),
-            "method": auth_info.get("method", "none"),
-            "help": "Check /v1/auth/status for detailed authentication information",
-        }
-        raise HTTPException(status_code=503, detail=error_detail)
+    # Gate on Claude CLI probe + config-level auth validation.
+    auth_block = _check_cli_auth_or_401()
+    if auth_block is not None:
+        return auth_block
 
     try:
         logger.info(f"Anthropic Messages API request: model={request_body.model}")
@@ -2746,6 +2854,7 @@ async def get_auth_status(request: Request):
 
     return {
         "claude_code_auth": auth_info,
+        "cli_health": _auth.cli_health.as_dict(),
         "server_info": {
             "api_key_required": bool(active_api_key),
             "api_key_source": (
diff --git a/tests/test_anthropic_messages.py b/tests/test_anthropic_messages.py
index 1f8d303..d368e44 100644
--- a/tests/test_anthropic_messages.py
+++ b/tests/test_anthropic_messages.py
@@ -211,5 +211,39 @@ def test_response_format_matches_anthropic_sdk(self):
         assert "output_tokens" in result["usage"]
 
 
+class TestAnthropicMessagesCliHealthGate:
+    """In-process gate check: /v1/messages must return 401 (not 503) when the
+    Claude CLI probe failed, so Anthropic SDK clients (VC and similar) route
+    the failure as AuthenticationError instead of a transient server error.
+    """
+
+    def test_messages_returns_401_when_cli_health_unhealthy(self, monkeypatch):
+        from fastapi.testclient import TestClient
+
+        from src import main as main_mod
+        from src import auth as auth_mod
+
+        monkeypatch.setattr(auth_mod.auth_manager, "auth_method", "claude_cli", raising=False)
+        auth_mod.cli_health.mark_failed("auth_failure", "Not logged in - Please run /login")
+
+        try:
+            client = TestClient(main_mod.app)
+            resp = client.post(
+                "/v1/messages",
+                json={
+                    "model": "claude-sonnet-4-6",
+                    "max_tokens": 16,
+                    "messages": [{"role": "user", "content": "hello"}],
+                },
+            )
+        finally:
+            auth_mod.cli_health.mark_ok()
+
+        assert resp.status_code == 401, resp.text
+        body = resp.json()
+        assert body["error"]["type"] == "authentication_error"
+        assert body["error"]["code"] == "claude_cli_not_authenticated"
+
+
 if __name__ == "__main__":
     pytest.main([__file__, "-v"])
diff --git a/tests/test_auth_unit.py b/tests/test_auth_unit.py
index ba9ec92..2f50fb2 100644
--- a/tests/test_auth_unit.py
+++ b/tests/test_auth_unit.py
@@ -491,6 +491,50 @@ def test_returns_runtime_key_when_available(self):
                 assert result in ["env-key", "runtime-key"]
 
 
+class TestProbeCliAuth:
+    """Cover the periodic CLI-auth probe in src.auth.probe_cli_auth()."""
+
+    @pytest.mark.asyncio
+    async def test_probe_cli_auth_success_marks_ok(self):
+        import src.auth
+
+        importlib.reload(src.auth)
+        fake_cli = MagicMock()
+        fake_cli.verify_cli = AsyncMock(return_value=True)
+        result = await src.auth.probe_cli_auth(cli=fake_cli)
+        assert result is True
+        assert src.auth.cli_health.ok is True
+        assert src.auth.cli_health.last_ok_at is not None
+        assert src.auth.cli_health.error_kind is None
+
+    @pytest.mark.asyncio
+    async def test_probe_cli_auth_marker_in_stderr_marks_auth_failure(self):
+        import src.auth
+
+        importlib.reload(src.auth)
+        fake_cli = MagicMock()
+        fake_cli.verify_cli = AsyncMock(
+            side_effect=RuntimeError("Not logged in - Please run /login")
+        )
+        result = await src.auth.probe_cli_auth(cli=fake_cli)
+        assert result is False
+        assert src.auth.cli_health.ok is False
+        assert src.auth.cli_health.error_kind == "auth_failure"
+        assert "Not logged in" in (src.auth.cli_health.error_message or "")
+
+    @pytest.mark.asyncio
+    async def test_probe_cli_auth_generic_error_marks_unknown(self):
+        import src.auth
+
+        importlib.reload(src.auth)
+        fake_cli = MagicMock()
+        fake_cli.verify_cli = AsyncMock(side_effect=RuntimeError("connection refused"))
+        result = await src.auth.probe_cli_auth(cli=fake_cli)
+        assert result is False
+        assert src.auth.cli_health.ok is False
+        assert src.auth.cli_health.error_kind == "unknown"
+
+
 # Reset module state after tests
 @pytest.fixture(autouse=True)
 def reset_auth_module():
diff --git a/tests/test_endpoints.py b/tests/test_endpoints.py
index 3592818..7b7a913 100644
--- a/tests/test_endpoints.py
+++ b/tests/test_endpoints.py
@@ -125,3 +125,37 @@ def main():
 
 if __name__ == "__main__":
     main()
+
+
+class TestChatCompletionsCliHealthGate:
+    """In-process gate check: when auth_method=claude_cli and the latest probe
+    failed, /v1/chat/completions must return 401 with an OpenAI-shaped
+    authentication_error body, without touching the SDK.
+    """
+
+    def test_chat_completions_returns_401_when_cli_health_unhealthy(self, monkeypatch):
+        from fastapi.testclient import TestClient
+
+        from src import main as main_mod
+        from src import auth as auth_mod
+
+        monkeypatch.setattr(auth_mod.auth_manager, "auth_method", "claude_cli", raising=False)
+        auth_mod.cli_health.mark_failed("auth_failure", "Not logged in - Please run /login")
+
+        try:
+            client = TestClient(main_mod.app)
+            resp = client.post(
+                "/v1/chat/completions",
+                json={
+                    "model": "claude-sonnet-4-6",
+                    "messages": [{"role": "user", "content": "hello"}],
+                },
+            )
+        finally:
+            auth_mod.cli_health.mark_ok()
+
+        assert resp.status_code == 401, resp.text
+        body = resp.json()
+        assert body["error"]["type"] == "authentication_error"
+        assert body["error"]["code"] == "claude_cli_not_authenticated"
+        assert body["error"]["error_kind"] == "auth_failure"
diff --git a/tests/test_error_path_unit.py b/tests/test_error_path_unit.py
index 1a06201..eebb8a0 100644
--- a/tests/test_error_path_unit.py
+++ b/tests/test_error_path_unit.py
@@ -147,3 +147,61 @@ def test_assistant_rate_limit_raises(self):
             cli.parse_claude_message(messages)
         assert excinfo.value.subtype == "assistant_rate_limit"
         assert "rate_limit" in excinfo.value.errors
+
+
+class TestCliAuthFailureToFourOhOne:
+    """Defense-in-depth: when ClaudeResultError carries CLI auth markers in
+    its stderr_tail or error_message, _build_sdk_error_response must return
+    HTTP 401 instead of 502, with an OpenAI-shaped authentication_error body.
+    """
+
+    def test_sdk_error_with_auth_marker_in_stderr_maps_to_401(self):
+        err = ClaudeResultError(
+            subtype="error_during_execution",
+            num_turns=0,
+            errors=None,
+            stop_reason=None,
+            error_message=None,
+            stderr_tail="Not logged in - Please run /login",
+        )
+        resp = _build_sdk_error_response("req-cli-auth", "claude-sonnet-4-6", err)
+        assert resp.status_code == 401
+        body = _body(resp)
+        assert body["error"]["type"] == "authentication_error"
+        assert body["error"]["code"] == "claude_cli_not_authenticated"
+
+    def test_sdk_error_with_invalid_api_key_in_message_maps_to_401(self):
+        err = ClaudeResultError(
+            subtype="error_during_execution",
+            errors=["Invalid API key"],
+            error_message="Invalid API key",
+        )
+        resp = _build_sdk_error_response("req-cli-key", "claude-sonnet-4-6", err)
+        assert resp.status_code == 401
+        body = _body(resp)
+        assert body["error"]["type"] == "authentication_error"
+
+    def test_sdk_error_without_auth_marker_still_502(self):
+        err = ClaudeResultError(
+            subtype="error_during_execution",
+            errors=["upstream timeout"],
+            stderr_tail="connection refused",
+        )
+        resp = _build_sdk_error_response("req-generic", "claude-sonnet-4-6", err)
+        assert resp.status_code == 502
+        body = _body(resp)
+        assert body["error"]["type"] == "upstream_sdk_error"
+
+    def test_sdk_error_with_auth_marker_seeds_cli_health(self):
+        import src.auth
+
+        src.auth.cli_health.mark_ok()
+        assert src.auth.cli_health.ok is True
+
+        err = ClaudeResultError(
+            subtype="error_during_execution",
+            stderr_tail="Not logged in - Please run /login",
+        )
+        _build_sdk_error_response("req-cli-seed", "claude-sonnet-4-6", err)
+        assert src.auth.cli_health.ok is False
+        assert src.auth.cli_health.error_kind == "auth_failure"