Skip to content

fix(v2.9.7): active Claude-CLI auth probe + 401 on cli auth failure#48

Closed
ttlequals0 wants to merge 49 commits into
RichardAtCT:mainfrom
ttlequals0:fix/v2.9.7-cli-auth-probe
Closed

fix(v2.9.7): active Claude-CLI auth probe + 401 on cli auth failure#48
ttlequals0 wants to merge 49 commits into
RichardAtCT:mainfrom
ttlequals0:fix/v2.9.7-cli-auth-probe

Conversation

@ttlequals0
Copy link
Copy Markdown

Summary

  • Active CLI-auth health probe: when CLAUDE_AUTH_METHOD=claude_cli, the lifespan now runs a periodic background coroutine that calls the existing claude_cli.verify_cli() (a 1-turn query(prompt="Hello", max_turns=1)) and updates a shared cli_health state. Default interval 600s, configurable via CLI_AUTH_PROBE_INTERVAL_SECONDS, set 0 to disable. Skipped for non-cli auth methods.
  • POST /v1/chat/completions and POST /v1/messages now return HTTP 401 with error.type=authentication_error and error.code=claude_cli_not_authenticated when the most recent probe failed. OpenAI / Anthropic client libraries route 401 as AuthenticationError, giving callers a durable signal instead of a transient 502/503.
  • /v1/auth/status exposes the new cli_health block: ok, last_probed_at, last_ok_at, error_kind (auth_failure | unknown | null), error_message.
  • Defense-in-depth: _build_sdk_error_response scans stderr_tail + error_message for known CLI-auth markers (not logged in, please run /login, invalid api key, authentication_error, 401). On a match it returns 401 instead of 502 and seeds cli_health failed so the next request fails fast.
  • Auth-failure responses bypass the global http_exception_handler (which would rewrite bodies to error.type=api_error) by returning JSONResponse directly.

Version

2.9.7 (bumped in src/__init__.py and pyproject.toml).

Test plan

  • Full suite: 673 passed, 31 skipped (was 664/31 on v2.9.6; +9 new tests).
    • TestProbeCliAuth (3): success / Not logged in stderr / generic exception classification.
    • TestChatCompletionsCliHealthGate + TestAnthropicMessagesCliHealthGate (2): in-process TestClient assertions on the 401 surface.
    • TestCliAuthFailureToFourOhOne (4): stderr-mapping including a 502 regression guard and a real request seeds cli_health.
  • Manual reproduction with CLAUDE_CONFIG_DIR=$(mktemp -d) HOME=$EMPTY CLAUDE_AUTH_METHOD=claude_cli: both /v1/chat/completions and /v1/messages return HTTP 401 with the OpenAI-shaped body before any SDK round-trip; /v1/auth/status shows cli_health.ok=false.
  • Healthy path manually verified: /v1/auth/status returns cli_health.ok=true with last_ok_at populated.

Docker

Image ttlequals0/claude-code-openai-wrapper:2.9.7 already pushed (sha256:39fc12f1dd5fa15b8752a384f03b839f89684d356bc5de40ab05f477975ca22f); :latest repointed to the same digest. Trivy: 14 HIGH/CRITICAL CVEs, all unfixed Debian base-OS packages, identical to v2.9.6's set per prior triage. Portainer webhook fired.

Order-of-operations note: the build-and-push doc puts PR creation as Step 10 (after image push); for this repo the better sequence is PR -> CodeQL green -> build/push, so CodeQL gates the image. Saved as feedback for future runs.

ttlequals0 and others added 30 commits January 30, 2026 19:52
- Add response_format parameter for OpenAI-compatible JSON mode
- Add ModelService for dynamic model fetching from Anthropic API
- Add claude-opus-4-5-20251101 model to supported models
- Add JSON extraction and enforcement methods to MessageAdapter
- Update docker-compose.yml to use published image
- Bump version to 2.3.0
Claude Code SDK was ignoring JSON_MODE_INSTRUCTION in the system prompt
and returning conversational text instead of JSON. Added JSON_PROMPT_SUFFIX
constant that is now appended to the user prompt alongside the system
prompt instruction, ensuring the model follows JSON output requirements.

Changes:
- Add JSON_PROMPT_SUFFIX constant to message_adapter.py
- Append suffix to user prompt in both streaming and non-streaming paths
- Update log messages to reflect dual-prompt approach
- Bump version to 2.3.1
- Updated JSON_MODE_INSTRUCTION with explicit first/last character rules
- Added explicit prohibition of markdown code blocks in instructions
- Updated JSON_PROMPT_SUFFIX with more concise output format
- Added log_json_structure() helper for debugging JSON responses
- Added boundary and structure logging in streaming/non-streaming paths
…-models

Add JSON response format support and dynamic model fetching
- Improve JSON mode instructions with numbered rules and explicit
  prohibition of preambles
- Add COMMON_PREAMBLES constant with 19 common Claude preambles
- Implement balanced brace/bracket matching algorithm that handles
  escaped quotes and braces inside strings correctly
- Add JsonExtractionResult dataclass and extract_json_with_metadata()
  for detailed extraction tracking
- Add enforce_json_format_with_metadata() for metadata-enabled
  JSON enforcement
- Add _log_extraction_diagnostics() for debugging extraction failures
- Create optional request deduplication cache with LRU eviction and TTL
- Add cache management endpoints: GET /v1/cache/stats, POST /v1/cache/clear
- Update version to 2.4.0
- Add comprehensive unit tests for all new functionality

The JSON extraction priority order is now:
1. Pure JSON (fast path)
2. Preamble removal + parse
3. Markdown code block extraction
4. Balanced brace/bracket matching
5. First-to-last fallback
- Add POST /v1/models/refresh to refresh models from Anthropic API at runtime
- Add GET /v1/models/status for service observability (source, count, last refresh)
- Track model source (api/fallback) and last refresh timestamp in ModelService
- Add comprehensive unit tests for refresh functionality

Version 2.4.1
- Model refresh now respects CLAUDE_AUTH_METHOD configuration
- Only 'anthropic' auth supports dynamic API fetch; others use static fallback
- Added auth_method field to /v1/models/refresh and /v1/models/status responses
- Updated CLAUDE_MODELS: added claude-opus-4-6, removed claude-opus-4-5-20250929
- Added model status/refresh endpoint cards to landing page UI
- Comprehensive unit tests for all auth methods
feat: JSON extraction improvements, request cache, and dynamic model refresh
…5.0)

- Add model metadata (context windows, output limits) and pricing from source
- Add claude-sonnet-4-6 and re-enable 3.x models confirmed supported
- Expand tool registry from 15 to 33 tools matching actual inventory
- Add retry module with exponential backoff and Opus-to-Sonnet fallback
- Add cost tracker with per-session accumulation and auto-cleanup
- Add X-Claude-Effort and X-Claude-Thinking header support
- Add model-specific max_tokens validation
- Extract shared options-building helper for streaming/non-streaming paths
- Rewrite README, trim historical migration docs
feat: v2.5.0 - models, tools, pricing from open-sourced Claude Code
- Replace generic landing page with clean utilitarian design
- Fix GitHub URL to ttlequals0/claude-code-openai-wrapper
- Fix OpenAPI docs version (was hardcoded 1.0.0, now dynamic)
- Add all 25 endpoints to landing page grouped by category
- Drop Pico CSS, use DM Sans + JetBrains Mono typography
- Bump version to 2.5.1
…date

feat: redesign landing page and update API docs (v2.5.1)
- Add JSON response mode documentation with usage example
- Expand API endpoints table from 14 to 25 entries, grouped by category
- Fix Installation git clone URL (was RichardAtCT, now ttlequals0)
- Bump version reference to 2.5.1
- Fix SDK version reference (removed pinned version, installed is 0.1.26)
- Fix production command (main.py does not exist, use claude-wrapper)
- Fix test command path (tests/test_endpoints.py not test_endpoints.py)
- Fix MAX_TIMEOUT units in Docker table (ms not seconds, 600000 not 300)
- Add missing env vars to config table (DEBUG_MODE, CORS_ORIGINS, etc.)
- Update temperature/top_p limitation (now applied via system prompt)
- Tighten prose, remove AI-ish phrasing
- Sync pyproject.toml version to 2.5.1
….5.2)

- Remove BashOutput, KillShell, SlashCommand (not in Claude Code registry)
- Add Brief, Config, ListPeers, REPL, Sleep, Monitor, SendUserFile,
  PushNotification, ListMcpResources, ReadMcpResource, VerifyPlanExecution
- Tool count: 33 -> 41, verified against Claude Code src/tools.ts
fix: remove fake tools, add missing real tools (v2.5.2)
…2.6.0)

- OpenAI function calling simulation via system prompt injection and
  response parsing (tools/tool_choice parameters, multi-turn support)
- JSON schema in response_format (type=json_schema with schema definition)
- Real-time streaming markdown fence stripping (JsonFenceStripper)
- CPU watchdog for Docker/Linux (WATCHDOG_ENABLED=true to enable)
- New models: ToolCall, FunctionCall, ToolDefinition, JsonSchema
- Message model extended with tool role, tool_calls, tool_call_id
- Extract duplicated JSON schema instructions to MessageAdapter.JSON_SCHEMA_TEMPLATE
- Remove no-op fence_str=fence assignments in JsonFenceStripper
- Fix filter_content(None) to return "" instead of None (type safety)
- Fix greedy bare JSON regex in parse_tool_calls (use json.loads validation)
- Add log when tools + json_mode both active in streaming
- Add precise return type annotation to parse_tool_calls
- Add tests: json_schema model, dict message conversion, nested array parsing
…-json-schema

feat: function calling, JSON schema, fence stripping, watchdog (v2.6.0)
…87;187;187m �[39msupported�[38;2;187;187;187m �[39mmodel�[38;2;187;187;187m �[39mlist�[38;2;102;102;102m,�[39m�[38;2;187;187;187m �[39madd�[38;2;187;187;187m �[39mClaude�[38;2;187;187;187m �[39mOpus�[38;2;187;187;187m �[39m�[38;2;102;102;102m4.7�[39m�[38;2;187;187;187m �[39m�[38;2;102;102;102m(�[39mv2�[38;2;102;102;102m.�[39m�[38;2;102;102;102m7.0�[39m�[38;2;102;102;102m)�[39m

Align�[38;2;187;187;187m �[39mCLAUDE_MODELS�[38;2;102;102;102m,�[39m�[38;2;187;187;187m �[39mMODEL_METADATA�[38;2;102;102;102m,�[39m�[38;2;187;187;187m �[39mMODEL_PRICING�[38;2;102;102;102m,�[39m�[38;2;187;187;187m �[39mand�[38;2;187;187;187m �[39mMODEL_FALLBACK_MAP
�[38;2;170;34;255;01mwith�[39;00m�[38;2;187;187;187m �[39mthe�[38;2;187;187;187m �[39mAnthropic�[38;2;187;187;187m �[39mmodels�[38;2;187;187;187m �[39mdocs�[38;2;187;187;187m �[39m�[38;2;170;34;255;01mas�[39;00m�[38;2;187;187;187m �[39mof�[38;2;187;187;187m �[39m�[38;2;102;102;102m2026�[39m�[38;2;102;102;102m-�[39m�[38;2;102;102;102m04�[39m�[38;2;102;102;102m-�[39m�[38;2;102;102;102m16�[39m�[38;2;102;102;102m.�[39m�[38;2;187;187;187m �[39mRemove�[38;2;187;187;187m �[39mthree�[38;2;187;187;187m �[39mmodels�[38;2;187;187;187m �[39malready
retired�[38;2;187;187;187m �[39mat�[38;2;187;187;187m �[39mthe�[38;2;187;187;187m �[39mAPI�[38;2;187;187;187m �[39mand�[38;2;187;187;187m �[39madd�[38;2;187;187;187m �[39mthe�[38;2;187;187;187m �[39m�[38;2;170;34;255;01mnew�[39;00m�[38;2;187;187;187m �[39mflagship�[38;2;187;187;187m �[39mOpus�[38;2;187;187;187m �[39m�[38;2;102;102;102m4.7�[39m�[38;2;102;102;102m.�[39m

�[38;2;102;102;102m-�[39m�[38;2;187;187;187m �[39mAdd�[38;2;187;187;187m �[39mclaude�[38;2;102;102;102m-�[39mopus�[38;2;102;102;102m-�[39m�[38;2;102;102;102m4�[39m�[38;2;102;102;102m-�[39m�[38;2;102;102;102m7�[39m�[38;2;187;187;187m �[39m�[38;2;102;102;102m(�[39m�[38;2;102;102;102m1�[39mM�[38;2;187;187;187m �[39mcontext�[38;2;102;102;102m,�[39m�[38;2;187;187;187m �[39m�[38;2;102;102;102m128�[39mK�[38;2;187;187;187m �[39mmax�[38;2;187;187;187m �[39moutput�[38;2;102;102;102m,�[39m�[38;2;187;187;187m �[39m$5�[38;2;102;102;102m/�[39m$25�[38;2;187;187;187m �[39mper�[38;2;187;187;187m �[39mMTok�[38;2;102;102;102m)�[39m
�[38;2;102;102;102m-�[39m�[38;2;187;187;187m �[39mRemove�[38;2;187;187;187m �[39mretired�[38;2;102;102;102m:�[39m�[38;2;187;187;187m �[39mclaude�[38;2;102;102;102m-�[39m�[38;2;102;102;102m3�[39m�[38;2;102;102;102m-�[39m�[38;2;102;102;102m7�[39m�[38;2;102;102;102m-�[39msonnet�[38;2;102;102;102m-�[39m�[38;2;102;102;102m20250219�[39m�[38;2;102;102;102m,�[39m�[38;2;187;187;187m �[39mclaude�[38;2;102;102;102m-�[39m�[38;2;102;102;102m3�[39m�[38;2;102;102;102m-�[39m�[38;2;102;102;102m5�[39m�[38;2;102;102;102m-�[39msonnet�[38;2;102;102;102m-�[39m�[38;2;102;102;102m20241022�[39m�[38;2;102;102;102m,�[39m
�[38;2;187;187;187m  �[39mclaude�[38;2;102;102;102m-�[39m�[38;2;102;102;102m3�[39m�[38;2;102;102;102m-�[39m�[38;2;102;102;102m5�[39m�[38;2;102;102;102m-�[39mhaiku�[38;2;102;102;102m-�[39m�[38;2;102;102;102m20241022�[39m
�[38;2;102;102;102m-�[39m�[38;2;187;187;187m �[39mFix�[38;2;187;187;187m �[39mcontext�[38;2;187;187;187m �[39mwindow�[38;2;187;187;187m �[39mto�[38;2;187;187;187m �[39m�[38;2;102;102;102m1�[39mM�[38;2;187;187;187m �[39m�[38;2;170;34;255;01mfor�[39;00m�[38;2;187;187;187m �[39mopus�[38;2;102;102;102m-�[39m�[38;2;102;102;102m4�[39m�[38;2;102;102;102m-�[39m�[38;2;102;102;102m7�[39m�[38;2;102;102;102m,�[39m�[38;2;187;187;187m �[39mopus�[38;2;102;102;102m-�[39m�[38;2;102;102;102m4�[39m�[38;2;102;102;102m-�[39m�[38;2;102;102;102m6�[39m�[38;2;102;102;102m,�[39m�[38;2;187;187;187m �[39msonnet�[38;2;102;102;102m-�[39m�[38;2;102;102;102m4�[39m�[38;2;102;102;102m-�[39m�[38;2;102;102;102m6�[39m
�[38;2;102;102;102m-�[39m�[38;2;187;187;187m �[39mFix�[38;2;187;187;187m �[39mmax�[38;2;187;187;187m �[39moutput�[38;2;187;187;187m �[39mto�[38;2;187;187;187m �[39m�[38;2;102;102;102m32�[39mK�[38;2;187;187;187m �[39m�[38;2;170;34;255;01mfor�[39;00m�[38;2;187;187;187m �[39mopus�[38;2;102;102;102m-�[39m�[38;2;102;102;102m4�[39m�[38;2;102;102;102m-�[39m�[38;2;102;102;102m1�[39m�[38;2;102;102;102m-�[39m�[38;2;102;102;102m20250805�[39m�[38;2;187;187;187m �[39mand�[38;2;187;187;187m �[39mopus�[38;2;102;102;102m-�[39m�[38;2;102;102;102m4�[39m�[38;2;102;102;102m-�[39m�[38;2;102;102;102m20250514�[39m
�[38;2;102;102;102m-�[39m�[38;2;187;187;187m �[39mFix�[38;2;187;187;187m �[39mmax�[38;2;187;187;187m �[39moutput�[38;2;187;187;187m �[39mto�[38;2;187;187;187m �[39m�[38;2;102;102;102m64�[39mK�[38;2;187;187;187m �[39m�[38;2;170;34;255;01mfor�[39;00m�[38;2;187;187;187m �[39msonnet�[38;2;102;102;102m-�[39m�[38;2;102;102;102m4�[39m�[38;2;102;102;102m-�[39m�[38;2;102;102;102m6�[39m�[38;2;187;187;187m �[39m�[38;2;102;102;102m(�[39msynchronous�[38;2;187;187;187m �[39mMessages�[38;2;187;187;187m �[39mAPI�[38;2;102;102;102m)�[39m
�[38;2;102;102;102m-�[39m�[38;2;187;187;187m �[39mSync�[38;2;187;187;187m �[39m�[38;2;102;102;102m.�[39m�[38;2;187;68;68menv�[39m�[38;2;102;102;102m.�[39m�[38;2;187;68;68mexample�[39m�[38;2;187;187;187m �[39mDEFAULT_MODEL�[38;2;187;187;187m �[39m�[38;2;170;34;255;01mwith�[39;00m�[38;2;187;187;187m �[39mcode�[38;2;187;187;187m �[39m�[38;2;170;34;255;01mdefault�[39;00m�[38;2;187;187;187m �[39m�[38;2;102;102;102m(�[39mclaude�[38;2;102;102;102m-�[39msonnet�[38;2;102;102;102m-�[39m�[38;2;102;102;102m4�[39m�[38;2;102;102;102m-�[39m�[38;2;102;102;102m6�[39m�[38;2;102;102;102m)�[39m
�[38;2;102;102;102m-�[39m�[38;2;187;187;187m �[39mUpdate�[38;2;187;187;187m �[39mlanding�[38;2;102;102;102m-�[39mpage�[38;2;187;187;187m �[39mquickstart�[38;2;187;187;187m �[39mand�[38;2;187;187;187m �[39mdebug�[38;2;187;187;187m �[39mexample�[38;2;187;187;187m �[39mto�[38;2;187;187;187m �[39mclaude�[38;2;102;102;102m-�[39msonnet�[38;2;102;102;102m-�[39m�[38;2;102;102;102m4�[39m�[38;2;102;102;102m-�[39m�[38;2;102;102;102m6�[39m
The existing rule only ignored test_roocode_compatibility.py.hypothesis/.
Pytest-hypothesis regenerates .hypothesis/ at the repo root on every run,
making it repeatedly show up as untracked.
Refresh supported model list, add Claude Opus 4.7 (v2.7.0)
…y user] as response content

Fixes the class of SDK failures where ResultMessage.subtype != "success"
fell through parse_claude_message and the synthetic
UserMessage('[Request interrupted by user]') was returned verbatim as the
assistant response. Raises ClaudeResultError instead, translated by the
HTTP layer to finish_reason="length" (error_max_turns) or a status-coded
error body (assistant rate_limit/billing/auth, generic SDK failure).

Also raises the OpenAI-compat max_turns default from 1 to 3 (env-configurable),
drops the max_tokens -> max_thinking_tokens misleading remap (opt-in via
env var), pins the SDK exactly, adds a circuit breaker and /healthz/deep
end-to-end probe, structured completion_result log line, multi-stage
Dockerfile dev/prod targets, BUILD_INFO stamp, and 13 new regression tests.

Upstream consumer affected was MinusPod; see that project's 2.0.12 notes
for the parallel defensive changes.
…ture CLI stderr

2.8.0 surfaced error_during_execution correctly but opened a tighter
secondary problem: the breaker tripped on a 5/10 intra-episode failure
burst, cascading into 503s for verification windows. Also discovered the
R5 structured-log extras were being dropped by the plain-text formatter
so circuit_breaker_open and completion_result logs shipped to Loki empty.

Three fixes, no new behavior:

- Inline key=value fields into log message strings via a _kv helper so
  default-formatter installs see the structured data.
- Raise CircuitBreakerConfig defaults to min_requests_for_trip=20 and
  failure_ratio_threshold=0.75. Add env-var overrides for every knob
  plus WRAPPER_CIRCUIT_BREAKER_ENABLED kill switch.
- Install stderr callback on ClaudeAgentOptions, ring-buffer 40 lines,
  emit + attach to ResultMessage dict on non-success. Propagates
  through ClaudeResultError.stderr_tail so the HTTP error path logs
  the real subprocess failure reason instead of just "num_turns=2".

12 new unit tests. Full suite 640 passed, 31 skipped.
Adds security-floor pins for starlette, urllib3, cryptography, pyjwt,
authlib, mcp, and nltk -- each is transitive via fastapi or claude-agent-sdk
but required a newer version than the parent's ceiling allowed. Widened
fastapi to >=0.119 to admit starlette 0.49.x (for CVE-2025-62727).

Clears 2 CRITICAL + 18 HIGH from the trivy scan against 2.8.1. Remaining
findings are nltk XML CVEs with no published fix and Debian base-image
packages that need a debian:13 rebase.

No code change. 640 tests pass on the new deps.
ttlequals0 and others added 19 commits April 23, 2026 22:44
47 patch releases worth of CLI and subprocess-handling fixes. Direct
motivation is the silent `error_during_execution` rate observed against
2.8.2 in production (num_turns=2, usage.input_tokens=0, stderr empty —
CLI dying before reaching Claude).

Notable fixes in the range:
- 0.1.52 control_cancel_request handling for hook callbacks
- 0.1.53 string-prompt deadlock fix
- 0.1.57 thinking-config serialization (direct vs max_thinking_tokens)
- 0.1.60 setting_sources=[] no longer dropped
- 0.1.51 ResultMessage.errors field now populated on failure
- Bundled Claude CLI 2.0.72 -> 2.1.118 (46 versions)

Full suite 640 passed, 31 skipped. No test changes required.
…parsing

feat(2.8.0): stop error_max_turns from leaking interrupt sentinel as response content
Closes all 10 code-scanning alerts open on main:

Workflows
- Remove .github/workflows/claude-code-review.yml; the pull_request_target
  + checkout(head.sha) shape was flagged untrusted-checkout/high.
- Pin .github/workflows/ci.yml to permissions: {contents: read}.

Error responses (py/stack-trace-exposure)
- _build_assistant_error_response returns static subtype-keyed messages
  via new _safe_assistant_error_message helper; raw err.errors/str(err)
  stay in server logs only.
- generate_streaming_response error chunk is now a generic
  "Streaming failed" string.
- Chat-completions and Anthropic-messages 500 HTTPException details
  are generic strings; the exception is already logged.
- /v1/debug/request is gated behind DEBUG_MODE or VERBOSE and emits
  only type(e).__name__ for the json-parse and outer-except paths.

filter_content (py/polynomial-redos)
- Replace <tag>.*?</tag> regex stripping with a linear str.find-based
  helper so unterminated tags cannot trigger quadratic scanning even
  before adding a length guard.
- Add 1MB input length guard for defence in depth.
- Rewrite the image/base64 regex with fixed upper bounds instead of
  lazy quantifiers + lookahead.

Tests
- tests/test_redos_safety.py: six pathological inputs each complete in
  under 1s (previously seconds-to-minutes), plus behavioural
  regression coverage for the tag-stripping and image replacement.

Full suite: 650 passed, 31 skipped.
…repo

Dockerfile + .dockerignore
- poetry install now --only main so dev deps (black, bandit, pytest,
  mypy, safety) don't ship in the runtime image. Removes the one
  fixable Trivy HIGH (CVE-2026-32274 black < 26.3.1).
- .dockerignore excludes .git, .venv, .hypothesis, .pytest_cache,
  tests, docs, .env*, editor cruft. Image drops from 1.18 GB to
  775 MB and BUILD_INFO stamp now succeeds at build time.
- 7 remaining Trivy HIGHs are in the Debian 13.4 base (ncurses,
  nghttp2, systemd); all have no upstream fix. Accepted risk until
  python:3.12-slim rebases.

.github/workflows/ci.yml
- timeout-minutes: 15 and fail-fast: false on the test matrix.
- poetry check --lock step catches lockfile drift pre-merge (the
  exact failure mode that produced the 2.9.0 SDK bump).
- Replaced deprecated `safety check` with pip-audit (non-blocking).
- Added a Docker smoke-build job on every PR so Dockerfile
  regressions surface before release.

.github/workflows/claude.yml
- Repo-specific tool allowlist: read-only gh/git + poetry run pytest
  / black --check / bandit. No write commands (no gh pr create, no
  gh pr merge, no git push, no editor invocations).
- Documented why the contains() gate on user-controlled event body
  fields is safe.
CI's `poetry run black --check src tests` was failing on 18 files
(16 pre-existing plus src/main.py and src/message_adapter.py touched
on this branch). Running `black` with the repo config
(line-length=100) to bring everything in line so the linting step
gates going forward. No behavioural changes; full pytest still 650
passed, 31 skipped.
Additional commits landed on this branch after the initial 2.9.1 commit
(Docker --only main, .dockerignore, ci.yml hardening, claude.yml
allowlist, black reformat). Bumping version so the deployed image
surfaces 2.9.2 on /version and the landing page, and CI's lockfile /
docker smoke gates ship under that tag.
CI has no push/deploy role for Docker - images are built and pushed
locally, so a smoke build inside CI just burns runner minutes without
gating anything the local flow doesn't already cover.
Without this, docker compose up (what Portainer runs on webhook
redeploy) reuses the locally cached :latest layer and the updated
image sits on Docker Hub unused. The Portainer stack's own compose
config needs to match for the running stack to pick this up.
After 2.9.2 switched the Docker image to poetry install --only main,
the first chat completion raised at SDK connect:

  File ".../claude_agent_sdk/_internal/transport/subprocess_cli.py",
       line 413, in connect
    from opentelemetry import propagate
  ModuleNotFoundError: No module named 'opentelemetry'

The SDK does an unconditional opentelemetry.propagate import, but
opentelemetry-api is declared on PyPI only as an optional [otel]
extra. Previous images accidentally had it via dev-group transitives;
--only main correctly dropped it.

Pinning claude-agent-sdk = {version = "0.1.65", extras = ["otel"]} so
opentelemetry-api 1.41.1 resolves into the runtime image.
fix(2.9.1): close CodeQL alerts for stack-trace exposure and ReDoS
README
- Version 2.7.0 -> 2.9.3 with 2.8.x and 2.9.x highlights.
- Test count 566 -> 650 passing (31 skipped).
- Add missing env vars: VERBOSE, WRAPPER_DEFAULT_MAX_TURNS,
  WRAPPER_MAP_MAX_TOKENS_TO_THINKING, MAX_REQUEST_SIZE,
  REQUEST_CACHE_TTL/MAX_SIZE, CLAUDE_WRAPPER_HOST, UVICORN_WORKERS,
  WATCHDOG_*, explicit API_KEY row, Bedrock/Vertex vars.
- Add /healthz/deep endpoint row.
- Note /v1/debug/request is gated behind DEBUG_MODE/VERBOSE.
- Add /v1/sessions/* rate-limit row with env var.
- Docker Compose example now matches docker-compose.yml (pull_policy,
  build target, container_name, healthcheck).
- Drop stale limitation claim that OpenAI-style function calling is
  unsupported - 2.6.0 added it and the section below documents it.
- X-Enable-Cache header row added.

docs/
- Delete docs/MIGRATION_STATUS.md and docs/UPGRADE_PLAN.md. Both
  described work that shipped; no current reader needs them.
Two complementary mechanisms to catch claude-agent-sdk drift.

.github/dependabot.yml
- Weekly pip (Poetry) and github-actions scans, grouped minor/patch
  so the review queue stays short.
- commit-message prefixes 'chore(deps)' and 'chore(ci)' for clean
  history. Release notes surface in the PR body automatically.

.github/workflows/check-sdk-version.yml
- Cron (Mondays 14:00 UTC) plus workflow_dispatch.
- Reads the claude-agent-sdk pin from pyproject.toml, fetches the
  latest PyPI version, and opens (or updates) an issue if the pin
  lags. Catches the case where Dependabot PRs pile up unreviewed.
- No event-payload interpolation; only schedule/dispatch triggers
  and explicit step outputs piped through env vars.
docs: audit README against current state, drop stale docs/
chore: monitor claude-agent-sdk drift via Dependabot + weekly sentinel
* fix(2.9.4): close all seven open Dependabot alerts

Bumps:
- black 24.10.0 -> 26.3.1        CVE-2026-32274 (high, dev only)
- filelock 3.20.1 -> 3.29.0      CVE-2026-22701 (medium, dev only)
- requests 2.32.4 -> 2.33.1      CVE-2026-25645 (medium, runtime)
- pytest 8.4.1 -> 9.0.3          CVE-2025-71176 (medium, dev only)
- python-multipart 0.0.22 -> 0.0.26  CVE-2026-40347 (medium, runtime)
- python-dotenv 1.1.1 -> 1.2.2   CVE-2026-28684 (medium, runtime)
- pygments 2.19.2 -> 2.20.0      CVE-2026-4539 (low, transitive)

Secondary:
- pytest-asyncio ^0.23 -> ^1.3.0 (pytest 9 requires it).
- 3 test files reformatted by black 26 so the lint gate passes:
  tests/test_redos_safety.py, tests/test_function_calling_unit.py,
  tests/test_session_complete.py.

Full suite: 650 passed, 31 skipped under pytest 9.0.3.

Supersedes PR #10 (Dependabot's grouped bump) - that PR's CI was red
on black 26 formatting; this consolidates the fix plus adds the
Pygments transitive that Dependabot did not surface.

* chore: refresh retired model reference in compat report

The /v1/compatibility response suggested claude-3-5-haiku as a
'more focused response' alternative when temperature is passed -
that model was retired in 2.7.0. Point at claude-haiku-4-5-20251001,
the current FAST_MODEL.
…#14)

Issues are disabled on this repo, so the weekly check-sdk-version
workflow has been failing at `gh issue create` whenever the pin falls
behind PyPI (run 25001796671). Replace the issue step with a
GITHUB_STEP_SUMMARY write; the existing `::warning::` annotation
still surfaces drift on the run page. Drop the now-unused
`issues: write` permission.
Per the weekly SDK drift check (now passing after PR #14). The 0.1.68
release adds an explicit `sniffio >= 1.0.0` dependency, which the lock
picks up as 1.3.1; otherwise no transitive movement. The `[otel]` extra
stays on the pin because the SDK still imports
`opentelemetry.propagate` unconditionally.

Bumps version to 2.9.5 and rolls in the prior CI-only change from PR #14.

Tests: 650 passed, 31 skipped (unchanged from v2.9.4).
…models from upstream, SDK-drift auto-PR (#17)

* feat: dynamically refresh Anthropic model list (RichardAtCT#46)

* feat: dynamically refresh Anthropic model list

* fix: harden /v1/models cache and resolve default model live

- Lock + double-check refresh path so concurrent requests at TTL
  expiry don't stampede the Anthropic Models API.
- Use a short MODEL_LIST_ERROR_TTL_SECONDS (default 60s) for the
  fallback cache so transient outages don't suppress live discovery
  for a full hour.
- Populate `created` (unix timestamp) on both live and fallback
  /v1/models entries to match OpenAI's model object schema.
- Resolve DEFAULT_MODEL at startup by picking the latest Sonnet from
  the live Models API; honor explicit DEFAULT_MODEL env override.

* docs: clarify ANTHROPIC_API_KEY is optional for live model discovery

- README: expand env vars table with ANTHROPIC_API_KEY (optional),
  DEFAULT_MODEL, FAST_MODEL, CLAUDE_MODELS_OVERRIDE, and the model
  list cache/timeout knobs. Rewrite the Supported Models section to
  explain the live-vs-static behavior and refresh the catalog around
  Claude 4.6 family. Bump model examples to claude-sonnet-4-6.
- .env.example: add a Model Discovery (optional) block documenting
  ANTHROPIC_API_KEY, CLAUDE_MODELS_OVERRIDE, and the cache TTLs;
  comment out DEFAULT_MODEL so live resolution drives it by default.
- main.py: log a single explicit info line at startup when live
  discovery is disabled (no ANTHROPIC_API_KEY) so operators see
  whether the dynamic path activated.
- tests: cover the new disabled-path log and update the env-key gate
  in the existing resolve_default_model test.

* chore(v2.9.6): SDK 0.1.81 bump, urllib3/python-multipart sec fixes, SDK-drift workflow auto-PR

- claude-agent-sdk 0.1.68 -> 0.1.81 (13 patch releases since v2.9.5).
- python-multipart ^0.0.26 -> ^0.0.27 (GHSA-pp6c-gr5w-3c5g, supersedes Dependabot PR #16).
- urllib3 security floor >=2.6.3 -> >=2.7.0 (GHSA-qccp-gfcp-xxvc, GHSA-mf9v-mfxr-j63j).
- check-sdk-version.yml opens a draft chore/sdk-bump-<latest> PR on drift instead
  of only writing to the run summary. Permissions widened to contents: write +
  pull-requests: write; idempotent by head branch; fallback summary still fires.

Lockfile regenerated locally with Poetry 2.3.4. Full suite at 664 passed, 31 skipped
(+14 from upstream test_dynamic_models.py picked up in the prior cherry-pick).

* docs(readme): bump to v2.9.6, document new model-discovery env vars, tighten supported-models intro

- Version 2.9.3 -> 2.9.6 in header and docker pin example
- Test count 650 -> 664 in Status and Testing sections
- Add 2.9.6 highlight bullet covering SDK 0.1.81, urllib3/python-multipart sec
  fixes, upstream PR RichardAtCT#46 dynamic-models sync, and check-sdk-version auto-PR
- Add ANTHROPIC_MODELS_URL, ANTHROPIC_VERSION, ANTHROPIC_BETA/ANTHROPIC_BETA_HEADER
  rows to the env var table (advanced overrides for the new live-discovery path)
- Tighten the Supported Models intro paragraph (was 3 dense sentences)

---------

Co-authored-by: Richard A <richardatk01@gmail.com>
Periodic background coroutine probes claude-agent-sdk via existing
verify_cli() (1-turn Hello query) when CLAUDE_AUTH_METHOD=claude_cli.
Default interval 600s, configurable via CLI_AUTH_PROBE_INTERVAL_SECONDS,
0 to disable. /v1/auth/status exposes new cli_health block (ok,
last_probed_at, last_ok_at, error_kind, error_message).

POST /v1/chat/completions and POST /v1/messages now return HTTP 401 with
error.type=authentication_error and code=claude_cli_not_authenticated
when the most recent probe failed, instead of letting the request reach
the SDK and surface as 502 or fall through to the 503 config check.
OpenAI / Anthropic client libraries route 401 as AuthenticationError.

Defense-in-depth: _build_sdk_error_response now scans stderr_tail +
error_message for known CLI-auth markers (not logged in, please run
/login, invalid api key, authentication_error, 401). On a match it
returns 401 instead of 502 and seeds cli_health failed so the next
request fails fast.

Auth-failure responses bypass the global http_exception_handler (which
rewrites bodies to error.type=api_error) by returning JSONResponse
directly, so the authentication_error literal reaches clients.

Tests: 673 passing, 31 skipped (+9 from v2.9.6 baseline of 664/31).
- TestProbeCliAuth: 3 async tests for probe classification
- TestChatCompletionsCliHealthGate + TestAnthropicMessagesCliHealthGate:
  in-process TestClient assertions on the 401 surface
- TestCliAuthFailureToFourOhOne: 4 stderr-mapping tests including a
  502 regression guard and a cli_health-seeding test
@ttlequals0
Copy link
Copy Markdown
Author

Opened against the wrong fork; will reopen against ttlequals0/claude-code-openai-wrapper.

@ttlequals0 ttlequals0 closed this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant