Skip to content

feat: dynamically refresh Anthropic model list#46

Merged
RichardAtCT merged 3 commits into
mainfrom
feat/dynamic-anthropic-models
May 4, 2026
Merged

feat: dynamically refresh Anthropic model list#46
RichardAtCT merged 3 commits into
mainfrom
feat/dynamic-anthropic-models

Conversation

@RichardAtCT
Copy link
Copy Markdown
Owner

@RichardAtCT RichardAtCT commented May 4, 2026

Summary

  • Add Claude 4.6 fallback models and make claude-sonnet-4-6 the default model
  • Teach /v1/models to prefer Anthropic's live Models API when ANTHROPIC_API_KEY is configured
  • Preserve existing OpenAI-compatible response shape while including useful Anthropic metadata for capable clients
  • Add caching, pagination support, env-configurable timeouts/TTL, and graceful fallback to static models
  • Keep CLAUDE_MODELS_OVERRIDE available for deployments that need a pinned/curated list

Test plan

  • poetry run black --check src/constants.py src/main.py tests/test_sdk_migration.py tests/test_dynamic_models.py
  • poetry run pytest tests/test_parameter_validator_unit.py tests/test_property_based.py tests/test_models_unit.py tests/test_dynamic_models.py tests/test_sdk_migration.py

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Code Review — PR #46: Dynamically refresh Anthropic model list

Overview: This PR replaces the static /v1/models response with a live Anthropic Models API lookup, adds a 1-hour cache, and updates the default model to claude-sonnet-4-6. The structure is clean and the fallback design is solid. One bug needs attention before merge; several smaller items are worth addressing.


🐛 Critical bug: validation/discovery mismatch

ParameterValidator.validate_model (in src/parameter_validator.py) builds its allowed set once at import time from the static CLAUDE_MODELS:

# parameter_validator.py
from src.constants import CLAUDE_MODELS
class ParameterValidator:
    SUPPORTED_MODELS = set(CLAUDE_MODELS)

When the live API returns models not in the static list (which is the whole point of this PR), every subsequent chat request for those models will be rejected at lines 534 and 796 of main.py with a validation error — even though /v1/models just advertised them.

Fix options (pick one):

  1. Make ParameterValidator.validate_model accept any model that is in the live cache (requires async or a cache accessor).
  2. Add a permissive mode: skip strict validation when the model looks like a valid Anthropic model ID (e.g. starts with claude-), since Anthropic's API will reject unknown IDs anyway.
  3. On cache miss in validate_model, fall through to allow rather than deny.

⚠️ Race condition in cache update

_model_list_cache is a plain dict mutated with .update() from an async context. Two concurrent requests that both find the cache expired will both call _fetch_anthropic_models() and both write to the cache. This doubles upstream traffic at TTL boundaries (thundering-herd on a shared dict).

Consider guarding with an asyncio.Lock:

_model_list_lock = asyncio.Lock()

async def get_available_models():
    ...
    async with _model_list_lock:
        # recheck inside lock
        if cached_models and time.time() < _model_list_cache["expires_at"]:
            return cached_models
        ...

⚠️ Stale fallback cached for full TTL

When _fetch_anthropic_models() returns None (network error, missing key), the fallback list is cached for the full MODEL_LIST_CACHE_TTL_SECONDS (default 1 hour). A transient 5-second Anthropic API blip will suppress live model discovery for an hour.

Consider a shorter retry TTL for the error case, e.g.:

error_ttl = min(60, MODEL_LIST_CACHE_TTL_SECONDS)
_model_list_cache.update({"models": fallback_models, "expires_at": now + error_ttl})

🔍 Missing created field in model objects

OpenAI's model object schema requires "created": <unix_timestamp>. The fallback objects generated in get_available_models() omit it entirely. Clients that rely on this field (sorting, display) will break silently.

_openai_model_from_anthropic maps created_at (Anthropic ISO string) but doesn't convert it to a unix int. Consider:

import datetime
if "created_at" in model_info:
    try:
        model["created"] = int(datetime.datetime.fromisoformat(
            model_info["created_at"].rstrip("Z")).timestamp())
    except ValueError:
        pass

And add a default created to the fallback objects.


Minor issues

FAST_MODEL now configurable without validation
The FAST_MODEL = os.getenv("FAST_MODEL", ...) change is welcome, but operators who set it to an invalid value won't see an error until runtime. A startup check or at least a logged warning would help.

os.getenv("ANTHROPIC_API_KEY") called per request
_fetch_anthropic_models() calls os.getenv on each invocation. This is fine since it supports runtime config, but it's worth noting this means a key added after server start will start working after the next cache expiry — document this in the env vars section of the README.

Cache not cleared on CLAUDE_MODELS_OVERRIDE toggle
If an operator sets CLAUDE_MODELS_OVERRIDE after the server starts, the old live-fetched cache won't be invalidated. The early-return path for overrides is correct, but the cache dict will still hold stale live data. This is a minor edge case for runtime reconfiguration.


Test coverage

The new test_dynamic_models.py covers the main code paths. A few gaps worth filling:

  • Cache hit path: no test verifies that a warm cache is returned without calling _fetch_anthropic_models.
  • _fetch_anthropic_models directly: no test exercises the HTTP pagination loop, the has_more branch, or the raise_for_status error path (the validation mismatch bug above also needs a test).
  • Concurrent requests: a quick test using asyncio.gather would confirm the race condition is addressed once fixed.

Summary

Severity Issue
🐛 Bug Live-discovered models fail validate_model → chat requests rejected
⚠️ Risk Thundering-herd race condition on cache expiry
⚠️ Risk Error fallback cached for full TTL (1 h default)
💡 Polish Missing created field in OpenAI model objects
💡 Polish No test for cache hit / HTTP fetch / concurrency

The overall design is sound — live fetch, graceful degradation, override escape hatch. Fixing the validation mismatch is required before this is safe to merge. The race condition and stale-error-cache issues are worth addressing in the same PR given how straightforward they are.

- Lock + double-check refresh path so concurrent requests at TTL
  expiry don't stampede the Anthropic Models API.
- Use a short MODEL_LIST_ERROR_TTL_SECONDS (default 60s) for the
  fallback cache so transient outages don't suppress live discovery
  for a full hour.
- Populate `created` (unix timestamp) on both live and fallback
  /v1/models entries to match OpenAI's model object schema.
- Resolve DEFAULT_MODEL at startup by picking the latest Sonnet from
  the live Models API; honor explicit DEFAULT_MODEL env override.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Code Review – PR #46: Dynamically refresh Anthropic model list

Overview

This PR upgrades /v1/models from a static list to a live Anthropic Models API fetch with caching, pagination, graceful fallback, and an operator override. It also makes the default model self-resolving (picks latest Sonnet at startup). The design is solid and the test suite is comprehensive. Below are a few issues worth addressing before merge.


Bugs / Correctness

1. Model validation is stale after live fetch (medium)

ParameterValidator.SUPPORTED_MODELS is a set built once at class-definition time from the static CLAUDE_MODELS list (parameter_validator.py:17). After this PR, /v1/models can return live Anthropic models that aren't in that set. A client that calls /v1/models, picks a live-only model (e.g. a brand-new claude-sonnet-4-7), then sends a /v1/chat/completions request will trigger a spurious warning log on every call – and if validation is ever hardened to reject unknown models, requests will silently break.

The fix is to validate against the live cache rather than the static constant:

# parameter_validator.py – validate against the live model cache
async def validate_model(cls, model: str) -> bool:
    from src.main import get_available_models
    live_ids = {m["id"] for m in await get_available_models()}
    if model not in live_ids:
        logger.warning(...)
    return True

Or, minimally, add live model IDs to SUPPORTED_MODELS when the cache is populated.

2. Env-var integer/float parsing can crash at import time (medium)

# src/constants.py
MODEL_LIST_CACHE_TTL_SECONDS = int(os.getenv("MODEL_LIST_CACHE_TTL_SECONDS", "3600"))
MODEL_LIST_ERROR_TTL_SECONDS = int(os.getenv("MODEL_LIST_ERROR_TTL_SECONDS", "60"))
MODEL_LIST_REQUEST_TIMEOUT_SECONDS = float(os.getenv("MODEL_LIST_REQUEST_TIMEOUT_SECONDS", "5"))

If an operator sets any of these to a non-numeric string (a common misconfiguration), the app crashes on import with an unhandled ValueError before it can even log a helpful message. Wrap each in a try/except or use a helper:

def _env_int(key: str, default: int) -> int:
    try:
        return int(os.getenv(key, str(default)))
    except ValueError:
        logger.warning("Invalid value for %s, using default %d", key, default)
        return default

Design / Style

3. CLAUDE_MODELS_OVERRIDE is re-read via os.getenv in get_available_models (low)

constants.py already resolves and stores the override in CLAUDE_MODELS. get_available_models (main.py line ~180) then calls os.getenv("CLAUDE_MODELS_OVERRIDE", "").strip() again for the short-circuit check. This creates two sources of truth and could diverge if someone changes the env after process start. Use the already-resolved CLAUDE_MODELS constant or a boolean flag instead:

# at module level
_using_model_override = bool(os.getenv("CLAUDE_MODELS_OVERRIDE", "").strip())

async def get_available_models():
    if _using_model_override:
        return _fallback_model_payload()
    ...

4. resolve_default_model has a redundant outer try/except (trivial)

get_available_models() already catches all exceptions internally and returns the fallback list — it cannot raise. The try/except in resolve_default_model around await get_available_models() is dead code. The one in lifespan around await resolve_default_model() is the right place to be defensive.


Tests

5. Module-global mutation without cleanup fixtures (low)

Several tests directly assign to main._model_list_cache and constants.RESOLVED_DEFAULT_MODEL:

main._model_list_cache = {"expires_at": 0.0, "models": None}
constants.RESOLVED_DEFAULT_MODEL = None

These mutations are not cleaned up via yield fixtures or monkeypatch.setattr, so a test failure mid-suite can poison later tests. Replace with monkeypatch.setattr or an autouse fixture:

@pytest.fixture(autouse=True)
def reset_model_cache():
    main._model_list_cache.update({"expires_at": 0.0, "models": None})
    yield
    main._model_list_cache.update({"expires_at": 0.0, "models": None})

6. Import inside test body (trivial)

# test_dynamic_models.py – test_failed_fetch_uses_short_error_ttl
import time as _time

Move this to the top of the file with the other imports.

7. No HTTP-level integration test for /v1/models (low)

The new helper functions are well-tested in isolation, but there's no test using FastAPI's TestClient (or httpx.AsyncClient) to verify that GET /v1/models actually returns the expected shape. Consider adding one similar to test_models_unit.py that covers the live-fetch path.


Security

8. ANTHROPIC_MODELS_URL is operator-configurable (informational)

Allowing this via env var is a useful escape hatch, but document that setting it to an untrusted URL would allow an attacker (who controls env vars) to supply arbitrary model lists. Not a blocker given that env-var access already implies privileged access in this deployment model.


Positives

  • The double-checked locking pattern in get_available_models correctly prevents thundering-herd on TTL expiry.
  • Short error-TTL on fetch failure is a good reliability touch.
  • Preserving Anthropic metadata fields (display_name, capabilities, etc.) without breaking OpenAI-compatible clients (unknown keys are ignored) is the right approach.
  • resolve_default_model being best-effort with a static fallback means startup never blocks on Anthropic availability.
  • Test coverage for the new helpers is thorough; the concurrent-fetch test (test_concurrent_calls_only_fetch_once) is particularly valuable.

Summary

The two medium-priority issues (stale model validation set, uncaught ValueError on bad env vars) are worth fixing before merge. The rest are polish. Overall this is well-structured code with good defensive design.

- README: expand env vars table with ANTHROPIC_API_KEY (optional),
  DEFAULT_MODEL, FAST_MODEL, CLAUDE_MODELS_OVERRIDE, and the model
  list cache/timeout knobs. Rewrite the Supported Models section to
  explain the live-vs-static behavior and refresh the catalog around
  Claude 4.6 family. Bump model examples to claude-sonnet-4-6.
- .env.example: add a Model Discovery (optional) block documenting
  ANTHROPIC_API_KEY, CLAUDE_MODELS_OVERRIDE, and the cache TTLs;
  comment out DEFAULT_MODEL so live resolution drives it by default.
- main.py: log a single explicit info line at startup when live
  discovery is disabled (no ANTHROPIC_API_KEY) so operators see
  whether the dynamic path activated.
- tests: cover the new disabled-path log and update the env-key gate
  in the existing resolve_default_model test.
@RichardAtCT RichardAtCT merged commit ba9b039 into main May 4, 2026
4 checks passed
@RichardAtCT RichardAtCT deleted the feat/dynamic-anthropic-models branch May 4, 2026 11:42
RichardAtCT added a commit that referenced this pull request May 4, 2026
## New Features (PR #46)
- Live `/v1/models` discovery from Anthropic's Models API when
  `ANTHROPIC_API_KEY` is set, cached for 1 hour with thundering-herd
  protection and a short retry TTL on failure
- Dynamic default model resolution: `DEFAULT_MODEL` resolves to the
  latest Sonnet at startup when `ANTHROPIC_API_KEY` is configured;
  falls back to `claude-sonnet-4-6` otherwise
- New env vars: `CLAUDE_MODELS_OVERRIDE`, `FAST_MODEL`,
  `MODEL_LIST_CACHE_TTL_SECONDS`, `MODEL_LIST_ERROR_TTL_SECONDS`,
  `MODEL_LIST_REQUEST_TIMEOUT_SECONDS`
- `created` (unix timestamp) populated on `/v1/models` entries to
  match OpenAI's model object schema
- Claude 4.6 family added to the static fallback list

## Documentation
- README and `.env.example` document `ANTHROPIC_API_KEY` as optional
  for live discovery; wrapper continues to work with Bedrock, Vertex,
  and Claude CLI subscription auth using the static fallback list
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

test

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

hello world

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

line1 line2

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Code Review: PR #46 — Dynamic Anthropic Model Discovery

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Code Review: PR #46 — Dynamic Anthropic Model Discovery

Overview

This PR adds live model discovery via Anthropic's Models API, makes claude-sonnet-4-6 the new default, and introduces caching, pagination, and graceful fallback. The feature design is solid and the test coverage is good. A few issues worth addressing before merge.


Positives

  • Double-checked locking in get_available_models() correctly prevents thundering-herd at cache expiry (the recheck inside the lock is the right pattern).
  • Graceful degradation is well-thought-out: live fetch failure falls back to static list, not a 500.
  • Short error TTL (60 s vs 3600 s) is a good operational touch — transient outages do not suppress live discovery for an hour.
  • Test suite is comprehensive: 14 new tests cover concurrency, override precedence, fallback, and env-var behavior.
  • Pagination support in _fetch_anthropic_models() is correct.

Issues

1. Env var parse errors crash the app at startup

MODEL_LIST_CACHE_TTL_SECONDS = int(os.getenv("MODEL_LIST_CACHE_TTL_SECONDS", "3600"))
MODEL_LIST_ERROR_TTL_SECONDS = int(os.getenv("MODEL_LIST_ERROR_TTL_SECONDS", "60"))
MODEL_LIST_REQUEST_TIMEOUT_SECONDS = float(os.getenv("MODEL_LIST_REQUEST_TIMEOUT_SECONDS", "5"))

A misconfigured value (e.g. MODEL_LIST_CACHE_TTL_SECONDS=1h) raises ValueError at module import time and prevents the server from starting. For a feature designed to be optional and non-breaking, these should have try/except fallbacks with a warning log.

2. RESOLVED_DEFAULT_MODEL mutation is invisible to DEFAULT_MODEL in constants.py

DEFAULT_MODEL = DEFAULT_MODEL_ENV or DEFAULT_MODEL_FALLBACK is evaluated once at import time and never changes. Any code that imported DEFAULT_MODEL directly (rather than calling get_default_model()) will miss the runtime-resolved value. get_default_model() in models.py does this correctly, but it is a footgun for future contributors. A short comment on DEFAULT_MODEL warning about this would help.

3. Inconsistency in the override path

get_available_models() re-reads CLAUDE_MODELS_OVERRIDE from the environment at call time, but CLAUDE_MODELS (which populates _fallback_model_payload()) is baked in at module import time. The asymmetry already forces the test to monkeypatch main.CLAUDE_MODELS separately from the env var:

monkeypatch.setenv("CLAUDE_MODELS_OVERRIDE", "custom-a,custom-b")
monkeypatch.setattr(main, "CLAUDE_MODELS", ["custom-a", "custom-b"])

This could be simplified if _fallback_model_payload() read constants.CLAUDE_MODELS at call time rather than from the module-level import.

4. ID-sort fallback in _pick_latest_sonnet is unreliable for custom model IDs

When CLAUDE_MODELS_OVERRIDE is set alongside ANTHROPIC_API_KEY, resolve_default_model() eventually calls _pick_latest_sonnet on a list without created_at fields and falls back to lexicographic sort on id. This works for date-suffixed built-in names but breaks for arbitrary custom IDs. Low risk since operators with custom lists typically also set DEFAULT_MODEL, but worth documenting.

5. Minor: inline import in test

import time as _time inside test_failed_fetch_uses_short_error_ttl should be at the top of the test file.

6. ANTHROPIC_MODELS_URL is redirectable to arbitrary servers

ANTHROPIC_MODELS_URL = os.getenv("ANTHROPIC_MODELS_URL", "https://api.anthropic.com/v1/models")

Redirecting this to an arbitrary server causes ANTHROPIC_API_KEY to be sent there. Useful for proxying/testing, but worth a security note in .env.example.


Test Observations

  • test_default_model_defined asserts DEFAULT_MODEL == "claude-sonnet-4-6" — fails if the test runner has DEFAULT_MODEL in the environment. Assert only DEFAULT_MODEL in CLAUDE_MODELS, or explicitly clear the env var.
  • The bound < 120 in test_failed_fetch_uses_short_error_ttl is loose; tighten to < 90.
  • No test covers the lifespan path where resolve_default_model() itself raises (the outer except Exception in lifespan). Low priority.

Summary

Area Assessment
Feature design Solid — override > live > static fallback chain is clear
Caching / concurrency Correct double-checked lock, short error TTL
Graceful degradation No startup failure on missing API key or fetch errors
Test coverage Good — all main paths exercised
Startup robustness Env var parse errors crash at import time (issue 1)
Documentation .env.example and README are thorough

The only blocker before merging is issue 1 (bare int()/float() on env vars at module import time). The rest are improvements or low-risk notes.

ttlequals0 added a commit to ttlequals0/claude-code-openai-wrapper that referenced this pull request May 11, 2026
…tighten supported-models intro

- Version 2.9.3 -> 2.9.6 in header and docker pin example
- Test count 650 -> 664 in Status and Testing sections
- Add 2.9.6 highlight bullet covering SDK 0.1.81, urllib3/python-multipart sec
  fixes, upstream PR RichardAtCT#46 dynamic-models sync, and check-sdk-version auto-PR
- Add ANTHROPIC_MODELS_URL, ANTHROPIC_VERSION, ANTHROPIC_BETA/ANTHROPIC_BETA_HEADER
  rows to the env var table (advanced overrides for the new live-discovery path)
- Tighten the Supported Models intro paragraph (was 3 dense sentences)
ttlequals0 added a commit to ttlequals0/claude-code-openai-wrapper that referenced this pull request May 11, 2026
…models from upstream, SDK-drift auto-PR (#17)

* feat: dynamically refresh Anthropic model list (RichardAtCT#46)

* feat: dynamically refresh Anthropic model list

* fix: harden /v1/models cache and resolve default model live

- Lock + double-check refresh path so concurrent requests at TTL
  expiry don't stampede the Anthropic Models API.
- Use a short MODEL_LIST_ERROR_TTL_SECONDS (default 60s) for the
  fallback cache so transient outages don't suppress live discovery
  for a full hour.
- Populate `created` (unix timestamp) on both live and fallback
  /v1/models entries to match OpenAI's model object schema.
- Resolve DEFAULT_MODEL at startup by picking the latest Sonnet from
  the live Models API; honor explicit DEFAULT_MODEL env override.

* docs: clarify ANTHROPIC_API_KEY is optional for live model discovery

- README: expand env vars table with ANTHROPIC_API_KEY (optional),
  DEFAULT_MODEL, FAST_MODEL, CLAUDE_MODELS_OVERRIDE, and the model
  list cache/timeout knobs. Rewrite the Supported Models section to
  explain the live-vs-static behavior and refresh the catalog around
  Claude 4.6 family. Bump model examples to claude-sonnet-4-6.
- .env.example: add a Model Discovery (optional) block documenting
  ANTHROPIC_API_KEY, CLAUDE_MODELS_OVERRIDE, and the cache TTLs;
  comment out DEFAULT_MODEL so live resolution drives it by default.
- main.py: log a single explicit info line at startup when live
  discovery is disabled (no ANTHROPIC_API_KEY) so operators see
  whether the dynamic path activated.
- tests: cover the new disabled-path log and update the env-key gate
  in the existing resolve_default_model test.

* chore(v2.9.6): SDK 0.1.81 bump, urllib3/python-multipart sec fixes, SDK-drift workflow auto-PR

- claude-agent-sdk 0.1.68 -> 0.1.81 (13 patch releases since v2.9.5).
- python-multipart ^0.0.26 -> ^0.0.27 (GHSA-pp6c-gr5w-3c5g, supersedes Dependabot PR #16).
- urllib3 security floor >=2.6.3 -> >=2.7.0 (GHSA-qccp-gfcp-xxvc, GHSA-mf9v-mfxr-j63j).
- check-sdk-version.yml opens a draft chore/sdk-bump-<latest> PR on drift instead
  of only writing to the run summary. Permissions widened to contents: write +
  pull-requests: write; idempotent by head branch; fallback summary still fires.

Lockfile regenerated locally with Poetry 2.3.4. Full suite at 664 passed, 31 skipped
(+14 from upstream test_dynamic_models.py picked up in the prior cherry-pick).

* docs(readme): bump to v2.9.6, document new model-discovery env vars, tighten supported-models intro

- Version 2.9.3 -> 2.9.6 in header and docker pin example
- Test count 650 -> 664 in Status and Testing sections
- Add 2.9.6 highlight bullet covering SDK 0.1.81, urllib3/python-multipart sec
  fixes, upstream PR RichardAtCT#46 dynamic-models sync, and check-sdk-version auto-PR
- Add ANTHROPIC_MODELS_URL, ANTHROPIC_VERSION, ANTHROPIC_BETA/ANTHROPIC_BETA_HEADER
  rows to the env var table (advanced overrides for the new live-discovery path)
- Tighten the Supported Models intro paragraph (was 3 dense sentences)

---------

Co-authored-by: Richard A <richardatk01@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant