Skip to content

refactor(clients): disambiguate model identity + fix /v1/models stub (#100)#101

Merged
antoinezambelli merged 2 commits into
mainfrom
refactor/model-identity-disambiguation
Jun 1, 2026
Merged

refactor(clients): disambiguate model identity + fix /v1/models stub (#100)#101
antoinezambelli merged 2 commits into
mainfrom
refactor/model-identity-disambiguation

Conversation

@antoinezambelli

@antoinezambelli antoinezambelli commented Jun 1, 2026

Copy link
Copy Markdown
Owner

What

Two related changes to model identity, in two commits:

  1. Refactor — disambiguates the overloaded self.model across all five clients into two unambiguously-named roles.
  2. Fix /v1/models endpoint doesn't pass through request to backend and list models #100/v1/models now reports the real backend model instead of a hardcoded "forge" stub, which the refactor makes a clean one-liner.

1. Identity disambiguation

attribute meaning
self.model the wire "model" field, sent verbatim to the backend
self.sampling_key the registry-lookup key for apply_sampling_defaults

self.model previously meant a derived registry stem on VLLMClient but the wire id on every other client. VLLMClient is the only client that changes meaningfully (wire id moves self.model_pathself.model, byte-identical value; stem moves to self.sampling_key; self.model_path dropped as an attribute). The other four keep self.model as-is and gain a self.sampling_key alias. The --model-path CLI flag and VLLMClient(model_path=...) ctor param are unchanged — model_path survives only at the locked boundary.

2. Fix #100 — /v1/models stub

_handle_models returned {"id": "forge"} regardless of backend. It now reports self._client.model — the real wire id (served-model-name for vLLM, gguf stem for llama.cpp, model tag for ollama). No fallback: a client without .model raises rather than serving a false id. The identity refactor is what makes this correct — before it, self._client.model was inconsistent across backends.

Also elevates model: str to the LLMClient protocol (sibling of api_format), making the wire-id attribute a documented contract now that all five clients set it uniformly.

Compatibility: zero proxy-user impact

CLI flags, ctor kwargs, the wire "model" value, and output schemas are all untouched. The renames are client-internal variable names; the only externally-visible behavior change is the intended one — /v1/models now tells the truth.

Verification

  • Full tests/unit suite green (1086).
  • Mock proxy smoke (scripts/smoke_test_proxy.py) extended with /v1/models coverage (none before).
  • Live smoke against real backends on an 8B Q4 (llama.cpp + ollama): /v1/models reports the real id on both; tool-call round-trips end-to-end (ollama validates the tag, confirming wire-id integrity).

Lands in v0.7.4.

🤖 Generated with Claude Code

antoinezambelli and others added 2 commits June 1, 2026 17:20
Every client now uses two unambiguously-named identity attributes:
  - self.model        the wire "model" field, sent verbatim to the backend
  - self.sampling_key  the registry-lookup key for apply_sampling_defaults

Previously self.model meant different things across clients: a derived
registry-lookup stem on VLLMClient, but the wire id (doubling as the key)
on ollama/openai_compat/anthropic/llamafile. That overload was the smell.

VLLMClient is the only client that changes meaningfully: its wire id moves
from self.model_path to self.model (the value sent is byte-identical), and
the derived stem moves from self.model to self.sampling_key. The other four
clients keep self.model exactly as-is and gain a self.sampling_key alias so
the registry lookup reads an unambiguous name.

self.model_path is dropped as an attribute (nothing external read it). The
--model-path CLI flag and VLLMClient(model_path=...) ctor param are
unchanged  model_path lives on only at the locked boundary.

Zero proxy-user impact: CLI flags, ctor kwargs, wire values, and all output
schemas (/v1/models, completion model echo, eval JSONL) are untouched.
Internal-only; no version bump (rides the next release).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t protocol

Closes #100.

/v1/models previously returned a hardcoded {"id": "forge"} stub regardless of
the backend. It now reports self._client.model — the real wire id every client
carries after the identity refactor (served-model-name for vLLM, gguf stem for
llama.cpp, model tag for ollama). No fallback: a client lacking .model raises
rather than serving a lie.

Elevates model: str to the LLMClient protocol (sibling of api_format), making
the wire-id attribute a real contract now that all five clients set it
uniformly. No type-checker runs in CI today, so this is a documented contract
rather than an enforced one — the direct read is what does the work.

Also extends scripts/smoke_test_proxy.py with /v1/models coverage (absent
before): the external "default" placeholder case and the configured-model case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@antoinezambelli antoinezambelli changed the title refactor(clients): disambiguate model identity into model (wire id) + sampling_key refactor(clients): disambiguate model identity + fix /v1/models stub (#100) Jun 1, 2026
@antoinezambelli antoinezambelli merged commit ad16280 into main Jun 1, 2026
2 checks passed
@antoinezambelli antoinezambelli deleted the refactor/model-identity-disambiguation branch June 1, 2026 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

/v1/models endpoint doesn't pass through request to backend and list models

1 participant