Model Regression Tests

Purpose

tests/model_regression.rs is a live network regression suite for provider/model compatibility checks.

It verifies that the default model profiles shipped in src/templates/buddy.toml can each complete:

a tiny plain-text round-trip, and
a protocol-valid follow-up turn that includes prior assistant tool-calls plus a tool-error result.

This suite is intentionally separate from normal unit tests because it requires network access and real credentials.

What It Checks

For each profile in the default template:

Select the profile via normal runtime config resolution.
Validate auth prerequisites:
- auth = "api-key": resolved key must be non-empty.
- auth = "login": saved provider login tokens must already exist.
- If a profile does not set api_key_env, the suite falls back to provider defaults: OPENAI_API_KEY, OPENROUTER_API_KEY, MOONSHOT_API_KEY, ANTHROPIC_API_KEY.
Send a minimal prompt (Reply with exactly OK.) with no tools.
Assert:
- response has at least one choice,
- no unexpected tool calls are returned,
- assistant text content is non-empty.
Send a second probe with injected tool history:
- assistant tool-call message,
- matching tool-result message containing an error payload,
- user follow-up requesting a plain text reply.
Assert:
- provider accepts the history shape,
- assistant text content is still non-empty (error recovery path remains viable).

Profile execution is independent:

missing credentials remain failures for that profile,
model_not_found responses are reported as skipped (profile unavailable on the current account/API),
one profile failure does not stop probes for other profiles.
auth-mode expectations are read from src/templates/models.toml capability metadata (for example login-only models are probed with login auth even when profile auth defaults to API key).

Why It Is Ignored By Default

The test is marked #[ignore] so cargo test remains offline, fast, and deterministic.

How To Run

cargo test --test model_regression -- --ignored --nocapture

Required Setup

Ensure template-referenced API key env vars are available for API-key profiles.
- Example defaults: OPENROUTER_API_KEY, MOONSHOT_API_KEY, ANTHROPIC_API_KEY.
Ensure login profiles are already authenticated (for default OpenAI login profiles):

buddy login openai

Unset global runtime override env vars before running this suite:
- BUDDY_API_KEY, AGENT_API_KEY
- BUDDY_BASE_URL, AGENT_BASE_URL
- BUDDY_MODEL, AGENT_MODEL

These overrides intentionally fail the suite because they mask per-profile behavior.

Cost/Token Discipline

The suite uses two short requests per profile and keeps prompts minimal. The tool-error-history probe uses synthetic tool history in the message list (it does not actually execute shell commands).

Failure Triage

Common failure classes:

auth preflight failed (missing env key or missing login tokens)
provider timeout / transport error
provider accepted request but returned empty assistant content
unexpected tool-call response for a no-tools probe
provider rejected prior tool-call/tool-result history shape

When a failure appears, rerun the single suite command with --nocapture to see per-profile logs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Regression Tests

Purpose

What It Checks

Why It Is Ignored By Default

How To Run

Required Setup

Cost/Token Discipline

Failure Triage

FilesExpand file tree

model-regression-tests.md

Latest commit

History

model-regression-tests.md

File metadata and controls

Model Regression Tests

Purpose

What It Checks

Why It Is Ignored By Default

How To Run

Required Setup

Cost/Token Discipline

Failure Triage