Request
Platform QA would like to run Evaluator skill-eval against different Switchyard backend profiles, so we can compare skill-eval behavior under backend routing strategies instead of only evaluating a single direct model.
Desired profiles:
- default model passthrough
- Switchyard random routing, for example GPT 5.4 / DeepSeek V4
- Switchyard dynamic routing, for example GPT 5.4 / DeepSeek V4 with DeepSeek V4 as classifier
What works today
Switchyard / IGW routing itself appears functional. A Switchyard VirtualModel with translate middleware can accept Claude Code style Anthropic /v1/messages requests and route them to OpenAI-compatible backend models successfully.
Current blocker
When running astra-skill-eval evaluate with Harbor and claude-code, the run fails before skill-eval trials start. Harbor agent/model preflight rejects the Platform Switchyard VirtualModel model id.
Observed failure shape:
Error: model not available for claude-code:
default/<switchyard-virtual-model-name>
Available claude-code models for this key:
aws/anthropic/bedrock-claude-opus-4-7
aws/anthropic/bedrock-claude-opus-4-6
...
Source analysis
From astra-skill-eval / Harbor source analysis:
- runner.py resolves ANTHROPIC_MODEL as the claude-code model.
- model_catalog.py validates the selected model against NVIDIA /models catalog.
- For claude-code, the compatibility filter only accepts model ids containing anthropic or claude.
- A Platform Switchyard VM model id such as default/skill-eval-swy-random-... is not in the public NVIDIA catalog and is rejected before Claude Code calls the configured ANTHROPIC_BASE_URL.
The Harbor Claude Code adapter itself appears capable of using custom ANTHROPIC_BASE_URL + ANTHROPIC_MODEL. The blocking issue is the preflight catalog validation path.
Expected behavior
Platform Evaluator skill-eval should support configuring Switchyard backend routing for agent eval runs.
When ANTHROPIC_BASE_URL or another custom gateway base URL points to Platform IGW, astra-skill-eval / Harbor should either:
- validate model availability against the custom gateway /models endpoint, or
- allow bypassing public NVIDIA catalog validation for custom gateway models, or
- provide an explicit config flag for custom agent model preflight.
Impact
QA cannot run end-to-end Evaluator skill-eval coverage through Switchyard backends, even though the gateway route itself works. This blocks testing skill-eval behavior across backend routing modes and makes it hard to compare default passthrough vs random vs dynamic Switchyard routing.
Suggested acceptance criteria
- Evaluator skill-eval can run with a configured Switchyard VirtualModel as the claude-code agent model.
- The run performs with-skill and without-skill baseline trials normally.
- The resulting artifacts show selected backend profile, agent model, and routing stats/distribution.
- Custom gateway models are not rejected solely because they are absent from NVIDIA public model catalog.
Request
Platform QA would like to run Evaluator skill-eval against different Switchyard backend profiles, so we can compare skill-eval behavior under backend routing strategies instead of only evaluating a single direct model.
Desired profiles:
What works today
Switchyard / IGW routing itself appears functional. A Switchyard VirtualModel with translate middleware can accept Claude Code style Anthropic /v1/messages requests and route them to OpenAI-compatible backend models successfully.
Current blocker
When running astra-skill-eval evaluate with Harbor and claude-code, the run fails before skill-eval trials start. Harbor agent/model preflight rejects the Platform Switchyard VirtualModel model id.
Observed failure shape:
Source analysis
From astra-skill-eval / Harbor source analysis:
The Harbor Claude Code adapter itself appears capable of using custom ANTHROPIC_BASE_URL + ANTHROPIC_MODEL. The blocking issue is the preflight catalog validation path.
Expected behavior
Platform Evaluator skill-eval should support configuring Switchyard backend routing for agent eval runs.
When ANTHROPIC_BASE_URL or another custom gateway base URL points to Platform IGW, astra-skill-eval / Harbor should either:
Impact
QA cannot run end-to-end Evaluator skill-eval coverage through Switchyard backends, even though the gateway route itself works. This blocks testing skill-eval behavior across backend routing modes and makes it hard to compare default passthrough vs random vs dynamic Switchyard routing.
Suggested acceptance criteria