[Evaluator][Skill Eval] Support configurable Switchyard backends for skill-eval agent runs

## Request

Platform QA would like to run Evaluator skill-eval against different Switchyard backend profiles, so we can compare skill-eval behavior under backend routing strategies instead of only evaluating a single direct model.

Desired profiles:

- default model passthrough
- Switchyard random routing, for example GPT 5.4 / DeepSeek V4
- Switchyard dynamic routing, for example GPT 5.4 / DeepSeek V4 with DeepSeek V4 as classifier

## What works today

Switchyard / IGW routing itself appears functional. A Switchyard VirtualModel with translate middleware can accept Claude Code style Anthropic /v1/messages requests and route them to OpenAI-compatible backend models successfully.

## Current blocker

When running astra-skill-eval evaluate with Harbor and claude-code, the run fails before skill-eval trials start. Harbor agent/model preflight rejects the Platform Switchyard VirtualModel model id.

Observed failure shape:

    Error: model not available for claude-code:
    default/<switchyard-virtual-model-name>

    Available claude-code models for this key:
      aws/anthropic/bedrock-claude-opus-4-7
      aws/anthropic/bedrock-claude-opus-4-6
      ...

## Source analysis

From astra-skill-eval / Harbor source analysis:

- runner.py resolves ANTHROPIC_MODEL as the claude-code model.
- model_catalog.py validates the selected model against NVIDIA /models catalog.
- For claude-code, the compatibility filter only accepts model ids containing anthropic or claude.
- A Platform Switchyard VM model id such as default/skill-eval-swy-random-... is not in the public NVIDIA catalog and is rejected before Claude Code calls the configured ANTHROPIC_BASE_URL.

The Harbor Claude Code adapter itself appears capable of using custom ANTHROPIC_BASE_URL + ANTHROPIC_MODEL. The blocking issue is the preflight catalog validation path.

## Expected behavior

Platform Evaluator skill-eval should support configuring Switchyard backend routing for agent eval runs.

When ANTHROPIC_BASE_URL or another custom gateway base URL points to Platform IGW, astra-skill-eval / Harbor should either:

- validate model availability against the custom gateway /models endpoint, or
- allow bypassing public NVIDIA catalog validation for custom gateway models, or
- provide an explicit config flag for custom agent model preflight.

## Impact

QA cannot run end-to-end Evaluator skill-eval coverage through Switchyard backends, even though the gateway route itself works. This blocks testing skill-eval behavior across backend routing modes and makes it hard to compare default passthrough vs random vs dynamic Switchyard routing.

## Suggested acceptance criteria

- Evaluator skill-eval can run with a configured Switchyard VirtualModel as the claude-code agent model.
- The run performs with-skill and without-skill baseline trials normally.
- The resulting artifacts show selected backend profile, agent model, and routing stats/distribution.
- Custom gateway models are not rejected solely because they are absent from NVIDIA public model catalog.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Evaluator][Skill Eval] Support configurable Switchyard backends for skill-eval agent runs #151

Request

What works today

Current blocker

Source analysis

Expected behavior

Impact

Suggested acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Evaluator][Skill Eval] Support configurable Switchyard backends for skill-eval agent runs #151

Description

Request

What works today

Current blocker

Source analysis

Expected behavior

Impact

Suggested acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions