Problem
When invoking a harness agent with --model-id to override the model at invoke time, all model inference parameters (temperature, topP, topK, maxTokens) configured in harness.json are dropped from the request. The server treats the invoke-time model override as a full replacement of the deployed model config, so omitting these fields results in unbounded/default inference behavior.
Example: A harness configured with maxTokens: 500 will ignore that limit when invoked with --model-id anthropic.claude-v3, producing responses far exceeding 500 tokens.
Root Cause
buildHarnessBaseOpts in the invoke command constructed the model override object with only the modelId (and apiKeyArn for non-Bedrock providers), without forwarding the inference parameters from the harness spec.
Impact
- Affects all provider types (Bedrock, OpenAI, Gemini)
- Only triggered when using
--model-id at invoke time
- Deploy-time configuration is unaffected (the parameters are correctly stored on the server)
Fix
PR #1287 — forwards inference parameters (temperature, topP, topK, maxTokens) from the harness spec when constructing the model override, and uses CLI-provided values (e.g. --api-key-arn) as precedence overrides.
Affected Versions
Preview branch only (harness feature has not shipped to mainline).
Problem
When invoking a harness agent with
--model-idto override the model at invoke time, all model inference parameters (temperature, topP, topK, maxTokens) configured inharness.jsonare dropped from the request. The server treats the invoke-time model override as a full replacement of the deployed model config, so omitting these fields results in unbounded/default inference behavior.Example: A harness configured with
maxTokens: 500will ignore that limit when invoked with--model-id anthropic.claude-v3, producing responses far exceeding 500 tokens.Root Cause
buildHarnessBaseOptsin the invoke command constructed the model override object with only themodelId(andapiKeyArnfor non-Bedrock providers), without forwarding the inference parameters from the harness spec.Impact
--model-idat invoke timeFix
PR #1287 — forwards inference parameters (temperature, topP, topK, maxTokens) from the harness spec when constructing the model override, and uses CLI-provided values (e.g.
--api-key-arn) as precedence overrides.Affected Versions
Preview branch only (harness feature has not shipped to mainline).