Phase 0: default inference parameters — backend#441
Conversation
Add a `parameters` field to ModelConfigSchema that allows teams to configure default inference parameters (temperature, top_p, max_tokens, service_tier, reasoning) on model aliases. Parameters are injected into request bodies using ??= semantics so client values always win. Resolves #425 (Phase 1 — backend) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds a ModelParameters schema and type in the API, implements an injectModelParameters helper in the gateway that merges model-default inference parameters into request bodies with operation-specific translations, wires injection into model resolution, and adds tests covering the behavior. ChangesDefault Model Parameters (API + Gateway + Tests)
Sequence DiagramsequenceDiagram
participant Client
participant Gateway
participant ModelCatalog as Catalog
participant ModelConfig as ModelConfig
Client->>Gateway: send request (body, operation)
Gateway->>Catalog: resolve model alias
Catalog-->>Gateway: model config (may include parameters)
Gateway->>ModelConfig: validate/parse ModelParameters
ModelConfig-->>Gateway: ModelParameters
Gateway->>Gateway: injectModelParameters(body, params, operation)
Note over Gateway: translates fields per operation\n(messages→thinking/max_tokens, chat→max_completion_tokens, responses→max_output_tokens)
Gateway-->>Client: forward enhanced request
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. 👉 Get your free trial and get 200 agent minutes per Slack user (a $50 value). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
|
@CodeRabbit please review |
|
Tip For best results, initiate chat on the files or code changes.
[review] |
|
I am a bit confused by the implemented approach. What's the purpose of tying defaults to a configured provider that cannot be changed at runtime? This should live on model aliases instead, should't it? (minor: don't see a reason to include |
Defaults are introduced at two levels: 1/ Model config: configurable per alias, resolved at runtime. These are "user defaults" and will be editable from the console in a different PR (Phase 1 from the issue design) 2/ Catalog model default: system-level. These provide out-of-the-box defaults for selected models. The resolution chain is: Where the client always wins. In this PR, I'm introducing the Do we agree on having system-level defaults? Alternatively, we should drop them and only support user-configurable defaults. |
|
I don't see a requirement for catalogue / system level defaults right now. On the opposite, it can be quite confusing, since consumers will expect similar defaults as from other gateways / the upstream providers itself. Correct me if you have a clear customer requirement here. One more setting to add is caching. |
|
Plus: to validate whether preset parameters can be overridden by request parameters in OpenRouter. |
This has already been explored. In OperRouter the client wins. We have investigated other providers too, you can see more details in the parent issue: #425, Industry context section. |
|
Has the OpenRouter behavior been validated or is it just theoretical research? |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/gateway/src/lib/hooks.ts`:
- Around line 62-70: The code is directly passing internal params.reasoning into
body and injecting both body.reasoning and body.reasoning_effort inconsistently;
update the translation so providers only receive fields they expect: when
operation === "chat" do not set body.reasoning at all and set
body.reasoning_effort ??= params.reasoning?.effort (only if client didn't
already provide reasoning_effort), and in the non-messages/non-chat (responses)
branch construct body.reasoning only from allowed keys (e.g., { effort:
params.reasoning?.effort, summary: /* map/derive if applicable */ } but do not
copy internal-only fields like enabled/max_tokens/exclude); ensure you
nullish-coalesce to avoid overwriting existing body.reasoning or
body.reasoning.effort and mirror how the messages branch translates shapes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 53a4d276-3a0e-4ee7-b7ed-269137a9a30a
📒 Files selected for processing (4)
apps/api/src/modules/providers/types.tsapps/gateway/src/gateway.tsapps/gateway/src/lib/hooks.test.tsapps/gateway/src/lib/hooks.ts
✅ Files skipped from review due to trivial changes (2)
- apps/gateway/src/gateway.ts
- apps/gateway/src/lib/hooks.test.ts
| } else { | ||
| if (operation === "chat") { | ||
| body.max_completion_tokens ??= params.max_tokens; | ||
| } else { | ||
| body.max_output_tokens ??= params.max_tokens; | ||
| } | ||
| body.reasoning ??= params.reasoning; | ||
| body.reasoning_effort ??= params.reasoning?.effort; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
OpenAI Responses API reasoning parameter accepted fields 2025 — does it allow unknown fields like "enabled" or "max_tokens" in the reasoning object?
💡 Result:
No, the OpenAI Responses API reasoning parameter does not allow unknown fields like "enabled" or "max_tokens". The accepted fields in the reasoning object are "effort" (e.g., "low", "medium") and "summary" (e.g., "auto", "detailed"). Including unknown fields such as "enabled" or "max_tokens" will result in a 400 error with "Unknown parameter" message, as the API strictly validates parameters and rejects unrecognized ones in objects like reasoning. "max_tokens" is a top-level parameter on the responses.create call, not part of reasoning. This behavior is consistent across OpenAI APIs, with evidence from official docs showing only effort and summary used, and multiple error examples for unknown params.
Citations:
- 1: https://developers.openai.com/api/reference/resources/responses
- 2: https://platform.openai.com/docs/guides/reasoning?api-mode=responses
- 3: https://developers.openai.com/api/reference/resources/responses/methods/create
- 4: Responses API: multi-turn conversations 400 on turn 2 when passing response.output back as input openai/openai-python#3008
- 5: https://community.openai.com/t/o1s-reasoning-effort-parameter/1062308
- 6: [Bug] Custom OpenAI-compatible provider errors with "Unknown parameter: 'reasoningSummary'" for GPT-5 series models anomalyco/opencode#13546
- 7: https://stackoverflow.com/questions/79039544/openai-api-error-invalid-parameter-response-format-of-type-json-schema-is
Pass internal reasoning through translation layer matching each provider's schema.
The code at lines 62-70 injects params.reasoning (hebo's internal format with enabled, max_tokens, exclude fields) directly into the request body without translating to provider formats:
- Responses API: Strictly rejects unknown fields in
reasoningwith 400 errors; onlyeffortandsummaryare valid - Chat Completions: Has no top-level
reasoningobject; onlyreasoning_effort(string) is accepted
Additionally, line 67 creates a conflict when a client supplies body.reasoning_effort but not body.reasoning—line 68 still injects params.reasoning.effort, causing body.reasoning.effort and body.reasoning_effort to diverge.
Translate the internal shape into each provider's expected format (mirroring the messages branch handling), and guard injection of body.reasoning against operation === "chat" where the field is invalid.
Suggested translation for non-messages branch
} else {
if (operation === "chat") {
body.max_completion_tokens ??= params.max_tokens;
} else {
body.max_output_tokens ??= params.max_tokens;
}
- body.reasoning ??= params.reasoning;
- body.reasoning_effort ??= params.reasoning?.effort;
+ body.reasoning_effort ??= params.reasoning?.effort;
+ // For operations that accept a `reasoning` object (e.g. "responses"),
+ // translate the internal shape into the provider-expected { effort, summary } form.
+ if (operation !== "chat" && params.reasoning && !body.reasoning) {
+ const reasoning: Record<string, unknown> = {};
+ if (params.reasoning.effort) reasoning.effort = params.reasoning.effort;
+ if (params.reasoning.summary && params.reasoning.summary !== "none") {
+ reasoning.summary = params.reasoning.summary;
+ }
+ if (Object.keys(reasoning).length > 0) body.reasoning = reasoning;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } else { | |
| if (operation === "chat") { | |
| body.max_completion_tokens ??= params.max_tokens; | |
| } else { | |
| body.max_output_tokens ??= params.max_tokens; | |
| } | |
| body.reasoning ??= params.reasoning; | |
| body.reasoning_effort ??= params.reasoning?.effort; | |
| } | |
| } else { | |
| if (operation === "chat") { | |
| body.max_completion_tokens ??= params.max_tokens; | |
| } else { | |
| body.max_output_tokens ??= params.max_tokens; | |
| } | |
| body.reasoning_effort ??= params.reasoning?.effort; | |
| // For operations that accept a `reasoning` object (e.g. "responses"), | |
| // translate the internal shape into the provider-expected { effort, summary } form. | |
| if (operation !== "chat" && params.reasoning && !body.reasoning) { | |
| const reasoning: Record<string, unknown> = {}; | |
| if (params.reasoning.effort) reasoning.effort = params.reasoning.effort; | |
| if (params.reasoning.summary && params.reasoning.summary !== "none") { | |
| reasoning.summary = params.reasoning.summary; | |
| } | |
| if (Object.keys(reasoning).length > 0) body.reasoning = reasoning; | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/gateway/src/lib/hooks.ts` around lines 62 - 70, The code is directly
passing internal params.reasoning into body and injecting both body.reasoning
and body.reasoning_effort inconsistently; update the translation so providers
only receive fields they expect: when operation === "chat" do not set
body.reasoning at all and set body.reasoning_effort ??= params.reasoning?.effort
(only if client didn't already provide reasoning_effort), and in the
non-messages/non-chat (responses) branch construct body.reasoning only from
allowed keys (e.g., { effort: params.reasoning?.effort, summary: /* map/derive
if applicable */ } but do not copy internal-only fields like
enabled/max_tokens/exclude); ensure you nullish-coalesce to avoid overwriting
existing body.reasoning or body.reasoning.effort and mirror how the messages
branch translates shapes.
Summary
Phase 0 (backend) of #425 — default inference parameters on model configuration.
parametersfield toModelConfigSchemawith temperature, top_p, max_tokens, frequency_penalty, presence_penalty, seed, stop, service_tier, and reasoning (includingthinkingtranslation for Anthropic Messages with display/summary/exclude support)resolveModelAliasusing??=semantics — client values always winadditionalPropertiesfor key models (temperature, reasoning effort)Resolution chain
Parameter support
max_completion_tokensmax_tokensmax_output_tokensstopstop_sequences(array)reasoning+reasoning_effortthinking(with display)reasoning+reasoning_effortTest plan
Closes #454
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests