Skip to content

Phase 0: default inference parameters — backend#441

Open
turisanapo wants to merge 13 commits into
mainfrom
feat/default-inference-parameters
Open

Phase 0: default inference parameters — backend#441
turisanapo wants to merge 13 commits into
mainfrom
feat/default-inference-parameters

Conversation

@turisanapo
Copy link
Copy Markdown
Contributor

@turisanapo turisanapo commented Apr 23, 2026

Summary

Phase 0 (backend) of #425 — default inference parameters on model configuration.

  • Add parameters field to ModelConfigSchema with temperature, top_p, max_tokens, frequency_penalty, presence_penalty, seed, stop, service_tier, and reasoning (including thinking translation for Anthropic Messages with display/summary/exclude support)
  • Inject and translate defaults at the end of resolveModelAlias using ??= semantics — client values always win
  • Add per-model catalog defaults via additionalProperties for key models (temperature, reasoning effort)
  • 48 unit tests covering all parameters, client overrides, per-surface translation (Chat Completions, Messages, Responses), reasoning/thinking conversion, and no-op behavior

Resolution chain

request body  →  model config (branch.models[].parameters)  →  catalog model defaults

Parameter support

Parameter Chat Completions Messages Responses
temperature
top_p
max_tokens max_completion_tokens max_tokens max_output_tokens
frequency_penalty
presence_penalty
seed
stop stop stop_sequences (array)
service_tier
reasoning reasoning + reasoning_effort ✅ → thinking (with display) reasoning + reasoning_effort

Test plan

  • Unit tests for every parameter injection + client override
  • Per-surface translation (max_tokens field names, stop → stop_sequences, reasoning → thinking)
  • Reasoning display translation (summary → summarized/omitted, exclude → omitted, exclude precedence)
  • Resolution precedence (client > model config > catalog defaults)
  • No-op on empty params
  • Typecheck clean (no new errors)

Closes #454

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Models now include optional default generation parameters (temperature, penalties, max tokens, seed, reasoning) to guide inference.
    • Request parameters are injected when missing while preserving any client-supplied values.
    • Parameter fields are normalized and mapped appropriately across different API operations for consistent behavior.
  • Tests

    • Added comprehensive tests validating injection, preservation, remapping, and reasoning-related behavior across operations.

Add a `parameters` field to ModelConfigSchema that allows teams to
configure default inference parameters (temperature, top_p, max_tokens,
service_tier, reasoning) on model aliases. Parameters are injected into
request bodies using ??= semantics so client values always win.

Resolves #425 (Phase 1 — backend)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

📝 Walkthrough

Walkthrough

Adds a ModelParameters schema and type in the API, implements an injectModelParameters helper in the gateway that merges model-default inference parameters into request bodies with operation-specific translations, wires injection into model resolution, and adds tests covering the behavior.

Changes

Default Model Parameters (API + Gateway + Tests)

Layer / File(s) Summary
Data Shape
apps/api/src/modules/providers/types.ts
Adds ModelParametersSchema (temperature, top_p, max_tokens, frequency_penalty, presence_penalty, seed, service_tier, cache_control, optional reasoning with enums/limits) and exposes ModelParameters type; extends ModelConfigSchema with optional parameters.
Core Implementation
apps/gateway/src/lib/hooks.ts
Adds exported injectModelParameters(body, params, operation) that applies defaults using ??= semantics and translates fields by operation (messages → max_tokens + thinking, chat → max_completion_tokens + reasoning/reasoning_effort, responses → max_output_tokens + reasoning), and maps penalties, seed, cache_control, etc.
Wiring / Integration
apps/gateway/src/lib/hooks.ts
Calls injectModelParameters(ctx.body, model.parameters, ctx.operation) from resolveModelAlias(...) when resolved model config contains parameters.
Formatting / Config
apps/gateway/src/gateway.ts
Reformats provider config object literals (fireworks, deepinfra) to multi-line objects without behavior changes.
Tests
apps/gateway/src/lib/hooks.test.ts
Adds comprehensive injectModelParameters test suite: default injection only when missing, operation-specific token field remapping, reasoning→thinking translation for messages, preservation of client-supplied values, penalties/seed/cache_control injection, and no-op when params empty.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Gateway
    participant ModelCatalog as Catalog
    participant ModelConfig as ModelConfig

    Client->>Gateway: send request (body, operation)
    Gateway->>Catalog: resolve model alias
    Catalog-->>Gateway: model config (may include parameters)
    Gateway->>ModelConfig: validate/parse ModelParameters
    ModelConfig-->>Gateway: ModelParameters
    Gateway->>Gateway: injectModelParameters(body, params, operation)
    Note over Gateway: translates fields per operation\n(messages→thinking/max_tokens, chat→max_completion_tokens, responses→max_output_tokens)
    Gateway-->>Client: forward enhanced request
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

  • #454: Phase 0: Default inference parameters — backend — Implements the Phase 0 backend scope (schema + injectModelParameters + per-API translation + tests).
  • #425: Epic for default inference parameters — This PR implements backend/catalog wiring that matches the epic requirements.
  • #422: Adds ModelParameters schema and gateway injection — Matches the feature described in that issue.

Poem

🐰 Hop, nibble, code and play,
Defaults tucked in fields today.
Parameters sprout where requests roam,
Request wins, but defaults guide home.
Tiny rabbit cheers: "Merged and bright!" 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Phase 0: default inference parameters — backend' accurately reflects the main change: adding backend support for default inference parameters on model configurations.
Linked Issues check ✅ Passed The PR implements all core objectives from #454: ModelParametersSchema with required fields, injectModelParameters with ??= semantics, per-API-surface translations (Chat Completions/Messages/Responses), and comprehensive unit test coverage.
Out of Scope Changes check ✅ Passed All changes align with #454 scope. The PR adds schema definitions, injection/translation logic in hooks, and extensive test coverage. No console UI (Phase 1) or observability/telemetry (Phase 2) changes are included.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/default-inference-parameters

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get your free trial and get 200 agent minutes per Slack user (a $50 value).


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@turisanapo turisanapo changed the title support default inference parameters on model configuration Phase 0: default inference parameters — backend May 1, 2026
@turisanapo turisanapo marked this pull request as ready for review May 1, 2026 03:03
@turisanapo
Copy link
Copy Markdown
Contributor Author

@CodeRabbit please review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 1, 2026

Tip

For best results, initiate chat on the files or code changes.

@turisanapo Sure! I'll start reviewing the changes in PR #441 right away.

[review]

@turisanapo turisanapo requested a review from heiwen May 1, 2026 03:17
@heiwen
Copy link
Copy Markdown
Contributor

heiwen commented May 1, 2026

I am a bit confused by the implemented approach. What's the purpose of tying defaults to a configured provider that cannot be changed at runtime? This should live on model aliases instead, should't it?

(minor: don't see a reason to include stop parameter)

@turisanapo
Copy link
Copy Markdown
Contributor Author

I am a bit confused by the implemented approach. What's the purpose of tying defaults to a configured provider that cannot be changed at runtime? This should live on model aliases instead, should't it?

(minor: don't see a reason to include stop parameter)

Defaults are introduced at two levels:

1/ Model config: configurable per alias, resolved at runtime. These are "user defaults" and will be editable from the console in a different PR (Phase 1 from the issue design)

2/ Catalog model default: system-level. These provide out-of-the-box defaults for selected models.

The resolution chain is:

  request body  →  model config (alias)  →  catalog defaults
    (highest)        (user-configured)        (system)

Where the client always wins.

In this PR, I'm introducing the ModelParametersSchema, the injection/translation logic across all three API surfaces, and the resolution chain. The console UI for editing model config parameters comes in Phase 1.

Do we agree on having system-level defaults? Alternatively, we should drop them and only support user-configurable defaults.

@heiwen
Copy link
Copy Markdown
Contributor

heiwen commented May 1, 2026

I don't see a requirement for catalogue / system level defaults right now. On the opposite, it can be quite confusing, since consumers will expect similar defaults as from other gateways / the upstream providers itself. Correct me if you have a clear customer requirement here.

One more setting to add is caching.

@heiwen
Copy link
Copy Markdown
Contributor

heiwen commented May 1, 2026

Plus: to validate whether preset parameters can be overridden by request parameters in OpenRouter.

@turisanapo
Copy link
Copy Markdown
Contributor Author

Plus: to validate whether preset parameters can be overridden by request parameters in OpenRouter.

This has already been explored. In OperRouter the client wins.

We have investigated other providers too, you can see more details in the parent issue: #425, Industry context section.

@heiwen
Copy link
Copy Markdown
Contributor

heiwen commented May 1, 2026

Has the OpenRouter behavior been validated or is it just theoretical research?

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/gateway/src/lib/hooks.ts`:
- Around line 62-70: The code is directly passing internal params.reasoning into
body and injecting both body.reasoning and body.reasoning_effort inconsistently;
update the translation so providers only receive fields they expect: when
operation === "chat" do not set body.reasoning at all and set
body.reasoning_effort ??= params.reasoning?.effort (only if client didn't
already provide reasoning_effort), and in the non-messages/non-chat (responses)
branch construct body.reasoning only from allowed keys (e.g., { effort:
params.reasoning?.effort, summary: /* map/derive if applicable */ } but do not
copy internal-only fields like enabled/max_tokens/exclude); ensure you
nullish-coalesce to avoid overwriting existing body.reasoning or
body.reasoning.effort and mirror how the messages branch translates shapes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 53a4d276-3a0e-4ee7-b7ed-269137a9a30a

📥 Commits

Reviewing files that changed from the base of the PR and between ccd21eb and 4932c6c.

📒 Files selected for processing (4)
  • apps/api/src/modules/providers/types.ts
  • apps/gateway/src/gateway.ts
  • apps/gateway/src/lib/hooks.test.ts
  • apps/gateway/src/lib/hooks.ts
✅ Files skipped from review due to trivial changes (2)
  • apps/gateway/src/gateway.ts
  • apps/gateway/src/lib/hooks.test.ts

Comment on lines +62 to +70
} else {
if (operation === "chat") {
body.max_completion_tokens ??= params.max_tokens;
} else {
body.max_output_tokens ??= params.max_tokens;
}
body.reasoning ??= params.reasoning;
body.reasoning_effort ??= params.reasoning?.effort;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

OpenAI Responses API reasoning parameter accepted fields 2025 — does it allow unknown fields like "enabled" or "max_tokens" in the reasoning object?

💡 Result:

No, the OpenAI Responses API reasoning parameter does not allow unknown fields like "enabled" or "max_tokens". The accepted fields in the reasoning object are "effort" (e.g., "low", "medium") and "summary" (e.g., "auto", "detailed"). Including unknown fields such as "enabled" or "max_tokens" will result in a 400 error with "Unknown parameter" message, as the API strictly validates parameters and rejects unrecognized ones in objects like reasoning. "max_tokens" is a top-level parameter on the responses.create call, not part of reasoning. This behavior is consistent across OpenAI APIs, with evidence from official docs showing only effort and summary used, and multiple error examples for unknown params.

Citations:


Pass internal reasoning through translation layer matching each provider's schema.

The code at lines 62-70 injects params.reasoning (hebo's internal format with enabled, max_tokens, exclude fields) directly into the request body without translating to provider formats:

  • Responses API: Strictly rejects unknown fields in reasoning with 400 errors; only effort and summary are valid
  • Chat Completions: Has no top-level reasoning object; only reasoning_effort (string) is accepted

Additionally, line 67 creates a conflict when a client supplies body.reasoning_effort but not body.reasoning—line 68 still injects params.reasoning.effort, causing body.reasoning.effort and body.reasoning_effort to diverge.

Translate the internal shape into each provider's expected format (mirroring the messages branch handling), and guard injection of body.reasoning against operation === "chat" where the field is invalid.

Suggested translation for non-messages branch
  } else {
    if (operation === "chat") {
      body.max_completion_tokens ??= params.max_tokens;
    } else {
      body.max_output_tokens ??= params.max_tokens;
    }
-   body.reasoning ??= params.reasoning;
-   body.reasoning_effort ??= params.reasoning?.effort;
+   body.reasoning_effort ??= params.reasoning?.effort;
+   // For operations that accept a `reasoning` object (e.g. "responses"),
+   // translate the internal shape into the provider-expected { effort, summary } form.
+   if (operation !== "chat" && params.reasoning && !body.reasoning) {
+     const reasoning: Record<string, unknown> = {};
+     if (params.reasoning.effort) reasoning.effort = params.reasoning.effort;
+     if (params.reasoning.summary && params.reasoning.summary !== "none") {
+       reasoning.summary = params.reasoning.summary;
+     }
+     if (Object.keys(reasoning).length > 0) body.reasoning = reasoning;
+   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} else {
if (operation === "chat") {
body.max_completion_tokens ??= params.max_tokens;
} else {
body.max_output_tokens ??= params.max_tokens;
}
body.reasoning ??= params.reasoning;
body.reasoning_effort ??= params.reasoning?.effort;
}
} else {
if (operation === "chat") {
body.max_completion_tokens ??= params.max_tokens;
} else {
body.max_output_tokens ??= params.max_tokens;
}
body.reasoning_effort ??= params.reasoning?.effort;
// For operations that accept a `reasoning` object (e.g. "responses"),
// translate the internal shape into the provider-expected { effort, summary } form.
if (operation !== "chat" && params.reasoning && !body.reasoning) {
const reasoning: Record<string, unknown> = {};
if (params.reasoning.effort) reasoning.effort = params.reasoning.effort;
if (params.reasoning.summary && params.reasoning.summary !== "none") {
reasoning.summary = params.reasoning.summary;
}
if (Object.keys(reasoning).length > 0) body.reasoning = reasoning;
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/lib/hooks.ts` around lines 62 - 70, The code is directly
passing internal params.reasoning into body and injecting both body.reasoning
and body.reasoning_effort inconsistently; update the translation so providers
only receive fields they expect: when operation === "chat" do not set
body.reasoning at all and set body.reasoning_effort ??= params.reasoning?.effort
(only if client didn't already provide reasoning_effort), and in the
non-messages/non-chat (responses) branch construct body.reasoning only from
allowed keys (e.g., { effort: params.reasoning?.effort, summary: /* map/derive
if applicable */ } but do not copy internal-only fields like
enabled/max_tokens/exclude); ensure you nullish-coalesce to avoid overwriting
existing body.reasoning or body.reasoning.effort and mirror how the messages
branch translates shapes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Phase 0: Default inference parameters — backend

2 participants