Phase 0: default inference parameters — backend by turisanapo · Pull Request #441 · 8monkey-ai/hebo-platform

turisanapo · 2026-04-23T07:26:54Z

Summary

Phase 0 (backend) of #425 — default inference parameters on model configuration.

Add parameters field to ModelConfigSchema with temperature, top_p, max_tokens, frequency_penalty, presence_penalty, seed, stop, service_tier, and reasoning (including thinking translation for Anthropic Messages with display/summary/exclude support)
Inject and translate defaults at the end of resolveModelAlias using ??= semantics — client values always win
Add per-model catalog defaults via additionalProperties for key models (temperature, reasoning effort)
48 unit tests covering all parameters, client overrides, per-surface translation (Chat Completions, Messages, Responses), reasoning/thinking conversion, and no-op behavior

Resolution chain

request body  →  model config (branch.models[].parameters)  →  catalog model defaults

Parameter support

Parameter	Chat Completions	Messages	Responses
temperature	✅	✅	✅
top_p	✅	✅	✅
max_tokens	✅ `max_completion_tokens`	✅ `max_tokens`	✅ `max_output_tokens`
frequency_penalty	✅	✅	✅
presence_penalty	✅	✅	✅
seed	✅	✅	✅
stop	✅ `stop`	✅ `stop_sequences` (array)	—
service_tier	✅	✅	✅
reasoning	✅ `reasoning` + `reasoning_effort`	✅ → `thinking` (with display)	✅ `reasoning` + `reasoning_effort`

Test plan

Unit tests for every parameter injection + client override
Per-surface translation (max_tokens field names, stop → stop_sequences, reasoning → thinking)
Reasoning display translation (summary → summarized/omitted, exclude → omitted, exclude precedence)
Resolution precedence (client > model config > catalog defaults)
No-op on empty params
Typecheck clean (no new errors)

Closes #454

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Models now include optional default generation parameters (temperature, penalties, max tokens, seed, reasoning) to guide inference.
- Request parameters are injected when missing while preserving any client-supplied values.
- Parameter fields are normalized and mapped appropriately across different API operations for consistent behavior.
Tests
- Added comprehensive tests validating injection, preservation, remapping, and reasoning-related behavior across operations.

Add a `parameters` field to ModelConfigSchema that allows teams to configure default inference parameters (temperature, top_p, max_tokens, service_tier, reasoning) on model aliases. Parameters are injected into request bodies using ??= semantics so client values always win. Resolves #425 (Phase 1 — backend) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-04-23T07:27:02Z

📝 Walkthrough

Walkthrough

Adds a ModelParameters schema and type in the API, implements an injectModelParameters helper in the gateway that merges model-default inference parameters into request bodies with operation-specific translations, wires injection into model resolution, and adds tests covering the behavior.

Changes

Default Model Parameters (API + Gateway + Tests)

Layer / File(s)	Summary
Data Shape `apps/api/src/modules/providers/types.ts`	Adds `ModelParametersSchema` (temperature, top_p, max_tokens, frequency_penalty, presence_penalty, seed, service_tier, cache_control, optional `reasoning` with enums/limits) and exposes `ModelParameters` type; extends `ModelConfigSchema` with optional `parameters`.
Core Implementation `apps/gateway/src/lib/hooks.ts`	Adds exported `injectModelParameters(body, params, operation)` that applies defaults using `??=` semantics and translates fields by operation (messages → `max_tokens` + `thinking`, chat → `max_completion_tokens` + `reasoning`/`reasoning_effort`, responses → `max_output_tokens` + `reasoning`), and maps penalties, seed, cache_control, etc.
Wiring / Integration `apps/gateway/src/lib/hooks.ts`	Calls `injectModelParameters(ctx.body, model.parameters, ctx.operation)` from `resolveModelAlias(...)` when resolved model config contains `parameters`.
Formatting / Config `apps/gateway/src/gateway.ts`	Reformats provider config object literals (`fireworks`, `deepinfra`) to multi-line objects without behavior changes.
Tests `apps/gateway/src/lib/hooks.test.ts`	Adds comprehensive `injectModelParameters` test suite: default injection only when missing, operation-specific token field remapping, reasoning→thinking translation for `messages`, preservation of client-supplied values, penalties/seed/cache_control injection, and no-op when params empty.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Gateway
    participant ModelCatalog as Catalog
    participant ModelConfig as ModelConfig

    Client->>Gateway: send request (body, operation)
    Gateway->>Catalog: resolve model alias
    Catalog-->>Gateway: model config (may include parameters)
    Gateway->>ModelConfig: validate/parse ModelParameters
    ModelConfig-->>Gateway: ModelParameters
    Gateway->>Gateway: injectModelParameters(body, params, operation)
    Note over Gateway: translates fields per operation\n(messages→thinking/max_tokens, chat→max_completion_tokens, responses→max_output_tokens)
    Gateway-->>Client: forward enhanced request

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

#454: Phase 0: Default inference parameters — backend — Implements the Phase 0 backend scope (schema + injectModelParameters + per-API translation + tests).
#425: Epic for default inference parameters — This PR implements backend/catalog wiring that matches the epic requirements.
#422: Adds ModelParameters schema and gateway injection — Matches the feature described in that issue.

Poem

🐰 Hop, nibble, code and play,
Defaults tucked in fields today.
Parameters sprout where requests roam,
Request wins, but defaults guide home.
Tiny rabbit cheers: "Merged and bright!" 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Phase 0: default inference parameters — backend' accurately reflects the main change: adding backend support for default inference parameters on model configurations.
Linked Issues check	✅ Passed	The PR implements all core objectives from `#454`: ModelParametersSchema with required fields, injectModelParameters with ??= semantics, per-API-surface translations (Chat Completions/Messages/Responses), and comprehensive unit test coverage.
Out of Scope Changes check	✅ Passed	All changes align with `#454` scope. The PR adds schema definitions, injection/translation logic in hooks, and extensive test coverage. No console UI (Phase 1) or observability/telemetry (Phase 2) changes are included.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/default-inference-parameters

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get your free trial and get 200 agent minutes per Slack user (a $50 value).

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

turisanapo · 2026-05-01T03:13:46Z

@CodeRabbit please review

coderabbitai · 2026-05-01T03:13:58Z

Tip

For best results, initiate chat on the files or code changes.

@turisanapo Sure! I'll start reviewing the changes in PR #441 right away.

[review]

heiwen · 2026-05-01T03:32:12Z

I am a bit confused by the implemented approach. What's the purpose of tying defaults to a configured provider that cannot be changed at runtime? This should live on model aliases instead, should't it?

(minor: don't see a reason to include stop parameter)

turisanapo · 2026-05-01T03:51:58Z

I am a bit confused by the implemented approach. What's the purpose of tying defaults to a configured provider that cannot be changed at runtime? This should live on model aliases instead, should't it?

(minor: don't see a reason to include stop parameter)

Defaults are introduced at two levels:

1/ Model config: configurable per alias, resolved at runtime. These are "user defaults" and will be editable from the console in a different PR (Phase 1 from the issue design)

2/ Catalog model default: system-level. These provide out-of-the-box defaults for selected models.

The resolution chain is:

  request body  →  model config (alias)  →  catalog defaults
    (highest)        (user-configured)        (system)

Where the client always wins.

In this PR, I'm introducing the ModelParametersSchema, the injection/translation logic across all three API surfaces, and the resolution chain. The console UI for editing model config parameters comes in Phase 1.

Do we agree on having system-level defaults? Alternatively, we should drop them and only support user-configurable defaults.

heiwen · 2026-05-01T03:58:26Z

I don't see a requirement for catalogue / system level defaults right now. On the opposite, it can be quite confusing, since consumers will expect similar defaults as from other gateways / the upstream providers itself. Correct me if you have a clear customer requirement here.

One more setting to add is caching.

heiwen · 2026-05-01T04:06:20Z

Plus: to validate whether preset parameters can be overridden by request parameters in OpenRouter.

turisanapo · 2026-05-01T04:13:25Z

Plus: to validate whether preset parameters can be overridden by request parameters in OpenRouter.

This has already been explored. In OperRouter the client wins.

We have investigated other providers too, you can see more details in the parent issue: #425, Industry context section.

heiwen · 2026-05-01T04:28:43Z

Has the OpenRouter behavior been validated or is it just theoretical research?

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/gateway/src/lib/hooks.ts`:
- Around line 62-70: The code is directly passing internal params.reasoning into
body and injecting both body.reasoning and body.reasoning_effort inconsistently;
update the translation so providers only receive fields they expect: when
operation === "chat" do not set body.reasoning at all and set
body.reasoning_effort ??= params.reasoning?.effort (only if client didn't
already provide reasoning_effort), and in the non-messages/non-chat (responses)
branch construct body.reasoning only from allowed keys (e.g., { effort:
params.reasoning?.effort, summary: /* map/derive if applicable */ } but do not
copy internal-only fields like enabled/max_tokens/exclude); ensure you
nullish-coalesce to avoid overwriting existing body.reasoning or
body.reasoning.effort and mirror how the messages branch translates shapes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 53a4d276-3a0e-4ee7-b7ed-269137a9a30a

📥 Commits

Reviewing files that changed from the base of the PR and between ccd21eb and 4932c6c.

📒 Files selected for processing (4)

apps/api/src/modules/providers/types.ts
apps/gateway/src/gateway.ts
apps/gateway/src/lib/hooks.test.ts
apps/gateway/src/lib/hooks.ts

✅ Files skipped from review due to trivial changes (2)

apps/gateway/src/gateway.ts
apps/gateway/src/lib/hooks.test.ts

coderabbitai · 2026-05-03T04:22:57Z

+  } else {
+    if (operation === "chat") {
+      body.max_completion_tokens ??= params.max_tokens;
+    } else {
+      body.max_output_tokens ??= params.max_tokens;
+    }
+    body.reasoning ??= params.reasoning;
+    body.reasoning_effort ??= params.reasoning?.effort;
+  }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

OpenAI Responses API reasoning parameter accepted fields 2025 — does it allow unknown fields like "enabled" or "max_tokens" in the reasoning object?

💡 Result:

No, the OpenAI Responses API reasoning parameter does not allow unknown fields like "enabled" or "max_tokens". The accepted fields in the reasoning object are "effort" (e.g., "low", "medium") and "summary" (e.g., "auto", "detailed"). Including unknown fields such as "enabled" or "max_tokens" will result in a 400 error with "Unknown parameter" message, as the API strictly validates parameters and rejects unrecognized ones in objects like reasoning. "max_tokens" is a top-level parameter on the responses.create call, not part of reasoning. This behavior is consistent across OpenAI APIs, with evidence from official docs showing only effort and summary used, and multiple error examples for unknown params.

Citations:

1: https://developers.openai.com/api/reference/resources/responses

2: https://platform.openai.com/docs/guides/reasoning?api-mode=responses

3: https://developers.openai.com/api/reference/resources/responses/methods/create

4: Responses API: multi-turn conversations 400 on turn 2 when passing response.output back as input openai/openai-python#3008

5: https://community.openai.com/t/o1s-reasoning-effort-parameter/1062308

6: [Bug] Custom OpenAI-compatible provider errors with "Unknown parameter: 'reasoningSummary'" for GPT-5 series models anomalyco/opencode#13546

7: https://stackoverflow.com/questions/79039544/openai-api-error-invalid-parameter-response-format-of-type-json-schema-is

Pass internal reasoning through translation layer matching each provider's schema.

The code at lines 62-70 injects params.reasoning (hebo's internal format with enabled, max_tokens, exclude fields) directly into the request body without translating to provider formats:

Responses API: Strictly rejects unknown fields in reasoning with 400 errors; only effort and summary are valid

Chat Completions: Has no top-level reasoning object; only reasoning_effort (string) is accepted

Additionally, line 67 creates a conflict when a client supplies body.reasoning_effort but not body.reasoning—line 68 still injects params.reasoning.effort, causing body.reasoning.effort and body.reasoning_effort to diverge.

Translate the internal shape into each provider's expected format (mirroring the messages branch handling), and guard injection of body.reasoning against operation === "chat" where the field is invalid.

Suggested translation for non-messages branch

} else { if (operation === "chat") { body.max_completion_tokens ??= params.max_tokens; } else { body.max_output_tokens ??= params.max_tokens; } - body.reasoning ??= params.reasoning; - body.reasoning_effort ??= params.reasoning?.effort; + body.reasoning_effort ??= params.reasoning?.effort; + // For operations that accept a `reasoning` object (e.g. "responses"), + // translate the internal shape into the provider-expected { effort, summary } form. + if (operation !== "chat" && params.reasoning && !body.reasoning) { + const reasoning: Record<string, unknown> = {}; + if (params.reasoning.effort) reasoning.effort = params.reasoning.effort; + if (params.reasoning.summary && params.reasoning.summary !== "none") { + reasoning.summary = params.reasoning.summary; + } + if (Object.keys(reasoning).length > 0) body.reasoning = reasoning; + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

} else {

if (operation === "chat") {

body.max_completion_tokens ??= params.max_tokens;

} else {

body.max_output_tokens ??= params.max_tokens;

}

body.reasoning ??= params.reasoning;

body.reasoning_effort ??= params.reasoning?.effort;

}

} else {

if (operation === "chat") {

body.max_completion_tokens ??= params.max_tokens;

} else {

body.max_output_tokens ??= params.max_tokens;

}

body.reasoning_effort ??= params.reasoning?.effort;

// For operations that accept a `reasoning` object (e.g. "responses"),

// translate the internal shape into the provider-expected { effort, summary } form.

if (operation !== "chat" && params.reasoning && !body.reasoning) {

const reasoning: Record<string, unknown> = {};

if (params.reasoning.effort) reasoning.effort = params.reasoning.effort;

if (params.reasoning.summary && params.reasoning.summary !== "none") {

reasoning.summary = params.reasoning.summary;

}

if (Object.keys(reasoning).length > 0) body.reasoning = reasoning;

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/gateway/src/lib/hooks.ts` around lines 62 - 70, The code is directly passing internal params.reasoning into body and injecting both body.reasoning and body.reasoning_effort inconsistently; update the translation so providers only receive fields they expect: when operation === "chat" do not set body.reasoning at all and set body.reasoning_effort ??= params.reasoning?.effort (only if client didn't already provide reasoning_effort), and in the non-messages/non-chat (responses) branch construct body.reasoning only from allowed keys (e.g., { effort: params.reasoning?.effort, summary: /* map/derive if applicable */ } but do not copy internal-only fields like enabled/max_tokens/exclude); ensure you nullish-coalesce to avoid overwriting existing body.reasoning or body.reasoning.effort and mirror how the messages branch translates shapes.

turisanapo added 11 commits April 28, 2026 10:09

fix: max_token translation

5806aa5

chores: simplify typing

09126b1

fix: rever typing

66f174c

chores: make code dry

b284bb1

chores: streamline catalog additional properties

db23007

feat: define model defaults

2343d9e

feat: include frequency_penalty, presence_penalty, seed, service_tier

48628c5

chores: rename additionalProperties to withMeta

71f514c

chores: simplify

0899666

fix: make hook more efficient

d604028

fix: simplify

ccd21eb

turisanapo changed the title ~~support default inference parameters on model configuration~~ Phase 0: default inference parameters — backend May 1, 2026

turisanapo marked this pull request as ready for review May 1, 2026 03:03

turisanapo requested a review from heiwen May 1, 2026 03:17

feat: remove system defaults and enable cache control

4932c6c

coderabbitai Bot reviewed May 3, 2026

View reviewed changes

Conversation

turisanapo commented Apr 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Resolution chain

Parameter support

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

turisanapo commented May 1, 2026

Uh oh!

coderabbitai Bot commented May 1, 2026

Uh oh!

heiwen commented May 1, 2026

Uh oh!

turisanapo commented May 1, 2026

Uh oh!

heiwen commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

heiwen commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

turisanapo commented May 1, 2026

Uh oh!

heiwen commented May 1, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

turisanapo commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

heiwen commented May 1, 2026 •

edited

Loading

heiwen commented May 1, 2026 •

edited

Loading