feat(backend): surface self-hosted reasoning via @ai-sdk/openai-compatible by MesoX · Pull Request #262 · willdady/platypus

MesoX · 2026-06-15T19:46:39Z

Summary

Chat-completions mode (apiMode=chat) on an OpenAI provider is, in practice, only used against self-hosted / company OpenAI-compatible servers (vLLM, SGLang, llama.cpp, TGI, LM Studio) — real OpenAI is driven via the Responses API. Those servers expose the model's thinking in a reasoning_content field, but @ai-sdk/openai's chat model only parses role/content/tool_calls and silently drops it. So the existing reasoning UI never received anything from them.

This routes chat-mode language models through @ai-sdk/openai-compatible, which reads reasoning_content (and reasoning) natively and emits reasoning stream parts — rendered by the existing collapsible "Thinking…" block. No frontend changes.

Changes

Add @ai-sdk/openai-compatible dependency (depends on the same @ai-sdk/provider@3 as the other adapters).
provider.ts: for OpenAI providers with apiMode==="chat", build the language model via createOpenAICompatible(...). Embeddings and native search tools stay on the OpenAI SDK. Responses mode is unchanged.
Set includeUsage: true so streamed turns carry stream_options.include_usage — without it self-hosted servers return no token usage on the streaming path (shows as In:0 / Out:0).
Tests covering chat→compatible routing and responses→OpenAI dispatch.

Behavior notes

Real OpenAI in chat mode is unaffected — it returns no reasoning_content, so no reasoning parts are emitted.
supportsStructuredOutputs is left at its default (false): a requested JSON schema downgrades to json_object mode (the AI SDK still validates client-side), which is safe across servers that don't support json_schema response_format. This only affects the title/tag generation call.

Testing

typecheck, lint, and the provider unit tests pass.
Verified end-to-end against a vLLM server (Qwen3, reasoning parser enabled): reasoning streams into the collapsible block and token usage populates.

🤖 Generated with Claude Code

Chat-completions mode (apiMode=chat) is used in practice only against self-hosted/company OpenAI-compatible servers (vLLM, SGLang, llama.cpp, TGI). Those expose model thinking in a `reasoning_content` field that `@ai-sdk/openai`'s chat model silently drops, so the existing reasoning UI never received it. Route chat-mode language models through `@ai-sdk/openai-compatible`, which reads `reasoning_content` natively and emits reasoning stream parts. Embeddings and native search tools stay on the OpenAI SDK. Responses mode is unchanged. Real OpenAI in chat mode is unaffected (it returns no reasoning_content). Leaves supportsStructuredOutputs at default false: a requested JSON schema downgrades to json_object mode (AI SDK still validates client-side) to stay safe across servers lacking json_schema support. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`@ai-sdk/openai-compatible` only sends `stream_options.include_usage` when `includeUsage` is set, so self-hosted servers (vLLM, SGLang, …) returned no token usage on the streaming path — surfacing as In:0/Out:0 in the UI. Non-streaming (e.g. compaction summarize) was unaffected. Set `includeUsage: true` so streamed turns carry token counts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

frantisek.spacek@morosystems.cz and others added 2 commits June 15, 2026 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(backend): surface self-hosted reasoning via @ai-sdk/openai-compatible#262

feat(backend): surface self-hosted reasoning via @ai-sdk/openai-compatible#262
MesoX wants to merge 2 commits into
willdady:mainfrom
MesoX:feature/vllm-reasoning

MesoX commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MesoX commented Jun 15, 2026

Summary

Changes

Behavior notes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant