WIP: Stream reasoning tokens for OpenAI reasoning models by onmete · Pull Request #2778 · openshift/lightspeed-service

onmete · 2026-02-26T17:17:52Z

Description

Draft/WIP reference implementation for streaming reasoning tokens (chain-of-thought) from OpenAI reasoning models (GPT-5, o-series). This enables the UI to show "thinking" progress and preserves reasoning context between tool-calling rounds.

Key architectural decisions

Responses API required — The default Chat Completions API does not expose reasoning tokens for GPT-5 (they are processed server-side and invisible). Switching to the Responses API (output_version="responses/v1") with reasoning={"effort": "low", "summary": "auto"} makes reasoning summaries available as content blocks.
Reasoning must be passed between tool-calling rounds — Per OpenAI's guidance, reasoning items should be kept in context between rounds within a single request. Without this, the model re-reasons from scratch each round, producing repetitive verbose output. The fix accumulates all AIMessageChunks per round and builds the inter-round AIMessage with full content (reasoning + text + tool calls) instead of content="".
Reasoning is ephemeral — not cached across requests — Reasoning context is only relevant within a single request's tool-calling loop. It is NOT stored in the conversation cache between separate question/answer pairs (the cache only stores the final text response).
New StreamedChunk(type="reasoning") and SSE event: reasoning — Reasoning summaries are yielded as a new chunk type, streamed as event: reasoning in JSON mode. In text/plain mode, reasoning text is output directly.
Loop termination fix for Responses API — The Chat Completions API uses finish_reason="stop" for text-only completions and finish_reason="tool_calls" for tool calls. The Responses API uses chunk_position="last" for ALL completions indiscriminately, so it cannot be used for early stop detection. Instead, the tool-calling loop now explicitly breaks when no tool calls are present after a round.
Token counter resilience — GenericTokenCounter.on_llm_new_token now handles non-string tokens (the Responses API can send structured content objects).

Open items / not yet done

Tuning reasoning.effort and verbosity levels — "low" may be too terse, "medium" too verbose
Unit tests for reasoning extraction and streaming
Integration tests with reasoning models
Evaluate whether summary: "auto" vs "concise" vs "detailed" is optimal
Config-driven reasoning parameters (per-model or per-provider) instead of hardcoded defaults
Consider use_previous_response_id=True as an alternative to manual message accumulation

Type of change

New feature

Related Tickets & Documents

Reference implementation for reasoning token extraction design

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Manually tested with GPT-5 via curl against a local OLS instance:

Verified tool-calling flow completes (model calls tool, gets result, produces concise answer, stops)
Verified reasoning is not cached across requests
Verified reasoning is passed between tool-calling rounds (no repetitive looping)
Verified text/plain mode outputs reasoning text directly
Verified application/json mode produces event: reasoning SSE events

Made with Cursor

openshift-ci · 2026-02-26T17:18:02Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2026-02-26T17:18:57Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign onmete for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2026-02-27T10:41:42Z

@onmete: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/ols-evaluation	`2d14808`	link	true	`/test ols-evaluation`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 26, 2026

onmete force-pushed the wip/reasoning-token-streaming branch from b8de481 to 782b4ac Compare March 12, 2026 14:51

support openai reasoning

31c8af5

onmete force-pushed the wip/reasoning-token-streaming branch from 5385aa1 to 31c8af5 Compare March 13, 2026 11:05

Don't use strict mode

4447a02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Stream reasoning tokens for OpenAI reasoning models#2778

WIP: Stream reasoning tokens for OpenAI reasoning models#2778
onmete wants to merge 2 commits intoopenshift:mainfrom
onmete:wip/reasoning-token-streaming

onmete commented Feb 26, 2026

Uh oh!

openshift-ci bot commented Feb 26, 2026

Uh oh!

openshift-ci bot commented Feb 26, 2026

Uh oh!

openshift-ci bot commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

onmete commented Feb 26, 2026

Description

Key architectural decisions

Open items / not yet done

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

Uh oh!

openshift-ci bot commented Feb 26, 2026

Uh oh!

openshift-ci bot commented Feb 26, 2026

Uh oh!

openshift-ci bot commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant