Fix async retry losing temperature→1.0 on LLMNoResponseError by malhotra5 · Pull Request #3356 · OpenHands/software-agent-sdk

malhotra5 · 2026-05-21T20:24:00Z

A human has tested these changes.

Why

The before_sleep callback bumps temperature from 0→1.0 on LLMNoResponseError by mutating retry_state.kwargs. In the sync path, tenacity's @retry decorator wraps _one_attempt and automatically threads retry_state.kwargs into each retry call — so the temperature override is picked up via final_kwargs = {**call_kwargs, **retry_kwargs}.

The async methods (acompletion, aresponses) instead used an async for attempt in AsyncRetrying loop. This pattern has no wrapped function for tenacity to re-invoke with updated kwargs. before_sleep wrote temperature=1.0 to retry_state.kwargs, but the loop body used a local call_kwargs that was never updated — so the bump was silently lost.

With temperature=0 the model tends to repeat the same empty response, so all retries fail identically.

Repro (temperatures per attempt): sync [0, 1.0, 1.0], async [0, 0, 0] (before fix).

Summary

Replace the AsyncRetrying loop in acompletion and aresponses with the same @retry decorator pattern used by the sync methods — tenacity's decorator natively handles async def functions and automatically threads retry_state.kwargs into each retry
Deduplicate shared logic across completion/acompletion and responses/aresponses:
- _current_metrics_snapshot() — replaces 4 identical MetricsSnapshot constructions
- _prepare_completion_params() — shared setup (format messages, tools, mock tools, select_chat_options, telemetry ctx)
- _prepare_responses_params() — shared setup for responses path
- _validate_chat_response() — shared post-processing (mock tools, telemetry, empty-choices check)
Remove the now-unused async_retry() method and AsyncRetrying import from RetryMixin
Add 3 async regression tests

How to Test

uv run pytest tests/sdk/llm/ -v

All 742 LLM tests pass (including 3 new async regression tests):

test_async_no_response_retry_bumps_temperature PASSED
test_async_no_response_retries_then_succeeds PASSED
test_async_no_response_exhausts_retries PASSED

The key regression test (test_async_no_response_retry_bumps_temperature) asserts that on the second attempt, litellm_acompletion is called with temperature=1.0.

Type

Notes

RetryMixin._build_before_sleep and retry_decorator are unchanged
This PR was created by an AI agent (OpenHands) on behalf of the user.

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:de3387c-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-de3387c-python \
  ghcr.io/openhands/agent-server:de3387c-python

All tags pushed for this build

ghcr.io/openhands/agent-server:de3387c-golang-amd64
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-golang-amd64
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-golang-amd64
ghcr.io/openhands/agent-server:de3387c-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:de3387c-golang-arm64
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-golang-arm64
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-golang-arm64
ghcr.io/openhands/agent-server:de3387c-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:de3387c-java-amd64
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-java-amd64
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-java-amd64
ghcr.io/openhands/agent-server:de3387c-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:de3387c-java-arm64
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-java-arm64
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-java-arm64
ghcr.io/openhands/agent-server:de3387c-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:de3387c-python-amd64
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-python-amd64
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-python-amd64
ghcr.io/openhands/agent-server:de3387c-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:de3387c-python-arm64
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-python-arm64
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-python-arm64
ghcr.io/openhands/agent-server:de3387c-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:de3387c-golang
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-golang
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-golang
ghcr.io/openhands/agent-server:de3387c-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:de3387c-java
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-java
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-java
ghcr.io/openhands/agent-server:de3387c-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:de3387c-python
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-python
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-python
ghcr.io/openhands/agent-server:de3387c-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., de3387c-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., de3387c-python-amd64) are also available if needed

github-actions · 2026-05-21T20:24:29Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-05-21T20:24:37Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-05-21T20:26:46Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/llm
llm.py	688	110	84%	491, 515, 548, 740–741, 749–750, 842–843, 846–850, 852, 860–862, 866, 883–884, 888, 890–891, 893–895, 973, 1036, 1204–1206, 1291, 1332, 1344–1346, 1349–1352, 1358, 1401, 1444, 1457–1459, 1462–1465, 1471, 1627, 1641–1646, 1762–1763, 1995–1996, 2005, 2011, 2016, 2056, 2058–2063, 2065–2082, 2085–2089, 2091–2092, 2098–2107, 2164, 2166
openhands-sdk/openhands/sdk/llm/utils
retry_mixin.py	55	11	80%	44, 47, 61, 100, 104, 108–109, 119, 124–125, 130
TOTAL	27824	8186	70%

@Retry

The async methods (acompletion, aresponses) used an AsyncRetrying 'async for' loop that has no wrapped function for tenacity to re-invoke with updated kwargs. The before_sleep callback wrote temperature=1.0 to retry_state.kwargs, but the loop body used a local call_kwargs that was never updated. Replace the AsyncRetrying loop with the same @Retry decorator pattern used by the sync methods. Tenacity's decorator handles async functions natively and automatically threads retry_state.kwargs into the wrapped function, so the temperature bump works identically for both sync and async paths. Also deduplicate shared logic across completion/acompletion and responses/aresponses: - _current_metrics_snapshot(): replaces 4 identical MetricsSnapshot constructions - _prepare_completion_params(): shared setup (format messages, tools, mock tools, select_chat_options, telemetry ctx) - _prepare_responses_params(): shared setup for responses path - _validate_chat_response(): shared post-processing (mock tools, telemetry, empty-choices check) Removes the now-unused async_retry() method and AsyncRetrying import from RetryMixin. Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

Code Review

The root-cause analysis in the PR description is spot-on: tenacity's AsyncRetrying loop manages the attempt context-manager but does not re-invoke the user's coroutine on each retry — so any retry_state.kwargs mutations written by before_sleep (the temperature bump) were silently dropped. Switching to @retry_decorator, which natively wraps async def functions and threads retry_state.kwargs into each call, is exactly the right fix.

The refactoring into shared helpers (_current_metrics_snapshot, _prepare_completion_params, _prepare_responses_params, _validate_chat_response) eliminates 4× duplication cleanly and the docstrings are accurate. The three new async regression tests directly validate the fixed behaviour.

Two items are worth a follow-up:

_prepare_completion_params mutates its kwargs argument in-place (inline comment below).
The aresponses async retry path received the same structural fix but has no new regression tests (inline comment below).

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation.

all-hands-bot

🟡 QA Report: PARTIAL

I reproduced the base async completion retry bug and verified this PR fixes the observable LLM.acompletion() temperature retry behavior; one adjacent async Responses path could not be independently confirmed.

Does this PR achieve its stated goal?

Yes for the directly reproducible completion-path bug. On origin/main, the same SDK call sequence produced sync_temperatures=[0.0, 1.0] but async_temperatures=[0.0, 0.0], confirming async retry lost the temperature=1.0 bump after LLMNoResponseError. On the PR branch, the same script produced async_temperatures=[0.0, 1.0] and returned successfully after retry, showing the fix works for the user-facing LLM.acompletion() path. I could not separately prove an aresponses() 0→1 retry transition because the Responses path sent temperature=1.0 on the first transport call in this environment even when configured with temperature=0.0.

Phase	Result
Environment Setup	✅ `make build` completed and installed the uv-managed workspace environment.
CI Status	🟡 GitHub checks at review time: 23 successful, 4 skipped, 5 pending, 0 failing.
Functional Verification	✅ Reproduced base bug and verified PR fix via real SDK `LLM.completion()` / `LLM.acompletion()` calls with provider responses simulated at the LiteLLM boundary.

Functional Verification

Test 1: Async completion retry applies the temperature bump after `LLMNoResponseError`

The verification script imported the SDK, created an LLM(temperature=0.0, num_retries=2), and called the public completion() and acompletion() APIs. To avoid external API credentials and make the empty-response condition deterministic, only the LiteLLM transport boundary was patched to return an empty choices=[] response followed by a successful response; the SDK retry, error handling, and response parsing paths were exercised normally.

Step 1 — Reproduce baseline without the fix:
Ran git checkout --detach origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_completion_retry_clean.py; echo qa_exit=$?:

sync_temperatures=[0.0, 1.0]
async_temperatures=[0.0, 0.0]
expected_async_temperatures=[0.0, 1.0]
result=FAIL
qa_exit=1

This confirms the reported bug: the sync retry picked up temperature=1.0, but async retry resent temperature=0.0 on the second attempt.

Step 2 — Apply the PR's changes:
Checked out fix/async-retry-temperature-bump at 321d7ac9abfb3dc1ba0a4dea9082fcbf83cad768.

Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_completion_retry_clean.py; echo qa_exit=$?:

sync_temperatures=[0.0, 1.0]
async_temperatures=[0.0, 1.0]
expected_async_temperatures=[0.0, 1.0]
result=PASS
qa_exit=0

This shows the async retry now receives the same temperature bump as the sync path, and the public SDK call completes successfully after the retry.

Unable to Verify

I attempted a similar before/after script for LLM.aresponses() by simulating LLMNoResponseError then a successful ResponsesAPIResponse. In this environment, the Responses path sent temperature=1.0 on the first transport call even with both instance-level and call-level temperature=0.0, so there was no observable 0.0 → 1.0 retry delta to compare. Future QA guidance could document a concrete provider/model configuration that naturally sends temperature=0.0 through the Responses API and can produce an LLMNoResponseError retry.

Issues Found

None.

Final verdict: PARTIAL (core completion-path bug verified fixed; adjacent aresponses() temperature transition not independently observable in this environment).

This review was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot

🟡 Taste Rating: Acceptable — the direction is good, but I found retry-kwargs regressions in the async paths.

Risk: 🟡 MEDIUM because this touches core LLM retry behavior and provider-specific parameter filtering; human maintainer/eval judgment is still recommended before approval.

This review was generated by an AI agent (OpenHands) on behalf of the user.

Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26252814170

- Add defensive kwargs = dict(kwargs) at the top of both _prepare_completion_params and _prepare_responses_params so callers are not affected by internal mutations (e.g. kwargs['tools'] = ...). - Add 3 aresponses regression tests covering temperature bump on retry, retry-then-succeed, and retry exhaustion — parallel to the existing acompletion tests. Co-authored-by: openhands <openhands@all-hands.dev>

Extract shared helpers to eliminate code duplication across completion/acompletion/responses/aresponses: Layer 1 — _make_retry_decorator(): replaces 4 copies of the 7-line retry decorator configuration. Layer 2 — _build_completion_result() / _build_responses_result(): shared result-building logic (response → Message → LLMResponse), replaces 2×5 identical lines per pair. Layer 3 — responses stream processing: - _build_responses_call_kwargs(): shared litellm call kwargs dict - _process_stream_event(): event type matching + delta extraction - _finalize_stream_response(): post-stream validation + output patching Net result: -33 lines, and the sync/async responses _one_attempt bodies now differ only in the transport call (litellm_responses vs await litellm_aresponses) and iteration (for vs async for). Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

🟡 QA Report: PARTIAL

The PR fixes the async chat-completion retry temperature regression in a real SDK call path; one Responses-API-specific 0→1.0 delta could not be reproduced in this environment.

Does this PR achieve its stated goal?

Partially verified. The core reported bug for LLM.acompletion() is fixed: on origin/main, the same public SDK call retried with temperatures [0.0, 0.0], while PR commit a7616581 retried with [0.0, 1.0] and returned an LLMResponse. The sync control path stayed [0.0, 1.0], so the refactor did not break the already-working sync retry behavior in this exercised path. I also exercised LLM.aresponses(), but the simulated Responses path sent [1.0, 1.0] on both base and PR, so I could not use it to prove a before/after 0→1.0 bump for that API.

Phase	Result
Environment Setup	✅ `make build` completed successfully and installed the uv workspace environment.
CI Status	⏳ At review time: 22 checks successful, 2 skipped, 6 still in progress.
Functional Verification	✅ `LLM.acompletion()` regression fixed; sync completion still behaves correctly.

Functional Verification

Test 1: Async chat completion retry applies temperature bump

Step 1 — Reproduce / establish baseline without the fix:
Checked out origin/main (8f406a88) and ran a small script that imports the SDK, constructs LLM(temperature=0.0, num_retries=2), calls await llm.acompletion(...), and drives the real SDK retry path with a deterministic empty first model response followed by a successful response.

Ran OPENHANDS_SUPPRESS_BANNER=1 uv run --no-sync python /tmp/qa_async_retry_temperature.py:

{
  "async_completion_under_test": {
    "temperatures": [0.0, 0.0],
    "text": "ok"
  },
  "sync_completion_control": {
    "temperatures": [0.0, 1.0],
    "text": "ok"
  }
}

This confirms the bug exists on base: async retry reaches the successful second attempt, but it reuses temperature=0.0 instead of applying the retry temperature bump. The sync control shows the expected behavior on the same scenario.

Step 2 — Apply the PR's changes:
Checked out PR commit a7616581e188ad8c17e8d3ba1cf6a41d7c0950b1.

Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run --no-sync python /tmp/qa_async_retry_temperature.py:

{
  "async_completion_under_test": {
    "temperatures": [0.0, 1.0],
    "text": "ok"
  },
  "sync_completion_control": {
    "temperatures": [0.0, 1.0],
    "text": "ok"
  }
}

This shows the PR fixes the core async chat-completion regression: the retry attempt now receives temperature=1.0 and the SDK returns a successful LLMResponse. The sync control remained unchanged.

Test 2: Async Responses API path still retries and returns an SDK response

Using the same before/after script, I also exercised await llm.aresponses(...) with a first-call LLMNoResponseError and a second successful Responses API result.

Base and PR both returned:

{
  "async_responses_under_test": {
    "temperatures": [1.0, 1.0],
    "text": "ok"
  }
}

This verifies the Responses API path still retries and returns an LLMResponse, but this scenario did not reproduce a base-branch [0.0, 0.0] failure pattern for aresponses().

Unable to Verify

I could not verify the exact aresponses() 0→1.0 before/after delta claimed in the PR description because the exercised Responses API path supplied temperature=1.0 on both attempts in both base and PR. I also did not use a live LLM provider because the bug requires deterministic empty/no-response behavior, which is not reliably inducible through a normal real-provider user request; instead I used a deterministic fake LiteLLM transport while exercising the real public SDK LLM methods and retry logic.

Suggested future QA guidance: add a documented minimal script or local fake-provider recipe in AGENTS.md for reproducing LLM retry edge cases without relying on nondeterministic external provider behavior.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

Final verdict: PARTIAL

all-hands-bot

Code Review

The root-cause diagnosis and fix are both correct. Tenacity's @retry decorator natively handles async def functions and threads retry_state.kwargs into each re-invocation, whereas the AsyncRetrying loop pattern had no wrapped function for tenacity to call with updated kwargs — so before_sleep's temperature=1.0 mutation was silently lost on every async retry. Switching acompletion and aresponses to @_make_retry_decorator() aligns them with the sync path and fixes the bug cleanly.

The refactoring is solid: the extracted helpers (_prepare_completion_params, _prepare_responses_params, _validate_chat_response, _build_completion_result, _build_responses_result, _build_responses_call_kwargs, _process_stream_event, _finalize_stream_response) eliminate substantial duplication while keeping each public method shallow and readable. The defensive kwargs = dict(kwargs) copies in the two prepare helpers (added in the follow-up commit) are correct.

One observation on the new aresponses tests (inline below).

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation.

all-hands-bot

🟡 Taste Rating: Acceptable — the async retry fix is directionally right, but I found one public SDK API removal that needs addressing.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM. This touches core LLM retry behavior and the SDK public surface.

VERDICT: ❌ Needs rework: restore or formally deprecate LLM.async_retry before removal.

This review was generated by an AI agent (OpenHands) on behalf of the user.

Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26254832519

Test that when a streaming response completes without a ResponseCompletedEvent (i.e. _finalize_stream_response raises LLMNoResponseError), the retry mechanism still bumps temperature from 0→1.0. Uses a minimal _FakeAsyncStreamIterator that inherits from ResponsesAPIStreamingIterator to pass isinstance checks. Co-authored-by: openhands <openhands@all-hands.dev>

malhotra5 · 2026-05-22T11:51:06Z

@OpenHands please address the unresolved review comments, and note that we should be retaining the original code comments and logs wherever the original code has moved

openhands-ai · 2026-05-22T11:51:14Z

@malhotra5 your session has expired. Please login again at OpenHands Cloud and try again.

…ed deltas, add sync tests - _validate_chat_response returns just resp (raw_resp is consumed internally by Telemetry.on_response); both call sites updated - Restore the mock-tool logger.debug and the informative comments lost when code moved into _prepare_completion_params / _prepare_responses_params - _process_stream_event gains emit_deltas so the ModelResponseStream chunk is not built when there is no stream callback to receive it - Add sync responses retry + sync streaming retry regression tests

all-hands-bot

🟡 QA Report: PARTIAL

Verified the async chat-completion retry bug before/after: the PR changes acompletion() from retrying at [0.0, 0.0] to retrying at [0.0, 1.0]; the Responses temperature delta was not independently reproducible in this environment.

Does this PR achieve its stated goal?

Partially verified. Yes for the reproducible LLM.acompletion() path: after a simulated empty-choices provider response, the retry warning fired and the second LiteLLM call received temperature=1.0 on the PR, while origin/main retried with temperature=0.0. I also exercised LLM.aresponses() retry/recovery, but in this black-box probe the Responses path sent temperature=1.0 on both base and PR before retry, so I could not establish the exact before/after temperature-bump delta for that path.

Phase	Result
Environment Setup	✅ `make build` completed and installed the uv-managed dev environment
CI Status	🟡 Most checks observed passing; image/binary plus review/QA automation checks were still pending when checked
Functional Verification	🟡 `acompletion()` fix verified before/after; `aresponses()` retry executed but temperature-specific delta was inconclusive

Functional Verification

Test 1: Async chat completion retries with bumped temperature

I used a short SDK script that calls the public LLM.acompletion() method with real SDK message objects. To deterministically trigger the user-facing bug without external provider credentials, the script temporarily replaced the LiteLLM async transport with a provider stand-in that returns an empty ModelResponse(choices=[]) first, then a valid assistant message, while recording the actual kwargs passed into the transport.

Step 1 — Reproduce / establish baseline without the fix:

Ran:

git checkout --detach origin/main
OPENHANDS_SUPPRESS_BANNER=1 uv run python -u /tmp/qa_async_retry_probe.py > /tmp/qa_probe_main.out 2>&1

Relevant output:

LLMNoResponseError with temperature=0, setting temperature to 1.0 for next attempt.
acompletion temperatures=[0.0, 0.0]; content=[TextContent(cache_prompt=False, type='text', text='async chat recovered')]

This confirms the bug on the base branch: the retry callback decided to bump temperature, but the second async chat transport call still received 0.0.

Step 2 — Apply the PR's changes:

Checked out the reviewed PR commit:

git checkout --detach 6daf9a2dc0b140d707330e8e85035bfb88d9d799

Step 3 — Re-run with the fix in place:

Ran the same probe:

OPENHANDS_SUPPRESS_BANNER=1 uv run python -u /tmp/qa_async_retry_probe.py > /tmp/qa_probe_pr.out 2>&1

Relevant output:

LLMNoResponseError with temperature=0, setting temperature to 1.0 for next attempt.
acompletion temperatures=[0.0, 1.0]; content=[TextContent(cache_prompt=False, type='text', text='async chat recovered')]

This shows the PR fixes the observed async retry behavior for chat completions: the retry-modified temperature now reaches the second provider call and the public SDK call recovers successfully.

Test 2: Async Responses retry path executes and recovers

The same probe exercised LLM.aresponses() by making the first transport call raise LLMNoResponseError and the second return a valid ResponsesAPIResponse.

Base and PR both recovered successfully, with this recorded temperature sequence:

origin/main: aresponses temperatures=[1.0, 1.0]; content=[TextContent(cache_prompt=False, type='text', text='async responses recovered')]
PR commit:  aresponses temperatures=[1.0, 1.0]; content=[TextContent(cache_prompt=False, type='text', text='async responses recovered')]

This verifies the async Responses retry path still executes and returns a normal LLMResponse on the PR, but it does not prove the specific 0→1.0 retry mutation because this path already normalized/sent 1.0 before retry in the exercised configuration.

Unable to Verify

I could not produce a before/after aresponses() temperature sequence of [0.0, 0.0] on base versus [0.0, 1.0] on the PR. I tried the public LLM.aresponses() path with multiple model names and explicit temperature=0.0, but the transport-level kwargs recorded temperature=1.0 on the first call for both base and PR. Future QA guidance in AGENTS.md could document a concrete Responses API configuration/provider where temperature=0.0 reaches LiteLLM unchanged, if that path needs manual verification.

Issues Found

None from the verified behavior.

This review was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot

Code Review

The root-cause diagnosis is precise, and the fix is the correct minimal change: tenacity's @retry decorator natively handles async def functions and threads retry_state.kwargs into every retry call, so the temperature bump written by before_sleep is automatically picked up. The old AsyncRetrying loop pattern had no wrapped function for tenacity to re-invoke, so the bump was silently discarded.

The accompanying refactoring (_prepare_*_params, _validate_chat_response, _build_*_result, _process_stream_event, _finalize_stream_response) is well-scoped and eliminates real duplication without introducing indirection. The defensive kwargs = dict(kwargs) copies in both _prepare_completion_params and _prepare_responses_params correctly guard against the kwargs['tools'] = ... mutation being visible to callers. The emit_deltas optimisation in _process_stream_event is a nice touch.

Test coverage is comprehensive: the three commit rounds added temperature-bump regression tests for acompletion, aresponses, and both streaming paths, plus retry-exhaustion and retry-then-succeed variants across all four call paths.

One minor observation below. Overall this is in good shape.

This review was generated by an AI agent (OpenHands) on behalf of the user via OpenHands Automation.

all-hands-bot

🟢 Taste Rating: Good taste — the retry wrapper change is simple and matches the sync path. I didn’t find actionable code issues in the current diff.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM. This touches core LLM retry behavior and can affect agent/eval performance, so I’m leaving a COMMENT rather than approving; human maintainer/eval judgment is recommended.

VERDICT: ✅ Worth merging after maintainer validation/eval sign-off.

KEY INSIGHT: Using Tenacity’s decorator restores one source of truth for retry kwargs across sync and async paths.

This review was generated by an AI agent (OpenHands) on behalf of the user.

Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26374664213

Avoids the per-retry sleep in the two stream-path tests without changing what the temperature-bump assertions verify.

malhotra5 force-pushed the fix/async-retry-temperature-bump branch 3 times, most recently from f18cfb8 to 1846b92 Compare May 21, 2026 20:55

malhotra5 force-pushed the fix/async-retry-temperature-bump branch from 1846b92 to 321d7ac Compare May 21, 2026 20:57

malhotra5 marked this pull request as ready for review May 21, 2026 21:00

all-hands-bot reviewed May 21, 2026

View reviewed changes

Comment thread openhands-sdk/openhands/sdk/llm/llm.py

Comment thread tests/sdk/llm/test_llm_no_response_retry.py

all-hands-bot reviewed May 21, 2026

View reviewed changes

Comment thread openhands-sdk/openhands/sdk/llm/llm.py Outdated

Comment thread openhands-sdk/openhands/sdk/llm/llm.py Outdated

Comment thread tests/sdk/llm/test_llm_no_response_retry.py

openhands-agent added 2 commits May 21, 2026 21:10

malhotra5 requested a review from all-hands-bot May 21, 2026 21:42

all-hands-bot reviewed May 21, 2026

View reviewed changes

Comment thread tests/sdk/llm/test_llm_no_response_retry.py

all-hands-bot reviewed May 21, 2026

View reviewed changes

Comment thread openhands-sdk/openhands/sdk/llm/utils/retry_mixin.py

malhotra5 requested a review from VascoSch92 May 21, 2026 22:00

VascoSch92 reviewed May 22, 2026

View reviewed changes

Comment thread openhands-sdk/openhands/sdk/llm/llm.py Outdated

Comment thread openhands-sdk/openhands/sdk/llm/llm.py

Comment thread openhands-sdk/openhands/sdk/llm/llm.py

Comment thread tests/sdk/llm/test_llm_no_response_retry.py

VascoSch92 and others added 2 commits May 25, 2026 00:34

Merge branch 'main' into fix/async-retry-temperature-bump

6daf9a2

VascoSch92 requested a review from all-hands-bot May 24, 2026 22:35

all-hands-bot reviewed May 24, 2026

View reviewed changes

Comment thread tests/sdk/llm/test_llm_no_response_retry.py Outdated

Comment thread tests/sdk/llm/test_llm_no_response_retry.py Outdated

all-hands-bot reviewed May 24, 2026

View reviewed changes

Use retry_min_wait=0 / retry_max_wait=0 in streaming retry tests

8df29dc

Avoids the per-retry sleep in the two stream-path tests without changing what the temperature-bump assertions verify.

Merge branch 'main' into fix/async-retry-temperature-bump

de3387c

Conversation

malhotra5 commented May 21, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Summary

How to Test

Type

Notes

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🟡 QA Report: PARTIAL

Does this PR achieve its stated goal?

Test 1: Async completion retry applies the temperature bump after LLMNoResponseError

Issues Found

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🟡 QA Report: PARTIAL

Does this PR achieve its stated goal?

Test 1: Async chat completion retry applies temperature bump

Test 2: Async Responses API path still retries and returns an SDK response

Issues Found

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

malhotra5 commented May 22, 2026

Uh oh!

openhands-ai Bot commented May 22, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🟡 QA Report: PARTIAL

Does this PR achieve its stated goal?

Test 1: Async chat completion retries with bumped temperature

Test 2: Async Responses retry path executes and recovers

Issues Found

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

malhotra5 commented May 21, 2026 •

edited by github-actions Bot

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

Test 1: Async completion retry applies the temperature bump after `LLMNoResponseError`