Skip to content

Fix async retry losing temperature→1.0 on LLMNoResponseError#3356

Open
malhotra5 wants to merge 8 commits into
mainfrom
fix/async-retry-temperature-bump
Open

Fix async retry losing temperature→1.0 on LLMNoResponseError#3356
malhotra5 wants to merge 8 commits into
mainfrom
fix/async-retry-temperature-bump

Conversation

@malhotra5
Copy link
Copy Markdown
Member

@malhotra5 malhotra5 commented May 21, 2026

  • A human has tested these changes.

Why

The before_sleep callback bumps temperature from 0→1.0 on LLMNoResponseError by mutating retry_state.kwargs. In the sync path, tenacity's @retry decorator wraps _one_attempt and automatically threads retry_state.kwargs into each retry call — so the temperature override is picked up via final_kwargs = {**call_kwargs, **retry_kwargs}.

The async methods (acompletion, aresponses) instead used an async for attempt in AsyncRetrying loop. This pattern has no wrapped function for tenacity to re-invoke with updated kwargs. before_sleep wrote temperature=1.0 to retry_state.kwargs, but the loop body used a local call_kwargs that was never updated — so the bump was silently lost.

With temperature=0 the model tends to repeat the same empty response, so all retries fail identically.

Repro (temperatures per attempt): sync [0, 1.0, 1.0], async [0, 0, 0] (before fix).

Summary

  • Replace the AsyncRetrying loop in acompletion and aresponses with the same @retry decorator pattern used by the sync methods — tenacity's decorator natively handles async def functions and automatically threads retry_state.kwargs into each retry
  • Deduplicate shared logic across completion/acompletion and responses/aresponses:
    • _current_metrics_snapshot() — replaces 4 identical MetricsSnapshot constructions
    • _prepare_completion_params() — shared setup (format messages, tools, mock tools, select_chat_options, telemetry ctx)
    • _prepare_responses_params() — shared setup for responses path
    • _validate_chat_response() — shared post-processing (mock tools, telemetry, empty-choices check)
  • Remove the now-unused async_retry() method and AsyncRetrying import from RetryMixin
  • Add 3 async regression tests

How to Test

uv run pytest tests/sdk/llm/ -v

All 742 LLM tests pass (including 3 new async regression tests):

test_async_no_response_retry_bumps_temperature PASSED
test_async_no_response_retries_then_succeeds PASSED
test_async_no_response_exhausts_retries PASSED

The key regression test (test_async_no_response_retry_bumps_temperature) asserts that on the second attempt, litellm_acompletion is called with temperature=1.0.

Type

  • Bug fix
  • Feature
  • Refactor
  • Breaking change
  • Docs / chore

Notes

  • RetryMixin._build_before_sleep and retry_decorator are unchanged
  • This PR was created by an AI agent (OpenHands) on behalf of the user.

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:de3387c-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-de3387c-python \
  ghcr.io/openhands/agent-server:de3387c-python

All tags pushed for this build

ghcr.io/openhands/agent-server:de3387c-golang-amd64
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-golang-amd64
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-golang-amd64
ghcr.io/openhands/agent-server:de3387c-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:de3387c-golang-arm64
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-golang-arm64
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-golang-arm64
ghcr.io/openhands/agent-server:de3387c-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:de3387c-java-amd64
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-java-amd64
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-java-amd64
ghcr.io/openhands/agent-server:de3387c-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:de3387c-java-arm64
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-java-arm64
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-java-arm64
ghcr.io/openhands/agent-server:de3387c-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:de3387c-python-amd64
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-python-amd64
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-python-amd64
ghcr.io/openhands/agent-server:de3387c-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:de3387c-python-arm64
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-python-arm64
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-python-arm64
ghcr.io/openhands/agent-server:de3387c-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:de3387c-golang
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-golang
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-golang
ghcr.io/openhands/agent-server:de3387c-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:de3387c-java
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-java
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-java
ghcr.io/openhands/agent-server:de3387c-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:de3387c-python
ghcr.io/openhands/agent-server:de3387c0d81d0e48124deaec084a5777771ed9c6-python
ghcr.io/openhands/agent-server:fix-async-retry-temperature-bump-python
ghcr.io/openhands/agent-server:de3387c-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., de3387c-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., de3387c-python-amd64) are also available if needed

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/llm
   llm.py68811084%491, 515, 548, 740–741, 749–750, 842–843, 846–850, 852, 860–862, 866, 883–884, 888, 890–891, 893–895, 973, 1036, 1204–1206, 1291, 1332, 1344–1346, 1349–1352, 1358, 1401, 1444, 1457–1459, 1462–1465, 1471, 1627, 1641–1646, 1762–1763, 1995–1996, 2005, 2011, 2016, 2056, 2058–2063, 2065–2082, 2085–2089, 2091–2092, 2098–2107, 2164, 2166
openhands-sdk/openhands/sdk/llm/utils
   retry_mixin.py551180%44, 47, 61, 100, 104, 108–109, 119, 124–125, 130
TOTAL27824818670% 

@malhotra5 malhotra5 force-pushed the fix/async-retry-temperature-bump branch 3 times, most recently from f18cfb8 to 1846b92 Compare May 21, 2026 20:55
The async methods (acompletion, aresponses) used an AsyncRetrying
'async for' loop that has no wrapped function for tenacity to
re-invoke with updated kwargs. The before_sleep callback wrote
temperature=1.0 to retry_state.kwargs, but the loop body used a
local call_kwargs that was never updated.

Replace the AsyncRetrying loop with the same @Retry decorator
pattern used by the sync methods. Tenacity's decorator handles
async functions natively and automatically threads retry_state.kwargs
into the wrapped function, so the temperature bump works identically
for both sync and async paths.

Also deduplicate shared logic across completion/acompletion and
responses/aresponses:
- _current_metrics_snapshot(): replaces 4 identical MetricsSnapshot
  constructions
- _prepare_completion_params(): shared setup (format messages, tools,
  mock tools, select_chat_options, telemetry ctx)
- _prepare_responses_params(): shared setup for responses path
- _validate_chat_response(): shared post-processing (mock tools,
  telemetry, empty-choices check)

Removes the now-unused async_retry() method and AsyncRetrying import
from RetryMixin.

Co-authored-by: openhands <openhands@all-hands.dev>
@malhotra5 malhotra5 force-pushed the fix/async-retry-temperature-bump branch from 1846b92 to 321d7ac Compare May 21, 2026 20:57
@malhotra5 malhotra5 marked this pull request as ready for review May 21, 2026 21:00
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The root-cause analysis in the PR description is spot-on: tenacity's AsyncRetrying loop manages the attempt context-manager but does not re-invoke the user's coroutine on each retry — so any retry_state.kwargs mutations written by before_sleep (the temperature bump) were silently dropped. Switching to @retry_decorator, which natively wraps async def functions and threads retry_state.kwargs into each call, is exactly the right fix.

The refactoring into shared helpers (_current_metrics_snapshot, _prepare_completion_params, _prepare_responses_params, _validate_chat_response) eliminates 4× duplication cleanly and the docstrings are accurate. The three new async regression tests directly validate the fixed behaviour.

Two items are worth a follow-up:

  1. _prepare_completion_params mutates its kwargs argument in-place (inline comment below).
  2. The aresponses async retry path received the same structural fix but has no new regression tests (inline comment below).

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation.

Comment thread openhands-sdk/openhands/sdk/llm/llm.py
Comment thread tests/sdk/llm/test_llm_no_response_retry.py
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 QA Report: PARTIAL

I reproduced the base async completion retry bug and verified this PR fixes the observable LLM.acompletion() temperature retry behavior; one adjacent async Responses path could not be independently confirmed.

Does this PR achieve its stated goal?

Yes for the directly reproducible completion-path bug. On origin/main, the same SDK call sequence produced sync_temperatures=[0.0, 1.0] but async_temperatures=[0.0, 0.0], confirming async retry lost the temperature=1.0 bump after LLMNoResponseError. On the PR branch, the same script produced async_temperatures=[0.0, 1.0] and returned successfully after retry, showing the fix works for the user-facing LLM.acompletion() path. I could not separately prove an aresponses() 0→1 retry transition because the Responses path sent temperature=1.0 on the first transport call in this environment even when configured with temperature=0.0.

Phase Result
Environment Setup make build completed and installed the uv-managed workspace environment.
CI Status 🟡 GitHub checks at review time: 23 successful, 4 skipped, 5 pending, 0 failing.
Functional Verification ✅ Reproduced base bug and verified PR fix via real SDK LLM.completion() / LLM.acompletion() calls with provider responses simulated at the LiteLLM boundary.
Functional Verification

Test 1: Async completion retry applies the temperature bump after LLMNoResponseError

The verification script imported the SDK, created an LLM(temperature=0.0, num_retries=2), and called the public completion() and acompletion() APIs. To avoid external API credentials and make the empty-response condition deterministic, only the LiteLLM transport boundary was patched to return an empty choices=[] response followed by a successful response; the SDK retry, error handling, and response parsing paths were exercised normally.

Step 1 — Reproduce baseline without the fix:
Ran git checkout --detach origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_completion_retry_clean.py; echo qa_exit=$?:

sync_temperatures=[0.0, 1.0]
async_temperatures=[0.0, 0.0]
expected_async_temperatures=[0.0, 1.0]
result=FAIL
qa_exit=1

This confirms the reported bug: the sync retry picked up temperature=1.0, but async retry resent temperature=0.0 on the second attempt.

Step 2 — Apply the PR's changes:
Checked out fix/async-retry-temperature-bump at 321d7ac9abfb3dc1ba0a4dea9082fcbf83cad768.

Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_completion_retry_clean.py; echo qa_exit=$?:

sync_temperatures=[0.0, 1.0]
async_temperatures=[0.0, 1.0]
expected_async_temperatures=[0.0, 1.0]
result=PASS
qa_exit=0

This shows the async retry now receives the same temperature bump as the sync path, and the public SDK call completes successfully after the retry.

Unable to Verify

I attempted a similar before/after script for LLM.aresponses() by simulating LLMNoResponseError then a successful ResponsesAPIResponse. In this environment, the Responses path sent temperature=1.0 on the first transport call even with both instance-level and call-level temperature=0.0, so there was no observable 0.0 → 1.0 retry delta to compare. Future QA guidance could document a concrete provider/model configuration that naturally sends temperature=0.0 through the Responses API and can produce an LLMNoResponseError retry.

Issues Found

None.

Final verdict: PARTIAL (core completion-path bug verified fixed; adjacent aresponses() temperature transition not independently observable in this environment).

This review was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Taste Rating: Acceptable — the direction is good, but I found retry-kwargs regressions in the async paths.

Risk: 🟡 MEDIUM because this touches core LLM retry behavior and provider-specific parameter filtering; human maintainer/eval judgment is still recommended before approval.

This review was generated by an AI agent (OpenHands) on behalf of the user.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26252814170

Comment thread openhands-sdk/openhands/sdk/llm/llm.py Outdated
Comment thread openhands-sdk/openhands/sdk/llm/llm.py Outdated
Comment thread tests/sdk/llm/test_llm_no_response_retry.py
- Add defensive kwargs = dict(kwargs) at the top of both
  _prepare_completion_params and _prepare_responses_params so callers
  are not affected by internal mutations (e.g. kwargs['tools'] = ...).
- Add 3 aresponses regression tests covering temperature bump on retry,
  retry-then-succeed, and retry exhaustion — parallel to the existing
  acompletion tests.

Co-authored-by: openhands <openhands@all-hands.dev>
Extract shared helpers to eliminate code duplication across
completion/acompletion/responses/aresponses:

Layer 1 — _make_retry_decorator(): replaces 4 copies of the 7-line
  retry decorator configuration.

Layer 2 — _build_completion_result() / _build_responses_result():
  shared result-building logic (response → Message → LLMResponse),
  replaces 2×5 identical lines per pair.

Layer 3 — responses stream processing:
  - _build_responses_call_kwargs(): shared litellm call kwargs dict
  - _process_stream_event(): event type matching + delta extraction
  - _finalize_stream_response(): post-stream validation + output patching

Net result: -33 lines, and the sync/async responses _one_attempt
bodies now differ only in the transport call (litellm_responses vs
await litellm_aresponses) and iteration (for vs async for).

Co-authored-by: openhands <openhands@all-hands.dev>
@malhotra5 malhotra5 requested a review from all-hands-bot May 21, 2026 21:42
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 QA Report: PARTIAL

The PR fixes the async chat-completion retry temperature regression in a real SDK call path; one Responses-API-specific 0→1.0 delta could not be reproduced in this environment.

Does this PR achieve its stated goal?

Partially verified. The core reported bug for LLM.acompletion() is fixed: on origin/main, the same public SDK call retried with temperatures [0.0, 0.0], while PR commit a7616581 retried with [0.0, 1.0] and returned an LLMResponse. The sync control path stayed [0.0, 1.0], so the refactor did not break the already-working sync retry behavior in this exercised path. I also exercised LLM.aresponses(), but the simulated Responses path sent [1.0, 1.0] on both base and PR, so I could not use it to prove a before/after 0→1.0 bump for that API.

Phase Result
Environment Setup make build completed successfully and installed the uv workspace environment.
CI Status ⏳ At review time: 22 checks successful, 2 skipped, 6 still in progress.
Functional Verification LLM.acompletion() regression fixed; sync completion still behaves correctly.
Functional Verification

Test 1: Async chat completion retry applies temperature bump

Step 1 — Reproduce / establish baseline without the fix:
Checked out origin/main (8f406a88) and ran a small script that imports the SDK, constructs LLM(temperature=0.0, num_retries=2), calls await llm.acompletion(...), and drives the real SDK retry path with a deterministic empty first model response followed by a successful response.

Ran OPENHANDS_SUPPRESS_BANNER=1 uv run --no-sync python /tmp/qa_async_retry_temperature.py:

{
  "async_completion_under_test": {
    "temperatures": [0.0, 0.0],
    "text": "ok"
  },
  "sync_completion_control": {
    "temperatures": [0.0, 1.0],
    "text": "ok"
  }
}

This confirms the bug exists on base: async retry reaches the successful second attempt, but it reuses temperature=0.0 instead of applying the retry temperature bump. The sync control shows the expected behavior on the same scenario.

Step 2 — Apply the PR's changes:
Checked out PR commit a7616581e188ad8c17e8d3ba1cf6a41d7c0950b1.

Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run --no-sync python /tmp/qa_async_retry_temperature.py:

{
  "async_completion_under_test": {
    "temperatures": [0.0, 1.0],
    "text": "ok"
  },
  "sync_completion_control": {
    "temperatures": [0.0, 1.0],
    "text": "ok"
  }
}

This shows the PR fixes the core async chat-completion regression: the retry attempt now receives temperature=1.0 and the SDK returns a successful LLMResponse. The sync control remained unchanged.

Test 2: Async Responses API path still retries and returns an SDK response

Using the same before/after script, I also exercised await llm.aresponses(...) with a first-call LLMNoResponseError and a second successful Responses API result.

Base and PR both returned:

{
  "async_responses_under_test": {
    "temperatures": [1.0, 1.0],
    "text": "ok"
  }
}

This verifies the Responses API path still retries and returns an LLMResponse, but this scenario did not reproduce a base-branch [0.0, 0.0] failure pattern for aresponses().

Unable to Verify

I could not verify the exact aresponses() 0→1.0 before/after delta claimed in the PR description because the exercised Responses API path supplied temperature=1.0 on both attempts in both base and PR. I also did not use a live LLM provider because the bug requires deterministic empty/no-response behavior, which is not reliably inducible through a normal real-provider user request; instead I used a deterministic fake LiteLLM transport while exercising the real public SDK LLM methods and retry logic.

Suggested future QA guidance: add a documented minimal script or local fake-provider recipe in AGENTS.md for reproducing LLM retry edge cases without relying on nondeterministic external provider behavior.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

Final verdict: PARTIAL

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The root-cause diagnosis and fix are both correct. Tenacity's @retry decorator natively handles async def functions and threads retry_state.kwargs into each re-invocation, whereas the AsyncRetrying loop pattern had no wrapped function for tenacity to call with updated kwargs — so before_sleep's temperature=1.0 mutation was silently lost on every async retry. Switching acompletion and aresponses to @_make_retry_decorator() aligns them with the sync path and fixes the bug cleanly.

The refactoring is solid: the extracted helpers (_prepare_completion_params, _prepare_responses_params, _validate_chat_response, _build_completion_result, _build_responses_result, _build_responses_call_kwargs, _process_stream_event, _finalize_stream_response) eliminate substantial duplication while keeping each public method shallow and readable. The defensive kwargs = dict(kwargs) copies in the two prepare helpers (added in the follow-up commit) are correct.

One observation on the new aresponses tests (inline below).


This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation.

Comment thread tests/sdk/llm/test_llm_no_response_retry.py
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Taste Rating: Acceptable — the async retry fix is directionally right, but I found one public SDK API removal that needs addressing.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM. This touches core LLM retry behavior and the SDK public surface.

VERDICT: ❌ Needs rework: restore or formally deprecate LLM.async_retry before removal.

This review was generated by an AI agent (OpenHands) on behalf of the user.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26254832519

Comment thread openhands-sdk/openhands/sdk/llm/utils/retry_mixin.py
Test that when a streaming response completes without a
ResponseCompletedEvent (i.e. _finalize_stream_response raises
LLMNoResponseError), the retry mechanism still bumps temperature
from 0→1.0. Uses a minimal _FakeAsyncStreamIterator that inherits
from ResponsesAPIStreamingIterator to pass isinstance checks.

Co-authored-by: openhands <openhands@all-hands.dev>
@malhotra5 malhotra5 requested a review from VascoSch92 May 21, 2026 22:00
Comment thread openhands-sdk/openhands/sdk/llm/llm.py Outdated
Comment thread openhands-sdk/openhands/sdk/llm/llm.py
Comment thread openhands-sdk/openhands/sdk/llm/llm.py
Comment thread tests/sdk/llm/test_llm_no_response_retry.py
@malhotra5
Copy link
Copy Markdown
Member Author

@OpenHands please address the unresolved review comments, and note that we should be retaining the original code comments and logs wherever the original code has moved

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 22, 2026

@malhotra5 your session has expired. Please login again at OpenHands Cloud and try again.

VascoSch92 and others added 2 commits May 25, 2026 00:34
…ed deltas, add sync tests

- _validate_chat_response returns just resp (raw_resp is consumed internally
  by Telemetry.on_response); both call sites updated
- Restore the mock-tool logger.debug and the informative comments lost when
  code moved into _prepare_completion_params / _prepare_responses_params
- _process_stream_event gains emit_deltas so the ModelResponseStream chunk is
  not built when there is no stream callback to receive it
- Add sync responses retry + sync streaming retry regression tests
@VascoSch92 VascoSch92 requested a review from all-hands-bot May 24, 2026 22:35
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 QA Report: PARTIAL

Verified the async chat-completion retry bug before/after: the PR changes acompletion() from retrying at [0.0, 0.0] to retrying at [0.0, 1.0]; the Responses temperature delta was not independently reproducible in this environment.

Does this PR achieve its stated goal?

Partially verified. Yes for the reproducible LLM.acompletion() path: after a simulated empty-choices provider response, the retry warning fired and the second LiteLLM call received temperature=1.0 on the PR, while origin/main retried with temperature=0.0. I also exercised LLM.aresponses() retry/recovery, but in this black-box probe the Responses path sent temperature=1.0 on both base and PR before retry, so I could not establish the exact before/after temperature-bump delta for that path.

Phase Result
Environment Setup make build completed and installed the uv-managed dev environment
CI Status 🟡 Most checks observed passing; image/binary plus review/QA automation checks were still pending when checked
Functional Verification 🟡 acompletion() fix verified before/after; aresponses() retry executed but temperature-specific delta was inconclusive
Functional Verification

Test 1: Async chat completion retries with bumped temperature

I used a short SDK script that calls the public LLM.acompletion() method with real SDK message objects. To deterministically trigger the user-facing bug without external provider credentials, the script temporarily replaced the LiteLLM async transport with a provider stand-in that returns an empty ModelResponse(choices=[]) first, then a valid assistant message, while recording the actual kwargs passed into the transport.

Step 1 — Reproduce / establish baseline without the fix:

Ran:

git checkout --detach origin/main
OPENHANDS_SUPPRESS_BANNER=1 uv run python -u /tmp/qa_async_retry_probe.py > /tmp/qa_probe_main.out 2>&1

Relevant output:

LLMNoResponseError with temperature=0, setting temperature to 1.0 for next attempt.
acompletion temperatures=[0.0, 0.0]; content=[TextContent(cache_prompt=False, type='text', text='async chat recovered')]

This confirms the bug on the base branch: the retry callback decided to bump temperature, but the second async chat transport call still received 0.0.

Step 2 — Apply the PR's changes:

Checked out the reviewed PR commit:

git checkout --detach 6daf9a2dc0b140d707330e8e85035bfb88d9d799

Step 3 — Re-run with the fix in place:

Ran the same probe:

OPENHANDS_SUPPRESS_BANNER=1 uv run python -u /tmp/qa_async_retry_probe.py > /tmp/qa_probe_pr.out 2>&1

Relevant output:

LLMNoResponseError with temperature=0, setting temperature to 1.0 for next attempt.
acompletion temperatures=[0.0, 1.0]; content=[TextContent(cache_prompt=False, type='text', text='async chat recovered')]

This shows the PR fixes the observed async retry behavior for chat completions: the retry-modified temperature now reaches the second provider call and the public SDK call recovers successfully.

Test 2: Async Responses retry path executes and recovers

The same probe exercised LLM.aresponses() by making the first transport call raise LLMNoResponseError and the second return a valid ResponsesAPIResponse.

Base and PR both recovered successfully, with this recorded temperature sequence:

origin/main: aresponses temperatures=[1.0, 1.0]; content=[TextContent(cache_prompt=False, type='text', text='async responses recovered')]
PR commit:  aresponses temperatures=[1.0, 1.0]; content=[TextContent(cache_prompt=False, type='text', text='async responses recovered')]

This verifies the async Responses retry path still executes and returns a normal LLMResponse on the PR, but it does not prove the specific 0→1.0 retry mutation because this path already normalized/sent 1.0 before retry in the exercised configuration.

Unable to Verify

I could not produce a before/after aresponses() temperature sequence of [0.0, 0.0] on base versus [0.0, 1.0] on the PR. I tried the public LLM.aresponses() path with multiple model names and explicit temperature=0.0, but the transport-level kwargs recorded temperature=1.0 on the first call for both base and PR. Future QA guidance in AGENTS.md could document a concrete Responses API configuration/provider where temperature=0.0 reaches LiteLLM unchanged, if that path needs manual verification.

Issues Found

None from the verified behavior.


This review was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The root-cause diagnosis is precise, and the fix is the correct minimal change: tenacity's @retry decorator natively handles async def functions and threads retry_state.kwargs into every retry call, so the temperature bump written by before_sleep is automatically picked up. The old AsyncRetrying loop pattern had no wrapped function for tenacity to re-invoke, so the bump was silently discarded.

The accompanying refactoring (_prepare_*_params, _validate_chat_response, _build_*_result, _process_stream_event, _finalize_stream_response) is well-scoped and eliminates real duplication without introducing indirection. The defensive kwargs = dict(kwargs) copies in both _prepare_completion_params and _prepare_responses_params correctly guard against the kwargs['tools'] = ... mutation being visible to callers. The emit_deltas optimisation in _process_stream_event is a nice touch.

Test coverage is comprehensive: the three commit rounds added temperature-bump regression tests for acompletion, aresponses, and both streaming paths, plus retry-exhaustion and retry-then-succeed variants across all four call paths.

One minor observation below. Overall this is in good shape.


This review was generated by an AI agent (OpenHands) on behalf of the user via OpenHands Automation.

Comment thread tests/sdk/llm/test_llm_no_response_retry.py Outdated
Comment thread tests/sdk/llm/test_llm_no_response_retry.py Outdated
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Taste Rating: Good taste — the retry wrapper change is simple and matches the sync path. I didn’t find actionable code issues in the current diff.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM. This touches core LLM retry behavior and can affect agent/eval performance, so I’m leaving a COMMENT rather than approving; human maintainer/eval judgment is recommended.

VERDICT: ✅ Worth merging after maintainer validation/eval sign-off.

KEY INSIGHT: Using Tenacity’s decorator restores one source of truth for retry kwargs across sync and async paths.

This review was generated by an AI agent (OpenHands) on behalf of the user.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26374664213

Avoids the per-retry sleep in the two stream-path tests without changing
what the temperature-bump assertions verify.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants