Skip to content

fix: surface friendly error message when LLM API key is invalid#3413

Open
erisfully wants to merge 6 commits into
mainfrom
fix/llm-auth-error-friendly-message-3411
Open

fix: surface friendly error message when LLM API key is invalid#3413
erisfully wants to merge 6 commits into
mainfrom
fix/llm-auth-error-friendly-message-3411

Conversation

@erisfully
Copy link
Copy Markdown
Member

@erisfully erisfully commented May 27, 2026

Fixes #3411

Problem

When a user has an invalid or expired LLM API key, LLMAuthenticationError propagated all the way up to the generic except Exception handler in both run() and arun(). That handler used detail=str(e), which baked the raw litellm error string (e.g. the full AnthropicException JSON blob) directly into the ConversationErrorEvent — which was then shown verbatim as a toast in the UI.

The result: every message the user sent silently failed with an opaque, provider-internal error string and no hint that their API key was the problem.

Fix

Add an explicit except LLMAuthenticationError clause before the catch-all except Exception in both run() and arun() in local_conversation.py. The new handler emits a ConversationErrorEvent with a clear, actionable detail message:

"Your LLM API key appears to be invalid or has expired. Please update it in Settings."

The raw exception is still re-raised as ConversationRunError so server logs are unaffected.

Changes

  • openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py
    • Import LLMAuthenticationError from openhands.sdk.llm.exceptions
    • Add except LLMAuthenticationError handler in run() (sync path)
    • Add except LLMAuthenticationError handler in arun() (async path)

Before / After

Before — toast shows raw litellm error:

litellm.AuthenticationError: AnthropicException - {"type":"error","error":{"type":"authentication_error","message":"invalid x-api-key"},"request_id":"req_011CbTfF4jtKVAB95FSH6ESb"}

After — toast shows actionable message:

Your LLM API key appears to be invalid or has expired. Please update it in Settings.

This PR was created by an AI agent (OpenHands) based on Datadog log analysis.

@erisfully can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:e0fd81b-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-e0fd81b-python \
  ghcr.io/openhands/agent-server:e0fd81b-python

All tags pushed for this build

ghcr.io/openhands/agent-server:e0fd81b-golang-amd64
ghcr.io/openhands/agent-server:e0fd81b807815f45f1a72c4fd2e325d639831922-golang-amd64
ghcr.io/openhands/agent-server:fix-llm-auth-error-friendly-message-3411-golang-amd64
ghcr.io/openhands/agent-server:e0fd81b-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:e0fd81b-golang-arm64
ghcr.io/openhands/agent-server:e0fd81b807815f45f1a72c4fd2e325d639831922-golang-arm64
ghcr.io/openhands/agent-server:fix-llm-auth-error-friendly-message-3411-golang-arm64
ghcr.io/openhands/agent-server:e0fd81b-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:e0fd81b-java-amd64
ghcr.io/openhands/agent-server:e0fd81b807815f45f1a72c4fd2e325d639831922-java-amd64
ghcr.io/openhands/agent-server:fix-llm-auth-error-friendly-message-3411-java-amd64
ghcr.io/openhands/agent-server:e0fd81b-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:e0fd81b-java-arm64
ghcr.io/openhands/agent-server:e0fd81b807815f45f1a72c4fd2e325d639831922-java-arm64
ghcr.io/openhands/agent-server:fix-llm-auth-error-friendly-message-3411-java-arm64
ghcr.io/openhands/agent-server:e0fd81b-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:e0fd81b-python-amd64
ghcr.io/openhands/agent-server:e0fd81b807815f45f1a72c4fd2e325d639831922-python-amd64
ghcr.io/openhands/agent-server:fix-llm-auth-error-friendly-message-3411-python-amd64
ghcr.io/openhands/agent-server:e0fd81b-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:e0fd81b-python-arm64
ghcr.io/openhands/agent-server:e0fd81b807815f45f1a72c4fd2e325d639831922-python-arm64
ghcr.io/openhands/agent-server:fix-llm-auth-error-friendly-message-3411-python-arm64
ghcr.io/openhands/agent-server:e0fd81b-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:e0fd81b-golang
ghcr.io/openhands/agent-server:e0fd81b807815f45f1a72c4fd2e325d639831922-golang
ghcr.io/openhands/agent-server:fix-llm-auth-error-friendly-message-3411-golang
ghcr.io/openhands/agent-server:e0fd81b-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:e0fd81b-java
ghcr.io/openhands/agent-server:e0fd81b807815f45f1a72c4fd2e325d639831922-java
ghcr.io/openhands/agent-server:fix-llm-auth-error-friendly-message-3411-java
ghcr.io/openhands/agent-server:e0fd81b-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:e0fd81b-python
ghcr.io/openhands/agent-server:e0fd81b807815f45f1a72c4fd2e325d639831922-python
ghcr.io/openhands/agent-server:fix-llm-auth-error-friendly-message-3411-python
ghcr.io/openhands/agent-server:e0fd81b-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., e0fd81b-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., e0fd81b-python-amd64) are also available if needed

When LLMAuthenticationError propagates up from the LLM call, the
generic except-Exception handler was baking the raw litellm error
string (e.g. the full AnthropicException JSON) into the
ConversationErrorEvent detail field, which was shown verbatim as a
toast in the UI.

Add an explicit except LLMAuthenticationError clause before the
catch-all in both run() and arun(), emitting a user-readable message
instead. The raw exception is still re-raised as ConversationRunError
so logs remain unaffected.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

Four tests covering both the sync (run) and async (arun) paths:
- ConversationRunError is still raised (logs unaffected)
- ConversationErrorEvent.detail contains a friendly message
- ConversationErrorEvent.detail does NOT contain the raw litellm string
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py5664991%310, 315, 459, 505, 574, 590, 666, 956–957, 960, 1088, 1099–1102, 1109–1110, 1113, 1119–1120, 1123, 1129, 1144, 1147, 1151–1152, 1156–1158, 1165, 1199–1202, 1209, 1266, 1271, 1381, 1383, 1387–1388, 1399–1400, 1425, 1620, 1624, 1694, 1701–1702
TOTAL28290820271% 

@erisfully erisfully marked this pull request as ready for review May 27, 2026 23:25
@erisfully erisfully added the qa-this label May 27, 2026 — with OpenHands AI
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ QA Report: PASS WITH ISSUES

The invalid/expired LLM API key path was exercised through the SDK with a real Anthropic invalid-key response; the PR fixes the UI-facing detail in both sync and async paths, but CI currently has a failing pre-commit check.

Does this PR achieve its stated goal?

Yes. On origin/main, the same invalid-key conversation emitted ConversationErrorEvent.detail containing raw litellm.AuthenticationError: AnthropicException ... invalid x-api-key for both run() and arun(). On PR commit 4e851294b0ac38b4844305b1c28c9a179eab370f, both paths emitted Your LLM API key appears to be invalid or has expired. Please update it in Settings., while still raising ConversationRunError with LLMAuthenticationError as the cause.

Phase Result
Environment Setup make build completed successfully; no tests, linters, or pre-commit hooks were run locally.
CI Status ⚠️ GitHub reports 20 successful, 1 failing (Pre-commit checks/pre-commit), 8 pending, and 14 skipped checks.
Functional Verification ✅ Real SDK Conversation.run() and Conversation.arun() calls with a bogus Anthropic key produced the expected before/after behavior.
Functional Verification

Test 1: Invalid LLM API key surfaces a friendly UI-facing conversation error

Step 1 — Reproduce / establish baseline without the fix:

I checked out origin/main at c6347949 and ran a temporary SDK script that creates LLM(model="anthropic/claude-3-haiku-20240307", api_key="qa-invalid-key-not-secret"), builds an Agent and Conversation, sends Say hello once., then executes both conv.run() and conv.arun().

Ran git fetch origin main && git checkout --detach origin/main && uv run python /tmp/qa_llm_auth_behavior.py 2>&1 | tee /tmp/qa_llm_auth_base.log, then extracted the behavior lines:

--- sync run() ---
raised=ConversationRunError
cause=LLMAuthenticationError
cause_is_llm_auth=True
error_event_count=1
event_code=LLMAuthenticationError
event_detail=litellm.AuthenticationError: AnthropicException - {"type":"error","error":{"type":"authentication_error","message":"invalid x-api-key"},"request_id":"req_011CbTwhhoRE8xeYhptJHHEE"}
detail_contains_raw_marker=True
--- async arun() ---
raised=ConversationRunError
cause=LLMAuthenticationError
cause_is_llm_auth=True
error_event_count=1
event_code=LLMAuthenticationError
event_detail=litellm.AuthenticationError: AnthropicException - {"type":"error","error":{"type":"authentication_error","message":"invalid x-api-key"},"request_id":"req_011CbTwhicXeyr9GYuRAPAsq"}
detail_contains_raw_marker=True

This confirms the reported bug exists on main: the UI-facing ConversationErrorEvent.detail includes raw provider/litellm authentication text, including AnthropicException and invalid x-api-key.

Step 2 — Apply the PR's changes:

I checked out the PR branch at 4e851294b0ac38b4844305b1c28c9a179eab370f.

Step 3 — Re-run with the fix in place:

Ran git checkout fix/llm-auth-error-friendly-message-3411 && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_llm_auth_behavior.py 2>&1 | tee /tmp/qa_llm_auth_pr.log, then extracted the behavior lines:

--- sync run() ---
raised=ConversationRunError
cause=LLMAuthenticationError
cause_is_llm_auth=True
error_event_count=1
event_code=LLMAuthenticationError
event_detail=Your LLM API key appears to be invalid or has expired. Please update it in Settings.
detail_contains_raw_marker=False
--- async arun() ---
raised=ConversationRunError
cause=LLMAuthenticationError
cause_is_llm_auth=True
error_event_count=1
event_code=LLMAuthenticationError
event_detail=Your LLM API key appears to be invalid or has expired. Please update it in Settings.
detail_contains_raw_marker=False

This shows the fix works in both user-facing entry points: the raw provider error is no longer placed in ConversationErrorEvent.detail, the friendly actionable message is present, and the raised exception chain remains intact for logs/debugging.

Issues Found

  • 🟠 Issue: CI is not green at review time: Pre-commit checks/pre-commit is failing, and 8 checks are still pending. Functional QA found no behavior issue with the PR goal, and I did not rerun CI-owned tests/linters locally per QA instructions.

This review was created by an AI agent (OpenHands) on behalf of the user.

@erisfully erisfully requested a review from openhands-agent May 27, 2026 23:33
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ QA Report: PASS WITH ISSUES

Functionally, the PR does achieve its invalid/expired LLM API-key goal for both sync and async SDK conversation runs; GitHub currently reports one failing CI check.

Does this PR achieve its stated goal?

Yes. I exercised the SDK as a user would by creating a real LLM with an intentionally invalid Anthropic API key, sending a message through Conversation, and running both run() and arun(). On main, the UI-facing ConversationErrorEvent.detail contained the raw litellm.AuthenticationError: AnthropicException ... invalid x-api-key provider blob; on commit 4e851294b0ac38b4844305b1c28c9a179eab370f, the same flow emitted exactly Your LLM API key appears to be invalid or has expired. Please update it in Settings. while still raising ConversationRunError caused by LLMAuthenticationError.

Phase Result
Environment Setup make build completed successfully with uv sync --dev.
CI Status ⚠️ gh pr checks showed 32 passing, 1 pending (qa-changes), 16 skipped, and 1 failing (Pre-commit checks).
Functional Verification ✅ Real SDK conversations with an invalid Anthropic key confirmed the before/after behavior in both sync and async paths.
Functional Verification

Test 1: Invalid LLM API key surfaces friendly UI-facing error detail

Step 1 — Reproduce / establish baseline without the fix:
Checked out origin/main (c6347949c4dacbdf9db364fc902e2be216599747) and ran a temporary SDK script that:

  • constructs LLM(model="anthropic/claude-3-5-haiku-20241022", api_key="invalid-openhands-qa-key")
  • creates Agent(tools=[]) and Conversation(...)
  • sends "Say hello in one short sentence."
  • calls both conv.run() and await conv.arun()
  • prints the observed ConversationErrorEvent details and raised exception cause

Ran:

git checkout --detach origin/main
uv run python /tmp/qa_llm_auth_check.py > /tmp/qa_base_stdout.txt 2> /tmp/qa_base_stderr.txt
tail -80 /tmp/qa_base_stdout.txt

Observed excerpt:

{
  "sync": {
    "raised": {
      "type": "ConversationRunError",
      "cause_type": "LLMAuthenticationError",
      "cause_contains_anthropic": true
    },
    "execution_status": "ConversationExecutionStatus.ERROR",
    "error_count": 1,
    "last_error_code": "LLMAuthenticationError",
    "last_error_detail": "litellm.AuthenticationError: AnthropicException - {"type":"error","error":{"type":"authentication_error","message":"invalid x-api-key"},"request_id":"req_011CbTx8teE3rm3o6aTGv9v6"}",
    "friendly_exact_match": false,
    "raw_provider_fragment_present": true
  },
  "async": {
    "raised": {
      "type": "ConversationRunError",
      "cause_type": "LLMAuthenticationError",
      "cause_contains_anthropic": true
    },
    "last_error_code": "LLMAuthenticationError",
    "last_error_detail": "litellm.AuthenticationError: AnthropicException - {"type":"error","error":{"type":"authentication_error","message":"invalid x-api-key"},"request_id":"req_011CbTx8uBy8WtTtidqNEAMW"}",
    "friendly_exact_match": false,
    "raw_provider_fragment_present": true
  }
}

This confirms the reported bug existed on the base branch: both sync and async conversation runs exposed the raw provider/litellm authentication blob in the event detail that the UI consumes.

Step 2 — Apply the PR's changes:
Checked out the PR commit:

git checkout --detach 4e851294b0ac38b4844305b1c28c9a179eab370f

Step 3 — Re-run with the fix in place:
Ran the same SDK script:

uv run python /tmp/qa_llm_auth_check.py > /tmp/qa_pr_stdout.txt 2> /tmp/qa_pr_stderr.txt
tail -80 /tmp/qa_pr_stdout.txt

Observed excerpt:

{
  "sync": {
    "raised": {
      "type": "ConversationRunError",
      "cause_type": "LLMAuthenticationError",
      "cause_contains_anthropic": true
    },
    "execution_status": "ConversationExecutionStatus.ERROR",
    "error_count": 1,
    "last_error_code": "LLMAuthenticationError",
    "last_error_detail": "Your LLM API key appears to be invalid or has expired. Please update it in Settings.",
    "friendly_exact_match": true,
    "raw_provider_fragment_present": false
  },
  "async": {
    "raised": {
      "type": "ConversationRunError",
      "cause_type": "LLMAuthenticationError",
      "cause_contains_anthropic": true
    },
    "execution_status": "ConversationExecutionStatus.ERROR",
    "error_count": 1,
    "last_error_code": "LLMAuthenticationError",
    "last_error_detail": "Your LLM API key appears to be invalid or has expired. Please update it in Settings.",
    "friendly_exact_match": true,
    "raw_provider_fragment_present": false
  }
}

This confirms the fix works end-to-end through the SDK conversation path: the UI-facing event detail is friendly and actionable, while the raised ConversationRunError still preserves LLMAuthenticationError as its cause for logging/debugging.

Issues Found

  • 🟠 Issue: GitHub currently reports Pre-commit checks as failing. I did not rerun or diagnose it because this QA pass was explicitly scoped away from tests/linters/formatters, but the PR should have green CI before merge.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py Outdated
Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py Outdated
@erisfully erisfully requested a review from VascoSch92 May 28, 2026 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Invalid LLM API key causes ConversationRunError on every message with no actionable error

3 participants