-
-
Notifications
You must be signed in to change notification settings - Fork 7.4k
[Fix] Request Timeout needs to be also fetched from litellm_settings.request_timeout
#25591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ishaan-berri
merged 11 commits into
BerriAI:litellm_ishaan_april15_2
from
harish876:azure-anthropic-timeout-bug
Apr 16, 2026
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
ff33fee
[Test] Add Azure async chat completion timeout test. WIP
harish876 645e43b
[Refactor] Implement timeout resolution logic in completion function
harish876 2345baf
remove stale test case
harish876 862b693
remove extra print statement
harish876 de20182
default request timeout value in constants to 600s to match timeout d…
harish876 c2b26ed
fix request timeout if using default value from constants.py
harish876 fdccbcb
Merge branch 'main' of https://github.com/BerriAI/litellm into azure-…
harish876 0fa9cd1
update code structure, test cases
harish876 2c14d42
only override if the global timeout sets timeout to 6000s
harish876 6a3ca9a
update code structure, move hard coded values to const and make the r…
harish876 9ae30ed
modify default timeout values, replacing hard coded ones with default…
harish876 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| """Completion HTTP timeout resolution (kept out of ``main.py`` to limit import cycles).""" | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from typing import Callable, Optional, Union | ||
|
|
||
| import httpx | ||
|
|
||
| from litellm.constants import ( | ||
| COMPLETION_HTTP_FALLBACK_SECONDS, | ||
| DEFAULT_REQUEST_TIMEOUT_SECONDS, | ||
| ) | ||
|
|
||
|
|
||
| class CompletionTimeout: | ||
| """Resolves HTTP timeout for ``completion()`` from model vs global settings.""" | ||
|
|
||
| @staticmethod | ||
| def _fallback_when_no_explicit_timeout( | ||
| global_timeout: Optional[Union[float, str]], | ||
| ) -> float: | ||
| """ | ||
| Used when ``model_timeout`` and kwargs timeouts are all unset. | ||
|
|
||
| ``global_timeout`` is :attr:`litellm.request_timeout` (numeric / string), not | ||
| :class:`httpx.Timeout`. | ||
|
|
||
| If it equals :data:`~litellm.constants.DEFAULT_REQUEST_TIMEOUT_SECONDS` (6000), | ||
| return :data:`~litellm.constants.COMPLETION_HTTP_FALLBACK_SECONDS`. Same if | ||
| ``None``. Otherwise return ``float(global_timeout)``. | ||
| """ | ||
| if global_timeout is None: | ||
| return COMPLETION_HTTP_FALLBACK_SECONDS | ||
| if float(global_timeout) == float(DEFAULT_REQUEST_TIMEOUT_SECONDS): | ||
| return COMPLETION_HTTP_FALLBACK_SECONDS | ||
| return float(global_timeout) | ||
|
|
||
| @staticmethod | ||
| def resolve( | ||
| model_timeout: Optional[Union[float, str, httpx.Timeout]], | ||
| kwargs: dict, | ||
| custom_llm_provider: str, | ||
| *, | ||
| global_timeout: Optional[Union[float, str]], | ||
| supports_httpx_timeout: Callable[[str], bool], | ||
| ) -> Union[float, httpx.Timeout]: | ||
| """ | ||
| Resolution order (first non-None wins): | ||
|
|
||
| 1. ``model_timeout`` (call argument / merged ``litellm_params``) | ||
| 2. ``kwargs["timeout"]`` | ||
| 3. ``kwargs["request_timeout"]`` | ||
| 4. Fallback from ``global_timeout`` (:attr:`litellm.request_timeout`) — if it is | ||
| the package default (6000), use 600 instead. | ||
|
|
||
| Coerce :class:`httpx.Timeout` when the provider does not support it. | ||
| Explicit ``6000`` on the model or in kwargs is kept as ``6000``. | ||
| """ | ||
| resolved: Union[float, str, httpx.Timeout] | ||
| if model_timeout is not None: | ||
| resolved = model_timeout | ||
| elif kwargs.get("timeout") is not None: | ||
| resolved = kwargs["timeout"] | ||
| elif kwargs.get("request_timeout") is not None: | ||
| resolved = kwargs["request_timeout"] | ||
| else: | ||
| resolved = CompletionTimeout._fallback_when_no_explicit_timeout( | ||
| global_timeout | ||
| ) | ||
|
|
||
| if isinstance(resolved, httpx.Timeout) and not supports_httpx_timeout( | ||
| custom_llm_provider | ||
| ): | ||
| read_timeout = resolved.read | ||
| resolved = ( | ||
| float(read_timeout) | ||
| if read_timeout is not None | ||
| else COMPLETION_HTTP_FALLBACK_SECONDS | ||
| ) # default 10 min timeout | ||
| elif not isinstance(resolved, httpx.Timeout): | ||
| resolved = float(resolved) # type: ignore | ||
|
|
||
| return resolved |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| """ | ||
| ``_get_httpx_client`` + ``HTTPHandler.post`` (same pattern as Azure Anthropic sync path: | ||
| ``_get_httpx_client(params={"timeout": ...})`` then ``post(..., timeout=...)``). | ||
|
|
||
| Uses https://httpbin.org/delay/10 with ``timeout=5`` — the handler must raise :class:`~litellm.exceptions.Timeout` | ||
| before the 10s delay completes. Skips if httpbin is unreachable. | ||
|
|
||
| Lives under ``local_testing`` (not ``make test-unit``). | ||
| """ | ||
|
|
||
| import json | ||
| import os | ||
| import sys | ||
|
|
||
| import httpx | ||
| import pytest | ||
|
|
||
| sys.path.insert( | ||
| 0, os.path.abspath(os.path.join(os.path.dirname(__file__), "../..")) | ||
| ) | ||
|
|
||
| from litellm.exceptions import Timeout as LitellmTimeout | ||
| from litellm.llms.custom_httpx.http_handler import _get_httpx_client | ||
|
|
||
| _HTTPBIN_DELAY_S = 10 | ||
| _PER_REQUEST_TIMEOUT_S = 5.0 | ||
| _CLIENT_DEFAULT_TIMEOUT_S = 60.0 | ||
|
|
||
|
|
||
| def test_post_delay_exceeds_per_request_timeout_raises(): | ||
| try: | ||
| httpx.get("https://httpbin.org/get", timeout=5.0) | ||
| except Exception as e: | ||
| pytest.skip(f"httpbin.org unreachable: {e}") | ||
|
|
||
| handler = _get_httpx_client(params={"timeout": _CLIENT_DEFAULT_TIMEOUT_S}) | ||
| try: | ||
| with pytest.raises(LitellmTimeout): | ||
| handler.post( | ||
| f"https://httpbin.org/delay/{_HTTPBIN_DELAY_S}", | ||
| headers={"content-type": "application/json"}, | ||
| data=json.dumps({"model": "claude", "messages": []}), | ||
| timeout=_PER_REQUEST_TIMEOUT_S, | ||
| ) | ||
| finally: | ||
| handler.close() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
42 changes: 42 additions & 0 deletions
42
tests/test_litellm/llms/azure_ai/claude/test_main_azure_anthropic_timeout.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| """ | ||
| Ensure litellm.completion() forwards timeout to Azure Anthropic handler (main.py dispatch). | ||
| """ | ||
|
|
||
| import os | ||
| import sys | ||
| from unittest.mock import MagicMock, patch | ||
|
|
||
| sys.path.insert( | ||
| 0, os.path.abspath(os.path.join(os.path.dirname(__file__), "../../../../..")) | ||
| ) | ||
|
|
||
| from litellm import completion | ||
| from litellm.types.utils import ModelResponse | ||
|
|
||
|
|
||
| def test_main_azure_ai_claude_completion_passes_timeout_to_azure_anthropic_handler(): | ||
| captured: dict = {} | ||
|
|
||
| def fake_azure_anthropic_completion(**kwargs): | ||
| captured.update(kwargs) | ||
| return ModelResponse() | ||
|
|
||
| with patch( | ||
| "litellm.main.azure_anthropic_chat_completions" | ||
| ) as mock_azure_anthropic: | ||
| mock_azure_anthropic.completion = MagicMock( | ||
| side_effect=fake_azure_anthropic_completion | ||
| ) | ||
|
|
||
| completion( | ||
| model="azure_ai/claude-sonnet-4-5", | ||
| messages=[{"role": "user", "content": "hi"}], | ||
| api_base="https://example.services.ai.azure.com/anthropic", | ||
| api_key="test-key", | ||
| timeout=42.5, | ||
| ) | ||
|
|
||
| mock_azure_anthropic.completion.assert_called_once() | ||
| assert captured["timeout"] == 42.5 | ||
| assert captured["model"] == "claude-sonnet-4-5" | ||
| assert captured["custom_llm_provider"] == "azure_ai" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
request_timeoutdefault change widens scope beyondcompletion()Changing the default from
6000to600fixes thecompletion()regression flagged in the previous thread, butlitellm.request_timeoutis also the default forRouter.__init__(self.timeout = timeout or litellm.request_timeout,router.py:530),speech()(main.py:6760), and the Anthropic / Azure-Anthropic / OpenAI count-token handlers. All of these would silently drop from a 6000 s ceiling to 600 s for users who have not set an explicit timeout. Long-running router calls or TTS jobs that complete in 600–6000 s will now time out.Per the "avoid backwards-incompatible changes without user-controlled flags" rule, consider keeping the constant at
6000(or introducing a separateCOMPLETION_REQUEST_TIMEOUTconstant) and only using the explicit-600 fallback inside_resolve_completion_timeout()itself, where you control the scope.Rule Used: What: avoid backwards-incompatible changes without... (source)