Fix(llm): retry transient failures and make concurrency configurable#29
Fix(llm): retry transient failures and make concurrency configurable#29jhamze7 wants to merge 8 commits into
Conversation
|
Verified on a self-hosted OpenAI-compatible backend (vLLM serving
LGTM for the #10 retry/backoff + configurable concurrency goal. |
rng1995
left a comment
There was a problem hiding this comment.
Thanks for tackling this — adding retry/backoff and making concurrency tunable is a real gap to close, and the retry core is implemented correctly (re-invoking the callable yields a fresh coroutine on each async attempt, and lambda prompt=prompt: avoids late-binding in the sync loop). The retry tests are good and target the right behavior (retry-then-succeed, and raise after exhausting attempts). I'm requesting changes for one concrete defect plus a couple of follow-ups.
Blocking — concurrency env parsing crashes on an empty/invalid value
max_concurrency = int(os.environ.get("SKILLSPECTOR_MAX_CONCURRENCY", "5"))os.environ.get(key, "5") only returns the "5" default when the key is absent. If the variable is present but empty — SKILLSPECTOR_MAX_CONCURRENCY= — it returns "", and int("") raises ValueError, which aborts the scan. The PR description says it defaults to 5 "if the variable is empty," but the code doesn't do that. This isn't a hypothetical: this repo's own .env.example leaves several vars empty by convention (SKILLSPECTOR_PROVIDER=, SKILLSPECTOR_MODEL=, …), so a user emptying this one to "use the default" gets a hard crash. A non-numeric value (=auto) crashes the same way, =0 yields asyncio.Semaphore(0) (hangs forever), and a negative value raises in the Semaphore constructor.
Suggested hardening — fall back on empty/invalid and clamp to a sane minimum:
if max_concurrency is None:
raw = (os.environ.get("SKILLSPECTOR_MAX_CONCURRENCY") or "").strip()
try:
max_concurrency = int(raw) if raw else 5
except ValueError:
logger.warning("Invalid SKILLSPECTOR_MAX_CONCURRENCY=%r; defaulting to 5", raw)
max_concurrency = 5
if max_concurrency < 1:
max_concurrency = 1Please also add a small test for this path (unset → 5, empty → 5, garbage → 5, 0/negative → clamped), since it's currently untested.
Non-blocking suggestions
- Retryable set is narrow. You match
429/ratelimit/timeout. Transient5xconditions are common and currently not retried: HTTP 500/502/503, Anthropic's 529 "overloaded", and connection errors (e.g.APIConnectionError). Consider widening the match (e.g. includeoverloaded,service unavailable,connection, and the relevant status codes) so the "universal across providers" goal actually covers the most common transient cases. Minor caveat: a bare substring check for"429"could in theory match an unrelated message; keying off status/exception type where available is more precise. - No jitter.
wait = 2 ** attemptwith up tomax_concurrencybatches all hitting a 429 at once means they back off in lockstep and re-collide. Adding a small random jitter would smooth that out. - Default concurrency changed 10 → 5. The previous hardcoded default was 10; this lowers it to 5. That's a reasonable, documented choice, just calling it out as a behavior change for existing users.
Happy to re-approve once the env parsing is robust (and ideally covered by a test). The rest is the right direction.
|
I went ahead and fixed the max concurrency logic by adding a more robust helper function in |
rng1995
left a comment
There was a problem hiding this comment.
[Automated SkillSpector Review]
Re-review: blocker resolved — approving.
The concurrency env crash is fully hardened: _resolve_max_concurrency strips, falls back to 5 on empty/whitespace/non-numeric (with a warning), and clamps < 1 to 1 — so SKILLSPECTOR_MAX_CONCURRENCY= no longer raises, =0 no longer hangs on Semaphore(0), and negatives no longer raise. The parametrized test_max_concurrency_env_parsing covers {}/""/" "/"auto"/"0"/"-3"/"8".
My non-blocking suggestions were also taken: the retryable set now includes 5xx/529/overloaded/service-unavailable/connection, and jitter was added to the backoff. Non-blocking residual: retryability is still substring-matched on the error text (a message containing e.g. "500" could retry spuriously) and the sync/async retry predicates are duplicated — both fine to leave. Approving.
This pull request addresses the issue of LLM calls immediately failing due to a lack of retry logic.
Changes:
llm_utils.py:
Created helper functions for the sync and async batch runners that retry LLM calls 4 times by default (this number is configurable with the
max_attemptsargument) with exponential backoff timing. These functions string match the error message to keep the implementation universal across different providers.llm_analyzer_base.py:
Wrapped the LLM calls in the helper functions created in
llm_utils.py. Also set max_concurrency to the SKILLSPECTOR_MAX_CONCURRENCY environment variable, with the value defaulting to 5 if the variable is empty..env.example:
Created a
SKILLSPECTOR_MAX_CONCURRENCYenvironment variable. This change was also noted indocs/DEVELOPMENT.mdtest_llm_analyzer_base.py:
Added sync and async tests verifying that LLM calls are retried when a simulated 429 error occurs. The tests also verify that processing succeeds after a transient failure and raises an exception once the retry limit is exhausted.
Closes #10