Skip to content

Fix(llm): retry transient failures and make concurrency configurable#29

Open
jhamze7 wants to merge 8 commits into
NVIDIA:mainfrom
jhamze7:fix-llm-retry-backoff
Open

Fix(llm): retry transient failures and make concurrency configurable#29
jhamze7 wants to merge 8 commits into
NVIDIA:mainfrom
jhamze7:fix-llm-retry-backoff

Conversation

@jhamze7

@jhamze7 jhamze7 commented Jun 12, 2026

Copy link
Copy Markdown

This pull request addresses the issue of LLM calls immediately failing due to a lack of retry logic.

Changes:

llm_utils.py:

Created helper functions for the sync and async batch runners that retry LLM calls 4 times by default (this number is configurable with the max_attempts argument) with exponential backoff timing. These functions string match the error message to keep the implementation universal across different providers.

llm_analyzer_base.py:

Wrapped the LLM calls in the helper functions created in llm_utils.py. Also set max_concurrency to the SKILLSPECTOR_MAX_CONCURRENCY environment variable, with the value defaulting to 5 if the variable is empty.

.env.example:

Created a SKILLSPECTOR_MAX_CONCURRENCY environment variable. This change was also noted in docs/DEVELOPMENT.md

test_llm_analyzer_base.py:

Added sync and async tests verifying that LLM calls are retried when a simulated 429 error occurs. The tests also verify that processing succeeds after a transient failure and raises an exception once the retry limit is exhausted.


Closes #10

@CharmingGroot

Copy link
Copy Markdown
Contributor

Verified on a self-hosted OpenAI-compatible backend (vLLM serving Qwen/Qwen3.6-35B-A3B-FP8).

  • Configurable concurrency works: scanning the repo's tests/fixtures/malicious_skill/ fixture with LLM analysis completes the full pass at SKILLSPECTOR_MAX_CONCURRENCY=5 (default) and at 3. Being able to tune concurrency down is what a single-instance backend needs.
  • Retry classifier behaves as written: exercising retry_llm_call_sync directly — 429 and timeout errors are retried (4 attempts with backoff), non-retryable errors raise immediately. Matches the PR's stated scope.

LGTM for the #10 retry/backoff + configurable concurrency goal.

@rng1995 rng1995 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this — adding retry/backoff and making concurrency tunable is a real gap to close, and the retry core is implemented correctly (re-invoking the callable yields a fresh coroutine on each async attempt, and lambda prompt=prompt: avoids late-binding in the sync loop). The retry tests are good and target the right behavior (retry-then-succeed, and raise after exhausting attempts). I'm requesting changes for one concrete defect plus a couple of follow-ups.

Blocking — concurrency env parsing crashes on an empty/invalid value

max_concurrency = int(os.environ.get("SKILLSPECTOR_MAX_CONCURRENCY", "5"))

os.environ.get(key, "5") only returns the "5" default when the key is absent. If the variable is present but empty — SKILLSPECTOR_MAX_CONCURRENCY= — it returns "", and int("") raises ValueError, which aborts the scan. The PR description says it defaults to 5 "if the variable is empty," but the code doesn't do that. This isn't a hypothetical: this repo's own .env.example leaves several vars empty by convention (SKILLSPECTOR_PROVIDER=, SKILLSPECTOR_MODEL=, …), so a user emptying this one to "use the default" gets a hard crash. A non-numeric value (=auto) crashes the same way, =0 yields asyncio.Semaphore(0) (hangs forever), and a negative value raises in the Semaphore constructor.

Suggested hardening — fall back on empty/invalid and clamp to a sane minimum:

if max_concurrency is None:
    raw = (os.environ.get("SKILLSPECTOR_MAX_CONCURRENCY") or "").strip()
    try:
        max_concurrency = int(raw) if raw else 5
    except ValueError:
        logger.warning("Invalid SKILLSPECTOR_MAX_CONCURRENCY=%r; defaulting to 5", raw)
        max_concurrency = 5
    if max_concurrency < 1:
        max_concurrency = 1

Please also add a small test for this path (unset → 5, empty → 5, garbage → 5, 0/negative → clamped), since it's currently untested.

Non-blocking suggestions

  • Retryable set is narrow. You match 429 / ratelimit / timeout. Transient 5x conditions are common and currently not retried: HTTP 500/502/503, Anthropic's 529 "overloaded", and connection errors (e.g. APIConnectionError). Consider widening the match (e.g. include overloaded, service unavailable, connection, and the relevant status codes) so the "universal across providers" goal actually covers the most common transient cases. Minor caveat: a bare substring check for "429" could in theory match an unrelated message; keying off status/exception type where available is more precise.
  • No jitter. wait = 2 ** attempt with up to max_concurrency batches all hitting a 429 at once means they back off in lockstep and re-collide. Adding a small random jitter would smooth that out.
  • Default concurrency changed 10 → 5. The previous hardcoded default was 10; this lowers it to 5. That's a reasonable, documented choice, just calling it out as a behavior change for existing users.

Happy to re-approve once the env parsing is robust (and ideally covered by a test). The rest is the right direction.

@jhamze7

jhamze7 commented Jun 23, 2026

Copy link
Copy Markdown
Author

I went ahead and fixed the max concurrency logic by adding a more robust helper function in llm_utils.py that handles the edge cases discussed, along with a test to cover it in test_llm_analyzer_base.py. I've also added jitter to stagger the retry backoff of LLM API requests. Please let me know if there’s anything else that should be addressed!

@rng1995 rng1995 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Automated SkillSpector Review]

Re-review: blocker resolved — approving.

The concurrency env crash is fully hardened: _resolve_max_concurrency strips, falls back to 5 on empty/whitespace/non-numeric (with a warning), and clamps < 1 to 1 — so SKILLSPECTOR_MAX_CONCURRENCY= no longer raises, =0 no longer hangs on Semaphore(0), and negatives no longer raise. The parametrized test_max_concurrency_env_parsing covers {}/""/" "/"auto"/"0"/"-3"/"8".

My non-blocking suggestions were also taken: the retryable set now includes 5xx/529/overloaded/service-unavailable/connection, and jitter was added to the backoff. Non-blocking residual: retryability is still substring-matched on the error text (a message containing e.g. "500" could retry spuriously) and the sync/async retry predicates are duplicated — both fine to leave. Approving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stage 2: LLM calls have no retry/backoff on 429 or timeout; default concurrency 10 is unconfigurable

3 participants