Fix(llm): retry transient failures and make concurrency configurable by jhamze7 · Pull Request #29 · NVIDIA/SkillSpector

jhamze7 · 2026-06-12T03:30:14Z

This pull request addresses the issue of LLM calls immediately failing due to a lack of retry logic.

Changes:

llm_utils.py:

Created helper functions for the sync and async batch runners that retry LLM calls 4 times by default (this number is configurable with the max_attempts argument) with exponential backoff timing. These functions string match the error message to keep the implementation universal across different providers.

llm_analyzer_base.py:

Wrapped the LLM calls in the helper functions created in llm_utils.py. Also set max_concurrency to the SKILLSPECTOR_MAX_CONCURRENCY environment variable, with the value defaulting to 5 if the variable is empty.

.env.example:

Created a SKILLSPECTOR_MAX_CONCURRENCY environment variable. This change was also noted in docs/DEVELOPMENT.md

test_llm_analyzer_base.py:

Added sync and async tests verifying that LLM calls are retried when a simulated 429 error occurs. The tests also verify that processing succeeds after a transient failure and raises an exception once the retry limit is exhausted.

Closes #10

…ment variable

CharmingGroot · 2026-06-12T04:54:17Z

Verified on a self-hosted OpenAI-compatible backend (vLLM serving Qwen/Qwen3.6-35B-A3B-FP8).

Configurable concurrency works: scanning the repo's tests/fixtures/malicious_skill/ fixture with LLM analysis completes the full pass at SKILLSPECTOR_MAX_CONCURRENCY=5 (default) and at 3. Being able to tune concurrency down is what a single-instance backend needs.
Retry classifier behaves as written: exercising retry_llm_call_sync directly — 429 and timeout errors are retried (4 attempts with backoff), non-retryable errors raise immediately. Matches the PR's stated scope.

LGTM for the #10 retry/backoff + configurable concurrency goal.

rng1995

Thanks for tackling this — adding retry/backoff and making concurrency tunable is a real gap to close, and the retry core is implemented correctly (re-invoking the callable yields a fresh coroutine on each async attempt, and lambda prompt=prompt: avoids late-binding in the sync loop). The retry tests are good and target the right behavior (retry-then-succeed, and raise after exhausting attempts). I'm requesting changes for one concrete defect plus a couple of follow-ups.

Blocking — concurrency env parsing crashes on an empty/invalid value

max_concurrency = int(os.environ.get("SKILLSPECTOR_MAX_CONCURRENCY", "5"))

os.environ.get(key, "5") only returns the "5" default when the key is absent. If the variable is present but empty — SKILLSPECTOR_MAX_CONCURRENCY= — it returns "", and int("") raises ValueError, which aborts the scan. The PR description says it defaults to 5 "if the variable is empty," but the code doesn't do that. This isn't a hypothetical: this repo's own .env.example leaves several vars empty by convention (SKILLSPECTOR_PROVIDER=, SKILLSPECTOR_MODEL=, …), so a user emptying this one to "use the default" gets a hard crash. A non-numeric value (=auto) crashes the same way, =0 yields asyncio.Semaphore(0) (hangs forever), and a negative value raises in the Semaphore constructor.

Suggested hardening — fall back on empty/invalid and clamp to a sane minimum:

if max_concurrency is None:
    raw = (os.environ.get("SKILLSPECTOR_MAX_CONCURRENCY") or "").strip()
    try:
        max_concurrency = int(raw) if raw else 5
    except ValueError:
        logger.warning("Invalid SKILLSPECTOR_MAX_CONCURRENCY=%r; defaulting to 5", raw)
        max_concurrency = 5
    if max_concurrency < 1:
        max_concurrency = 1

Please also add a small test for this path (unset → 5, empty → 5, garbage → 5, 0/negative → clamped), since it's currently untested.

Non-blocking suggestions

Retryable set is narrow. You match 429 / ratelimit / timeout. Transient 5x conditions are common and currently not retried: HTTP 500/502/503, Anthropic's 529 "overloaded", and connection errors (e.g. APIConnectionError). Consider widening the match (e.g. include overloaded, service unavailable, connection, and the relevant status codes) so the "universal across providers" goal actually covers the most common transient cases. Minor caveat: a bare substring check for "429" could in theory match an unrelated message; keying off status/exception type where available is more precise.
No jitter. wait = 2 ** attempt with up to max_concurrency batches all hitting a 429 at once means they back off in lockstep and re-collide. Adding a small random jitter would smooth that out.
Default concurrency changed 10 → 5. The previous hardcoded default was 10; this lowers it to 5. That's a reasonable, documented choice, just calling it out as a behavior change for existing users.

Happy to re-approve once the env parsing is robust (and ideally covered by a test). The rest is the right direction.

jhamze7 · 2026-06-23T17:47:19Z

I went ahead and fixed the max concurrency logic by adding a more robust helper function in llm_utils.py that handles the edge cases discussed, along with a test to cover it in test_llm_analyzer_base.py. I've also added jitter to stagger the retry backoff of LLM API requests. Please let me know if there’s anything else that should be addressed!

rng1995

[Automated SkillSpector Review]

Re-review: blocker resolved — approving.

The concurrency env crash is fully hardened: _resolve_max_concurrency strips, falls back to 5 on empty/whitespace/non-numeric (with a warning), and clamps < 1 to 1 — so SKILLSPECTOR_MAX_CONCURRENCY= no longer raises, =0 no longer hangs on Semaphore(0), and negatives no longer raise. The parametrized test_max_concurrency_env_parsing covers {}/""/" "/"auto"/"0"/"-3"/"8".

My non-blocking suggestions were also taken: the retryable set now includes 5xx/529/overloaded/service-unavailable/connection, and jitter was added to the backoff. Non-blocking residual: retryability is still substring-matched on the error text (a message containing e.g. "500" could retry spuriously) and the sync/async retry predicates are duplicated — both fine to leave. Approving.

jhamze7 added 4 commits June 11, 2026 22:41

Created retry tests for sync and async calls

4336c9b

Updated documentation to include SKILLSPECTOR_MAX_CONCURRENCY environ…

11492ad

…ment variable

fix(llm): retry transient failures and make concurrency configurable

97bbfb1

Increased default max retries to 4

619d522

rng1995 requested changes Jun 21, 2026

View reviewed changes

rng1995 mentioned this pull request Jun 21, 2026

Bound build_context file reads and fail closed on oversized inputs #19

Open

mimran-khan mentioned this pull request Jun 22, 2026

[Bug] LLM API failures silently produce zero findings with no retry or user notification #149

Closed

jhamze7 added 4 commits June 23, 2026 12:53

Expanded list of retry cases

f7b4951

Implemented suggested hardening for max_concurrency

8a83a37

Moved max_concurrency logic to a helper function

bb824c7

Added coverage for max concurrency edge cases

4500f3c

rng1995 approved these changes Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix(llm): retry transient failures and make concurrency configurable#29

Fix(llm): retry transient failures and make concurrency configurable#29
jhamze7 wants to merge 8 commits into
NVIDIA:mainfrom
jhamze7:fix-llm-retry-backoff

jhamze7 commented Jun 12, 2026

Uh oh!

CharmingGroot commented Jun 12, 2026

Uh oh!

rng1995 left a comment

Uh oh!

jhamze7 commented Jun 23, 2026

Uh oh!

rng1995 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jhamze7 commented Jun 12, 2026

Changes:

llm_utils.py:

llm_analyzer_base.py:

.env.example:

test_llm_analyzer_base.py:

Uh oh!

CharmingGroot commented Jun 12, 2026

Uh oh!

rng1995 left a comment

Choose a reason for hiding this comment

Uh oh!

jhamze7 commented Jun 23, 2026

Uh oh!

rng1995 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants