[Bug] LLM API failures silently produce zero findings with no retry or user notification

## Summary

Imagine going to the doctor for a full-body scan. The MRI machine malfunctions halfway through, but instead of telling you "we couldn't complete the scan," the radiologist hands you a clean report that says "No abnormalities found." You leave thinking you're healthy, but half your body was never actually examined.

When SkillSpector's LLM analyzers encounter an API failure (rate limit, timeout, network error, malformed response), they silently return `{"findings": []}` — zero findings. There is no retry logic, no backoff, no user-visible warning, and no indication in the report that the semantic analysis was incomplete. The user receives a lower risk score and assumes the skill is safe, when in reality the most sophisticated analysis layer simply never ran.

## Relation to Existing Issues

This issue is related to a cluster of reports about LLM failure handling:

- **#10** by @wernerkasselman-au: Documents the lack of retry/backoff (implementation-level root cause). PR #29 by @jhamze7 proposes retry + configurable concurrency — **still open, not merged**.
- **#9** by @wernerkasselman-au: Documents that one failed batch aborts all batches. PR #32 by @nyxst4ck proposes per-batch isolation — **still open, not merged**.
- **#11** by @wernerkasselman-au: Documents that `apply_filter` drops findings the LLM never analyzed.

**What this issue adds that the above do not cover:**

1. **The user-facing report is indistinguishable from a clean scan.** Even if retry and batch isolation are implemented (#29, #32), the user still has no way to know the LLM layer ran at all. There is no `"analysis_complete": false` flag, no `"llm_status": "partial"` field, no warning in terminal output.
2. **No auditability for compliance workflows.** Organizations using SkillSpector as a security gate need to prove that all analysis layers completed. The current output provides no evidence of this.
3. **Score deflation as a security gap.** The aggregate effect: missing LLM findings → lower risk score → skills pass security gates they should fail. This is distinct from the retry/isolation mechanism — it's about what the *user sees* when failures occur.

The existing issues focus on *why* the LLM fails (no retry, batch abort, concurrency). This issue focuses on *what the user experiences* when it fails (silence, indistinguishable output, no recourse).

## Why This Matters — Real-World Scenario

**Scenario: Security gate with LLM analysis as the critical detection layer**

An organization deploys SkillSpector with LLM analysis enabled because static patterns alone miss sophisticated attacks (prompt injection, indirect tool manipulation, semantic data exfiltration). Their threat model specifically relies on LLM-powered semantic analysis to catch what regex cannot.

During a peak usage period, their LLM provider (OpenAI, Anthropic, etc.) returns HTTP 429 (rate limit exceeded) for several minutes. During this window, 12 skills pass through the scanner. Each one:
- Gets static analysis results (perhaps 2-3 low findings)
- Gets LLM analysis results: `[]` (silently failed)
- Scores 15-20 (LOW risk)
- Auto-approved and published

Three of those 12 skills contained sophisticated prompt injection payloads that *only* the LLM semantic analyzer would have caught. They're now live in production.

The security team discovers this a week later during a routine re-scan (when the API is working again). The skills now score 75+ (HIGH risk). Three malicious skills operated in production for a week because the scanner gave them a false clean bill of health instead of saying "I couldn't complete the analysis."

## Reproduction

```bash
# Simulate LLM failure by setting an invalid API key
export OPENAI_API_KEY="sk-invalid-key-12345"

skillspector scan ./any-skill/ --format json -o report.json

python -c "
import json
data = json.load(open('report.json'))
issues = data['issues']
llm_findings = [i for i in issues if 'SQP' in i.get('rule_id', '') or 'semantic' in i.get('rule_id', '').lower()]
print(f'Total findings: {len(issues)}')
print(f'LLM findings: {len(llm_findings)}')  # 0 — silently failed
print(f'Score: {data[\"risk_assessment\"][\"score\"]}')
# No warning, no error message, no 'incomplete scan' flag in the output
"
```

The report looks identical to a scan where LLM analysis ran successfully and found nothing. There is no way to distinguish "LLM found no issues" from "LLM never ran."

## Root Cause

In `src/skillspector/llm_analyzer_base.py`, the `run_batches()` method (lines 366-393):

```python
async def run_batches(self, file_contents: dict[str, str]) -> list[dict]:
    results = []
    for batch in self._create_batches(file_contents):
        try:
            response = await self._invoke_llm(batch)
            findings = self._parse_response(response)
            results.extend(findings)
        except Exception:
            # Silent failure — no retry, no logging, no user notification
            continue
    return results
```

There is no:
- **Retry with backoff**: A transient 429 or 503 could succeed on the second attempt
- **Error propagation**: The caller never knows the LLM failed
- **Report annotation**: The output contains no field indicating incomplete analysis
- **Partial failure tracking**: If 3 of 5 batches fail, the user sees reduced findings but no warning

The same pattern appears in individual analyzers like `semantic_security_discovery.py` (lines 98-102):

```python
except Exception:
    return {"findings": []}  # Identical to "no findings" — indistinguishable
```

## Impact

- **False sense of security**: Users cannot distinguish "LLM found nothing" from "LLM never ran"
- **Silent degradation**: Transient API issues produce permanently incomplete scan results with no way to detect them after the fact
- **No auditability**: Compliance workflows require knowing whether all analysis layers completed — this information is lost
- **Risk score deflation**: Missing LLM findings lower the score, making dangerous skills appear safer
- **No retry opportunity**: Users cannot request "re-run failed LLM batches" because they don't know which ones failed

## Affected Version

SkillSpector v2.2.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] LLM API failures silently produce zero findings with no retry or user notification #149

Summary

Relation to Existing Issues

Why This Matters — Real-World Scenario

Reproduction

Root Cause

Impact

Affected Version

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] LLM API failures silently produce zero findings with no retry or user notification #149

Description

Summary

Relation to Existing Issues

Why This Matters — Real-World Scenario

Reproduction

Root Cause

Impact

Affected Version

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions