fix(analyzer): parse structured output from text so OpenAI-compatible endpoints don't crash LLM analysis#71
Conversation
…e endpoints The LLM analyzers built their structured-output client with `ChatOpenAI.with_structured_output(schema)`, which defaults to `method="json_schema"` (strict `response_format`). Endpoints that speak the OpenAI API but do not enforce that contract are free to ignore it, and several return the JSON wrapped in a markdown code fence or surrounded by prose. The strict parser then receives non-JSON and raises, so every semantic analyzer fails and the whole scan exits with a JSON decode error. Static analysis still runs, but all LLM findings are lost. Swap the strict client for the canonical langchain chain `llm | PydanticOutputParser`. The parser uses langchain's fence-tolerant `parse_json_markdown` under the hood and validates into the same Pydantic schema, so `parse_response` is unchanged for both the per-file analyzers and the meta-analyzer. Strict endpoints keep working; lenient ones now work too. Inject the parser's `get_format_instructions()` through a single `_with_format` helper at the run-loop chokepoint so the default prompt and the meta-analyzer's overridden prompt both get JSON guidance without duplicating it. Add regression tests that drive fenced and prose-wrapped JSON through the real parser, and update the three tests that asserted the removed `with_structured_output` seam to the direct-mock pattern the other tests use. Closes NVIDIA#69 Signed-off-by: Minh-Duc Bui <minhduc2k31905@gmail.com>
rng1995
left a comment
There was a problem hiding this comment.
LGTM. This is a well-diagnosed fix. Swapping the strict with_structured_output(method="json_schema") client for self._llm | PydanticOutputParser(...) plus appended get_format_instructions() is the right call: langchain's parse_json_markdown tolerates fenced / prose-wrapped JSON, so OpenAI-compatible endpoints that don't honor strict response_format no longer crash the whole semantic stage, while the output still validates into the same Pydantic schema. Centralizing the instruction-append in _with_format so subclasses (the meta-analyzer) inherit it is clean.
Tests are solid — _FakeChatModel | parser exercises the real parser end-to-end across bare JSON, fenced JSON, and prose-wrapped JSON, plus the empty-findings and format-instructions cases, and the meta-analyzer tests were updated consistently.
Minor / optional (non-blocking):
- A genuinely malformed (non-JSON) response will now raise
OutputParserExceptionfrom the chain. Since the stated goal is to stop endpoint quirks from crashing the scan, it's worth confirming there's a try/except around theinvoke/ainvokeso a single bad batch degrades gracefully (skip/log) rather than aborting the run. - Heads-up: the branch is currently conflicted (not mergeable) and will need a rebase.
|
@bmd1905 - Please resolve the conflicts, minor issues and merge the PR. |
What this fixes
Closes #69.
When you point
SKILLSPECTOR_PROVIDER=openaiat an OpenAI-compatible endpoint throughOPENAI_BASE_URL, the semantic stage can crash the whole scan with a JSON decode error. Static analysis still runs, but every LLM finding is silently lost.The cause is in how the analyzers build their structured-output client:
ChatOpenAI.with_structured_outputdefaults tomethod="json_schema", i.e. strictresponse_format. An endpoint can implement the OpenAI API faithfully and still not honor that, and plenty don't. The ones I hit return valid JSON, just wrapped in a markdown fence:PydanticOutputParseralready uses langchain's fence-tolerantparse_json_markdown, and it validates into the same Pydantic schema, soparse_responsedoes not change for either the per-file analyzers or the meta-analyzer. Strict endpoints keep working exactly as before; lenient ones now work too.I also moved the format guidance into one place. A small
_with_formathelper appendsget_format_instructions()at the run-loop chokepoint, so the default prompt and the meta-analyzer's own prompt both get it without each template having to embed JSON instructions by hand. It is a no-op in raw-string mode.I deliberately did not hand-roll a fence stripper or a brace slicer. langchain's parser already covers the fenced, bare, and prose-wrapped cases, and reusing it keeps the surface small.
Tests
PydanticOutputParservia a tiny fakeRunnable, plus an empty-findings case and a check that the format instructions land on the prompt.with_structured_outputseam. They use the same direct_structured_llmmock pattern that the other run-loop tests already rely on.make test: 605 passed, 11 skipped.make lintandmake formatare clean.How I verified it end to end
Against a non-strict OpenAI-compatible endpoint,
mainaborts in the semantic stage with the JSON error above. With this change the same scan completes and returns LLM-backed findings. Repeated runs show no more decode crashes; the only non-zero exits are the intendedrisk_score > 50gate (exit 1), not parser errors (exit 2).