Skip to content

fix(analyzer): parse structured output from text so OpenAI-compatible endpoints don't crash LLM analysis#71

Open
bmd1905 wants to merge 1 commit into
NVIDIA:mainfrom
bmd1905:fix/openai-compatible-structured-output
Open

fix(analyzer): parse structured output from text so OpenAI-compatible endpoints don't crash LLM analysis#71
bmd1905 wants to merge 1 commit into
NVIDIA:mainfrom
bmd1905:fix/openai-compatible-structured-output

Conversation

@bmd1905

@bmd1905 bmd1905 commented Jun 15, 2026

Copy link
Copy Markdown

What this fixes

Closes #69.

When you point SKILLSPECTOR_PROVIDER=openai at an OpenAI-compatible endpoint through OPENAI_BASE_URL, the semantic stage can crash the whole scan with a JSON decode error. Static analysis still runs, but every LLM finding is silently lost.

The cause is in how the analyzers build their structured-output client:

self._structured_llm = self._llm.with_structured_output(self.response_schema)

ChatOpenAI.with_structured_output defaults to method="json_schema", i.e. strict response_format. An endpoint can implement the OpenAI API faithfully and still not honor that, and plenty don't. The ones I hit return valid JSON, just wrapped in a markdown fence:

```json
{"findings": []}

or with a sentence of prose on either side. The strict path hands that straight to `json.loads`, which raises, and the scan exits with `Error: Expecting value: line 1 column 1 (char 0)` or `Invalid JSON ... for LLMAnalysisResult`. Every analyzer that inherits `LLMAnalyzerBase` is affected, including the meta-analyzer.

## The change

I dropped the strict client in favor of the canonical langchain chain:

```python
self._parser = PydanticOutputParser(pydantic_object=self.response_schema)
self._structured_llm = self._llm | self._parser
self._format_instructions = self._parser.get_format_instructions()

PydanticOutputParser already uses langchain's fence-tolerant parse_json_markdown, and it validates into the same Pydantic schema, so parse_response does not change for either the per-file analyzers or the meta-analyzer. Strict endpoints keep working exactly as before; lenient ones now work too.

I also moved the format guidance into one place. A small _with_format helper appends get_format_instructions() at the run-loop chokepoint, so the default prompt and the meta-analyzer's own prompt both get it without each template having to embed JSON instructions by hand. It is a no-op in raw-string mode.

I deliberately did not hand-roll a fence stripper or a brace slicer. langchain's parser already covers the fenced, bare, and prose-wrapped cases, and reusing it keeps the surface small.

Tests

  • Added a regression suite that pushes bare, fenced, and prose-wrapped JSON through the real PydanticOutputParser via a tiny fake Runnable, plus an empty-findings case and a check that the format instructions land on the prompt.
  • Updated the three existing tests that asserted the now-removed with_structured_output seam. They use the same direct _structured_llm mock pattern that the other run-loop tests already rely on.
  • make test: 605 passed, 11 skipped. make lint and make format are clean.

How I verified it end to end

Against a non-strict OpenAI-compatible endpoint, main aborts in the semantic stage with the JSON error above. With this change the same scan completes and returns LLM-backed findings. Repeated runs show no more decode crashes; the only non-zero exits are the intended risk_score > 50 gate (exit 1), not parser errors (exit 2).

…e endpoints

The LLM analyzers built their structured-output client with
`ChatOpenAI.with_structured_output(schema)`, which defaults to
`method="json_schema"` (strict `response_format`). Endpoints that speak the
OpenAI API but do not enforce that contract are free to ignore it, and several
return the JSON wrapped in a markdown code fence or surrounded by prose. The
strict parser then receives non-JSON and raises, so every semantic analyzer
fails and the whole scan exits with a JSON decode error. Static analysis still
runs, but all LLM findings are lost.

Swap the strict client for the canonical langchain chain `llm | PydanticOutputParser`.
The parser uses langchain's fence-tolerant `parse_json_markdown` under the hood
and validates into the same Pydantic schema, so `parse_response` is unchanged
for both the per-file analyzers and the meta-analyzer. Strict endpoints keep
working; lenient ones now work too.

Inject the parser's `get_format_instructions()` through a single `_with_format`
helper at the run-loop chokepoint so the default prompt and the meta-analyzer's
overridden prompt both get JSON guidance without duplicating it.

Add regression tests that drive fenced and prose-wrapped JSON through the real
parser, and update the three tests that asserted the removed
`with_structured_output` seam to the direct-mock pattern the other tests use.

Closes NVIDIA#69

Signed-off-by: Minh-Duc Bui <minhduc2k31905@gmail.com>

@rng1995 rng1995 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This is a well-diagnosed fix. Swapping the strict with_structured_output(method="json_schema") client for self._llm | PydanticOutputParser(...) plus appended get_format_instructions() is the right call: langchain's parse_json_markdown tolerates fenced / prose-wrapped JSON, so OpenAI-compatible endpoints that don't honor strict response_format no longer crash the whole semantic stage, while the output still validates into the same Pydantic schema. Centralizing the instruction-append in _with_format so subclasses (the meta-analyzer) inherit it is clean.

Tests are solid — _FakeChatModel | parser exercises the real parser end-to-end across bare JSON, fenced JSON, and prose-wrapped JSON, plus the empty-findings and format-instructions cases, and the meta-analyzer tests were updated consistently.

Minor / optional (non-blocking):

  • A genuinely malformed (non-JSON) response will now raise OutputParserException from the chain. Since the stated goal is to stop endpoint quirks from crashing the scan, it's worth confirming there's a try/except around the invoke/ainvoke so a single bad batch degrades gracefully (skip/log) rather than aborting the run.
  • Heads-up: the branch is currently conflicted (not mergeable) and will need a rebase.

@rng1995

rng1995 commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

@bmd1905 - Please resolve the conflicts, minor issues and merge the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI-compatible endpoints that don't strictly honor response_format crash all LLM analyzers (fenced/prose JSON)

2 participants