feat: drop ge/le schema bounds on LLM finding confidence and start_line#133
Conversation
Pydantic ge/le constraints emit JSON-schema minimum/maximum. Some OpenAI-compatible structured-output and tool-calling endpoints reject a response schema that carries those keywords, so the LLM call fails before any finding is returned. Express the ranges in the field descriptions and enforce them with runtime field_validators that clamp into range instead. Clamping rather than raising means a slightly out-of-range model value normalises rather than failing the whole structured-output parse and dropping the batch: - LLMFinding.confidence and MetaAnalyzerFinding.confidence clamp to [0.0, 1.0] - LLMFinding.start_line clamps to >= 1 Only the numeric bound is removed, not the requiredness: start_line stays a required field (no default), so a finding with no location is still rejected rather than materialised at line 1. The emitted JSON schema no longer contains minimum/maximum, so it stays portable across structured-output backends, while the in-code guarantee is unchanged. Adds tests asserting the schema carries no numeric bounds, that start_line stays required, and that the validators clamp. Signed-off-by: Werner Kasselman <145896621+wernerkasselman-au@users.noreply.github.com>
rng1995
left a comment
There was a problem hiding this comment.
Review: APPROVE
Thanks for splitting this out of #19 with its own rationale. The change is focused, the intent is well-documented in both the code comments and the description, and the tests lock in the guarantees that matter.
Verification against the stated contract
- (a)
ge/leremoved — Confirmed onLLMFinding.start_line,LLMFinding.confidence, andMetaAnalyzerFinding.confidence; each is now a bareField(description=...)with the range conveyed in the description. - (b) Validators genuinely clamp —
_clamp_confidencereturnsmin(1.0, max(0.0, v))([0.0, 1.0]) and_clamp_start_linereturnsv if v >= 1 else 1(>= 1). Both are@field_validatorin the default ("after") mode, so they run on the already-coerced numeric value, which is the right choice here — a "before" validator would see raw/uncoerced input. Non-numeric input still fails type coercion before reaching the clamp, which is the desired behavior. - (c)
start_linestays required —LLMFinding.start_linehas no default, so a finding with no location is still rejected rather than silently materialized at line 1.test_llm_finding_start_line_is_requiredpins this down. Good, explicit separation between "drop the numeric bound" and "drop requiredness". - (d) Emitted schema is clean — The recursive
_numeric_keywordswalk asserts neither model'smodel_json_schema()containsminimum/maximum, which is exactly the portability goal.
Design trade-off (clamp vs. raise)
Reasonable for this contract. A single value that lands a hair outside the range shouldn't fail the whole structured-output parse and drop the rest of the batch. Confidence is advisory, and snapping a whole-file finding's start_line 0 -> 1 keeps the finding instead of losing it over an off-by-one.
Minor / optional (non-blocking): the clamp is silent, so a genuinely misbehaving model (e.g. confidence=5.0 -> 1.0, or start_line=-50 -> 1) is normalized with no trace. For a security tool, emitting a logger.debug/warning when a value is actually out of range (vs. within range) would preserve that signal without changing the resilience behavior. Not required to merge.
Tests
Good coverage: schema-has-no-bounds for both models, clamp high/low for both, start_line clamp, and the required-field guarantee. Converting the two prior "raises on confidence 1.5" tests into clamp assertions is a deliberate, documented contract change, and the updates are consistent with it.
Code quality / compatibility
Clear and minimal. It reuses the field_validator import already present in the meta-analyzer module (used by the existing assessment parser), the comments explain the "why" rather than the "what", and there's no dead code. The only behavior change for callers is the raise -> clamp contract on out-of-range numeric input, which is intentional and covered by the updated tests.
LGTM.
Closes #132. Split out of #19 at @rng1995's request, since it changes the LLM structured-output contract and deserves its own rationale.
Problem
LLMFindingandMetaAnalyzerFindingexpress their numeric ranges with Pydanticge/le(start_line >= 1, confidence in [0.0, 1.0]). Pydantic emits those as JSON-schemaminimum/maximumin the response schema. Some OpenAI-compatible structured-output and tool-calling endpoints validate that schema and rejectminimum/maximum, so the LLM call fails before a single finding comes back.Change
ge/leField bounds onLLMFinding.start_line,LLMFinding.confidence, andMetaAnalyzerFinding.confidence, and convey the ranges in the field descriptions.field_validators that clamp into range: confidence to [0.0, 1.0], start_line to >= 1.Clamping rather than raising is deliberate: an LLM occasionally returns a value just outside the range (confidence 1.01, or start_line 0 for a whole-file finding), and normalising it is better than a ValidationError that fails the whole structured-output parse and drops the batch.
Result
The emitted JSON schema no longer carries
minimum/maximum, so it stays portable across structured-output backends, and the in-code [0, 1] / >= 1 guarantee is unchanged. Tests assert both: that neither model'smodel_json_schema()containsminimumormaximum, and that the validators clamp.Notes
625 passed, 12 skipped, ruff clean. The two existing tests that asserted a ValueError on out-of-range confidence now assert the clamp.meta_analyzeralready importedfield_validator(for the overall_assessment parser), so this reuses it.