feat: add structured category and severity fields to review findings#29
feat: add structured category and severity fields to review findings#29mvanhorn wants to merge 1 commit into
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Please merge this, it is an important feature to end users. |
|
Thanks for the great work, @mvanhorn! 🙏 This is a really solid contribution — adding structured One thing I'd like to evaluate before merging: this change introduces modifications to the review prompts. I want to carefully assess whether the additional prompt instructions for category/severity classification could have any impact on the quality or focus of the review output itself. I'll run some comparative reviews today and get back to you with my findings. Appreciate your patience — expect an update later today! |
|
Thanks @mvanhorn for this well-implemented PR! The code quality is solid, and the structured category and severity fields will be valuable for downstream CI integrations. However, after conducting careful evaluations on our benchmark suite, I've observed that introducing these changes results in a noticeable degradation in the overall review quality of the tool. The additional prompt instructions for category/severity classification appear to be affecting the focus and accuracy of the review output itself. We're currently investigating the root cause of this regression in depth. Once we identify the underlying issue, we'll provide specific improvement suggestions — potentially around prompt engineering, model behavior, or field population strategies. Please keep this PR open — we believe this feature is important and want to work through the quality concerns rather than close it. We'll follow up soon with concrete next steps and any necessary adjustments. Appreciate your patience and contribution! |
Summary
Adds two optional structured fields,
categoryandseverity, to every reviewfinding. They flow through the model, tool-call parsing, JSON output, agent
output, and the human-readable text renderer, and are populated by the review
LLM via the
code_commenttool schema and a short prompt-template instruction.Allowed values match the issue's tables:
severity:critical,high,medium,low,infocategory:bug,security,performance,maintainability,test,style,documentation,otherWhy this matters
Per #16, the machine-readable output of
ocr reviewexposes finding text,location, and suggestion, but no structured category/severity per finding. CI
integrations (GitHub Actions, GitLab CI) currently have to re-parse
natural-language comment text to sort, group, filter, or gate builds by
importance. The maintainers asked the reporter to open this dedicated issue and
laid out the enum tables plus acceptance criteria this PR implements:
categoryandseverityper finding whenthe model provides them.
omitempty, and the tool schema does not mark themrequired, so the keys areomitted entirely when empty and older/less-capable models still emit valid tool
calls.
The change is backward-compatible by construction (optional +
omitempty+ notrequired).
Out of scope by design (the issue frames these as follow-ups, design questions
#3/#4): no
--severityCLI filtering flags and noconfidencefield. This PRlands the data first; filtering/gating can be a separate change now that the
fields exist.
Testing
go build ./...— successgo vet ./...— cleango test ./...— all packages pass (198 tests)internal/tool/code_comment_test.go:category/severityare parsed when presentwhen empty (no
"category":"")Fixes #16
AI was used for assistance.