feat: synthesize rollup findings failures and coverage by rianjs · Pull Request #384 · open-cli-collective/codereview-cli

rianjs · 2026-06-24T21:31:48Z

Summary

require specialist reviewer outputs to include inspected/skipped files and constraints
derive compact reviewer coverage summaries for rollup prompts, dry-run JSON, and final rollup markdown
coerce clean approval to comment when reviewer coverage is incomplete, while preserving isolated reviewer failure behavior

Verification

rtk go test ./internal/pipeline ./internal/reviewplan ./internal/llm ./internal/cmd/benchmarkcmd ./internal/cmd/reviewcmd -count=1
rtk make lint
rtk make build

Closes #374

rianjs · 2026-06-24T21:34:56Z

Findings

Major: Coverage can be marked complete even when the reviewer did not inspect its full assigned scope. buildReviewerCoverage sets complete_broad or complete_constrained whenever there are no SkippedFiles, without checking that InspectedFiles covers Scope (internal/pipeline/pipeline.go). A broad reviewer assigned main.go, other.go can return inspected_files: ["main.go"], skip nothing, and still allow clean approval. Compare scope - inspected - skipped and classify any remainder as incomplete; also reject or normalize inspected/skipped overlap.
Major: The reviewer prompt output contract uses input.Assignment.Files as the allowed file list, but checkout scope and coverage scope use AllowedFiles when present (internal/pipeline/pipeline.go, internal/pipeline/pipeline.go). This assumes allowed_files is always identical to or contained in files, which the selection contract does not enforce. A constrained reviewer can get a prompt schema that excludes the actual readable scope. Derive one effective reviewer scope and use it consistently for prompt contract, decode validation, checkout request, and coverage classification.
Minor: constraints is unbounded in count and length (internal/llm/contracts.go). Since constraints flow into the rollup prompt and public markdown, a model can bloat or destabilize the synthesis step despite the “compact coverage” contract. Add modest caps and tests.

rianjs · 2026-06-24T21:40:57Z

Major

effectiveReviewerScope now treats any reviewer without allowed_files as responsible for every changed file. In a normal multi-reviewer split, alpha may be selected for main.go and beta for other.go; if each reports only the file it actually inspected, both become incomplete_skipped, and a clean approval is coerced to comment even though aggregate reviewer coverage is complete. This conflates read capability with expected assignment coverage. The safer shape is to let broad reviewers read the whole checkout, but compute incomplete coverage from the aggregate union of inspected/skipped files, or use selected.Files as the expected assignment when present. Please add a regression for two broad reviewers splitting changed files without allowed_files.

rianjs · 2026-06-24T21:45:35Z

Major

DecodeFindings rejects any reviewer output with empty inspected_files, even when the reviewer reports all assigned files in skipped_files with constraints. That makes a valid incomplete-coverage report look like an LLM contract failure instead of preserving the intended coverage diagnostic. The product contract says skipped files and constraints are first-class reviewer output, so an all-skipped reviewer should decode successfully and roll up as incomplete coverage. Please allow inspected_files: [] when skipped_files covers the reviewer coverage scope, and add a regression for that path.

Minor

skipped_files is still worded against the broad readable changed-file set, while coverage completeness is now assignment-scoped. A broad reviewer assigned main.go can reasonably report other.go as skipped because it was readable but outside their assignment, and buildReviewerCoverage currently marks any skipped file incomplete. Consider either defining skipped files as assignment-scope only in the prompt, or only using skipped files intersecting reviewerCoverageScope to drive incomplete status.

rianjs · 2026-06-24T21:49:19Z

Minor

buildReviewerCoverage now correctly ignores skipped files outside reviewerCoverageScope for status, but it still stores the unfiltered result.SkippedFiles in the coverage entry. That means a reviewer can be complete_broad while the rollup prompt and final markdown still show skipped files, which conflicts with the assignment-scoped skipped-file contract. Please either filter entry.SkippedFiles to scopedSkipped before persisting/rendering coverage, or reject non-assignment skipped files at the reviewer boundary.

rianjs · 2026-06-24T21:51:41Z

No findings. The read-scope versus coverage-scope split now matches the product intent: all-skipped outputs decode, skipped coverage is assignment-scoped before rollup/rendering, and the added regressions cover the prior false-incomplete paths.

rianjs · 2026-06-24T21:54:14Z

The suite covers the new schema/summary shapes, but the main behavior is still only proven through helper-level tests.

Major: The safety path that prevents silent approval on incomplete coverage is only exercised with synthetic data in internal/reviewplan/summary_test.go and helper-level coverage classification in internal/pipeline/pipeline_test.go. There is no test that drives the real pipeline chain from reviewer output into ReviewerCoverage and then into rollup outcome, so a regression in runReviewers, buildRunSummary, or newReviewSummary could still pass these tests.
Minor: reviewcmd does not have a test that verifies the new ReviewerCoverage field survives into the rendered view model. internal/cmd/reviewcmd/reviewcmd_test.go still only checks the older summary fields, so a mapping omission in internal/cmd/reviewcmd/reviewcmd.go would go unnoticed.
Minor: The rollup prompt test only checks for the presence of reviewer_coverage and a few anti-leak strings in internal/pipeline/pipeline_test.go. It does not assert the absence of full diff hunks or changed-file bodies, so the “compact PR/dossier context” part of the contract is still under-tested.

monit-reviewer

Automated PR Review

Reviewed commit: 8395b81

Summary

Reviewer	Findings
harness-engineering:harness-enforcement-reviewer	1
harness-engineering:harness-knowledge-reviewer	2
harness-engineering:harness-self-documenting-code-reviewer	4

harness-engineering:harness-enforcement-reviewer (1 findings)

💡 Suggestion - internal/reviewplan/reviewplan.go:321

hasIncompleteReviewerCoverage relies on a strings.HasPrefix(entry.Status, "incomplete_") naming convention rather than a type-system invariant. A future contributor adding a new incomplete status (e.g. "partial_coverage") without knowing this convention would silently bypass the approval-coercion guard. Consider a defined string type with an IsIncomplete() method, or an exhaustive switch that returns an error on unrecognized status values.

harness-engineering:harness-knowledge-reviewer (2 findings)

⚠️ Should Fix - docs/checkout-native-review-contract.md:240

The PR introduces five stable coverage status values (complete_broad, complete_constrained, incomplete_skipped, incomplete_failed, incomplete_unassigned) that appear in dry-run JSON output and rollup markdown, but the contract doc does not define them or describe what each means. Consumers of dry-run JSON or rollup markdown will encounter these values without source access; they belong in the contract documentation.

💡 Suggestion - internal/pipeline/pipeline.go:2242

The non-obvious split between reviewerReviewableScope (read access over all changed files) and reviewerCoverageScope (assignment-scoped, uses selected.Files when allowed_files is absent) is what allows two broad reviewers splitting changed files to both be complete_broad. This design decision is not captured in any versioned document; future modifications to coverage logic will need this context to avoid silently regressing the multi-reviewer split case.

harness-engineering:harness-self-documenting-code-reviewer (4 findings)

⚠️ Should Fix - internal/pipeline/pipeline.go:2228

appendSortedStrings is a misleading name: 'append' implies accumulation or mutation of a target, but the function creates and returns a defensive sorted copy of its input. sortedCopy or copySorted would accurately communicate what callers receive without implying a side effect.

💡 Suggestion - internal/pipeline/pipeline.go:2183

reviewerReviewableScope and reviewerCoverageScope are easy to conflate; 'reviewable' vs 'coverage' does not convey the critical distinction (read access vs expected assignment coverage). A reader encountering both in buildReviewerPrompt and buildReviewerCoverage must read both bodies to understand why they diverge. reviewerReadableFiles / reviewerAssignmentScope, or an inline comment on the non-obvious invariant, would make the behavioral split explicit.

💡 Suggestion - internal/pipeline/pipeline.go:2207

In intersectSortedStrings, 'sorted' as a qualifier conventionally describes an input precondition, but here inputs can arrive in any order and it is the output that is sorted. intersectToSorted or sortedIntersect would more clearly signal the direction of the sort guarantee.

💡 Suggestion - internal/reviewplan/summary.go:278

coverageConstraintCell merges both entry.Constraints and entry.Diagnostic into its rendered output, but 'constraint' in the name implies only constraints are included. Diagnostics are a distinct concept (error or missing-file messages) from reviewer-declared scope constraints. coverageDetailsCell or coverageAnnotationCell would reflect that diagnostics are part of the output without requiring a reader to inspect the body.

6 PR discussion threads considered.

Completed in 3m 47s | $3.01 | sonnet | daemon 0.2.132 | Glorfindel

Field	Value
Model	sonnet
Reviewers	hybrid-synthesis, documentation:docs-reviewer, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine	claude · sonnet
Reviewed by	pr-review-daemon · monit-pr-reviewer
Duration	3m 47s wall · 3m 44s compute (Reviewers: 2m 37s · Synthesis: 1m 06s)
Cost	$3.01 (estimated)
Tokens	519.1k in / 27.8k out
Turns	14

Per-workstream usage

Workstream	Model	In	Out	Cache read	Cache create	Cost
hybrid-synthesis	sonnet	40.3k	3.5k	13.1k	27.1k (1h)	$0.22
documentation:docs-reviewer	sonnet	34.3k	1.7k	13.1k	21.2k (1h)	$0.16
harness-engineering:harness-architecture-reviewer	sonnet	85.9k	1.5k	13.1k	72.7k (1h)	$0.46
harness-engineering:harness-enforcement-reviewer	sonnet	87.8k	3.1k	13.1k	74.7k (1h)	$0.50
harness-engineering:harness-knowledge-reviewer	sonnet	87.7k	3.1k	13.1k	74.5k (1h)	$0.50
harness-engineering:harness-self-documenting-code-reviewer	sonnet	94.2k	9.6k	13.1k	81.1k (1h)	$0.63
security:security-code-auditor	sonnet	88.9k	5.2k	13.1k	75.8k (1h)	$0.54

_{Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.}

Superseded by updated review

monit-reviewer

Automated PR Review

Reviewed commit: 053a666 | Previous: 8395b81 (incremental)

Summary

No issues found.

13 PR discussion threads considered.

Completed in 1m 29s | $8.04 | sonnet | daemon 0.2.132 | Glorfindel

Field	Value
Model	sonnet
Mode	Re-review · Cycle 2 · Session resumed
Reviewers	hybrid-synthesis, documentation:docs-reviewer, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine	claude · sonnet
Reviewed by	pr-review-daemon · monit-pr-reviewer
Duration	1m 29s wall · 59s compute (Reviewers: 38s · Synthesis: 21s)
Cost	$8.04 (estimated)
Tokens	1425.6k in / 34.5k out
Turns	28

Per-workstream usage

Workstream	Model	In	Out	Cache read	Cache create	Cost
hybrid-synthesis	sonnet	93.5k	4.6k	26.3k	67.2k (1h)	$0.48
documentation:docs-reviewer	sonnet	77.1k	2.6k	26.3k	50.8k (1h)	$0.35
harness-engineering:harness-architecture-reviewer	sonnet	233.0k	2.1k	26.3k	206.7k (1h)	$1.28
harness-engineering:harness-enforcement-reviewer	sonnet	239.0k	3.9k	26.3k	212.7k (1h)	$1.34
harness-engineering:harness-knowledge-reviewer	sonnet	238.7k	4.0k	26.3k	212.4k (1h)	$1.34
harness-engineering:harness-self-documenting-code-reviewer	sonnet	258.3k	11.0k	26.3k	232.0k (1h)	$1.56
security:security-code-auditor	sonnet	241.3k	5.4k	26.3k	215.0k (1h)	$1.38
discussion-summarizer	—	44.6k	1.1k	18.5k	26.1k (1h)	$0.30

_{Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.}

rianjs · 2026-06-24T22:05:15Z

No findings. The daemon edits stayed aligned with the ticket architecture: they added coverage contract documentation, dry-run mapping coverage, conservative handling for unknown coverage statuses, and regression coverage for approval coercion without changing the read-scope/assignment-scope split.

Synthesize rollup coverage context

f591e32

Closes #374

Tighten reviewer coverage scope

58ce613

Separate reviewer read and coverage scopes

09d4568

Allow all-skipped reviewer coverage

d37f9d7

Filter skipped coverage to assignment scope

de72189

Cover incomplete coverage rollup flow

8395b81

monit-reviewer previously requested changes Jun 24, 2026

View reviewed changes

Clarify reviewer coverage semantics

053a666

monit-reviewer approved these changes Jun 24, 2026

View reviewed changes

rianjs changed the title ~~Synthesize rollup findings, failures, and coverage~~ feat: synthesize rollup findings failures and coverage Jun 24, 2026

chore: retrigger PR title check

cc95289

rianjs merged commit 3c28be7 into main Jun 24, 2026
10 checks passed

rianjs deleted the issue-374-rollup-findings-failures-coverage branch June 24, 2026 22:08

Uh oh!

Conversation

rianjs commented Jun 24, 2026

Summary

Verification

Uh oh!

rianjs commented Jun 24, 2026

Uh oh!

rianjs commented Jun 24, 2026

Major

Uh oh!

rianjs commented Jun 24, 2026

Major

Minor

Uh oh!

rianjs commented Jun 24, 2026

Minor

Uh oh!

rianjs commented Jun 24, 2026

Uh oh!

rianjs commented Jun 24, 2026

Uh oh!

monit-reviewer left a comment

Choose a reason for hiding this comment

Automated PR Review

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

monit-reviewer left a comment

Choose a reason for hiding this comment

Automated PR Review

Summary

Uh oh!

rianjs commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants