Skip to content

feat: synthesize rollup findings failures and coverage#384

Merged
rianjs merged 8 commits into
mainfrom
issue-374-rollup-findings-failures-coverage
Jun 24, 2026
Merged

feat: synthesize rollup findings failures and coverage#384
rianjs merged 8 commits into
mainfrom
issue-374-rollup-findings-failures-coverage

Conversation

@rianjs

@rianjs rianjs commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

  • require specialist reviewer outputs to include inspected/skipped files and constraints
  • derive compact reviewer coverage summaries for rollup prompts, dry-run JSON, and final rollup markdown
  • coerce clean approval to comment when reviewer coverage is incomplete, while preserving isolated reviewer failure behavior

Verification

  • rtk go test ./internal/pipeline ./internal/reviewplan ./internal/llm ./internal/cmd/benchmarkcmd ./internal/cmd/reviewcmd -count=1
  • rtk make lint
  • rtk make build

Closes #374

@rianjs

rianjs commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Findings

  • Major: Coverage can be marked complete even when the reviewer did not inspect its full assigned scope. buildReviewerCoverage sets complete_broad or complete_constrained whenever there are no SkippedFiles, without checking that InspectedFiles covers Scope (internal/pipeline/pipeline.go). A broad reviewer assigned main.go, other.go can return inspected_files: ["main.go"], skip nothing, and still allow clean approval. Compare scope - inspected - skipped and classify any remainder as incomplete; also reject or normalize inspected/skipped overlap.

  • Major: The reviewer prompt output contract uses input.Assignment.Files as the allowed file list, but checkout scope and coverage scope use AllowedFiles when present (internal/pipeline/pipeline.go, internal/pipeline/pipeline.go). This assumes allowed_files is always identical to or contained in files, which the selection contract does not enforce. A constrained reviewer can get a prompt schema that excludes the actual readable scope. Derive one effective reviewer scope and use it consistently for prompt contract, decode validation, checkout request, and coverage classification.

  • Minor: constraints is unbounded in count and length (internal/llm/contracts.go). Since constraints flow into the rollup prompt and public markdown, a model can bloat or destabilize the synthesis step despite the “compact coverage” contract. Add modest caps and tests.

@rianjs

rianjs commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Major

  • effectiveReviewerScope now treats any reviewer without allowed_files as responsible for every changed file. In a normal multi-reviewer split, alpha may be selected for main.go and beta for other.go; if each reports only the file it actually inspected, both become incomplete_skipped, and a clean approval is coerced to comment even though aggregate reviewer coverage is complete. This conflates read capability with expected assignment coverage. The safer shape is to let broad reviewers read the whole checkout, but compute incomplete coverage from the aggregate union of inspected/skipped files, or use selected.Files as the expected assignment when present. Please add a regression for two broad reviewers splitting changed files without allowed_files.

@rianjs

rianjs commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Major

  • DecodeFindings rejects any reviewer output with empty inspected_files, even when the reviewer reports all assigned files in skipped_files with constraints. That makes a valid incomplete-coverage report look like an LLM contract failure instead of preserving the intended coverage diagnostic. The product contract says skipped files and constraints are first-class reviewer output, so an all-skipped reviewer should decode successfully and roll up as incomplete coverage. Please allow inspected_files: [] when skipped_files covers the reviewer coverage scope, and add a regression for that path.

Minor

  • skipped_files is still worded against the broad readable changed-file set, while coverage completeness is now assignment-scoped. A broad reviewer assigned main.go can reasonably report other.go as skipped because it was readable but outside their assignment, and buildReviewerCoverage currently marks any skipped file incomplete. Consider either defining skipped files as assignment-scope only in the prompt, or only using skipped files intersecting reviewerCoverageScope to drive incomplete status.

@rianjs

rianjs commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Minor

  • buildReviewerCoverage now correctly ignores skipped files outside reviewerCoverageScope for status, but it still stores the unfiltered result.SkippedFiles in the coverage entry. That means a reviewer can be complete_broad while the rollup prompt and final markdown still show skipped files, which conflicts with the assignment-scoped skipped-file contract. Please either filter entry.SkippedFiles to scopedSkipped before persisting/rendering coverage, or reject non-assignment skipped files at the reviewer boundary.

@rianjs

rianjs commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

No findings. The read-scope versus coverage-scope split now matches the product intent: all-skipped outputs decode, skipped coverage is assignment-scoped before rollup/rendering, and the added regressions cover the prior false-incomplete paths.

@rianjs

rianjs commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

The suite covers the new schema/summary shapes, but the main behavior is still only proven through helper-level tests.

  • Major: The safety path that prevents silent approval on incomplete coverage is only exercised with synthetic data in internal/reviewplan/summary_test.go and helper-level coverage classification in internal/pipeline/pipeline_test.go. There is no test that drives the real pipeline chain from reviewer output into ReviewerCoverage and then into rollup outcome, so a regression in runReviewers, buildRunSummary, or newReviewSummary could still pass these tests.

  • Minor: reviewcmd does not have a test that verifies the new ReviewerCoverage field survives into the rendered view model. internal/cmd/reviewcmd/reviewcmd_test.go still only checks the older summary fields, so a mapping omission in internal/cmd/reviewcmd/reviewcmd.go would go unnoticed.

  • Minor: The rollup prompt test only checks for the presence of reviewer_coverage and a few anti-leak strings in internal/pipeline/pipeline_test.go. It does not assert the absence of full diff hunks or changed-file bodies, so the “compact PR/dossier context” part of the contract is still under-tested.

@monit-reviewer monit-reviewer left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR Review

Reviewed commit: 8395b81

Summary

Reviewer Findings
harness-engineering:harness-enforcement-reviewer 1
harness-engineering:harness-knowledge-reviewer 2
harness-engineering:harness-self-documenting-code-reviewer 4
harness-engineering:harness-enforcement-reviewer (1 findings)

💡 Suggestion - internal/reviewplan/reviewplan.go:321

hasIncompleteReviewerCoverage relies on a strings.HasPrefix(entry.Status, "incomplete_") naming convention rather than a type-system invariant. A future contributor adding a new incomplete status (e.g. "partial_coverage") without knowing this convention would silently bypass the approval-coercion guard. Consider a defined string type with an IsIncomplete() method, or an exhaustive switch that returns an error on unrecognized status values.

harness-engineering:harness-knowledge-reviewer (2 findings)

⚠️ Should Fix - docs/checkout-native-review-contract.md:240

The PR introduces five stable coverage status values (complete_broad, complete_constrained, incomplete_skipped, incomplete_failed, incomplete_unassigned) that appear in dry-run JSON output and rollup markdown, but the contract doc does not define them or describe what each means. Consumers of dry-run JSON or rollup markdown will encounter these values without source access; they belong in the contract documentation.

💡 Suggestion - internal/pipeline/pipeline.go:2242

The non-obvious split between reviewerReviewableScope (read access over all changed files) and reviewerCoverageScope (assignment-scoped, uses selected.Files when allowed_files is absent) is what allows two broad reviewers splitting changed files to both be complete_broad. This design decision is not captured in any versioned document; future modifications to coverage logic will need this context to avoid silently regressing the multi-reviewer split case.

harness-engineering:harness-self-documenting-code-reviewer (4 findings)

⚠️ Should Fix - internal/pipeline/pipeline.go:2228

appendSortedStrings is a misleading name: 'append' implies accumulation or mutation of a target, but the function creates and returns a defensive sorted copy of its input. sortedCopy or copySorted would accurately communicate what callers receive without implying a side effect.

💡 Suggestion - internal/pipeline/pipeline.go:2183

reviewerReviewableScope and reviewerCoverageScope are easy to conflate; 'reviewable' vs 'coverage' does not convey the critical distinction (read access vs expected assignment coverage). A reader encountering both in buildReviewerPrompt and buildReviewerCoverage must read both bodies to understand why they diverge. reviewerReadableFiles / reviewerAssignmentScope, or an inline comment on the non-obvious invariant, would make the behavioral split explicit.

💡 Suggestion - internal/pipeline/pipeline.go:2207

In intersectSortedStrings, 'sorted' as a qualifier conventionally describes an input precondition, but here inputs can arrive in any order and it is the output that is sorted. intersectToSorted or sortedIntersect would more clearly signal the direction of the sort guarantee.

💡 Suggestion - internal/reviewplan/summary.go:278

coverageConstraintCell merges both entry.Constraints and entry.Diagnostic into its rendered output, but 'constraint' in the name implies only constraints are included. Diagnostics are a distinct concept (error or missing-file messages) from reviewer-declared scope constraints. coverageDetailsCell or coverageAnnotationCell would reflect that diagnostics are part of the output without requiring a reader to inspect the body.

6 PR discussion threads considered.


Completed in 3m 47s | $3.01 | sonnet | daemon 0.2.132 | Glorfindel
Field Value
Model sonnet
Reviewers hybrid-synthesis, documentation:docs-reviewer, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine claude · sonnet
Reviewed by pr-review-daemon · monit-pr-reviewer
Duration 3m 47s wall · 3m 44s compute (Reviewers: 2m 37s · Synthesis: 1m 06s)
Cost $3.01 (estimated)
Tokens 519.1k in / 27.8k out
Turns 14

Per-workstream usage

Workstream Model In Out Cache read Cache create Cost
hybrid-synthesis sonnet 40.3k 3.5k 13.1k 27.1k (1h) $0.22
documentation:docs-reviewer sonnet 34.3k 1.7k 13.1k 21.2k (1h) $0.16
harness-engineering:harness-architecture-reviewer sonnet 85.9k 1.5k 13.1k 72.7k (1h) $0.46
harness-engineering:harness-enforcement-reviewer sonnet 87.8k 3.1k 13.1k 74.7k (1h) $0.50
harness-engineering:harness-knowledge-reviewer sonnet 87.7k 3.1k 13.1k 74.5k (1h) $0.50
harness-engineering:harness-self-documenting-code-reviewer sonnet 94.2k 9.6k 13.1k 81.1k (1h) $0.63
security:security-code-auditor sonnet 88.9k 5.2k 13.1k 75.8k (1h) $0.54

Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.

Comment thread internal/pipeline/pipeline.go
Comment thread docs/checkout-native-review-contract.md
Comment thread internal/reviewplan/reviewplan.go
Comment thread internal/pipeline/pipeline.go
Comment thread internal/pipeline/pipeline.go Outdated
Comment thread internal/pipeline/pipeline.go
Comment thread internal/reviewplan/summary.go
@monit-reviewer monit-reviewer dismissed their stale review June 24, 2026 22:04

Superseded by updated review

@monit-reviewer monit-reviewer left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR Review

Reviewed commit: 053a666 | Previous: 8395b81 (incremental)

Summary

No issues found.

13 PR discussion threads considered.


Completed in 1m 29s | $8.04 | sonnet | daemon 0.2.132 | Glorfindel
Field Value
Model sonnet
Mode Re-review · Cycle 2 · Session resumed
Reviewers hybrid-synthesis, documentation:docs-reviewer, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine claude · sonnet
Reviewed by pr-review-daemon · monit-pr-reviewer
Duration 1m 29s wall · 59s compute (Reviewers: 38s · Synthesis: 21s)
Cost $8.04 (estimated)
Tokens 1425.6k in / 34.5k out
Turns 28

Per-workstream usage

Workstream Model In Out Cache read Cache create Cost
hybrid-synthesis sonnet 93.5k 4.6k 26.3k 67.2k (1h) $0.48
documentation:docs-reviewer sonnet 77.1k 2.6k 26.3k 50.8k (1h) $0.35
harness-engineering:harness-architecture-reviewer sonnet 233.0k 2.1k 26.3k 206.7k (1h) $1.28
harness-engineering:harness-enforcement-reviewer sonnet 239.0k 3.9k 26.3k 212.7k (1h) $1.34
harness-engineering:harness-knowledge-reviewer sonnet 238.7k 4.0k 26.3k 212.4k (1h) $1.34
harness-engineering:harness-self-documenting-code-reviewer sonnet 258.3k 11.0k 26.3k 232.0k (1h) $1.56
security:security-code-auditor sonnet 241.3k 5.4k 26.3k 215.0k (1h) $1.38
discussion-summarizer 44.6k 1.1k 18.5k 26.1k (1h) $0.30

Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.

@rianjs

rianjs commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

No findings. The daemon edits stayed aligned with the ticket architecture: they added coverage contract documentation, dry-run mapping coverage, conservative handling for unknown coverage statuses, and regression coverage for approval coercion without changing the read-scope/assignment-scope split.

@rianjs rianjs changed the title Synthesize rollup findings, failures, and coverage feat: synthesize rollup findings failures and coverage Jun 24, 2026
@rianjs rianjs merged commit 3c28be7 into main Jun 24, 2026
10 checks passed
@rianjs rianjs deleted the issue-374-rollup-findings-failures-coverage branch June 24, 2026 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update rollup to synthesize findings, failures, and coverage

2 participants