feat: prepare pinned read-only PR workbench by rianjs · Pull Request #380 · open-cli-collective/codereview-cli

rianjs · 2026-06-24T17:39:16Z

Summary

prepare a per-run workbench repo and scratch area before dossier/selection execution
pin the workbench checkout to the review head SHA and write durable metadata for the checkout inputs
cover the new workbench flow with pipeline tests and update command harnesses to use real local SHAs

Closes #370

Testing

make lint
make test

rianjs · 2026-06-24T17:49:51Z

Findings

Major: The pipeline unlocks workbench/repo before returning from SelectionOnly/execute, so retained artifacts are writable after the run (pipeline.go, pipeline.go). That weakens the “scratch is the only writable directory” workbench contract.
Major: Source repo validation ignores host. resolveWorkbenchBaseRemoteURL discards the parsed remote host and only checks owner/repo (pipeline.go). Same owner/repo on a different host could produce the wrong checkout.
Minor: The read-only test still only overwrites a tracked file; it does not assert workbench/repo/new-file.txt creation fails (pipeline_test.go).

rianjs · 2026-06-24T17:59:45Z

Findings

Major: Host validation still permits a mismatched source origin when both base/head commits already exist locally. The guard only errors when !remoteMatchesBaseHost && (!basePresent || !headPresent), so a checkout from the wrong host can still be accepted as the PR workbench. That weakens the PR provenance invariant for the pinned workbench. See internal/pipeline/pipeline.go and internal/pipeline/pipeline.go.
Minor: The test-only auto-unlock path is wired through production behavior via CR_PIPELINE_TEST_AUTO_UNLOCK_WORKBENCH. Prefer an explicit test option or local test cleanup so production semantics are not affected by a hidden env var. See internal/pipeline/pipeline.go.

rianjs · 2026-06-24T18:09:38Z

Findings

Minor: Several tests now depend on the caller’s real repo history (HEAD^, HEAD~2) instead of hermetic fixtures, which makes make test sensitive to shallow checkouts/source archives and local git state. The workbench temp-repo fixtures already cover real SHAs; prefer using injected fixture SHAs for command/pipeline harnesses too. See internal/pipeline/pipeline_test.go, internal/pipeline/pipeline_test.go, and internal/cmd/reviewcmd/reviewcmd_test.go.

rianjs · 2026-06-24T18:12:48Z

The suite is directionally good, but it still leaves a few explicit ticket guarantees unproven.

Major: [internal/pipeline/pipeline_test.go](/Users/rianjs/dev/codereview-cli/internal/pipeline/pipeline_test.go#L1386) only covers first-time preparation. There is no test that runs the helper twice against the same artifact root and verifies stale files or old metadata are replaced. A no-op “reuse the existing workbench” implementation would still pass the current tests, even though the ticket explicitly requires rerun refresh.
Major: [internal/pipeline/pipeline_test.go](/Users/rianjs/dev/codereview-cli/internal/pipeline/pipeline_test.go#L1424) asserts schema, checkout mode, base/head SHAs, paths, and one changed file, but it never checks PR, SourceRepoRoot, or the nested FingerprintInputs block. That means an incomplete or non-deterministic metadata payload could still slip through.
Major: [internal/pipeline/pipeline_test.go](/Users/rianjs/dev/codereview-cli/internal/pipeline/pipeline_test.go#L1462) and [internal/pipeline/pipeline_test.go](/Users/rianjs/dev/codereview-cli/internal/pipeline/pipeline_test.go#L1521) do not exercise the real DryRun/Live run-artifact path. The new coverage stays at the helper level plus SelectionOnly with a temp dir, so a wiring bug in the allocated run artifact layout or the new workbench hook could still ship. The fork fixture also only proves the final HEAD/diff, not that the helper באמת starts from the base checkout and fetches head from the derived fork remote.

monit-reviewer

Automated PR Review

Reviewed commit: 297e2ea

Summary

Reviewer	Findings
harness-engineering:harness-enforcement-reviewer	2
harness-engineering:harness-knowledge-reviewer	2
harness-engineering:harness-self-documenting-code-reviewer	2
security:security-code-auditor	1

harness-engineering:harness-enforcement-reviewer (2 findings)

⚠️ Should Fix - internal/pipeline/pipeline.go:50

The package-level workbenchAutoUnlockOnExit bool is a test-only escape hatch embedded in production code. shouldAutoUnlockWorkbenchOnExit ORs it with Options.AutoUnlockWorkbenchOnExit, creating hidden production-visible state that bypasses the options pattern used everywhere else. No structural guard prevents non-test code from setting it, and co-loading this package into any in-process test suite (fuzz harness, integration suite) silently disables the read-only workbench guarantee for all pipeline runs in that process. Remove the global and route all unlock control through Options.AutoUnlockWorkbenchOnExit; the testRuntimeOptions / harness-helper path already wires this correctly for most call sites.

💡 Suggestion - internal/cmd/noleak/noleak_test.go:1119

The shouldSkipOwnedPath exclusion for workbench/repo permanently removes this subtree from the owned-files leak detector. The compensating TestDefaultRemoveAllUnlocksReadOnlyArtifactTree covers the unlock+remove primitive in isolation but does not verify the full end-to-end cleanup lifecycle. Consider asserting after the noleak run that the workbench directory inside the artifact root does not exist, so structural cleanup verification is preserved without needing the skip.

harness-engineering:harness-knowledge-reviewer (2 findings)

💡 Suggestion - internal/pipeline/pipeline.go:2855

The workbenchMetadataArtifact struct introduces a durable JSON artifact schema (schema_version: 1) with fields like fingerprint_inputs, checkout_mode, and source_repo_root that are not documented. The checkout_mode constant artifact-clone implies future modes are possible, making this an extensible format. Agents consuming workbench/metadata.json in future runs have no versioned reference for what these fields mean or how fingerprint_inputs is used to identify stale workbenches.

💡 Suggestion - internal/pipeline/pipeline.go:48

The workbench contract — read-only checkout at workbench/repo, writable scratch at workbench/scratch, pinned to head SHA, with source repo provenance validation — is expressed only in code. A short docs/ entry describing the workbench layout and its provenance validation rules would make it discoverable to future agent sessions without requiring reverse-engineering of the pipeline.

harness-engineering:harness-self-documenting-code-reviewer (2 findings)

⚠️ Should Fix - internal/pipeline/pipeline.go:2997

In resolveWorkbenchBaseRemoteURL, the parsed host is silently discarded via blank identifier: _, owner, repo, _, err := parseWorkbenchRemoteURL(originURL). A reader skimming this function may assume host validation occurs here — it does not; only owner/repo are checked. Two open PR discussion threads confirm this is a real correctness gap: host validation is bypassed when both base and head commits already exist locally, meaning a checkout from a different host can be accepted as the PR workbench. Extract the host into a named variable and either assert it or explicitly comment why it is deferred.

💡 Suggestion - internal/cmd/reviewcmd/reviewcmd_test.go:2723

testRuntimeOptions describes where the function is used (tests) but not what it does. The body only sets AutoUnlockWorkbenchOnExit: true; a reader must inspect it to know that. A name like withAutoUnlockWorkbench or runtimeOptsWithAutoUnlock would make the transformation self-evident at every call site.

security:security-code-auditor (1 findings)

💡 Suggestion - internal/pipeline/pipeline.go:3009

branch.Ref (sourced from the GitHub API response) is passed as a positional argument to git fetch --no-tags <remote> <ref> without local validation. exec.Command prevents shell injection, but git treats arguments beginning with - as flags. A ref value such as --upload-pack=<path> would cause git to execute an arbitrary binary. GitHub validates branch names in practice, but adding a guard that asserts branch.Ref starts with refs/ (or at minimum does not start with -) before passing it to git provides defense in depth.

4 PR discussion threads considered.

Completed in 5m 13s | $2.79 | sonnet | daemon 0.2.132 | Glorfindel

Field	Value
Model	sonnet
Reviewers	hybrid-synthesis, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine	claude · sonnet
Reviewed by	pr-review-daemon · monit-pr-reviewer
Duration	5m 13s wall · 5m 10s compute (Reviewers: 3m 40s · Synthesis: 1m 29s)
Cost	$2.79 (estimated)
Tokens	455.9k in / 33.7k out
Turns	12

Per-workstream usage

Workstream	Model	In	Out	Cache read	Cache create	Cost
hybrid-synthesis	sonnet	42.7k	4.8k	13.1k	29.6k (1h)	$0.25
harness-engineering:harness-architecture-reviewer	sonnet	78.4k	1.8k	13.1k	65.2k (1h)	$0.42
harness-engineering:harness-enforcement-reviewer	sonnet	80.8k	3.9k	13.1k	67.7k (1h)	$0.47
harness-engineering:harness-knowledge-reviewer	sonnet	79.1k	2.4k	13.1k	66.0k (1h)	$0.44
harness-engineering:harness-self-documenting-code-reviewer	sonnet	85.4k	7.9k	13.1k	72.2k (1h)	$0.56
security:security-code-auditor	sonnet	89.5k	12.9k	13.1k	76.3k (1h)	$0.66

_{Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.}

Superseded by updated review

monit-reviewer

Automated PR Review

Reviewed commit: b0beef8 | Previous: 297e2ea (incremental)

Summary

No issues found.

11 PR discussion threads considered.

Completed in 3m 38s | $8.09 | sonnet | daemon 0.2.132 | Glorfindel

Field	Value
Model	sonnet
Mode	Re-review · Cycle 2 · Session resumed
Reviewers	hybrid-synthesis, documentation:docs-reviewer, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine	claude · sonnet
Reviewed by	pr-review-daemon · monit-pr-reviewer
Duration	3m 38s wall · 1m 31s compute (Reviewers: 1m 10s · Synthesis: 21s)
Cost	$8.09 (estimated)
Tokens	1395.7k in / 46.1k out
Turns	26

Per-workstream usage

Workstream	Model	In	Out	Cache read	Cache create	Cost
hybrid-synthesis	sonnet	101.2k	5.6k	26.3k	74.9k (1h)	$0.54
documentation:docs-reviewer	sonnet	34.5k	2.0k	13.1k	21.3k (1h)	$0.16
harness-engineering:harness-architecture-reviewer	sonnet	238.7k	2.9k	26.3k	212.4k (1h)	$1.33
harness-engineering:harness-enforcement-reviewer	sonnet	247.1k	6.1k	26.3k	220.8k (1h)	$1.42
harness-engineering:harness-knowledge-reviewer	sonnet	240.9k	3.4k	26.3k	214.6k (1h)	$1.35
harness-engineering:harness-self-documenting-code-reviewer	sonnet	261.2k	11.2k	26.3k	234.9k (1h)	$1.59
security:security-code-auditor	sonnet	272.2k	14.9k	26.3k	245.9k (1h)	$1.71

_{Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.}

monit-reviewer

Automated PR Review

Reviewed commit: f338537 | Previous: b0beef8 (incremental)

Approved with 1 non-blocking suggestion below. Address at your discretion.

Summary

Reviewer	Findings
harness-engineering:harness-self-documenting-code-reviewer	1

harness-engineering:harness-self-documenting-code-reviewer (1 findings)

💡 Suggestion - internal/pipeline/pipeline_test.go:44

liveForTest internally sets opts.AutoUnlockWorkbenchOnExit = true, but every call site also sets AutoUnlockWorkbenchOnExit: true explicitly in the Options literal, creating redundancy. dryRunForTest and selectionOnlyForTest callers do not set the field — the wrapper handles it. The inconsistency makes a reader wonder whether the explicit field in the liveForTest literals is load-bearing or redundant. Remove the explicit AutoUnlockWorkbenchOnExit: true from liveForTest call sites to match the pattern established by the other two wrappers.

11 PR discussion threads considered.

Completed in 3m 32s | $16.82 | sonnet | daemon 0.2.132 | Glorfindel

Field	Value
Model	sonnet
Mode	Re-review · Cycle 3 · Session resumed
Reviewers	hybrid-synthesis, documentation:docs-reviewer, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine	claude · sonnet
Reviewed by	pr-review-daemon · monit-pr-reviewer
Duration	3m 32s wall · 3m 00s compute (Reviewers: 2m 37s · Synthesis: 23s)
Cost	$16.82 (estimated)
Tokens	2877.5k in / 59.9k out
Turns	40

Per-workstream usage

Workstream	Model	In	Out	Cache read	Cache create	Cost
hybrid-synthesis	sonnet	167.9k	6.6k	39.4k	128.4k (1h)	$0.88
documentation:docs-reviewer	sonnet	76.4k	2.2k	26.3k	50.1k (1h)	$0.34
harness-engineering:harness-architecture-reviewer	sonnet	492.1k	3.1k	39.4k	452.7k (1h)	$2.77
harness-engineering:harness-enforcement-reviewer	sonnet	509.8k	8.1k	39.4k	470.3k (1h)	$2.96
harness-engineering:harness-knowledge-reviewer	sonnet	496.1k	3.6k	39.4k	456.6k (1h)	$2.81
harness-engineering:harness-self-documenting-code-reviewer	sonnet	540.2k	18.9k	39.4k	500.8k (1h)	$3.30
security:security-code-auditor	sonnet	549.6k	16.4k	39.4k	510.1k (1h)	$3.32
discussion-summarizer	—	45.5k	1.0k	3.7k	41.8k (1h)	$0.45

_{Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.}

monit-reviewer · 2026-06-24T18:59:53Z

+	return SelectionOnly(ctx, opts, req)
+}
+
+func liveForTest(ctx context.Context, opts Options, req Request, run ledger.Run) (Result, error) {


🔵 Low (harness-engineering:harness-self-documenting-code-reviewer): liveForTest internally sets opts.AutoUnlockWorkbenchOnExit = true, but every call site also sets AutoUnlockWorkbenchOnExit: true explicitly in the Options literal, creating redundancy. dryRunForTest and selectionOnlyForTest callers do not set the field — the wrapper handles it. The inconsistency makes a reader wonder whether the explicit field in the liveForTest literals is load-bearing or redundant. Remove the explicit AutoUnlockWorkbenchOnExit: true from liveForTest call sites to match the pattern established by the other two wrappers.

Reply to this thread when addressed.

rianjs force-pushed the issue-370-pr-workbench branch from 1fb8e3a to c7001a1 Compare June 24, 2026 17:47

rianjs force-pushed the issue-370-pr-workbench branch 2 times, most recently from a1a06ae to 60332b7 Compare June 24, 2026 17:57

rianjs force-pushed the issue-370-pr-workbench branch from 60332b7 to bc11571 Compare June 24, 2026 18:07

rianjs force-pushed the issue-370-pr-workbench branch from bc11571 to 297e2ea Compare June 24, 2026 18:16

monit-reviewer previously requested changes Jun 24, 2026

View reviewed changes

rianjs force-pushed the issue-370-pr-workbench branch from 297e2ea to b0beef8 Compare June 24, 2026 18:35

monit-reviewer previously approved these changes Jun 24, 2026

View reviewed changes

feat: prepare pinned read-only PR workbench

f338537

rianjs dismissed monit-reviewer’s stale review via f338537 June 24, 2026 18:53

rianjs force-pushed the issue-370-pr-workbench branch from b0beef8 to f338537 Compare June 24, 2026 18:53

rianjs marked this pull request as ready for review June 24, 2026 18:55

monit-reviewer approved these changes Jun 24, 2026

View reviewed changes

rianjs merged commit a94a8e6 into main Jun 24, 2026
10 checks passed

rianjs deleted the issue-370-pr-workbench branch June 24, 2026 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: prepare pinned read-only PR workbench#380

feat: prepare pinned read-only PR workbench#380
rianjs merged 1 commit into
mainfrom
issue-370-pr-workbench

rianjs commented Jun 24, 2026

Uh oh!

rianjs commented Jun 24, 2026

Uh oh!

rianjs commented Jun 24, 2026

Uh oh!

rianjs commented Jun 24, 2026

Uh oh!

rianjs commented Jun 24, 2026

Uh oh!

monit-reviewer left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

monit-reviewer left a comment

Uh oh!

monit-reviewer left a comment

Uh oh!

monit-reviewer Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

rianjs commented Jun 24, 2026

Summary

Testing

Uh oh!

rianjs commented Jun 24, 2026

Uh oh!

rianjs commented Jun 24, 2026

Uh oh!

rianjs commented Jun 24, 2026

Uh oh!

rianjs commented Jun 24, 2026

Uh oh!

monit-reviewer left a comment

Choose a reason for hiding this comment

Automated PR Review

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

monit-reviewer left a comment

Choose a reason for hiding this comment

Automated PR Review

Summary

Uh oh!

monit-reviewer left a comment

Choose a reason for hiding this comment

Automated PR Review

Summary

Uh oh!

monit-reviewer Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants