Skip to content

feat: prepare pinned read-only PR workbench#380

Merged
rianjs merged 1 commit into
mainfrom
issue-370-pr-workbench
Jun 24, 2026
Merged

feat: prepare pinned read-only PR workbench#380
rianjs merged 1 commit into
mainfrom
issue-370-pr-workbench

Conversation

@rianjs

@rianjs rianjs commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

  • prepare a per-run workbench repo and scratch area before dossier/selection execution
  • pin the workbench checkout to the review head SHA and write durable metadata for the checkout inputs
  • cover the new workbench flow with pipeline tests and update command harnesses to use real local SHAs

Closes #370

Testing

  • make lint
  • make test

@rianjs rianjs force-pushed the issue-370-pr-workbench branch from 1fb8e3a to c7001a1 Compare June 24, 2026 17:47
@rianjs

rianjs commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Findings

  • Major: The pipeline unlocks workbench/repo before returning from SelectionOnly/execute, so retained artifacts are writable after the run (pipeline.go, pipeline.go). That weakens the “scratch is the only writable directory” workbench contract.

  • Major: Source repo validation ignores host. resolveWorkbenchBaseRemoteURL discards the parsed remote host and only checks owner/repo (pipeline.go). Same owner/repo on a different host could produce the wrong checkout.

  • Minor: The read-only test still only overwrites a tracked file; it does not assert workbench/repo/new-file.txt creation fails (pipeline_test.go).

@rianjs rianjs force-pushed the issue-370-pr-workbench branch 2 times, most recently from a1a06ae to 60332b7 Compare June 24, 2026 17:57
@rianjs

rianjs commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Findings

  • Major: Host validation still permits a mismatched source origin when both base/head commits already exist locally. The guard only errors when !remoteMatchesBaseHost && (!basePresent || !headPresent), so a checkout from the wrong host can still be accepted as the PR workbench. That weakens the PR provenance invariant for the pinned workbench. See internal/pipeline/pipeline.go and internal/pipeline/pipeline.go.

  • Minor: The test-only auto-unlock path is wired through production behavior via CR_PIPELINE_TEST_AUTO_UNLOCK_WORKBENCH. Prefer an explicit test option or local test cleanup so production semantics are not affected by a hidden env var. See internal/pipeline/pipeline.go.

@rianjs rianjs force-pushed the issue-370-pr-workbench branch from 60332b7 to bc11571 Compare June 24, 2026 18:07
@rianjs

rianjs commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Findings

@rianjs

rianjs commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

The suite is directionally good, but it still leaves a few explicit ticket guarantees unproven.

  • Major: [internal/pipeline/pipeline_test.go](/Users/rianjs/dev/codereview-cli/internal/pipeline/pipeline_test.go#L1386) only covers first-time preparation. There is no test that runs the helper twice against the same artifact root and verifies stale files or old metadata are replaced. A no-op “reuse the existing workbench” implementation would still pass the current tests, even though the ticket explicitly requires rerun refresh.
  • Major: [internal/pipeline/pipeline_test.go](/Users/rianjs/dev/codereview-cli/internal/pipeline/pipeline_test.go#L1424) asserts schema, checkout mode, base/head SHAs, paths, and one changed file, but it never checks PR, SourceRepoRoot, or the nested FingerprintInputs block. That means an incomplete or non-deterministic metadata payload could still slip through.
  • Major: [internal/pipeline/pipeline_test.go](/Users/rianjs/dev/codereview-cli/internal/pipeline/pipeline_test.go#L1462) and [internal/pipeline/pipeline_test.go](/Users/rianjs/dev/codereview-cli/internal/pipeline/pipeline_test.go#L1521) do not exercise the real DryRun/Live run-artifact path. The new coverage stays at the helper level plus SelectionOnly with a temp dir, so a wiring bug in the allocated run artifact layout or the new workbench hook could still ship. The fork fixture also only proves the final HEAD/diff, not that the helper באמת starts from the base checkout and fetches head from the derived fork remote.

@rianjs rianjs force-pushed the issue-370-pr-workbench branch from bc11571 to 297e2ea Compare June 24, 2026 18:16

@monit-reviewer monit-reviewer left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR Review

Reviewed commit: 297e2ea

Summary

Reviewer Findings
harness-engineering:harness-enforcement-reviewer 2
harness-engineering:harness-knowledge-reviewer 2
harness-engineering:harness-self-documenting-code-reviewer 2
security:security-code-auditor 1
harness-engineering:harness-enforcement-reviewer (2 findings)

⚠️ Should Fix - internal/pipeline/pipeline.go:50

The package-level workbenchAutoUnlockOnExit bool is a test-only escape hatch embedded in production code. shouldAutoUnlockWorkbenchOnExit ORs it with Options.AutoUnlockWorkbenchOnExit, creating hidden production-visible state that bypasses the options pattern used everywhere else. No structural guard prevents non-test code from setting it, and co-loading this package into any in-process test suite (fuzz harness, integration suite) silently disables the read-only workbench guarantee for all pipeline runs in that process. Remove the global and route all unlock control through Options.AutoUnlockWorkbenchOnExit; the testRuntimeOptions / harness-helper path already wires this correctly for most call sites.

💡 Suggestion - internal/cmd/noleak/noleak_test.go:1119

The shouldSkipOwnedPath exclusion for workbench/repo permanently removes this subtree from the owned-files leak detector. The compensating TestDefaultRemoveAllUnlocksReadOnlyArtifactTree covers the unlock+remove primitive in isolation but does not verify the full end-to-end cleanup lifecycle. Consider asserting after the noleak run that the workbench directory inside the artifact root does not exist, so structural cleanup verification is preserved without needing the skip.

harness-engineering:harness-knowledge-reviewer (2 findings)

💡 Suggestion - internal/pipeline/pipeline.go:2855

The workbenchMetadataArtifact struct introduces a durable JSON artifact schema (schema_version: 1) with fields like fingerprint_inputs, checkout_mode, and source_repo_root that are not documented. The checkout_mode constant artifact-clone implies future modes are possible, making this an extensible format. Agents consuming workbench/metadata.json in future runs have no versioned reference for what these fields mean or how fingerprint_inputs is used to identify stale workbenches.

💡 Suggestion - internal/pipeline/pipeline.go:48

The workbench contract — read-only checkout at workbench/repo, writable scratch at workbench/scratch, pinned to head SHA, with source repo provenance validation — is expressed only in code. A short docs/ entry describing the workbench layout and its provenance validation rules would make it discoverable to future agent sessions without requiring reverse-engineering of the pipeline.

harness-engineering:harness-self-documenting-code-reviewer (2 findings)

⚠️ Should Fix - internal/pipeline/pipeline.go:2997

In resolveWorkbenchBaseRemoteURL, the parsed host is silently discarded via blank identifier: _, owner, repo, _, err := parseWorkbenchRemoteURL(originURL). A reader skimming this function may assume host validation occurs here — it does not; only owner/repo are checked. Two open PR discussion threads confirm this is a real correctness gap: host validation is bypassed when both base and head commits already exist locally, meaning a checkout from a different host can be accepted as the PR workbench. Extract the host into a named variable and either assert it or explicitly comment why it is deferred.

💡 Suggestion - internal/cmd/reviewcmd/reviewcmd_test.go:2723

testRuntimeOptions describes where the function is used (tests) but not what it does. The body only sets AutoUnlockWorkbenchOnExit: true; a reader must inspect it to know that. A name like withAutoUnlockWorkbench or runtimeOptsWithAutoUnlock would make the transformation self-evident at every call site.

security:security-code-auditor (1 findings)

💡 Suggestion - internal/pipeline/pipeline.go:3009

branch.Ref (sourced from the GitHub API response) is passed as a positional argument to git fetch --no-tags <remote> <ref> without local validation. exec.Command prevents shell injection, but git treats arguments beginning with - as flags. A ref value such as --upload-pack=<path> would cause git to execute an arbitrary binary. GitHub validates branch names in practice, but adding a guard that asserts branch.Ref starts with refs/ (or at minimum does not start with -) before passing it to git provides defense in depth.

4 PR discussion threads considered.


Completed in 5m 13s | $2.79 | sonnet | daemon 0.2.132 | Glorfindel
Field Value
Model sonnet
Reviewers hybrid-synthesis, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine claude · sonnet
Reviewed by pr-review-daemon · monit-pr-reviewer
Duration 5m 13s wall · 5m 10s compute (Reviewers: 3m 40s · Synthesis: 1m 29s)
Cost $2.79 (estimated)
Tokens 455.9k in / 33.7k out
Turns 12

Per-workstream usage

Workstream Model In Out Cache read Cache create Cost
hybrid-synthesis sonnet 42.7k 4.8k 13.1k 29.6k (1h) $0.25
harness-engineering:harness-architecture-reviewer sonnet 78.4k 1.8k 13.1k 65.2k (1h) $0.42
harness-engineering:harness-enforcement-reviewer sonnet 80.8k 3.9k 13.1k 67.7k (1h) $0.47
harness-engineering:harness-knowledge-reviewer sonnet 79.1k 2.4k 13.1k 66.0k (1h) $0.44
harness-engineering:harness-self-documenting-code-reviewer sonnet 85.4k 7.9k 13.1k 72.2k (1h) $0.56
security:security-code-auditor sonnet 89.5k 12.9k 13.1k 76.3k (1h) $0.66

Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.

Comment thread internal/pipeline/pipeline.go
Comment thread internal/pipeline/pipeline.go
Comment thread internal/cmd/noleak/noleak_test.go
Comment thread internal/pipeline/pipeline.go
Comment thread internal/pipeline/pipeline.go
Comment thread internal/pipeline/pipeline.go
Comment thread internal/cmd/reviewcmd/reviewcmd_test.go
@rianjs rianjs force-pushed the issue-370-pr-workbench branch from 297e2ea to b0beef8 Compare June 24, 2026 18:35
@monit-reviewer monit-reviewer dismissed their stale review June 24, 2026 18:39

Superseded by updated review

monit-reviewer
monit-reviewer previously approved these changes Jun 24, 2026

@monit-reviewer monit-reviewer left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR Review

Reviewed commit: b0beef8 | Previous: 297e2ea (incremental)

Summary

No issues found.

11 PR discussion threads considered.


Completed in 3m 38s | $8.09 | sonnet | daemon 0.2.132 | Glorfindel
Field Value
Model sonnet
Mode Re-review · Cycle 2 · Session resumed
Reviewers hybrid-synthesis, documentation:docs-reviewer, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine claude · sonnet
Reviewed by pr-review-daemon · monit-pr-reviewer
Duration 3m 38s wall · 1m 31s compute (Reviewers: 1m 10s · Synthesis: 21s)
Cost $8.09 (estimated)
Tokens 1395.7k in / 46.1k out
Turns 26

Per-workstream usage

Workstream Model In Out Cache read Cache create Cost
hybrid-synthesis sonnet 101.2k 5.6k 26.3k 74.9k (1h) $0.54
documentation:docs-reviewer sonnet 34.5k 2.0k 13.1k 21.3k (1h) $0.16
harness-engineering:harness-architecture-reviewer sonnet 238.7k 2.9k 26.3k 212.4k (1h) $1.33
harness-engineering:harness-enforcement-reviewer sonnet 247.1k 6.1k 26.3k 220.8k (1h) $1.42
harness-engineering:harness-knowledge-reviewer sonnet 240.9k 3.4k 26.3k 214.6k (1h) $1.35
harness-engineering:harness-self-documenting-code-reviewer sonnet 261.2k 11.2k 26.3k 234.9k (1h) $1.59
security:security-code-auditor sonnet 272.2k 14.9k 26.3k 245.9k (1h) $1.71

Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.

@rianjs rianjs force-pushed the issue-370-pr-workbench branch from b0beef8 to f338537 Compare June 24, 2026 18:53
@rianjs rianjs marked this pull request as ready for review June 24, 2026 18:55

@monit-reviewer monit-reviewer left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR Review

Reviewed commit: f338537 | Previous: b0beef8 (incremental)

Approved with 1 non-blocking suggestion below. Address at your discretion.

Summary

Reviewer Findings
harness-engineering:harness-self-documenting-code-reviewer 1
harness-engineering:harness-self-documenting-code-reviewer (1 findings)

💡 Suggestion - internal/pipeline/pipeline_test.go:44

liveForTest internally sets opts.AutoUnlockWorkbenchOnExit = true, but every call site also sets AutoUnlockWorkbenchOnExit: true explicitly in the Options literal, creating redundancy. dryRunForTest and selectionOnlyForTest callers do not set the field — the wrapper handles it. The inconsistency makes a reader wonder whether the explicit field in the liveForTest literals is load-bearing or redundant. Remove the explicit AutoUnlockWorkbenchOnExit: true from liveForTest call sites to match the pattern established by the other two wrappers.

11 PR discussion threads considered.


Completed in 3m 32s | $16.82 | sonnet | daemon 0.2.132 | Glorfindel
Field Value
Model sonnet
Mode Re-review · Cycle 3 · Session resumed
Reviewers hybrid-synthesis, documentation:docs-reviewer, harness-engineering:harness-architecture-reviewer, harness-engineering:harness-enforcement-reviewer, harness-engineering:harness-knowledge-reviewer, harness-engineering:harness-self-documenting-code-reviewer, security:security-code-auditor
Engine claude · sonnet
Reviewed by pr-review-daemon · monit-pr-reviewer
Duration 3m 32s wall · 3m 00s compute (Reviewers: 2m 37s · Synthesis: 23s)
Cost $16.82 (estimated)
Tokens 2877.5k in / 59.9k out
Turns 40

Per-workstream usage

Workstream Model In Out Cache read Cache create Cost
hybrid-synthesis sonnet 167.9k 6.6k 39.4k 128.4k (1h) $0.88
documentation:docs-reviewer sonnet 76.4k 2.2k 26.3k 50.1k (1h) $0.34
harness-engineering:harness-architecture-reviewer sonnet 492.1k 3.1k 39.4k 452.7k (1h) $2.77
harness-engineering:harness-enforcement-reviewer sonnet 509.8k 8.1k 39.4k 470.3k (1h) $2.96
harness-engineering:harness-knowledge-reviewer sonnet 496.1k 3.6k 39.4k 456.6k (1h) $2.81
harness-engineering:harness-self-documenting-code-reviewer sonnet 540.2k 18.9k 39.4k 500.8k (1h) $3.30
security:security-code-auditor sonnet 549.6k 16.4k 39.4k 510.1k (1h) $3.32
discussion-summarizer 45.5k 1.0k 3.7k 41.8k (1h) $0.45

Re-reviews only run when @monit-reviewer is re-requested as a reviewer — push as many commits as you need, then re-request when ready. PRs targeting branches other than main, master are skipped, even when @monit-reviewer is re-requested.

return SelectionOnly(ctx, opts, req)
}

func liveForTest(ctx context.Context, opts Options, req Request, run ledger.Run) (Result, error) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Low (harness-engineering:harness-self-documenting-code-reviewer): liveForTest internally sets opts.AutoUnlockWorkbenchOnExit = true, but every call site also sets AutoUnlockWorkbenchOnExit: true explicitly in the Options literal, creating redundancy. dryRunForTest and selectionOnlyForTest callers do not set the field — the wrapper handles it. The inconsistency makes a reader wonder whether the explicit field in the liveForTest literals is load-bearing or redundant. Remove the explicit AutoUnlockWorkbenchOnExit: true from liveForTest call sites to match the pattern established by the other two wrappers.

Reply to this thread when addressed.

@rianjs rianjs merged commit a94a8e6 into main Jun 24, 2026
10 checks passed
@rianjs rianjs deleted the issue-370-pr-workbench branch June 24, 2026 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prepare pinned read-only PR workbench

2 participants