Skip to content

feat(respond): conversational replies on review-comment threads#333

Open
piekstra wants to merge 2 commits into
mainfrom
feat/conversational-review-replies
Open

feat(respond): conversational replies on review-comment threads#333
piekstra wants to merge 2 commits into
mainfrom
feat/conversational-review-replies

Conversation

@piekstra

Copy link
Copy Markdown
Contributor

Closes #326

Summary

Adds a new cr respond <PR> command that closes the conversational loop on cr's own open review-comment threads, so a finding can be acknowledged and resolved (or answered) without forcing a full re-review.

What it does:

  • Lists the PR's inline review threads and selects the ones that are cr-authored findings (first comment matches the posting identity and carries a codereview marker) that are still open and whose latest comment is a new human reply cr has not already answered. Threads where cr already had the last word are skipped, so repeated runs are idempotent without persisted per-thread state.
  • For each selected thread, a small-tier LLM classifier reads the original finding plus the full thread and returns one of:
    • acknowledge_and_resolve - post a brief acknowledgement and resolve the thread (finding addressed/conceded),
    • reply_only - post a contextual answer, keep the thread open (question / pushback),
    • skip - take no action.
  • Posts replies via the existing ReplyToThread plumbing and resolves via ResolveThread, both already on the gitprovider seam.

Why it matters

Reviews feel interactive, stale open threads get closed, and authors get answers without a full re-review cycle - exactly the loop described in #326.

Design notes

  • New internal/threadreply engine package modeled directly on internal/approvaloverride: strict structured-output schema (schema_version=1), deterministic prompt builder, and a pure SelectCandidates thread-selection step. The package has full unit coverage (schema decode, prompt content, classifier behavior, candidate selection, marker stripping).
  • New internal/cmd/respondcmd command following the reviewcmd/mecmd idioms: injectable RuntimeFactory, profile/provider/identity resolution, domain error mapping, and a lazy LLM classifier so runs with no candidate threads never touch the LLM provider.
  • New view.RespondResult text/JSON renderer.
  • Flags: --dry-run / --no-post, --json, --no-resolve-threads. Respects profile resolve_threads: never. The classifier uses the profile small model tier (falls back to medium with a warning; rejects a profile with neither).
  • README cr respond reference section and a cmd/cr wiring smoke test added.

Scope / remaining work

This PR implements the engine + CLI-trigger slice (part 1 of the issue, plus a manual trigger). The menu-bar watcher trigger (part 2 - polling for new thread replies and invoking cr, e.g. cr-daemon) is a separate tool and is intentionally out of scope here; cr respond is the engine it would invoke. The classifier is stateless per run and relies on thread shape (cr did not post the latest comment) rather than a ledger, which keeps this self-contained; wiring it into the durable ledger/outbox path (like cr review) is a possible follow-up if posts need the same retry/idempotency guarantees.

Testing

All commands run from the repo root:

  • go build ./... - pass
  • golangci-lint run - pass (0 issues)
  • go test ./... - pass (entire suite green)
  • go mod tidy + git diff --exit-code go.mod go.sum - clean
  • Manual: go run ./cmd/cr respond --help and go run ./cmd/cr --help render the new command correctly.

New tests:

  • internal/threadreply - schema decode, decision parsing, prompt content, classifier decision/reply + skip-clears-reply + empty-reply rejection, candidate selection (open/resolved/no-reply/cr-had-last-word/human-only), same-login-without-marker exclusion, marker stripping.
  • internal/cmd/respondcmd - live post+resolve, dry-run no-post, --no-resolve-threads, skip decision, --json output, no-candidates skips classifier, missing-PR usage error.
  • internal/view - respond renderer covered by the command JSON test.

Note: gofmt -l flags the pre-existing internal/cmd/configcmd/configcmd.go (unchanged by this PR); all files this PR touches are gofmt-clean and golangci-lint reports 0 issues.

Add a `cr respond <PR>` command that closes the conversational loop on
cr's own open review-comment threads. It lists the PR's inline review
threads, selects the ones cr authored (its own findings, matched by
posting identity plus a codereview marker) that have a newer unaddressed
human reply, and asks a small-tier LLM classifier whether to acknowledge
and resolve, reply only, or skip. Replies post in-thread via the existing
ReplyToThread plumbing and the thread is resolved via ResolveThread when
the finding is addressed.

The engine lives in a new internal/threadreply package modeled on
internal/approvaloverride: a strict structured-output schema, a
deterministic prompt builder, and a pure thread-selection step that keeps
repeated runs idempotent without persisted per-thread state. The command
supports --dry-run/--no-post, --json, and --no-resolve-threads, and
respects the profile resolve_threads policy.

This is the engine + CLI-trigger slice of the feature. The menu-bar
watcher trigger (polling for new thread replies and invoking cr) is a
separate tool and is intentionally out of scope.

Closes #326
@piekstra piekstra marked this pull request as ready for review June 17, 2026 13:04
@piekstra piekstra requested a review from piekstra-dev June 17, 2026 13:04
Comment thread internal/cmd/respondcmd/respondcmd.go
Comment thread internal/threadreply/threadreply_test.go
Comment thread internal/threadreply/responder.go Outdated
Comment thread internal/cmd/respondcmd/respondcmd.go
@piekstra-dev

Copy link
Copy Markdown
Collaborator

Automated PR Review

Reviewed commit: 14b63012fadc
Profile: reviewer - Posting as: piekstra-dev

Summary

Reviewer Findings
documentation:docs 0
go:implementation-tests 1
policies:conventions 1
structure:repo-health 1
go:implementation-tests (1 finding)

Minor - internal/threadreply/responder.go:52

buildRequest applies marker.SanitizeModelContent to comment bodies before stripMarkers, but uses the raw c.Body (marker-stripped only) for OriginalFinding:

body := marker.SanitizeModelContent(c.Body)   // sanitized
if i == 0 {
    original = stripMarkers(c.Body)             // NOT sanitized
}
comments = append(comments, Comment{
    Body: stripMarkers(body),                   // sanitized
})

The LLM prompt therefore sees two representations of the first comment: OriginalFinding (raw minus markers) and Comments[0].Body (sanitized minus markers). If SanitizeModelContent transforms any content beyond the HTML marker itself—e.g., trims prompt-injection patterns or normalises whitespace—the model receives inconsistent context for the same comment.

Fix: derive original from the already-sanitized body variable rather than c.Body:

body := marker.SanitizeModelContent(c.Body)
if i == 0 {
    original = stripMarkers(body)   // consistent with Comments[0].Body
}
policies:conventions (1 finding)

Minor - internal/cmd/respondcmd/respondcmd.go:81

--no-post is registered as a visible flag whose help text explicitly says "Alias for --dry-run". Exposing two sibling flags for the same behavior leaves users without a clear canonical option. Open CLI Collective conventions favour a single authoritative flag; if --no-post must be kept for ergonomic or compatibility reasons it should be marked hidden (cmd.Flags().MarkHidden("no-post")) so only --dry-run appears in --help, with a note in respondLong that --no-post is accepted as a legacy alias. As-is, the respondLong doc also lists both names side-by-side (--dry-run (or --no-post)), which compounds the ambiguity.

structure:repo-health (1 finding)

Minor - internal/cmd/respondcmd/respondcmd.go:348

newAdapter in respondcmd is a full dispatch over every known LLM adapter type (ClaudeCLI, CodexCLI, PiRPC, AnthropicAPI, OpenAIAPI). If reviewcmd or any other command contains an equivalent function, adding a new adapter type requires parallel edits across all copies. The invariant for this repo is that shared LLM adapter construction belongs in a single location (likely internal/llm or a shared internal/cmd helper), not replicated per-command.

Fix: extract newAdapter to a shared package (e.g., internal/llm or internal/cmd/cmdllm) and have both reviewcmd and respondcmd import it. The existing newAdapterForRuntime var indirection already provides the test-seam, so the shared function slots in without touching test structure.

0 PR discussion threads considered. 0 summarized; 0 resolved.


Completed in 5m 28s | ~$0.35 (est.) | claude-sonnet-4-6 | cr 0.4.161
Field Value
Model claude-sonnet-4-6
Reviewers documentation:docs, go:implementation-tests, policies:conventions, structure:repo-health
Engine claude_cli · claude-sonnet-4-6
Reviewed by cr · piekstra-dev
Duration 5m 28s wall · 5m 20s compute
Cost ~$0.35 (est.)
Tokens 21 in / 6.8k out

Per-workstream usage

Workstream Model In Out Cache read Cache create Cost Duration
orchestrator-selection claude-sonnet-4-6 3 124 3.3k 4.8k ~$0.02 (est.) 26s
documentation:docs claude-sonnet-4-6 4 1.2k 7.2k 12.4k ~$0.07 (est.) 26s
go:implementation-tests claude-sonnet-4-6 3 124 3.3k 4.8k ~$0.02 (est.) 1m 54s
policies:conventions claude-sonnet-4-6 3 124 3.3k 4.8k ~$0.02 (est.) 50s
structure:repo-health claude-sonnet-4-6 4 3.9k 7.7k 23.2k ~$0.15 (est.) 1m 13s
orchestrator-rollup claude-sonnet-4-6 4 1.3k 18.0k 13.3k ~$0.08 (est.) 29s

piekstra-dev
piekstra-dev previously approved these changes Jun 17, 2026

@piekstra-dev piekstra-dev left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed with outcome: approved.

@piekstra piekstra requested a review from rianjs June 17, 2026 13:42
- threadreply: strip cr markers before sanitizing in buildRequest so the
  first comment's Body and OriginalFinding derive from one consistent
  representation and no live/escaped marker leaks to the model
- threadreply: drop dead duplicate adapter.Queue in the empty-reply test
- respond: hide --no-post (keep it as a working legacy alias for --dry-run)
  and stop advertising both flags side-by-side in the long help
- llm: extract shared NewAdapterFromConfig and have reviewcmd/respondcmd
  delegate to it, removing the duplicated adapter dispatch
Comment thread internal/cmd/respondcmd/respondcmd.go
Comment thread internal/cmd/respondcmd/respondcmd_test.go
Comment thread internal/threadreply/threadreply_test.go
Comment thread internal/threadreply/responder.go
@piekstra-dev

Copy link
Copy Markdown
Collaborator

Automated PR Review

Reviewed commit: b92a5a9ea955
Profile: reviewer - Posting as: piekstra-dev

Summary

Reviewer Findings
documentation:docs 0
go:implementation-tests 1
policies:conventions 1
structure:repo-health 1
go:implementation-tests (1 finding)

Minor - internal/cmd/respondcmd/respondcmd_test.go

The deprecated --no-post alias (if flags.noPost { flags.dryRun = true } in runRespond) has no test. All dry-run behavioral coverage goes through --dry-run directly. If the alias guard were accidentally removed, callers relying on --no-post would get live posts instead of a dry-run—a silent safety regression. Add a focused test mirroring TestRespondDryRunDoesNotPost but invoking --no-post, asserting RecordedThreadReplies is empty and the output contains "Mode: dry-run".

policies:conventions (1 finding)

Minor - internal/cmd/respondcmd/respondcmd.go:76

--no-post is introduced already hidden and described as a "Deprecated alias for --dry-run" on a brand-new command that has never shipped. Pre-deprecating a flag before any users exist creates permanent maintenance debt with no migration benefit. The Open CLI Collective convention for new commands is to ship only the canonical flag (--dry-run) and add a compatibility alias later only if a prior published surface requires it. Remove --no-post and its MarkHidden call; if a cross-command alias is ever required, add it at that point with a real deprecation notice.

structure:repo-health (1 finding)

Minor - internal/threadreply/responder.go:95

stripMarkers hardcodes &lt;!-- codereview: and --> as local string literals inside the threadreply package, duplicating format knowledge that is owned by the marker package. The marker package is already imported here (for marker.SanitizeModelContent and marker.FindActions), so it is the authoritative owner of the marker format. If the HTML-comment envelope ever changes, stripMarkers will silently fail to strip old markers or misidentify new ones, leaking raw marker text into model context. Fix: expose a StripMarkers(body string) string function from the marker package and replace the local implementation with a call to it, keeping all marker-format knowledge in one place.

4 PR discussion threads considered. 4 summarized; 4 resolved.


Completed in 5m 27s | ~$0.75 (est.) | claude-sonnet-4-6 | cr 0.4.161
Field Value
Model claude-sonnet-4-6
Reviewers documentation:docs, go:implementation-tests, policies:conventions, structure:repo-health
Engine claude_cli · claude-sonnet-4-6
Reviewed by cr · piekstra-dev
Duration 5m 27s wall · 5m 19s compute
Cost ~$0.75 (est.)
Tokens 23 in / 12.9k out

Per-workstream usage

Workstream Model In Out Cache read Cache create Cost Duration
orchestrator-selection claude-sonnet-4-6 4 3.0k 8.3k 38.5k ~$0.19 (est.) 57s
documentation:docs claude-sonnet-4-6 4 1.3k 7.2k 12.4k ~$0.07 (est.) 28s
go:implementation-tests claude-sonnet-4-6 5 5.7k 41.3k 36.0k ~$0.23 (est.) 1m 51s
policies:conventions claude-sonnet-4-6 3 130 3.3k 4.8k ~$0.02 (est.) 42s
structure:repo-health claude-sonnet-4-6 4 2.5k 7.2k 24.4k ~$0.13 (est.) 51s
orchestrator-rollup claude-sonnet-4-6 3 123 3.9k 26.9k ~$0.10 (est.) 28s

@piekstra-dev piekstra-dev left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed with outcome: approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: conversational replies on review-comment threads (+ resolve when addressed)

2 participants