Summary
cr review currently aborts the entire multi-reviewer run when a single reviewer returns structured output that fails schema decoding, and the failed run does not persist enough diagnostic state to identify the offending reviewer or recover successful reviewer work.
Observed
While dogfooding PR #359 on commit 70ee83c17c3980a6e1efb24a7a681bcdf54c4c2a, this command failed after the built-in retry:
/tmp/cr-thread-lifecycle review https://github.com/open-cli-collective/codereview-cli/pull/359 \
--dry-run --max-agents 5 --allow-self-review --json --verbose
Failure:
structured output invalid after retry: first: json: unknown field "line"; second: json: unknown field "line"
The run selected/logged four reviewers:
structure:repo-health
go:implementation-tests
policies:conventions
documentation:docs
The ledger run was marked failed, but no sessions or findings were persisted for the failed run. The artifact directory retained only wrapper logs with short Claude background job IDs, and those jobs were no longer available via claude logs, so the raw malformed JSON and offending reviewer could not be recovered.
Example failed run ID:
a53df086-ab7d-4454-b4f6-dc21bdccd5f0
Why this is a bug
Rejecting malformed structured output is correct. The problematic behavior is that one malformed reviewer response:
- aborts the whole review run, including successful sibling reviewers, and
- leaves insufficient durable diagnostics to identify the reviewer, inspect the raw invalid output, or resume/recover from the successful work.
For parallel reviewer execution, a schema failure should be isolated to the reviewer/workstream that produced it, or at minimum reported with enough context to debug without relying on ephemeral provider CLI logs.
Desired behavior
- Include reviewer/workstream ID in the structured-output validation error.
- Persist failed reviewer session metadata and validation error details before aborting.
- Persist or artifact the raw invalid structured output, with normal secret-safety constraints.
- Preserve successful reviewer results where possible instead of discarding the whole run.
- Consider allowing the run to complete with a failed-reviewer diagnostic finding/status when one reviewer fails schema validation.
Related symptoms
This is also a debuggability/recovery gap: failed structured-output runs currently depend on ephemeral Claude background logs for raw response inspection, but those logs may be gone by the time the operator investigates.
Summary
cr reviewcurrently aborts the entire multi-reviewer run when a single reviewer returns structured output that fails schema decoding, and the failed run does not persist enough diagnostic state to identify the offending reviewer or recover successful reviewer work.Observed
While dogfooding PR #359 on commit
70ee83c17c3980a6e1efb24a7a681bcdf54c4c2a, this command failed after the built-in retry:Failure:
The run selected/logged four reviewers:
structure:repo-healthgo:implementation-testspolicies:conventionsdocumentation:docsThe ledger run was marked failed, but no sessions or findings were persisted for the failed run. The artifact directory retained only wrapper logs with short Claude background job IDs, and those jobs were no longer available via
claude logs, so the raw malformed JSON and offending reviewer could not be recovered.Example failed run ID:
Why this is a bug
Rejecting malformed structured output is correct. The problematic behavior is that one malformed reviewer response:
For parallel reviewer execution, a schema failure should be isolated to the reviewer/workstream that produced it, or at minimum reported with enough context to debug without relying on ephemeral provider CLI logs.
Desired behavior
Related symptoms
This is also a debuggability/recovery gap: failed structured-output runs currently depend on ephemeral Claude background logs for raw response inspection, but those logs may be gone by the time the operator investigates.