Improve review agent detection of error-handling and string-safety bugs

## What happened

On PR [#816](https://github.com/fullsend-ai/fullsend/pull/816), the review bot approved 3 times while missing 3 blocking correctness bugs that the human reviewer caught on the first pass:

1. **UTF-8 truncation**: `truncate()` in `postreview.go` sliced strings by byte offset, which could split multi-byte Unicode characters and produce invalid UTF-8. The human suggested `[]rune` conversion.
2. **Overly broad 422 retry**: `CreateIssue` in `github.go` retried on any HTTP 422 response, not just label-validation errors. The bot actually praised this retry logic as "pragmatic" — a false negative that actively mischaracterized a bug as a feature.
3. **Missing label-scope filtering**: `ListOpenIssues` fetched all open issues with `type/chore` label instead of further filtering by the marker label, causing unnecessary API calls and potential false-positive dedup matches.

All three bugs are correctness issues in error handling and API contract usage — areas where the review agent consistently underperformed.

## What could go better

The review agent's code-review skill (`skills/code-review/SKILL.md`) evaluates six dimensions but does not include specific guidance for common Go correctness patterns:

- **String safety**: Operations on `string` in Go operate on bytes, not characters. Any truncation, slicing, or indexing should be flagged for Unicode safety review.
- **Error-handling specificity**: Retry logic that catches broad error categories (e.g., all 422s) without inspecting the error detail is a common anti-pattern. The skill should prompt the agent to check whether error handling is appropriately narrow.
- **API contract alignment**: When code wraps an external API (like GitHub's), the review should verify that error handling matches the API's documented error semantics, not just that error handling exists.

Confidence: **High**. The bot's review text explicitly praised the broad retry pattern, showing it evaluated the code but reached the wrong conclusion. This isn't a case of the bot skipping the code — it's a reasoning gap that better prompting could address.

Uncertainty: Adding review guidance could increase false positives on legitimate broad error handling. The guidance should focus on retry/recovery paths specifically, not all error handling.

## Proposed change

Add a "Common correctness patterns" subsection to the code-review skill (`internal/scaffold/fullsend-repo/skills/code-review/SKILL.md`) under the correctness dimension. Include:

1. **String safety**: "When reviewing string truncation, slicing, or indexing in Go, check whether the operation is byte-safe or rune-safe. Flag any `string[:n]` pattern where `n` could land in the middle of a multi-byte UTF-8 character."
2. **Error-handling specificity**: "When reviewing retry or recovery logic, verify that the error condition is specific enough. Retrying on a broad HTTP status code (e.g., all 422s) without inspecting the response body or error details is a correctness risk — it may mask unrelated failures."
3. **API contract fidelity**: "When code wraps an external API, verify that error handling matches the API's documented error semantics. For example, GitHub's 422 response includes an `errors` array with `field` and `code` entries that should be inspected for specific failure modes."

## Validation criteria

Over the next 10 review agent runs on PRs that modify Go error-handling or string-manipulation code in fullsend-ai/fullsend, track whether the bot flags UTF-8 safety or overly broad error matching. At least 2 of those runs should produce medium-or-higher findings for patterns that would have been missed before this change. No significant increase in false-positive rate (human reviewers should not need to dismiss more than 1 additional bot finding per review on average).

---
_Generated by retro agent from https://github.com/fullsend-ai/fullsend/pull/816_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve review agent detection of error-handling and string-safety bugs #892

What happened

What could go better

Proposed change

Validation criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve review agent detection of error-handling and string-safety bugs #892

Description

What happened

What could go better

Proposed change

Validation criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions