Skip to content

Improve review agent detection of error-handling and string-safety bugs #892

@fullsend-ai-retro

Description

@fullsend-ai-retro

What happened

On PR #816, the review bot approved 3 times while missing 3 blocking correctness bugs that the human reviewer caught on the first pass:

  1. UTF-8 truncation: truncate() in postreview.go sliced strings by byte offset, which could split multi-byte Unicode characters and produce invalid UTF-8. The human suggested []rune conversion.
  2. Overly broad 422 retry: CreateIssue in github.go retried on any HTTP 422 response, not just label-validation errors. The bot actually praised this retry logic as "pragmatic" — a false negative that actively mischaracterized a bug as a feature.
  3. Missing label-scope filtering: ListOpenIssues fetched all open issues with type/chore label instead of further filtering by the marker label, causing unnecessary API calls and potential false-positive dedup matches.

All three bugs are correctness issues in error handling and API contract usage — areas where the review agent consistently underperformed.

What could go better

The review agent's code-review skill (skills/code-review/SKILL.md) evaluates six dimensions but does not include specific guidance for common Go correctness patterns:

  • String safety: Operations on string in Go operate on bytes, not characters. Any truncation, slicing, or indexing should be flagged for Unicode safety review.
  • Error-handling specificity: Retry logic that catches broad error categories (e.g., all 422s) without inspecting the error detail is a common anti-pattern. The skill should prompt the agent to check whether error handling is appropriately narrow.
  • API contract alignment: When code wraps an external API (like GitHub's), the review should verify that error handling matches the API's documented error semantics, not just that error handling exists.

Confidence: High. The bot's review text explicitly praised the broad retry pattern, showing it evaluated the code but reached the wrong conclusion. This isn't a case of the bot skipping the code — it's a reasoning gap that better prompting could address.

Uncertainty: Adding review guidance could increase false positives on legitimate broad error handling. The guidance should focus on retry/recovery paths specifically, not all error handling.

Proposed change

Add a "Common correctness patterns" subsection to the code-review skill (internal/scaffold/fullsend-repo/skills/code-review/SKILL.md) under the correctness dimension. Include:

  1. String safety: "When reviewing string truncation, slicing, or indexing in Go, check whether the operation is byte-safe or rune-safe. Flag any string[:n] pattern where n could land in the middle of a multi-byte UTF-8 character."
  2. Error-handling specificity: "When reviewing retry or recovery logic, verify that the error condition is specific enough. Retrying on a broad HTTP status code (e.g., all 422s) without inspecting the response body or error details is a correctness risk — it may mask unrelated failures."
  3. API contract fidelity: "When code wraps an external API, verify that error handling matches the API's documented error semantics. For example, GitHub's 422 response includes an errors array with field and code entries that should be inspected for specific failure modes."

Validation criteria

Over the next 10 review agent runs on PRs that modify Go error-handling or string-manipulation code in fullsend-ai/fullsend, track whether the bot flags UTF-8 safety or overly broad error matching. At least 2 of those runs should produce medium-or-higher findings for patterns that would have been missed before this change. No significant increase in false-positive rate (human reviewers should not need to dismiss more than 1 additional bot finding per review on average).


Generated by retro agent from #816

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions