fix: normalize CRLF/LF line endings in stats and checkpoint diffing by svarlamov · Pull Request #1075 · git-ai-project/git-ai

svarlamov · 2026-04-14T04:17:21Z

Summary

Fixes inflated AI vs Human stats when files switch between CRLF and LF line endings (common on Windows with core.autocrlf or when AI tools write LF into a CRLF repo)
A 100-line CRLF file with 5 AI-added LF lines was showing +105 -100 instead of +5 -0
Root cause: compute_line_changes() passed raw content to imara-diff, which tokenizes by \n — lines like "hello\r\n" produce token "hello\r" ≠ "hello" from "hello\n"

Changes

imara_diff_utils.rs: Add normalize_line_endings() (Cow-based, zero-copy when no \r present). Normalize both inputs in compute_line_changes() before diffing while still returning references to original strings
checkpoint.rs: Add content_eq_normalized() and update 3 equality checks to skip files where only line endings differ — prevents unnecessary checkpoint entries

Test coverage

6 unit tests for compute_line_changes with CRLF/LF scenarios
4 unit tests for compute_file_line_stats with CRLF/LF scenarios
3 unit tests for attribution preservation through CRLF changes
2 end-to-end tests exercising the real checkpoint flow with CRLF git blobs vs LF working tree

Test plan

All 1412 unit tests pass
All 2944 integration tests pass
All 54 daemon mode tests pass
All 33 notes sync regression tests pass
CI green on ubuntu checks

🤖 Generated with Claude Code

When previous_content (from git blob) and current_content (from working tree) have different line endings — common on Windows with core.autocrlf or when AI tools write LF into a CRLF repo — every line appeared as changed, inflating AI vs Human stats (e.g. +105/-100 instead of +5/-0). Root cause: compute_line_changes() passed raw content to imara-diff, which tokenizes by \n. Lines like "hello\r\n" produce token "hello\r", which doesn't match "hello" from "hello\n". Fix: - Add normalize_line_endings() that strips \r before \n (with zero-copy Cow fast path when no \r present) - Normalize both inputs in compute_line_changes() before diffing, while still returning references to the original input strings - Add content_eq_normalized() for CRLF-aware equality checks in the checkpoint pipeline, preventing unnecessary entries when only line endings differ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ment Bare \r (pre-2001 Mac OS 9) converted to \n would increase the normalized line count, causing hunk indices from the normalized diff to exceed the bounds of original line arrays from split_lines_with_terminators. Only handle \r\n → \n (Windows CRLF) which preserves line count. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

svarlamov · 2026-04-14T04:56:53Z

CI Status — Ubuntu Checks

All Ubuntu-based checks are green except Test on ubuntu-latest (daemon), which failed on an unrelated flaky test:

squash_merge::test_prepare_working_log_squash_with_main_changes_standard_human_in_worktree
Line 4: Expected AI author but got 'Test User'

This is a pre-existing flaky test — the same Test on ubuntu-latest (daemon) job also fails on main (run 24362817360) with a different test (graphite::test_gt_create_then_squash_then_fold — daemon timeout). The failing test is in the squash merge integration path, which is unrelated to the CRLF normalization changes in this PR.

Will re-run the failed job once the current run completes.

When a previous checkpoint stores a CRLF blob and the working tree converts to LF without content changes, the checkpoint now updates the stored blob to LF (remapping attributions via line-number roundtrip) instead of skipping entirely. This prevents the next AI checkpoint from seeing all lines as changed due to CRLF vs LF byte differences in capture_diff_slices, which with force_split=true would re-attribute every line to AI. Addresses Devin PR review feedback on #1075. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

svarlamov · 2026-04-14T05:34:01Z

Addressed Devin review feedback (stale CRLF blob re-attribution)

Fixed in f872ecd. When content differs only in line endings (CRLF ↔ LF), the checkpoint now updates the stored blob to LF instead of skipping entirely. Attributions are preserved via a line-attribution roundtrip (char attrs → line attrs using old content → char attrs using new content), correctly remapping byte offsets from CRLF to LF space.

Added regression test test_checkpoint_stale_crlf_blob_causes_ai_reattribution that demonstrates the exact scenario: stale CRLF blob + AI checkpoint with force_split=true re-attributes all 5 lines to AI when only 1 was actually added.

svarlamov and others added 2 commits April 14, 2026 04:17

style: fix formatting

6d9f924

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This comment was marked as resolved.

Sign in to view

svarlamov merged commit 47de6ad into main Apr 14, 2026
44 of 46 checks passed

svarlamov deleted the worktree-crlf-line-ending-fix branch April 14, 2026 15:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: normalize CRLF/LF line endings in stats and checkpoint diffing#1075

fix: normalize CRLF/LF line endings in stats and checkpoint diffing#1075
svarlamov merged 4 commits intomainfrom
worktree-crlf-line-ending-fix

svarlamov commented Apr 14, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

svarlamov commented Apr 14, 2026

Uh oh!

svarlamov commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

svarlamov commented Apr 14, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test coverage

Test plan

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

svarlamov commented Apr 14, 2026

CI Status — Ubuntu Checks

Uh oh!

svarlamov commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

svarlamov commented Apr 14, 2026 •

edited by devin-ai-integration bot

Loading