Skip to content

fix(read): truncate long lines at 500 chars as documented#38

Open
T0mSIlver wants to merge 1 commit into
microsoft:mainfrom
T0mSIlver:fix/read-line-length
Open

fix(read): truncate long lines at 500 chars as documented#38
T0mSIlver wants to merge 1 commit into
microsoft:mainfrom
T0mSIlver:fix/read-line-length

Conversation

@T0mSIlver

Copy link
Copy Markdown

Problem

read.md states:

Any lines longer than 500 characters will be truncated to 500 characters with '...' appended to the end.

But MAX_LINE_LENGTH = 2000, so lines up to 2000 chars were returned in full — a 4x mismatch between the documented and actual line-truncation length.

Fix

Set MAX_LINE_LENGTH = 500 to honor the prompt. MAX_LINE (the 2000-line file cap, which is correctly documented) is left unchanged. Adds tests/test_read_truncation.py asserting a 600-char line is truncated to 500 + "..." and that the constant matches the doc.

Why 500, not 2000

The paper's Read schema specifies no line-length or line-count truncation values (FastContext paper, arXiv:2606.14066, Appendix E, p. 19), so the exact number isn't spec-mandated. But 500 is the principled choice: it keeps a single Read's worst-case output within the model's context budget.

  • MAX_LINE * MAX_LINE_LENGTH = 2000 * 500 = 1,000,000 chars (1M)
  • At the common ~4 chars/token heuristic → 1M / 4 ≈ 250k tokens
  • ~250k tokens fits comfortably within the model's max sequence length.

Keeping MAX_LINE_LENGTH = 2000 instead would make the worst case 2000 * 2000 = 4M chars ≈ 1M tokens — 4x larger, which would blow past typical context limits. So beyond matching read.md, 500 is the value that balances the two caps to a sane ~250k-token ceiling per Read call.

read.md states 'Any lines longer than 500 characters will be truncated
to 500 characters with ...' but MAX_LINE_LENGTH was 2000, so lines up to
2000 chars passed through untruncated. Set MAX_LINE_LENGTH to 500 to
match the documented contract (MAX_LINE, the 2000-line file cap, is
correct and unchanged), and add a test covering the truncation length.
@T0mSIlver T0mSIlver closed this Jun 27, 2026
@T0mSIlver T0mSIlver deleted the fix/read-line-length branch June 27, 2026 11:32
@T0mSIlver T0mSIlver restored the fix/read-line-length branch June 27, 2026 11:56
@T0mSIlver T0mSIlver reopened this Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant