fix(read): truncate long lines at 500 chars as documented#38
Open
T0mSIlver wants to merge 1 commit into
Open
Conversation
read.md states 'Any lines longer than 500 characters will be truncated to 500 characters with ...' but MAX_LINE_LENGTH was 2000, so lines up to 2000 chars passed through untruncated. Set MAX_LINE_LENGTH to 500 to match the documented contract (MAX_LINE, the 2000-line file cap, is correct and unchanged), and add a test covering the truncation length.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
read.mdstates:But
MAX_LINE_LENGTH = 2000, so lines up to 2000 chars were returned in full — a 4x mismatch between the documented and actual line-truncation length.Fix
Set
MAX_LINE_LENGTH = 500to honor the prompt.MAX_LINE(the 2000-line file cap, which is correctly documented) is left unchanged. Addstests/test_read_truncation.pyasserting a 600-char line is truncated to 500 + "..." and that the constant matches the doc.Why 500, not 2000
The paper's Read schema specifies no line-length or line-count truncation values (FastContext paper, arXiv:2606.14066, Appendix E, p. 19), so the exact number isn't spec-mandated. But 500 is the principled choice: it keeps a single Read's worst-case output within the model's context budget.
MAX_LINE * MAX_LINE_LENGTH = 2000 * 500 = 1,000,000chars (1M)1M / 4 ≈ 250k tokensKeeping
MAX_LINE_LENGTH = 2000instead would make the worst case2000 * 2000 = 4Mchars ≈ 1M tokens — 4x larger, which would blow past typical context limits. So beyond matchingread.md, 500 is the value that balances the two caps to a sane ~250k-token ceiling per Read call.