perf: buffer-at-a-time search for literal patterns by sylvestre · Pull Request #16 · uutils/grep

sylvestre · 2026-05-31T08:58:21Z

Literal searches were ~50-70x slower than GNU grep because every line paid per-line costs (terminator scan, NUL scan, dispatch) even when a buffer held no match. Add a buffer-at-a-time driver that scans whole chunks with a substring searcher and only locates line boundaries around the matches it finds; a chunk with no match costs a single vectorized sweep and no per-line work.

codspeed-hq · 2026-05-31T09:01:17Z

Merging this PR will improve performance by ×19

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 3 improved benchmarks
✅ 7 untouched benchmarks
⏩ 17 skipped benchmarks¹

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	`literal_no_match`	27.6 ms	1.3 ms	×21
⚡	`search_pattern`	29.5 ms	1.6 ms	×18
⚡	`fixed_string`	29.5 ms	1.6 ms	×18

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing literal-fast-path (56d774f) with main (f4798cb)}

17 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

codecov · 2026-05-31T09:02:12Z

Codecov Report

❌ Patch coverage is 98.51632% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.67%. Comparing base (c614a57) to head (b3d70c0).

Files with missing lines	Patch %	Lines
src/searcher.rs	97.02%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #16      +/-   ##
==========================================
+ Coverage   95.28%   95.67%   +0.38%     
==========================================
  Files           6        6              
  Lines        1422     1758     +336     
  Branches      140      188      +48     
==========================================
+ Hits         1355     1682     +327     
- Misses         66       75       +9     
  Partials        1        1

Flag	Coverage Δ
macOS_latest	`96.50% <99.40%> (+0.40%)`	⬆️
ubuntu_latest	`96.50% <99.40%> (+0.40%)`	⬆️
windows_latest	`0.00% <0.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Literal searches were ~50-70x slower than GNU grep because every line paid per-line costs (terminator scan, NUL scan, dispatch) even when a buffer held no match. Add a buffer-at-a-time driver that scans whole chunks with a substring searcher and only locates line boundaries around the matches it finds; a chunk with no match costs a single vectorized sweep and no per-line work. The driver activates only for plain ASCII literal patterns (case sensitive, no metacharacters) in the simpler output modes: -c, -l, -L, -q, and plain line printing with -n/-b/filename/-m. Anything needing match positions, context, inversion, color, or special binary handling falls back to the unchanged line-at-a-time path. Output stays byte-identical to that path, including binary/invalid-UTF-8 behavior. - line_buffer: read_chunk() yields the largest span of complete lines. - matcher: expose per-pattern memmem searchers when every pattern is a plain literal (plain_literal()). - searcher: eligible_for_fast_path(), fast_locate(), fast_print(). All scanning rides on the memchr crate (SIMD memchr/memrchr/memmem). Unit tests for read_chunk and plain_literal; integration tests for prefixes, -m, and multi-chunk line-number correctness. Benchmarks (31 MB corpus) vs prior release: -F (no match): 232ms -> 15ms (15.9x; now faster than GNU) -c literal: 229ms -> 15ms (15.2x) plain print: 248ms -> 18ms (13.5x) Regex and -i paths are unchanged (still the line-at-a-time engine).

The buffer-at-a-time fast path now serves the literal patterns that the existing -l/-L/-q and binary tests used, leaving the line-at-a-time engine's equivalents uncovered. Add bracket-class (non-literal) tests for -l/-L/-q and binary handling (notice, -a text, without-match bail, and the finalize-time notice), plus a fast-path test for a NUL that is only discovered after a line was already printed. No dead code was found: the remaining uncovered lines are writer I/O error-propagation arms and pre-existing filesystem error handlers.

sylvestre · 2026-06-05T06:28:01Z

auto merged before they bit rot with all the contribs

lhecker · 2026-06-05T14:05:39Z

I apologize for not reviewing it earlier. I'm currently being swamped over in https://github.com/microsoft/coreutils.

I have to extra-apologize for this PR, because I personally do not believe that this PR is going into an ideal direction. It introduces a distinct path for fixed patterns, but such invocations are not any more common than other invocations as far as I can tell. The added code path is rather large and so is not worth the cost in my opinion. Lastly, it uses the same memrchr approach that ripgrep uses and I do not consider it a good idea from a perf perspective.

The correct approach in my opinion is to always read in full chunks without memrchr. The complex regex cases can continue to use a memchr line iterator while the fixed string path should perform searches that overlap chunks by the length of the needle (thereby making the memrchr unnecessary).

An alternative approach in the meantime is to adopt more of ripgrep's approach and always use memrchr in the common path and then use memchr in specifically the per-line regex path only.

sylvestre · 2026-06-05T14:11:20Z

@lhecker no worries: revert here: #57

The added code path is rather large and so is not worth the cost in my opinion.

not sure i agree: see #16 (comment) :)

sylvestre force-pushed the literal-fast-path branch from 0c9bbbe to 0846294 Compare May 31, 2026 09:17

sylvestre requested a review from lhecker May 31, 2026 09:28

sylvestre force-pushed the literal-fast-path branch 2 times, most recently from c3840c2 to b3d70c0 Compare May 31, 2026 16:13

sylvestre added 2 commits June 4, 2026 22:24

sylvestre force-pushed the literal-fast-path branch from b3d70c0 to 56d774f Compare June 4, 2026 20:24

sylvestre merged commit b4980df into main Jun 5, 2026
16 checks passed

sylvestre deleted the literal-fast-path branch June 5, 2026 06:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: buffer-at-a-time search for literal patterns#16

perf: buffer-at-a-time search for literal patterns#16
sylvestre merged 2 commits into
mainfrom
literal-fast-path

sylvestre commented May 31, 2026

Uh oh!

codspeed-hq Bot commented May 31, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 31, 2026 •

edited

Loading

Uh oh!

sylvestre commented Jun 5, 2026

Uh oh!

Uh oh!

lhecker commented Jun 5, 2026

Uh oh!

sylvestre commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sylvestre commented May 31, 2026

Uh oh!

codspeed-hq Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by ×19

Performance Changes

Footnotes

Uh oh!

codecov Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sylvestre commented Jun 5, 2026

Uh oh!

Uh oh!

lhecker commented Jun 5, 2026

Uh oh!

sylvestre commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codspeed-hq Bot commented May 31, 2026 •

edited

Loading

codecov Bot commented May 31, 2026 •

edited

Loading