Skip to content

perf: buffer-at-a-time search for literal patterns#16

Merged
sylvestre merged 2 commits into
mainfrom
literal-fast-path
Jun 5, 2026
Merged

perf: buffer-at-a-time search for literal patterns#16
sylvestre merged 2 commits into
mainfrom
literal-fast-path

Conversation

@sylvestre
Copy link
Copy Markdown
Contributor

Literal searches were ~50-70x slower than GNU grep because every line paid per-line costs (terminator scan, NUL scan, dispatch) even when a buffer held no match. Add a buffer-at-a-time driver that scans whole chunks with a substring searcher and only locates line boundaries around the matches it finds; a chunk with no match costs a single vectorized sweep and no per-line work.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 31, 2026

Merging this PR will improve performance by ×19

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 3 improved benchmarks
✅ 7 untouched benchmarks
⏩ 17 skipped benchmarks1

Performance Changes

Benchmark BASE HEAD Efficiency
literal_no_match 27.6 ms 1.3 ms ×21
search_pattern 29.5 ms 1.6 ms ×18
fixed_string 29.5 ms 1.6 ms ×18

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing literal-fast-path (56d774f) with main (f4798cb)

Open in CodSpeed

Footnotes

  1. 17 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 31, 2026

Codecov Report

❌ Patch coverage is 98.51632% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.67%. Comparing base (c614a57) to head (b3d70c0).

Files with missing lines Patch % Lines
src/searcher.rs 97.02% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #16      +/-   ##
==========================================
+ Coverage   95.28%   95.67%   +0.38%     
==========================================
  Files           6        6              
  Lines        1422     1758     +336     
  Branches      140      188      +48     
==========================================
+ Hits         1355     1682     +327     
- Misses         66       75       +9     
  Partials        1        1              
Flag Coverage Δ
macOS_latest 96.50% <99.40%> (+0.40%) ⬆️
ubuntu_latest 96.50% <99.40%> (+0.40%) ⬆️
windows_latest 0.00% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sylvestre sylvestre force-pushed the literal-fast-path branch from 0c9bbbe to 0846294 Compare May 31, 2026 09:17
@sylvestre sylvestre requested a review from lhecker May 31, 2026 09:28
@sylvestre sylvestre force-pushed the literal-fast-path branch 2 times, most recently from c3840c2 to b3d70c0 Compare May 31, 2026 16:13
sylvestre added 2 commits June 4, 2026 22:24
Literal searches were ~50-70x slower than GNU grep because every line
paid per-line costs (terminator scan, NUL scan, dispatch) even when a
buffer held no match. Add a buffer-at-a-time driver that scans whole
chunks with a substring searcher and only locates line boundaries
around the matches it finds; a chunk with no match costs a single
vectorized sweep and no per-line work.

The driver activates only for plain ASCII literal patterns (case
sensitive, no metacharacters) in the simpler output modes: -c, -l, -L,
-q, and plain line printing with -n/-b/filename/-m. Anything needing
match positions, context, inversion, color, or special binary handling
falls back to the unchanged line-at-a-time path. Output stays
byte-identical to that path, including binary/invalid-UTF-8 behavior.

- line_buffer: read_chunk() yields the largest span of complete lines.
- matcher: expose per-pattern memmem searchers when every pattern is a
  plain literal (plain_literal()).
- searcher: eligible_for_fast_path(), fast_locate(), fast_print().

All scanning rides on the memchr crate (SIMD memchr/memrchr/memmem).
Unit tests for read_chunk and plain_literal; integration tests for
prefixes, -m, and multi-chunk line-number correctness.

Benchmarks (31 MB corpus) vs prior release:
  -F (no match):  232ms -> 15ms  (15.9x; now faster than GNU)
  -c literal:     229ms -> 15ms  (15.2x)
  plain print:    248ms -> 18ms  (13.5x)
Regex and -i paths are unchanged (still the line-at-a-time engine).
The buffer-at-a-time fast path now serves the literal patterns that the
existing -l/-L/-q and binary tests used, leaving the line-at-a-time
engine's equivalents uncovered. Add bracket-class (non-literal) tests
for -l/-L/-q and binary handling (notice, -a text, without-match bail,
and the finalize-time notice), plus a fast-path test for a NUL that is
only discovered after a line was already printed.

No dead code was found: the remaining uncovered lines are writer I/O
error-propagation arms and pre-existing filesystem error handlers.
@sylvestre sylvestre force-pushed the literal-fast-path branch from b3d70c0 to 56d774f Compare June 4, 2026 20:24
@sylvestre
Copy link
Copy Markdown
Contributor Author

auto merged before they bit rot with all the contribs

@sylvestre sylvestre merged commit b4980df into main Jun 5, 2026
16 checks passed
@sylvestre sylvestre deleted the literal-fast-path branch June 5, 2026 06:29
@lhecker
Copy link
Copy Markdown
Collaborator

lhecker commented Jun 5, 2026

I apologize for not reviewing it earlier. I'm currently being swamped over in https://github.com/microsoft/coreutils.

I have to extra-apologize for this PR, because I personally do not believe that this PR is going into an ideal direction. It introduces a distinct path for fixed patterns, but such invocations are not any more common than other invocations as far as I can tell. The added code path is rather large and so is not worth the cost in my opinion. Lastly, it uses the same memrchr approach that ripgrep uses and I do not consider it a good idea from a perf perspective.

The correct approach in my opinion is to always read in full chunks without memrchr. The complex regex cases can continue to use a memchr line iterator while the fixed string path should perform searches that overlap chunks by the length of the needle (thereby making the memrchr unnecessary).

An alternative approach in the meantime is to adopt more of ripgrep's approach and always use memrchr in the common path and then use memchr in specifically the per-line regex path only.

@sylvestre
Copy link
Copy Markdown
Contributor Author

@lhecker no worries: revert here: #57

The added code path is rather large and so is not worth the cost in my opinion.

not sure i agree: see #16 (comment) :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants