Optimizations by fr3akX · Pull Request #271 · tafia/quick-protobuf

fr3akX · 2026-02-27T10:14:12Z

No description provided.

Replace all 32 instances of incorrect `#[cfg_attr(std, inline)]` syntax with `#[cfg_attr(feature = "std", inline)]`. The `cfg(std)` predicate was never true, so no inline hints were being applied to reader functions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove unnecessary unsafe in varint fast paths (use safe slice creation) - Fix read_len bounds check to prevent overflow on 32-bit targets - Cap read_packed capacity hint to avoid excessive allocation - Fix read_varint64 fast path to advance self.start on Error::Varint - Add truncation comment in varint64 fast path matching slow path docs - Fix cfg_attr(std, inline) -> cfg_attr(feature = "std", inline) in writer.rs - Add fast-path tests for varint32 (1-4 byte) and varint64 (1-9 byte) - Add fast-path overflow error tests for both varint32 and varint64 - Improve error assertions to check specific error variants Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

All benchmarks pass with no regressions. Fixed-width reads (~7.6-7.8us/10k) are ~3x faster than varint reads (~24us/10k), confirming from_le_bytes optimization effectiveness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- read_fixed*: use .get().ok_or() for defensive bounds check against bytes.len(), restoring graceful error (vs panic) when bytes slice is shorter than self.end - read_u8: add self.end boundary check to prevent sub-reader escapes - read_packed: divide capacity hint by size_of::<M>() to avoid over-allocating by up to 8x for wide element types - tests: add PackedFixed::Borrowed variant coverage to test_packed_fixed_size_hint - tests: add varint64 fast-path tests for 3-byte, 4-byte, 6-byte, 7-byte encodings - docs: add Changelog entry for v0.8.2 - docs: create CLAUDE.md with architecture patterns and conventions - fix: correct doc comment typo and indentation in reader.rs module header Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add a branchless varint32 fast path on aarch64 that loads 8 bytes as a u64 and uses bit manipulation to find the varint length and extract the value without per-byte branches. The existing scalar fast path is preserved under cfg(not(target_arch = "aarch64")). Includes direct tests for the decode_varint32_branchless helper function covering all varint sizes (1-5 bytes) and the negative i32 case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add batch_decode_varint32_neon() function using ARM64 NEON intrinsics (vld1q_u8, vshrq_n_u8, vmulq_u8, vpaddlq_*) to detect varint boundaries in 16-byte chunks in parallel, then decode individual varints using the existing branchless scalar approach. Add read_packed_int32() method to BytesReader that uses the NEON batch path on aarch64 with scalar fallback. Includes 10 new tests covering mixed sizes, edge cases, and scalar equivalence validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tch decode Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add ARM64/NEON optimization section to CLAUDE.md documenting branchless varint decode, NEON batch decode, and read_packed_int32 patterns. Move completed plan to docs/plans/completed/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add unit tests for all six rewritten fixed-width read methods (read_fixed32, read_fixed64, read_sfixed32, read_sfixed64, read_float, read_double) covering success, insufficient buffer, and sub-message boundary enforcement - Add unit test for read_u8 sub-message boundary enforcement - Update Changelog with missing ARM64/NEON optimization entries and new read_packed_int32 public API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fr3akX and others added 26 commits October 16, 2025 12:22

CHRN-27 - Update reader.rs (#1)

9e52ff4

add ralphex entries to .gitignore

233447e

feat: optimize varint32 with fast-path batch bounds check

ccd4f07

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: optimize varint64 with fast-path batch bounds check

bd8a4bf

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add bounds check in read_len and capacity hint in read_packed

65063f5

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: expand benchmarks for fixed32, fixed64, string, and packed reads

97baaf9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: verify acceptance criteria for reader optimizations

7e11452

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: address code review findings

9e29d8d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

move completed plan: 2026-02-27-optimize-reader-hot-paths.md

07040fc

feat: replace generic read_fixed with direct from_le_bytes implementa…

ec45864

…tions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add #[cold] annotations to varint slow paths

bc09774

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add size_hint and ExactSizeIterator to PackedFixed iterators

d619949

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: benchmark and validate reader optimizations

67094db

All benchmarks pass with no regressions. Fixed-width reads (~7.6-7.8us/10k) are ~3x faster than varint reads (~24us/10k), confirming from_le_bytes optimization effectiveness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: move completed plan to docs/plans/completed

e59d8ec

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add branchless varint64 decode for ARM64

9766b79

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add ARM64-targeted benchmarks for branchless varint and NEON ba…

06e4c0b

…tch decode Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: verify acceptance criteria for ARM64 NEON read path optimizations

1b705ac

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

perf

4b6093d

expose fields

1f072a2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizations#271

Optimizations#271
fr3akX wants to merge 26 commits intotafia:masterfrom
coralogix:optimizations

fr3akX commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fr3akX commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant