Skip to content

Optimizations#271

Open
fr3akX wants to merge 26 commits intotafia:masterfrom
coralogix:optimizations
Open

Optimizations#271
fr3akX wants to merge 26 commits intotafia:masterfrom
coralogix:optimizations

Conversation

@fr3akX
Copy link

@fr3akX fr3akX commented Feb 27, 2026

No description provided.

fr3akX and others added 26 commits October 16, 2025 12:22
Replace all 32 instances of incorrect `#[cfg_attr(std, inline)]` syntax
with `#[cfg_attr(feature = "std", inline)]`. The `cfg(std)` predicate was
never true, so no inline hints were being applied to reader functions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unnecessary unsafe in varint fast paths (use safe slice creation)
- Fix read_len bounds check to prevent overflow on 32-bit targets
- Cap read_packed capacity hint to avoid excessive allocation
- Fix read_varint64 fast path to advance self.start on Error::Varint
- Add truncation comment in varint64 fast path matching slow path docs
- Fix cfg_attr(std, inline) -> cfg_attr(feature = "std", inline) in writer.rs
- Add fast-path tests for varint32 (1-4 byte) and varint64 (1-9 byte)
- Add fast-path overflow error tests for both varint32 and varint64
- Improve error assertions to check specific error variants

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All benchmarks pass with no regressions. Fixed-width reads (~7.6-7.8us/10k)
are ~3x faster than varint reads (~24us/10k), confirming from_le_bytes
optimization effectiveness.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- read_fixed*: use .get().ok_or() for defensive bounds check against bytes.len(),
  restoring graceful error (vs panic) when bytes slice is shorter than self.end
- read_u8: add self.end boundary check to prevent sub-reader escapes
- read_packed: divide capacity hint by size_of::<M>() to avoid over-allocating
  by up to 8x for wide element types
- tests: add PackedFixed::Borrowed variant coverage to test_packed_fixed_size_hint
- tests: add varint64 fast-path tests for 3-byte, 4-byte, 6-byte, 7-byte encodings
- docs: add Changelog entry for v0.8.2
- docs: create CLAUDE.md with architecture patterns and conventions
- fix: correct doc comment typo and indentation in reader.rs module header

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a branchless varint32 fast path on aarch64 that loads 8 bytes as a
u64 and uses bit manipulation to find the varint length and extract the
value without per-byte branches. The existing scalar fast path is
preserved under cfg(not(target_arch = "aarch64")). Includes direct
tests for the decode_varint32_branchless helper function covering all
varint sizes (1-5 bytes) and the negative i32 case.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add batch_decode_varint32_neon() function using ARM64 NEON intrinsics
(vld1q_u8, vshrq_n_u8, vmulq_u8, vpaddlq_*) to detect varint boundaries
in 16-byte chunks in parallel, then decode individual varints using the
existing branchless scalar approach. Add read_packed_int32() method to
BytesReader that uses the NEON batch path on aarch64 with scalar fallback.
Includes 10 new tests covering mixed sizes, edge cases, and scalar
equivalence validation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tch decode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ARM64/NEON optimization section to CLAUDE.md documenting branchless
varint decode, NEON batch decode, and read_packed_int32 patterns. Move
completed plan to docs/plans/completed/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add unit tests for all six rewritten fixed-width read methods
  (read_fixed32, read_fixed64, read_sfixed32, read_sfixed64,
  read_float, read_double) covering success, insufficient buffer,
  and sub-message boundary enforcement
- Add unit test for read_u8 sub-message boundary enforcement
- Update Changelog with missing ARM64/NEON optimization entries
  and new read_packed_int32 public API

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant