salt: plain_value_and_slot accessor + drop hashbrown/serde feature#131
Merged
Conversation
- PlainStateProvider::plain_value_and_slot() returns the (SlotId, SaltValue) pair without the Vec<u8> copy that plain_value performs; plain_value now delegates to it. - Align SaltValue to 8 bytes so hot-path eq/clone can use aligned u64 loads (adds 2 bytes of trailing padding, 94 -> 96). - Cover plain_value_and_slot with a direct test asserting both the returned SlotId and the SaltValue match the underlying store entry.
Performance Benchmark ComparisonCompared Detailed Comparison
|
The align(8) change introduced in the previous commit caused a perf regression on the update path: -12% on 1-thread, scaling down to -3% at 16-thread `update 10000 KVs` benchmark. Root cause: SaltValue is already 8-aligned in most containers (Vec, Box, BTreeMap<SaltKey, SaltValue>) due to surrounding alignment. The one place align(8) actually changes layout is inside Option<SaltValue> -- where it forces 7 bytes of padding after the discriminant plus 2 bytes of trailing padding inside SaltValue itself, growing Option<SaltValue> from 95 to 104 bytes (+9.5%). Both hot-path containers carry Option<SaltValue>: - EphemeralSaltState::cache: HashMap<SaltKey, Option<SaltValue>> - StateUpdates::data: BTreeMap<SaltKey, (Option<SaltValue>, Option<SaltValue>)> So the alignment penalty hits exactly the working set of `update 10000 KVs`, inflating it by ~10% and pushing it past L1 on single-thread. The "aligned u64 load" win is also negligible on modern x86 (unaligned MOVQ is zero-cost since Sandy Bridge). Keeps plain_value_and_slot from the previous commit -- only reverts the alignment change and the cosmetic SaltValue::new rewrite that was bundled with it.
4c525f2 to
279e687
Compare
Hand-write Serialize/Deserialize for SaltProof::levels so the salt and ipa-multipoint crates can drop the hashbrown/serde feature flag, which transitively pulled in serde_core 1.0.221+ and broke downstream alloy-tx-macros 1.0.23. Entries are emitted in ascending key order so proof bytes stay deterministic across provers.
abelmega
reviewed
Apr 30, 2026
abelmega
reviewed
Apr 30, 2026
abelmega
approved these changes
May 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two independent changes to the
saltcrate:PlainStateProvider::plain_value_and_slot— give callers that need the slot, or the encoded record itself, a way to skipplain_value'sVec<u8>copy.hashbrown/serdefeature — hand-writeSerialize/DeserializeforSaltProof::levelsso we no longer pullserde_core ≥ 1.0.221throughhashbrown, which was breaking downstreamalloy-tx-macros 1.0.23.Commit history
This PR is intentionally kept as three commits (not squashed) to preserve the perf investigation:
05cf399— initial change:plain_value_and_slot+#[repr(align(8))]onSaltValue.14f9430— revert just therepr(align(8))part after the perf-pr benchmark flagged it.279e687— drophashbrown/serdefeature flag; hand-rolllevels(de)serialization.Part 1 —
plain_value_and_slot(commits 1 & 2)Changes that stay (commit 1)
salt/src/state/state.rspub fn plain_value_and_slot(plain_key, hint) -> Result<Option<(SlotId, SaltValue)>, S::Error>.plain_valueto delegate to it (no behavior change for existing callers).test_plain_state_provider_plain_value_and_slotcovering: happy path with slot cross-check (verifies the returnedSlotIdmatches where the entry actually lives by readingstore.value((bucket_id, slot).into())); wrong-hint short-circuit; missing-key.Why
Some callers only need
(SlotId, SaltValue)— for example follow-up slot-addressed operations, or reading the encoded record without copying it onto the heap. The existingplain_valuealways allocated aVec<u8>. The new method exposes the underlyingshi_searchresult directly without that allocation.What got reverted (commit 2)
The first commit also added
#[repr(align(8))]toSaltValueon the theory that alignedu64loads would speed upPartialEq/Clone. perf-pr ranupdate 10000 KVson it and showed the opposite:Linear convergence from −12% (1T) toward −3% (16T) is the textbook signature of a memory-bound regression: per-core working set blows past L1, multi-thread parallelism then hides part of the latency.
Root cause: Option layout, not SaltValue itself
SaltValueis already 8-aligned in most containers (Vec,Box,BTreeMap<SaltKey, SaltValue>) because the surrounding alignment naturally puts it on an 8-byte boundary. Sorepr(align(8))is a no-op formem_store'sBTreeMap<SaltKey, SaltValue>— which is the bulk-storage case.The one place where
repr(align(8))actually changes layout is insideOption<SaltValue>.[u8; 94]has no niche (every bit-pattern is a validSaltValue), so theOptiondiscriminant must be stored as a real byte. Withalign=1it sits at offset 0 andSaltValuefollows immediately at offset 1; withalign=8,SaltValuemust start at offset 8, so the compiler inserts 7 bytes of padding after the discriminant — plus 2 bytes of trailing padding insideSaltValueitself to make its size a multiple of 8. Net growth: +9 bytes perOption<SaltValue>:And the hot-path containers all carry
Option<SaltValue>, not rawSaltValue:EphemeralSaltState::cache: HashMap<SaltKey, Option<SaltValue>>— entry size grew 104 → 112 B (+7.7%)StateUpdates::data: BTreeMap<SaltKey, (Option<SaltValue>, Option<SaltValue>)>— entry value grew 190 → 208 B (+9.5%)For
update 10000 KVsthat's roughly +260 KB of working set, enough to push single-threaded execution past per-core L1 (32–64 KB).The "aligned
u64load" benefit was also overstated: unalignedMOVQis zero-cost on x86 since Sandy Bridge, and a 94-byte struct unavoidably spans cache lines either way. Net: paid ~10% memory locality for no real ALU win.The cosmetic
SaltValue::newrewrite that was bundled in the first commit (constructingSelffirst vs. building a local array) is also reverted, since it was added together with the alignment change for the same speculative-perf reason.Part 2 — drop
hashbrown/serdefeature (commit 3)Why
saltandipa-multipointwere enablinghashbrown/serdepurely soSaltProof::levels: FxHashMap<BucketId, u8>could deriveSerialize/Deserialize. That feature transitively bumpsserde_coreto≥ 1.0.221, which in turn breaks downstreamalloy-tx-macros 1.0.23(it doesn't compile against the newerserde_core). Dropping the feature unblocks downstream consumers without forcing them onto a differentalloyversion.Changes
salt/Cargo.toml,ipa-multipoint/Cargo.toml: drop thehashbrown/serdefeature.Cargo.lock: regenerate (loses theserde_core 1.0.221+line).salt/src/proof/prover.rs: hand-writeSerialize/Deserializefor thelevelsfield via a#[serde(with = "fx_hashmap_serde")]module. Implementation:Vec, sort byBucketIdascending, emit as a serde map. Sort is required so the encoded proof bytes are deterministic across provers —FxHashMapiteration order is not stable.Visitorover the map, building anFxHashMapwith the size hint. No order assumption on input.This is the only
FxHashMapin the crate that crossed the serde boundary, so a single hand-rolled module covers it.Test plan
cargo build -p salt --all-targetscleancargo test -p salt --lib state::state::tests::test_plain_state_provider_plain_value_and_slotpassescargo test -p salt proof::passes (coversSaltProofround-trip serde)cargo test -p salt(CI)update 10000 KVsback within noise band after the revert (CI)alloy-tx-macros 1.0.23consumer builds against this branch (CI / manual)