salt: plain_value_and_slot accessor + drop hashbrown/serde feature by deepld · Pull Request #131 · megaeth-labs/salt

deepld · 2026-04-28T10:00:48Z

Summary

Two independent changes to the salt crate:

PlainStateProvider::plain_value_and_slot — give callers that need the slot, or the encoded record itself, a way to skip plain_value's Vec<u8> copy.
Drop the hashbrown/serde feature — hand-write Serialize/Deserialize for SaltProof::levels so we no longer pull serde_core ≥ 1.0.221 through hashbrown, which was breaking downstream alloy-tx-macros 1.0.23.

Commit history

This PR is intentionally kept as three commits (not squashed) to preserve the perf investigation:

05cf399 — initial change: plain_value_and_slot + #[repr(align(8))] on SaltValue.
14f9430 — revert just the repr(align(8)) part after the perf-pr benchmark flagged it.
279e687 — drop hashbrown/serde feature flag; hand-roll levels (de)serialization.

Part 1 — `plain_value_and_slot` (commits 1 & 2)

Changes that stay (commit 1)

`salt/src/state/state.rs`

Add pub fn plain_value_and_slot(plain_key, hint) -> Result<Option<(SlotId, SaltValue)>, S::Error>.
Refactor plain_value to delegate to it (no behavior change for existing callers).
New test test_plain_state_provider_plain_value_and_slot covering: happy path with slot cross-check (verifies the returned SlotId matches where the entry actually lives by reading store.value((bucket_id, slot).into())); wrong-hint short-circuit; missing-key.

Why

Some callers only need (SlotId, SaltValue) — for example follow-up slot-addressed operations, or reading the encoded record without copying it onto the heap. The existing plain_value always allocated a Vec<u8>. The new method exposes the underlying shi_search result directly without that allocation.

What got reverted (commit 2)

The first commit also added #[repr(align(8))] to SaltValue on the theory that aligned u64 loads would speed up PartialEq/Clone. perf-pr ran update 10000 KVs on it and showed the opposite:

Threads	Baseline (Kelem/s)	With align(8)	Δ
1	49.10	43.08	−12.27%
2	88.82	81.17	−8.61%
4	159.12	150.06	−5.69%
8	276.22	265.00	−4.06%
16	405.55	392.11	−3.31%

Linear convergence from −12% (1T) toward −3% (16T) is the textbook signature of a memory-bound regression: per-core working set blows past L1, multi-thread parallelism then hides part of the latency.

Root cause: Option layout, not SaltValue itself

SaltValue is already 8-aligned in most containers (Vec, Box, BTreeMap<SaltKey, SaltValue>) because the surrounding alignment naturally puts it on an 8-byte boundary. So repr(align(8)) is a no-op for mem_store's BTreeMap<SaltKey, SaltValue> — which is the bulk-storage case.

The one place where repr(align(8)) actually changes layout is inside Option<SaltValue>. [u8; 94] has no niche (every bit-pattern is a valid SaltValue), so the Option discriminant must be stored as a real byte. With align=1 it sits at offset 0 and SaltValue follows immediately at offset 1; with align=8, SaltValue must start at offset 8, so the compiler inserts 7 bytes of padding after the discriminant — plus 2 bytes of trailing padding inside SaltValue itself to make its size a multiple of 8. Net growth: +9 bytes per Option<SaltValue>:

align=1 (95 B):
[disc:1][data:94]

align=8 (104 B):
[disc:1][padding:7][data:94][tail-pad:2]
        ↑                    ↑
        forced 7 B           SaltValue size rounded up to align

And the hot-path containers all carry Option<SaltValue>, not raw SaltValue:

EphemeralSaltState::cache: HashMap<SaltKey, Option<SaltValue>> — entry size grew 104 → 112 B (+7.7%)
StateUpdates::data: BTreeMap<SaltKey, (Option<SaltValue>, Option<SaltValue>)> — entry value grew 190 → 208 B (+9.5%)

For update 10000 KVs that's roughly +260 KB of working set, enough to push single-threaded execution past per-core L1 (32–64 KB).

The "aligned u64 load" benefit was also overstated: unaligned MOVQ is zero-cost on x86 since Sandy Bridge, and a 94-byte struct unavoidably spans cache lines either way. Net: paid ~10% memory locality for no real ALU win.

The cosmetic SaltValue::new rewrite that was bundled in the first commit (constructing Self first vs. building a local array) is also reverted, since it was added together with the alignment change for the same speculative-perf reason.

Part 2 — drop `hashbrown/serde` feature (commit 3)

Why

salt and ipa-multipoint were enabling hashbrown/serde purely so SaltProof::levels: FxHashMap<BucketId, u8> could derive Serialize/Deserialize. That feature transitively bumps serde_core to ≥ 1.0.221, which in turn breaks downstream alloy-tx-macros 1.0.23 (it doesn't compile against the newer serde_core). Dropping the feature unblocks downstream consumers without forcing them onto a different alloy version.

Changes

salt/Cargo.toml, ipa-multipoint/Cargo.toml: drop the hashbrown/serde feature.
Cargo.lock: regenerate (loses the serde_core 1.0.221+ line).
salt/src/proof/prover.rs: hand-write Serialize/Deserialize for the levels field via a #[serde(with = "fx_hashmap_serde")] module. Implementation:
- Serialize: collect entries into a Vec, sort by BucketId ascending, emit as a serde map. Sort is required so the encoded proof bytes are deterministic across provers — FxHashMap iteration order is not stable.
- Deserialize: drive a Visitor over the map, building an FxHashMap with the size hint. No order assumption on input.

This is the only FxHashMap in the crate that crossed the serde boundary, so a single hand-rolled module covers it.

Test plan

cargo build -p salt --all-targets clean
cargo test -p salt --lib state::state::tests::test_plain_state_provider_plain_value_and_slot passes
cargo test -p salt proof:: passes (covers SaltProof round-trip serde)
full cargo test -p salt (CI)
perf-pr update 10000 KVs back within noise band after the revert (CI)
downstream alloy-tx-macros 1.0.23 consumer builds against this branch (CI / manual)

- PlainStateProvider::plain_value_and_slot() returns the (SlotId, SaltValue) pair without the Vec<u8> copy that plain_value performs; plain_value now delegates to it. - Align SaltValue to 8 bytes so hot-path eq/clone can use aligned u64 loads (adds 2 bytes of trailing padding, 94 -> 96). - Cover plain_value_and_slot with a direct test asserting both the returned SlotId and the SaltValue match the underlying store entry.

github-actions · 2026-04-28T10:06:03Z

Performance Benchmark Comparison

Compared 5 benchmark(s) against the latest main baseline.

Detailed Comparison

Benchmark	Baseline Throughput (Kelem/s)	New Throughput (Kelem/s)	Change
`update 10000 KVs/1 threads`	`49.10`	`42.41`	`-13.62%`
`update 10000 KVs/2 threads`	`88.82`	`79.73`	`-10.24%`
`update 10000 KVs/4 threads`	`159.12`	`147.59`	`-7.25%`
`update 10000 KVs/8 threads`	`276.22`	`260.90`	`-5.55%`
`update 10000 KVs/16 threads`	`405.55`	`387.58`	`-4.43%`

The align(8) change introduced in the previous commit caused a perf regression on the update path: -12% on 1-thread, scaling down to -3% at 16-thread `update 10000 KVs` benchmark. Root cause: SaltValue is already 8-aligned in most containers (Vec, Box, BTreeMap<SaltKey, SaltValue>) due to surrounding alignment. The one place align(8) actually changes layout is inside Option<SaltValue> -- where it forces 7 bytes of padding after the discriminant plus 2 bytes of trailing padding inside SaltValue itself, growing Option<SaltValue> from 95 to 104 bytes (+9.5%). Both hot-path containers carry Option<SaltValue>: - EphemeralSaltState::cache: HashMap<SaltKey, Option<SaltValue>> - StateUpdates::data: BTreeMap<SaltKey, (Option<SaltValue>, Option<SaltValue>)> So the alignment penalty hits exactly the working set of `update 10000 KVs`, inflating it by ~10% and pushing it past L1 on single-thread. The "aligned u64 load" win is also negligible on modern x86 (unaligned MOVQ is zero-cost since Sandy Bridge). Keeps plain_value_and_slot from the previous commit -- only reverts the alignment change and the cosmetic SaltValue::new rewrite that was bundled with it.

Hand-write Serialize/Deserialize for SaltProof::levels so the salt and ipa-multipoint crates can drop the hashbrown/serde feature flag, which transitively pulled in serde_core 1.0.221+ and broke downstream alloy-tx-macros 1.0.23. Entries are emitted in ascending key order so proof bytes stay deterministic across provers.

flyq

LGTM

deepld force-pushed the deepld/version branch from 162c176 to 05cf399 Compare April 28, 2026 10:01

deepld force-pushed the deepld/version branch from 05cf399 to 32dfaa3 Compare April 28, 2026 10:26

deepld force-pushed the deepld/version branch from 32dfaa3 to 14f9430 Compare April 28, 2026 10:28

deepld changed the title ~~feat: add plain_value_and_slot accessor and align SaltValue to 8 bytes~~ feat: add plain_value_and_slot accessor Apr 28, 2026

deepld force-pushed the deepld/version branch 2 times, most recently from 4c525f2 to 279e687 Compare April 29, 2026 04:55

deepld changed the title ~~feat: add plain_value_and_slot accessor~~ salt: plain_value_and_slot accessor + drop hashbrown/serde feature Apr 29, 2026

deepld force-pushed the deepld/version branch from 279e687 to 459c28e Compare April 29, 2026 07:47

flyq approved these changes Apr 29, 2026

View reviewed changes

abelmega reviewed Apr 30, 2026

View reviewed changes

Comment thread salt/src/proof/prover.rs Outdated

abelmega reviewed Apr 30, 2026

View reviewed changes

Comment thread salt/src/proof/prover.rs Outdated

fix comment

950d81b

abelmega approved these changes May 7, 2026

View reviewed changes

deepld merged commit 6474b3c into main May 8, 2026
9 checks passed

deepld deleted the deepld/version branch May 8, 2026 04:01

flyq mentioned this pull request May 9, 2026

chore: bump salt version to 1.0.3 #132

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

salt: plain_value_and_slot accessor + drop hashbrown/serde feature#131

salt: plain_value_and_slot accessor + drop hashbrown/serde feature#131
deepld merged 4 commits into
mainfrom
deepld/version

deepld commented Apr 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

flyq left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

deepld commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commit history

Part 1 — plain_value_and_slot (commits 1 & 2)

Changes that stay (commit 1)

salt/src/state/state.rs

Why

What got reverted (commit 2)

Root cause: Option layout, not SaltValue itself

Part 2 — drop hashbrown/serde feature (commit 3)

Why

Changes

Test plan

Uh oh!

github-actions Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmark Comparison

Uh oh!

flyq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

deepld commented Apr 28, 2026 •

edited

Loading

Part 1 — `plain_value_and_slot` (commits 1 & 2)

`salt/src/state/state.rs`

Part 2 — drop `hashbrown/serde` feature (commit 3)

github-actions Bot commented Apr 28, 2026 •

edited

Loading