Skip to content

salt: plain_value_and_slot accessor + drop hashbrown/serde feature#131

Merged
deepld merged 4 commits into
mainfrom
deepld/version
May 8, 2026
Merged

salt: plain_value_and_slot accessor + drop hashbrown/serde feature#131
deepld merged 4 commits into
mainfrom
deepld/version

Conversation

@deepld
Copy link
Copy Markdown
Collaborator

@deepld deepld commented Apr 28, 2026

Summary

Two independent changes to the salt crate:

  1. PlainStateProvider::plain_value_and_slot — give callers that need the slot, or the encoded record itself, a way to skip plain_value's Vec<u8> copy.
  2. Drop the hashbrown/serde feature — hand-write Serialize/Deserialize for SaltProof::levels so we no longer pull serde_core ≥ 1.0.221 through hashbrown, which was breaking downstream alloy-tx-macros 1.0.23.

Commit history

This PR is intentionally kept as three commits (not squashed) to preserve the perf investigation:

  1. 05cf399 — initial change: plain_value_and_slot + #[repr(align(8))] on SaltValue.
  2. 14f9430 — revert just the repr(align(8)) part after the perf-pr benchmark flagged it.
  3. 279e687 — drop hashbrown/serde feature flag; hand-roll levels (de)serialization.

Part 1 — plain_value_and_slot (commits 1 & 2)

Changes that stay (commit 1)

salt/src/state/state.rs

  • Add pub fn plain_value_and_slot(plain_key, hint) -> Result<Option<(SlotId, SaltValue)>, S::Error>.
  • Refactor plain_value to delegate to it (no behavior change for existing callers).
  • New test test_plain_state_provider_plain_value_and_slot covering: happy path with slot cross-check (verifies the returned SlotId matches where the entry actually lives by reading store.value((bucket_id, slot).into())); wrong-hint short-circuit; missing-key.

Why

Some callers only need (SlotId, SaltValue) — for example follow-up slot-addressed operations, or reading the encoded record without copying it onto the heap. The existing plain_value always allocated a Vec<u8>. The new method exposes the underlying shi_search result directly without that allocation.

What got reverted (commit 2)

The first commit also added #[repr(align(8))] to SaltValue on the theory that aligned u64 loads would speed up PartialEq/Clone. perf-pr ran update 10000 KVs on it and showed the opposite:

Threads Baseline (Kelem/s) With align(8) Δ
1 49.10 43.08 −12.27%
2 88.82 81.17 −8.61%
4 159.12 150.06 −5.69%
8 276.22 265.00 −4.06%
16 405.55 392.11 −3.31%

Linear convergence from −12% (1T) toward −3% (16T) is the textbook signature of a memory-bound regression: per-core working set blows past L1, multi-thread parallelism then hides part of the latency.

Root cause: Option layout, not SaltValue itself

SaltValue is already 8-aligned in most containers (Vec, Box, BTreeMap<SaltKey, SaltValue>) because the surrounding alignment naturally puts it on an 8-byte boundary. So repr(align(8)) is a no-op for mem_store's BTreeMap<SaltKey, SaltValue> — which is the bulk-storage case.

The one place where repr(align(8)) actually changes layout is inside Option<SaltValue>. [u8; 94] has no niche (every bit-pattern is a valid SaltValue), so the Option discriminant must be stored as a real byte. With align=1 it sits at offset 0 and SaltValue follows immediately at offset 1; with align=8, SaltValue must start at offset 8, so the compiler inserts 7 bytes of padding after the discriminant — plus 2 bytes of trailing padding inside SaltValue itself to make its size a multiple of 8. Net growth: +9 bytes per Option<SaltValue>:

align=1 (95 B):
[disc:1][data:94]

align=8 (104 B):
[disc:1][padding:7][data:94][tail-pad:2]
        ↑                    ↑
        forced 7 B           SaltValue size rounded up to align

And the hot-path containers all carry Option<SaltValue>, not raw SaltValue:

  • EphemeralSaltState::cache: HashMap<SaltKey, Option<SaltValue>> — entry size grew 104 → 112 B (+7.7%)
  • StateUpdates::data: BTreeMap<SaltKey, (Option<SaltValue>, Option<SaltValue>)> — entry value grew 190 → 208 B (+9.5%)

For update 10000 KVs that's roughly +260 KB of working set, enough to push single-threaded execution past per-core L1 (32–64 KB).

The "aligned u64 load" benefit was also overstated: unaligned MOVQ is zero-cost on x86 since Sandy Bridge, and a 94-byte struct unavoidably spans cache lines either way. Net: paid ~10% memory locality for no real ALU win.

The cosmetic SaltValue::new rewrite that was bundled in the first commit (constructing Self first vs. building a local array) is also reverted, since it was added together with the alignment change for the same speculative-perf reason.


Part 2 — drop hashbrown/serde feature (commit 3)

Why

salt and ipa-multipoint were enabling hashbrown/serde purely so SaltProof::levels: FxHashMap<BucketId, u8> could derive Serialize/Deserialize. That feature transitively bumps serde_core to ≥ 1.0.221, which in turn breaks downstream alloy-tx-macros 1.0.23 (it doesn't compile against the newer serde_core). Dropping the feature unblocks downstream consumers without forcing them onto a different alloy version.

Changes

  • salt/Cargo.toml, ipa-multipoint/Cargo.toml: drop the hashbrown/serde feature.
  • Cargo.lock: regenerate (loses the serde_core 1.0.221+ line).
  • salt/src/proof/prover.rs: hand-write Serialize/Deserialize for the levels field via a #[serde(with = "fx_hashmap_serde")] module. Implementation:
    • Serialize: collect entries into a Vec, sort by BucketId ascending, emit as a serde map. Sort is required so the encoded proof bytes are deterministic across provers — FxHashMap iteration order is not stable.
    • Deserialize: drive a Visitor over the map, building an FxHashMap with the size hint. No order assumption on input.

This is the only FxHashMap in the crate that crossed the serde boundary, so a single hand-rolled module covers it.


Test plan

  • cargo build -p salt --all-targets clean
  • cargo test -p salt --lib state::state::tests::test_plain_state_provider_plain_value_and_slot passes
  • cargo test -p salt proof:: passes (covers SaltProof round-trip serde)
  • full cargo test -p salt (CI)
  • perf-pr update 10000 KVs back within noise band after the revert (CI)
  • downstream alloy-tx-macros 1.0.23 consumer builds against this branch (CI / manual)

- PlainStateProvider::plain_value_and_slot() returns the (SlotId, SaltValue)
  pair without the Vec<u8> copy that plain_value performs; plain_value now
  delegates to it.
- Align SaltValue to 8 bytes so hot-path eq/clone can use aligned u64 loads
  (adds 2 bytes of trailing padding, 94 -> 96).
- Cover plain_value_and_slot with a direct test asserting both the returned
  SlotId and the SaltValue match the underlying store entry.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 28, 2026

Performance Benchmark Comparison

Compared 5 benchmark(s) against the latest main baseline.

Detailed Comparison
Benchmark Baseline Throughput (Kelem/s) New Throughput (Kelem/s) Change
update 10000 KVs/1 threads 49.10 42.41 -13.62%
update 10000 KVs/2 threads 88.82 79.73 -10.24%
update 10000 KVs/4 threads 159.12 147.59 -7.25%
update 10000 KVs/8 threads 276.22 260.90 -5.55%
update 10000 KVs/16 threads 405.55 387.58 -4.43%

The align(8) change introduced in the previous commit caused a perf
regression on the update path: -12% on 1-thread, scaling down to -3% at
16-thread `update 10000 KVs` benchmark.

Root cause: SaltValue is already 8-aligned in most containers (Vec, Box,
BTreeMap<SaltKey, SaltValue>) due to surrounding alignment. The one place
align(8) actually changes layout is inside Option<SaltValue> -- where it
forces 7 bytes of padding after the discriminant plus 2 bytes of trailing
padding inside SaltValue itself, growing Option<SaltValue> from 95 to 104
bytes (+9.5%).

Both hot-path containers carry Option<SaltValue>:
  - EphemeralSaltState::cache: HashMap<SaltKey, Option<SaltValue>>
  - StateUpdates::data: BTreeMap<SaltKey, (Option<SaltValue>, Option<SaltValue>)>

So the alignment penalty hits exactly the working set of `update 10000 KVs`,
inflating it by ~10% and pushing it past L1 on single-thread. The "aligned
u64 load" win is also negligible on modern x86 (unaligned MOVQ is zero-cost
since Sandy Bridge).

Keeps plain_value_and_slot from the previous commit -- only reverts the
alignment change and the cosmetic SaltValue::new rewrite that was bundled
with it.
@deepld deepld changed the title feat: add plain_value_and_slot accessor and align SaltValue to 8 bytes feat: add plain_value_and_slot accessor Apr 28, 2026
@deepld deepld force-pushed the deepld/version branch 2 times, most recently from 4c525f2 to 279e687 Compare April 29, 2026 04:55
@deepld deepld changed the title feat: add plain_value_and_slot accessor salt: plain_value_and_slot accessor + drop hashbrown/serde feature Apr 29, 2026
Hand-write Serialize/Deserialize for SaltProof::levels so the salt and
ipa-multipoint crates can drop the hashbrown/serde feature flag, which
transitively pulled in serde_core 1.0.221+ and broke downstream
alloy-tx-macros 1.0.23. Entries are emitted in ascending key order so
proof bytes stay deterministic across provers.
Copy link
Copy Markdown
Member

@flyq flyq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread salt/src/proof/prover.rs Outdated
Comment thread salt/src/proof/prover.rs Outdated
@deepld deepld merged commit 6474b3c into main May 8, 2026
9 checks passed
@deepld deepld deleted the deepld/version branch May 8, 2026 04:01
@flyq flyq mentioned this pull request May 9, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants