perf: parallelize attestation signature verification by MegaRedHand · Pull Request #274 · lambdaclass/ethlambda

MegaRedHand · 2026-04-10T18:32:01Z

Summary

Parallelize attestation signature verification using rayon's par_iter
Signature verification was sequential (~40-50ms per attestation), causing blocks with many attestations to take 1.4-1.9s to process
Split into two phases: prepare inputs sequentially (cheap), then verify in parallel
Observed in devnet: with 36 attestations, block processing took ~1.4-1.9s; expect ~4x speedup on 4 cores

Test plan

make fmt passes
make lint passes
make test passes (all 101 tests, including signature spec tests)
Deploy to devnet and verify block processing times drop

github-actions · 2026-04-10T18:33:49Z

🤖 Kimi Code Review

Overall: Good optimization moving signature verification to parallel execution. The approach of separating cheap input preparation from expensive verification is correct. However, there are performance and observability regressions to address.

Critical Issues

crates/blockchain/src/store.rs

1. Performance regression: Unnecessary Vec clone in hot loop (Line ~1201)

verify_aggregated_signature(proof_data, public_keys.clone(), message, *slot)

Each attestation clones the entire public_keys Vec (48 bytes × N validators) inside the parallel worker. With many attestations, this significantly increases memory traffic.

Fix: Change verify_aggregated_signature signature to accept &[PublicKey] instead of Vec<PublicKey>, or wrap the Vec in Arc<[PublicKey]> during input preparation to share ownership without cloning.

2. Loss of timing metrics (Lines ~1196-1207)

The original code wrapped verification in:

let _timing = metrics::time_pq_sig_aggregated_signatures_verification();

This metric is entirely removed. For consensus monitoring, you need visibility into aggregate signature verification latency.

Fix: Either:

Add the timing metric back around the entire parallel block (measures total time, not per-signature)
Use rayon::spawn with manual timing if per-signature timing is needed
Document the intentional removal in PR description

Code Quality

3. Inconsistent error handling pattern

The sequential preparation phase uses collect::<Result<_, _>>() (good), but the parallel phase uses try_for_each with manual match on the result. The match is unnecessary since verify_aggregated_signature already returns Result<(), E>.

Suggestion (Line ~1198-1208):

.par_iter()
.try_for_each(|(proof_data, public_keys, message, slot)| {
    verify_aggregated_signature(proof_data, public_keys.clone(), message, *slot)
        .map(|_| metrics::inc_pq_sig_aggregated_signatures_valid())
        .map_err(|e| {
            metrics::inc_pq_sig_aggregated_signatures_invalid();
            StoreError::AggregateVerificationFailed(e)
        })
})

4. Module import placement (Line ~1197)

use rayon::prelude::*;

Placing this inside the function body is unconventional. Move to top-level imports for consistency.

Security & Correctness

5. Verification completeness

✅ Good: Using try_for_each ensures all attestations are verified (not just first N), and short-circuits on first failure, maintaining consensus safety.

6. Thread safety

✅ Good: The verification_inputs Vec is immutable during parallel iteration, and verify_aggregated_signature appears pure (no mutable shared state). Safe for Rayon.

Documentation

7. `RELEASE.md` & `README.md` updates

✅ Good: Clear explanation of the new unstable vs latest tagging strategy. The table in README accurately reflects the release process.

Nitpicks

Line ~1184: The comment // cheap: bit checks + pubkey lookups is helpful, but note that validator_indices + pubkey lookups are actually O(N) in the number of validators per attestation. For large committees, this could be non-trivial, but still likely cheaper than crypto verification.

Line ~1203: *slot dereference suggests slot is &u32. Verify that verify_aggregated_signature doesn't actually need u32 by value (if it just reads it, &u32 would suffice).

Summary

Fix the public_keys.clone() overhead - this is the blocker
Restore or document the timing metric removal
Move use rayon::prelude::*; to imports
Consider simplifying the error handling in the parallel block

The parallelization strategy is sound and maintains consensus correctness.

Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

github-actions · 2026-04-10T18:34:19Z

🤖 Claude Code Review

Here's my review of PR #274:

PR Review: perf: parallelize attestation signature verification

Summary

The two-phase approach (sequential input preparation → parallel verification) is correct and well-motivated. The Once-based verifier initialization is thread-safe, and prometheus counters use atomics so concurrent inc_* calls are safe. The logic is sound.

Issues

par_iter() forces an unnecessary Vec<ValidatorPublicKey> clone per attestation

store.rs:1217

// Current: par_iter() borrows, forcing .clone() on each parallel call
verification_inputs
    .par_iter()
    .try_for_each(|(proof_data, public_keys, message, slot)| {
        let result =
            verify_aggregated_signature(proof_data, public_keys.clone(), message, *slot);

verify_aggregated_signature takes public_keys: Vec<ValidatorPublicKey> by value, and verification_inputs is not used after this point. Switching to into_par_iter() moves each element instead of borrowing, eliminating the clone:

verification_inputs
    .into_par_iter()
    .try_for_each(|(proof_data, public_keys, message, slot)| {
        let result =
            verify_aggregated_signature(proof_data, public_keys, &message, slot);

Each XMSS pubkey is 52 bytes; with a full attestation committee this is a non-trivial allocation per attestation that is entirely avoidable.

Timing metric dropped without replacement

store.rs (removed context)

The RAII guard let _timing = metrics::time_pq_sig_aggregated_signatures_verification() was removed and no parallel equivalent added. The overall aggregated_elapsed log field covers wall-clock time, but the lean_pq_sig_aggregated_signatures_verification histogram loses all data. Wrapping each closure body with a fresh timing guard still works correctly in parallel (each guard measures one verification call on its thread):

.try_for_each(|(proof_data, public_keys, message, slot)| {
    let _timing = metrics::time_pq_sig_aggregated_signatures_verification();
    let result = verify_aggregated_signature(...);

Given that the stated motivation is observing a 4× speedup, keeping this metric would let you verify it in production without relying solely on log output.

use rayon::prelude::* inside function body

store.rs:1212

This import belongs at the top of the file with the other use declarations, not embedded mid-function. It's not a correctness issue but is inconsistent with the rest of the codebase.

Non-deterministic error attribution on failure

Minor: the sequential version always returned the error from the first failing attestation (by index). try_for_each in rayon returns the error from whichever thread completes first. For consensus correctness this doesn't matter, but it may affect debugging (the error log won't consistently point to the same offending attestation index). Acceptable tradeoff, worth a comment.

Docs changes

README.md and RELEASE.md updates are clear and accurate. The tag convention table is a useful addition.

Verdict

Approve with the into_par_iter() fix applied — that's the one concrete regression introduced by the refactor. The timing metric drop is secondary but worth restoring for observability.

Automated review by Claude (Anthropic) · sonnet · custom prompt

greptile-apps · 2026-04-10T18:34:30Z

Greptile Summary

This PR parallelizes attestation signature verification in verify_signatures using rayon's par_iter, splitting the work into a cheap sequential preparation phase (bit checks, pubkey lookups) and a parallel verification phase. The implementation is correct and the expected ~4× speedup on 4 cores should address the observed 1.4–1.9 s block processing times with 36 attestations. Documentation changes update container tagging conventions.

Confidence Score: 5/5

Safe to merge — parallelization is correct, error propagation is sound, and all findings are P2 style/observability suggestions.

The two-phase approach (sequential prep, parallel verify) is correctly implemented. Prometheus counters are atomic so calling them from rayon threads is safe. try_for_each correctly short-circuits on the first error. The only findings are a dropped timing histogram (observability regression, not a correctness issue) and an in-function import placement. Both are P2.

crates/blockchain/src/store.rs — dropped per-attestation timing metric worth restoring.

Important Files Changed

Filename	Overview
crates/blockchain/src/store.rs	Parallel attestation signature verification added via rayon; per-attestation timing metric (`time_pq_sig_aggregated_signatures_verification`) silently dropped from this path.
crates/blockchain/Cargo.toml	Added `rayon` workspace dependency for parallel iteration.
Cargo.toml	Registered `rayon = "1.10"` in workspace dependencies.
README.md	Added Container Releases section and a new roadmap item for formal STF verification.
RELEASE.md	Updated tag convention: `unstable` replaces `latest` for main-branch builds; `latest` now tracks the current devnet.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[verify_signatures called] --> B{attestation count\n== signature count?}
    B -- No --> ERR1[Err: AttestationSignatureMismatch]
    B -- Yes --> C[Sequential preparation phase\nbits check + pubkey lookup]
    C --> D{Any input error?}
    D -- Yes --> ERR2[Err: ParticipantsMismatch /\nInvalidValidatorIndex /\nPubkeyDecodingFailed]
    D -- No --> E[verification_inputs Vec collected]
    E --> F[rayon par_iter]
    F --> G1[Thread 1: verify_aggregated_signature]
    F --> G2[Thread 2: verify_aggregated_signature]
    F --> G3[Thread N: verify_aggregated_signature]
    G1 & G2 & G3 --> H{All OK?}
    H -- Any Err --> ERR3[Err: AggregateVerificationFailed]
    H -- All Ok --> I[inc valid metrics]
    I --> J[verify proposer signature sequentially]
    J --> K[Log timing info]
    K --> L[Ok]

Prompt To Fix All With AI

This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 1199-1214

Comment:
**Per-attestation timing metric dropped**

The `_timing = metrics::time_pq_sig_aggregated_signatures_verification()` guard was removed and nothing replaces it inside the parallel closure. The histogram (`LEAN_PQ_SIG_AGGREGATED_SIGNATURES_VERIFICATION_TIME_SECONDS`) is still registered and still used in the gossip path (line 458), so it will now only reflect gossip-path verifications, silently under-counting block-path verifications. Since `TimingGuard` is a plain RAII struct with no shared state, it's safe to create per-thread inside `par_iter`. Consider adding it back:

```suggestion
    verification_inputs
        .par_iter()
        .try_for_each(|(proof_data, public_keys, message, slot)| {
            let _timing = metrics::time_pq_sig_aggregated_signatures_verification();
            let result =
                verify_aggregated_signature(proof_data, public_keys.clone(), message, *slot);
            match result {
                Ok(()) => {
                    metrics::inc_pq_sig_aggregated_signatures_valid();
                    Ok(())
                }
                Err(e) => {
                    metrics::inc_pq_sig_aggregated_signatures_invalid();
                    Err(StoreError::AggregateVerificationFailed(e))
                }
            }
        })?;
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 1198

Comment:
**Import inside function body**

`use rayon::prelude::*` is placed mid-function. The codebase convention is to keep imports at the top of the file or at least at the top of the function; placing it inline between a comment and the expression it enables makes it easy to miss. Consider moving it to the top of the file alongside the other `use` statements.

```suggestion
    use rayon::prelude::*;
    // Run expensive signature verification in parallel
    verification_inputs
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "perf: parallelize attestation signature ..." | Re-trigger Greptile}

Signature verification dominated block processing time (~40-50ms per attestation, sequential). With 36 attestations this meant ~1.4-1.9s per block, causing the target to fall hundreds of slots behind the head. Split into two phases: 1. Prepare inputs sequentially (cheap: bit checks, pubkey lookups) 2. Verify signatures in parallel via rayon's par_iter On a 4-core machine with 36 attestations, expect ~4x speedup (1.4s -> ~350ms).

greptile-apps · 2026-04-10T18:34:33Z

crates/blockchain/src/store.rs

+    verification_inputs
+        .par_iter()
+        .try_for_each(|(proof_data, public_keys, message, slot)| {
+            let result =
+                verify_aggregated_signature(proof_data, public_keys.clone(), message, *slot);
+            match result {
+                Ok(()) => {
+                    metrics::inc_pq_sig_aggregated_signatures_valid();
+                    Ok(())
+                }
+                Err(e) => {
+                    metrics::inc_pq_sig_aggregated_signatures_invalid();
+                    Err(StoreError::AggregateVerificationFailed(e))
                }
-                validators[vid as usize]
-                    .get_pubkey()
-                    .map_err(|_| StoreError::PubkeyDecodingFailed(vid))
-            })
-            .collect::<Result<_, _>>()?;
-
-        let verification_result = {
-            let _timing = metrics::time_pq_sig_aggregated_signatures_verification();
-            verify_aggregated_signature(&aggregated_proof.proof_data, public_keys, &message, slot)
-        };
-        match verification_result {
-            Ok(()) => metrics::inc_pq_sig_aggregated_signatures_valid(),
-            Err(e) => {
-                metrics::inc_pq_sig_aggregated_signatures_invalid();
-                return Err(StoreError::AggregateVerificationFailed(e));
            }
-        }
-    }
+        })?;


Per-attestation timing metric dropped

The _timing = metrics::time_pq_sig_aggregated_signatures_verification() guard was removed and nothing replaces it inside the parallel closure. The histogram (LEAN_PQ_SIG_AGGREGATED_SIGNATURES_VERIFICATION_TIME_SECONDS) is still registered and still used in the gossip path (line 458), so it will now only reflect gossip-path verifications, silently under-counting block-path verifications. Since TimingGuard is a plain RAII struct with no shared state, it's safe to create per-thread inside par_iter. Consider adding it back:

Suggested change

verification_inputs

.par_iter()

.try_for_each(|(proof_data, public_keys, message, slot)| {

let result =

verify_aggregated_signature(proof_data, public_keys.clone(), message, *slot);

match result {

Ok(()) => {

metrics::inc_pq_sig_aggregated_signatures_valid();

Ok(())

}

Err(e) => {

metrics::inc_pq_sig_aggregated_signatures_invalid();

Err(StoreError::AggregateVerificationFailed(e))

}

validators[vid as usize]

.get_pubkey()

.map_err(|_| StoreError::PubkeyDecodingFailed(vid))

})

.collect::<Result<_, _>>()?;

let verification_result = {

let _timing = metrics::time_pq_sig_aggregated_signatures_verification();

verify_aggregated_signature(&aggregated_proof.proof_data, public_keys, &message, slot)

};

match verification_result {

Ok(()) => metrics::inc_pq_sig_aggregated_signatures_valid(),

Err(e) => {

metrics::inc_pq_sig_aggregated_signatures_invalid();

return Err(StoreError::AggregateVerificationFailed(e));

}

}

}

})?;

verification_inputs

.par_iter()

.try_for_each(|(proof_data, public_keys, message, slot)| {

let _timing = metrics::time_pq_sig_aggregated_signatures_verification();

let result =

verify_aggregated_signature(proof_data, public_keys.clone(), message, *slot);

match result {

Ok(()) => {

metrics::inc_pq_sig_aggregated_signatures_valid();

Ok(())

}

Err(e) => {

metrics::inc_pq_sig_aggregated_signatures_invalid();

Err(StoreError::AggregateVerificationFailed(e))

}

}

})?;

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/blockchain/src/store.rs Line: 1199-1214 Comment: **Per-attestation timing metric dropped** The `_timing = metrics::time_pq_sig_aggregated_signatures_verification()` guard was removed and nothing replaces it inside the parallel closure. The histogram (`LEAN_PQ_SIG_AGGREGATED_SIGNATURES_VERIFICATION_TIME_SECONDS`) is still registered and still used in the gossip path (line 458), so it will now only reflect gossip-path verifications, silently under-counting block-path verifications. Since `TimingGuard` is a plain RAII struct with no shared state, it's safe to create per-thread inside `par_iter`. Consider adding it back: ```suggestion verification_inputs .par_iter() .try_for_each(|(proof_data, public_keys, message, slot)| { let _timing = metrics::time_pq_sig_aggregated_signatures_verification(); let result = verify_aggregated_signature(proof_data, public_keys.clone(), message, *slot); match result { Ok(()) => { metrics::inc_pq_sig_aggregated_signatures_valid(); Ok(()) } Err(e) => { metrics::inc_pq_sig_aggregated_signatures_invalid(); Err(StoreError::AggregateVerificationFailed(e)) } } })?; ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-10T18:34:34Z

crates/blockchain/src/store.rs

+        .collect::<Result<_, StoreError>>()?;
+
+    // Run expensive signature verification in parallel
+    use rayon::prelude::*;


Import inside function body

use rayon::prelude::* is placed mid-function. The codebase convention is to keep imports at the top of the file or at least at the top of the function; placing it inline between a comment and the expression it enables makes it easy to miss. Consider moving it to the top of the file alongside the other use statements.

Suggested change

use rayon::prelude::*;

use rayon::prelude::*;

// Run expensive signature verification in parallel

verification_inputs

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/blockchain/src/store.rs Line: 1198 Comment: **Import inside function body** `use rayon::prelude::*` is placed mid-function. The codebase convention is to keep imports at the top of the file or at least at the top of the function; placing it inline between a comment and the expression it enables makes it easy to miss. Consider moving it to the top of the file alongside the other `use` statements. ```suggestion use rayon::prelude::*; // Run expensive signature verification in parallel verification_inputs ``` How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

github-actions · 2026-04-10T18:35:24Z

🤖 Codex Code Review

High: the new verification_inputs staging turns aggregate verification from streaming to fully buffered, and then clones every pubkey set again inside the Rayon closure. In the old code, each attestation’s Vec<ValidatorPublicKey> was built, verified, and dropped immediately; now one vector is retained for every attestation, and public_keys.clone() duplicates it once more before calling verify_aggregated_signature. Because both the attestation list and signature list are capped at 4096, and each bitlist can itself cover up to 4096 validators, this changes memory from O(participants in one attestation) to O(total participants across the whole block), which is a new block-import DoS surface. See store.rs, store.rs, block.rs, block.rs, attestation.rs. I’d keep this streaming or use into_par_iter() over owned work items in bounded batches so the pubkey vectors are moved, not cloned and retained en masse.
Medium: this makes peer-triggered block validation consume the global Rayon pool from the hot import path, so one large block can now fan verifier work across all cores and contend with fork choice, networking, and proposer duties. Previously the damage from an expensive block was naturally limited to one validation thread; after this change it becomes whole-process CPU amplification. That is a meaningful operational/security regression for a consensus client unless parallelism is explicitly bounded or gated behind a threshold. See store.rs, store.rs.
Low: block import no longer records time_pq_sig_aggregated_signatures_verification() per aggregate proof, while the gossip attestation path still does. That silently changes the meaning of the histogram and will make verification latency dashboards/alerts inconsistent after merge. See store.rs, store.rs, metrics.rs.

No fork-choice, justification/finalization, STF, or SSZ logic changed in this PR; the risk is isolated to XMSS aggregate verification behavior.

I couldn’t run cargo test here because the sandbox blocks rustup from creating temp files under its default home.

Automated review by OpenAI Codex · gpt-5.4 · custom prompt

Also bump rayon to 1.11.

Switch from par_iter() + public_keys.clone() to into_par_iter() which moves ownership of each tuple into the parallel closure, eliminating the per-attestation Vec<ValidatorPublicKey> clone.

MegaRedHand force-pushed the parallel-sig-verification branch from 1a71cbf to e45113b Compare April 10, 2026 18:34

greptile-apps bot reviewed Apr 10, 2026

View reviewed changes

MegaRedHand added 2 commits April 10, 2026 15:36

fix: emit per-attestation verification timing metric in parallel loop

cf0ef91

Also bump rayon to 1.11.

perf: use into_par_iter() to avoid cloning public keys

e6b6d48

Switch from par_iter() + public_keys.clone() to into_par_iter() which moves ownership of each tuple into the parallel closure, eliminating the per-attestation Vec<ValidatorPublicKey> clone.

MegaRedHand added the performance Performance improvements or possible performance improvements label Apr 10, 2026

pablodeymo approved these changes Apr 10, 2026

View reviewed changes

pablodeymo merged commit e98859d into main Apr 10, 2026
3 checks passed

pablodeymo deleted the parallel-sig-verification branch April 10, 2026 21:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: parallelize attestation signature verification#274

perf: parallelize attestation signature verification#274
pablodeymo merged 3 commits intomainfrom
parallel-sig-verification

MegaRedHand commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

greptile-apps bot commented Apr 10, 2026

Important Files Changed

Uh oh!

greptile-apps bot Apr 10, 2026

Uh oh!

greptile-apps bot Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MegaRedHand commented Apr 10, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Apr 10, 2026

🤖 Kimi Code Review

Critical Issues

1. Performance regression: Unnecessary Vec clone in hot loop (Line ~1201)

2. Loss of timing metrics (Lines ~1196-1207)

Code Quality

3. Inconsistent error handling pattern

4. Module import placement (Line ~1197)

Security & Correctness

5. Verification completeness

6. Thread safety

Documentation

7. RELEASE.md & README.md updates

Nitpicks

Summary

Uh oh!

github-actions bot commented Apr 10, 2026

🤖 Claude Code Review

PR Review: perf: parallelize attestation signature verification

Summary

Issues

Docs changes

Verdict

Uh oh!

greptile-apps bot commented Apr 10, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 10, 2026

🤖 Codex Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

7. `RELEASE.md` & `README.md` updates