Skip to content

Fix oversized blocks during stall recovery (#259)#260

Merged
MegaRedHand merged 5 commits intomainfrom
issue-259-oversized-blocks
Apr 7, 2026
Merged

Fix oversized blocks during stall recovery (#259)#260
MegaRedHand merged 5 commits intomainfrom
issue-259-oversized-blocks

Conversation

@MegaRedHand
Copy link
Copy Markdown
Collaborator

@MegaRedHand MegaRedHand commented Apr 7, 2026

Summary

  • Cap attestations per block to MAX_ATTESTATIONS_PER_BLOCK = 24 in build_block, preventing blocks from exceeding the 10 MiB MAX_PAYLOAD_SIZE gossip limit during stall recovery
  • Thread the remaining budget into extend_proofs_greedily so proof selection stops once the cap is reached
  • Add regression test that simulates a stall (50 entries with ~253 KB XMSS proofs each) and verifies the block stays under the wire limit

Closes #259

Context

With XMSS aggregated proofs averaging ~253 KB each, just ~39 attestations exceed the 10 MiB limit. During stall recovery the payload pool can grow well past that, causing MessageTooLarge on gossip publish and snappy decoded len exceeds max on receiving peers.

24 attestations produces ~6 MiB, leaving headroom for the block header, proposer signature, and proof size variance.

build_block() greedily includes every matching attestation from the
payload pool with no size cap. During stall recovery the pool can grow
large enough that the SSZ-encoded SignedBlockWithAttestation exceeds
MAX_PAYLOAD_SIZE (10 MiB), causing gossip publish failures.

The test simulates a stall by populating the pool with 50 entries
carrying ~253 KB proofs each, producing a 12.4 MiB block. It asserts
the block must stay under the limit, so it fails until a size cap is
enforced in build_block.
With XMSS proofs averaging ~253 KB each, an uncapped block builder can
produce blocks exceeding the 10 MiB MAX_PAYLOAD_SIZE gossip limit during
stall recovery. Cap build_block to MAX_ATTESTATIONS_PER_BLOCK = 24,
which yields ~6 MiB and leaves headroom for header and proposer
signature.

The budget is threaded into extend_proofs_greedily so it stops selecting
proofs once the cap is reached.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🤖 Kimi Code Review

Overall Assessment: Good fix for a critical DoS vulnerability (issue #259). The implementation correctly caps block size to stay within the 10 MiB gossip limit while maximizing validator coverage via greedy selection.

Minor Issues:

  1. Function signature consistency (crates/blockchain/src/store.rs:928):
    The addition of budget: usize to extend_proofs_greedily is correct, but verify no other call sites exist in the codebase. The diff shows only the build_block call updated (line 1042), which is consistent.

  2. Test assertion strictness (`crates/blockchain/src/store.rs:1430-1433):

    assert_eq!(
        block.body.attestations.len(),
        MAX_ATTESTATIONS_PER_BLOCK,

    This assertion assumes the pool always has ≥24 entries. If the test fixtures change to have fewer entries, this will fail even if the cap logic is correct. Consider using assert!(block.body.attestations.len() <= MAX_ATTESTATIONS_PER_BLOCK) or asserting the specific test condition (50 available → 24 selected).

Security & Consensus:

  1. Resource exhaustion in test (crates/blockchain/src/store.rs:1380-1381): The test allocates 50 * 253KB = ~12.6MBof zeroed memory for proofs. While acceptable for a regression test, consider adding#[cfg(test)]` guard or documenting the memory requirement if this grows.

  2. XMSS size assumption (crates/blockchain/src/store.rs:31-32): The comment assumes 253 KB per XMSS proof. If the XMSS parameters change (e.g., tree height), this constant may need adjustment. Consider making this configurable or adding a static assertion linking it to the actual XmssSignature` size.

Code Quality:

  1. Early termination logic (crates/blockchain/src/store.rs:1022-1024): The if remaining == 0 { break; }check inside thesorted_entries` loop is correct, but could be slightly more idiomatic as:
    for &(data_root, (att_data, proofs)) in &sorted_entries {
        if aggregated_attestations.len() >= MAX_ATTESTATIONS_PER_BLOCK {
            break;
        }
        let remaining = MAX_ATTESTATIONS_PER_BLOCK - aggregated_attestations.len();
    This avoids the saturating_sub in every iteration. Current implementation is functionally correct, just a micro-optimization consideration.

Dependencies:

  1. Test-only dependencies (crates/blockchain/Cargo.toml:36-37): libsszandlibssz-typesare added to[dependencies]but appear test-only. Move them to[dev-dependencies]` to avoid bloating the release binary:
    [dev-dependencies]
    libssz.workspace = true
    libssz-types.workspace = true

Consensus Correctness:
The greedy selection (maximizing validator coverage per proof) under the cap preserves liveness while preventing the stall scenario described in issue #259. The 24-attestation limit leaves adequate headroom (~3.8 MiB) for block header, proposer signature, and future protocol additions.

Action Required:

  • Item 6 (move SSZ deps to dev-dependencies) should be fixed before merge.
  • Item 2 (test assertion) is recommended but not blocking.

Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🤖 Codex Code Review

Findings

  1. The new MAX_ATTESTATIONS_PER_BLOCK = 24 cap does not actually enforce the 10 MiB gossip ceiling; it only caps count, not bytes. crates/blockchain/src/store.rs:29 crates/blockchain/src/store.rs:1022 crates/common/types/src/block.rs:80
    AggregatedSignatureProof::proof_data is valid up to ByteList<1_048_576>, so a fully valid set of 24 large proofs can still produce a block well above MAX_PAYLOAD_SIZE. In that case the proposer builds and signs a block that local state transition accepts, but gossip peers reject at decode time, which is a liveness and propagation failure. For consensus code, this should be enforced by accumulated SSZ size (or a conservative byte-budget derived from actual proof lengths), not a fixed attestation count based on “average” proof size.

No other material correctness issues stood out in the touched logic.

I could not run the targeted test in this environment because cargo/rustup tried to write under /home/runner/.rustup and hit the sandbox’s read-only restriction.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 7, 2026

Greptile Summary

This PR fixes a block size overflow during stall recovery by capping attestations at MAX_ATTESTATIONS_PER_BLOCK = 24 in build_block and threading a remaining-budget parameter into extend_proofs_greedily. A regression test validates that a 50-entry payload pool (each with a ~253 KB XMSS proof) produces a block that stays under the 10 MiB gossip wire limit.

Confidence Score: 5/5

Safe to merge — the cap logic is correct, the budget is properly threaded, and a regression test validates the fix end-to-end.

No P0 or P1 issues found. The remaining budget is recomputed correctly for each entry in the for-loop, extend_proofs_greedily respects it, the outer loop terminates correctly once the cap is hit, and the new dev-only dependencies are scoped appropriately. All findings are P2 or below.

No files require special attention.

Important Files Changed

Filename Overview
crates/blockchain/src/store.rs Adds MAX_ATTESTATIONS_PER_BLOCK=24 cap in build_block and threads a remaining-budget parameter into extend_proofs_greedily; includes regression test for the stall scenario. Logic is correct: remaining is recomputed per entry in the for-loop, extend_proofs_greedily respects the budget, and the outer loop correctly terminates after the cap is hit.
crates/blockchain/Cargo.toml Adds libssz and libssz-types under [dev-dependencies] only, scoped appropriately to the new regression test.
Cargo.lock Lock file updated to record libssz and libssz-types as new dev dependencies for ethlambda-blockchain.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[build_block called] --> B{aggregated_payloads empty?}
    B -- Yes --> Z[Build final block with 0 attestations]
    B -- No --> C[Sort entries by target.slot, data_root]
    C --> D[Outer loop: found_new = false]
    D --> E[For each sorted entry]
    E --> F{remaining = MAX 24 - len\nremaining == 0?}
    F -- Yes --> G[break for-loop]
    F -- No --> H{entry already processed?}
    H -- Yes --> E
    H -- No --> I{source == current_justified\nand head in known_block_roots?}
    I -- No --> E
    I -- Yes --> J[Mark processed, found_new = true]
    J --> K[extend_proofs_greedily\nwith budget = remaining]
    K --> L[Add up to budget attestations\nfor this att_data]
    L --> E
    G --> M{found_new?}
    M -- No --> Z
    M -- Yes --> N[process_slots + process_block\ncheck justification]
    N --> O{justification advanced?}
    O -- No --> Z
    O -- Yes --> D
    Z --> P[Compute final state_root\nReturn Block + proofs]
Loading

Reviews (1): Last reviewed commit: "Cap attestations per block to 24 to prev..." | Re-trigger Greptile

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🤖 Claude Code Review

Here is my review of PR #260:


Review: Fix oversized blocks during stall recovery

Overall: Clean, focused fix that correctly addresses the root cause. Logic is sound, test coverage is meaningful.


Correctness

extend_proofs_greedily budget threading — correct. The added < budget guard in the while condition bounds the number of proofs selected per data_root, and added += 1 increments only on an actual proof addition. The budget == 0 early return prevents wasted work.

build_block remaining placement — the cap check (remaining == 0 → break) appears before the processed_data_roots.contains and source-filter checks. Since this is a break (not a continue), it exits the inner loop entirely when the budget is exhausted — correct, though it means we're computing remaining on every iteration even for entries that would be filtered anyway. Negligible cost.

Outer loop extra iteration — when the cap is hit mid-loop and found_new = true, the outer justification loop continues for one extra iteration before remaining == 0 causes found_new = false and breaks. This triggers an extra process_slots + process_block call unnecessarily. Consider short-circuiting:

// after the inner for loop, before the justification check:
if aggregated_attestations.len() >= MAX_ATTESTATIONS_PER_BLOCK {
    break;
}

This is a minor performance concern, not a correctness issue.

AggregatedAttestations capacity — the type is SszList<_, 4096>, so MAX_ATTESTATIONS_PER_BLOCK = 24 is far below the hard SSZ limit. The .expect("attestation count exceeds limit") at lines 1057 and 1081 cannot panic under the new cap.


Count-Based vs Byte-Based Cap

The cap is count-based (24) rather than byte-based. This is pragmatic and avoids expensive SSZ encoding during block construction, but it implicitly assumes proof sizes stay bounded. The constant is well-documented with the ~253 KB average and ~6 MiB projected block size. If the leanSpec proof format changes (e.g., larger aggregation trees), this constant would need revisiting. Consider a comment noting this dependency.


Test

build_block_respects_max_payload_size_during_stall — the regression test is well-constructed:

  • 50 entries with 1 non-overlapping validator each → extend_proofs_greedily adds exactly 1 proof per call, making the count directly proportional to entries processed.
  • The assert_eq!(..., MAX_ATTESTATIONS_PER_BLOCK) assertion verifies the cap was enforced before the SSZ size check.
  • Constructing and encoding a SignedBlockWithAttestation with full-size proof bytes is the right approach — it directly tests what publish_block encodes.

One observation: the test only covers the case where every entry has exactly 1 proof with 1 validator (no greedy merge needed). A case where one data_root contributes multiple proofs (testing budget consumption within a single extend_proofs_greedily call) would provide more coverage of the added < budget path in the inner while loop. Not a blocker, but worth a follow-up.


Dependencies

libssz and libssz-types are correctly placed under [dev-dependencies] and are already workspace dependencies — no issue.


Summary

  • Correctness: good
  • The outer loop minor inefficiency and the count-vs-byte caveat are worth noting but not blocking
  • Test covers the primary regression scenario correctly

Automated review by Claude (Anthropic) · sonnet · custom prompt

Replace the fixed count cap (MAX_ATTESTATIONS_PER_BLOCK = 24) with a
byte budget (MAX_ATTESTATION_PROOF_BYTES = 9 MiB) that tracks cumulative
proof_data.len() as proofs are added. This adapts automatically to
varying proof sizes instead of assuming ~253 KB averages.

The 9 MiB budget leaves 1 MiB margin within the 10 MiB MAX_PAYLOAD_SIZE
gossip limit for block header, proposer signature, attestation metadata,
and SSZ encoding overhead.
@MegaRedHand MegaRedHand merged commit 1345286 into main Apr 7, 2026
3 checks passed
@MegaRedHand MegaRedHand deleted the issue-259-oversized-blocks branch April 7, 2026 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Block builder produces oversized blocks during stalls (MAX_PAYLOAD_SIZE violation)

2 participants