Skip to content

fix: self-deliver aggregator attestations to gossip_signatures#270

Merged
pablodeymo merged 5 commits intomainfrom
fix/aggregator-self-delivery
Apr 9, 2026
Merged

fix: self-deliver aggregator attestations to gossip_signatures#270
pablodeymo merged 5 commits intomainfrom
fix/aggregator-self-delivery

Conversation

@MegaRedHand
Copy link
Copy Markdown
Collaborator

Summary

  • Aggregator now calls on_gossip_attestation() locally before publishing to gossip, ensuring its own validator's attestation enters gossip_signatures and is included in aggregated proofs
  • Adds publish_best_effort() to suppress the expected NoPeersSubscribedToTopic warning when the aggregator publishes to attestation subnets it subscribes to
  • Threads is_aggregator from SwarmConfig through BuiltSwarm into P2PServer so the gossipsub handler can use best-effort publishing

Context

Gossipsub does not deliver messages back to the sender. The aggregator subscribes to the attestation subnet and publishes via mesh, but in small devnets no other node subscribes to attestation subnets. This causes the aggregator's own attestation publish to fail silently with NoPeersSubscribedToTopic, meaning its vote never enters gossip_signatures, is never aggregated, and is never included in blocks.

The reference spec handles this explicitly in _produce_attestations():

# Process attestation locally before publishing.
#
# Gossipsub does not deliver messages back to the sender.
# Without local processing, the aggregator node never sees its own
# validator's attestation in gossip_signatures, reducing the
# aggregation count below the 2/3 safe-target threshold.
self.sync_service.store = self.sync_service.store.on_gossip_attestation(
    signed_attestation=signed_attestation,
    is_aggregator=is_aggregator_role,
)

Test plan

  • cargo build passes
  • cargo clippy --workspace -- -D warnings passes
  • cargo test --workspace --release --lib passes (all 32 unit tests)
  • Deploy to devnet and verify:
    • Validator 3's attestation appears in "Attestation processed" logs on the aggregator
    • Blocks from all proposers include aggregator's vote in aggregated proofs (higher attestation counts)
    • No more NoPeersSubscribedToTopic warnings on the aggregator node
    • Finalization continues normally

Gossipsub does not deliver messages back to the sender. The aggregator
subscribes to the attestation subnet and publishes via mesh, but in
small networks no other node subscribes, so the publish fails with
NoPeersSubscribedToTopic and the aggregator's own attestation never
enters gossip_signatures. This means it is excluded from aggregated
proofs in blocks.

Fix by calling on_gossip_attestation() locally before publishing to
gossip, matching the reference spec behavior. Also suppress the
expected NoPeersSubscribedToTopic warning for the aggregator's
attestation publish using a best-effort publish mode.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

🤖 Claude Code Review

Review: fix: self-deliver aggregator attestations to gossip_signatures

Summary: This fix correctly addresses a consensus-critical bug where the aggregator's own validator signature was excluded from aggregated proofs. The approach matches the reference Python spec. The implementation is clean and well-reasoned overall.


Correctness

Self-delivery ordering (lib.rs:201-210): Self-delivery happens before the gossip publish — correct, matching the reference spec's "process locally before publishing" pattern. The signature goes into gossip_signatures unconditionally regardless of network state.

Idempotency on re-delivery: insert_gossip_signature at storage/src/store.rs:998-1007 has a dedup check (any(|e| e.validator_id == validator_id)). If the attestation somehow propagates back from another peer, the second insert is a no-op. Good.

Full validation preserved: The self-delivery path goes through store::on_gossip_attestation, which runs validate_attestation_data + full XMSS signature verification (lines 355, 376-383). No security checks are bypassed — the aggregator node verifies its own freshly-signed attestation, which is correct.


Concerns

best_effort silently discards all publish errors (swarm_adapter.rs:151-154):

let result = swarm.behaviour_mut().gossipsub.publish(topic, data);
if !best_effort {
    let _ = result.inspect_err(|err| warn!(%err, "Swarm adapter: publish failed"));
}

The PR's intent is to suppress the expected NoPeersSubscribedToTopic warning. But best_effort: true silently discards all error variants — including Duplicate, MessageTooLarge, SigningError, etc. In the aggregator case these other errors would be invisible. A more surgical fix:

if let Err(err) = result {
    if !best_effort || !matches!(err, PublishError::NoPeersSubscribedToTopic) {
        warn!(%err, "Swarm adapter: publish failed");
    }
}

This keeps the rationale precise and avoids silently masking real problems in future.

is_aggregator coupling in the P2P layer (handler.rs:150-154): The publish_attestation function in the network layer now makes a role-based decision. The comment explains the reasoning well, so this is acceptable for now. Worth noting that if non-aggregator nodes are ever made to subscribe to subnets, they'd have the same NoPeersSubscribedToTopic issue on small networks, and the fix here wouldn't apply to them.


Minor points

  • The publish_best_effort name implies a difference in delivery semantics, but both publish and publish_best_effort are fire-and-forget through the unbounded channel. The only difference is error logging. A name like publish_suppress_no_peers_warning or a silent: bool param on publish itself would more precisely describe the distinction, though the current doc comment does explain it adequately.

  • The warn! level for self-delivery failure (lib.rs:208) seems appropriate — this is a recoverable condition since the gossip publish still proceeds.


Overall

The fix is correct and necessary for consensus progress in small devnets. The self-delivery approach is the right solution for the gossipsub limitation. The main actionable concern is the overly broad best_effort error suppression — worth narrowing to NoPeersSubscribedToTopic specifically to preserve observability for other error variants.


Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

🤖 Codex Code Review

  1. best_effort currently suppresses all gossipsub publish errors for aggregator attestations, not just the benign NoPeersSubscribedToTopic case. In swarm_adapter.rs, the best_effort branch drops the full publish() result, and handler.rs enables that path for every aggregator attestation publish. That means real failures like oversize payloads or other libp2p publish errors become silent, while the caller still logs the attestation as published. For consensus networking this is a liveness risk and makes diagnosis much harder. I’d narrow the suppression to the specific NoPeersSubscribedToTopic error and keep warning on anything else.

No other correctness or consensus regressions stood out from the diff. The self-delivery change in lib.rs is consistent with the current aggregation flow and should fix the local-validator exclusion problem without affecting fork-choice accounting.

I couldn’t run the Rust test suites in this environment because cargo tried to update the toolchain via rustup, which failed on a read-only filesystem under /home/runner/.rustup/tmp.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

🤖 Kimi Code Review

Overall Assessment: This PR correctly addresses a critical liveness issue where aggregators were excluding their own attestations from aggregated proofs due to gossipsub not echoing messages back to the sender. The implementation is clean and the separation of publish vs publish_best_effort is idiomatic.

Issues and Concerns:

  1. Idempotency Risk in Self-Delivery (crates/blockchain/src/lib.rs:201-207)
    The attestation is self-delivered via on_gossip_attestation before network publication. If the node later receives its own attestation through the network (e.g., due to mesh propagation delays or network loops), store::on_gossip_attestation must be idempotent or perform deduplication. If it returns an error for duplicates, this could spam logs or cause incorrect state transitions.

  2. Silent Failure of Critical Store Operation (crates/blockchain/src/lib.rs:203-205)
    If store::on_gossip_attestation fails (e.g., disk I/O error), the aggregator loses its own attestation permanently for this slot, potentially resulting in an aggregate that excludes the validator's own signature. This is a liveness risk if the validator is the sole aggregator for the committee. Consider:

    • Propagating the error to prevent publishing (if local storage is a prerequisite for aggregation)
    • Or escalating the log level to error instead of warn
  3. Synchronous Store Blocking (crates/blockchain/src/lib.rs:201)
    The self-delivery executes synchronously before network publication. If the store operation is slow (database latency), it delays gossip propagation. Consider whether this should be spawned as a background task, though the current approach ensures ordering.

Code Quality Suggestions:

  • Line 203: The clone() of signed_attestation is necessary but consider documenting the ownership flow explicitly, as this data could be large (XMSS signatures are typically larger than ECDSA).

  • Line 150-154 (crates/net/p2p/src/gossipsub/handler.rs): The logic distinguishing aggregator vs non-aggregator publish paths is correct, but add a comment clarifying that non-aggregators rely on fanout propagation and thus expect different error behaviors than aggregators using mesh.

Security Considerations:

  • No verification bypass: Ensure store::on_gossip_attestation does not skip XMSS signature verification for self-delivered attestations unless explicitly intended. Since the attestation was just created locally, it should be valid, but defensive programming suggests the store layer should remain agnostic to the caller.

  • Topic subscription consistency: Verify that server.is_aggregator accurately reflects the node's subscription state. If a node is marked as aggregator but not actually subscribed to the subnet topic, publish_best_effort could mask legitimate network connectivity errors.

Positive Notes:

  • The best_effort flag in SwarmCommand is a clean, non-breaking way to handle expected mesh isolation errors without changing the default failure behavior for other publishers.
  • Comments explaining the gossipsub mesh/fanout distinction are accurate and helpful for future maintainers.

Recommendation: Approve with minor fixes for error handling severity and idempotency documentation.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 9, 2026

Greptile Summary

This PR fixes a critical small-devnet finalization bug: the aggregator now self-delivers its own attestation via on_gossip_attestation before publishing (since gossipsub does not loop messages back to the sender), and introduces publish_best_effort to suppress the expected NoPeersSubscribedToTopic error. It also removes PostBlockCheckpoints and simplifies the proposer attestation path.

  • P1 – XMSS ordering regression (crates/storage/src/store.rs): GossipDataEntry.signatures was changed from BTreeMap<u64, …> (guaranteed ascending order) to Vec<GossipSignatureEntry> (insertion order). The deleted comment explicitly documented that XMSS aggregation is order-dependent and that the bitfield-based verifier reconstructs pubkeys low-to-high. Self-delivery now inserts the aggregator's signature first, potentially producing non-ascending order for subsequent gossip arrivals, causing proof verification to fail on any block with more than one aggregated attestor.

Confidence Score: 3/5

Not safe to merge as-is: the BTreeMap→Vec change likely breaks XMSS proof verification in any multi-validator scenario where the aggregator's validator_id is not the lowest among attestors.

One P1 finding: removing the BTreeMap ordering guarantee for gossip_signatures silently breaks the producer/verifier pubkey-order contract that was explicitly documented in the removed comment. The self-delivery fix itself is correct and valuable, but the Vec change introduces a regression that would surface whenever more than one attestation signature is aggregated and the aggregator is not the lowest-id validator.

crates/storage/src/store.rs — the GossipDataEntry.signatures Vec needs a sort step in iter_gossip_signatures or insert_gossip_signature to restore ascending validator_id order.

Vulnerabilities

No security concerns identified. The best_effort path only suppresses log output; it does not bypass any authentication or signature-verification logic. The self-delivery path reuses the existing on_gossip_attestation signature-verification flow.

Important Files Changed

Filename Overview
crates/storage/src/store.rs Changed GossipDataEntry.signatures from BTreeMap (ascending-ordered) to Vec (insertion-ordered); breaks XMSS aggregation ordering requirement in multi-validator scenarios where the self-delivering aggregator has a non-lowest validator_id.
crates/blockchain/src/lib.rs Adds aggregator self-delivery via on_gossip_attestation before publishing; also simplifies proposer attestation to use pre-import store state instead of post-STF PostBlockCheckpoints — may produce stale source checkpoint when block advances justification.
crates/blockchain/src/store.rs Removes PostBlockCheckpoints and get_attestation_target_with_checkpoints; simplifies produce_block_with_signatures return type; logic change is a cleanup but removes a correctness guard for proposer attestation source.
crates/net/p2p/src/gossipsub/handler.rs Aggregator now uses publish_best_effort for attestation subnet, suppressing expected NoPeersSubscribedToTopic error; non-aggregator path unchanged.
crates/net/p2p/src/lib.rs Threads is_aggregator flag from SwarmConfig through BuiltSwarm into P2PServer cleanly; no logic concerns.
crates/net/p2p/src/swarm_adapter.rs Adds best_effort flag to SwarmCommand::Publish; silently swallows all publish errors when set (not just NoPeersSubscribedToTopic), reducing observability for unexpected failures.

Sequence Diagram

sequenceDiagram
    participant BC as BlockChain (aggregator)
    participant Store as Store (gossip_signatures)
    participant P2P as P2PServer
    participant Swarm as Swarm (gossipsub)
    participant Peers as Other Nodes

    BC->>BC: produce_attestation(slot, validator_id)
    BC->>BC: sign_attestation → signed_attestation
    Note over BC: NEW: self-deliver before publish
    BC->>Store: on_gossip_attestation(signed_attestation)
    Store->>Store: insert_gossip_signature (Vec, insertion order)
    BC->>P2P: publish_attestation(signed_attestation)
    P2P->>Swarm: publish_best_effort(topic, data)
    Swarm-->>Peers: gossip (may fail: NoPeersSubscribedToTopic, silently ignored)
    Peers-->>Store: insert_gossip_signature (other validators, appended)

    Note over Store: Vec order: [aggregator_id, peer0, peer1, ...]
    Note over Store: Verifier expects ascending order [0, 1, 2, ...]

    BC->>Store: aggregate_committee_signatures (interval 2)
    Store->>Store: iter_gossip_signatures → snapshot in Vec order
    Store->>Store: aggregate_signatures(pubkeys, sigs) → proof
    Note over Store: proof built with non-ascending pubkey order
    Store->>Store: verify_signatures → pubkeys from bitfield (ascending)
    Note over Store: mismatch → verification fails if order differs
Loading

Comments Outside Diff (2)

  1. crates/storage/src/store.rs, line 993-1007 (link)

    P1 Vec ordering breaks XMSS aggregation for multi-validator devnets

    The BTreeMap that was replaced here guaranteed ascending validator_id order on iteration. The original code included an explicit comment explaining that XMSS aggregate proofs are order-dependent: verification reconstructs pubkeys from the participation bitfield in low-to-high index order, so aggregation must receive pubkeys in the same ascending order. The Vec does not sort on insertion, so when the aggregator self-delivers its attestation first (before network arrivals), its validator_id will appear at position 0 in the Vec regardless of its numeric value.

    In aggregate_committee_signatures, pubkeys and sigs are built by iterating the snapshot in Vec order. In verify_signatures, validator_indices(&attestation.aggregation_bits) reconstructs pubkeys in ascending bitfield order (0 … max). If the aggregator is validator 3 and delivers first, aggregation uses order [3, 0, 1, 2] but verification expects [0, 1, 2, 3]; the proof will fail every block with multiple attestors.

    A minimal fix is to sort entries by validator_id when iterating the snapshot:

    // In iter_gossip_signatures, before mapping:
    let mut sigs: Vec<_> = entry.signatures.iter()
        .map(|e| (e.validator_id, e.signature.clone()))
        .collect();
    sigs.sort_unstable_by_key(|(vid, _)| *vid);

    Or sort in insert_gossip_signature using entry.signatures.sort_unstable_by_key(|e| e.validator_id) after each push.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/storage/src/store.rs
    Line: 993-1007
    
    Comment:
    **Vec ordering breaks XMSS aggregation for multi-validator devnets**
    
    The `BTreeMap` that was replaced here guaranteed ascending `validator_id` order on iteration. The original code included an explicit comment explaining that XMSS aggregate proofs are order-dependent: verification reconstructs pubkeys from the participation bitfield in low-to-high index order, so aggregation must receive pubkeys in the same ascending order. The `Vec` does not sort on insertion, so when the aggregator self-delivers its attestation first (before network arrivals), its `validator_id` will appear at position 0 in the `Vec` regardless of its numeric value.
    
    In `aggregate_committee_signatures`, `pubkeys` and `sigs` are built by iterating the snapshot in Vec order. In `verify_signatures`, `validator_indices(&attestation.aggregation_bits)` reconstructs pubkeys in ascending bitfield order (0 … max). If the aggregator is validator 3 and delivers first, aggregation uses order `[3, 0, 1, 2]` but verification expects `[0, 1, 2, 3]`; the proof will fail every block with multiple attestors.
    
    A minimal fix is to sort entries by `validator_id` when iterating the snapshot:
    
    ```rust
    // In iter_gossip_signatures, before mapping:
    let mut sigs: Vec<_> = entry.signatures.iter()
        .map(|e| (e.validator_id, e.signature.clone()))
        .collect();
    sigs.sort_unstable_by_key(|(vid, _)| *vid);
    ```
    
    Or sort in `insert_gossip_signature` using `entry.signatures.sort_unstable_by_key(|e| e.validator_id)` after each push.
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. crates/blockchain/src/lib.rs, line 234-245 (link)

    P2 Proposer attestation now uses pre-import justification

    The removed PostBlockCheckpoints struct captured post_state.latest_justified and post_state.latest_finalized from inside build_block, so the proposer's attestation source reflected the checkpoints after the block's attestations were processed. Now self.store.latest_justified() is read from the store before the new block is imported, so if the block's packed attestations cross the 2/3 threshold and advance justification, the proposer's own attestation will carry a stale source checkpoint.

    Is this change intentional? The original comment ("the block's attestations may have advanced justification/finalization but the block hasn't been imported into the store yet") described an active correctness concern, not just a caveat.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/blockchain/src/lib.rs
    Line: 234-245
    
    Comment:
    **Proposer attestation now uses pre-import justification**
    
    The removed `PostBlockCheckpoints` struct captured `post_state.latest_justified` and `post_state.latest_finalized` from inside `build_block`, so the proposer's attestation source reflected the checkpoints *after* the block's attestations were processed. Now `self.store.latest_justified()` is read from the store before the new block is imported, so if the block's packed attestations cross the 2/3 threshold and advance justification, the proposer's own attestation will carry a stale source checkpoint.
    
    Is this change intentional? The original comment ("the block's attestations may have advanced justification/finalization but the block hasn't been imported into the store yet") described an active correctness concern, not just a caveat.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 993-1007

Comment:
**Vec ordering breaks XMSS aggregation for multi-validator devnets**

The `BTreeMap` that was replaced here guaranteed ascending `validator_id` order on iteration. The original code included an explicit comment explaining that XMSS aggregate proofs are order-dependent: verification reconstructs pubkeys from the participation bitfield in low-to-high index order, so aggregation must receive pubkeys in the same ascending order. The `Vec` does not sort on insertion, so when the aggregator self-delivers its attestation first (before network arrivals), its `validator_id` will appear at position 0 in the `Vec` regardless of its numeric value.

In `aggregate_committee_signatures`, `pubkeys` and `sigs` are built by iterating the snapshot in Vec order. In `verify_signatures`, `validator_indices(&attestation.aggregation_bits)` reconstructs pubkeys in ascending bitfield order (0 … max). If the aggregator is validator 3 and delivers first, aggregation uses order `[3, 0, 1, 2]` but verification expects `[0, 1, 2, 3]`; the proof will fail every block with multiple attestors.

A minimal fix is to sort entries by `validator_id` when iterating the snapshot:

```rust
// In iter_gossip_signatures, before mapping:
let mut sigs: Vec<_> = entry.signatures.iter()
    .map(|e| (e.validator_id, e.signature.clone()))
    .collect();
sigs.sort_unstable_by_key(|(vid, _)| *vid);
```

Or sort in `insert_gossip_signature` using `entry.signatures.sort_unstable_by_key(|e| e.validator_id)` after each push.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/net/p2p/src/swarm_adapter.rs
Line: 143-157

Comment:
**`best_effort` silences all errors, not just `NoPeersSubscribedToTopic`**

When `best_effort: true` the entire publish result is dropped silently. `NoPeersSubscribedToTopic` is expected for the aggregator, but other errors (e.g., message-too-large, duplicate `MessageId`) would also be swallowed with no trace. Consider logging at `debug!` or `trace!` even in best-effort mode so unexpected failures remain observable:

```rust
        SwarmCommand::Publish {
            topic,
            data,
            best_effort,
        } => {
            let result = swarm.behaviour_mut().gossipsub.publish(topic, data);
            if best_effort {
                let _ = result.inspect_err(|err| {
                    tracing::debug!(%err, "Swarm adapter: best-effort publish failed (ignored)")
                });
            } else {
                let _ = result.inspect_err(|err| warn!(%err, "Swarm adapter: publish failed"));
            }
        }
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/blockchain/src/lib.rs
Line: 234-245

Comment:
**Proposer attestation now uses pre-import justification**

The removed `PostBlockCheckpoints` struct captured `post_state.latest_justified` and `post_state.latest_finalized` from inside `build_block`, so the proposer's attestation source reflected the checkpoints *after* the block's attestations were processed. Now `self.store.latest_justified()` is read from the store before the new block is imported, so if the block's packed attestations cross the 2/3 threshold and advance justification, the proposer's own attestation will carry a stale source checkpoint.

Is this change intentional? The original comment ("the block's attestations may have advanced justification/finalization but the block hasn't been imported into the store yet") described an active correctness concern, not just a caveat.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: self-deliver aggregator attestation..." | Re-trigger Greptile

Copy link
Copy Markdown
Collaborator

@pablodeymo pablodeymo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit (non-blocking): The if let Err pattern here works, but the codebase convention prefers inspect_err for side-effect-only error handling. One option:

let result = swarm.behaviour_mut().gossipsub.publish(topic, data);
if let Err(err) = &result {
    if !(ignore_no_peers && matches!(err, PublishError::NoPeersSubscribedToTopic)) {
        warn!(%err, "Swarm adapter: publish failed");
    }
}

Purely cosmetic — current version reads fine too.

// this the aggregator never sees its own validator's signature in
// gossip_signatures and it is excluded from aggregated proofs.
if self.is_aggregator {
let _ = store::on_gossip_attestation(&mut self.store, signed_attestation.clone())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid this clone? we can do this in another PR

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can: #271

@pablodeymo pablodeymo merged commit 06c4306 into main Apr 9, 2026
2 checks passed
@pablodeymo pablodeymo deleted the fix/aggregator-self-delivery branch April 9, 2026 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants