Skip to content

Advance XMSS key preparation window before signing#261

Merged
MegaRedHand merged 3 commits intomainfrom
xmss-advance-key-preparation
Apr 8, 2026
Merged

Advance XMSS key preparation window before signing#261
MegaRedHand merged 3 commits intomainfrom
xmss-advance-key-preparation

Conversation

@pablodeymo
Copy link
Copy Markdown
Collaborator

@pablodeymo pablodeymo commented Apr 7, 2026

Summary

  • Expose is_prepared_for(slot) and advance_preparation() on ValidatorSecretKey, delegating to the leansig SignatureSchemeSecretKey trait
  • Before signing in KeyManager::sign_message(), check if the target slot is within the prepared window and advance if needed
  • Return a descriptive error if the key's activation interval is fully exhausted
  • Add a timing test for advance_preparation() (526ms in release mode on Apple Silicon)

Root Cause

XMSS keys use a Top-Bottom Tree Traversal scheme where only two consecutive bottom trees are loaded in memory at any time. Each bottom tree covers sqrt(LIFETIME) = 2^16 = 65,536 slots, so the prepared window spans 131,072 slots (~6 days at 4s/slot).

The leansig library provides advance_preparation() to slide this window forward by computing the next bottom tree, but ethlambda's KeyManager never called it. When the devnet at admin@ethlambda-1 reached slot 131,072, all 4 nodes panicked simultaneously:

Signing: key not yet prepared for this epoch, try calling sk.advance_preparation.

The fix checks the prepared interval before every sign operation and advances the window on demand. This is a lazy approach — advance_preparation is called at signing time rather than proactively in the tick loop — because:

  • It happens once every ~3 days (65,536 slots)
  • The computation (one bottom tree of hash leaves) takes ~526ms in release mode
  • It keeps the change minimal and avoids tick-loop complexity

Test plan

  • make fmt clean
  • make lint clean
  • make test passes (existing tests use small lifetimes or skip verification)
  • test_advance_preparation_duration passes (cargo test -p ethlambda-types test_advance_preparation_duration --release -- --ignored --nocapture)
  • Deploy to devnet with fresh genesis and verify it runs past slot 131,072 without panic

…t bottom-tree boundaries

  The XMSS signing key maintains a sliding window of 2 bottom trees (131,072 slots).
  When the window is exhausted, advance_preparation() must be called to compute the
  next bottom tree. Without this, all nodes panic at the window boundary.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🤖 Codex Code Review

  1. crates/blockchain/src/key_manager.rs:108 The new “advance preparation” loop is functionally a single-step check, not a real loop. After one advance_preparation(), line 112 immediately returns an exhaustion error if the slot is still outside the prepared interval, so slots that are more than one bottom-tree ahead can never recover. Example: if a validator is offline long enough to skip more than 65,536 slots, signing will fail even though repeated advances should make the target slot signable. This is a consensus-liveness bug for local validator duties. The fix is to keep advancing until either is_prepared_for(slot) becomes true or you can positively detect that advance_preparation() no longer changes the interval.

  2. crates/blockchain/src/key_manager.rs:113 The new error text says the slot is “beyond the key’s activation interval”, but this branch is also reached whenever the slot is merely still outside the prepared window after the first advance. With the current control flow, that message can be false and will mislead debugging. If the code keeps this API shape, it should distinguish “needs more window advances” from true key exhaustion.

No other issues stood out in the diff itself. I could not fully verify with cargo test because dependency fetching is blocked in this sandbox.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 7, 2026

Greptile Summary

This PR fixes a production panic that occurred at slot 131,072 by checking whether the XMSS key's prepared window covers the target slot before signing, and calling advance_preparation() if it doesn't. The fix is minimal and well-targeted: two new methods are exposed on ValidatorSecretKey in signature.rs, and KeyManager::sign_message uses them before invoking the signer.

Confidence Score: 4/5

Safe to merge for the common case; one logic concern in the advancement loop is worth addressing for robustness.

The while loop contains an inner if !is_prepared_for guard that returns an error immediately after the first advance, making the loop semantically equivalent to a single-shot if. In normal operation (slots advancing monotonically, at most one window behind) this is fine and the PR fully solves the reported panic. The concern only surfaces if a node is offline for more than ~131k slots (~6 days), where two or more advances would be needed but the current code would incorrectly report key exhaustion.

crates/blockchain/src/key_manager.rs — the while loop in the preparation window advancement logic.

Important Files Changed

Filename Overview
crates/blockchain/src/key_manager.rs Adds XMSS key preparation window advancement before signing; the while loop logic has a subtle issue where it only ever advances once before erroring, rather than iterating until prepared or truly exhausted.
crates/common/types/src/signature.rs Exposes is_prepared_for(slot) and advance_preparation() on ValidatorSecretKey by delegating to the leansig SignatureSchemeSecretKey trait; straightforward and correct.

Sequence Diagram

sequenceDiagram
    participant BC as BlockChain
    participant KM as KeyManager
    participant VSK as ValidatorSecretKey
    participant LS as leansig (inner)

    BC->>KM: sign_attestation(validator_id, attestation_data)
    KM->>KM: slot = attestation_data.slot as u32
    KM->>KM: sign_message(validator_id, slot, message_hash)
    KM->>VSK: is_prepared_for(slot)
    VSK->>LS: get_prepared_interval().contains(slot as u64)
    LS-->>VSK: bool
    VSK-->>KM: false (slot outside window)
    note over KM: log info — advancing window
    KM->>VSK: advance_preparation()
    VSK->>LS: advance_preparation() — slides window by 65,536 slots
    KM->>VSK: is_prepared_for(slot)
    VSK-->>KM: true
    KM->>VSK: sign(slot, message)
    VSK->>LS: LeanSignatureScheme::sign(&inner, slot, message)
    LS-->>VSK: ValidatorSignature
    VSK-->>KM: Ok(signature)
    KM-->>BC: Ok(XmssSignature)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/blockchain/src/key_manager.rs
Line: 110-118

Comment:
**`while` loop only ever advances once**

The inner `if !is_prepared_for` guard returns an error immediately after the first `advance_preparation()` call, so the `while` never actually iterates a second time. If the slot happens to be two or more windows ahead of the current prepared interval (e.g., after an extended outage), the key would be incorrectly reported as exhausted even though another advance would bring it in range.

To detect true exhaustion (where `advance_preparation()` is a no-op), capture the interval before and after and compare:

```rust
while !secret_key.is_prepared_for(slot) {
    let before = secret_key.get_prepared_interval();
    secret_key.advance_preparation();
    if secret_key.get_prepared_interval() == before {
        // No-op: key activation interval is fully exhausted
        return Err(KeyManagerError::SigningError(format!(
            "XMSS key exhausted for validator {validator_id}: \
             slot {slot} is beyond the key's activation interval"
        )));
    }
}
```

This requires exposing `get_prepared_interval()` on `ValidatorSecretKey`, but correctly distinguishes "key exhausted" from "slot is multiple windows ahead".

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Advance XMSS key preparation window befo..." | Re-trigger Greptile

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🤖 Claude Code Review

Now I have everything I need to write the review.


Code Review: Advance XMSS key preparation window before signing

Summary: The PR fixes a real production panic and the overall approach is sound. There's one logic bug in the loop structure and a misleading error message that should be addressed.


Bug: while loop is effectively single-iteration (key_manager.rs:110–118)

while !secret_key.is_prepared_for(slot) {
    secret_key.advance_preparation();
    if !secret_key.is_prepared_for(slot) {
        return Err(KeyManagerError::SigningError(format!(
            "XMSS key exhausted for validator {validator_id}: \
             slot {slot} is beyond the key's activation interval"
        )));
    }
}

The while suggests it can loop multiple times, but it cannot. After each advance_preparation() call, if the slot is still not covered, the function immediately returns an error—the while condition is never re-evaluated for a second iteration. The structure is equivalent to a plain if:

secret_key.advance_preparation();
if !secret_key.is_prepared_for(slot) { return Err(...); }

When does this matter? Each advance_preparation() slides the window by one bottom tree (65,536 slots). If a node restarts with an un-advanced key at a high-numbered slot—e.g., slot 400,000 with window [0, 131,071]—five advances are required. The current code bails out after the first failed advance and reports "key exhausted" for a key that is not actually exhausted.

A concrete failure case:

  • Window: [0, 131,071], target slot: 200,000
  • One advance: [65,536, 196,607] — slot 200,000 still not covered → premature error
  • Two advances: [131,072, 262,143] — slot 200,000 covered ✓

A correct implementation loops until prepared or until advance_preparation is provably a no-op. One clean approach—add get_prepared_interval() to ValidatorSecretKey and detect stall:

if !secret_key.is_prepared_for(slot) {
    info!(validator_id, slot, "Advancing XMSS key preparation window");
    loop {
        let prev_end = secret_key.prepared_interval_end(); // new helper
        secret_key.advance_preparation();
        if secret_key.is_prepared_for(slot) {
            break;
        }
        if secret_key.prepared_interval_end() == prev_end {
            // advance_preparation was a no-op; key truly exhausted
            return Err(KeyManagerError::SigningError(format!(
                "XMSS key exhausted for validator {validator_id}: \
                 slot {slot} is beyond the key's activation interval"
            )));
        }
    }
}

Alternatively, a bounded loop with a generous MAX_ADVANCES constant (e.g., 64) keeps it simple without needing to expose the interval:

for _ in 0..MAX_ADVANCES {
    if secret_key.is_prepared_for(slot) { break; }
    secret_key.advance_preparation();
}
if !secret_key.is_prepared_for(slot) {
    return Err(...);
}

Minor: Misleading error text

The error message "slot {slot} is beyond the key's activation interval" can fire (due to the single-advance logic above) for slots that are within the activation interval but just need more than one window advance. The message would mislead operators into thinking the key is permanently dead when they may simply need to advance it further. The message should be revised once the loop logic is fixed.


Rest of the PR

  • signature.rs: The two new delegation methods (is_prepared_for, advance_preparation) are clean and correctly placed. The cast slot as u64 in is_prepared_for is fine (u32 max ≈ 170 years at 4s/slot).
  • key_manager.rs: The tracing::info! log (line 109) fires once per advance event outside the inner loop, which is the right level of verbosity.
  • The lazy-advance approach (vs. proactive tick-loop advancing) is a reasonable trade-off as explained in the PR description.
  • sign_attestation is the only public entry point and routes through sign_message, so the fix covers all signing paths.

Automated review by Claude (Anthropic) · sonnet · custom prompt

Comment on lines +110 to +118
while !secret_key.is_prepared_for(slot) {
secret_key.advance_preparation();
if !secret_key.is_prepared_for(slot) {
return Err(KeyManagerError::SigningError(format!(
"XMSS key exhausted for validator {validator_id}: \
slot {slot} is beyond the key's activation interval"
)));
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 while loop only ever advances once

The inner if !is_prepared_for guard returns an error immediately after the first advance_preparation() call, so the while never actually iterates a second time. If the slot happens to be two or more windows ahead of the current prepared interval (e.g., after an extended outage), the key would be incorrectly reported as exhausted even though another advance would bring it in range.

To detect true exhaustion (where advance_preparation() is a no-op), capture the interval before and after and compare:

while !secret_key.is_prepared_for(slot) {
    let before = secret_key.get_prepared_interval();
    secret_key.advance_preparation();
    if secret_key.get_prepared_interval() == before {
        // No-op: key activation interval is fully exhausted
        return Err(KeyManagerError::SigningError(format!(
            "XMSS key exhausted for validator {validator_id}: \
             slot {slot} is beyond the key's activation interval"
        )));
    }
}

This requires exposing get_prepared_interval() on ValidatorSecretKey, but correctly distinguishes "key exhausted" from "slot is multiple windows ahead".

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/key_manager.rs
Line: 110-118

Comment:
**`while` loop only ever advances once**

The inner `if !is_prepared_for` guard returns an error immediately after the first `advance_preparation()` call, so the `while` never actually iterates a second time. If the slot happens to be two or more windows ahead of the current prepared interval (e.g., after an extended outage), the key would be incorrectly reported as exhausted even though another advance would bring it in range.

To detect true exhaustion (where `advance_preparation()` is a no-op), capture the interval before and after and compare:

```rust
while !secret_key.is_prepared_for(slot) {
    let before = secret_key.get_prepared_interval();
    secret_key.advance_preparation();
    if secret_key.get_prepared_interval() == before {
        // No-op: key activation interval is fully exhausted
        return Err(KeyManagerError::SigningError(format!(
            "XMSS key exhausted for validator {validator_id}: \
             slot {slot} is beyond the key's activation interval"
        )));
    }
}
```

This requires exposing `get_prepared_interval()` on `ValidatorSecretKey`, but correctly distinguishes "key exhausted" from "slot is multiple windows ahead".

How can I resolve this? If you propose a fix, please make it concise.

Generates a production-scheme XMSS key with 3 bottom trees and
measures the time to slide the preparation window forward. Marked
#[ignore] since key generation takes ~1.6s even in release mode.
Copy link
Copy Markdown
Collaborator

@MegaRedHand MegaRedHand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The while loop is redundant


// Advance XMSS key preparation window if the slot is outside the current window.
// Each bottom tree covers 65,536 slots; the window holds 2 at a time.
if !secret_key.is_prepared_for(slot) {
Copy link
Copy Markdown
Collaborator

@MegaRedHand MegaRedHand Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #262 to tackle this later

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for tracking it.

Comment thread crates/blockchain/src/key_manager.rs
…tages

The previous loop returned an error after a single failed advance, but
a node that was offline for more than one window (65,536 slots) needs
multiple advances. Now detects true key exhaustion by comparing the
prepared interval before and after advance_preparation().
@MegaRedHand MegaRedHand merged commit 172d9b0 into main Apr 8, 2026
3 checks passed
@MegaRedHand MegaRedHand deleted the xmss-advance-key-preparation branch April 8, 2026 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants