Advance XMSS key preparation window before signing#261
Conversation
…t bottom-tree boundaries The XMSS signing key maintains a sliding window of 2 bottom trees (131,072 slots). When the window is exhausted, advance_preparation() must be called to compute the next bottom tree. Without this, all nodes panic at the window boundary.
🤖 Codex Code Review
No other issues stood out in the diff itself. I could not fully verify with Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
Greptile SummaryThis PR fixes a production panic that occurred at slot 131,072 by checking whether the XMSS key's prepared window covers the target slot before signing, and calling Confidence Score: 4/5Safe to merge for the common case; one logic concern in the advancement loop is worth addressing for robustness. The while loop contains an inner crates/blockchain/src/key_manager.rs — the while loop in the preparation window advancement logic.
|
| Filename | Overview |
|---|---|
| crates/blockchain/src/key_manager.rs | Adds XMSS key preparation window advancement before signing; the while loop logic has a subtle issue where it only ever advances once before erroring, rather than iterating until prepared or truly exhausted. |
| crates/common/types/src/signature.rs | Exposes is_prepared_for(slot) and advance_preparation() on ValidatorSecretKey by delegating to the leansig SignatureSchemeSecretKey trait; straightforward and correct. |
Sequence Diagram
sequenceDiagram
participant BC as BlockChain
participant KM as KeyManager
participant VSK as ValidatorSecretKey
participant LS as leansig (inner)
BC->>KM: sign_attestation(validator_id, attestation_data)
KM->>KM: slot = attestation_data.slot as u32
KM->>KM: sign_message(validator_id, slot, message_hash)
KM->>VSK: is_prepared_for(slot)
VSK->>LS: get_prepared_interval().contains(slot as u64)
LS-->>VSK: bool
VSK-->>KM: false (slot outside window)
note over KM: log info — advancing window
KM->>VSK: advance_preparation()
VSK->>LS: advance_preparation() — slides window by 65,536 slots
KM->>VSK: is_prepared_for(slot)
VSK-->>KM: true
KM->>VSK: sign(slot, message)
VSK->>LS: LeanSignatureScheme::sign(&inner, slot, message)
LS-->>VSK: ValidatorSignature
VSK-->>KM: Ok(signature)
KM-->>BC: Ok(XmssSignature)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/blockchain/src/key_manager.rs
Line: 110-118
Comment:
**`while` loop only ever advances once**
The inner `if !is_prepared_for` guard returns an error immediately after the first `advance_preparation()` call, so the `while` never actually iterates a second time. If the slot happens to be two or more windows ahead of the current prepared interval (e.g., after an extended outage), the key would be incorrectly reported as exhausted even though another advance would bring it in range.
To detect true exhaustion (where `advance_preparation()` is a no-op), capture the interval before and after and compare:
```rust
while !secret_key.is_prepared_for(slot) {
let before = secret_key.get_prepared_interval();
secret_key.advance_preparation();
if secret_key.get_prepared_interval() == before {
// No-op: key activation interval is fully exhausted
return Err(KeyManagerError::SigningError(format!(
"XMSS key exhausted for validator {validator_id}: \
slot {slot} is beyond the key's activation interval"
)));
}
}
```
This requires exposing `get_prepared_interval()` on `ValidatorSecretKey`, but correctly distinguishes "key exhausted" from "slot is multiple windows ahead".
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Advance XMSS key preparation window befo..." | Re-trigger Greptile
🤖 Claude Code ReviewNow I have everything I need to write the review. Code Review: Advance XMSS key preparation window before signingSummary: The PR fixes a real production panic and the overall approach is sound. There's one logic bug in the loop structure and a misleading error message that should be addressed. Bug:
|
| while !secret_key.is_prepared_for(slot) { | ||
| secret_key.advance_preparation(); | ||
| if !secret_key.is_prepared_for(slot) { | ||
| return Err(KeyManagerError::SigningError(format!( | ||
| "XMSS key exhausted for validator {validator_id}: \ | ||
| slot {slot} is beyond the key's activation interval" | ||
| ))); | ||
| } | ||
| } |
There was a problem hiding this comment.
while loop only ever advances once
The inner if !is_prepared_for guard returns an error immediately after the first advance_preparation() call, so the while never actually iterates a second time. If the slot happens to be two or more windows ahead of the current prepared interval (e.g., after an extended outage), the key would be incorrectly reported as exhausted even though another advance would bring it in range.
To detect true exhaustion (where advance_preparation() is a no-op), capture the interval before and after and compare:
while !secret_key.is_prepared_for(slot) {
let before = secret_key.get_prepared_interval();
secret_key.advance_preparation();
if secret_key.get_prepared_interval() == before {
// No-op: key activation interval is fully exhausted
return Err(KeyManagerError::SigningError(format!(
"XMSS key exhausted for validator {validator_id}: \
slot {slot} is beyond the key's activation interval"
)));
}
}This requires exposing get_prepared_interval() on ValidatorSecretKey, but correctly distinguishes "key exhausted" from "slot is multiple windows ahead".
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/key_manager.rs
Line: 110-118
Comment:
**`while` loop only ever advances once**
The inner `if !is_prepared_for` guard returns an error immediately after the first `advance_preparation()` call, so the `while` never actually iterates a second time. If the slot happens to be two or more windows ahead of the current prepared interval (e.g., after an extended outage), the key would be incorrectly reported as exhausted even though another advance would bring it in range.
To detect true exhaustion (where `advance_preparation()` is a no-op), capture the interval before and after and compare:
```rust
while !secret_key.is_prepared_for(slot) {
let before = secret_key.get_prepared_interval();
secret_key.advance_preparation();
if secret_key.get_prepared_interval() == before {
// No-op: key activation interval is fully exhausted
return Err(KeyManagerError::SigningError(format!(
"XMSS key exhausted for validator {validator_id}: \
slot {slot} is beyond the key's activation interval"
)));
}
}
```
This requires exposing `get_prepared_interval()` on `ValidatorSecretKey`, but correctly distinguishes "key exhausted" from "slot is multiple windows ahead".
How can I resolve this? If you propose a fix, please make it concise.Generates a production-scheme XMSS key with 3 bottom trees and measures the time to slide the preparation window forward. Marked #[ignore] since key generation takes ~1.6s even in release mode.
MegaRedHand
left a comment
There was a problem hiding this comment.
The while loop is redundant
|
|
||
| // Advance XMSS key preparation window if the slot is outside the current window. | ||
| // Each bottom tree covers 65,536 slots; the window holds 2 at a time. | ||
| if !secret_key.is_prepared_for(slot) { |
There was a problem hiding this comment.
Got it, thanks for tracking it.
…tages The previous loop returned an error after a single failed advance, but a node that was offline for more than one window (65,536 slots) needs multiple advances. Now detects true key exhaustion by comparing the prepared interval before and after advance_preparation().
Summary
is_prepared_for(slot)andadvance_preparation()onValidatorSecretKey, delegating to the leansigSignatureSchemeSecretKeytraitKeyManager::sign_message(), check if the target slot is within the prepared window and advance if neededadvance_preparation()(526ms in release mode on Apple Silicon)Root Cause
XMSS keys use a Top-Bottom Tree Traversal scheme where only two consecutive bottom trees are loaded in memory at any time. Each bottom tree covers
sqrt(LIFETIME) = 2^16 = 65,536slots, so the prepared window spans131,072slots (~6 days at 4s/slot).The leansig library provides
advance_preparation()to slide this window forward by computing the next bottom tree, but ethlambda'sKeyManagernever called it. When the devnet atadmin@ethlambda-1reached slot 131,072, all 4 nodes panicked simultaneously:The fix checks the prepared interval before every sign operation and advances the window on demand. This is a lazy approach —
advance_preparationis called at signing time rather than proactively in the tick loop — because:Test plan
make fmtcleanmake lintcleanmake testpasses (existing tests use small lifetimes or skip verification)test_advance_preparation_durationpasses (cargo test -p ethlambda-types test_advance_preparation_duration --release -- --ignored --nocapture)