Fix scripts multisig rotation race condition#1260
Closed
ryan-hansen wants to merge 5 commits intoWebOfTrust:mainfrom
Closed
Fix scripts multisig rotation race condition#1260ryan-hansen wants to merge 5 commits intoWebOfTrust:mainfrom
ryan-hansen wants to merge 5 commits intoWebOfTrust:mainfrom
Conversation
- increase retry limit to 3 seconds, effectively 2 retries over 3 seconds.
Collaborator
Author
|
Not quite ready yet, apparently. |
SmithSamuelM
approved these changes
Mar 1, 2026
Collaborator
SmithSamuelM
left a comment
There was a problem hiding this comment.
Convert this to a non-draft so it can be accepted
Collaborator
Author
|
The decision was made to handle retries and waits, as necessary, at the script level rather than in the actual cli code. Closing this and will push a new PR for that work. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem: The multisig-join.sh CLI script was hanging.
Two causes:
1 - A timing race: when one member queried the other’s KEL after local rotations, the query could be answered before the witness had applied that member’s rotation, so the group rotation was built with stale keys and failed with “invalid rotation, new key set unable to satisfy prior next signing threshold”.
2 - When the rotate failed, the companion multisig join had no timeout and waited indefinitely for a group event that never came, so the script’s wait never returned.
Fix:
1 - Added a --timeout option to kli multisig join so it exits with an error instead of hanging when no group multisig event arrives.
2 - Added retry logic in the group rotation CLI so transient ValidationErrors (e.g. from stale member KEL state) are retried for a short window before failing.
3 - Hardened the demo script by using --timeout on all join calls and adding short sleeps before cross-queries so witnesses have time to apply rotations.
4 - Added kering.TimeoutError for join timeouts.