Fix key ratchet race condition during DAVE epoch transitions#5
Open
MohmmedAshraf wants to merge 1 commit intodisgoorg:masterfrom
Open
Fix key ratchet race condition during DAVE epoch transitions#5MohmmedAshraf wants to merge 1 commit intodisgoorg:masterfrom
MohmmedAshraf wants to merge 1 commit intodisgoorg:masterfrom
Conversation
When a user joins or leaves a voice channel, Discord triggers a DAVE_PROTOCOL_PREPARE_EPOCH event. The prepareEpoch() function calls Session.Init() which destroys the old MLS crypto state. Discord then sends DavePrepareTransition which calls setupKeyRatchetForUser, but at this point GetKeyRatchet() returns a Go struct wrapping a NULL C pointer because the new MLS handshake hasn't completed yet. This causes SetPassthroughMode(false) to be called on the encryptor which expects encryption keys that don't exist yet, resulting in every Encrypt() call failing with ErrMissingKeyRatchet until the new MLS Welcome message arrives (~1 second later). This fix addresses the race in two places: 1. prepareEpoch: Reset encryptor and all decryptors to passthrough mode BEFORE calling Session.Init(), so audio can continue flowing unencrypted during the brief MLS renegotiation window rather than failing entirely. 2. setupKeyRatchetForUser: Guard against nil key ratchets returned by GetKeyRatchet(). If encryption is enabled but no valid key ratchet exists yet (because the MLS handshake is still in progress), skip the transition and keep passthrough mode active. The next MLS Welcome/Commit will call prepareTransition again with valid keys. 3. newKeyRatchet: Return nil when the underlying C handle is nil, matching the existing pattern used by newWelcomeResult. This prevents wrapping NULL C pointers in Go structs that appear non-nil to callers.
Member
|
@davfsa can you review this? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a user joins or leaves a DAVE-enabled voice channel, Discord triggers a
DAVE_PROTOCOL_PREPARE_EPOCHevent. The current code flow creates a race condition:prepareEpoch()callsSession.Init()which destroys the old MLS crypto stateDavePrepareTransition→setupKeyRatchetForUser(self)GetKeyRatchet()returns a Go struct wrapping a NULL C pointer (old state destroyed, new handshake not done yet)SetPassthroughMode(false)is called → encryptor expects encryption but has no keysEncrypt()call fails withErrMissingKeyRatchetfor ~1 second until the new MLS Welcome arrivesThis means every member join/leave in a DAVE-enabled channel causes ~1 second of complete audio failure for all bots using golibdave.
Root Cause
Two issues:
prepareEpochdestroys crypto state without a safety net.Session.Init()invalidates all key ratchets, but the encryptor/decryptors are still set topassthrough=false— so they try to encrypt/decrypt with destroyed keys.setupKeyRatchetForUserunconditionally disables passthrough. Even whenGetKeyRatchet()returns nil (NULL C handle wrapped in a non-nil Go struct), the code setsSetPassthroughMode(false)and proceeds. ThenewKeyRatchetfunction doesn't check for nil C handles, unlike the existingnewWelcomeResultwhich already does.Fix
Three targeted changes:
1.
prepareEpoch— reset to passthrough before destroying crypto state2.
setupKeyRatchetForUser— guard against nil key ratchetsIf encryption is enabled but
GetKeyRatchet()returns nil (MLS handshake still in progress), skip the transition and keep passthrough active. The next MLS Welcome/Commit will callprepareTransitionagain with valid keys.3.
newKeyRatchet— return nil for nil C handlesMatches the existing pattern used by
newWelcomeResult. Prevents wrapping NULL C pointers in Go structs that appear non-nil to callers.Testing
Tested in production with a Discord voice relay bot (listener + 3 speaker bots across multiple guilds):
pkt_dropped=0)ErrMissingKeyRatcheterrorsContext
DAVE becomes mandatory on March 1, 2026. This race condition affects any bot using golibdave in channels where members join/leave while audio is active — which is essentially all real-world usage.