Re-arm wakeup doorbell on multishot poll termination#80
Merged
Conversation
The per-CPU wakeup doorbell is a single IORING_OP_POLL_ADD_MULTI submitted once at init: wakeThread writes the eventfd, the multishot poll posts a CQE, and io_uring_enter2 wakes. But a multishot poll is not permanent - the kernel ends it (final CQE with IORING_CQE_F_MORE cleared) on CQ overflow, which IORING_FEAT_NODROP does not prevent (NODROP preserves the CQEs, not the poll's arming). The completion handler never checked F_MORE and never re-armed, so a single termination left the doorbell deaf for the life of the processor: wakeThread's eventfd_write produced no CQE, and the CPU only woke via its park timeout (up to maxWaitNs) - a sticky cross-CPU wakeup-latency cliff. Factor the arming into ProcessorState::enqueueDoorbell, used by both initialize and handleCompletionQueueSlow, which now re-arms when a doorbell CQE arrives with F_MORE cleared (after the CQ is fully drained, outside the for-each-cqe iteration). Unlike enqueueWakeup, a skipped re-arm is unrecoverable, so it retries through a full SQ ring instead of skipping.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The per-CPU wakeup doorbell is a single IORING_OP_POLL_ADD_MULTI submitted once at init: wakeThread writes the eventfd, the multishot poll posts a CQE, and io_uring_enter2 wakes. But a multishot poll is not permanent - the kernel ends it (final CQE with IORING_CQE_F_MORE cleared) on CQ overflow, which IORING_FEAT_NODROP does not prevent (NODROP preserves the CQEs, not the poll's arming). The completion handler never checked F_MORE and never re-armed, so a single termination left the doorbell deaf for the life of the processor: wakeThread's eventfd_write produced no CQE, and the CPU only woke via its park timeout (up to maxWaitNs) - a sticky cross-CPU wakeup-latency cliff.
Factor the arming into ProcessorState::enqueueDoorbell, used by both initialize and handleCompletionQueueSlow, which now re-arms when a doorbell CQE arrives with F_MORE cleared (after the CQ is fully drained, outside the for-each-cqe iteration). Unlike enqueueWakeup, a skipped re-arm is unrecoverable, so it retries through a full SQ ring instead of skipping.