Skip to content

bug: ev.Group in .gather mode panics when children exceed io_uring SQ depth #354

@manishrjain

Description

@manishrjain

Summary: When an ev.Group in .gather mode has more children than the io_uring submission queue depth (default 256), submitting the group panics with reached unreachable code in task.zig:263.

Root cause: loop.addInternal for a .group op walks the children list and calls addInternal on each child (loop.zig:514-520). Each child's addInternal calls getSqe(). When the SQ is full, getSqe flushes via poll(state, .zero) and retries once (io_uring.zig:836-840). If the kernel hasn't completed any in-flight SQEs yet, the retry also returns SubmissionQueueFull. This error propagates up and leaves the group in an inconsistent state (.running with partial children added), causing an assertion failure when the waiter tries to yield.

Reproduction:

const ev = zio.ev;

fn repro() !void {
    // Create a file with 300 pages
    const file = try std.fs.createFileAbsolute("/tmp/zio_group_bug", .{});
    defer std.fs.deleteFileAbsolute("/tmp/zio_group_bug") catch {};
    var page: [4096]u8 = undefined;
    @memset(&page, 0xAA);
    for (0..300) |_| try file.writeAll(&page);
    file.close();

    const f = try zio.fs.open("/tmp/zio_group_bug", .{});
    defer f.close();

    var reads: [300]ev.FileRead = undefined;
    var iovs: [300][1]zio.os.iovec = undefined;
    var bufs: [300][4096]u8 = undefined;

    var group: ev.Group = .init(.gather);
    for (0..300) |i| {
        reads[i] = ev.FileRead.init(f.fd, .fromSlice(&bufs[i], &iovs[i]), i * 4096);
        group.add(&reads[i].c);
    }
    try zio.waitForIo(&group.c); // panics
}

test "group gather 300 reads" {
    const rt = try zio.Runtime.init(std.testing.allocator, .{});
    defer rt.deinit();
    var h = try rt.spawn(repro, .{});
    try h.join();
}

Actual behavior:

[zio] (err): Failed to get io_uring SQE for file_read
[default] (err): Event loop error during yield: error.SubmissionQueueFull
thread panic: reached unreachable code
  task.zig:263 -- assert(self.state.load(.acquire) == .ready)

Expected behavior: Either all 300 reads complete successfully (by waiting for SQ slots to free up), or the group returns a clean error without panicking.

Suggested fixes (in order of preference):

  1. getSqe should wait for a free slot instead of failing after one retry. When the SQ is full and the flush doesn't free any slots, getSqe could wait for at least one CQE to complete (freeing an SQ slot), then retry. This makes large groups work transparently.

  2. The group add loop should handle getSqe failure gracefully: cancel all already-added children, set the group error, and propagate SubmissionQueueFull to the caller instead of panicking.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions