Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## 2024-05-24 - Branch Misprediction Optimization for Packet Classification
**Learning:** In Zig, standard `switch` statements on integers compile to jump tables. Extracting the dominant case (like data-plane packets `wg_transport` which is 99.9% of traffic) into an explicit `if` branch before a `switch` improves branch prediction and avoids jump table overhead.
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The claim that "In Zig, standard switch statements on integers compile to jump tables" is an oversimplification. Zig uses LLVM as its backend in release modes, and LLVM's SimplifyCFG pass decides whether to use a jump table, branch chain, or lookup table based on case density and count. For a small, dense switch on values 1–4, LLVM will typically not generate a jump table. This documentation could mislead future developers into unnecessary manual optimizations. Consider qualifying the statement (e.g., "for larger or sparser switches") or removing the unverified claim.

Suggested change
**Learning:** In Zig, standard `switch` statements on integers compile to jump tables. Extracting the dominant case (like data-plane packets `wg_transport` which is 99.9% of traffic) into an explicit `if` branch before a `switch` improves branch prediction and avoids jump table overhead.
**Learning:** In Zig, `switch` statements on integers may compile to jump tables, branch chains, or lookup tables depending on the case pattern. Extracting the dominant case (like data-plane packets `wg_transport` which is 99.9% of traffic) into an explicit `if` branch before a `switch` can improve branch prediction and, when a jump table is used, avoid its overhead.

Copilot uses AI. Check for mistakes.
**Action:** When classifying packets on the hot path, explicitly use an `if` statement for the most common case, falling back to a `switch` for the less common ones. Also, use the `inline` keyword on small, frequently called classification functions to eliminate function call overhead.
43 changes: 24 additions & 19 deletions src/main.zig
Original file line number Diff line number Diff line change
Expand Up @@ -2486,7 +2486,29 @@ fn processIncomingPacket(
) void {
const Device = lib.wireguard.Device;

switch (Device.PacketType.classify(pkt)) {
const pkt_type = Device.PacketType.classify(pkt);

// Optimization: Fast path for data plane transport to avoid jump table overhead
if (pkt_type == .wg_transport) {
if (n_decrypted.* < 64) {
if (wg_dev.decryptTransport(pkt, &decrypt_storage[n_decrypted.*])) |result| {
// Check service filter before buffering
const PolicyMod = lib.services.Policy;
if (PolicyMod.parseTransportHeader(decrypt_storage[n_decrypted.*][0..result.len])) |ti| {
if (wg_dev.peers[result.slot]) |peer| {
const org_pk = if (swim.membership.peers.getPtr(peer.identity_key)) |mp| mp.org_pubkey else null;
if (!service_filter.check(peer.identity_key, org_pk, ti.proto, ti.dst_port)) return;
}
}
decrypt_lens[n_decrypted.*] = result.len;
decrypt_slots[n_decrypted.*] = result.slot;
n_decrypted.* += 1;
} else |_| {}
}
return;
}
Comment on lines +2491 to +2509
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if fast-path on the enum value here is redundant with the fast-path already added inside classify() at device.zig:35. Since classify is now inline, the compiler will inline the body, and LLVM will fold the two consecutive comparisons against .wg_transport into one.

The real cost of this duplication is maintainability: the transport handling logic is now copy-pasted out of the switch into a separate if block. If the transport handling ever needs to change, a future maintainer could easily update one path and forget the other (or not realize the if block exists above the switch). The .wg_transport arm in the switch on line 2537 is now dead code that silently does nothing — which is especially confusing since the actual handling is 20 lines above.

Consider keeping the transport handling inside the switch statement. The fast-path if (msg_type == 4) inside the inlined classify function is sufficient to get the branch-prediction benefit without duplicating call-site logic.

Copilot uses AI. Check for mistakes.

switch (pkt_type) {
.wg_handshake_init => {
if (pkt.len >= @sizeOf(lib.wireguard.Noise.HandshakeInitiation)) {
const msg: *const lib.wireguard.Noise.HandshakeInitiation = @ptrCast(@alignCast(pkt.ptr));
Expand All @@ -2509,27 +2531,10 @@ fn processIncomingPacket(
} else |_| {}
}
},
.wg_transport => {
if (n_decrypted.* < 64) {
if (wg_dev.decryptTransport(pkt, &decrypt_storage[n_decrypted.*])) |result| {
// Check service filter before buffering
const PolicyMod = lib.services.Policy;
if (PolicyMod.parseTransportHeader(decrypt_storage[n_decrypted.*][0..result.len])) |ti| {
if (wg_dev.peers[result.slot]) |peer| {
const org_pk = if (swim.membership.peers.getPtr(peer.identity_key)) |mp| mp.org_pubkey else null;
if (!service_filter.check(peer.identity_key, org_pk, ti.proto, ti.dst_port)) return;
}
}
decrypt_lens[n_decrypted.*] = result.len;
decrypt_slots[n_decrypted.*] = result.slot;
n_decrypted.* += 1;
} else |_| {}
}
},
.wg_cookie => {},
.stun => swim.feedPacket(pkt, sender_addr, sender_port),
.swim => swim.feedPacket(pkt, sender_addr, sender_port),
.unknown => {},
.wg_transport, .unknown => {},
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combining .wg_transport with .unknown in a no-op arm is misleading. The .wg_transport case here is dead code (it's handled by the early return on line 2508), while .unknown was a legitimate no-op in the original code. Grouping them together obscures this distinction. If the fast-path if is kept, consider adding a comment like // handled above for .wg_transport, or keep it as a separate arm for clarity.

Suggested change
.wg_transport, .unknown => {},
.wg_transport => {
// handled above in fast path
},
.unknown => {},

Copilot uses AI. Check for mistakes.
}
}

Expand Down
8 changes: 6 additions & 2 deletions src/wireguard/device.zig
Original file line number Diff line number Diff line change
Expand Up @@ -24,16 +24,20 @@ pub const PacketType = enum {
stun, // STUN binding response
unknown,

pub fn classify(data: []const u8) PacketType {
/// Optimization: Inline and extract dominant case to avoid jump table overhead
pub inline fn classify(data: []const u8) PacketType {
if (data.len < 4) return .unknown;

// WireGuard messages: first byte is type, next 3 are zeros
const msg_type = std.mem.readInt(u32, data[0..4], .little);

// Fast path for data-plane transport packets (99.9% of traffic)
if (msg_type == 4) return .wg_transport;

return switch (msg_type) {
1 => .wg_handshake_init,
2 => .wg_handshake_resp,
3 => .wg_cookie,
Comment on lines +27 to 40
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The premise that Zig's switch on a small dense integer range (1–4) compiles to a jump table is not generally accurate. LLVM (Zig's backend in ReleaseFast) typically converts small switches with dense, contiguous values into a series of comparisons or a direct lookup, not a jump table. Jump tables are used for larger, sparser ranges. For a 4-case switch on values 1–4, the generated code is already close to optimal.

Additionally, the inline keyword in Zig forces inlining at all call sites (including meshguard_ffi.zig and wg_interop.zig), which increases code size and may negatively affect instruction cache performance. In ReleaseFast mode, LLVM will already inline small functions when profitable.

If this optimization is important, it should be validated with benchmarks (e.g., perf stat showing branch misprediction rates before and after). Without data, this adds complexity for a speculative benefit.

Suggested change
/// Optimization: Inline and extract dominant case to avoid jump table overhead
pub inline fn classify(data: []const u8) PacketType {
if (data.len < 4) return .unknown;
// WireGuard messages: first byte is type, next 3 are zeros
const msg_type = std.mem.readInt(u32, data[0..4], .little);
// Fast path for data-plane transport packets (99.9% of traffic)
if (msg_type == 4) return .wg_transport;
return switch (msg_type) {
1 => .wg_handshake_init,
2 => .wg_handshake_resp,
3 => .wg_cookie,
pub fn classify(data: []const u8) PacketType {
if (data.len < 4) return .unknown;
// WireGuard messages: first byte is type, next 3 are zeros
const msg_type = std.mem.readInt(u32, data[0..4], .little);
return switch (msg_type) {
1 => .wg_handshake_init,
2 => .wg_handshake_resp,
3 => .wg_cookie,
4 => .wg_transport,

Copilot uses AI. Check for mistakes.
4 => .wg_transport,
else => blk: {
// STUN: check for magic cookie at bytes 4-7
if (data.len >= 8) {
Expand Down