Add L2 bridge interface support by rjarry · Pull Request #491 · DPDK/grout

rjarry · 2026-01-29T14:59:14Z

Implement L2 bridging functionality with MAC learning, flooding, and per-bridge configuration options.

This series introduces a new bridge interface type that can attach port and bond interfaces as members. Packets received on member interfaces are forwarded based on learned MAC addresses, with unknown unicast, broadcast and multicast traffic flooded to all other members.

The first commits prepare the infrastructure by moving the control queue to avoid circular dependencies and allowing nexthop creation from datapath workers. This enables the bridge input node to learn MAC addresses on the fly without blocking the control plane.

The L2 nexthop type stores (bridge_id, vlan_id, mac) tuples with automatic ageing based on per-bridge configurable timeout. Static entries can be created via CLI to override learned ones.

Bridge configuration supports disabling flooding (NO_FLOOD) and MAC learning (NO_LEARN) flags, custom MAC address, and up to 64 member interfaces. The frr plugin exposes bridges to zebra as ZEBRA_IF_BRIDGE with proper slave type on members.

Summary by CodeRabbit

New Features
- Full L2 bridge support: bridge interfaces, member management, ageing/MAC handling, L2 nexthops, dataplane input/flood processing, and CLI for bridge and L2 nexthops.
Bug Fixes
- More robust control-plane event delivery: deferred notifications, explicit control-queue draining, and QSBR sync tied to current core for deterministic reclamation.
Tests
- Added smoke test validating L2/L3 bridge setup, learning, forwarding and teardown.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-29T15:08:40Z

📝 Walkthrough

Walkthrough

Adds a new L2 module (api, cli, control, datapath) implementing bridge interfaces and L2 nexthops: new enums GR_IFACE_TYPE_BRIDGE/GR_IFACE_MODE_BRIDGE, GR_NH_T_L2/GR_NH_ORIGIN_BRIDGE, and L2 API/header types. Introduces bridge control and L2 nexthop management, CLI commands, datapath nodes bridge_input and bridge_flood, and a smoke test. Adds gr_control_queue.h and control_queue_drain plus deferred event delivery from dataplane workers; several rte_rcu_qsbr_synchronize calls now use rte_lcore_id(). Build files and complexity metrics updated.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In `@main/control_queue.c`:
- Around line 89-92: The current code violates the single-consumer ring
contract: ctrlq_ring is created with RING_F_SC_DEQ but control_queue_poll
(consumer in control queue thread) and control_queue_drain can both dequeue
concurrently; fix by either (A) forcing all dequeues onto the control queue
thread — change control_queue_drain to enqueue the drain request (use the same
enqueue API/event mechanism and have control_queue_poll handle the actual
dequeue and processing) so only control_queue_poll performs ring_dequeue, or (B)
switch the ring to multi-consumer mode — replace RING_F_SC_DEQ with the
appropriate multi-consumer flag (e.g., RING_F_MP_DEQ) when creating ctrlq_ring
and audit callers to ensure they safely call the ring API concurrently; update
references to control_queue_drain and control_queue_poll accordingly.

In `@main/event.c`:
- Around line 42-44: control_queue_push's failure path currently drops
notifications silently; update the error path in the block handling
control_queue_push(notify_subscribers, (void *)obj, ev_type) to record the
failure (e.g., increment a dedicated metric/error counter and emit a descriptive
log entry including notify_subscribers, ev_type and obj identifiers) so drops
are observable; optionally consider a bounded retry or backpressure mechanism
later, but at minimum add the metric increment and processLogger/syslog error
for visibility.

In `@modules/l2/datapath/bridge_flood.c`:
- Around line 114-122: The tracing loop dereferences br->id without checking br
for NULL; update the block guarded by gr_mbuf_is_traced(m) (the loop that
creates struct bridge_flood_trace via gr_mbuf_trace_add on clones[]) to first
test whether br is non-NULL (e.g., if (br) { t->bridge_id = br->id; } else {
t->bridge_id = 0; } or skip setting bridge_id) to avoid dereferencing a NULL
pointer returned by iface_from_id(); ensure the rest of the fields
(t->flood_count) and trace addition still happen only after this NULL-safe
handling.
- Line 92: The edge selection is using the source iface type (iface->type) but
should use the destination member type (member->type); update the assignment
that sets edges[flood_count] in the bridge_flood code so it checks member->type
against GR_IFACE_TYPE_VLAN and assigns ETH_IN or OUTPUT accordingly (keep the
special-case behavior for the bridge interface destination at the existing later
check). Locate the edges[flood_count] assignment in bridge_flood and replace the
reference to iface->type with member->type to ensure routing is determined by
destination interface type.

In `@modules/l2/datapath/bridge_input.c`:
- Around line 65-68: The code extracts vlan_id when RTE_MBUF_F_RX_VLAN_STRIPPED
is set but never persists it, so egress sends untagged frames; fix by storing
the extracted vlan_id back into the mbuf metadata (e.g., assign m->vlan_tci =
vlan_id) and mark the mbuf for re-tagging (e.g., set a re-tag flag such as
RTE_MBUF_F_TX_VLAN or a module-specific mbuf flag) in the branch that checks
RTE_MBUF_F_RX_VLAN_STRIPPED so the egress path (which will read m->vlan_tci) can
re-insert the VLAN header; ensure you still clear or adjust
RTE_MBUF_F_RX_VLAN_STRIPPED consistently if needed.

coderabbitai · 2026-01-29T15:08:43Z

main/control_queue.c

+void control_queue_drain(uint32_t event, const void *obj) {
 	struct control_queue_drain drain = {event, obj};
 	control_queue_poll(0, 0, &drain);
 }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the control_queue.c file to understand the structure cat -n main/control_queue.c | head -120

Repository: DPDK/grout

Length of output: 3959

🏁 Script executed:

# Search for ctrlq_ring creation and how it's initialized rg "ctrlq_ring" --context 3

Repository: DPDK/grout

Length of output: 2087

🏁 Script executed:

# Find all call sites of control_queue_drain rg "control_queue_drain" --context 2

Repository: DPDK/grout

Length of output: 9319

🏁 Script executed:

# Search for RING_F_SC_DEQ usage to understand ring configuration rg "RING_F_SC_DEQ|RING_F_SC" --context 2

Repository: DPDK/grout

Length of output: 262

🏁 Script executed:

# Look at the entire control_queue_poll function to understand what it does ast-grep --pattern 'fn control_queue_poll($_, $_, $_) { $$$ }'

Repository: DPDK/grout

Length of output: 36

Concurrent dequeue violates SC ring contract

ctrlq_ring is configured with RING_F_SC_DEQ, but two code paths can dequeue concurrently: the libevent event loop (via control_queue_poll callback in the dedicated control queue thread) and control_queue_drain() called directly from module code (e.g., nexthop deletion, interface removal). This is undefined behavior. Either ensure both paths run on the same consumer thread with proper synchronization, or switch to a multi-consumer ring mode.

🤖 Prompt for AI Agents

In `@main/control_queue.c` around lines 89 - 92, The current code violates the single-consumer ring contract: ctrlq_ring is created with RING_F_SC_DEQ but control_queue_poll (consumer in control queue thread) and control_queue_drain can both dequeue concurrently; fix by either (A) forcing all dequeues onto the control queue thread — change control_queue_drain to enqueue the drain request (use the same enqueue API/event mechanism and have control_queue_poll handle the actual dequeue and processing) so only control_queue_poll performs ring_dequeue, or (B) switch the ring to multi-consumer mode — replace RING_F_SC_DEQ with the appropriate multi-consumer flag (e.g., RING_F_MP_DEQ) when creating ctrlq_ring and audit callers to ensure they safely call the ring API concurrently; update references to control_queue_drain and control_queue_poll accordingly.

coderabbitai · 2026-01-29T15:08:43Z

main/event.c

+		if (control_queue_push(notify_subscribers, (void *)obj, ev_type) < 0) {
+			// XXX: add error stat if push fails?
+		}


⚠️ Potential issue | 🟡 Minor

Silent drop on queue push failure loses events.

When control_queue_push returns an error (e.g., ring full), the notification is silently discarded. For critical events this could cause inconsistent state between dataplane and control plane. Consider at minimum incrementing an error counter for observability.

🤖 Prompt for AI Agents

In `@main/event.c` around lines 42 - 44, control_queue_push's failure path currently drops notifications silently; update the error path in the block handling control_queue_push(notify_subscribers, (void *)obj, ev_type) to record the failure (e.g., increment a dedicated metric/error counter and emit a descriptive log entry including notify_subscribers, ev_type and obj identifiers) so drops are observable; optionally consider a bounded retry or backpressure mechanism later, but at minimum add the metric increment and processLogger/syslog error for visibility.

coderabbitai · 2026-01-29T15:08:43Z

modules/l2/datapath/bridge_flood.c

+			if (clone == NULL)
+				continue;
+
+			edges[flood_count] = iface->type == GR_IFACE_TYPE_VLAN ? ETH_IN : OUTPUT;


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd modules/l2/datapath && head -120 bridge_flood.c | tail -40

Repository: DPDK/grout

Length of output: 1051

🏁 Script executed:

cat -n modules/l2/datapath/bridge_flood.c | sed -n '70,110p'

Repository: DPDK/grout

Length of output: 1351

🏁 Script executed:

cat -n modules/l2/datapath/bridge_flood.c | sed -n '60,115p'

Repository: DPDK/grout

Length of output: 1778

🏁 Script executed:

rg -A 5 -B 5 "struct gr_iface" modules/l2/datapath/bridge_flood.c | head -50

Repository: DPDK/grout

Length of output: 36

🏁 Script executed:

cat -n modules/l2/datapath/bridge_flood.c | sed -n '50,130p'

Repository: DPDK/grout

Length of output: 2568

Destination routing should check member type, not source interface type.

Packets are being cloned for each bridge member destination (line 88), but the edge selection at line 92 checks the source interface type (iface->type) instead of the destination (member->type). This is inconsistent with the special case at line 102, where the bridge interface destination always routes through ETH_IN regardless of source type, implying that destination routing should depend on destination interface type.

Proposed fix

- edges[flood_count] = iface->type == GR_IFACE_TYPE_VLAN ? ETH_IN : OUTPUT; + edges[flood_count] = member->type == GR_IFACE_TYPE_VLAN ? ETH_IN : OUTPUT;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

edges[flood_count] = iface->type == GR_IFACE_TYPE_VLAN ? ETH_IN : OUTPUT;

edges[flood_count] = member->type == GR_IFACE_TYPE_VLAN ? ETH_IN : OUTPUT;

🤖 Prompt for AI Agents

In `@modules/l2/datapath/bridge_flood.c` at line 92, The edge selection is using the source iface type (iface->type) but should use the destination member type (member->type); update the assignment that sets edges[flood_count] in the bridge_flood code so it checks member->type against GR_IFACE_TYPE_VLAN and assigns ETH_IN or OUTPUT accordingly (keep the special-case behavior for the bridge interface destination at the existing later check). Locate the edges[flood_count] assignment in bridge_flood and replace the reference to iface->type with member->type to ensure routing is determined by destination interface type.

coderabbitai · 2026-01-29T15:08:43Z

modules/l2/datapath/bridge_flood.c

+		if (gr_mbuf_is_traced(m)) {
+			for (uint16_t i = 0; i < flood_count; i++) {
+				struct bridge_flood_trace *t = gr_mbuf_trace_add(
+					clones[i], node, sizeof(*t)
+				);
+				t->bridge_id = br->id;
+				t->flood_count = i;
+			}
+		}


⚠️ Potential issue | 🔴 Critical

Null pointer dereference when br is NULL and tracing is enabled.

When iface_from_id() returns NULL at line 72, the code jumps to next: and sets up the drop path. However, if tracing is enabled, line 119 dereferences br->id which will crash.

Proposed fix

if (gr_mbuf_is_traced(m)) { for (uint16_t i = 0; i < flood_count; i++) { struct bridge_flood_trace *t = gr_mbuf_trace_add( clones[i], node, sizeof(*t) ); - t->bridge_id = br->id; + t->bridge_id = br ? br->id : 0; t->flood_count = i; } }

🤖 Prompt for AI Agents

In `@modules/l2/datapath/bridge_flood.c` around lines 114 - 122, The tracing loop dereferences br->id without checking br for NULL; update the block guarded by gr_mbuf_is_traced(m) (the loop that creates struct bridge_flood_trace via gr_mbuf_trace_add on clones[]) to first test whether br is non-NULL (e.g., if (br) { t->bridge_id = br->id; } else { t->bridge_id = 0; } or skip setting bridge_id) to avoid dereferencing a NULL pointer returned by iface_from_id(); ensure the rest of the fields (t->flood_count) and trace addition still happen only after this NULL-safe handling.

coderabbitai · 2026-01-29T15:08:43Z

modules/l2/datapath/bridge_input.c

+		if (m->ol_flags & RTE_MBUF_F_RX_VLAN_STRIPPED) {
+			vlan_id = m->vlan_tci & 0xfff;
+			m->ol_flags &= ~RTE_MBUF_F_RX_VLAN_STRIPPED;
+		} else if (eth->ether_type == RTE_BE16(RTE_ETHER_TYPE_VLAN)) {


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's see the full context around the flagged lines cat -n modules/l2/datapath/bridge_input.c | sed -n '50,85p'

Repository: DPDK/grout

Length of output: 1481

🏁 Script executed:

# Check if TX VLAN is set elsewhere in the bridging code rg "RTE_MBUF_F_TX_VLAN" modules/l2/datapath/ -A 2 -B 2

Repository: DPDK/grout

Length of output: 36

🏁 Script executed:

# Look for how packets are forwarded and where re-tagging happens rg "vlan_id|vlan_tci" modules/l2/datapath/ -A 2 -B 2

Repository: DPDK/grout

Length of output: 1962

🏁 Script executed:

# Check the TX path to understand if/how TX VLAN is honored rg "tx_burst|send_pkts" modules/l2/datapath/ -A 3 -B 1

Repository: DPDK/grout

Length of output: 36

🏁 Script executed:

# Understand the overall bridging architecture head -50 modules/l2/datapath/bridge_input.c

Repository: DPDK/grout

Length of output: 1014

🏁 Script executed:

# Look for the OUTPUT edge handling - where packets actually get sent rg "OUTPUT|ETH_IN|FLOOD" modules/l2/datapath/bridge_input.c -B 5 -A 5 | head -60

Repository: DPDK/grout

Length of output: 1360

🏁 Script executed:

# Check if there's a bridge_output or similar file that handles TX path fd "bridge" modules/l2/datapath/ -type f

Repository: DPDK/grout

Length of output: 224

🏁 Script executed:

# Search for VLAN re-tagging or insertion logic in the bridge/output path rg "TX_VLAN|tx_vlan|retag|insert.*vlan" modules/l2/ -l

Repository: DPDK/grout

Length of output: 36

🏁 Script executed:

# Look for nexthop forwarding to see how packets are output rg "nexthop.*output|output.*packet" modules/l2/datapath/ -B 2 -A 2

Repository: DPDK/grout

Length of output: 36

🏁 Script executed:

# Check what happens after nexthop lookup - the 'next' label flow cat -n modules/l2/datapath/bridge_input.c | sed -n '80,120p'

Repository: DPDK/grout

Length of output: 1391

🏁 Script executed:

# Find the iface_output handling to see if it re-tags packets fd iface_output -type f modules/l2/

Repository: DPDK/grout

Length of output: 224

🏁 Script executed:

# Search for any VLAN handling in output/TX paths rg "iface_output|tx_vlan|vlan.*tx" modules/l2/ -B 2 -A 2 | head -80

Repository: DPDK/grout

Length of output: 1016

🏁 Script executed:

# Look for where vlan_id might be stored/used after bridge_input rg "mbuf_data" modules/l2/ -A 3 -B 1 | head -100

Repository: DPDK/grout

Length of output: 1537

🏁 Script executed:

# Check the mbuf structure definition to see if vlan_id is stored anywhere find . -name "*.h" -type f | xargs grep -l "mbuf_data\|mbuf_info" | head -5

Repository: DPDK/grout

Length of output: 241

🏁 Script executed:

# Look at the mbuf_data structure definition cat -n ./modules/infra/datapath/gr_mbuf.h | head -80

Repository: DPDK/grout

Length of output: 3885

🏁 Script executed:

# Search for VLAN-related fields in mbuf_data rg "vlan|mbuf_data" ./modules/infra/datapath/gr_mbuf.h -A 5 -B 2

Repository: DPDK/grout

Length of output: 736

🏁 Script executed:

# Check the iface_output implementation to see if it uses any mbuf_data fields find . -name "*iface_output*" -o -name "*output*.c" | grep -v build | head -10

Repository: DPDK/grout

Length of output: 449

🏁 Script executed:

# Look at the br->flags to understand how VLAN is supposed to be handled in bridge rg "GR_BRIDGE_F" modules/l2/ -B 1 -A 1

Repository: DPDK/grout

Length of output: 1973

🏁 Script executed:

# Check iface_output implementation - this is where packets are actually transmitted cat -n ./modules/infra/datapath/iface_output.c | head -100

Repository: DPDK/grout

Length of output: 3170

🏁 Script executed:

# Look for any VLAN re-tagging in eth_output or iface_output rg "TX_VLAN|vlan_tci|ol_flags" ./modules/infra/datapath/iface_output.c -B 2 -A 2

Repository: DPDK/grout

Length of output: 36

🏁 Script executed:

# Check eth_output.c for VLAN handling cat -n ./modules/infra/datapath/eth_output.c | head -80

Repository: DPDK/grout

Length of output: 2511

🏁 Script executed:

# Look for any VLAN handling in the output path rg "RTE_MBUF_F_TX_VLAN|RTE_MBUF_F_RX_VLAN" modules/ -B 2 -A 2

Repository: DPDK/grout

Length of output: 1512

🏁 Script executed:

# Check if there's any mechanism to preserve vlan_id for egress rg "vlan_id" modules/l2/datapath/bridge_input.c -B 2 -A 2 | tail -30

Repository: DPDK/grout

Length of output: 955

🏁 Script executed:

# Look at bridge_flood to see if it handles VLAN preservation cat -n modules/l2/datapath/bridge_flood.c | head -120

Repository: DPDK/grout

Length of output: 3573

🏁 Script executed:

# Search for any mbuf_data structure that might store vlan_id rg "struct.*mbuf.*data|GR_MBUF_PRIV_DATA_TYPE" modules/l2/ -A 5

Repository: DPDK/grout

Length of output: 396

🏁 Script executed:

# Check if vlan_id is ever stored in the packet metadata for TX rg "\.vlan_id\s*=" modules/l2/datapath/ -B 2 -A 2

Repository: DPDK/grout

Length of output: 36

VLAN tags are lost for bridged traffic when RX stripping is enabled.

When RTE_MBUF_F_RX_VLAN_STRIPPED is set, the VLAN ID is extracted to a local variable for FDB decisions but never stored in mbuf metadata. After forwarding, the TX path has no knowledge of the original VLAN ID and transmits the frame untagged, breaking VLAN isolation. The vlan_id must be persisted (e.g., in mbuf_data) and re-tagged on egress.

Note: Setting RTE_MBUF_F_TX_VLAN alone is insufficient—the TX path does not currently honor this flag or re-insert VLAN headers based on it.

🤖 Prompt for AI Agents

In `@modules/l2/datapath/bridge_input.c` around lines 65 - 68, The code extracts vlan_id when RTE_MBUF_F_RX_VLAN_STRIPPED is set but never persists it, so egress sends untagged frames; fix by storing the extracted vlan_id back into the mbuf metadata (e.g., assign m->vlan_tci = vlan_id) and mark the mbuf for re-tagging (e.g., set a re-tag flag such as RTE_MBUF_F_TX_VLAN or a module-specific mbuf flag) in the branch that checks RTE_MBUF_F_RX_VLAN_STRIPPED so the egress path (which will read m->vlan_tci) can re-insert the VLAN header; ensure you still clear or adjust RTE_MBUF_F_RX_VLAN_STRIPPED consistently if needed.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@modules/l2/control/nexthop.c`:
- Around line 252-261: In the age-out branch inside nexthop L2 cleanup, you must
remove the nexthop from the hash table before calling nexthop_decref(nh) to
avoid leaving a dangling pointer; change the block that currently calls only
nexthop_decref(nh) to first delete the entry from the nexthop hash (use the same
hash-remove helper as used by nexthop_l2_purge_iface and
nexthop_l2_purge_bridge) and then call nexthop_decref(nh). Ensure the
hash-remove call targets the same key/nh used by the LOG(...) and precedes
nexthop_decref(nh).

modules/l2/control/nexthop.c

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@modules/l2/cli/nexthop.c`:
- Line 83: The VLAN max value is off-by-one: update the ec_node_uint call inside
the with_help invocation that defines "VLAN" to use 4095 as the upper bound
instead of 4096 so the valid range becomes 0–4095; locate the with_help("VLAN
ID.", ec_node_uint("VLAN", 0, 4096, 10)) expression and change the upper limit
parameter passed to ec_node_uint to 4095.

coderabbitai · 2026-01-29T16:35:09Z

modules/l2/cli/nexthop.c

+		"Create a new L2 entry.",
+		with_help("Output interface.", ec_node_dyn("IFACE", complete_iface_names, NULL)),
+		with_help("Nexthop ID.", ec_node_uint("ID", 1, UINT32_MAX - 1, 10)),
+		with_help("VLAN ID.", ec_node_uint("VLAN", 0, 4096, 10)),


⚠️ Potential issue | 🟡 Minor

VLAN ID range allows invalid value 4096.

VLAN IDs are 12-bit values with a valid range of 0-4095. The max value 4096 allows users to enter an invalid VLAN ID.

🐛 Proposed fix

- with_help("VLAN ID.", ec_node_uint("VLAN", 0, 4096, 10)), + with_help("VLAN ID.", ec_node_uint("VLAN", 0, 4095, 10)),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

with_help("VLAN ID.", ec_node_uint("VLAN", 0, 4096, 10)),

with_help("VLAN ID.", ec_node_uint("VLAN", 0, 4095, 10)),

🤖 Prompt for AI Agents

In `@modules/l2/cli/nexthop.c` at line 83, The VLAN max value is off-by-one: update the ec_node_uint call inside the with_help invocation that defines "VLAN" to use 4095 as the upper bound instead of 4096 so the valid range becomes 0–4095; locate the with_help("VLAN ID.", ec_node_uint("VLAN", 0, 4096, 10)) expression and change the upper limit parameter passed to ec_node_uint to 4095.

Replace hardcoded array sizes and *_COUNT enum values with UINT_NUM_VALUES() macro for all edge registration arrays. This makes the array sizing explicit based on the index type and eliminates tautological comparison warnings. Update registration functions to use name-based validation instead of bounds checking where applicable. Use gr_iface_type_name(), gr_iface_mode_name(), and gr_nh_type_name() for better log messages and consistent error reporting. Signed-off-by: Robin Jarry <rjarry@redhat.com>

The interface type name is already available via gr_iface_type_name() which derives it from the type ID. Storing it again in struct iface_type is redundant. This field is unused anyway. Remove the name field from iface_type. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Replace the linked list of interface types with a static array indexed by type ID. This makes iface_type_get() an O(1) array lookup instead of O(n) linked list traversal. This benefits iface_get_eth_addr() which is called in the eth_input hot path and needs to look up the interface type to find the get_eth_addr callback. Since interface types are no longer linked, the STAILQ_ENTRY field is removed and the iface_type structs can be declared const. Substituting pointer chasing with direct array indexing reduces LLC cache misses by 9% and increases throughput from 11.6M to 11.8M pkt/s (+1.7%). Signed-off-by: Robin Jarry <rjarry@redhat.com>

Instead of updating the shared iface_stats structure for every packet, accumulate packets and bytes in local variables and flush to the stats structure only when the interface changes or at the end of the batch. This reduces memory writes from O(n) to O(number of unique interfaces) per batch, which is typically O(1) for homogeneous traffic. The shared stats structure is accessed less frequently, reducing cache line pressure and store buffer usage. Reduces LLC cache misses by 11% (3.20M to 2.86M) and eth_input cycles from 48.0 to 46.2 per packet (-4%). Throughput remains at 11.8M pkt/s as ip_input (88 cycles/pkt) dominates the critical path. Signed-off-by: Robin Jarry <rjarry@redhat.com>

RCU quiescent state only needs to be reported periodically to signal that the thread has passed through a quiescent period. Reporting it on every graph walk iteration is unnecessarily frequent and adds overhead from the memory barriers involved. Move rte_rcu_qsbr_quiescent() from the main loop body into the housekeeping block that runs every 256 iterations. This reduces the frequency of memory barriers while still ensuring timely RCU grace period completion. Reduces LLC cache misses by 7% (2.86M to 2.67M) and L1 cache misses by 6% (2.14B to 2.01B). Throughput remains at 11.8M pkt/s as ip_input dominates the critical path. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Replace individual rte_node_enqueue_x1() calls with a batching scheme. Each rte_node_enqueue_x1() call accesses the destination node's rte_node structure to append one packet. Even when all packets go to the same edge, touching these fields once per packet instead of once per batch causes unnecessary cache pressure. Introduce two helper macros: - NODE_ENQUEUE_NEXT: tracks runs of consecutive packets going to the same edge. When the edge changes, flush the previous run using rte_node_enqueue() which copies mbuf pointers in bulk. - NODE_ENQUEUE_FLUSH: at end of loop, if all packets went to the same edge (common case), use rte_node_next_stream_move() which swaps pointers and updates idx once. Otherwise flush the final run. These macros are not used in port_rx (use stream move directly), control_input (source node), control_output and port_tx (packets consumed directly). Increases throughput by 10.2% (11.8M to 13.0M pkt/s). Cycles per packet reduced significantly in output nodes: port_output -37%, iface_output -30%, ip_forward -22%, eth_output -18%, ip_output -14%. Signed-off-by: Robin Jarry <rjarry@redhat.com>

The rx_process function was checking the ether_type of every received packet to detect LACP slow protocol frames (0x8809). This check is only relevant for bonded interfaces, yet it was performed unconditionally on all ports. Move the ether_type check outside the main loop and only execute it when the interface mode is GR_IFACE_MODE_BOND. For non-bonded ports, the loop now simply stores the iface pointer without accessing packet data. This reduces L1 cache misses by 6% (2.26B to 2.13B) and port_rx node cycles per packet by 10% (47.0 to 42.1). The perf annotate data shows that the ether_type comparison accounted for 59% of L1 cache misses within rx_process. Eliminating this access for non-bonded ports cuts rx_process cache miss samples by 61%. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Requesting RTE_ETH_TX_OFFLOAD_VLAN_INSERT at port configuration time causes DPDK to select a more expensive TX function that checks every packet's ol_flags for offload requests. For the ice driver, this means using ice_xmit_pkts_vec_avx2_offload instead of ice_xmit_pkts_vec_avx2, adding roughly 4% overhead even when no packets require VLAN insertion. Introduce add_subinterface and del_subinterface callbacks in struct iface_type to let interface types react when subinterfaces are created or destroyed. The callbacks are invoked from iface_add_subinterface() and iface_del_subinterface() which now return an error code. Implement these callbacks for ports: when the first VLAN subinterface is added, enable RTE_ETH_TX_OFFLOAD_VLAN_INSERT and reconfigure the port. When the last VLAN subinterface is removed, disable it. For bonds, propagate the setting to all member ports. Also handle the case where a port joins or leaves a bond that already has VLAN subinterfaces. Store the negotiated RX and TX offload flags in iface_info_port so that datapath nodes can check hardware capabilities without calling rte_eth_dev_info_get() at runtime. Signed-off-by: Robin Jarry <rjarry@redhat.com>

The eth_output node was resolving VLAN subinterfaces and inserting VLAN headers inline. This coupled L2 framing with interface type logic and prevented using hardware TX VLAN offload since eth_output has no access to port capabilities. Move VLAN subinterface resolution to iface_output. When the destination interface is a VLAN, resolve its parent and store the VLAN ID in mbuf private data via the new iface_inout_mbuf_data structure. Update the output interface pointer to the parent so that downstream nodes operate on the physical port or bond. Move VLAN header insertion to port_output which can check port->tx_offloads. When RTE_ETH_TX_OFFLOAD_VLAN_INSERT is available, set RTE_MBUF_F_TX_VLAN and vlan_tci on the mbuf and let the hardware insert the tag. Otherwise, prepend the VLAN header in software by shifting the Ethernet addresses and inserting the 802.1Q tag. Simplify eth_input and eth_output traces to only record the Ethernet header. The interface and VLAN information that was previously in these traces moves to iface_output. The eth_input trace temporarily loses this context; it will be restored once iface_input is introduced on the RX path. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Introduce iface_input between port_rx and eth_input to provide a L1 interface abstraction on the RX path, mirroring iface_output on TX. This refactoring prepares for bridge support. When VLAN subinterfaces are bridge members, the VLAN tag must be stripped before L2 processing so the bridge can learn and forward based on the inner frame. With iface_input in place, the processing order becomes: port_rx, iface_input, bridge_input. Packets only enter eth_input when destined for the local stack; pure L2 switching bypasses Ethernet header parsing entirely. Move VLAN tag handling to port_rx which has access to port->rx_offloads and can check whether the hardware already stripped the tag. When the VLAN is still inline, strip it by shifting the Ethernet addresses forward and adjusting the packet offset. Store the resolved interface and VLAN ID in iface_inout_mbuf_data so downstream nodes always see a clean frame. Handle bond member ports specially: LACP slow protocol frames must be processed by the member port, not the bond. The new iface_input node handles VLAN subinterface resolution for VRF mode interfaces, interface up/down checking, and RX statistics accounting. It routes packets to the next node based on interface mode. Simplify eth_input to only parse the Ethernet header and classify packets by destination address domain (local, broadcast, multicast, other). Move iface_input_mode_register() to iface_input.c and register bond mode from lacp_input since bond members need eth_input for LACP. Signed-off-by: Robin Jarry <rjarry@redhat.com>

A future change will require calling control_queue_push() from gr_event_push() which lives in main/. If control_queue stays in the infra module, this would create a circular dependency between main and infra. Move control_queue.c and gr_control_queue.h to main/ and replace the event-based drain mechanism with explicit control_queue_drain() calls from iface_destroy() and nexthop_destroy() after the RCU sync. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Pass rte_lcore_id() to rte_rcu_qsbr_synchronize() instead of RTE_QSBR_THRID_INVALID to exclude the calling thread from the quiescent state wait. This is needed to allow creating objects from datapath workers. Also move the quiescent state report inside the housekeeping block to reduce overhead. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Previously, calling nexthop_new() from a datapath thread would trigger an ABORT(). This was a defensive measure to prevent calling functions that are not thread-safe from the wrong context. In practice, the only constraint is that event notifications must be processed on the control plane thread. Modify gr_event_push() to detect when it is called from a datapath worker and use the control queue to defer the notification to the control plane event loop. This enables datapath nodes (such as bridge MAC learning) to create nexthops on the fly without blocking the control plane. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Introduce a new L2 nexthop type (GR_NH_T_L2) and associated origin (GR_NH_ORIGIN_BRIDGE) to support MAC address learning on bridge interfaces. Create a new l2 module with API, control plane and CLI components. The API defines gr_nexthop_info_l2 containing bridge_id, vlan_id and mac address fields. The control plane implements a RCU-protected hash table for fast lookups by (bridge_id, vlan_id, mac) tuple. The nexthop_learn_l2() function can be called from datapath workers to create new entries or refresh existing ones. Learned entries have a 300 second ageing timeout after which they are automatically removed. Helper functions allow purging all entries associated with a given interface or bridge. The CLI provides grcli commands to manually add L2 nexthops and display their info. No support in the frr plugin for now. It will require some more thinking to translate grout nexthops into zebra neighbors back and forth. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Implement a new bridge interface type (GR_IFACE_TYPE_BRIDGE) along with its associated mode (GR_IFACE_MODE_BRIDGE) for member interfaces. The bridge configuration includes a configurable ageing time for learned MAC addresses (default 300s), flags to disable flooding and MAC learning, an optional MAC address (random if not specified), and up to 64 member interfaces (ports or bonds). When a member interface is attached to a bridge, its mode is set to GR_IFACE_MODE_BRIDGE and its domain_id points to the bridge. When detached, it reverts to GR_IFACE_MODE_VRF. On bridge deletion, all members are detached and all associated L2 nexthops are purged. Bridge interfaces are always in VRF mode and they reference themselves via their domain_id. The L2 nexthop ageing callback now uses the per-bridge ageing_time setting instead of a hardcoded value. Update the frr plugin to expose bridge interfaces to zebra as ZEBRA_IF_BRIDGE and set the bridge_ifindex on member interfaces with ZEBRA_IF_SLAVE_BRIDGE slave type. For now, there is no datapath support for bridge interfaces. This will be added in the next commit. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Add the packet processing nodes for bridge forwarding. Register GR_IFACE_MODE_BRIDGE and and GR_IFACE_TYPE_BRIDGE in the port_rx and iface_output nodes, respectively to to steer packets to bridge_input. The bridge_input node handles incoming packets on bridge member interfaces. It extracts the VLAN ID (if present), learns the source MAC address (unless learning is disabled), and performs destination lookup. Known unicast destinations are forwarded directly to the learned output interface. Unknown unicast, broadcast and multicast packets are sent to the flood node. Packets destined back to their source interface are dropped as hairpin. The bridge_flood node replicates packets to all bridge members except the source interface. It also floods to the bridge interface itself. Packets are cloned for each output member. Both nodes handle the NO_FLOOD and NO_LEARN bridge flags to allow disabling these behaviors per bridge. When packets go out of a bridge into a bridge interface, they are sent to eth_input. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Verify basic bridge functionality with three ports attached to a bridge interface. The test exercises L2 forwarding between namespaces, MAC learning, static L2 nexthop creation overriding a learned entry, and L3 connectivity to the bridge interface itself. Also verify that detaching a member port and deleting the bridge properly cleans up the associated L2 nexthops. Signed-off-by: Robin Jarry <rjarry@redhat.com>

coderabbitai bot reviewed Jan 29, 2026

View reviewed changes

rjarry force-pushed the bridge branch from b788f34 to b2aa4be Compare January 29, 2026 15:12

coderabbitai bot reviewed Jan 29, 2026

View reviewed changes

modules/l2/control/nexthop.c Show resolved Hide resolved

rjarry force-pushed the bridge branch 2 times, most recently from abf123b to 2155207 Compare January 29, 2026 16:28

coderabbitai bot reviewed Jan 29, 2026

View reviewed changes

rjarry marked this pull request as draft February 1, 2026 20:47

rjarry added 17 commits February 3, 2026 14:51

rjarry force-pushed the bridge branch from 2155207 to a76bc74 Compare February 4, 2026 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add L2 bridge interface support#491

Add L2 bridge interface support#491
rjarry wants to merge 17 commits intoDPDK:mainfrom
rjarry:bridge

rjarry commented Jan 29, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 29, 2026 •

edited

Loading

Walkthrough

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 29, 2026

Uh oh!

coderabbitai bot Jan 29, 2026

Uh oh!

coderabbitai bot Jan 29, 2026

Uh oh!

coderabbitai bot Jan 29, 2026

Uh oh!

coderabbitai bot Jan 29, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	edges[flood_count] = iface->type == GR_IFACE_TYPE_VLAN ? ETH_IN : OUTPUT;
	edges[flood_count] = member->type == GR_IFACE_TYPE_VLAN ? ETH_IN : OUTPUT;

	with_help("VLAN ID.", ec_node_uint("VLAN", 0, 4096, 10)),
	with_help("VLAN ID.", ec_node_uint("VLAN", 0, 4095, 10)),

Conversation

rjarry commented Jan 29, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rjarry commented Jan 29, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 29, 2026 •

edited

Loading