Skip to content

infra: add iface_input/iface_output abstraction for bridge support#493

Merged
maxime-leroy merged 9 commits intoDPDK:mainfrom
rjarry:iface-input
Feb 10, 2026
Merged

infra: add iface_input/iface_output abstraction for bridge support#493
maxime-leroy merged 9 commits intoDPDK:mainfrom
rjarry:iface-input

Conversation

@rjarry
Copy link
Copy Markdown
Collaborator

@rjarry rjarry commented Feb 1, 2026

Introduce a proper L1 interface abstraction layer in the datapath, separating interface-level concerns (VLAN handling, interface state, statistics) from L2 Ethernet processing.

The key motivation is bridge support: when VLAN sub-interfaces are bridge members, the VLAN tag must be stripped before L2 processing so the bridge can learn and forward based on the inner frame. This requires processing order: port_rx → iface_input → bridge_input, with packets only entering eth_input when destined for the local stack. Pure L2 switching bypasses Ethernet header parsing entirely.

On the TX path, move VLAN resolution to iface_output and VLAN header insertion to port_output, enabling hardware TX VLAN offload when supported by the NIC.

Summary by CodeRabbit

  • New Features

    • Port VLAN offload toggle and automatic handling of VLAN subinterfaces.
    • New per-interface input/output nodes enabling mode- and VLAN-aware routing and tracing.
  • Performance

    • Batched interface statistics to reduce per-packet overhead.
    • Multiple RX/TX processing paths selected at startup to exploit device offloads and queue sharing.
  • Behavior

    • Simplified VLAN handling and streamlined packet trace format for clearer output.
    • Improved validation and name-based logging for type registrations.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds VLAN offload fields and a port_set_vlan_offload API, introduces iface_input and iface_output datapath nodes for per-interface VLAN-aware routing, and splits RX/TX processing into offload-aware and shared variants. VLAN handling is removed from eth_input/eth_output. iface_type registration is moved to a const array with new add_subinterface/del_subinterface callbacks and int return codes. Per-packet interface stats are replaced by batched IFACE_STATS_* macros across many datapath modules. Several dispatch tables were resized to use UINT_NUM_VALUES(enum_type) and trace formats/structures were simplified.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@modules/infra/datapath/iface_input.c`:
- Around line 88-99: The VLAN cache in iface_input is keyed only by vlan_id
causing cross-interface reuse; update the cache to include the incoming
interface id by tracking last_iface_id alongside last_vlan_id (or replace the
cache entirely). Concretely, when deciding whether to call
vlan_get_iface(iface->id, vlan_id) check both last_iface_id == iface->id and
last_vlan_id == vlan_id (store last_iface_id when you update vlan_iface), so
vlan_iface is only reused for the same (iface->id, vlan_id) pair; alternatively
remove vlan_iface/last_vlan_id caching and always call vlan_get_iface(iface->id,
vlan_id).

In `@modules/infra/datapath/port_output.c`:
- Around line 44-52: The NULL-check branch that handles gr_mbuf_prepend()
failing frees the mbuf but doesn't stop execution, so memmove(data, ...) runs
with data == NULL; update the error path in the routine (the block that calls
gr_mbuf_is_traced(mbuf), gr_mbuf_trace_finish(mbuf), and rte_pktmbuf_free(mbuf))
to return/continue out of the surrounding loop or function after freeing the
mbuf so execution does not reach memmove; ensure you use the same control flow
style as the surrounding code (e.g., add a continue in the loop or a return if
not in a loop) to avoid the null pointer dereference.
- Around line 38-40: The code currently checks port->tx_offloads &
RTE_ETH_TX_OFFLOAD_VLAN_INSERT and sets mbuf->tx_offload, which is wrong; to
request per-packet VLAN insertion set the mbuf offload flag and VLAN TCI:
replace the mbuf->tx_offload write with mbuf->ol_flags |= RTE_MBUF_F_TX_VLAN and
keep mbuf->vlan_tci = d->vlan_id so the hardware will perform VLAN insertion for
this packet (references: port->tx_offloads, RTE_ETH_TX_OFFLOAD_VLAN_INSERT,
mbuf, mbuf->ol_flags, RTE_MBUF_F_TX_VLAN, mbuf->vlan_tci, d->vlan_id).

Comment thread modules/infra/datapath/iface_input.c Outdated
Comment thread modules/infra/datapath/port_output.c Outdated
Comment thread modules/infra/datapath/port_output.c Outdated
@rjarry rjarry force-pushed the iface-input branch 2 times, most recently from f88a2a9 to e58cac1 Compare February 2, 2026 14:50
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@modules/infra/control/bond.c`:
- Around line 440-452: The loop in bond_add_subinterface enables VLAN offload on
members via port_set_vlan_offload but returns on the first failure, leaving
earlier members enabled and creating an inconsistent state; change this to
perform a rollback on failure: when
port_set_vlan_offload(bond->members[i].iface, true) returns < 0, iterate over
previously-successful indices (0..i-1) and call
port_set_vlan_offload(bond->members[j].iface, false) to restore prior state,
then return the original error; ensure you reference bond_add_subinterface,
bond->members, bond->n_members and port_set_vlan_offload when making the change
and optionally add a debug log entry for the error and rollback outcome.

In `@modules/infra/datapath/eth_output.c`:
- Around line 24-59: The trace code can record an uninitialized or stale src_mac
when gr_mbuf_prepend fails or iface_get_eth_addr fails (NO_HEADROOM/NO_MAC); in
eth_output_process ensure src_mac is always valid before any trace: either
zero-initialize src_mac at declaration or set a clear/known value when you take
the NO_HEADROOM/NO_MAC path, or only call gr_mbuf_trace_add after successfully
populating src_mac (i.e., after gr_mbuf_prepend returns non-NULL and
iface_get_eth_addr succeeds); reference eth_output_process, src_mac,
gr_mbuf_prepend, iface_get_eth_addr, NO_HEADROOM, NO_MAC and gr_mbuf_trace_add
to locate and update the logic so traces never include uninitialized src_mac
data.

In `@modules/infra/datapath/loop_xvrf.c`:
- Around line 54-58: When flushing accumulated packets/bytes on interface
change, use the previous interface id (last_iface_id) instead of the new
eth_data->iface->id so stats are credited to the correct interface; replace the
call to iface_get_stats(lcore_id, eth_data->iface->id) with
iface_get_stats(lcore_id, last_iface_id) and then update stats->rx_packets and
stats->rx_bytes as before (ensure last_iface_id is a valid previous id before
calling iface_get_stats).

Comment thread modules/infra/control/bond.c
Comment thread modules/infra/datapath/eth_output.c
Comment thread modules/infra/datapath/loop_xvrf.c Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@modules/infra/control/iface.c`:
- Around line 379-393: The function iface_del_subinterface currently removes the
sub from parent->subinterfaces via gr_vec_del_swap before invoking the optional
hook type->del_subinterface, which can leave state inconsistent if the hook
fails; change the order so you locate the index, call
type->del_subinterface(parent, sub) first (if type and hook exist), and only
call gr_vec_del_swap(parent->subinterfaces, i) to remove the element when the
hook returns success (0); ensure the function returns the hook's error code on
failure and only returns 0 after a successful deletion.
- Line 45: The function iface_type_get currently returns iface_types[type_id]
without validation; update iface_type_get to check that type_id is within valid
bounds (>= 0 and < the number of entries in iface_types) and return NULL for
invalid values to avoid out-of-bounds reads; locate the iface_type_get
implementation and add a guard using the iface_types array length or the
constant/enum that defines the valid type count, and ensure callers handle a
NULL return.

In `@modules/infra/datapath/eth_output.c`:
- Around line 46-51: The src_mac caching never takes effect because
last_iface_id is not updated after a successful iface_get_eth_addr lookup;
modify the branch around iface_get_eth_addr (the block that sets edge = NO_MAC
and goto next) so that when iface_get_eth_addr(priv->iface, &src_mac) succeeds
you assign last_iface_id = priv->iface->id and proceed (only set edge = NO_MAC
and goto next on failure), ensuring subsequent packets reuse the cached src_mac;
reference symbols: last_iface_id, priv->iface, iface_get_eth_addr, src_mac,
edge, NO_MAC, goto next.

Comment thread modules/infra/control/iface.c
Comment thread modules/infra/control/iface.c
Comment thread modules/infra/datapath/eth_output.c
@rjarry rjarry force-pushed the iface-input branch 4 times, most recently from 712b992 to f579eef Compare February 3, 2026 20:22
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@modules/infra/datapath/port_rx.c`:
- Around line 87-93: set_port_vlan currently always calls strip_vlan and
corrupts untagged frames; change it to check the Ethernet type (eth->ether_type
== RTE_BE16(RTE_ETHER_TYPE_VLAN)) before calling strip_vlan and only set
d->vlan_id from strip_vlan when that test passes, otherwise set d->vlan_id to 0
(or an appropriate "no VLAN" value) and leave the packet intact; use the same
pattern as set_bond_vlan and reference functions/structs iface_inout_mbuf_data,
strip_vlan, and the ether_type check with RTE_BE16(RTE_ETHER_TYPE_VLAN) to
locate the code to modify.

Comment thread modules/infra/datapath/port_rx.c Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@modules/infra/control/port.c`:
- Around line 856-862: The callback port_del_subinterface incorrectly calls
port_has_vlan_subinterfaces() while the to-be-removed subinterface is still in
parent->subinterfaces, so the check must ignore the sub being deleted; change
port_del_subinterface to detect whether any VLAN subinterfaces other than the
one passed in remain (e.g., iterate parent->subinterfaces and count VLANs
skipping the 'sub' pointer, or test if the only VLAN subinterface is exactly
'sub'), and only call port_set_vlan_offload(parent, false) when no other VLAN
subinterfaces remain; reference port_del_subinterface,
port_has_vlan_subinterfaces, iface_del_subinterface and port_set_vlan_offload
when locating the code to update.

Comment thread modules/infra/control/port.c
@rjarry rjarry force-pushed the iface-input branch 3 times, most recently from 57bf33a to 30624c8 Compare February 5, 2026 10:50
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@modules/infra/datapath/eth_output.c`:
- Around line 28-49: The variable last_iface_id is used before initialization
causing undefined behavior; initialize it to a sentinel that cannot match a real
interface (e.g., UINT16_MAX) before the for loop in eth_output.c so the first
iteration always forces iface_get_eth_addr(priv->iface, &src_mac) to run; also
consider zero-initializing src_mac before use to be safe. Update the
initialization near the declarations for last_iface_id/src_mac in the function
that contains eth_output_mbuf_data, gr_mbuf_prepend, and iface_get_eth_addr
calls so the MAC lookup is never skipped due to an indeterminate last_iface_id.

In `@modules/infra/datapath/iface_output.c`:
- Line 18: The review notes that the NO_HEADROOM edge is wired but never used in
iface_output_process; either add a headroom check similar to other outputs
(eth_output, icmp_output, srv6_output) or remove the unused edge wiring. Locate
iface_output_process and implement a check for required skb headroom before
transmission; if headroom is insufficient set edge = NO_HEADROOM (and return the
same way other modules do), otherwise proceed with normal transmit logic. If you
prefer removing unused logic, delete the NO_HEADROOM edge wiring and any related
enum/strings so no unused edge remains.

Comment thread modules/infra/datapath/eth_output.c
Comment thread modules/infra/datapath/iface_output.c Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@modules/infra/datapath/gr_rxtx.h`:
- Around line 69-81: The IFACE_STATS_INC macro is multi-statement and can
misbehave when used in a bare if; wrap the entire macro body in a single do {
... } while (0) block so all statements (including dir##_packets += 1 and
dir##_bytes += ...) are executed atomically as one statement; update the
IFACE_STATS_INC definition (referencing IFACE_STATS_INC, dir##_last_iface_id,
dir##_packets, dir##_bytes, and the rte_pktmbuf_pkt_len(mbuf) usage) accordingly
to prevent silent miscounts when the macro is used without braces.

In `@modules/infra/datapath/port_tx.c`:
- Around line 135-152: tx_finish is being called with the original objs array
after tx_add_vlan compacts/filtering into mbufs, causing wrong packets to be
traced/enqueued; update calls in tx_process and tx_shared_process to pass the
mbufs buffer and the tx_ok/nb_objs count that match mbufs (i.e., replace usages
of objs in the tx_finish call with mbufs so tx_finish indexes the driver-owned
mbufs correctly), and ensure the arguments order and counts remain consistent
with tx_finish's signature.
- Around line 121-133: In tx_offload_add_vlan replace writes to m->tx_offload
with m->ol_flags so VLAN insertion requests use the correct offload flag (set
RTE_MBUF_F_TX_VLAN in ol_flags and set m->vlan_tci from iface_inout_mbuf_data),
and in tx_process/tx_shared_process fix the array-indexing bug introduced by
tx_add_vlan: when tx_add_vlan filters/continues, maintain the parallel mapping
between mbufs[] and objs[] (or only use mbufs[] everywhere) so that the value
returned by rte_eth_tx_burst (tx_ok, an index into mbufs[]) is applied to
mbufs[] (not objs[]); update tx_finish to consume the correct array (mbufs) or
use an index translation from mbufs back to objs to avoid misaligned drops for
NO_HEADROOM packets.

Comment thread modules/infra/datapath/gr_rxtx.h Outdated
Comment thread modules/infra/datapath/port_tx.c
Comment thread modules/infra/datapath/port_tx.c
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@modules/infra/control/port.c`:
- Around line 822-840: port_set_vlan_offload currently updates
p->want_vlan_offload before calling iface_port_reconfig, so on reconfig failure
the flag stays changed and future calls short-circuit; modify the function to
avoid leaving p->want_vlan_offload altered on failure by either moving the
assignment until after a successful reconfig when p->started, or by reverting it
when iface_port_reconfig returns non‑zero (e.g., call iface_port_reconfig(iface,
GR_PORT_SET_N_RXQS, NULL, &api) then if (ret) set p->want_vlan_offload =
!enable). Keep references to port_set_vlan_offload, iface_port_reconfig,
p->want_vlan_offload and p->started when making the change.

In `@modules/infra/datapath/xconnect.c`:
- Around line 33-45: peer returned from iface_from_id(iface->domain_id) may be
NULL; add a NULL guard immediately after the call and only dereference peer when
non-NULL. Specifically, after peer = iface_from_id(...), if (!peer) set edge =
NO_PORT (and avoid calling IFACE_STATS_INC(tx, mbuf, peer) or
iface_info_port(peer)); otherwise keep the existing branch that sets port,
mbuf->port, edge = OUTPUT and calls IFACE_STATS_INC(tx, mbuf, peer). This
ensures iface_info_port, mbuf->port assignment and stats increment only run when
peer is valid.
🧹 Nitpick comments (1)
modules/infra/datapath/port_tx.c (1)

135-174: tx_finish receives (void *)mbufs — cast should be (void **).

Lines 149 and 171 cast mbufs to (void *) but tx_finish parameter is void **objs. This works in practice because the underlying pointer value is correct, but the cast silences a type mismatch.

Proposed fix
-	tx_finish(ctx, graph, node, (void *)mbufs, nb_objs, tx_ok);
+	tx_finish(ctx, graph, node, (void **)mbufs, nb_objs, tx_ok);

Comment on lines +822 to +840
int port_set_vlan_offload(struct iface *iface, bool enable) {
struct iface_info_port *p = iface_info_port(iface);
struct gr_iface_info_port api = {0};
int ret = 0;

if (p->want_vlan_offload == enable)
return 0;

p->want_vlan_offload = enable;

if (p->started) {
// Trigger port reconfiguration to apply the new offload setting.
// Use GR_PORT_SET_N_RXQS as a dummy flag to force reconfiguration.
api.n_rxq = p->n_rxq;
ret = iface_port_reconfig(iface, GR_PORT_SET_N_RXQS, NULL, &api);
}

return ret;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug: want_vlan_offload not reverted on reconfig failure, blocks future retries.

If iface_port_reconfig fails on line 836, want_vlan_offload remains set to enable. Any subsequent call with the same value (e.g., adding another VLAN subinterface) hits the early return on line 827–828 without reattempting the reconfig. The hardware offload stays unconfigured until an unrelated event triggers port_configure.

Proposed fix
 int port_set_vlan_offload(struct iface *iface, bool enable) {
 	struct iface_info_port *p = iface_info_port(iface);
 	struct gr_iface_info_port api = {0};
 	int ret = 0;
 
 	if (p->want_vlan_offload == enable)
 		return 0;
 
 	p->want_vlan_offload = enable;
 
 	if (p->started) {
 		// Trigger port reconfiguration to apply the new offload setting.
 		// Use GR_PORT_SET_N_RXQS as a dummy flag to force reconfiguration.
 		api.n_rxq = p->n_rxq;
 		ret = iface_port_reconfig(iface, GR_PORT_SET_N_RXQS, NULL, &api);
+		if (ret < 0)
+			p->want_vlan_offload = !enable;
 	}
 
 	return ret;
 }
🤖 Prompt for AI Agents
In `@modules/infra/control/port.c` around lines 822 - 840, port_set_vlan_offload
currently updates p->want_vlan_offload before calling iface_port_reconfig, so on
reconfig failure the flag stays changed and future calls short-circuit; modify
the function to avoid leaving p->want_vlan_offload altered on failure by either
moving the assignment until after a successful reconfig when p->started, or by
reverting it when iface_port_reconfig returns non‑zero (e.g., call
iface_port_reconfig(iface, GR_PORT_SET_N_RXQS, NULL, &api) then if (ret) set
p->want_vlan_offload = !enable). Keep references to port_set_vlan_offload,
iface_port_reconfig, p->want_vlan_offload and p->started when making the change.

Comment on lines 33 to 45
peer = iface_from_id(iface->domain_id);

struct iface_stats *rx_stats = iface_get_stats(rte_lcore_id(), iface->id);
rx_stats->rx_packets++;
rx_stats->rx_bytes += rte_pktmbuf_pkt_len(mbuf);
IFACE_STATS_INC(rx, mbuf, iface);

if (peer->type == GR_IFACE_TYPE_PORT) {
port = iface_info_port(peer);
mbuf->port = port->port_id;
edge = OUTPUT;

struct iface_stats *tx_stats = iface_get_stats(rte_lcore_id(), peer->id);
tx_stats->tx_packets++;
tx_stats->tx_bytes += rte_pktmbuf_pkt_len(mbuf);
IFACE_STATS_INC(tx, mbuf, peer);
} else {
edge = NO_PORT;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find the definition of iface_from_id
fd -t f -e c -e h | xargs rg -l "iface_from_id" | head -20

Repository: DPDK/grout

Length of output: 699


🏁 Script executed:

# Search for the function definition
rg "^\s*(static\s+)?.*\*?\s+iface_from_id\s*\(" --type c -A 10

Repository: DPDK/grout

Length of output: 49936


🏁 Script executed:

# Find all usages of iface_from_id to see how callers handle it
rg "iface_from_id\(" --type c -B 2 -A 2

Repository: DPDK/grout

Length of output: 27534


🏁 Script executed:

# Verify the xconnect.c code specifically
cat -n modules/infra/datapath/xconnect.c | sed -n '30,50p'

Repository: DPDK/grout

Length of output: 692


🏁 Script executed:

# Check if there are any NULL checks for peer in xconnect.c
grep -n "peer" modules/infra/datapath/xconnect.c

Repository: DPDK/grout

Length of output: 256


Add NULL guard for iface_from_id() return value before dereferencing.

peer is dereferenced on line 37 without checking if iface_from_id() returned NULL. If domain_id references a deleted or invalid interface, this causes a NULL pointer dereference and crash. All other datapath callers of iface_from_id() properly check for NULL before use.

🐛 Proposed fix
 		peer = iface_from_id(iface->domain_id);
+		if (peer == NULL) {
+			edge = NO_PORT;
+			goto enqueue;
+		}
 
 		IFACE_STATS_INC(rx, mbuf, iface);
 
 		if (peer->type == GR_IFACE_TYPE_PORT) {
 			port = iface_info_port(peer);
 			mbuf->port = port->port_id;
 			edge = OUTPUT;
 
 			IFACE_STATS_INC(tx, mbuf, peer);
 		} else {
 			edge = NO_PORT;
 		}
 
 		if (gr_mbuf_is_traced(mbuf)) {
 			gr_mbuf_trace_add(mbuf, node, 0);
 		}
+enqueue:
 		rte_node_enqueue_x1(graph, node, edge, mbuf);
🤖 Prompt for AI Agents
In `@modules/infra/datapath/xconnect.c` around lines 33 - 45, peer returned from
iface_from_id(iface->domain_id) may be NULL; add a NULL guard immediately after
the call and only dereference peer when non-NULL. Specifically, after peer =
iface_from_id(...), if (!peer) set edge = NO_PORT (and avoid calling
IFACE_STATS_INC(tx, mbuf, peer) or iface_info_port(peer)); otherwise keep the
existing branch that sets port, mbuf->port, edge = OUTPUT and calls
IFACE_STATS_INC(tx, mbuf, peer). This ensures iface_info_port, mbuf->port
assignment and stats increment only run when peer is valid.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modules/infra/control/vlan.c (1)

67-80: ⚠️ Potential issue | 🟠 Major

Error paths leave VLAN state inconsistent during reconfig.

If iface_del_subinterface fails on line 72, the old hash key (line 71) is already deleted, making the VLAN unreachable via vlan_get_iface() while the subinterface link still exists on the old parent.

Similarly, if iface_add_subinterface fails on line 79, the interface has already been detached from the old parent (line 72) and its parent_id/vlan_id updated (lines 77-78), but it's neither in the new parent's subinterface list nor in the hash. The VLAN is orphaned.

Consider either reordering operations so the hash deletion happens after successful subinterface manipulation, or adding rollback logic on failure.

🤖 Fix all issues with AI agents
In `@modules/infra/control/bond.c`:
- Around line 462-476: bond_del_subinterface can fail mid-loop disabling VLAN
offload on bond members (using port_set_vlan_offload), leaving an inconsistent
state; change it to do a rollback on failure: when port_set_vlan_offload returns
<0 for member i, iterate over previously-processed members (indices 0..i-1) and
call port_set_vlan_offload(..., true) to restore their original state, then
return the error; use the existing bond_del_subinterface function, the
bond->members array and port_set_vlan_offload calls to implement the rollback.

Comment on lines +462 to +476
static int bond_del_subinterface(struct iface *parent, struct iface *sub) {
struct iface_info_bond *bond = iface_info_bond(parent);

if (sub->type != GR_IFACE_TYPE_VLAN)
return 0;
if (bond_has_vlan_subinterfaces(parent, sub))
return 0;

for (uint8_t i = 0; i < bond->n_members; i++) {
int ret = port_set_vlan_offload(bond->members[i].iface, false);
if (ret < 0)
return ret;
}
return 0;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Same partial-failure risk on the disable path in bond_del_subinterface.

Symmetric to bond_add_subinterface: if disabling VLAN offload fails mid-loop, some members retain the offload while others have it disabled.

🤖 Prompt for AI Agents
In `@modules/infra/control/bond.c` around lines 462 - 476, bond_del_subinterface
can fail mid-loop disabling VLAN offload on bond members (using
port_set_vlan_offload), leaving an inconsistent state; change it to do a
rollback on failure: when port_set_vlan_offload returns <0 for member i, iterate
over previously-processed members (indices 0..i-1) and call
port_set_vlan_offload(..., true) to restore their original state, then return
the error; use the existing bond_del_subinterface function, the bond->members
array and port_set_vlan_offload calls to implement the rollback.

Replace hardcoded array sizes and *_COUNT enum values with
UINT_NUM_VALUES() macro for all edge registration arrays. This makes
the array sizing explicit based on the index type and eliminates
tautological comparison warnings.

Update registration functions to use name-based validation instead of
bounds checking where applicable. Use gr_iface_type_name(),
gr_iface_mode_name(), and gr_nh_type_name() for better log messages
and consistent error reporting.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
The interface type name is already available via gr_iface_type_name()
which derives it from the type ID. Storing it again in struct iface_type
is redundant. This field is unused anyway. Remove the name field from
iface_type.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Replace the linked list of interface types with a static array indexed
by type ID. This makes iface_type_get() an O(1) array lookup instead of
O(n) linked list traversal.

This benefits iface_get_eth_addr() which is called in the eth_input hot
path and needs to look up the interface type to find the get_eth_addr
callback.

Since interface types are no longer linked, the STAILQ_ENTRY field is
removed and the iface_type structs can be declared const.

Substituting pointer chasing with direct array indexing reduces LLC
cache misses by 9% and increases throughput from 11.6M to 11.8M pkt/s
(+1.7%).

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Instead of updating the shared iface_stats structure for every packet,
accumulate packets and bytes in local variables and flush to the stats
structure only when the interface changes or at the end of the batch.

Add helper macros IFACE_STATS_VARS, IFACE_STATS_INC and IFACE_STATS_FLUSH
to reduce boilerplate in datapath nodes.

This reduces memory writes from O(n) to O(number of unique interfaces)
per batch, which is typically O(1) for homogeneous traffic. The shared
stats structure is accessed less frequently, reducing cache line
pressure and store buffer usage.

Reduces LLC cache misses by 11% (3.20M to 2.86M) and eth_input cycles
from 48.0 to 46.2 per packet (-4%). Throughput remains at 11.8M pkt/s
as ip_input (88 cycles/pkt) dominates the critical path.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
RCU quiescent state only needs to be reported periodically to signal
that the thread has passed through a quiescent period. Reporting it on
every graph walk iteration is unnecessarily frequent and adds overhead
from the memory barriers involved.

Move rte_rcu_qsbr_quiescent() from the main loop body into the
housekeeping block that runs every 256 iterations. This reduces the
frequency of memory barriers while still ensuring timely RCU grace
period completion.

Reduces LLC cache misses by 7% (2.86M to 2.67M) and L1 cache misses by
6% (2.14B to 2.01B). Throughput remains at 11.8M pkt/s as ip_input
dominates the critical path.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Store the negotiated RX and TX offload flags in iface_info_port after
port configuration. This allows datapath nodes to check hardware
capabilities without calling rte_eth_dev_info_get() at runtime.

This prepares for dedicated process functions for port_rx and port_tx
nodes that will be selected based on available offloads (e.g. VLAN
stripping on RX, VLAN insertion on TX).

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Introduce add_subinterface and del_subinterface callbacks in struct
iface_type to let interface types react when subinterfaces are created
or destroyed. The callbacks are invoked from iface_add_subinterface()
and iface_del_subinterface() which now return an error code.

Implement these callbacks for ports: when the first VLAN subinterface is
added, enable RTE_ETH_TX_OFFLOAD_VLAN_INSERT and reconfigure the port.
When the last VLAN subinterface is removed, disable it. For bonds,
propagate the setting to all member ports. Also handle the case where
a port joins or leaves a bond that already has VLAN subinterfaces.

Requesting RTE_ETH_TX_OFFLOAD_VLAN_INSERT at port configuration time
may cause DPDK to select a more expensive TX function that checks every
packet ol_flags for offload requests.

E.g. for the ice driver, this means using ice_xmit_pkts_vec_avx2_offload
instead of ice_xmit_pkts_vec_avx2, adding roughly 4% overhead on the
global IPv4 performance even when no packets require VLAN insertion.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Move VLAN subinterface resolution from eth_output to iface_output. When
the destination interface is a VLAN, resolve its parent and store the
VLAN ID in iface_mbuf_data. Update the output interface pointer to the
parent so that downstream nodes operate on the physical port or bond.

Move VLAN header insertion from eth_output to port_tx. Add four
dedicated process functions selected at graph creation time based on
port capabilities and queue sharing:

 tx_process                - software VLAN insertion, exclusive queue
 tx_shared_process         - software VLAN insertion, shared queue (spinlock)
 tx_offload_process        - hardware VLAN offload, exclusive queue
 tx_shared_offload_process - hardware VLAN offload, shared queue (spinlock)

When RTE_ETH_TX_OFFLOAD_VLAN_INSERT is available, set RTE_MBUF_F_TX_VLAN
and vlan_tci on the mbuf and let the hardware insert the tag. Otherwise,
insert the VLAN header in software.

Add rxtx_flags_t to identify which TX path variant handled a packet.
Record these flags in TX traces instead of port/queue, which is already
visible from the node clone name.

Simplify eth_input and eth_output traces to only record the Ethernet
header. The interface and VLAN information moves to iface_output. The
eth_input trace temporarily loses this context; it will be restored
once a similar iface_input node is introduced on the RX path.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Introduce iface_input between port_rx and eth_input to provide a L1
interface abstraction on the RX path, mirroring iface_output on TX.
The new node handles VLAN subinterface resolution for VRF mode, checks
interface up/down state, accounts RX statistics, and routes packets to
the next node based on interface mode.

Move VLAN tag stripping from eth_input to port_rx. Add four dedicated
process functions selected at graph creation time based on port
capabilities and bond membership:

  rx_process              - software VLAN stripping, standalone port
  rx_offload_process      - hardware VLAN stripping, standalone port
  rx_bond_process         - software VLAN stripping, bond member
  rx_bond_offload_process - hardware VLAN stripping, bond member

When RTE_ETH_RX_OFFLOAD_VLAN_STRIP is available, read the VLAN ID from
mbuf vlan_tci. Otherwise, strip the tag by shifting Ethernet addresses
forward. Bond variants handle LACP slow protocol frames specially,
directing them to the member port instead of the bond.

Record rxtx_flags_t in RX traces instead of port/queue, matching the TX
side. Add RXTX_F_BOND to identify bond member paths in traces.

Simplify eth_input to only parse the Ethernet header and classify
packets by destination address domain (local, broadcast, multicast,
other).

This refactoring prepares for bridge support where VLAN tags must be
stripped before L2 processing so the bridge can learn and forward based
on the inner frame.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
@maxime-leroy maxime-leroy merged commit fc2c787 into DPDK:main Feb 10, 2026
10 checks passed
@rjarry rjarry deleted the iface-input branch February 10, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants