Skip to content

netvsp: Add VLAN support#3417

Open
ben-zen wants to merge 19 commits intomicrosoft:mainfrom
ben-zen:netvsp-vlan
Open

netvsp: Add VLAN support#3417
ben-zen wants to merge 19 commits intomicrosoft:mainfrom
ben-zen:netvsp-vlan

Conversation

@ben-zen
Copy link
Copy Markdown
Contributor

@ben-zen ben-zen commented May 2, 2026

Presently, VTL0 with VF passthrough supports VLANs with the MANA NIC, but when Underhill switches that connection over to synthetic path, the Underhill NetVSP implementation drops VLAN tags and therefore breaks connections for the duration of the synthetic handoff; this is leading to issues for users depending on VLAN support, and is a functionality gap with standard NetVSP.

This change adds full VLAN support to Underhill for NetVSP with the MANA backend, in all directions. It does not address other network interfaces at this time such as the consomme test network backend, the tap interface, or the virtio integration; that is left as future work to build off of the structures established here in the interest of expedience.

Ben Lewis and others added 15 commits April 29, 2026 00:12
Add a new VMM test that validates guest-side VLAN (802.1Q) sub-interface
creation and configuration on the synthetic NIC (netvsp).

The test:
- Boots a Linux VM with a NIC (via consomme backend)
- Finds the NIC by MAC address in sysfs
- Creates a VLAN sub-interface (ID 100) using ip-link
- Verifies 802.1Q protocol and VLAN ID in interface details
- Assigns an IP address and brings the interface up
- Sends TX smoke traffic (ping) to exercise the netvsp VLAN PPI path
- Verifies at least one TX packet was transmitted
- Confirms the parent interface remains operational
- Cleans up the VLAN interface

This exercises the guest netvsc driver's VLAN support and the netvsp
VLAN PPI (Per-Packet Information) metadata extraction path. Full
end-to-end VLAN datapath validation would require a VLAN-aware backend;
the current consomme backend ignores VLAN metadata, so this test focuses
on guest-side configuration correctness and TX path stability.

Also fix a pre-existing build error in gdma/src/resolver.rs where the
cfg(test)-gated reject_tx_with_vlan_error field was referenced
unconditionally in struct initialization.

The test took a little over 2m running under WSL2, which seems long.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the `reject_tx_with_vlan_error` field from VportConfig, Vport,
and TxRxTask, which was gated behind #[cfg(test)] and caused net_mana
tests to fail to compile (gdma is built as a regular dependency, not in
test mode, when net_mana tests run).

Instead, have the BNIC emulator inspect the `inject_vlan_pri_tag` field
in the TX long OOB — matching how real MANA hardware detects and rejects
802.1Q VLAN tag insertion requests with CQE_TX_VLAN_TAGGING_VIOLATION.

On the net_mana side, translate TxMetadata's vlan_enabled flag into the
MANA OOB's inject_vlan_pri_tag/vlan_id/pcp/dei fields, and force the
long OOB format when VLAN tagging is requested.

The VLAN fallback test now sends an actual VLAN-tagged packet rather
than relying on a synthetic per-vport flag.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 2, 2026 05:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds VLAN (802.1Q) tagging support end-to-end for Underhill’s NetVSP when using the MANA backend, by plumbing VLAN metadata through net_backend and the NetVSP RNDIS PPI paths, and adding targeted unit/integration tests to prevent regressions.

Changes:

  • Extend net_backend TX/RX metadata to carry VLAN information (and add a transport header offset field for offload parsing).
  • Teach NetVSP to parse/emit VLAN RNDIS PPI and propagate VLAN metadata into TX/RX paths (with stats + tests).
  • Add MANA/GDMA VLAN preservation handling and new tests, plus a VMM test that exercises guest VLAN sub-interface configuration.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
vmm_tests/vmm_tests/tests/tests/multiarch/vlan.rs New multi-arch VMM test that configures a guest VLAN interface and performs a TX smoke check.
vmm_tests/vmm_tests/tests/tests/multiarch.rs Registers the new VLAN test module.
vm/devices/net/netvsp/src/test.rs Enhances test endpoint plumbing to carry RX metadata and capture TX metadata; adds VLAN PPI parsing/tests.
vm/devices/net/netvsp/src/rndisprot.rs Introduces VLAN PPI type + EthVlanInfo encoding helpers.
vm/devices/net/netvsp/src/lib.rs Parses VLAN PPI into TxMetadata, adjusts header-length logic, and adds VLAN counters.
vm/devices/net/netvsp/src/buffers.rs Emits VLAN PPI on RX when RxMetadata.vlan is present.
vm/devices/net/net_tap/src/lib.rs Updates RX metadata construction to include vlan: None.
vm/devices/net/net_mana/src/test.rs Adds a VLAN-preserving loopback endpoint and multiple VLAN-focused tests.
vm/devices/net/net_mana/src/lib.rs Plumbs VLAN into MANA TX OOB (inject) and RX metadata (detect).
vm/devices/net/net_mana/Cargo.toml Adds parking_lot as a dev-dependency for the new tests.
vm/devices/net/net_consomme/src/lib.rs Updates RX metadata construction to include vlan: None.
vm/devices/net/net_backend/src/tests.rs Adds a helper to retrieve written RX metadata in tests.
vm/devices/net/net_backend/src/lib.rs Adds VlanMetadata + VLAN fields on RX/TX metadata; adds tcp_header_offset.
vm/devices/net/gdma/src/bnic.rs Propagates VLAN through GDMA RX flags and TX metadata.
Cargo.lock Records new dependency usage (parking_lot) via dev-deps.

Comment thread vm/devices/net/net_backend/src/lib.rs Outdated
Comment on lines +414 to +415
/// The offset into the buffer where the TCP header begins. Only expected
/// to be set if offload flags are set.
Comment thread vm/devices/net/netvsp/src/rndisprot.rs
Comment thread vm/devices/net/netvsp/src/lib.rs Outdated
Comment thread vm/devices/net/net_mana/src/lib.rs Outdated
@ben-zen ben-zen marked this pull request as ready for review May 4, 2026 17:54
@ben-zen ben-zen requested a review from a team as a code owner May 4, 2026 17:54
Copilot AI review requested due to automatic review settings May 4, 2026 17:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 6 comments.

Comment thread vm/devices/net/netvsp/src/lib.rs Outdated
Comment thread vm/devices/net/gdma/src/bnic.rs Outdated
Comment on lines +122 to +138
let _ = cmd!(sh, "ping -I {vlan_iface} -c 1 -W 2 10.100.0.1")
.read()
.await;

// Verify that at least one packet was transmitted through the VLAN
// interface (the ARP request for the ping target).
let tx_packets = cmd!(sh, "cat /sys/class/net/{vlan_iface}/statistics/tx_packets")
.read()
.await?;
let tx_count: u64 = tx_packets
.trim()
.parse()
.context("failed to parse tx_packets")?;
tracing::info!(tx_count, "TX packets through VLAN interface");
assert!(
tx_count > 0,
"expected at least one TX packet through the VLAN interface"
Comment thread vm/devices/net/netvsp/src/test.rs
Comment thread vm/devices/net/net_backend/src/lib.rs
Comment thread vm/devices/net/net_mana/src/lib.rs Outdated
Copilot AI review requested due to automatic review settings May 4, 2026 20:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated 6 comments.

Comment on lines +2654 to +2656
if metadata.vlan.is_some() {
stats.tx_vlan_packets.increment();
}
Comment thread vm/devices/net/gdma/src/bnic.rs Outdated
Comment on lines +580 to +581
priority: 0,
drop_eligible_indicator: false,
Comment on lines +1391 to +1392
let tx_vlan = captured_tx[0].vlan.expect("TX metadata should carry VLAN");
assert_eq!(tx_vlan.vlan_id, 42);
Comment on lines +1396 to +1397
let rx_vlan = rx.vlan.expect("RX metadata should carry VLAN");
assert_eq!(rx_vlan.vlan_id, 42);
Comment on lines +72 to +74
// Load the 8021q kernel module for VLAN support. This is a no-op if the
// module is already loaded or built into the kernel.
cmd!(sh, "modprobe 8021q").run().await?;
tx_lso_packets: Counter,
tx_checksum_packets: Counter,
tx_invalid_lso_packets: Counter,
rx_vlan_packets: Counter,
@github-actions github-actions Bot added the unsafe Related to unsafe code label May 4, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

Comment thread vm/devices/net/gdma/src/bnic.rs Outdated
net_backend::ETHERNET_VLAN_HEADER_LEN
} else {
net_backend::ETHERNET_HEADER_LEN
} as u16;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: here you get l2_len as u16, then in TxMetadata, you set l2_len: l2_len as u8. There is an almost zero risk of silent overflow. Given that these are both consts, I feel like my slight preference is to not cast to u16, and then do u8::from in the TxMeta.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked but there's not a good from or into; I could do TryFrom but I just don't think there's an issue with overflow here.

Comment thread vm/devices/net/gdma/src/bnic.rs Outdated
l2_len: 14,
l3_len: oob.s_oob.trans_off().clamp(14, 255) - 14,
l2_len: l2_len as u8,
l3_len: oob.s_oob.trans_off().clamp(l2_len, 255) - l2_len,
Copy link
Copy Markdown
Contributor

@erfrimod erfrimod May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check this clamp? Like maybe trace a warning if it happens?
Edit: I know you inherited this, so doesn't need to be part of this pr.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably... I don't particularly want to add that just now, but if this fails we've got odd packets flowing.

Comment thread vm/devices/net/net_backend/src/lib.rs Outdated

/// A loopback endpoint that preserves VLAN metadata from TX → RX and captures
/// every transmitted packet's [`TxMetadata`] for later inspection.
struct VlanPreservingEndpoint {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked into it, but could you just add Vlan support to the existing TestEndpoint?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look but if it's the consomme endpoint, I need to extend smoltcp to support 802.1q.

stats.tx_lso_packets.increment();
}
if metadata.vlan.is_some() {
stats.tx_vlan_packets.increment();
Copy link
Copy Markdown
Contributor

@erfrimod erfrimod May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You incremented tx_vlan_packets here and at the bottom of rndisprot::PPI_LSO. Should probably just be here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ripped it out above, but I'm currently questioning the utility of doing this. Both mana and netvsp make it hard to trace when vlan packets are getting transported, so it's not really useful beyond having a rough idea that vlan traffic exists.

Comment on lines +738 to +739
/// VLAN IDs are 12 bits. This will silently reject any bits outside of
/// the range.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was debating whether we should validate / check... but I think in both cases TX and RX were we get the values from sources we should just trust. Do we log the VLAN somewhere for triage?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't at all so far--I'll see about logging state that traffic was going over a vlan on error, though.

drop_eligible_indicator: false,
priority: 0,
vlan_id: rx_oob.flags.rx_vlan_id() as u16,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increment the counter. Maybe here?
self.stats.rx_packets_vlan.increment();

l2_len: l2_len as u8,
l3_len: oob.s_oob.trans_off().clamp(l2_len, 255) - l2_len,
l4_len: 0,
transport_header_offset: 0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oob.s_oob.trans_off() Just so that the test bnic is closely aligned with real behavior.


#[repr(C)]
#[derive(Debug, Copy, Clone, IntoBytes, Immutable, KnownLayout, FromBytes)]
pub struct EthVlanInfo(pub u32);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not spark joy but I can't see a good reason to refactor this file to use bitfields right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

unsafe Related to unsafe code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants