Skip to content

Latest commit

 

History

History
128 lines (101 loc) · 5.71 KB

File metadata and controls

128 lines (101 loc) · 5.71 KB

Userspace Dataplane Debug Map

This is the compact file/function map for active debugging on current master. Use it when forwarding is broken, throughput collapses after startup, or the XDP shim and Rust helper disagree about who owns the packet.

1. Start Here

2. Symptom To File Map

Packets never reach bpfrx-userspace-dp

Look at:

Questions:

  • Did Go choose xdp_userspace_prog or xdp_main_prog?
  • Is the capability gate forcing legacy fallback?
  • Is the XDP shim redirecting, cpumap-passing, tail-calling, or dropping?

Helper is up but bindings never become live

Look at:

Questions:

  • Did bootstrap maps get programmed correctly?
  • Did the helper apply the snapshot and arm forwarding?
  • Did AF_XDP bind or rebind fail after a link cycle?

Session opens but reply traffic dies

Look at:

Questions:

  • Is the helper parsing the authoritative 5-tuple from metadata or from the mutated frame?
  • Is reverse NAT lookup hitting nat_reverse_index?
  • Are rebuilt L4 ports coming from the session tuple or from stale frame bytes?

Throughput starts high then falls to zero

Look at:

Questions:

  • Is TX backpressure starving RX fill-ring replenishment?
  • Are pending_tx_local or pending_tx_prepared growing without draining?
  • Are completions being reaped fast enough to recycle frames?

Idle softirq burn or AF_XDP stall

Look at:

Questions:

  • Is the fill ring draining to zero?
  • Are AF_XDP RX buffer allocation errors climbing?
  • Are we spinning in backpressure without refilling?

HA/session-sync looks wrong

Look at:

Questions:

  • Is forwarding armed on the actual primary?
  • Are session deltas being drained from Rust and mirrored into Go/cluster sync?
  • Is owner RG being preserved or falling back to zone-based sync?

3. Short Packet-Path Checklist

  1. Did Go arm userspace forwarding?
  2. Did the XDP shim redirect this packet to AF_XDP?
  3. Did the Rust worker parse the expected tuple?
  4. Was there a session hit, shared hit, or NAT-reverse hit?
  5. Did NAT and FIB resolution produce a valid egress?
  6. Did TX enqueue and drain without starving fill-ring recycle?

4. Validation And Capture Workflow

Use:

That workflow gives you:

  • runtime mode detection
  • sustained-throughput detection
  • perf capture on the active userspace firewall
  • synchronized firewall-side and server-side tcpdump when iperf3 collapses