fix(tcp): bound payload by IPv4 Total Length — drops NIC Ethernet padding (data-phase RST)#110
Conversation
…adding) parse_tcp_packet took payload as frame[payload_start..] (to end of frame) and never read the IPv4 Total Length. NICs hardware-pad sub-60-byte frames to the 60-byte Ethernet minimum, so a peer's bare 54-byte ACK arrives as 60 bytes; those 6 padding bytes were parsed as TCP payload, advancing rcv_nxt past what the peer actually sent. The DUT then ACKed unsent data and the peer RST the flow — observed as EVERY connection resetting in the data phase during perf run 28557222376 (150k flows opened, 0 sustained). Bound payload by the IPv4 Total Length (frame[ip+2..ip+4]). Regression test parse_ignores_ethernet_padding feeds a 54-byte bare ACK padded to 60 bytes and asserts an empty payload; fails before the fix (6-byte payload), passes after. Offline, no EC2. Found via a diagnostic workflow over the perf-run logs + code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Synthetic Performance Results — Graviton (run)Commit: ✅ synthetic UDP socket bound to 10.0.0.1:9000 (MAC: 02:00:00:00:00:01) Synthetic UDP Performance ResultsMeasures framework overhead: sync IPv4 Baseline
IPv6
IPv6 vs IPv4 Comparison (sync path)
IPv4 avg sync/async ratio: 0.9x, worst: 1.0x | IPv6 vs IPv4 worst ratio: 1.33x (OK)
|
Synthetic Performance Results (run)Commit: ✅ synthetic UDP socket bound to 10.0.0.1:9000 (MAC: 02:00:00:00:00:01) Synthetic UDP Performance ResultsMeasures framework overhead: sync IPv4 Baseline
IPv6
IPv6 vs IPv4 Comparison (sync path)
IPv4 avg sync/async ratio: 1.0x, worst: 1.1x | IPv6 vs IPv4 worst ratio: 1.20x (OK)
|
[CI] Stage: DeployInfrastructure ready.
|
[CI] Stage: DeployInfrastructure ready.
|
[CI] Stage: SummaryAll tests PASSED. ARP seeding: kernel /proc/net/arp (automatic)
|
1 similar comment
[CI] Stage: SummaryAll tests PASSED. ARP seeding: kernel /proc/net/arp (automatic)
|
✅ Integration Tests Passed — Graviton (run)Branch: Test Results
Application Logs (last 20 lines)receiver-echo-server.log sender-echo-server.log sender-test-client.log |
✅ Integration Tests Passed (Run 28562108531)Branch: Test Results
Application Logs (last 20 lines)receiver-echo-server.log sender-echo-server.log sender-test-client.log receiver-test-client-iperf.log sender-test-client-iperf.log Full Application Logs (last 200 lines each)receiver-echo-server.logsender-echo-server.logsender-test-client.logreceiver-test-client-iperf.logsender-test-client-iperf.log
|
Root cause of DPDK-TCP connections resetting in the data phase (perf run 28557222376: TRex opened 150,002 flows, 0 sustained; DUT logs show
connection reseton every flow).parse_tcp_packetcomputed the TCP payload asframe[payload_start..]— all bytes to the end of the frame — and never bounded it by the IPv4 Total Length. AWS ENA (like any NIC) hardware-pads sub-60-byte Ethernet frames to the 60-byte minimum, so a peer's bare 54-byte ACK arrives as 60 bytes. Those 6 padding bytes were parsed as a 6-byte TCP "payload", so the engine advancedrcv_nxtby 6 and ACKed data the peer never sent → per RFC 793 the peer RSTs → every flow reset right after the first echo.Fix
Bound payload by the IPv4 Total Length. Regression test
parse_ignores_ethernet_paddingbuilds a bare ACK, pads it to 60 B, and asserts an empty payload — fails before, passes after. All 221 tcp tests pass; offline, no EC2 needed.This is a genuine stack correctness bug (affects any NIC-padded short frame), not perf-only. Found via a multi-agent diagnostic workflow over the run logs + engine code.
Note: real perf numbers also need the separate TRex stats-key fix (coming) — this PR stops the resets; that one reads the counters from the right dict. Follow-up: check whether the UDP/QUIC codecs need the same Total-Length bound.
🤖 Generated with Claude Code