test-plans perf: TCP vs Go/JS, Python QUIC harness work & core QUIC improvements #1302
Replies: 1 comment
-
TCP + yamux: related work in py-libp2pSeparate from the QUIC-focused notes above, TCP throughput in perf/interop is also tied to the yamux multiplexer and window sizing:
So: #1258 is the broader perf/interop stability + window-size direction; #1269 is the deeper yamux alignment with go-yamux for window semantics and streaming behavior. Both are relevant when interpreting test-plans / loopback TCP numbers for Python vs Go. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
This post shares local perf numbers from the libp2p test-plans
perfharness comparing go-libp2p, js-libp2p, and py-libp2p on loopback (TCP throughput + simple latency), documents what was wired up for Python QUIC in that harness, and lists concrete directions for py-libp2p’s core QUIC stack so upload and download can move closer to TCP and to other implementations.TCP results (one representative local run)
Test style matches the official runner idea: 5s wall clock per direction, MAX_SAFE_INTEGER bytes on one axis, mean of per-second intermediary JSON lines (bit/s), plus a 1 byte / 1 byte latency sample.
Throughput (Gbit/s, approximate)
As % of the fastest implementation in that row
1B/1B latency (ms) — single sample, lower is better
Takeaway: on loopback TCP, the Python perf binary is well behind Go on sustained throughput (expected for a Python stack), while one-shot latency can still look good depending on scheduling and warmup. These numbers are not CI or multi-host; they’re useful for relative comparison and for QUIC work below.
QUIC in test-plans: what was done (harness / glue)
Work lives under
perf/impl/py-libp2p/v0.6/in test-plans (CLI aligned with Go/JS:--run-server,--server-address,--transport,--upload-bytes,--download-bytes, JSON lines to stdout).Notable items that were necessary for QUIC perf to behave at all or to stop foot-guns:
Half-close vs full close on QUIC
The perf protocol needs
close_writeon the client after sending the upload leg. Calling fullclose()on QUIC was closing the stream in a way that broke the download leg (often 0 bytes received). The harness uses the muxed stream’sclose_write()where appropriate.Avoid extra streams right after connect
BasicHostcan run identify on a new connection and open another stream, which interferes with perf’s single-stream assumption. For QUIC, the client path seeds the peerstore so identify does not need to open an extra stream for the ping protocol.QUIC transport configuration
QUICTransportConfig(larger windows, sensible timeouts) andnegotiate_timeoutare passed intonew_hostfor both server and client so upgrades are less likely to fail under load.aioquic flow-control defaults
aioquic’s
QuicConfigurationdefaultsmax_data/max_stream_datato ~1 MiB unless set. py-libp2p’s transport config exposesCONNECTION_FLOW_CONTROL_WINDOW/STREAM_FLOW_CONTROL_WINDOW, but those were not previously applied to aioquic’s limits. The harness installs a narrow patch that wrapscreate_server_config_from_base/create_client_config_from_basein bothlibp2p.transport.quic.utilsandlibp2p.transport.quic.transport(the latter re-imports names, so patching onlyutilsis insufficient) and assignsmax_data/max_stream_datafrom the transport config.After the above, QUIC upload in long
timeout …runs was broadly in the same ballpark as TCP on loopback in local tests. QUIC download (server → client) remained far below TCP (often tens of MB/s effective vs multi‑GB/s class for TCP in the same setup). Bumping flow-control limits further did not fix download, which points to remaining issues beyond the 1 MiB defaults.What to do in py-libp2p core QUIC (upload & download)
These are library-level follow-ups so applications do not rely on monkey-patching and so perf matches other stacks more closely.
1. Apply flow control to aioquic in one place (no duplicate imports)
QUICTransport/ QUIC config construction, mapCONNECTION_FLOW_CONTROL_WINDOWandSTREAM_FLOW_CONTROL_WINDOW(and any related knobs) directly ontoQuicConfiguration.max_dataandmax_stream_data(and stream limits if needed).quic.utilsandquic.transportso behavior cannot diverge.2. QUIC download / receive path (likely the main gap)
measure_performance/ stream read loop under large server→client transfers: CPU, trio scheduling, aioquicreceive_stream/ datagram handling, and yamux interaction if applicable.write_block_sizediscussions in py-libp2p PRs).3. Scheduling and fairness
4. Tests & regression harness
perf) with TCP vs QUIC upload/download on loopback and optionally cross-impl, so regressions are caught in CI where feasible.5. Documentation
References
write_block_sizeand related interop fixes)Posted from community benchmarking work against test-plans
perf; numbers vary by machine—treat as directional.Beta Was this translation helpful? Give feedback.
All reactions