test-plans perf: TCP vs Go/JS, Python QUIC harness work & core QUIC improvements #1302

acul71 · 2026-04-07T00:10:28Z

acul71
Apr 7, 2026
Maintainer

Summary

This post shares local perf numbers from the libp2p test-plans perf harness comparing go-libp2p, js-libp2p, and py-libp2p on loopback (TCP throughput + simple latency), documents what was wired up for Python QUIC in that harness, and lists concrete directions for py-libp2p’s core QUIC stack so upload and download can move closer to TCP and to other implementations.

TCP results (one representative local run)

Test style matches the official runner idea: 5s wall clock per direction, MAX_SAFE_INTEGER bytes on one axis, mean of per-second intermediary JSON lines (bit/s), plus a 1 byte / 1 byte latency sample.

Throughput (Gbit/s, approximate)

	Go (v0.42)	JS (v2.8)	py-libp2p (v0.6)
Upload	~11.7	~8.2	~5.1
Download	~15.0	~8.0	~5.3

As % of the fastest implementation in that row

	Go	JS	Python
Upload	100%	~70%	~44%
Download	100%	~53%	~35%

1B/1B latency (ms) — single sample, lower is better

Go	JS	Python
~9	~23	~5.5

Takeaway: on loopback TCP, the Python perf binary is well behind Go on sustained throughput (expected for a Python stack), while one-shot latency can still look good depending on scheduling and warmup. These numbers are not CI or multi-host; they’re useful for relative comparison and for QUIC work below.

QUIC in test-plans: what was done (harness / glue)

Work lives under perf/impl/py-libp2p/v0.6/ in test-plans (CLI aligned with Go/JS: --run-server, --server-address, --transport, --upload-bytes, --download-bytes, JSON lines to stdout).

Notable items that were necessary for QUIC perf to behave at all or to stop foot-guns:

Half-close vs full close on QUIC
The perf protocol needs close_write on the client after sending the upload leg. Calling full close() on QUIC was closing the stream in a way that broke the download leg (often 0 bytes received). The harness uses the muxed stream’s close_write() where appropriate.
Avoid extra streams right after connect
BasicHost can run identify on a new connection and open another stream, which interferes with perf’s single-stream assumption. For QUIC, the client path seeds the peerstore so identify does not need to open an extra stream for the ping protocol.
QUIC transport configuration
QUICTransportConfig (larger windows, sensible timeouts) and negotiate_timeout are passed into new_host for both server and client so upgrades are less likely to fail under load.
aioquic flow-control defaults
aioquic’s QuicConfiguration defaults max_data / max_stream_data to ~1 MiB unless set. py-libp2p’s transport config exposes CONNECTION_FLOW_CONTROL_WINDOW / STREAM_FLOW_CONTROL_WINDOW, but those were not previously applied to aioquic’s limits. The harness installs a narrow patch that wraps create_server_config_from_base / create_client_config_from_base in both libp2p.transport.quic.utils and libp2p.transport.quic.transport (the latter re-imports names, so patching only utils is insufficient) and assigns max_data / max_stream_data from the transport config.

After the above, QUIC upload in long timeout … runs was broadly in the same ballpark as TCP on loopback in local tests. QUIC download (server → client) remained far below TCP (often tens of MB/s effective vs multi‑GB/s class for TCP in the same setup). Bumping flow-control limits further did not fix download, which points to remaining issues beyond the 1 MiB defaults.

What to do in py-libp2p core QUIC (upload & download)

These are library-level follow-ups so applications do not rely on monkey-patching and so perf matches other stacks more closely.

1. Apply flow control to aioquic in one place (no duplicate imports)

In QUICTransport / QUIC config construction, map CONNECTION_FLOW_CONTROL_WINDOW and STREAM_FLOW_CONTROL_WINDOW (and any related knobs) directly onto QuicConfiguration.max_data and max_stream_data (and stream limits if needed).
Ensure there is a single code path used by both quic.utils and quic.transport so behavior cannot diverge.

2. QUIC download / receive path (likely the main gap)

Profile measure_performance / stream read loop under large server→client transfers: CPU, trio scheduling, aioquic receive_stream / datagram handling, and yamux interaction if applicable.
Look for small reads, excessive wakeups, or buffer starvation on the receive side compared to upload.
Consider aligning read buffer sizes and write block sizes with interop expectations (see e.g. Noise frame limits and perf write_block_size discussions in py-libp2p PRs).

3. Scheduling and fairness

Under load, Python’s async model may starve the receive path relative to Go/JS. Investigate task priorities, nursery structure, and whether blocking work can be pushed out of the critical read path.

4. Tests & regression harness

Add or extend interop / perf tests in py-libp2p (or document running against test-plans perf) with TCP vs QUIC upload/download on loopback and optionally cross-impl, so regressions are caught in CI where feasible.

5. Documentation

Document recommended QUIC settings for high-throughput streams and known aioquic limitations vs go-libp2p.

References

libp2p perf spec
py-libp2p perf / yamux / block size: PR #1258 (context for write_block_size and related interop fixes)

Posted from community benchmarking work against test-plans perf; numbers vary by machine—treat as directional.

acul71 · 2026-04-07T00:17:36Z

acul71
Apr 7, 2026
Maintainer Author

TCP + yamux: related work in py-libp2p

Separate from the QUIC-focused notes above, TCP throughput in perf/interop is also tied to the yamux multiplexer and window sizing:

PR #1258 — fix(perf): stabilize interop perf test setup and timeout handling
Hardens the perf interop harness (timeouts, QUIC negotiation, dependency layout) and includes work on yamux window defaults so transfers are less bottlenecked—larger window sizes help end-to-end performance on multiplexed TCP+Noise connections during perf-style workloads.
PR #1269 — yamux fixes
Targets go-yamux compatibility on the wire: correct window update frame types (SYN/ACK/FIN/RST as TYPE_WINDOW_UPDATE, not TYPE_DATA), serialized writes to avoid frame interleaving on the secure connection, and receive-window auto-tuning (grow toward a larger cap per RTT, similar in spirit to go-yamux). Together these address interoperability and throughput limits that show up strongly in TCP + Noise + yamux stacks compared to Go peers.

So: #1258 is the broader perf/interop stability + window-size direction; #1269 is the deeper yamux alignment with go-yamux for window semantics and streaming behavior. Both are relevant when interpreting test-plans / loopback TCP numbers for Python vs Go.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test-plans perf: TCP vs Go/JS, Python QUIC harness work & core QUIC improvements #1302

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

test-plans perf: TCP vs Go/JS, Python QUIC harness work & core QUIC improvements #1302

Uh oh!

Uh oh!

acul71 Apr 7, 2026 Maintainer

Summary

TCP results (one representative local run)

QUIC in test-plans: what was done (harness / glue)

What to do in py-libp2p core QUIC (upload & download)

1. Apply flow control to aioquic in one place (no duplicate imports)

2. QUIC download / receive path (likely the main gap)

3. Scheduling and fairness

4. Tests & regression harness

5. Documentation

References

Replies: 1 comment

Uh oh!

acul71 Apr 7, 2026 Maintainer Author

TCP + yamux: related work in py-libp2p

acul71
Apr 7, 2026
Maintainer

acul71
Apr 7, 2026
Maintainer Author