Conversation
Spec for an OpenTela-native pairwise network profiler that brings up an isolated mesh on remote machines, measures libp2p_ping / HTTP-over-libp2p latency / throughput across every pair, and writes JSONL. Motivated by the relay-circuit NO_RESERVATION debugging effort: raw IP ping/iperf3 numbers don't characterise what production traffic actually experiences (TLS, multiplexing, relay hops). Adds /v1/probe/echo, /v1/probe/run, libp2p ping registration, and otela probe / peer-id CLI commands as always-on diagnostic surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Allows multiple otela processes to coexist on one host with isolated config and key storage. Foundation for the mesh-bench runner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When --config-dir was set and no cfg.yaml existed yet, initConfig seeded the file at a relative path instead of inside the config dir, breaking otela init --config-dir on fresh dirs (the mesh-bench's happy path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lets the mesh-bench runner read the PeerID before any otela start call. Idempotent: existing keys are left alone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GenerateAndWriteKey now returns the generated key directly so newHost doesn't have to immediately reload from disk. The init_test now restores viper state via t.Cleanup so config_dir doesn't leak into other tests in the same package binary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prints the local libp2p PeerID by reading the key from disk. Used by the mesh-bench runner to discover PeerIDs before bringing up the mesh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Always-on diagnostic surface: every otela node now responds to libp2p ping. Used by the mesh-bench libp2p_ping probe kind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Streams N zero bytes back, capped at 1 GiB. Throughput counterpart for the mesh-bench probe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Calling c.Writer.Flush() per chunk ensures the libp2p-http transport sees bytes incrementally — without it, throughput measurements would show inflated tail latency. Also adds a debug log on write errors and asserts the echo body is actually zero bytes for small n. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three kinds: latency (HTTP-over-libp2p RTT to /v1/health), throughput (byte-stream from /v1/probe/echo), libp2p_ping (raw /ipfs/ping/1.0.0 RTT). Reuses the global libp2p HTTP transport from the proxy handler. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ping.Ping(ctx, h, pid) avoids re-registering the global stream handler on every probe request (NewPingService overwrites the host's existing handler as a side effect). Throughput's aggregate mbps is now total_bytes * 8 / total_elapsed across all iterations, rather than the mean of per-iteration mbps. The per-iteration array still records each iteration's instant mbps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thin client over /v1/probe/run on the local node. Used by the mesh-bench runner to dispatch per-pair probes via rcc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure functions to build per-host opentela cfg.yaml for an isolated bench mesh: ports, bootstrap multiaddrs, feature flags off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-host otela init and peer-id discovery for the mesh-bench runner. Pure orchestration over the existing RemoteRunner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Renders per-host cfg.yaml with bootstrap multiaddrs and pushes via base64-piped stdin. RemoteRunner gains stdin support. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Background-launch otela on each host, poll /v1/health for readiness, then poll /v1/dnt/table for full mesh convergence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pairwise (src, dst, kind) sweep that calls otela probe via rcc and writes one JSONL record per measurement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pkill matched on the unique bench cfg path, then rm -rf the bench dir. Best-effort: pkill returning 1 (no match) is not a failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Top-level orchestrator with signal-safe teardown, phase timing, and run.json summary alongside measurements.jsonl. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local two-node validation runs the full bench pipeline against loopback. Per-machine http_port/libp2p_port overrides allow multiple nodes on a single host. phase_start now passes --config-dir so otela loads the libp2p key from the bench dir (matching what peer-id discovered) rather than the default ~/.config/opentela location. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Test Coverage Report 📊Click to view detailed coverageSummary: |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds an OpenTela-native “mesh bench” capability by introducing new probe endpoints and CLI commands in the Go binary, plus a new Python-based contrib/network-profiler tool that can orchestrate pairwise latency/throughput measurements across a set of machines.
Changes:
- Added
/v1/probe/echoand/v1/probe/runendpoints and a keep-alive libp2p HTTP RoundTripper to support efficient HTTP-over-libp2p measurements. - Added
otela probeandotela peer-idcommands plus config/key-path handling updates to support bench orchestration. - Added a new Python
network-profilerpackage with abenchworkflow (init/discover/configure/push/start/converge/sweep/teardown), tests, and documentation.
Reviewed changes
Copilot reviewed 43 out of 45 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/internal/server/server.go | Registers new /v1/probe routes and swaps libp2p transport registration to the new RoundTripper. |
| src/internal/server/proxy_handler.go | Uses the new libp2p HTTP RoundTripper for libp2p:// traffic. |
| src/internal/server/probe_handler.go | Implements echo + probe execution endpoints (latency/throughput/libp2p ping). |
| src/internal/server/probe_handler_test.go | Adds unit tests for probe handler helpers and endpoints. |
| src/internal/server/libp2p_http_transport.go | Adds a custom RoundTripper to enable stdlib keep-alive pooling over libp2p streams. |
| src/internal/protocol/key.go | Adds ResolveKeyPath, exports key load/write helpers, and adds GenerateAndWriteKey. |
| src/internal/protocol/key_test.go | Adds tests for key path resolution behavior. |
| src/internal/protocol/host.go | Registers ping protocol and changes relay/autorelay configuration + relay resource settings. |
| src/internal/protocol/host_test.go | Adds a test that verifies the ping protocol is registered. |
| src/entry/cmd/root.go | Adds --config-dir persistent flag and ensures seeding honors the configured path. |
| src/entry/cmd/root_test.go | Updates config seeding tests and adds a regression test for --config-dir seeding. |
| src/entry/cmd/init.go | Adds libp2p key initialization during otela init. |
| src/entry/cmd/init_test.go | Tests libp2p key creation/idempotence for otela init flow. |
| src/entry/cmd/probe.go | Adds otela probe command to call the local node’s /v1/probe/run. |
| src/entry/cmd/probe_test.go | Adds tests for the probe CLI behavior against an httptest server. |
| src/entry/cmd/peerid.go | Adds otela peer-id command to print PeerID derived from the on-disk key. |
| src/entry/cmd/peerid_test.go | Adds a test for the peer-id command output shape. |
| meta/build_core_docker.sh | Hardens script execution and adds build signing key handling with BuildKit secrets. |
| deploy/tds/opentela.yaml | Updates the deployment image reference to an amd64-tagged image. |
| docs/content/docs/proposals/meta.json | Adds the mesh-bench design proposal to proposals index. |
| docs/content/docs/proposals/2026-05-02-mesh-bench-design.mdx | Adds the mesh-bench design proposal document. |
| docs/content/docs/advanced/security.mdx | Updates security documentation text. |
| docs/content/docs/advanced/network-profiling.mdx | Adds end-user docs for baseline profiling and OpenTela-native bench workflow. |
| docs/content/docs/advanced/meta.json | Adds the new network profiling page to the advanced docs index. |
| contrib/network-profiler/pyproject.toml | Introduces Python package metadata and pytest configuration. |
| contrib/network-profiler/uv.lock | Adds dependency lockfile for the Python tool. |
| contrib/network-profiler/network_profiler/init.py | Adds package init + version. |
| contrib/network-profiler/network_profiler/main.py | Adds module entrypoint. |
| contrib/network-profiler/network_profiler/cli.py | Implements CLI with plan, collect, heatmap, and new bench command. |
| contrib/network-profiler/network_profiler/model.py | Adds config/machine models and JSON config loading. |
| contrib/network-profiler/network_profiler/remote.py | Adds command template expansion and remote execution wrapper. |
| contrib/network-profiler/network_profiler/measure.py | Implements baseline ping/iperf collection + JSONL recording. |
| contrib/network-profiler/network_profiler/render.py | Adds HTML heatmap rendering from JSONL measurements. |
| contrib/network-profiler/network_profiler/bench_config.py | Builds per-host bench cfg.yaml including bootstrap multiaddrs and ports. |
| contrib/network-profiler/network_profiler/bench.py | Implements bench orchestration phases + JSONL/run.json output. |
| contrib/network-profiler/tests/test_remote.py | Adds tests for remote command templating and stdin support. |
| contrib/network-profiler/tests/test_measure.py | Adds tests for ping/iperf parsing. |
| contrib/network-profiler/tests/test_bench_cli.py | Adds tests that bench is wired into the CLI. |
| contrib/network-profiler/tests/test_bench_config.py | Adds tests for multiaddr and bench config generation. |
| contrib/network-profiler/tests/test_bench_phases.py | Adds tests for bench phase behaviors and record writing. |
| contrib/network-profiler/tests/test_bench_smoke.py | Adds a local two-node smoke test for the end-to-end bench workflow. |
| contrib/network-profiler/README.md | Adds usage and configuration documentation for the Python tool. |
| contrib/network-profiler/Makefile | Adds test and bench-smoke targets. |
| contrib/network-profiler/.gitignore | Ignores Python caches and results/. |
| .gitignore | Adds .claude/ to ignored paths. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return err | ||
| } | ||
| if _, err := os.Stat(keyPath); err == nil { | ||
| return nil |
Comment on lines
+163
to
+166
| libp2p.EnableRelayService( | ||
| relayv2.WithResources(relayServiceResources()), | ||
| relayv2.WithInfiniteLimits(), | ||
| ), |
Comment on lines
+217
to
+221
| probeGroup := v1.Group("/probe") | ||
| { | ||
| probeGroup.GET("/echo", echoHandler) | ||
| probeGroup.POST("/run", runHandler) | ||
| } |
Comment on lines
+147
to
+151
| resp, err := client.Do(req) | ||
| if err != nil { | ||
| failed++ | ||
| lastErr = err.Error() | ||
| continue |
Comment on lines
+22
to
+26
| const maxEchoBytes = 1 << 30 // 1 GiB hard cap | ||
|
|
||
| func echoHandler(c *gin.Context) { | ||
| raw := c.Query("bytes") | ||
| if raw == "" { |
|
|
||
| ## OpenTela-native: `net-profiler bench` | ||
|
|
||
| The bench owns the full lifecycle: it brings up an isolated four-port mesh on each host, runs every (source, destination) pair through three probe kinds, and tears the mesh down. Each run uses a unique config dir at `/tmp/otela-bench-<runID>-<host>/` so it never collides with persistent OpenTela state. |
Comment on lines
+361
to
+362
| "http_port": http_port, | ||
| "libp2p_port": libp2p_port, |
Comment on lines
+215
to
+219
| resp, err := client.Do(req) | ||
| if err != nil { | ||
| failed++ | ||
| lastErr = err.Error() | ||
| continue |
Comment on lines
+221
to
+225
| bytesRead, copyErr := io.Copy(io.Discard, resp.Body) | ||
| resp.Body.Close() | ||
| elapsed := time.Since(start).Nanoseconds() | ||
| if copyErr != nil { | ||
| failed++ |
Comment on lines
+135
to
+139
| func TestRunThroughput_AggregateIsBandwidthWeighted(t *testing.T) { | ||
| // Two iterations, equal bytes, very different elapsed times. | ||
| // Per-iteration mbps: 80 and 8. Mean = 44 mbps. | ||
| // Bandwidth-weighted: total_bytes=2*1MB=16Mbits, total_elapsed=0.1+1.0=1.1s = 16/1.1 ≈ 14.55 mbps. | ||
| // The aggregate should be the bandwidth-weighted number. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new "network-profiler" tool in the
contrib/network-profilerdirectory. The tool enables profiling of pairwise network conditions (latency and bandwidth) across machines managed by the remote-cluster-controller, with support for running isolated libp2p mesh benchmarks. The implementation includes configuration, orchestration, measurement collection, result rendering, and a command-line interface.The most important changes are:
Core Functionality and CLI:
network_profiler/cli.py) supporting commands to plan, collect, render heatmaps, and run isolated mesh benchmarks, allowing users to easily profile and visualize network conditions across clusters.network_profiler/__main__.py) and package versioning (network_profiler/__init__.py). [1] [2]Benchmark Orchestration:
network_profiler/bench.py) that automates multi-phase remote setup, configuration, execution, health checking, convergence, measurement collection, and teardown for libp2p mesh network benchmarks.network_profiler/bench_config.py) to generate per-host configuration files for the benchmarking process, supporting flexible bootstrapping and port assignment.Project Setup and Documentation:
README.mdwith instructions, usage examples, and requirements for using the network profiler tool.Makefilefor running tests and smoke benchmarks, and a.gitignorefor common Python and results files. [1] [2]