A production-grade, Go-native distributed load-testing platform
Real performance testing for CI/CD pipelines, staging environments, and load balancer validation
Key Design Principles:
- Time-driven load modeling (not request loops)
- Single controller coordination (no peer-to-peer)
- Online metric aggregation with HDR histograms
- Context-driven cancellation throughout
- Lock-free hot paths for performance
- Streaming metrics with backpressure
| Category | Capability |
|---|---|
| Load Profiles | constant, ramp_up, ramp_down, step, spike |
| Metrics | p50, p90, p95, p99, RPS, error_ratio, min/max/mean |
| Distribution | Multiple agents, VUs split automatically |
| Thresholds | < > <= >= on any metric, global or per-scenario |
| Observability | Prometheus export, structured JSON logs, Grafana-ready |
| Reporting | HTML (human) + JSON (machine) reports, time-series data |
| CI/CD | Exit code 0 pass / 1 fail, GitHub Actions friendly |
- Go 1.21+
make(optional, but recommended)
# Linux / macOS
make deps && make build
# Windows (PowerShell)
go mod download
go mod tidy
go build -o bin/controller.exe ./cmd/controller
go build -o bin/agent.exe ./cmd/agent
go build -o bin/indus-tester.exe ./cmd/indus-testerBinaries are placed in bin/:
| Binary | Role |
|---|---|
bin/controller |
Coordinates all agents, aggregates metrics, evaluates thresholds |
bin/agent |
Executes Virtual Users and streams metrics back |
bin/indus-tester |
CLI — submits plans, streams progress, generates reports |
# Build and start (one-time)
go build -o bin/mock-server ./scripts/mock-server.go
./bin/mock-server # starts on http://localhost:80802026/02/28 17:18:37 🚀 Mock server starting on :8080
2026/02/28 17:18:37 Test with: curl http://localhost:8080/healthz
# Terminal 1
./bin/agent --id=agent-1 --grpc-addr=:50052 --metrics-addr=:9091
# Terminal 2
./bin/agent --id=agent-2 --grpc-addr=:50053 --metrics-addr=:9092Agent log output:
{"time":"2026-02-28T17:18:57+05:30","level":"INFO","msg":"agent gRPC server starting","agent_id":"agent-1","addr":":50052"}
{"time":"2026-02-28T17:18:57+05:30","level":"INFO","msg":"agent metrics server starting","agent_id":"agent-1","addr":":9091"}# Terminal 3
./bin/controller \
--grpc-addr=:50051 \
--metrics-addr=:9090 \
--agents=localhost:50052,localhost:50053./bin/indus-tester --controller=localhost:50051 run examples/basic.yamlLive output:
Run started: 059a8d29 (state: running)
STATE ELAPSED VUs RPS p95(ms) ERRORS
────────── ────────── ──────── ────────── ────────── ──────────
running 8s 10 1.7 494.67 0.00%
running 16s 10 1.8 491.20 0.00%
running 24s 10 1.9 489.45 0.00%
running 30s 10 2.0 487.85 0.00%
Final result:
Run completed: 059a8d29
THRESHOLD RESULTS:
✓ p95 < 500ms actual: 487.85ms
✓ error_ratio < 1% actual: 0.00%
All thresholds passed. Exit code: 0
Goal: Confirm the API handles 10 concurrent users for 30 seconds.
name: "basic-api-test"
scenarios:
get-users:
duration: "30s"
think_time: "200ms"
profile:
type: constant
vus: 10
steps:
- name: "list-users"
method: GET
url: "http://localhost:8080/api/users"
timeout: "5s"
thresholds:
- metric: p95
condition: "<"
value: 500 # p95 < 500ms
- metric: error_ratio
condition: "<"
value: 0.01 # errors < 1%Command:
./bin/indus-tester --controller=localhost:50051 run examples/basic.yamlResult:
STATE ELAPSED VUs RPS p95(ms) ERRORS
running 30s 10 2.0 487.85 0.00%
THRESHOLD RESULTS:
✓ p95 < 500 actual: 487.85
✓ error_ratio < 0.01 actual: 0.00
Exit: 0 ✅
Goal: Find the system's breaking point by ramping 5 → 200 VUs over 2 minutes.
name: "ramp-stress-test"
scenarios:
api-stress:
duration: "120s"
think_time: "100ms"
profile:
type: ramp_up
start_vus: 5
end_vus: 200
steps:
- name: "create-order"
method: POST
url: "http://localhost:8080/api/orders"
body: '{"item": "widget", "quantity": 1}'
headers:
Content-Type: "application/json"
thresholds:
- metric: p95
condition: "<"
value: 1000 # p95 < 1s
- metric: p99
condition: "<"
value: 3000 # p99 < 3s
- metric: error_ratio
condition: "<"
value: 0.05 # errors < 5%
- metric: rps
condition: ">"
value: 100 # must sustain > 100 RPSExpected live output (showing latency increase as VUs ramp):
STATE ELAPSED VUs RPS p95(ms) ERRORS
running 15s 21 18.2 142.30 0.00%
running 45s 67 55.4 389.10 0.10%
running 90s 133 88.7 742.60 2.30%
running 120s 200 101.2 994.45 4.80%
THRESHOLD RESULTS:
✓ p95 < 1000 actual: 994.45
✓ p99 < 3000 actual: 2104.30
✓ error_ratio < 0.05 actual: 0.048
✓ rps > 100 actual: 101.2
Exit: 0 ✅
Interpretation: The system starts struggling at ~133 VUs (p95 crosses 700ms). At 200 VU it's at the edge. Set your SLAs accordingly.
Goal: Simulate a flash sale. 5 base VUs → 100 spike VUs for 15s → back to 5.
scenarios:
checkout-spike:
duration: "90s"
profile:
type: spike
base_vus: 5
spike_vus: 100
spike_at: "30s"
spike_duration: "15s"
steps:
- name: "checkout"
method: POST
url: "http://localhost:8080/api/checkout"
body: '{"payment_method": "card"}'
thresholds:
- metric: p95
scenario: "checkout-spike"
condition: "<"
value: 2000 # LB/system may be slow during spike, but keep p95 < 2s
- metric: error_ratio
scenario: "checkout-spike"
condition: "<"
value: 0.05Expected live output:
STATE ELAPSED VUs RPS p95(ms) ERRORS
running 25s 5 2.1 210.40 0.00% ← base load
running 32s 100 24.3 1840.50 1.10% ← spike hit!
running 47s 100 25.8 1920.30 1.80% ← peak of spike
running 50s 5 2.2 240.10 0.00% ← recovered
THRESHOLD RESULTS:
✓ p95 < 2000 [checkout-spike] actual: 1920.30
✓ error_ratio < 0.05 [checkout-spike] actual: 0.018
Exit: 0 ✅
Goal: Step up VUs every 30s to identify the exact degradation threshold.
scenarios:
step-api:
duration: "120s"
profile:
type: step
start_vus: 10
step_vus: 20
step_duration: "30s" # 10 → 30 → 50 → 70 VUs
steps:
- name: "heavy-report"
method: GET
url: "http://localhost:8080/api/reports/summary"
timeout: "15s"Expected live output:
STATE ELAPSED VUs RPS p95(ms) ERRORS
running 15s 10 3.1 185.20 0.00% ← step 1
running 45s 30 8.9 420.80 0.00% ← step 2
running 75s 50 14.2 880.10 0.40% ← step 3 — degrading
running 105s 70 17.5 1840.60 3.20% ← step 4 — near limit
Interpretation: Performance degrades significantly at 50 VUs. Set your max-capacity at 40 VUs for this endpoint.
Goal: Comprehensive LB validation — all 4 profile types against your load balancer.
./bin/indus-tester --controller=localhost:50051 run examples/lb-test.yamlExpected output (60s test):
Run started: eb0ef8fb
STATE ELAPSED VUs RPS p95(ms) ERRORS
running 10s 35 6.4 412.30 0.00%
running 30s 90 14.8 525.47 0.00%
running 50s 100 18.2 610.30 0.40%
running 60s 10 3.2 490.10 0.00% ← spike recovered
THRESHOLD RESULTS:
✓ p95 < 1000 actual: 610.30
✓ error_ratio < 0.02 actual: 0.004
✓ p95 < 300 [baseline-health] actual: 241.10
✓ error_ratio < 0.005 [baseline] actual: 0.000
✓ p95 < 2000 [traffic-spike] actual: 1840.20
✓ error_ratio < 0.05 [spike] actual: 0.018
✓ rps > 50 actual: 18.2
Exit: 0 ✅
name: "my-test" # Human-readable test name
scenarios:
scenario-name: # Unique scenario identifier
duration: "60s" # How long this scenario runs
think_time: "200ms" # Pause between each VU iteration
profile:
type: constant # constant | ramp_up | ramp_down | step | spike
vus: 50 # (constant) fixed VU count
# ramp_up / ramp_down
start_vus: 5
end_vus: 100
# step
start_vus: 10
step_vus: 20 # VUs added per step
step_duration: "30s" # Duration of each step
# spike
base_vus: 10
spike_vus: 200
spike_at: "30s" # When to begin the spike
spike_duration: "15s" # Duration of the spike
steps: # HTTP steps executed per VU iteration
- name: "step-name"
method: GET # GET | POST | PUT | DELETE | PATCH
url: "http://host/path"
timeout: "5s"
headers:
Accept: "application/json"
Content-Type: "application/json"
body: '{"key": "value"}' # Request body (string)
tags:
endpoint: "users" # Custom tags for metric breakdown
thresholds: # Optional pass/fail SLA gates
- metric: p95 # p50 | p90 | p95 | p99 | error_ratio | rps
condition: "<" # < | > | <= | >=
value: 500 # Milliseconds for latency; ratio for error_ratio
scenario: "scenario-name" # Optional: scope to one scenarioUsage:
indus-tester [command]
Available Commands:
run Execute a load test plan
status Get run status by ID
agents List connected agents
report Generate HTML or JSON report
Flags:
--controller string Controller gRPC address (default "localhost:50051")
./bin/indus-tester run examples/basic.yaml
./bin/indus-tester --controller=host:50051 run plan.yaml
./bin/indus-tester run plan.yaml --no-stream # submit without live output./bin/indus-tester status 059a8d29Run: 059a8d29
State: completed
Duration: 30s
Requests: 62 Errors: 0 RPS: 2.07
p50: 312ms p95: 487ms p99: 503ms
./bin/indus-tester agentsID ADDRESS STATE VUs
agent-1 localhost:50052 active 5
agent-2 localhost:50053 active 5
./bin/indus-tester report 059a8d29 --format=html -o my-report.html
./bin/indus-tester report 059a8d29 --format=json -o my-report.json| Flag | Default | Description |
|---|---|---|
--grpc-addr |
:50051 |
gRPC listen address |
--metrics-addr |
:9090 |
Prometheus metrics endpoint |
--agents |
(required) | Comma-separated host:port list |
| Flag | Default | Description |
|---|---|---|
--id |
agent-0 |
Unique agent identifier |
--grpc-addr |
:50052 |
gRPC listen address |
--metrics-addr |
:9091 |
Prometheus metrics endpoint |
| Flag | Default | Description |
|---|---|---|
--controller |
localhost:50051 |
Controller address |
| Endpoint | Component |
|---|---|
http://localhost:9090/metrics |
Controller |
http://localhost:9091/metrics |
Agent 1 |
http://localhost:9092/metrics |
Agent 2 |
Key exported metrics:
indus_controller_requests_total
indus_controller_request_duration_seconds{quantile="0.95"}
indus_controller_active_vus
indus_controller_thresholds_passed
indus_controller_thresholds_failed
indus_controller_run_state
# prometheus.yml (already included in repo)
scrape_configs:
- job_name: 'indus-controller'
static_configs:
- targets: ['localhost:9090']
- job_name: 'indus-agents'
static_configs:
- targets: ['localhost:9091', 'localhost:9092']# Start Prometheus
docker run -p 9091:9090 \
-v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheusAll components emit structured logs to stderr:
{"time":"2026-02-28T17:19:54+05:30","level":"INFO","msg":"RunScenario stream opened","component":"agent","agent_id":"agent-1"}
{"time":"2026-02-28T17:19:54+05:30","level":"INFO","msg":"assigning scenario","component":"agent","agent_id":"agent-1","scenario":"get-users","run_id":"059a8d29","target_vus":5}
{"time":"2026-02-28T17:20:24+05:30","level":"INFO","msg":"scenario duration elapsed","component":"agent","agent_id":"agent-1","scenario":"get-users"}Fields: time, level, msg, component, run_id, agent_id, scenario, addr
| Code | Meaning |
|---|---|
0 |
All thresholds passed — deploy! |
1 |
Threshold violation or error — block deploy |
name: Load Test
on: [push, pull_request]
jobs:
load-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: '1.21'
- name: Build
run: make build
- name: Start services
run: |
./bin/agent --id=agent-1 --grpc-addr=:50052 --metrics-addr=:9091 &
./bin/agent --id=agent-2 --grpc-addr=:50053 --metrics-addr=:9092 &
sleep 1
./bin/controller --grpc-addr=:50051 --metrics-addr=:9090 \
--agents=localhost:50052,localhost:50053 &
sleep 2
- name: Run load test
run: ./bin/indus-tester --controller=localhost:50051 run examples/basic.yaml
- name: Upload report
if: always()
uses: actions/upload-artifact@v4
with:
name: load-test-report
path: report.htmldocker compose upServices started:
controller—localhost:50051agent-1—localhost:50052agent-2—localhost:50053prometheus—localhost:9090
Each VU is a goroutine that:
- Runs HTTP steps in sequence
- Records timing + status for each step
- Sends a
Sampleto the per-agent aggregator channel - Waits
think_timebefore repeating - Shuts down cleanly when context is cancelled
The scheduler pre-computes a 1-second resolution VU schedule:
t=0s → 5 VUs/agent (constant of 10 total)
t=1s → 5 VUs/agent
...
For ramp profiles, it computes intermediate values using linear interpolation. For spike, it injects a high-VU window. For step, it increments at each step_duration boundary.
Adjust commands are streamed to agents whenever the target VU count changes.
VU goroutine
└─ executes HTTP request
└─ creates Sample{scenario, step, duration_ns, status_code, is_error}
└─ pushes to buffered channel (capacity 10,000)
Aggregator goroutine (per agent)
└─ drains channel
└─ updates HDR histogram (per scenario, per step)
└─ updates atomic counters (total_reqs, total_errs, active_vus)
└─ emits Snapshot every 1s → gRPC stream to controller
Controller
└─ receives Snapshot from each agent
└─ merges into global HDR histogram
└─ evaluates thresholds
└─ streams ProgressUpdate to CLI
thresholds:
- metric: p95 # Evaluated against global or per-scenario snapshot
condition: "<" # Supports: < > <= >=
value: 500 # Milliseconds for latency metrics; ratio (0.0–1.0) for error_ratio
scenario: "my-sc" # Optional — scopes check to one scenario onlyThreshold evaluation happens:
- Continuously during the run (every 1s) — for early abort if desired
- Finally when all scenarios complete — determines the exit code
indus-tester/
├── cmd/
│ ├── agent/ # Agent binary entry point
│ ├── controller/ # Controller binary entry point
│ └── indus-tester/ # CLI entry point
├── internal/
│ ├── agent/ # Agent gRPC server + VU orchestration
│ ├── controller/ # Controller gRPC server + scheduler + aggregation
│ └── proto/ # gRPC codec + generated types
├── pkg/
│ ├── executor/ # HTTP execution engine
│ ├── metrics/ # HDR histogram aggregator + snapshot
│ ├── observability/ # Logger + Prometheus metrics
│ ├── plan/ # YAML parser + plan types
│ ├── report/ # HTML + JSON report generation
│ ├── scheduler/ # VU count schedule computation
│ ├── threshold/ # Pass/fail threshold engine
│ └── vu/ # Virtual user goroutine
├── proto/
│ └── indus.proto # gRPC service definition
├── examples/
│ ├── basic.yaml
│ ├── ramp-stress.yaml
│ ├── step-load.yaml
│ ├── multi-scenario-spike.yaml
│ └── lb-test.yaml # Load balancer performance suite
├── docs/
│ ├── architecture.svg
│ ├── load-profiles.svg
│ ├── metrics-pipeline.svg
│ └── sequence-diagram.svg
└── scripts/
└── mock-server.go # Mock HTTP target for local testing
| Characteristic | Implementation |
|---|---|
| Lock-free metric ingestion | Buffered channels + atomic counters |
| Memory-bounded histograms | HDR histogram with fixed bucket count |
| Connection reuse | HTTP client with keep-alive pool per VU |
| Zero goroutine leaks | Context propagation from CLI → VU |
| Graceful shutdown | SIGTERM → context cancel → drain → exit |
| Backpressure | Aggregator channel drops oldest on overflow |
# Run all tests
make test
# Format code
make fmt
# Vet code
make vet
# Clean build artifacts
make clean- Browser automation → use Playwright / Selenium
- Custom scripting language → fork and extend in Go
- Plugin system → fork and extend
- Built-in web UI → use Grafana + Prometheus
- Persistent run history → add a database layer
MIT — see LICENSE
This is a reference implementation. For production hardening, consider:
- Persistent state — database-backed run storage
- Controller HA — leader election (etcd / Raft)
- Dynamic agent discovery — service mesh / Consul
- Protocol coverage — gRPC, WebSocket, TCP load testing
- Advanced profiles — custom scripted profiles