⚡ INDUS Tester

A production-grade, Go-native distributed load-testing platform

Real performance testing for CI/CD pipelines, staging environments, and load balancer validation

System Architecture

Key Design Principles:

Time-driven load modeling (not request loops)
Single controller coordination (no peer-to-peer)
Online metric aggregation with HDR histograms
Context-driven cancellation throughout
Lock-free hot paths for performance
Streaming metrics with backpressure

Features

Category	Capability
Load Profiles	`constant`, `ramp_up`, `ramp_down`, `step`, `spike`
Metrics	p50, p90, p95, p99, RPS, error_ratio, min/max/mean
Distribution	Multiple agents, VUs split automatically
Thresholds	`<` `>` `<=` `>=` on any metric, global or per-scenario
Observability	Prometheus export, structured JSON logs, Grafana-ready
Reporting	HTML (human) + JSON (machine) reports, time-series data
CI/CD	Exit code `0` pass / `1` fail, GitHub Actions friendly

Load Profiles

Metrics Pipeline

Execution Sequence (UML)

Quick Start

Prerequisites

Go 1.21+
make (optional, but recommended)

1. Build

# Linux / macOS
make deps && make build

# Windows (PowerShell)
go mod download
go mod tidy
go build -o bin/controller.exe ./cmd/controller
go build -o bin/agent.exe      ./cmd/agent
go build -o bin/indus-tester.exe ./cmd/indus-tester

Binaries are placed in bin/:

Binary	Role
`bin/controller`	Coordinates all agents, aggregates metrics, evaluates thresholds
`bin/agent`	Executes Virtual Users and streams metrics back
`bin/indus-tester`	CLI — submits plans, streams progress, generates reports

2. Start the Mock Target Server

# Build and start (one-time)
go build -o bin/mock-server ./scripts/mock-server.go
./bin/mock-server          # starts on http://localhost:8080

2026/02/28 17:18:37 🚀 Mock server starting on :8080
2026/02/28 17:18:37    Test with: curl http://localhost:8080/healthz

3. Start Agents

# Terminal 1
./bin/agent --id=agent-1 --grpc-addr=:50052 --metrics-addr=:9091

# Terminal 2
./bin/agent --id=agent-2 --grpc-addr=:50053 --metrics-addr=:9092

Agent log output:

{"time":"2026-02-28T17:18:57+05:30","level":"INFO","msg":"agent gRPC server starting","agent_id":"agent-1","addr":":50052"}
{"time":"2026-02-28T17:18:57+05:30","level":"INFO","msg":"agent metrics server starting","agent_id":"agent-1","addr":":9091"}

4. Start the Controller

# Terminal 3
./bin/controller \
  --grpc-addr=:50051 \
  --metrics-addr=:9090 \
  --agents=localhost:50052,localhost:50053

5. Run Your First Test

./bin/indus-tester --controller=localhost:50051 run examples/basic.yaml

Live output:

Run started: 059a8d29 (state: running)

STATE      ELAPSED         VUs        RPS    p95(ms)     ERRORS
────────── ────────── ──────── ────────── ────────── ──────────
running    8s               10        1.7     494.67      0.00%
running    16s              10        1.8     491.20      0.00%
running    24s              10        1.9     489.45      0.00%
running    30s              10        2.0     487.85      0.00%

Final result:

Run completed: 059a8d29

THRESHOLD RESULTS:
  ✓  p95 < 500ms          actual: 487.85ms
  ✓  error_ratio < 1%     actual: 0.00%

All thresholds passed. Exit code: 0

Example Test Flows & Results

Example 1 — Basic Constant Load (`examples/basic.yaml`)

Goal: Confirm the API handles 10 concurrent users for 30 seconds.

name: "basic-api-test"

scenarios:
  get-users:
    duration: "30s"
    think_time: "200ms"
    profile:
      type: constant
      vus: 10
    steps:
      - name: "list-users"
        method: GET
        url: "http://localhost:8080/api/users"
        timeout: "5s"

thresholds:
  - metric: p95
    condition: "<"
    value: 500      # p95 < 500ms
  - metric: error_ratio
    condition: "<"
    value: 0.01     # errors < 1%

Command:

./bin/indus-tester --controller=localhost:50051 run examples/basic.yaml

Result:

STATE      ELAPSED    VUs    RPS    p95(ms)   ERRORS
running    30s        10     2.0    487.85    0.00%

THRESHOLD RESULTS:
  ✓  p95 < 500         actual: 487.85
  ✓  error_ratio < 0.01  actual: 0.00

Exit: 0  ✅

Example 2 — Ramp-Up Stress Test (`examples/ramp-stress.yaml`)

Goal: Find the system's breaking point by ramping 5 → 200 VUs over 2 minutes.

name: "ramp-stress-test"

scenarios:
  api-stress:
    duration: "120s"
    think_time: "100ms"
    profile:
      type: ramp_up
      start_vus: 5
      end_vus: 200
    steps:
      - name: "create-order"
        method: POST
        url: "http://localhost:8080/api/orders"
        body: '{"item": "widget", "quantity": 1}'
        headers:
          Content-Type: "application/json"

thresholds:
  - metric: p95
    condition: "<"
    value: 1000     # p95 < 1s
  - metric: p99
    condition: "<"
    value: 3000     # p99 < 3s
  - metric: error_ratio
    condition: "<"
    value: 0.05     # errors < 5%
  - metric: rps
    condition: ">"
    value: 100      # must sustain > 100 RPS

Expected live output (showing latency increase as VUs ramp):

STATE      ELAPSED    VUs    RPS     p95(ms)   ERRORS
running    15s        21     18.2    142.30    0.00%
running    45s        67     55.4    389.10    0.10%
running    90s        133    88.7    742.60    2.30%
running    120s       200    101.2   994.45    4.80%

THRESHOLD RESULTS:
  ✓  p95 < 1000        actual: 994.45
  ✓  p99 < 3000        actual: 2104.30
  ✓  error_ratio < 0.05  actual: 0.048
  ✓  rps > 100         actual: 101.2

Exit: 0  ✅

Interpretation: The system starts struggling at ~133 VUs (p95 crosses 700ms). At 200 VU it's at the edge. Set your SLAs accordingly.

Example 3 — Traffic Spike (`examples/multi-scenario-spike.yaml`)

Goal: Simulate a flash sale. 5 base VUs → 100 spike VUs for 15s → back to 5.

scenarios:
  checkout-spike:
    duration: "90s"
    profile:
      type: spike
      base_vus: 5
      spike_vus: 100
      spike_at: "30s"
      spike_duration: "15s"
    steps:
      - name: "checkout"
        method: POST
        url: "http://localhost:8080/api/checkout"
        body: '{"payment_method": "card"}'

thresholds:
  - metric: p95
    scenario: "checkout-spike"
    condition: "<"
    value: 2000     # LB/system may be slow during spike, but keep p95 < 2s
  - metric: error_ratio
    scenario: "checkout-spike"
    condition: "<"
    value: 0.05

Expected live output:

STATE      ELAPSED    VUs    RPS    p95(ms)   ERRORS
running    25s        5      2.1    210.40    0.00%   ← base load
running    32s        100    24.3   1840.50   1.10%   ← spike hit!
running    47s        100    25.8   1920.30   1.80%   ← peak of spike
running    50s        5      2.2    240.10    0.00%   ← recovered

THRESHOLD RESULTS:
  ✓  p95 < 2000 [checkout-spike]   actual: 1920.30
  ✓  error_ratio < 0.05 [checkout-spike]  actual: 0.018

Exit: 0  ✅

Example 4 — Step Load (`examples/step-load.yaml`)

Goal: Step up VUs every 30s to identify the exact degradation threshold.

scenarios:
  step-api:
    duration: "120s"
    profile:
      type: step
      start_vus: 10
      step_vus: 20
      step_duration: "30s"   # 10 → 30 → 50 → 70 VUs
    steps:
      - name: "heavy-report"
        method: GET
        url: "http://localhost:8080/api/reports/summary"
        timeout: "15s"

Expected live output:

STATE      ELAPSED    VUs    RPS    p95(ms)   ERRORS
running    15s        10     3.1    185.20    0.00%   ← step 1
running    45s        30     8.9    420.80    0.00%   ← step 2
running    75s        50     14.2   880.10    0.40%   ← step 3 — degrading
running    105s       70     17.5   1840.60   3.20%   ← step 4 — near limit

Interpretation: Performance degrades significantly at 50 VUs. Set your max-capacity at 40 VUs for this endpoint.

Example 5 — Load Balancer Performance Suite (`examples/lb-test.yaml`)

Goal: Comprehensive LB validation — all 4 profile types against your load balancer.

./bin/indus-tester --controller=localhost:50051 run examples/lb-test.yaml

Expected output (60s test):

Run started: eb0ef8fb

STATE      ELAPSED    VUs    RPS    p95(ms)   ERRORS
running    10s        35     6.4    412.30    0.00%
running    30s        90     14.8   525.47    0.00%
running    50s        100    18.2   610.30    0.40%
running    60s        10     3.2    490.10    0.00%   ← spike recovered

THRESHOLD RESULTS:
  ✓  p95 < 1000                    actual: 610.30
  ✓  error_ratio < 0.02            actual: 0.004
  ✓  p95 < 300 [baseline-health]   actual: 241.10
  ✓  error_ratio < 0.005 [baseline] actual: 0.000
  ✓  p95 < 2000 [traffic-spike]    actual: 1840.20
  ✓  error_ratio < 0.05 [spike]    actual: 0.018
  ✓  rps > 50                      actual: 18.2

Exit: 0  ✅

Test Plan Schema Reference

name: "my-test"                    # Human-readable test name

scenarios:
  scenario-name:                   # Unique scenario identifier
    duration: "60s"                # How long this scenario runs
    think_time: "200ms"            # Pause between each VU iteration

    profile:
      type: constant               # constant | ramp_up | ramp_down | step | spike
      vus: 50                      # (constant) fixed VU count

      # ramp_up / ramp_down
      start_vus: 5
      end_vus: 100

      # step
      start_vus: 10
      step_vus: 20                 # VUs added per step
      step_duration: "30s"         # Duration of each step

      # spike
      base_vus: 10
      spike_vus: 200
      spike_at: "30s"              # When to begin the spike
      spike_duration: "15s"        # Duration of the spike

    steps:                         # HTTP steps executed per VU iteration
      - name: "step-name"
        method: GET                # GET | POST | PUT | DELETE | PATCH
        url: "http://host/path"
        timeout: "5s"
        headers:
          Accept: "application/json"
          Content-Type: "application/json"
        body: '{"key": "value"}'   # Request body (string)
        tags:
          endpoint: "users"        # Custom tags for metric breakdown

thresholds:                        # Optional pass/fail SLA gates
  - metric: p95                    # p50 | p90 | p95 | p99 | error_ratio | rps
    condition: "<"                 # < | > | <= | >=
    value: 500                     # Milliseconds for latency; ratio for error_ratio
    scenario: "scenario-name"      # Optional: scope to one scenario

CLI Reference

Usage:
  indus-tester [command]

Available Commands:
  run      Execute a load test plan
  status   Get run status by ID
  agents   List connected agents
  report   Generate HTML or JSON report

Flags:
  --controller string   Controller gRPC address (default "localhost:50051")

`run`

./bin/indus-tester run examples/basic.yaml
./bin/indus-tester --controller=host:50051 run plan.yaml
./bin/indus-tester run plan.yaml --no-stream      # submit without live output

`status`

./bin/indus-tester status 059a8d29

Run: 059a8d29
State: completed
Duration: 30s
Requests: 62  Errors: 0  RPS: 2.07
p50: 312ms  p95: 487ms  p99: 503ms

`agents`

./bin/indus-tester agents

ID        ADDRESS             STATE     VUs
agent-1   localhost:50052     active    5
agent-2   localhost:50053     active    5

`report`

./bin/indus-tester report 059a8d29 --format=html -o my-report.html
./bin/indus-tester report 059a8d29 --format=json -o my-report.json

Configuration Reference

Controller Flags

Flag	Default	Description
`--grpc-addr`	`:50051`	gRPC listen address
`--metrics-addr`	`:9090`	Prometheus metrics endpoint
`--agents`	(required)	Comma-separated `host:port` list

Agent Flags

Flag	Default	Description
`--id`	`agent-0`	Unique agent identifier
`--grpc-addr`	`:50052`	gRPC listen address
`--metrics-addr`	`:9091`	Prometheus metrics endpoint

CLI Flags

Flag	Default	Description
`--controller`	`localhost:50051`	Controller address

Observability

Prometheus Metrics

Endpoint	Component
`http://localhost:9090/metrics`	Controller
`http://localhost:9091/metrics`	Agent 1
`http://localhost:9092/metrics`	Agent 2

Key exported metrics:

indus_controller_requests_total
indus_controller_request_duration_seconds{quantile="0.95"}
indus_controller_active_vus
indus_controller_thresholds_passed
indus_controller_thresholds_failed
indus_controller_run_state

Prometheus + Grafana Setup

# prometheus.yml (already included in repo)
scrape_configs:
  - job_name: 'indus-controller'
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'indus-agents'
    static_configs:
      - targets: ['localhost:9091', 'localhost:9092']

# Start Prometheus
docker run -p 9091:9090 \
  -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus

Structured JSON Logs

All components emit structured logs to stderr:

{"time":"2026-02-28T17:19:54+05:30","level":"INFO","msg":"RunScenario stream opened","component":"agent","agent_id":"agent-1"}
{"time":"2026-02-28T17:19:54+05:30","level":"INFO","msg":"assigning scenario","component":"agent","agent_id":"agent-1","scenario":"get-users","run_id":"059a8d29","target_vus":5}
{"time":"2026-02-28T17:20:24+05:30","level":"INFO","msg":"scenario duration elapsed","component":"agent","agent_id":"agent-1","scenario":"get-users"}

Fields: time, level, msg, component, run_id, agent_id, scenario, addr

CI/CD Integration

Exit Codes

Code	Meaning
`0`	All thresholds passed — deploy!
`1`	Threshold violation or error — block deploy

GitHub Actions

name: Load Test
on: [push, pull_request]

jobs:
  load-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Go
        uses: actions/setup-go@v5
        with:
          go-version: '1.21'

      - name: Build
        run: make build

      - name: Start services
        run: |
          ./bin/agent --id=agent-1 --grpc-addr=:50052 --metrics-addr=:9091 &
          ./bin/agent --id=agent-2 --grpc-addr=:50053 --metrics-addr=:9092 &
          sleep 1
          ./bin/controller --grpc-addr=:50051 --metrics-addr=:9090 \
            --agents=localhost:50052,localhost:50053 &
          sleep 2

      - name: Run load test
        run: ./bin/indus-tester --controller=localhost:50051 run examples/basic.yaml

      - name: Upload report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: load-test-report
          path: report.html

Docker Compose (included)

docker compose up

Services started:

controller — localhost:50051
agent-1 — localhost:50052
agent-2 — localhost:50053
prometheus — localhost:9090

Architecture Deep Dive

Virtual User Execution Model

Each VU is a goroutine that:

Runs HTTP steps in sequence
Records timing + status for each step
Sends a Sample to the per-agent aggregator channel
Waits think_time before repeating
Shuts down cleanly when context is cancelled

Scheduler

The scheduler pre-computes a 1-second resolution VU schedule:

t=0s  → 5 VUs/agent  (constant of 10 total)
t=1s  → 5 VUs/agent
...

For ramp profiles, it computes intermediate values using linear interpolation. For spike, it injects a high-VU window. For step, it increments at each step_duration boundary.

Adjust commands are streamed to agents whenever the target VU count changes.

Metrics Collection

VU goroutine
  └─ executes HTTP request
  └─ creates Sample{scenario, step, duration_ns, status_code, is_error}
  └─ pushes to buffered channel (capacity 10,000)

Aggregator goroutine  (per agent)
  └─ drains channel
  └─ updates HDR histogram (per scenario, per step)
  └─ updates atomic counters (total_reqs, total_errs, active_vus)
  └─ emits Snapshot every 1s → gRPC stream to controller

Controller
  └─ receives Snapshot from each agent
  └─ merges into global HDR histogram
  └─ evaluates thresholds
  └─ streams ProgressUpdate to CLI

Threshold Engine

thresholds:
  - metric: p95          # Evaluated against global or per-scenario snapshot
    condition: "<"       # Supports: < > <= >=
    value: 500           # Milliseconds for latency metrics; ratio (0.0–1.0) for error_ratio
    scenario: "my-sc"    # Optional — scopes check to one scenario only

Threshold evaluation happens:

Continuously during the run (every 1s) — for early abort if desired
Finally when all scenarios complete — determines the exit code

Project Structure

indus-tester/
├── cmd/
│   ├── agent/           # Agent binary entry point
│   ├── controller/      # Controller binary entry point
│   └── indus-tester/    # CLI entry point
├── internal/
│   ├── agent/           # Agent gRPC server + VU orchestration
│   ├── controller/      # Controller gRPC server + scheduler + aggregation
│   └── proto/           # gRPC codec + generated types
├── pkg/
│   ├── executor/        # HTTP execution engine
│   ├── metrics/         # HDR histogram aggregator + snapshot
│   ├── observability/   # Logger + Prometheus metrics
│   ├── plan/            # YAML parser + plan types
│   ├── report/          # HTML + JSON report generation
│   ├── scheduler/       # VU count schedule computation
│   ├── threshold/       # Pass/fail threshold engine
│   └── vu/              # Virtual user goroutine
├── proto/
│   └── indus.proto      # gRPC service definition
├── examples/
│   ├── basic.yaml
│   ├── ramp-stress.yaml
│   ├── step-load.yaml
│   ├── multi-scenario-spike.yaml
│   └── lb-test.yaml     # Load balancer performance suite
├── docs/
│   ├── architecture.svg
│   ├── load-profiles.svg
│   ├── metrics-pipeline.svg
│   └── sequence-diagram.svg
└── scripts/
    └── mock-server.go   # Mock HTTP target for local testing

Performance Characteristics

Characteristic	Implementation
Lock-free metric ingestion	Buffered channels + atomic counters
Memory-bounded histograms	HDR histogram with fixed bucket count
Connection reuse	HTTP client with keep-alive pool per VU
Zero goroutine leaks	Context propagation from CLI → VU
Graceful shutdown	SIGTERM → context cancel → drain → exit
Backpressure	Aggregator channel drops oldest on overflow

Development

# Run all tests
make test

# Format code
make fmt

# Vet code
make vet

# Clean build artifacts
make clean

Non-Goals

Browser automation → use Playwright / Selenium
Custom scripting language → fork and extend in Go
Plugin system → fork and extend
Built-in web UI → use Grafana + Prometheus
Persistent run history → add a database layer

License

MIT — see LICENSE

Contributing

This is a reference implementation. For production hardening, consider:

Persistent state — database-backed run storage
Controller HA — leader election (etcd / Raft)
Dynamic agent discovery — service mesh / Consul
Protocol coverage — gRPC, WebSocket, TCP load testing
Advanced profiles — custom scripted profiles

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
cmd		cmd
docs		docs
examples		examples
internal		internal
pkg		pkg
proto		proto
scripts		scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.agent		Dockerfile.agent
Dockerfile.controller		Dockerfile.controller
GETTING_STARTED.md		GETTING_STARTED.md
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
prometheus.yml		prometheus.yml

Folders and files

Latest commit

History

Repository files navigation

⚡ INDUS Tester

System Architecture

Features

Load Profiles

Metrics Pipeline

Execution Sequence (UML)

Quick Start

Prerequisites

1. Build

2. Start the Mock Target Server

3. Start Agents

4. Start the Controller

5. Run Your First Test

Example Test Flows & Results

Example 1 — Basic Constant Load (examples/basic.yaml)

Example 2 — Ramp-Up Stress Test (examples/ramp-stress.yaml)

Example 3 — Traffic Spike (examples/multi-scenario-spike.yaml)

Example 4 — Step Load (examples/step-load.yaml)

Example 5 — Load Balancer Performance Suite (examples/lb-test.yaml)

Test Plan Schema Reference

CLI Reference

run

status

agents

report

Configuration Reference

Controller Flags

Agent Flags

CLI Flags

Observability

Prometheus Metrics

Prometheus + Grafana Setup

Structured JSON Logs

CI/CD Integration

Exit Codes

GitHub Actions

Docker Compose (included)

Architecture Deep Dive

Virtual User Execution Model

Scheduler

Metrics Collection

Threshold Engine

Project Structure

Performance Characteristics

Development

Non-Goals

License

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Example 1 — Basic Constant Load (`examples/basic.yaml`)

Example 2 — Ramp-Up Stress Test (`examples/ramp-stress.yaml`)

Example 3 — Traffic Spike (`examples/multi-scenario-spike.yaml`)

Example 4 — Step Load (`examples/step-load.yaml`)

Example 5 — Load Balancer Performance Suite (`examples/lb-test.yaml`)

`run`

`status`

`agents`

`report`

Packages