Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
name: CI

on:
push:
branches: [main, master]
pull_request:
branches: [main, master]

jobs:
test:
strategy:
matrix:
go-version: ['1.21', '1.22', '1.23']
os: [ubuntu-latest, macos-latest]

runs-on: ${{ matrix.os }}

steps:
- uses: actions/checkout@v4

- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: ${{ matrix.go-version }}

- name: Build
run: go build ./...

- name: Test
run: go test ./...

- name: Test with Race Detector
run: go test -race ./...

- name: Benchmark (sanity check)
run: go test -bench=. -benchtime=100ms ./internal/...

lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: '1.23'

- name: Run golangci-lint
uses: golangci/golangci-lint-action@v4
with:
version: latest

benchmark:
runs-on: ubuntu-latest
needs: test
if: github.event_name == 'push' && github.ref == 'refs/heads/main'

steps:
- uses: actions/checkout@v4

- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: '1.23'

- name: Run Benchmarks
run: |
go test -bench=. -count=5 -benchmem ./internal/... | tee benchmark_results.txt

- name: Upload Benchmark Results
uses: actions/upload-artifact@v4
with:
name: benchmark-results
path: benchmark_results.txt
266 changes: 266 additions & 0 deletions BENCHMARKING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
# Benchmarking Guide

This document provides guidance for running and interpreting benchmarks.

## Quick Start

```bash
# Run all benchmarks
make bench

# Run with multiple iterations for variance analysis
make bench-count

# Run specific package
go test -bench=. -benchmem ./internal/cancel
```

## Environment Setup

### Linux (Recommended)

For consistent, reproducible results:

```bash
# 1. Set CPU governor to performance (prevents frequency scaling)
sudo cpupower frequency-set -g performance

# 2. Disable turbo boost (for consistent clock speed)
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo

# 3. Verify CPU frequency is stable
watch -n1 "cat /proc/cpuinfo | grep MHz | head -4"

# 4. Check for background processes
top -bn1 | head -20
```

### GOMAXPROCS

Control how many OS threads execute Go code:

```bash
# Single-threaded execution (lowest variance, no goroutine scheduling noise)
GOMAXPROCS=1 go test -bench=. ./internal/...

# Match physical cores (no hyperthreading)
GOMAXPROCS=4 go test -bench=. ./internal/...

# Default: uses all logical CPUs (GOMAXPROCS=runtime.NumCPU())
go test -bench=. ./internal/...
```

**When to use:**
- `GOMAXPROCS=1`: Best for measuring raw single-threaded performance
- `GOMAXPROCS=N`: For parallel benchmarks (`b.RunParallel`)
- Default: For realistic multi-core scenarios

### Pinning to Single Core (Lowest Variance)

```bash
# Run on CPU 0 only
taskset -c 0 go test -bench=. ./internal/...

# Combined: single core + single GOMAXPROCS (ultimate isolation)
taskset -c 0 GOMAXPROCS=1 go test -bench=. ./internal/...
```

### Scheduler Priority (nice/renice)

Increase process priority to reduce interference from other processes:

```bash
# Run with highest priority (requires root)
sudo nice -n -20 go test -bench=. ./internal/...

# Or renice an existing process
sudo renice -n -20 -p $(pgrep -f "go test")
```

**Nice values:**
- `-20`: Highest priority (most CPU time)
- `0`: Default priority
- `19`: Lowest priority (least CPU time)

**Combined with CPU pinning for maximum isolation:**

```bash
sudo nice -n -20 taskset -c 0 GOMAXPROCS=1 go test -bench=. ./internal/...
```

> **Note:** High priority alone doesn't prevent context switches. For true isolation, combine with CPU pinning and consider isolating CPU cores from the scheduler (`isolcpus` kernel parameter).

### macOS

```bash
# Disable App Nap (can affect timing)
defaults write NSGlobalDomain NSAppSleepDisabled -bool YES

# Run with elevated priority (macOS equivalent of nice)
sudo nice -n -20 go test -bench=. ./internal/...
```

### Advanced: Kernel-Level CPU Isolation

For the most stable benchmarks on dedicated machines:

```bash
# 1. Add to kernel boot parameters (GRUB)
# isolcpus=2,3 nohz_full=2,3 rcu_nocbs=2,3

# 2. After reboot, CPUs 2-3 are isolated from scheduler
# Run benchmarks on isolated CPU:
sudo taskset -c 2 nice -n -20 GOMAXPROCS=1 go test -bench=. ./internal/...
```

This removes the CPUs from general scheduling entirely.

## Running Benchmarks

### Standard Run

```bash
go test -bench=. -benchmem ./internal/...
```

### With Variance Analysis

Run 10 iterations and analyze with `benchstat`:

```bash
# Install benchstat
go install golang.org/x/perf/cmd/benchstat@latest

# Run benchmarks
go test -bench=. -count=10 ./internal/... > results.txt

# Analyze
benchstat results.txt
```

### Comparing Before/After

```bash
# Before changes
go test -bench=. -count=10 ./internal/... > old.txt

# Make changes...

# After changes
go test -bench=. -count=10 ./internal/... > new.txt

# Compare
benchstat old.txt new.txt
```

## Interpreting Results

### Understanding Output

```
BenchmarkCancel_Atomic_Done_Direct-24 1000000000 0.34 ns/op 0 B/op 0 allocs/op
```

- `-24`: Number of CPUs used (GOMAXPROCS)
- `1000000000`: Iterations run
- `0.34 ns/op`: Time per operation
- `0 B/op`: Bytes allocated per operation
- `0 allocs/op`: Heap allocations per operation

### Expected Variance

- **Good:** < 2% variance
- **Acceptable:** 2-5% variance
- **Investigate:** > 5% variance

High variance causes and mitigations:

| Cause | Mitigation |
|-------|------------|
| Background processes | `nice -n -20`, close browsers/IDEs |
| CPU frequency scaling | Set governor to `performance` |
| Thermal throttling | Let CPU cool between runs |
| Memory pressure | Close memory-heavy apps |
| Goroutine scheduling | `GOMAXPROCS=1` |
| OS scheduler preemption | `taskset -c 0` + `nice -n -20` |
| Hyperthreading noise | Pin to physical core |

### Sanity Checks

1. **Allocations should be 0** for hot-path operations
2. **Relative ordering should be stable** across runs
3. **TSC results may vary** with CPU frequency changes

## CLI Tools

### cmd/context

Compare context cancellation checking:

```bash
go run ./cmd/context -n 10000000
```

### cmd/channel

Compare queue implementations:

```bash
go run ./cmd/channel -n 10000000 -size 1024
```

### cmd/ticker

Compare ticker implementations:

```bash
go run ./cmd/ticker -n 10000000
```

### cmd/context-ticker

Combined benchmark (most realistic):

```bash
go run ./cmd/context-ticker -n 10000000
```

## Typical Results

Results on AMD Ryzen Threadripper PRO 3945WX:

| Component | Standard | Optimized | Speedup |
|-----------|----------|-----------|---------|
| Cancel check | ~10 ns | ~0.3 ns | **30x** |
| Tick check | ~100 ns | ~6 ns (batch) | **16x** |
| Combined | ~96 ns | ~5 ns | **18x** |

## Caveats

1. **Micro-benchmarks measure one dimension** — Real applications have many factors
2. **Results are hardware-dependent** — Your mileage will vary
3. **go:linkname may break** — `runtime.nanotime` is internal
4. **TSC requires calibration** — Accuracy depends on CPU frequency stability

## Profiling

### CPU Profile

```bash
go test -bench=BenchmarkCancel -cpuprofile=cpu.prof ./internal/cancel
go tool pprof -http=:8080 cpu.prof
```

### Memory Profile

```bash
go test -bench=BenchmarkQueue -memprofile=mem.prof ./internal/queue
go tool pprof -http=:8080 mem.prof
```

### Trace

```bash
go test -bench=BenchmarkCombined -trace=trace.out ./internal/combined
go tool trace trace.out
```
Loading
Loading