A distributed task queue with Raft consensus, built from scratch in Go.
Think: mini-SQS + mini-Celery, with the hard parts implemented.
- Raft Consensus - From-scratch implementation with leader election, log replication, and snapshots
- At-Least-Once Delivery - Tasks are never lost; may be processed multiple times
- Visibility Timeout - SQS-style lease mechanism prevents duplicate processing
- Automatic Failover - Leader dies? New one elected in seconds. Zero data loss.
- Idempotency Keys - Built-in deduplication for safe client retries
- Priority Queues - Higher priority tasks processed first
- Dead-Letter Queue - Failed tasks moved after max retries
- Observability - Prometheus metrics, structured logging
┌─────────────────────────────────────────────────────────────┐
│ DQueue Cluster │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Node 1 │◀─────▶│ Node 2 │◀─────▶│ Node 3 │ │
│ │ (Leader)│ Raft │(Follower)│ Raft │(Follower)│ │
│ └────┬────┘ └─────────┘ └─────────┘ │
│ │ │
│ │ HTTP API │
└────────┼────────────────────────────────────────────────────┘
│
┌────┴────┐
│ Clients │
│ Workers │
└─────────┘
- Go 1.22+
- Docker & Docker Compose (for cluster mode)
- Make
# Build
make build
# Run single node (no Raft, for quick testing)
./bin/dqueue --id=dev --bootstrap=true# Start cluster
docker-compose -f deploy/docker-compose.yml up -d
# Check health
curl http://localhost:8081/health
# View logs
docker-compose -f deploy/docker-compose.yml logs -f# Using CLI
./bin/dqueue-cli enqueue -type email -payload '{"to":"user@example.com"}'
# Using curl
curl -X POST http://localhost:8081/tasks \
-H "Content-Type: application/json" \
-d '{"type":"email","payload":{"to":"user@example.com"}}'./bin/dqueue-worker --server http://localhost:8081POST /tasks
{
"type": "email",
"payload": {"to": "user@example.com", "subject": "Hello"},
"priority": 10, # Optional: higher = more urgent
"max_retries": 3, # Optional: default 3
"idempotency_key": "abc" # Optional: for deduplication
}POST /tasks/lease
{
"worker_id": "worker-1"
}Returns the next available task. Task is "invisible" to other workers until ACKed or visibility timeout expires.
POST /tasks/{id}/ack
{
"worker_id": "worker-1"
}POST /tasks/{id}/nack
{
"worker_id": "worker-1",
"reason": "database connection failed"
}Task will be retried with exponential backoff, or moved to dead-letter after max retries.
GET /tasks/{id}GET /healthGET /statsTasks may be delivered multiple times if:
- Worker crashes after processing but before ACKing
- Network issues cause ACK to be lost
- Visibility timeout expires during processing
Your task handlers should be idempotent.
Exactly-once is impossible in distributed systems without two-phase commit. We chose at-least-once because:
- It's what SQS and most production queues use
- It's simpler and more performant
- Idempotent handlers are a well-understood pattern
Watch automatic leader election:
# Terminal 1: Start cluster
docker-compose -f deploy/docker-compose.yml up
# Terminal 2: Run failover demo
./scripts/demo-failover.shWhat happens:
- Identify current leader
- Enqueue tasks
- Kill the leader (
docker kill) - Watch new leader elected (~3 seconds)
- Verify zero task loss
Run the chaos test to verify reliability under failure:
./scripts/chaos-test.shThis script:
- Continuously enqueues tasks
- Kills the leader every 10 seconds
- Verifies no data loss after 60 seconds
Results: Zero committed tasks lost. Transient failures during leader transitions are expected (clients should retry).
| Flag | Default | Description |
|---|---|---|
--id |
required | Unique node ID |
--data-dir |
./data |
Data directory |
--http-addr |
:8080 |
HTTP API address |
--grpc-addr |
:9090 |
gRPC (Raft) address |
--peers |
`` | Peer addresses: id1=addr1,id2=addr2 |
--bootstrap |
false |
Bootstrap new cluster |
--lease-timeout |
30s |
Visibility timeout |
--max-retries |
3 |
Max retries before dead-letter |
The Raft log serves as the write-ahead log. Queue state is derived by replaying committed entries (event sourcing). This avoids double-write complexity.
Leases (visibility timeouts) are NOT replicated via Raft. This is a deliberate trade-off:
- Pro: Much higher throughput (no consensus per lease)
- Con: If leader dies, leased tasks become available again
- This matches how SQS visibility timeout works
We implement core Raft without extensions like pre-vote, joint consensus, or linearizable reads. See docs/raft.md for rationale.
├── cmd/
│ ├── dqueue/ # Queue node binary
│ ├── dqueue-worker/ # Example worker
│ └── dqueue-cli/ # CLI tool
├── pkg/
│ ├── raft/ # Raft consensus implementation
│ ├── queue/ # Task queue logic
│ ├── storage/ # Unified WAL storage
│ ├── transport/ # gRPC + HTTP transport
│ └── client/ # Go SDK
├── docs/
│ ├── architecture.md # System design
│ ├── raft.md # Raft implementation details
│ └── failure-modes.md # Failure scenarios
├── deploy/
│ ├── Dockerfile
│ └── docker-compose.yml
└── scripts/
├── demo-basic.sh
├── demo-failover.sh
└── chaos-test.sh
# Build all binaries
make build
# Run tests
make test
# Generate protobuf (requires protoc)
make proto
# Build Docker image
make docker-build- Architecture - System design and data flow
- Raft Implementation - Consensus details and non-goals
- Failure Modes - How failures are handled
- Raft Paper - "In Search of an Understandable Consensus Algorithm"
- Amazon SQS - Visibility timeout inspiration
MIT