Skip to content

AwaisAliShahid/dqueue

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DQueue

A distributed task queue with Raft consensus, built from scratch in Go.

Think: mini-SQS + mini-Celery, with the hard parts implemented.

Go Version License

Features

  • Raft Consensus - From-scratch implementation with leader election, log replication, and snapshots
  • At-Least-Once Delivery - Tasks are never lost; may be processed multiple times
  • Visibility Timeout - SQS-style lease mechanism prevents duplicate processing
  • Automatic Failover - Leader dies? New one elected in seconds. Zero data loss.
  • Idempotency Keys - Built-in deduplication for safe client retries
  • Priority Queues - Higher priority tasks processed first
  • Dead-Letter Queue - Failed tasks moved after max retries
  • Observability - Prometheus metrics, structured logging

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      DQueue Cluster                         │
│                                                             │
│   ┌─────────┐       ┌─────────┐       ┌─────────┐          │
│   │  Node 1 │◀─────▶│  Node 2 │◀─────▶│  Node 3 │          │
│   │ (Leader)│ Raft  │(Follower)│ Raft  │(Follower)│          │
│   └────┬────┘       └─────────┘       └─────────┘          │
│        │                                                    │
│        │ HTTP API                                           │
└────────┼────────────────────────────────────────────────────┘
         │
    ┌────┴────┐
    │ Clients │
    │ Workers │
    └─────────┘

Quick Start

Prerequisites

  • Go 1.22+
  • Docker & Docker Compose (for cluster mode)
  • Make

Run a Single Node (Development)

# Build
make build

# Run single node (no Raft, for quick testing)
./bin/dqueue --id=dev --bootstrap=true

Run a 3-Node Cluster

# Start cluster
docker-compose -f deploy/docker-compose.yml up -d

# Check health
curl http://localhost:8081/health

# View logs
docker-compose -f deploy/docker-compose.yml logs -f

Enqueue Tasks

# Using CLI
./bin/dqueue-cli enqueue -type email -payload '{"to":"user@example.com"}'

# Using curl
curl -X POST http://localhost:8081/tasks \
  -H "Content-Type: application/json" \
  -d '{"type":"email","payload":{"to":"user@example.com"}}'

Run a Worker

./bin/dqueue-worker --server http://localhost:8081

API Reference

Enqueue Task

POST /tasks
{
  "type": "email",
  "payload": {"to": "user@example.com", "subject": "Hello"},
  "priority": 10,           # Optional: higher = more urgent
  "max_retries": 3,         # Optional: default 3
  "idempotency_key": "abc"  # Optional: for deduplication
}

Lease Task (for workers)

POST /tasks/lease
{
  "worker_id": "worker-1"
}

Returns the next available task. Task is "invisible" to other workers until ACKed or visibility timeout expires.

Acknowledge Task (success)

POST /tasks/{id}/ack
{
  "worker_id": "worker-1"
}

Negative Acknowledge (failure)

POST /tasks/{id}/nack
{
  "worker_id": "worker-1",
  "reason": "database connection failed"
}

Task will be retried with exponential backoff, or moved to dead-letter after max retries.

Get Task Status

GET /tasks/{id}

Health Check

GET /health

Statistics

GET /stats

Delivery Guarantees

At-Least-Once Delivery

Tasks may be delivered multiple times if:

  • Worker crashes after processing but before ACKing
  • Network issues cause ACK to be lost
  • Visibility timeout expires during processing

Your task handlers should be idempotent.

Why Not Exactly-Once?

Exactly-once is impossible in distributed systems without two-phase commit. We chose at-least-once because:

  • It's what SQS and most production queues use
  • It's simpler and more performant
  • Idempotent handlers are a well-understood pattern

Failover Demo

Watch automatic leader election:

# Terminal 1: Start cluster
docker-compose -f deploy/docker-compose.yml up

# Terminal 2: Run failover demo
./scripts/demo-failover.sh

What happens:

  1. Identify current leader
  2. Enqueue tasks
  3. Kill the leader (docker kill)
  4. Watch new leader elected (~3 seconds)
  5. Verify zero task loss

Chaos Testing

Run the chaos test to verify reliability under failure:

./scripts/chaos-test.sh

This script:

  • Continuously enqueues tasks
  • Kills the leader every 10 seconds
  • Verifies no data loss after 60 seconds

Results: Zero committed tasks lost. Transient failures during leader transitions are expected (clients should retry).

Configuration

Flag Default Description
--id required Unique node ID
--data-dir ./data Data directory
--http-addr :8080 HTTP API address
--grpc-addr :9090 gRPC (Raft) address
--peers `` Peer addresses: id1=addr1,id2=addr2
--bootstrap false Bootstrap new cluster
--lease-timeout 30s Visibility timeout
--max-retries 3 Max retries before dead-letter

Design Decisions

Unified Log

The Raft log serves as the write-ahead log. Queue state is derived by replaying committed entries (event sourcing). This avoids double-write complexity.

Leader-Local Leases

Leases (visibility timeouts) are NOT replicated via Raft. This is a deliberate trade-off:

  • Pro: Much higher throughput (no consensus per lease)
  • Con: If leader dies, leased tasks become available again
  • This matches how SQS visibility timeout works

Scoped Raft

We implement core Raft without extensions like pre-vote, joint consensus, or linearizable reads. See docs/raft.md for rationale.

Project Structure

├── cmd/
│   ├── dqueue/          # Queue node binary
│   ├── dqueue-worker/   # Example worker
│   └── dqueue-cli/      # CLI tool
├── pkg/
│   ├── raft/            # Raft consensus implementation
│   ├── queue/           # Task queue logic
│   ├── storage/         # Unified WAL storage
│   ├── transport/       # gRPC + HTTP transport
│   └── client/          # Go SDK
├── docs/
│   ├── architecture.md  # System design
│   ├── raft.md          # Raft implementation details
│   └── failure-modes.md # Failure scenarios
├── deploy/
│   ├── Dockerfile
│   └── docker-compose.yml
└── scripts/
    ├── demo-basic.sh
    ├── demo-failover.sh
    └── chaos-test.sh

Building

# Build all binaries
make build

# Run tests
make test

# Generate protobuf (requires protoc)
make proto

# Build Docker image
make docker-build

Documentation

References

  • Raft Paper - "In Search of an Understandable Consensus Algorithm"
  • Amazon SQS - Visibility timeout inspiration

License

MIT

About

Distributed task queue with Raft consensus built from scratch in Go. Leader election, log replication, at-least-once delivery, and automatic failover.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors