A Redis-compatible, high-performance distributed in-memory cache built in Go.
- Core Store: 256-shard concurrent map with RWMutex per shard, xxHash, 4 data types (string, list, hash, set)
- RESP Protocol: Full Redis wire compatibility; works with
redis-cli,redis-benchmark, and all Redis client libraries - Persistence: Append-only file (AOF) with configurable fsync + binary snapshots with CRC-32C checksums
- Replication: Async master-slave replication with PSYNC, full/partial resync, and connection hijacking
- Observability: Prometheus metrics on
/metricswith per-command latency histograms - 80+ Commands: Complete coverage of string, list, hash, set, key management, and server commands
| Operation | Throughput | Latency |
|---|---|---|
| GET (parallel) | 50M+ ops/sec | 22.73 ns |
| SET (parallel) | 42M+ ops/sec | 27.23 ns |
| Mixed 80/20 | 33M+ ops/sec | 36.49 ns |
| DELETE | 65M+ ops/sec | 18.07 ns |
Benchmarked on Apple Macbook Air M2, 8 cores
# Clone and build
git clone https://github.com/aravinth/distributed-cache
cd distributed-cache
make build
# Run the server
./dcache
# Or with all options
./dcache --port 6379 --appendonly --aof-fsync everysec --save-interval 300 --metrics-port 9090redis-cli
> PING
PONG
> SET mykey "hello"
OK
> GET mykey
"hello"
> LPUSH mylist a b c
(integer) 3
> HSET myhash name dcache version 1
(integer) 2
> SADD myset x y z
(integer) 3
> INFO# Terminal 1: Start master
./dcache --port 6379
# Terminal 2: Start slave
./dcache --port 6380 --replicaof localhost:6379
# Write to master, read from slave
redis-cli -p 6379 SET hello world
redis-cli -p 6380 GET hello # → "world"# Start dcache + Prometheus + Grafana
docker compose up --build
# dcache on :6379, metrics on :9090, Prometheus on :9091, Grafana on :3000 Client ──→ TCP Server (connection-per-goroutine)
│
▼
RESP Protocol Parser
│
▼
Command Handler (Execute)
│ │ │ │
▼ ▼ ▼ ▼
String List Hash Set ← Data Type Operations
│ │ │ │
└────┴────┴────┘
│
▼
Sharded Map (256 shards, RWMutex per shard)
│
┌────────┼────────┐
▼ ▼ ▼
AOF Log Backlog Metrics
(disk) (ring buf) (Prometheus)
│ │
▼ ▼
Recovery Slaves
Every command passes through Handler.Execute(), which is the single hook point for:
- Slave read-only guard - reject writes when operating as a slave
- Command dispatch - route to the appropriate handler
- Metrics - increment command counter, observe latency histogram
- AOF hook - log successful mutating commands for durability
- Replication hook - feed successful mutations to connected slaves
| Flag | Default | Description |
|---|---|---|
--port |
6379 |
TCP port |
--maxclients |
10000 |
Maximum concurrent connections |
--timeout |
300 |
Idle connection timeout (seconds) |
--dir |
data/ |
Data directory for persistence |
--appendonly |
true |
Enable append-only file |
--aof-fsync |
everysec |
AOF fsync policy: always, everysec, no |
--save-interval |
300 |
Snapshot interval in seconds (0 = disabled) |
--replicaof |
Make this server a replica: host:port |
|
--repl-backlog-size |
1048576 |
Replication backlog size (bytes) |
--metrics-port |
9090 |
Prometheus HTTP port (0 = disabled) |
When --metrics-port is set, DCache exposes Prometheus metrics at http://localhost:9090/metrics.
| Metric | Type | Description |
|---|---|---|
dcache_commands_total |
Counter | Commands processed (by name) |
dcache_command_duration_seconds |
Histogram | Per-command latency |
dcache_keys_total |
Gauge | Total keys in store |
dcache_cache_hits_total |
Gauge | Cache hit count |
dcache_cache_misses_total |
Gauge | Cache miss count |
dcache_cache_hit_ratio |
Gauge | Hit ratio (0.0–1.0) |
dcache_connections_active |
Gauge | Current connections |
dcache_replication_slaves_connected |
Gauge | Connected replicas |
dcache_snapshot_in_progress |
Gauge | Snapshot running (0/1) |
- With 1000 concurrent clients, ~4 clients per shard on average
- Power of 2 enables O(1) shard selection via bit masking (
hash & 0xFF) - RWMutex provides consistent latency for read-heavy and delete-heavy workloads
- 3x faster than FNV-1a, excellent distribution (no hot shards)
- Used by ClickHouse, ScyllaDB, and other high-performance systems
- Keys checked on access - zero background overhead
- Trade-off: expired keys consume memory until accessed
- Suitable for high-frequency access patterns in trading systems
- All counters use
sync/atomic- zero lock contention for observability - Per-shard metrics aggregated on demand
- AOF: Buffered channel (64K), non-blocking
Log(), configurable fsync - Snapshots: Shard-by-shard binary encode with CRC-32C checksums
- Recovery: Load latest snapshot, replay AOF tail
- Master streams mutations via bounded ring buffer backlog (1MB default)
- Full sync: snapshot transfer + backlog gap
- Partial sync: backlog slice from last offset
- Connection hijacking: PSYNC switches from request-response to streaming
| Type | Implementation | Concurrency |
|---|---|---|
| String | []byte with atomic pointer swap |
Lock-free for reads |
| List | Doubly-linked list | Per-list RWMutex |
| Hash | map[string][]byte |
Per-hash RWMutex |
| Set | map[string]struct{} |
Per-set RWMutex |
Server: PING, ECHO, QUIT, SELECT, INFO, COMMAND, DBSIZE, FLUSHDB, FLUSHALL
Keys: DEL, EXISTS, EXPIRE, PEXPIRE, TTL, PTTL, PERSIST, TYPE, RENAME, KEYS
Strings: GET, SET (EX/PX/NX/XX), SETNX, GETSET, MGET, MSET, INCR, INCRBY, DECR, DECRBY, INCRBYFLOAT, APPEND, STRLEN, GETRANGE, SETRANGE
Lists: LPUSH, RPUSH, LPOP, RPOP, LLEN, LINDEX, LRANGE, LSET, LTRIM, LREM, LINSERT, LPUSHX, RPUSHX, LMOVE
Hashes: HSET, HGET, HGETALL, HDEL, HEXISTS, HLEN, HKEYS, HVALS, HMGET, HSETNX, HINCRBY, HINCRBYFLOAT, HSTRLEN
Sets: SADD, SREM, SISMEMBER, SMISMEMBER, SMEMBERS, SCARD, SPOP, SRANDMEMBER, SMOVE, SUNION, SUNIONSTORE, SINTER, SINTERSTORE, SINTERCARD, SDIFF, SDIFFSTORE
Persistence: BGSAVE, BGREWRITEAOF, LASTSAVE
Replication: REPLICAOF, SLAVEOF, REPLCONF, PSYNC
distributed-cache/
├── cmd/server/main.go # Entry point with flags + signal handling
├── internal/
│ ├── store/ # 256-shard concurrent map
│ │ ├── entry.go # Cache entry with TTL
│ │ └── shard.go # ShardedMap + per-shard metrics
│ ├── datatype/ # Data type operations
│ │ ├── string.go # String commands
│ │ ├── list.go # Doubly-linked list
│ │ ├── hash.go # Hash table
│ │ └── set.go # Set operations
│ ├── protocol/ # RESP wire format
│ │ ├── resp.go # Types + sentinels
│ │ ├── reader.go # Parser (+ inline support)
│ │ └── writer.go # Serialiser
│ ├── server/ # TCP server
│ │ ├── server.go # Accept loop + graceful shutdown
│ │ ├── connection.go # Connection wrapper
│ │ └── handler.go # Command routing + hooks
│ ├── persistence/ # AOF + snapshots
│ │ ├── config.go # Persistence config
│ │ ├── aof.go # AOF writer (buffered channel)
│ │ ├── aof_rewrite.go # AOF compaction
│ │ ├── snapshot.go # Binary snapshot engine
│ │ ├── snapshot_format.go # Encoder/decoder + CRC-32C
│ │ ├── recovery.go # Startup recovery
│ │ └── mutating.go # Mutating command set
│ ├── replication/ # Master-slave replication
│ │ ├── role.go # ReplState + role management
│ │ ├── backlog.go # Ring buffer for repl stream
│ │ ├── master.go # PSYNC, sync, streaming
│ │ └── slave.go # Connect, handshake, apply
│ ├── metrics/ # Prometheus observability
│ │ ├── prometheus.go # Collectors + metric definitions
│ │ └── server.go # HTTP /metrics endpoint
│ └── pool/ # sync.Pool for bufio buffers
│ └── pool.go
├── test/
│ ├── unit/ # Unit tests
│ └── benchmark/ # Benchmark suite
├── docs/
│ └── design-decisions.md # Technical decisions explained
├── scripts/
│ └── bench.sh # redis-benchmark harness
├── Dockerfile # Multi-stage build
├── docker-compose.yml # dcache + Prometheus + Grafana
├── prometheus.yml # Scrape config
├── Makefile # build, test, bench, docker
└── README.md
| Operation | DCache | Redis | Notes |
|---|---|---|---|
| SET (single) | 24M/s | 110K/s | 218x faster |
| GET (single) | 13M/s | 120K/s | 108x faster |
| SET (parallel) | 42M/s | N/A | |
| GET (parallel) | 50M/s | N/A |
Note: Redis benchmarks are I/O bound; this comparison shows in-process performance.
# Go benchmarks (in-process)
make bench
# redis-benchmark (over TCP)
make redis-bench// Fine-grained locking: each shard has its own RWMutex
type ShardedMap struct {
shards [256]*Shard // 256 independent partitions
}
type Shard struct {
mu sync.RWMutex // Per-shard lock
data map[string]*Entry
// Lock-free metrics
gets atomic.Uint64
sets atomic.Uint64
hits atomic.Uint64
misses atomic.Uint64
}The Rename operation requires locking two shards. Deadlocks are prevented by locking shards in consistent address order:
func (sm *ShardedMap) Rename(oldKey, newKey string) bool {
first, second := oldShard, newShard
if uintptr(unsafe.Pointer(oldShard)) > uintptr(unsafe.Pointer(newShard)) {
first, second = newShard, oldShard
}
first.mu.Lock()
second.mu.Lock()
// ... perform rename ...
}