An ultra-fast, concurrent-safe, sharded bump-pointer memory arena for Go.
memoryArena avoids the immense GC overhead of millions of small object allocations by pre-allocating a large byte buffer and bumping a pointer. It is strictly optimized for high-concurrency environments, utilizing stack-pointer Fibonacci hashing to deterministically assign goroutines to hardware-isolated memory shards, virtually eliminating atomic cache-line contention (False Sharing).
- Zero GC Overhead: Allocate structs and slices directly into the arena.
- Thread-Safe & Lock-Free: Built on top of
atomic.Uint64with no locks or mutexes. - Cache-Line Isolated Shards: Automatically divides the arena into 64 padded shards. Goroutines are deterministically hashed to shards to prevent CPU cache-line bouncing.
- Graceful Linear Probing: If a shard runs out of space, the allocator instantly falls back to the next available shard.
- Architecture Safe: Uses Go 1.19+
atomic.Uint64to prevent 32-bit atomic alignment panics.
package main
import (
"fmt"
arena "github.com/Protocol-Lattice/memoryArena"
)
func main() {
// Create a 32MB arena
a := arena.NewMemoryArena(32 * 1024 * 1024)
// Allocate a single struct
val := arena.New(a, 42)
fmt.Println(*val)
// Allocate a slice
slice := arena.NewSlice[int](a, 100)
slice[0] = 99
// Reset the arena (reclaims all memory instantly)
a.Reset()
}memoryArena is optimized for two primary scenarios: extreme concurrency and high GC pressure.
By utilizing Thread-Local Allocation Buffers (TLABs) via runtime_procPin, the arena eliminates atomic operations on the hot path. Even under extreme multi-threaded contention, allocations take less than 1 nanosecond.
goos: darwin
goarch: arm64
pkg: github.com/Protocol-Lattice/memoryArena
cpu: Apple M2
BenchmarkArena_Contention-8 1000000000 0.9097 ns/op
When the Go heap is crowded with millions of objects, the standard make becomes slower due to GC background marking and scanning. memoryArena remains unaffected as its entire buffer is seen as a single object by the GC.
Benchmark (10M Live Objects, 80M Pointers):
| Workload | Latency | Allocs/Op | GC Overhead |
|---|---|---|---|
| Standard Heap | 20.58 ns/op | 1 | High |
| memoryArena | 5.85 ns/op | 0 | None |
The arena is ~3.5x faster under pressure while maintaining consistent p99 latency.
- Per-P TLABs: Each Go Processor (P) maintains a private 64KB allocation buffer. This allows for "zero-atomic" allocations in the fast path.
- Cache-Line Isolation: All internal structures are padded to 128 bytes (optimized for ARM64/M2) to prevent False Sharing.
- O(1) Reset: Uses a Generation Counter to instantly invalidate all thread-local buffers across all CPU cores during a
Reset(). - Sharded Fallback: A 64-shard global arena handles TLAB refills and large allocations with minimal contention.
package main
import (
"fmt"
arena "github.com/Protocol-Lattice/memoryArena"
)
func main() {
// Create a 32MB arena
a := arena.NewMemoryArena(32 * 1024 * 1024)
// Allocate a single struct
val := arena.New(a, 42)
fmt.Println(*val)
// Allocate a slice
slice := arena.NewSlice[int](a, 100)
slice[0] = 99
// Reset the arena (reclaims all memory instantly)
a.Reset()
}