Skip to content

Protocol-Lattice/memoryArena

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

memoryArena

An ultra-fast, concurrent-safe, sharded bump-pointer memory arena for Go.

Changelog

memoryArena avoids the immense GC overhead of millions of small object allocations by pre-allocating a large byte buffer and bumping a pointer. It is strictly optimized for high-concurrency environments, utilizing stack-pointer Fibonacci hashing to deterministically assign goroutines to hardware-isolated memory shards, virtually eliminating atomic cache-line contention (False Sharing).

Features

  • Zero GC Overhead: Allocate structs and slices directly into the arena.
  • Thread-Safe & Lock-Free: Built on top of atomic.Uint64 with no locks or mutexes.
  • Cache-Line Isolated Shards: Automatically divides the arena into 64 padded shards. Goroutines are deterministically hashed to shards to prevent CPU cache-line bouncing.
  • Graceful Linear Probing: If a shard runs out of space, the allocator instantly falls back to the next available shard.
  • Architecture Safe: Uses Go 1.19+ atomic.Uint64 to prevent 32-bit atomic alignment panics.

Usage

package main

import (
    "fmt"
    arena "github.com/Protocol-Lattice/memoryArena"
)

func main() {
    // Create a 32MB arena
    a := arena.NewMemoryArena(32 * 1024 * 1024)

    // Allocate a single struct
    val := arena.New(a, 42)
    fmt.Println(*val)

    // Allocate a slice
    slice := arena.NewSlice[int](a, 100)
    slice[0] = 99

    // Reset the arena (reclaims all memory instantly)
    a.Reset()
}

Performance & Benchmarks

memoryArena is optimized for two primary scenarios: extreme concurrency and high GC pressure.

1. High Contention (TLAB Optimization)

By utilizing Thread-Local Allocation Buffers (TLABs) via runtime_procPin, the arena eliminates atomic operations on the hot path. Even under extreme multi-threaded contention, allocations take less than 1 nanosecond.

goos: darwin
goarch: arm64
pkg: github.com/Protocol-Lattice/memoryArena
cpu: Apple M2
BenchmarkArena_Contention-8             1000000000               0.9097 ns/op

2. GC Pressure (Heap vs Arena)

When the Go heap is crowded with millions of objects, the standard make becomes slower due to GC background marking and scanning. memoryArena remains unaffected as its entire buffer is seen as a single object by the GC.

Benchmark (10M Live Objects, 80M Pointers):

Workload Latency Allocs/Op GC Overhead
Standard Heap 20.58 ns/op 1 High
memoryArena 5.85 ns/op 0 None

The arena is ~3.5x faster under pressure while maintaining consistent p99 latency.

Architecture

  • Per-P TLABs: Each Go Processor (P) maintains a private 64KB allocation buffer. This allows for "zero-atomic" allocations in the fast path.
  • Cache-Line Isolation: All internal structures are padded to 128 bytes (optimized for ARM64/M2) to prevent False Sharing.
  • O(1) Reset: Uses a Generation Counter to instantly invalidate all thread-local buffers across all CPU cores during a Reset().
  • Sharded Fallback: A 64-shard global arena handles TLAB refills and large allocations with minimal contention.

Usage

package main

import (
    "fmt"
    arena "github.com/Protocol-Lattice/memoryArena"
)

func main() {
    // Create a 32MB arena
    a := arena.NewMemoryArena(32 * 1024 * 1024)

    // Allocate a single struct
    val := arena.New(a, 42)
    fmt.Println(*val)

    // Allocate a slice
    slice := arena.NewSlice[int](a, 100)
    slice[0] = 99

    // Reset the arena (reclaims all memory instantly)
    a.Reset()
}

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages