Lode

Structured persistence for object storage.

Lode is an embeddable Go persistence framework that brings snapshots, metadata, and safe write semantics to object storage—without running a database. It offers two coequal persistence paradigms: Datasets for structured collections of named objects, and Volumes for sparse, resumable byte-range storage.

What Lode is (and is not)

Lode is:

An embeddable Go library
A persistence framework for structured data and sparse byte ranges
Snapshots, atomic commits, and metadata guarantees you'd otherwise hand-roll on raw storage APIs
Explicit and predictable by design

Lode is not:

A database or storage engine
A query planner or execution runtime
A distributed system or background service

Lode owns structure and lifecycle, not execution.

Why Lode exists

Many systems already use object storage as their primary persistence layer:

Events written to S3
Parquet files on a filesystem
Analytics data dumped “by convention”

What’s usually missing is discipline:

No clear snapshots or versions
No metadata guarantees
Accidental overwrites
Ad-hoc directory layouts
Unclear read/write safety

Lode formalizes the parts that should be formalized and refuses to do the rest.

Core ideas

Lode is built around a small set of invariants:

Dataset (structured collections):

Datasets are immutable — writes produce snapshots
Snapshots reference manifests that describe data files and metadata
Storage, format, compression, and partitioning are orthogonal

Volume (sparse byte ranges):

Volumes are sparse, resumable byte address spaces
Volume commits track which byte ranges exist — gaps are explicit, never zero-filled
Each commit produces a cumulative manifest covering all committed blocks

Shared invariants:

Manifest presence is the commit signal — a snapshot is visible only after its manifest is persisted
Metadata is explicit, persisted, and self-describing (never inferred)
Single-writer semantics required — Lode does not resolve concurrent writer conflicts

If you know the snapshot ID, you know exactly what data you are reading.

Quick Start (5 minutes)

Install

go get github.com/justapithecus/lode

Write and read a blob

package main

import (
    "context"
    "fmt"
    "github.com/justapithecus/lode/lode"
)

func main() {
    ctx := context.Background()

    // Create dataset with filesystem storage
    ds, _ := lode.NewDataset("mydata", lode.NewFSFactory("/tmp/lode-demo"))

    // Write a blob (default: raw bytes, no codec)
    snap, _ := ds.Write(ctx, []any{[]byte("hello world")}, lode.Metadata{"source": "demo"})
    fmt.Println("Created snapshot:", snap.ID)

    // Read it back
    data, _ := ds.Read(ctx, snap.ID)
    fmt.Println("Data:", string(data[0].([]byte)))
}

Write structured records

// With a codec, Write accepts structured data
ds, _ := lode.NewDataset("events",
    lode.NewFSFactory("/tmp/lode-demo"),
    lode.WithCodec(lode.NewJSONLCodec()),
)

records := lode.R(
    lode.D{"id": 1, "event": "signup"},
    lode.D{"id": 2, "event": "login"},
)
snap, _ := ds.Write(ctx, records, lode.Metadata{"batch": "1"})

Stream large files

// StreamWrite is for large binary payloads (no codec)
ds, _ := lode.NewDataset("backups", lode.NewFSFactory("/tmp/lode-demo"))

sw, _ := ds.StreamWrite(ctx, lode.Metadata{"type": "backup"})
sw.Write([]byte("... large data ..."))
snap, _ := sw.Commit(ctx)

Write sparse byte ranges (Volume)

// Volume: sparse, resumable byte-range persistence
vol, _ := lode.NewVolume("disk-image", lode.NewFSFactory("/tmp/lode-demo"), 1024)

// Stage a byte range and commit
blk, _ := vol.StageWriteAt(ctx, 0, bytes.NewReader(data))
snap, _ := vol.Commit(ctx, []lode.BlockRef{blk}, lode.Metadata{"step": "boot"})

// Read it back
result, _ := vol.ReadAt(ctx, snap.ID, 0, len(data))

See examples/ for complete runnable code.

Which Write API?

Paradigm	API	Use Case	Codec	Partitioning
Dataset	`Write`	In-memory data, small batches	✅ Optional	✅ Supported
Dataset	`StreamWrite`	Large binary payloads (GB+)	❌ Raw only	❌ Not supported
Dataset	`StreamWriteRecords`	Large record streams	✅ Required (streaming)	❌ Not supported
Volume	`StageWriteAt` + `Commit`	Sparse byte ranges, resumable	❌ Raw only	❌ Not applicable

Decision flow:

Is the data already in memory? → Use Write
Is it a large binary blob (no structure)? → Use StreamWrite
Is it a large stream of records? → Use StreamWriteRecords with a streaming codec
Is it a sparse/resumable byte address space? → Use Volume

Guarantees

What Lode commits to:

Guarantee	Detail
Immutable snapshots	Once written, data files and manifests never change
Atomic commits	Manifest presence is the commit signal; no partial visibility
Explicit metadata	Every snapshot has caller-supplied metadata (never inferred)
Safe writes	Overwrites are prevented (atomic for small files; best-effort for large)
Backend-agnostic	Same semantics on filesystem, memory, or S3

What Lode explicitly does NOT provide:

Non-goal	Why
Multi-writer conflict resolution	Requires distributed coordination; use external locks
Query execution	Lode structures data; query engines consume it
Background compaction	No implicit mutations; callers control lifecycle
Automatic cleanup	Partial objects from failed writes may remain

For full contract details: docs/contracts/

Gotchas

Common pitfalls when using Lode:

Metadata must be non-nil — Pass lode.Metadata{} for empty metadata, not nil.
Raw mode expects []byte — Without a codec, Write expects exactly one []byte element.
Single-writer only — Concurrent writers to the same dataset or volume may corrupt history.
Cleanup is best-effort — Failed streams may leave partial objects in storage.
StreamWriteRecords requires streaming codec — Not all codecs support streaming.

See PUBLIC_API.md for complete usage guidance.

Example: event storage

A canonical Lode workflow looks like this:

Dataset: events
Rows: timestamped events
Partitioning: Hive-style by day (dt=YYYY-MM-DD)
Format: JSON Lines
Compression: gzip
Backend: file system or S3

Each write produces a new snapshot. Reads always target a snapshot explicitly.

Supported Backends

Filesystem — Local storage via NewFSFactory
In-memory — Testing via NewMemoryFactory
S3 — AWS S3, MinIO, LocalStack, R2 via lode/s3

Storage Prerequisites

Ensure storage exists before constructing a dataset, volume, or reader.

Lode does not create storage infrastructure. This is intentional:

No hidden side effects in constructors
Explicit provisioning keeps control with the caller
Same pattern across all backends

Filesystem

# Create directory before use
mkdir -p /data/lode

// Then construct dataset
ds, err := lode.NewDataset("events", lode.NewFSFactory("/data/lode"))

S3

# Create bucket before use (via AWS CLI, console, or IaC)
aws s3 mb s3://my-bucket

// Then construct dataset (wrap Store in factory)
store, err := s3.New(client, s3.Config{Bucket: "my-bucket"})
factory := func() (lode.Store, error) { return store, nil }
ds, err := lode.NewDataset("events", factory)

Bootstrap Helpers

If you need provisioning helpers, implement them outside core APIs:

// Example: ensure directory exists before constructing dataset
func EnsureFSDataset(id, root string, opts ...lode.Option) (lode.Dataset, error) {
    if err := os.MkdirAll(root, 0755); err != nil {
        return nil, fmt.Errorf("create storage root: %w", err)
    }
    return lode.NewDataset(id, lode.NewFSFactory(root), opts...)
}

This keeps Lode's core APIs explicit and predictable.

Examples

Example	Purpose	Run
`default_layout`	Write → list → read with default layout	`go run ./examples/default_layout`
`hive_layout`	Partition-first layout with Hive partitioner	`go run ./examples/hive_layout`
`blob_upload`	Raw blob write/read (no codec, default bundle)	`go run ./examples/blob_upload`
`manifest_driven`	Demonstrates manifest-as-commit-signal	`go run ./examples/manifest_driven`
`stream_write_records`	Streaming record writes with iterator	`go run ./examples/stream_write_records`
`parquet`	Parquet codec with schema-typed fields	`go run ./examples/parquet`
`volume_sparse`	Sparse Volume: stage, commit, read with gaps	`go run ./examples/volume_sparse`
`s3_experimental`	S3 adapter with LocalStack/MinIO	`go run ./examples/s3_experimental`

Each example is self-contained and runnable. See the example source for detailed comments.

Status

Lode is at v0.7.0 and under active development. APIs are stabilizing; some changes are possible before v1.0.

If you are evaluating Lode, focus on:

snapshot semantics (Dataset and Volume)
metadata visibility
API clarity

Usage overview: PUBLIC_API.md Concrete usage: examples/ Implementation milestones: docs/IMPLEMENTATION_PLAN.md

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
internal/testutil		internal/testutil
lode		lode
scripts		scripts
.golangci.yaml		.golangci.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PUBLIC_API.md		PUBLIC_API.md
README.md		README.md
Taskfile.yaml		Taskfile.yaml
go.mod		go.mod
go.sum		go.sum
mise.toml		mise.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lode

What Lode is (and is not)

Why Lode exists

Core ideas

Quick Start (5 minutes)

Install

Write and read a blob

Write structured records

Stream large files

Write sparse byte ranges (Volume)

Which Write API?

Guarantees

Gotchas

Example: event storage

Supported Backends

Storage Prerequisites

Filesystem

S3

Bootstrap Helpers

Examples

Status

License

About

Uh oh!

Releases 8

Packages

Contributors 2

Languages

License

justapithecus/lode

Folders and files

Latest commit

History

Repository files navigation

Lode

What Lode is (and is not)

Why Lode exists

Core ideas

Quick Start (5 minutes)

Install

Write and read a blob

Write structured records

Stream large files

Write sparse byte ranges (Volume)

Which Write API?

Guarantees

Gotchas

Example: event storage

Supported Backends

Storage Prerequisites

Filesystem

S3

Bootstrap Helpers

Examples

Status

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 2

Languages

Packages