Structured persistence for object storage.
Lode is an embeddable Go persistence framework that brings snapshots, metadata, and safe write semantics to object storage—without running a database. It offers two coequal persistence paradigms: Datasets for structured collections of named objects, and Volumes for sparse, resumable byte-range storage.
Lode is:
- An embeddable Go library
- A persistence framework for structured data and sparse byte ranges
- Snapshots, atomic commits, and metadata guarantees you'd otherwise hand-roll on raw storage APIs
- Explicit and predictable by design
Lode is not:
- A database or storage engine
- A query planner or execution runtime
- A distributed system or background service
Lode owns structure and lifecycle, not execution.
Many systems already use object storage as their primary persistence layer:
- Events written to S3
- Parquet files on a filesystem
- Analytics data dumped “by convention”
What’s usually missing is discipline:
- No clear snapshots or versions
- No metadata guarantees
- Accidental overwrites
- Ad-hoc directory layouts
- Unclear read/write safety
Lode formalizes the parts that should be formalized and refuses to do the rest.
Lode is built around a small set of invariants:
Dataset (structured collections):
- Datasets are immutable — writes produce snapshots
- Snapshots reference manifests that describe data files and metadata
- Storage, format, compression, and partitioning are orthogonal
Volume (sparse byte ranges):
- Volumes are sparse, resumable byte address spaces
- Volume commits track which byte ranges exist — gaps are explicit, never zero-filled
- Each commit produces a cumulative manifest covering all committed blocks
Shared invariants:
- Manifest presence is the commit signal — a snapshot is visible only after its manifest is persisted
- Metadata is explicit, persisted, and self-describing (never inferred)
- Single-writer semantics required — Lode does not resolve concurrent writer conflicts
If you know the snapshot ID, you know exactly what data you are reading.
go get github.com/justapithecus/lodepackage main
import (
"context"
"fmt"
"github.com/justapithecus/lode/lode"
)
func main() {
ctx := context.Background()
// Create dataset with filesystem storage
ds, _ := lode.NewDataset("mydata", lode.NewFSFactory("/tmp/lode-demo"))
// Write a blob (default: raw bytes, no codec)
snap, _ := ds.Write(ctx, []any{[]byte("hello world")}, lode.Metadata{"source": "demo"})
fmt.Println("Created snapshot:", snap.ID)
// Read it back
data, _ := ds.Read(ctx, snap.ID)
fmt.Println("Data:", string(data[0].([]byte)))
}// With a codec, Write accepts structured data
ds, _ := lode.NewDataset("events",
lode.NewFSFactory("/tmp/lode-demo"),
lode.WithCodec(lode.NewJSONLCodec()),
)
records := lode.R(
lode.D{"id": 1, "event": "signup"},
lode.D{"id": 2, "event": "login"},
)
snap, _ := ds.Write(ctx, records, lode.Metadata{"batch": "1"})// StreamWrite is for large binary payloads (no codec)
ds, _ := lode.NewDataset("backups", lode.NewFSFactory("/tmp/lode-demo"))
sw, _ := ds.StreamWrite(ctx, lode.Metadata{"type": "backup"})
sw.Write([]byte("... large data ..."))
snap, _ := sw.Commit(ctx)// Volume: sparse, resumable byte-range persistence
vol, _ := lode.NewVolume("disk-image", lode.NewFSFactory("/tmp/lode-demo"), 1024)
// Stage a byte range and commit
blk, _ := vol.StageWriteAt(ctx, 0, bytes.NewReader(data))
snap, _ := vol.Commit(ctx, []lode.BlockRef{blk}, lode.Metadata{"step": "boot"})
// Read it back
result, _ := vol.ReadAt(ctx, snap.ID, 0, len(data))See examples/ for complete runnable code.
| Paradigm | API | Use Case | Codec | Partitioning |
|---|---|---|---|---|
| Dataset | Write |
In-memory data, small batches | ✅ Optional | ✅ Supported |
| Dataset | StreamWrite |
Large binary payloads (GB+) | ❌ Raw only | ❌ Not supported |
| Dataset | StreamWriteRecords |
Large record streams | ✅ Required (streaming) | ❌ Not supported |
| Volume | StageWriteAt + Commit |
Sparse byte ranges, resumable | ❌ Raw only | ❌ Not applicable |
Decision flow:
- Is the data already in memory? → Use
Write - Is it a large binary blob (no structure)? → Use
StreamWrite - Is it a large stream of records? → Use
StreamWriteRecordswith a streaming codec - Is it a sparse/resumable byte address space? → Use
Volume
What Lode commits to:
| Guarantee | Detail |
|---|---|
| Immutable snapshots | Once written, data files and manifests never change |
| Atomic commits | Manifest presence is the commit signal; no partial visibility |
| Explicit metadata | Every snapshot has caller-supplied metadata (never inferred) |
| Safe writes | Overwrites are prevented (atomic for small files; best-effort for large) |
| Backend-agnostic | Same semantics on filesystem, memory, or S3 |
What Lode explicitly does NOT provide:
| Non-goal | Why |
|---|---|
| Multi-writer conflict resolution | Requires distributed coordination; use external locks |
| Query execution | Lode structures data; query engines consume it |
| Background compaction | No implicit mutations; callers control lifecycle |
| Automatic cleanup | Partial objects from failed writes may remain |
For full contract details: docs/contracts/
Common pitfalls when using Lode:
- Metadata must be non-nil — Pass
lode.Metadata{}for empty metadata, notnil. - Raw mode expects
[]byte— Without a codec,Writeexpects exactly one[]byteelement. - Single-writer only — Concurrent writers to the same dataset or volume may corrupt history.
- Cleanup is best-effort — Failed streams may leave partial objects in storage.
- StreamWriteRecords requires streaming codec — Not all codecs support streaming.
See PUBLIC_API.md for complete usage guidance.
A canonical Lode workflow looks like this:
- Dataset:
events - Rows: timestamped events
- Partitioning: Hive-style by day (
dt=YYYY-MM-DD) - Format: JSON Lines
- Compression: gzip
- Backend: file system or S3
Each write produces a new snapshot. Reads always target a snapshot explicitly.
- Filesystem — Local storage via
NewFSFactory - In-memory — Testing via
NewMemoryFactory - S3 — AWS S3, MinIO, LocalStack, R2 via
lode/s3
Ensure storage exists before constructing a dataset, volume, or reader.
Lode does not create storage infrastructure. This is intentional:
- No hidden side effects in constructors
- Explicit provisioning keeps control with the caller
- Same pattern across all backends
# Create directory before use
mkdir -p /data/lode// Then construct dataset
ds, err := lode.NewDataset("events", lode.NewFSFactory("/data/lode"))# Create bucket before use (via AWS CLI, console, or IaC)
aws s3 mb s3://my-bucket// Then construct dataset (wrap Store in factory)
store, err := s3.New(client, s3.Config{Bucket: "my-bucket"})
factory := func() (lode.Store, error) { return store, nil }
ds, err := lode.NewDataset("events", factory)If you need provisioning helpers, implement them outside core APIs:
// Example: ensure directory exists before constructing dataset
func EnsureFSDataset(id, root string, opts ...lode.Option) (lode.Dataset, error) {
if err := os.MkdirAll(root, 0755); err != nil {
return nil, fmt.Errorf("create storage root: %w", err)
}
return lode.NewDataset(id, lode.NewFSFactory(root), opts...)
}This keeps Lode's core APIs explicit and predictable.
| Example | Purpose | Run |
|---|---|---|
default_layout |
Write → list → read with default layout | go run ./examples/default_layout |
hive_layout |
Partition-first layout with Hive partitioner | go run ./examples/hive_layout |
blob_upload |
Raw blob write/read (no codec, default bundle) | go run ./examples/blob_upload |
manifest_driven |
Demonstrates manifest-as-commit-signal | go run ./examples/manifest_driven |
stream_write_records |
Streaming record writes with iterator | go run ./examples/stream_write_records |
parquet |
Parquet codec with schema-typed fields | go run ./examples/parquet |
volume_sparse |
Sparse Volume: stage, commit, read with gaps | go run ./examples/volume_sparse |
s3_experimental |
S3 adapter with LocalStack/MinIO | go run ./examples/s3_experimental |
Each example is self-contained and runnable. See the example source for detailed comments.
Lode is at v0.7.0 and under active development. APIs are stabilizing; some changes are possible before v1.0.
If you are evaluating Lode, focus on:
- snapshot semantics (Dataset and Volume)
- metadata visibility
- API clarity
Usage overview: PUBLIC_API.md
Concrete usage: examples/
Implementation milestones: docs/IMPLEMENTATION_PLAN.md
Apache License 2.0. Copyright © 2026 Andrew Hu me@andrewhu.nyc.