Skip to content

feat(oci): store OCI layers as content-addressed artifacts (layers survive)#71

Merged
jaredLunde merged 1 commit into
mainfrom
jared/better-layers
Jun 6, 2026
Merged

feat(oci): store OCI layers as content-addressed artifacts (layers survive)#71
jaredLunde merged 1 commit into
mainfrom
jared/better-layers

Conversation

@jaredLunde
Copy link
Copy Markdown
Contributor

Problem

Flattening all OCI layers into one merged ext4 destroys cross-image dedup: a shared base layer lands at different block offsets in each image, so it shares 0 blocks. Proven empirically with the new oci_dedup_measure bin on real debian / python / node images (all share the debian base layer):

  • merged: 207.0 MiB, 0.0% Jaccard on every pair
  • layers-survive: 133.2 MiB35.7% smaller (and the win grows with fleet size)

A single linear ext4 and cross-image layer dedup are fundamentally incompatible (deltas don't help; layer-stable layout fights ext4's global metadata). So this stops merging and stores each layer once. Direction matches the platform's beyond/primitives/oci, which composes images with overlayfs over content-addressed layers.

What changed

  • ext4::convert_layer_to_ext4 — convert one OCI layer to a deterministic ext4 that is a valid overlayfs lower layer: .wh.<f> → char-device 0,0, .wh..wh..opqtrusted.overlay.opaque xattr (instead of flattening).
  • oci::ext4_store — extracted the block→pack→manifest streaming core out of the feature-gated cli::bless into the always-on lib so bless and the layer store share it (store_ext4_stream, deterministic_uuid). cli::bless now delegates to it.
  • oci::layer_store — store each layer once under a global content-addressed namespace layers/{digest} (idempotent via HEAD); record an image as an ordered digest list at images/{name}.
  • cli::blessrun_bless_oci_layered + bless --oci --layered.
  • cli::gcreconcile_layers: ref-count the shared layer pool by images/* descriptors; reap orphans after the grace period (which also covers the layered-bless write race). Layers live outside exports/, so the existing per-prefix GC never touches them.

Verification

End-to-end docker_integration test (oci_layer_dedup) pushes three images sharing a base layer to a real registry:2 and asserts:

  1. the shared base layer is stored exactly once (images 2 & 3 reuse it),
  2. layered storage < merged storage of the same three images,
  3. overlay composition == merged filesystem (incl. the whiteout deletion and opaque dir), via ext4::Reader,
  4. the GC reference invariant holds (dropping one image keeps the base referenced).

Passes against a real registry. Plus ext4 (52) + bless (8) + layer_store (3) + gc (29) unit tests, all green.

Scope / follow-up

Storage dedup + GC + proofs are in. Phase 2 (not in this PR): the runtime so a VM boots from layered storage via overlayfs over lazy per-layer devices; per-layer hot-sets.

🤖 Generated with Claude Code

…rvive)

Flattening all OCI layers into one merged ext4 destroys cross-image dedup: a
shared base layer lands at different block offsets in each image, so it shares
0 blocks (proven empirically — see the oci_dedup_measure bin: 207 MiB merged
vs 133 MiB layers-survive / 35.7% smaller on debian/python/node). A single
linear ext4 and cross-image layer dedup are fundamentally incompatible, so this
stops merging and stores each layer once.

- ext4: convert_layer_to_ext4 — convert one OCI layer to a deterministic ext4
  that is a valid overlayfs lower layer (.wh.<f> -> char-dev 0,0,
  .wh..wh..opq -> trusted.overlay.opaque xattr) instead of flattening.
- oci::ext4_store: extract the block->pack->manifest streaming core out of the
  feature-gated cli::bless into the always-on lib so bless and the layer store
  share it (store_ext4_stream, deterministic_uuid).
- oci::layer_store: store each layer once under a global content-addressed
  namespace layers/{digest} (idempotent via HEAD); record an image as an
  ordered digest list at images/{name}.
- cli::bless: run_bless_oci_layered + `bless --oci --layered`.
- cli::gc: reconcile_layers — ref-count the shared layer pool by images/*
  descriptors; reap orphans after the grace period (which also covers the
  bless write-race). Layers live outside exports/, untouched by per-prefix GC.

Verification: end-to-end docker_integration test (oci_layer_dedup) pushes three
images sharing a base layer to registry:2 and asserts the base is stored once,
layered storage < merged storage, overlay composition equals the merged
filesystem (incl. whiteout + opaque dir), and the GC reference invariant holds.
Passes against a real registry. Plus ext4 + bless + layer_store + gc unit tests.

Phase 2 (not included): runtime so a VM boots from layered storage via overlayfs
over lazy per-layer devices; per-layer hot-sets.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jaredLunde jaredLunde merged commit c5b8dc6 into main Jun 6, 2026
25 of 35 checks passed
@jaredLunde jaredLunde deleted the jared/better-layers branch June 6, 2026 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant