- Status: Completed
- Date: 2026-06-16
- Deciders: project maintainer
- Supersedes: —
- Superseded by: —
- Related: ADR 0010 — Lazy decode for 1:1 transform encodings, ADR 0012 — Zero-copy layout decoding: lazy Chunked / Dict
ADR 0010 + 0012 established Lazy as the canonical decode shape for every encoding where per-element transform is feasible. The reader still carries the older eager paths for most of these encodings — they were left in place as a safety net during the lazy rollout, gated only by tests.
The cost compounds:
- Test matrix. Every encoding test now exercises two output shapes, duplicated across unit + integration suites.
- Decoder churn.
ZigZag,FoR,ALP,RunEnd,Sparse,RLE,Dict,Chunked,VarBinView,DateTimeParts,DecimalByteParts,Decimal,Constanteach have a non-trivial materialised branch that must keep up with format changes — without ever firing in production. - Surprise factor for contributors. New decoders default to the pattern they read in neighbouring files; if the neighbouring file still has a Materialized branch, the new encoding inherits the obsolete shape by mimicry.
- Real-file fragility. The per-column chunking misalignment fix in
0.7.2 (fix(reader): align per-column chunking via shared decode +
sliced views) had
to consult both the Lazy and Materialized output shapes when wiring the
Offset*Arrayslice path. One shape would have been enough.
When an encoding ships a Lazy variant that meets the criteria below, the Materialized fallback is removed in the same PR or in a follow-up PR within one release cycle. Reverts are allowed if a regression appears; the policy is "stop carrying two shapes by default", not "never go back".
A Materialized branch may be deleted when all of these hold:
- The Lazy variant is the default for every public read path
(
ScanIterator,Chunk.column,decodeFlatSegment). - Every integration test that asserts row values passes against the Lazy
shape — including the Rust round-trip suite
(
RustWritesJavaReadsIntegrationTest,RustJavaReaderComparisonIntegrationTest). - The encoding's
Decode shaperow indocs/compatibility.mdreadsLazy / Lazy. - At least one production-shaped workload has decoded the encoding via the Lazy path — the NYC Yellow Taxi fixture set is the canonical stand-in until a broader corpus exists.
- The eager output path inside
decode(DecodeContext)(typically thectx.arena().allocate(...)+ per-row loop +Materialized*Arrayreturn). - The matching test cases that pin the eager shape (
Materialized*Arrayinstance-of assertions, byte-level segment comparisons that only the eager path produces). - Helper methods that exist only to support the eager path (e.g.
applyReference, per-ptype materialisers inFrameOfReferenceEncodingDecoder).
- Decompression-style encodings —
bitpacked,pco,zstd,fsst,delta,patched— keep their Materialized output. ADR 0010 § "Decompression encodings stay eager" applies; ADR 0015 does not change it. - Materialisation fallbacks inside
ArraySegments.of(arr, arena)stay — they exist for callers that explicitly request aMemorySegmentfrom a Lazy array, not for default decode.
- One shape per encoding to read, test, and document.
docs/compatibility.mdbecomes the single source of truth for shape decisions; the code matches.- New decoders inherit the Lazy pattern by default.
- Loss of an in-tree A/B comparison point. Mitigation: keep the Materialised path live in benchmarks only, never in production decode.
- Encoding-specific micro-optimisations that the eager path enabled
(e.g. SIMD-friendly
applyReferencein FoR) need to migrate to the Lazy path or to amaterialiseXxxhelper inArraySegments.
- 0.7.x: policy ratified, no removals yet.
- 0.8.0: eager paths removed from
ZigZag(all PTypes + broadcast),FoR(all integer PTypes),ALP(broadcast-without-patches),Constant(Decimal →LazyConstantDecimalArray;DecimalArrayinterface introduced to replace two concreteArraypermits entries),RunEnd(Bool →LazyRunEndBoolArray),Sparse(Bool →LazySparseBoolArray),RLE(validity →OffsetBoolArray; empty →LazyConstantXxxArray).Dictencoding-level path intentionally kept eager (layout-level path is lazy via ADR 0012).DateTimeParts,DecimalByteParts,Decimal,VarBinView,Chunkedwere already fully lazy before this sweep. - 0.9.0: revisit Decompression encodings — if a Lazy variant lands
(e.g. window-decoded
Pco) the same criteria apply.
- Removal PRs must be one encoding per commit so a regression can be reverted in isolation.
- After each removal, refresh the encoding's row in
docs/compatibility.mdand the ADR 0010 / 0012 status header if applicable.