Skip to content

refactor(reader): Array.materialize() — kill ArraySegments duplication#62

Merged
dfa1 merged 1 commit into
mainfrom
worktree-prancy-meandering-meteor
Jun 19, 2026
Merged

refactor(reader): Array.materialize() — kill ArraySegments duplication#62
dfa1 merged 1 commit into
mainfrom
worktree-prancy-meandering-meteor

Conversation

@dfa1

@dfa1 dfa1 commented Jun 19, 2026

Copy link
Copy Markdown
Owner

What

ArraySegments re-stated each encoding's decode formula (FoR / ZigZag / ALP), the chunked concat, and the dict gather in one large central switch — separate from the per-element accessor that already lives on each lazy array. This adds Array.materialize(SegmentAllocator) and pushes every case onto the type that owns it, mirroring the existing Array.limited() polymorphism.

ArraySegments.of() collapses to arr.materialize(arena) (407 → 71 lines). trySegment() — the non-allocating zero-copy probe — is unchanged.

Where materialize lives now

  • Array root: default throws "no primary segment" (struct / list / variant / byte-parts decimal).
  • Primitive interfaces (Long/Int/Double/Float/Short/Byte/Bool): scalar getX fallback — covers every lazy type with no special bulk form (RLE/RunEnd/Sparse/Constant/AlpRd) for free. BoolArray packs an LSB-first bitmap.
  • Materialized* / Generic / VarBin / LazyDecimal: return existing buffer, zero-copy.
  • Lazy For / ZigZag / Alp: vectorisable formula loop, now beside their own getX.
  • Chunked* concat, Dict* gather, LazyConstantDecimal fill, MaskedArray delegate to inner.

Behaviour

Preserved. of() / trySegment() semantics identical, including the broadcast short-buffer edge (a constant column's 1-element buffer, matching prior behaviour — not Arrow-complete; see ADR 0016). New: lazy bool now materialises instead of throwing.

The contiguous little-endian segment materialize() yields is the producer step for the Arrow C-Data values buffer described in ADR 0016 — but this PR is a standalone locality cleanup, not an Arrow feature.

Tests

core 223, reader 540, writer 527, integration 268 (2 env-skipped) — all green. verify packaged clean, javadoc failOnWarnings passed.

🤖 Generated with Claude Code

}
}
return dst.asReadOnly();
return arr.materialize(arena);

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's inline this no?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2760e5a — removed ArraySegments.of(); callers (ReadRegistry, ScanIterator, DecodeContext, encoder + integration tests) now call arr.materialize(arena) directly. ArraySegments keeps only the non-allocating trySegment() probe.

/// @return the primary {@link MemorySegment}
/// @throws VortexException if this array type has no primary segment
default MemorySegment materialize(SegmentAllocator arena) {
throw new VortexException(getClass().getSimpleName() + " has no primary segment");

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use a pure abstract method

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2760e5amaterialize(SegmentAllocator) is now pure abstract. The 8 non-segment leaves (NullArray, StructArray, ListArray, ListViewArray, FixedSizeListArray, VariantArray, UnknownArray, LazyDecimalBytePartsArray) each declare their own throwing impl explicitly rather than inheriting a default.

@dfa1 dfa1 force-pushed the worktree-prancy-meandering-meteor branch 4 times, most recently from 6af0be6 to c5990a9 Compare June 19, 2026 19:02
…witch

ArraySegments.of() re-stated each encoding's decode formula (FoR/ZigZag/ALP),
the chunked concat, and the dict gather in one large switch, separate from the
per-element accessor each lazy array already carries. Introduce
Array.materialize(SegmentAllocator) — a pure abstract method mirroring the
existing Array.limited() polymorphism — and push every case onto the type that
owns it:

- segment-backed arrays (Materialized*, VarBin, Generic, LazyDecimal) return
  their existing buffer with no copy;
- the Lazy FoR/ZigZag/ALP variants bulk-decode through their own getX(i) in a
  tight loop — the formula lives only in getX; the override exists purely to give
  the JIT a monomorphic, inlinable (hence vectorisable) call site that the shared
  megamorphic interface default cannot;
- chunked arrays concatenate children, dict arrays gather, constant-decimal fills;
- MaskedArray delegates to its inner data;
- primitive interfaces provide a scalar getX fallback, and BoolArray packs an
  LSB-first bitmap (the lazy-bool path that previously had no materialisation);
- families with no primary segment (struct, list, list-view, fixed-size list,
  variant, byte-parts decimal, null, unknown) implement it to throw explicitly.

ArraySegments.of() is removed; callers (ReadRegistry, ScanIterator, DecodeContext,
encoder/integration/performance code) invoke arr.materialize(arena) directly.
ArraySegments retains only the non-allocating trySegment() probe used by the scan
layer's dict zip-bomb guard, and is marked @deprecated(forRemoval = true) — it is
deleted once the decode-limits layer owns that bound. Behaviour is preserved,
including the broadcast edge where a constant column materialises to a single-element
buffer.

Adds ArrayMaterializeTest for direct coverage (zero-copy, scalar/bitmap fallbacks,
the Lazy formulas, chunked/dict, constant-decimal, masked delegation, and all
no-primary-segment throw cases). Updates ADR 0016 to document materialize() as the
shipped producer of the Arrow C-Data values buffer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dfa1 dfa1 force-pushed the worktree-prancy-meandering-meteor branch from c5990a9 to 5607bcf Compare June 19, 2026 19:04
if (getBoolean(i)) {
long byteIndex = i >>> 3;
byte b = dst.get(ValueLayout.JAVA_BYTE, byteIndex);
dst.set(ValueLayout.JAVA_BYTE, byteIndex, (byte) (b | (1 << (i & 7))));

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should call getBoolean(i)

@dfa1 dfa1 merged commit 332b067 into main Jun 19, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant