refactor(reader): Array.materialize() — kill ArraySegments duplication#62
Conversation
| } | ||
| } | ||
| return dst.asReadOnly(); | ||
| return arr.materialize(arena); |
There was a problem hiding this comment.
Done in 2760e5a — removed ArraySegments.of(); callers (ReadRegistry, ScanIterator, DecodeContext, encoder + integration tests) now call arr.materialize(arena) directly. ArraySegments keeps only the non-allocating trySegment() probe.
| /// @return the primary {@link MemorySegment} | ||
| /// @throws VortexException if this array type has no primary segment | ||
| default MemorySegment materialize(SegmentAllocator arena) { | ||
| throw new VortexException(getClass().getSimpleName() + " has no primary segment"); |
There was a problem hiding this comment.
let's use a pure abstract method
There was a problem hiding this comment.
Done in 2760e5a — materialize(SegmentAllocator) is now pure abstract. The 8 non-segment leaves (NullArray, StructArray, ListArray, ListViewArray, FixedSizeListArray, VariantArray, UnknownArray, LazyDecimalBytePartsArray) each declare their own throwing impl explicitly rather than inheriting a default.
6af0be6 to
c5990a9
Compare
…witch ArraySegments.of() re-stated each encoding's decode formula (FoR/ZigZag/ALP), the chunked concat, and the dict gather in one large switch, separate from the per-element accessor each lazy array already carries. Introduce Array.materialize(SegmentAllocator) — a pure abstract method mirroring the existing Array.limited() polymorphism — and push every case onto the type that owns it: - segment-backed arrays (Materialized*, VarBin, Generic, LazyDecimal) return their existing buffer with no copy; - the Lazy FoR/ZigZag/ALP variants bulk-decode through their own getX(i) in a tight loop — the formula lives only in getX; the override exists purely to give the JIT a monomorphic, inlinable (hence vectorisable) call site that the shared megamorphic interface default cannot; - chunked arrays concatenate children, dict arrays gather, constant-decimal fills; - MaskedArray delegates to its inner data; - primitive interfaces provide a scalar getX fallback, and BoolArray packs an LSB-first bitmap (the lazy-bool path that previously had no materialisation); - families with no primary segment (struct, list, list-view, fixed-size list, variant, byte-parts decimal, null, unknown) implement it to throw explicitly. ArraySegments.of() is removed; callers (ReadRegistry, ScanIterator, DecodeContext, encoder/integration/performance code) invoke arr.materialize(arena) directly. ArraySegments retains only the non-allocating trySegment() probe used by the scan layer's dict zip-bomb guard, and is marked @deprecated(forRemoval = true) — it is deleted once the decode-limits layer owns that bound. Behaviour is preserved, including the broadcast edge where a constant column materialises to a single-element buffer. Adds ArrayMaterializeTest for direct coverage (zero-copy, scalar/bitmap fallbacks, the Lazy formulas, chunked/dict, constant-decimal, masked delegation, and all no-primary-segment throw cases). Updates ADR 0016 to document materialize() as the shipped producer of the Arrow C-Data values buffer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
c5990a9 to
5607bcf
Compare
| if (getBoolean(i)) { | ||
| long byteIndex = i >>> 3; | ||
| byte b = dst.get(ValueLayout.JAVA_BYTE, byteIndex); | ||
| dst.set(ValueLayout.JAVA_BYTE, byteIndex, (byte) (b | (1 << (i & 7)))); |
There was a problem hiding this comment.
this should call getBoolean(i)
What
ArraySegmentsre-stated each encoding's decode formula (FoR / ZigZag / ALP), the chunked concat, and the dict gather in one large central switch — separate from the per-element accessor that already lives on each lazy array. This addsArray.materialize(SegmentAllocator)and pushes every case onto the type that owns it, mirroring the existingArray.limited()polymorphism.ArraySegments.of()collapses toarr.materialize(arena)(407 → 71 lines).trySegment()— the non-allocating zero-copy probe — is unchanged.Where materialize lives now
Arrayroot: default throws "no primary segment" (struct / list / variant / byte-parts decimal).getXfallback — covers every lazy type with no special bulk form (RLE/RunEnd/Sparse/Constant/AlpRd) for free.BoolArraypacks an LSB-first bitmap.Materialized*/Generic/VarBin/LazyDecimal: return existing buffer, zero-copy.For/ZigZag/Alp: vectorisable formula loop, now beside their owngetX.Chunked*concat,Dict*gather,LazyConstantDecimalfill,MaskedArraydelegate to inner.Behaviour
Preserved.
of()/trySegment()semantics identical, including the broadcast short-buffer edge (a constant column's 1-element buffer, matching prior behaviour — not Arrow-complete; see ADR 0016). New: lazy bool now materialises instead of throwing.The contiguous little-endian segment
materialize()yields is the producer step for the Arrow C-Data values buffer described in ADR 0016 — but this PR is a standalone locality cleanup, not an Arrow feature.Tests
core 223, reader 540, writer 527, integration 268 (2 env-skipped) — all green.
verifypackaged clean, javadocfailOnWarningspassed.🤖 Generated with Claude Code