@@ -115,8 +115,7 @@ Three pieces of work beyond just handing over the existing mmap slices:
1151152 . ** Lazy materialisation.** Lazy arrays (ZigZag/FoR/ALP/Dict/RLE) store the
116116 * encoded* form, which is not the Arrow values layout, so they must be materialised
117117 into a contiguous LE segment first. This is exactly the producer step that
118- ` ArraySegments.of(...) ` (or a future ` Array.materialize(arena) ` delegation seam,
119- see below) performs, so the internal materialise path feeds the ` values ` buffer
118+ ` Array.materialize(arena) ` performs (see below), so it feeds the ` values ` buffer
120119 directly. Primitive values, VarBin data+offsets, and StringView are already
121120 Arrow-shaped (zero-copy).
1221213 . ** Lifetime / release contract.** Buffers are zero-copy slices of the mmap'd file
@@ -127,20 +126,33 @@ Three pieces of work beyond just handing over the existing mmap slices:
127126 consumer calls ` release ` is a use-after-unmap → native segfault, not a Java
128127 exception. This is the highest-risk part.
129128
130- ### Relationship to the internal materialise seam
131-
132- ` ArraySegments.of(Array, SegmentAllocator) ` already centralises "turn any array
133- (lazy or eager) into a contiguous LE primitive segment", and currently re-states each
134- encoding's decode formula (ZigZag/FoR/ALP) in a large switch separate from the
135- per-element accessor on the lazy array. A standalone refactor — moving that bulk
136- materialisation onto the array types as an ` Array.materialize(SegmentAllocator) `
137- delegation (mirroring the existing ` Array.limited(...) ` pattern, kept on a
138- package-private seam to avoid widening the public API) — stands on its own as a
139- locality cleanup. It is ** not** an Arrow feature, but it is the natural producer of
140- the Arrow ` values ` buffer, so Option B should build on it rather than duplicate it.
141- The contiguous LE segment it yields already matches Arrow's primitive values-buffer
142- layout; the gap to a full Arrow array is validity + offsets + children, per the table
143- above.
129+ ### Relationship to the ` Array.materialize ` seam (shipped)
130+
131+ The bulk-materialisation seam Option B builds on now exists. Previously
132+ ` ArraySegments.of(Array, SegmentAllocator) ` centralised "turn any array (lazy or eager)
133+ into a contiguous LE primitive segment" in a large switch that re-stated each encoding's
134+ decode formula (ZigZag/FoR/ALP) separately from the per-element accessor on the lazy
135+ array. That switch was pushed onto the array types themselves as
136+ ` Array.materialize(SegmentAllocator) ` — a pure abstract method (mirroring the existing
137+ ` Array.limited(...) ` polymorphism); ` ArraySegments.of() ` was removed and callers invoke
138+ ` arr.materialize(arena) ` directly. Each type owns its path: segment-backed arrays return
139+ their buffer zero-copy, the ` Lazy* ` variants apply their inlined formula in a vectorisable
140+ loop, chunked/dict arrays concat/gather, and the families with no primary segment (struct,
141+ list, variant, byte-parts decimal, null, unknown) throw.
142+
143+ This was a standalone locality cleanup, ** not** an Arrow feature — but it is the natural
144+ producer of the Arrow ` values ` buffer, so Option B should build on it rather than
145+ duplicate it. The contiguous LE segment it yields already matches Arrow's primitive
146+ values-buffer layout. Two gaps remain to a full Arrow array, both per the table above:
147+ validity + offsets + children; and the broadcast edge — a constant column materialises to
148+ a single-element buffer (` length != elementCount ` ), which ` materialize() ` returns as-is
149+ (matching the prior ` ArraySegments ` behaviour), so the Arrow producer must expand it to
150+ ` length ` values.
151+
152+ ` materialize ` is intentionally part of the public ` Array ` contract (not a package-private
153+ seam): it is the documented way to obtain a column's contiguous primitive buffer, and a
154+ future ` vortex-arrow ` module in a separate package consumes it without further API
155+ widening.
144156
145157### Option C — No bridge; document manual conversion
146158
0 commit comments