Skip to content

Commit acae23c

Browse files
dfa1claude
andcommitted
docs(adr): mark Array.materialize seam as shipped in ADR 0016
The materialise seam Option B (Arrow C-Data) builds on is now implemented: ArraySegments.of() removed, Array.materialize(SegmentAllocator) is a public abstract method. Update the relationship section to past tense, note the public (not package-private) contract decision, and record the broadcast edge the Arrow producer must expand. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 1998cea commit acae23c

1 file changed

Lines changed: 28 additions & 16 deletions

File tree

docs/adr/0016-vortex-arrow-bridge.md

Lines changed: 28 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -115,8 +115,7 @@ Three pieces of work beyond just handing over the existing mmap slices:
115115
2. **Lazy materialisation.** Lazy arrays (ZigZag/FoR/ALP/Dict/RLE) store the
116116
*encoded* form, which is not the Arrow values layout, so they must be materialised
117117
into a contiguous LE segment first. This is exactly the producer step that
118-
`ArraySegments.of(...)` (or a future `Array.materialize(arena)` delegation seam,
119-
see below) performs, so the internal materialise path feeds the `values` buffer
118+
`Array.materialize(arena)` performs (see below), so it feeds the `values` buffer
120119
directly. Primitive values, VarBin data+offsets, and StringView are already
121120
Arrow-shaped (zero-copy).
122121
3. **Lifetime / release contract.** Buffers are zero-copy slices of the mmap'd file
@@ -127,20 +126,33 @@ Three pieces of work beyond just handing over the existing mmap slices:
127126
consumer calls `release` is a use-after-unmap → native segfault, not a Java
128127
exception. This is the highest-risk part.
129128

130-
### Relationship to the internal materialise seam
131-
132-
`ArraySegments.of(Array, SegmentAllocator)` already centralises "turn any array
133-
(lazy or eager) into a contiguous LE primitive segment", and currently re-states each
134-
encoding's decode formula (ZigZag/FoR/ALP) in a large switch separate from the
135-
per-element accessor on the lazy array. A standalone refactor — moving that bulk
136-
materialisation onto the array types as an `Array.materialize(SegmentAllocator)`
137-
delegation (mirroring the existing `Array.limited(...)` pattern, kept on a
138-
package-private seam to avoid widening the public API) — stands on its own as a
139-
locality cleanup. It is **not** an Arrow feature, but it is the natural producer of
140-
the Arrow `values` buffer, so Option B should build on it rather than duplicate it.
141-
The contiguous LE segment it yields already matches Arrow's primitive values-buffer
142-
layout; the gap to a full Arrow array is validity + offsets + children, per the table
143-
above.
129+
### Relationship to the `Array.materialize` seam (shipped)
130+
131+
The bulk-materialisation seam Option B builds on now exists. Previously
132+
`ArraySegments.of(Array, SegmentAllocator)` centralised "turn any array (lazy or eager)
133+
into a contiguous LE primitive segment" in a large switch that re-stated each encoding's
134+
decode formula (ZigZag/FoR/ALP) separately from the per-element accessor on the lazy
135+
array. That switch was pushed onto the array types themselves as
136+
`Array.materialize(SegmentAllocator)` — a pure abstract method (mirroring the existing
137+
`Array.limited(...)` polymorphism); `ArraySegments.of()` was removed and callers invoke
138+
`arr.materialize(arena)` directly. Each type owns its path: segment-backed arrays return
139+
their buffer zero-copy, the `Lazy*` variants apply their inlined formula in a vectorisable
140+
loop, chunked/dict arrays concat/gather, and the families with no primary segment (struct,
141+
list, variant, byte-parts decimal, null, unknown) throw.
142+
143+
This was a standalone locality cleanup, **not** an Arrow feature — but it is the natural
144+
producer of the Arrow `values` buffer, so Option B should build on it rather than
145+
duplicate it. The contiguous LE segment it yields already matches Arrow's primitive
146+
values-buffer layout. Two gaps remain to a full Arrow array, both per the table above:
147+
validity + offsets + children; and the broadcast edge — a constant column materialises to
148+
a single-element buffer (`length != elementCount`), which `materialize()` returns as-is
149+
(matching the prior `ArraySegments` behaviour), so the Arrow producer must expand it to
150+
`length` values.
151+
152+
`materialize` is intentionally part of the public `Array` contract (not a package-private
153+
seam): it is the documented way to obtain a column's contiguous primitive buffer, and a
154+
future `vortex-arrow` module in a separate package consumes it without further API
155+
widening.
144156

145157
### Option C — No bridge; document manual conversion
146158

0 commit comments

Comments
 (0)