Skip to content

feat(inspector): new module + FFM-based TUI#23

Merged
dfa1 merged 37 commits into
mainfrom
feat/inspector-module
Jun 9, 2026
Merged

feat(inspector): new module + FFM-based TUI#23
dfa1 merged 37 commits into
mainfrom
feat/inspector-module

Conversation

@dfa1

@dfa1 dfa1 commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Summary

  • New vortex-inspector module: text + interactive TUI for inspecting a Vortex file's structure, encodings, segments, stats, dictionary entries, and per-cell data.
  • Zero-dependency TUI — drops Lanterna, drives the terminal directly via ANSI escapes + FFM (tcgetattr/cfmakeraw/ioctl on POSIX, kernel32 on Windows). Saves ~600 KB + the JNA transitive on Windows. CI gains a windows-latest job that runs inspector tests + a smoke test exercising the kernel32 FFM downcalls.
  • CLI: inspect <file|url> for the quick text dump, new tui <file|url> for the interactive viewer. Both accept local paths and http(s):// URLs.
  • TUI features: layout tree on the left, details on the right; per-array min/max stats; segment offset / length / compression / bits/elem per leaf; first 32 decoded values per column; dictionary entries when a vortex.dict node is selected; per-chunk stats children explicitly tagged; hex fallback for non-column nodes; spinner + status row driven by an IoWorker so input never blocks.
  • Decode helpers promoted to core so any reader-jar consumer benefits:
    • GenericArray.getDecimal(long) — handles both vortex.decimal (single buffer) and vortex.decimal_byte_parts (one mantissa child) shapes; element width derived from buffer size, not precision (catches narrower-than-precision encodings).
    • Extensions.localDate(Array, long) / localDate(DType.Extension, Array, long) — decodes vortex.date from any signed-integer storage array; the explicit-dtype overload verifies the extension id rather than trusting any caller-supplied storage.
  • Reader fix: ScanIterator.truncateArray now supports GenericArray via a new GenericArray.withLength so ScanOptions.withLimit works on decimal columns.

Test plan

  • ./mvnw verify (full reactor) green on macOS
  • ./mvnw javadoc:javadoc -pl inspector clean
  • inspector-windows CI job (windows-latest) passes
  • Manual: java -jar vortex.jar inspect <local.vtx> prints schema / used encodings / segment table / stats
  • Manual: java -jar vortex.jar tui <local.vtx> opens TUI, navigates struct → stats → chunked → flat, shows decoded data + dict entries + decimal/date values + bits/elem
  • Manual: java -jar vortex.jar tui https://vortex-compat-fixtures.s3.amazonaws.com/v0.72.0/arrays/tpch_lineitem.regular.vortex — instant open, progressive column fill, no sluggishness on navigation

🤖 Generated with Claude Code

dfa1 and others added 30 commits June 9, 2026 07:17
Moves VortexInspector out of reader into a new vortex-inspector module
and adds InspectorTree (immutable structural snapshot) plus a Lanterna
two-pane TUI (`inspect --tui`). CLI also gains http(s):// URL support.

The text report and TUI now display total row count and per-segment
size, offset, and compression scheme, resolving every segment index
against the footer's segment table.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drops the lanterna dependency in favour of a small in-tree terminal
abstraction built on Java 25's FFM (MemorySegment / Linker), keeping
the project's no-JNI / no-Unsafe stance and shrinking the runtime
footprint by ~600 KB (plus the transitive JNA on Windows).

New package io.github.dfa1.vortex.inspect.term:

- RawTerminal — sealed AutoCloseable abstraction over POSIX / Windows.
- PosixTerminal — libc tcgetattr/cfmakeraw/tcsetattr/ioctl(TIOCGWINSZ)
  via FFM. Saves and restores the prior termios; a shutdown hook
  guarantees restoration if the caller forgoes try-with-resources.
- WindowsTerminal — kernel32 GetStdHandle / Get/SetConsoleMode and
  GetConsoleScreenBufferInfo via FFM. Enables VT processing on stdout
  and VT input on stdin (Win10 1809+).
- Ansi — CSI escape constants + moveTo / fg / bg helpers.
- Key — sealed key event type (arrows, PgUp/Dn, Home/End, Enter, Esc,
  Eof, Char).
- KeyDecoder — stateless byte-stream → Key decoder covering xterm
  CSI letter and tilde sequences.

The inspector tree and text renderer are unchanged; only
VortexInspectorTui swaps its drawing backend.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous catch printed only e.getMessage(), which surfaced as
"error: null" whenever the exception had no message (e.g. an
IOException constructed from a cause). The new describe() prints the
simple class name plus the cause chain so failures are diagnosable
without rebuilding. Setting VORTEX_DEBUG=1 still emits the full stack
trace. The catch now also covers RuntimeException so unchecked
failures during TUI rendering surface the same way.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`inspect <file|url>` keeps the quick text report; the new `tui
<file|url>` subcommand opens the interactive viewer. Drops the
`--tui` flag from `inspect` - the split matches how the two modes
are used in practice and avoids mixing a one-shot output command
with one that takes over the terminal.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
InspectorTree.Node now carries an ArrayStats record decoded from the
flat segment's FlatBuffer Array root (the same source the scan reader
uses for zone-map pruning). The text renderer aggregates min/max
across each column's leaves and prints them after the per-column
encoding bracket. The TUI details pane shows the selected node's own
min/max under a new 'Stats:' section.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a windows-latest job that runs `./mvnw test -pl inspector -am`
so regressions in the FFM kernel32 bindings surface in CI.

Also adds WindowsTerminalSmokeTest which (1) loads WindowsTerminal to
force every kernel32 downcallHandle to resolve its symbol (a missing
entry point throws UnsatisfiedLinkError during static init) and (2)
verifies the input / output console-mode flag math against the values
WindowsTerminal applies. The test class is gated on
`@EnabledOnOs(OS.WINDOWS)`; it's skipped on the existing Linux job.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Building the inspector tree calls handle.slice() once per Flat segment
to peek the encoding id and per-array stats. On VortexHttpReader that
slice triggers a separate HTTP range request, so on a remote file
with dozens of segments the TUI sits idle for several seconds before
the screen appears.

Adds InspectorTree.Progress (functional interface, NOOP default) and
an InspectorTree.build(handle, progress) overload that fires
(current, total) on each peek. VortexInspectorTui.show gains a
matching overload. TuiCommand wires a stderr progress bar so the
delay is visible. The single-arg variants are kept for callers that
don't want a callback.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When a Flat node is selected, the details pane now shows the first
256 bytes of its first segment alongside encoding / stats / segment
metadata. Output mirrors xxd: 8-digit hex offset, 16 hex bytes split
in two groups of 8, plus a printable-ASCII column.

Bytes are fetched on demand via VortexHandle.slice and cached per
node so repeated re-renders on the same selection don't re-trigger an
HTTP range request on remote files. Slice failures degrade silently
to "no hex preview" rather than crashing the loop.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Opens the TUI instantly even on remote files by switching to
InspectorTree.buildShallow(handle), which derives the layout tree
from the footer alone (no slice calls). Encoding id, per-array stats,
and a data preview are now fetched per node on demand the first time
the user selects it, then cached for free re-renders.

Adds:
  - InspectorTree.buildShallow(VortexHandle) - structure-only build
  - InspectorTree.Peek (public record) and InspectorTree.peek(Node, VortexHandle)
    for one-shot lazy resolution of encoding + stats
  - VortexInspectorTui now invokes a small scan (limit 32, projected to
    the selected node's owning column) and formats the resulting Array
    via a pattern switch on the Array sealed hierarchy. Raw hex remains
    as a fallback when the selected node isn't inside any column.

The existing eager InspectorTree.build(handle, progress) path is kept
for the text-mode `inspect` command (and the test suite), so the only
behaviour change for non-TUI consumers is the new Peek type.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Data previews now load on virtual threads, so navigating doesn't
stall the input loop on each new column. A small ASCII spinner
shows next to "Data (column 'X'): | loading..." in the details pane
while the fetch is in flight; once a virtual thread completes the
cache entry flips to Loaded and the next render shows the values.
Failed fetches surface as "! <message>" in the same slot.

The main loop now polls via RawTerminal.readKey(timeoutMs) every
80 ms, so the spinner animates and completed fetches paint as soon
as they land — no need for the user to press a key.

A new status row sits between the body and the keybinding footer:
green "ready" when idle, blue "<spinner> I/O N pending" while
fetches are in flight, red "! <message>" sticky on the last error.

Top-level struct columns are pre-fetched in the Loop constructor so
the user can scroll through them with cache hits rather than cold
misses.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous commit moved data fetches onto virtual threads, but
VortexReader/VortexHttpReader use Arena.ofConfined() — every slice()
and scan() must run on the thread that opened the handle. Virtual
threads tripped the FFM scope check ("Attempted access outside
owning thread"), so every column showed up as a Failed entry.

Adds a single-threaded IoWorker that:
  - opens the VortexHandle on its own thread (via runAndAwait at
    startup) so the confined Arena is owned by the worker
  - executes every subsequent peek / hex slice / scan submitted by
    the TUI on that same thread
  - exposes pending() so the status row can show "I/O N pending"
    without the Loop tracking its own counter

The render thread now never touches the handle directly. Peek and
hex preview switched from synchronous to fire-and-forget submit; the
detail pane shows the spinner until the first result lands.
TuiCommand opens, runs, and closes the handle entirely through the
worker.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ScanIterator.truncateArray throws "limit: truncation not supported
for GenericArray" for the array shape used by decimal_byte_parts
(and likely other fallback dtypes). The TUI's data preview was
asking for withLimit(32) on every column, which made decimal
columns fail outright with that message landing in the status row.

Since the slicing happens inside the format loop anyway
(Math.min(array.length(), DATA_PREVIEW_ROWS)) and chunks are the
natural granularity of a Vortex scan, the withLimit call wasn't
actually saving any work. Removing it sidesteps the reader bug
without losing functionality.

Also makes the default formatValue branch include the dtype so
GenericArray cells render as "<GenericArray decimal(15,2)>" rather
than an opaque "<GenericArray>".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… decimals in TUI

`ScanIterator.truncateArray` previously threw "limit: truncation not
supported for GenericArray" whenever ScanOptions.withLimit was used
on a column decoded into a GenericArray (decimal, ext, datetimeparts,
constant). The new branch calls `GenericArray.withLength(rows)`,
which reuses the same buffers and children — safe because callers
already bound their reads by `length()`.

GenericArray gains a small public surface (`withLength`,
`bufferCount`, `bufferAt`, `childCount`) so renderers can introspect
the underlying buffer without reaching for the package-private
accessor.

The TUI uses the new accessors to decode Decimal cells properly:
read the little-endian two's-complement mantissa from the single
buffer at width derived from `precision`, then format via BigDecimal
with `scale`. Other GenericArray-shaped cells still fall back to the
"<GenericArray dtype>" placeholder.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous commit put the BigInteger/BigDecimal decoding helpers
inside VortexInspectorTui, which meant only the inspector module
could turn a Decimal-shaped GenericArray cell into a number. Anyone
consuming vortex-reader directly (CLI export, JDBC, downstream
applications) still saw an opaque buffer.

Promotes the logic to a public method on GenericArray itself:

    BigDecimal value = a.getDecimal(i);

Width is derived from dtype's precision (1 / 2 / 4 / 8 / 16 bytes
for precision ≤ 2 / 4 / 9 / 18 / 38) and scaled by dtype's scale.
Throws VortexException on misuse (non-decimal dtype, multi-buffer
array). TUI's formatValue now calls a.getDecimal(i).toPlainString()
instead of its own private helpers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extends getDecimal to handle the second shape Vortex decoders produce
for decimal columns: zero buffers, one child carrying the
most-significant integer part as a typed array (LongArray /
IntArray / ShortArray / ByteArray, optionally wrapped in a
MaskedArray). This is the shape vortex.decimal_byte_parts emits when
lower_part_count == 0 — i.e. tpch_lineitem.regular's l_quantity,
l_extendedprice, l_discount and l_tax columns.

The TUI's pattern switch drops its single-buffer guard and now calls
getDecimal whenever the dtype is Decimal, falling back to the
placeholder only if getDecimal itself rejects the shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a small core helper for the most common Vortex extension dtype.
Storage is days since the Unix epoch (Arrow convention) carried in
any signed integer primitive array — ByteArray, ShortArray, IntArray,
LongArray, or a MaskedArray wrapping one of those. Extensions.DATE
holds the canonical "vortex.date" id string so callers don't have to
hard-code it.

  LocalDate d = Extensions.localDate(array, i);

The TUI's data preview now calls localDate() for ext<vortex.date>
columns (l_shipdate / l_commitdate / l_receiptdate in tpch_lineitem),
so the values render as 1996-02-12 instead of "9538". Falls back to
the generic per-array switch if localDate throws on an unexpected
storage shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…elected

The TUI's details pane now decodes child[0] (the values layout) of any
selected vortex.dict node and renders the unique entries underneath
"Dictionary (N entries):". Same DataState/spinner machinery as the
column data preview, so the lookup runs on the IoWorker and the UI
stays responsive.

To make this possible without duplicating decoder plumbing, VortexHandle
gets a single `registry()` accessor — same internal-escape-hatch shape
as `slice()`. The actual decode is now one inline call in the TUI:

    FlatSegmentDecoder(handle.registry())
        .decode(handle.slice(...), handle.footer().arraySpecs(),
                dtype, values.rowCount(), arena);

The previous `decodeFlatLayout` method on VortexHandle (added in the
same session) and its duplicated impls in VortexReader / VortexHttpReader
are gone — that method was leaking encoding-decoder plumbing into the
file-handle interface and was duplicated across both readers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ExtEncoding.decode unwraps the storage child and returns it with its
primitive dtype (I32 for dates), so the column Array no longer carries
the Extension marker by the time it reaches the TUI. The previous
guard `array.dtype() instanceof DType.Extension` therefore never
matched and dates rendered as raw epoch-day integers (9577, 9606, ...)
instead of 1996-03-21.

The TUI now threads the column's declared dtype (looked up in the
top-level struct schema) through to formatValue, so the date check
runs against the schema-level type rather than the post-unwrap array
type. The new helper Extensions.localDateFromStorage decodes from any
signed-integer storage array without re-checking the dtype, since the
caller has already established context.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…egment lines

Renames the segment table's "comp=" column to "compression=" in both
the text inspector and the TUI details pane so the field reads as
plain English. The TUI segment lines also gain a "bits/elem=N.NN"
suffix computed from the owning layout's row count and segment byte
length, which makes the encoding's compression ratio obvious at a
glance (e.g. bitpacked vs flat for the same column).

The top-level segment table in the text inspector keeps the same
columns minus bits/elem — that table is global and a single segment
can be reused across layouts with different row counts, so a single
bits/elem number would be misleading.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The TUI now appends ", stats" to the row count in the tree view for
nodes that hold zone-map statistics rather than column data:

- the second child of any vortex.stats (Zoned) node
- the first child of a vortex.chunked layout whose metadata byte 0
  is set to 1 (matches the ScanIterator skip rule)

So instead of two indistinguishable "vortex.flat (8 rows)" siblings
under a vortex.stats node, the stats one renders as
"vortex.flat (8 rows, stats)" — explains the seemingly anomalous
high bits/elem (it's bits per stats row, not per data value).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
GenericArray.getDecimal previously picked the on-disk integer width
from the dtype's precision (1 / 2 / 4 / 8 / 16 bytes for precision
≤ 2 / 4 / 9 / 18 / 38). vortex.decimal is free to pick a narrower
valuesType when the actual values fit, so a decimal(15,2) column
whose values fit in I32 is stored at 4 bytes per element — but the
precision table said 8, and the decoder happily read garbage from
the half-element offset.

The fix derives the width from the single buffer's byteSize divided
by length, then validates the result is 1 / 2 / 4 / 8 / 16. An
unalignedbuffer-size now throws VortexException rather than silently
truncating. Adds an explicit bounds check on i so callers that don't
respect length() fail fast with IndexOutOfBoundsException rather than
reading past the buffer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
localDateFromStorage took any Array with no dtype check, so passing a
plain I32 column would silently render as a date. The doc said the
caller had "already established context" but the API didn't enforce
it.

Replaces it with a localDate(DType.Extension, Array, long) overload:
the caller must supply the declared extension dtype, which Extensions
then verifies against the vortex.date id before decoding. The
inspector's call site already had the column's declared dtype in
scope, so threading it through is one line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Loop.indexStatsChildren reads the first byte of vortex.chunked layout
metadata to decide whether child[0] is the per-chunk stats payload.
That ByteBuffer wraps a confined-Arena segment owned by the IoWorker
thread (the only thread that ever calls VortexReader.open under the
TUI), so doing the read on the main render thread tripped
WrongThreadException on local files:

    error: WrongThreadException: Attempted access outside owning thread

The fix dispatches indexStatsChildren via worker.runAndAwait so the
metadata read happens on the owning thread. The set is populated
before runAndAwait returns and only read afterwards, so the
synchronized signal in runAndAwait gives us the happens-before we
need for the subsequent unsynchronized HashSet reads from the render
loop.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Temporary measure to localise the WrongThreadException that still
trips on local-file TUI startup after the indexStatsChildren fix.
Will revert once the root cause is found.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Layout is a record whose components include a ByteBuffer metadata
field. Record-generated equals/hashCode delegate to ByteBuffer's
which reads the underlying bytes — when those bytes wrap a confined
Arena owned by another thread (the IoWorker, in the TUI), the read
throws WrongThreadException.

The TUI dropped into that path on every local-file open: HashSet.add
on the root node → Node.hashCode → Layout.hashCode →
ByteBuffer.hashCode → arena byte read on the render thread.

Nodes are constructed exactly once per shallow build and used as
container keys by reference everywhere in the inspector, so identity
semantics are the correct contract anyway. Overriding equals /
hashCode on the record sidesteps Layout's metadata entirely and
fixes every container — expanded set, peek/hex/dict caches, columnOf
map, statsChildren set — in one shot.

Also reverts the unconditional stack trace from the previous debug
commit; VORTEX_DEBUG=1 gates it again.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…containers

Reverts the equals/hashCode override on InspectorTree.Node — a
record's value semantics are part of its contract, and overriding
them is surprising to readers and tools.

Moves the identity semantics to the call sites instead: every
Node-keyed container in Loop now backs onto IdentityHashMap, wrapped
in synchronizedMap / synchronizedSet for the caches that the IoWorker
writes to. Plain IdentityHashMap is fine for the
constructor-populated containers (columnOf, statsChildren, expanded)
since they're only accessed on the render thread once IoWorker init
returns. dataCache keeps ConcurrentHashMap (String keys, no Node
hashing).

Functionally equivalent to the previous fix — the WrongThreadException
that surfaced through Layout's record-auto hashCode never fires
because IdentityHashMap and HashSet-of-IdentityHashMap both compare
references via System.identityHashCode.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The "Raw" prefix was redundant — every terminal abstraction in the
inspect.term package is the raw / low-level one (there's nothing
"non-raw" to disambiguate against). Plain Terminal reads more
naturally at every call site.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PosixTerminal and WindowsTerminal call FFM downcalls
(tcgetattr / cfmakeraw / ioctl / SetConsoleMode), which JEP 472
flags as "restricted methods" in JDK 25. Without an explicit opt-in,
the JVM prints a four-line "WARNING: restricted method has been
called" block on each first invocation and threatens to block such
calls entirely in a future release.

The standard fix for an uber-jar is the Enable-Native-Access manifest
attribute (also from JEP 472): the entry-point module gets native
access on launch without the user passing the corresponding command
flag. Only the cli jar gets the entry; vortex-core / vortex-reader
consumers still have to enable native access in their own deployments
if they touch FFM directly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When vortex.decimal_byte_parts produces a GenericArray whose single
child is a MaskedArray (nullable decimal columns), mantissaFromChild
used to unwrap the MaskedArray and read straight from a.inner() at
index i — silently returning whatever integer happened to occupy the
slot for null cells.

Now consults a.isValid(i) first and throws a VortexException with a
"null cell at index N" message if the bit is clear. The TUI's
tryDecimal recognises that message and renders the cell as "null"
instead of falling back to the generic "<GenericArray ...>"
placeholder.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
GenericArray.getDecimal silently supported 16-byte mantissas for
decimal(>18, _) columns but had no test exercising it; the
precision-table tests stopped at decimal(15,2) / I64. The new test
round-trips ±2^70 through a single-buffer decimal(38,4) so the
ValueLayout.JAVA_BYTE loop in readSignedLe gets actually walked end
to end and the little-endian -> big-endian flip is verified at i128
width.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dfa1 and others added 2 commits June 9, 2026 07:17
Both localDate overloads now reject indices outside [0, length()) up
front instead of leaking through to storage.getInt and silently
reading whatever the typed-array accessor produces past the end (or
worse: garbage from a half-element offset when the storage layer
doesn't bounds-check).

The class-level doc previously listed vortex.timestamp / vortex.time
as covered types, but only vortex.date was actually implemented.
Trims the doc to match reality and notes timestamp / time as TODO
gated on a public ScalarUnit type.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…on-free

readSignedLe used to allocate a byte[width] per call and copy bytes
one at a time before handing it to BigInteger. Fine for the 32-row
TUI preview, painful for any caller decoding a full decimal column.

Widths 1 / 2 / 4 / 8 now use the corresponding native ValueLayout
(JAVA_BYTE / SHORT_LE / INT_LE / LONG_LE) and feed
BigInteger.valueOf, which boxes via the small-integer cache when the
value fits. Width 16 keeps the heap-byte-array path under a separate
helper — there is no 128-bit ValueLayout and the i128 case only
fires for decimal(>18, _) columns, which are rare.

LE layouts are constructed explicitly via withOrder(LITTLE_ENDIAN)
rather than relying on native byte order, so the code stays correct
on big-endian hosts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dfa1 dfa1 force-pushed the feat/inspector-module branch from eb8d1cd to d80fbc4 Compare June 9, 2026 05:17
dfa1 and others added 5 commits June 9, 2026 07:21
Adds an "Extension types" section to docs/compatibility.md mirroring
the encodings table style. Covers the four extensions the Rust
reference defines under vortex-array/src/extension: date, time,
timestamp, uuid. Notes the canonical id, storage shape, metadata
layout, the matching Java decoder (Extensions.localDate is the only
one wired up), and whether it's supported.

Includes the TimeUnit metadata-byte enum table referenced from
extension/datetime/unit.rs so the precision-byte values aren't a
magic constant for future implementers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wires up the second Vortex extension type (after vortex.date). Storage
shape per the Arrow canonical UUID extension is
FixedSizeList(Primitive(U8), 16); each row is 16 contiguous bytes
interpreted as a big-endian UUID. Extensions.uuid(Array, long) reads
both halves with an explicit & 0xffL mask so ByteArray.getByte's
sign-extension doesn't poison the upper bytes of msb / lsb.

  java.util.UUID id = Extensions.uuid(array, i);

The matching uuid(DType.Extension, Array, long) overload guards
against a non-uuid extension being silently reinterpreted, mirroring
the localDate pattern.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the third Vortex extension type. Storage is a signed integer in
the TimeUnit recorded in ext.metadata() byte 0:

  - tag 0 / Nanoseconds  -> I64 nanos-of-day
  - tag 1 / Microseconds -> I64 micros-of-day
  - tag 2 / Milliseconds -> I32 millis-of-day
  - tag 3 / Seconds      -> I32 seconds-of-day

Days (tag 4) is rejected — vortex.time is a sub-second unit.
Conversion scales raw to nanos via 1e9 / TimeUnit.divisor() so the
existing TimeUnit enum carries all the precision math.

  LocalTime t = Extensions.localTime(ext, storage, i);

Tests cover all four sub-second units, the Days-tag rejection, the
missing-metadata path, and the wrong-extension-id guard.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…eTime

Final Vortex extension type. Two complementary helpers:

  Instant t = Extensions.instant(ext, storage, i);            // tz-less
  ZonedDateTime z = Extensions.zonedDateTime(ext, storage, i); // tz-aware

Storage is an I64 count of the metadata-recorded TimeUnit since the
Unix epoch (Days rejected, same as vortex.time). Wire format for the
extension metadata, kept binary-compatible with the Rust reference:

  byte[0]   = TimeUnit tag
  bytes[1..3] = tz_len (u16 LE)
  bytes[3..3+tz_len] = tz UTF-8

instantFromRaw uses Math.floorDiv / floorMod for the μs and ns paths
so negative timestamps (pre-1970) split cleanly across the seconds
boundary instead of rounding fractional nanos towards zero.
zonedDateTime defaults to UTC when tz_len == 0; Extensions.timezone
is exposed so callers can ask the column's recorded zone without
materialising an Instant. Truncated metadata (declared tz_len longer
than the buffer can carry) throws rather than silently decoding a
shorter zone string.

docs/compatibility.md now records all four Vortex extensions as
implemented in vortex-java.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…y class

Mirrors the Encoding / EncodingId pattern. The new sealed interface
io.github.dfa1.vortex.core.Extension fuses the closed-world id
classification with the typed decode behaviour:

  public sealed interface Extension permits Date, Time, Timestamp, Uuid, Custom {
      String id();
      ...
      static Extension of(String id);
  }

Each spec-defined variant (Date, Time, Timestamp, Uuid) is a final
class with its own statically-typed decode methods — no Object
return type, no caller-side downcasts:

  LocalDate    d  = Extension.DATE.decode(storage, i);
  LocalTime    t  = Extension.TIME.decode(ext, storage, i);
  Instant      ts = Extension.TIMESTAMP.instant(ext, storage, i);
  java.util.UUID u = Extension.UUID.decode(storage, i);
  Optional<ZoneId> z = Extension.TIMESTAMP.timezone(ext);

Custom(String id) carries any non-spec id verbatim so unknown
extensions round-trip without loss. DType.Extension.kind() returns
the matching record so callers pattern-match exhaustively:

  switch (ext.kind()) {
      case Extension.Date d      -> d.decode(storage, i);
      case Extension.Time t      -> t.decode(ext, storage, i);
      case Extension.Timestamp ts -> ts.instant(ext, storage, i);
      case Extension.Uuid u      -> u.decode(storage, i);
      case Extension.Custom c    -> renderPlaceholder(c.id());
  }

Drops core/array/Extensions.java entirely. Its String constants
(DATE / TIME / TIMESTAMP / UUID_ID) move onto the records as ID
constants; its static helpers move onto the records as instance
methods; its shared utilities (epochInteger, readUnit, instantFromRaw,
checkBounds) live as private static helpers inside the sealed
interface.

VortexInspectorTui's date format switch now binds the
Extension.Date record and calls date.decode(array, i) directly,
replacing the previous Extensions.localDate(ext, array, i) call.

docs/compatibility.md updated with the new dispatch example and a
table row for Extension.Custom.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dfa1 dfa1 merged commit 175ad07 into main Jun 9, 2026
3 checks passed
@dfa1 dfa1 deleted the feat/inspector-module branch June 12, 2026 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant