All notable changes to vortex-java are documented here.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.8.3 — 2026-06-23
A Sonar-driven refactoring release: no new file-format capability, but a focused pass using SonarCloud findings to drive cleanups — dead code removed, duplication factored out, and one hot-loop micro-optimisation. Each finding was triaged (lead, not verdict) so the changes preserve behaviour and the JIT vectorisation of the hot decode loops. The interpretation framework behind this is now documented in docs/testing.md.
FastLanes.transposeIndex/iterateIndex: replaced the per-element%//+ORDER[]indirection with permutation tables built once in a static initialiser. Faster address generation keeps more outstanding scatter misses in flight; measured 1.4×–3.4× on the transpose/undelta kernels (Apple M5, L1→DRAM working sets). The per-element decode loops stay specialised per width to preserve C2 superword vectorisation. (089b6e36, e683a634)
- Breaking (read SPI): removed
EncodingDecoder.accepts(DType). It was a residual of the ADR-0001 read/write split — encode-selection semantics copied onto the decoder side, where the reader dispatches purely byEncodingIdand never called it (dead since the split).EncodingEncoder.acceptsis unchanged. Downstream customEncodingDecoderimplementations should delete theiracceptsoverride. (7516a544)
- Internal dedup driven by Sonar duplication findings: extracted the shared FastLanes layout +
PType.bitsandPrimitiveArrays.toLongs/fromLongsinto core, hoisted theMaterialized*array boilerplate into a shared base, factored the fourBitpackedEncodingDecoderunpack loops onto one precomputed per-row schedule, addedPType.isUnsigned(dropping three private copies), and deduplicated the CLI inspect plumbing andformatBytes. (ec6b9631, a74263c0, 7af0af2a, 8362a353, 87c77cc9, d8f84088, b557e573, d52e8c0c) - Dropped dead
PTypeswitch arms in the writer'sreadPrimitiveElement,primitiveArrayLen, andbuildTypedUniqueArray— unreachable branches flagged as uncovered. (4c6ab149, 94d2fa49, f89072a6)
- Cleared two SonarCloud-reported bugs in the writer's SUM zone-map stat plumbing. (33798ab9)
- Suppressed
java:S1172onAbstractMaterializedArray.materializewith a reason — thearenaparameter is contractual (implementsArray#materialize(SegmentAllocator)for the leaf classes), not a removable unused parameter. (9b226f73)
- Filled coverage gaps surfaced by Sonar: the
Materialized*materializedefaults, everySchemaCommand.formatDTypearm, and the writer's global-dict cardinality fallback with U16 utf8 codes. (8741dad3, 77fad504, c2918eaa)
docs/testing.md: new section on reading Sonar/PIT as data — the uncovered-line triage (missing-test / dead-code / defensive-by-contract), why mutation testing splits what coverage cannot, and when duplication is the deliberate price of the hot-loop rule. (8999661b)
0.8.2 — 2026-06-22
The headline is writer-side zone-map statistics: the writer now emits vortex.stats (zoned) layouts carrying per-chunk MIN/MAX, NULL_COUNT, and SUM — matching the Rust reference — so zone-map chunk pruning and aggregate push-down work on Java-written files (previously the reader could decode these stats but the writer never produced them). The release also continues the test-hardening track: the lowest-covered encoder/decoder paths are filled in, SonarCloud new-code coverage is back to 100% with the quality gate green (overall ~83%, all ratings A, zero bugs/vulnerabilities), and the build toolchain is refreshed across eight dependency bumps.
- Writer:
vortex.stats(zoned) layout emission, toggled byWriteOptions.enableZoneMaps. Each column is wrapped with a per-zone (one zone per chunk) statistics table; the stat set follows the Rust reference exactly. (838dba82, f2d74351) - Writer: per-zone MIN/MAX for primitive columns including F16, extension columns (over their storage primitive), Utf8 columns (full string bounds), and dictionary-encoded columns (computed on the logical values, independent of the dict encoding). (838dba82, fb5d096a, 38ab5c51, c1198253, e51da936)
- Writer: per-zone NULL_COUNT for every column type. (135c9b37, c52d4b83, ab233b86)
- Writer: per-zone SUM for numeric primitive columns (signed →
i64, unsigned →u64, float →f64; integer overflow records a null sum). Matches Rust, which sums numeric primitives and decimals but not Utf8/extension columns. (9661f554) - Reader:
RowFilter.isNull/RowFilter.isNotNullpredicates with zone-map chunk pruning — IS NULL skips chunks with zero nulls, IS NOT NULL skips all-null chunks — via the per-chunknull_count. (2749b6ca) - Reader:
columnStats()aggregatesnull_countacross a column's chunks (reported only when every chunk carries one). (cb844f23)
- Reader: the shared default
HttpClientbehindVortexHttpReader.open(URI, ReadRegistry)is now a package-private non-final field used purely as a unit-test seam, so the default-client overload is driven to a normal return by a mocked client instead of a live network call. Production never reassigns it. (12e46270)
- Coverage for the ten lowest-coverage encode/decode classes —
ZigZagEncodingDecoder/Encoder,SequenceEncodingEncoder,VariantEncodingDecoder.dtypeFromProto(every proto→coreDTypearm),TimeExtensionEncoder,VarBinViewEncodingDecoder,VarBinEncodingDecoder,AlpEncodingDecoder,DateTimePartsEncodingDecoder, andDeltaEncodingDecoder— exercising guards, broadcast/constant paths, and ptype arms. (a3012d4a, c9386eda, 6c9682b8, bbb9d669, 7742ecd3) - Writer: property-based and mutation-driven round-trips for the Delta and AlpRd encoders. (d3d245a6)
- Reader: HTTP fixtures bumped to
v0.75.0with a smoke test across all encodings; theopen(URI, ReadRegistry)overload is now covered via the default-client seam. (8a1b5db2, 12e46270) - Reader: decoder tests allocate via
Arena.ofAuto()instead of the never-freedArena.global(). (59ec2e2a)
- Dependency refresh:
jacoco-maven-plugin0.8.13→0.8.15,pitest-maven1.20.0→1.25.5,checkstyle13.5.0→13.6.0,byte-buddy-agent1.17.7→1.18.10,central-publishing-maven-plugin0.10.0→0.11.0,maven-jar-plugin3.4.1→3.5.0,maven-dependency-plugin3.7.0→3.11.0, andactions/checkout6→7. (dab876b7, 7b7c3580, 46659a73, 46a30be1, c6723832, 3e5fa349, c943f81b, af009116)
0.8.1 — 2026-06-20
A hardening release: no new file-format capability, but a large step up in verification rigour. Mutation testing (PIT) now guards the security-critical bounds/parse paths in core, reader, and writer at 99–100% kill rate; the build fails on any javac warning (-Xlint:all -Werror); and property-based round-trips exercise every lossless encoding plus the full cascade-selection pipeline against seeded-random inputs. The one functional addition is boxed-nullable array input on the map writeChunk path.
- Writer: the map-based
writeChunkpath accepts boxed nullable arrays (Integer[],Long[],Double[], …) alongside primitive arrays, so columns with nulls can be written without manual validity bookkeeping. (4d18939a)
- Breaking —
ExtensionEncoder.encodeAllis now abstract. The default body threwVortexException; every implementation already overrides it, so the contract now fails at compile time rather than at runtime. (2dcd69ce) - Breaking —
Estimateis now an enum{ SKIP, ALWAYS_USE, COMPLETE }. The sealed interface with emptySkip/AlwaysUserecords, theskip()/alwaysUse()factories, and thenull"no verdict" sentinel are gone;COMPLETEis the explicit defer-to-sample-encode verdict. (c355a4bf) - Reader cleanups: dropped a dead
length < 0blob check and a redundantoffset > fileSizebounds clause, reused the sharedPTypeIOlittle-endian layouts, and removed redundant numeric casts flagged by static analysis. (5d5fcc45, 36328285, 04cab707)
- Writer: I8/I16 columns are excluded from the global dictionary — the reader cannot decode a narrow-int dict, so dict-encoding them produced unreadable files. (473256b1)
- Writer:
WriteRegistrynow iterates encoders in a deterministic order andaccepts()reports honestly, fixing a non-deterministic encoder selection that broke the Windows build. (9c4ebb18) - Reader: Pco decode now guards
preDeltaNagainst int overflow before clamping — the subtraction is widened tolong, restoring the overflow-safe path. (b7346e7c)
- Zero-warning rule:
-Xlint:all -Werroracross all modules. Theclassfilelint (which only flags missing annotation class files inside third-party Arrow bytecode) is scoped off in the two Arrow-using modules only. (dab467e5, 43f6f840) - Mutation testing (PIT): opt-in
pitestprofiles in core, reader, and writer, scoped to the bounds/parse classes (IoBounds,PTypeIO,WriteRegistry,ChunkImpl, …), with common config hoisted into the parent POM. (46904b24, ed8c98a1, 1200c76b, 840cc46a) - SonarCloud: generated
fbs/andproto/sources excluded from analysis (machine output, not hand-maintained); the deliberate per-width SIMD-loop duplication is documented in ADR 0005 rather than refactored away. Code smells dropped 857→394; coverage ~81%, all ratings A, zero bugs/vulnerabilities. (6c591293)
- Property-based lossless round-trips added for ALP (f32/f64), Delta/FoR/ZigZag/AlpRd, a bitpacked bit-width sweep, the full
CascadingCompressor(every codec × cascade depth 0–3), and a Pco seeded-random distribution sweep. (dbe44aaa, a2cf3443, aede11d7, 115dd6fd, a426c1de) - Mutation-driven test hardening lifted core/reader/writer bounds and registry classes to 99–100% kill rate. (2235499a, c9243f9a, 912fcaff)
- Integration: added Java↔Rust round-trips for
vortex.patched,fastlanes.delta, andmaskedencodings. (13702764) - CLI: terminal smoke tests now force class initialization so the FFM libc/kernel32 symbol resolution is actually exercised. (3f741ef7)
0.8.0 — 2026-06-20
Read and write Vortex Variant (semi-structured, JSON-shaped) columns from Java. Internally, transform encodings now decode lazily, trimming per-decode allocation. This release also hardens the reader's bounds handling on untrusted input (ADR 0003 Phase E), fixes CSV-import memory blow-ups on large files, and lifts test coverage to 80% with all Sonar ratings at A.
- Writer:
vortex.variantencoder. Encodes a variant column as the canonicalvortex.variantcontainer overcore_storage— an all-equal column becomes a singlevortex.constant, a row-varying column avortex.chunkedof per-run constants — with an optional row-aligned typedshreddedchild recorded inVariantMetadata.shredded_dtype. Input isVariantData(List<Scalar>)with.constant(n, v)/.shredded(...)factories. Java↔Rust (JNI) round-trip verified for constant, row-varying, and shredded columns. Scalar values only — arbitrary nested objects needvortex.parquet.variant(deferred, ADR 0014). (35da529d, e4e44980, 4566dca0) - Reader: variant columns now decode Java-side.
ConstantEncodingDecoderandChunkedEncodingDecoderhandleDType.Variant(materialising the inner-typed array);VariantEncodingDecoderwraps the result asVariantArray, exposingcoreStorage()andshredded(). (76e4c741, 4566dca0)
- Reader bounds hardening (ADR 0003 Phase E): untrusted offsets/lengths from file metadata now flow through a typed
IoBoundshelper that throwsVortexExceptioninstead of a rawIndexOutOfBoundsException, and hand-rolled index guards were replaced withObjects.checkIndex. A crafted flat-segment file can no longer trip an unchecked array access during decode. (e9af80d6, 3bcd9881, a5ce8380)
- CSV import: large files no longer OOM. The importer now streams rows in a single pass (buffering only the first chunk for schema inference) and disables the global-dictionary pass by default, which previously accumulated every distinct value in memory. (d5280ae2, 0b6784b5, 62863616)
- CLI:
IoWorker.runAndAwaitdecremented its in-flight counter after signaling completion, so a caller readingpending()right after it returned could still see the task counted; the counter is now decremented before the await returns. Theview/tuicommands also close the openedVortexHandleon every error path (openOnWorkerreturnsOptional). (95c06b1a, 27446d81) - Reader:
BoolArray.materializemasked the accumulator byte before the bit-set OR, removing a sign-promotion footgun in the packed-bitmap write. (bc8e9d4e)
- Decode shape: transform encodings now decode lazy-only. The eager
Materialized*Arrayfallbacks were removed fromvortex.zigzag(all PTypes + broadcast, cd59fefa),fastlanes.for(all integer PTypes, d7953e1f),vortex.alp(broadcast-without-patches, deab8067),vortex.constant(Decimal →LazyConstantDecimalArray, a6a9611e),vortex.runend(Bool →LazyRunEndBoolArray, 0bbcb81f),vortex.sparse(Bool →LazySparseBoolArray, db2e955b), andfastlanes.rle(validity →OffsetBoolArray, empty →LazyConstantXxxArray, 5e83a5c3). Decompression encodings (bitpacked,pco,zstd,fsst,delta,patched), the primitive base, thevortex.dictencoding-level path, and thevortex.alppatches path stay Materialized by design. See ADR 0015. - Breaking — sealed
Arraypermits changed.DecimalArrayis now anon-sealedfamily interface (decimal arrays moved fromimplements Arraytoimplements DecimalArray), so decimal joins the per-dtype family layer. Downstream exhaustiveswitchoverArraymust add acase DecimalArray. (a6a9611e) - Breaking —
ArrayAPI.Array.truncate(rows)renamed toArray.limited(rows)and made an abstract operation implemented by every array (composites slice their children); raw-segment access moved off theArraySegmentsutility ontoArray.materialize(SegmentAllocator)andArray.segmentIfPresent(). (87ab65e2, 4d9ac1f8, 332b067e, 32a35e03) - CSV import reports progress every 10K rows instead of per-chunk. (07a056e7)
- Breaking —
EmptyArrayremoved from the sealedArraypermits. It was never emitted by the reader (empties are zero-length typed arrays in their own family) and broke the dtype→family invariant (EmptyArray(I64)was not aLongArray). Represent an empty column as a zero-length array of the appropriate family. (3a4dcdfa)
- ADR 0016: captures
vortex-arrowbridge interop options (separate module / Arrow C-Data / none); deferred until a concrete downstream need. (a6126f29)
- Test coverage raised from ~74% to 80% — the lazy/chunked/dict/run-end/sparse array families,
ChunkImpl, and several decoders (DecimalEncodingDecoder,DictEncodingDecoder,ParquetImporter) reached full line + branch coverage. SonarCloud quality gate green: reliability, security, and maintainability all at A, zero bugs and vulnerabilities.
0.7.3 — 2026-06-17
Parquet ZSTD support, vortex.patched encoder, constant-encoding selection fix, Windows TUI raw-mode fix.
- Parquet: ZSTD-compressed Parquet import now works —
zstd-jniwas an optional dep in hardwood and had to be declared explicitly. NYC Yellow Taxi 2024-01 (47.6 MB Parquet, 2.96 M rows × 19 cols) imports to 40.7 MB Vortex — 14% smaller than the Rust JNI reference (47 MB) thanks to the global-dict encoder catching low-cardinalityF64columns. (bea15f2d) - Writer:
vortex.patchedencoder — identifies outlier values that exceed the optimal bit width, zeros them in the inner array (exposed as an open cascade child for further bitpacking), and stores their within-chunk U16 indices and raw values separately. (d63ab7c3)
- CLI: Windows TUI raw-mode —
readKeynow callsReadFiledirectly on the kernel handle obtained viaGetStdHandleinstead of reading fromSystem.in. Java'sSystem.ingoes through JVM-internal CRT wrappers that ignoreSetConsoleMode, so every keypress previously required Enter before the TUI reacted. (31b77acc) - Writer: constant encoding skipped for single-distinct-value columns —
isDictCandidatereturnedtruefordistinctCount == 1, routing all-same-value columns through the global-dict path instead ofvortex.constant. (0e8b945e)
- CLI: polling loop in
Terminal.readKey(Duration)extracted toKeyDecoder.nextWithTimeout(InputStream, Duration)— eliminates duplication betweenPosixTerminalandWindowsTerminal. (35b05d16)
- Integration:
TaxiParquetOracleVsJavaIntegrationTest— hardwood reads the taxi Parquet to a CSV (oracle);ParquetImporter→CsvExporterproduces a second CSV (SUT); line-by-line diff must be zero. Proves the importer loses no data across 2.96 M rows × 19 columns. (1a1a676e)
0.7.2 — 2026-06-16
CLI usability + reader robustness on real-world files (NYC Yellow Taxi).
- CLI
view <file>— scrollable Excel-like grid TUI. Streams rows on demand via a newLazyGridSource(one live chunk at a time, format only the visible window). Title bar showschunk K/N. Default writes to alt-screen; quit withq/Esc. (1c0311fb, b7f6b6c1, 94e5bff8, 6a8ddd3a) - CLI
exportwrites to a derived<name>.csvnext to the input by default, with a stderr progress bar mirroring the import flow.export <file.vortex> -keeps the old stdout streaming behaviour. (2b26da9a) - Reader:
ScanIterator.chunkRowCounts()— returns per-chunk row counts by walking the layout tree, no value decode. Used by theviewTUI to plan navigation. (b7f6b6c1) - Reader: lazy
vortex.decimaldecode — newLazyDecimalArrayrecord holds a zero-copy mmap slice and producesBigDecimalpergetDecimal(i). Replaces theGenericArraywrapper, no buffers / children indirection. (6bc955d2) - Reader: 7
Offset*Arrayrecords (Long / Int / Short / Byte / Double / Float / Bool) +VarBinArray.SlicedModefor offset-based slicing of pre-decoded shared arrays. (5df3d9a9)
- Reader: per-column chunking alignment — files where one column has 1 mega-flat and another has N small flats (e.g. NYC Yellow Taxi 2024-01 has a 2.96M-row VendorID flat next to 23 × 131072-row datetime flats) now decode the wide column once into a
sharedArenaand slice it per chunk viaOffset*Array. Previously the scan iterator emitted a single chunk whose datetime columns were the first 131072 rows only — silently dropping 95.6 % of the file. (5df3d9a9) - Reader:
FrameOfReferenceEncodingDecodernow takes the arena variant ofArraySegments.of, so lazy children (e.g.LazyRunEndLongArray) materialise instead of throwing "no primary segment". (5df3d9a9)
- Compatibility table:
constant,varbinview,alprd,datetimeparts,decimal_byte_parts,decimalrows now reflect their shipped Lazy shape; container encodings (list/listview/fixed_size_list) marked Lazy (inherit child shape);patchedpinned Materialized with reasoning. (6e87b74e, 6bc955d2, 2ed32ec8, d8363920)
0.7.1 — 2026-06-16
Cleanup release on top of 0.7.0 — one more lazy encoding, a Windows TUI usability fix, and a fresh round of read benchmarks.
vortex.constantlazy decode — seven metadata-onlyLazyConstantXxxArrayrecords (Long / Int / Double / Float / Short / Byte / Bool) replace the one-element broadcast buffer; the per-element broadcast-modulo path is gone (3edf6e8c)- Top-N read benchmarks (N=10, 100) + README table, refreshed 80M-row numbers (c00fdf7f, 33714d7b, a6fd92fc)
- CLI:
schemaprints per-row column listing (9b3fe4b5) - CLI:
Terminal.readKeytakesDurationinstead oflong ms(2942a4da) - Reader: extract
TimeDtype+TimestampDtypeshared metadata helpers (8f1b9feb)
- CLI: actionable error on Git Bash / MinTTY —
GetConsoleModefailure now points users atwinpty/ Windows Terminal / PowerShell instead of dead-ending on the raw error (6ec42288) - Reader:
ArraySegments.of(arr)typed-accessor fallback for lazy arrays (74ec207b)
- Drop
sonar.cpd.exclusions(cde845bf)
0.7.0 — 2026-06-16
pco encoder (Classic + Consecutive delta + IntMult mode, 4-way tANS, multi-chunk, all 8 ptypes),
writer compression (~93% Rust JNI parity on NYC Yellow Taxi: 47.0 MB → 43.4 MB; stratified sampling, stats-driven cascade, sparse-cascade idx/val children, patched bitpacking),
lazy / zero-copy decode (ADR 0010 + ADR 0012: ALP / FoR / ZigZag / Chunked / Dict / RunEnd / RLE / Sparse / ALP-RD / VarBinView / DateTimeParts / DecimalByteParts now defer transform / materialisation until access),
write API ergonomics (DType static factories, structBuilder, typed writeChunk(Consumer<Chunk>) — ADR 0009),
Sonar pass (Codecov → SonarCloud, Javadoc HTML → Markdown, full S6218 / S7474 / S2184 / S3776 sweep).
vortex.pcoencoder (PcoEncodingEncoder) — Classic mode + Consecutive delta + IntMult mode (mode=1); 4-way interleaved tANS; histogram + bin-optimization DP; multi-chunk (64K-element chunks); all 8 supported ptypes (I16/U16/I32/U32/F32/I64/U64/F64) (1bb14ab, 086aa52, 30579ed, 7219974, f856559)LeBitWriter— LSB-first bit writer, symmetric toLeBitReader; reusable for future bit-oriented encoders (1bb14ab)- ADR 0009 — write API ergonomics:
DTypestatic factories +asNullable()(0e9d6703),DType.structBuilder()(63d66eef), typedwriteChunk(Consumer<Chunk>)builder (ddb3e21a); design doc (d9c4b99, a57ea70);MemorySegmentzero-copy overload split to ADR 0011 (6367eb37) - ADR 0010 — lazy decode for 1:1 transform encodings:
LazyAlpFloatArray, lazyFoR/ZigZagarrays defer the transform until first element access (cff3acb5, c47c055c, c3ca6951, 68186f8f) - ADR 0012 — zero-copy decode for compound encodings.
ChunkedXxxArraywraps instead of concatenating (dfe7aa34, c557b8fb, e2db153d);DictXxxArraylazy reads (9b97a1a5); lazyRunEnd(210449b5),RLE(f35f9a96),Sparse(b604f21c),ALP-RD(937ade36);VarBinArray.ChunkedMode(b3696f5a) +ViewModefor VarBinView (0eea0405);LazyDateTimePartsLongArray(8ab9ec70);LazyDecimalBytePartsArray(22887cb2); design doc (f6a19c47, 2578f892, 1c7f5950) - ADR 0013 — compute primitives (masks, kernels, no-materialise) design doc (400e5b03)
forEach*/folddefault methods on Short / Byte / Bool array interfaces; chunked overrides iterate children directly (7dc6567e, f500afe3)truncateArraypreserves zero-copy onChunkedXxxArray(6f4eaa96)- ALP size-based exponent search ported from Rust, two-step decode (f9bb7373)
- Decode shape table in
docs/compatibility.md(47a91fd1) - Writer compression closes ~93% of the Java↔Rust file-size gap on NYC Yellow Taxi 2024-01 (2.96M rows, 19 cols): 47.0 MB → 43.4 MB; Rust JNI baseline 42.8 MB. Four coordinated changes ported from
vortex-compressor:- Global dictionary encoding admitted for F64 columns. Codes assigned in frequency-descending order so the dominant value maps to code 0, which lets
SparseEncodingEncoder(fill = 0) compress the codes child. Mirrors RustFloatDictScheme. Also:FrameOfReferenceEncodingEncoderskips cascade whenref == 0and ptype is unsigned (residuals == input, FoR adds wrapper overhead for zero benefit) (01fbaa6) - Stratified sampling in
CascadingCompressor: 32 contiguous strides at evenly-partitioned, non-overlapping offsets. Preserves local run structure soRunEnd/RLEcan win on dict codes while covering breadth so cardinality-based encoders see realistic distinct counts. Matchesvortex-compressor::sample::stratified_slices(715a697, da16f0d) - Stats-driven cascade selection. Single-pass
ArrayStats(distinct count, top-frequency value + count) shared across all eligible encoders via mergedStatsOptions. NewEstimatesealed hierarchy (Skip/AlwaysUse/Ratio) lets encoders short-circuit the cascade without paying the sample-encode cost.ConstantEncodingEncoder.AlwaysUsewhen distinct == 1;DictEncodingEncoder.Skipwhen distinct > n/2 (RustFloatDictScheme/IntDictSchemerule);SparseEncodingEncoder.Skipunless dominant-value bits == 0 andtopFreq * 2 >= n;RunEndEncodingEncoder.Skipwhen every value is distinct. Mirrorsvortex-compressor::estimate::EstimateVerdict(2e31265) SparseEncodingEncoder.encodeCascadeexposes patch-index and patch-value buffers asChildSlotentries so the cascade further bitpacks them — biggest single lever (~1.7 MB on dict-coded F64 columns: tolls_amount, Airport_fee, congestion_surcharge, mta_tax, RatecodeID, improvement_surcharge) (2ad275c)
- Global dictionary encoding admitted for F64 columns. Codes assigned in frequency-descending order so the dominant value maps to code 0, which lets
- Patched bitpacking —
BitpackedEncodingEncoderpicks the bestbit_widthand stores overflow as sparse patches; ontrip_distance-style columns Java is 1.8 MB ahead of Rust JNI (007e6c47) - Per-chunk zone-map stats shown in the TUI inspector (5e24fb62)
- Per-chunk column row-count consistency validation in the writer (c54c8dab)
GlobalDictF64Test— round-trip + dict cardinality + sparse-codes-child verification on dominant-F64 columns (01fbaa6)TaxiColumnByteDiff— per-column byte attribution diagnostic. Walks the layout tree, prints Java vs JNI bytes side-by-side. Used to locate the sparse-cascade gap that the global file size hid (2ad275c)TaxiCsvSizediagnostic — reports CSV / CSV.gz sizes for the taxi corpus (7f851557, 5ba2ae30)- ADR 0008 — domain primitives and unsigned integer representation (806f52f2)
- Primitive
Arraytypes are non-sealed interfaces;fold/forEachare default methods (aec4d813, f500afe3) FoRdecode writes in-place when the source segment is writable;applyReferencealways allocates from the arena (b1906a08, 9955a39f)- ALP eager fallback collapsed to a single allocate + transform pass (e3a6c21a)
- Arena lifted out of lazy array records into
ArraySegments.of(8d6fe4f0) ScanIteratordrops deadregistryfield (53dfdcbb)- Per-chunk zone-map stats shown in TUI inspector (5e24fb62)
- Javadoc HTML tags → Markdown
///(44d2a052);{@code}→ backticks (aca51d2f) - CI: Codecov → SonarCloud, daily scheduled run (c600e3d0, 1e5816a5); failsafe + integration jacoco-it included in Sonar coverage (c59e2f6c, 03b2dfe7)
- CI: Mockito self-attach warning silenced via byte-buddy-agent (2ba7b877, 1183c526)
- OHLC read benchmarks re-run at 80M rows;
-Dvortex.bench.ohlc.rowsoverride added (9b7fd61f) dev.vortex:vortex-jni0.74.0 → 0.75.0 (2f55f1c1);hardwood-coreandzstd-jnibumped (ff5fe4b3, 2c885d3b)
- CLI: terminal mode restored on TUI exit (60cda920)
- CLI:
aircompressorbundled in the uber-jar so the zstd decoder loads (e96c5968) - CLI: scan-based filter parser;
VortexExceptioncaught at the boundary (d9cff370, 6165c497) - CLI: CSV import
--delimiterflag (976934b3) - CLI:
IoWorkerusesqueue.addoverofferso submission failures aren't silently dropped (a624d3da) - Reader:
LazySparsXxxArrayguards nullpatchValueswhennumPatches == 0(d83ec1b5) - Sonar pass: explicit widening on int-math feeding long/float (S2184) (3cd23364, ba6ea44d); LazyRle S6218/S3776 + Pco S1905 (a2ea4796); S6218 on internal records with array components (e1d20cc5); S7474/S6218 in new lazy array files (11fe0c41); 2 hotspots + 1 assertion bug (65ccd65d); SonarCloud organization key fix (b95dcf0e)
- Pco encode FloatMult / FloatQuant modes deferred — marginal gain over existing Classic+ALP cascade.
- Remaining 0.6 MB (1.4%) writer gap vs Rust JNI on the taxi benchmark is structural — concentrated in
trip_distance(+540 KB, per-chunk ALP encoding) andPULocationID(+250 KB, dict-codes layout shape). Closing it needsvortex.statsouter-layer support or dtype-specialised dict schemes.
0.6.0 — 2026-06-13
proto-rewrite (protobuf-java → in-tree MemorySegment-native codec, CLI −14%),
Extension API split (ExtensionDecoder / ExtensionEncoder SPI, writer auto-route, UUID + nullable support, JDBC extension import),
module boundary cleanup (Array subtypes → reader.array, encode data holders → writer.encode).
proto-genmodule — build-time.proto→ Java record/enum generator (ae6c46a, 743278d, b527f84)ProtoReader/ProtoWriter— MemorySegment-native proto3 wire primitives (ae6c46a, b527f84)- Oneof factories on generated records, e.g.
ScalarValue.ofInt64Value(v)(b527f84) PatchedMetadata/VariantMetadataadded toencodings.proto(743278d, b527f84)- Nullable extension columns (
vortex.date/time/timestamp/uuid) viaExtEncoding → MaskedEncoding → primitive(1015f9b) - Null-preserving
decodeAllon all extension decoders —nullat invalid positions (24c64a9) ExtensionDecoder/ExtensionEncoderSPI with separate ServiceLoader manifests (a560563)- Spec extension decoders: Date, Time, Timestamp, Uuid in
reader.extension(a560563) - Spec extension encoders: Date, Time, Timestamp, Uuid in
writer.encode(a560563) - Writer auto-routes
List<LocalDate>,List<Instant>,List<UUID>, … to extension storage (1d54b57, 75d7b4b) vortex.uuidextension —FixedSizeList(U8, 16), big-endian, JDBC vendor detection (89a0a69, cce2d2d)- JDBC import for
DATE/TIME/TIMESTAMP/ UUID columns (9f31d9e, cce2d2d) Chunk.as(name, Class)— typed extension column access (e5cefb0)- ExtEncoding storage child cascade-compressed (FoR / Bitpacked / ALP / RLE / …) (33cf42e)
- Java → Rust nullable extension integration tests; UUID
@Disabledpending vortex-jni upgrade (bb7fcb0)
EncodingRegistry→ReadRegistryinio.github.dfa1.vortex.reader(834d2f1, a560563)core.Extension/core.ExtensionEncoder→reader.ExtensionDecoder/writer.ExtensionEncoder(2a0ed93, a560563)VortexHttpReader.opengainsHttpClientoverload (235826f)core.array.*→reader.array.*— update import paths (286715c)core.array.NullableData→writer.encode.NullableData(286715c)- Decode utilities (
LeBitReader,PcoBin,PcoTansDecoder,SegmentBroadcast) →reader.decode(d514435) - Encode data holders (
ChunkedData,DateTimePartsData,FixedSizeListData, …) →writer.encode(d514435) ExtEncodingunwrap shortcut removed from registry (4d4ab34, 75d7b4b)ArrayNode.stats()/ArrayNode.of(…, stats)removed — was dead code in decode path (dc3aa00)
regenerate-sourcesprofile uses in-process proto-gen;protocno longer needed (743278d)- 25 encoding classes migrated to generated record API (
meta.bit_width()style) (0132417, 743278d)
VortexHttpReaderthrowsVortexExceptionon HTTP body length mismatch (235826f)vortex.date/vortex.uuidmetadata presence fixes Java → Rust cross-compat (bb7fcb0)- Extension dtype
nullablederived from storage dtype, not hardcodedfalse(1015f9b) DType.Extension.metadatacapped at 64 KiB on parse (22a5f59)- CLI startup: silenced
dev.hardwood VectorSupportINFO log (57a5a38)
vortex-readerdependency fromvortex-parquet(eca40f4)com.google.protobuf:protobuf-java— CLI jar 14 MB → 12 MB; JDK 25 Unsafe warning gone (743278d)protocbuild dependency (743278d)
1046 unit + 248 integration tests, JDK 25 (2 skipped — UUID cross-compat blocked on vortex-jni 0.74.0).
ProtoWriter.varintSizebranchless viaInteger.numberOfLeadingZeros(42177ca)ProtoWriterbackpatched length-delim writes eliminate per-message temp allocation (c79611e)
- Compatibility doc bumped to Rust reference v0.74.0 (cf73887)
0.5.0 — 2026-06-09
The headline themes are an interactive inspector TUI for navigating Vortex files
(extracted as a dedicated vortex-inspector module), full Vortex extension type
decode (date, time, timestamp, uuid, decimal), and a scan API rewrite that
replaces the silent hasNext() arena-closing footgun with closeable Chunk objects.
- Interactive TUI inspector (
vortex-inspectormodule +tuiCLI subcommand). Lazy-loaded layout tree with stats, dictionary entries, hex previews, and decoded data; works against local files andhttp(s)://URLs. FFM-based ANSI terminal driver — no Lanterna dependency. Documented indocs/how-to.md#inspect-interactively-tui. (aa7561f, 397b64a, d4cd0bc, 8dae240, 00452e4, 7a51165, e8db30a, a43f340) - Extension type decode —
vortex.date→LocalDate,vortex.time→LocalTime,vortex.timestamp→Instant/ZonedDateTime,vortex.uuid→UUID. Routed through a newExtensionsealed hierarchy onDType.Extension. Seedocs/compatibility.mdfor the coverage matrix. (4963aa9, ca8d687, 99417ad, 9da2a78, 175ad07) - Decimal decode —
GenericArray.getDecimalsupports thedecimal_byte_partsshape, including i128 (precision > 18). Width 1/2/4/8 reads stay allocation-free. (23d5019, 4735324, ff20a24, f4ae8c0) - CLI uber-jar deployed to Maven Central under classifier
all(io.github.dfa1.vortex:vortex-cli:0.5.0:jar:all). Useful when the consumer environment can't clone from GitHub. The manifest setsEnable-Native-Accessso FFM downcalls work without the JVM flag. (3e2c552, cfc5cc8) - Writer: global dictionary for low-cardinality
Utf8— columns with ≤ 256 distinct values across chunks are now emitted as a sharedvortex.dictlayout. (b4d1b43) - CI: Windows runs for the inspector module. (a9c9d4e)
- Breaking — scan API lifecycle.
ScanIteratornow implementsIterator<Chunk>.next()returns aChunkthat the caller must close (try-with-resources);hasNext()is side-effect-free. Callingnext()while a priorChunkis still open throwsIllegalStateException. This removes the previous footgun whereiter.hasNext()silently closed the previous chunk's arena, invalidating anyArrayreferences the caller still held. Use afterclose()raises FFM's scope check (IllegalStateException) instead of returning undefined data. See the updated examples inREADME.mdanddocs/explanation.md#memory-model. (b45fd98) - Breaking —
EncodingRegistryis immutable. Register via the new builder:EncodingRegistry.builder().registerServiceLoaded().register(myEncoding).build(). (64ffbaa) - Breaking —
inspectsplit intoinspect(text) +tui(interactive). Previousinspect <file>behaviour stays oninspect; interactive use is now on the dedicatedtuisubcommand. (e8db30a) Extensionsealed hierarchy replaces the priorExtensionsutility class. (175ad07)- CLI errors always print the exception class + cause chain —
VORTEX_DEBUGenvironment variable removed. (6a4464b, f2f85bd)
- Bitpacked unpack — per-row bookkeeping hoisted out of the inner block loop
in
unpackLoop8/16/32/64. Measurable win on bitpacked scan benchmarks. (ab3ca3f, ad8a64d) - Broadcast modulo branch-split — ALP + Dict hot paths gate the
ConstantEncodingbroadcast modulo behind a cheapcap == ncheck, restoring C2 vectorization on the common path. ~5–10× recovery on the regressed scans. (442021f) - Scan fast-path on non-broadcast reads — recovers ~25% on bitpacked scans by skipping the broadcast capacity check when not needed. (051a794)
GenericArray.getDecimal— width 1/2/4/8 reads stay allocation-free. (f4ae8c0)
- Decimal element width is derived from the buffer size, not the declared precision — fixes round-trip with the Rust reference implementation for oversized declared precisions. (c798e95)
Extensions.localDatebounds-check — rejects out-of-range storage values. (a7eab37)GenericArray.getDecimalrejects null cells in the mantissa path. (5198115)- TUI thread safety —
Layout.metadatabyte reads run on the I/O worker thread that owns the handle.InspectorTree.Nodeuses identity equality so duplicate subtrees don't collapse. (6c732de, 0cc1137, a47b6fd) InspectorTree—vortex.datecolumns format using the declared dtype; TUI data scan no longer applieswithLimit(was rejectingGenericArray). (0e749df, b5ce1d6)ScanIterator.truncateArraynow supportsGenericArray; decimals format correctly in the TUI. (f09f564)- CLI prints the exception class + full cause chain on
inspecterrors. (f2f85bd)
ScanResult— renamed toChunkand given lifecycle methods. Update imports:io.github.dfa1.vortex.scan.ScanResult→io.github.dfa1.vortex.scan.Chunk. (b45fd98)Extensionsutility class — replaced by theExtensionsealed hierarchy. (175ad07)Extension.Time#unit/Extension.Timestamp#unitaccessors (unused). (2fcb311)VORTEX_DEBUGenv-var gate — stack traces are always printed on CLI error. (6a4464b)- Lanterna dependency — replaced by an FFM-based ANSI terminal in
cli/src/main/java/.../tui/term. (397b64a)
Trailerparser extracted, shared byVortexReader(mmap) andVortexHttpReader(range-request) paths. (1ac6f5a)VortexHttpReaderallocates its ownArenain the constructor and reuses a singleHttpClient. (e18dafd)- Inspector module carved out of
cli; TUI +IoWorker+ terminal code later moved back intocli(only theinspectpackage stayed invortex-inspector). (aa7561f, 77baff1) - Documentation: layout section expanded (node types, encoding namespaces,
pruning); on-disk file-layout diagram in
explanation.md. (e696f07, 5c2b410)
0.4.0 — 2026-06-07
The headline themes for this release are a security-hardening sweep of the file-format
parser, a public-API cleanup of the Array hierarchy (the heap-allocated buffer(int) /
segment() accessors are gone from the interface), and cascading writer features that
close the compression gap with the Rust reference implementation on real-world workloads.
Every malformed input now surfaces as VortexException rather than a JDK exception
(IndexOutOfBoundsException, ArrayIndexOutOfBoundsException, StackOverflowError, raw
FlatBuffer/Protobuf runtime exceptions). Regression suite lives under
reader/src/test/java/.../*SecurityTest.
- Zip-bomb protection —
ConstantEncodingand dict-layout decode no longer pre-allocateO(rowCount)memory; a 150-byte crafted file claiming 10⁹ rows is now constant-cost. (10a7776) - Trailer + postscript validation —
VortexReaderandVortexHttpReaderreject unknown fileversion,postscriptLen == 0, andpostscriptLen > fileSize - 8. Footer/layout/dtype blob offsets andLayout.encodingindex are bounds-checked at parse time. (f8f89fe) - Footer
segmentSpecsbounds — every spec is validated againstfileSizethe moment the footer is materialised, eliminating laterIndexOutOfBoundsExceptiononMemorySegment.asSlice. (03845ac) - PType ordinal bounds-check —
PType.fromOrdinal(int)replaces all 22PType.values()[idx]call sites across encodings; crafted Protobuf ptype fields are rejected up front. (b4988c3) - Layout-tree depth cap —
PostscriptParser.convertLayoutis capped at depth 64, preventing both unbounded nesting and self-referential FlatBuffer cycles (a ~120-byte cycle attack previously triggeredStackOverflowError). (29adbe0) - Layout metadata size cap — per-layout
metadataAsByteBuffer()is capped at 4 MiB (above any real encoding's footprint; FSST's symbol table is the largest at ~32 KiB). (ebbe644) - Decimal field validation —
DType.Decimalis rejected unlessprecision ∈ [1, 38]andscale ∈ [0, precision], matching IEEE 754-2008 decimal128. (ebbe644) readFlatStatsbounds-check — zone-map stats reads now validate the trailing little-endianfbLenfield against the segment size, returning empty stats on malformed input rather than throwingIndexOutOfBoundsExceptionfromMemorySegment.asSlice. (ebbe644)
vortex.sequenceF16 encode/decode — half-precision floats now round-trip through the sequence encoding. (7b3d7a9)- Writer: cascading with global dict layout — low-cardinality columns (≤ 256 distinct
values in the chunk sample) are now emitted as a
vortex.dictlayout, with the dict candidate detection tightened to avoid false positives. (53b2a19, d383765) - Writer: opt-in Zstd compression —
WriteOptions.withZstd(boolean)exposes the size/throughput trade-off. Off by default; turn on for archival workloads. (ea10d37) Encoding.decodeSegmentextension point — added as part of the typed-segment migration (see Changed below). Provides a typed alternative toArray.segment(). (ed4a0ae)DecodeContext.decodeChild(int, DType, long)— typed child-decode helper that replaces the per-encodingdecodeChildAs(...)private utilities. (d07faf0, a1512da)- Typed accessors on concrete array types —
LongArray.segment(),VarBinArray.offsetsSegment(),MaskedArray.inner(), and friends now live on the concrete types where they fit cleanly, rather than on theArrayinterface. (84a34f4) *SecurityTesttest-naming convention — adversarial / robustness tests are now grouped under the*SecurityTestsuffix, mirroring the existing*IntegrationTestconvention. Run with./mvnw test -Dtest='*SecurityTest'. (76ba67c)FlatSegmentDecoder— extracted fromEncodingRegistry; the registry is now pure dispatch. (4a08356)
Arrayinterface slimmed down.buffer(int),child(int), andsegment()are no longer part of theArrayinterface; consumers should use the typed accessors on the concrete subtype (e.g.LongArray.segment()) orArraySegments.of(arr)for a generic fallback.buffer(int)is now package-private on the concrete array types. (bdb4e7d, 1283168, ba5957c, df6ab3f, bb7b656)VarBinArrayno longer keeps a redundantoffsetsArrfield; consumers read offsets viaoffsetsSegment(). (96687f8)ArrayStatsis no longer eagerly stored on decoded array types; statistics are now read on demand from the FlatBuffer node, matching the Rust reference implementation. (9237e28)
MaskedArray.segment()delegates correctly to its inner array (regression introduced during the typed-accessor migration). (8a16119)- Constant-encoded array indexing broadcasts the index correctly when scanning multiple rows from a single stored value. (ed658b7)
- Performance benchmark (
RustWritesJavaReadsBigFileBenchmark) migrated off the removedArray.buffer(int)accessor, unblocking./mvnw verifyand./bench. (977a529)
Array.buffer(int),Array.child(int), andArray.segment()from the publicArrayinterface (see Changed). Callers should migrate to the concrete-type accessors orArraySegments.of(arr). (bdb4e7d, 1283168)Encoding.decodeSegment(...)is removed after the migration toDecodeContext.decodeChild. (977a529)ArrayStatsfield on decoded array types (statistics are now lazy). (9237e28)
- Added
CONTRIBUTING.mdcovering trunk-based workflow, commit conventions, and the three-touch-point rule for adding encodings. (d9825cf) - Added an internal-architecture diagram set covering the file format, layout tree, and scan path. (75f4cea)
- Added a "Vortex vs Parquet" comparison section to the README. (886bb80)
- Expanded the
## Securitysection inTODO.mdwith the open hardening roadmap (resource caps, per-encoding adversarial tests, Jazzer fuzz harness, OSS-Fuzz submission). (1bb1465)
- Dependabot enabled for Maven and GitHub Actions. (dd118a7)
- Numerous dependency bumps: JUnit Jupiter (5.11.4 → 6.1.0, tests now require JUnit 6),
Mockito, FastCSV (3 → 4), H2 (2.3 → 2.4), Checkstyle, Zstd-JNI,
maven-compiler/surefire/failsafe/javadoc/source/shade/gpg/antrun/exec/build-helper plugins,
actions/checkout(4 → 6),actions/setup-java(4 → 5),actions/cache(4 → 5), Sonatype central-publishing plugin. (81be668, 2c8319f, f6dbcf1, dfabd8c, 7b4a718, fb9b404, 7b67000, fd1ea7e) pom.xmlfiles now group dependencies underproduction/testingcomment sections with a consistent project-internal-first ordering. (bcbbbfd)- Checkstyle scope tightened to exclude generated
fbs/protopackages. (f5ab433)