Skip to content

Commit bf61c07

Browse files
authored
Merge vortex-dtype into vortex-array (#6582)
## Summary Tracking Issue: #6547 Merges `vortex-dtype` into `vortex-array`. I think it is worth considering combining some more crates, but we can have that discussion another time. ### Implementation Note I had Claude generate a script that I can run to automatically do most of these changes. This lets me rebase and rerun the script instead of fix merge conflicts on my own. <details> ```sh #!/usr/bin/env bash # SPDX-License-Identifier: Apache-2.0 # SPDX-FileCopyrightText: Copyright the Vortex contributors set -euo pipefail # Merge vortex-dtype into vortex-array, then delete vortex-dtype entirely. # # Run from the repo root: # bash scripts/merge-dtype-into-array.sh # ─── Step 1: Move source files into vortex-array/src/dtype/ ───────────────── mkdir -p vortex-array/src/dtype cp -r vortex-dtype/src/* vortex-array/src/dtype/ mv vortex-array/src/dtype/lib.rs vortex-array/src/dtype/mod.rs # ─── Step 2: Handle dtype.rs module inception ─────────────────────────────── # Rename dtype.rs → dtype_impl.rs to avoid having a file named the same as its # parent directory module, and update references in mod.rs. mv vortex-array/src/dtype/dtype.rs vortex-array/src/dtype/dtype_impl.rs sed -i '' 's/^mod dtype;$/mod dtype_impl;/' vortex-array/src/dtype/mod.rs sed -i '' 's/^pub use dtype::NativeDType;$/pub use dtype_impl::NativeDType;/' \ vortex-array/src/dtype/mod.rs # Move the DType enum definition from dtype_impl.rs into mod.rs so that the # canonical definition lives at the module root. # 2a: Extract the DType enum (doc comment through closing brace) to a temp file. sed -n '/^\/\/\/ The logical types of elements in Vortex arrays\./,/^}$/p' \ vortex-array/src/dtype/dtype_impl.rs > /tmp/vortex_dtype_enum.rs # 2b: Remove the enum from dtype_impl.rs. sed -i '' '/^\/\/\/ The logical types of elements in Vortex arrays\./,/^}$/d' \ vortex-array/src/dtype/dtype_impl.rs # 2c: In dtype_impl.rs, DType is now in the parent module. Import it and bring # all variants into scope for the impl blocks. sed -i '' 's/^use DType::\*;$/use super::DType;\ use DType::*;/' vortex-array/src/dtype/dtype_impl.rs # 2d: Build a new mod.rs with the DType enum inserted before the `pub use` # re-exports, and add the `use std::sync::Arc;` import that the enum needs. INSERT_LINE=$(rg -n '^pub use bigint::\*;' vortex-array/src/dtype/mod.rs | head -1 | cut -d: -f1) { # Everything before the first `pub use` line. head -n $((INSERT_LINE - 1)) vortex-array/src/dtype/mod.rs echo "use std::sync::Arc;" echo "" cat /tmp/vortex_dtype_enum.rs echo "" # Everything from the first `pub use` line onward. tail -n +"$INSERT_LINE" vortex-array/src/dtype/mod.rs } > /tmp/vortex_dtype_mod.rs mv /tmp/vortex_dtype_mod.rs vortex-array/src/dtype/mod.rs # 2e: Remove the old re-export of DType (it is now defined directly in mod.rs). sed -i '' '/^pub use dtype::DType;$/d' vortex-array/src/dtype/mod.rs sed -i '' '/^pub use dtype_impl::DType;$/d' vortex-array/src/dtype/mod.rs rm -f /tmp/vortex_dtype_enum.rs # ─── Step 3: Strip crate-level attributes from dtype/mod.rs ───────────────── # Inner #![...] attributes are only valid at the crate root. Drop them. sed -i '' '/^#!\[cfg(target_endian/d' vortex-array/src/dtype/mod.rs sed -i '' '/^#!\[deny/d; /^#!\[warn/d' vortex-array/src/dtype/mod.rs # ─── Step 4: Add `pub mod dtype;` to vortex-array/src/lib.rs ──────────────── # Insert alphabetically (between display and executor). sed -i '' 's/^mod executor;$/pub mod dtype;\nmod executor;/' \ vortex-array/src/lib.rs # ─── Step 5: Fix imports in moved dtype files ─────────────────────────────── # 5a: crate:: → crate::dtype:: (all internal references in the moved files). fd -e rs . vortex-array/src/dtype -x sed -i '' 's/crate::/crate::dtype::/g' # 5b: Fix double dtype:: caused by the internal `dtype` module (now `dtype_impl`) # being referenced as crate::dtype:: in the original, which became crate::dtype::dtype::. fd -e rs . vortex-array/src/dtype -x sed -i '' 's/crate::dtype::dtype::/crate::dtype::/g' # 5c: vortex_dtype:: → vortex_array::dtype:: (doc examples in the moved files). fd -e rs . vortex-array/src/dtype -x sed -i '' 's/vortex_dtype::/vortex_array::dtype::/g' # 5d: Fix #[macro_export] macro bodies. These macros used literal `vortex_dtype::` # references which step 5c turned into `vortex_array::dtype::`. But you can't # use a crate's own name from within that crate — macros need `$crate::`. # Only apply to non-comment lines to preserve doc examples. fd -e rs . vortex-array/src/dtype \ -x sed -i '' '/^[[:space:]]*\/\//!s/vortex_array::/$crate::/g' # ─── Step 6: Fix imports in existing vortex-array files ───────────────────── # 6a: vortex_dtype:: → crate::dtype:: in vortex-array/src/ EXCEPT the dtype/ subdirectory. fd -e rs . vortex-array/src --exclude dtype \ -x sed -i '' 's/vortex_dtype::/crate::dtype::/g' # 6b: vortex_dtype:: → vortex_array::dtype:: in bench/test files (these compile as # separate binaries and use vortex_array::, not crate::). fd -e rs . vortex-array --exclude src \ -x sed -i '' 's/vortex_dtype::/vortex_array::dtype::/g' # ─── Step 7: Fix imports in all other crates ──────────────────────────────── # vortex_dtype:: → vortex_array::dtype:: across the entire workspace, # excluding vortex-array (handled above) and vortex-dtype (being deleted). fd -e rs . \ --exclude vortex-array \ --exclude vortex-dtype \ -x sed -i '' 's/vortex_dtype::/vortex_array::dtype::/g' # ─── Step 7b: Fix #[macro_export] macro imports ───────────────────────────── # Macros with #[macro_export] are exported at the crate root, not at the module # where they are defined. The complete list from vortex-dtype: # field_path, match_each_native_ptype, match_each_integer_ptype, # match_each_unsigned_integer_ptype, match_each_signed_integer_ptype, # match_each_float_ptype, match_each_native_simd_ptype, # match_smallest_offset_type, match_each_decimal_value, match_each_decimal_value_type MACRO_SED='s/::dtype::match_each_/::match_each_/g; s/::dtype::match_smallest_/::match_smallest_/g; s/::dtype::field_path/::field_path/g' # Within vortex-array/src (uses crate::dtype:: → crate::) fd -e rs . vortex-array/src -x sed -i '' "$MACRO_SED" # Within vortex-array bench/test files (uses vortex_array::dtype:: → vortex_array::) fd -e rs . vortex-array --exclude src -x sed -i '' "$MACRO_SED" # In all other crates (including vortex-duckdb, vortex-python, etc.) fd -e rs . --exclude vortex-array --exclude vortex-dtype -x sed -i '' "$MACRO_SED" # Also fix macro imports in vortex-duckdb and vortex-python specifically for any # non-.rs files (e.g. build scripts, pyo3 bindings) that may use these macros. fd -e rs . vortex-duckdb -x sed -i '' "$MACRO_SED" fd -e rs . vortex-python -x sed -i '' "$MACRO_SED" # ─── Step 8: Update vortex-array/Cargo.toml ───────────────────────────────── # 8a: Add cudarc (optional) — after cfg-if alphabetically. sed -i '' '/^cfg-if = { workspace = true }/a\ cudarc = { workspace = true, optional = true } ' vortex-array/Cargo.toml # 8b: Add half with num-traits feature — after goldenfile. sed -i '' '/^goldenfile = /a\ half = { workspace = true, features = ["num-traits"] } ' vortex-array/Cargo.toml # 8c: Add jiff — after itertools. sed -i '' '/^itertools = { workspace = true }/a\ jiff = { workspace = true } ' vortex-array/Cargo.toml # 8d: Add primitive-types (optional) — after pin-project-lite. sed -i '' '/^pin-project-lite = { workspace = true }/a\ primitive-types = { workspace = true, optional = true, features = ["arbitrary"] } ' vortex-array/Cargo.toml # 8e: Add "dtype" feature to vortex-flatbuffers. sed -i '' 's/vortex-flatbuffers = { workspace = true, features = \["array"\] }/vortex-flatbuffers = { workspace = true, features = ["array", "dtype"] }/' \ vortex-array/Cargo.toml # 8f: Add "dtype" feature to vortex-proto. sed -i '' 's/vortex-proto = { workspace = true, features = \["expr", "scalar"\] }/vortex-proto = { workspace = true, features = ["dtype", "expr", "scalar"] }/' \ vortex-array/Cargo.toml # 8g: Add "flatbuffers" feature to vortex-error. sed -i '' 's/vortex-error = { workspace = true }/vortex-error = { workspace = true, features = ["flatbuffers"] }/' \ vortex-array/Cargo.toml # 8h: Add "rc" feature to serde (dtype needs serde with "rc" + "derive"). sed -i '' 's/serde = { workspace = true, optional = true, features = \["derive"\] }/serde = { workspace = true, optional = true, features = ["derive", "rc"] }/' \ vortex-array/Cargo.toml # 8i: Remove vortex-dtype dependency. sed -i '' '/^vortex-dtype = /d' vortex-array/Cargo.toml # 8j: Update arbitrary feature — replace vortex-dtype/arbitrary with dep:primitive-types. sed -i '' 's/arbitrary = \["dep:arbitrary", "vortex-dtype\/arbitrary"\]/arbitrary = ["dep:arbitrary", "dep:primitive-types"]/' \ vortex-array/Cargo.toml # 8k: Add cudarc feature (after canonical_counter). sed -i '' '/^canonical_counter = \[\]/a\ cudarc = ["dep:cudarc"] ' vortex-array/Cargo.toml # 8l: Remove "vortex-dtype/serde" from serde feature list. sed -i '' '/"vortex-dtype\/serde",/d' vortex-array/Cargo.toml # 8m: Add serde_json and serde_test to dev-dependencies. sed -i '' '/^rstest = { workspace = true }$/a\ serde_json = { workspace = true }\ serde_test = { workspace = true } ' vortex-array/Cargo.toml # ─── Step 9: Remove vortex-dtype dep from all other Cargo.toml files ──────── # First, add vortex-array to any Cargo.toml that has vortex-dtype but not # vortex-array (e.g. vortex-cuda/cub). for f in $(fd -g Cargo.toml . --exclude vortex-array --exclude vortex-dtype); do if rg -q '^vortex-dtype = ' "$f" && ! rg -q '^vortex-array = ' "$f"; then sed -i '' '/^vortex-dtype = /a\ vortex-array = { workspace = true } ' "$f" fi done # Propagate cudarc feature to vortex-cuda's vortex-array dependency. sed -i '' 's/^vortex-array = { workspace = true }$/vortex-array = { workspace = true, features = ["cudarc"] }/' \ vortex-cuda/Cargo.toml # Remove vortex-dtype dependency from all other Cargo.toml files. fd -g Cargo.toml . \ --exclude vortex-array \ --exclude vortex-dtype \ -x sed -i '' '/^vortex-dtype = /d' # Remove "vortex-dtype/serde" from the umbrella vortex crate's serde feature. sed -i '' '/"vortex-dtype\/serde",/d' vortex/Cargo.toml # ─── Step 10: Update vortex/src/lib.rs (umbrella crate) ───────────────────── # Point the dtype re-export at vortex-array::dtype instead of vortex_dtype. sed -i '' 's/pub use vortex_dtype::\*;/pub use vortex_array::dtype::*;/' \ vortex/src/lib.rs # Fix the DTypeSession import. sed -i '' 's/use vortex_dtype::session::DTypeSession;/use vortex_array::dtype::session::DTypeSession;/' \ vortex/src/lib.rs # ─── Step 11: Remove vortex-dtype from workspace root Cargo.toml ──────────── # Remove from members list. sed -i '' '/"vortex-dtype",/d' Cargo.toml # Remove workspace dependency definition. sed -i '' '/^vortex-dtype = .*path = /d' Cargo.toml # ─── Step 12: Delete the vortex-dtype crate entirely ──────────────────────── rm -rf vortex-dtype # ─── Step 13: Format files ────────────────────────────────────────────────── cargo +nightly fmt --all taplo fmt # ─── Step 14: Update public API lockfile ───────────────────────────────────── bash ./scripts/public-api.sh cargo update --manifest-path java/testfiles/Cargo.toml echo "" echo "Done! Run these to verify:" echo " cargo clippy --all-targets --all-features" echo " cargo nextest run --all-features --no-fail-fast" ``` </details> ## API Change Removes `vortex-dtype`. Everything can be accessed via `vortex-array` or `vortex::array::dtype`. ## Testing This is a cosmetic (albeit very large) change, so no functionality is different. --------- Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
1 parent 8fd8b8d commit bf61c07

632 files changed

Lines changed: 6298 additions & 6922 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -666,7 +666,7 @@ jobs:
666666
matrix:
667667
include:
668668
- { shard: 1, name: "Core foundation", packages: "vortex-buffer vortex-error" }
669-
- { shard: 2, name: "Arrays", packages: "vortex-array vortex-dtype", features: "--features _test-harness" }
669+
- { shard: 2, name: "Arrays", packages: "vortex-array", features: "--features _test-harness" }
670670
- { shard: 3, name: "Main library", packages: "vortex" }
671671
- { shard: 4, name: "Encodings 1", packages: "vortex-alp vortex-bytebool vortex-datetime-parts" }
672672
- { shard: 5, name: "Encodings 2", packages: "vortex-decimal-byte-parts vortex-fastlanes vortex-fsst", features: "--features _test-harness" }

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616

1717
* `vortex-buffer` defines zero-copy aligned `Buffer<T>` and `BufferMut<T>` that are guaranteed
1818
to be aligned to `T` (or whatever requested runtime alignment).
19-
* `vortex-dtype` contains the basic `DType` logical type enum that is the basis of the Vortex
19+
* `vortex-array/src/dtype` contains the basic `DType` logical type enum that is the basis of the Vortex
2020
type system
2121
* `vortex-array` contains the basic `Array` trait, as well as several encodings which impl
2222
that trait for each encoding. It includes all of most of the Apache Arrow encodings.

Cargo.lock

Lines changed: 7 additions & 54 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@ members = [
1010
"vortex-metrics",
1111
"vortex-io",
1212
"vortex-proto",
13-
"vortex-dtype",
1413
"vortex-array",
1514
"vortex-btrblocks",
1615
"vortex-layout",
@@ -254,7 +253,6 @@ vortex-bytebool = { version = "0.1.0", path = "./encodings/bytebool", default-fe
254253
vortex-datafusion = { version = "0.1.0", path = "./vortex-datafusion", default-features = false }
255254
vortex-datetime-parts = { version = "0.1.0", path = "./encodings/datetime-parts", default-features = false }
256255
vortex-decimal-byte-parts = { version = "0.1.0", path = "encodings/decimal-byte-parts", default-features = false }
257-
vortex-dtype = { version = "0.1.0", path = "./vortex-dtype", default-features = false }
258256
vortex-error = { version = "0.1.0", path = "./vortex-error", default-features = false }
259257
vortex-fastlanes = { version = "0.1.0", path = "./encodings/fastlanes", default-features = false }
260258
vortex-file = { version = "0.1.0", path = "./vortex-file", default-features = false }

docs/developer-guide/internals/architecture.md

Lines changed: 38 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -24,60 +24,60 @@ query engine integrations build on the file reading and scan APIs exposed throug
2424
The core crates provide the foundation for the Vortex type system, array representation, file
2525
format, and I/O.
2626

27-
| Crate | Role |
28-
|----------------------|--------------------------------------------------------------------------------|
29-
| `vortex-error` | `VortexError` and `VortexResult` types, `vortex_err!` / `vortex_bail!` macros |
30-
| `vortex-buffer` | Zero-copy aligned `Buffer<T>` with guaranteed alignment |
31-
| `vortex-dtype` | `DType` enum: Null, Bool, Primitive, UTF8, Binary, Struct, List, Extension |
32-
| `vortex-scalar` | Single-value representations of each dtype |
33-
| `vortex-mask` | Bitmask operations for validity and selection |
34-
| `vortex-session` | Session object holding registries for encodings, layouts, and extension types |
35-
| `vortex-array` | `Array` trait, canonical encodings, vtable system, statistics |
36-
| `vortex-io` | Async I/O abstraction (local filesystem, object store, HTTP) |
37-
| `vortex-layout` | Layout traits and built-in layouts (Flat, Struct, Chunked) |
38-
| `vortex-ipc` | IPC format for inter-process communication |
39-
| `vortex-file` | `.vortex` file reading and writing |
40-
| `vortex-scan` | Table scan with filter and projection pushdown |
41-
| `vortex-expr` | Expression representation and optimization |
42-
| `vortex-flatbuffers` | FlatBuffer schema definitions |
27+
| Crate | Role |
28+
| ------------------------- | ----------------------------------------------------------------------------- |
29+
| `vortex-error` | `VortexError` and `VortexResult` types, `vortex_err!` / `vortex_bail!` macros |
30+
| `vortex-buffer` | Zero-copy aligned `Buffer<T>` with guaranteed alignment |
31+
| `vortex-array/src/dtype` | `DType` enum: Null, Bool, Primitive, UTF8, Binary, Struct, List, Extension |
32+
| `vortex-array/src/scalar` | Single-value representations of each dtype |
33+
| `vortex-mask` | Bitmask operations for validity and selection |
34+
| `vortex-session` | Session object holding registries for encodings, layouts, and extension types |
35+
| `vortex-array` | `Array` trait, canonical encodings, vtable system, statistics |
36+
| `vortex-io` | Async I/O abstraction (local filesystem, object store, HTTP) |
37+
| `vortex-layout` | Layout traits and built-in layouts (Flat, Struct, Chunked) |
38+
| `vortex-ipc` | IPC format for inter-process communication |
39+
| `vortex-file` | `.vortex` file reading and writing |
40+
| `vortex-scan` | Table scan with filter and projection pushdown |
41+
| `vortex-expr` | Expression representation and optimization |
42+
| `vortex-flatbuffers` | FlatBuffer schema definitions |
4343

4444
## Encodings
4545

4646
Encodings live in separate crates under `/encodings/`. Each encoding implements the array vtable
4747
and registers itself with the session. The standard encodings are bundled into the `vortex` crate.
4848

49-
| Crate | Technique |
50-
|------------------------------|--------------------------------------------------------|
51-
| `vortex-alp` | Adaptive Lossless floating-Point compression |
52-
| `vortex-fastlanes` | FastLanes bit-packing, delta, and frame-of-reference |
53-
| `vortex-fsst` | Fast Static Symbol Table compression for strings |
54-
| `vortex-runend` | Run-end encoding for repetitive data |
55-
| `vortex-sparse` | Sparse array encoding |
56-
| `vortex-zigzag` | ZigZag encoding for signed integers |
57-
| `vortex-roaring` | Roaring bitmap encoding |
58-
| `vortex-dict` | Dictionary encoding |
59-
| `vortex-bytebool` | Byte-per-boolean encoding |
60-
| `vortex-datetime-parts` | DateTime field decomposition |
61-
| `vortex-decimal-byte-parts` | Decimal byte decomposition |
62-
| `vortex-sequence` | Arithmetic sequence encoding |
49+
| Crate | Technique |
50+
| --------------------------- | ---------------------------------------------------- |
51+
| `vortex-alp` | Adaptive Lossless floating-Point compression |
52+
| `vortex-fastlanes` | FastLanes bit-packing, delta, and frame-of-reference |
53+
| `vortex-fsst` | Fast Static Symbol Table compression for strings |
54+
| `vortex-runend` | Run-end encoding for repetitive data |
55+
| `vortex-sparse` | Sparse array encoding |
56+
| `vortex-zigzag` | ZigZag encoding for signed integers |
57+
| `vortex-roaring` | Roaring bitmap encoding |
58+
| `vortex-dict` | Dictionary encoding |
59+
| `vortex-bytebool` | Byte-per-boolean encoding |
60+
| `vortex-datetime-parts` | DateTime field decomposition |
61+
| `vortex-decimal-byte-parts` | Decimal byte decomposition |
62+
| `vortex-sequence` | Arithmetic sequence encoding |
6363

6464
## Language Bindings
6565

6666
Language bindings expose Vortex to non-Rust environments.
6767

68-
| Directory | Role |
69-
|--------------------|----------------------------------------|
70-
| `vortex-python/` | Python bindings via PyO3 and Maturin |
71-
| `java/vortex-jni/` | Java JNI bindings |
72-
| `vortex-ffi/` | C FFI bindings (generates `vortex.h`) |
73-
| `vortex-cxx/` | C++ wrapper around the C FFI |
68+
| Directory | Role |
69+
| ------------------ | ------------------------------------- |
70+
| `vortex-python/` | Python bindings via PyO3 and Maturin |
71+
| `java/vortex-jni/` | Java JNI bindings |
72+
| `vortex-ffi/` | C FFI bindings (generates `vortex.h`) |
73+
| `vortex-cxx/` | C++ wrapper around the C FFI |
7474

7575
## Integrations
7676

7777
Query engine integrations allow Vortex files to be queried through existing analytics engines.
7878

7979
| Crate / Directory | Engine | Notes |
80-
|----------------------|------------|----------------------------------------------|
80+
| -------------------- | ---------- | -------------------------------------------- |
8181
| `vortex-datafusion/` | DataFusion | `TableProvider` and `FileFormat` integration |
8282
| `vortex-duckdb/` | DuckDB | Table function integration |
8383
| `java/vortex-spark/` | Spark | DataSource V2 connector via JNI |
@@ -86,7 +86,7 @@ Query engine integrations allow Vortex files to be queried through existing anal
8686
## Other Crates
8787

8888
| Crate | Role |
89-
|----------------|--------------------------------------------------------|
89+
| -------------- | ------------------------------------------------------ |
9090
| `vortex-cuda` | GPU-accelerated decompression and compute (Linux only) |
9191
| `vortex-tui` | Terminal UI for inspecting Vortex files |
9292
| `vortex-bench` | Benchmark harness and data generators |

docs/developer-guide/internals/session.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Each Vortex crate defines a session variable that holds a registry for its exten
2727

2828
| Session Variable | Crate | Registry Contents |
2929
|-------------------|------------------|----------------------------------------------|
30-
| `DTypeSession` | `vortex-dtype` | Extension dtype vtables (Date, Time, ...) |
30+
| `DTypeSession` | `vortex-array` | Extension dtype vtables (Date, Time, ...) |
3131
| `ArraySession` | `vortex-array` | Array encoding vtables (ALP, FSST, ...) |
3232
| `ExprSession` | `vortex-array` | Scalar expression vtables |
3333
| `LayoutSession` | `vortex-layout` | Layout encoding vtables (Flat, Chunked, ...) |

0 commit comments

Comments
 (0)