Skip to content

Rb pre parse schema#3

Draft
robertbuessow wants to merge 2 commits into
mainfrom
rb-pre-parse-schema
Draft

Rb pre parse schema#3
robertbuessow wants to merge 2 commits into
mainfrom
rb-pre-parse-schema

Conversation

@robertbuessow
Copy link
Copy Markdown
Collaborator

No description provided.

robertbuessow and others added 2 commits May 27, 2026 13:23
…hema, CBuffer, COffsets, lazy columns, @generated dispatch

Zero-allocation-per-schema schema parsing:
- Add `SchemaNode` (replaces per-batch unsafe_string + format dispatch) and `TableSchema`
  (pre-built col_names + lookup Dict) so the schema is parsed once via `parse_c_schema`
  and every subsequent batch import skips all string work.
- Add `from_c_data(::TableSchema, array_ptrs)` and `from_c_data(::SchemaNode, ptr)`
  as the fast-path entry points.

Zero-boxing C buffer wrappers:
- `CBuffer{T} <: AbstractVector{T}`: isbits wrapper around a C pointer that avoids
  the `Vector` header allocation produced by `unsafe_wrap`. Stored inline in
  `Primitive.data`, `List.data`, etc.
- `COffsets{T} <: AbstractOffsets{T}`: isbits wrapper for list/map offset arrays.
  Parameterize `List{T,O,A,OF}` and `Map{T,O,A,OF}` on the offsets type so
  `COffsets` is embedded inline instead of heap-allocated.
- `AbstractOffsets{T}` abstract type + `_raw_offsets` helper keep the IPC write
  path working unchanged.

Lazy CImportedTable (no pre-built column vector):
- `CImportedTable` stores `arr_ptrs::Vector{Ptr{ArrowArray}}` (isbits) instead of
  `columns::Vector{CImportedArray}` (abstract, boxed). Column ArrowVectors are
  constructed on demand in `Tables.getcolumn`.
- `shared_handle` field distinguishes stream-owned arrays (release root once) from
  individually-owned flat arrays.

Remove per-column overhead in `_import_arrowvec_fast`:
- Drop `handle` parameter entirely — it was only threaded through for the now-removed
  `CImportedArray` wrapping.
- `ArrowArray` and `ArrowSchema` changed from `mutable struct` to `struct`; `unsafe_load`
  now returns a stack value with zero allocation.
- `_ALL_VALID` singleton: null-free columns reuse one pre-allocated `ValidityBitmap`
  instead of allocating a new one per column.

Type-stability fixes:
- `@generated _import_prim_fast(... ::Val{S})` makes `S = node.storage_type` concrete
  at compile time so `CBuffer{S}` is provably isbits and stack-allocated.
- `@generated _make_dict_indices(... ::Val{S})` applies the same treatment narrowly to
  dict index construction (specialises only on S, avoiding combinatorial blowup over
  dict value types that caused OOM under JULIA_NUM_THREADS=2).
- Split `CKIND_STR32/BIN32` and `CKIND_STR64/BIN64` branches to eliminate the
  `OT::Union{Type{Int32},Type{Int64}}` phi-node union.
- Rename `data_bytes` in the bool branch to prevent it merging with the string branch
  into a type-unstable union.

Benchmark (bench/schema_cache.jl): 10-column mixed table (int+float+string+nullable),
cached path vs. parse-every-call baseline:
  Before: ~16.6 μs baseline, no column access measured
  After:   ~7.7 μs baseline / ~1.6 μs cached — **4.7× speedup**, 34 allocs

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…w pairs

Revert the ValidityBitmap struct from Ptr{UInt8}+ref back to the original
Vector{UInt8}+pos layout, and update all call sites accordingly:

- Restore ValidityBitmap struct fields: bytes::Vector{UInt8}, pos::Int
- Revert _build_validity() back to ValidityBitmap() constructor
- Revert getindex/setindex! to array-indexed access
- Revert writebitmap to view-based approach
- Remove _ALL_VALID singleton (was only valid with pointer-based struct)
- Revert _import_validity() to copy C bytes into Vector{UInt8}
- Revert _validity_ptr() export helper to use pointer(bm.bytes, bm.pos)
- Replace push!(roots, v.validity.ref) with push!(roots, v.validity.bytes)

Separately, collapse unsafe_wrap+view offset patterns into direct
pointer-arithmetic wraps throughout _import_arrowvec:

- Fixed-size binary: unsafe_wrap(dptr + off*N, len*N)
- String/binary offsets: unsafe_wrap(optr + off*sizeof(OT), len+1)
- Generic list offsets: same
- Map offsets: same (Int32)
- Dict index array: unsafe_wrap(iptr + off*sizeof(S), len)
- Primitive data: unsafe_wrap(dptr + off*sizeof(S), len)

Each case advances the pointer by off × element_size bytes and wraps
exactly the needed count, removing the intermediate array allocation
and the conditional view.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant