Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/language/reference/dataset_methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,10 @@ The Substrait helper surface behind these methods is split by semantic role:
| `with_column` | `def with_column(self, name: str, expr: ColumnExpr) -> Self` | Add or replace one projected column using a scalar expression. |
| `group_by` | `def group_by(self, columns: list[ColumnExpr]) -> Self` | Define grouping keys using scalar expressions. |
| `agg` | `def agg(self, measures: list[AggregateMeasure]) -> Self` | Apply aggregate measures over the current relation or current grouping. |
| `generate` | `def generate(self, generator: GeneratorApplication) -> Self` | Apply a relation-shaping generator such as `explode(...)` with explicit output aliases. |
| `order_by` | `def order_by(self, columns: list[ColumnExpr]) -> Self` | Sort rows by scalar expressions or ordering helpers such as `asc(...)` and `desc(...)`. |
| `limit` | `def limit(self, n: int) -> Self` | Cap row count. |
| `explode` | `def explode(self) -> Self` | Expand a nested list column into rows. |
| `explode` | `def explode(self) -> Self` | Emit the lower-level `EXPLODE` extension boundary without expression/schema metadata. |

## `with_column`

Expand Down Expand Up @@ -67,6 +68,7 @@ def enrich(orders: LazyFrame[Order]) -> LazyFrame[Order]:

- `join(...)` is constrained to same-carrier inputs and the boolean join predicate surface shown in the signature.
- `select(...)` preserves projection shape; explicit projection lists are represented today through `with_column(...)` and scalar-expression builders.
- `generate(...)` preserves all input columns and appends generated output aliases for `explode`, `explode_outer`, `posexplode`, `posexplode_outer`, `inline`, `inline_outer`, `flatten`, and `stack` generator applications. Alias collisions are rejected during planning/lowering.
- `DataFrame[T]` exposes materialized metadata and preview text; row-level accessors belong to the materialized DataFrame API surface.
- Query-block and scoped DSL surfaces lower into these builder APIs rather than defining separate method semantics.

Expand Down
43 changes: 43 additions & 0 deletions docs/language/reference/functions/generators.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Generator and Table-Valued Functions (Reference)

Generators are relation-shaping operations. They are registry-backed like scalar and aggregate helpers, but they return
`GeneratorApplication` values and must be applied through a relation method such as `generate(...)`.

```incan
from pub::inql import LazyFrame
from pub::inql.functions import array, col, explode, inline, lit, named_struct
from models import Order

def order_lines(orders: LazyFrame[Order]) -> LazyFrame[Order]:
return orders.generate(explode(col("line_items"), "line_item"))

def fixed_items(orders: LazyFrame[Order]) -> LazyFrame[Order]:
rows = array([
named_struct(["sku", "quantity"], [lit("A"), lit(1)]),
named_struct(["sku", "quantity"], [lit("B"), lit(2)]),
])
return orders.generate(inline(rows, ["sku", "quantity"]))
```

The explicit generator surface currently includes:

| Function | Output aliases | Relation effect |
| --- | --- | --- |
| `explode(expr, as_)` | one value column | Emits one row per array element; null or empty inputs emit zero rows. |
| `explode_outer(expr, as_)` | one value column | Preserves the input row for null or empty inputs and emits a null generated value. |
| `posexplode(expr, position_as, value_as)` | position and value columns | Emits one row per array element with a zero-based position column. |
| `posexplode_outer(expr, position_as, value_as)` | position and value columns | Outer positional explode with the same zero-based position rule. |
| `inline(expr, output_columns)` | one column per struct field | Expands array-of-struct values into generated rows and declared output columns. |
| `inline_outer(expr, output_columns)` | one column per struct field | Outer inline with the same null/empty row preservation rule. |
| `flatten(expr, as_)` | one value column | Portable table-valued flatten for one array expression. |
| `stack(row_count, values, output_columns)` | declared output columns | Emits `row_count` generated rows from row-major scalar values. |

Generator applications preserve input columns and append generated columns in declaration order. Generated aliases are
required, must be non-empty, and must not collide with existing input columns.

The zero-argument `DataSet.explode()` method is a lower-level extension-boundary operation. It emits the registered
`EXPLODE` relation extension without carrying a source expression or generated output schema. Generator code should use
`generate(explode(...))` so the relation-shaping function identity, input expression, and output schema are explicit.

Nested scalar helpers such as `array_flatten(...)` remain scalar expressions. They do not expand rows and are documented
on the [nested data functions](nested.md) page. The relation-shaping `flatten(...)` helper is intentionally separate.
6 changes: 4 additions & 2 deletions docs/language/reference/functions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,12 @@ Today the concrete shipped surfaces are documented here:
- [Filter builders](../builders/filters.md)
- [Aggregate builders](../builders/aggregates.md)
- [Projection builders](../builders/projections.md)
- [Generator and table-valued functions](generators.md)
- [Nested data functions](nested.md)

The canonical scalar literal helper is `lit(...)`. Typed literal helpers construct the same scalar-expression representation.

The current registry-backed helper surface covers references, literals, casts, operators, predicates, conditionals, math, ordering, aggregates, and nested data. Each runtime entry exposes a stable function reference such as `inql.functions.col`, namespace, canonical name, typed lifecycle metadata (`since`, versioned changes, and optional deprecation), function policy category, function class, null behavior, alias policy, aggregate modifier policy, and Substrait mapping metadata. Checked public helpers provide the signature and, by default, the canonical name; metadata may override the canonical name only for source spelling constraints such as the reserved-word `mod` case.
The current registry-backed helper surface covers references, literals, casts, operators, predicates, conditionals, math, ordering, aggregates, generators, and nested data. Each runtime entry exposes a stable function reference such as `inql.functions.col`, namespace, canonical name, typed lifecycle metadata (`since`, versioned changes, and optional deprecation), function policy category, function class, null behavior, alias policy, aggregate modifier policy, and Substrait mapping metadata. Checked public helpers provide the signature and, by default, the canonical name; metadata may override the canonical name only for source spelling constraints such as the reserved-word `mod` case.

The registry is the source for non-derivable machine facts. Public helper declarations are the source for argument names, argument types, and return types. Docstrings remain human-facing explanation, examples, and parameter intent. The `registry-metadata` check validates the checked API metadata projections produced from public facade aliases, registry decorators, and decorated callable signatures. Runtime registry entries are lazy and process-local: they support helper execution and lowering for loaded helpers, while the complete public catalog comes from checked metadata. This matters for generated docs, diagnostics, Prism lowering, and backend capability checks as the catalog grows.

Expand All @@ -32,7 +33,8 @@ The registered helper surface currently includes:
| `coalesce(...)`, `nullif(...)`, `case_when(...)` | scalar | registered Substrait mappings; `case_when(...)` lowers as built-in `IfThen` |
| `in_(...)`, `between(...)` | scalar | built-in membership/range lowering (`SingularOrList` and `between`) |
| `abs(...)`, `ceil(...)`, `floor(...)`, `round(...)` | scalar | registered Substrait math scalar mappings; `round(...)` is currently the single-argument form |
| `array(...)`, `cardinality(...)`, `array_contains(...)`, `arrays_overlap(...)`, `array_position(...)`, `element_at(...)`, `array_sort(...)`, `array_distinct(...)`, `array_except(...)`, `array_intersect(...)`, `array_union(...)`, `array_join(...)`, `array_slice(...)`, `array_reverse(...)`, `array_flatten(...)`, `map_from_arrays(...)`, `map_extract(...)`, `map_contains_key(...)`, `map_keys(...)`, `map_values(...)`, `map_entries(...)`, `named_struct(...)` | scalar | registered nested scalar helpers backed by Substrait extension mappings; `map_contains_key(...)` lowers as a documented predicate rewrite |
| `array(...)`, `cardinality(...)`, `array_contains(...)`, `arrays_overlap(...)`, `array_position(...)`, `array_range(...)`, `element_at(...)`, `array_sort(...)`, `array_distinct(...)`, `array_except(...)`, `array_intersect(...)`, `array_union(...)`, `array_join(...)`, `array_slice(...)`, `array_reverse(...)`, `array_flatten(...)`, `map_from_arrays(...)`, `map_extract(...)`, `map_contains_key(...)`, `map_keys(...)`, `map_values(...)`, `map_entries(...)`, `named_struct(...)` | scalar | registered nested scalar helpers backed by Substrait extension mappings; `array_range(...)` registers canonical `range` for positional generator lowering and `map_contains_key(...)` lowers as a documented predicate rewrite |
| `explode(...)`, `explode_outer(...)`, `posexplode(...)`, `posexplode_outer(...)`, `inline(...)`, `inline_outer(...)`, `flatten(...)`, `stack(...)` | generator | relation-extension mappings consumed by `generate(...)`; positional forms use zero-based positions |
| `asc(...)`, `desc(...)`, `asc_nulls_first(...)`, `asc_nulls_last(...)`, `desc_nulls_first(...)`, `desc_nulls_last(...)` | ordering | structural sort-field helpers consumed by `order_by(...)` and lowered to Substrait `SortRel.sorts` |
| `sum(...)`, `count(...)`, `count_expr(...)`, `count_distinct(...)`, `count_if(...)`, `avg(...)`, `min(...)`, `max(...)` | aggregate | registered Substrait extension functions for core aggregates plus compatibility rewrites for `count_expr(...)`, `count_distinct(...)`, and `count_if(...)`; core aggregates allow `DISTINCT` and aggregate-local `FILTER` where the aggregate shape is valid |

Expand Down
3 changes: 2 additions & 1 deletion docs/language/reference/functions/nested.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Generator or table-valued operations such as row-expanding `explode(...)` are se
| `array_intersect(left, right)` | Return elements shared by both arrays. |
| `array_union(left, right)` | Return the union of both arrays. |
| `array_join(array_expr, delimiter)` | Join a string array into one string. |
| `array_range(start, stop)` | Build a row-level integer array from `start` inclusive to `stop` exclusive. |
| `array_slice(array_expr, start, stop)` | Return a one-based array slice using the backend adapter's slice contract. |
| `array_reverse(array_expr)` | Reverse one array value. |
| `array_flatten(array_expr)` | Flatten an array-of-arrays into one row-level array value. |
Expand Down Expand Up @@ -54,5 +55,5 @@ projected = (

- Array indexing is one-based for `element_at(...)`, `array_position(...)`, and `array_slice(...)`.
- `element_at(...)` currently maps to the portable array-element adapter path. Out-of-range behavior follows the current backend adapter's recoverable result until InQL has a richer static/runtime error-policy split for strict versus try-style element access.
- `array_flatten(...)` is intentionally named to avoid colliding with future table-valued or generator `flatten(...)` forms.
- `array_flatten(...)` is intentionally named to stay distinct from the relation-shaping generator `flatten(...)`.
- Grouping or ordering by nested values is not documented as portable until equality and ordering semantics for arrays, maps, and structs are specified.
3 changes: 3 additions & 0 deletions docs/language/reference/substrait/operator_catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,9 @@ Core Substrait does not define a portable unnest or explode `Rel` at the logical
Current package-level RFC 002 boundary registration:

- `https://inql.io/extensions/v0.1/unnest.yaml#explode`
- `https://inql.io/extensions/v0.1/unnest.yaml#explode_outer`
- `https://inql.io/extensions/v0.1/unnest.yaml#posexplode`
- `https://inql.io/extensions/v0.1/unnest.yaml#posexplode_outer`

### Pivot / unpivot

Expand Down
1 change: 1 addition & 0 deletions docs/release_notes/v0_1.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Entries will be filled in as work lands (link RFCs and PRs when applicable).
- **Core scalar functions:** RFC 015 adds registry-backed scalar function applications and the first core helper slice for casts, comparisons, boolean logic, null/NaN predicates, arithmetic, conditionals, membership/range predicates, and ordering expressions. Implemented helpers lower to Substrait IR through registry metadata, built-in Rex shapes, or structural sort-field lowering; DataFusion remains the first execution adapter rather than the semantic boundary.
- **Common scalar functions:** The first RFC 018 slice adds registry-backed math helpers for `abs(...)`, `ceil(...)`, `floor(...)`, and single-argument `round(...)`, with Substrait mappings and DataFusion-backed execution coverage.
- **Nested data functions:** RFC 020 adds registry-backed scalar helpers for array construction/access, cardinality, containment, overlap, sorting, set-like operations, joining, slicing, reversing, scalar array flattening, map construction/access, map key/value/entry extraction, map key containment, and named struct construction. These helpers lower through Substrait extension metadata without introducing generator semantics, with representative DataFusion-backed Session coverage for composable array projection paths.
- **Generator functions:** RFC 021 adds registry-backed generator applications for `explode(...)`, `explode_outer(...)`, `posexplode(...)`, `posexplode_outer(...)`, `inline(...)`, `inline_outer(...)`, portable `flatten(...)`, and `stack(...)`. Generators remain relation-shaping operations applied with `generate(...)`; they preserve input columns, require explicit output aliases, lower through the current Substrait extension-relation gap encoding, and execute through the DataFusion Session adapter with concrete output-column materialization.
- **Function registry:** RFC 014 adds declaration-site registry decorators for the current public helper surface, including stable function references, checked signature projection, lifecycle metadata, behavior categories, alias policy, Substrait mapping categories, and checked API metadata drift validation.
- **Function extension policy:** InQL RFC 024 policy metadata now distinguishes portable core functions, namespaced extension-only functions, opt-in compatibility aliases, engine-specific functions, and rejected compatibility requests without adding an extension plugin system or backend-owned semantics.
- **Projection:** builder-based `with_column`, `add`, `mul`, and literal expression helpers now lower derived columns through Prism, Substrait, and Session execution.
Expand Down
Loading
Loading