This page documents the current DataSet[T] method surface. Builder-function details live under reference/builders/.
The Substrait helper surface behind these methods is split by semantic role:
src/substrait/relations.incnbuilds concreteRelnodessrc/substrait/plans.incnassemblesPlanenvelopessrc/substrait/inspect.incnowns relation/plan inspection and output-column inferencesrc/schema_registry.incnowns logical named-table schema binding
| Method | Signature | Meaning |
|---|---|---|
filter |
def filter(self, predicate: ColumnExpr) -> Self |
Restrict rows by a boolean scalar expression. |
join |
def join(self, other: Self, on: bool) -> Self |
Combine with another same-carrier relation using the package's boolean join predicate surface. |
select |
def select(self) -> Self |
Preserve the current projection shape as an identity projection. |
with_column |
def with_column(self, name: str, expr: ColumnExpr) -> Self |
Add or replace one projected column using a scalar expression. |
group_by |
def group_by(self, columns: list[ColumnExpr]) -> Self |
Define grouping keys using scalar expressions. |
agg |
def agg(self, measures: list[AggregateMeasure]) -> Self |
Apply aggregate measures over the current relation or current grouping. |
generate |
def generate(self, generator: GeneratorApplication) -> Self |
Apply a relation-shaping generator such as explode(...) with explicit output aliases. |
with_window_column |
def with_window_column(self, name: str, application: WindowFunctionApplication) -> Self |
Add or replace one projected column using a placed window function. |
order_by |
def order_by(self, columns: list[ColumnExpr]) -> Self |
Sort rows by scalar expressions or ordering helpers such as asc(...) and desc(...). |
limit |
def limit(self, n: int) -> Self |
Cap row count. |
def with_column(self, name: str, expr: ColumnExpr) -> Self
- If
namedoes not already exist, the new projected column is appended at the end. - If
namealready exists, that slot is replaced in place. - Replacement preserves ordinal position.
- The scalar-expression surface is:
col(name)lit(value)int_expr(...)float_expr(...)str_expr(...)bool_expr(...)add(left, right)mul(left, right)eq(left, right)gt(left, right)
from pub::inql import LazyFrame
from pub::inql.functions import add, col, lit, mul
from models import Order
def enrich(orders: LazyFrame[Order]) -> LazyFrame[Order]:
return (
orders
.with_column("amount_x2", mul(col("amount"), lit(2)))
.with_column("amount_plus_one", add(col("amount"), lit(1)))
)
join(...)is constrained to same-carrier inputs and the boolean join predicate surface shown in the signature.select(...)preserves projection shape; explicit projection lists are represented today throughwith_column(...)and scalar-expression builders.generate(...)preserves all input columns and appends generated output aliases forexplode,explode_outer,posexplode,posexplode_outer,inline,inline_outer,flatten, andstackgenerator applications. Alias collisions are rejected during planning/lowering.with_window_column(...)supports placed ranking, distribution, offset, value, and aggregate-over-window helpers over explicit window specs. Portable helpers lower through Substrait window relations and execute through the DataFusion session adapter.DataFrame[T]exposes materialized metadata and preview text; row-level accessors belong to the materialized DataFrame API surface.- Query-block and scoped DSL surfaces lower into these builder APIs rather than defining separate method semantics.