Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ concurrency:
env:
CARGO_TERM_COLOR: always
INCAN_REF: release/v0.3
EXPECTED_INCAN_VERSION: 0.3.0-rc19
EXPECTED_INCAN_VERSION: 0.3.0-rc20
RUST_BACKTRACE: 1
INCAN_NO_BANNER: 1

Expand Down
32 changes: 26 additions & 6 deletions docs/language/reference/builders/aggregates.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,21 +9,36 @@ Current aggregate authoring is explicit and scalar-expression-based.
| `col` | `def col(name: str) -> ColumnExpr` | Column reference builder used by aggregates, filters, and projections. |
| `lit` | `def lit(value: int \| float \| str \| bool) -> ColumnExpr` | Canonical scalar literal helper. |
| `sum` | `def sum(expr: ColumnExpr) -> AggregateMeasure` | Sum one scalar expression. |
| `count` | `def count() -> AggregateMeasure` | Count rows. |
| `count_expr` | `def count_expr(expr: ColumnExpr) -> AggregateMeasure` | Count non-null expression values; compatibility spelling for the future `count(expr)` form. |
| `count` | `def count(*exprs: ColumnExpr) -> AggregateMeasure` | Count rows with no argument, or count non-null expression values with one argument. |
| `count_expr` | `def count_expr(expr: ColumnExpr) -> AggregateMeasure` | Compatibility spelling for `count(expr)`. |
| `count_distinct` | `def count_distinct(expr: ColumnExpr) -> AggregateMeasure` | Count distinct non-null expression values. |
| `count_if` | `def count_if(predicate: ColumnExpr) -> AggregateMeasure` | Count rows where the predicate is true. |
| `avg` | `def avg(expr: ColumnExpr) -> AggregateMeasure` | Average one numeric scalar expression. |
| `min` | `def min(expr: ColumnExpr) -> AggregateMeasure` | Return the minimum non-null value for one orderable scalar expression. |
| `max` | `def max(expr: ColumnExpr) -> AggregateMeasure` | Return the maximum non-null value for one orderable scalar expression. |

## Modifiers

Aggregate measures support method-style modifiers:

| Modifier | Signature | Meaning |
| --- | --- | --- |
| `distinct` | `measure.distinct() -> AggregateMeasure` | Apply SQL-style `DISTINCT` to aggregate input values. |
| `filter` | `measure.filter(predicate: ColumnExpr) -> AggregateMeasure` | Apply an aggregate-local boolean predicate before aggregation. |
| `order_by` | `measure.order_by(ordering: list[ColumnExpr]) -> AggregateMeasure` | Record ordered aggregate input. Core aggregates reject ordered input until an order-sensitive aggregate lands. |

## Example

```incan
from pub::inql.functions import add, avg, col, count, count_expr, lit, max, min, sum
from pub::inql.functions import add, avg, col, count, count_distinct, count_if, eq, lit, max, min, str_lit, sum

grouped = orders.group_by([col("customer_id")]).agg([
sum(add(col("amount"), lit(5))),
count(),
count_expr(col("discount_code")),
count(col("discount_code")),
count_distinct(col("product_id")),
count_if(eq(col("status"), str_lit("paid"))),
sum(col("amount")).filter(eq(col("status"), str_lit("paid"))),
avg(col("amount")),
min(col("created_at")),
max(col("created_at")),
Expand All @@ -33,7 +48,12 @@ grouped = orders.group_by([col("customer_id")]).agg([
## Notes

- Aggregate inputs use the same scalar-expression model as filters, projections, and grouping keys.
- `count()` counts rows. `count_expr(expr)` counts non-null values produced by the expression and lowers to the same
canonical `count` Substrait extension function.
- `count()` counts rows. `count(expr)` counts non-null values produced by the expression.
- `count(...)` accepts zero or one expression; passing multiple expressions is an error.
- `count_expr(expr)` is a compatibility spelling for `count(expr)`.
- `count_distinct(expr)` is compatibility sugar for `count(expr).distinct()`.
- `count_if(predicate)` is compatibility sugar for `count().filter(predicate)`. Rows where the predicate is false or
null do not contribute to the aggregate.
- `sum`, `avg`, `min`, and `max` skip null values. They return backend-null results when no non-null input value exists.
- Unsupported aggregate modifiers fail at lowering or backend planning; they are not ignored.
- Future `.column` sugar and scoped aggregate symbols should lower to this same surface rather than replacing its semantics.
4 changes: 2 additions & 2 deletions docs/language/reference/functions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Today the concrete shipped surfaces are documented here:

The canonical scalar literal helper is `lit(...)`. Typed literal helpers construct the same scalar-expression representation.

The current registry-backed helper surface is registered in the package-owned function registry. Registry types live in `src/function_registry.incn`, the shared package registry lives in `src/functions/registry.incn`, and concrete public helper entries are produced by `register_function(...)` decorators in individual `src/functions/<family>/<name>.incn` modules. The registry-backed families are references, literals, casts, operators, predicates, conditionals, math, ordering, and aggregates. Each runtime entry exposes a stable function reference such as `inql.functions.col`, namespace, canonical name, typed lifecycle metadata (`since`, versioned changes, and optional deprecation), InQL RFC 024 policy category, function class, null behavior, alias policy, and Substrait mapping metadata. Checked public helpers provide the signature and, by default, the canonical name; decorator metadata may override the canonical name only for source spelling constraints such as the reserved-word `mod` case.
The current registry-backed helper surface is registered in the package-owned function registry. Registry types live in `src/function_registry.incn`, the shared package registry lives in `src/functions/registry.incn`, and concrete public helper entries are produced by `register_function(...)` decorators in individual `src/functions/<family>/<name>.incn` modules. The registry-backed families are references, literals, casts, operators, predicates, conditionals, math, ordering, and aggregates. Each runtime entry exposes a stable function reference such as `inql.functions.col`, namespace, canonical name, typed lifecycle metadata (`since`, versioned changes, and optional deprecation), InQL RFC 024 policy category, function class, null behavior, alias policy, aggregate modifier policy, and Substrait mapping metadata. Checked public helpers provide the signature and, by default, the canonical name; decorator metadata may override the canonical name only for source spelling constraints such as the reserved-word `mod` case.

The registry is the source for non-derivable machine facts. Public helper declarations are the source for argument names, argument types, and return types. Docstrings remain human-facing explanation, examples, and parameter intent. The `registry-metadata` check validates the checked API metadata projections produced from public facade aliases, registry decorators, and decorated callable signatures. Runtime registry entries are lazy and process-local: they support helper execution and lowering for loaded helpers, while the complete public catalog comes from checked metadata. This matters for generated docs, diagnostics, Prism lowering, and backend capability checks as the catalog grows.

Expand All @@ -32,6 +32,6 @@ The registered helper surface currently includes:
| `in_(...)`, `between(...)` | scalar | built-in membership/range lowering (`SingularOrList` and `between`) |
| `abs(...)`, `ceil(...)`, `floor(...)`, `round(...)` | scalar | registered Substrait math scalar mappings; `round(...)` is currently the single-argument form |
| `asc(...)`, `desc(...)`, `asc_nulls_first(...)`, `asc_nulls_last(...)`, `desc_nulls_first(...)`, `desc_nulls_last(...)` | ordering | structural sort-field helpers consumed by `order_by(...)` and lowered to Substrait `SortRel.sorts` |
| `sum(...)`, `count()`, `count_expr(...)`, `avg(...)`, `min(...)`, `max(...)` | aggregate | registered Substrait extension functions; `count_expr(...)` is a compatibility spelling for future `count(expr)` helper overloading |
| `sum(...)`, `count(...)`, `count_expr(...)`, `count_distinct(...)`, `count_if(...)`, `avg(...)`, `min(...)`, `max(...)` | aggregate | registered Substrait extension functions for core aggregates plus compatibility rewrites for `count_expr(...)`, `count_distinct(...)`, and `count_if(...)`; core aggregates allow `DISTINCT` and aggregate-local `FILTER` where the aggregate shape is valid |

Future ANSI-style families should grow under this section instead of bloating `dataset_types` or `dataset_methods`.
2 changes: 1 addition & 1 deletion docs/release_notes/v0_1.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Entries will be filled in as work lands (link RFCs and PRs when applicable).
- **Carriers:** `DataSet[T]` hierarchy including bounded vs unbounded traits and concrete frame/stream types.
- **Plans:** Apache Substrait as the logical interchange contract.
- **Authoring:** method-chain lowering into a real Substrait boundary today, with `query {}` work still ahead.
- **Aggregates:** builder-based `col`, `sum`, `count`, `count_expr`, `avg`, `min`, and `max` helpers now lower grouped and global aggregates through Prism, Substrait, and Session execution. `count()` counts rows, and `count_expr(expr)` counts non-null expression values while preserving the future `count(expr)` semantics.
- **Aggregates:** builder-based `col`, `sum`, `count`, `count_expr`, `count_distinct`, `count_if`, `avg`, `min`, and `max` helpers now lower grouped and global aggregates through Prism, Substrait, and Session execution. `count()` counts rows, `count(expr)` counts non-null expression values, `count_expr(expr)` remains a compatibility spelling, and the first aggregate modifier slice supports `DISTINCT` plus aggregate-local `FILTER` where valid.
- **Scalar expressions:** RFC 012 unifies filter predicates, computed projection values, grouping keys, and aggregate inputs around one `ColumnExpr` surface with canonical `lit(...)` and typed literal helpers.
- **Core scalar functions:** RFC 015 adds registry-backed scalar function applications and the first core helper slice for casts, comparisons, boolean logic, null/NaN predicates, arithmetic, conditionals, membership/range predicates, and ordering expressions. Implemented helpers lower to Substrait IR through registry metadata, built-in Rex shapes, or structural sort-field lowering; DataFusion remains the first execution adapter rather than the semantic boundary.
- **Common scalar functions:** The first RFC 018 slice adds registry-backed math helpers for `abs(...)`, `ceil(...)`, `floor(...)`, and single-argument `round(...)`, with Substrait mappings and DataFusion-backed execution coverage.
Expand Down
17 changes: 9 additions & 8 deletions docs/rfcs/016_core_aggregate_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,14 +45,14 @@ Without explicit rules, aggregate behavior can drift across authoring surfaces a
Authors can summarize grouped or whole-relation data with the core aggregate set:

```incan
from pub::inql.functions import avg, col, count, count_expr, max, min, sum
from pub::inql.functions import avg, col, count, max, min, sum

summary = (
orders
.group_by([col("customer_id")])
.agg([
count(),
count_expr(col("discount_code")),
count(col("discount_code")),
sum(col("amount")),
avg(col("amount")),
min(col("created_at")),
Expand All @@ -61,13 +61,13 @@ summary = (
)
```

`count()` counts rows. `count_expr(col("discount_code"))` counts non-null values in that expression. Numeric aggregates ignore null input values unless every value is null or the group is empty, in which case result behavior follows the rules below.
`count()` counts rows. `count(col("discount_code"))` counts non-null values in that expression. Numeric aggregates ignore null input values unless every value is null or the group is empty, in which case result behavior follows the rules below.

## Reference-level explanation (precise rules)

InQL must define canonical aggregate entries for `count`, `sum`, `avg`, `min`, and `max`.

`count()` must count input rows in the current relation or group. The v0.1 helper surface exposes expression-count semantics as `count_expr(expr)` because Incan does not yet support the exact same public helper spelling for both `count()` and `count(expr)`. `count_expr(expr)` must count rows where `expr` evaluates to a non-null value and lower through the same canonical aggregate mapping as `count(expr)`. `count` must return a non-null integer count. For an empty input relation or empty group, `count()` and expression-count semantics must return zero.
`count()` must count input rows in the current relation or group. `count(expr)` must count rows where `expr` evaluates to a non-null value. `count_expr(expr)` remains available as a compatibility spelling for `count(expr)`, but must lower through the same canonical aggregate mapping. `count` must return a non-null integer count. For an empty input relation or empty group, `count()` and expression-count semantics must return zero.

`sum(expr)` must accept numeric input expressions. It must ignore null input values. If no non-null value exists in the group or relation, `sum` must return null unless a later RFC defines an explicit defaulting aggregate. The result type must be derived from the input numeric type according to the numeric type policy.

Expand All @@ -87,7 +87,7 @@ This RFC requires importable aggregate functions. Query syntax may support SQL s

### Semantics

Core aggregates skip null values except for `count()`, which counts rows regardless of null values. Expression-count semantics count non-null expression results and are exposed by `count_expr(expr)` in the v0.1 helper surface.
Core aggregates skip null values except for `count()`, which counts rows regardless of null values. Expression-count semantics count non-null expression results and are exposed by `count(expr)`.

Aggregate argument expressions must be scalar expressions under InQL RFC 012. Aggregate arguments must not themselves contain aggregate outputs unless a later RFC defines nested aggregate semantics.

Expand All @@ -97,7 +97,7 @@ Aggregate argument expressions must be scalar expressions under InQL RFC 012. Ag

### Compatibility / migration

Existing `sum` and `count` helpers should be treated as compatibility-compatible forms of the canonical registry entries. `count()` behavior remains compatible with the current row-count intent. `count_expr(expr)` is the v0.1 compatibility spelling for expression-count semantics until Incan can expose the exact overloaded `count(expr)` helper shape.
Existing `sum` and `count` helpers should be treated as compatibility-compatible forms of the canonical registry entries. `count()` behavior remains compatible with the current row-count intent. `count_expr(expr)` remains as a compatibility spelling for expression-count semantics.

## Alternatives considered

Expand All @@ -109,7 +109,8 @@ Existing `sum` and `count` helpers should be treated as compatibility-compatible

- Null and empty-input behavior can surprise authors coming from APIs that default missing sums to zero.
- Result type policy for numeric aggregates is a cross-cutting dependency on scalar numeric types.
- Supporting both `count()` and `count(expr)` increases overload complexity.
- Supporting both `count()` and `count(expr)` makes one helper carry row-count and expression-count semantics, so
tests must keep both call shapes covered.

## Layers affected

Expand All @@ -129,6 +130,6 @@ Existing `sum` and `count` helpers should be treated as compatibility-compatible

## Design Decisions

- **Expression-count spelling:** v0.1 exposes `count_expr(expr)` as the public helper spelling for expression-count semantics. This preserves the canonical aggregate distinction while avoiding a same-name overload shape that current Incan cannot express for decorated public helpers.
- **Expression-count spelling:** v0.1 exposes canonical `count(expr)` for expression-count semantics. `count_expr(expr)` remains a compatibility spelling that returns the same canonical aggregate measure.
- **Count result type:** the v0.1 package records count as a non-null aggregate count measure and validates concrete execution through DataFusion-backed session tests. A more precise static numeric return type belongs with the broader InQL numeric type policy.
- **Average result type:** the v0.1 package records `avg` as a numeric aggregate and relies on the backend/interchange path for concrete materialized numeric representation. Static decimal/floating promotion rules remain tied to the broader numeric type policy rather than this aggregate helper slice.
18 changes: 10 additions & 8 deletions docs/rfcs/017_aggregate_modifiers.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# InQL RFC 017: Aggregate modifiers

- **Status:** Draft
- **Status:** Implemented
- **Created:** 2026-04-27
- **Author(s):** Danny Meijer (@dannymeijer)
- **Related:**
Expand All @@ -10,9 +10,9 @@
- InQL RFC 014 (function registry and catalog governance)
- InQL RFC 016 (core aggregate functions)
- **Issue:** [InQL #34](https://github.com/dannys-code-corner/InQL/issues/34)
- **RFC PR:**
- **Written against:** Incan v0.2
- **Shipped in:**
- **RFC PR:** [InQL #45](https://github.com/dannys-code-corner/InQL/pull/45)
- **Written against:** Incan v0.3-era InQL
- **Shipped in:** v0.1

## Summary

Expand Down Expand Up @@ -111,9 +111,11 @@ Existing aggregate helpers remain valid. New compatibility helpers such as `coun
- **Execution / interchange** — Prism and Substrait lowering must preserve filter, distinct, and ordering semantics or reject unsupported forms.
- **Documentation** — aggregate docs should prefer the modifier model and list compatibility helper aliases.

## Unresolved questions
## Design Decisions

- Should `count_if(null)` count zero rows or follow a stricter boolean-null diagnostic rule?
- Which aggregate functions must allow ordered input in the initial modifier contract, especially `listagg`, `percentile_cont`, and `percentile_disc`?
### Resolved

<!-- When every question is resolved, rename this section to **Design Decisions**, group answers under ### Resolved, and remove this comment. -->
- `count_if(predicate)` follows aggregate `FILTER` semantics: rows where the predicate is false or null do not
contribute to the aggregate.
- The initial modifier contract records ordered aggregate input but no current core aggregate allows it. Ordered input
is rejected explicitly until an order-sensitive aggregate such as `listagg` or ordered percentile functions lands.
2 changes: 1 addition & 1 deletion docs/rfcs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ InQL uses its **own** RFC series (starting at 000), independent of the [Incan la
| [014][rfc-014] | Implemented | Function registry and catalog governance | |
| [015][rfc-015] | Implemented | Core scalar functions and operators | |
| [016][rfc-016] | Implemented | Core aggregate functions | |
| [017][rfc-017] | Draft | Aggregate modifiers | |
| [017][rfc-017] | Implemented | Aggregate modifiers | |
| [018][rfc-018] | In Progress | Common scalar function catalog | |
| [019][rfc-019] | Draft | Window functions | |
| [020][rfc-020] | Draft | Nested data functions | |
Expand Down
Loading
Loading