Skip to content

Latest commit

 

History

History
114 lines (71 loc) · 3.54 KB

File metadata and controls

114 lines (71 loc) · 3.54 KB

Execution context (Explanation)

This page explains how to think about InQL's execution model as it works today.

The mental model

There are two distinct phases:

  1. Build deferred relational work in LazyFrame[T]
  2. Ask a Session to bind and run that work

LazyFrame[T] is not local data in hand. It is deferred relational intent. DataFrame[T] is local materialized data.

Why Session exists

Session owns the parts that are not just logical plan shape:

  • source registration
  • logical-name to physical-source binding
  • backend execution
  • materialization
  • writing to sinks

That keeps the carrier model clean. A LazyFrame[T] describes work. A Session makes that work run.

execute vs collect

These two APIs are related but not interchangeable.

session.execute(...)

Use execute(...) when you want an execution checkpoint:

  • the plan binds successfully
  • lowering succeeds
  • the backend can run it

It returns LazyFrame[T] again because the point is validation and execution success, not local materialization.

session.collect(...)

Use collect(...) when you want local data:

  • it runs the same backend path
  • it materializes a DataFrame[T]
  • it records structured materialization metadata such as resolved columns and row count
  • it may also retain preview text for display/debugging

This is the boundary where deferred relational work becomes local data in hand.

Why active session exists

Some convenience APIs are nicer when they do not force the session parameter through every call site. lazy.collect() is one of those cases.

That convenience needs a real execution context underneath, so it resolves through the active session at call time.

  • session.activate() sets the current active session
  • lazy.collect() uses that active session

If there is no active session, the convenience API fails clearly instead of pretending execution context can be ambient without definition.

Writing is Session-owned

session.write_csv(...) and session.write_parquet(...) remain explicit Session methods because writing is not just a carrier concern. It requires binding, execution, and sink ownership.

The ergonomic split is:

  • convenience materialization: lazy.collect()
  • explicit writes: session.write_csv(...), session.write_parquet(...)

This keeps materialization convenient while leaving sink ownership explicit at the session boundary.

Typical flow

from pub::inql import LazyFrame, Session
from pub::inql.functions import col, gt, lit, mul
from models import Order

session = Session.default()

orders: LazyFrame[Order] = session.read_csv("orders", "orders.csv")?
enriched = orders.with_column("amount_x2", mul(col("amount"), lit(2)))
filtered = enriched.filter(gt(col("amount"), lit(100))).limit(10)

session.activate()
preview = filtered.collect()?
session.write_csv(filtered, "orders_out.csv")?

This pattern is intentionally simple:

  • read returns deferred work
  • projection/filter/aggregate transforms stay declarative
  • transforms stay deferred
  • collect materializes when needed
  • writes remain explicit on the session

For the exact method surface, see Dataset methods (Reference).

Materialized carrier shape

DataFrame[T] is the materialized carrier. The important semantic distinction is:

  • LazyFrame[T] = deferred
  • DataFrame[T] = local materialized

The materialized carrier exposes structured collection metadata:

  • resolved columns
  • row count
  • preview text

For exact API shape, see Execution context (Reference).