Investigation: long-format (Polars) kernel behind linopy's xarray-style API

## Trigger

Upstream PyPSA/linopy#740 compares linopy against [polar-high](https://github.com/nodal-tools/polar-high), a Polars-backed LP/MIP eDSL, and reports 2.3–3.5× faster builds with ~30 % less peak memory. That prompted a deeper question than the benchmark itself: **is the xarray-based kernel the right foundation for linopy going forward?**

This issue documents the investigation: what polar-high actually does, where linopy's costs really are, what a tabular kernel would mean, and what it would depend on. It is a research/tracking issue on my fork — nothing here is a decision.

## What polar-high actually does

Everything is a long-format (tidy) Polars frame, sparse from the first moment:

| Object | Schema | Evaluation |
|---|---|---|
| `Var` | `(*dims, col_id)` — one row per LP column | eager, built once |
| `Param` | `(*dims, value)` | lazy |
| expression term | `(*dims, col_id, coef)` | lazy |

Arithmetic never materialises anything: `Param * Var` is a lazy join, `Sum(expr, over=dims)` is a lazy group_by, `expr1 + expr2` is list concatenation of terms. Matrix assembly happens inside `solve()`: per-constraint-family COO triples → global dedup group_by → lexsort → CSC → `highspy.passModel()` (or streaming `addRows` per family, which bounds peak memory to one family + the growing HiGHS model).

**Where the speed comes from — and what it is *not*:**

1. It is **not Polars multi-threading** — polar-high forces `POLARS_MAX_THREADS=1` at import; their own benchmark shows single-thread is faster and leaner for this workload.
2. **Sparse-first representation.** There is never a dense `(dims × _term)` array.
3. **No LP text file.** Direct in-process matrix assembly into highspy. The benchmark in #740 forces linopy through `io_api="lp"` (multi-GB text written, then re-parsed by HiGHS) and never tests linopy's `io_api="direct"`.
4. Lazy plans with bounded intermediates ("prune-down" joins keep every intermediate at O(constraint rows)).

Two claims in #740 need correction: linopy's LP writer is *already* Polars-based since 0.7.0 (`lp` and `lp-polars` are the same code path), and the time gap is substantially an IO-path artifact, not purely architectural.

## The real question: rectangular-dense vs long-format-sparse

"xarray vs Polars" is the wrong framing. The defining property of linopy's kernel is that **an expression is a rectangular array**: `coeffs`/`vars` with shape `(*coord_dims, _term)`. Rectangularity is what xarray needs, and it is what costs us. The question is whether that shape is right — the container choice follows from the answer.

**Where dense wins — homogeneous expressions.** `2*x + y` over `(time=8760, bus=500)`: every cell genuinely has 2 terms, zero padding. Dense: ~140 MB with coords stored once. Long format: ~420 MB (coordinates repeated per row). Dense regular grids — the PyPSA core case — are ~3× *more* compact in xarray.

**Where dense loses — incidence/balance constraints.** `flow.groupby(dst).sum()` goes through `unstack(group_dim, fill_value=...)` in `LinearExpressionGroupby.sum()`: every group is padded to the size of the **largest** group. A network where the median node has 3 lines but one hub has 60:

- dense: `(10 000 nodes × 168 h × 60 terms)` ≈ 1.6 GB, ~95 % padding
- long format: one row per real (edge, time) entry ≈ 270 MB

The nodal balance is the central constraint of every energy model, and this is why the gap in polar-high's *network* benchmark (3.2×) is larger than in their dense one. It is real, not a methodology artifact.

**Where dense also loses — masking.** `x.where(mask)` keeps the full rectangular shape with `-1` labels; a constraint over 5 % of components costs the same as over 100 %.

## The two coherent architectures

A hybrid — xarray objects with Polars bolted on top/underneath mid-pipeline — was considered and rejected: the dense→long conversion doesn't disappear, it moves into the hot arithmetic path and gets paid per operation, and the library ends up with two mental models. The coherent end states are:

**A — xarray through and through.** Keep the kernel; do the fixes that don't require re-architecting: make `io_api="direct"` the default for HiGHS/Gurobi, fix the groupby padding (an implementation choice, not forced by the representation), accept the representation's limits.

**B — tabular through and through, xarray only at the edges.** All model objects (Variable, LinearExpression, Constraint) are Polars frames from birth. xarray appears in exactly two places: user input (bounds/coefficients/coords arrive as DataArrays → converted once at `add_variables`/`add_constraints`) and solution output (returned as DataArrays). The user-facing API *vocabulary* — named dims, coords, `.sel()`, broadcasting arithmetic, `x - y >= rhs` — is kept; the objects backing it are not xarray.

## Is B feasible? Measured, not guessed

Inventory of every linopy API call in the PyPSA ecosystem (PyPSA `master`, PyPSA-Eur `master`, linopy's own test suite):

| | PyPSA core | PyPSA-Eur |
|---|---|---|
| call sites | ~500 | ~150 |
| trivial (arithmetic, `add_*`, `.solution`, `.dual`) | 11 patterns | 8 patterns |
| mechanical (`sel`/`loc`/`groupby`/`merge`/`shift`/`rolling`/`where`/`reindex`) | 15 patterns | 6 patterns |
| reaching into xarray internals | **2 patterns (3 sites)** | **0** |

The three internal-reaching call sites in the entire ecosystem (all in PyPSA core, none in PyPSA-Eur):

- `(lhs.vars == -1).all("_term")` — `pypsa/optimization/constraints.py:1166` (emptiness check; needs a public `.empty`)
- `soc.data.sel(snapshot=(p, sl)).roll(snapshot=1)` — `constraints.py:1711` and `:1895` (multi-index roll for storage cyclicity; needs a public multi-index-aware `.roll()`)

No dask, no netcdf, no subclassing, no `.coeffs`/`.nterm` access anywhere in either consumer.

linopy's own test suite (~900–1000 tests): ~55–60 % behavioral spec (must pass with any kernel), ~25–30 % representation-coupled (assert on `_term`/Dataset structure; rewritten), ~15–20 % xarray-integration (`stack`/`unstack`/`ffill` forwarding, netcdf, dask — defines what is lost or needs a redesign decision).

The genuinely hard residue is small and concentrated:

1. **DataArray/Series-operand arithmetic semantics** — what alignment means when users write `xr.DataArray(p_max_pu) * dispatch`. This is the silent-failure risk: different alignment = different coefficients, no error.
2. **Multi-index time structures** — rolling/shifting over `(period, timestep)` investment-period indexes.
3. The test-suite migration volume (~350–400 tests) and a kernel-agnostic `assert_linequal`.

## The v1 convention is the prerequisite

Hard-spot #1 above is exactly what the v1 semantic convention (PyPSA/linopy#717) specifies. The convention's object-scope rule makes operand type irrelevant (`pd.Series * x` ≡ `xr.DataArray * x` ≡ constant-expression), §8 replaces implicit xarray alignment with exact-match-or-raise, §6/§13 define absence propagation and reductions. The unlabeled-operand pairing rule is specified but not yet implemented (PyPSA/linopy#736).

The dependency direction matters: **a tabular kernel depends on v1, not the other way around.**

- Without v1, a tabular kernel must bug-for-bug replicate legacy's implicit xarray behaviors (positional fallbacks, silent inner joins, per-operator NaN fills). That is close to impossible and the failure mode is silent wrong coefficients.
- With v1, the semantics are explicit and join-native. They map onto Polars primitives almost 1:1 — and in several places Polars provides natively what v1 must build and police on top of numpy/xarray:

| v1 rule | On numpy/xarray (today) | On Polars |
|---|---|---|
| §1–§2 absence is first-class | `NaN` in float fields + `-1` sentinels in int fields, kept in sync by convention | `null` for every dtype (Arrow validity bitmap) — one mechanism, engine-enforced |
| §5 user NaN ≠ absence, raises | needed because NaN is overloaded | `NaN` and `null` are different things; the ambiguity doesn't exist |
| §6 absence propagates | implemented per operator | null propagation is the arithmetic default |
| §13 reductions skip absent, sum of none = 0 | `skipna` handling | the default behavior of `sum()`/`mean()` |
| §8 exact-match alignment | custom check | join + verify-no-unmatched (~small wrapper) |
| §10 explicit `join=` | custom | the `how=` argument of `join()` |

In short: v1 specifies relational missing-data semantics on top of an array stack that doesn't have them; Polars is built on those semantics. That is both why a tabular kernel would be faster and why it would be *safer* — fewer conventions to police, more invariants the storage physically cannot break.

One real design gap: v1's absence is per-*slot*, and in long format an absent slot is most naturally "no row at all" — which means the kernel must track the coordinate universe (index sets) separately from the data rows, e.g. for `fillna()` reviving slots and for constraint row counting. polar-high solves this by having `Var` frames born with their full index. Solved problem, but a decision to make, not a freebie.

## Sequencing

Independent of any kernel decision (worth doing now):

- [ ] Make `io_api="direct"` the default for solvers with a direct interface; reply to PyPSA/linopy#740 with corrected methodology
- [ ] Fix `groupby().sum()` padding to max group size

The prerequisite chain for any kernel work:

- [ ] v1 convention lands (PyPSA/linopy#717)
- [ ] Unlabeled-operand pairing implemented (PyPSA/linopy#736)
- [ ] v1 becomes the default; legacy removed
- [ ] Conventions for structural-operation results (`groupby`/`rolling`/`sel` result coordinates and ordering) — the part of the spec #717 doesn't cover

Then, and only then, the kernel question becomes decidable:

- [ ] Spike: prototype `add_variables` + arithmetic + `groupby().sum()` + direct HiGHS assembly over Polars; run the #740 benchmarks against it
- [ ] Decision with the upstream team, on numbers

## References

- Upstream trigger: PyPSA/linopy#740
- polar-high: https://github.com/nodal-tools/polar-high (Apache-2.0)
- v1 semantic convention: PyPSA/linopy#717, unlabeled pairing: PyPSA/linopy#736
- Earlier Polars touchpoints: PyPSA/linopy#510, PyPSA/linopy#484


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigation: long-format (Polars) kernel behind linopy's xarray-style API #24

Trigger

What polar-high actually does

The real question: rectangular-dense vs long-format-sparse

The two coherent architectures

Is B feasible? Measured, not guessed

The v1 convention is the prerequisite

Sequencing

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Object	Schema	Evaluation
`Var`	`(*dims, col_id)` — one row per LP column	eager, built once
`Param`	`(*dims, value)`	lazy
expression term	`(*dims, col_id, coef)`	lazy

	PyPSA core	PyPSA-Eur
call sites	~500	~150
trivial (arithmetic, `add_*`, `.solution`, `.dual`)	11 patterns	8 patterns
mechanical (`sel`/`loc`/`groupby`/`merge`/`shift`/`rolling`/`where`/`reindex`)	15 patterns	6 patterns
reaching into xarray internals	2 patterns (3 sites)	0

v1 rule	On numpy/xarray (today)	On Polars
§1–§2 absence is first-class	`NaN` in float fields + `-1` sentinels in int fields, kept in sync by convention	`null` for every dtype (Arrow validity bitmap) — one mechanism, engine-enforced
§5 user NaN ≠ absence, raises	needed because NaN is overloaded	`NaN` and `null` are different things; the ambiguity doesn't exist
§6 absence propagates	implemented per operator	null propagation is the arithmetic default
§13 reductions skip absent, sum of none = 0	`skipna` handling	the default behavior of `sum()`/`mean()`
§8 exact-match alignment	custom check	join + verify-no-unmatched (~small wrapper)
§10 explicit `join=`	custom	the `how=` argument of `join()`

Investigation: long-format (Polars) kernel behind linopy's xarray-style API #24

Description

Trigger

What polar-high actually does

The real question: rectangular-dense vs long-format-sparse

The two coherent architectures

Is B feasible? Measured, not guessed

The v1 convention is the prerequisite

Sequencing

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions