You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Upstream PyPSA#740 compares linopy against polar-high, a Polars-backed LP/MIP eDSL, and reports 2.3–3.5× faster builds with ~30 % less peak memory. That prompted a deeper question than the benchmark itself: is the xarray-based kernel the right foundation for linopy going forward?
This issue documents the investigation: what polar-high actually does, where linopy's costs really are, what a tabular kernel would mean, and what it would depend on. It is a research/tracking issue on my fork — nothing here is a decision.
What polar-high actually does
Everything is a long-format (tidy) Polars frame, sparse from the first moment:
Object
Schema
Evaluation
Var
(*dims, col_id) — one row per LP column
eager, built once
Param
(*dims, value)
lazy
expression term
(*dims, col_id, coef)
lazy
Arithmetic never materialises anything: Param * Var is a lazy join, Sum(expr, over=dims) is a lazy group_by, expr1 + expr2 is list concatenation of terms. Matrix assembly happens inside solve(): per-constraint-family COO triples → global dedup group_by → lexsort → CSC → highspy.passModel() (or streaming addRows per family, which bounds peak memory to one family + the growing HiGHS model).
Where the speed comes from — and what it is not:
It is not Polars multi-threading — polar-high forces POLARS_MAX_THREADS=1 at import; their own benchmark shows single-thread is faster and leaner for this workload.
Sparse-first representation. There is never a dense (dims × _term) array.
Lazy plans with bounded intermediates ("prune-down" joins keep every intermediate at O(constraint rows)).
Two claims in PyPSA#740 need correction: linopy's LP writer is already Polars-based since 0.7.0 (lp and lp-polars are the same code path), and the time gap is substantially an IO-path artifact, not purely architectural.
The real question: rectangular-dense vs long-format-sparse
"xarray vs Polars" is the wrong framing. The defining property of linopy's kernel is that an expression is a rectangular array: coeffs/vars with shape (*coord_dims, _term). Rectangularity is what xarray needs, and it is what costs us. The question is whether that shape is right — the container choice follows from the answer.
Where dense wins — homogeneous expressions.2*x + y over (time=8760, bus=500): every cell genuinely has 2 terms, zero padding. Dense: ~140 MB with coords stored once. Long format: ~420 MB (coordinates repeated per row). Dense regular grids — the PyPSA core case — are ~3× more compact in xarray.
Where dense loses — incidence/balance constraints.flow.groupby(dst).sum() goes through unstack(group_dim, fill_value=...) in LinearExpressionGroupby.sum(): every group is padded to the size of the largest group. A network where the median node has 3 lines but one hub has 60:
long format: one row per real (edge, time) entry ≈ 270 MB
The nodal balance is the central constraint of every energy model, and this is why the gap in polar-high's network benchmark (3.2×) is larger than in their dense one. It is real, not a methodology artifact.
Where dense also loses — masking.x.where(mask) keeps the full rectangular shape with -1 labels; a constraint over 5 % of components costs the same as over 100 %.
The two coherent architectures
A hybrid — xarray objects with Polars bolted on top/underneath mid-pipeline — was considered and rejected: the dense→long conversion doesn't disappear, it moves into the hot arithmetic path and gets paid per operation, and the library ends up with two mental models. The coherent end states are:
A — xarray through and through. Keep the kernel; do the fixes that don't require re-architecting: make io_api="direct" the default for HiGHS/Gurobi, fix the groupby padding (an implementation choice, not forced by the representation), accept the representation's limits.
B — tabular through and through, xarray only at the edges. All model objects (Variable, LinearExpression, Constraint) are Polars frames from birth. xarray appears in exactly two places: user input (bounds/coefficients/coords arrive as DataArrays → converted once at add_variables/add_constraints) and solution output (returned as DataArrays). The user-facing API vocabulary — named dims, coords, .sel(), broadcasting arithmetic, x - y >= rhs — is kept; the objects backing it are not xarray.
Is B feasible? Measured, not guessed
Inventory of every linopy API call in the PyPSA ecosystem (PyPSA master, PyPSA-Eur master, linopy's own test suite):
The three internal-reaching call sites in the entire ecosystem (all in PyPSA core, none in PyPSA-Eur):
(lhs.vars == -1).all("_term") — pypsa/optimization/constraints.py:1166 (emptiness check; needs a public .empty)
soc.data.sel(snapshot=(p, sl)).roll(snapshot=1) — constraints.py:1711 and :1895 (multi-index roll for storage cyclicity; needs a public multi-index-aware .roll())
No dask, no netcdf, no subclassing, no .coeffs/.nterm access anywhere in either consumer.
linopy's own test suite (~900–1000 tests): ~55–60 % behavioral spec (must pass with any kernel), ~25–30 % representation-coupled (assert on _term/Dataset structure; rewritten), ~15–20 % xarray-integration (stack/unstack/ffill forwarding, netcdf, dask — defines what is lost or needs a redesign decision).
The genuinely hard residue is small and concentrated:
DataArray/Series-operand arithmetic semantics — what alignment means when users write xr.DataArray(p_max_pu) * dispatch. This is the silent-failure risk: different alignment = different coefficients, no error.
Multi-index time structures — rolling/shifting over (period, timestep) investment-period indexes.
The test-suite migration volume (~350–400 tests) and a kernel-agnostic assert_linequal.
The v1 convention is the prerequisite
Hard-spot #1 above is exactly what the v1 semantic convention (PyPSA#717) specifies. The convention's object-scope rule makes operand type irrelevant (pd.Series * x ≡ xr.DataArray * x ≡ constant-expression), §8 replaces implicit xarray alignment with exact-match-or-raise, §6/§13 define absence propagation and reductions. The unlabeled-operand pairing rule is specified but not yet implemented (PyPSA#736).
The dependency direction matters: a tabular kernel depends on v1, not the other way around.
Without v1, a tabular kernel must bug-for-bug replicate legacy's implicit xarray behaviors (positional fallbacks, silent inner joins, per-operator NaN fills). That is close to impossible and the failure mode is silent wrong coefficients.
With v1, the semantics are explicit and join-native. They map onto Polars primitives almost 1:1 — and in several places Polars provides natively what v1 must build and police on top of numpy/xarray:
v1 rule
On numpy/xarray (today)
On Polars
§1–§2 absence is first-class
NaN in float fields + -1 sentinels in int fields, kept in sync by convention
null for every dtype (Arrow validity bitmap) — one mechanism, engine-enforced
§5 user NaN ≠ absence, raises
needed because NaN is overloaded
NaN and null are different things; the ambiguity doesn't exist
§6 absence propagates
implemented per operator
null propagation is the arithmetic default
§13 reductions skip absent, sum of none = 0
skipna handling
the default behavior of sum()/mean()
§8 exact-match alignment
custom check
join + verify-no-unmatched (~small wrapper)
§10 explicit join=
custom
the how= argument of join()
In short: v1 specifies relational missing-data semantics on top of an array stack that doesn't have them; Polars is built on those semantics. That is both why a tabular kernel would be faster and why it would be safer — fewer conventions to police, more invariants the storage physically cannot break.
One real design gap: v1's absence is per-slot, and in long format an absent slot is most naturally "no row at all" — which means the kernel must track the coordinate universe (index sets) separately from the data rows, e.g. for fillna() reviving slots and for constraint row counting. polar-high solves this by having Var frames born with their full index. Solved problem, but a decision to make, not a freebie.
Sequencing
Independent of any kernel decision (worth doing now):
Conventions for structural-operation results (groupby/rolling/sel result coordinates and ordering) — the part of the spec feat: v1 semantic convention PyPSA/linopy#717 doesn't cover
Then, and only then, the kernel question becomes decidable:
Trigger
Upstream PyPSA#740 compares linopy against polar-high, a Polars-backed LP/MIP eDSL, and reports 2.3–3.5× faster builds with ~30 % less peak memory. That prompted a deeper question than the benchmark itself: is the xarray-based kernel the right foundation for linopy going forward?
This issue documents the investigation: what polar-high actually does, where linopy's costs really are, what a tabular kernel would mean, and what it would depend on. It is a research/tracking issue on my fork — nothing here is a decision.
What polar-high actually does
Everything is a long-format (tidy) Polars frame, sparse from the first moment:
Var(*dims, col_id)— one row per LP columnParam(*dims, value)(*dims, col_id, coef)Arithmetic never materialises anything:
Param * Varis a lazy join,Sum(expr, over=dims)is a lazy group_by,expr1 + expr2is list concatenation of terms. Matrix assembly happens insidesolve(): per-constraint-family COO triples → global dedup group_by → lexsort → CSC →highspy.passModel()(or streamingaddRowsper family, which bounds peak memory to one family + the growing HiGHS model).Where the speed comes from — and what it is not:
POLARS_MAX_THREADS=1at import; their own benchmark shows single-thread is faster and leaner for this workload.(dims × _term)array.polar-high(Polars-backed LP/MIP eDSL on HiGHS) — thoughts? PyPSA/linopy#740 forces linopy throughio_api="lp"(multi-GB text written, then re-parsed by HiGHS) and never tests linopy'sio_api="direct".Two claims in PyPSA#740 need correction: linopy's LP writer is already Polars-based since 0.7.0 (
lpandlp-polarsare the same code path), and the time gap is substantially an IO-path artifact, not purely architectural.The real question: rectangular-dense vs long-format-sparse
"xarray vs Polars" is the wrong framing. The defining property of linopy's kernel is that an expression is a rectangular array:
coeffs/varswith shape(*coord_dims, _term). Rectangularity is what xarray needs, and it is what costs us. The question is whether that shape is right — the container choice follows from the answer.Where dense wins — homogeneous expressions.
2*x + yover(time=8760, bus=500): every cell genuinely has 2 terms, zero padding. Dense: ~140 MB with coords stored once. Long format: ~420 MB (coordinates repeated per row). Dense regular grids — the PyPSA core case — are ~3× more compact in xarray.Where dense loses — incidence/balance constraints.
flow.groupby(dst).sum()goes throughunstack(group_dim, fill_value=...)inLinearExpressionGroupby.sum(): every group is padded to the size of the largest group. A network where the median node has 3 lines but one hub has 60:(10 000 nodes × 168 h × 60 terms)≈ 1.6 GB, ~95 % paddingThe nodal balance is the central constraint of every energy model, and this is why the gap in polar-high's network benchmark (3.2×) is larger than in their dense one. It is real, not a methodology artifact.
Where dense also loses — masking.
x.where(mask)keeps the full rectangular shape with-1labels; a constraint over 5 % of components costs the same as over 100 %.The two coherent architectures
A hybrid — xarray objects with Polars bolted on top/underneath mid-pipeline — was considered and rejected: the dense→long conversion doesn't disappear, it moves into the hot arithmetic path and gets paid per operation, and the library ends up with two mental models. The coherent end states are:
A — xarray through and through. Keep the kernel; do the fixes that don't require re-architecting: make
io_api="direct"the default for HiGHS/Gurobi, fix the groupby padding (an implementation choice, not forced by the representation), accept the representation's limits.B — tabular through and through, xarray only at the edges. All model objects (Variable, LinearExpression, Constraint) are Polars frames from birth. xarray appears in exactly two places: user input (bounds/coefficients/coords arrive as DataArrays → converted once at
add_variables/add_constraints) and solution output (returned as DataArrays). The user-facing API vocabulary — named dims, coords,.sel(), broadcasting arithmetic,x - y >= rhs— is kept; the objects backing it are not xarray.Is B feasible? Measured, not guessed
Inventory of every linopy API call in the PyPSA ecosystem (PyPSA
master, PyPSA-Eurmaster, linopy's own test suite):add_*,.solution,.dual)sel/loc/groupby/merge/shift/rolling/where/reindex)The three internal-reaching call sites in the entire ecosystem (all in PyPSA core, none in PyPSA-Eur):
(lhs.vars == -1).all("_term")—pypsa/optimization/constraints.py:1166(emptiness check; needs a public.empty)soc.data.sel(snapshot=(p, sl)).roll(snapshot=1)—constraints.py:1711and:1895(multi-index roll for storage cyclicity; needs a public multi-index-aware.roll())No dask, no netcdf, no subclassing, no
.coeffs/.ntermaccess anywhere in either consumer.linopy's own test suite (~900–1000 tests): ~55–60 % behavioral spec (must pass with any kernel), ~25–30 % representation-coupled (assert on
_term/Dataset structure; rewritten), ~15–20 % xarray-integration (stack/unstack/ffillforwarding, netcdf, dask — defines what is lost or needs a redesign decision).The genuinely hard residue is small and concentrated:
xr.DataArray(p_max_pu) * dispatch. This is the silent-failure risk: different alignment = different coefficients, no error.(period, timestep)investment-period indexes.assert_linequal.The v1 convention is the prerequisite
Hard-spot #1 above is exactly what the v1 semantic convention (PyPSA#717) specifies. The convention's object-scope rule makes operand type irrelevant (
pd.Series * x≡xr.DataArray * x≡ constant-expression), §8 replaces implicit xarray alignment with exact-match-or-raise, §6/§13 define absence propagation and reductions. The unlabeled-operand pairing rule is specified but not yet implemented (PyPSA#736).The dependency direction matters: a tabular kernel depends on v1, not the other way around.
NaNin float fields +-1sentinels in int fields, kept in sync by conventionnullfor every dtype (Arrow validity bitmap) — one mechanism, engine-enforcedNaNandnullare different things; the ambiguity doesn't existskipnahandlingsum()/mean()join=how=argument ofjoin()In short: v1 specifies relational missing-data semantics on top of an array stack that doesn't have them; Polars is built on those semantics. That is both why a tabular kernel would be faster and why it would be safer — fewer conventions to police, more invariants the storage physically cannot break.
One real design gap: v1's absence is per-slot, and in long format an absent slot is most naturally "no row at all" — which means the kernel must track the coordinate universe (index sets) separately from the data rows, e.g. for
fillna()reviving slots and for constraint row counting. polar-high solves this by havingVarframes born with their full index. Solved problem, but a decision to make, not a freebie.Sequencing
Independent of any kernel decision (worth doing now):
io_api="direct"the default for solvers with a direct interface; reply to Benchmark comparison vspolar-high(Polars-backed LP/MIP eDSL on HiGHS) — thoughts? PyPSA/linopy#740 with corrected methodologygroupby().sum()padding to max group sizeThe prerequisite chain for any kernel work:
groupby/rolling/selresult coordinates and ordering) — the part of the spec feat: v1 semantic convention PyPSA/linopy#717 doesn't coverThen, and only then, the kernel question becomes decidable:
add_variables+ arithmetic +groupby().sum()+ direct HiGHS assembly over Polars; run the Benchmark comparison vspolar-high(Polars-backed LP/MIP eDSL on HiGHS) — thoughts? PyPSA/linopy#740 benchmarks against itReferences
polar-high(Polars-backed LP/MIP eDSL on HiGHS) — thoughts? PyPSA/linopy#740lp-polarsIO API writes different models tolpwhich are incorrect PyPSA/linopy#484