Note: This issue was drafted by an AI agent (Claude Code) during a profiling/investigation session with @FBumann. The measurements (memray C-level profiling, benchmarks, the 284× / 132 MB figures) and the git-history archaeology are real and reproducible, but the framing and proposed directions are a starting point for discussion — please sanity-check before acting on them.
merge (expressions.py, merge(...)) concatenates several LinearExpressions along a new/shared dimension by aligning their _term axes. Because the _term axis is a dense rectangle, alignment pads every block to the global maximum term count before concatenation, and the concatenation itself allocates the full padded result.
Evidence
Memray on a full SciGRID-DE create_model() (585 buses, 24 snapshots, 59,640 vars, 142,968 cons) — peak C-level memory 351 MB, and the single largest live allocation at the high-water mark:
132.6 MB concatenate (xarray duck_array_ops)
<- concat (xarray)
<- merge (linopy/expressions.py)
<- define_kirchhoff_voltage_constraints (pypsa/.../constraints.py)
So merge is the peak operation of the whole build — i.e. the allocation that actually sets the OOM ceiling for large networks, more than groupby (#745) on realistic group-size distributions.
Two contributing factors:
- The inputs are already densified (see the KVL
@ issue) — merge inherits that bloat.
merge pads each block to the global-max _term before concat; blocks with few terms waste the rest.
Why it matters
PyPSA works around exactly this by splitting the nodal-balance constraint into two (strongly- vs weakly-meshed buses) so each merge/groupby operates on a bucket of similar term counts — see #745 for that evidence. A merge that handled ragged _term (or operated in long format) would remove the need for that manual bucketing.
Possible directions
Sibling of #745 and the KVL @ issue — all three are the dense-_term representation surfacing at different ops.
merge(expressions.py,merge(...)) concatenates severalLinearExpressions along a new/shared dimension by aligning their_termaxes. Because the_termaxis is a dense rectangle, alignment pads every block to the global maximum term count before concatenation, and the concatenation itself allocates the full padded result.Evidence
Memray on a full SciGRID-DE
create_model()(585 buses, 24 snapshots, 59,640 vars, 142,968 cons) — peak C-level memory 351 MB, and the single largest live allocation at the high-water mark:So
mergeis the peak operation of the whole build — i.e. the allocation that actually sets the OOM ceiling for large networks, more thangroupby(#745) on realistic group-size distributions.Two contributing factors:
@issue) —mergeinherits that bloat.mergepads each block to the global-max_termbefore concat; blocks with few terms waste the rest.Why it matters
PyPSA works around exactly this by splitting the nodal-balance constraint into two (strongly- vs weakly-meshed buses) so each
merge/groupbyoperates on a bucket of similar term counts — see #745 for that evidence. Amergethat handled ragged_term(or operated in long format) would remove the need for that manual bucketing.Possible directions
_termkernel (Umbrella: long-format / sparse_termkernel (dense-_termmemory cluster) #756).Sibling of #745 and the KVL
@issue — all three are the dense-_termrepresentation surfacing at different ops.