Skip to content

Data Shape Invariants

Syed Ibrahim Omer edited this page Apr 13, 2026 · 1 revision

Data Shape Invariants

This page describes the shape of data as it moves through the pipeline so tests and downstream tools can rely on consistent expectations.

After source_data

Each package’s data is a Polars LazyFrame with columns (before lowercasing):

  • Date
  • Open, High, Low, Close, Volume
  • Dividends, Stock Splits

(as renamed from Yahoo’s multi-index columns for that ticker).

After calculate_indicators rename

Column names are lowercased by zipping original names with lower():

  • Datedate
  • Closeclose

Spaces in names become part of the lowered string (e.g. Stock Splitsstock splits).

Indicator columns added

See Indicators (Overview). Notable case exceptions in current code:

  • ATR (uppercase)
  • K, D (stochastic, uppercase)

Intermediate columns used for RSI/ATR (e.g. returns, gains, losses, true_range) remain in the frame unless dropped elsewhere.

Row count

Row count equals the number of history rows returned by Yahoo for that (ticker, period, interval) after filtering invalid tickers.

Leading nulls are normal for rolling indicators until the window is filled.

Lazy vs eager

  • Computation is built on a LazyFrame until collect(engine=...).
  • The returned package’s data is a materialized DataFrame.

Related pages:

Clone this wiki locally