Issue 1: Redesign Storage Layer for Structured Data
- Replace JSON-per-key in HDF5 with structured tables (compound datasets).
- Ensure datasets are appendable (maxshape=None, chunks=True).
- Make columns explicitable (date, symbol, open, close, volume, factors).
Example: HDF5 compound dataset or Parquet + Polars.
Issue 2: Implement Asset Metadata System
- Store asset metadata separately: symbol, class, region, sector, market cap, liquidity, currency, dividend yield, factor scores.
- Allow fast filtering and universe selection without scanning time-series.
- Maintain mapping between metadata and corresponding data storage.
Issue 3: Create Hierarchical Filtering Pipeline
- Pipeline & Filter classes
- Implement a multi-step filtering process before strategy/backtesting:
- Remove illiquid/extreme assets
- Select universe by strategy/asset class
- Compute risk metrics, factor exposures
- Feed filtered dataset into backtest or optimizer
Issue 4: Define Universe Templates
- Predefine pools of assets for repeated strategy testing (e.g., “Global Equities”, “Brazil Bonds”, “Multi-asset ETFs”).
- Templates should include filtering criteria: liquidity, size, asset class, region.
- Ensure templates are easily selectable and interchangeable in backtests.
Issue 5: Strategy-Specific Views
- Each strategy should work on its own filtered subset of the universe.
Examples: Momentum strategy → top 1000 liquid equities; Value strategy → equities by book-to-price ratio; Multi-asset → ETFs across classes/regions.
- Supports modular strategy testing and avoids data contamination across strategies.
Issue 6: Incremental Updates & Factor Computation
- Support appendable time-series updates.
- Precompute and cache factor scores, correlations, and risk metrics monthly/quarterly.
- Maintain optional index mapping symbols to file locations for fast access.
Issue 7: Integrate Queryable & Searchable Storage
- Support efficient filtering, sorting, and selection on structured datasets.
- try for HDF5 + PyTables or Parquet + Polars.
- Include examples of common queries (filter by symbol/date, sort by factor).
Issue 8: Testing & Migration Plan
- Plan migration from current JSON-based storage to new system.
- Implement tests to ensure append, query, and filter operations return correct results.
- Benchmark read/write speeds, especially for thousands of assets and years of daily data.
Issue 1: Redesign Storage Layer for Structured Data
Example: HDF5 compound dataset or Parquet + Polars.
Issue 2: Implement Asset Metadata System
Issue 3: Create Hierarchical Filtering Pipeline
Issue 4: Define Universe Templates
Issue 5: Strategy-Specific Views
Examples: Momentum strategy → top 1000 liquid equities; Value strategy → equities by book-to-price ratio; Multi-asset → ETFs across classes/regions.
Issue 6: Incremental Updates & Factor Computation
Issue 7: Integrate Queryable & Searchable Storage
Issue 8: Testing & Migration Plan