A high-throughput, low-latency market data analytics and algorithmic trading platform designed for real-time quantitative research and signal generation. This platform specializes in capturing alpha through Order Book Imbalance (OBI) and Micro-price statistical arbitrage.
![Dashboard Preview] Note: Run the tool and replace this with your own screenshot!
In High-Frequency Trading (HFT), the standard "Mid-price" (
The Solution: This platform provides an end-to-end, sub-millisecond pipeline that:
- Ingests massive streams of L2 market events.
- Calculates high-fidelity features (Micro-price/OBI) using vectorized memory engines.
- Executes statistical models to identify temporary market inefficiencies.
- Persists data into industrial-grade Delta Lakes for continuous strategy refinement.
The platform is engineered around the physics of market microstructure:
OBI quantifies the relative buying vs. selling pressure at the best bid and offer (BBO). It is a predictor of short-term price direction.
-
Formula:
$$OBI = \frac{Size_{bid} - Size_{ask}}{Size_{bid} + Size_{ask}}$$ -
Range:
$[-1, 1]$ .-
$OBI \to 1$ : Heavy buying pressure; high probability of an upward "tick". -
$OBI \to -1$ : Heavy selling pressure; high probability of a downward "tick".
-
The Micro-price is a "fair value" estimate that incorporates order sizes, making it less susceptible to "bid-ask bounce" and more reflective of true market intent.
-
Formula:
$$P_{micro} = \frac{Price_{bid} \times Size_{ask} + Price_{ask} \times Size_{bid}}{Size_{bid} + Size_{ask}}$$ - Key Insight: When the bid size is significantly larger than the ask size, the Micro-price moves closer to the ask price, signaling an imminent upward shift in the mid-price.
- Zero-Copy Pipeline: Utilizes PyArrow and Protobuf to move data from Kafka to the analysis engine with minimal CPU overhead.
- Vectorized Feature Engineering: Leverages the Polars OLAP engine to conduct 100ms windowed aggregations on tens of thousands of messages per second.
- Industrial Storage: Integrated Delta Lake (Hive-partitioned) for immutable, time-travel-capable historical data storage.
- Integrated Research Lab: A custom-built UI and API that allows for real-time backtesting and Auto-Refinement of strategy parameters via grid search.
- High-Frequency Simulation: Built a high-performance Geometric Brownian Motion (GBM) simulator capable of generating 50,000+ ticks/sec to stress-test the stack.
- Market Producer: Streams L2 data into Kafka. Includes a GBM-based
MarketSimulator. - Real-Time Engine: Subscribes to Kafka, maps payloads to PyArrow, and conducts vectorized feature engineering.
- Delta Lake Storage: Immutable Hive-partitioned records for auditing and backtesting.
- Strategy Research Lab (UI): Direct browser-based control for strategy optimization and backtesting.
- React Dashboard: Real-time visualization of spreads, OBI, and generated trade signals.
- Python 3.11+
- Poetry
- Docker & Docker Compose
# Start Kafka, Redpanda, Engine, Producer, API, Frontend, and Prometheus
docker-compose up -d --build
# Real-Time Dashboard & Research Lab: http://localhost:5173
# Prometheus Metrics: http://localhost:9090The platform includes a Strategy Research Lab integrated into the UI, powered by a dedicated BacktestingEngine.
Adjust the OBI Threshold slider in the UI to see how your strategy would have performed on historical data. View Total PNL, Win Rate, and Trade Count instantly.
The AUTO-REFINE feature allows AI agents or researchers to perform an automated grid search to find the optimal threshold that maximizes signal density and PNL.
To use live data, replace MarketSimulator with a broker-specific client (e.g., Alpaca, IBKR, or Binance).
- Create
src/broker_producer.py. - Map the broker's WebSocket message to our
MarketDataprotobuf schema. - Update
docker-compose.ymlto point to the new producer.
Rishabh Patil Quantitative Developer & Systems Architect