November 29: Ran into early developmental issue where we had an unoriginal distribution of holds dominating our random forest output. This was due to our metrics being to agressive. For major S&P companies a 5% return change over a 10 day hold period is unlikely. I've been messing with changing our threshold to 3 and even 2% to encourage more volatility. We can also have our model give balanced weights but that will lower accuracy which I'm trying to stray away from doing if possible(changing class weights to balanced in our training function, class_weight="balanced").
- Gabriel H.
Nov 27th(Happy thanksgiving): Moved to a 1% change on 2 day look ahead.
- Gabriel H.
New work to be done(November 30th 10:47pm): We should add a section in the app where our data and decision labels are coming from. We are using 5 and 10 day moving averages, daily return averages and more to determine what to do. Based on these features and their associated labels we made during preprocessing, when each model goes through, they associate each feature with certain weights to make more informed decisions. The quantum systems use additional features/weights and adjust differently than the other classical programming models. This is the general idea and I'll add more later so we can have an explanation in our front-end. I'd also love to add graphs to visually represent the agreeance with the RF since that's what we used to preprocess. Hence why our output is 1.0 for it right now. Also would be great if we could add a section explaining our depth and shots since these are simulated and not using actual quantum hardware.
- Gabriel H.
For reference this was our original label distribution with 5% return and a 10 day hold period: Label distribution: label HOLD 2173 BUY 536 SELL 273
After adjusting params:
frontend to be done: Search Capability for Ticker Selection Date Restriction(2015-2020) Search Capability for model selection
Steps to retrain model: cd backend rm -rf data/processed/* rm -f models/random_forest.pkl python3 retrain.py
Already done: Create a Accuracy model(Random Forest): To serve as our baseline and metric to determine our accuracy against different models If a given model has the same determination as this one it is said to be accurate Automate it so it can run on a range of dates with different models. Not specific dates only month and year to a future month and yearx (ex from March 2018 to Jan 2020)
TBD: Add Logistic Regression Model text
Add Variational Quantum Classifier model with Qiskit text
Add CircuitQNN imports text
Run via docker: docker compose up --build docker compose up -d to have your terminal return and the app run in the background ^^ I used this to rerun backend after changes made and if I wanted to see logs I did docker compose ps
This project builds an end-to-end system that predicts Buy, Hold, or Sell signals for 200 publicly traded companies using historical stock market data from 2015–2020. The classification system combines:
- Classical machine learning models
- Quantum machine learning models (simulated via quantum libraries)
The goal is to compare predictive accuracy between classical and quantum approaches using the same feature set and same labeled dataset.
The project includes:
- A data pipeline (download → feature engineering → labeling → dataset creation)
- Training four models (two classical, two quantum)
- A FastAPI backend that serves predictions
- A React frontend that visualizes predictions and comparisons
A ticker is the stock symbol used to identify a publicly traded company.
Examples:
| Company | Ticker |
|---|---|
| Apple | AAPL |
| Microsoft | MSFT |
| GOOG |
This project uses ~200 such tickers.
All historical market data is retrieved from Yahoo Finance using the yfinance Python library.
For each ticker, we download:
- Date
- Open
- High
- Low
- Close
- Adjusted Close
- Volume
Raw data is saved as: data/raw/_2015_2020.csv
The models are trained only on engineered features, not on raw OHLCV data.
Measures change from the previous day:
[ \text{return}t = \frac{Close_t - Close{t-1}}{Close_{t-1}} ]
Trend indicators:
- MA5 = average closing price over the last 5 days
- MA10 = average closing price over the last 10 days
Normalized momentum indicator ranging from 0–100.
OR
[ \text{momentum}{14} = Close_t - Close{t-14} ]
Standardizes trading volume:
[ z = \frac{Volume_t - \mu_{20}}{\sigma_{20}} ]
Where ( \mu_{20} ) and ( \sigma_{20} ) are the 20-day mean and standard deviation.
After computing features, we normalize using one of two methods:
Each stock is normalized using only its own mean and standard deviation.
Normalize across all stocks combined.
Labels are created using only the price data and the features above.
We define a prediction date t and a forecast horizon H = 10 trading days.
Compute:
[ \text{future_return} = \frac{Close_{t+H} - Close_t}{Close_t} ]
Then assign labels:
- BUY → future return > +5%
- SELL → future return < −5%
- HOLD → everything between −5% and +5%
These labels become the ground truth for training all models.
Stored in a Python list.
Saved to data/raw/.
Using only:
- daily return
- MA5
- MA10
- RSI or momentum
- volume z-score
Buy/Hold/Sell based on +5% / −5% thresholds over a 10-day horizon.
Stored in data/processed/.
This forms the training dataset.
- Logistic Regression or Linear SVM
- Random Forest or XGBoost
- Variational Quantum Classifier (VQC)
- Quantum Neural Network (QNN)
Endpoints:
/api/tickers/api/predict
Displays predictions and model comparisons.
stock-quantum-project/ backend/ app/ main.py config.py schemas.py models/ classical.py quantum.py data/ load_data.py requirements.txt
data/ raw/ processed/
frontend/ src/ App.tsx components/ services/api.ts package.json
- Python
- FastAPI
- scikit-learn
- pandas
- numpy
- Qiskit or PennyLane
- React
- TypeScript
- Yahoo Finance (
yfinance)
- Build a clean, reproducible dataset covering 2015–2020
- Train classical and quantum ML models on identical data
- Evaluate and compare Buy/Hold/Sell classification accuracy
- Provide a full-stack system with an interactive UI