GitHub - raysteezy/raysteezy-learning: Learning projects — ML, financial data pipelines, and Python development

Raysteezy Learning
Machine Learning · Financial Data Pipelines · Python Development

What this is and why

This is my personal learning expierence. I'm a first-year college student teaching myself Python, data pipelines, and machine learning by working with real stock market data. There is nothing fancy this is just me figuring things out and documenting what I learn along the way.

The main project tracks Planet Labs (PL) stock data using 5 years of daily price history (since their IPO in April 2021). I built a pipeline that grabs financial data every week, and I've been building prediction models in two rounds — V1 and V2. Both versions live in separate files so you can see the progression.

All the data in this repo also gets synced to Airweave for AI-powered search.

V1 — Baseline Models (Grade: C+)

My first attempt. I used basic regression with no proper validation. These models taught me a lot about what NOT to do, so I'm keeping them for reference.

V1 Price Prediction

Model	R² (Training)	Problem
Linear Regression	0.036	Underfits — barely explains anything
Polynomial (deg-3)	0.853	Overfits — curves up forever past training data

V1 Monte Carlo

Detail	Value
Model	Constant-vol Geometric Brownian Motion (GBM)
Stress tests	Made-up multipliers (not based on real data)
Validation	None
2-year median	$31.02
P(Profit)	46.8%

Why V1 Got a C+

R² was only computed on training data (no out-of-sample testing)
Only used the date as input — no volume, no returns, no fundamentals
No confidence intervals — just single-number predictions
Polynomial extrapolation goes to infinity past the training data
Monte Carlo used constant volatility (real stocks don't behave that way)
Stress scenarios were arbitrary multipliers, not grounded in data

V1 Files

What	File
Price prediction	prediction_v1.py
Monte Carlo simulation	monte_carlo_v1.py

V2 — Upgraded Models (Grade: A-)

After getting a C+ on V1, I rebuilt everything to fix all 7 issues. This section has everything you need to grade V2.

What I Fixed

Problem from V1	How V2 Fixes It
R² only on training data	Walk-forward validation on 63 unseen trading days
Only used date as a feature	10+ features (lagged returns, volatility, momentum, volume, SMA crossovers)
No overfitting control	Ridge regularization + ARIMA with AIC penalty
No confidence intervals	Bootstrap prediction intervals (90% and 50% bands)
Constant-vol Monte Carlo	Heston stochastic volatility + Merton jump diffusion
Made-up stress scenarios	HMM regime-switching (2 regimes detected from real PL data)
Single model, no comparison	3 MC models side-by-side + V1 vs V2 comparison tables

V2 Price Prediction Results

Model	R² (Out-of-Sample)	MAE	RMSE	Directional Accuracy
ARIMA	0.703	$1.14	$1.57	45.2%
Ridge + features	0.737	$1.11	$1.47	50.8%

The key difference: V1's polynomial R² of 0.85 was on training data (cheating). V2's R² of 0.74 is on data the model never saw (honest).

ARIMA 6-month forecast: $36.35 — 90% CI: [$34.22, $39.79]

V2 Monte Carlo Results

Model	2-Year Median	P(Profit)	P(Double)	VaR 95%
V1 GBM (baseline)	$31.02	46.8%	23.6%	$5.32
V2 Heston	$23.89	39.3%	21.2%	$2.76
V2 Jump Diffusion	$24.84	40.4%	22.0%	$3.22

The V2 models are more pessimistic because they're more realistic about tail risk and volatility clustering.

HMM Regime Detection: Found 2 regimes in PL's history — Calm (85% of days, 46% annualized vol) and Volatile (15% of days, 134% annualized vol).

V2 Charts

Chart	What It Shows
Forecast	V1 vs V2 prediction side-by-side
Dashboard	Walk-forward validation + scorecard
Fan Chart	GBM vs Heston vs Jump Diffusion
Stress Tests	5 HMM scenarios from crash to bull
Risk Analysis	Distributions, sensitivity, risk comparison

V2 Files

What	File
Price prediction	prediction_v2.py
Monte Carlo simulation	monte_carlo_v2.py
Prediction writeup	predictions/README.md
Monte Carlo writeup	monte-carlo/README.md
Model results	results.json
MC results	results.json

Repo Structure

raysteezy-learning/
├── data/
│   └── planet-labs/                     # Planet Labs (NYSE: PL) financial data
│       ├── quote.json                   #   Current stock price and key metrics
│       ├── income_statement.csv         #   Quarterly income statements
│       ├── balance_sheet.csv            #   Quarterly balance sheets
│       ├── cash_flow.csv               #   Quarterly cash flow statements
│       ├── price_history.csv            #   Full daily OHLCV prices (since IPO)
│       ├── README.md                    #   Data dictionary
│       └── predictions/
│           ├── README.md                #   V1 vs V2 model comparison
│           ├── results.json             #   V2 model results
│           ├── prices.csv               #   V2 forecasts with CIs
│           ├── forecast.png             #   V1 vs V2 prediction chart
│           ├── dashboard.png            #   Walk-forward validation dashboard
│           └── monte-carlo/
│               ├── README.md            #   MC methodology
│               ├── results.json         #   Full V1 vs V2 MC comparison
│               ├── paths.csv            #   Heston percentile paths
│               ├── stress_paths.csv     #   Stress scenario paths
│               ├── sensitivity.csv      #   Parameter sensitivity grid
│               ├── fan_chart.png        #   V1 vs V2 fan chart
│               ├── stress.png           #   HMM stress scenarios
│               └── risk.png             #   Risk analysis dashboard
├── scripts/
│   ├── fetch_planet_labs_financials.py  # Grabs financial data from Yahoo Finance
│   ├── prediction_v1.py                # V1 baseline (linear + polynomial)
│   ├── prediction_v2.py                # V2 upgraded (ARIMA + Ridge + walk-forward)
│   ├── monte_carlo_v1.py               # V1 baseline (constant-vol GBM)
│   └── monte_carlo_v2.py               # V2 upgraded (Heston + jumps + HMM)
├── .github/workflows/
│   └── update-planet-labs-data.yml     # Runs the full pipeline every week
├── .gitignore
├── LEGAL.md                            # All legal stuff in one place
├── LICENSE
├── SECURITY.md
└── README.md                           # You are here

Planet Labs (NYSE: PL) — Weekly Data Feed

I picked Planet Labs because they're a space company that's publicly traded, which I think is cool. The pipeline grabs their financial data once a week and saves it here so I can use it for analysis later.

Dataset	Format	How Often	What's In It
Quote & Metrics	JSON	Weekly	Current price, market cap, P/E ratio, margins
Income Statement	CSV	Weekly	Revenue, gross profit, operating income, EPS
Balance Sheet	CSV	Weekly	Assets, liabilities, equity, cash, debt
Cash Flow	CSV	Weekly	Operating, investing, financing cash flows
Price History	CSV	Weekly	Full daily open/high/low/close/volume (since IPO)

How the pipeline works:

A GitHub Actions workflow runs every Monday at 11:00 PM MST
The Python script uses yfinance to pull data from Yahoo Finance (free, no API key needed)
Both V1 and V2 prediction + Monte Carlo scripts run automatically
Updated files get auto-committed back to this repo
Airweave picks up the changes and indexes everything for search

Want to run it yourself? Go to the Actions tab → "Update Planet Labs Data" → "Run workflow"

Tools I Used

Tool	What I Used It For
Python 3.11	Everything — data collection, ML models, charts
GitHub Actions	Automating the weekly data pulls and model runs
Airweave	Syncing this repo for AI-powered search
yfinance	Getting stock data from Yahoo Finance (free)
NumPy / SciPy	Math for simulations and statistics
Matplotlib	Making all the charts and dashboards
scikit-learn	Linear, polynomial, and Ridge regression
pmdarima	Auto-ARIMA model selection
hmmlearn	Hidden Markov Model for regime detection

How to Run This Yourself

# Clone the repo
git clone https://github.com/raysteezy/raysteezy-learning.git
cd raysteezy-learning

# Install the Python packages you need
pip install yfinance pandas numpy scipy matplotlib scikit-learn pmdarima hmmlearn

# Run the data collector
python scripts/fetch_planet_labs_financials.py

# Run V1 models (baseline)
python scripts/prediction_v1.py
python scripts/monte_carlo_v1.py

# Run V2 models (upgraded)
python scripts/prediction_v2.py
python scripts/monte_carlo_v2.py

Legal

This repository is for educational and academic purposes only. Nothing herein constitutes financial advice, investment recommendations, or a solicitation to buy or sell any security. All models, simulations, and predictions are statistical exercises based on historical data with significant limitations. Past performance does not guarantee future results.

Everything legal — disclaimer, data attribution, security policy, and the MIT License — is in one file: LEGAL.md.

Built by @raysteezy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What this is and why

V1 — Baseline Models (Grade: C+)

V1 Price Prediction

V1 Monte Carlo

Why V1 Got a C+

V1 Files

V2 — Upgraded Models (Grade: A-)

What I Fixed

V2 Price Prediction Results

V2 Monte Carlo Results

V2 Charts

V2 Files

Repo Structure

Planet Labs (NYSE: PL) — Weekly Data Feed

Tools I Used

How to Run This Yourself

Legal

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
.github		.github
data/planet-labs		data/planet-labs
scripts		scripts
.gitignore		.gitignore
LEGAL.md		LEGAL.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Folders and files

Latest commit

History

Repository files navigation

What this is and why

V1 — Baseline Models (Grade: C+)

V1 Price Prediction

V1 Monte Carlo

Why V1 Got a C+

V1 Files

V2 — Upgraded Models (Grade: A-)

What I Fixed

V2 Price Prediction Results

V2 Monte Carlo Results

V2 Charts

V2 Files

Repo Structure

Planet Labs (NYSE: PL) — Weekly Data Feed

Tools I Used

How to Run This Yourself

Legal

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages