Skip to content

raysteezy/raysteezy-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

172 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Raysteezy Learning
Machine Learning · Financial Data Pipelines · Python Development

Planet Labs Data Status Python 3.11 Perplexity Computer Automated Data MIT License

What this is and why

This is my personal learning expierence. I'm a first-year college student teaching myself Python, data pipelines, and machine learning by working with real stock market data. There is nothing fancy this is just me figuring things out and documenting what I learn along the way.

The main project tracks Planet Labs (PL) stock data using 5 years of daily price history (since their IPO in April 2021). I built a pipeline that grabs financial data every week, and I've been building prediction models in two rounds — V1 and V2. Both versions live in separate files so you can see the progression.

All the data in this repo also gets synced to Airweave for AI-powered search.

V1 — Baseline Models (Grade: C+)

My first attempt. I used basic regression with no proper validation. These models taught me a lot about what NOT to do, so I'm keeping them for reference.

V1 Price Prediction

Model R² (Training) Problem
Linear Regression 0.036 Underfits — barely explains anything
Polynomial (deg-3) 0.853 Overfits — curves up forever past training data

V1 Monte Carlo

Detail Value
Model Constant-vol Geometric Brownian Motion (GBM)
Stress tests Made-up multipliers (not based on real data)
Validation None
2-year median $31.02
P(Profit) 46.8%

Why V1 Got a C+

  • R² was only computed on training data (no out-of-sample testing)
  • Only used the date as input — no volume, no returns, no fundamentals
  • No confidence intervals — just single-number predictions
  • Polynomial extrapolation goes to infinity past the training data
  • Monte Carlo used constant volatility (real stocks don't behave that way)
  • Stress scenarios were arbitrary multipliers, not grounded in data

V1 Files

What File
Price prediction prediction_v1.py
Monte Carlo simulation monte_carlo_v1.py

V2 — Upgraded Models (Grade: A-)

After getting a C+ on V1, I rebuilt everything to fix all 7 issues. This section has everything you need to grade V2.

What I Fixed

Problem from V1 How V2 Fixes It
R² only on training data Walk-forward validation on 63 unseen trading days
Only used date as a feature 10+ features (lagged returns, volatility, momentum, volume, SMA crossovers)
No overfitting control Ridge regularization + ARIMA with AIC penalty
No confidence intervals Bootstrap prediction intervals (90% and 50% bands)
Constant-vol Monte Carlo Heston stochastic volatility + Merton jump diffusion
Made-up stress scenarios HMM regime-switching (2 regimes detected from real PL data)
Single model, no comparison 3 MC models side-by-side + V1 vs V2 comparison tables

V2 Price Prediction Results

Model R² (Out-of-Sample) MAE RMSE Directional Accuracy
ARIMA 0.703 $1.14 $1.57 45.2%
Ridge + features 0.737 $1.11 $1.47 50.8%

The key difference: V1's polynomial R² of 0.85 was on training data (cheating). V2's R² of 0.74 is on data the model never saw (honest).

ARIMA 6-month forecast: $36.35 — 90% CI: [$34.22, $39.79]

V2 Monte Carlo Results

Model 2-Year Median P(Profit) P(Double) VaR 95%
V1 GBM (baseline) $31.02 46.8% 23.6% $5.32
V2 Heston $23.89 39.3% 21.2% $2.76
V2 Jump Diffusion $24.84 40.4% 22.0% $3.22

The V2 models are more pessimistic because they're more realistic about tail risk and volatility clustering.

HMM Regime Detection: Found 2 regimes in PL's history — Calm (85% of days, 46% annualized vol) and Volatile (15% of days, 134% annualized vol).

V2 Charts

Chart What It Shows
Forecast V1 vs V2 prediction side-by-side
Dashboard Walk-forward validation + scorecard
Fan Chart GBM vs Heston vs Jump Diffusion
Stress Tests 5 HMM scenarios from crash to bull
Risk Analysis Distributions, sensitivity, risk comparison

V2 Files

What File
Price prediction prediction_v2.py
Monte Carlo simulation monte_carlo_v2.py
Prediction writeup predictions/README.md
Monte Carlo writeup monte-carlo/README.md
Model results results.json
MC results results.json

Repo Structure

raysteezy-learning/
├── data/
│   └── planet-labs/                     # Planet Labs (NYSE: PL) financial data
│       ├── quote.json                   #   Current stock price and key metrics
│       ├── income_statement.csv         #   Quarterly income statements
│       ├── balance_sheet.csv            #   Quarterly balance sheets
│       ├── cash_flow.csv               #   Quarterly cash flow statements
│       ├── price_history.csv            #   Full daily OHLCV prices (since IPO)
│       ├── README.md                    #   Data dictionary
│       └── predictions/
│           ├── README.md                #   V1 vs V2 model comparison
│           ├── results.json             #   V2 model results
│           ├── prices.csv               #   V2 forecasts with CIs
│           ├── forecast.png             #   V1 vs V2 prediction chart
│           ├── dashboard.png            #   Walk-forward validation dashboard
│           └── monte-carlo/
│               ├── README.md            #   MC methodology
│               ├── results.json         #   Full V1 vs V2 MC comparison
│               ├── paths.csv            #   Heston percentile paths
│               ├── stress_paths.csv     #   Stress scenario paths
│               ├── sensitivity.csv      #   Parameter sensitivity grid
│               ├── fan_chart.png        #   V1 vs V2 fan chart
│               ├── stress.png           #   HMM stress scenarios
│               └── risk.png             #   Risk analysis dashboard
├── scripts/
│   ├── fetch_planet_labs_financials.py  # Grabs financial data from Yahoo Finance
│   ├── prediction_v1.py                # V1 baseline (linear + polynomial)
│   ├── prediction_v2.py                # V2 upgraded (ARIMA + Ridge + walk-forward)
│   ├── monte_carlo_v1.py               # V1 baseline (constant-vol GBM)
│   └── monte_carlo_v2.py               # V2 upgraded (Heston + jumps + HMM)
├── .github/workflows/
│   └── update-planet-labs-data.yml     # Runs the full pipeline every week
├── .gitignore
├── LEGAL.md                            # All legal stuff in one place
├── LICENSE
├── SECURITY.md
└── README.md                           # You are here

Planet Labs (NYSE: PL) — Weekly Data Feed

I picked Planet Labs because they're a space company that's publicly traded, which I think is cool. The pipeline grabs their financial data once a week and saves it here so I can use it for analysis later.

Dataset Format How Often What's In It
Quote & Metrics JSON Weekly Current price, market cap, P/E ratio, margins
Income Statement CSV Weekly Revenue, gross profit, operating income, EPS
Balance Sheet CSV Weekly Assets, liabilities, equity, cash, debt
Cash Flow CSV Weekly Operating, investing, financing cash flows
Price History CSV Weekly Full daily open/high/low/close/volume (since IPO)

How the pipeline works:

  1. A GitHub Actions workflow runs every Monday at 11:00 PM MST
  2. The Python script uses yfinance to pull data from Yahoo Finance (free, no API key needed)
  3. Both V1 and V2 prediction + Monte Carlo scripts run automatically
  4. Updated files get auto-committed back to this repo
  5. Airweave picks up the changes and indexes everything for search

Want to run it yourself? Go to the Actions tab → "Update Planet Labs Data" → "Run workflow"

Tools I Used

Tool What I Used It For
Python 3.11 Everything — data collection, ML models, charts
GitHub Actions Automating the weekly data pulls and model runs
Airweave Syncing this repo for AI-powered search
yfinance Getting stock data from Yahoo Finance (free)
NumPy / SciPy Math for simulations and statistics
Matplotlib Making all the charts and dashboards
scikit-learn Linear, polynomial, and Ridge regression
pmdarima Auto-ARIMA model selection
hmmlearn Hidden Markov Model for regime detection

How to Run This Yourself

# Clone the repo
git clone https://github.com/raysteezy/raysteezy-learning.git
cd raysteezy-learning

# Install the Python packages you need
pip install yfinance pandas numpy scipy matplotlib scikit-learn pmdarima hmmlearn

# Run the data collector
python scripts/fetch_planet_labs_financials.py

# Run V1 models (baseline)
python scripts/prediction_v1.py
python scripts/monte_carlo_v1.py

# Run V2 models (upgraded)
python scripts/prediction_v2.py
python scripts/monte_carlo_v2.py

Legal

This repository is for educational and academic purposes only. Nothing herein constitutes financial advice, investment recommendations, or a solicitation to buy or sell any security. All models, simulations, and predictions are statistical exercises based on historical data with significant limitations. Past performance does not guarantee future results.

Everything legal — disclaimer, data attribution, security policy, and the MIT License — is in one file: LEGAL.md.


Built by @raysteezy

About

Learning projects — ML, financial data pipelines, and Python development

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages