Raysteezy Learning
Machine Learning · Financial Data Pipelines · Python Development
This is my personal learning expierence. I'm a first-year college student teaching myself Python, data pipelines, and machine learning by working with real stock market data. There is nothing fancy this is just me figuring things out and documenting what I learn along the way.
The main project tracks Planet Labs (PL) stock data using 5 years of daily price history (since their IPO in April 2021). I built a pipeline that grabs financial data every week, and I've been building prediction models in two rounds — V1 and V2. Both versions live in separate files so you can see the progression.
All the data in this repo also gets synced to Airweave for AI-powered search.
My first attempt. I used basic regression with no proper validation. These models taught me a lot about what NOT to do, so I'm keeping them for reference.
| Model | R² (Training) | Problem |
|---|---|---|
| Linear Regression | 0.036 | Underfits — barely explains anything |
| Polynomial (deg-3) | 0.853 | Overfits — curves up forever past training data |
| Detail | Value |
|---|---|
| Model | Constant-vol Geometric Brownian Motion (GBM) |
| Stress tests | Made-up multipliers (not based on real data) |
| Validation | None |
| 2-year median | $31.02 |
| P(Profit) | 46.8% |
- R² was only computed on training data (no out-of-sample testing)
- Only used the date as input — no volume, no returns, no fundamentals
- No confidence intervals — just single-number predictions
- Polynomial extrapolation goes to infinity past the training data
- Monte Carlo used constant volatility (real stocks don't behave that way)
- Stress scenarios were arbitrary multipliers, not grounded in data
| What | File |
|---|---|
| Price prediction | prediction_v1.py |
| Monte Carlo simulation | monte_carlo_v1.py |
After getting a C+ on V1, I rebuilt everything to fix all 7 issues. This section has everything you need to grade V2.
| Problem from V1 | How V2 Fixes It |
|---|---|
| R² only on training data | Walk-forward validation on 63 unseen trading days |
| Only used date as a feature | 10+ features (lagged returns, volatility, momentum, volume, SMA crossovers) |
| No overfitting control | Ridge regularization + ARIMA with AIC penalty |
| No confidence intervals | Bootstrap prediction intervals (90% and 50% bands) |
| Constant-vol Monte Carlo | Heston stochastic volatility + Merton jump diffusion |
| Made-up stress scenarios | HMM regime-switching (2 regimes detected from real PL data) |
| Single model, no comparison | 3 MC models side-by-side + V1 vs V2 comparison tables |
| Model | R² (Out-of-Sample) | MAE | RMSE | Directional Accuracy |
|---|---|---|---|---|
| ARIMA | 0.703 | $1.14 | $1.57 | 45.2% |
| Ridge + features | 0.737 | $1.11 | $1.47 | 50.8% |
The key difference: V1's polynomial R² of 0.85 was on training data (cheating). V2's R² of 0.74 is on data the model never saw (honest).
ARIMA 6-month forecast: $36.35 — 90% CI: [$34.22, $39.79]
| Model | 2-Year Median | P(Profit) | P(Double) | VaR 95% |
|---|---|---|---|---|
| V1 GBM (baseline) | $31.02 | 46.8% | 23.6% | $5.32 |
| V2 Heston | $23.89 | 39.3% | 21.2% | $2.76 |
| V2 Jump Diffusion | $24.84 | 40.4% | 22.0% | $3.22 |
The V2 models are more pessimistic because they're more realistic about tail risk and volatility clustering.
HMM Regime Detection: Found 2 regimes in PL's history — Calm (85% of days, 46% annualized vol) and Volatile (15% of days, 134% annualized vol).
| Chart | What It Shows |
|---|---|
| Forecast | V1 vs V2 prediction side-by-side |
| Dashboard | Walk-forward validation + scorecard |
| Fan Chart | GBM vs Heston vs Jump Diffusion |
| Stress Tests | 5 HMM scenarios from crash to bull |
| Risk Analysis | Distributions, sensitivity, risk comparison |
| What | File |
|---|---|
| Price prediction | prediction_v2.py |
| Monte Carlo simulation | monte_carlo_v2.py |
| Prediction writeup | predictions/README.md |
| Monte Carlo writeup | monte-carlo/README.md |
| Model results | results.json |
| MC results | results.json |
raysteezy-learning/
├── data/
│ └── planet-labs/ # Planet Labs (NYSE: PL) financial data
│ ├── quote.json # Current stock price and key metrics
│ ├── income_statement.csv # Quarterly income statements
│ ├── balance_sheet.csv # Quarterly balance sheets
│ ├── cash_flow.csv # Quarterly cash flow statements
│ ├── price_history.csv # Full daily OHLCV prices (since IPO)
│ ├── README.md # Data dictionary
│ └── predictions/
│ ├── README.md # V1 vs V2 model comparison
│ ├── results.json # V2 model results
│ ├── prices.csv # V2 forecasts with CIs
│ ├── forecast.png # V1 vs V2 prediction chart
│ ├── dashboard.png # Walk-forward validation dashboard
│ └── monte-carlo/
│ ├── README.md # MC methodology
│ ├── results.json # Full V1 vs V2 MC comparison
│ ├── paths.csv # Heston percentile paths
│ ├── stress_paths.csv # Stress scenario paths
│ ├── sensitivity.csv # Parameter sensitivity grid
│ ├── fan_chart.png # V1 vs V2 fan chart
│ ├── stress.png # HMM stress scenarios
│ └── risk.png # Risk analysis dashboard
├── scripts/
│ ├── fetch_planet_labs_financials.py # Grabs financial data from Yahoo Finance
│ ├── prediction_v1.py # V1 baseline (linear + polynomial)
│ ├── prediction_v2.py # V2 upgraded (ARIMA + Ridge + walk-forward)
│ ├── monte_carlo_v1.py # V1 baseline (constant-vol GBM)
│ └── monte_carlo_v2.py # V2 upgraded (Heston + jumps + HMM)
├── .github/workflows/
│ └── update-planet-labs-data.yml # Runs the full pipeline every week
├── .gitignore
├── LEGAL.md # All legal stuff in one place
├── LICENSE
├── SECURITY.md
└── README.md # You are here
I picked Planet Labs because they're a space company that's publicly traded, which I think is cool. The pipeline grabs their financial data once a week and saves it here so I can use it for analysis later.
| Dataset | Format | How Often | What's In It |
|---|---|---|---|
| Quote & Metrics | JSON | Weekly | Current price, market cap, P/E ratio, margins |
| Income Statement | CSV | Weekly | Revenue, gross profit, operating income, EPS |
| Balance Sheet | CSV | Weekly | Assets, liabilities, equity, cash, debt |
| Cash Flow | CSV | Weekly | Operating, investing, financing cash flows |
| Price History | CSV | Weekly | Full daily open/high/low/close/volume (since IPO) |
How the pipeline works:
- A GitHub Actions workflow runs every Monday at 11:00 PM MST
- The Python script uses
yfinanceto pull data from Yahoo Finance (free, no API key needed) - Both V1 and V2 prediction + Monte Carlo scripts run automatically
- Updated files get auto-committed back to this repo
- Airweave picks up the changes and indexes everything for search
Want to run it yourself? Go to the Actions tab → "Update Planet Labs Data" → "Run workflow"
| Tool | What I Used It For |
|---|---|
| Python 3.11 | Everything — data collection, ML models, charts |
| GitHub Actions | Automating the weekly data pulls and model runs |
| Airweave | Syncing this repo for AI-powered search |
| yfinance | Getting stock data from Yahoo Finance (free) |
| NumPy / SciPy | Math for simulations and statistics |
| Matplotlib | Making all the charts and dashboards |
| scikit-learn | Linear, polynomial, and Ridge regression |
| pmdarima | Auto-ARIMA model selection |
| hmmlearn | Hidden Markov Model for regime detection |
# Clone the repo
git clone https://github.com/raysteezy/raysteezy-learning.git
cd raysteezy-learning
# Install the Python packages you need
pip install yfinance pandas numpy scipy matplotlib scikit-learn pmdarima hmmlearn
# Run the data collector
python scripts/fetch_planet_labs_financials.py
# Run V1 models (baseline)
python scripts/prediction_v1.py
python scripts/monte_carlo_v1.py
# Run V2 models (upgraded)
python scripts/prediction_v2.py
python scripts/monte_carlo_v2.pyThis repository is for educational and academic purposes only. Nothing herein constitutes financial advice, investment recommendations, or a solicitation to buy or sell any security. All models, simulations, and predictions are statistical exercises based on historical data with significant limitations. Past performance does not guarantee future results.
Everything legal — disclaimer, data attribution, security policy, and the MIT License — is in one file: LEGAL.md.
Built by @raysteezy