Skip to content

[Phase 4] Parameter Regression Models #5

@Sakeeb91

Description

@Sakeeb91

Objective

Predict atmospheric parameters (Teff, log g, [Fe/H], [alpha/Fe]) with target MAE thresholds.

Dependencies

  • Phase 2: Preprocessing Pipeline (requires cleaned features)
  • Phase 3: Classification (can run in parallel)

Tasks

  • Implement src/models/regressor.py with multi-output regression
  • Implement src/evaluation/regression_metrics.py for MAE, R2, scatter
  • Train separate models for each parameter
  • Analyze residuals for systematic bias
  • Create notebooks/03_regression_analysis.ipynb
  • Save trained models to models/parameter_regressor_v1.joblib

Files to Create

File Purpose
src/models/regressor.py Regression models
src/evaluation/regression_metrics.py Regression metrics
notebooks/03_regression_analysis.ipynb Analysis

Target Metrics

Parameter Target MAE
Teff < 100 K
log g < 0.2 dex
[Fe/H] < 0.1 dex
[alpha/Fe] < 0.05 dex

Starter Code

# src/models/regressor.py
"""Stellar parameter regression models."""

import numpy as np
import joblib
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor

class ParameterRegressor:
    """Regressor for stellar atmospheric parameters."""

    def __init__(self, target: str, model_type: str = "xgboost"):
        self.target = target
        self.model_type = model_type

        if model_type == "xgboost":
            self.model = XGBRegressor(
                n_estimators=100,
                max_depth=6,
                learning_rate=0.1,
                random_state=42
            )
        else:
            self.model = RandomForestRegressor(
                n_estimators=100,
                random_state=42
            )

    def fit(self, X: np.ndarray, y: np.ndarray) -> "ParameterRegressor":
        """Train the regressor."""
        self.model.fit(X, y)
        return self

    def predict(self, X: np.ndarray) -> np.ndarray:
        """Predict parameter values."""
        return self.model.predict(X)

    def save(self, path: str) -> None:
        """Save model to disk."""
        joblib.dump({"model": self.model, "target": self.target}, path)

Definition of Done

  • Teff MAE < 100 K
  • log g MAE < 0.2 dex
  • [Fe/H] MAE < 0.1 dex
  • Residual analysis shows no systematic bias
  • All tests passing

Technical Notes

  • Different parameters have different error characteristics
  • Consider parameter correlations (Teff-log g relation)
  • XGBoost often outperforms RF for regression

Part of #1 (Meta Issue)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions