Skip to content

[Phase 3] Baseline Stellar Classification Model #4

@Sakeeb91

Description

@Sakeeb91

Objective

Train stellar type classifier achieving >85% accuracy on spectroscopic features.

Dependencies

  • Phase 2: Preprocessing Pipeline (requires cleaned data)

Tasks

  • Define stellar type labels from continuous parameters (dwarf/giant/etc.)
  • Implement src/models/classifier.py with StellarClassifier class
  • Implement src/evaluation/metrics.py for evaluation utilities
  • Handle class imbalance (more dwarfs than giants)
  • Extract and analyze feature importance
  • Save trained model to models/stellar_classifier_v1.joblib
  • Write unit tests for classifier

Files to Create

File Purpose
src/models/__init__.py Subpackage init
src/models/classifier.py Classifier implementation
src/evaluation/__init__.py Subpackage init
src/evaluation/metrics.py Custom metrics
tests/test_classifier.py Model tests

Starter Code

# src/models/classifier.py
"""Stellar type classification model."""

import numpy as np
import joblib
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

class StellarClassifier:
    """Classifier for stellar types based on spectroscopic features."""

    def __init__(self, n_estimators: int = 100, random_state: int = 42):
        self.model = RandomForestClassifier(
            n_estimators=n_estimators,
            random_state=random_state,
            class_weight="balanced"  # Handle imbalance
        )

    def fit(self, X: np.ndarray, y: np.ndarray) -> "StellarClassifier":
        """Train the classifier."""
        self.model.fit(X, y)
        return self

    def predict(self, X: np.ndarray) -> np.ndarray:
        """Predict stellar types."""
        return self.model.predict(X)

    def score(self, X: np.ndarray, y: np.ndarray) -> float:
        """Return accuracy score."""
        return self.model.score(X, y)

    def get_feature_importance(self) -> np.ndarray:
        """Return feature importances."""
        return self.model.feature_importances_

    def save(self, path: str) -> None:
        """Save model to disk."""
        joblib.dump(self.model, path)

Definition of Done

  • Classifier achieves >85% accuracy on test set
  • Confusion matrix shows reasonable class separation
  • Model saved and loadable via joblib
  • Feature importance extracted and documented
  • All tests passing

Technical Notes

  • Use class_weight="balanced" for imbalanced classes
  • Consider stratified cross-validation for robust evaluation
  • Log g < 3.5 typically indicates giants; > 3.5 indicates dwarfs

Part of #1 (Meta Issue)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions