[Phase 3] Baseline Stellar Classification Model

## Objective

Train stellar type classifier achieving >85% accuracy on spectroscopic features.

## Dependencies

- Phase 2: Preprocessing Pipeline (requires cleaned data)

## Tasks

- [ ] Define stellar type labels from continuous parameters (dwarf/giant/etc.)
- [ ] Implement `src/models/classifier.py` with `StellarClassifier` class
- [ ] Implement `src/evaluation/metrics.py` for evaluation utilities
- [ ] Handle class imbalance (more dwarfs than giants)
- [ ] Extract and analyze feature importance
- [ ] Save trained model to `models/stellar_classifier_v1.joblib`
- [ ] Write unit tests for classifier

## Files to Create

| File | Purpose |
|------|---------|
| `src/models/__init__.py` | Subpackage init |
| `src/models/classifier.py` | Classifier implementation |
| `src/evaluation/__init__.py` | Subpackage init |
| `src/evaluation/metrics.py` | Custom metrics |
| `tests/test_classifier.py` | Model tests |

## Starter Code

```python
# src/models/classifier.py
"""Stellar type classification model."""

import numpy as np
import joblib
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

class StellarClassifier:
    """Classifier for stellar types based on spectroscopic features."""

    def __init__(self, n_estimators: int = 100, random_state: int = 42):
        self.model = RandomForestClassifier(
            n_estimators=n_estimators,
            random_state=random_state,
            class_weight="balanced"  # Handle imbalance
        )

    def fit(self, X: np.ndarray, y: np.ndarray) -> "StellarClassifier":
        """Train the classifier."""
        self.model.fit(X, y)
        return self

    def predict(self, X: np.ndarray) -> np.ndarray:
        """Predict stellar types."""
        return self.model.predict(X)

    def score(self, X: np.ndarray, y: np.ndarray) -> float:
        """Return accuracy score."""
        return self.model.score(X, y)

    def get_feature_importance(self) -> np.ndarray:
        """Return feature importances."""
        return self.model.feature_importances_

    def save(self, path: str) -> None:
        """Save model to disk."""
        joblib.dump(self.model, path)
```

## Definition of Done

- [ ] Classifier achieves >85% accuracy on test set
- [ ] Confusion matrix shows reasonable class separation
- [ ] Model saved and loadable via joblib
- [ ] Feature importance extracted and documented
- [ ] All tests passing

## Technical Notes

- Use `class_weight="balanced"` for imbalanced classes
- Consider stratified cross-validation for robust evaluation
- Log g < 3.5 typically indicates giants; > 3.5 indicates dwarfs

---
Part of #1 (Meta Issue)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Phase 3] Baseline Stellar Classification Model #4

Objective

Dependencies

Tasks

Files to Create

Starter Code

Definition of Done

Technical Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

File	Purpose
`src/models/__init__.py`	Subpackage init
`src/models/classifier.py`	Classifier implementation
`src/evaluation/__init__.py`	Subpackage init
`src/evaluation/metrics.py`	Custom metrics
`tests/test_classifier.py`	Model tests

[Phase 3] Baseline Stellar Classification Model #4

Description

Objective

Dependencies

Tasks

Files to Create

Starter Code

Definition of Done

Technical Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions