Skip to content

[Phase 1] Data Foundation - APOGEE Data Loading #2

@Sakeeb91

Description

@Sakeeb91

Objective

Download APOGEE allStar file and create data loading infrastructure with quality filtering.

Dependencies

None - this is the foundation phase.

Tasks

  • Download APOGEE DR17 allStar FITS file from SDSS
  • Implement src/data/apogee_loader.py with load_apogee_allstar() function
  • Implement src/data/quality_filters.py for quality flag filtering
  • Create configs/data_config.yaml with data paths
  • Create notebooks/01_data_exploration.ipynb for initial EDA
  • Write unit tests for data loading

Files to Create

File Purpose
src/__init__.py Package init
src/data/__init__.py Subpackage init
src/data/apogee_loader.py Load APOGEE data
src/data/quality_filters.py Filter bad data
configs/data_config.yaml Data paths

Starter Code

# src/data/apogee_loader.py
"""APOGEE DR17 data loader."""

from pathlib import Path
import pandas as pd
from astropy.io import fits
from astropy.table import Table

DEFAULT_COLUMNS = [
    "APOGEE_ID", "RA", "DEC",
    "TEFF", "TEFF_ERR", "LOGG", "LOGG_ERR",
    "FE_H", "FE_H_ERR", "ALPHA_M", "ALPHA_M_ERR",
    "ASPCAPFLAG", "STARFLAG", "SNR", "J", "H", "K"
]

def load_apogee_allstar(
    filepath: str,
    columns: list[str] | None = None
) -> pd.DataFrame:
    """Load APOGEE allStar FITS file into DataFrame."""
    filepath = Path(filepath)
    if not filepath.exists():
        raise FileNotFoundError(f"FITS file not found: {filepath}")

    cols_to_load = columns or DEFAULT_COLUMNS

    with fits.open(filepath, memmap=True) as hdul:
        table = Table.read(hdul[1])
        df = table[cols_to_load].to_pandas()

    return df

Definition of Done

  • APOGEE allStar file downloaded (~1GB)
  • Loader returns DataFrame with >600,000 stars
  • Quality filters remove flagged stars
  • EDA notebook shows Teff, log g, [Fe/H] distributions
  • All tests passing

Technical Notes

  • FITS file can be >1GB; use memmap=True for memory efficiency
  • Quality flags are bitmasks - use bitwise AND to check specific flags
  • Missing values encoded as -9999 in APOGEE data

References


Part of #1 (Meta Issue)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions