Skip to content

javihslu/DSPRO1

Repository files navigation

DSPRO1 - CarMa

CI Python 3.10+ License: MIT W&B

Car Market Analysis — Swiss used car price prediction using machine learning.

HSLU Data Science Project 1 (HS25) | Documentation | W&B Dashboard | References


Results

Model MAE (CHF) RMSE (CHF) Status
Random Forest 0.68 6,490 9,850 Baseline
Neural Network 0.96 3,059 4,200 Best

Best NN achieved via W&B hyperparameter sweep (s5dzwgec)


Architecture

flowchart LR
    subgraph Data
        A[Raw Data] --> B[Cleaned]
        B --> C[Train/Val/Test Split]
    end

    subgraph Pipelines
        C --> D[RF Pipeline]
        C --> E[NN Pipeline]
    end

    subgraph Training
        D --> F[RandomForest]
        E --> G[MLP Neural Net]
    end

    subgraph Evaluation
        F --> H[Metrics & Viz]
        G --> H
        H --> I[W&B Logging]
    end
Loading

Factory Pattern: All components instantiated via factories for reproducibility.

from src.pipelines import PipelineFactory

pipeline = PipelineFactory.create("NeuralNetworkPipeline")
result = pipeline.run()  # Returns metrics, model path, artifacts

See Architecture Documentation for detailed diagrams.


Quick Start

# Clone and setup
git clone https://github.com/S4sch/DSPRO1.git && cd DSPRO1
./setup.sh  # Creates venv, installs deps, configures pre-commit

# Run models
python scripts/step05_random_forest.py   # RF baseline
python scripts/step06_neural_network.py  # NN model
Manual Installation
git lfs install && git lfs pull
python3 -m venv venv && source venv/bin/activate
pip install -e ".[dev]"
pre-commit install

Project Structure

DSPRO1/
├── configs/           # YAML configurations (orchestration, model, training)
├── data/              # Raw and processed data (Git LFS)
├── docs/              # Documentation and diagrams
├── notebooks/         # Jupyter notebooks (EDA, baselines, HPO)
├── reports/           # Generated reports and figures
├── scripts/           # Pipeline execution scripts
├── src/               # Source code (factory-based architecture)
│   ├── datasets/      # Dataset classes (RF, NN)
│   ├── models/        # Model architectures (MLP)
│   ├── pipelines/     # ML pipelines with stages
│   ├── trainers/      # Training logic (RF, NN)
│   └── transforms/    # Data transformations
└── tests/             # Unit and integration tests

Key Notebooks

Notebook Purpose
baseline_rf.ipynb Random Forest baseline analysis
baseline_nn.ipynb Neural Network baseline analysis
hpo_nn.ipynb NN hyperparameter optimization
experiments.ipynb Central experiment dashboard
EDA01_*.ipynb Exploratory data analysis

Documentation

Document Description
Documentation Index Complete documentation hub
Architecture System design and diagrams
Reproducibility Guide How to reproduce results
Technical Report LaTeX academic report

Links & Resources

Resource Link
W&B Dashboard wandb.ai/hs25_dspro1/dspro1_carma
Best NN Sweep s5dzwgec (R²=0.9577)
RF Sweep 4diy2psz (R²=0.679)
GitHub Repository github.com/S4sch/DSPRO1

Development

make lint      # Run Ruff linting
make test      # Run pytest
make ci-local  # Full CI check locally

See CONTRIBUTING.md for development workflow and guidelines.


Authors


Acknowledgments

Supervisors: Dr. Seraina Glaus, Dr. Umberto Michelucci, Dr. Jan Svoboda (HSLU)

AI Assistance: We used AI-assisted tools (ChatGPT, GitHub Copilot, and similar LLMs) to support development and documentation. All AI-generated content was reviewed and verified by the authors, who take full responsibility for the final work.


License

This project is licensed under the MIT License - see LICENSE for details.

About

We present CarMa, a machine-learning system for predicting used-car prices in the Swiss market.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors