Car Market Analysis — Swiss used car price prediction using machine learning.
HSLU Data Science Project 1 (HS25) | Documentation | W&B Dashboard | References
| Model | R² | MAE (CHF) | RMSE (CHF) | Status |
|---|---|---|---|---|
| Random Forest | 0.68 | 6,490 | 9,850 | Baseline |
| Neural Network | 0.96 | 3,059 | 4,200 | Best |
Best NN achieved via W&B hyperparameter sweep (s5dzwgec)
flowchart LR
subgraph Data
A[Raw Data] --> B[Cleaned]
B --> C[Train/Val/Test Split]
end
subgraph Pipelines
C --> D[RF Pipeline]
C --> E[NN Pipeline]
end
subgraph Training
D --> F[RandomForest]
E --> G[MLP Neural Net]
end
subgraph Evaluation
F --> H[Metrics & Viz]
G --> H
H --> I[W&B Logging]
end
Factory Pattern: All components instantiated via factories for reproducibility.
from src.pipelines import PipelineFactory
pipeline = PipelineFactory.create("NeuralNetworkPipeline")
result = pipeline.run() # Returns metrics, model path, artifactsSee Architecture Documentation for detailed diagrams.
# Clone and setup
git clone https://github.com/S4sch/DSPRO1.git && cd DSPRO1
./setup.sh # Creates venv, installs deps, configures pre-commit
# Run models
python scripts/step05_random_forest.py # RF baseline
python scripts/step06_neural_network.py # NN modelManual Installation
git lfs install && git lfs pull
python3 -m venv venv && source venv/bin/activate
pip install -e ".[dev]"
pre-commit installDSPRO1/
├── configs/ # YAML configurations (orchestration, model, training)
├── data/ # Raw and processed data (Git LFS)
├── docs/ # Documentation and diagrams
├── notebooks/ # Jupyter notebooks (EDA, baselines, HPO)
├── reports/ # Generated reports and figures
├── scripts/ # Pipeline execution scripts
├── src/ # Source code (factory-based architecture)
│ ├── datasets/ # Dataset classes (RF, NN)
│ ├── models/ # Model architectures (MLP)
│ ├── pipelines/ # ML pipelines with stages
│ ├── trainers/ # Training logic (RF, NN)
│ └── transforms/ # Data transformations
└── tests/ # Unit and integration tests
| Notebook | Purpose |
|---|---|
baseline_rf.ipynb |
Random Forest baseline analysis |
baseline_nn.ipynb |
Neural Network baseline analysis |
hpo_nn.ipynb |
NN hyperparameter optimization |
experiments.ipynb |
Central experiment dashboard |
EDA01_*.ipynb |
Exploratory data analysis |
| Document | Description |
|---|---|
| Documentation Index | Complete documentation hub |
| Architecture | System design and diagrams |
| Reproducibility Guide | How to reproduce results |
| Technical Report | LaTeX academic report |
| Resource | Link |
|---|---|
| W&B Dashboard | wandb.ai/hs25_dspro1/dspro1_carma |
| Best NN Sweep | s5dzwgec (R²=0.9577) |
| RF Sweep | 4diy2psz (R²=0.679) |
| GitHub Repository | github.com/S4sch/DSPRO1 |
make lint # Run Ruff linting
make test # Run pytest
make ci-local # Full CI check locallySee CONTRIBUTING.md for development workflow and guidelines.
Supervisors: Dr. Seraina Glaus, Dr. Umberto Michelucci, Dr. Jan Svoboda (HSLU)
AI Assistance: We used AI-assisted tools (ChatGPT, GitHub Copilot, and similar LLMs) to support development and documentation. All AI-generated content was reviewed and verified by the authors, who take full responsibility for the final work.
This project is licensed under the MIT License - see LICENSE for details.