Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
b12cba1
initial commit
Mar 27, 2026
3bf5d33
adding workflow
Apr 8, 2026
3c720f0
adding workflow
Apr 9, 2026
011e5fe
adding workflow
Apr 9, 2026
1b7c624
adding workflow
Apr 9, 2026
1080fb9
adding workflow
Apr 9, 2026
16bc5b6
adding workflow
Apr 9, 2026
4e3186e
adding workflow
Apr 9, 2026
36e29f5
adding workflow
Apr 9, 2026
ced1143
adding workflow
Apr 9, 2026
af3e4b2
adding workflow
Apr 9, 2026
3db03be
adding workflow
Apr 9, 2026
2d8ba77
adding workflow
Apr 9, 2026
e3ad618
adding workflow
Apr 9, 2026
1d3e60d
adding workflow
Apr 9, 2026
eb0a5f0
adding workflow
Apr 9, 2026
9c0041b
adding workflow
Apr 9, 2026
bf92e46
adding workflow
Apr 9, 2026
2b57153
adding workflow
Apr 9, 2026
fb1446c
adding workflow
Apr 9, 2026
6452237
adding workflow
Apr 9, 2026
6fbc053
adding workflow
Apr 9, 2026
6f73ba9
adding workflow
Apr 9, 2026
20dd349
adding workflow
Apr 9, 2026
b8d15d9
adding workflow
Apr 9, 2026
e3ed5c6
adding workflow
Apr 10, 2026
a7bea19
adding workflow
Apr 10, 2026
3f0ef96
adding workflow
Apr 10, 2026
6c2212f
adding workflow
Apr 10, 2026
434032a
adding workflow
Apr 10, 2026
72d2c6b
adding workflow
Apr 10, 2026
e480766
adding workflow
Apr 10, 2026
aaeb117
adding workflow
Apr 10, 2026
b1e5d8f
adding workflow
Apr 10, 2026
2693ecd
adding s3 bucket cf template
Apr 21, 2026
156d8c4
adding s3 bucket cf template
Apr 21, 2026
4fb4972
adding s3 bucket cf template
Apr 21, 2026
f54f205
adding s3 bucket cf template
Apr 21, 2026
4fb6f5b
adding s3 bucket cf template
Apr 21, 2026
d8c3d65
adding s3 bucket cf template
Apr 21, 2026
c8a5b08
adding s3 bucket cf template
Apr 21, 2026
439a501
adding s3 bucket cf template
Apr 21, 2026
4e16219
adding s3 bucket cf template
Apr 21, 2026
8f30f86
adding s3 bucket cf template
Apr 22, 2026
78758b0
adding playbook
May 1, 2026
ac6d3ca
adding the ML files
May 1, 2026
71a30d5
adding the file to test on Github
May 2, 2026
df75ea9
adding the file to test on Github
May 2, 2026
1e48bde
adding the file to test on Github
May 2, 2026
27bbf88
adding the file to test on Github
May 2, 2026
ebc3030
adding the file to test on Github
May 2, 2026
c71e0d8
adding the file to test on Github
May 2, 2026
12f87b9
adding the file to test on Github
May 2, 2026
ce3e8f3
adding the file
May 6, 2026
0def107
removing the helpers folder, not necessary
May 7, 2026
c008a12
final reshaping
May 7, 2026
1c18356
final reshaping
May 7, 2026
21cde4f
final reshaping
May 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Use Python 3.12 slim (already has Python and pip).
FROM python:3.12-slim

# Avoid interactive prompts during apt operations.
ENV DEBIAN_FRONTEND=noninteractive

# Install CA certificates (needed for HTTPS).
RUN apt-get update && apt-get install -y \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*

# Install project specific packages.
RUN mkdir -p /install
COPY requirements.txt /install/requirements.txt
RUN pip install --upgrade pip && \
pip install --no-cache-dir jupyterlab jupyterlab_vim jupytext -r /install/requirements.txt

# Config.
COPY etc_sudoers /install/
COPY etc_sudoers /etc/sudoers
COPY bashrc /root/.bashrc

# Report package versions.
COPY version.sh /install/
RUN /install/version.sh 2>&1 | tee version.log

# Jupyter.
EXPOSE 8888

CMD ["/bin/bash"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# Ansible

## Description
- Ansible is an open-source automation tool used for application deployment,
configuration management, and task automation.
- It uses a simple, human-readable YAML syntax to define automation tasks,
making it accessible for users without extensive programming knowledge.
- Ansible operates in an agentless manner, meaning it does not require any
software to be installed on the target machines, allowing for easier
management of systems.
- It supports a wide range of modules for various tasks, including cloud
provisioning, orchestration, and security compliance, enabling extensive
automation capabilities.
- Ansible is designed to be idempotent, which means that running the same
playbook multiple times will not change the system beyond the initial
application, ensuring stability and predictability.

## How to run the project
## 🚀 Step-by-Step Execution


### Step 1 — Build the Docker image

```bash
docker build -t house-price-project .
```

> This installs all dependencies from `requirements.txt` and sets up
> JupyterLab inside the container. Takes ~3–5 minutes on first build.

### Step 2 — Start the container

```bash
docker run -it -p 5001:5000 -p 8888:8888 \
--name house-price \
-v $(pwd):/project \
house-price-project
```
| Flag | Meaning |
|------|---------|
| `-it` | Interactive shell |
| `-p 5001:5000` | Mac port 5001 → container port 5000 (Flask API) |
| `-p 8888:8888` | Mac port 8888 → container port 8888 (JupyterLab) |
| `--name house-price` | Give the container a fixed name |
| `-v $(pwd):/project` | Mount project folder so files persist |

You will land inside the container at `root@container:/project#`.

### Step 3 — Train the model

Inside the container:

If you want to run the JupyterLab interface, execute:
```bash
PORT=5000 jupyter lab --ip=0.0.0.0 --no-browser --allow-root
```

if you want to run the training script, execute:
```bash
python template.example.py
```

Expected output:
WARNING: File 'ml_model/train.csv' not found – generating synthetic dataset.
INFO: Dataset shape: (1460, 16)
INFO: Cross-validating GradientBoosting (5 folds)…
INFO: Cross-validating RandomForest (5 folds)…
INFO: Cross-validating Ridge (5 folds)…
INFO: Best model: GradientBoosting
INFO: Test R²: 0.9822
INFO: Model saved to '/project/ml_model/house_price_model.pkl'.

> If you have the Kaggle dataset, place `train.csv` in `ml_model/` before
> running this step to train on real data instead of synthetic data.

### Step 4 — Start the Flask API

Inside the container (keep this terminal open):

```bash
PORT=5000 python app.py
```

then in Jupyter Notebook run template.API.py inside the notebook and run the cells

Once everything is done, you can run the whole process using Ansible. Make sure you have ansible installed and configured properly. Then, execute the following command in your terminal:

```bash
ansible-playbook playbook.yml
```


## Project Objective
The goal of the project is to automate the deployment of a machine learning
model using Ansible. Students will create a playbook that provisions a virtual
machine, installs necessary dependencies, and deploys a pre-trained model to
serve predictions via a REST API. The project will optimize the deployment
process to ensure it is efficient and reproducible.

## Dataset Suggestions
1. **Kaggle House Prices Dataset**
- **Source Name**: Kaggle
- **URL**:
[Kaggle House Prices](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)
- **Data Contains**: Various features of houses in Ames, Iowa, including sale
prices, which can be used for regression tasks.
- **Access Requirements**: Free account on Kaggle.

2. **UCI Machine Learning Repository: Wine Quality Dataset**
- **Source Name**: UCI Machine Learning Repository
- **URL**:
[Wine Quality Dataset](https://archive.ics.uci.edu/ml/datasets/wine+quality)
- **Data Contains**: Chemical properties of wine samples along with quality
ratings, suitable for classification tasks.
- **Access Requirements**: No authentication required.

3. **Open Government Data: NYC Taxi Trip Data**
- **Source Name**: NYC Open Data
- **URL**: [NYC Taxi Trip Data](https://opendata.cityofnewyork.us/)
- **Data Contains**: Trip records including pickup and drop-off locations,
times, and fares, which can be used for regression or clustering tasks.
- **Access Requirements**: Publicly available without authentication.

## Tasks
- **Set Up Virtual Environment**: Create a virtual machine using Ansible to host
the machine learning model.
- **Install Dependencies**: Write Ansible tasks to install necessary libraries
and frameworks (e.g., Flask, scikit-learn) for serving the model.
- **Deploy Model**: Use Ansible to copy the pre-trained model files to the
virtual machine and configure the application to serve predictions.
- **Create REST API**: Implement a simple REST API using Flask to handle
incoming prediction requests and return results.
- **Testing and Validation**: Write Ansible tasks to test the deployment and
validate that the API is returning the expected outputs.

## Bonus Ideas
- **Monitoring and Logging**: Extend the project by integrating monitoring tools
(e.g., Prometheus) to keep track of API performance and logs.
- **Scaling Deployment**: Explore how to scale the deployment across multiple
servers using Ansible's orchestration capabilities.
- **CI/CD Pipeline**: Implement a continuous integration/continuous deployment
(CI/CD) pipeline to automate updates to the model and application.

## Useful Resources
- [Ansible Documentation](https://docs.ansible.com/ansible/latest/index.html)
- [Ansible GitHub Repository](https://github.com/ansible/ansible)
- [Kaggle Datasets](https://www.kaggle.com/datasets)
- [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php)
- [Flask Documentation](https://flask.palletsprojects.com/)
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
"""
app.py
──────
House Price Prediction – Flask REST API server.

Run:
python app.py

Endpoints:
GET /health Liveness probe.
GET /features Feature catalogue and defaults.
POST /predict Predict price for a single house.
POST /predict/batch Predict prices for multiple houses.
"""

import logging
import os
import sys

# Add /project to the path so template_utils can be imported directly.
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))

import template_utils as tu
from flask import Flask, jsonify, request

# ── App setup ─────────────────────────────────────────────────
app = Flask(__name__)
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s",
)
_LOG = logging.getLogger(__name__)

# Eager-load the model at startup so the first request is fast.
try:
_model = tu.load_model_artifact()
_LOG.info("Model loaded successfully.")
except FileNotFoundError as exc:
_LOG.warning("Startup model load failed: %s", exc)
_model = None


# ── Routes ────────────────────────────────────────────────────
@app.get("/health")
def health():
"""Return API liveness and model status."""
status = "ok" if _model is not None else "model_unavailable"
return jsonify({"status": status}), 200 if _model else 503


@app.get("/features")
def features():
"""Return the feature catalogue and default values."""
return jsonify({
"numeric_features": tu.NUMERIC_FEATURES,
"categorical_features": tu.CATEGORICAL_FEATURES,
"defaults": tu.FEATURE_DEFAULTS,
})


@app.post("/predict")
def predict():
"""
Predict the sale price for a single house.

All request fields are optional; missing values use FEATURE_DEFAULTS.
"""
if _model is None:
return jsonify({"error": "Model not loaded"}), 503
try:
payload = request.get_json(force=True) or {}
errors = tu.validate_features(payload)
if errors:
return jsonify({"error": "Validation failed", "details": errors}), 400
price = tu.predict_price(payload, model=_model)
_LOG.info("predict price=%.2f payload=%s", price, payload)
return jsonify({
"predicted_price": price,
"model_version": "1.0",
})
except Exception as exc:
_LOG.exception("Prediction error.")
return jsonify({"error": str(exc)}), 500


@app.post("/predict/batch")
def predict_batch():
"""Predict prices for multiple houses in one call."""
if _model is None:
return jsonify({"error": "Model not loaded"}), 503
try:
body = request.get_json(force=True) or {}
instances = body.get("instances", [])
if not instances:
return jsonify({"error": "No instances provided"}), 400
prices = [tu.predict_price(inst, model=_model) for inst in instances]
_LOG.info("batch_predict count=%d", len(prices))
return jsonify({"predictions": prices, "count": len(prices)})
except Exception as exc:
_LOG.exception("Batch prediction error.")
return jsonify({"error": str(exc)}), 500


# ── Entry point ───────────────────────────────────────────────
if __name__ == "__main__":
port = int(os.getenv("PORT", 5000))
debug = os.getenv("FLASK_DEBUG", "false").lower() == "true"
app.run(host="0.0.0.0", port=port, debug=debug)
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# ~/.bashrc – container shell configuration.

# Prompt.
export PS1='\[\033[01;32m\]\u@container\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ '

# Convenience aliases.
alias ll='ls -alF'
alias la='ls -A'
alias l='ls -CF'

# Set working directory to the project root on login.
cd /project 2>/dev/null || true

# Show available commands on startup.
echo ""
echo " ┌─────────────────────────────────────────────────────┐"
echo " │ House Price Prediction – Docker Container │"
echo " │ │"
echo " │ Run notebooks: jupyter lab --ip=0.0.0.0 │"
echo " │ --no-browser --allow-root │"
echo " │ │"
echo " │ Run API: python template.API.py │"
echo " │ Run example: python template.example.py │"
echo " │ Run Ansible: ansible-playbook playbook.yaml │"
echo " └─────────────────────────────────────────────────────┘"
echo ""
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#
# This file MUST be edited with the 'visudo' command as root.
#
# Please consider adding local content in /etc/sudoers.d/ instead of
# directly modifying this file.
#
# See the man page for details on how to write a sudoers file.
#
Defaults env_reset
Defaults mail_badpass
Defaults secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin"

# Host alias specification

# User alias specification

# Cmnd alias specification

# User privilege specification
root ALL=(ALL:ALL) ALL

# Members of the admin group may gain root privileges
%admin ALL=(ALL) ALL

# Allow members of group sudo to execute any command
%sudo ALL=(ALL:ALL) ALL

# See sudoers(5) for more information on "#include" directives:
postgres ALL=(ALL) NOPASSWD:ALL

#includedir /etc/sudoers.d
Binary file not shown.
Loading