gpsaggese · likhongomes · Mar 27, 2026 · Apr 8, 2026 · Apr 9, 2026 · Apr 9, 2026
diff --git a/.../data605/Spring2026/projects/UmdTask405_-DATA605_Spring2026_Ansible_Deployment/Dockerfile b/.../data605/Spring2026/projects/UmdTask405_-DATA605_Spring2026_Ansible_Deployment/Dockerfile
@@ -0,0 +1,30 @@
+# Use Python 3.12 slim (already has Python and pip).
+FROM python:3.12-slim
+
+# Avoid interactive prompts during apt operations.
+ENV DEBIAN_FRONTEND=noninteractive
+
+# Install CA certificates (needed for HTTPS).
+RUN apt-get update && apt-get install -y \
+    ca-certificates \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install project specific packages.
+RUN mkdir -p /install
+COPY requirements.txt /install/requirements.txt
+RUN pip install --upgrade pip && \
+    pip install --no-cache-dir jupyterlab jupyterlab_vim jupytext -r /install/requirements.txt
+
+# Config.
+COPY etc_sudoers /install/
+COPY etc_sudoers /etc/sudoers
+COPY bashrc /root/.bashrc
+
+# Report package versions.
+COPY version.sh /install/
+RUN /install/version.sh 2>&1 | tee version.log
+
+# Jupyter.
+EXPOSE 8888
+
+CMD ["/bin/bash"]
diff --git a/...Spring2026/projects/UmdTask405_-DATA605_Spring2026_Ansible_Deployment/ReadMe.md b/...Spring2026/projects/UmdTask405_-DATA605_Spring2026_Ansible_Deployment/ReadMe.md
@@ -0,0 +1,149 @@
+# Ansible
+
+## Description
+- Ansible is an open-source automation tool used for application deployment,
+  configuration management, and task automation.
+- It uses a simple, human-readable YAML syntax to define automation tasks,
+  making it accessible for users without extensive programming knowledge.
+- Ansible operates in an agentless manner, meaning it does not require any
+  software to be installed on the target machines, allowing for easier
+  management of systems.
+- It supports a wide range of modules for various tasks, including cloud
+  provisioning, orchestration, and security compliance, enabling extensive
+  automation capabilities.
+- Ansible is designed to be idempotent, which means that running the same
+  playbook multiple times will not change the system beyond the initial
+  application, ensuring stability and predictability.
+
+## How to run the project
+## 🚀 Step-by-Step Execution
+
+
+### Step 1 — Build the Docker image
+
+```bash
+docker build -t house-price-project .
+```
+
+> This installs all dependencies from `requirements.txt` and sets up
+> JupyterLab inside the container. Takes ~3–5 minutes on first build.
+
+### Step 2 — Start the container
+
+```bash
+docker run -it -p 5001:5000 -p 8888:8888 \
+  --name house-price \
+  -v $(pwd):/project \
+  house-price-project
+```
+| Flag | Meaning |
+|------|---------|
+| `-it` | Interactive shell |
+| `-p 5001:5000` | Mac port 5001 → container port 5000 (Flask API) |
+| `-p 8888:8888` | Mac port 8888 → container port 8888 (JupyterLab) |
+| `--name house-price` | Give the container a fixed name |
+| `-v $(pwd):/project` | Mount project folder so files persist |
+
+You will land inside the container at `root@container:/project#`.
+
+### Step 3 — Train the model
+
+Inside the container:
+
+If you want to run the JupyterLab interface, execute:
+```bash
+PORT=5000 jupyter lab --ip=0.0.0.0 --no-browser --allow-root
+```
+
+ if you want to run the training script, execute:
+```bash
+python template.example.py
+```
+
+Expected output:
+WARNING: File 'ml_model/train.csv' not found – generating synthetic dataset.
+INFO: Dataset shape: (1460, 16)
+INFO: Cross-validating GradientBoosting (5 folds)…
+INFO: Cross-validating RandomForest (5 folds)…
+INFO: Cross-validating Ridge (5 folds)…
+INFO: Best model: GradientBoosting
+INFO: Test R²: 0.9822
+INFO: Model saved to '/project/ml_model/house_price_model.pkl'.
+
+> If you have the Kaggle dataset, place `train.csv` in `ml_model/` before
+> running this step to train on real data instead of synthetic data.
+
+### Step 4 — Start the Flask API
+
+Inside the container (keep this terminal open):
+
+```bash
+PORT=5000 python app.py
+```
+
+then in Jupyter Notebook run template.API.py inside the notebook and run the cells
+
+Once everything is done, you can run the whole process using Ansible. Make sure you have ansible installed and configured properly. Then, execute the following command in your terminal:
+
+```bash
+ansible-playbook playbook.yml
+```
+
+
+## Project Objective
+The goal of the project is to automate the deployment of a machine learning
+model using Ansible. Students will create a playbook that provisions a virtual
+machine, installs necessary dependencies, and deploys a pre-trained model to
+serve predictions via a REST API. The project will optimize the deployment
+process to ensure it is efficient and reproducible.
+
+## Dataset Suggestions
+1. **Kaggle House Prices Dataset**
+   - **Source Name**: Kaggle
+   - **URL**:
+     [Kaggle House Prices](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)
+   - **Data Contains**: Various features of houses in Ames, Iowa, including sale
+     prices, which can be used for regression tasks.
+   - **Access Requirements**: Free account on Kaggle.
+
+2. **UCI Machine Learning Repository: Wine Quality Dataset**
+   - **Source Name**: UCI Machine Learning Repository
+   - **URL**:
+     [Wine Quality Dataset](https://archive.ics.uci.edu/ml/datasets/wine+quality)
+   - **Data Contains**: Chemical properties of wine samples along with quality
+     ratings, suitable for classification tasks.
+   - **Access Requirements**: No authentication required.
+
+3. **Open Government Data: NYC Taxi Trip Data**
+   - **Source Name**: NYC Open Data
+   - **URL**: [NYC Taxi Trip Data](https://opendata.cityofnewyork.us/)
+   - **Data Contains**: Trip records including pickup and drop-off locations,
+     times, and fares, which can be used for regression or clustering tasks.
+   - **Access Requirements**: Publicly available without authentication.
+
+## Tasks
+- **Set Up Virtual Environment**: Create a virtual machine using Ansible to host
+  the machine learning model.
+- **Install Dependencies**: Write Ansible tasks to install necessary libraries
+  and frameworks (e.g., Flask, scikit-learn) for serving the model.
+- **Deploy Model**: Use Ansible to copy the pre-trained model files to the
+  virtual machine and configure the application to serve predictions.
+- **Create REST API**: Implement a simple REST API using Flask to handle
+  incoming prediction requests and return results.
+- **Testing and Validation**: Write Ansible tasks to test the deployment and
+  validate that the API is returning the expected outputs.
+
+## Bonus Ideas
+- **Monitoring and Logging**: Extend the project by integrating monitoring tools
+  (e.g., Prometheus) to keep track of API performance and logs.
+- **Scaling Deployment**: Explore how to scale the deployment across multiple
+  servers using Ansible's orchestration capabilities.
+- **CI/CD Pipeline**: Implement a continuous integration/continuous deployment
+  (CI/CD) pipeline to automate updates to the model and application.
+
+## Useful Resources
+- [Ansible Documentation](https://docs.ansible.com/ansible/latest/index.html)
+- [Ansible GitHub Repository](https://github.com/ansible/ansible)
+- [Kaggle Datasets](https://www.kaggle.com/datasets)
+- [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php)
+- [Flask Documentation](https://flask.palletsprojects.com/)
diff --git a/...ject/data605/Spring2026/projects/UmdTask405_-DATA605_Spring2026_Ansible_Deployment/app.py b/...ject/data605/Spring2026/projects/UmdTask405_-DATA605_Spring2026_Ansible_Deployment/app.py
@@ -0,0 +1,108 @@
+"""
+app.py
+──────
+House Price Prediction – Flask REST API server.
+
+Run:
+    python app.py
+
+Endpoints:
+    GET  /health            Liveness probe.
+    GET  /features          Feature catalogue and defaults.
+    POST /predict           Predict price for a single house.
+    POST /predict/batch     Predict prices for multiple houses.
+"""
+
+import logging
+import os
+import sys
+
+# Add /project to the path so template_utils can be imported directly.
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+
+import template_utils as tu
+from flask import Flask, jsonify, request
+
+# ── App setup ─────────────────────────────────────────────────
+app = Flask(__name__)
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s [%(levelname)s] %(message)s",
+)
+_LOG = logging.getLogger(__name__)
+
+# Eager-load the model at startup so the first request is fast.
+try:
+    _model = tu.load_model_artifact()
+    _LOG.info("Model loaded successfully.")
+except FileNotFoundError as exc:
+    _LOG.warning("Startup model load failed: %s", exc)
+    _model = None
+
+
+# ── Routes ────────────────────────────────────────────────────
+@app.get("/health")
+def health():
+    """Return API liveness and model status."""
+    status = "ok" if _model is not None else "model_unavailable"
+    return jsonify({"status": status}), 200 if _model else 503
+
+
+@app.get("/features")
+def features():
+    """Return the feature catalogue and default values."""
+    return jsonify({
+        "numeric_features":     tu.NUMERIC_FEATURES,
+        "categorical_features": tu.CATEGORICAL_FEATURES,
+        "defaults":             tu.FEATURE_DEFAULTS,
+    })
+
+
+@app.post("/predict")
+def predict():
+    """
+    Predict the sale price for a single house.
+
+    All request fields are optional; missing values use FEATURE_DEFAULTS.
+    """
+    if _model is None:
+        return jsonify({"error": "Model not loaded"}), 503
+    try:
+        payload = request.get_json(force=True) or {}
+        errors = tu.validate_features(payload)
+        if errors:
+            return jsonify({"error": "Validation failed", "details": errors}), 400
+        price = tu.predict_price(payload, model=_model)
+        _LOG.info("predict  price=%.2f  payload=%s", price, payload)
+        return jsonify({
+            "predicted_price": price,
+            "model_version":   "1.0",
+        })
+    except Exception as exc:
+        _LOG.exception("Prediction error.")
+        return jsonify({"error": str(exc)}), 500
+
+
+@app.post("/predict/batch")
+def predict_batch():
+    """Predict prices for multiple houses in one call."""
+    if _model is None:
+        return jsonify({"error": "Model not loaded"}), 503
+    try:
+        body      = request.get_json(force=True) or {}
+        instances = body.get("instances", [])
+        if not instances:
+            return jsonify({"error": "No instances provided"}), 400
+        prices = [tu.predict_price(inst, model=_model) for inst in instances]
+        _LOG.info("batch_predict  count=%d", len(prices))
+        return jsonify({"predictions": prices, "count": len(prices)})
+    except Exception as exc:
+        _LOG.exception("Batch prediction error.")
+        return jsonify({"error": str(exc)}), 500
+
+
+# ── Entry point ───────────────────────────────────────────────
+if __name__ == "__main__":
+    port  = int(os.getenv("PORT", 5000))
+    debug = os.getenv("FLASK_DEBUG", "false").lower() == "true"
+    app.run(host="0.0.0.0", port=port, debug=debug)
diff --git a/...ject/data605/Spring2026/projects/UmdTask405_-DATA605_Spring2026_Ansible_Deployment/bashrc b/...ject/data605/Spring2026/projects/UmdTask405_-DATA605_Spring2026_Ansible_Deployment/bashrc
@@ -0,0 +1,26 @@
+# ~/.bashrc – container shell configuration.
+
+# Prompt.
+export PS1='\[\033[01;32m\]\u@container\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ '
+
+# Convenience aliases.
+alias ll='ls -alF'
+alias la='ls -A'
+alias l='ls -CF'
+
+# Set working directory to the project root on login.
+cd /project 2>/dev/null || true
+
+# Show available commands on startup.
+echo ""
+echo "  ┌─────────────────────────────────────────────────────┐"
+echo "  │  House Price Prediction – Docker Container          │"
+echo "  │                                                     │"
+echo "  │  Run notebooks:  jupyter lab --ip=0.0.0.0           │"
+echo "  │                  --no-browser --allow-root          │"
+echo "  │                                                     │"
+echo "  │  Run API:        python template.API.py             │"
+echo "  │  Run example:    python template.example.py         │"
+echo "  │  Run Ansible:    ansible-playbook playbook.yaml     │"
+echo "  └─────────────────────────────────────────────────────┘"
+echo ""
diff --git a/...data605/Spring2026/projects/UmdTask405_-DATA605_Spring2026_Ansible_Deployment/etc_sudoers b/...data605/Spring2026/projects/UmdTask405_-DATA605_Spring2026_Ansible_Deployment/etc_sudoers
@@ -0,0 +1,31 @@
+#
+# This file MUST be edited with the 'visudo' command as root.
+#
+# Please consider adding local content in /etc/sudoers.d/ instead of
+# directly modifying this file.
+#
+# See the man page for details on how to write a sudoers file.
+#
+Defaults        env_reset
+Defaults        mail_badpass
+Defaults        secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin"
+
+# Host alias specification
+
+# User alias specification
+
+# Cmnd alias specification
+
+# User privilege specification
+root    ALL=(ALL:ALL) ALL
+
+# Members of the admin group may gain root privileges
+%admin ALL=(ALL) ALL
+
+# Allow members of group sudo to execute any command
+%sudo   ALL=(ALL:ALL) ALL
+
+# See sudoers(5) for more information on "#include" directives:
+postgres ALL=(ALL) NOPASSWD:ALL
+
+#includedir /etc/sudoers.d
diff --git a/...ct/data605/Spring2026/projects/UmdTask405_-DATA605_Spring2026_Ansible_Deployment/logs.log b/...ct/data605/Spring2026/projects/UmdTask405_-DATA605_Spring2026_Ansible_Deployment/logs.log
diff --git a/...g2026/projects/UmdTask405_-DATA605_Spring2026_Ansible_Deployment/ml/house_price_model.pkl b/...g2026/projects/UmdTask405_-DATA605_Spring2026_Ansible_Deployment/ml/house_price_model.pkl