AC215 - Milestone 5 (MSMBAllstars)

Team Members

Itamar Belson
Kenny Chen
Sam Crowder
Clay Coleman

Group Name

MSMBAllstars

Project Overview

Our project develops a machine learning application that predicts tennis match outcomes using historical ATP match data. The system combines an LSTM-based prediction model with an LLM-powered chat interface for user interaction.

Milestone 5 - Kubernetes Deployment & GPU Acceleration & ML Pipeline

For this milestone, we've implemented a robust Kubernetes deployment on Google Cloud Platform (GCP) with the following key features:

Kubernetes Cluster Architecture
- Multi-node GKE cluster with both CPU and GPU nodes
- GPU node pool using NVIDIA L4 GPUs for LLM acceleration
- Load balancing and auto-scaling capabilities
- Resource optimization across nodes
Service Components
- API Service (FastAPI)
- Probability Model Service (Tennis prediction model)
- LLM Service (Chat interface)
- Ollama Service (GPU-accelerated LLM model)
Infrastructure as Code
- Ansible-based deployment automation
- Kubernetes manifests for all services
- GPU resource management and scheduling
- Container orchestration and scaling
GPU Acceleration
- NVIDIA device plugin integration
- GPU-optimized Ollama container
- Efficient resource allocation for ML workloads
ML Pipeline
- Single pipeline for preprocessing (see run_pipeline.sh in root)
- Training on GCP Vertex AI and sweep optimization on Weights & Biases
- Deployment of model only if passes validation metric threshold

System Architecture

Deployment Architecture

The system is deployed on GKE with the following node configuration:

3 CPU nodes (e2-medium) for general workloads
1 GPU node (g2-standard-4) with NVIDIA L4 for LLM acceleration

Node Pool Configuration

# CPU Node Pool
gcloud container node-pools create default-pool \
    --machine-type=e2-medium \
    --num-nodes=3

# GPU Node Pool
gcloud container node-pools create l4-gpu-pool \
    --machine-type=g2-standard-4 \
    --accelerator type=nvidia-l4,count=1 \
    --num-nodes=1

Deployment Process

Setup GCP Project

# Set project ID
export PROJECT_ID="tennis-match-predictor"
gcloud config set project $PROJECT_ID

Create GKE Cluster

gcloud container clusters create tennis-predictor-cluster \
    --zone us-central1-a \
    --machine-type g2-standard-4

Deploy Services with Ansible

There are two ways to deploy:

a. Using the deployment script, which first builds and pushes the Docker images for all services, then deploys them to Kubernetes using Ansible.

cd src/deploy
./deploy.zsh

b. Using GitHub Actions:

Push to main branch, or
Manually trigger the "Deploy to GKE" workflow

The deployment script handles:

Building and pushing Docker images for all services
Deploying services to Kubernetes using Ansible

Verify Deployment

kubectl get pods -o wide
kubectl get services

Service Endpoints

The application exposes the following endpoints:

API Service: http://<external-ip>:8000
- /predict - Match prediction endpoint
- /chat - WebSocket chat endpoint
Probability Model: Internal service on port 8001
LLM Service: Internal service on port 8002
Ollama Service: Internal service on port 11434

Monitoring and Maintenance

Check GPU Status

kubectl describe node <gpu-node-name> | grep nvidia

View Service Logs

kubectl logs -f deployment/api
kubectl logs -f deployment/ollama

Monitor Resources

kubectl top nodes
kubectl top pods

Testing

To test the deployed services:

Prediction API

curl -X POST "http://<external-ip>:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "player_a_id": "Novak Djokovic",
    "player_b_id": "Roger Federer",
    "lookback": 10
  }'

Chat API

curl -X POST "http://<external-ip>:8000/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "player_a_id": "Novak Djokovic",
    "player_b_id": "Roger Federer",
    "query": "Who is more likely to win between Federer and Novak?",
    "history": []
  }'

Project Organization

├── README.md
├── src
    ├── ansible/                    # Ansible deployment configuration
    │   ├── inventory/
    │   ├── roles/
    │   └── deploy-k8s.yml
    ├── api/                        # FastAPI application
    ├── llm/                        # LLM service
    ├── probability_model/          # Tennis prediction model
    └── ollama/                     # GPU-accelerated LLM container

Future Improvements

Implement horizontal pod autoscaling (HPA)
Add monitoring with Prometheus and Grafana
Implement CI/CD pipeline for automated deployments
Add backup and disaster recovery procedures

Name		Name	Last commit message	Last commit date
Latest commit History 326 Commits
.dvc		.dvc
.github/workflows		.github/workflows
.vscode		.vscode
data		data
deliverables		deliverables
notebooks		notebooks
references		references
reports		reports
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AC215 - Milestone 5 (MSMBAllstars)

Project Overview

Milestone 5 - Kubernetes Deployment & GPU Acceleration & ML Pipeline

System Architecture

Deployment Architecture

Node Pool Configuration

Deployment Process

Service Endpoints

Monitoring and Maintenance

Testing

Project Organization

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AC215 - Milestone 5 (MSMBAllstars)

Project Overview

Milestone 5 - Kubernetes Deployment & GPU Acceleration & ML Pipeline

System Architecture

Deployment Architecture

Node Pool Configuration

Deployment Process

Service Endpoints

Monitoring and Maintenance

Testing

Project Organization

Future Improvements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages