Skip to content

samecrowder/ac215_MSMBAllstars

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

326 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AC215 - Milestone 5 (MSMBAllstars)

Team Members

  • Itamar Belson
  • Kenny Chen
  • Sam Crowder
  • Clay Coleman

Group Name

MSMBAllstars

Project Overview

Our project develops a machine learning application that predicts tennis match outcomes using historical ATP match data. The system combines an LSTM-based prediction model with an LLM-powered chat interface for user interaction.

Milestone 5 - Kubernetes Deployment & GPU Acceleration & ML Pipeline

For this milestone, we've implemented a robust Kubernetes deployment on Google Cloud Platform (GCP) with the following key features:

  1. Kubernetes Cluster Architecture

    • Multi-node GKE cluster with both CPU and GPU nodes
    • GPU node pool using NVIDIA L4 GPUs for LLM acceleration
    • Load balancing and auto-scaling capabilities
    • Resource optimization across nodes
  2. Service Components

    • API Service (FastAPI)
    • Probability Model Service (Tennis prediction model)
    • LLM Service (Chat interface)
    • Ollama Service (GPU-accelerated LLM model)
  3. Infrastructure as Code

    • Ansible-based deployment automation
    • Kubernetes manifests for all services
    • GPU resource management and scheduling
    • Container orchestration and scaling
  4. GPU Acceleration

    • NVIDIA device plugin integration
    • GPU-optimized Ollama container
    • Efficient resource allocation for ML workloads
  5. ML Pipeline

    • Single pipeline for preprocessing (see run_pipeline.sh in root)
    • Training on GCP Vertex AI and sweep optimization on Weights & Biases
    • Deployment of model only if passes validation metric threshold

System Architecture

System Overview

Deployment Architecture

The system is deployed on GKE with the following node configuration:

  • 3 CPU nodes (e2-medium) for general workloads
  • 1 GPU node (g2-standard-4) with NVIDIA L4 for LLM acceleration

Node Pool Configuration

# CPU Node Pool
gcloud container node-pools create default-pool \
    --machine-type=e2-medium \
    --num-nodes=3

# GPU Node Pool
gcloud container node-pools create l4-gpu-pool \
    --machine-type=g2-standard-4 \
    --accelerator type=nvidia-l4,count=1 \
    --num-nodes=1

Deployment Process

  1. Setup GCP Project
# Set project ID
export PROJECT_ID="tennis-match-predictor"
gcloud config set project $PROJECT_ID
  1. Create GKE Cluster
gcloud container clusters create tennis-predictor-cluster \
    --zone us-central1-a \
    --machine-type g2-standard-4
  1. Deploy Services with Ansible

There are two ways to deploy:

a. Using the deployment script, which first builds and pushes the Docker images for all services, then deploys them to Kubernetes using Ansible.

cd src/deploy
./deploy.zsh

b. Using GitHub Actions:

  • Push to main branch, or
  • Manually trigger the "Deploy to GKE" workflow

The deployment script handles:

  • Building and pushing Docker images for all services
  • Deploying services to Kubernetes using Ansible
  1. Verify Deployment
kubectl get pods -o wide
kubectl get services

Service Endpoints

The application exposes the following endpoints:

  • API Service: http://<external-ip>:8000

    • /predict - Match prediction endpoint
    • /chat - WebSocket chat endpoint
  • Probability Model: Internal service on port 8001

  • LLM Service: Internal service on port 8002

  • Ollama Service: Internal service on port 11434

Monitoring and Maintenance

  1. Check GPU Status
kubectl describe node <gpu-node-name> | grep nvidia
  1. View Service Logs
kubectl logs -f deployment/api
kubectl logs -f deployment/ollama
  1. Monitor Resources
kubectl top nodes
kubectl top pods

Testing

To test the deployed services:

  1. Prediction API
curl -X POST "http://<external-ip>:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "player_a_id": "Novak Djokovic",
    "player_b_id": "Roger Federer",
    "lookback": 10
  }'
  1. Chat API
curl -X POST "http://<external-ip>:8000/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "player_a_id": "Novak Djokovic",
    "player_b_id": "Roger Federer",
    "query": "Who is more likely to win between Federer and Novak?",
    "history": []
  }'

Project Organization

├── README.md
├── src
    ├── ansible/                    # Ansible deployment configuration
    │   ├── inventory/
    │   ├── roles/
    │   └── deploy-k8s.yml
    ├── api/                        # FastAPI application
    ├── llm/                        # LLM service
    ├── probability_model/          # Tennis prediction model
    └── ollama/                     # GPU-accelerated LLM container

Future Improvements

  1. Implement horizontal pod autoscaling (HPA)
  2. Add monitoring with Prometheus and Grafana
  3. Implement CI/CD pipeline for automated deployments
  4. Add backup and disaster recovery procedures

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors