CodePilot Deployment Guide

This guide covers different deployment scenarios for CodePilot, from local development to production deployment.

Local Development Setup
Local Inference Server
AWS Deployment
Docker Deployment
VSCode Extension Installation
Production Considerations

Local Development Setup

Prerequisites

Ubuntu 20.04+ (or similar Linux distribution)
Python 3.8+
CUDA 11.8+ (for GPU support)
16GB+ RAM (32GB recommended for training)
Node.js 16+ (for VSCode extension)

Quick Start

# Clone repository
git clone https://github.com/sreekar-gajula/code-pilot.git
cd code-pilot

# Run automated setup
chmod +x scripts/setup.sh
./scripts/setup.sh

# Activate environment
source venv/bin/activate

Manual Setup

If automated setup fails:

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Create directories
mkdir -p data/{raw,processed} model/checkpoints logs evaluation/results

Local Inference Server

Option 1: Using Pre-trained Model

# Activate environment
source venv/bin/activate

# Set model path (will auto-download from HuggingFace)
export MODEL_PATH="codellama/CodeLlama-13b-Instruct-hf"

# Start server
cd inference
python server.py --model-path $MODEL_PATH --port 8000

Option 2: Using Fine-tuned Model

After training your own model:

# Set path to your fine-tuned model
export MODEL_PATH="./model/checkpoints/codepilot-13b"

# Start server
cd inference
python server.py --model-path $MODEL_PATH --port 8000

Testing the Server

# Health check
curl http://localhost:8000/health

# Test completion
curl -X POST http://localhost:8000/complete \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "void can_transmit_message(can_msg_t* msg) {\n",
    "max_tokens": 150,
    "temperature": 0.2
  }'

Server Configuration

Edit inference/config.yaml:

model:
  path: "./model/codepilot-13b"
  quantization: "4bit"
  
server:
  host: "0.0.0.0"
  port: 8000
  workers: 4
  
vllm:
  max_model_len: 4096
  tensor_parallel_size: 1
  dtype: "float16"

AWS Deployment

EC2 Instance Setup

Launch EC2 Instance
- Instance Type: g5.2xlarge (NVIDIA A10G GPU)
- AMI: Deep Learning AMI (Ubuntu)
- Storage: 100GB EBS
- Security Group: Open port 8000
Connect and Setup

# SSH into instance
ssh -i your-key.pem ubuntu@<instance-ip>

# Clone repository
git clone https://github.com/sreekar-gajula/code-pilot.git
cd code-pilot

# Run setup
./scripts/setup.sh

# Install NVIDIA drivers (if not using DL AMI)
sudo apt-get install -y nvidia-driver-525

Start Service

# Run as background service with systemd
sudo cp scripts/codepilot.service /etc/systemd/system/
sudo systemctl enable codepilot
sudo systemctl start codepilot

# Check status
sudo systemctl status codepilot

systemd Service File

Create /etc/systemd/system/codepilot.service:

[Unit]
Description=CodePilot Inference Server
After=network.target

[Service]
Type=simple
User=ubuntu
WorkingDirectory=/home/ubuntu/code-pilot
Environment="PATH=/home/ubuntu/code-pilot/venv/bin"
ExecStart=/home/ubuntu/code-pilot/venv/bin/python inference/server.py
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Load Balancing (Optional)

For high availability, use AWS ELB:

# Install nginx
sudo apt-get install -y nginx

# Configure reverse proxy
sudo nano /etc/nginx/sites-available/codepilot

Nginx config:

server {
    listen 80;
    server_name your-domain.com;

    location / {
        proxy_pass http://localhost:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Docker Deployment

Build Image

# Build production image
docker build -t codepilot:latest .

# Or build development image
docker build --target development -t codepilot:dev .

Run Container

# Run with GPU support
docker run --gpus all -p 8000:8000 \
  -v $(pwd)/model:/app/model \
  codepilot:latest

# Run with custom configuration
docker run --gpus all -p 8000:8000 \
  -e MODEL_PATH=/app/model/codepilot-13b \
  -e API_PORT=8000 \
  codepilot:latest

Docker Compose

Create docker-compose.yml:

version: '3.8'

services:
  codepilot:
    image: codepilot:latest
    ports:
      - "8000:8000"
    volumes:
      - ./model:/app/model
      - ./logs:/app/logs
    environment:
      - MODEL_PATH=/app/model/codepilot-13b
      - CUDA_VISIBLE_DEVICES=0
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Run with:

docker-compose up -d

VSCode Extension Installation

Method 1: From Source (Development)

cd vscode-extension

# Install dependencies
npm install

# Compile TypeScript
npm run compile

# Open in VSCode and press F5 to launch extension development host

Method 2: Package and Install

cd vscode-extension

# Install VSCE
npm install -g vsce

# Package extension
vsce package

# Install in VSCode
code --install-extension codepilot-automotive-1.0.0.vsix

Method 3: From Marketplace (Future)

Once published:

Open VSCode
Go to Extensions (Ctrl+Shift+X)
Search "CodePilot Automotive"
Click Install

Configuration

After installation, configure in VSCode settings:

{
  "codepilot.apiEndpoint": "http://localhost:8000",
  "codepilot.enableAutoComplete": true,
  "codepilot.maxTokens": 150,
  "codepilot.temperature": 0.2
}

Production Considerations

Performance Optimization

Model Quantization
- Use 4-bit quantization for memory efficiency
- Trade-off: ~5% accuracy for 4x less memory
Batch Processing
- vLLM enables efficient batching
- Configure max_num_seqs based on GPU memory
Caching
- Enable KV cache for repeated prefixes
- Can reduce latency by 30-50%

Monitoring

Prometheus Metrics

Add to inference/server.py:

from prometheus_client import Counter, Histogram

request_count = Counter('codepilot_requests_total', 'Total requests')
latency = Histogram('codepilot_latency_seconds', 'Request latency')

Logging

Configure structured logging:

import logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('logs/codepilot.log'),
        logging.StreamHandler()
    ]
)

Security

API Authentication
- Add API key validation
- Use JWT tokens for user auth
Rate Limiting
- Implement per-user rate limits
- Use Redis for distributed rate limiting
HTTPS
- Use Let's Encrypt for SSL certificates
- Configure nginx with HTTPS

Scalability

Horizontal Scaling
- Deploy multiple instances
- Use load balancer (AWS ELB, nginx)
Model Replication
- Tensor parallelism for large models
- Pipeline parallelism across nodes
Auto-scaling
- Configure AWS Auto Scaling groups
- Scale based on GPU utilization

Troubleshooting

Common Issues

Issue: CUDA out of memory

# Solution: Reduce batch size or use smaller model
export CUDA_VISIBLE_DEVICES=0
python inference/server.py --max-batch-size 1

Issue: Port already in use

# Solution: Change port or kill existing process
lsof -ti:8000 | xargs kill -9
python inference/server.py --port 8001

Issue: Model loading slow

# Solution: Use local model cache
export TRANSFORMERS_CACHE=/path/to/cache

Getting Help

GitHub Issues: Report bugs and feature requests
Discussions: Ask questions and share ideas
Email: sreekar.gajula@example.com

Next Steps

Training Guide - Train your own model
API Reference - Detailed API documentation
Examples - Usage examples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CodePilot Deployment Guide

Table of Contents

Local Development Setup

Prerequisites

Quick Start

Manual Setup

Local Inference Server

Option 1: Using Pre-trained Model

Option 2: Using Fine-tuned Model

Testing the Server

Server Configuration

AWS Deployment

EC2 Instance Setup

systemd Service File

Load Balancing (Optional)

Docker Deployment

Build Image

Run Container

Docker Compose

VSCode Extension Installation

Method 1: From Source (Development)

Method 2: Package and Install

Method 3: From Marketplace (Future)

Configuration

Production Considerations

Performance Optimization

Monitoring

Security

Scalability

Troubleshooting

Common Issues

Getting Help

Next Steps

FilesExpand file tree

deployment.md

Latest commit

History

deployment.md

File metadata and controls

CodePilot Deployment Guide

Table of Contents

Local Development Setup

Prerequisites

Quick Start

Manual Setup

Local Inference Server

Option 1: Using Pre-trained Model

Option 2: Using Fine-tuned Model

Testing the Server

Server Configuration

AWS Deployment

EC2 Instance Setup

systemd Service File

Load Balancing (Optional)

Docker Deployment

Build Image

Run Container

Docker Compose

VSCode Extension Installation

Method 1: From Source (Development)

Method 2: Package and Install

Method 3: From Marketplace (Future)

Configuration

Production Considerations

Performance Optimization

Monitoring

Security

Scalability

Troubleshooting

Common Issues

Getting Help

Next Steps