Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
260 changes: 260 additions & 0 deletions 02_ml_inference/02_sentiment_analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,260 @@
# flash-sentiment

Flash application demonstrating distributed GPU and CPU computing on Runpod's serverless infrastructure.

## About This Template

This project was generated using `flash init`. The `flash-sentiment` placeholder is automatically replaced with your actual project name during initialization.

Comment on lines +1 to +8
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README reads like the generic flash init template and doesn’t mention Hugging Face / sentiment analysis in the overview. Update the intro/“What this demonstrates” sections to match the actual purpose of this example (Hugging Face sentiment classification).

Copilot uses AI. Check for mistakes.
## Quick Start

### 1. Install Dependencies

```bash
pip install -r requirements.txt
```

### 2. Configure Environment

Create `.env` file:

```bash
RUNPOD_API_KEY=your_api_key_here
```

Get your API key from [Runpod Settings](https://www.runpod.io/console/user/settings).

### 3. Run Locally

```bash
# Standard run
Comment on lines +17 to +30
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README instructs users to create a .env file manually, but the repo’s other examples typically provide a .env.example to copy from. Consider updating the instructions to cp .env.example .env and adding a matching .env.example file for consistency.

Copilot uses AI. Check for mistakes.
flash run

# Faster development: pre-provision endpoints (eliminates cold-start delays)
flash run --auto-provision
```

Server starts at **http://localhost:8000**

With `--auto-provision`, all serverless endpoints deploy before testing begins. This is much faster for development because endpoints are cached and reused across server restarts. Subsequent runs skip deployment and start immediately.
Comment on lines +37 to +39
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README states the server starts at http://localhost:8000, but the example code (and other examples) default to FLASH_PORT=8888. Update the README to reflect the actual default port or adjust the code to match the documented port.

Copilot uses AI. Check for mistakes.

### 4. Test the API

```bash
# Health check
curl http://localhost:8000/ping

# GPU worker
curl -X POST http://localhost:8000/gpu/hello \
-H "Content-Type: application/json" \
-d '{"message": "Hello GPU!"}'

# CPU worker
curl -X POST http://localhost:8000/cpu/hello \
-H "Content-Type: application/json" \
-d '{"message": "Hello CPU!"}'
```
Comment on lines +41 to +56
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README currently only documents /gpu/hello and /cpu/hello, but the PR’s main feature is sentiment analysis via /classify. Add usage docs (curl example + request/response) for /classify, otherwise users won’t know how to run the Hugging Face sentiment demo.

Copilot uses AI. Check for mistakes.

Visit **http://localhost:8000/docs** for interactive API documentation.

## What This Demonstrates

### GPU Worker (`workers/gpu/`)
Simple GPU-based serverless function:
- Remote execution with `@remote` decorator
- GPU resource configuration
- Automatic scaling (0-3 workers)
- No external dependencies required

```python
@remote(
resource_config=LiveServerless(
name="gpu_worker",
gpus=[GpuGroup.ADA_24], # RTX 4090
workersMin=0,
workersMax=3,
)
)
async def gpu_hello(input_data: dict) -> dict:
# Your GPU code here
return {"status": "success", "message": "Hello from GPU!"}
```

### CPU Worker (`workers/cpu/`)
Simple CPU-based serverless function:
- CPU-only execution (no GPU overhead)
- CpuLiveServerless configuration
- Efficient for API endpoints
- Automatic scaling (0-5 workers)

```python
@remote(
resource_config=CpuLiveServerless(
name="cpu_worker",
instanceIds=[CpuInstanceType.CPU3G_2_8], # 2 vCPU, 8GB RAM
workersMin=0,
workersMax=5,
)
)
async def cpu_hello(input_data: dict) -> dict:
# Your CPU code here
return {"status": "success", "message": "Hello from CPU!"}
```

## Project Structure

```
flash-sentiment/
├── main.py # FastAPI application
├── workers/
│ ├── gpu/ # GPU worker
│ │ ├── __init__.py # FastAPI router
│ │ └── endpoint.py # @remote decorated function
│ └── cpu/ # CPU worker
│ ├── __init__.py # FastAPI router
│ └── endpoint.py # @remote decorated function
├── .env # Environment variables
├── requirements.txt # Dependencies
└── README.md # This file
```

## Key Concepts

### Remote Execution
The `@remote` decorator transparently executes functions on serverless infrastructure:
- Code runs locally during development
- Automatically deploys to Runpod when configured
- Handles serialization, dependencies, and resource management

### Resource Scaling
Both workers scale to zero when idle to minimize costs:
- **idleTimeout**: Minutes before scaling down (default: 5)
- **workersMin**: 0 = completely scales to zero
- **workersMax**: Maximum concurrent workers

### GPU Types
Available GPU options for `LiveServerless`:
- `GpuGroup.ADA_24` - RTX 4090 (24GB)
- `GpuGroup.ADA_48_PRO` - RTX 6000 Ada, L40 (48GB)
- `GpuGroup.AMPERE_80` - A100 (80GB)
- `GpuGroup.ANY` - Any available GPU

### CPU Types
Available CPU options for `CpuLiveServerless`:
- `CpuInstanceType.CPU3G_2_8` - 2 vCPU, 8GB RAM (General Purpose)
- `CpuInstanceType.CPU3C_4_8` - 4 vCPU, 8GB RAM (Compute Optimized)
- `CpuInstanceType.CPU5G_4_16` - 4 vCPU, 16GB RAM (Latest Gen)
- `CpuInstanceType.ANY` - Any available GPU
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CPU instance list includes CpuInstanceType.ANY described as “Any available GPU”, which is incorrect and will confuse readers. Fix the description to reference CPU availability (or remove the line if that enum doesn’t exist).

Suggested change
- `CpuInstanceType.ANY` - Any available GPU
- `CpuInstanceType.ANY` - Any available CPU

Copilot uses AI. Check for mistakes.

## Development Workflow

### Test Workers Locally
```bash
# Test GPU worker
python -m workers.gpu.endpoint

# Test CPU worker
python -m workers.cpu.endpoint
```

### Run the Application
```bash
flash run
```

### Deploy to Production
```bash
# Build and deploy in one step
flash deploy

# Or deploy to a specific environment
flash deploy --env production
```

## Adding New Workers

### Add a GPU Worker

1. Create `workers/my_worker/endpoint.py`:
```python
from runpod_flash import remote, LiveServerless

config = LiveServerless(name="my_worker")

@remote(resource_config=config, dependencies=["torch"])
async def my_function(data: dict) -> dict:
import torch
# Your code here
return {"result": "success"}
```

2. Create `workers/my_worker/__init__.py`:
```python
from fastapi import APIRouter
from .endpoint import my_function

router = APIRouter()

@router.post("/process")
async def handler(data: dict):
return await my_function(data)
```

3. Add to `main.py`:
```python
from workers.my_worker import router as my_router
app.include_router(my_router, prefix="/my_worker")
```

### Add a CPU Worker

Same pattern but use `CpuLiveServerless`:
```python
from runpod_flash import remote, CpuLiveServerless, CpuInstanceType

config = CpuLiveServerless(
name="my_cpu_worker",
instanceIds=[CpuInstanceType.CPU3G_2_8]
)

@remote(resource_config=config, dependencies=["requests"])
async def fetch_data(url: str) -> dict:
import requests
return requests.get(url).json()
```

## Adding Dependencies

Specify dependencies in the `@remote` decorator:
```python
@remote(
resource_config=config,
dependencies=["torch>=2.0.0", "transformers"], # Python packages
system_dependencies=["ffmpeg"] # System packages
)
async def my_function(data: dict) -> dict:
# Dependencies are automatically installed
import torch
import transformers
```

## Environment Variables

```bash
# Required
RUNPOD_API_KEY=your_api_key

# Optional
FLASH_HOST=localhost # Host to bind the server to (default: localhost)
FLASH_PORT=8888 # Port to bind the server to (default: 8888)
LOG_LEVEL=INFO # Logging level (default: INFO)
```

## Next Steps

- Add your ML models or processing logic
- Configure GPU/CPU resources based on your needs
- Add authentication to your endpoints
- Implement error handling and retries
- Add monitoring and logging
- Deploy to production with `flash deploy`
64 changes: 64 additions & 0 deletions 02_ml_inference/02_sentiment_analysis/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import logging
import os
import sentiment # noqa: F401
from sentiment import classify


from fastapi import FastAPI

Comment on lines +3 to +8
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import sentiment # noqa: F401 is redundant because from sentiment import classify already imports the module, and the noqa suppresses a real unused-import warning. Remove the redundant import (or import only what’s needed) to keep the example clean.

Suggested change
import sentiment # noqa: F401
from sentiment import classify
from fastapi import FastAPI
from sentiment import classify
from fastapi import FastAPI

Copilot uses AI. Check for mistakes.
from workers.cpu import cpu_router
from workers.gpu import gpu_router

logger = logging.getLogger(__name__)


app = FastAPI(
title="Flash Application",
description="Distributed GPU and CPU computing with Runpod Flash",
version="0.1.0",
)

# Include routers
app.include_router(gpu_router, prefix="/gpu", tags=["GPU Workers"])
app.include_router(cpu_router, prefix="/cpu", tags=["CPU Workers"])


@app.get("/")
def home():
return {
"message": "Flash Application",
"docs": "/docs",
"endpoints": {"gpu_hello": "/gpu/hello", "cpu_hello": "/cpu/hello"},
Comment on lines +29 to +31
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The / response omits the new /classify endpoint, so users won’t discover the main sentiment-analysis functionality from the homepage payload. Add /classify to the returned endpoints (and ideally update the message/description to mention sentiment analysis).

Suggested change
"message": "Flash Application",
"docs": "/docs",
"endpoints": {"gpu_hello": "/gpu/hello", "cpu_hello": "/cpu/hello"},
"message": "Flash Application - Sentiment Analysis",
"docs": "/docs",
"endpoints": {
"gpu_hello": "/gpu/hello",
"cpu_hello": "/cpu/hello",
"classify": "/classify",
},

Copilot uses AI. Check for mistakes.
}


@app.get("/ping")
def ping():
return {"status": "healthy"}

from pydantic import BaseModel

Comment on lines +39 to +40
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports are split across the file (from pydantic import BaseModel mid-file and additional runpod_flash imports at the bottom). This breaks the import pattern used across other examples and makes it easy to miss unused/duplicate code; move imports to the top and remove unused ones.

Copilot uses AI. Check for mistakes.
class ClassifyRequest(BaseModel):
text: str

@app.post("/classify", tags=["AI"])
async def classify_endpoint(req: ClassifyRequest):
# classify() is a Flash remote function, so you must await it
return await classify(req.text)

if __name__ == "__main__":
import uvicorn

host = os.getenv("FLASH_HOST", "localhost")
port = int(os.getenv("FLASH_PORT", 8888))
logger.info(f"Starting Flash server on {host}:{port}")

uvicorn.run(app, host=host, port=port)

from runpod_flash import remote, LiveServerless, CpuInstanceType

cpu_config = LiveServerless(
name="flash-ai-sentiment",
instanceIds=[CpuInstanceType.CPU3G_2_8],
workersMax=1,
)
Comment on lines +56 to +64
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trailing runpod_flash imports and cpu_config = LiveServerless(...) block at the bottom are unused and duplicate the config in sentiment.py. Because they execute at import time, they add confusion and can cause accidental name collisions; remove this dead code.

Suggested change
uvicorn.run(app, host=host, port=port)
from runpod_flash import remote, LiveServerless, CpuInstanceType
cpu_config = LiveServerless(
name="flash-ai-sentiment",
instanceIds=[CpuInstanceType.CPU3G_2_8],
workersMax=1,
)
uvicorn.run(app, host=host, port=port)

Copilot uses AI. Check for mistakes.
55 changes: 55 additions & 0 deletions 02_ml_inference/02_sentiment_analysis/mothership.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
"""
Mothership Endpoint Configuration

The mothership endpoint serves your FastAPI application routes.
It is automatically deployed as a CPU-optimized load-balanced endpoint.

To customize this configuration:
- Modify worker scaling: change workersMin and workersMax values
- Use GPU load balancer: import LiveLoadBalancer instead of CpuLiveLoadBalancer
- Change endpoint name: update the 'name' parameter

To disable mothership deployment:
- Delete this file, or
- Comment out the 'mothership' variable below

Documentation: https://docs.runpod.io/flash/mothership
"""

from runpod_flash import CpuLiveLoadBalancer

# Mothership endpoint configuration
# This serves your FastAPI app routes from main.py
mothership = CpuLiveLoadBalancer(
name="mothership",
workersMin=1,
workersMax=3,
)
Comment on lines +23 to +27
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mothership load balancer name is set to the generic "mothership". Other examples use unique, namespaced endpoint names (to avoid collisions across examples/accounts); consider renaming this to something like 02_02_sentiment_analysis-mothership.

Copilot uses AI. Check for mistakes.

# Examples of customization:

# Increase scaling for high traffic
# mothership = CpuLiveLoadBalancer(
# name="mothership",
# workersMin=2,
# workersMax=10,
# )

# Use GPU-based load balancer instead of CPU
# (requires importing LiveLoadBalancer)
# from runpod_flash import LiveLoadBalancer
# mothership = LiveLoadBalancer(
# name="mothership",
# gpus=[GpuGroup.ANY],
# )

# Custom endpoint name
# mothership = CpuLiveLoadBalancer(
# name="my-api-gateway",
# workersMin=1,
# workersMax=3,
# )

# To disable mothership:
# - Delete this entire file, or
# - Comment out the 'mothership' variable above
Loading
Loading