From e897470c16b79d2a00372e1d960b45a9e02447ab Mon Sep 17 00:00:00 2001 From: "promptless[bot]" <179508745+promptless[bot]@users.noreply.github.com> Date: Tue, 20 Jan 2026 18:24:02 +0000 Subject: [PATCH 01/19] Add Flash documentation section --- docs.json | 13 ++ flash/api-endpoints.mdx | 235 ++++++++++++++++++++++ flash/deploy-apps.mdx | 170 ++++++++++++++++ flash/monitoring.mdx | 204 +++++++++++++++++++ flash/overview.mdx | 155 +++++++++++++++ flash/pricing.mdx | 108 +++++++++++ flash/quickstart.mdx | 324 +++++++++++++++++++++++++++++++ flash/remote-functions.mdx | 261 +++++++++++++++++++++++++ flash/resource-configuration.mdx | 268 +++++++++++++++++++++++++ 9 files changed, 1738 insertions(+) create mode 100644 flash/api-endpoints.mdx create mode 100644 flash/deploy-apps.mdx create mode 100644 flash/monitoring.mdx create mode 100644 flash/overview.mdx create mode 100644 flash/pricing.mdx create mode 100644 flash/quickstart.mdx create mode 100644 flash/remote-functions.mdx create mode 100644 flash/resource-configuration.mdx diff --git a/docs.json b/docs.json index 3de1948b..4ec14ccb 100644 --- a/docs.json +++ b/docs.json @@ -118,6 +118,19 @@ } ] }, + { + "group": "Flash", + "pages": [ + "flash/overview", + "flash/quickstart", + "flash/pricing", + "flash/remote-functions", + "flash/api-endpoints", + "flash/deploy-apps", + "flash/resource-configuration", + "flash/monitoring" + ] + }, { "group": "Pods", "pages": [ diff --git a/flash/api-endpoints.mdx b/flash/api-endpoints.mdx new file mode 100644 index 00000000..9d669b3c --- /dev/null +++ b/flash/api-endpoints.mdx @@ -0,0 +1,235 @@ +--- +title: "Create Flash API endpoints" +sidebarTitle: "Create API endpoints" +description: "Build and serve HTTP APIs using FastAPI with Flash." +--- + +Flash API endpoints let you build HTTP APIs with FastAPI that run on Runpod Serverless workers. Use them to deploy production APIs that need GPU or CPU acceleration. + +Unlike standalone scripts that run once and return results, API endpoints create a persistent server that handles incoming HTTP requests. Each request is processed by a Serverless worker using the same remote functions you'd use in a standalone script. + + + +Flash API endpoints are currently available for local testing only. Run `flash run` to start the API server on your local machine. Production deployment support is coming in future updates. + + + +## Step 1: Initialize a new project + +Use the `flash init` command to generate a structured project template with a preconfigured FastAPI application entry point. + +Run this command to initialize a new project directory: + +```bash +flash init my_project +``` + +You can also initialize your current directory: + +```bash +flash init +``` + +## Step 2: Explore the project template + +This is the structure of the project template created by `flash init`: + +```text +my_project/ +├── main.py # FastAPI application entry point +├── workers/ +│ ├── gpu/ # GPU worker example +│ │ ├── __init__.py # FastAPI router +│ │ └── endpoint.py # GPU script with @remote decorated function +│ └── cpu/ # CPU worker example +│ ├── __init__.py # FastAPI router +│ └── endpoint.py # CPU script with @remote decorated function +├── .env # Environment variable template +├── .gitignore # Git ignore patterns +├── .flashignore # Flash deployment ignore patterns +├── requirements.txt # Python dependencies +└── README.md # Project documentation +``` + +This template includes: + +- A FastAPI application entry point and routers. +- Templates for Python dependencies, `.env`, `.gitignore`, etc. +- Flash scripts (`endpoint.py`) for both GPU and CPU workers, which include: + - Pre-configured worker scaling limits using the `LiveServerless()` object. + - A `@remote` decorated function that returns a response from a worker. + +When you start the FastAPI server, it creates API endpoints at `/gpu/hello` and `/cpu/hello`, which call the remote function described in their respective `endpoint.py` files. + +## Step 3: Install Python dependencies + +After initializing the project, navigate into the project directory: + +```bash +cd my_project +``` + +Install required dependencies: + +```bash +pip install -r requirements.txt +``` + +## Step 4: Configure your API key + +Open the `.env` template file in a text editor and add your [Runpod API key](/get-started/api-keys): + +```bash +# Use your text editor of choice, e.g. +cursor .env +``` + +Remove the `#` symbol from the beginning of the `RUNPOD_API_KEY` line and replace `your_api_key_here` with your actual Runpod API key: + +```text +RUNPOD_API_KEY=your_api_key_here +# FLASH_HOST=localhost +# FLASH_PORT=8888 +# LOG_LEVEL=INFO +``` + +Save the file and close it. + +## Step 5: Start the local API server + +Use `flash run` to start the API server: + +```bash +flash run +``` + +Open a new terminal tab or window and test your GPU API using cURL: + +```bash +curl -X POST http://localhost:8888/gpu/hello \ + -H "Content-Type: application/json" \ + -d '{"message": "Hello from the GPU!"}' +``` + +If you switch back to the terminal tab where you used `flash run`, you'll see the details of the job's progress. + +### Faster testing with auto-provisioning + +For development with multiple endpoints, use `--auto-provision` to deploy all resources before testing: + +```bash +flash run --auto-provision +``` + +This eliminates cold-start delays by provisioning all serverless endpoints upfront. Endpoints are cached and reused across server restarts, making subsequent runs faster. Resources are identified by name, so the same endpoint won't be re-deployed if the configuration hasn't changed. + +## Step 6: Open the API explorer + +Besides starting the API server, `flash run` also starts an interactive API explorer. Point your web browser at [http://localhost:8888/docs](http://localhost:8888/docs) to explore the API. + +To run remote functions in the explorer: + +1. Expand one of the functions under **GPU Workers** or **CPU Workers**. +2. Click **Try it out** and then **Execute**. + +You'll get a response from your workers right in the explorer. + +## Step 7: Customize your API + +To customize your API endpoint and functionality: + +1. Add or edit remote functions in your `endpoint.py` files. +2. Test the scripts individually by running `python endpoint.py`. +3. Configure your FastAPI routers by editing the `__init__.py` files. +4. Add any new endpoints to your `main.py` file. + +### Example: Adding a custom endpoint + +To add a new GPU endpoint for image generation: + +1. Create a new file at `workers/gpu/image_gen.py`: + +```python +from tetra_rp import remote, LiveServerless, GpuGroup + +config = LiveServerless( + name="image-generator", + gpus=[GpuGroup.AMPERE_24], + workersMax=2 +) + +@remote( + resource_config=config, + dependencies=["diffusers", "torch", "transformers"] +) +def generate_image(prompt: str, width: int = 512, height: int = 512): + import torch + from diffusers import StableDiffusionPipeline + import base64 + import io + + pipeline = StableDiffusionPipeline.from_pretrained( + "runwayml/stable-diffusion-v1-5", + torch_dtype=torch.float16 + ).to("cuda") + + image = pipeline(prompt=prompt, width=width, height=height).images[0] + + buffered = io.BytesIO() + image.save(buffered, format="PNG") + img_str = base64.b64encode(buffered.getvalue()).decode() + + return {"image": img_str, "prompt": prompt} +``` + +2. Add a route in `workers/gpu/__init__.py`: + +```python +from fastapi import APIRouter +from .image_gen import generate_image + +router = APIRouter() + +@router.post("/generate") +async def generate(prompt: str, width: int = 512, height: int = 512): + result = await generate_image(prompt, width, height) + return result +``` + +3. Include the router in `main.py` if not already included. + +## Load-balanced endpoints + +For API endpoints requiring low-latency HTTP access with direct routing, use load-balanced endpoints: + +```python +from tetra_rp import LiveLoadBalancer, remote + +api = LiveLoadBalancer(name="api-service") + +@remote(api, method="POST", path="/api/process") +async def process_data(x: int, y: int): + return {"result": x + y} + +@remote(api, method="GET", path="/api/health") +def health_check(): + return {"status": "ok"} + +# Call functions directly +result = await process_data(5, 3) # → {"result": 8} +``` + +Key differences from queue-based endpoints: + +- **Direct HTTP routing**: Requests routed directly to workers, no queue. +- **Lower latency**: No queuing overhead. +- **Custom HTTP methods**: GET, POST, PUT, DELETE, PATCH support. +- **No automatic retries**: Users handle errors directly. + +Load-balanced endpoints are ideal for REST APIs, webhooks, and real-time services. Queue-based endpoints are better for batch processing and fault-tolerant workflows. + +## Next steps + +- [Deploy Flash applications](/flash/deploy-apps) for production use. +- [Configure resources](/flash/resource-configuration) for your endpoints. +- [Monitor and debug](/flash/monitoring) your endpoints. diff --git a/flash/deploy-apps.mdx b/flash/deploy-apps.mdx new file mode 100644 index 00000000..31d49ee4 --- /dev/null +++ b/flash/deploy-apps.mdx @@ -0,0 +1,170 @@ +--- +title: "Deploy Flash apps" +sidebarTitle: "Deploy apps" +description: "Build and deploy Flash applications for production." +--- + +Flash uses a build process to package your application for deployment. This page covers how the build process works, including handler generation, cross-platform builds, and troubleshooting common issues. + +## Build process and handler generation + +When you run `flash build`, the following happens: + +1. **Discovery**: Flash scans your code for `@remote` decorated functions. +2. **Grouping**: Functions are grouped by their `resource_config`. +3. **Handler generation**: For each resource config, Flash generates a lightweight handler file. +4. **Manifest creation**: A `flash_manifest.json` file maps functions to their endpoints. +5. **Dependency installation**: Python packages are installed with Linux `x86_64` compatibility. +6. **Packaging**: Everything is bundled into `archive.tar.gz` for deployment. + +### Handler architecture + +Flash uses a factory pattern for handlers to eliminate code duplication: + +```python +# Generated handler (handler_gpu_config.py) +from tetra_rp.runtime.generic_handler import create_handler +from workers.gpu import process_data + +FUNCTION_REGISTRY = { + "process_data": process_data, +} + +handler = create_handler(FUNCTION_REGISTRY) +``` + +This approach provides: + +- **Single source of truth**: All handler logic in one place. +- **Easier maintenance**: Bug fixes don't require rebuilding projects. + +## Cross-platform builds + +Flash automatically handles cross-platform builds, ensuring your deployments work correctly regardless of your development platform: + +- **Automatic platform targeting**: Dependencies are installed for Linux `x86_64` (Runpod's serverless platform), even when building on macOS or Windows. +- **Python version matching**: The build uses your current Python version to ensure package compatibility. +- **Binary wheel enforcement**: Only pre-built binary wheels are used, preventing platform-specific compilation issues. + +This means you can build on macOS ARM64, Windows, or any other platform, and the resulting package will run correctly on Runpod Serverless. + +## Cross-endpoint function calls + +Flash enables functions on different endpoints to call each other. The runtime automatically discovers endpoints using the manifest and routes calls appropriately: + +```python +# CPU endpoint function +@remote(resource_config=cpu_config) +def preprocess(data): + # Preprocessing logic + return clean_data + +# GPU endpoint function +@remote(resource_config=gpu_config) +async def inference(data): + # Can call CPU endpoint function + clean = await preprocess(data) + # Run inference on clean data + return result +``` + +The runtime wrapper handles service discovery and routing automatically. This allows you to build pipelines that use CPU workers for preprocessing and GPU workers for inference, optimizing costs by using appropriate hardware for each task. + +## Build artifacts + +After `flash build` completes, you'll find these artifacts: + +| Artifact | Description | +|----------|-------------| +| `.flash/.build/` | Temporary build directory (removed unless `--keep-build`) | +| `.flash/archive.tar.gz` | Deployment package | +| `.flash/flash_manifest.json` | Service discovery configuration | + +### Managing bundle size + +Runpod Serverless has a **500MB deployment limit**. Exceeding this limit will cause deployment failures. + +Use `--exclude` to skip packages already in your worker-tetra Docker image: + +```bash +# For GPU deployments (PyTorch pre-installed) +flash build --exclude torch,torchvision,torchaudio +``` + +Which packages to exclude depends on your resource config: + +- **GPU resources**: PyTorch images have `torch`, `torchvision`, and `torchaudio` pre-installed. +- **CPU resources**: Python slim images have no ML frameworks pre-installed. +- **Load-balanced**: Same as above, depends on GPU vs CPU variant. + +## Troubleshooting + +### No @remote functions found + +If the build process can't find your remote functions: + +- Ensure your functions are decorated with `@remote(resource_config)`. +- Check that Python files are not excluded by `.gitignore` or `.flashignore`. +- Verify function decorators have valid syntax. + +### Handler generation failed + +If handler generation fails: + +- Check for syntax errors in your Python files (these will be logged). +- Verify all imports in your worker modules are available. +- Ensure resource config variables (e.g., `gpu_config`) are defined before functions reference them. +- Use `--keep-build` to inspect generated handler files in `.flash/.build/`. + +### Build succeeded but deployment failed + +If the build succeeds but deployment fails: + +- Verify all function imports work in the deployment environment. +- Check that environment variables required by your functions are available. +- Review the generated `flash_manifest.json` for correct function mappings. + +### Dependency installation failed + +If dependency installation fails during the build: + +- If a package doesn't have pre-built Linux `x86_64`` wheels, the build will fail with an error. +- For newer Python versions (3.13+), some packages may require `manylinux_2_27`` or higher. +- Ensure you have standard pip installed (`python -m ensurepip --upgrade`) for best compatibility. +- Check PyPI to verify the package supports your Python version on Linux. + +### Authentication errors + +If you're seeing authentication errors: + +Verify your API key is set correctly: + +```bash +echo $RUNPOD_API_KEY # Should show your key +``` + +### Import errors in remote functions + +Remember to import packages inside remote functions: + +```python +@remote(dependencies=["requests"]) +def fetch_data(url): + import requests # Import here, not at top of file + return requests.get(url).json() +``` + +## Performance optimization + +To optimize performance: + +- Set `workersMin=1` to keep workers warm and avoid cold starts. +- Use `idleTimeout` to balance cost and responsiveness. +- Choose appropriate GPU types for your workload. +- Use `--auto-provision` with `flash run` to eliminate cold-start delays during development. + +## Next steps + +- [View the resource configuration reference](/flash/resource-configuration) for all available options. +- [Monitor and debug](/flash/monitoring) your deployments. +- [Learn about pricing](/flash/pricing) to optimize costs. diff --git a/flash/monitoring.mdx b/flash/monitoring.mdx new file mode 100644 index 00000000..b9f2589f --- /dev/null +++ b/flash/monitoring.mdx @@ -0,0 +1,204 @@ +--- +title: "Monitoring and debugging" +sidebarTitle: "Monitoring and debugging" +description: "Monitor, debug, and troubleshoot Flash deployments." +--- + +This page covers how to monitor and debug your Flash deployments, including viewing logs, troubleshooting common issues, and optimizing performance. + +## Viewing logs + +When running Flash functions, logs are displayed in your terminal. The output includes: + +- Endpoint creation and reuse status. +- Job submission and queue status. +- Execution progress. +- Worker information (delay time, execution time). + +Example output: + +```text +2025-11-19 12:35:15,109 | INFO | Created endpoint: rb50waqznmn2kg - flash-quickstart-fb +2025-11-19 12:35:15,112 | INFO | URL: https://console.runpod.io/serverless/user/endpoint/rb50waqznmn2kg +2025-11-19 12:35:15,114 | INFO | LiveServerless:rb50waqznmn2kg | API /run +2025-11-19 12:35:15,655 | INFO | LiveServerless:rb50waqznmn2kg | Started Job:b0b341e7-e460-4305-9acd-fc2dfd1bd65c-u2 +2025-11-19 12:35:15,762 | INFO | Job:b0b341e7-e460-4305-9acd-fc2dfd1bd65c-u2 | Status: IN_QUEUE +2025-11-19 12:36:09,983 | INFO | Job:b0b341e7-e460-4305-9acd-fc2dfd1bd65c-u2 | Status: COMPLETED +2025-11-19 12:36:10,068 | INFO | Worker:icmkdgnrmdf8gz | Delay Time: 51842 ms +2025-11-19 12:36:10,068 | INFO | Worker:icmkdgnrmdf8gz | Execution Time: 1533 ms +``` + +### Log levels + +You can control log verbosity using the `LOG_LEVEL` environment variable: + +```bash +LOG_LEVEL=DEBUG python your_script.py +``` + +Available log levels: `DEBUG`, `INFO`, `WARNING`, `ERROR`. + +## Monitoring in the Runpod console + +View detailed metrics and logs in the [Runpod console](https://www.runpod.io/console/serverless): + +1. Navigate to the **Serverless** section. +2. Click on your endpoint to view: + - Active workers and queue depth. + - Request history and job status. + - Worker logs and execution details. + - Metrics (requests, latency, errors). + +### Endpoint metrics + +The console provides metrics including: + +- **Request rate**: Number of requests per minute. +- **Queue depth**: Number of pending requests. +- **Latency**: Average response time. +- **Worker count**: Active and idle workers. +- **Error rate**: Failed requests percentage. + +## Debugging common issues + +### Cold start delays + +If you're experiencing slow initial responses: + +- **Cause**: Workers need time to start, load dependencies, and initialize models. +- **Solutions**: + - Set `workersMin=1` to keep at least one worker warm. + - Use smaller models or optimize model loading. + - Use `--auto-provision` with `flash run` for development. + +```python +config = LiveServerless( + name="always-warm", + workersMin=1, # Keep one worker always running + idleTimeout=30 # Longer idle timeout +) +``` + +### Timeout errors + +If requests are timing out: + +- **Cause**: Execution taking longer than the timeout limit. +- **Solutions**: + - Increase `executionTimeoutMs` in your configuration. + - Optimize your function to run faster. + - Break long operations into smaller chunks. + +```python +config = LiveServerless( + name="long-running", + executionTimeoutMs=600000 # 10 minutes +) +``` + +### Memory errors + +If you're seeing out-of-memory errors: + +- **Cause**: Model or data too large for available GPU/CPU memory. +- **Solutions**: + - Use a larger GPU type (e.g., `GpuGroup.AMPERE_80` for 80GB VRAM). + - Use model quantization or smaller batch sizes. + - Clear GPU memory between operations. + +```python +config = LiveServerless( + name="large-model", + gpus=[GpuGroup.AMPERE_80], # A100 80GB + template=PodTemplate(containerDiskInGb=100) # More disk space +) +``` + +### Dependency errors + +If packages aren't being installed correctly: + +- **Cause**: Missing or incompatible dependencies. +- **Solutions**: + - Verify package names and versions in the `dependencies` list. + - Check that packages have Linux `x86_64` wheels available. + - Import packages inside the function, not at the top of the file. + +```python +@remote( + resource_config=config, + dependencies=["torch==2.0.0", "transformers==4.36.0"] # Pin versions +) +def my_function(data): + import torch # Import inside the function + import transformers + # ... +``` + +### Authentication errors + +If you're seeing API key errors: + +- **Cause**: Missing or invalid Runpod API key. +- **Solutions**: + - Verify your API key is set in the environment. + - Check that the `.env` file is in the correct directory. + - Ensure the API key has the required permissions. + +```bash +# Check if API key is set +echo $RUNPOD_API_KEY + +# Set API key directly +export RUNPOD_API_KEY=your_api_key_here +``` + +## Performance optimization + +### Reducing cold starts + +- Set `workersMin=1` for endpoints that need fast responses. +- Use `idleTimeout` to balance cost and warm worker availability. +- Cache models on network volumes to reduce loading time. + +### Optimizing execution time + +- Profile your functions to identify bottlenecks. +- Use appropriate GPU types for your workload. +- Batch multiple inputs into a single request when possible. +- Use async operations to parallelize independent tasks. + +### Managing costs + +- Set appropriate `workersMax` limits to control scaling. +- Use CPU workers for non-GPU tasks. +- Monitor usage in the console to identify optimization opportunities. +- Use shorter `idleTimeout` for sporadic workloads. + +## Endpoint management + +As you work with Flash, endpoints accumulate in your Runpod account. To manage them: + +1. Go to the [Serverless section](https://www.runpod.io/console/serverless) in the Runpod console. +2. Review your endpoints and delete unused ones. +3. Note that a `flash undeploy` command is in development for easier cleanup. + + + +Endpoints persist until manually deleted through the Runpod console. Regularly clean up unused endpoints to avoid unnecessary charges. + + + +## Getting help + +If you're encountering issues not covered here: + +- Check the [Flash examples repository](https://github.com/runpod/flash-examples) for working examples. +- Review the [tetra-rp GitHub repository](https://github.com/runpod/tetra-rp) for the latest documentation. +- Contact [Runpod support](https://www.runpod.io/contact) for additional assistance. + +## Next steps + +- [View the resource configuration reference](/flash/resource-configuration) for all available options. +- [Learn about pricing](/flash/pricing) to optimize costs. +- [Deploy Flash applications](/flash/deploy-apps) for production. diff --git a/flash/overview.mdx b/flash/overview.mdx new file mode 100644 index 00000000..8b573b53 --- /dev/null +++ b/flash/overview.mdx @@ -0,0 +1,155 @@ +--- +title: "Flash overview" +sidebarTitle: "Overview" +description: "Develop and deploy AI workflows on Runpod Serverless with Python." +--- + +Flash is a Python SDK for developing and deploying AI workflows on Runpod Serverless. You write Python functions locally, and Flash handles infrastructure management, GPU/CPU provisioning, dependency installation, and data transfer automatically. + +Flash provides two ways to run workloads: + +- **Standalone scripts**: Use the `@remote` decorator to run Python functions on Runpod cloud infrastructure. +- **API endpoints**: Build and serve HTTP APIs using FastAPI that compute responses with GPU and CPU Serverless workers. + +You can find prebuilt Flash examples at [runpod/flash-examples](https://github.com/runpod/flash-examples). + +## Why use Flash? + +Flash deploys Python functions to Runpod's Serverless infrastructure without requiring you to manage servers, configure networking, or handle scaling. You write functions, specify your dependencies in the decorator, and Flash installs them automatically when the function runs on remote workers. + +You can specify the exact GPU hardware you need, from RTX 4090s to A100 80GB GPUs, for AI inference, training, and other compute-intensive tasks. Functions scale automatically based on demand and can run in parallel across multiple workers. + +Flash uses Serverless pricing with per-second billing. You're only charged for actual compute time—there are no costs when your code isn't running. + + + Follow the quickstart to create your first Flash function in minutes. + + +## Install Flash + +Install Flash with `pip`: + +```bash +pip install tetra_rp`. +``` + +Then configure your Runpod API key as an environment variable. + +## Concepts + +### Remote functions + +The `@remote` decorator marks functions for execution on Runpod's infrastructure. Code inside the decorated function runs remotely on a Serverless worker, while code outside the function runs locally on your machine. + +```python +@remote(resource_config=config, dependencies=["pandas"]) +def process_data(data): + # This code runs remotely on Runpod + import pandas as pd + df = pd.DataFrame(data) + return df.describe().to_dict() + +async def main(): + # This code runs locally + result = await process_data(my_data) +``` + +### Resource configuration + +Flash provides fine-grained control over hardware allocation through configuration objects. You can configure GPU types, worker counts, idle timeouts, environment variables, and more. + +```python +from tetra_rp import LiveServerless, GpuGroup + +gpu_config = LiveServerless( + name="ml-inference", + gpus=[GpuGroup.AMPERE_80], # A100 80GB + workersMax=5 +) +``` + +### Dependency management + +Specify Python packages in the decorator, and Flash installs them automatically on the remote worker: + +```python +@remote( + resource_config=gpu_config, + dependencies=["transformers==4.36.0", "torch", "pillow"] +) +def generate_image(prompt): + # Import inside the function + from transformers import pipeline + # ... +``` + +Imports should be placed inside the function body because they need to happen on the remote worker, not in your local environment. + +### Parallel execution + +Run multiple remote functions concurrently using Python's async capabilities: + +```python +results = await asyncio.gather( + process_item(item1), + process_item(item2), + process_item(item3) +) +``` + +## How it works + +Flash orchestrates workflow execution through a multi-step process: + +1. **Function identification**: The `@remote` decorator marks functions for remote execution, enabling Flash to distinguish between local and remote operations. +2. **Dependency analysis**: Flash automatically analyzes function dependencies to construct an optimal execution order. +3. **Resource provisioning and execution**: For each remote function, Flash: + - Dynamically provisions endpoint and worker resources on Runpod's infrastructure. + - Serializes and securely transfers input data to the remote worker. + - Executes the function on the remote infrastructure with the specified GPU or CPU resources. + - Returns results to your local environment. +4. **Data orchestration**: Results flow seamlessly between functions according to your local Python code structure. + +## Use cases + +Flash is well-suited for a range of AI and data processing workloads: + +- **Multi-modal AI pipelines**: Orchestrate unified workflows combining text, image, and audio models with GPU acceleration. +- **Distributed model training**: Scale training operations across multiple GPU workers for faster model development. +- **AI research experimentation**: Rapidly prototype and test complex model combinations without infrastructure overhead. +- **Production inference systems**: Deploy multi-stage inference pipelines for real-world applications. +- **Data processing workflows**: Process large datasets using CPU workers for general computation and GPU workers for accelerated tasks. +- **Hybrid GPU/CPU workflows**: Optimize cost and performance by combining CPU preprocessing with GPU inference. + +## Development workflow + +A typical Flash development workflow looks like this: + +1. Write Python functions with the `@remote` decorator. +2. Specify resource requirements and dependencies in the decorator. +3. Run your script locally—Flash handles remote execution automatically. +4. For API deployments, use `flash init` to create a project and `flash run` to start your server. + +## Limitations + +- Serverless deployments using Flash are currently restricted to the `EU-RO-1` datacenter. +- Flash is designed primarily for local development and live-testing workflows. +- Endpoints created by Flash persist until manually deleted through the Runpod console. A `flash undeploy` command is in development. +- Be aware of your account's maximum worker capacity limits. Flash can rapidly scale workers across multiple endpoints, and you may hit capacity constraints. Contact Runpod support to increase your account's capacity allocation if needed. + +## Next steps + + + + Get started with your first Flash function. + + + Learn about resource configuration, dependencies, and parallel execution. + + + Build HTTP APIs with FastAPI and Flash. + + + Complete reference for resource configuration options. + + diff --git a/flash/pricing.mdx b/flash/pricing.mdx new file mode 100644 index 00000000..f2c05944 --- /dev/null +++ b/flash/pricing.mdx @@ -0,0 +1,108 @@ +--- +title: "Pricing" +sidebarTitle: "Pricing" +description: "Understand Flash pricing and optimize your costs." +--- + +Flash follows the same pricing model as [Runpod Serverless](/serverless/pricing). You pay per second of compute time, with no charges when your code isn't running. Pricing depends on the GPU or CPU type you configure for your endpoints. + +## How pricing works + +You're billed from when a worker starts until it completes your request, plus any idle time before scaling down. If a worker is already warm, you skip the cold start and only pay for execution time. + +### Compute cost breakdown + +Flash workers incur charges during these periods: + +1. **Start time**: The time required to initialize a worker and load models into GPU memory. This includes starting the container, installing dependencies, and preparing the runtime environment. +2. **Execution time**: The time spent processing your request (running your `@remote` decorated function). +3. **Idle time**: The period a worker remains active after completing a request, waiting for additional requests before scaling down. + +### Pricing by resource type + +Flash supports both GPU and CPU workers. Pricing varies based on the hardware type: + +- **GPU workers**: Use `LiveServerless` or `ServerlessEndpoint` with GPU configurations. Pricing depends on the GPU type (e.g., RTX 4090, A100 80GB). +- **CPU workers**: Use `LiveServerless` or `CpuServerlessEndpoint` with CPU configurations. Pricing depends on the CPU instance type. + +See the [Serverless pricing page](/serverless/pricing) for current rates by GPU and CPU type. + +## How to estimate and optimize costs + +To estimate costs for your Flash workloads, consider: + +- How long each function takes to execute. +- How many concurrent workers you need (`workersMax` setting). +- Which GPU or CPU types you'll use. +- Your idle timeout configuration (`idleTimeout` setting). + +### Cost optimization strategies + +#### Choose appropriate hardware + +Select the smallest GPU or CPU that meets your performance requirements. For example, if your workload fits in 24GB of VRAM, use `GpuGroup.ADA_24` or `GpuGroup.AMPERE_24` instead of larger GPUs. + +```python +# Cost-effective configuration for workloads that fit in 24GB VRAM +config = LiveServerless( + name="cost-optimized", + gpus=[GpuGroup.ADA_24, GpuGroup.AMPERE_24], # RTX 4090, L4, A5000, 3090 +) +``` + +#### Configure idle timeouts + +Balance responsiveness and cost by adjusting the `idleTimeout` parameter. Shorter timeouts reduce idle costs but increase cold starts for sporadic traffic. + +```python +# Lower idle timeout for cost savings (more cold starts) +config = LiveServerless( + name="low-idle", + idleTimeout=5, # 5 seconds (default) +) + +# Higher idle timeout for responsiveness (higher idle costs) +config = LiveServerless( + name="responsive", + idleTimeout=30, # 30 seconds +) +``` + +#### Use CPU workers for non-GPU tasks + +For data preprocessing, postprocessing, or other tasks that don't require GPU acceleration, use CPU workers instead of GPU workers. + +```python +from tetra_rp import LiveServerless, CpuInstanceType + +# CPU configuration for non-GPU tasks +cpu_config = LiveServerless( + name="data-processor", + instanceIds=[CpuInstanceType.CPU5C_2_4], # 2 vCPU, 4GB RAM +) +``` + +#### Limit maximum workers + +Set `workersMax` to prevent runaway scaling and unexpected costs: + +```python +config = LiveServerless( + name="controlled-scaling", + workersMax=3, # Limit to 3 concurrent workers +) +``` + +### Monitoring costs + +Monitor your usage in the [Runpod console](https://www.runpod.io/console/serverless) to track: + +- Total compute time across endpoints. +- Worker utilization and idle time. +- Cost breakdown by endpoint. + +## Next steps + +- [Create remote functions](/flash/remote-functions) with optimized resource configurations. +- [View Serverless pricing details](/serverless/pricing) for current rates. +- [Configure resources](/flash/resource-configuration) for your workloads. diff --git a/flash/quickstart.mdx b/flash/quickstart.mdx new file mode 100644 index 00000000..3ee79bd8 --- /dev/null +++ b/flash/quickstart.mdx @@ -0,0 +1,324 @@ +--- +title: "Get started with Flash" +sidebarTitle: "Quickstart" +description: "Set up your development environment and run your first GPU workload with Flash." +--- + +This tutorial shows you how to set up Flash and run a GPU workload on Runpod Serverless. You'll create a remote function that performs matrix operations on a GPU and returns the results to your local machine. + +## What you'll learn + +In this tutorial you'll learn how to: + +- Set up your development environment for Flash. +- Configure a Serverless endpoint using a `LiveServerless` object. +- Create and define remote functions with the `@remote` decorator. +- Deploy a GPU-based workload using Runpod resources. +- Pass data between your local environment and remote workers. +- Run multiple operations in parallel. + +## Requirements + +- You've [created a Runpod account](/get-started/manage-accounts). +- You've [created a Runpod API key](/get-started/api-keys). +- You've installed [Python 3.9 or greater](https://www.python.org/downloads/). + +## Step 1: Install Flash + +Use `pip` to install Flash: + +```bash +pip install tetra_rp +``` + +## Step 2: Add your API key to the environment + +Add your Runpod API key to your development environment before using Flash to run workloads. + +Run this command to create a `.env` file, replacing `YOUR_API_KEY` with your Runpod API key: + +```bash +touch .env && echo "RUNPOD_API_KEY=YOUR_API_KEY" > .env +``` + + + +You can create this in your project's root directory or in the `/examples` folder. Make sure your `.env` file is in the same folder as the Python file you create in the next step. + + + +## Step 3: Create your project file + +Create a new file called `matrix_operations.py` in the same directory as your `.env` file: + +```bash +touch matrix_operations.py +``` + +Open this file in your code editor. The following steps walk through building a matrix multiplication example that demonstrates Flash's remote execution and parallel processing capabilities. + +## Step 4: Add imports and load the .env file + +Add the necessary import statements: + +```python +import asyncio +from dotenv import load_dotenv +from tetra_rp import remote, LiveServerless, GpuGroup + +# Load environment variables from .env file +load_dotenv() +``` + +This imports: + +- `asyncio`: Python's asynchronous programming library, which Flash uses for non-blocking execution. +- `dotenv`: Loads environment variables from your `.env` file, including your Runpod API key. +- `remote` and `LiveServerless`: The core Flash components for defining remote functions and their resource requirements. + +`load_dotenv()` reads your API key from the `.env` file and makes it available to Flash. + +## Step 5: Add Serverless endpoint configuration + +Define the Serverless endpoint configuration for your Flash workload: + +```python +# Configuration for a Serverless endpoint using GPU workers +gpu_config = LiveServerless( + gpus=[GpuGroup.AMPERE_24, GpuGroup.ADA_24], # Use any 24GB GPU + workersMax=3, + name="tetra_gpu", +) +``` + +This `LiveServerless` object defines: + +- `gpus=[GpuGroup.AMPERE_24, GpuGroup.ADA_24]`: The GPUs that can be used by workers on this endpoint. This restricts workers to using any 24 GB GPU (L4, A5000, 3090, or 4090). See [GPU pools](/references/gpu-types#gpu-pools) for available GPU pool IDs. Removing this parameter allows the endpoint to use any available GPUs. +- `workersMax=3`: The maximum number of worker instances. +- `name="tetra_gpu"`: The name of the endpoint that will be created/used in the Runpod console. + +If you run a Flash function that uses an identical `LiveServerless` configuration to a prior run, Runpod reuses your existing endpoint rather than creating a new one. However, if any configuration values have changed (not just the `name` parameter), a new endpoint will be created. + +## Step 6: Define your remote function + +Define the function that will run on the GPU worker: + +```python +@remote( + resource_config=gpu_config, + dependencies=["numpy", "torch"] +) +def tetra_matrix_operations(size): + """Perform large matrix operations using NumPy and check GPU availability.""" + import numpy as np + import torch + + # Get GPU count and name + device_count = torch.cuda.device_count() + device_name = torch.cuda.get_device_name(0) + + # Create large random matrices + A = np.random.rand(size, size) + B = np.random.rand(size, size) + + # Perform matrix multiplication + C = np.dot(A, B) + + return { + "matrix_size": size, + "result_shape": C.shape, + "result_mean": float(np.mean(C)), + "result_std": float(np.std(C)), + "device_count": device_count, + "device_name": device_name + } +``` + +This code demonstrates several key concepts: + +- `@remote`: The decorator that marks the function for remote execution on Runpod's infrastructure. +- `resource_config=gpu_config`: The function runs using the GPU configuration defined earlier. +- `dependencies=["numpy", "torch"]`: Python packages that must be installed on the remote worker. + +The `tetra_matrix_operations` function: + +- Gets GPU details using PyTorch's CUDA utilities. +- Creates two large random matrices using NumPy. +- Performs matrix multiplication. +- Returns statistics about the result and information about the GPU. + +Notice that `numpy` and `torch` are imported inside the function, not at the top of the file. These imports need to happen on the remote worker, not in your local environment. + +## Step 7: Add the main function + +Add a `main` function to execute your GPU workload: + +```python +async def main(): + # Run the GPU matrix operations + print("Starting large matrix operations on GPU...") + result = await tetra_matrix_operations(1000) + + # Print the results + print("\nMatrix operations results:") + print(f"Matrix size: {result['matrix_size']}x{result['matrix_size']}") + print(f"Result shape: {result['result_shape']}") + print(f"Result mean: {result['result_mean']:.4f}") + print(f"Result standard deviation: {result['result_std']:.4f}") + + # Print GPU information + print("\nGPU Information:") + print(f"GPU device count: {result['device_count']}") + print(f"GPU device name: {result['device_name']}") + +if __name__ == "__main__": + asyncio.run(main()) +``` + +The `main` function: + +- Calls the remote function with `await`, which runs it asynchronously on Runpod's infrastructure. +- Prints the results of the matrix operations. +- Displays information about the GPU that was used. + +`asyncio.run(main())` is Python's standard way to execute an asynchronous `main` function from synchronous code. + +All code outside of the `@remote` decorated function runs on your local machine. The `main` function acts as a bridge between your local environment and Runpod's cloud infrastructure, allowing you to send input data to remote functions, wait for remote execution to complete without blocking your local process, and process returned results locally. + +The `await` keyword pauses execution of the `main` function until the remote operation completes, but doesn't block the entire Python process. + +## Step 8: Run your GPU example + +Run the example: + +```bash +python matrix_operations.py +``` + +You should see output similar to this: + +```text +Starting large matrix operations on GPU... +Resource LiveServerless_33e1fa59c64b611c66c5a778e120c522 already exists, reusing. +Registering RunPod endpoint: server_LiveServerless_33e1fa59c64b611c66c5a778e120c522 at https://api.runpod.ai/xvf32dan8rcilp +Initialized RunPod stub for endpoint: https://api.runpod.ai/xvf32dan8rcilp (ID: xvf32dan8rcilp) +Executing function on RunPod endpoint ID: xvf32dan8rcilp +Initial job status: IN_QUEUE +Job completed, output received + +Matrix operations results: +Matrix size: 1000x1000 +Result shape: (1000, 1000) +Result mean: 249.8286 +Result standard deviation: 6.8704 + +GPU Information: +GPU device count: 1 +GPU device name: NVIDIA GeForce RTX 4090 +``` + + +If you're having trouble running your code due to authentication issues: + +1. Verify your `.env` file is in the same directory as your `matrix_operations.py` file. +2. Check that the API key in your `.env` file is correct and properly formatted. + +Alternatively, you can set the API key directly in your terminal: + + + +```bash +export RUNPOD_API_KEY=[YOUR_API_KEY] +``` + + +```bash +set RUNPOD_API_KEY=[YOUR_API_KEY] +``` + + + + +## Step 9: Understand what's happening + +When you run this script: + +1. Flash reads your GPU resource configuration and provisions a worker on Runpod. +2. It installs the required dependencies (NumPy and PyTorch) on the worker. +3. Your `tetra_matrix_operations` function runs on the remote worker. +4. The function creates and multiplies large matrices, then calculates statistics. +5. Your local `main` function receives these results and displays them in your terminal. + +## Step 10: Run multiple operations in parallel + +Flash makes it easy to run multiple remote operations in parallel. + +Replace your `main` function with this code: + +```python +async def main(): + # Run multiple matrix operations in parallel + print("Starting large matrix operations on GPU...") + + # Run all matrix operations in parallel + results = await asyncio.gather( + tetra_matrix_operations(500), + tetra_matrix_operations(1000), + tetra_matrix_operations(2000) + ) + + print("\nMatrix operations results:") + + # Print the results for each matrix size + for result in results: + print(f"\nMatrix size: {result['matrix_size']}x{result['matrix_size']}") + print(f"Result shape: {result['result_shape']}") + print(f"Result mean: {result['result_mean']:.4f}") + print(f"Result standard deviation: {result['result_std']:.4f}") + +if __name__ == "__main__": + asyncio.run(main()) +``` + +This updated `main` function demonstrates Flash's ability to run multiple operations in parallel using `asyncio.gather()`. Instead of running one matrix operation at a time, you're launching three operations with different matrix sizes (500, 1000, and 2000) simultaneously. This parallel execution significantly improves efficiency when you have multiple independent tasks. + +Run the example again: + +```bash +python matrix_operations.py +``` + +You should see results for all three matrix sizes after the operations complete: + +```text +Initial job status: IN_QUEUE +Initial job status: IN_QUEUE +Initial job status: IN_QUEUE +Job completed, output received +Job completed, output received +Job completed, output received + +Matrix size: 500x500 +Result shape: (500, 500) +Result mean: 125.3097 +Result standard deviation: 5.0425 + +Matrix size: 1000x1000 +Result shape: (1000, 1000) +Result mean: 249.9442 +Result standard deviation: 7.1072 + +Matrix size: 2000x2000 +Result shape: (2000, 2000) +Result mean: 500.1321 +Result standard deviation: 9.8879 +``` + +## Next steps + +You've successfully used Flash to run a GPU workload on Runpod. Now you can: + +- [Create more complex remote functions](/flash/remote-functions) with custom dependencies and resource configurations. +- [Build API endpoints](/flash/api-endpoints) using FastAPI. +- [Deploy Flash applications](/flash/deploy-apps) for production use. +- Explore more examples on the [runpod/flash-examples](https://github.com/runpod/flash-examples) GitHub repository. diff --git a/flash/remote-functions.mdx b/flash/remote-functions.mdx new file mode 100644 index 00000000..f3f2c4f5 --- /dev/null +++ b/flash/remote-functions.mdx @@ -0,0 +1,261 @@ +--- +title: "Create remote functions" +sidebarTitle: "Create remote functions" +description: "Learn how to create and configure remote functions with Flash." +--- + +Remote functions are the core building blocks of Flash. The `@remote` decorator marks Python functions for execution on Runpod's Serverless infrastructure, handling resource provisioning, dependency installation, and data transfer automatically. + +## Resource configuration + +Every remote function requires a resource configuration that specifies the compute resources to use. Flash provides several configuration classes for different use cases. + +### LiveServerless + +`LiveServerless` is the primary configuration class for Flash. It supports full remote code execution, allowing you to run arbitrary Python functions on Runpod's infrastructure. + +```python +from tetra_rp import LiveServerless, GpuGroup + +gpu_config = LiveServerless( + name="ml-inference", + gpus=[GpuGroup.AMPERE_80], # A100 80GB + workersMax=5, + idleTimeout=10 +) + +@remote(resource_config=gpu_config, dependencies=["torch"]) +def run_inference(data): + import torch + # Your inference code here + return result +``` + +Common configuration options: + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `name` | Name for your endpoint (required) | - | +| `gpus` | GPU pool IDs that can be used | `[GpuGroup.ANY]` | +| `workersMax` | Maximum number of workers | 3 | +| `workersMin` | Minimum number of workers | 0 | +| `idleTimeout` | Minutes before scaling down | 5 | + +See the [resource configuration reference](/flash/resource-configuration) for all available options. + +### CPU configuration + +For CPU-only workloads, specify `instanceIds` instead of `gpus`: + +```python +from tetra_rp import LiveServerless, CpuInstanceType + +cpu_config = LiveServerless( + name="data-processor", + instanceIds=[CpuInstanceType.CPU5C_4_8], # 4 vCPU, 8GB RAM + workersMax=3 +) + +@remote(resource_config=cpu_config, dependencies=["pandas"]) +def process_data(data): + import pandas as pd + df = pd.DataFrame(data) + return df.describe().to_dict() +``` + +## Dependency management + +Specify Python packages in the `dependencies` parameter of the `@remote` decorator. Flash installs these packages on the remote worker before executing your function. + +```python +@remote( + resource_config=config, + dependencies=["transformers==4.36.0", "torch", "pillow"] +) +def generate_image(prompt): + from transformers import pipeline + import torch + from PIL import Image + # Your code here +``` + +### Important notes about dependencies + +**Import inside the function**: Always import packages inside the decorated function body, not at the top of your file. These imports need to happen on the remote worker, not in your local environment. + +```python +# Correct - imports inside the function +@remote(resource_config=config, dependencies=["numpy"]) +def compute(data): + import numpy as np # Import here + return np.sum(data) + +# Incorrect - imports at top of file won't work +import numpy as np # This import happens locally, not on the worker + +@remote(resource_config=config, dependencies=["numpy"]) +def compute(data): + return np.sum(data) # numpy not available on worker +``` + +**Version pinning**: You can pin specific versions using standard pip syntax: + +```python +dependencies=["transformers==4.36.0", "torch>=2.0.0"] +``` + +**Pre-installed packages**: Some packages (like PyTorch) are pre-installed on GPU workers. Including them in dependencies ensures the correct version is available. + +## Parallel execution + +Flash functions are asynchronous by default. Use Python's `asyncio` to run multiple functions in parallel: + +```python +import asyncio + +async def main(): + # Run three functions in parallel + results = await asyncio.gather( + process_item(item1), + process_item(item2), + process_item(item3) + ) + return results +``` + +This is particularly useful for: + +- Batch processing multiple inputs. +- Running different models on the same data. +- Parallelizing independent pipeline stages. + +### Example: Parallel batch processing + +```python +import asyncio +from tetra_rp import remote, LiveServerless, GpuGroup + +config = LiveServerless( + name="batch-processor", + gpus=[GpuGroup.ADA_24], + workersMax=5 # Allow up to 5 parallel workers +) + +@remote(resource_config=config, dependencies=["torch"]) +def process_batch(batch_id, data): + import torch + # Process batch + return {"batch_id": batch_id, "result": len(data)} + +async def main(): + batches = [ + (1, [1, 2, 3]), + (2, [4, 5, 6]), + (3, [7, 8, 9]) + ] + + # Process all batches in parallel + results = await asyncio.gather(*[ + process_batch(batch_id, data) + for batch_id, data in batches + ]) + + print(results) + +if __name__ == "__main__": + asyncio.run(main()) +``` + +## Custom Docker images + +For specialized environments that require a custom Docker image, use `ServerlessEndpoint` or `CpuServerlessEndpoint` instead of `LiveServerless`: + +```python +from tetra_rp import ServerlessEndpoint, GpuGroup + +custom_gpu = ServerlessEndpoint( + name="custom-ml-env", + imageName="pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime", + gpus=[GpuGroup.AMPERE_80] +) +``` + + + +Unlike `LiveServerless`, `ServerlessEndpoint` and `CpuServerlessEndpoint` only support dictionary payloads in the form of `{"input": {...}}` (similar to a traditional [Serverless endpoint request](/serverless/endpoints/send-requests)). They cannot execute arbitrary Python functions remotely. + + + +Use custom Docker images when you need: + +- Pre-installed system-level dependencies. +- Specific CUDA or cuDNN versions. +- Custom base images with large models baked in. + +## Using persistent storage + +Attach [network volumes](/storage/network-volumes) for persistent storage across workers and endpoints. This is useful for sharing large models or datasets between workers without downloading them each time. + +```python +config = LiveServerless( + name="model-server", + networkVolumeId="vol_abc123", # Your network volume ID + template=PodTemplate(containerDiskInGb=100) +) +``` + +To find your network volume ID: + +1. Go to the [Storage page](https://www.runpod.io/console/storage) in the Runpod console. +2. Click on your network volume. +3. Copy the volume ID from the URL or volume details. + +### Example: Using a network volume for model storage + +```python +from tetra_rp import LiveServerless, GpuGroup, PodTemplate + +config = LiveServerless( + name="model-inference", + gpus=[GpuGroup.AMPERE_80], + networkVolumeId="vol_abc123", + template=PodTemplate(containerDiskInGb=100) +) + +@remote(resource_config=config, dependencies=["torch", "transformers"]) +def run_inference(prompt): + from transformers import AutoModelForCausalLM, AutoTokenizer + + # Load model from network volume + model_path = "/runpod-volume/models/llama-7b" + model = AutoModelForCausalLM.from_pretrained(model_path) + tokenizer = AutoTokenizer.from_pretrained(model_path) + + # Run inference + inputs = tokenizer(prompt, return_tensors="pt") + outputs = model.generate(**inputs) + return tokenizer.decode(outputs[0]) +``` + +## Environment variables + +Pass environment variables to remote functions using the `env` parameter: + +```python +config = LiveServerless( + name="api-worker", + env={"HF_TOKEN": "your_token", "MODEL_ID": "gpt2"} +) +``` + + + +Environment variables are excluded from configuration hashing. Changing environment values won't trigger endpoint recreation, which allows different processes to load environment variables from `.env` files without causing false drift detection. + + + +## Next steps + +- [Create API endpoints](/flash/api-endpoints) using FastAPI. +- [Deploy Flash applications](/flash/deploy-apps) for production. +- [View the resource configuration reference](/flash/resource-configuration) for all available options. diff --git a/flash/resource-configuration.mdx b/flash/resource-configuration.mdx new file mode 100644 index 00000000..6241ea4e --- /dev/null +++ b/flash/resource-configuration.mdx @@ -0,0 +1,268 @@ +--- +title: "Resource configuration reference" +sidebarTitle: "Configuration reference" +description: "Complete reference for Flash resource configuration options." +--- + +Flash provides several resource configuration classes for different use cases. This reference covers all available parameters and options. + +## LiveServerless + +`LiveServerless` is the primary configuration class for Flash. It supports full remote code execution, allowing you to run arbitrary Python functions on Runpod's infrastructure. + +```python +from tetra_rp import LiveServerless, GpuGroup, CpuInstanceType, PodTemplate + +gpu_config = LiveServerless( + name="ml-inference", + gpus=[GpuGroup.AMPERE_80], + workersMax=5, + idleTimeout=10, + template=PodTemplate(containerDiskInGb=100) +) +``` + +### Parameters + +| Parameter | Type | Description | Default | +|-----------|------|-------------|---------| +| `name` | `string` | Name for your endpoint (required) | - | +| `gpus` | `list[GpuGroup]` | GPU pool IDs that can be used by workers | `[GpuGroup.ANY]` | +| `gpuCount` | `int` | Number of GPUs per worker | 1 | +| `instanceIds` | `list[CpuInstanceType]` | CPU instance types (forces CPU endpoint) | `None` | +| `workersMin` | `int` | Minimum number of workers | 0 | +| `workersMax` | `int` | Maximum number of workers | 3 | +| `idleTimeout` | `int` | Minutes before scaling down | 5 | +| `env` | `dict` | Environment variables | `None` | +| `networkVolumeId` | `string` | Persistent storage volume ID | `None` | +| `executionTimeoutMs` | `int` | Max execution time in milliseconds | 0 (no limit) | +| `scalerType` | `string` | Scaling strategy | `QUEUE_DELAY` | +| `scalerValue` | `int` | Scaling parameter value | 4 | +| `locations` | `string` | Preferred datacenter locations | `None` | +| `template` | `PodTemplate` | Pod template overrides | `None` | + +### GPU configuration example + +```python +from tetra_rp import LiveServerless, GpuGroup, PodTemplate + +config = LiveServerless( + name="gpu-inference", + gpus=[GpuGroup.AMPERE_80], # A100 80GB + gpuCount=1, + workersMin=0, + workersMax=5, + idleTimeout=10, + template=PodTemplate(containerDiskInGb=100), + env={"MODEL_ID": "llama-7b"} +) +``` + +### CPU configuration example + +```python +from tetra_rp import LiveServerless, CpuInstanceType + +config = LiveServerless( + name="cpu-processor", + instanceIds=[CpuInstanceType.CPU5C_4_8], # 4 vCPU, 8GB RAM + workersMax=3, + idleTimeout=5 +) +``` + +## ServerlessEndpoint + +`ServerlessEndpoint` is for GPU workloads that require custom Docker images. Unlike `LiveServerless`, it only supports dictionary payloads and cannot execute arbitrary Python functions. + +```python +from tetra_rp import ServerlessEndpoint, GpuGroup + +config = ServerlessEndpoint( + name="custom-ml-env", + imageName="pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime", + gpus=[GpuGroup.AMPERE_80] +) +``` + +### Parameters + +All parameters from `LiveServerless` are available, plus: + +| Parameter | Type | Description | Default | +|-----------|------|-------------|---------| +| `imageName` | `string` | Custom Docker image | - | + +### Limitations + +- Only supports dictionary payloads in the form of `{"input": {...}}`. +- Cannot execute arbitrary Python functions remotely. +- Requires a custom Docker image with a handler that processes the input dictionary. + +### Example + +```python +from tetra_rp import ServerlessEndpoint, GpuGroup + +# Custom image with pre-installed models +config = ServerlessEndpoint( + name="stable-diffusion", + imageName="my-registry/stable-diffusion:v1.0", + gpus=[GpuGroup.AMPERE_24], + workersMax=3 +) + +# Send requests as dictionaries +result = await config.run({ + "input": { + "prompt": "A beautiful sunset over mountains", + "width": 512, + "height": 512 + } +}) +``` + +## CpuServerlessEndpoint + +`CpuServerlessEndpoint` is for CPU workloads that require custom Docker images. Like `ServerlessEndpoint`, it only supports dictionary payloads. + +```python +from tetra_rp import CpuServerlessEndpoint, CpuInstanceType + +config = CpuServerlessEndpoint( + name="data-processor", + imageName="python:3.11-slim", + instanceIds=[CpuInstanceType.CPU5C_4_8] +) +``` + +### Parameters + +| Parameter | Type | Description | Default | +|-----------|------|-------------|---------| +| `name` | `string` | Name for your endpoint (required) | - | +| `imageName` | `string` | Custom Docker image | - | +| `instanceIds` | `list[CpuInstanceType]` | CPU instance types | - | +| `workersMin` | `int` | Minimum number of workers | 0 | +| `workersMax` | `int` | Maximum number of workers | 3 | +| `idleTimeout` | `int` | Minutes before scaling down | 5 | +| `env` | `dict` | Environment variables | `None` | +| `networkVolumeId` | `string` | Persistent storage volume ID | `None` | +| `executionTimeoutMs` | `int` | Max execution time in milliseconds | 0 (no limit) | + +## Resource class comparison + +| Feature | LiveServerless | ServerlessEndpoint | CpuServerlessEndpoint | +|---------|----------------|--------------------|-----------------------| +| Remote code execution | ✅ Full Python function execution | ❌ Dictionary payload only | ❌ Dictionary payload only | +| Custom Docker images | ❌ Fixed optimized images | ✅ Any Docker image | ✅ Any Docker image | +| Use case | Dynamic remote functions | Traditional API endpoints | Traditional CPU endpoints | +| Function returns | Any Python object | Dictionary only | Dictionary only | +| `@remote` decorator | Full functionality | Limited to payload passing | Limited to payload passing | + +## Available GPU types + +The `GpuGroup` enum provides access to GPU pools. Some common options: + +| GpuGroup | Description | VRAM | +|----------|-------------|------| +| `GpuGroup.ANY` | Any available GPU (default) | Varies | +| `GpuGroup.ADA_24` | RTX 4090 | 24GB | +| `GpuGroup.AMPERE_24` | RTX A5000, L4, RTX 3090 | 24GB | +| `GpuGroup.AMPERE_48` | A40, RTX A6000 | 48GB | +| `GpuGroup.AMPERE_80` | A100 80GB | 80GB | + +See [GPU types](/references/gpu-types#gpu-pools) for the complete list of available GPU pools. + +## Available CPU instance types + +The `CpuInstanceType` enum provides access to CPU configurations: + +### 3rd generation general purpose + +| CpuInstanceType | ID | vCPU | RAM | +|-----------------|-----|------|-----| +| `CPU3G_1_4` | cpu3g-1-4 | 1 | 4GB | +| `CPU3G_2_8` | cpu3g-2-8 | 2 | 8GB | +| `CPU3G_4_16` | cpu3g-4-16 | 4 | 16GB | +| `CPU3G_8_32` | cpu3g-8-32 | 8 | 32GB | + +### 3rd generation compute-optimized + +| CpuInstanceType | ID | vCPU | RAM | +|-----------------|-----|------|-----| +| `CPU3C_1_2` | cpu3c-1-2 | 1 | 2GB | +| `CPU3C_2_4` | cpu3c-2-4 | 2 | 4GB | +| `CPU3C_4_8` | cpu3c-4-8 | 4 | 8GB | +| `CPU3C_8_16` | cpu3c-8-16 | 8 | 16GB | + +### 5th generation compute-optimized + +| CpuInstanceType | ID | vCPU | RAM | +|-----------------|-----|------|-----| +| `CPU5C_1_2` | cpu5c-1-2 | 1 | 2GB | +| `CPU5C_2_4` | cpu5c-2-4 | 2 | 4GB | +| `CPU5C_4_8` | cpu5c-4-8 | 4 | 8GB | +| `CPU5C_8_16` | cpu5c-8-16 | 8 | 16GB | + +## PodTemplate + +Use `PodTemplate` to configure additional pod settings: + +```python +from tetra_rp import LiveServerless, PodTemplate + +config = LiveServerless( + name="custom-template", + template=PodTemplate( + containerDiskInGb=100, + env=[{"key": "PYTHONPATH", "value": "/workspace"}] + ) +) +``` + +### Parameters + +| Parameter | Type | Description | Default | +|-----------|------|-------------|---------| +| `containerDiskInGb` | `int` | Container disk size in GB | 20 | +| `env` | `list[dict]` | Environment variables as key-value pairs | `None` | + +## Environment variables + +Environment variables can be set in two ways: + +### Using the `env` parameter + +```python +config = LiveServerless( + name="api-worker", + env={"HF_TOKEN": "your_token", "MODEL_ID": "gpt2"} +) +``` + +### Using PodTemplate + +```python +config = LiveServerless( + name="api-worker", + template=PodTemplate( + env=[ + {"key": "HF_TOKEN", "value": "your_token"}, + {"key": "MODEL_ID", "value": "gpt2"} + ] + ) +) +``` + + + +Environment variables are excluded from configuration hashing. Changing environment values won't trigger endpoint recreation, which allows different processes to load environment variables from `.env` files without causing false drift detection. Only structural changes (like GPU type, image, or template modifications) trigger endpoint updates. + + + +## Next steps + +- [Create remote functions](/flash/remote-functions) using these configurations. +- [Deploy Flash applications](/flash/deploy-apps) for production. +- [Learn about pricing](/flash/pricing) to optimize costs. From 60d135a20ea8f1671b51cce66b449e04fa7f4a37 Mon Sep 17 00:00:00 2001 From: Mo King Date: Thu, 22 Jan 2026 10:59:11 -0500 Subject: [PATCH 02/19] Update flash overview --- flash/api-endpoints.mdx | 7 +-- flash/deploy-apps.mdx | 37 ++++++++------- flash/monitoring.mdx | 19 ++------ flash/overview.mdx | 80 +++++++++++++++++++++++--------- flash/pricing.mdx | 1 + flash/quickstart.mdx | 3 +- flash/remote-functions.mdx | 1 + flash/resource-configuration.mdx | 1 + 8 files changed, 91 insertions(+), 58 deletions(-) diff --git a/flash/api-endpoints.mdx b/flash/api-endpoints.mdx index 9d669b3c..b8f04631 100644 --- a/flash/api-endpoints.mdx +++ b/flash/api-endpoints.mdx @@ -1,12 +1,13 @@ --- -title: "Create Flash API endpoints" -sidebarTitle: "Create API endpoints" +title: "Create a Flash API endpoint" +sidebarTitle: "Create an endpoint" description: "Build and serve HTTP APIs using FastAPI with Flash." +tag: "BETA" --- Flash API endpoints let you build HTTP APIs with FastAPI that run on Runpod Serverless workers. Use them to deploy production APIs that need GPU or CPU acceleration. -Unlike standalone scripts that run once and return results, API endpoints create a persistent server that handles incoming HTTP requests. Each request is processed by a Serverless worker using the same remote functions you'd use in a standalone script. +Unlike standalone scripts that run once and return results, this lets you create a persistent endpoint for handling incoming HTTP requests. Each request is processed by a Serverless worker using the same remote functions you'd use in a standalone script. diff --git a/flash/deploy-apps.mdx b/flash/deploy-apps.mdx index 31d49ee4..ff01b646 100644 --- a/flash/deploy-apps.mdx +++ b/flash/deploy-apps.mdx @@ -1,7 +1,8 @@ --- -title: "Deploy Flash apps" -sidebarTitle: "Deploy apps" -description: "Build and deploy Flash applications for production." +title: "Build and deploy Flash apps" +sidebarTitle: "Deploy Flash apps" +description: "Package and deploy Flash applications for production with `flash build`." +tag: "BETA" --- Flash uses a build process to package your application for deployment. This page covers how the build process works, including handler generation, cross-platform builds, and troubleshooting common issues. @@ -42,15 +43,15 @@ This approach provides: Flash automatically handles cross-platform builds, ensuring your deployments work correctly regardless of your development platform: -- **Automatic platform targeting**: Dependencies are installed for Linux `x86_64` (Runpod's serverless platform), even when building on macOS or Windows. +- **Automatic platform targeting**: Dependencies are installed for Linux `x86_64` (required for [Runpod Serverless](/serverless/overview)), even when building on macOS or Windows. - **Python version matching**: The build uses your current Python version to ensure package compatibility. - **Binary wheel enforcement**: Only pre-built binary wheels are used, preventing platform-specific compilation issues. -This means you can build on macOS ARM64, Windows, or any other platform, and the resulting package will run correctly on Runpod Serverless. +This means you can build on macOS ARM64, Windows, or any other platform, and the resulting package will run correctly on [Runpod Serverless](/serverless/overview). ## Cross-endpoint function calls -Flash enables functions on different endpoints to call each other. The runtime automatically discovers endpoints using the manifest and routes calls appropriately: +Flash enables functions on different endpoints to call each other: ```python # CPU endpoint function @@ -68,11 +69,11 @@ async def inference(data): return result ``` -The runtime wrapper handles service discovery and routing automatically. This allows you to build pipelines that use CPU workers for preprocessing and GPU workers for inference, optimizing costs by using appropriate hardware for each task. +The runtime automatically discovers endpoints and routes calls appropriately using the [`flash_manifest.json`](#build-artifacts) file generated during the build process. This lets you build pipelines that use CPU workers for preprocessing and GPU workers for inference, optimizing costs by using the appropriate hardware for each task. ## Build artifacts -After `flash build` completes, you'll find these artifacts: +After running `flash build`, you'll find these artifacts in the `.flash/` directory: | Artifact | Description | |----------|-------------| @@ -82,20 +83,24 @@ After `flash build` completes, you'll find these artifacts: ### Managing bundle size -Runpod Serverless has a **500MB deployment limit**. Exceeding this limit will cause deployment failures. +Runpod Serverless has a **500MB deployment limit**. Exceeding this limit will cause your build to fail. -Use `--exclude` to skip packages already in your worker-tetra Docker image: +Use `--exclude` to skip packages that are already included in your base worker image: ```bash # For GPU deployments (PyTorch pre-installed) flash build --exclude torch,torchvision,torchaudio ``` -Which packages to exclude depends on your resource config: +Which packages to exclude depends on your [resource config](/flash/resource-configuration): -- **GPU resources**: PyTorch images have `torch`, `torchvision`, and `torchaudio` pre-installed. -- **CPU resources**: Python slim images have no ML frameworks pre-installed. -- **Load-balanced**: Same as above, depends on GPU vs CPU variant. +- **GPU resources** use PyTorch as the base image, which has `torch`, `torchvision`, and `torchaudio` pre-installed. +- **CPU resources** use Python slim images, which have no ML frameworks pre-installed. +- **Load-balancer** resources use the same base image as their GPU/CPU counterparts. + + + You can find details about the Flash worker image in the [runpod-workers/worker-tetra](https://github.com/runpod/worker-tetra) repository. Find the `Dockerfile` for your endpoint type: `Dockerfile` (for GPU workers), `Dockerfile-cpu` (for CPU workers), or `Dockerfile-lb` (for load balancing workers). + ## Troubleshooting @@ -111,9 +116,9 @@ If the build process can't find your remote functions: If handler generation fails: -- Check for syntax errors in your Python files (these will be logged). +- Check for syntax errors in your Python files (they should be logged in the terminal). - Verify all imports in your worker modules are available. -- Ensure resource config variables (e.g., `gpu_config`) are defined before functions reference them. +- Ensure resource config variables (e.g., `gpu_config`) are defined before a function references them. - Use `--keep-build` to inspect generated handler files in `.flash/.build/`. ### Build succeeded but deployment failed diff --git a/flash/monitoring.mdx b/flash/monitoring.mdx index b9f2589f..fb206f58 100644 --- a/flash/monitoring.mdx +++ b/flash/monitoring.mdx @@ -2,6 +2,7 @@ title: "Monitoring and debugging" sidebarTitle: "Monitoring and debugging" description: "Monitor, debug, and troubleshoot Flash deployments." +tag: "BETA" --- This page covers how to monitor and debug your Flash deployments, including viewing logs, troubleshooting common issues, and optimizing performance. @@ -185,20 +186,6 @@ As you work with Flash, endpoints accumulate in your Runpod account. To manage t -Endpoints persist until manually deleted through the Runpod console. Regularly clean up unused endpoints to avoid unnecessary charges. +Endpoints persist until manually deleted through the Runpod console. Regularly clean up unused endpoints to avoid hitting your account's maximum worker capacity limits. - - -## Getting help - -If you're encountering issues not covered here: - -- Check the [Flash examples repository](https://github.com/runpod/flash-examples) for working examples. -- Review the [tetra-rp GitHub repository](https://github.com/runpod/tetra-rp) for the latest documentation. -- Contact [Runpod support](https://www.runpod.io/contact) for additional assistance. - -## Next steps - -- [View the resource configuration reference](/flash/resource-configuration) for all available options. -- [Learn about pricing](/flash/pricing) to optimize costs. -- [Deploy Flash applications](/flash/deploy-apps) for production. + \ No newline at end of file diff --git a/flash/overview.mdx b/flash/overview.mdx index 8b573b53..c9fe2641 100644 --- a/flash/overview.mdx +++ b/flash/overview.mdx @@ -2,28 +2,54 @@ title: "Flash overview" sidebarTitle: "Overview" description: "Develop and deploy AI workflows on Runpod Serverless with Python." +tag: "BETA" --- -Flash is a Python SDK for developing and deploying AI workflows on Runpod Serverless. You write Python functions locally, and Flash handles infrastructure management, GPU/CPU provisioning, dependency installation, and data transfer automatically. + +Flash is currently in beta. [Join our Discord](https://discord.gg/cUpRmau42V) to provide feedback and get support. + -Flash provides two ways to run workloads: +Flash is a Python SDK for developing and deploying AI workflows on [Runpod Serverless](/serverless/overview). You write Python functions locally, and Flash handles infrastructure management, GPU/CPU provisioning, dependency installation, and data transfer automatically. -- **Standalone scripts**: Use the `@remote` decorator to run Python functions on Runpod cloud infrastructure. -- **API endpoints**: Build and serve HTTP APIs using FastAPI that compute responses with GPU and CPU Serverless workers. +There are two ways to run workloads with Flash: + +- **Standalone scripts:** Add the `@remote` decorator to Python functions, and they'll run automatically on Runpod's cloud infrastructure when you run the script locally. +- **API endpoints:** Convert those functions into persistent endpoints that can be accessed via HTTP, scaling GPU/CPU resources automatically based on demand. + +Ready to try it out? Check out the quickstart guide and examples repository: + + + + Follow the quickstart to create your first Flash function in minutes. + + + + Check out our repository of prebuilt Flash applications. + + + + Learn about resource configuration, dependencies, and parallel execution. + + + Build HTTP APIs with FastAPI and Flash. + + -You can find prebuilt Flash examples at [runpod/flash-examples](https://github.com/runpod/flash-examples). ## Why use Flash? -Flash deploys Python functions to Runpod's Serverless infrastructure without requiring you to manage servers, configure networking, or handle scaling. You write functions, specify your dependencies in the decorator, and Flash installs them automatically when the function runs on remote workers. +**Flash is the easiest and fastest way to test and deploy AI/ML workloads on Runpod.** It's designed for local development and live-testing workflows, but can also be used to deploy production-ready applications. -You can specify the exact GPU hardware you need, from RTX 4090s to A100 80GB GPUs, for AI inference, training, and other compute-intensive tasks. Functions scale automatically based on demand and can run in parallel across multiple workers. +When you run a `@remote` function, Flash: +- Automatically provisions resources on Runpod's infrastructure. +- Installs your dependencies automatically. +- Runs your function on a remote GPU/CPU. +- Returns the result to your local environment. -Flash uses Serverless pricing with per-second billing. You're only charged for actual compute time—there are no costs when your code isn't running. +You can specify the exact GPU hardware you need, from RTX 4090s to A100 80GB GPUs, for AI inference, training, and other compute-intensive tasks. Functions scale automatically based on demand and can run in parallel across multiple resources. + +Flash uses [Runpod's Serverless pricing](/serverless/pricing) with per-second billing. You're only charged for actual compute time; there are no costs when your code isn't running. - - Follow the quickstart to create your first Flash function in minutes. - ## Install Flash @@ -33,7 +59,11 @@ Install Flash with `pip`: pip install tetra_rp`. ``` -Then configure your Runpod API key as an environment variable. +In your project directory, create a `.env` file and add your Runpod API key, replacing `YOUR_API_KEY` with your actual API key: + +```bash +touch .env && echo "RUNPOD_API_KEY=YOUR_API_KEY" > .env +``` ## Concepts @@ -127,15 +157,16 @@ A typical Flash development workflow looks like this: 1. Write Python functions with the `@remote` decorator. 2. Specify resource requirements and dependencies in the decorator. -3. Run your script locally—Flash handles remote execution automatically. -4. For API deployments, use `flash init` to create a project and `flash run` to start your server. +3. Run your script locally. Flash handles remote execution automatically. + +For API deployments, use `flash init` to create a project, then `flash run` to start your server. For a full walkthrough, see [Create a Flash API endpoint](/flash/api-endpoints). ## Limitations - Serverless deployments using Flash are currently restricted to the `EU-RO-1` datacenter. - Flash is designed primarily for local development and live-testing workflows. -- Endpoints created by Flash persist until manually deleted through the Runpod console. A `flash undeploy` command is in development. -- Be aware of your account's maximum worker capacity limits. Flash can rapidly scale workers across multiple endpoints, and you may hit capacity constraints. Contact Runpod support to increase your account's capacity allocation if needed. +- Endpoints created by Flash persist until manually deleted through the Runpod console. A `flash undeploy` command is currently in development to clean up unused endpoints. +- Be aware of your account's maximum worker capacity limits. Flash can rapidly scale workers across multiple endpoints, and you may hit capacity constraints. Contact [Runpod support](https://www.runpod.io/contact) to increase your account's capacity allocation if needed. ## Next steps @@ -143,13 +174,18 @@ A typical Flash development workflow looks like this: Get started with your first Flash function. - - Learn about resource configuration, dependencies, and parallel execution. - - - Build HTTP APIs with FastAPI and Flash. - Complete reference for resource configuration options. + + +## Getting help + +- Join the [Runpod community on Discord](https://discord.gg/cUpRmau42V) for support and discussion. + +## Next steps + +- [View the resource configuration reference](/flash/resource-configuration) for all available options. +- [Learn about pricing](/flash/pricing) to optimize costs. +- [Deploy Flash applications](/flash/deploy-apps) for production. diff --git a/flash/pricing.mdx b/flash/pricing.mdx index f2c05944..f6312c72 100644 --- a/flash/pricing.mdx +++ b/flash/pricing.mdx @@ -2,6 +2,7 @@ title: "Pricing" sidebarTitle: "Pricing" description: "Understand Flash pricing and optimize your costs." +tag: "BETA" --- Flash follows the same pricing model as [Runpod Serverless](/serverless/pricing). You pay per second of compute time, with no charges when your code isn't running. Pricing depends on the GPU or CPU type you configure for your endpoints. diff --git a/flash/quickstart.mdx b/flash/quickstart.mdx index 3ee79bd8..7bdfc665 100644 --- a/flash/quickstart.mdx +++ b/flash/quickstart.mdx @@ -2,6 +2,7 @@ title: "Get started with Flash" sidebarTitle: "Quickstart" description: "Set up your development environment and run your first GPU workload with Flash." +tag: "BETA" --- This tutorial shows you how to set up Flash and run a GPU workload on Runpod Serverless. You'll create a remote function that performs matrix operations on a GPU and returns the results to your local machine. @@ -21,7 +22,7 @@ In this tutorial you'll learn how to: - You've [created a Runpod account](/get-started/manage-accounts). - You've [created a Runpod API key](/get-started/api-keys). -- You've installed [Python 3.9 or greater](https://www.python.org/downloads/). +- You've installed [Python 3.9 (or higher)](https://www.python.org/downloads/). ## Step 1: Install Flash diff --git a/flash/remote-functions.mdx b/flash/remote-functions.mdx index f3f2c4f5..b8cc20bf 100644 --- a/flash/remote-functions.mdx +++ b/flash/remote-functions.mdx @@ -2,6 +2,7 @@ title: "Create remote functions" sidebarTitle: "Create remote functions" description: "Learn how to create and configure remote functions with Flash." +tag: "BETA" --- Remote functions are the core building blocks of Flash. The `@remote` decorator marks Python functions for execution on Runpod's Serverless infrastructure, handling resource provisioning, dependency installation, and data transfer automatically. diff --git a/flash/resource-configuration.mdx b/flash/resource-configuration.mdx index 6241ea4e..4623953e 100644 --- a/flash/resource-configuration.mdx +++ b/flash/resource-configuration.mdx @@ -2,6 +2,7 @@ title: "Resource configuration reference" sidebarTitle: "Configuration reference" description: "Complete reference for Flash resource configuration options." +tag: "BETA" --- Flash provides several resource configuration classes for different use cases. This reference covers all available parameters and options. From 8f0edc7a0f2da9cdc837f560f5e2ad83e432460b Mon Sep 17 00:00:00 2001 From: Mo King Date: Thu, 22 Jan 2026 12:34:55 -0500 Subject: [PATCH 03/19] Update overview description --- flash/overview.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/flash/overview.mdx b/flash/overview.mdx index c9fe2641..8845b165 100644 --- a/flash/overview.mdx +++ b/flash/overview.mdx @@ -1,7 +1,7 @@ --- -title: "Flash overview" +title: "Overview" sidebarTitle: "Overview" -description: "Develop and deploy AI workflows on Runpod Serverless with Python." +description: "Rapidly develop and deploy AI/ML apps with the Flash Python SDK." tag: "BETA" --- From b5453eb5bad48d6157f3e1a4e06ded4c4186db1e Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 6 Feb 2026 10:56:31 -0500 Subject: [PATCH 04/19] Tetra -> Flash --- .cursor/rules/rp-styleguide.mdc | 2 +- CLAUDE.md | 2 +- flash/api-endpoints.mdx | 4 ++-- flash/deploy-apps.mdx | 4 ++-- flash/overview.mdx | 4 ++-- flash/pricing.mdx | 2 +- flash/quickstart.mdx | 24 ++++++++++++------------ flash/remote-functions.mdx | 10 +++++----- flash/resource-configuration.mdx | 14 +++++++------- 9 files changed, 33 insertions(+), 33 deletions(-) diff --git a/.cursor/rules/rp-styleguide.mdc b/.cursor/rules/rp-styleguide.mdc index 9c4fefc5..e6529a1e 100644 --- a/.cursor/rules/rp-styleguide.mdc +++ b/.cursor/rules/rp-styleguide.mdc @@ -5,7 +5,7 @@ alwaysApply: true --- Always use sentence case for headings and titles. -These are proper nouns: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Tetra. +These are proper nouns: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Flash. These are generic terms: endpoint, worker, cluster, template, handler, fine-tune, network volume. Prefer using paragraphs to bullet points unless directly asked. diff --git a/CLAUDE.md b/CLAUDE.md index 3e7dae0c..0be9f6d3 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -98,7 +98,7 @@ Follow the Runpod style guide (`.cursor/rules/rp-styleguide.mdc`) and Google Dev ### Capitalization and Terminology - **Always use sentence case** for headings and titles -- **Proper nouns**: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Tetra +- **Proper nouns**: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Flash - **Generic terms** (lowercase): endpoint, worker, cluster, template, handler, fine-tune, network volume ### Writing Style diff --git a/flash/api-endpoints.mdx b/flash/api-endpoints.mdx index b8f04631..d7091c54 100644 --- a/flash/api-endpoints.mdx +++ b/flash/api-endpoints.mdx @@ -151,7 +151,7 @@ To add a new GPU endpoint for image generation: 1. Create a new file at `workers/gpu/image_gen.py`: ```python -from tetra_rp import remote, LiveServerless, GpuGroup +from runpod_flash import remote, LiveServerless, GpuGroup config = LiveServerless( name="image-generator", @@ -204,7 +204,7 @@ async def generate(prompt: str, width: int = 512, height: int = 512): For API endpoints requiring low-latency HTTP access with direct routing, use load-balanced endpoints: ```python -from tetra_rp import LiveLoadBalancer, remote +from runpod_flash import LiveLoadBalancer, remote api = LiveLoadBalancer(name="api-service") diff --git a/flash/deploy-apps.mdx b/flash/deploy-apps.mdx index ff01b646..ff62e7ac 100644 --- a/flash/deploy-apps.mdx +++ b/flash/deploy-apps.mdx @@ -24,7 +24,7 @@ Flash uses a factory pattern for handlers to eliminate code duplication: ```python # Generated handler (handler_gpu_config.py) -from tetra_rp.runtime.generic_handler import create_handler +from runpod_flash.runtime.generic_handler import create_handler from workers.gpu import process_data FUNCTION_REGISTRY = { @@ -99,7 +99,7 @@ Which packages to exclude depends on your [resource config](/flash/resource-conf - **Load-balancer** resources use the same base image as their GPU/CPU counterparts. - You can find details about the Flash worker image in the [runpod-workers/worker-tetra](https://github.com/runpod/worker-tetra) repository. Find the `Dockerfile` for your endpoint type: `Dockerfile` (for GPU workers), `Dockerfile-cpu` (for CPU workers), or `Dockerfile-lb` (for load balancing workers). + You can find details about the Flash worker image in the [runpod-workers/flash](https://github.com/runpod-workers/flash) repository. Find the `Dockerfile` for your endpoint type: `Dockerfile` (for GPU workers), `Dockerfile-cpu` (for CPU workers), or `Dockerfile-lb` (for load balancing workers). ## Troubleshooting diff --git a/flash/overview.mdx b/flash/overview.mdx index 8845b165..fe97982e 100644 --- a/flash/overview.mdx +++ b/flash/overview.mdx @@ -56,7 +56,7 @@ Flash uses [Runpod's Serverless pricing](/serverless/pricing) with per-second bi Install Flash with `pip`: ```bash -pip install tetra_rp`. +pip install runpod-flash ``` In your project directory, create a `.env` file and add your Runpod API key, replacing `YOUR_API_KEY` with your actual API key: @@ -89,7 +89,7 @@ async def main(): Flash provides fine-grained control over hardware allocation through configuration objects. You can configure GPU types, worker counts, idle timeouts, environment variables, and more. ```python -from tetra_rp import LiveServerless, GpuGroup +from runpod_flash import remote, LiveServerless, GpuGroup gpu_config = LiveServerless( name="ml-inference", diff --git a/flash/pricing.mdx b/flash/pricing.mdx index f6312c72..28ca0df8 100644 --- a/flash/pricing.mdx +++ b/flash/pricing.mdx @@ -74,7 +74,7 @@ config = LiveServerless( For data preprocessing, postprocessing, or other tasks that don't require GPU acceleration, use CPU workers instead of GPU workers. ```python -from tetra_rp import LiveServerless, CpuInstanceType +from runpod_flash import LiveServerless, CpuInstanceType # CPU configuration for non-GPU tasks cpu_config = LiveServerless( diff --git a/flash/quickstart.mdx b/flash/quickstart.mdx index 7bdfc665..417b7b04 100644 --- a/flash/quickstart.mdx +++ b/flash/quickstart.mdx @@ -29,7 +29,7 @@ In this tutorial you'll learn how to: Use `pip` to install Flash: ```bash -pip install tetra_rp +pip install runpod-flash ``` ## Step 2: Add your API key to the environment @@ -65,7 +65,7 @@ Add the necessary import statements: ```python import asyncio from dotenv import load_dotenv -from tetra_rp import remote, LiveServerless, GpuGroup +from runpod_flash import remote, LiveServerless, GpuGroup # Load environment variables from .env file load_dotenv() @@ -88,7 +88,7 @@ Define the Serverless endpoint configuration for your Flash workload: gpu_config = LiveServerless( gpus=[GpuGroup.AMPERE_24, GpuGroup.ADA_24], # Use any 24GB GPU workersMax=3, - name="tetra_gpu", + name="flash_gpu", ) ``` @@ -96,7 +96,7 @@ This `LiveServerless` object defines: - `gpus=[GpuGroup.AMPERE_24, GpuGroup.ADA_24]`: The GPUs that can be used by workers on this endpoint. This restricts workers to using any 24 GB GPU (L4, A5000, 3090, or 4090). See [GPU pools](/references/gpu-types#gpu-pools) for available GPU pool IDs. Removing this parameter allows the endpoint to use any available GPUs. - `workersMax=3`: The maximum number of worker instances. -- `name="tetra_gpu"`: The name of the endpoint that will be created/used in the Runpod console. +- `name="flash_gpu"`: The name of the endpoint that will be created/used in the Runpod console. If you run a Flash function that uses an identical `LiveServerless` configuration to a prior run, Runpod reuses your existing endpoint rather than creating a new one. However, if any configuration values have changed (not just the `name` parameter), a new endpoint will be created. @@ -109,7 +109,7 @@ Define the function that will run on the GPU worker: resource_config=gpu_config, dependencies=["numpy", "torch"] ) -def tetra_matrix_operations(size): +def flash_matrix_operations(size): """Perform large matrix operations using NumPy and check GPU availability.""" import numpy as np import torch @@ -141,7 +141,7 @@ This code demonstrates several key concepts: - `resource_config=gpu_config`: The function runs using the GPU configuration defined earlier. - `dependencies=["numpy", "torch"]`: Python packages that must be installed on the remote worker. -The `tetra_matrix_operations` function: +The `flash_matrix_operations` function: - Gets GPU details using PyTorch's CUDA utilities. - Creates two large random matrices using NumPy. @@ -158,7 +158,7 @@ Add a `main` function to execute your GPU workload: async def main(): # Run the GPU matrix operations print("Starting large matrix operations on GPU...") - result = await tetra_matrix_operations(1000) + result = await flash_matrix_operations(1000) # Print the results print("\nMatrix operations results:") @@ -246,7 +246,7 @@ When you run this script: 1. Flash reads your GPU resource configuration and provisions a worker on Runpod. 2. It installs the required dependencies (NumPy and PyTorch) on the worker. -3. Your `tetra_matrix_operations` function runs on the remote worker. +3. Your `flash_matrix_operations` function runs on the remote worker. 4. The function creates and multiplies large matrices, then calculates statistics. 5. Your local `main` function receives these results and displays them in your terminal. @@ -263,9 +263,9 @@ async def main(): # Run all matrix operations in parallel results = await asyncio.gather( - tetra_matrix_operations(500), - tetra_matrix_operations(1000), - tetra_matrix_operations(2000) + flash_matrix_operations(500), + flash_matrix_operations(1000), + flash_matrix_operations(2000) ) print("\nMatrix operations results:") @@ -322,4 +322,4 @@ You've successfully used Flash to run a GPU workload on Runpod. Now you can: - [Create more complex remote functions](/flash/remote-functions) with custom dependencies and resource configurations. - [Build API endpoints](/flash/api-endpoints) using FastAPI. - [Deploy Flash applications](/flash/deploy-apps) for production use. -- Explore more examples on the [runpod/flash-examples](https://github.com/runpod/flash-examples) GitHub repository. +- Explore more examples on the [runpod-workers/flash](https://github.com/runpod-workers/flash) GitHub repository. diff --git a/flash/remote-functions.mdx b/flash/remote-functions.mdx index b8cc20bf..60d035d5 100644 --- a/flash/remote-functions.mdx +++ b/flash/remote-functions.mdx @@ -16,7 +16,7 @@ Every remote function requires a resource configuration that specifies the compu `LiveServerless` is the primary configuration class for Flash. It supports full remote code execution, allowing you to run arbitrary Python functions on Runpod's infrastructure. ```python -from tetra_rp import LiveServerless, GpuGroup +from runpod_flash import LiveServerless, GpuGroup gpu_config = LiveServerless( name="ml-inference", @@ -49,7 +49,7 @@ See the [resource configuration reference](/flash/resource-configuration) for al For CPU-only workloads, specify `instanceIds` instead of `gpus`: ```python -from tetra_rp import LiveServerless, CpuInstanceType +from runpod_flash import LiveServerless, CpuInstanceType cpu_config = LiveServerless( name="data-processor", @@ -134,7 +134,7 @@ This is particularly useful for: ```python import asyncio -from tetra_rp import remote, LiveServerless, GpuGroup +from runpod_flash import remote, LiveServerless, GpuGroup config = LiveServerless( name="batch-processor", @@ -172,7 +172,7 @@ if __name__ == "__main__": For specialized environments that require a custom Docker image, use `ServerlessEndpoint` or `CpuServerlessEndpoint` instead of `LiveServerless`: ```python -from tetra_rp import ServerlessEndpoint, GpuGroup +from runpod_flash import ServerlessEndpoint, GpuGroup custom_gpu = ServerlessEndpoint( name="custom-ml-env", @@ -214,7 +214,7 @@ To find your network volume ID: ### Example: Using a network volume for model storage ```python -from tetra_rp import LiveServerless, GpuGroup, PodTemplate +from runpod_flash import LiveServerless, GpuGroup, PodTemplate config = LiveServerless( name="model-inference", diff --git a/flash/resource-configuration.mdx b/flash/resource-configuration.mdx index 4623953e..886c0f90 100644 --- a/flash/resource-configuration.mdx +++ b/flash/resource-configuration.mdx @@ -12,7 +12,7 @@ Flash provides several resource configuration classes for different use cases. T `LiveServerless` is the primary configuration class for Flash. It supports full remote code execution, allowing you to run arbitrary Python functions on Runpod's infrastructure. ```python -from tetra_rp import LiveServerless, GpuGroup, CpuInstanceType, PodTemplate +from runpod_flash import LiveServerless, GpuGroup, CpuInstanceType, PodTemplate gpu_config = LiveServerless( name="ml-inference", @@ -45,7 +45,7 @@ gpu_config = LiveServerless( ### GPU configuration example ```python -from tetra_rp import LiveServerless, GpuGroup, PodTemplate +from runpod_flash import LiveServerless, GpuGroup, PodTemplate config = LiveServerless( name="gpu-inference", @@ -62,7 +62,7 @@ config = LiveServerless( ### CPU configuration example ```python -from tetra_rp import LiveServerless, CpuInstanceType +from runpod_flash import LiveServerless, CpuInstanceType config = LiveServerless( name="cpu-processor", @@ -77,7 +77,7 @@ config = LiveServerless( `ServerlessEndpoint` is for GPU workloads that require custom Docker images. Unlike `LiveServerless`, it only supports dictionary payloads and cannot execute arbitrary Python functions. ```python -from tetra_rp import ServerlessEndpoint, GpuGroup +from runpod_flash import ServerlessEndpoint, GpuGroup config = ServerlessEndpoint( name="custom-ml-env", @@ -103,7 +103,7 @@ All parameters from `LiveServerless` are available, plus: ### Example ```python -from tetra_rp import ServerlessEndpoint, GpuGroup +from runpod_flash import ServerlessEndpoint, GpuGroup # Custom image with pre-installed models config = ServerlessEndpoint( @@ -128,7 +128,7 @@ result = await config.run({ `CpuServerlessEndpoint` is for CPU workloads that require custom Docker images. Like `ServerlessEndpoint`, it only supports dictionary payloads. ```python -from tetra_rp import CpuServerlessEndpoint, CpuInstanceType +from runpod_flash import CpuServerlessEndpoint, CpuInstanceType config = CpuServerlessEndpoint( name="data-processor", @@ -211,7 +211,7 @@ The `CpuInstanceType` enum provides access to CPU configurations: Use `PodTemplate` to configure additional pod settings: ```python -from tetra_rp import LiveServerless, PodTemplate +from runpod_flash import LiveServerless, PodTemplate config = LiveServerless( name="custom-template", From 4620f5f47183a6c7e416d369624ea96dfce96143 Mon Sep 17 00:00:00 2001 From: Mo King Date: Wed, 11 Feb 2026 11:32:24 -0500 Subject: [PATCH 05/19] Add environment variable instructions --- flash/overview.mdx | 4 +++- flash/quickstart.mdx | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/flash/overview.mdx b/flash/overview.mdx index fe97982e..e49cdb76 100644 --- a/flash/overview.mdx +++ b/flash/overview.mdx @@ -53,9 +53,11 @@ Flash uses [Runpod's Serverless pricing](/serverless/pricing) with per-second bi ## Install Flash -Install Flash with `pip`: +Create a Python virtual environment and use `pip` to install Flash: ```bash +python3 -m venv venv +source venv/bin/activate pip install runpod-flash ``` diff --git a/flash/quickstart.mdx b/flash/quickstart.mdx index 417b7b04..6579fc39 100644 --- a/flash/quickstart.mdx +++ b/flash/quickstart.mdx @@ -26,9 +26,11 @@ In this tutorial you'll learn how to: ## Step 1: Install Flash -Use `pip` to install Flash: +Create a Python virtual environment and use `pip` to install Flash: ```bash +python3 -m venv venv +source venv/bin/activate pip install runpod-flash ``` From 9b9c31428c01bed653d697075c86501e6488bd19 Mon Sep 17 00:00:00 2001 From: Mo King Date: Wed, 18 Feb 2026 08:11:32 -0500 Subject: [PATCH 06/19] Add CLI reference, expand app guides --- docs.json | 28 ++- flash/apps-and-environments.mdx | 218 +++++++++++++++++ flash/{api-endpoints.mdx => build-app.mdx} | 115 +++++---- flash/build-apps-overview.mdx | 255 ++++++++++++++++++++ flash/cli/app.mdx | 224 ++++++++++++++++++ flash/cli/build.mdx | 184 +++++++++++++++ flash/cli/deploy.mdx | 247 +++++++++++++++++++ flash/cli/env.mdx | 255 ++++++++++++++++++++ flash/cli/init.mdx | 89 +++++++ flash/cli/overview.mdx | 121 ++++++++++ flash/cli/run.mdx | 156 ++++++++++++ flash/cli/undeploy.mdx | 213 +++++++++++++++++ flash/deploy-apps.mdx | 262 ++++++++++++--------- flash/initialize-project.mdx | 209 ++++++++++++++++ flash/local-testing.mdx | 174 ++++++++++++++ flash/monitoring.mdx | 58 ++++- flash/overview.mdx | 221 ++++++++++++----- flash/quickstart.mdx | 4 +- flash/remote-functions.mdx | 2 +- flash/resource-configuration.mdx | 2 +- 20 files changed, 2821 insertions(+), 216 deletions(-) create mode 100644 flash/apps-and-environments.mdx rename flash/{api-endpoints.mdx => build-app.mdx} (65%) create mode 100644 flash/build-apps-overview.mdx create mode 100644 flash/cli/app.mdx create mode 100644 flash/cli/build.mdx create mode 100644 flash/cli/deploy.mdx create mode 100644 flash/cli/env.mdx create mode 100644 flash/cli/init.mdx create mode 100644 flash/cli/overview.mdx create mode 100644 flash/cli/run.mdx create mode 100644 flash/cli/undeploy.mdx create mode 100644 flash/initialize-project.mdx create mode 100644 flash/local-testing.mdx diff --git a/docs.json b/docs.json index bc8939df..0f92f47e 100644 --- a/docs.json +++ b/docs.json @@ -126,10 +126,32 @@ "flash/quickstart", "flash/pricing", "flash/remote-functions", - "flash/api-endpoints", - "flash/deploy-apps", "flash/resource-configuration", - "flash/monitoring" + { + "group": "Build apps", + "pages": [ + "flash/build-apps-overview", + "flash/build-app", + "flash/initialize-project", + "flash/local-testing", + "flash/apps-and-environments", + "flash/deploy-apps" + ] + }, + "flash/monitoring", + { + "group": "CLI reference", + "pages": [ + "flash/cli/overview", + "flash/cli/init", + "flash/cli/run", + "flash/cli/build", + "flash/cli/deploy", + "flash/cli/env", + "flash/cli/app", + "flash/cli/undeploy" + ] + } ] }, { diff --git a/flash/apps-and-environments.mdx b/flash/apps-and-environments.mdx new file mode 100644 index 00000000..6f0c42c6 --- /dev/null +++ b/flash/apps-and-environments.mdx @@ -0,0 +1,218 @@ +--- +title: "Manage apps and environments" +sidebarTitle: "Manage apps and environments" +description: "Understand the Flash deployment hierarchy and learn how to manage your apps." +tag: "BETA" +--- + +Flash organizes deployments using a two-level hierarchy: **apps** and **environments**. This structure enables standard development workflows where you can test changes in development, validate in staging, and deploy to production. + +## What is a Flash app? + +A **Flash app** is a cloud-side container that groups everything related to a single project. Think of it as a project namespace in Runpod that keeps your deployments organized together. + +Each app contains: + +- **Environments**: Deployment contexts like `dev`, `staging`, and `production`. +- **Builds**: Versioned artifacts created from your code. +- **Configuration**: App-wide settings and metadata. + +### App hierarchy + +```text +Flash App (my-project) +│ +├── Environments +│ ├── dev +│ │ ├── Endpoints (gpu-worker, cpu-worker) +│ │ └── Volumes (model-cache) +│ ├── staging +│ │ ├── Endpoints (gpu-worker, cpu-worker) +│ │ └── Volumes (model-cache) +│ └── production +│ ├── Endpoints (gpu-worker, cpu-worker) +│ └── Volumes (model-cache) +│ +└── Builds + ├── build_v1 (2024-01-15) + ├── build_v2 (2024-01-18) + └── build_v3 (2024-01-20) +``` + +### Creating apps + +Apps are created automatically when you first run `flash deploy`. You can also create them explicitly: + +```bash +flash app create my-project +``` + +### Managing apps + +Use `flash app` commands to manage your apps: + +```bash +# List all apps +flash app list + +# Get app details +flash app get my-project + +# Delete an app and all its resources +flash app delete --app my-project +``` + + + +Deleting an app removes all environments, builds, endpoints, and volumes associated with it. This operation is irreversible. + + + +## What is an environment? + +An **environment** is an isolated deployment context within a Flash app. Each environment is a separate "stage" that contains its own: + +- **Deployed endpoints**: Serverless endpoints provisioned from your `@remote` functions. +- **Active build version**: The specific version of your code running in this environment. +- **Network volumes**: Persistent storage for models, caches, and data. +- **Deployment state**: Current status (PENDING, DEPLOYING, DEPLOYED, etc.). + +Environments are completely independent. Deploying to one environment has no effect on others. + +### Creating environments + +Environments are created automatically when you deploy with `--env`: + +```bash +# Creates 'staging' environment if it doesn't exist +flash deploy --env staging +``` + +You can also create them explicitly: + +```bash +flash env create staging +``` + +### Managing environments + +Use `flash env` commands to manage environments: + +```bash +# List all environments +flash env list + +# Get environment details +flash env get production + +# Delete an environment +flash env delete dev +``` + +### Environment states + +| State | Description | +|-------|-------------| +| PENDING | Environment created but not deployed | +| DEPLOYING | Deployment in progress | +| DEPLOYED | Successfully deployed and running | +| FAILED | Deployment or health check failed | +| DELETING | Deletion in progress | + +## Deployment workflows + +### Single environment (simple projects) + +For simple projects, use a single `production` environment: + +```bash +# First deployment creates app and environment +flash deploy +``` + +### Multiple environments (team projects) + +For team projects, use multiple environments: + +```bash +# Create environments +flash env create dev +flash env create staging +flash env create production + +# Deploy to each +flash deploy --env dev # Development testing +flash deploy --env staging # QA validation +flash deploy --env production # Live deployment +``` + +### Feature branch deployments + +Create temporary environments for feature testing: + +```bash +# Create feature environment +flash env create feature-auth + +# Deploy feature branch +git checkout feature-auth +flash deploy --env feature-auth + +# Clean up after merge +flash env delete feature-auth +``` + +## Best practices + +### Naming conventions + +Use clear, descriptive names: + +```bash +# Good +flash env create dev +flash env create staging +flash env create production + +# Avoid +flash env create env1 +flash env create test123 +``` + +### Environment strategy + +**Three-tier approach** (recommended for teams): + +| Environment | Purpose | +|-------------|---------| +| `dev` | Active development, frequent deploys | +| `staging` | Pre-production testing, QA validation | +| `production` | Live user-facing deployment | + +**Simple approach** (small projects): + +| Environment | Purpose | +|-------------|---------| +| `dev` | Development and testing | +| `production` | Live deployment | + +### Workflow recommendations + +1. **Develop locally**: Test with `flash run` before deploying. +2. **Deploy to dev**: `flash deploy --env dev` for initial testing. +3. **Deploy to staging**: `flash deploy --env staging` for QA. +4. **Deploy to production**: `flash deploy --env production` after approval. + +### Resource management + +- Monitor environments regularly with `flash env list`. +- Clean up unused environments to avoid resource accumulation. +- Check resource usage with `flash env get `. +- Delete environments carefully as deletion is irreversible. + +## Next steps + +- [Deploy your first app](/flash/deploy-apps) with `flash deploy`. +- [Learn about the CLI](/flash/cli/overview) for all available commands. +- [View the env command reference](/flash/cli/env) for detailed options. +- [View the app command reference](/flash/cli/app) for detailed options. diff --git a/flash/api-endpoints.mdx b/flash/build-app.mdx similarity index 65% rename from flash/api-endpoints.mdx rename to flash/build-app.mdx index d7091c54..a225d058 100644 --- a/flash/api-endpoints.mdx +++ b/flash/build-app.mdx @@ -1,42 +1,75 @@ --- -title: "Create a Flash API endpoint" -sidebarTitle: "Create an endpoint" -description: "Build and serve HTTP APIs using FastAPI with Flash." +title: "Build a Flash app" +sidebarTitle: "Build a Flash app" +description: "Create a Flash app, test it locally, and deploy it to production." tag: "BETA" --- -Flash API endpoints let you build HTTP APIs with FastAPI that run on Runpod Serverless workers. Use them to deploy production APIs that need GPU or CPU acceleration. +Flash apps let you build FastAPI apps to serve AI/ML workloads on Runpod Serverless. This guide walks you through the process of building a Flash app from scratch, from project initialization and local testing to production deployment. -Unlike standalone scripts that run once and return results, this lets you create a persistent endpoint for handling incoming HTTP requests. Each request is processed by a Serverless worker using the same remote functions you'd use in a standalone script. + +If you haven't already, we recommend starting with the [Quickstart](/flash/quickstart) guide to get a feel for how Flash `@remote` functions work. + - +## Requirements: -Flash API endpoints are currently available for local testing only. Run `flash run` to start the API server on your local machine. Production deployment support is coming in future updates. +- You've [created a Runpod account](/get-started/manage-accounts). +- You've [created a Runpod API key](/get-started/api-keys). +- You've installed [Python 3.10 (or higher)](https://www.python.org/downloads/). - +## What you'll learn -## Step 1: Initialize a new project +In this tutorial you'll learn how to: + +- Create a new Flash project with a template structure. +- Explore the project template. +- Install Python dependencies. +- Add your API key to the environment. +- Start the local development server. +- Test the API endpoint using cURL. +- Open the API explorer. +- Customize your API endpoint. +- Deploy to production. -Use the `flash init` command to generate a structured project template with a preconfigured FastAPI application entry point. +## Step 1: Initialize a new project -Run this command to initialize a new project directory: +Create a new directory and Python virtual environment: ```bash -flash init my_project +# Create the project directory and navigate into it: +mkdir flash_app +cd flash_app + +# Install Flash: +python3 -m venv venv +source venv/bin/activate +pip install runpod-flash ``` -You can also initialize your current directory: +Use the `flash init` command to generate a structured project template with a preconfigured FastAPI application entry point: ```bash flash init ``` +Make sure your API key is set in the environment, either by creating a `.env` file or exporting the `RUNPOD_API_KEY` environment variable: + +```bash +# Set the API key as an environment variable: +export RUNPOD_API_KEY=YOUR_API_KEY + +# Or create a `.env` file: +touch .env && echo "RUNPOD_API_KEY=YOUR_API_KEY" > .env +``` + +Replace `YOUR_API_KEY` with your actual Runpod API key. + ## Step 2: Explore the project template This is the structure of the project template created by `flash init`: ```text -my_project/ +flash_app/ ├── main.py # FastAPI application entry point ├── workers/ │ ├── gpu/ # GPU worker example @@ -45,7 +78,6 @@ my_project/ │ └── cpu/ # CPU worker example │ ├── __init__.py # FastAPI router │ └── endpoint.py # CPU script with @remote decorated function -├── .env # Environment variable template ├── .gitignore # Git ignore patterns ├── .flashignore # Flash deployment ignore patterns ├── requirements.txt # Python dependencies @@ -55,7 +87,7 @@ my_project/ This template includes: - A FastAPI application entry point and routers. -- Templates for Python dependencies, `.env`, `.gitignore`, etc. +- Templates for `requirements.txt`, `.env`, `.gitignore`, etc. - Flash scripts (`endpoint.py`) for both GPU and CPU workers, which include: - Pre-configured worker scaling limits using the `LiveServerless()` object. - A `@remote` decorated function that returns a response from a worker. @@ -64,12 +96,6 @@ When you start the FastAPI server, it creates API endpoints at `/gpu/hello` and ## Step 3: Install Python dependencies -After initializing the project, navigate into the project directory: - -```bash -cd my_project -``` - Install required dependencies: ```bash @@ -199,35 +225,42 @@ async def generate(prompt: str, width: int = 512, height: int = 512): 3. Include the router in `main.py` if not already included. -## Load-balanced endpoints +## Step 8: Deploy to Runpod -For API endpoints requiring low-latency HTTP access with direct routing, use load-balanced endpoints: +When you're ready to deploy your app to Runpod, use `flash deploy`: -```python -from runpod_flash import LiveLoadBalancer, remote +```bash +flash deploy +``` -api = LiveLoadBalancer(name="api-service") +This command: -@remote(api, method="POST", path="/api/process") -async def process_data(x: int, y: int): - return {"result": x + y} +1. Builds your application into a deployment artifact. +2. Uploads it to Runpod's storage. +3. Provisions Serverless endpoints for your `@remote` functions. +4. Deploys your FastAPI application as the "mothership" endpoint. -@remote(api, method="GET", path="/api/health") -def health_check(): - return {"status": "ok"} +After deployment, you'll receive a public URL for your API: -# Call functions directly -result = await process_data(5, 3) # → {"result": 8} +```text +Your mothership is deployed at: +https://api-xxxxx.runpod.net + +Available Routes: +POST /gpu/hello +POST /cpu/hello ``` -Key differences from queue-based endpoints: +All requests to the deployed app require authentication with your Runpod API key: -- **Direct HTTP routing**: Requests routed directly to workers, no queue. -- **Lower latency**: No queuing overhead. -- **Custom HTTP methods**: GET, POST, PUT, DELETE, PATCH support. -- **No automatic retries**: Users handle errors directly. +```bash +curl -X POST https://api-xxxxx.runpod.net/gpu/hello \ + -H "Authorization: Bearer $RUNPOD_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"message": "Hello from production!"}' +``` -Load-balanced endpoints are ideal for REST APIs, webhooks, and real-time services. Queue-based endpoints are better for batch processing and fault-tolerant workflows. +For detailed deployment options including environment management, see [Deploy Flash apps](/flash/deploy-apps). ## Next steps diff --git a/flash/build-apps-overview.mdx b/flash/build-apps-overview.mdx new file mode 100644 index 00000000..488fbfd1 --- /dev/null +++ b/flash/build-apps-overview.mdx @@ -0,0 +1,255 @@ +--- +title: "Development lifecycle" +sidebarTitle: "Development lifecycle" +description: "Understand the Flash development lifecycle and how to build and deploy your applications." +tag: "BETA" +--- + +Flash provides a complete development and deployment workflow to build AI/ML applications and services using Runpod's GPU/CPU infrastructure. This page explains the key concepts and processes you'll use when building Flash apps. + + +If you prefer to learn by doing, follow this tuturial to [build your first Flash app](/flash/build-app). + + +## App development overview + +Building a Flash application follows a clear progression from initialization to production deployment: + +
+```mermaid +%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%% + +flowchart TB + Init["flash init
Create project"] + Code["Define endpoints with
@remote functions"] + Run["Test locally with
flash run"] + Deploy["Deploy to Runpod with
flash deploy"] + Manage["Manage apps and
environments with
flash app and flash env"] + + Init --> Code + Code --> Run + Run -->|"Ready for production"| Deploy + Deploy --> Manage + Run -->|"Continue development"| Code + + style Init fill:#5F4CFE,stroke:#5F4CFE,color:#fff + style Code fill:#22C55E,stroke:#22C55E,color:#000 + style Run fill:#4D38F5,stroke:#4D38F5,color:#fff + style Deploy fill:#AE6DFF,stroke:#AE6DFF,color:#000 + style Manage fill:#9289FE,stroke:#9289FE,color:#fff +``` +
+ + + + Use `flash init` to create a new project with a FastAPI server and example workers: + + ```bash + flash init my-project + cd my-project + pip install -r requirements.txt + ``` + + This gives you a working project structure with GPU and CPU worker examples. [Learn more about project initialization](/flash/initialize-project). + + + + Write your application code by defining `@remote` functions that execute on Runpod workers: + + ```python + from runpod_flash import remote, LiveServerless, GpuGroup + + config = LiveServerless( + name="inference-worker", + gpus=[GpuGroup.ADA_24], + workersMax=3, + ) + + @remote(resource_config=config, dependencies=["torch"]) + def run_inference(prompt: str) -> dict: + import torch + # Your inference logic here + return {"result": "..."} + ``` + + [Learn more about building apps](/flash/build-app). + + + + Start a local development server to test your application: + + ```bash + flash run + ``` + + Your FastAPI app runs locally and updates automatically, while `@remote` functions execute on real Runpod workers. This hybrid architecture lets you iterate quickly without deploying after every change. [Learn more about local testing](/flash/local-testing). + + + + When ready for production, deploy your application to Runpod Serverless: + + ```bash + flash deploy + ``` + + Your entire application—including the FastAPI server and all worker functions—runs on Runpod infrastructure. [Learn more about deployment](/flash/deploy-apps). + + + + Use apps and environments to organize and manage your deployments across different stages (dev, staging, production). [Learn more about apps and environments](/flash/apps-and-environments). + + + +## Apps and environments + +Flash uses a two-level organizational structure to manage deployments: **apps** and **environments**. + +### What is a Flash app? + +A **Flash app** is a logical container for all resources related to a single project. Think of it as a namespace that groups together: + +- **Environments**: Different deployment stages (dev, staging, production). +- **Builds**: Versioned artifacts of your application code. +- **Configuration**: App-wide settings and metadata. + +Apps are created automatically when you first run `flash deploy`, or you can create them explicitly with `flash app create`. + +### What is an environment? + +An **environment** is an isolated deployment stage within an app. Each environment has its own: + +- **Deployed endpoints**: Serverless workers for your `@remote` functions. +- **Build version**: The specific code version running in this environment. +- **State**: Current deployment status (deploying, deployed, failed, etc.). + +Environments are completely independent—deploying to `dev` has no effect on `production`. You can create and manage environments with the `flash env` command. + +## Local vs production deployment + +Flash supports two modes of operation: + +### Local development (`flash run`) + +```mermaid +%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%% + +flowchart TB + subgraph Local ["YOUR MACHINE"] + FastAPI["FastAPI App
• Updates automatically
• localhost:8888"] + end + + subgraph Runpod ["RUNPOD SERVERLESS"] + Workers["Workers
• @remote functions
• live- prefix"] + end + + FastAPI -->|"HTTPS"| Workers + + style Local fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff + style Runpod fill:#1a1a2e,stroke:#22C55E,stroke-width:2px,color:#fff + style FastAPI fill:#5F4CFE,stroke:#5F4CFE,color:#fff + style Workers fill:#22C55E,stroke:#22C55E,color:#000 +``` + +**How it works:** +- FastAPI runs on your machine and updates automatically +- `@remote` functions run on Runpod workers +- Endpoints prefixed with `live-` for easy identification +- No authentication required for local testing +- Fast iteration on application logic + +### Production deployment (`flash deploy`) + +```mermaid +%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%% + +flowchart TB + Users(["USERS"]) + + subgraph Runpod ["RUNPOD SERVERLESS"] + Mothership["Mothership
• FastAPI app
• Public URL"] + Workers["Workers
• @remote functions"] + + Mothership -->|"internal"| Workers + end + + Users -->|"HTTPS (auth required)"| Mothership + + style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff + style Users fill:#4D38F5,stroke:#4D38F5,color:#fff + style Mothership fill:#5F4CFE,stroke:#5F4CFE,color:#fff + style Workers fill:#22C55E,stroke:#22C55E,color:#000 +``` + +**How it works:** +- Entire application runs on Runpod Serverless +- FastAPI "mothership" endpoint orchestrates worker calls +- Public HTTPS URL with API key authentication +- Automatic scaling based on load +- Production-grade reliability and performance + +## Common workflows + +### Simple projects (single environment) + +For solo projects or simple applications: + +```bash +# Initialize and develop +flash init my-project +cd my-project + +# Test locally +flash run + +# Deploy to production (creates 'production' environment by default) +flash deploy +``` + +### Team projects (multiple environments) + +For team collaboration with dev, staging, and production stages: + +```bash +# Create environments +flash env create dev +flash env create staging +flash env create production + +# Development cycle +flash run # Test locally +flash deploy --env dev # Deploy to dev for integration testing +flash deploy --env staging # Deploy to staging for QA +flash deploy --env production # Deploy to production after approval +``` + +### Feature development + +For testing new features in isolation: + +```bash +# Create temporary feature environment +flash env create feature-new-model + +# Deploy and test +flash deploy --env feature-new-model + +# Clean up after merging +flash env delete feature-new-model +``` + +## Next steps + + + + Create a Flash app, test it locally, and deploy it to production. + + + Create boilerplate code for a new Flash project with `flash init`. + + + Use `flash run` for local development and testing. + + + Deploy your application to production with `flash deploy`. + + diff --git a/flash/cli/app.mdx b/flash/cli/app.mdx new file mode 100644 index 00000000..371ae7ca --- /dev/null +++ b/flash/cli/app.mdx @@ -0,0 +1,224 @@ +--- +title: "app" +sidebarTitle: "app" +--- + +Manage Flash applications. An app is the top-level container that groups your deployment environments, build artifacts, and configuration. + +```bash Command +flash app [OPTIONS] +``` + +## Subcommands + +| Subcommand | Description | +|------------|-------------| +| `list` | Show all apps in your account | +| `create` | Create a new app | +| `get` | Show details of an app | +| `delete` | Delete an app and all its resources | + +--- + +## app list + +Show all Flash apps under your account. + +```bash Command +flash app list +``` + +### Example + +```bash +flash app list +``` + +### Output + +```text +┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓ +┃ Name ┃ ID ┃ Environments ┃ Builds ┃ +┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩ +│ my-project │ app_abc123 │ dev, staging, prod │ build_1, build_2 │ +│ demo-api │ app_def456 │ production │ build_3 │ +│ ml-inference │ app_ghi789 │ dev, production │ build_4, build_5 │ +└────────────────┴──────────────────────┴─────────────────────────┴──────────────────┘ +``` + +--- + +## app create + +Create a new Flash app. + +```bash Command +flash app create +``` + +### Example + +```bash +flash app create my-project +``` + +### Arguments + + +Name for the new Flash app. Must be unique within your account. + + +### Notes + +- App names must be unique within your account. +- Apps are namespaced to your account, so different users can have apps with the same name. + + + +Most users don't need to run `flash app create` explicitly. Apps are created automatically when you first run `flash deploy`. This command is primarily for CI/CD pipelines that need to pre-register apps before deployment. + + + +--- + +## app get + +Get detailed information about a Flash app. + +```bash Command +flash app get +``` + +### Example + +```bash +flash app get my-project +``` + +### Arguments + + +Name of the Flash app to inspect. + + +### Output + +```text +╭─────────────────────────────────╮ +│ Flash App: my-project │ +├─────────────────────────────────┤ +│ Name: my-project │ +│ ID: app_abc123 │ +│ Environments: 3 │ +│ Builds: 5 │ +╰─────────────────────────────────╯ + + Environments +┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓ +┃ Name ┃ ID ┃ State ┃ Active Build ┃ Created ┃ +┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩ +│ dev │ env_dev123 │ DEPLOYED│ build_xyz789 │ 2024-01-15 10:30 │ +│ staging │ env_stg456 │ DEPLOYED│ build_xyz789 │ 2024-01-16 14:20 │ +│ production │ env_prd789 │ DEPLOYED│ build_abc123 │ 2024-01-20 09:15 │ +└────────────┴────────────────────┴─────────┴──────────────────┴──────────────────┘ + + Builds +┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓ +┃ ID ┃ Status ┃ Created ┃ +┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩ +│ build_abc123 │ COMPLETED │ 2024-01-20 09:00 │ +│ build_xyz789 │ COMPLETED │ 2024-01-18 15:45 │ +│ build_def456 │ COMPLETED │ 2024-01-15 11:20 │ +└────────────────────┴──────────────────────────┴──────────────────┘ +``` + +--- + +## app delete + +Delete a Flash app and all its associated resources. + +```bash Command +flash app delete --app +``` + +### Example + +```bash +flash app delete --app my-project +``` + +### Flags + + +Name of the Flash app to delete. Required explicitly for safety. + + + + +Unlike other subcommands, `delete` requires the `--app` flag explicitly. This is a safety measure for destructive operations. + + + +### Process + +1. Shows app details and resources to be deleted. +2. Prompts for confirmation (required). +3. Deletes all environments and their resources. +4. Deletes all builds. +5. Deletes the app. + + + +This operation is irreversible. All environments, builds, endpoints, volumes, and configuration will be permanently deleted. + + + +--- + +## App hierarchy + +A Flash app contains environments and builds: + +```text +Flash App (my-project) +│ +├── Environments +│ ├── dev +│ │ ├── Endpoints (ep1, ep2) +│ │ └── Volumes (vol1) +│ ├── staging +│ │ ├── Endpoints (ep1, ep2) +│ │ └── Volumes (vol1) +│ └── production +│ ├── Endpoints (ep1, ep2) +│ └── Volumes (vol1) +│ +└── Builds + ├── build_v1 (2024-01-15) + ├── build_v2 (2024-01-18) + └── build_v3 (2024-01-20) +``` + +## Auto-detection + +Flash CLI automatically detects the app name from your current directory: + +```bash +cd /path/to/my-project +flash deploy # Deploys to 'my-project' app +flash env list # Lists 'my-project' environments +``` + +Override with the `--app` flag: + +```bash +flash deploy --app other-project +flash env list --app other-project +``` + +## Related commands + +- [`flash env`](/flash/cli/env) - Manage environments within an app +- [`flash deploy`](/flash/cli/deploy) - Deploy to an app's environment +- [`flash init`](/flash/cli/init) - Create a new project diff --git a/flash/cli/build.mdx b/flash/cli/build.mdx new file mode 100644 index 00000000..fa125112 --- /dev/null +++ b/flash/cli/build.mdx @@ -0,0 +1,184 @@ +--- +title: "build" +sidebarTitle: "build" +--- + +Build a deployment-ready artifact for your Flash application without deploying. Use this for more control over the build process or to inspect the artifact before deploying. + +```bash +flash build [OPTIONS] +``` + +## Example + +Build with all dependencies: + +```bash +flash build +``` + +Build and launch local preview environment: + +```bash +flash build --preview +``` + +Build with excluded packages (for smaller deployment size): + +```bash +flash build --exclude torch,torchvision,torchaudio +``` + +Keep the build directory for inspection: + +```bash +flash build --keep-build +``` + +## Flags + + +Skip transitive dependencies during pip install. Only installs direct dependencies specified in `@remote` decorators. Useful when the base image already includes dependencies. + + + +Keep the `.flash/.build` directory after creating the archive. Useful for debugging build issues or inspecting generated files. + + + +Custom name for the output archive file. + + + +Comma-separated list of packages to exclude from the build (e.g., `torch,torchvision`). Use this to skip packages already in the base image. + + + +Launch a local Docker-based test environment after building. Automatically enables `--keep-build`. + + +## What happens during build + +1. **Function discovery**: Finds all `@remote` decorated functions. +2. **Grouping**: Groups functions by their `resource_config`. +3. **Manifest generation**: Creates `.flash/flash_manifest.json` with endpoint definitions. +4. **Dependency installation**: Installs Python packages for Linux x86_64. +5. **Packaging**: Bundles everything into `.flash/artifact.tar.gz`. + +## Build artifacts + +After running `flash build`: + +| File/Directory | Description | +|----------------|-------------| +| `.flash/artifact.tar.gz` | Deployment package ready for Runpod | +| `.flash/flash_manifest.json` | Service discovery configuration | +| `.flash/.build/` | Temporary build directory (removed unless `--keep-build`) | + +## Cross-platform builds + +Flash automatically handles cross-platform builds: + +- **Automatic platform targeting**: Dependencies are installed for Linux x86_64, regardless of your build platform. +- **Python version matching**: Uses your current Python version for package compatibility. +- **Binary wheel enforcement**: Only pre-built wheels are used, preventing compilation issues. + +You can build on macOS, Windows, or Linux, and the deployment will work on Runpod. + +## Managing deployment size + +Runpod Serverless has a **500MB deployment limit**. Use `--exclude` to skip packages already in your base image: + +```bash +# For GPU deployments (PyTorch pre-installed) +flash build --exclude torch,torchvision,torchaudio +``` + +### Base image reference + +| Resource type | Base image | Safe to exclude | +|--------------|------------|-----------------| +| GPU | PyTorch base | `torch`, `torchvision`, `torchaudio` | +| CPU | Python slim | Do not exclude ML packages | + + + +Check the [worker-flash repository](https://github.com/runpod-workers/worker-flash) for current base images and pre-installed packages. + + + +## Preview environment + +Test your deployment locally before pushing to Runpod: + +```bash +flash build --preview +``` + +This: + +1. Builds your project (creates archive and manifest). +2. Creates a Docker network for inter-container communication. +3. Starts one container per resource config (mothership + workers). +4. Exposes the mothership on `localhost:8000`. +5. On shutdown (`Ctrl+C`), stops and removes all containers. + +### When to use preview + +- Test deployment configuration before production. +- Validate manifest structure. +- Debug resource provisioning. +- Verify cross-endpoint function calls. + +## Troubleshooting + +### Build fails with "functions not found" + +Ensure your project has `@remote` decorated functions: + +```python +from runpod_flash import remote, LiveServerless + +config = LiveServerless(name="my-worker") + +@remote(resource_config=config) +def my_function(data): + return {"result": data} +``` + +### Archive is too large + +Use `--exclude` or `--no-deps`: + +```bash +flash build --exclude torch,torchvision,torchaudio +``` + +### Dependency installation fails + +If a package doesn't have Linux x86_64 wheels: + +1. Ensure standard pip is installed: `python -m ensurepip --upgrade` +2. Check PyPI for Linux wheel availability. +3. For Python 3.13+, some packages may require newer manylinux versions. + +### Need to examine generated files + +Use `--keep-build`: + +```bash +flash build --keep-build +ls .flash/.build/ +``` + +## Related commands + +- [`flash deploy`](/flash/cli/deploy) - Build and deploy in one step +- [`flash run`](/flash/cli/run) - Start development server +- [`flash env`](/flash/cli/env) - Manage environments + + + +Most users should use `flash deploy` instead, which runs build and deploy in one step. Use `flash build` when you need more control or want to inspect the artifact. + + diff --git a/flash/cli/deploy.mdx b/flash/cli/deploy.mdx new file mode 100644 index 00000000..00ee5544 --- /dev/null +++ b/flash/cli/deploy.mdx @@ -0,0 +1,247 @@ +--- +title: "deploy" +sidebarTitle: "deploy" +--- + +Build and deploy your Flash application to Runpod Serverless endpoints in one step. This is the primary command for getting your application running in the cloud. + +```bash +flash deploy [OPTIONS] +``` + +## Example + +Build and deploy (auto-selects environment if only one exists): + +```bash +flash deploy +``` + +Deploy to a specific environment: + +```bash +flash deploy --env production +``` + +Deploy with excluded packages to reduce size: + +```bash +flash deploy --exclude torch,torchvision,torchaudio +``` + +Build and test locally before deploying: + +```bash +flash deploy --preview +``` + +## Flags + + +Target environment name (e.g., `dev`, `staging`, `production`). Auto-selected if only one exists. Creates the environment if it doesn't exist. + + + +Flash app name. Auto-detected from the current directory if not specified. + + + +Skip transitive dependencies during pip install. Useful when the base image already includes dependencies. + + + +Comma-separated packages to exclude (e.g., `torch,torchvision`). Use this to stay under the 500MB deployment limit. + + + +Custom archive name for the build artifact. + + + +Build and launch a local Docker-based preview environment instead of deploying to Runpod. + + + +Bundle local `runpod_flash` source instead of the PyPI version. For development and testing only. + + +## What happens during deployment + +1. **Build phase**: Creates the deployment artifact (same as `flash build`). +2. **Environment resolution**: Detects or creates the target environment. +3. **Upload**: Sends the artifact to Runpod storage. +4. **Provisioning**: Creates or updates Serverless endpoints. +5. **Configuration**: Sets up environment variables and service discovery. +6. **Verification**: Confirms endpoints are healthy. + +## Architecture + +After deployment, your entire application runs on Runpod Serverless: + +
+```mermaid +%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%% + +flowchart TB + Users(["USERS"]) + + subgraph Runpod ["RUNPOD SERVERLESS"] + Mothership["MOTHERSHIP ENDPOINT
(your FastAPI app from main.py)
• Your HTTP routes
• Orchestrates @remote calls
• Public URL for users"] + GPU["gpu-worker
(your @remote function)"] + CPU["cpu-worker
(your @remote function)"] + + Mothership -->|"internal"| GPU + Mothership -->|"internal"| CPU + end + + Users -->|"HTTPS (authenticated)"| Mothership + + style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff + style Users fill:#4D38F5,stroke:#4D38F5,color:#fff + style Mothership fill:#5F4CFE,stroke:#5F4CFE,color:#fff + style GPU fill:#22C55E,stroke:#22C55E,color:#000 + style CPU fill:#22C55E,stroke:#22C55E,color:#000 +``` +
+ +## Environment management + +### Automatic creation + +If the specified environment doesn't exist, `flash deploy` creates it: + +```bash +# Creates 'staging' if it doesn't exist +flash deploy --env staging +``` + +### Auto-selection + +When you have only one environment, it's selected automatically: + +```bash +# Auto-selects the only available environment +flash deploy +``` + +When multiple environments exist, you must specify one: + +```bash +# Required when multiple environments exist +flash deploy --env staging +``` + +### Default environment + +If no environment exists and none is specified, Flash creates a `production` environment by default. + +## Post-deployment + +After successful deployment, Flash displays: + +```text +✓ Deployment Complete + +Your mothership is deployed at: +https://api-xxxxx.runpod.net + +Available Routes: +POST /api/hello +POST /gpu/process + +All endpoints require authentication: +curl -X POST https://api-xxxxx.runpod.net/api/hello \ + -H "Authorization: Bearer $RUNPOD_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"param": "value"}' +``` + +### Authentication + +All deployed endpoints require authentication with your Runpod API key: + +```bash +export RUNPOD_API_KEY="your_key_here" + +curl -X POST https://YOUR_ENDPOINT_URL/path \ + -H "Authorization: Bearer $RUNPOD_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"param": "value"}' +``` + +## Preview mode + +Test locally before deploying: + +```bash +flash deploy --preview +``` + +This builds your project and runs it in Docker containers locally: + +- Mothership exposed on `localhost:8000`. +- All containers communicate via Docker network. +- Press `Ctrl+C` to stop. + +## Managing deployment size + +Runpod Serverless has a **500MB limit**. Use `--exclude` to skip packages in the base image: + +```bash +# GPU deployments (PyTorch pre-installed) +flash deploy --exclude torch,torchvision,torchaudio +``` + +| Resource type | Safe to exclude | +|--------------|-----------------| +| GPU | `torch`, `torchvision`, `torchaudio` | +| CPU | Do not exclude ML packages | + +## flash run vs flash deploy + +| Aspect | `flash run` | `flash deploy` | +|--------|-------------|----------------| +| FastAPI app runs on | Your machine | Runpod Serverless | +| `@remote` functions run on | Runpod Serverless | Runpod Serverless | +| Endpoint naming | `live-` prefix | No prefix | +| Automatic updates | Yes | No | +| Use case | Development | Production | + +## Troubleshooting + +### Multiple environments error + +```text +Error: Multiple environments found: dev, staging, production +``` + +Specify the target environment: + +```bash +flash deploy --env staging +``` + +### Deployment size limit + +Use `--exclude` to reduce size: + +```bash +flash deploy --exclude torch,torchvision,torchaudio +``` + +### Authentication fails + +Ensure your API key is set: + +```bash +echo $RUNPOD_API_KEY +export RUNPOD_API_KEY="your_key_here" +``` + +## Related commands + +- [`flash build`](/flash/cli/build) - Build without deploying +- [`flash run`](/flash/cli/run) - Local development server +- [`flash env`](/flash/cli/env) - Manage environments +- [`flash app`](/flash/cli/app) - Manage applications +- [`flash undeploy`](/flash/cli/undeploy) - Remove endpoints diff --git a/flash/cli/env.mdx b/flash/cli/env.mdx new file mode 100644 index 00000000..00215404 --- /dev/null +++ b/flash/cli/env.mdx @@ -0,0 +1,255 @@ +--- +title: "env" +sidebarTitle: "env" +--- + +Manage deployment environments for Flash applications. Environments are isolated deployment contexts (like `dev`, `staging`, `production`) within a Flash app. + +```bash Command +flash env [OPTIONS] +``` + +## Subcommands + +| Subcommand | Description | +|------------|-------------| +| `list` | Show all environments for an app | +| `create` | Create a new environment | +| `get` | Show details of an environment | +| `delete` | Delete an environment and its resources | + +--- + +## env list + +Show all available environments for an app. + +```bash Command +flash env list [OPTIONS] +``` + +### Example + +```bash +# List environments for current app +flash env list + +# List environments for specific app +flash env list --app my-project +``` + +### Flags + + +Flash app name. Auto-detected from current directory if not specified. + + +### Output + +```text +┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓ +┃ Name ┃ ID ┃ Active Build ┃ Created At ┃ +┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩ +│ dev │ env_abc123 │ build_xyz789 │ 2024-01-15 10:30 │ +│ staging │ env_def456 │ build_uvw456 │ 2024-01-16 14:20 │ +│ production │ env_ghi789 │ build_rst123 │ 2024-01-20 09:15 │ +└────────────┴─────────────────────┴───────────────────┴──────────────────┘ +``` + +--- + +## env create + +Create a new deployment environment. + +```bash Command +flash env create [OPTIONS] +``` + +### Example + +```bash +# Create staging environment +flash env create staging + +# Create environment in specific app +flash env create production --app my-project +``` + +### Arguments + + +Name for the new environment (e.g., `dev`, `staging`, `production`). + + +### Flags + + +Flash app name. Auto-detected from current directory if not specified. + + +### Notes + +- If the app doesn't exist, it's created automatically. +- Environment names must be unique within an app. +- Newly created environments have no active build until first deployment. + + + +You don't always need to create environments explicitly. Running `flash deploy --env ` creates the environment automatically if it doesn't exist. + + + +--- + +## env get + +Show detailed information about a deployment environment. + +```bash Command +flash env get [OPTIONS] +``` + +### Example + +```bash +# Get details for production environment +flash env get production + +# Get details for specific app's environment +flash env get staging --app my-project +``` + +### Arguments + + +Name of the environment to inspect. + + +### Flags + + +Flash app name. Auto-detected from current directory if not specified. + + +### Output + +```text +╭────────────────────────────────────╮ +│ Environment: production │ +├────────────────────────────────────┤ +│ ID: env_ghi789 │ +│ State: DEPLOYED │ +│ Active Build: build_rst123 │ +│ Created: 2024-01-20 09:15:00 │ +╰────────────────────────────────────╯ + + Associated Endpoints +┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓ +┃ Name ┃ ID ┃ +┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩ +│ my-gpu │ ep_abc123 │ +│ my-cpu │ ep_def456 │ +└────────────────┴────────────────────┘ + + Associated Network Volumes +┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓ +┃ Name ┃ ID ┃ +┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩ +│ model-cache │ nv_xyz789 │ +└────────────────┴────────────────────┘ +``` + +--- + +## env delete + +Delete a deployment environment and all its associated resources. + +```bash Command +flash env delete [OPTIONS] +``` + +### Example + +```bash +# Delete development environment +flash env delete dev + +# Delete environment in specific app +flash env delete staging --app my-project +``` + +### Arguments + + +Name of the environment to delete. + + +### Flags + + +Flash app name. Auto-detected from current directory if not specified. + + +### Process + +1. Shows environment details and resources to be deleted. +2. Prompts for confirmation (required). +3. Undeploys all associated endpoints. +4. Removes all associated network volumes. +5. Deletes the environment from the app. + + + +This operation is irreversible. All endpoints, volumes, and configuration associated with the environment will be permanently deleted. + + + +--- + +## Environment states + +| State | Description | +|-------|-------------| +| PENDING | Environment created but not deployed | +| DEPLOYING | Deployment in progress | +| DEPLOYED | Successfully deployed and running | +| FAILED | Deployment or health check failed | +| DELETING | Deletion in progress | + +## Common workflows + +### Three-tier deployment + +```bash +# Create environments +flash env create dev +flash env create staging +flash env create production + +# Deploy to each +flash deploy --env dev +flash deploy --env staging +flash deploy --env production +``` + +### Feature branch testing + +```bash +# Create feature environment +flash env create feature-auth + +# Deploy feature branch +git checkout feature-auth +flash deploy --env feature-auth + +# Clean up after merge +flash env delete feature-auth +``` + +## Related commands + +- [`flash deploy`](/flash/cli/deploy) - Deploy to an environment +- [`flash app`](/flash/cli/app) - Manage applications +- [`flash undeploy`](/flash/cli/undeploy) - Remove specific endpoints diff --git a/flash/cli/init.mdx b/flash/cli/init.mdx new file mode 100644 index 00000000..6fcf1511 --- /dev/null +++ b/flash/cli/init.mdx @@ -0,0 +1,89 @@ +--- +title: "init" +sidebarTitle: "init" +--- + +Create a new Flash project with a ready-to-use template structure including a FastAPI server, example GPU and CPU workers, and configuration files. + +```bash +flash init [PROJECT_NAME] [OPTIONS] +``` + +## Example + +Create a new project directory: + +```bash +flash init my-project +cd my-project +pip install -r requirements.txt +flash run +``` + +Initialize in the current directory: + +```bash +flash init . +``` + +## Arguments + + +Name of the project directory to create. If omitted or set to `.`, initializes in the current directory. + + +## Flags + + +Overwrite existing files if they already exist in the target directory. + + +## What it creates + +The command creates the following project structure: + +```text +my-project/ +├── main.py # FastAPI application entry point +├── workers/ +│ ├── gpu/ # GPU worker example +│ │ ├── __init__.py +│ │ └── endpoint.py +│ └── cpu/ # CPU worker example +│ ├── __init__.py +│ └── endpoint.py +├── .env # Environment variables template +├── .gitignore # Git ignore patterns +├── .flashignore # Flash deployment ignore patterns +├── requirements.txt # Python dependencies +└── README.md # Project documentation +``` + +### Template contents + +- **main.py**: FastAPI application that imports routers from the `workers/` directory. +- **workers/gpu/endpoint.py**: Example GPU worker with a `@remote` decorated function using `LiveServerless`. +- **workers/cpu/endpoint.py**: Example CPU worker with a `@remote` decorated function using CPU configuration. +- **.env**: Template for environment variables including `RUNPOD_API_KEY`. + +## Next steps + +After initialization: + +1. Copy `.env.example` to `.env` (if needed) and add your `RUNPOD_API_KEY`. +2. Install dependencies: `pip install -r requirements.txt` +3. Start the development server: `flash run` +4. Open http://localhost:8888/docs to explore the API. +5. Customize the workers for your use case. +6. Deploy with `flash deploy` when ready. + + + +This command only creates local files. It doesn't interact with Runpod or create any cloud resources. Cloud resources are created when you run `flash run` or `flash deploy`. + + + +## Related commands + +- [`flash run`](/flash/cli/run) - Start the development server +- [`flash deploy`](/flash/cli/deploy) - Build and deploy to Runpod diff --git a/flash/cli/overview.mdx b/flash/cli/overview.mdx new file mode 100644 index 00000000..aa44caba --- /dev/null +++ b/flash/cli/overview.mdx @@ -0,0 +1,121 @@ +--- +title: "CLI overview" +sidebarTitle: "Overview" +description: "Learn how to use the Flash CLI for local development and deployment." +--- + +The Flash CLI provides commands for initializing projects, running local development servers, building deployment artifacts, and managing your applications on Runpod Serverless. + +## Install Flash + +Create a Python virtual environment and install Flash using pip: + +```bash +python3 -m venv venv +source venv/bin/activate +pip install runpod-flash +``` + +## Configure your API key + +Flash requires a Runpod API key to provision and manage Serverless endpoints. Create a `.env` file in your project directory: + +```bash +echo "RUNPOD_API_KEY=your_api_key_here" > .env +``` + +You can also set the API key as an environment variable: + + + +```bash +export RUNPOD_API_KEY=your_api_key_here +``` + + +```bash +set RUNPOD_API_KEY=your_api_key_here +``` + + + +## Available commands + +| Command | Description | +|---------|-------------| +| [`flash init`](/flash/cli/init) | Create a new Flash project with a template structure | +| [`flash run`](/flash/cli/run) | Start the local development server with automatic updates | +| [`flash build`](/flash/cli/build) | Build a deployment artifact without deploying | +| [`flash deploy`](/flash/cli/deploy) | Build and deploy your application to Runpod | +| [`flash env`](/flash/cli/env) | Manage deployment environments | +| [`flash app`](/flash/cli/app) | Manage Flash applications | +| [`flash undeploy`](/flash/cli/undeploy) | Remove deployed endpoints | + +## Getting help + +View help for any command by adding `--help`: + +```bash +flash --help +flash deploy --help +flash env --help +``` + +## Common workflows + +### Local development + +```bash +# Create a new project +flash init my-project +cd my-project + +# Install dependencies +pip install -r requirements.txt + +# Add your API key to .env +# Start the development server +flash run +``` + +### Deploy to production + +```bash +# Build and deploy +flash deploy + +# Deploy to a specific environment +flash deploy --env production +``` + +### Manage deployments + +```bash +# List environments +flash env list + +# Check environment status +flash env get production + +# Remove an environment +flash env delete staging +``` + +### Clean up endpoints + +```bash +# List deployed endpoints +flash undeploy list + +# Remove specific endpoint +flash undeploy my-api + +# Remove all endpoints +flash undeploy --all +``` + +## Next steps + +- [Create a project](/flash/cli/init) with `flash init`. +- [Start developing](/flash/cli/run) with `flash run`. +- [Deploy your app](/flash/cli/deploy) with `flash deploy`. diff --git a/flash/cli/run.mdx b/flash/cli/run.mdx new file mode 100644 index 00000000..4dab9e6c --- /dev/null +++ b/flash/cli/run.mdx @@ -0,0 +1,156 @@ +--- +title: "run" +sidebarTitle: "run" +--- + +Start the Flash development server for local testing with automatic updates. Your FastAPI app runs locally while `@remote` functions execute on Runpod Serverless. + +```bash +flash run [OPTIONS] +``` + +## Example + +Start the development server with defaults: + +```bash +flash run +``` + +Start with auto-provisioning to eliminate cold-start delays: + +```bash +flash run --auto-provision +``` + +Start on a custom port: + +```bash +flash run --port 3000 +``` + +## Flags + + +Host address to bind the server to. + + + +Port number to bind the server to. + + + +Enable or disable auto-reload on code changes. Enabled by default. + + + +Auto-provision all Serverless endpoints on startup instead of lazily on first call. Eliminates cold-start delays during development. + + +## Architecture + +With `flash run`, your system runs in a hybrid architecture: + +```mermaid +%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%% + +flowchart TB + subgraph Local ["YOUR MACHINE (localhost:8888)"] + FastAPI["FastAPI App (main.py)
• Your HTTP routes
• Orchestrates @remote calls
• Updates automatically"] + end + + subgraph Runpod ["RUNPOD SERVERLESS"] + GPU["live-gpu-worker
(your @remote function)"] + CPU["live-cpu-worker
(your @remote function)"] + end + + FastAPI -->|"HTTPS"| GPU + FastAPI -->|"HTTPS"| CPU + + style Local fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff + style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff + style FastAPI fill:#5F4CFE,stroke:#5F4CFE,color:#fff + style GPU fill:#22C55E,stroke:#22C55E,color:#000 + style CPU fill:#22C55E,stroke:#22C55E,color:#000 +``` + +**Key points:** + +- Your FastAPI app runs locally and updates automatically for rapid iteration. +- `@remote` functions run on Runpod as Serverless endpoints. +- Endpoints are prefixed with `live-` to distinguish from production. +- Changes to local code are picked up instantly. + +This is different from `flash deploy`, where everything runs on Runpod. + +## Auto-provisioning + +By default, endpoints are provisioned lazily on first `@remote` function call. Use `--auto-provision` to provision all endpoints at server startup: + +```bash +flash run --auto-provision +``` + +### How it works + +1. **Discovery**: Scans your app for `@remote` decorated functions. +2. **Deployment**: Deploys resources concurrently (up to 3 at a time). +3. **Confirmation**: Asks for confirmation if deploying more than 5 endpoints. +4. **Caching**: Stores deployed resources in `.runpod/resources.pkl` for reuse. +5. **Updates**: Recognizes existing endpoints and updates if configuration changed. + +### Benefits + +- **Zero cold start**: All endpoints ready before you test them. +- **Faster development**: No waiting for deployment on first HTTP call. +- **Resource reuse**: Cached endpoints are reused across server restarts. + +### When to use + +- Local development with multiple endpoints. +- Testing workflows that call multiple remote functions. +- Debugging where you want deployment separated from handler logic. + +## Provisioning modes + +| Mode | When endpoints are deployed | +|------|----------------------------| +| Default (lazy) | On first `@remote` function call | +| `--auto-provision` | At server startup | + +## Testing your API + +Once the server is running, test your endpoints: + +```bash +# Health check +curl http://localhost:8888/ + +# Call a GPU endpoint +curl -X POST http://localhost:8888/gpu/hello \ + -H "Content-Type: application/json" \ + -d '{"message": "Hello from GPU!"}' +``` + +Open http://localhost:8888/docs for the interactive API explorer. + +## Requirements + +- `RUNPOD_API_KEY` must be set in your `.env` file or environment. +- A valid Flash project structure (created by `flash init` or manually). + +## flash run vs flash deploy + +| Aspect | `flash run` | `flash deploy` | +|--------|-------------|----------------| +| FastAPI app runs on | Your machine (localhost) | Runpod Serverless | +| `@remote` functions run on | Runpod Serverless | Runpod Serverless | +| Endpoint naming | `live-` prefix | No prefix | +| Automatic updates | Yes | No | +| Use case | Development | Production | + +## Related commands + +- [`flash init`](/flash/cli/init) - Create a new project +- [`flash deploy`](/flash/cli/deploy) - Deploy to production +- [`flash undeploy`](/flash/cli/undeploy) - Remove endpoints diff --git a/flash/cli/undeploy.mdx b/flash/cli/undeploy.mdx new file mode 100644 index 00000000..870e4ad1 --- /dev/null +++ b/flash/cli/undeploy.mdx @@ -0,0 +1,213 @@ +--- +title: "undeploy" +sidebarTitle: "undeploy" +--- + +Manage and delete Runpod Serverless endpoints deployed via Flash. Use this command to clean up endpoints created during local development with `flash run`. + +```bash +flash undeploy [NAME|list] [OPTIONS] +``` + +## Example + +List all tracked endpoints: + +```bash +flash undeploy list +``` + +Remove a specific endpoint: + +```bash +flash undeploy my-api +``` + +Remove all endpoints: + +```bash +flash undeploy --all +``` + +## Usage modes + +### List endpoints + +Display all tracked endpoints with their current status: + +```bash +flash undeploy list +``` + +Output includes: + +- **Name**: Endpoint name +- **Endpoint ID**: Runpod endpoint identifier +- **Status**: Current health status (Active/Inactive/Unknown) +- **Type**: Resource type (Live Serverless, Cpu Live Serverless, etc.) + +**Status indicators:** + +| Status | Meaning | +|--------|---------| +| Active | Endpoint is running and responding | +| Inactive | Tracking exists but endpoint deleted externally | +| Unknown | Error during health check | + +### Undeploy by name + +Delete a specific endpoint: + +```bash +flash undeploy my-api +``` + +This: + +1. Searches for endpoints matching the name. +2. Shows endpoint details. +3. Prompts for confirmation. +4. Deletes the endpoint from Runpod. +5. Removes from local tracking. + +### Undeploy all + +Delete all tracked endpoints (requires double confirmation): + +```bash +flash undeploy --all +``` + +Safety features: + +1. Shows total count of endpoints. +2. First confirmation: Yes/No prompt. +3. Second confirmation: Type "DELETE ALL" exactly. +4. Deletes all endpoints from Runpod. +5. Removes all from tracking. + +### Interactive selection + +Select endpoints to undeploy using checkboxes: + +```bash +flash undeploy --interactive +``` + +Use arrow keys to navigate, space bar to select/deselect, and Enter to confirm. + +### Clean up stale tracking + +Remove inactive endpoints from tracking without API deletion: + +```bash +flash undeploy --cleanup-stale +``` + +Use this when endpoints were deleted via the Runpod console or API (not through Flash). The local tracking file (`.runpod/resources.pkl`) becomes stale, and this command cleans it up. + +## Flags + + +Undeploy all tracked endpoints. Requires double confirmation for safety. + + + +Interactive checkbox selection mode. Select multiple endpoints to undeploy. + + + +Remove inactive endpoints from local tracking without attempting API deletion. Use when endpoints were deleted externally. + + +## Arguments + + +Name of the endpoint to undeploy. Use `list` to show all endpoints. + + +## undeploy vs env delete + +| Command | Scope | When to use | +|---------|-------|-------------| +| `flash undeploy` | Individual endpoints from local tracking | Development cleanup, granular control | +| `flash env delete` | Entire environment + all resources | Production cleanup, full teardown | + +For production deployments, use `flash env delete` to remove entire environments and all associated resources. + +## How tracking works + +Flash tracks deployed endpoints in `.runpod/resources.pkl`. Endpoints are added when you: + +- Run `flash run --auto-provision` +- Run `flash run` and call `@remote` functions +- Run `flash deploy` + +The tracking file is in `.gitignore` and should never be committed. It contains local deployment state. + +## Common workflows + +### Basic cleanup + +```bash +# Check what's deployed +flash undeploy list + +# Remove a specific endpoint +flash undeploy my-api + +# Clean up stale tracking +flash undeploy --cleanup-stale +``` + +### Bulk operations + +```bash +# Undeploy all endpoints +flash undeploy --all + +# Interactive selection +flash undeploy --interactive +``` + +### Managing external deletions + +If you delete endpoints via the Runpod console: + +```bash +# Check status - will show as "Inactive" +flash undeploy list + +# Remove stale tracking entries +flash undeploy --cleanup-stale +``` + +## Troubleshooting + +### Endpoint shows as "Inactive" + +The endpoint was deleted via Runpod console or API. Clean up: + +```bash +flash undeploy --cleanup-stale +``` + +### Can't find endpoint by name + +Check the exact name: + +```bash +flash undeploy list +``` + +### Undeploy fails with API error + +1. Check `RUNPOD_API_KEY` in `.env`. +2. Verify network connectivity. +3. Check if the endpoint still exists on Runpod. + +## Related commands + +- [`flash run`](/flash/cli/run) - Development server (creates endpoints) +- [`flash deploy`](/flash/cli/deploy) - Deploy to Runpod +- [`flash env delete`](/flash/cli/env) - Delete entire environment diff --git a/flash/deploy-apps.mdx b/flash/deploy-apps.mdx index ff62e7ac..8497fe28 100644 --- a/flash/deploy-apps.mdx +++ b/flash/deploy-apps.mdx @@ -1,175 +1,227 @@ --- -title: "Build and deploy Flash apps" -sidebarTitle: "Deploy Flash apps" -description: "Package and deploy Flash applications for production with `flash build`." +title: "Deploy Flash apps to Runpod" +sidebarTitle: "Deploy to Runpod" +description: "Bild and deploy your FastAPI app to Runpod." tag: "BETA" --- -Flash uses a build process to package your application for deployment. This page covers how the build process works, including handler generation, cross-platform builds, and troubleshooting common issues. +Flash provides a complete deployment workflow for taking your local development project to production. Use `flash deploy` to build and deploy your application in a single command, or use `flash build` for more control over the build process. -## Build process and handler generation -When you run `flash build`, the following happens: +## Deployment workflow -1. **Discovery**: Flash scans your code for `@remote` decorated functions. -2. **Grouping**: Functions are grouped by their `resource_config`. -3. **Handler generation**: For each resource config, Flash generates a lightweight handler file. -4. **Manifest creation**: A `flash_manifest.json` file maps functions to their endpoints. -5. **Dependency installation**: Python packages are installed with Linux `x86_64` compatibility. -6. **Packaging**: Everything is bundled into `archive.tar.gz` for deployment. +A typical deployment workflow looks like this: -### Handler architecture +1. **Create a new project**: Use [`flash init`](/flash/cli/init) to create a new project. +2. **Develop locally**: Use [`flash run`](/flash/cli/run) to test your application. Any functions decorated with `@remote` will be run on Runpod Serverless workers. +3. **Preview** (optional): Use [`flash deploy --preview`](/flash/cli/deploy) to test locally with Docker. +4. **Deploy**: Use [`flash deploy`](/flash/cli/deploy) to push to Runpod Serverless. +5. **Manage**: Use [`flash env`](/flash/cli/env) and [`flash app`](/flash/cli/app) to manage your deployments. -Flash uses a factory pattern for handlers to eliminate code duplication: +## Deploy your application -```python -# Generated handler (handler_gpu_config.py) -from runpod_flash.runtime.generic_handler import create_handler -from workers.gpu import process_data +When you're satisfied with your `@remote` functions and ready to move to production, use `flash deploy` to build and deploy your Flash application: + +```bash +flash deploy +``` + +This command performs the following steps: + +1. **Build**: Packages your code, dependencies, and manifest. +2. **Upload**: Sends the artifact to Runpod's storage. +3. **Provision**: Creates or updates Serverless endpoints. +4. **Configure**: Sets up environment variables and service discovery. +5. **Verify**: Confirms endpoints are healthy. + +### Deployment architecture -FUNCTION_REGISTRY = { - "process_data": process_data, -} +After deployment, your entire application runs on Runpod Serverless: -handler = create_handler(FUNCTION_REGISTRY) +```mermaid +%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%% + +flowchart TB + Users(["USERS"]) + + subgraph Runpod ["RUNPOD SERVERLESS"] + Mothership["MOTHERSHIP ENDPOINT
(your FastAPI app from main.py)
• Your HTTP routes
• Orchestrates @remote calls
• Public URL for users"] + GPU["gpu-worker
(your @remote function)"] + CPU["cpu-worker
(your @remote function)"] + + Mothership -->|"internal"| GPU + Mothership -->|"internal"| CPU + end + + Users -->|"HTTPS (authenticated)"| Mothership + + style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff + style Users fill:#4D38F5,stroke:#4D38F5,color:#fff + style Mothership fill:#5F4CFE,stroke:#5F4CFE,color:#fff + style GPU fill:#22C55E,stroke:#22C55E,color:#000 + style CPU fill:#22C55E,stroke:#22C55E,color:#000 ``` -This approach provides: +### Deploy to an environment -- **Single source of truth**: All handler logic in one place. -- **Easier maintenance**: Bug fixes don't require rebuilding projects. +Flash organizes deployments using [apps and environments](/flash/apps-and-environments). Deploy to a specific environment using the `--env` flag: -## Cross-platform builds +```bash +# Deploy to staging +flash deploy --env staging -Flash automatically handles cross-platform builds, ensuring your deployments work correctly regardless of your development platform: +# Deploy to production +flash deploy --env production +``` -- **Automatic platform targeting**: Dependencies are installed for Linux `x86_64` (required for [Runpod Serverless](/serverless/overview)), even when building on macOS or Windows. -- **Python version matching**: The build uses your current Python version to ensure package compatibility. -- **Binary wheel enforcement**: Only pre-built binary wheels are used, preventing platform-specific compilation issues. +If the specified environment doesn't exist, Flash creates it automatically. -This means you can build on macOS ARM64, Windows, or any other platform, and the resulting package will run correctly on [Runpod Serverless](/serverless/overview). +### Post-deployment -## Cross-endpoint function calls +After a successful deployment, Flash displays: -Flash enables functions on different endpoints to call each other: +- The public URL for your application. +- Available routes from your `@remote` decorated functions. +- Instructions for authenticating requests. -```python -# CPU endpoint function -@remote(resource_config=cpu_config) -def preprocess(data): - # Preprocessing logic - return clean_data - -# GPU endpoint function -@remote(resource_config=gpu_config) -async def inference(data): - # Can call CPU endpoint function - clean = await preprocess(data) - # Run inference on clean data - return result +```text +✓ Deployment Complete + +Your mothership is deployed at: +https://api-xxxxx.runpod.net + +Available Routes: +POST /api/hello +POST /gpu/process + +All endpoints require authentication: +curl -X POST https://api-xxxxx.runpod.net/api/hello \ + -H "Authorization: Bearer $RUNPOD_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"message": "Hello"}' ``` -The runtime automatically discovers endpoints and routes calls appropriately using the [`flash_manifest.json`](#build-artifacts) file generated during the build process. This lets you build pipelines that use CPU workers for preprocessing and GPU workers for inference, optimizing costs by using the appropriate hardware for each task. +## Preview before deploying + +Test your deployment locally using Docker before pushing to production: -## Build artifacts +```bash +flash deploy --preview +``` -After running `flash build`, you'll find these artifacts in the `.flash/` directory: +This command: -| Artifact | Description | -|----------|-------------| -| `.flash/.build/` | Temporary build directory (removed unless `--keep-build`) | -| `.flash/archive.tar.gz` | Deployment package | -| `.flash/flash_manifest.json` | Service discovery configuration | +1. Builds your project (creates the archive and manifest). +2. Creates a Docker network for inter-container communication. +3. Starts one container per resource config (mothership + workers). +4. Exposes the mothership on `localhost:8000`. + +Use preview mode to: -### Managing bundle size +- Validate your deployment configuration. +- Test cross-endpoint function calls. +- Debug resource provisioning issues. +- Verify the manifest structure. -Runpod Serverless has a **500MB deployment limit**. Exceeding this limit will cause your build to fail. +Press `Ctrl+C` to stop the preview environment. -Use `--exclude` to skip packages that are already included in your base worker image: +## Managing deployment size + +Runpod Serverless has a **500MB deployment limit**. If your deployment exceeds this limit, use the `--exclude` flag to skip packages already included in your base worker image: ```bash -# For GPU deployments (PyTorch pre-installed) -flash build --exclude torch,torchvision,torchaudio +# Exclude PyTorch packages (pre-installed in GPU images) +flash deploy --exclude torch,torchvision,torchaudio ``` -Which packages to exclude depends on your [resource config](/flash/resource-configuration): +### Base image packages + +Which packages to exclude depends on your resource configuration: -- **GPU resources** use PyTorch as the base image, which has `torch`, `torchvision`, and `torchaudio` pre-installed. -- **CPU resources** use Python slim images, which have no ML frameworks pre-installed. -- **Load-balancer** resources use the same base image as their GPU/CPU counterparts. +| Resource type | Base image | Pre-installed packages | +|--------------|------------|------------------------| +| GPU (`LiveServerless` with `gpus`) | PyTorch base | `torch`, `torchvision`, `torchaudio` | +| CPU (`LiveServerless` with `instanceIds`) | Python slim | None | +| Load-balanced | Same as GPU/CPU | Same as GPU/CPU | - You can find details about the Flash worker image in the [runpod-workers/flash](https://github.com/runpod-workers/flash) repository. Find the `Dockerfile` for your endpoint type: `Dockerfile` (for GPU workers), `Dockerfile-cpu` (for CPU workers), or `Dockerfile-lb` (for load balancing workers). + +Check the [worker-flash repository](https://github.com/runpod-workers/worker-flash) for current base images and pre-installed packages. + -## Troubleshooting +## Build process -### No @remote functions found +When you run `flash deploy` (or `flash build`), Flash: -If the build process can't find your remote functions: +1. **Discovers** all `@remote` decorated functions. +2. **Groups** functions by their `resource_config`. +3. **Generates** handler files for each resource config. +4. **Creates** a `flash_manifest.json` file for service discovery. +5. **Installs** dependencies with Linux x86_64 compatibility. +6. **Packages** everything into `.flash/artifact.tar.gz`. -- Ensure your functions are decorated with `@remote(resource_config)`. -- Check that Python files are not excluded by `.gitignore` or `.flashignore`. -- Verify function decorators have valid syntax. +### Cross-platform builds -### Handler generation failed +Flash automatically handles cross-platform builds. You can build on macOS, Windows, or Linux, and the resulting package will run correctly on Runpod's Linux x86_64 infrastructure. -If handler generation fails: +### Build artifacts -- Check for syntax errors in your Python files (they should be logged in the terminal). -- Verify all imports in your worker modules are available. -- Ensure resource config variables (e.g., `gpu_config`) are defined before a function references them. -- Use `--keep-build` to inspect generated handler files in `.flash/.build/`. +After building, these artifacts are created in the `.flash/` directory: -### Build succeeded but deployment failed +| Artifact | Description | +|----------|-------------| +| `.flash/artifact.tar.gz` | Deployment package | +| `.flash/flash_manifest.json` | Service discovery configuration | +| `.flash/.build/` | Temporary build directory (removed by default) | -If the build succeeds but deployment fails: +## Troubleshooting -- Verify all function imports work in the deployment environment. -- Check that environment variables required by your functions are available. -- Review the generated `flash_manifest.json` for correct function mappings. +### No @remote functions found -### Dependency installation failed +If the build process can't find your remote functions: -If dependency installation fails during the build: +- Ensure functions are decorated with `@remote(resource_config=...)`. +- Check that Python files aren't excluded by `.gitignore` or `.flashignore`. +- Verify decorator syntax is correct. -- If a package doesn't have pre-built Linux `x86_64`` wheels, the build will fail with an error. -- For newer Python versions (3.13+), some packages may require `manylinux_2_27`` or higher. -- Ensure you have standard pip installed (`python -m ensurepip --upgrade`) for best compatibility. -- Check PyPI to verify the package supports your Python version on Linux. +### Deployment size limit exceeded -### Authentication errors +If your deployment exceeds 500MB: + +```bash +# Exclude packages already in base image +flash deploy --exclude torch,torchvision,torchaudio +``` -If you're seeing authentication errors: +### Authentication errors Verify your API key is set correctly: ```bash -echo $RUNPOD_API_KEY # Should show your key +echo $RUNPOD_API_KEY +``` + +If not set, add it to your `.env` file or export it: + +```bash +export RUNPOD_API_KEY=your_api_key_here ``` ### Import errors in remote functions -Remember to import packages inside remote functions: +Import packages inside the remote function, not at the top of the file: ```python -@remote(dependencies=["requests"]) +@remote(resource_config=config, dependencies=["requests"]) def fetch_data(url): - import requests # Import here, not at top of file + import requests # Import here return requests.get(url).json() ``` -## Performance optimization - -To optimize performance: - -- Set `workersMin=1` to keep workers warm and avoid cold starts. -- Use `idleTimeout` to balance cost and responsiveness. -- Choose appropriate GPU types for your workload. -- Use `--auto-provision` with `flash run` to eliminate cold-start delays during development. - ## Next steps -- [View the resource configuration reference](/flash/resource-configuration) for all available options. +- [Learn about apps and environments](/flash/apps-and-environments) for managing deployments. +- [View the CLI reference](/flash/cli/overview) for all available commands. +- [Configure resources](/flash/resource-configuration) for your endpoints. - [Monitor and debug](/flash/monitoring) your deployments. -- [Learn about pricing](/flash/pricing) to optimize costs. diff --git a/flash/initialize-project.mdx b/flash/initialize-project.mdx new file mode 100644 index 00000000..b00d4ea9 --- /dev/null +++ b/flash/initialize-project.mdx @@ -0,0 +1,209 @@ +--- +title: "Initialize a Flash app project" +sidebarTitle: "Initialize a project" +description: "Use flash init to create a new Flash project with a ready-to-use structure." +tag: "BETA" +--- + +The `flash init` command creates a new Flash project with a complete project structure, including a FastAPI server, example GPU and CPU workers, and configuration files. This gives you a working starting point for building Flash applications. + +Use `flash init` whenever you want to start a new Flash project, fully configured for you to run `flash run` and `flash deploy`. + +## Create a new project + +Create a new project in a new directory: + +```bash +flash init my-project +cd my-project +``` + +Or initialize in your current directory: + +```bash +flash init . +``` + +## Project structure + +`flash init` creates the following structure: + +```text +my-project/ +├── main.py # FastAPI application entry point +├── mothership.py # Mothership endpoint configuration +├── workers/ +│ ├── gpu/ # GPU worker +│ │ ├── __init__.py +│ │ └── endpoint.py +│ └── cpu/ # CPU worker +│ ├── __init__.py +│ └── endpoint.py +├── .env.example # Environment variables template +├── .flashignore # Files to exclude from deployment +├── .gitignore # Git ignore patterns +├── pyproject.toml # Python project configuration +├── requirements.txt # Python dependencies +└── README.md # Project documentation +``` + +### Key files + +**main.py**: The FastAPI application that imports and registers your worker routers. + +**mothership.py**: Configuration for the "mothership" endpoint—the main entry point that orchestrates calls to your workers when deployed. + +**workers/gpu/endpoint.py**: An example GPU worker with a `@remote` decorated function. This is where you define functions that run on GPU workers. + +**workers/cpu/endpoint.py**: An example CPU worker for tasks that don't require GPU acceleration. + +**.flashignore**: Lists files and directories to exclude from the deployment artifact (similar to `.gitignore`). + +## Set up the project + +After initialization, complete the setup: + +```bash +# Install dependencies +pip install -r requirements.txt + +# Copy environment template +cp .env.example .env + +# Add your API key to .env +# RUNPOD_API_KEY=your_api_key_here +``` + +## How it fits into the workflow + +`flash init` is the first step in the Flash development workflow: + +```mermaid +%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%% + +flowchart LR + Init["flash init"] + Dev["flash run"] + Deploy["flash deploy"] + + Init -->|"Create project"| Dev + Dev -->|"Test locally"| Deploy + + style Init fill:#5F4CFE,stroke:#5F4CFE,color:#fff + style Dev fill:#22C55E,stroke:#22C55E,color:#000 + style Deploy fill:#4D38F5,stroke:#4D38F5,color:#fff +``` + +1. **`flash init`**: Creates project structure and boilerplate. +2. **`flash run`**: Starts local development server for testing. +3. **`flash deploy`**: Builds and deploys to Runpod Serverless. + +## Customize your project + +### Add a new GPU endpoint + +To add a new GPU endpoint, you need to create a new file in the `workers/gpu/` directory. This file will contain the code for the endpoint and be automatically included in the FastAPI app. + +1. Create a new file in `workers/gpu/` with the name of the endpoint. For example, `inference.py`: + +```python +# workers/gpu/inference.py +from runpod_flash import remote, LiveServerless, GpuGroup + +config = LiveServerless( + name="inference-worker", + gpus=[GpuGroup.ADA_24], + workersMax=3, +) + +@remote(resource_config=config, dependencies=["transformers", "torch"]) +def run_inference(prompt: str) -> dict: + # Rember to import endpoint dependencies inside the function. + from transformers import pipeline + + generator = pipeline("text-generation", model="gpt2") + result = generator(prompt, max_length=50) + return {"output": result[0]["generated_text"]} +``` + +2. Add a route in `workers/gpu/__init__.py`: + +```python +from fastapi import APIRouter +from .inference import run_inference + +router = APIRouter(prefix="/gpu", tags=["GPU Workers"]) + +@router.post("/inference") +async def inference_endpoint(prompt: str): + result = await run_inference(prompt) + return result +``` + +3. The router is automatically included via `main.py`. + +### Add a CPU endpoint + +Follow the same pattern in `workers/cpu/`. CPU endpoints use `instanceIds` instead of `gpus`: + +```python +from runpod_flash import remote, LiveServerless, CpuInstanceType + +config = LiveServerless( + name="data-processor", + instanceIds=[CpuInstanceType.CPU5C_4_8], + workersMax=2, +) + +@remote(resource_config=config, dependencies=["pandas"]) +def process_data(data: list) -> dict: + import pandas as pd + df = pd.DataFrame(data) + return df.describe().to_dict() +``` + +## Handle existing files + +If you run `flash init` in a directory with existing files, Flash detects conflicts and prompts for confirmation: + +```text +┌─ File Conflicts Detected ─────────────────────┐ +│ Warning: The following files will be │ +│ overwritten: │ +│ │ +│ • main.py │ +│ • requirements.txt │ +└───────────────────────────────────────────────┘ +Continue and overwrite these files? [y/N]: +``` + +Use `--force` to skip the prompt and overwrite files: + +```bash +flash init . --force +``` + +## Start developing + +Once your project is set up: + +```bash +# Start the development server +flash run + +# Open the API explorer +# http://localhost:8888/docs +``` + +Make changes to your workers, and the server reloads automatically. When you're ready, deploy with: + +```bash +flash deploy +``` + +## Next steps + +- [Test locally](/flash/local-testing) with `flash run`. +- [Build your app](/flash/build-app) by customizing workers. +- [Deploy to production](/flash/deploy-apps) with `flash deploy`. +- [View the flash init reference](/flash/cli/init) for all options. diff --git a/flash/local-testing.mdx b/flash/local-testing.mdx new file mode 100644 index 00000000..d19beeb9 --- /dev/null +++ b/flash/local-testing.mdx @@ -0,0 +1,174 @@ +--- +title: "Test Flash apps locally" +sidebarTitle: "Test locally" +description: "Use flash run to test your Flash application locally before deploying." +tag: "BETA" +--- + +The `flash run` command starts a local development server that lets you test your Flash application before deploying to production. Your FastAPI app runs locally with automatic reloading, while `@remote` functions execute on Runpod Serverless workers. + +Use `flash run` when you want to: + +- Iterate quickly on your API logic with automatic reloading. +- Test `@remote` functions against real GPU/CPU workers. +- Debug request/response handling before deployment. +- Develop new endpoints without deploying after every change. + +## Start the development server + +From inside your [project directory](/flash/initialize-project), run: + +```bash +flash run +``` + +The server starts at `http://localhost:8888` by default. Your FastAPI routes are available immediately, and `@remote` functions provision Serverless endpoints on first call. + +### Custom host and port + +```bash +# Change port +flash run --port 3000 + +# Make accessible on network +flash run --host 0.0.0.0 +``` + +## Test your endpoints + +### Using curl + +```bash +curl -X POST http://localhost:8888/gpu/hello \ + -H "Content-Type: application/json" \ + -d '{"name": "Flash"}' +``` + +### Using the API explorer + +Open [http://localhost:8888/docs](http://localhost:8888/docs) in your browser to access the interactive Swagger UI. You can test all endpoints directly from the browser. + +### Using Python + +```python +import requests + +response = requests.post( + "http://localhost:8888/gpu/hello", + json={"name": "Flash"} +) +print(response.json()) +``` + +## Reduce cold-start delays + +The first call to a `@remote` function provisions a Serverless endpoint, which takes 30-60 seconds. Use `--auto-provision` to provision all endpoints at startup: + +```bash +flash run --auto-provision +``` + +This scans your project for `@remote` functions and deploys them before the server starts accepting requests. Endpoints are cached in `.runpod/resources.pkl` and reused across server restarts. + +## How it works + +With `flash run`, your system runs in a hybrid architecture: + +```mermaid +%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%% + +flowchart TB + subgraph Local ["YOUR MACHINE (localhost:8888)"] + FastAPI["FastAPI App
• Updates automatically
• Your HTTP routes"] + end + + subgraph Runpod ["RUNPOD SERVERLESS"] + GPU["live-gpu-worker"] + CPU["live-cpu-worker"] + end + + FastAPI -->|"HTTPS"| GPU + FastAPI -->|"HTTPS"| CPU + + style Local fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff + style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff + style FastAPI fill:#5F4CFE,stroke:#5F4CFE,color:#fff + style GPU fill:#22C55E,stroke:#22C55E,color:#000 + style CPU fill:#22C55E,stroke:#22C55E,color:#000 +``` + +**What runs where:** + +| Component | Location | Automatic updates | +|-----------|----------|------------| +| FastAPI app (`main.py`) | Your machine | Yes | +| HTTP routes | Your machine | Yes | +| `@remote` functions | Runpod Serverless | No | + +Endpoints created by `flash run` are prefixed with `live-` to distinguish them from production endpoints. + +## Development workflow + +A typical development cycle looks like this: + +1. Start the server: `flash run` +2. Make changes to your code. +3. The server reloads automatically. +4. Test your changes via curl or the API explorer. +5. Repeat until ready to deploy. + +When you're done, use `flash undeploy` to clean up the `live-` endpoints created during development. + +## Differences from production + +| Aspect | `flash run` | `flash deploy` | +|--------|-------------|----------------| +| FastAPI app runs on | Your machine | Runpod Serverless | +| Endpoint naming | `live-` prefix | No prefix | +| Automatic updates | Yes | No | +| Authentication | Not required | Required | + +## Clean up after testing + +Endpoints created by `flash run` persist until you delete them. To clean up: + +```bash +# List all endpoints +flash undeploy list + +# Remove a specific endpoint +flash undeploy live-YOUR_ENDPOINT_NAME + +# Remove all endpoints +flash undeploy --all +``` + +## Troubleshooting + +**Port already in use** + +```bash +flash run --port 3000 +``` + +**Slow first request** + +Use `--auto-provision` to eliminate cold-start delays: + +```bash +flash run --auto-provision +``` + +**Authentication errors** + +Ensure `RUNPOD_API_KEY` is set in your `.env` file or environment: + +```bash +export RUNPOD_API_KEY=your_api_key_here +``` + +## Next steps + +- [Deploy to production](/flash/deploy-apps) when your app is ready. +- [Clean up endpoints](/flash/cli/undeploy) after testing. +- [View the flash run reference](/flash/cli/run) for all options. diff --git a/flash/monitoring.mdx b/flash/monitoring.mdx index fb206f58..33da1d53 100644 --- a/flash/monitoring.mdx +++ b/flash/monitoring.mdx @@ -1,6 +1,6 @@ --- -title: "Monitoring and debugging" -sidebarTitle: "Monitoring and debugging" +title: "Monitor and debug remote functions" +sidebarTitle: "Monitor and debug" description: "Monitor, debug, and troubleshoot Flash deployments." tag: "BETA" --- @@ -178,14 +178,54 @@ export RUNPOD_API_KEY=your_api_key_here ## Endpoint management -As you work with Flash, endpoints accumulate in your Runpod account. To manage them: +As you work with Flash, endpoints accumulate in your Runpod account. Use `flash undeploy` to manage and clean up endpoints. -1. Go to the [Serverless section](https://www.runpod.io/console/serverless) in the Runpod console. -2. Review your endpoints and delete unused ones. -3. Note that a `flash undeploy` command is in development for easier cleanup. +### List deployed endpoints - +View all tracked endpoints with their status: -Endpoints persist until manually deleted through the Runpod console. Regularly clean up unused endpoints to avoid hitting your account's maximum worker capacity limits. +```bash +flash undeploy list +``` + +This shows each endpoint's name, ID, status (Active/Inactive), and resource type. + +### Undeploy specific endpoints + +Remove a specific endpoint by name: + +```bash +flash undeploy my-api +``` + +### Undeploy all endpoints + +Remove all tracked endpoints (requires confirmation): + +```bash +flash undeploy --all +``` + +### Interactive selection + +Select endpoints to undeploy using an interactive interface: + +```bash +flash undeploy --interactive +``` + +### Clean up stale tracking + +If you delete endpoints through the Runpod console, the local tracking file becomes stale. Clean it up with: + +```bash +flash undeploy --cleanup-stale +``` + +For production deployments, use `flash env delete` to remove an entire environment and all its associated resources. + + + +For detailed documentation on the undeploy command, see the [flash undeploy CLI reference](/flash/cli/undeploy). - \ No newline at end of file +
\ No newline at end of file diff --git a/flash/overview.mdx b/flash/overview.mdx index e49cdb76..115fbac9 100644 --- a/flash/overview.mdx +++ b/flash/overview.mdx @@ -11,34 +11,18 @@ Flash is currently in beta. [Join our Discord](https://discord.gg/cUpRmau42V) to Flash is a Python SDK for developing and deploying AI workflows on [Runpod Serverless](/serverless/overview). You write Python functions locally, and Flash handles infrastructure management, GPU/CPU provisioning, dependency installation, and data transfer automatically. -There are two ways to run workloads with Flash: - -- **Standalone scripts:** Add the `@remote` decorator to Python functions, and they'll run automatically on Runpod's cloud infrastructure when you run the script locally. -- **API endpoints:** Convert those functions into persistent endpoints that can be accessed via HTTP, scaling GPU/CPU resources automatically based on demand. - -Ready to try it out? Check out the quickstart guide and examples repository: - - - Follow the quickstart to create your first Flash function in minutes. - - - - Check out our repository of prebuilt Flash applications. - - - - Learn about resource configuration, dependencies, and parallel execution. + + Write a standalone Flash script for instant access to Runpod infrastructure. - - Build HTTP APIs with FastAPI and Flash. + + Create a Flash app with a FastAPI server and deploy it on Runpod to serve production endpoints. - ## Why use Flash? -**Flash is the easiest and fastest way to test and deploy AI/ML workloads on Runpod.** It's designed for local development and live-testing workflows, but can also be used to deploy production-ready applications. +**Flash is the easiest and fastest way to test and deploy AI/ML workloads on Runpod.** Whether you're prototyping a new model or deploying a production API, Flash handles the infrastructure complexity so you can focus on your code. When you run a `@remote` function, Flash: - Automatically provisions resources on Runpod's infrastructure. @@ -50,9 +34,12 @@ You can specify the exact GPU hardware you need, from RTX 4090s to A100 80GB GPU Flash uses [Runpod's Serverless pricing](/serverless/pricing) with per-second billing. You're only charged for actual compute time; there are no costs when your code isn't running. - ## Install Flash + +Flash requires Python 3.10 or higher. + + Create a Python virtual environment and use `pip` to install Flash: ```bash @@ -67,7 +54,7 @@ In your project directory, create a `.env` file and add your Runpod API key, rep touch .env && echo "RUNPOD_API_KEY=YOUR_API_KEY" > .env ``` -## Concepts +## Core concepts ### Remote functions @@ -100,6 +87,8 @@ gpu_config = LiveServerless( ) ``` +[View the complete configuration reference](/flash/resource-configuration). + ### Dependency management Specify Python packages in the decorator, and Flash installs them automatically on the remote worker: @@ -129,18 +118,154 @@ results = await asyncio.gather( ) ``` -## How it works +## Development workflows + +Flash supports two main methods for running workloads on Runpod: standalone scripts and Flash apps. + + +### Standalone scripts + +This is the fastest way to get started with Flash. Just write a Python script with `@remote` decorated functions and run it locally with `python script.py`. + +```python +import asyncio +from runpod_flash import remote, LiveServerless, GpuGroup + +config = LiveServerless( + name="gpu-inference", + gpus=[GpuGroup.ADA_24], +) + +@remote(resource_config=config, dependencies=["torch"]) +def process_on_gpu(data): + import torch + # Your GPU workload here + return {"result": "processed"} + +async def main(): + result = await process_on_gpu({"input": "data"}) + print(result) + +if __name__ == "__main__": + asyncio.run(main()) +``` + +Run the script locally, and Flash executes the `@remote` function on Runpod's infrastructure: + +```bash +python my_script.py +``` + +**Use this approach for:** +- Quick prototypes and experiments. +- Batch processing jobs. +- One-off data processing tasks. +- Local development and testing. + +[Follow the quickstart](/flash/quickstart) to create your first Flash script. + +### Flash apps + +Build FastAPI applications with HTTP endpoints that run on Runpod Serverless. Flash apps provide a complete development and deployment workflow with local testing and production deployment. + +```python +# main.py +from fastapi import FastAPI +from runpod_flash import remote, LiveServerless, GpuGroup + +app = FastAPI() + +config = LiveServerless( + name="api-worker", + gpus=[GpuGroup.ADA_24], +) + +@remote(resource_config=config, dependencies=["torch"]) +def inference(prompt: str): + import torch + # Your inference logic + return {"output": "result"} + +@app.post("/inference") +async def inference_endpoint(prompt: str): + result = await inference(prompt) + return result +``` + +Develop and test locally with automatic updates: + +```bash +flash run +``` + +Deploy to production when ready: + +```bash +flash deploy +``` + +**Use this approach for:** + +- Production HTTP APIs. +- Persistent endpoints. +- Long-running services. +- Team collaboration with staging/production environments. + +[Follow this tutorial](/flash/build-app) to build your first Flash app. + + +### Flash apps + +1. **Initialize**: Create a project with `flash init` +2. **Develop**: Write your FastAPI app with `@remote` functions +3. **Test locally**: Run `flash run` to test with automatic updates +4. **Deploy**: Run `flash deploy` to push to production + +This workflow is ideal for production APIs and services that need persistent endpoints. + +```mermaid +%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%% + +flowchart LR + Init["flash init"] + Dev["Write code"] + Run["flash run
(test locally)"] + Deploy["flash deploy
(production)"] + + Init --> Dev + Dev --> Run + Run -->|"Ready"| Deploy + Run -->|"Continue developing"| Dev + + style Init fill:#5F4CFE,stroke:#5F4CFE,color:#fff + style Dev fill:#22C55E,stroke:#22C55E,color:#000 + style Run fill:#4D38F5,stroke:#4D38F5,color:#fff + style Deploy fill:#AE6DFF,stroke:#AE6DFF,color:#000 +``` + +[Learn more about the Flash app workflow](/flash/build-apps-overview). + -Flash orchestrates workflow execution through a multi-step process: -1. **Function identification**: The `@remote` decorator marks functions for remote execution, enabling Flash to distinguish between local and remote operations. -2. **Dependency analysis**: Flash automatically analyzes function dependencies to construct an optimal execution order. -3. **Resource provisioning and execution**: For each remote function, Flash: - - Dynamically provisions endpoint and worker resources on Runpod's infrastructure. - - Serializes and securely transfers input data to the remote worker. - - Executes the function on the remote infrastructure with the specified GPU or CPU resources. - - Returns results to your local environment. -4. **Data orchestration**: Results flow seamlessly between functions according to your local Python code structure. +## CLI commands + +Flash provides CLI commands for managing Flash apps: + +| Command | Description | +|---------|-------------| +| [`flash init`](/flash/cli/init) | Create a new Flash app project | +| [`flash run`](/flash/cli/run) | Start the local development server | +| [`flash build`](/flash/cli/build) | Build a deployment artifact | +| [`flash deploy`](/flash/cli/deploy) | Build and deploy to Runpod | +| [`flash env`](/flash/cli/env) | Manage deployment environments | +| [`flash app`](/flash/cli/app) | Manage Flash applications | +| [`flash undeploy`](/flash/cli/undeploy) | Remove deployed endpoints | + + +CLI commands are primarily for Flash apps. Standalone scripts don't require the CLI—just run them with `python`. + + +See the [CLI reference](/flash/cli/overview) for detailed documentation on each command. ## Use cases @@ -153,41 +278,29 @@ Flash is well-suited for a range of AI and data processing workloads: - **Data processing workflows**: Process large datasets using CPU workers for general computation and GPU workers for accelerated tasks. - **Hybrid GPU/CPU workflows**: Optimize cost and performance by combining CPU preprocessing with GPU inference. -## Development workflow - -A typical Flash development workflow looks like this: - -1. Write Python functions with the `@remote` decorator. -2. Specify resource requirements and dependencies in the decorator. -3. Run your script locally. Flash handles remote execution automatically. - -For API deployments, use `flash init` to create a project, then `flash run` to start your server. For a full walkthrough, see [Create a Flash API endpoint](/flash/api-endpoints). - ## Limitations - Serverless deployments using Flash are currently restricted to the `EU-RO-1` datacenter. -- Flash is designed primarily for local development and live-testing workflows. -- Endpoints created by Flash persist until manually deleted through the Runpod console. A `flash undeploy` command is currently in development to clean up unused endpoints. - Be aware of your account's maximum worker capacity limits. Flash can rapidly scale workers across multiple endpoints, and you may hit capacity constraints. Contact [Runpod support](https://www.runpod.io/contact) to increase your account's capacity allocation if needed. ## Next steps - Get started with your first Flash function. + Write your first standalone script with Flash + + + Create a FastAPI app with Flash - Complete reference for resource configuration options. + Complete reference for resource configuration + + + Learn about Flash CLI commands ## Getting help -- Join the [Runpod community on Discord](https://discord.gg/cUpRmau42V) for support and discussion. - -## Next steps - -- [View the resource configuration reference](/flash/resource-configuration) for all available options. -- [Learn about pricing](/flash/pricing) to optimize costs. -- [Deploy Flash applications](/flash/deploy-apps) for production. +Join the [Runpod community on Discord](https://discord.gg/cUpRmau42V) for support and discussion. diff --git a/flash/quickstart.mdx b/flash/quickstart.mdx index 6579fc39..7813a5e9 100644 --- a/flash/quickstart.mdx +++ b/flash/quickstart.mdx @@ -22,7 +22,7 @@ In this tutorial you'll learn how to: - You've [created a Runpod account](/get-started/manage-accounts). - You've [created a Runpod API key](/get-started/api-keys). -- You've installed [Python 3.9 (or higher)](https://www.python.org/downloads/). +- You've installed [Python 3.10 or higher](https://www.python.org/downloads/). ## Step 1: Install Flash @@ -322,6 +322,6 @@ Result standard deviation: 9.8879 You've successfully used Flash to run a GPU workload on Runpod. Now you can: - [Create more complex remote functions](/flash/remote-functions) with custom dependencies and resource configurations. -- [Build API endpoints](/flash/api-endpoints) using FastAPI. +- [Build a Flash app](/flash/build-app) using FastAPI. - [Deploy Flash applications](/flash/deploy-apps) for production use. - Explore more examples on the [runpod-workers/flash](https://github.com/runpod-workers/flash) GitHub repository. diff --git a/flash/remote-functions.mdx b/flash/remote-functions.mdx index 60d035d5..a139bde0 100644 --- a/flash/remote-functions.mdx +++ b/flash/remote-functions.mdx @@ -257,6 +257,6 @@ Environment variables are excluded from configuration hashing. Changing environm ## Next steps -- [Create API endpoints](/flash/api-endpoints) using FastAPI. +- [Create API endpoints](/flash/build-app) using FastAPI. - [Deploy Flash applications](/flash/deploy-apps) for production. - [View the resource configuration reference](/flash/resource-configuration) for all available options. diff --git a/flash/resource-configuration.mdx b/flash/resource-configuration.mdx index 886c0f90..023354af 100644 --- a/flash/resource-configuration.mdx +++ b/flash/resource-configuration.mdx @@ -1,7 +1,7 @@ --- title: "Resource configuration reference" sidebarTitle: "Configuration reference" -description: "Complete reference for Flash resource configuration options." +description: "A complete reference for Flash GPU/CPU resource configuration options." tag: "BETA" --- From c7e2f5e499ff754cc3ff95c4e98fb9ed67d94310 Mon Sep 17 00:00:00 2001 From: Mo King Date: Thu, 19 Feb 2026 08:35:25 -0500 Subject: [PATCH 07/19] Update filestructure to include apps/ section --- docs.json | 12 +- flash/{ => apps}/apps-and-environments.mdx | 70 +----- flash/{ => apps}/build-app.mdx | 4 +- flash/{ => apps}/deploy-apps.mdx | 4 +- flash/{ => apps}/initialize-project.mdx | 6 +- flash/{ => apps}/local-testing.mdx | 4 +- flash/apps/overview.mdx | 255 +++++++++++++++++++++ flash/overview.mdx | 8 +- flash/quickstart.mdx | 4 +- flash/remote-functions.mdx | 4 +- flash/resource-configuration.mdx | 2 +- 11 files changed, 286 insertions(+), 87 deletions(-) rename flash/{ => apps}/apps-and-environments.mdx (60%) rename flash/{ => apps}/build-app.mdx (98%) rename flash/{ => apps}/deploy-apps.mdx (97%) rename flash/{ => apps}/initialize-project.mdx (96%) rename flash/{ => apps}/local-testing.mdx (96%) create mode 100644 flash/apps/overview.mdx diff --git a/docs.json b/docs.json index 0f92f47e..bf665420 100644 --- a/docs.json +++ b/docs.json @@ -130,12 +130,12 @@ { "group": "Build apps", "pages": [ - "flash/build-apps-overview", - "flash/build-app", - "flash/initialize-project", - "flash/local-testing", - "flash/apps-and-environments", - "flash/deploy-apps" + "flash/apps/overview", + "flash/apps/build-app", + "flash/apps/initialize-project", + "flash/apps/local-testing", + "flash/apps/apps-and-environments", + "flash/apps/deploy-apps" ] }, "flash/monitoring", diff --git a/flash/apps-and-environments.mdx b/flash/apps/apps-and-environments.mdx similarity index 60% rename from flash/apps-and-environments.mdx rename to flash/apps/apps-and-environments.mdx index 6f0c42c6..16f74a36 100644 --- a/flash/apps-and-environments.mdx +++ b/flash/apps/apps-and-environments.mdx @@ -1,21 +1,15 @@ --- title: "Manage apps and environments" sidebarTitle: "Manage apps and environments" -description: "Understand the Flash deployment hierarchy and learn how to manage your apps." +description: "Manage Flash apps and environments with the flash app and flash env commands." tag: "BETA" --- -Flash organizes deployments using a two-level hierarchy: **apps** and **environments**. This structure enables standard development workflows where you can test changes in development, validate in staging, and deploy to production. +This page covers practical commands and workflows for managing Flash apps and environments. For a conceptual overview of the deployment hierarchy, see the [development lifecycle guide](/flash/apps/overview). -## What is a Flash app? +## Flash apps -A **Flash app** is a cloud-side container that groups everything related to a single project. Think of it as a project namespace in Runpod that keeps your deployments organized together. - -Each app contains: - -- **Environments**: Deployment contexts like `dev`, `staging`, and `production`. -- **Builds**: Versioned artifacts created from your code. -- **Configuration**: App-wide settings and metadata. +A **Flash app** groups all resources for a single project, including environments, builds, and configuration. ### App hierarchy @@ -68,16 +62,9 @@ Deleting an app removes all environments, builds, endpoints, and volumes associa -## What is an environment? - -An **environment** is an isolated deployment context within a Flash app. Each environment is a separate "stage" that contains its own: - -- **Deployed endpoints**: Serverless endpoints provisioned from your `@remote` functions. -- **Active build version**: The specific version of your code running in this environment. -- **Network volumes**: Persistent storage for models, caches, and data. -- **Deployment state**: Current status (PENDING, DEPLOYING, DEPLOYED, etc.). +## Environments -Environments are completely independent. Deploying to one environment has no effect on others. +An **environment** is an isolated deployment stage within a Flash app (e.g., `dev`, `staging`, `production`). Each environment has its own endpoints, build version, volumes, and deployment state. Environments are completely independent. ### Creating environments @@ -119,49 +106,6 @@ flash env delete dev | FAILED | Deployment or health check failed | | DELETING | Deletion in progress | -## Deployment workflows - -### Single environment (simple projects) - -For simple projects, use a single `production` environment: - -```bash -# First deployment creates app and environment -flash deploy -``` - -### Multiple environments (team projects) - -For team projects, use multiple environments: - -```bash -# Create environments -flash env create dev -flash env create staging -flash env create production - -# Deploy to each -flash deploy --env dev # Development testing -flash deploy --env staging # QA validation -flash deploy --env production # Live deployment -``` - -### Feature branch deployments - -Create temporary environments for feature testing: - -```bash -# Create feature environment -flash env create feature-auth - -# Deploy feature branch -git checkout feature-auth -flash deploy --env feature-auth - -# Clean up after merge -flash env delete feature-auth -``` - ## Best practices ### Naming conventions @@ -212,7 +156,7 @@ flash env create test123 ## Next steps -- [Deploy your first app](/flash/deploy-apps) with `flash deploy`. +- [Deploy your first app](/flash/apps/deploy-apps) with `flash deploy`. - [Learn about the CLI](/flash/cli/overview) for all available commands. - [View the env command reference](/flash/cli/env) for detailed options. - [View the app command reference](/flash/cli/app) for detailed options. diff --git a/flash/build-app.mdx b/flash/apps/build-app.mdx similarity index 98% rename from flash/build-app.mdx rename to flash/apps/build-app.mdx index a225d058..5009330e 100644 --- a/flash/build-app.mdx +++ b/flash/apps/build-app.mdx @@ -260,10 +260,10 @@ curl -X POST https://api-xxxxx.runpod.net/gpu/hello \ -d '{"message": "Hello from production!"}' ``` -For detailed deployment options including environment management, see [Deploy Flash apps](/flash/deploy-apps). +For detailed deployment options including environment management, see [Deploy Flash apps](/flash/apps/deploy-apps). ## Next steps -- [Deploy Flash applications](/flash/deploy-apps) for production use. +- [Deploy Flash applications](/flash/apps/deploy-apps) for production use. - [Configure resources](/flash/resource-configuration) for your endpoints. - [Monitor and debug](/flash/monitoring) your endpoints. diff --git a/flash/deploy-apps.mdx b/flash/apps/deploy-apps.mdx similarity index 97% rename from flash/deploy-apps.mdx rename to flash/apps/deploy-apps.mdx index 8497fe28..3d244519 100644 --- a/flash/deploy-apps.mdx +++ b/flash/apps/deploy-apps.mdx @@ -64,7 +64,7 @@ flowchart TB ### Deploy to an environment -Flash organizes deployments using [apps and environments](/flash/apps-and-environments). Deploy to a specific environment using the `--env` flag: +Flash organizes deployments using [apps and environments](/flash/apps/apps-and-environments). Deploy to a specific environment using the `--env` flag: ```bash # Deploy to staging @@ -221,7 +221,7 @@ def fetch_data(url): ## Next steps -- [Learn about apps and environments](/flash/apps-and-environments) for managing deployments. +- [Learn about apps and environments](/flash/apps/apps-and-environments) for managing deployments. - [View the CLI reference](/flash/cli/overview) for all available commands. - [Configure resources](/flash/resource-configuration) for your endpoints. - [Monitor and debug](/flash/monitoring) your deployments. diff --git a/flash/initialize-project.mdx b/flash/apps/initialize-project.mdx similarity index 96% rename from flash/initialize-project.mdx rename to flash/apps/initialize-project.mdx index b00d4ea9..d7df884d 100644 --- a/flash/initialize-project.mdx +++ b/flash/apps/initialize-project.mdx @@ -203,7 +203,7 @@ flash deploy ## Next steps -- [Test locally](/flash/local-testing) with `flash run`. -- [Build your app](/flash/build-app) by customizing workers. -- [Deploy to production](/flash/deploy-apps) with `flash deploy`. +- [Test locally](/flash/apps/local-testing) with `flash run`. +- [Build your app](/flash/apps/build-app) by customizing workers. +- [Deploy to production](/flash/apps/deploy-apps) with `flash deploy`. - [View the flash init reference](/flash/cli/init) for all options. diff --git a/flash/local-testing.mdx b/flash/apps/local-testing.mdx similarity index 96% rename from flash/local-testing.mdx rename to flash/apps/local-testing.mdx index d19beeb9..cde93ae9 100644 --- a/flash/local-testing.mdx +++ b/flash/apps/local-testing.mdx @@ -16,7 +16,7 @@ Use `flash run` when you want to: ## Start the development server -From inside your [project directory](/flash/initialize-project), run: +From inside your [project directory](/flash/apps/initialize-project), run: ```bash flash run @@ -169,6 +169,6 @@ export RUNPOD_API_KEY=your_api_key_here ## Next steps -- [Deploy to production](/flash/deploy-apps) when your app is ready. +- [Deploy to production](/flash/apps/deploy-apps) when your app is ready. - [Clean up endpoints](/flash/cli/undeploy) after testing. - [View the flash run reference](/flash/cli/run) for all options. diff --git a/flash/apps/overview.mdx b/flash/apps/overview.mdx new file mode 100644 index 00000000..b538cd51 --- /dev/null +++ b/flash/apps/overview.mdx @@ -0,0 +1,255 @@ +--- +title: "Overview" +sidebarTitle: "Overview" +description: "Understand the Flash development lifecycle and how to build and deploy your applications." +tag: "BETA" +--- + +Flash provides a complete development and deployment workflow to build AI/ML applications and services using Runpod's GPU/CPU infrastructure. This page explains the key concepts and processes you'll use when building Flash apps. + + +If you prefer to learn by doing, follow this tuturial to [build your first Flash app](/flash/apps/build-app). + + +## App development overview + +Building a Flash application follows a clear progression from initialization to production deployment: + +
+```mermaid +%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%% + +flowchart TB + Init["flash init
Create project"] + Code["Define endpoints with
@remote functions"] + Run["Test locally with
flash run"] + Deploy["Deploy to Runpod with
flash deploy"] + Manage["Manage apps and
environments with
flash app and flash env"] + + Init --> Code + Code --> Run + Run -->|"Ready for production"| Deploy + Deploy --> Manage + Run -->|"Continue development"| Code + + style Init fill:#5F4CFE,stroke:#5F4CFE,color:#fff + style Code fill:#22C55E,stroke:#22C55E,color:#000 + style Run fill:#4D38F5,stroke:#4D38F5,color:#fff + style Deploy fill:#AE6DFF,stroke:#AE6DFF,color:#000 + style Manage fill:#9289FE,stroke:#9289FE,color:#fff +``` +
+ + + + Use `flash init` to create a new project with a FastAPI server and example workers: + + ```bash + flash init my-project + cd my-project + pip install -r requirements.txt + ``` + + This gives you a working project structure with GPU and CPU worker examples. [Learn more about project initialization](/flash/apps/initialize-project). + + + + Write your application code by defining `@remote` functions that execute on Runpod workers: + + ```python + from runpod_flash import remote, LiveServerless, GpuGroup + + config = LiveServerless( + name="inference-worker", + gpus=[GpuGroup.ADA_24], + workersMax=3, + ) + + @remote(resource_config=config, dependencies=["torch"]) + def run_inference(prompt: str) -> dict: + import torch + # Your inference logic here + return {"result": "..."} + ``` + + [Learn more about building apps](/flash/apps/build-app). + + + + Start a local development server to test your application: + + ```bash + flash run + ``` + + Your FastAPI app runs locally and updates automatically, while `@remote` functions execute on real Runpod workers. This hybrid architecture lets you iterate quickly without deploying after every change. [Learn more about local testing](/flash/apps/local-testing). + + + + When ready for production, deploy your application to Runpod Serverless: + + ```bash + flash deploy + ``` + + Your entire application—including the FastAPI server and all worker functions—runs on Runpod infrastructure. [Learn more about deployment](/flash/apps/deploy-apps). + + + + Use apps and environments to organize and manage your deployments across different stages (dev, staging, production). [Learn more about apps and environments](/flash/apps/apps-and-environments). + + + +## Apps and environments + +Flash uses a two-level organizational structure to manage deployments: **apps** and **environments**. + +### What is a Flash app? + +A **Flash app** is a logical container for all resources related to a single project. Think of it as a namespace that groups together: + +- **Environments**: Different deployment stages (dev, staging, production). +- **Builds**: Versioned artifacts of your application code. +- **Configuration**: App-wide settings and metadata. + +Apps are created automatically when you first run `flash deploy`, or you can create them explicitly with `flash app create`. + +### What is an environment? + +An **environment** is an isolated deployment stage within an app. Each environment has its own: + +- **Deployed endpoints**: Serverless workers for your `@remote` functions. +- **Build version**: The specific code version running in this environment. +- **State**: Current deployment status (deploying, deployed, failed, etc.). + +Environments are completely independent—deploying to `dev` has no effect on `production`. You can create and manage environments with the `flash env` command. + +## Local vs production deployment + +Flash supports two modes of operation: + +### Local development (`flash run`) + +```mermaid +%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%% + +flowchart TB + subgraph Local ["YOUR MACHINE"] + FastAPI["FastAPI App
• Updates automatically
• localhost:8888"] + end + + subgraph Runpod ["RUNPOD SERVERLESS"] + Workers["Workers
• @remote functions
• live- prefix"] + end + + FastAPI -->|"HTTPS"| Workers + + style Local fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff + style Runpod fill:#1a1a2e,stroke:#22C55E,stroke-width:2px,color:#fff + style FastAPI fill:#5F4CFE,stroke:#5F4CFE,color:#fff + style Workers fill:#22C55E,stroke:#22C55E,color:#000 +``` + +**How it works:** +- FastAPI runs on your machine and updates automatically +- `@remote` functions run on Runpod workers +- Endpoints prefixed with `live-` for easy identification +- No authentication required for local testing +- Fast iteration on application logic + +### Production deployment (`flash deploy`) + +```mermaid +%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%% + +flowchart TB + Users(["USERS"]) + + subgraph Runpod ["RUNPOD SERVERLESS"] + Mothership["Mothership
• FastAPI app
• Public URL"] + Workers["Workers
• @remote functions"] + + Mothership -->|"internal"| Workers + end + + Users -->|"HTTPS (auth required)"| Mothership + + style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff + style Users fill:#4D38F5,stroke:#4D38F5,color:#fff + style Mothership fill:#5F4CFE,stroke:#5F4CFE,color:#fff + style Workers fill:#22C55E,stroke:#22C55E,color:#000 +``` + +**How it works:** +- Entire application runs on Runpod Serverless +- FastAPI "mothership" endpoint orchestrates worker calls +- Public HTTPS URL with API key authentication +- Automatic scaling based on load +- Production-grade reliability and performance + +## Common workflows + +### Simple projects (single environment) + +For solo projects or simple applications: + +```bash +# Initialize and develop +flash init my-project +cd my-project + +# Test locally +flash run + +# Deploy to production (creates 'production' environment by default) +flash deploy +``` + +### Team projects (multiple environments) + +For team collaboration with dev, staging, and production stages: + +```bash +# Create environments +flash env create dev +flash env create staging +flash env create production + +# Development cycle +flash run # Test locally +flash deploy --env dev # Deploy to dev for integration testing +flash deploy --env staging # Deploy to staging for QA +flash deploy --env production # Deploy to production after approval +``` + +### Feature development + +For testing new features in isolation: + +```bash +# Create temporary feature environment +flash env create feature-new-model + +# Deploy and test +flash deploy --env feature-new-model + +# Clean up after merging +flash env delete feature-new-model +``` + +## Next steps + + + + Create a Flash app, test it locally, and deploy it to production. + + + Create boilerplate code for a new Flash project with `flash init`. + + + Use `flash run` for local development and testing. + + + Deploy your application to production with `flash deploy`. + + diff --git a/flash/overview.mdx b/flash/overview.mdx index 115fbac9..317c38cb 100644 --- a/flash/overview.mdx +++ b/flash/overview.mdx @@ -15,7 +15,7 @@ Flash is a Python SDK for developing and deploying AI workflows on [Runpod Serve Write a standalone Flash script for instant access to Runpod infrastructure. - + Create a Flash app with a FastAPI server and deploy it on Runpod to serve production endpoints. @@ -211,7 +211,7 @@ flash deploy - Long-running services. - Team collaboration with staging/production environments. -[Follow this tutorial](/flash/build-app) to build your first Flash app. +[Follow this tutorial](/flash/apps/build-app) to build your first Flash app. ### Flash apps @@ -243,7 +243,7 @@ flowchart LR style Deploy fill:#AE6DFF,stroke:#AE6DFF,color:#000 ``` -[Learn more about the Flash app workflow](/flash/build-apps-overview). +[Learn more about the Flash app workflow](/flash/apps/overview). @@ -289,7 +289,7 @@ Flash is well-suited for a range of AI and data processing workloads: Write your first standalone script with Flash - + Create a FastAPI app with Flash diff --git a/flash/quickstart.mdx b/flash/quickstart.mdx index 7813a5e9..83c95583 100644 --- a/flash/quickstart.mdx +++ b/flash/quickstart.mdx @@ -322,6 +322,6 @@ Result standard deviation: 9.8879 You've successfully used Flash to run a GPU workload on Runpod. Now you can: - [Create more complex remote functions](/flash/remote-functions) with custom dependencies and resource configurations. -- [Build a Flash app](/flash/build-app) using FastAPI. -- [Deploy Flash applications](/flash/deploy-apps) for production use. +- [Build a Flash app](/flash/apps/build-app) using FastAPI. +- [Deploy Flash applications](/flash/apps/deploy-apps) for production use. - Explore more examples on the [runpod-workers/flash](https://github.com/runpod-workers/flash) GitHub repository. diff --git a/flash/remote-functions.mdx b/flash/remote-functions.mdx index a139bde0..1378c8fa 100644 --- a/flash/remote-functions.mdx +++ b/flash/remote-functions.mdx @@ -257,6 +257,6 @@ Environment variables are excluded from configuration hashing. Changing environm ## Next steps -- [Create API endpoints](/flash/build-app) using FastAPI. -- [Deploy Flash applications](/flash/deploy-apps) for production. +- [Create API endpoints](/flash/apps/build-app) using FastAPI. +- [Deploy Flash applications](/flash/apps/deploy-apps) for production. - [View the resource configuration reference](/flash/resource-configuration) for all available options. diff --git a/flash/resource-configuration.mdx b/flash/resource-configuration.mdx index 023354af..00bb1710 100644 --- a/flash/resource-configuration.mdx +++ b/flash/resource-configuration.mdx @@ -265,5 +265,5 @@ Environment variables are excluded from configuration hashing. Changing environm ## Next steps - [Create remote functions](/flash/remote-functions) using these configurations. -- [Deploy Flash applications](/flash/deploy-apps) for production. +- [Deploy Flash applications](/flash/apps/deploy-apps) for production. - [Learn about pricing](/flash/pricing) to optimize costs. From 688281e55b4a4c72676e309b9d0a21170cfad06a Mon Sep 17 00:00:00 2001 From: Mo King Date: Thu, 19 Feb 2026 09:59:17 -0500 Subject: [PATCH 08/19] Improve flash apps and run docs, remove endpoint management section --- flash/apps/apps-and-environments.mdx | 12 +- flash/apps/initialize-project.mdx | 6 +- flash/apps/local-testing.mdx | 20 +-- flash/apps/overview.mdx | 18 +- flash/build-apps-overview.mdx | 255 --------------------------- flash/cli/app.mdx | 45 ++--- flash/cli/build.mdx | 2 +- flash/cli/deploy.mdx | 4 +- flash/cli/env.mdx | 18 +- flash/cli/init.mdx | 6 +- flash/cli/overview.mdx | 12 +- flash/cli/undeploy.mdx | 8 +- flash/monitoring.mdx | 56 +----- flash/quickstart.mdx | 18 +- flash/remote-functions.mdx | 1 + 15 files changed, 86 insertions(+), 395 deletions(-) delete mode 100644 flash/build-apps-overview.mdx diff --git a/flash/apps/apps-and-environments.mdx b/flash/apps/apps-and-environments.mdx index 16f74a36..f1a6bb03 100644 --- a/flash/apps/apps-and-environments.mdx +++ b/flash/apps/apps-and-environments.mdx @@ -9,7 +9,7 @@ This page covers practical commands and workflows for managing Flash apps and en ## Flash apps -A **Flash app** groups all resources for a single project, including environments, builds, and configuration. +A **Flash app** is a namespace registered on Runpod's backend that groups all resources for a single project, including environments, builds, and configuration. The app itself is just metadata—actual cloud resources (endpoints, volumes) are created when you deploy to an environment. ### App hierarchy @@ -35,12 +35,14 @@ Flash App (my-project) ### Creating apps -Apps are created automatically when you first run `flash deploy`. You can also create them explicitly: +Apps are created automatically when you first run `flash deploy`. You can also register them explicitly: ```bash -flash app create my-project +flash app create APP_NAME ``` +This registers the app namespace on Runpod's backend but doesn't create any cloud resources or local files. + ### Managing apps Use `flash app` commands to manage your apps: @@ -50,10 +52,10 @@ Use `flash app` commands to manage your apps: flash app list # Get app details -flash app get my-project +flash app get APP_NAME # Delete an app and all its resources -flash app delete --app my-project +flash app delete --app APP_NAME ``` diff --git a/flash/apps/initialize-project.mdx b/flash/apps/initialize-project.mdx index d7df884d..e20b035d 100644 --- a/flash/apps/initialize-project.mdx +++ b/flash/apps/initialize-project.mdx @@ -14,8 +14,8 @@ Use `flash init` whenever you want to start a new Flash project, fully configure Create a new project in a new directory: ```bash -flash init my-project -cd my-project +flash init PROJECT_NAME +cd PROJECT_NAME ``` Or initialize in your current directory: @@ -29,7 +29,7 @@ flash init . `flash init` creates the following structure: ```text -my-project/ +PROJECT_NAME/ ├── main.py # FastAPI application entry point ├── mothership.py # Mothership endpoint configuration ├── workers/ diff --git a/flash/apps/local-testing.mdx b/flash/apps/local-testing.mdx index cde93ae9..3ae547de 100644 --- a/flash/apps/local-testing.mdx +++ b/flash/apps/local-testing.mdx @@ -5,14 +5,14 @@ description: "Use flash run to test your Flash application locally before deploy tag: "BETA" --- -The `flash run` command starts a local development server that lets you test your Flash application before deploying to production. Your FastAPI app runs locally with automatic reloading, while `@remote` functions execute on Runpod Serverless workers. +The `flash run` command starts a local development server that lets you test your Flash application before deploying to production. Your FastAPI app runs locally and updates automatically as you edit files. When you call a `@remote` function, Flash sends the latest function code to Serverless workers on Runpod, so your changes are reflected immediately. Use `flash run` when you want to: -- Iterate quickly on your API logic with automatic reloading. +- Iterate quickly with automatic code updates. - Test `@remote` functions against real GPU/CPU workers. - Debug request/response handling before deployment. -- Develop new endpoints without deploying after every change. +- Develop without redeploying after every change. ## Start the development server @@ -99,13 +99,13 @@ flowchart TB **What runs where:** -| Component | Location | Automatic updates | -|-----------|----------|------------| -| FastAPI app (`main.py`) | Your machine | Yes | -| HTTP routes | Your machine | Yes | -| `@remote` functions | Runpod Serverless | No | +| Component | Location | +|-----------|----------| +| FastAPI app (`main.py`) | Your machine | +| HTTP routes | Your machine | +| `@remote` function code | Runpod Serverless | -Endpoints created by `flash run` are prefixed with `live-` to distinguish them from production endpoints. +Your code updates automatically as you edit files. Endpoints created by `flash run` are prefixed with `live-` to distinguish them from production endpoints. ## Development workflow @@ -137,7 +137,7 @@ Endpoints created by `flash run` persist until you delete them. To clean up: flash undeploy list # Remove a specific endpoint -flash undeploy live-YOUR_ENDPOINT_NAME +flash undeploy ENDPOINT_NAME # Remove all endpoints flash undeploy --all diff --git a/flash/apps/overview.mdx b/flash/apps/overview.mdx index b538cd51..bf4d7474 100644 --- a/flash/apps/overview.mdx +++ b/flash/apps/overview.mdx @@ -45,8 +45,8 @@ flowchart TB Use `flash init` to create a new project with a FastAPI server and example workers: ```bash - flash init my-project - cd my-project + flash init PROJECT_NAME + cd PROJECT_NAME pip install -r requirements.txt ``` @@ -72,7 +72,7 @@ flowchart TB return {"result": "..."} ``` - [Learn more about building apps](/flash/apps/build-app). + [Learn more about remote functions](/flash/remote-functions). @@ -82,7 +82,7 @@ flowchart TB flash run ``` - Your FastAPI app runs locally and updates automatically, while `@remote` functions execute on real Runpod workers. This hybrid architecture lets you iterate quickly without deploying after every change. [Learn more about local testing](/flash/apps/local-testing). + Your FastAPI app runs locally and updates automatically. When you call a `@remote` function, Flash sends the latest code to Runpod workers. This hybrid architecture lets you iterate quickly without redeploying. [Learn more about local testing](/flash/apps/local-testing). @@ -195,8 +195,8 @@ For solo projects or simple applications: ```bash # Initialize and develop -flash init my-project -cd my-project +flash init PROJECT_NAME +cd PROJECT_NAME # Test locally flash run @@ -228,13 +228,13 @@ For testing new features in isolation: ```bash # Create temporary feature environment -flash env create feature-new-model +flash env create FEATURE_NAME # Deploy and test -flash deploy --env feature-new-model +flash deploy --env FEATURE_NAME # Clean up after merging -flash env delete feature-new-model +flash env delete FEATURE_NAME ``` ## Next steps diff --git a/flash/build-apps-overview.mdx b/flash/build-apps-overview.mdx deleted file mode 100644 index 488fbfd1..00000000 --- a/flash/build-apps-overview.mdx +++ /dev/null @@ -1,255 +0,0 @@ ---- -title: "Development lifecycle" -sidebarTitle: "Development lifecycle" -description: "Understand the Flash development lifecycle and how to build and deploy your applications." -tag: "BETA" ---- - -Flash provides a complete development and deployment workflow to build AI/ML applications and services using Runpod's GPU/CPU infrastructure. This page explains the key concepts and processes you'll use when building Flash apps. - - -If you prefer to learn by doing, follow this tuturial to [build your first Flash app](/flash/build-app). - - -## App development overview - -Building a Flash application follows a clear progression from initialization to production deployment: - -
-```mermaid -%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%% - -flowchart TB - Init["flash init
Create project"] - Code["Define endpoints with
@remote functions"] - Run["Test locally with
flash run"] - Deploy["Deploy to Runpod with
flash deploy"] - Manage["Manage apps and
environments with
flash app and flash env"] - - Init --> Code - Code --> Run - Run -->|"Ready for production"| Deploy - Deploy --> Manage - Run -->|"Continue development"| Code - - style Init fill:#5F4CFE,stroke:#5F4CFE,color:#fff - style Code fill:#22C55E,stroke:#22C55E,color:#000 - style Run fill:#4D38F5,stroke:#4D38F5,color:#fff - style Deploy fill:#AE6DFF,stroke:#AE6DFF,color:#000 - style Manage fill:#9289FE,stroke:#9289FE,color:#fff -``` -
- - - - Use `flash init` to create a new project with a FastAPI server and example workers: - - ```bash - flash init my-project - cd my-project - pip install -r requirements.txt - ``` - - This gives you a working project structure with GPU and CPU worker examples. [Learn more about project initialization](/flash/initialize-project). - - - - Write your application code by defining `@remote` functions that execute on Runpod workers: - - ```python - from runpod_flash import remote, LiveServerless, GpuGroup - - config = LiveServerless( - name="inference-worker", - gpus=[GpuGroup.ADA_24], - workersMax=3, - ) - - @remote(resource_config=config, dependencies=["torch"]) - def run_inference(prompt: str) -> dict: - import torch - # Your inference logic here - return {"result": "..."} - ``` - - [Learn more about building apps](/flash/build-app). - - - - Start a local development server to test your application: - - ```bash - flash run - ``` - - Your FastAPI app runs locally and updates automatically, while `@remote` functions execute on real Runpod workers. This hybrid architecture lets you iterate quickly without deploying after every change. [Learn more about local testing](/flash/local-testing). - - - - When ready for production, deploy your application to Runpod Serverless: - - ```bash - flash deploy - ``` - - Your entire application—including the FastAPI server and all worker functions—runs on Runpod infrastructure. [Learn more about deployment](/flash/deploy-apps). - - - - Use apps and environments to organize and manage your deployments across different stages (dev, staging, production). [Learn more about apps and environments](/flash/apps-and-environments). - - - -## Apps and environments - -Flash uses a two-level organizational structure to manage deployments: **apps** and **environments**. - -### What is a Flash app? - -A **Flash app** is a logical container for all resources related to a single project. Think of it as a namespace that groups together: - -- **Environments**: Different deployment stages (dev, staging, production). -- **Builds**: Versioned artifacts of your application code. -- **Configuration**: App-wide settings and metadata. - -Apps are created automatically when you first run `flash deploy`, or you can create them explicitly with `flash app create`. - -### What is an environment? - -An **environment** is an isolated deployment stage within an app. Each environment has its own: - -- **Deployed endpoints**: Serverless workers for your `@remote` functions. -- **Build version**: The specific code version running in this environment. -- **State**: Current deployment status (deploying, deployed, failed, etc.). - -Environments are completely independent—deploying to `dev` has no effect on `production`. You can create and manage environments with the `flash env` command. - -## Local vs production deployment - -Flash supports two modes of operation: - -### Local development (`flash run`) - -```mermaid -%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%% - -flowchart TB - subgraph Local ["YOUR MACHINE"] - FastAPI["FastAPI App
• Updates automatically
• localhost:8888"] - end - - subgraph Runpod ["RUNPOD SERVERLESS"] - Workers["Workers
• @remote functions
• live- prefix"] - end - - FastAPI -->|"HTTPS"| Workers - - style Local fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff - style Runpod fill:#1a1a2e,stroke:#22C55E,stroke-width:2px,color:#fff - style FastAPI fill:#5F4CFE,stroke:#5F4CFE,color:#fff - style Workers fill:#22C55E,stroke:#22C55E,color:#000 -``` - -**How it works:** -- FastAPI runs on your machine and updates automatically -- `@remote` functions run on Runpod workers -- Endpoints prefixed with `live-` for easy identification -- No authentication required for local testing -- Fast iteration on application logic - -### Production deployment (`flash deploy`) - -```mermaid -%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%% - -flowchart TB - Users(["USERS"]) - - subgraph Runpod ["RUNPOD SERVERLESS"] - Mothership["Mothership
• FastAPI app
• Public URL"] - Workers["Workers
• @remote functions"] - - Mothership -->|"internal"| Workers - end - - Users -->|"HTTPS (auth required)"| Mothership - - style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff - style Users fill:#4D38F5,stroke:#4D38F5,color:#fff - style Mothership fill:#5F4CFE,stroke:#5F4CFE,color:#fff - style Workers fill:#22C55E,stroke:#22C55E,color:#000 -``` - -**How it works:** -- Entire application runs on Runpod Serverless -- FastAPI "mothership" endpoint orchestrates worker calls -- Public HTTPS URL with API key authentication -- Automatic scaling based on load -- Production-grade reliability and performance - -## Common workflows - -### Simple projects (single environment) - -For solo projects or simple applications: - -```bash -# Initialize and develop -flash init my-project -cd my-project - -# Test locally -flash run - -# Deploy to production (creates 'production' environment by default) -flash deploy -``` - -### Team projects (multiple environments) - -For team collaboration with dev, staging, and production stages: - -```bash -# Create environments -flash env create dev -flash env create staging -flash env create production - -# Development cycle -flash run # Test locally -flash deploy --env dev # Deploy to dev for integration testing -flash deploy --env staging # Deploy to staging for QA -flash deploy --env production # Deploy to production after approval -``` - -### Feature development - -For testing new features in isolation: - -```bash -# Create temporary feature environment -flash env create feature-new-model - -# Deploy and test -flash deploy --env feature-new-model - -# Clean up after merging -flash env delete feature-new-model -``` - -## Next steps - - - - Create a Flash app, test it locally, and deploy it to production. - - - Create boilerplate code for a new Flash project with `flash init`. - - - Use `flash run` for local development and testing. - - - Deploy your application to production with `flash deploy`. - - diff --git a/flash/cli/app.mdx b/flash/cli/app.mdx index 371ae7ca..15f1bfdc 100644 --- a/flash/cli/app.mdx +++ b/flash/cli/app.mdx @@ -28,12 +28,6 @@ Show all Flash apps under your account. flash app list ``` -### Example - -```bash -flash app list -``` - ### Output ```text @@ -50,28 +44,29 @@ flash app list ## app create -Create a new Flash app. +Register a new Flash app on Runpod's backend. ```bash Command flash app create ``` -### Example - -```bash -flash app create my-project -``` - ### Arguments Name for the new Flash app. Must be unique within your account. -### Notes +### What it creates + +This command registers a Flash app in Runpod's backend—essentially creating a namespace for your environments and builds. It does not: -- App names must be unique within your account. -- Apps are namespaced to your account, so different users can have apps with the same name. +- Create local files (use `flash init` for that) +- Provision cloud resources (endpoints, volumes, etc.) +- Deploy any code + +The app is just a container that groups environments and builds together. + +### When to use @@ -89,12 +84,6 @@ Get detailed information about a Flash app. flash app get ``` -### Example - -```bash -flash app get my-project -``` - ### Arguments @@ -142,12 +131,6 @@ Delete a Flash app and all its associated resources. flash app delete --app ``` -### Example - -```bash -flash app delete --app my-project -``` - ### Flags @@ -205,9 +188,9 @@ Flash App (my-project) Flash CLI automatically detects the app name from your current directory: ```bash -cd /path/to/my-project -flash deploy # Deploys to 'my-project' app -flash env list # Lists 'my-project' environments +cd /path/to/APP_NAME +flash deploy # Deploys to 'APP_NAME' app +flash env list # Lists 'APP_NAME' environments ``` Override with the `--app` flag: diff --git a/flash/cli/build.mdx b/flash/cli/build.mdx index fa125112..fb6da58f 100644 --- a/flash/cli/build.mdx +++ b/flash/cli/build.mdx @@ -9,7 +9,7 @@ Build a deployment-ready artifact for your Flash application without deploying. flash build [OPTIONS] ``` -## Example +## Examples Build with all dependencies: diff --git a/flash/cli/deploy.mdx b/flash/cli/deploy.mdx index 00ee5544..bd4224fa 100644 --- a/flash/cli/deploy.mdx +++ b/flash/cli/deploy.mdx @@ -9,9 +9,9 @@ Build and deploy your Flash application to Runpod Serverless endpoints in one st flash deploy [OPTIONS] ``` -## Example +## Examples -Build and deploy (auto-selects environment if only one exists): +Build and deploy a Flash app from the current directory (auto-selects environment if only one exists): ```bash flash deploy diff --git a/flash/cli/env.mdx b/flash/cli/env.mdx index 00215404..7d4494ba 100644 --- a/flash/cli/env.mdx +++ b/flash/cli/env.mdx @@ -35,7 +35,7 @@ flash env list [OPTIONS] flash env list # List environments for specific app -flash env list --app my-project +flash env list --app APP_NAME ``` ### Flags @@ -73,7 +73,7 @@ flash env create [OPTIONS] flash env create staging # Create environment in specific app -flash env create production --app my-project +flash env create production --app APP_NAME ``` ### Arguments @@ -117,7 +117,7 @@ flash env get [OPTIONS] flash env get production # Get details for specific app's environment -flash env get staging --app my-project +flash env get staging --app APP_NAME ``` ### Arguments @@ -170,14 +170,14 @@ Delete a deployment environment and all its associated resources. flash env delete [OPTIONS] ``` -### Example +### Examples ```bash # Delete development environment flash env delete dev # Delete environment in specific app -flash env delete staging --app my-project +flash env delete staging --app APP_NAME ``` ### Arguments @@ -238,14 +238,14 @@ flash deploy --env production ```bash # Create feature environment -flash env create feature-auth +flash env create FEATURE_NAME # Deploy feature branch -git checkout feature-auth -flash deploy --env feature-auth +git checkout FEATURE_NAME +flash deploy --env FEATURE_NAME # Clean up after merge -flash env delete feature-auth +flash env delete FEATURE_NAME ``` ## Related commands diff --git a/flash/cli/init.mdx b/flash/cli/init.mdx index 6fcf1511..12f93b93 100644 --- a/flash/cli/init.mdx +++ b/flash/cli/init.mdx @@ -14,8 +14,8 @@ flash init [PROJECT_NAME] [OPTIONS] Create a new project directory: ```bash -flash init my-project -cd my-project +flash init PROJECT_NAME +cd PROJECT_NAME pip install -r requirements.txt flash run ``` @@ -43,7 +43,7 @@ Overwrite existing files if they already exist in the target directory. The command creates the following project structure: ```text -my-project/ +PROJECT_NAME/ ├── main.py # FastAPI application entry point ├── workers/ │ ├── gpu/ # GPU worker example diff --git a/flash/cli/overview.mdx b/flash/cli/overview.mdx index aa44caba..6f1b0d66 100644 --- a/flash/cli/overview.mdx +++ b/flash/cli/overview.mdx @@ -67,8 +67,8 @@ flash env --help ```bash # Create a new project -flash init my-project -cd my-project +flash init PROJECT_NAME +cd PROJECT_NAME # Install dependencies pip install -r requirements.txt @@ -85,7 +85,7 @@ flash run flash deploy # Deploy to a specific environment -flash deploy --env production +flash deploy --env ENVIRONMENT_NAME ``` ### Manage deployments @@ -95,10 +95,10 @@ flash deploy --env production flash env list # Check environment status -flash env get production +flash env get ENVIRONMENT_NAME # Remove an environment -flash env delete staging +flash env delete ENVIRONMENT_NAME ``` ### Clean up endpoints @@ -108,7 +108,7 @@ flash env delete staging flash undeploy list # Remove specific endpoint -flash undeploy my-api +flash undeploy ENDPOINT_NAME # Remove all endpoints flash undeploy --all diff --git a/flash/cli/undeploy.mdx b/flash/cli/undeploy.mdx index 870e4ad1..8225182f 100644 --- a/flash/cli/undeploy.mdx +++ b/flash/cli/undeploy.mdx @@ -20,7 +20,7 @@ flash undeploy list Remove a specific endpoint: ```bash -flash undeploy my-api +flash undeploy ENDPOINT_NAME ``` Remove all endpoints: @@ -59,8 +59,8 @@ Output includes: Delete a specific endpoint: ```bash -flash undeploy my-api -``` +flash undeploy ENDPOINT_NAME +``` This: @@ -154,7 +154,7 @@ The tracking file is in `.gitignore` and should never be committed. It contains flash undeploy list # Remove a specific endpoint -flash undeploy my-api +flash undeploy ENDPOINT_NAME # Clean up stale tracking flash undeploy --cleanup-stale diff --git a/flash/monitoring.mdx b/flash/monitoring.mdx index 33da1d53..96212791 100644 --- a/flash/monitoring.mdx +++ b/flash/monitoring.mdx @@ -174,58 +174,4 @@ export RUNPOD_API_KEY=your_api_key_here - Set appropriate `workersMax` limits to control scaling. - Use CPU workers for non-GPU tasks. - Monitor usage in the console to identify optimization opportunities. -- Use shorter `idleTimeout` for sporadic workloads. - -## Endpoint management - -As you work with Flash, endpoints accumulate in your Runpod account. Use `flash undeploy` to manage and clean up endpoints. - -### List deployed endpoints - -View all tracked endpoints with their status: - -```bash -flash undeploy list -``` - -This shows each endpoint's name, ID, status (Active/Inactive), and resource type. - -### Undeploy specific endpoints - -Remove a specific endpoint by name: - -```bash -flash undeploy my-api -``` - -### Undeploy all endpoints - -Remove all tracked endpoints (requires confirmation): - -```bash -flash undeploy --all -``` - -### Interactive selection - -Select endpoints to undeploy using an interactive interface: - -```bash -flash undeploy --interactive -``` - -### Clean up stale tracking - -If you delete endpoints through the Runpod console, the local tracking file becomes stale. Clean it up with: - -```bash -flash undeploy --cleanup-stale -``` - -For production deployments, use `flash env delete` to remove an entire environment and all its associated resources. - - - -For detailed documentation on the undeploy command, see the [flash undeploy CLI reference](/flash/cli/undeploy). - - \ No newline at end of file +- Use shorter `idleTimeout` for sporadic workloads. \ No newline at end of file diff --git a/flash/quickstart.mdx b/flash/quickstart.mdx index 83c95583..2eaaa675 100644 --- a/flash/quickstart.mdx +++ b/flash/quickstart.mdx @@ -317,11 +317,25 @@ Result mean: 500.1321 Result standard deviation: 9.8879 ``` +## Clean up + +When you're done testing, you can clean up the endpoints created during this tutorial. Use the [`flash undeploy`](/flash/cli/undeploy) command to remove development endpoints: + +```bash +# List all endpoints +flash undeploy list + +# Remove a specific endpoint +flash undeploy live-ENDPOINT_NAME + +# Remove all endpoints +flash undeploy --all +``` + ## Next steps You've successfully used Flash to run a GPU workload on Runpod. Now you can: - [Create more complex remote functions](/flash/remote-functions) with custom dependencies and resource configurations. -- [Build a Flash app](/flash/apps/build-app) using FastAPI. -- [Deploy Flash applications](/flash/apps/deploy-apps) for production use. +- [Build and deploy Flash apps](/flash/apps/overview) for production use. - Explore more examples on the [runpod-workers/flash](https://github.com/runpod-workers/flash) GitHub repository. diff --git a/flash/remote-functions.mdx b/flash/remote-functions.mdx index 1378c8fa..dff3baca 100644 --- a/flash/remote-functions.mdx +++ b/flash/remote-functions.mdx @@ -260,3 +260,4 @@ Environment variables are excluded from configuration hashing. Changing environm - [Create API endpoints](/flash/apps/build-app) using FastAPI. - [Deploy Flash applications](/flash/apps/deploy-apps) for production. - [View the resource configuration reference](/flash/resource-configuration) for all available options. +- [Clean up development endpoints](/flash/cli/undeploy) when you're done testing. From 317a8467cfe351d4e0a1a88c59bc69348c9b945f Mon Sep 17 00:00:00 2001 From: Mo King Date: Thu, 19 Feb 2026 13:14:18 -0500 Subject: [PATCH 09/19] Add section for coding agent skills --- flash/overview.mdx | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/flash/overview.mdx b/flash/overview.mdx index 317c38cb..9824ef64 100644 --- a/flash/overview.mdx +++ b/flash/overview.mdx @@ -301,6 +301,18 @@ Flash is well-suited for a range of AI and data processing workloads: +## Coding agent integration + +Flash provides a skill package for AI coding agents like Claude Code, Cline, and Cursor. The skill gives these agents detailed context about the Flash SDK, CLI, best practices, and common patterns. + +Install the Flash skill by running the following command in your terminal: + +```bash +npx skills add runpod/skills +``` + +This allows your coding agent to provide more accurate Flash code suggestions and troubleshooting help. See the [runpod/skills repository](https://github.com/runpod/skills) for more details. + ## Getting help Join the [Runpod community on Discord](https://discord.gg/cUpRmau42V) for support and discussion. From 8a12edf7e686b75914797314f5bc5f674e6ae843 Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 20 Feb 2026 09:27:09 -0500 Subject: [PATCH 10/19] Update overview, remove duplicate info from the quickstart. --- flash/cli/overview.mdx | 41 +-------------- flash/overview.mdx | 111 ++++++++--------------------------------- flash/quickstart.mdx | 6 +-- 3 files changed, 25 insertions(+), 133 deletions(-) diff --git a/flash/cli/overview.mdx b/flash/cli/overview.mdx index 6f1b0d66..db53b4bb 100644 --- a/flash/cli/overview.mdx +++ b/flash/cli/overview.mdx @@ -6,38 +6,7 @@ description: "Learn how to use the Flash CLI for local development and deploymen The Flash CLI provides commands for initializing projects, running local development servers, building deployment artifacts, and managing your applications on Runpod Serverless. -## Install Flash - -Create a Python virtual environment and install Flash using pip: - -```bash -python3 -m venv venv -source venv/bin/activate -pip install runpod-flash -``` - -## Configure your API key - -Flash requires a Runpod API key to provision and manage Serverless endpoints. Create a `.env` file in your project directory: - -```bash -echo "RUNPOD_API_KEY=your_api_key_here" > .env -``` - -You can also set the API key as an environment variable: - - - -```bash -export RUNPOD_API_KEY=your_api_key_here -``` - - -```bash -set RUNPOD_API_KEY=your_api_key_here -``` - - +Before using the CLI, make sure you've [installed Flash](/flash/overview#install-flash) and set your [Runpod API key](/get-started/api-keys) in your environment. ## Available commands @@ -112,10 +81,4 @@ flash undeploy ENDPOINT_NAME # Remove all endpoints flash undeploy --all -``` - -## Next steps - -- [Create a project](/flash/cli/init) with `flash init`. -- [Start developing](/flash/cli/run) with `flash run`. -- [Deploy your app](/flash/cli/deploy) with `flash deploy`. +``` \ No newline at end of file diff --git a/flash/overview.mdx b/flash/overview.mdx index 9824ef64..1a9f044f 100644 --- a/flash/overview.mdx +++ b/flash/overview.mdx @@ -166,39 +166,25 @@ python my_script.py ### Flash apps -Build FastAPI applications with HTTP endpoints that run on Runpod Serverless. Flash apps provide a complete development and deployment workflow with local testing and production deployment. +When you're ready to build a production-ready API, you can build a Flash app with FastAPI and deploy it to Runpod. Flash apps provide a complete development and deployment workflow with local testing and production deployment. -```python -# main.py -from fastapi import FastAPI -from runpod_flash import remote, LiveServerless, GpuGroup - -app = FastAPI() - -config = LiveServerless( - name="api-worker", - gpus=[GpuGroup.ADA_24], -) +Flash comes with a [comprehensive CLI](/flash/cli/overview) that makes getting started with Flash apps easy: -@remote(resource_config=config, dependencies=["torch"]) -def inference(prompt: str): - import torch - # Your inference logic - return {"output": "result"} +Initialize a new Flash app project in your current directory: -@app.post("/inference") -async def inference_endpoint(prompt: str): - result = await inference(prompt) - return result +```bash +flash init ``` -Develop and test locally with automatic updates: +This creates a new project with a FastAPI server and example workers. Remote functions are defined in the `workers/` directory. + +Start a local development server to test your app: ```bash flash run ``` -Deploy to production when ready: +Deploy your app to production when ready: ```bash flash deploy @@ -213,60 +199,6 @@ flash deploy [Follow this tutorial](/flash/apps/build-app) to build your first Flash app. - -### Flash apps - -1. **Initialize**: Create a project with `flash init` -2. **Develop**: Write your FastAPI app with `@remote` functions -3. **Test locally**: Run `flash run` to test with automatic updates -4. **Deploy**: Run `flash deploy` to push to production - -This workflow is ideal for production APIs and services that need persistent endpoints. - -```mermaid -%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%% - -flowchart LR - Init["flash init"] - Dev["Write code"] - Run["flash run
(test locally)"] - Deploy["flash deploy
(production)"] - - Init --> Dev - Dev --> Run - Run -->|"Ready"| Deploy - Run -->|"Continue developing"| Dev - - style Init fill:#5F4CFE,stroke:#5F4CFE,color:#fff - style Dev fill:#22C55E,stroke:#22C55E,color:#000 - style Run fill:#4D38F5,stroke:#4D38F5,color:#fff - style Deploy fill:#AE6DFF,stroke:#AE6DFF,color:#000 -``` - -[Learn more about the Flash app workflow](/flash/apps/overview). - - - -## CLI commands - -Flash provides CLI commands for managing Flash apps: - -| Command | Description | -|---------|-------------| -| [`flash init`](/flash/cli/init) | Create a new Flash app project | -| [`flash run`](/flash/cli/run) | Start the local development server | -| [`flash build`](/flash/cli/build) | Build a deployment artifact | -| [`flash deploy`](/flash/cli/deploy) | Build and deploy to Runpod | -| [`flash env`](/flash/cli/env) | Manage deployment environments | -| [`flash app`](/flash/cli/app) | Manage Flash applications | -| [`flash undeploy`](/flash/cli/undeploy) | Remove deployed endpoints | - - -CLI commands are primarily for Flash apps. Standalone scripts don't require the CLI—just run them with `python`. - - -See the [CLI reference](/flash/cli/overview) for detailed documentation on each command. - ## Use cases Flash is well-suited for a range of AI and data processing workloads: @@ -278,6 +210,18 @@ Flash is well-suited for a range of AI and data processing workloads: - **Data processing workflows**: Process large datasets using CPU workers for general computation and GPU workers for accelerated tasks. - **Hybrid GPU/CPU workflows**: Optimize cost and performance by combining CPU preprocessing with GPU inference. +## Coding agent integration + +Flash provides a skill package for AI coding agents like Claude Code, Cline, and Cursor. The skill gives these agents detailed context about the Flash SDK, CLI, best practices, and common patterns. + +Install the Flash skill by running the following command in your terminal: + +```bash +npx skills add runpod/skills +``` + +This allows your coding agent to provide more accurate Flash code suggestions and troubleshooting help. See the [runpod/skills repository](https://github.com/runpod/skills) for more details. + ## Limitations - Serverless deployments using Flash are currently restricted to the `EU-RO-1` datacenter. @@ -300,19 +244,6 @@ Flash is well-suited for a range of AI and data processing workloads:
- -## Coding agent integration - -Flash provides a skill package for AI coding agents like Claude Code, Cline, and Cursor. The skill gives these agents detailed context about the Flash SDK, CLI, best practices, and common patterns. - -Install the Flash skill by running the following command in your terminal: - -```bash -npx skills add runpod/skills -``` - -This allows your coding agent to provide more accurate Flash code suggestions and troubleshooting help. See the [runpod/skills repository](https://github.com/runpod/skills) for more details. - ## Getting help Join the [Runpod community on Discord](https://discord.gg/cUpRmau42V) for support and discussion. diff --git a/flash/quickstart.mdx b/flash/quickstart.mdx index 2eaaa675..cebc681c 100644 --- a/flash/quickstart.mdx +++ b/flash/quickstart.mdx @@ -36,9 +36,7 @@ pip install runpod-flash ## Step 2: Add your API key to the environment -Add your Runpod API key to your development environment before using Flash to run workloads. - -Run this command to create a `.env` file, replacing `YOUR_API_KEY` with your Runpod API key: +Add your Runpod API key to your development environment before using Flash to run workloads. Run this command to create a `.env` file, replacing `YOUR_API_KEY` with your Runpod API key: ```bash touch .env && echo "RUNPOD_API_KEY=YOUR_API_KEY" > .env @@ -46,7 +44,7 @@ touch .env && echo "RUNPOD_API_KEY=YOUR_API_KEY" > .env -You can create this in your project's root directory or in the `/examples` folder. Make sure your `.env` file is in the same folder as the Python file you create in the next step. +Make sure your `.env` file is in the same folder as the Python file you create in the next step. From 486dae2802dbc0578cb4e48ab7b95e19f8fe9e24 Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 20 Feb 2026 10:04:45 -0500 Subject: [PATCH 11/19] Improve explanation of Flash apps --- flash/apps/apps-and-environments.mdx | 40 +++++++++++++ flash/apps/deploy-apps.mdx | 89 ++++++++++++++++++++++++++++ flash/apps/overview.mdx | 20 +++++-- flash/overview.mdx | 14 +++-- 4 files changed, 152 insertions(+), 11 deletions(-) diff --git a/flash/apps/apps-and-environments.mdx b/flash/apps/apps-and-environments.mdx index f1a6bb03..837e452c 100644 --- a/flash/apps/apps-and-environments.mdx +++ b/flash/apps/apps-and-environments.mdx @@ -64,6 +64,46 @@ Deleting an app removes all environments, builds, endpoints, and volumes associa +## Understanding builds and deployments + +When you run `flash deploy`, three things happen on Runpod: + +### 1. Build artifact is uploaded + +Flash creates a **tarball** (`.flash/artifact.tar.gz`) containing: + +- Your Python code (`main.py`, `workers/`, etc.) +- Pre-installed dependencies (bundled during build) +- Deployment manifest (`flash_manifest.json`) +- Auto-generated handler code + +This tarball is uploaded to Runpod's storage and associated with your app as a "build." + +### 2. Serverless endpoints are provisioned + +For each resource in the manifest, Flash creates a Serverless endpoint: + +**Mothership endpoint** (Load-Balanced): +- Runs your FastAPI app from `main.py` +- Provides the public HTTPS URL for users +- Orchestrates calls to worker endpoints +- Uses pre-built image: `runpod/flash-lb-cpu:latest` + +**Worker endpoints** (Queue-Based): +- Execute your `@remote` functions +- Scale automatically based on load +- Run on GPUs or CPUs based on configuration +- Uses pre-built images: `runpod/flash:latest` (GPU) or `runpod/flash-cpu:latest` (CPU) + +### 3. Environment is activated + +The environment is linked to: +- The uploaded build (specific version of your code) +- The provisioned endpoints (running infrastructure) +- Deployment state (health, status, metrics) + +**Key insight:** You're **not** building custom Docker images. The Flash images are pre-built and generic—they extract your tarball and run your code. This is why deployments are fast (no image build step) and limited to 500 MB (only code and dependencies, not full Docker images). + ## Environments An **environment** is an isolated deployment stage within a Flash app (e.g., `dev`, `staging`, `production`). Each environment has its own endpoints, build version, volumes, and deployment state. Environments are completely independent. diff --git a/flash/apps/deploy-apps.mdx b/flash/apps/deploy-apps.mdx index 3d244519..98e4db36 100644 --- a/flash/apps/deploy-apps.mdx +++ b/flash/apps/deploy-apps.mdx @@ -175,6 +175,95 @@ After building, these artifacts are created in the `.flash/` directory: | `.flash/flash_manifest.json` | Service discovery configuration | | `.flash/.build/` | Temporary build directory (removed by default) | +## What gets deployed to Runpod + +When you deploy a Flash app, you're deploying a **build artifact** (tarball) onto pre-built Flash Docker images. This architecture is similar to AWS Lambda layers: the base runtime is pre-built, and your code and dependencies are layered on top. + +### The build artifact + +The `.flash/artifact.tar.gz` file (max 500 MB) contains: + +```text +artifact.tar.gz/ +├── main.py # Your FastAPI application +├── workers/ # Your worker modules +│ ├── gpu/ +│ │ └── endpoint.py # Functions with @remote +│ └── cpu/ +│ └── endpoint.py +├── flash_manifest.json # Deployment manifest (critical!) +├── requirements.txt # (empty or minimal) +└── [installed dependencies]/ # All pip packages bundled + ├── numpy/ + ├── torch/ + └── ... +``` + +Dependencies are installed locally during the build process and bundled into the tarball. They are **not** installed at runtime on Runpod workers. + +### The deployment manifest + +The `flash_manifest.json` file is the brain of your deployment. It tells each endpoint: + +- Which functions to execute +- What Docker image to use +- How to configure resources (GPUs, workers, scaling) +- How to route HTTP requests (for load balancer endpoints) + +```json +{ + "resources": { + "mothership": { + "resource_type": "CpuLiveLoadBalancer", + "is_mothership": true, + "imageName": "runpod/flash-lb-cpu:latest", + "main_file": "main.py", + "app_variable": "app", + "functions": [...] + }, + "gpu-worker": { + "resource_type": "LiveServerless", + "imageName": "runpod/flash:latest", + "gpuIds": "...", + "workersMax": 5, + "functions": [ + { + "name": "process_on_gpu", + "module": "workers.gpu.endpoint" + } + ] + } + }, + "routes": { + "mothership": { + "POST /api/process": "process_endpoint" + } + } +} +``` + +### What gets created on Runpod + +For each resource in the manifest, Flash creates a Serverless endpoint: + +**Mothership/orchestrator endpoint ([load balancer](/serverless/load-balancing/overview))** +- **Purpose**: Receives HTTP requests, orchestrates `@remote` calls +- **Image**: Pre-built `runpod/flash-lb-cpu:latest` or `runpod/flash-lb:latest` +- **Startup process**: + 1. Container extracts your tarball + 2. Auto-generated handler imports your `main.py` + 3. FastAPI routes are mounted + 4. Uvicorn server starts + +**Worker endpoints (queue-based by default)** +- **Purpose**: Execute compute-intensive `@remote` functions +- **Image**: Pre-built `runpod/flash:latest` (GPU) or `runpod/flash-cpu:latest` (CPU) +- **Startup process**: + 1. Container extracts your tarball + 2. Your worker modules are imported + 3. Function registry is created (maps function names to actual function objects) + 4. Workers listen for jobs with function name + serialized arguments + ## Troubleshooting ### No @remote functions found diff --git a/flash/apps/overview.mdx b/flash/apps/overview.mdx index bf4d7474..d8f2b722 100644 --- a/flash/apps/overview.mdx +++ b/flash/apps/overview.mdx @@ -1,16 +1,25 @@ --- title: "Overview" sidebarTitle: "Overview" -description: "Understand the Flash development lifecycle and how to build and deploy your applications." +description: "Understand the Flash app development lifecycle." tag: "BETA" --- -Flash provides a complete development and deployment workflow to build AI/ML applications and services using Runpod's GPU/CPU infrastructure. This page explains the key concepts and processes you'll use when building Flash apps. +A Flash app is a **FastAPI application with GPU/CPU workers** deployed to Runpod Serverless. When you deploy an app, Runpod: + +1. Packages your code, dependencies, and deployment manifest into a tarball (max 500 MB). +2. Uploads the tarball to Runpod. +3. Provisions Serverless endpoints: + - Orchestrator endpoint: Runs your FastAPI app on a [load balaning](/serverless/load-balancing/overview) endpoint. + - Worker endpoints: Execute your `@remote` functions on GPU/CPU endpoints. + +This page explains the key concepts and processes you'll use when building Flash apps. If you prefer to learn by doing, follow this tuturial to [build your first Flash app](/flash/apps/build-app). + ## App development overview Building a Flash application follows a clear progression from initialization to production deployment: @@ -106,11 +115,12 @@ Flash uses a two-level organizational structure to manage deployments: **apps** ### What is a Flash app? -A **Flash app** is a logical container for all resources related to a single project. Think of it as a namespace that groups together: +A **Flash app** is a logical container for all resources related to a single project. It consists of: +- **App registry entry**: Metadata in Runpod's system (just a namespace). - **Environments**: Different deployment stages (dev, staging, production). -- **Builds**: Versioned artifacts of your application code. -- **Configuration**: App-wide settings and metadata. +- **Builds**: Versioned tarball artifacts containing your code and dependencies. +- **Serverless endpoints**: The actual running infrastructure (mothership + workers). Apps are created automatically when you first run `flash deploy`, or you can create them explicitly with `flash app create`. diff --git a/flash/overview.mdx b/flash/overview.mdx index 1a9f044f..14ae4d1b 100644 --- a/flash/overview.mdx +++ b/flash/overview.mdx @@ -24,13 +24,13 @@ Flash is a Python SDK for developing and deploying AI workflows on [Runpod Serve **Flash is the easiest and fastest way to test and deploy AI/ML workloads on Runpod.** Whether you're prototyping a new model or deploying a production API, Flash handles the infrastructure complexity so you can focus on your code. -When you run a `@remote` function, Flash: +When you run a [remote function](#remote-functions), Flash: - Automatically provisions resources on Runpod's infrastructure. - Installs your dependencies automatically. - Runs your function on a remote GPU/CPU. - Returns the result to your local environment. -You can specify the exact GPU hardware you need, from RTX 4090s to A100 80GB GPUs, for AI inference, training, and other compute-intensive tasks. Functions scale automatically based on demand and can run in parallel across multiple resources. +You can specify the exact GPU hardware you need, from RTX 4090s to A100 80GB GPUs, for AI inference, training, and other compute-intensive tasks. Serverless workers scale automatically based on demand and can run in parallel across multiple GPUs or CPUs. Flash uses [Runpod's Serverless pricing](/serverless/pricing) with per-second billing. You're only charged for actual compute time; there are no costs when your code isn't running. @@ -73,6 +73,8 @@ async def main(): result = await process_data(my_data) ``` +[Learn more about remote functions](/flash/remote-functions). + ### Resource configuration Flash provides fine-grained control over hardware allocation through configuration objects. You can configure GPU types, worker counts, idle timeouts, environment variables, and more. @@ -120,8 +122,7 @@ results = await asyncio.gather( ## Development workflows -Flash supports two main methods for running workloads on Runpod: standalone scripts and Flash apps. - +Flash supports two primary workflows for running workloads on Runpod: standalone scripts and Flash apps. ### Standalone scripts @@ -166,7 +167,8 @@ python my_script.py ### Flash apps -When you're ready to build a production-ready API, you can build a Flash app with FastAPI and deploy it to Runpod. Flash apps provide a complete development and deployment workflow with local testing and production deployment. +When you're ready to build a production-ready API, you can create a [Flash app](/flash/apps/overview) with FastAPI and deploy it to Runpod. Flash apps provide a complete development and deployment workflow with local testing and production deployment. + Flash comes with a [comprehensive CLI](/flash/cli/overview) that makes getting started with Flash apps easy: @@ -220,7 +222,7 @@ Install the Flash skill by running the following command in your terminal: npx skills add runpod/skills ``` -This allows your coding agent to provide more accurate Flash code suggestions and troubleshooting help. See the [runpod/skills repository](https://github.com/runpod/skills) for more details. +This allows your coding agent to provide more accurate Flash code suggestions and troubleshooting help. You can find the Flash `SKILL.md` file in the [runpod/skills repository](https://github.com/runpod/skills/blob/main/flash/SKILL.md). ## Limitations From 6f87d4a892d74377b1b7c66bdb449d4bf28369ef Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 20 Feb 2026 10:12:29 -0500 Subject: [PATCH 12/19] Update flash examples url --- flash/overview.mdx | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/flash/overview.mdx b/flash/overview.mdx index 14ae4d1b..454a25ee 100644 --- a/flash/overview.mdx +++ b/flash/overview.mdx @@ -169,10 +169,7 @@ python my_script.py When you're ready to build a production-ready API, you can create a [Flash app](/flash/apps/overview) with FastAPI and deploy it to Runpod. Flash apps provide a complete development and deployment workflow with local testing and production deployment. - -Flash comes with a [comprehensive CLI](/flash/cli/overview) that makes getting started with Flash apps easy: - -Initialize a new Flash app project in your current directory: +To get started, initialize a new Flash app project in your current directory: ```bash flash init From aa2ba7051081c0fe66aa4153229adec1cb4622b7 Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 20 Feb 2026 10:12:40 -0500 Subject: [PATCH 13/19] Update flash examples url --- flash/quickstart.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/flash/quickstart.mdx b/flash/quickstart.mdx index cebc681c..9869ca27 100644 --- a/flash/quickstart.mdx +++ b/flash/quickstart.mdx @@ -336,4 +336,4 @@ You've successfully used Flash to run a GPU workload on Runpod. Now you can: - [Create more complex remote functions](/flash/remote-functions) with custom dependencies and resource configurations. - [Build and deploy Flash apps](/flash/apps/overview) for production use. -- Explore more examples on the [runpod-workers/flash](https://github.com/runpod-workers/flash) GitHub repository. +- Explore more examples on the [runpod/flash-examples](https://github.com/runpod/flash-examples/) GitHub repository. From e08dd61d64b1c1bb54259cce380d48963e52ec69 Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 20 Feb 2026 10:35:48 -0500 Subject: [PATCH 14/19] Improve "why use flash" --- flash/overview.mdx | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/flash/overview.mdx b/flash/overview.mdx index 454a25ee..12dccaae 100644 --- a/flash/overview.mdx +++ b/flash/overview.mdx @@ -5,6 +5,8 @@ description: "Rapidly develop and deploy AI/ML apps with the Flash Python SDK." tag: "BETA" --- +import { ServerlessTooltip, PodsTooltip } from "/snippets/tooltips.jsx"; + Flash is currently in beta. [Join our Discord](https://discord.gg/cUpRmau42V) to provide feedback and get support. @@ -24,22 +26,14 @@ Flash is a Python SDK for developing and deploying AI workflows on [Runpod Serve **Flash is the easiest and fastest way to test and deploy AI/ML workloads on Runpod.** Whether you're prototyping a new model or deploying a production API, Flash handles the infrastructure complexity so you can focus on your code. -When you run a [remote function](#remote-functions), Flash: -- Automatically provisions resources on Runpod's infrastructure. -- Installs your dependencies automatically. -- Runs your function on a remote GPU/CPU. -- Returns the result to your local environment. +Unlike traditional Runpod (which requires you to build custom Docker images and write handler code) or (which require manual management and bill 24/7), Flash automatically handles infrastructure using simple Python decorators. Just write [remote functions](#remote-functions) in a local Python script and Flash will provision endpoints, install dependencies, and scale GPU/CPU workers automatically. Code updates deploy instantly without your needing to rebuild the worker image. -You can specify the exact GPU hardware you need, from RTX 4090s to A100 80GB GPUs, for AI inference, training, and other compute-intensive tasks. Serverless workers scale automatically based on demand and can run in parallel across multiple GPUs or CPUs. +You can specify the exact hardware you need for each function, from RTX 4090s to A100 80GB GPUs, enabling you to optimize for cost and performance for AI inference, training, and other compute-intensive tasks. Serverless workers scale automatically based on demand and can run in parallel across multiple GPUs or CPUs. -Flash uses [Runpod's Serverless pricing](/serverless/pricing) with per-second billing. You're only charged for actual compute time; there are no costs when your code isn't running. +Flash uses the exact same per-second pricing model as [Runpod Serverless](/serverless/pricing). You're only charged for actual compute time—there are no costs when your code isn't running. ## Install Flash - -Flash requires Python 3.10 or higher. - - Create a Python virtual environment and use `pip` to install Flash: ```bash @@ -48,6 +42,10 @@ source venv/bin/activate pip install runpod-flash ``` + +Flash requires Python 3.10 or higher. + + In your project directory, create a `.env` file and add your Runpod API key, replacing `YOUR_API_KEY` with your actual API key: ```bash @@ -73,6 +71,12 @@ async def main(): result = await process_data(my_data) ``` +When you run a remote function, Flash: +- Automatically provisions resources on Runpod's infrastructure. +- Installs your dependencies automatically. +- Runs your function on a remote GPU/CPU. +- Returns the result to your local environment. + [Learn more about remote functions](/flash/remote-functions). ### Resource configuration From 196da4d42e33823b040162f66a6982a949cef2cc Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 20 Feb 2026 10:48:55 -0500 Subject: [PATCH 15/19] Fix tooltip, improve "why use flash" --- flash/overview.mdx | 12 ++++++++---- snippets/tooltips.jsx | 2 +- 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/flash/overview.mdx b/flash/overview.mdx index 12dccaae..6fd4c526 100644 --- a/flash/overview.mdx +++ b/flash/overview.mdx @@ -5,7 +5,7 @@ description: "Rapidly develop and deploy AI/ML apps with the Flash Python SDK." tag: "BETA" --- -import { ServerlessTooltip, PodsTooltip } from "/snippets/tooltips.jsx"; +import { ServerlessTooltip, PodsTooltip, WorkersTooltip } from "/snippets/tooltips.jsx"; Flash is currently in beta. [Join our Discord](https://discord.gg/cUpRmau42V) to provide feedback and get support. @@ -24,11 +24,15 @@ Flash is a Python SDK for developing and deploying AI workflows on [Runpod Serve ## Why use Flash? -**Flash is the easiest and fastest way to test and deploy AI/ML workloads on Runpod.** Whether you're prototyping a new model or deploying a production API, Flash handles the infrastructure complexity so you can focus on your code. +**Flash is the easiest and fastest way to test and deploy AI/ML workloads on Runpod.** -Unlike traditional Runpod (which requires you to build custom Docker images and write handler code) or (which require manual management and bill 24/7), Flash automatically handles infrastructure using simple Python decorators. Just write [remote functions](#remote-functions) in a local Python script and Flash will provision endpoints, install dependencies, and scale GPU/CPU workers automatically. Code updates deploy instantly without your needing to rebuild the worker image. +Unlike traditional Runpod (which requires you to build custom Docker images and write handler code) or (which require manual management and bill 24/7), Flash automatically handles infrastructure using simple Python decorators. -You can specify the exact hardware you need for each function, from RTX 4090s to A100 80GB GPUs, enabling you to optimize for cost and performance for AI inference, training, and other compute-intensive tasks. Serverless workers scale automatically based on demand and can run in parallel across multiple GPUs or CPUs. +With Flash, you write [remote functions](#remote-functions) in a local Python script. Run the script, and Flash provisions endpoints, installs dependencies, and scales GPU/CPU automatically. Code updates deploy instantly without you needing to rebuild/deploy the worker image—just run the script again. + +You can specify the [exact hardware](#resource-configuration) you need for each function, from RTX 4090s to A100 80GB GPUs, enabling you to optimize for cost and performance for AI inference, training, and other compute-intensive tasks. + +When you're ready to deploy your code to production, build a [Flash app](/flash/apps/overview) with a FastAPI server that routes requests between GPU/CPU workers automatically. The [Flash CLI](/flash/cli/overview) gives you full control over the app's development and deployment lifecycle. Flash uses the exact same per-second pricing model as [Runpod Serverless](/serverless/pricing). You're only charged for actual compute time—there are no costs when your code isn't running. diff --git a/snippets/tooltips.jsx b/snippets/tooltips.jsx index 1751ca50..ace6967f 100644 --- a/snippets/tooltips.jsx +++ b/snippets/tooltips.jsx @@ -83,7 +83,7 @@ export const WorkerTooltip = () => { export const WorkersTooltip = () => { return ( - worker + workers ); }; From 88d95144d98d54273a6cadc9fda3836376a65033 Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 20 Feb 2026 19:17:28 -0500 Subject: [PATCH 16/19] Add custom docker images guide --- docs.json | 1 + flash/custom-docker-images.mdx | 324 +++++++++++++++++++++++++++++++ flash/remote-functions.mdx | 82 ++++---- flash/resource-configuration.mdx | 26 ++- 4 files changed, 392 insertions(+), 41 deletions(-) create mode 100644 flash/custom-docker-images.mdx diff --git a/docs.json b/docs.json index 903bbcb7..4441b111 100644 --- a/docs.json +++ b/docs.json @@ -126,6 +126,7 @@ "flash/pricing", "flash/remote-functions", "flash/resource-configuration", + "flash/custom-docker-images", { "group": "Build apps", "pages": [ diff --git a/flash/custom-docker-images.mdx b/flash/custom-docker-images.mdx new file mode 100644 index 00000000..983b92d3 --- /dev/null +++ b/flash/custom-docker-images.mdx @@ -0,0 +1,324 @@ +--- +title: "Use custom Docker images with Flash" +sidebarTitle: "Custom Docker images" +description: "Deploy pre-built Docker images with Flash using ServerlessEndpoint." +tag: "BETA" +--- + +Flash's `LiveServerless` configuration handles most use cases by automatically managing dependencies and executing arbitrary Python code. However, for specialized environments that require custom Docker images—such as pre-built ML frameworks, specific CUDA versions, or system-level dependencies—you can use `ServerlessEndpoint` or `CpuServerlessEndpoint`. + +## When to use custom Docker images + +Use custom Docker images when you need: + +- **Pre-built inference servers**: vLLM, TensorRT-LLM, or other specialized serving frameworks. +- **System-level dependencies**: Custom CUDA versions, cuDNN, or system libraries not installable via `pip`. +- **Baked-in models**: Large models pre-downloaded in the image to avoid runtime downloads. +- **Existing Serverless workers**: You already have a working Runpod Serverless Docker image that you want to use with Flash. + + +For most use cases, you should use `LiveServerless` and [remote functions](/flash/remote-functions). It's simpler, faster, and lets you execute arbitrary Python code remotely. + + +## How it works + +Unlike `LiveServerless` (which delivers your Python code to pre-built Flash workers), you can use `ServerlessEndpoint` to create a traditional [Runpod Serverless endpoint](/serverless/overview) using any Docker image you specify. + + + +Here are the key differences between `ServerlessEndpoint` and `LiveServerless` resources: + +| Aspect | LiveServerless | ServerlessEndpoint | +|--------|---------------|-------------------| +| **Code execution** | Delivers Python code with each request | Uses the [handler function](/serverless/workers/handler-functions) in your Docker image | +| **Input format** | Any Python arguments | Dictionary: `{"input": {...}}` | +| **Docker image** | Pre-built Flash images | Your custom image | +| **Dependencies** | Specified in decorator | Baked into Docker image | +| **Use case** | Dynamic Python functions | Pre-built inference servers | + +## Basic usage + + + +Create a `ServerlessEndpoint` resource configuration pointing to your Docker image. For example: + +```python +from runpod_flash import ServerlessEndpoint, GpuGroup + +config = ServerlessEndpoint( + name="my-custom-worker", + imageName="your-registry/your-image:tag", + gpus=[GpuGroup.AMPERE_24], + workersMax=3 +) +``` + + + + +Call `.run()` with a dictionary payload in the format `{"input": {...}}`: + +```python +import asyncio +from runpod_flash import ServerlessEndpoint, GpuGroup, ResourceManager + +async def main(): + # Explicitly provision the endpoint if it doesn't already exist + manager = ResourceManager() + deployed_endpoint = await manager.get_or_deploy_resource(config) + + # Send a request to the endpoint + result = await config.run({ + "input": { + "prompt": "Your input data", + "param1": "value1" + } + }) + print(result) + +if __name__ == "__main__": + asyncio.run(main()) +``` + +**No `@remote` decorator is needed**. The endpoint will process the request using the [handler function](/serverless/workers/handler-functions) that's baked into your Docker image. + + + + +## Complete example: vLLM inference + +This example uses Runpod's official [vLLM worker](/serverless/vllm/overview) to deploy the `microsoft/Phi-3.5-mini-instruct` language model: + +```python title="vllm_example.py" +import asyncio +from runpod_flash import ServerlessEndpoint, GpuGroup, ResourceManager + +# Configure vLLM endpoint +vllm_config = ServerlessEndpoint( + name="vllm-small-model", + imageName="runpod/worker-vllm:stable-cuda12.1.0", + gpus=[GpuGroup.AMPERE_24], # RTX 4090 or similar (24GB) + workersMax=3, + env={ + "MODEL_NAME": "microsoft/Phi-3.5-mini-instruct", + "MAX_MODEL_LEN": "4096", + "GPU_MEMORY_UTILIZATION": "0.9", + "MAX_CONCURRENCY": "30", + } +) + +async def main(): + # Explicitly provision the endpoint if it doesn't exist + manager = ResourceManager() + deployed_endpoint = await manager.get_or_deploy_resource(vllm_config) + + print(f"Endpoint deployed at: {deployed_endpoint.endpoint_url}") + + # Generate text + result = await deployed_endpoint.run({ + "input": { + "prompt": "Explain quantum computing in simple terms:", + "max_tokens": 100, + "temperature": 0.7 + } + }) + + # Extract the generated text + text = result.output[0]['choices'][0]['tokens'][0] + print(f"\nGenerated text: {text}") + +if __name__ == "__main__": + asyncio.run(main()) +``` + +Here's what happens when you run this code: + + +1. **Resource configuration**: The `ServerlessEndpoint` configuration specifies the official Runpod [vLLM worker](/serverless/vllm/overview) Docker image and GPU requirements. +2. **Environment variables**: Model and vLLM settings are configured via `env`. +3. **Provisioning**: In `main()`, `ResourceManager.get_or_deploy_resource()` creates the endpoint if it doesn't already exist. +3. **Request**: The input is sent as a dictionary via `.run()` to the deployed vLLM endpoint, matching the worker's expected input format. +4. **Response**: The results are extracted from the nested response structure. + +## Available Docker images + +### Official Runpod workers + +Runpod provides pre-built worker images for common frameworks: + +| Framework | Image | Image link | +|-----------|-------|---------------| +| vLLM | `runpod/worker-vllm` | [Link](https://hub.docker.com/r/runpod/worker-vllm) | +| Automatic1111 | `runpod/worker-a1111:stable` | [A1111 docs](/serverless/workers/sdxl-a1111) | +| ComfyUI | `runpod/worker-comfy` | [Link]](https://hub.docker.com/r/runpod/worker-comfyui) | + +### Custom images + +To use your own Docker image: + +1. **Build a handler**: Follow the [Serverless handler guide](/serverless/workers/handler-functions) +2. **Create a Dockerfile**: Package your handler with dependencies +3. **Push to registry**: Upload to Docker Hub, GitHub Container Registry, or Runpod's registry +4. **Use in Flash**: Reference the image in `imageName` + +See [Deploy custom workers](/serverless/workers/deploy) for details. + +## Configuration options + +All parameters from `LiveServerless` are available: + +```python +config = ServerlessEndpoint( + name="custom-worker", + imageName="your-registry/image:tag", # Required + gpus=[GpuGroup.AMPERE_80], + workersMin=0, + workersMax=5, + idleTimeout=10, + env={ + "MODEL_PATH": "/models/llama", + "MAX_BATCH_SIZE": "32" + }, + networkVolumeId="vol_abc123", # Optional: persistent storage + executionTimeoutMs=300000 # 5 minutes +) +``` + +See the [resource configuration reference](/flash/resource-configuration) for all available options. + +## CPU endpoints + +For CPU workloads, use `CpuServerlessEndpoint`: + +```python +from runpod_flash import CpuServerlessEndpoint, CpuInstanceType + +config = CpuServerlessEndpoint( + name="cpu-worker", + imageName="your-registry/cpu-worker:latest", + instanceIds=[CpuInstanceType.CPU5C_4_8] # 4 vCPU, 8GB RAM +) +``` + +## Environment variables + +Pass configuration to your Docker image via environment variables. For example: + +```python +config = ServerlessEndpoint( + name="vllm-worker", + imageName="runpod/worker-vllm:stable-cuda12.1.0", + env={ + "MODEL_NAME": "meta-llama/Llama-3.2-3B-Instruct", + "MAX_MODEL_LEN": "8192", + "HF_TOKEN": "hf_...", # For gated models + "TRUST_REMOTE_CODE": "True" + } +) +``` + +## Explicit provisioning + +If it doesn't already exist, you'll need to provision the endpoint before you can make requests. For example: + +```python +from runpod_flash import ResourceManager + +async def main(): + manager = ResourceManager() + deployed = await manager.get_or_deploy_resource(config) + + print(f"Endpoint ID: {deployed.id}") + print(f"Endpoint URL: {deployed.endpoint_url}") + + # Now make requests + result = await deployed.run({"input": {...}}) +``` + +## Request/response format + +### Request structure + +All requests must use the format `{"input": {...}}`. For example: + +```python +{ + "input": { + # Your worker-specific parameters + "param1": "value1", + "param2": "value2" + } +} +``` + +### Response structure + +The response is a `JobOutput` object with these attributes: + +```python +result.id # Job ID +result.workerId # Worker that processed the request +result.status # COMPLETED, IN_PROGRESS, FAILED +result.delayTime # Queue delay in ms +result.executionTime # Execution time in ms +result.output # Worker response (structure varies by worker) +result.error # Error message if failed +``` + +Extract data from `result.output` based on your worker's output format. + + + +## Limitations + +- **Input format**: Only supports dictionary payloads `{"input": {...}}`. You cannot pass arbitrary Python arguments like with `LiveServerless`. +- **Code execution**: Cannot execute arbitrary Python code remotely. Your Docker image must include all logic. +- **@remote decorator**: The `@remote` decorator does not work with `ServerlessEndpoint`. Use `.run()` directly. +- **Handler required**: Your Docker image must implement a Runpod Serverless [handler function](/serverless/workers/handler-functions). + +## Troubleshooting + +### Endpoint fails to initialize + +**Problem**: Workers fail to start or crash immediately. + +**Solutions**: +- Check that your Docker image is compatible with [Runpod Serverless](/serverless/overview) +- Verify environment variables are correct +- Ensure the image includes a valid handler function +- Check worker logs in the Runpod console + +### Out of memory errors + +**Problem**: Workers crash with CUDA OOM or RAM errors. + +**Solutions**: +- Use a larger GPU: `gpus=[GpuGroup.AMPERE_80]` +- Reduce `GPU_MEMORY_UTILIZATION` (for vLLM/ML frameworks) +- Lower `MAX_MODEL_LEN` or batch size +- Reduce `workersMax` to limit parallel execution + +### Wrong response format + +**Problem**: Cannot extract data from `result.output`. + +**Solutions**: +- Check your worker's documentation for response format +- Print the full `result` to see the structure +- Look at worker logs for errors + +### Authentication errors + +**Problem**: Cannot download gated models or private images. + +**Solutions**: +- Add `HF_TOKEN` to `env` for Hugging Face gated models +- Configure Docker registry authentication in Runpod console for private images +- Verify API keys are correct + +## Next steps + +- [View the resource configuration reference](/flash/resource-configuration) for all `ServerlessEndpoint` options +- [Learn about vLLM deployment](/serverless/vllm/overview) for LLM inference +- [Build custom Serverless workers](/serverless/workers/overview) for specialized use cases +- [Create Flash apps](/flash/apps/build-app) combining custom images with FastAPI diff --git a/flash/remote-functions.mdx b/flash/remote-functions.mdx index dff3baca..bca83da0 100644 --- a/flash/remote-functions.mdx +++ b/flash/remote-functions.mdx @@ -7,14 +7,30 @@ tag: "BETA" Remote functions are the core building blocks of Flash. The `@remote` decorator marks Python functions for execution on Runpod's Serverless infrastructure, handling resource provisioning, dependency installation, and data transfer automatically. -## Resource configuration +## How remote functions work + +A remote function is just a Python function that's been marked with the `@remote` decorator. For example: -Every remote function requires a resource configuration that specifies the compute resources to use. Flash provides several configuration classes for different use cases. +```python +@remote(resource_config=config, dependencies=["torch"]) +def run_inference(data): + import torch + # Your inference code here + return result +``` -### LiveServerless +When you call a remote function from a local Python script or [Flash app](/flash/apps/overview), the function code is sent to a Runpod worker. The worker then executes the function code and returns the result to your local environment. + +## Resource configuration + +Every remote function requires a resource configuration that specifies the compute resources to use. `LiveServerless` is the primary configuration class for Flash. It supports full remote code execution, allowing you to run arbitrary Python functions on Runpod's infrastructure. +### GPU configuration + +For GPU workloads, create a `LiveServerless` configuration and specify the [GPU pool(s)](/references/gpu-types#gpu-pools) that your workers will use with the `gpus` parameter. + ```python from runpod_flash import LiveServerless, GpuGroup @@ -46,7 +62,7 @@ See the [resource configuration reference](/flash/resource-configuration) for al ### CPU configuration -For CPU-only workloads, specify `instanceIds` instead of `gpus`: +For CPU workloads, specify `instanceIds` instead of `gpus`: ```python from runpod_flash import LiveServerless, CpuInstanceType @@ -64,6 +80,13 @@ def process_data(data): return df.describe().to_dict() ``` +### Custom Docker images + +For specialized environments that require pre-built Docker images—such as vLLM, TensorRT, or images with custom system dependencies—you'll need to use the `ServerlessEndpoint` configuration. + +See [Custom Docker images](/flash/custom-docker-images) for details. + + ## Dependency management Specify Python packages in the `dependencies` parameter of the `@remote` decorator. Flash installs these packages on the remote worker before executing your function. @@ -80,33 +103,41 @@ def generate_image(prompt): # Your code here ``` -### Important notes about dependencies + +Some packages (like PyTorch) are pre-installed on GPU workers, but including them in dependencies ensures the correct version is available. + + -**Import inside the function**: Always import packages inside the decorated function body, not at the top of your file. These imports need to happen on the remote worker, not in your local environment. +### Import packages inside the function body +You must import packages **inside the decorated function body,** not at the top of your file. This will ensure the imports happen on the remote worker, not in your local environment. + + +**Correct:** imports inside the function. ```python -# Correct - imports inside the function @remote(resource_config=config, dependencies=["numpy"]) def compute(data): import numpy as np # Import here return np.sum(data) +``` +**Incorrect:** imports at top of file won't work. -# Incorrect - imports at top of file won't work +```python import numpy as np # This import happens locally, not on the worker @remote(resource_config=config, dependencies=["numpy"]) def compute(data): - return np.sum(data) # numpy not available on worker + return np.sum(data) # numpy not available on the remote worker ``` -**Version pinning**: You can pin specific versions using standard pip syntax: +### Version pinning + +You can pin specific versions using standard pip syntax: ```python dependencies=["transformers==4.36.0", "torch>=2.0.0"] ``` -**Pre-installed packages**: Some packages (like PyTorch) are pre-installed on GPU workers. Including them in dependencies ensures the correct version is available. - ## Parallel execution Flash functions are asynchronous by default. Use Python's `asyncio` to run multiple functions in parallel: @@ -167,32 +198,6 @@ if __name__ == "__main__": asyncio.run(main()) ``` -## Custom Docker images - -For specialized environments that require a custom Docker image, use `ServerlessEndpoint` or `CpuServerlessEndpoint` instead of `LiveServerless`: - -```python -from runpod_flash import ServerlessEndpoint, GpuGroup - -custom_gpu = ServerlessEndpoint( - name="custom-ml-env", - imageName="pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime", - gpus=[GpuGroup.AMPERE_80] -) -``` - - - -Unlike `LiveServerless`, `ServerlessEndpoint` and `CpuServerlessEndpoint` only support dictionary payloads in the form of `{"input": {...}}` (similar to a traditional [Serverless endpoint request](/serverless/endpoints/send-requests)). They cannot execute arbitrary Python functions remotely. - - - -Use custom Docker images when you need: - -- Pre-installed system-level dependencies. -- Specific CUDA or cuDNN versions. -- Custom base images with large models baked in. - ## Using persistent storage Attach [network volumes](/storage/network-volumes) for persistent storage across workers and endpoints. This is useful for sharing large models or datasets between workers without downloading them each time. @@ -255,6 +260,7 @@ Environment variables are excluded from configuration hashing. Changing environm + ## Next steps - [Create API endpoints](/flash/apps/build-app) using FastAPI. diff --git a/flash/resource-configuration.mdx b/flash/resource-configuration.mdx index 00bb1710..8048800c 100644 --- a/flash/resource-configuration.mdx +++ b/flash/resource-configuration.mdx @@ -74,9 +74,16 @@ config = LiveServerless( ## ServerlessEndpoint -`ServerlessEndpoint` is for GPU workloads that require custom Docker images. Unlike `LiveServerless`, it only supports dictionary payloads and cannot execute arbitrary Python functions. +`ServerlessEndpoint` is for GPU workloads that require custom Docker images. -```python +These resources work similarly to [traditional Serverless endpoints](/serverless/overview). Before you can run your function, you'll need to: +- Write a [handler function](/serverless/workers/handler-functions) that processes the input dictionary. +- [Create a Dockerfile](/serverless/workers/create-dockerfile) that packages your handler function and its dependencies. +- [Push the image to a container registry](/serverless/workers/deploy). + +You'll then add the image name to your resource configuration: + +```python highlight="5" from runpod_flash import ServerlessEndpoint, GpuGroup config = ServerlessEndpoint( @@ -86,6 +93,19 @@ config = ServerlessEndpoint( ) ``` + +### Request structure + +When you make requests to the endpoint, you'll need to provide the input as a dictionary in the form of `{"input": {...}}`. For example: + +```json +{ + "input": { + "prompt": "Hello, world!" + } +} +``` + ### Parameters All parameters from `LiveServerless` are available, plus: @@ -98,7 +118,7 @@ All parameters from `LiveServerless` are available, plus: - Only supports dictionary payloads in the form of `{"input": {...}}`. - Cannot execute arbitrary Python functions remotely. -- Requires a custom Docker image with a handler that processes the input dictionary. +- Requires a custom Docker image with a [handler function](/serverless/workers/handler-functions) that processes the input dictionary. ### Example From 2a6297835fb9482ca63225414c554ff6939293b9 Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 20 Feb 2026 19:19:29 -0500 Subject: [PATCH 17/19] Update "why use flash" --- flash/overview.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/flash/overview.mdx b/flash/overview.mdx index 6fd4c526..1a3a5414 100644 --- a/flash/overview.mdx +++ b/flash/overview.mdx @@ -28,11 +28,11 @@ Flash is a Python SDK for developing and deploying AI workflows on [Runpod Serve Unlike traditional Runpod (which requires you to build custom Docker images and write handler code) or (which require manual management and bill 24/7), Flash automatically handles infrastructure using simple Python decorators. -With Flash, you write [remote functions](#remote-functions) in a local Python script. Run the script, and Flash provisions endpoints, installs dependencies, and scales GPU/CPU automatically. Code updates deploy instantly without you needing to rebuild/deploy the worker image—just run the script again. +With Flash, you write [remote functions](#remote-functions) in a local Python script. Run the script, and Flash provisions endpoints, installs dependencies, and scales GPU/CPU automatically. When you update your code, the changes are deployed instantly without requiring you to rebuild/redeploy the worker image—just run the script again. You can specify the [exact hardware](#resource-configuration) you need for each function, from RTX 4090s to A100 80GB GPUs, enabling you to optimize for cost and performance for AI inference, training, and other compute-intensive tasks. -When you're ready to deploy your code to production, build a [Flash app](/flash/apps/overview) with a FastAPI server that routes requests between GPU/CPU workers automatically. The [Flash CLI](/flash/cli/overview) gives you full control over the app's development and deployment lifecycle. +When you're ready to deploy your code to production, build a [Flash app](/flash/apps/overview) with a FastAPI server to route requests between GPU/CPU workers. The [Flash CLI](/flash/cli/overview) gives you full control over the app's development and deployment lifecycle. Flash uses the exact same per-second pricing model as [Runpod Serverless](/serverless/pricing). You're only charged for actual compute time—there are no costs when your code isn't running. From 822b7bda90db4e4447ac4699a6c4f4e5af6ef864 Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 20 Feb 2026 19:26:22 -0500 Subject: [PATCH 18/19] Update --- flash/overview.mdx | 8 ++++++-- flash/remote-functions.mdx | 2 +- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/flash/overview.mdx b/flash/overview.mdx index 1a3a5414..cb4eb85c 100644 --- a/flash/overview.mdx +++ b/flash/overview.mdx @@ -28,7 +28,9 @@ Flash is a Python SDK for developing and deploying AI workflows on [Runpod Serve Unlike traditional Runpod (which requires you to build custom Docker images and write handler code) or (which require manual management and bill 24/7), Flash automatically handles infrastructure using simple Python decorators. -With Flash, you write [remote functions](#remote-functions) in a local Python script. Run the script, and Flash provisions endpoints, installs dependencies, and scales GPU/CPU automatically. When you update your code, the changes are deployed instantly without requiring you to rebuild/redeploy the worker image—just run the script again. +With Flash, you write [remote functions](#remote-functions) using local Python scripts. Run the script, and Flash provisions endpoints, installs dependencies, and scales GPU/CPU automatically. + +When you update your code, the changes are deployed instantly without requiring you to rebuild/redeploy the worker image—just run the script again. You can specify the [exact hardware](#resource-configuration) you need for each function, from RTX 4090s to A100 80GB GPUs, enabling you to optimize for cost and performance for AI inference, training, and other compute-intensive tasks. @@ -97,7 +99,7 @@ gpu_config = LiveServerless( ) ``` -[View the complete configuration reference](/flash/resource-configuration). +[View the complete resource configuration reference](/flash/resource-configuration). ### Dependency management @@ -116,6 +118,8 @@ def generate_image(prompt): Imports should be placed inside the function body because they need to happen on the remote worker, not in your local environment. +[Learn more about dependency management](/flash/remote-functions#dependency-management). + ### Parallel execution Run multiple remote functions concurrently using Python's async capabilities: diff --git a/flash/remote-functions.mdx b/flash/remote-functions.mdx index bca83da0..7a9f0772 100644 --- a/flash/remote-functions.mdx +++ b/flash/remote-functions.mdx @@ -19,7 +19,7 @@ def run_inference(data): return result ``` -When you call a remote function from a local Python script or [Flash app](/flash/apps/overview), the function code is sent to a Runpod worker. The worker then executes the function code and returns the result to your local environment. +When you call a remote function from a local Python script or [Flash app](/flash/apps/overview), the function code is sent to a Runpod worker. The worker executes the function code and returns the result to your local environment. ## Resource configuration From 222fb0564f3830567dfbc7df8334cd99294ff0e3 Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 20 Feb 2026 19:27:40 -0500 Subject: [PATCH 19/19] Update --- flash/remote-functions.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/flash/remote-functions.mdx b/flash/remote-functions.mdx index 7a9f0772..f1aef608 100644 --- a/flash/remote-functions.mdx +++ b/flash/remote-functions.mdx @@ -18,7 +18,7 @@ def run_inference(data): # Your inference code here return result ``` - +x When you call a remote function from a local Python script or [Flash app](/flash/apps/overview), the function code is sent to a Runpod worker. The worker executes the function code and returns the result to your local environment. ## Resource configuration @@ -48,12 +48,12 @@ def run_inference(data): return result ``` -Common configuration options: +Here are the common configuration options for `LiveServerless`: | Parameter | Description | Default | |-----------|-------------|---------| | `name` | Name for your endpoint (required) | - | -| `gpus` | GPU pool IDs that can be used | `[GpuGroup.ANY]` | +| `gpus` | [GPU pool IDs](/references/gpu-types#gpu-pools) that can be used by workers | `[GpuGroup.ANY]` | | `workersMax` | Maximum number of workers | 3 | | `workersMin` | Minimum number of workers | 0 | | `idleTimeout` | Minutes before scaling down | 5 |