+```
+
+### Flags
+
+
+Name of the Flash app to delete. Required explicitly for safety.
+
+
+
+
+Unlike other subcommands, `delete` requires the `--app` flag explicitly. This is a safety measure for destructive operations.
+
+
+
+### Process
+
+1. Shows app details and resources to be deleted.
+2. Prompts for confirmation (required).
+3. Deletes all environments and their resources.
+4. Deletes all builds.
+5. Deletes the app.
+
+
+
+This operation is irreversible. All environments, builds, endpoints, volumes, and configuration will be permanently deleted.
+
+
+
+---
+
+## App hierarchy
+
+A Flash app contains environments and builds:
+
+```text
+Flash App (my-project)
+│
+├── Environments
+│ ├── dev
+│ │ ├── Endpoints (ep1, ep2)
+│ │ └── Volumes (vol1)
+│ ├── staging
+│ │ ├── Endpoints (ep1, ep2)
+│ │ └── Volumes (vol1)
+│ └── production
+│ ├── Endpoints (ep1, ep2)
+│ └── Volumes (vol1)
+│
+└── Builds
+ ├── build_v1 (2024-01-15)
+ ├── build_v2 (2024-01-18)
+ └── build_v3 (2024-01-20)
+```
+
+## Auto-detection
+
+Flash CLI automatically detects the app name from your current directory:
+
+```bash
+cd /path/to/APP_NAME
+flash deploy # Deploys to 'APP_NAME' app
+flash env list # Lists 'APP_NAME' environments
+```
+
+Override with the `--app` flag:
+
+```bash
+flash deploy --app other-project
+flash env list --app other-project
+```
+
+## Related commands
+
+- [`flash env`](/flash/cli/env) - Manage environments within an app
+- [`flash deploy`](/flash/cli/deploy) - Deploy to an app's environment
+- [`flash init`](/flash/cli/init) - Create a new project
diff --git a/flash/cli/build.mdx b/flash/cli/build.mdx
new file mode 100644
index 00000000..fb6da58f
--- /dev/null
+++ b/flash/cli/build.mdx
@@ -0,0 +1,184 @@
+---
+title: "build"
+sidebarTitle: "build"
+---
+
+Build a deployment-ready artifact for your Flash application without deploying. Use this for more control over the build process or to inspect the artifact before deploying.
+
+```bash
+flash build [OPTIONS]
+```
+
+## Examples
+
+Build with all dependencies:
+
+```bash
+flash build
+```
+
+Build and launch local preview environment:
+
+```bash
+flash build --preview
+```
+
+Build with excluded packages (for smaller deployment size):
+
+```bash
+flash build --exclude torch,torchvision,torchaudio
+```
+
+Keep the build directory for inspection:
+
+```bash
+flash build --keep-build
+```
+
+## Flags
+
+
+Skip transitive dependencies during pip install. Only installs direct dependencies specified in `@remote` decorators. Useful when the base image already includes dependencies.
+
+
+
+Keep the `.flash/.build` directory after creating the archive. Useful for debugging build issues or inspecting generated files.
+
+
+
+Custom name for the output archive file.
+
+
+
+Comma-separated list of packages to exclude from the build (e.g., `torch,torchvision`). Use this to skip packages already in the base image.
+
+
+
+Launch a local Docker-based test environment after building. Automatically enables `--keep-build`.
+
+
+## What happens during build
+
+1. **Function discovery**: Finds all `@remote` decorated functions.
+2. **Grouping**: Groups functions by their `resource_config`.
+3. **Manifest generation**: Creates `.flash/flash_manifest.json` with endpoint definitions.
+4. **Dependency installation**: Installs Python packages for Linux x86_64.
+5. **Packaging**: Bundles everything into `.flash/artifact.tar.gz`.
+
+## Build artifacts
+
+After running `flash build`:
+
+| File/Directory | Description |
+|----------------|-------------|
+| `.flash/artifact.tar.gz` | Deployment package ready for Runpod |
+| `.flash/flash_manifest.json` | Service discovery configuration |
+| `.flash/.build/` | Temporary build directory (removed unless `--keep-build`) |
+
+## Cross-platform builds
+
+Flash automatically handles cross-platform builds:
+
+- **Automatic platform targeting**: Dependencies are installed for Linux x86_64, regardless of your build platform.
+- **Python version matching**: Uses your current Python version for package compatibility.
+- **Binary wheel enforcement**: Only pre-built wheels are used, preventing compilation issues.
+
+You can build on macOS, Windows, or Linux, and the deployment will work on Runpod.
+
+## Managing deployment size
+
+Runpod Serverless has a **500MB deployment limit**. Use `--exclude` to skip packages already in your base image:
+
+```bash
+# For GPU deployments (PyTorch pre-installed)
+flash build --exclude torch,torchvision,torchaudio
+```
+
+### Base image reference
+
+| Resource type | Base image | Safe to exclude |
+|--------------|------------|-----------------|
+| GPU | PyTorch base | `torch`, `torchvision`, `torchaudio` |
+| CPU | Python slim | Do not exclude ML packages |
+
+
+
+Check the [worker-flash repository](https://github.com/runpod-workers/worker-flash) for current base images and pre-installed packages.
+
+
+
+## Preview environment
+
+Test your deployment locally before pushing to Runpod:
+
+```bash
+flash build --preview
+```
+
+This:
+
+1. Builds your project (creates archive and manifest).
+2. Creates a Docker network for inter-container communication.
+3. Starts one container per resource config (mothership + workers).
+4. Exposes the mothership on `localhost:8000`.
+5. On shutdown (`Ctrl+C`), stops and removes all containers.
+
+### When to use preview
+
+- Test deployment configuration before production.
+- Validate manifest structure.
+- Debug resource provisioning.
+- Verify cross-endpoint function calls.
+
+## Troubleshooting
+
+### Build fails with "functions not found"
+
+Ensure your project has `@remote` decorated functions:
+
+```python
+from runpod_flash import remote, LiveServerless
+
+config = LiveServerless(name="my-worker")
+
+@remote(resource_config=config)
+def my_function(data):
+ return {"result": data}
+```
+
+### Archive is too large
+
+Use `--exclude` or `--no-deps`:
+
+```bash
+flash build --exclude torch,torchvision,torchaudio
+```
+
+### Dependency installation fails
+
+If a package doesn't have Linux x86_64 wheels:
+
+1. Ensure standard pip is installed: `python -m ensurepip --upgrade`
+2. Check PyPI for Linux wheel availability.
+3. For Python 3.13+, some packages may require newer manylinux versions.
+
+### Need to examine generated files
+
+Use `--keep-build`:
+
+```bash
+flash build --keep-build
+ls .flash/.build/
+```
+
+## Related commands
+
+- [`flash deploy`](/flash/cli/deploy) - Build and deploy in one step
+- [`flash run`](/flash/cli/run) - Start development server
+- [`flash env`](/flash/cli/env) - Manage environments
+
+
+
+Most users should use `flash deploy` instead, which runs build and deploy in one step. Use `flash build` when you need more control or want to inspect the artifact.
+
+
diff --git a/flash/cli/deploy.mdx b/flash/cli/deploy.mdx
new file mode 100644
index 00000000..bd4224fa
--- /dev/null
+++ b/flash/cli/deploy.mdx
@@ -0,0 +1,247 @@
+---
+title: "deploy"
+sidebarTitle: "deploy"
+---
+
+Build and deploy your Flash application to Runpod Serverless endpoints in one step. This is the primary command for getting your application running in the cloud.
+
+```bash
+flash deploy [OPTIONS]
+```
+
+## Examples
+
+Build and deploy a Flash app from the current directory (auto-selects environment if only one exists):
+
+```bash
+flash deploy
+```
+
+Deploy to a specific environment:
+
+```bash
+flash deploy --env production
+```
+
+Deploy with excluded packages to reduce size:
+
+```bash
+flash deploy --exclude torch,torchvision,torchaudio
+```
+
+Build and test locally before deploying:
+
+```bash
+flash deploy --preview
+```
+
+## Flags
+
+
+Target environment name (e.g., `dev`, `staging`, `production`). Auto-selected if only one exists. Creates the environment if it doesn't exist.
+
+
+
+Flash app name. Auto-detected from the current directory if not specified.
+
+
+
+Skip transitive dependencies during pip install. Useful when the base image already includes dependencies.
+
+
+
+Comma-separated packages to exclude (e.g., `torch,torchvision`). Use this to stay under the 500MB deployment limit.
+
+
+
+Custom archive name for the build artifact.
+
+
+
+Build and launch a local Docker-based preview environment instead of deploying to Runpod.
+
+
+
+Bundle local `runpod_flash` source instead of the PyPI version. For development and testing only.
+
+
+## What happens during deployment
+
+1. **Build phase**: Creates the deployment artifact (same as `flash build`).
+2. **Environment resolution**: Detects or creates the target environment.
+3. **Upload**: Sends the artifact to Runpod storage.
+4. **Provisioning**: Creates or updates Serverless endpoints.
+5. **Configuration**: Sets up environment variables and service discovery.
+6. **Verification**: Confirms endpoints are healthy.
+
+## Architecture
+
+After deployment, your entire application runs on Runpod Serverless:
+
+
+```mermaid
+%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%
+
+flowchart TB
+ Users(["USERS"])
+
+ subgraph Runpod ["RUNPOD SERVERLESS"]
+ Mothership["MOTHERSHIP ENDPOINT
(your FastAPI app from main.py)
• Your HTTP routes
• Orchestrates @remote calls
• Public URL for users"]
+ GPU["gpu-worker
(your @remote function)"]
+ CPU["cpu-worker
(your @remote function)"]
+
+ Mothership -->|"internal"| GPU
+ Mothership -->|"internal"| CPU
+ end
+
+ Users -->|"HTTPS (authenticated)"| Mothership
+
+ style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
+ style Users fill:#4D38F5,stroke:#4D38F5,color:#fff
+ style Mothership fill:#5F4CFE,stroke:#5F4CFE,color:#fff
+ style GPU fill:#22C55E,stroke:#22C55E,color:#000
+ style CPU fill:#22C55E,stroke:#22C55E,color:#000
+```
+
+
+## Environment management
+
+### Automatic creation
+
+If the specified environment doesn't exist, `flash deploy` creates it:
+
+```bash
+# Creates 'staging' if it doesn't exist
+flash deploy --env staging
+```
+
+### Auto-selection
+
+When you have only one environment, it's selected automatically:
+
+```bash
+# Auto-selects the only available environment
+flash deploy
+```
+
+When multiple environments exist, you must specify one:
+
+```bash
+# Required when multiple environments exist
+flash deploy --env staging
+```
+
+### Default environment
+
+If no environment exists and none is specified, Flash creates a `production` environment by default.
+
+## Post-deployment
+
+After successful deployment, Flash displays:
+
+```text
+✓ Deployment Complete
+
+Your mothership is deployed at:
+https://api-xxxxx.runpod.net
+
+Available Routes:
+POST /api/hello
+POST /gpu/process
+
+All endpoints require authentication:
+curl -X POST https://api-xxxxx.runpod.net/api/hello \
+ -H "Authorization: Bearer $RUNPOD_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{"param": "value"}'
+```
+
+### Authentication
+
+All deployed endpoints require authentication with your Runpod API key:
+
+```bash
+export RUNPOD_API_KEY="your_key_here"
+
+curl -X POST https://YOUR_ENDPOINT_URL/path \
+ -H "Authorization: Bearer $RUNPOD_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{"param": "value"}'
+```
+
+## Preview mode
+
+Test locally before deploying:
+
+```bash
+flash deploy --preview
+```
+
+This builds your project and runs it in Docker containers locally:
+
+- Mothership exposed on `localhost:8000`.
+- All containers communicate via Docker network.
+- Press `Ctrl+C` to stop.
+
+## Managing deployment size
+
+Runpod Serverless has a **500MB limit**. Use `--exclude` to skip packages in the base image:
+
+```bash
+# GPU deployments (PyTorch pre-installed)
+flash deploy --exclude torch,torchvision,torchaudio
+```
+
+| Resource type | Safe to exclude |
+|--------------|-----------------|
+| GPU | `torch`, `torchvision`, `torchaudio` |
+| CPU | Do not exclude ML packages |
+
+## flash run vs flash deploy
+
+| Aspect | `flash run` | `flash deploy` |
+|--------|-------------|----------------|
+| FastAPI app runs on | Your machine | Runpod Serverless |
+| `@remote` functions run on | Runpod Serverless | Runpod Serverless |
+| Endpoint naming | `live-` prefix | No prefix |
+| Automatic updates | Yes | No |
+| Use case | Development | Production |
+
+## Troubleshooting
+
+### Multiple environments error
+
+```text
+Error: Multiple environments found: dev, staging, production
+```
+
+Specify the target environment:
+
+```bash
+flash deploy --env staging
+```
+
+### Deployment size limit
+
+Use `--exclude` to reduce size:
+
+```bash
+flash deploy --exclude torch,torchvision,torchaudio
+```
+
+### Authentication fails
+
+Ensure your API key is set:
+
+```bash
+echo $RUNPOD_API_KEY
+export RUNPOD_API_KEY="your_key_here"
+```
+
+## Related commands
+
+- [`flash build`](/flash/cli/build) - Build without deploying
+- [`flash run`](/flash/cli/run) - Local development server
+- [`flash env`](/flash/cli/env) - Manage environments
+- [`flash app`](/flash/cli/app) - Manage applications
+- [`flash undeploy`](/flash/cli/undeploy) - Remove endpoints
diff --git a/flash/cli/env.mdx b/flash/cli/env.mdx
new file mode 100644
index 00000000..7d4494ba
--- /dev/null
+++ b/flash/cli/env.mdx
@@ -0,0 +1,255 @@
+---
+title: "env"
+sidebarTitle: "env"
+---
+
+Manage deployment environments for Flash applications. Environments are isolated deployment contexts (like `dev`, `staging`, `production`) within a Flash app.
+
+```bash Command
+flash env [OPTIONS]
+```
+
+## Subcommands
+
+| Subcommand | Description |
+|------------|-------------|
+| `list` | Show all environments for an app |
+| `create` | Create a new environment |
+| `get` | Show details of an environment |
+| `delete` | Delete an environment and its resources |
+
+---
+
+## env list
+
+Show all available environments for an app.
+
+```bash Command
+flash env list [OPTIONS]
+```
+
+### Example
+
+```bash
+# List environments for current app
+flash env list
+
+# List environments for specific app
+flash env list --app APP_NAME
+```
+
+### Flags
+
+
+Flash app name. Auto-detected from current directory if not specified.
+
+
+### Output
+
+```text
+┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
+┃ Name ┃ ID ┃ Active Build ┃ Created At ┃
+┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
+│ dev │ env_abc123 │ build_xyz789 │ 2024-01-15 10:30 │
+│ staging │ env_def456 │ build_uvw456 │ 2024-01-16 14:20 │
+│ production │ env_ghi789 │ build_rst123 │ 2024-01-20 09:15 │
+└────────────┴─────────────────────┴───────────────────┴──────────────────┘
+```
+
+---
+
+## env create
+
+Create a new deployment environment.
+
+```bash Command
+flash env create [OPTIONS]
+```
+
+### Example
+
+```bash
+# Create staging environment
+flash env create staging
+
+# Create environment in specific app
+flash env create production --app APP_NAME
+```
+
+### Arguments
+
+
+Name for the new environment (e.g., `dev`, `staging`, `production`).
+
+
+### Flags
+
+
+Flash app name. Auto-detected from current directory if not specified.
+
+
+### Notes
+
+- If the app doesn't exist, it's created automatically.
+- Environment names must be unique within an app.
+- Newly created environments have no active build until first deployment.
+
+
+
+You don't always need to create environments explicitly. Running `flash deploy --env ` creates the environment automatically if it doesn't exist.
+
+
+
+---
+
+## env get
+
+Show detailed information about a deployment environment.
+
+```bash Command
+flash env get [OPTIONS]
+```
+
+### Example
+
+```bash
+# Get details for production environment
+flash env get production
+
+# Get details for specific app's environment
+flash env get staging --app APP_NAME
+```
+
+### Arguments
+
+
+Name of the environment to inspect.
+
+
+### Flags
+
+
+Flash app name. Auto-detected from current directory if not specified.
+
+
+### Output
+
+```text
+╭────────────────────────────────────╮
+│ Environment: production │
+├────────────────────────────────────┤
+│ ID: env_ghi789 │
+│ State: DEPLOYED │
+│ Active Build: build_rst123 │
+│ Created: 2024-01-20 09:15:00 │
+╰────────────────────────────────────╯
+
+ Associated Endpoints
+┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
+┃ Name ┃ ID ┃
+┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
+│ my-gpu │ ep_abc123 │
+│ my-cpu │ ep_def456 │
+└────────────────┴────────────────────┘
+
+ Associated Network Volumes
+┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
+┃ Name ┃ ID ┃
+┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
+│ model-cache │ nv_xyz789 │
+└────────────────┴────────────────────┘
+```
+
+---
+
+## env delete
+
+Delete a deployment environment and all its associated resources.
+
+```bash Command
+flash env delete [OPTIONS]
+```
+
+### Examples
+
+```bash
+# Delete development environment
+flash env delete dev
+
+# Delete environment in specific app
+flash env delete staging --app APP_NAME
+```
+
+### Arguments
+
+
+Name of the environment to delete.
+
+
+### Flags
+
+
+Flash app name. Auto-detected from current directory if not specified.
+
+
+### Process
+
+1. Shows environment details and resources to be deleted.
+2. Prompts for confirmation (required).
+3. Undeploys all associated endpoints.
+4. Removes all associated network volumes.
+5. Deletes the environment from the app.
+
+
+
+This operation is irreversible. All endpoints, volumes, and configuration associated with the environment will be permanently deleted.
+
+
+
+---
+
+## Environment states
+
+| State | Description |
+|-------|-------------|
+| PENDING | Environment created but not deployed |
+| DEPLOYING | Deployment in progress |
+| DEPLOYED | Successfully deployed and running |
+| FAILED | Deployment or health check failed |
+| DELETING | Deletion in progress |
+
+## Common workflows
+
+### Three-tier deployment
+
+```bash
+# Create environments
+flash env create dev
+flash env create staging
+flash env create production
+
+# Deploy to each
+flash deploy --env dev
+flash deploy --env staging
+flash deploy --env production
+```
+
+### Feature branch testing
+
+```bash
+# Create feature environment
+flash env create FEATURE_NAME
+
+# Deploy feature branch
+git checkout FEATURE_NAME
+flash deploy --env FEATURE_NAME
+
+# Clean up after merge
+flash env delete FEATURE_NAME
+```
+
+## Related commands
+
+- [`flash deploy`](/flash/cli/deploy) - Deploy to an environment
+- [`flash app`](/flash/cli/app) - Manage applications
+- [`flash undeploy`](/flash/cli/undeploy) - Remove specific endpoints
diff --git a/flash/cli/init.mdx b/flash/cli/init.mdx
new file mode 100644
index 00000000..12f93b93
--- /dev/null
+++ b/flash/cli/init.mdx
@@ -0,0 +1,89 @@
+---
+title: "init"
+sidebarTitle: "init"
+---
+
+Create a new Flash project with a ready-to-use template structure including a FastAPI server, example GPU and CPU workers, and configuration files.
+
+```bash
+flash init [PROJECT_NAME] [OPTIONS]
+```
+
+## Example
+
+Create a new project directory:
+
+```bash
+flash init PROJECT_NAME
+cd PROJECT_NAME
+pip install -r requirements.txt
+flash run
+```
+
+Initialize in the current directory:
+
+```bash
+flash init .
+```
+
+## Arguments
+
+
+Name of the project directory to create. If omitted or set to `.`, initializes in the current directory.
+
+
+## Flags
+
+
+Overwrite existing files if they already exist in the target directory.
+
+
+## What it creates
+
+The command creates the following project structure:
+
+```text
+PROJECT_NAME/
+├── main.py # FastAPI application entry point
+├── workers/
+│ ├── gpu/ # GPU worker example
+│ │ ├── __init__.py
+│ │ └── endpoint.py
+│ └── cpu/ # CPU worker example
+│ ├── __init__.py
+│ └── endpoint.py
+├── .env # Environment variables template
+├── .gitignore # Git ignore patterns
+├── .flashignore # Flash deployment ignore patterns
+├── requirements.txt # Python dependencies
+└── README.md # Project documentation
+```
+
+### Template contents
+
+- **main.py**: FastAPI application that imports routers from the `workers/` directory.
+- **workers/gpu/endpoint.py**: Example GPU worker with a `@remote` decorated function using `LiveServerless`.
+- **workers/cpu/endpoint.py**: Example CPU worker with a `@remote` decorated function using CPU configuration.
+- **.env**: Template for environment variables including `RUNPOD_API_KEY`.
+
+## Next steps
+
+After initialization:
+
+1. Copy `.env.example` to `.env` (if needed) and add your `RUNPOD_API_KEY`.
+2. Install dependencies: `pip install -r requirements.txt`
+3. Start the development server: `flash run`
+4. Open http://localhost:8888/docs to explore the API.
+5. Customize the workers for your use case.
+6. Deploy with `flash deploy` when ready.
+
+
+
+This command only creates local files. It doesn't interact with Runpod or create any cloud resources. Cloud resources are created when you run `flash run` or `flash deploy`.
+
+
+
+## Related commands
+
+- [`flash run`](/flash/cli/run) - Start the development server
+- [`flash deploy`](/flash/cli/deploy) - Build and deploy to Runpod
diff --git a/flash/cli/overview.mdx b/flash/cli/overview.mdx
new file mode 100644
index 00000000..6f1b0d66
--- /dev/null
+++ b/flash/cli/overview.mdx
@@ -0,0 +1,121 @@
+---
+title: "CLI overview"
+sidebarTitle: "Overview"
+description: "Learn how to use the Flash CLI for local development and deployment."
+---
+
+The Flash CLI provides commands for initializing projects, running local development servers, building deployment artifacts, and managing your applications on Runpod Serverless.
+
+## Install Flash
+
+Create a Python virtual environment and install Flash using pip:
+
+```bash
+python3 -m venv venv
+source venv/bin/activate
+pip install runpod-flash
+```
+
+## Configure your API key
+
+Flash requires a Runpod API key to provision and manage Serverless endpoints. Create a `.env` file in your project directory:
+
+```bash
+echo "RUNPOD_API_KEY=your_api_key_here" > .env
+```
+
+You can also set the API key as an environment variable:
+
+
+
+```bash
+export RUNPOD_API_KEY=your_api_key_here
+```
+
+
+```bash
+set RUNPOD_API_KEY=your_api_key_here
+```
+
+
+
+## Available commands
+
+| Command | Description |
+|---------|-------------|
+| [`flash init`](/flash/cli/init) | Create a new Flash project with a template structure |
+| [`flash run`](/flash/cli/run) | Start the local development server with automatic updates |
+| [`flash build`](/flash/cli/build) | Build a deployment artifact without deploying |
+| [`flash deploy`](/flash/cli/deploy) | Build and deploy your application to Runpod |
+| [`flash env`](/flash/cli/env) | Manage deployment environments |
+| [`flash app`](/flash/cli/app) | Manage Flash applications |
+| [`flash undeploy`](/flash/cli/undeploy) | Remove deployed endpoints |
+
+## Getting help
+
+View help for any command by adding `--help`:
+
+```bash
+flash --help
+flash deploy --help
+flash env --help
+```
+
+## Common workflows
+
+### Local development
+
+```bash
+# Create a new project
+flash init PROJECT_NAME
+cd PROJECT_NAME
+
+# Install dependencies
+pip install -r requirements.txt
+
+# Add your API key to .env
+# Start the development server
+flash run
+```
+
+### Deploy to production
+
+```bash
+# Build and deploy
+flash deploy
+
+# Deploy to a specific environment
+flash deploy --env ENVIRONMENT_NAME
+```
+
+### Manage deployments
+
+```bash
+# List environments
+flash env list
+
+# Check environment status
+flash env get ENVIRONMENT_NAME
+
+# Remove an environment
+flash env delete ENVIRONMENT_NAME
+```
+
+### Clean up endpoints
+
+```bash
+# List deployed endpoints
+flash undeploy list
+
+# Remove specific endpoint
+flash undeploy ENDPOINT_NAME
+
+# Remove all endpoints
+flash undeploy --all
+```
+
+## Next steps
+
+- [Create a project](/flash/cli/init) with `flash init`.
+- [Start developing](/flash/cli/run) with `flash run`.
+- [Deploy your app](/flash/cli/deploy) with `flash deploy`.
diff --git a/flash/cli/run.mdx b/flash/cli/run.mdx
new file mode 100644
index 00000000..4dab9e6c
--- /dev/null
+++ b/flash/cli/run.mdx
@@ -0,0 +1,156 @@
+---
+title: "run"
+sidebarTitle: "run"
+---
+
+Start the Flash development server for local testing with automatic updates. Your FastAPI app runs locally while `@remote` functions execute on Runpod Serverless.
+
+```bash
+flash run [OPTIONS]
+```
+
+## Example
+
+Start the development server with defaults:
+
+```bash
+flash run
+```
+
+Start with auto-provisioning to eliminate cold-start delays:
+
+```bash
+flash run --auto-provision
+```
+
+Start on a custom port:
+
+```bash
+flash run --port 3000
+```
+
+## Flags
+
+
+Host address to bind the server to.
+
+
+
+Port number to bind the server to.
+
+
+
+Enable or disable auto-reload on code changes. Enabled by default.
+
+
+
+Auto-provision all Serverless endpoints on startup instead of lazily on first call. Eliminates cold-start delays during development.
+
+
+## Architecture
+
+With `flash run`, your system runs in a hybrid architecture:
+
+```mermaid
+%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%
+
+flowchart TB
+ subgraph Local ["YOUR MACHINE (localhost:8888)"]
+ FastAPI["FastAPI App (main.py)
• Your HTTP routes
• Orchestrates @remote calls
• Updates automatically"]
+ end
+
+ subgraph Runpod ["RUNPOD SERVERLESS"]
+ GPU["live-gpu-worker
(your @remote function)"]
+ CPU["live-cpu-worker
(your @remote function)"]
+ end
+
+ FastAPI -->|"HTTPS"| GPU
+ FastAPI -->|"HTTPS"| CPU
+
+ style Local fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
+ style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
+ style FastAPI fill:#5F4CFE,stroke:#5F4CFE,color:#fff
+ style GPU fill:#22C55E,stroke:#22C55E,color:#000
+ style CPU fill:#22C55E,stroke:#22C55E,color:#000
+```
+
+**Key points:**
+
+- Your FastAPI app runs locally and updates automatically for rapid iteration.
+- `@remote` functions run on Runpod as Serverless endpoints.
+- Endpoints are prefixed with `live-` to distinguish from production.
+- Changes to local code are picked up instantly.
+
+This is different from `flash deploy`, where everything runs on Runpod.
+
+## Auto-provisioning
+
+By default, endpoints are provisioned lazily on first `@remote` function call. Use `--auto-provision` to provision all endpoints at server startup:
+
+```bash
+flash run --auto-provision
+```
+
+### How it works
+
+1. **Discovery**: Scans your app for `@remote` decorated functions.
+2. **Deployment**: Deploys resources concurrently (up to 3 at a time).
+3. **Confirmation**: Asks for confirmation if deploying more than 5 endpoints.
+4. **Caching**: Stores deployed resources in `.runpod/resources.pkl` for reuse.
+5. **Updates**: Recognizes existing endpoints and updates if configuration changed.
+
+### Benefits
+
+- **Zero cold start**: All endpoints ready before you test them.
+- **Faster development**: No waiting for deployment on first HTTP call.
+- **Resource reuse**: Cached endpoints are reused across server restarts.
+
+### When to use
+
+- Local development with multiple endpoints.
+- Testing workflows that call multiple remote functions.
+- Debugging where you want deployment separated from handler logic.
+
+## Provisioning modes
+
+| Mode | When endpoints are deployed |
+|------|----------------------------|
+| Default (lazy) | On first `@remote` function call |
+| `--auto-provision` | At server startup |
+
+## Testing your API
+
+Once the server is running, test your endpoints:
+
+```bash
+# Health check
+curl http://localhost:8888/
+
+# Call a GPU endpoint
+curl -X POST http://localhost:8888/gpu/hello \
+ -H "Content-Type: application/json" \
+ -d '{"message": "Hello from GPU!"}'
+```
+
+Open http://localhost:8888/docs for the interactive API explorer.
+
+## Requirements
+
+- `RUNPOD_API_KEY` must be set in your `.env` file or environment.
+- A valid Flash project structure (created by `flash init` or manually).
+
+## flash run vs flash deploy
+
+| Aspect | `flash run` | `flash deploy` |
+|--------|-------------|----------------|
+| FastAPI app runs on | Your machine (localhost) | Runpod Serverless |
+| `@remote` functions run on | Runpod Serverless | Runpod Serverless |
+| Endpoint naming | `live-` prefix | No prefix |
+| Automatic updates | Yes | No |
+| Use case | Development | Production |
+
+## Related commands
+
+- [`flash init`](/flash/cli/init) - Create a new project
+- [`flash deploy`](/flash/cli/deploy) - Deploy to production
+- [`flash undeploy`](/flash/cli/undeploy) - Remove endpoints
diff --git a/flash/cli/undeploy.mdx b/flash/cli/undeploy.mdx
new file mode 100644
index 00000000..8225182f
--- /dev/null
+++ b/flash/cli/undeploy.mdx
@@ -0,0 +1,213 @@
+---
+title: "undeploy"
+sidebarTitle: "undeploy"
+---
+
+Manage and delete Runpod Serverless endpoints deployed via Flash. Use this command to clean up endpoints created during local development with `flash run`.
+
+```bash
+flash undeploy [NAME|list] [OPTIONS]
+```
+
+## Example
+
+List all tracked endpoints:
+
+```bash
+flash undeploy list
+```
+
+Remove a specific endpoint:
+
+```bash
+flash undeploy ENDPOINT_NAME
+```
+
+Remove all endpoints:
+
+```bash
+flash undeploy --all
+```
+
+## Usage modes
+
+### List endpoints
+
+Display all tracked endpoints with their current status:
+
+```bash
+flash undeploy list
+```
+
+Output includes:
+
+- **Name**: Endpoint name
+- **Endpoint ID**: Runpod endpoint identifier
+- **Status**: Current health status (Active/Inactive/Unknown)
+- **Type**: Resource type (Live Serverless, Cpu Live Serverless, etc.)
+
+**Status indicators:**
+
+| Status | Meaning |
+|--------|---------|
+| Active | Endpoint is running and responding |
+| Inactive | Tracking exists but endpoint deleted externally |
+| Unknown | Error during health check |
+
+### Undeploy by name
+
+Delete a specific endpoint:
+
+```bash
+flash undeploy ENDPOINT_NAME
+```
+
+This:
+
+1. Searches for endpoints matching the name.
+2. Shows endpoint details.
+3. Prompts for confirmation.
+4. Deletes the endpoint from Runpod.
+5. Removes from local tracking.
+
+### Undeploy all
+
+Delete all tracked endpoints (requires double confirmation):
+
+```bash
+flash undeploy --all
+```
+
+Safety features:
+
+1. Shows total count of endpoints.
+2. First confirmation: Yes/No prompt.
+3. Second confirmation: Type "DELETE ALL" exactly.
+4. Deletes all endpoints from Runpod.
+5. Removes all from tracking.
+
+### Interactive selection
+
+Select endpoints to undeploy using checkboxes:
+
+```bash
+flash undeploy --interactive
+```
+
+Use arrow keys to navigate, space bar to select/deselect, and Enter to confirm.
+
+### Clean up stale tracking
+
+Remove inactive endpoints from tracking without API deletion:
+
+```bash
+flash undeploy --cleanup-stale
+```
+
+Use this when endpoints were deleted via the Runpod console or API (not through Flash). The local tracking file (`.runpod/resources.pkl`) becomes stale, and this command cleans it up.
+
+## Flags
+
+
+Undeploy all tracked endpoints. Requires double confirmation for safety.
+
+
+
+Interactive checkbox selection mode. Select multiple endpoints to undeploy.
+
+
+
+Remove inactive endpoints from local tracking without attempting API deletion. Use when endpoints were deleted externally.
+
+
+## Arguments
+
+
+Name of the endpoint to undeploy. Use `list` to show all endpoints.
+
+
+## undeploy vs env delete
+
+| Command | Scope | When to use |
+|---------|-------|-------------|
+| `flash undeploy` | Individual endpoints from local tracking | Development cleanup, granular control |
+| `flash env delete` | Entire environment + all resources | Production cleanup, full teardown |
+
+For production deployments, use `flash env delete` to remove entire environments and all associated resources.
+
+## How tracking works
+
+Flash tracks deployed endpoints in `.runpod/resources.pkl`. Endpoints are added when you:
+
+- Run `flash run --auto-provision`
+- Run `flash run` and call `@remote` functions
+- Run `flash deploy`
+
+The tracking file is in `.gitignore` and should never be committed. It contains local deployment state.
+
+## Common workflows
+
+### Basic cleanup
+
+```bash
+# Check what's deployed
+flash undeploy list
+
+# Remove a specific endpoint
+flash undeploy ENDPOINT_NAME
+
+# Clean up stale tracking
+flash undeploy --cleanup-stale
+```
+
+### Bulk operations
+
+```bash
+# Undeploy all endpoints
+flash undeploy --all
+
+# Interactive selection
+flash undeploy --interactive
+```
+
+### Managing external deletions
+
+If you delete endpoints via the Runpod console:
+
+```bash
+# Check status - will show as "Inactive"
+flash undeploy list
+
+# Remove stale tracking entries
+flash undeploy --cleanup-stale
+```
+
+## Troubleshooting
+
+### Endpoint shows as "Inactive"
+
+The endpoint was deleted via Runpod console or API. Clean up:
+
+```bash
+flash undeploy --cleanup-stale
+```
+
+### Can't find endpoint by name
+
+Check the exact name:
+
+```bash
+flash undeploy list
+```
+
+### Undeploy fails with API error
+
+1. Check `RUNPOD_API_KEY` in `.env`.
+2. Verify network connectivity.
+3. Check if the endpoint still exists on Runpod.
+
+## Related commands
+
+- [`flash run`](/flash/cli/run) - Development server (creates endpoints)
+- [`flash deploy`](/flash/cli/deploy) - Deploy to Runpod
+- [`flash env delete`](/flash/cli/env) - Delete entire environment
diff --git a/flash/monitoring.mdx b/flash/monitoring.mdx
new file mode 100644
index 00000000..96212791
--- /dev/null
+++ b/flash/monitoring.mdx
@@ -0,0 +1,177 @@
+---
+title: "Monitor and debug remote functions"
+sidebarTitle: "Monitor and debug"
+description: "Monitor, debug, and troubleshoot Flash deployments."
+tag: "BETA"
+---
+
+This page covers how to monitor and debug your Flash deployments, including viewing logs, troubleshooting common issues, and optimizing performance.
+
+## Viewing logs
+
+When running Flash functions, logs are displayed in your terminal. The output includes:
+
+- Endpoint creation and reuse status.
+- Job submission and queue status.
+- Execution progress.
+- Worker information (delay time, execution time).
+
+Example output:
+
+```text
+2025-11-19 12:35:15,109 | INFO | Created endpoint: rb50waqznmn2kg - flash-quickstart-fb
+2025-11-19 12:35:15,112 | INFO | URL: https://console.runpod.io/serverless/user/endpoint/rb50waqznmn2kg
+2025-11-19 12:35:15,114 | INFO | LiveServerless:rb50waqznmn2kg | API /run
+2025-11-19 12:35:15,655 | INFO | LiveServerless:rb50waqznmn2kg | Started Job:b0b341e7-e460-4305-9acd-fc2dfd1bd65c-u2
+2025-11-19 12:35:15,762 | INFO | Job:b0b341e7-e460-4305-9acd-fc2dfd1bd65c-u2 | Status: IN_QUEUE
+2025-11-19 12:36:09,983 | INFO | Job:b0b341e7-e460-4305-9acd-fc2dfd1bd65c-u2 | Status: COMPLETED
+2025-11-19 12:36:10,068 | INFO | Worker:icmkdgnrmdf8gz | Delay Time: 51842 ms
+2025-11-19 12:36:10,068 | INFO | Worker:icmkdgnrmdf8gz | Execution Time: 1533 ms
+```
+
+### Log levels
+
+You can control log verbosity using the `LOG_LEVEL` environment variable:
+
+```bash
+LOG_LEVEL=DEBUG python your_script.py
+```
+
+Available log levels: `DEBUG`, `INFO`, `WARNING`, `ERROR`.
+
+## Monitoring in the Runpod console
+
+View detailed metrics and logs in the [Runpod console](https://www.runpod.io/console/serverless):
+
+1. Navigate to the **Serverless** section.
+2. Click on your endpoint to view:
+ - Active workers and queue depth.
+ - Request history and job status.
+ - Worker logs and execution details.
+ - Metrics (requests, latency, errors).
+
+### Endpoint metrics
+
+The console provides metrics including:
+
+- **Request rate**: Number of requests per minute.
+- **Queue depth**: Number of pending requests.
+- **Latency**: Average response time.
+- **Worker count**: Active and idle workers.
+- **Error rate**: Failed requests percentage.
+
+## Debugging common issues
+
+### Cold start delays
+
+If you're experiencing slow initial responses:
+
+- **Cause**: Workers need time to start, load dependencies, and initialize models.
+- **Solutions**:
+ - Set `workersMin=1` to keep at least one worker warm.
+ - Use smaller models or optimize model loading.
+ - Use `--auto-provision` with `flash run` for development.
+
+```python
+config = LiveServerless(
+ name="always-warm",
+ workersMin=1, # Keep one worker always running
+ idleTimeout=30 # Longer idle timeout
+)
+```
+
+### Timeout errors
+
+If requests are timing out:
+
+- **Cause**: Execution taking longer than the timeout limit.
+- **Solutions**:
+ - Increase `executionTimeoutMs` in your configuration.
+ - Optimize your function to run faster.
+ - Break long operations into smaller chunks.
+
+```python
+config = LiveServerless(
+ name="long-running",
+ executionTimeoutMs=600000 # 10 minutes
+)
+```
+
+### Memory errors
+
+If you're seeing out-of-memory errors:
+
+- **Cause**: Model or data too large for available GPU/CPU memory.
+- **Solutions**:
+ - Use a larger GPU type (e.g., `GpuGroup.AMPERE_80` for 80GB VRAM).
+ - Use model quantization or smaller batch sizes.
+ - Clear GPU memory between operations.
+
+```python
+config = LiveServerless(
+ name="large-model",
+ gpus=[GpuGroup.AMPERE_80], # A100 80GB
+ template=PodTemplate(containerDiskInGb=100) # More disk space
+)
+```
+
+### Dependency errors
+
+If packages aren't being installed correctly:
+
+- **Cause**: Missing or incompatible dependencies.
+- **Solutions**:
+ - Verify package names and versions in the `dependencies` list.
+ - Check that packages have Linux `x86_64` wheels available.
+ - Import packages inside the function, not at the top of the file.
+
+```python
+@remote(
+ resource_config=config,
+ dependencies=["torch==2.0.0", "transformers==4.36.0"] # Pin versions
+)
+def my_function(data):
+ import torch # Import inside the function
+ import transformers
+ # ...
+```
+
+### Authentication errors
+
+If you're seeing API key errors:
+
+- **Cause**: Missing or invalid Runpod API key.
+- **Solutions**:
+ - Verify your API key is set in the environment.
+ - Check that the `.env` file is in the correct directory.
+ - Ensure the API key has the required permissions.
+
+```bash
+# Check if API key is set
+echo $RUNPOD_API_KEY
+
+# Set API key directly
+export RUNPOD_API_KEY=your_api_key_here
+```
+
+## Performance optimization
+
+### Reducing cold starts
+
+- Set `workersMin=1` for endpoints that need fast responses.
+- Use `idleTimeout` to balance cost and warm worker availability.
+- Cache models on network volumes to reduce loading time.
+
+### Optimizing execution time
+
+- Profile your functions to identify bottlenecks.
+- Use appropriate GPU types for your workload.
+- Batch multiple inputs into a single request when possible.
+- Use async operations to parallelize independent tasks.
+
+### Managing costs
+
+- Set appropriate `workersMax` limits to control scaling.
+- Use CPU workers for non-GPU tasks.
+- Monitor usage in the console to identify optimization opportunities.
+- Use shorter `idleTimeout` for sporadic workloads.
\ No newline at end of file
diff --git a/flash/overview.mdx b/flash/overview.mdx
new file mode 100644
index 00000000..9824ef64
--- /dev/null
+++ b/flash/overview.mdx
@@ -0,0 +1,318 @@
+---
+title: "Overview"
+sidebarTitle: "Overview"
+description: "Rapidly develop and deploy AI/ML apps with the Flash Python SDK."
+tag: "BETA"
+---
+
+
+Flash is currently in beta. [Join our Discord](https://discord.gg/cUpRmau42V) to provide feedback and get support.
+
+
+Flash is a Python SDK for developing and deploying AI workflows on [Runpod Serverless](/serverless/overview). You write Python functions locally, and Flash handles infrastructure management, GPU/CPU provisioning, dependency installation, and data transfer automatically.
+
+
+
+ Write a standalone Flash script for instant access to Runpod infrastructure.
+
+
+ Create a Flash app with a FastAPI server and deploy it on Runpod to serve production endpoints.
+
+
+
+## Why use Flash?
+
+**Flash is the easiest and fastest way to test and deploy AI/ML workloads on Runpod.** Whether you're prototyping a new model or deploying a production API, Flash handles the infrastructure complexity so you can focus on your code.
+
+When you run a `@remote` function, Flash:
+- Automatically provisions resources on Runpod's infrastructure.
+- Installs your dependencies automatically.
+- Runs your function on a remote GPU/CPU.
+- Returns the result to your local environment.
+
+You can specify the exact GPU hardware you need, from RTX 4090s to A100 80GB GPUs, for AI inference, training, and other compute-intensive tasks. Functions scale automatically based on demand and can run in parallel across multiple resources.
+
+Flash uses [Runpod's Serverless pricing](/serverless/pricing) with per-second billing. You're only charged for actual compute time; there are no costs when your code isn't running.
+
+## Install Flash
+
+
+Flash requires Python 3.10 or higher.
+
+
+Create a Python virtual environment and use `pip` to install Flash:
+
+```bash
+python3 -m venv venv
+source venv/bin/activate
+pip install runpod-flash
+```
+
+In your project directory, create a `.env` file and add your Runpod API key, replacing `YOUR_API_KEY` with your actual API key:
+
+```bash
+touch .env && echo "RUNPOD_API_KEY=YOUR_API_KEY" > .env
+```
+
+## Core concepts
+
+### Remote functions
+
+The `@remote` decorator marks functions for execution on Runpod's infrastructure. Code inside the decorated function runs remotely on a Serverless worker, while code outside the function runs locally on your machine.
+
+```python
+@remote(resource_config=config, dependencies=["pandas"])
+def process_data(data):
+ # This code runs remotely on Runpod
+ import pandas as pd
+ df = pd.DataFrame(data)
+ return df.describe().to_dict()
+
+async def main():
+ # This code runs locally
+ result = await process_data(my_data)
+```
+
+### Resource configuration
+
+Flash provides fine-grained control over hardware allocation through configuration objects. You can configure GPU types, worker counts, idle timeouts, environment variables, and more.
+
+```python
+from runpod_flash import remote, LiveServerless, GpuGroup
+
+gpu_config = LiveServerless(
+ name="ml-inference",
+ gpus=[GpuGroup.AMPERE_80], # A100 80GB
+ workersMax=5
+)
+```
+
+[View the complete configuration reference](/flash/resource-configuration).
+
+### Dependency management
+
+Specify Python packages in the decorator, and Flash installs them automatically on the remote worker:
+
+```python
+@remote(
+ resource_config=gpu_config,
+ dependencies=["transformers==4.36.0", "torch", "pillow"]
+)
+def generate_image(prompt):
+ # Import inside the function
+ from transformers import pipeline
+ # ...
+```
+
+Imports should be placed inside the function body because they need to happen on the remote worker, not in your local environment.
+
+### Parallel execution
+
+Run multiple remote functions concurrently using Python's async capabilities:
+
+```python
+results = await asyncio.gather(
+ process_item(item1),
+ process_item(item2),
+ process_item(item3)
+)
+```
+
+## Development workflows
+
+Flash supports two main methods for running workloads on Runpod: standalone scripts and Flash apps.
+
+
+### Standalone scripts
+
+This is the fastest way to get started with Flash. Just write a Python script with `@remote` decorated functions and run it locally with `python script.py`.
+
+```python
+import asyncio
+from runpod_flash import remote, LiveServerless, GpuGroup
+
+config = LiveServerless(
+ name="gpu-inference",
+ gpus=[GpuGroup.ADA_24],
+)
+
+@remote(resource_config=config, dependencies=["torch"])
+def process_on_gpu(data):
+ import torch
+ # Your GPU workload here
+ return {"result": "processed"}
+
+async def main():
+ result = await process_on_gpu({"input": "data"})
+ print(result)
+
+if __name__ == "__main__":
+ asyncio.run(main())
+```
+
+Run the script locally, and Flash executes the `@remote` function on Runpod's infrastructure:
+
+```bash
+python my_script.py
+```
+
+**Use this approach for:**
+- Quick prototypes and experiments.
+- Batch processing jobs.
+- One-off data processing tasks.
+- Local development and testing.
+
+[Follow the quickstart](/flash/quickstart) to create your first Flash script.
+
+### Flash apps
+
+Build FastAPI applications with HTTP endpoints that run on Runpod Serverless. Flash apps provide a complete development and deployment workflow with local testing and production deployment.
+
+```python
+# main.py
+from fastapi import FastAPI
+from runpod_flash import remote, LiveServerless, GpuGroup
+
+app = FastAPI()
+
+config = LiveServerless(
+ name="api-worker",
+ gpus=[GpuGroup.ADA_24],
+)
+
+@remote(resource_config=config, dependencies=["torch"])
+def inference(prompt: str):
+ import torch
+ # Your inference logic
+ return {"output": "result"}
+
+@app.post("/inference")
+async def inference_endpoint(prompt: str):
+ result = await inference(prompt)
+ return result
+```
+
+Develop and test locally with automatic updates:
+
+```bash
+flash run
+```
+
+Deploy to production when ready:
+
+```bash
+flash deploy
+```
+
+**Use this approach for:**
+
+- Production HTTP APIs.
+- Persistent endpoints.
+- Long-running services.
+- Team collaboration with staging/production environments.
+
+[Follow this tutorial](/flash/apps/build-app) to build your first Flash app.
+
+
+### Flash apps
+
+1. **Initialize**: Create a project with `flash init`
+2. **Develop**: Write your FastAPI app with `@remote` functions
+3. **Test locally**: Run `flash run` to test with automatic updates
+4. **Deploy**: Run `flash deploy` to push to production
+
+This workflow is ideal for production APIs and services that need persistent endpoints.
+
+```mermaid
+%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%
+
+flowchart LR
+ Init["flash init"]
+ Dev["Write code"]
+ Run["flash run
(test locally)"]
+ Deploy["flash deploy
(production)"]
+
+ Init --> Dev
+ Dev --> Run
+ Run -->|"Ready"| Deploy
+ Run -->|"Continue developing"| Dev
+
+ style Init fill:#5F4CFE,stroke:#5F4CFE,color:#fff
+ style Dev fill:#22C55E,stroke:#22C55E,color:#000
+ style Run fill:#4D38F5,stroke:#4D38F5,color:#fff
+ style Deploy fill:#AE6DFF,stroke:#AE6DFF,color:#000
+```
+
+[Learn more about the Flash app workflow](/flash/apps/overview).
+
+
+
+## CLI commands
+
+Flash provides CLI commands for managing Flash apps:
+
+| Command | Description |
+|---------|-------------|
+| [`flash init`](/flash/cli/init) | Create a new Flash app project |
+| [`flash run`](/flash/cli/run) | Start the local development server |
+| [`flash build`](/flash/cli/build) | Build a deployment artifact |
+| [`flash deploy`](/flash/cli/deploy) | Build and deploy to Runpod |
+| [`flash env`](/flash/cli/env) | Manage deployment environments |
+| [`flash app`](/flash/cli/app) | Manage Flash applications |
+| [`flash undeploy`](/flash/cli/undeploy) | Remove deployed endpoints |
+
+
+CLI commands are primarily for Flash apps. Standalone scripts don't require the CLI—just run them with `python`.
+
+
+See the [CLI reference](/flash/cli/overview) for detailed documentation on each command.
+
+## Use cases
+
+Flash is well-suited for a range of AI and data processing workloads:
+
+- **Multi-modal AI pipelines**: Orchestrate unified workflows combining text, image, and audio models with GPU acceleration.
+- **Distributed model training**: Scale training operations across multiple GPU workers for faster model development.
+- **AI research experimentation**: Rapidly prototype and test complex model combinations without infrastructure overhead.
+- **Production inference systems**: Deploy multi-stage inference pipelines for real-world applications.
+- **Data processing workflows**: Process large datasets using CPU workers for general computation and GPU workers for accelerated tasks.
+- **Hybrid GPU/CPU workflows**: Optimize cost and performance by combining CPU preprocessing with GPU inference.
+
+## Limitations
+
+- Serverless deployments using Flash are currently restricted to the `EU-RO-1` datacenter.
+- Be aware of your account's maximum worker capacity limits. Flash can rapidly scale workers across multiple endpoints, and you may hit capacity constraints. Contact [Runpod support](https://www.runpod.io/contact) to increase your account's capacity allocation if needed.
+
+## Next steps
+
+
+
+ Write your first standalone script with Flash
+
+
+ Create a FastAPI app with Flash
+
+
+ Complete reference for resource configuration
+
+
+ Learn about Flash CLI commands
+
+
+
+
+## Coding agent integration
+
+Flash provides a skill package for AI coding agents like Claude Code, Cline, and Cursor. The skill gives these agents detailed context about the Flash SDK, CLI, best practices, and common patterns.
+
+Install the Flash skill by running the following command in your terminal:
+
+```bash
+npx skills add runpod/skills
+```
+
+This allows your coding agent to provide more accurate Flash code suggestions and troubleshooting help. See the [runpod/skills repository](https://github.com/runpod/skills) for more details.
+
+## Getting help
+
+Join the [Runpod community on Discord](https://discord.gg/cUpRmau42V) for support and discussion.
diff --git a/flash/pricing.mdx b/flash/pricing.mdx
new file mode 100644
index 00000000..28ca0df8
--- /dev/null
+++ b/flash/pricing.mdx
@@ -0,0 +1,109 @@
+---
+title: "Pricing"
+sidebarTitle: "Pricing"
+description: "Understand Flash pricing and optimize your costs."
+tag: "BETA"
+---
+
+Flash follows the same pricing model as [Runpod Serverless](/serverless/pricing). You pay per second of compute time, with no charges when your code isn't running. Pricing depends on the GPU or CPU type you configure for your endpoints.
+
+## How pricing works
+
+You're billed from when a worker starts until it completes your request, plus any idle time before scaling down. If a worker is already warm, you skip the cold start and only pay for execution time.
+
+### Compute cost breakdown
+
+Flash workers incur charges during these periods:
+
+1. **Start time**: The time required to initialize a worker and load models into GPU memory. This includes starting the container, installing dependencies, and preparing the runtime environment.
+2. **Execution time**: The time spent processing your request (running your `@remote` decorated function).
+3. **Idle time**: The period a worker remains active after completing a request, waiting for additional requests before scaling down.
+
+### Pricing by resource type
+
+Flash supports both GPU and CPU workers. Pricing varies based on the hardware type:
+
+- **GPU workers**: Use `LiveServerless` or `ServerlessEndpoint` with GPU configurations. Pricing depends on the GPU type (e.g., RTX 4090, A100 80GB).
+- **CPU workers**: Use `LiveServerless` or `CpuServerlessEndpoint` with CPU configurations. Pricing depends on the CPU instance type.
+
+See the [Serverless pricing page](/serverless/pricing) for current rates by GPU and CPU type.
+
+## How to estimate and optimize costs
+
+To estimate costs for your Flash workloads, consider:
+
+- How long each function takes to execute.
+- How many concurrent workers you need (`workersMax` setting).
+- Which GPU or CPU types you'll use.
+- Your idle timeout configuration (`idleTimeout` setting).
+
+### Cost optimization strategies
+
+#### Choose appropriate hardware
+
+Select the smallest GPU or CPU that meets your performance requirements. For example, if your workload fits in 24GB of VRAM, use `GpuGroup.ADA_24` or `GpuGroup.AMPERE_24` instead of larger GPUs.
+
+```python
+# Cost-effective configuration for workloads that fit in 24GB VRAM
+config = LiveServerless(
+ name="cost-optimized",
+ gpus=[GpuGroup.ADA_24, GpuGroup.AMPERE_24], # RTX 4090, L4, A5000, 3090
+)
+```
+
+#### Configure idle timeouts
+
+Balance responsiveness and cost by adjusting the `idleTimeout` parameter. Shorter timeouts reduce idle costs but increase cold starts for sporadic traffic.
+
+```python
+# Lower idle timeout for cost savings (more cold starts)
+config = LiveServerless(
+ name="low-idle",
+ idleTimeout=5, # 5 seconds (default)
+)
+
+# Higher idle timeout for responsiveness (higher idle costs)
+config = LiveServerless(
+ name="responsive",
+ idleTimeout=30, # 30 seconds
+)
+```
+
+#### Use CPU workers for non-GPU tasks
+
+For data preprocessing, postprocessing, or other tasks that don't require GPU acceleration, use CPU workers instead of GPU workers.
+
+```python
+from runpod_flash import LiveServerless, CpuInstanceType
+
+# CPU configuration for non-GPU tasks
+cpu_config = LiveServerless(
+ name="data-processor",
+ instanceIds=[CpuInstanceType.CPU5C_2_4], # 2 vCPU, 4GB RAM
+)
+```
+
+#### Limit maximum workers
+
+Set `workersMax` to prevent runaway scaling and unexpected costs:
+
+```python
+config = LiveServerless(
+ name="controlled-scaling",
+ workersMax=3, # Limit to 3 concurrent workers
+)
+```
+
+### Monitoring costs
+
+Monitor your usage in the [Runpod console](https://www.runpod.io/console/serverless) to track:
+
+- Total compute time across endpoints.
+- Worker utilization and idle time.
+- Cost breakdown by endpoint.
+
+## Next steps
+
+- [Create remote functions](/flash/remote-functions) with optimized resource configurations.
+- [View Serverless pricing details](/serverless/pricing) for current rates.
+- [Configure resources](/flash/resource-configuration) for your workloads.
diff --git a/flash/quickstart.mdx b/flash/quickstart.mdx
new file mode 100644
index 00000000..2eaaa675
--- /dev/null
+++ b/flash/quickstart.mdx
@@ -0,0 +1,341 @@
+---
+title: "Get started with Flash"
+sidebarTitle: "Quickstart"
+description: "Set up your development environment and run your first GPU workload with Flash."
+tag: "BETA"
+---
+
+This tutorial shows you how to set up Flash and run a GPU workload on Runpod Serverless. You'll create a remote function that performs matrix operations on a GPU and returns the results to your local machine.
+
+## What you'll learn
+
+In this tutorial you'll learn how to:
+
+- Set up your development environment for Flash.
+- Configure a Serverless endpoint using a `LiveServerless` object.
+- Create and define remote functions with the `@remote` decorator.
+- Deploy a GPU-based workload using Runpod resources.
+- Pass data between your local environment and remote workers.
+- Run multiple operations in parallel.
+
+## Requirements
+
+- You've [created a Runpod account](/get-started/manage-accounts).
+- You've [created a Runpod API key](/get-started/api-keys).
+- You've installed [Python 3.10 or higher](https://www.python.org/downloads/).
+
+## Step 1: Install Flash
+
+Create a Python virtual environment and use `pip` to install Flash:
+
+```bash
+python3 -m venv venv
+source venv/bin/activate
+pip install runpod-flash
+```
+
+## Step 2: Add your API key to the environment
+
+Add your Runpod API key to your development environment before using Flash to run workloads.
+
+Run this command to create a `.env` file, replacing `YOUR_API_KEY` with your Runpod API key:
+
+```bash
+touch .env && echo "RUNPOD_API_KEY=YOUR_API_KEY" > .env
+```
+
+
+
+You can create this in your project's root directory or in the `/examples` folder. Make sure your `.env` file is in the same folder as the Python file you create in the next step.
+
+
+
+## Step 3: Create your project file
+
+Create a new file called `matrix_operations.py` in the same directory as your `.env` file:
+
+```bash
+touch matrix_operations.py
+```
+
+Open this file in your code editor. The following steps walk through building a matrix multiplication example that demonstrates Flash's remote execution and parallel processing capabilities.
+
+## Step 4: Add imports and load the .env file
+
+Add the necessary import statements:
+
+```python
+import asyncio
+from dotenv import load_dotenv
+from runpod_flash import remote, LiveServerless, GpuGroup
+
+# Load environment variables from .env file
+load_dotenv()
+```
+
+This imports:
+
+- `asyncio`: Python's asynchronous programming library, which Flash uses for non-blocking execution.
+- `dotenv`: Loads environment variables from your `.env` file, including your Runpod API key.
+- `remote` and `LiveServerless`: The core Flash components for defining remote functions and their resource requirements.
+
+`load_dotenv()` reads your API key from the `.env` file and makes it available to Flash.
+
+## Step 5: Add Serverless endpoint configuration
+
+Define the Serverless endpoint configuration for your Flash workload:
+
+```python
+# Configuration for a Serverless endpoint using GPU workers
+gpu_config = LiveServerless(
+ gpus=[GpuGroup.AMPERE_24, GpuGroup.ADA_24], # Use any 24GB GPU
+ workersMax=3,
+ name="flash_gpu",
+)
+```
+
+This `LiveServerless` object defines:
+
+- `gpus=[GpuGroup.AMPERE_24, GpuGroup.ADA_24]`: The GPUs that can be used by workers on this endpoint. This restricts workers to using any 24 GB GPU (L4, A5000, 3090, or 4090). See [GPU pools](/references/gpu-types#gpu-pools) for available GPU pool IDs. Removing this parameter allows the endpoint to use any available GPUs.
+- `workersMax=3`: The maximum number of worker instances.
+- `name="flash_gpu"`: The name of the endpoint that will be created/used in the Runpod console.
+
+If you run a Flash function that uses an identical `LiveServerless` configuration to a prior run, Runpod reuses your existing endpoint rather than creating a new one. However, if any configuration values have changed (not just the `name` parameter), a new endpoint will be created.
+
+## Step 6: Define your remote function
+
+Define the function that will run on the GPU worker:
+
+```python
+@remote(
+ resource_config=gpu_config,
+ dependencies=["numpy", "torch"]
+)
+def flash_matrix_operations(size):
+ """Perform large matrix operations using NumPy and check GPU availability."""
+ import numpy as np
+ import torch
+
+ # Get GPU count and name
+ device_count = torch.cuda.device_count()
+ device_name = torch.cuda.get_device_name(0)
+
+ # Create large random matrices
+ A = np.random.rand(size, size)
+ B = np.random.rand(size, size)
+
+ # Perform matrix multiplication
+ C = np.dot(A, B)
+
+ return {
+ "matrix_size": size,
+ "result_shape": C.shape,
+ "result_mean": float(np.mean(C)),
+ "result_std": float(np.std(C)),
+ "device_count": device_count,
+ "device_name": device_name
+ }
+```
+
+This code demonstrates several key concepts:
+
+- `@remote`: The decorator that marks the function for remote execution on Runpod's infrastructure.
+- `resource_config=gpu_config`: The function runs using the GPU configuration defined earlier.
+- `dependencies=["numpy", "torch"]`: Python packages that must be installed on the remote worker.
+
+The `flash_matrix_operations` function:
+
+- Gets GPU details using PyTorch's CUDA utilities.
+- Creates two large random matrices using NumPy.
+- Performs matrix multiplication.
+- Returns statistics about the result and information about the GPU.
+
+Notice that `numpy` and `torch` are imported inside the function, not at the top of the file. These imports need to happen on the remote worker, not in your local environment.
+
+## Step 7: Add the main function
+
+Add a `main` function to execute your GPU workload:
+
+```python
+async def main():
+ # Run the GPU matrix operations
+ print("Starting large matrix operations on GPU...")
+ result = await flash_matrix_operations(1000)
+
+ # Print the results
+ print("\nMatrix operations results:")
+ print(f"Matrix size: {result['matrix_size']}x{result['matrix_size']}")
+ print(f"Result shape: {result['result_shape']}")
+ print(f"Result mean: {result['result_mean']:.4f}")
+ print(f"Result standard deviation: {result['result_std']:.4f}")
+
+ # Print GPU information
+ print("\nGPU Information:")
+ print(f"GPU device count: {result['device_count']}")
+ print(f"GPU device name: {result['device_name']}")
+
+if __name__ == "__main__":
+ asyncio.run(main())
+```
+
+The `main` function:
+
+- Calls the remote function with `await`, which runs it asynchronously on Runpod's infrastructure.
+- Prints the results of the matrix operations.
+- Displays information about the GPU that was used.
+
+`asyncio.run(main())` is Python's standard way to execute an asynchronous `main` function from synchronous code.
+
+All code outside of the `@remote` decorated function runs on your local machine. The `main` function acts as a bridge between your local environment and Runpod's cloud infrastructure, allowing you to send input data to remote functions, wait for remote execution to complete without blocking your local process, and process returned results locally.
+
+The `await` keyword pauses execution of the `main` function until the remote operation completes, but doesn't block the entire Python process.
+
+## Step 8: Run your GPU example
+
+Run the example:
+
+```bash
+python matrix_operations.py
+```
+
+You should see output similar to this:
+
+```text
+Starting large matrix operations on GPU...
+Resource LiveServerless_33e1fa59c64b611c66c5a778e120c522 already exists, reusing.
+Registering RunPod endpoint: server_LiveServerless_33e1fa59c64b611c66c5a778e120c522 at https://api.runpod.ai/xvf32dan8rcilp
+Initialized RunPod stub for endpoint: https://api.runpod.ai/xvf32dan8rcilp (ID: xvf32dan8rcilp)
+Executing function on RunPod endpoint ID: xvf32dan8rcilp
+Initial job status: IN_QUEUE
+Job completed, output received
+
+Matrix operations results:
+Matrix size: 1000x1000
+Result shape: (1000, 1000)
+Result mean: 249.8286
+Result standard deviation: 6.8704
+
+GPU Information:
+GPU device count: 1
+GPU device name: NVIDIA GeForce RTX 4090
+```
+
+
+If you're having trouble running your code due to authentication issues:
+
+1. Verify your `.env` file is in the same directory as your `matrix_operations.py` file.
+2. Check that the API key in your `.env` file is correct and properly formatted.
+
+Alternatively, you can set the API key directly in your terminal:
+
+
+
+```bash
+export RUNPOD_API_KEY=[YOUR_API_KEY]
+```
+
+
+```bash
+set RUNPOD_API_KEY=[YOUR_API_KEY]
+```
+
+
+
+
+## Step 9: Understand what's happening
+
+When you run this script:
+
+1. Flash reads your GPU resource configuration and provisions a worker on Runpod.
+2. It installs the required dependencies (NumPy and PyTorch) on the worker.
+3. Your `flash_matrix_operations` function runs on the remote worker.
+4. The function creates and multiplies large matrices, then calculates statistics.
+5. Your local `main` function receives these results and displays them in your terminal.
+
+## Step 10: Run multiple operations in parallel
+
+Flash makes it easy to run multiple remote operations in parallel.
+
+Replace your `main` function with this code:
+
+```python
+async def main():
+ # Run multiple matrix operations in parallel
+ print("Starting large matrix operations on GPU...")
+
+ # Run all matrix operations in parallel
+ results = await asyncio.gather(
+ flash_matrix_operations(500),
+ flash_matrix_operations(1000),
+ flash_matrix_operations(2000)
+ )
+
+ print("\nMatrix operations results:")
+
+ # Print the results for each matrix size
+ for result in results:
+ print(f"\nMatrix size: {result['matrix_size']}x{result['matrix_size']}")
+ print(f"Result shape: {result['result_shape']}")
+ print(f"Result mean: {result['result_mean']:.4f}")
+ print(f"Result standard deviation: {result['result_std']:.4f}")
+
+if __name__ == "__main__":
+ asyncio.run(main())
+```
+
+This updated `main` function demonstrates Flash's ability to run multiple operations in parallel using `asyncio.gather()`. Instead of running one matrix operation at a time, you're launching three operations with different matrix sizes (500, 1000, and 2000) simultaneously. This parallel execution significantly improves efficiency when you have multiple independent tasks.
+
+Run the example again:
+
+```bash
+python matrix_operations.py
+```
+
+You should see results for all three matrix sizes after the operations complete:
+
+```text
+Initial job status: IN_QUEUE
+Initial job status: IN_QUEUE
+Initial job status: IN_QUEUE
+Job completed, output received
+Job completed, output received
+Job completed, output received
+
+Matrix size: 500x500
+Result shape: (500, 500)
+Result mean: 125.3097
+Result standard deviation: 5.0425
+
+Matrix size: 1000x1000
+Result shape: (1000, 1000)
+Result mean: 249.9442
+Result standard deviation: 7.1072
+
+Matrix size: 2000x2000
+Result shape: (2000, 2000)
+Result mean: 500.1321
+Result standard deviation: 9.8879
+```
+
+## Clean up
+
+When you're done testing, you can clean up the endpoints created during this tutorial. Use the [`flash undeploy`](/flash/cli/undeploy) command to remove development endpoints:
+
+```bash
+# List all endpoints
+flash undeploy list
+
+# Remove a specific endpoint
+flash undeploy live-ENDPOINT_NAME
+
+# Remove all endpoints
+flash undeploy --all
+```
+
+## Next steps
+
+You've successfully used Flash to run a GPU workload on Runpod. Now you can:
+
+- [Create more complex remote functions](/flash/remote-functions) with custom dependencies and resource configurations.
+- [Build and deploy Flash apps](/flash/apps/overview) for production use.
+- Explore more examples on the [runpod-workers/flash](https://github.com/runpod-workers/flash) GitHub repository.
diff --git a/flash/remote-functions.mdx b/flash/remote-functions.mdx
new file mode 100644
index 00000000..dff3baca
--- /dev/null
+++ b/flash/remote-functions.mdx
@@ -0,0 +1,263 @@
+---
+title: "Create remote functions"
+sidebarTitle: "Create remote functions"
+description: "Learn how to create and configure remote functions with Flash."
+tag: "BETA"
+---
+
+Remote functions are the core building blocks of Flash. The `@remote` decorator marks Python functions for execution on Runpod's Serverless infrastructure, handling resource provisioning, dependency installation, and data transfer automatically.
+
+## Resource configuration
+
+Every remote function requires a resource configuration that specifies the compute resources to use. Flash provides several configuration classes for different use cases.
+
+### LiveServerless
+
+`LiveServerless` is the primary configuration class for Flash. It supports full remote code execution, allowing you to run arbitrary Python functions on Runpod's infrastructure.
+
+```python
+from runpod_flash import LiveServerless, GpuGroup
+
+gpu_config = LiveServerless(
+ name="ml-inference",
+ gpus=[GpuGroup.AMPERE_80], # A100 80GB
+ workersMax=5,
+ idleTimeout=10
+)
+
+@remote(resource_config=gpu_config, dependencies=["torch"])
+def run_inference(data):
+ import torch
+ # Your inference code here
+ return result
+```
+
+Common configuration options:
+
+| Parameter | Description | Default |
+|-----------|-------------|---------|
+| `name` | Name for your endpoint (required) | - |
+| `gpus` | GPU pool IDs that can be used | `[GpuGroup.ANY]` |
+| `workersMax` | Maximum number of workers | 3 |
+| `workersMin` | Minimum number of workers | 0 |
+| `idleTimeout` | Minutes before scaling down | 5 |
+
+See the [resource configuration reference](/flash/resource-configuration) for all available options.
+
+### CPU configuration
+
+For CPU-only workloads, specify `instanceIds` instead of `gpus`:
+
+```python
+from runpod_flash import LiveServerless, CpuInstanceType
+
+cpu_config = LiveServerless(
+ name="data-processor",
+ instanceIds=[CpuInstanceType.CPU5C_4_8], # 4 vCPU, 8GB RAM
+ workersMax=3
+)
+
+@remote(resource_config=cpu_config, dependencies=["pandas"])
+def process_data(data):
+ import pandas as pd
+ df = pd.DataFrame(data)
+ return df.describe().to_dict()
+```
+
+## Dependency management
+
+Specify Python packages in the `dependencies` parameter of the `@remote` decorator. Flash installs these packages on the remote worker before executing your function.
+
+```python
+@remote(
+ resource_config=config,
+ dependencies=["transformers==4.36.0", "torch", "pillow"]
+)
+def generate_image(prompt):
+ from transformers import pipeline
+ import torch
+ from PIL import Image
+ # Your code here
+```
+
+### Important notes about dependencies
+
+**Import inside the function**: Always import packages inside the decorated function body, not at the top of your file. These imports need to happen on the remote worker, not in your local environment.
+
+```python
+# Correct - imports inside the function
+@remote(resource_config=config, dependencies=["numpy"])
+def compute(data):
+ import numpy as np # Import here
+ return np.sum(data)
+
+# Incorrect - imports at top of file won't work
+import numpy as np # This import happens locally, not on the worker
+
+@remote(resource_config=config, dependencies=["numpy"])
+def compute(data):
+ return np.sum(data) # numpy not available on worker
+```
+
+**Version pinning**: You can pin specific versions using standard pip syntax:
+
+```python
+dependencies=["transformers==4.36.0", "torch>=2.0.0"]
+```
+
+**Pre-installed packages**: Some packages (like PyTorch) are pre-installed on GPU workers. Including them in dependencies ensures the correct version is available.
+
+## Parallel execution
+
+Flash functions are asynchronous by default. Use Python's `asyncio` to run multiple functions in parallel:
+
+```python
+import asyncio
+
+async def main():
+ # Run three functions in parallel
+ results = await asyncio.gather(
+ process_item(item1),
+ process_item(item2),
+ process_item(item3)
+ )
+ return results
+```
+
+This is particularly useful for:
+
+- Batch processing multiple inputs.
+- Running different models on the same data.
+- Parallelizing independent pipeline stages.
+
+### Example: Parallel batch processing
+
+```python
+import asyncio
+from runpod_flash import remote, LiveServerless, GpuGroup
+
+config = LiveServerless(
+ name="batch-processor",
+ gpus=[GpuGroup.ADA_24],
+ workersMax=5 # Allow up to 5 parallel workers
+)
+
+@remote(resource_config=config, dependencies=["torch"])
+def process_batch(batch_id, data):
+ import torch
+ # Process batch
+ return {"batch_id": batch_id, "result": len(data)}
+
+async def main():
+ batches = [
+ (1, [1, 2, 3]),
+ (2, [4, 5, 6]),
+ (3, [7, 8, 9])
+ ]
+
+ # Process all batches in parallel
+ results = await asyncio.gather(*[
+ process_batch(batch_id, data)
+ for batch_id, data in batches
+ ])
+
+ print(results)
+
+if __name__ == "__main__":
+ asyncio.run(main())
+```
+
+## Custom Docker images
+
+For specialized environments that require a custom Docker image, use `ServerlessEndpoint` or `CpuServerlessEndpoint` instead of `LiveServerless`:
+
+```python
+from runpod_flash import ServerlessEndpoint, GpuGroup
+
+custom_gpu = ServerlessEndpoint(
+ name="custom-ml-env",
+ imageName="pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime",
+ gpus=[GpuGroup.AMPERE_80]
+)
+```
+
+
+
+Unlike `LiveServerless`, `ServerlessEndpoint` and `CpuServerlessEndpoint` only support dictionary payloads in the form of `{"input": {...}}` (similar to a traditional [Serverless endpoint request](/serverless/endpoints/send-requests)). They cannot execute arbitrary Python functions remotely.
+
+
+
+Use custom Docker images when you need:
+
+- Pre-installed system-level dependencies.
+- Specific CUDA or cuDNN versions.
+- Custom base images with large models baked in.
+
+## Using persistent storage
+
+Attach [network volumes](/storage/network-volumes) for persistent storage across workers and endpoints. This is useful for sharing large models or datasets between workers without downloading them each time.
+
+```python
+config = LiveServerless(
+ name="model-server",
+ networkVolumeId="vol_abc123", # Your network volume ID
+ template=PodTemplate(containerDiskInGb=100)
+)
+```
+
+To find your network volume ID:
+
+1. Go to the [Storage page](https://www.runpod.io/console/storage) in the Runpod console.
+2. Click on your network volume.
+3. Copy the volume ID from the URL or volume details.
+
+### Example: Using a network volume for model storage
+
+```python
+from runpod_flash import LiveServerless, GpuGroup, PodTemplate
+
+config = LiveServerless(
+ name="model-inference",
+ gpus=[GpuGroup.AMPERE_80],
+ networkVolumeId="vol_abc123",
+ template=PodTemplate(containerDiskInGb=100)
+)
+
+@remote(resource_config=config, dependencies=["torch", "transformers"])
+def run_inference(prompt):
+ from transformers import AutoModelForCausalLM, AutoTokenizer
+
+ # Load model from network volume
+ model_path = "/runpod-volume/models/llama-7b"
+ model = AutoModelForCausalLM.from_pretrained(model_path)
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
+
+ # Run inference
+ inputs = tokenizer(prompt, return_tensors="pt")
+ outputs = model.generate(**inputs)
+ return tokenizer.decode(outputs[0])
+```
+
+## Environment variables
+
+Pass environment variables to remote functions using the `env` parameter:
+
+```python
+config = LiveServerless(
+ name="api-worker",
+ env={"HF_TOKEN": "your_token", "MODEL_ID": "gpt2"}
+)
+```
+
+
+
+Environment variables are excluded from configuration hashing. Changing environment values won't trigger endpoint recreation, which allows different processes to load environment variables from `.env` files without causing false drift detection.
+
+
+
+## Next steps
+
+- [Create API endpoints](/flash/apps/build-app) using FastAPI.
+- [Deploy Flash applications](/flash/apps/deploy-apps) for production.
+- [View the resource configuration reference](/flash/resource-configuration) for all available options.
+- [Clean up development endpoints](/flash/cli/undeploy) when you're done testing.
diff --git a/flash/resource-configuration.mdx b/flash/resource-configuration.mdx
new file mode 100644
index 00000000..00bb1710
--- /dev/null
+++ b/flash/resource-configuration.mdx
@@ -0,0 +1,269 @@
+---
+title: "Resource configuration reference"
+sidebarTitle: "Configuration reference"
+description: "A complete reference for Flash GPU/CPU resource configuration options."
+tag: "BETA"
+---
+
+Flash provides several resource configuration classes for different use cases. This reference covers all available parameters and options.
+
+## LiveServerless
+
+`LiveServerless` is the primary configuration class for Flash. It supports full remote code execution, allowing you to run arbitrary Python functions on Runpod's infrastructure.
+
+```python
+from runpod_flash import LiveServerless, GpuGroup, CpuInstanceType, PodTemplate
+
+gpu_config = LiveServerless(
+ name="ml-inference",
+ gpus=[GpuGroup.AMPERE_80],
+ workersMax=5,
+ idleTimeout=10,
+ template=PodTemplate(containerDiskInGb=100)
+)
+```
+
+### Parameters
+
+| Parameter | Type | Description | Default |
+|-----------|------|-------------|---------|
+| `name` | `string` | Name for your endpoint (required) | - |
+| `gpus` | `list[GpuGroup]` | GPU pool IDs that can be used by workers | `[GpuGroup.ANY]` |
+| `gpuCount` | `int` | Number of GPUs per worker | 1 |
+| `instanceIds` | `list[CpuInstanceType]` | CPU instance types (forces CPU endpoint) | `None` |
+| `workersMin` | `int` | Minimum number of workers | 0 |
+| `workersMax` | `int` | Maximum number of workers | 3 |
+| `idleTimeout` | `int` | Minutes before scaling down | 5 |
+| `env` | `dict` | Environment variables | `None` |
+| `networkVolumeId` | `string` | Persistent storage volume ID | `None` |
+| `executionTimeoutMs` | `int` | Max execution time in milliseconds | 0 (no limit) |
+| `scalerType` | `string` | Scaling strategy | `QUEUE_DELAY` |
+| `scalerValue` | `int` | Scaling parameter value | 4 |
+| `locations` | `string` | Preferred datacenter locations | `None` |
+| `template` | `PodTemplate` | Pod template overrides | `None` |
+
+### GPU configuration example
+
+```python
+from runpod_flash import LiveServerless, GpuGroup, PodTemplate
+
+config = LiveServerless(
+ name="gpu-inference",
+ gpus=[GpuGroup.AMPERE_80], # A100 80GB
+ gpuCount=1,
+ workersMin=0,
+ workersMax=5,
+ idleTimeout=10,
+ template=PodTemplate(containerDiskInGb=100),
+ env={"MODEL_ID": "llama-7b"}
+)
+```
+
+### CPU configuration example
+
+```python
+from runpod_flash import LiveServerless, CpuInstanceType
+
+config = LiveServerless(
+ name="cpu-processor",
+ instanceIds=[CpuInstanceType.CPU5C_4_8], # 4 vCPU, 8GB RAM
+ workersMax=3,
+ idleTimeout=5
+)
+```
+
+## ServerlessEndpoint
+
+`ServerlessEndpoint` is for GPU workloads that require custom Docker images. Unlike `LiveServerless`, it only supports dictionary payloads and cannot execute arbitrary Python functions.
+
+```python
+from runpod_flash import ServerlessEndpoint, GpuGroup
+
+config = ServerlessEndpoint(
+ name="custom-ml-env",
+ imageName="pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime",
+ gpus=[GpuGroup.AMPERE_80]
+)
+```
+
+### Parameters
+
+All parameters from `LiveServerless` are available, plus:
+
+| Parameter | Type | Description | Default |
+|-----------|------|-------------|---------|
+| `imageName` | `string` | Custom Docker image | - |
+
+### Limitations
+
+- Only supports dictionary payloads in the form of `{"input": {...}}`.
+- Cannot execute arbitrary Python functions remotely.
+- Requires a custom Docker image with a handler that processes the input dictionary.
+
+### Example
+
+```python
+from runpod_flash import ServerlessEndpoint, GpuGroup
+
+# Custom image with pre-installed models
+config = ServerlessEndpoint(
+ name="stable-diffusion",
+ imageName="my-registry/stable-diffusion:v1.0",
+ gpus=[GpuGroup.AMPERE_24],
+ workersMax=3
+)
+
+# Send requests as dictionaries
+result = await config.run({
+ "input": {
+ "prompt": "A beautiful sunset over mountains",
+ "width": 512,
+ "height": 512
+ }
+})
+```
+
+## CpuServerlessEndpoint
+
+`CpuServerlessEndpoint` is for CPU workloads that require custom Docker images. Like `ServerlessEndpoint`, it only supports dictionary payloads.
+
+```python
+from runpod_flash import CpuServerlessEndpoint, CpuInstanceType
+
+config = CpuServerlessEndpoint(
+ name="data-processor",
+ imageName="python:3.11-slim",
+ instanceIds=[CpuInstanceType.CPU5C_4_8]
+)
+```
+
+### Parameters
+
+| Parameter | Type | Description | Default |
+|-----------|------|-------------|---------|
+| `name` | `string` | Name for your endpoint (required) | - |
+| `imageName` | `string` | Custom Docker image | - |
+| `instanceIds` | `list[CpuInstanceType]` | CPU instance types | - |
+| `workersMin` | `int` | Minimum number of workers | 0 |
+| `workersMax` | `int` | Maximum number of workers | 3 |
+| `idleTimeout` | `int` | Minutes before scaling down | 5 |
+| `env` | `dict` | Environment variables | `None` |
+| `networkVolumeId` | `string` | Persistent storage volume ID | `None` |
+| `executionTimeoutMs` | `int` | Max execution time in milliseconds | 0 (no limit) |
+
+## Resource class comparison
+
+| Feature | LiveServerless | ServerlessEndpoint | CpuServerlessEndpoint |
+|---------|----------------|--------------------|-----------------------|
+| Remote code execution | ✅ Full Python function execution | ❌ Dictionary payload only | ❌ Dictionary payload only |
+| Custom Docker images | ❌ Fixed optimized images | ✅ Any Docker image | ✅ Any Docker image |
+| Use case | Dynamic remote functions | Traditional API endpoints | Traditional CPU endpoints |
+| Function returns | Any Python object | Dictionary only | Dictionary only |
+| `@remote` decorator | Full functionality | Limited to payload passing | Limited to payload passing |
+
+## Available GPU types
+
+The `GpuGroup` enum provides access to GPU pools. Some common options:
+
+| GpuGroup | Description | VRAM |
+|----------|-------------|------|
+| `GpuGroup.ANY` | Any available GPU (default) | Varies |
+| `GpuGroup.ADA_24` | RTX 4090 | 24GB |
+| `GpuGroup.AMPERE_24` | RTX A5000, L4, RTX 3090 | 24GB |
+| `GpuGroup.AMPERE_48` | A40, RTX A6000 | 48GB |
+| `GpuGroup.AMPERE_80` | A100 80GB | 80GB |
+
+See [GPU types](/references/gpu-types#gpu-pools) for the complete list of available GPU pools.
+
+## Available CPU instance types
+
+The `CpuInstanceType` enum provides access to CPU configurations:
+
+### 3rd generation general purpose
+
+| CpuInstanceType | ID | vCPU | RAM |
+|-----------------|-----|------|-----|
+| `CPU3G_1_4` | cpu3g-1-4 | 1 | 4GB |
+| `CPU3G_2_8` | cpu3g-2-8 | 2 | 8GB |
+| `CPU3G_4_16` | cpu3g-4-16 | 4 | 16GB |
+| `CPU3G_8_32` | cpu3g-8-32 | 8 | 32GB |
+
+### 3rd generation compute-optimized
+
+| CpuInstanceType | ID | vCPU | RAM |
+|-----------------|-----|------|-----|
+| `CPU3C_1_2` | cpu3c-1-2 | 1 | 2GB |
+| `CPU3C_2_4` | cpu3c-2-4 | 2 | 4GB |
+| `CPU3C_4_8` | cpu3c-4-8 | 4 | 8GB |
+| `CPU3C_8_16` | cpu3c-8-16 | 8 | 16GB |
+
+### 5th generation compute-optimized
+
+| CpuInstanceType | ID | vCPU | RAM |
+|-----------------|-----|------|-----|
+| `CPU5C_1_2` | cpu5c-1-2 | 1 | 2GB |
+| `CPU5C_2_4` | cpu5c-2-4 | 2 | 4GB |
+| `CPU5C_4_8` | cpu5c-4-8 | 4 | 8GB |
+| `CPU5C_8_16` | cpu5c-8-16 | 8 | 16GB |
+
+## PodTemplate
+
+Use `PodTemplate` to configure additional pod settings:
+
+```python
+from runpod_flash import LiveServerless, PodTemplate
+
+config = LiveServerless(
+ name="custom-template",
+ template=PodTemplate(
+ containerDiskInGb=100,
+ env=[{"key": "PYTHONPATH", "value": "/workspace"}]
+ )
+)
+```
+
+### Parameters
+
+| Parameter | Type | Description | Default |
+|-----------|------|-------------|---------|
+| `containerDiskInGb` | `int` | Container disk size in GB | 20 |
+| `env` | `list[dict]` | Environment variables as key-value pairs | `None` |
+
+## Environment variables
+
+Environment variables can be set in two ways:
+
+### Using the `env` parameter
+
+```python
+config = LiveServerless(
+ name="api-worker",
+ env={"HF_TOKEN": "your_token", "MODEL_ID": "gpt2"}
+)
+```
+
+### Using PodTemplate
+
+```python
+config = LiveServerless(
+ name="api-worker",
+ template=PodTemplate(
+ env=[
+ {"key": "HF_TOKEN", "value": "your_token"},
+ {"key": "MODEL_ID", "value": "gpt2"}
+ ]
+ )
+)
+```
+
+
+
+Environment variables are excluded from configuration hashing. Changing environment values won't trigger endpoint recreation, which allows different processes to load environment variables from `.env` files without causing false drift detection. Only structural changes (like GPU type, image, or template modifications) trigger endpoint updates.
+
+
+
+## Next steps
+
+- [Create remote functions](/flash/remote-functions) using these configurations.
+- [Deploy Flash applications](/flash/apps/deploy-apps) for production.
+- [Learn about pricing](/flash/pricing) to optimize costs.