diff --git a/README.md b/README.md index 8b9c3ea9..cce420ce 100644 --- a/README.md +++ b/README.md @@ -141,7 +141,7 @@ These endpoints use the same Python `@remote` decorators [demonstrated above](#g ### Step 1: Initialize a new project -Use the `flash init` command to generate a structured project template with a preconfigured FastAPI application entry point. +Use the `flash init` command to generate a project template with example worker files. Run this command to initialize a new project directory: @@ -162,30 +162,24 @@ This is the structure of the project template created by `flash init`: ```txt my_project/ -├── main.py # FastAPI application entry point -├── workers/ -│ ├── gpu/ # GPU worker example -│ │ ├── __init__.py # FastAPI router -│ │ └── endpoint.py # GPU script @remote decorated function -│ └── cpu/ # CPU worker example -│ ├── __init__.py # FastAPI router -│ └── endpoint.py # CPU script with @remote decorated function -├── .env # Environment variable template +├── gpu_worker.py # GPU worker with @remote function +├── cpu_worker.py # CPU worker with @remote function +├── .env # Environment variable template ├── .gitignore # Git ignore patterns ├── .flashignore # Flash deployment ignore patterns -├── requirements.txt # Python dependencies +├── pyproject.toml # Python dependencies (uv/pip compatible) └── README.md # Project documentation ``` This template includes: -- A FastAPI application entry point and routers. +- Example worker files with `@remote` decorated functions. - Templates for Python dependencies, `.env`, `.gitignore`, etc. -- Flash scripts (`endpoint.py`) for both GPU and CPU workers, which include: +- Each worker file contains: - Pre-configured worker scaling limits using the `LiveServerless()` object. - A `@remote` decorated function that returns a response from a worker. -When you start the FastAPI server, it creates API endpoints at `/gpu/hello` and `/cpu/hello`, which call the remote function described in their respective `endpoint.py` files. +When you run `flash run`, it auto-discovers all `@remote` functions and generates a local development server at `.flash/server.py`. Queue-based workers are exposed at `/{file_prefix}/run_sync` (e.g., `/gpu_worker/run_sync`). ### Step 3: Install Python dependencies @@ -195,9 +189,11 @@ After initializing the project, navigate into the project directory: cd my_project ``` -Install required dependencies: +Install required dependencies using uv (recommended) or pip: ```bash +uv sync # recommended +# or pip install -r requirements.txt ``` @@ -232,7 +228,7 @@ flash run Open a new terminal tab or window and test your GPU API using cURL: ```bash -curl -X POST http://localhost:8888/gpu/hello \ +curl -X POST http://localhost:8888/gpu_worker/run_sync \ -H "Content-Type: application/json" \ -d '{"message": "Hello from the GPU!"}' ``` @@ -257,19 +253,18 @@ Besides starting the API server, `flash run` also starts an interactive API expl To run remote functions in the explorer: -1. Expand one of the functions under **GPU Workers** or **CPU Workers**. -2. Click **Try it out** and then **Execute** +1. Expand one of the available endpoints (e.g., `/gpu_worker/run_sync`). +2. Click **Try it out** and then **Execute**. You'll get a response from your workers right in the explorer. ### Step 7: Customize your API -To customize your API endpoint and functionality: +To customize your API: -1. Add/edit remote functions in your `endpoint.py` files. -2. Test the scripts individually by running `python endpoint.py`. -3. Configure your FastAPI routers by editing the `__init__.py` files. -4. Add any new endpoints to your `main.py` file. +1. Create new `.py` files with `@remote` decorated functions. +2. Test the scripts individually by running `python your_worker.py`. +3. Run `flash run` to auto-discover all `@remote` functions and serve them. ## CLI Reference @@ -541,7 +536,7 @@ After `flash build` completes: - `.flash/artifact.tar.gz`: Deployment package - `.flash/flash_manifest.json`: Service discovery configuration -For information on load-balanced endpoints (required for Mothership and HTTP services), see [docs/Load_Balancer_Endpoints.md](docs/Load_Balancer_Endpoints.md). +For information on load-balanced endpoints (required for HTTP services), see [docs/Load_Balancer_Endpoints.md](docs/Load_Balancer_Endpoints.md). #### Troubleshooting Build Issues diff --git a/VERIFICATION.md b/VERIFICATION.md deleted file mode 100644 index 02def0ec..00000000 --- a/VERIFICATION.md +++ /dev/null @@ -1,303 +0,0 @@ -# Docker Image Constants Fix - Verification Guide - -This document provides step-by-step instructions for verifying the Docker image constant configuration fix. - -## Overview - -**Commit**: `1f3a6fd` - "refactor(resources): centralize docker image configuration" - -The fix centralizes all Docker image references into constants that support environment variable overrides. This eliminates hardcoded image names and enables flexible configuration for local development, testing, and production deployment. - -## Quick Start - -### Run All Tests - -```bash -cd /Users/deanquinanola/Github/python/runpod-flash - -# Run the verification script -uv run python3 scripts/test-image-constants.py -``` - -Expected output: -``` -✓ 20/20 tests passed -✓ ALL TESTS PASSED - -The Docker image configuration fix is working correctly: - ✓ Constants are properly centralized - ✓ Manifest builder uses constants - ✓ LiveServerless classes use constants - ✓ Environment variables override constants - ✓ No hardcoded values remain -``` - -## Individual Test Scenarios - -### Test 1: Constants Are Defined - -```bash -uv run python3 << 'EOF' -import sys -sys.path.insert(0, 'src') - -from runpod_flash.core.resources.constants import ( - FLASH_IMAGE_TAG, - FLASH_GPU_IMAGE, - FLASH_CPU_IMAGE, - FLASH_LB_IMAGE, - FLASH_CPU_LB_IMAGE, - DEFAULT_WORKERS_MIN, - DEFAULT_WORKERS_MAX, -) - -print(f"FLASH_IMAGE_TAG: {FLASH_IMAGE_TAG}") -print(f"FLASH_GPU_IMAGE: {FLASH_GPU_IMAGE}") -print(f"FLASH_CPU_IMAGE: {FLASH_CPU_IMAGE}") -print(f"FLASH_LB_IMAGE: {FLASH_LB_IMAGE}") -print(f"FLASH_CPU_LB_IMAGE: {FLASH_CPU_LB_IMAGE}") -print(f"DEFAULT_WORKERS_MIN: {DEFAULT_WORKERS_MIN}") -print(f"DEFAULT_WORKERS_MAX: {DEFAULT_WORKERS_MAX}") -EOF -``` - -### Test 2: Environment Variable Override (FLASH_IMAGE_TAG=local) - -```bash -FLASH_IMAGE_TAG=local uv run python3 << 'EOF' -import sys -sys.path.insert(0, 'src') - -from runpod_flash.core.resources.constants import ( - FLASH_IMAGE_TAG, - FLASH_GPU_IMAGE, - FLASH_LB_IMAGE, - FLASH_CPU_LB_IMAGE, -) - -print(f"With FLASH_IMAGE_TAG={FLASH_IMAGE_TAG}:") -print(f" FLASH_GPU_IMAGE: {FLASH_GPU_IMAGE}") -print(f" FLASH_LB_IMAGE: {FLASH_LB_IMAGE}") -print(f" FLASH_CPU_LB_IMAGE: {FLASH_CPU_LB_IMAGE}") - -assert ":local" in FLASH_GPU_IMAGE -assert ":local" in FLASH_LB_IMAGE -assert ":local" in FLASH_CPU_LB_IMAGE -print("✓ All images use :local tag") -EOF -``` - -### Test 3: Individual Image Override - -```bash -FLASH_CPU_LB_IMAGE=custom/lb-cpu:v1 uv run python3 << 'EOF' -import sys -sys.path.insert(0, 'src') - -from runpod_flash.core.resources.constants import FLASH_CPU_LB_IMAGE - -print(f"FLASH_CPU_LB_IMAGE: {FLASH_CPU_LB_IMAGE}") -assert FLASH_CPU_LB_IMAGE == "custom/lb-cpu:v1" -print("✓ Custom override works") -EOF -``` - -### Test 4: Manifest Builder Uses Constants - -```bash -uv run python3 << 'EOF' -import sys -sys.path.insert(0, 'src') - -from pathlib import Path -from runpod_flash.cli.commands.build_utils.manifest import ManifestBuilder -from runpod_flash.core.resources.constants import ( - FLASH_CPU_LB_IMAGE, - DEFAULT_WORKERS_MIN, - DEFAULT_WORKERS_MAX, -) - -builder = ManifestBuilder(project_name="test", remote_functions=[]) -mothership = builder._create_mothership_resource({ - "file_path": Path("main.py"), - "app_variable": "app" -}) - -print(f"Mothership configuration:") -print(f" imageName: {mothership['imageName']} (expected: {FLASH_CPU_LB_IMAGE})") -print(f" workersMin: {mothership['workersMin']} (expected: {DEFAULT_WORKERS_MIN})") -print(f" workersMax: {mothership['workersMax']} (expected: {DEFAULT_WORKERS_MAX})") - -assert mothership['imageName'] == FLASH_CPU_LB_IMAGE -assert mothership['workersMin'] == DEFAULT_WORKERS_MIN -assert mothership['workersMax'] == DEFAULT_WORKERS_MAX - -print("✓ Manifest builder uses constants correctly") -EOF -``` - -### Test 5: LiveServerless Uses Constants - -```bash -uv run python3 << 'EOF' -import sys -sys.path.insert(0, 'src') - -from runpod_flash import LiveServerless, LiveLoadBalancer, CpuLiveLoadBalancer -from runpod_flash.core.resources.constants import ( - FLASH_GPU_IMAGE, - FLASH_LB_IMAGE, - FLASH_CPU_LB_IMAGE, -) - -gpu_ls = LiveServerless(name="test-gpu") -gpu_lb = LiveLoadBalancer(name="test-gpu-lb") -cpu_lb = CpuLiveLoadBalancer(name="test-cpu-lb") - -print(f"Resource image configuration:") -print(f" LiveServerless: {gpu_ls.imageName} (expected: {FLASH_GPU_IMAGE})") -print(f" LiveLoadBalancer: {gpu_lb.imageName} (expected: {FLASH_LB_IMAGE})") -print(f" CpuLiveLoadBalancer: {cpu_lb.imageName} (expected: {FLASH_CPU_LB_IMAGE})") - -assert gpu_ls.imageName == FLASH_GPU_IMAGE -assert gpu_lb.imageName == FLASH_LB_IMAGE -assert cpu_lb.imageName == FLASH_CPU_LB_IMAGE - -print("✓ All LiveServerless classes use correct image constants") -EOF -``` - -### Test 6: No Hardcoded Values Remain - -```bash -# Verify no hardcoded image names in manifest.py -grep -n "runpod/runpod-flash-lb" src/runpod_flash/cli/commands/build_utils/manifest.py || echo "✓ No hardcoded images found" - -# Verify constants are imported -grep "FLASH_CPU_LB_IMAGE\|FLASH_LB_IMAGE\|DEFAULT_WORKERS" src/runpod_flash/cli/commands/build_utils/manifest.py -``` - -### Test 7: Unit Tests Pass - -```bash -# Run manifest mothership tests -uv run pytest tests/unit/cli/commands/build_utils/test_manifest_mothership.py -v - -# Run all tests -uv run pytest --tb=short -``` - -## Test Coverage - -The verification tests cover: - -1. **Constants Definition** (✓ 7 tests) - - All 7 constants properly defined - - Default values correct - - Support environment variable overrides - -2. **Manifest Builder Integration** (✓ 3 tests) - - `_create_mothership_resource()` uses constants - - `_create_mothership_from_explicit()` uses constants - - Worker count constants used correctly - -3. **LiveServerless Integration** (✓ 3 tests) - - `LiveServerless` uses `FLASH_GPU_IMAGE` - - `LiveLoadBalancer` uses `FLASH_LB_IMAGE` - - `CpuLiveLoadBalancer` uses `FLASH_CPU_LB_IMAGE` - -4. **Environment Variable Overrides** (✓ 1 test) - - `FLASH_IMAGE_TAG=dev` works correctly - - Individual image overrides work - -5. **Code Quality** (✓ 6 tests) - - No hardcoded image names remain - - Constants are properly imported - - Code follows project patterns - -## Environment Variables - -### Global Override: FLASH_IMAGE_TAG - -Affects all images at once: - -```bash -export FLASH_IMAGE_TAG=local -# or -export FLASH_IMAGE_TAG=dev -# or -export FLASH_IMAGE_TAG=staging -``` - -### Individual Overrides - -Override specific images: - -```bash -export FLASH_GPU_IMAGE=my-registry/runpod-flash:custom -export FLASH_CPU_IMAGE=my-registry/runpod-flash-cpu:custom -export FLASH_LB_IMAGE=my-registry/runpod-flash-lb:custom -export FLASH_CPU_LB_IMAGE=my-registry/runpod-flash-lb-cpu:custom -``` - -## Files Modified - -- `src/runpod_flash/cli/commands/build_utils/manifest.py` - Uses constants -- `src/runpod_flash/cli/commands/test_mothership.py` - Uses constants -- `src/runpod_flash/core/resources/constants.py` - Centralizes constants -- `src/runpod_flash/core/resources/live_serverless.py` - Imports from constants -- `tests/unit/cli/commands/build_utils/test_manifest_mothership.py` - Updated tests - -## Related Documentation - -- **Commit**: `1f3a6fd` - Full diff of changes -- **CLAUDE.md**: Project development guidelines -- **README**: Project overview - -## Future Verification - -To re-run this verification after future changes: - -```bash -cd /Users/deanquinanola/Github/python/runpod-flash -uv run python3 scripts/test-image-constants.py -``` - -This script can be retained indefinitely and re-run to ensure the fix remains intact. - -## Troubleshooting - -### Test Fails with "Module not found" - -Make sure you're running from the runpod-flash directory: -```bash -cd /Users/deanquinanola/Github/python/runpod-flash -``` - -### Constants Have Unexpected Values - -Check if environment variables are set: -```bash -echo $FLASH_IMAGE_TAG -echo $FLASH_CPU_LB_IMAGE -``` - -Unset them if they're interfering: -```bash -unset FLASH_IMAGE_TAG FLASH_CPU_LB_IMAGE FLASH_LB_IMAGE -``` - -### Manifest Not Using Constants - -Verify imports in manifest.py: -```bash -grep "from runpod_flash.core.resources.constants import" src/runpod_flash/cli/commands/build_utils/manifest.py -``` - -## Summary - -✅ All hardcoded image names have been eliminated -✅ Constants are centralized with environment variable support -✅ All tests pass (856 passed, 68.74% coverage) -✅ Backward compatible (defaults unchanged) -✅ Ready for production deployment diff --git a/docs/Cross_Endpoint_Routing.md b/docs/Cross_Endpoint_Routing.md index aa851705..b9318c9b 100644 --- a/docs/Cross_Endpoint_Routing.md +++ b/docs/Cross_Endpoint_Routing.md @@ -342,7 +342,7 @@ graph TD B -->|"load service configuration"| C["ServiceRegistry"] C -->|"if not cached"| D["ManifestClient"] - D -->|"query mothership API"| E["Manifest
Endpoint URLs"] + D -->|"query State Manager API"| E["Manifest
Endpoint URLs"] E -->|"cache result
TTL 300s"| C C -->|"lookup in manifest
flash_manifest.json"| F{"Routing
Decision"} @@ -465,7 +465,7 @@ class ServiceRegistry: Environment Variables (for local vs remote detection): RUNPOD_API_KEY: API key for State Manager GraphQL access (peer-to-peer). - FLASH_RESOURCE_NAME: Resource config name for this endpoint (child endpoints). + FLASH_RESOURCE_NAME: Resource config name for this endpoint (worker endpoints). Identifies which resource config this endpoint represents in the manifest. RUNPOD_ENDPOINT_ID: Endpoint ID (used as fallback for identification). """ @@ -473,7 +473,7 @@ class ServiceRegistry: self._state_manager_client = state_manager_client or StateManagerClient() self._endpoint_registry = {} # Cached endpoint URLs self._endpoint_registry_lock = asyncio.Lock() - # Child endpoints use FLASH_RESOURCE_NAME to identify which resource they represent + # Worker endpoints use FLASH_RESOURCE_NAME to identify which resource they represent # Falls back to RUNPOD_ENDPOINT_ID if not set self._current_endpoint = os.getenv("FLASH_RESOURCE_NAME") or os.getenv( "RUNPOD_ENDPOINT_ID" @@ -531,7 +531,7 @@ class ServiceRegistry: **Location**: `src/runpod_flash/runtime/state_manager_client.py` -GraphQL client for State Manager manifest persistence (used by mothership auto-provisioning): +GraphQL client for State Manager manifest persistence (used by endpoint auto-provisioning): ```python class StateManagerClient: @@ -542,10 +542,13 @@ class StateManagerClient: """ async def get_persisted_manifest( - self, mothership_id: str + self, flash_environment_id: str ) -> Optional[Dict[str, Any]]: """Fetch persisted manifest from State Manager. + Args: + flash_environment_id: ID of the Flash environment. + Returns: Manifest dict or None if not found (first boot). @@ -556,7 +559,7 @@ class StateManagerClient: async def update_resource_state( self, - mothership_id: str, + flash_environment_id: str, resource_name: str, resource_data: Dict[str, Any], ) -> None: @@ -815,7 +818,7 @@ class JsonSerializer: #### Adding New Manifest Backends -To support directories other than mothership: +To support alternative manifest backends: 1. Create client class with `get_manifest()` method: ```python @@ -974,7 +977,7 @@ print(f"RUNPOD_ENDPOINT_ID: {os.getenv('RUNPOD_ENDPOINT_ID')}") # Check state manager client directly client = StateManagerClient() -manifest = await client.get_persisted_manifest(mothership_id) +manifest = await client.get_persisted_manifest(flash_environment_id) ``` ## Peer-to-Peer Architecture with StateManagerClient @@ -983,7 +986,7 @@ manifest = await client.get_persisted_manifest(mothership_id) Cross-endpoint routing uses a **peer-to-peer architecture** where all endpoints query State Manager directly for service discovery. This eliminates single points of failure and simplifies the system architecture compared to previous hub-and-spoke models. -**Key Difference**: No mothership endpoint exposing a `/manifest` HTTP endpoint. Instead, all endpoints use `StateManagerClient` to query the Runpod GraphQL API directly. +**Key Difference**: No dedicated endpoint exposing a `/manifest` HTTP endpoint. Instead, all endpoints use `StateManagerClient` to query the Runpod GraphQL API directly. ### Architecture @@ -1034,7 +1037,7 @@ export RUNPOD_ENDPOINT_ID=gpu-endpoint-123 - **Caching**: 300-second TTL cache to minimize API calls - **Retry Logic**: Exponential backoff on failures (default 3 attempts) - **Thread-Safe**: Uses `asyncio.Lock` for concurrent operations -- **Auto-Provisioning**: Used by mothership provisioner to update resource state +- **Auto-Provisioning**: Used by endpoint provisioner to update resource state ## Key Implementation Highlights diff --git a/docs/Deployment_Architecture.md b/docs/Deployment_Architecture.md index cc395aaf..c483debf 100644 --- a/docs/Deployment_Architecture.md +++ b/docs/Deployment_Architecture.md @@ -1,7 +1,7 @@ # Flash App Deployment Architecture Specification ## Overview -A deployed Flash App consists of a Mothership coordinator and distributed Child Endpoints, where functions are partitioned across endpoints. The system uses a manifest-driven approach to route requests and coordinate execution across the distributed topology. +A deployed Flash App consists of peer endpoints, where functions are partitioned across endpoints. The system uses a manifest-driven approach to route requests and coordinate execution across the distributed topology. ## Build and Deploy Flow @@ -11,33 +11,31 @@ graph TD B -->|"Write"| C["flash_manifest.json"] B -->|"Archive"| D["artifact.tar.gz"] - D -->|"flash deploy"| E["Push Archive +
Provision Resources"] + D -->|"flash deploy"| E["Push Archive +
Load Manifest"] - E -->|"CLI provisions
upfront"| F["Child Endpoints
Deployed"] - - G["🎯 Mothership
Endpoint"] -->|"Load from
.flash/"| H["Load Local
Manifest"] - - H --> I["reconcile_children()"] + E --> I["Reconcile:
Compute Diff"] I --> J["Categorize:
New, Changed,
Removed, Unchanged"] - J --> K["Verify NEW
Endpoints"] - J --> L["Verify CHANGED
Endpoints"] - J --> M["Verify REMOVED
Endpoints"] + J --> K["Provision NEW
Endpoints"] + J --> L["Update CHANGED
Endpoints"] + J --> M["Remove DELETED
Endpoints"] J --> N["Skip UNCHANGED
Endpoints"] - K -->|"Healthy?"| O["Update State"] - L -->|"Healthy?"| O - M -->|"Decommissioned?"| O + K -->|"Deployed"| O["Update State"] + L -->|"Updated"| O + M -->|"Decommissioned"| O O --> P["Persist to State Manager"] - P --> Q["🚀 Reconciliation
Complete"] + P --> Q["🚀 Deploy
Complete"] + + Q -.->|"Endpoints boot"| F["Peer Endpoints
Running"] F -.->|"Peer-to-peer
Service Discovery"| R["Query State Manager
GraphQL API"] style A fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff - style G fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff + style E fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff style I fill:#f57c00,stroke:#bf360c,stroke-width:3px,color:#fff style K fill:#f57c00,stroke:#bf360c,stroke-width:3px,color:#fff style L fill:#f57c00,stroke:#bf360c,stroke-width:3px,color:#fff @@ -51,9 +49,9 @@ graph TD ```mermaid graph TD - A["Request arrives at
Mothership for funcA"] -->|"Consult manifest"| B{"Function
Location?"} + A["Request arrives at
Endpoint for funcA"] -->|"Consult manifest"| B{"Function
Location?"} - B -->|"Local to Mothership"| C["Execute locally"] + B -->|"Local to Endpoint"| C["Execute locally"] B -->|"On Endpoint1"| D["Route request to
Endpoint1 with payload"] D --> E["Endpoint1 receives
Endpoint1>funcA"] @@ -69,7 +67,7 @@ graph TD L --> J J --> M["funcA completes
with all results"] - M --> N["Response back
to Mothership"] + M --> N["Response back
to Endpoint"] N --> O["Return to client"] style A fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff @@ -82,7 +80,7 @@ graph TD ```mermaid graph LR - subgraph Mothership["🎯 Mothership
(Coordinator)"] + subgraph CoordinatorNode["🎯 Manifest Store"] MF["Manifest Store
Function Map"] end @@ -105,7 +103,7 @@ graph LR E1F1 -.->|"Local execution"| E1F2 E1F1 -.->|"Remote call"| E2F1 - style Mothership fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff + style CoordinatorNode fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff style EP1 fill:#f57c00,stroke:#bf360c,stroke-width:3px,color:#fff style EP2 fill:#f57c00,stroke:#bf360c,stroke-width:3px,color:#fff style MF fill:#1565c0,stroke:#0d47a1,stroke-width:2px,color:#fff @@ -118,8 +116,8 @@ graph LR - **Smart Routing**: System automatically determines if execution is local (in-process) or remote (inter-endpoint) - **Deployed Mode**: Unlike Live mode, endpoints are aware they're in distributed deployment with explicit role assignments - **Transparent Execution**: Functions can call other functions without knowing deployment topology; manifest handles routing -- **State Synchronization**: Mothership maintains single source of truth, synced with GQL State Manager -- **Reconciliation**: On each boot, Mothership reconciles local manifest with persisted state to deploy/update/undeploy resources +- **State Synchronization**: State Manager maintains the source of truth; endpoints sync via GraphQL +- **Reconciliation**: The CLI reconciles the manifest with persisted state during `flash deploy` - **Peer-to-Peer Discovery**: All endpoints query State Manager GraphQL API directly for service discovery ## Actual Manifest Structure @@ -285,12 +283,8 @@ Each reconciliation action updates State Manager: ## Environment Variables -### Mothership -- `FLASH_IS_MOTHERSHIP=true` - Identifies this endpoint as mothership -- `RUNPOD_API_KEY` - For State Manager authentication -- `FLASH_MANIFEST_PATH` - Optional explicit path to manifest - -### Child Endpoints +### All Endpoints - `RUNPOD_API_KEY` - For State Manager GraphQL access (peer-to-peer service discovery) - `FLASH_RESOURCE_NAME` - Which resource config this endpoint represents -- `RUNPOD_ENDPOINT_ID` - This child's endpoint ID +- `RUNPOD_ENDPOINT_ID` - This endpoint's ID (set by Runpod) +- `FLASH_MANIFEST_PATH` - Optional explicit path to manifest diff --git a/docs/Flash_Deploy_Guide.md b/docs/Flash_Deploy_Guide.md index 234e5f33..1cd39780 100644 --- a/docs/Flash_Deploy_Guide.md +++ b/docs/Flash_Deploy_Guide.md @@ -30,12 +30,7 @@ graph TB subgraph Cloud["Runpod Cloud"] S3["S3 Storage
artifact.tar.gz"] - subgraph Mothership["Mothership Endpoint
(FLASH_IS_MOTHERSHIP=true)"] - MothershipReconciler["MothershipsProvisioner
Reconcile Children"] - MothershipState["State Sync
to State Manager"] - end - - subgraph ChildEndpoints["Child Endpoints
(Resource Configs)"] + subgraph Endpoints["Peer Endpoints
(one per resource config)"] Handler1["GPU Handler
@remote functions"] Handler2["CPU Handler
@remote functions"] StateQuery["Service Registry
Query State Manager"] @@ -47,22 +42,19 @@ graph TB Developer -->|flash build| Build Build -->|archive| S3 Developer -->|flash deploy --env| S3 - CLI -->|provision upfront
before activation| ChildEndpoints - Mothership -->|reconcile_children
on boot| ChildEndpoints - MothershipReconciler -->|update state| Database - ChildEndpoints -->|query manifest
peer-to-peer| Database - Developer -->|call @remote| ChildEndpoints - - style Mothership fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff - style ChildEndpoints fill:#388e3c,stroke:#1b5e20,stroke-width:3px,color:#fff + CLI -->|provision all endpoints| Endpoints + Endpoints -->|query manifest
peer-to-peer| Database + Developer -->|call @remote| Endpoints + + style Endpoints fill:#388e3c,stroke:#1b5e20,stroke-width:3px,color:#fff style Build fill:#f57c00,stroke:#e65100,stroke-width:3px,color:#fff ``` ### Key Concepts -**Mothership**: The orchestration endpoint responsible for deployment, resource provisioning, and manifest distribution. Created via `flash env create `. +**Endpoints**: All deployed endpoints are peers. The CLI provisions them upfront during `flash deploy`. Each endpoint loads the manifest from its `.flash/` directory and queries State Manager for peer discovery. -**Child Endpoints**: Worker endpoints that execute `@remote` functions. One per resource config (e.g., `gpu_config`, `cpu_config`). +**Worker Endpoints**: Endpoints that execute `@remote` functions. One per resource config (e.g., `gpu_config`, `cpu_config`). **Manifest**: JSON document describing all deployed functions, their resource configs, routing rules, and metadata. Built at compile-time, distributed to all endpoints. @@ -76,7 +68,7 @@ graph TB ### flash env create -Create a new deployment environment (mothership). +Create a new deployment environment. ```bash flash env create [--app ] @@ -91,7 +83,7 @@ flash env create [--app ] **What it does:** 1. Creates a FlashApp in Runpod (if first environment for the app) 2. Creates FlashEnvironment with the specified name -3. Provisions a mothership serverless endpoint +3. Provisions serverless endpoints **Example:** ```bash @@ -277,93 +269,37 @@ sequenceDiagram **Upload Process** (`src/runpod_flash/cli/commands/deploy.py:197-224`): 1. Archive uploaded to Runpod's built-in S3 storage 2. URL generated with temporary access -3. URL passed to mothership endpoint creation +3. URL passed to endpoint creation **Key Files:** - `src/runpod_flash/cli/commands/deploy.py` - Deploy CLI commands --- -### Phase 3: Mothership Boot & Reconciliation +### Phase 3: Endpoint Boot & Service Discovery -The mothership runs on each boot to perform reconcile_children() - reconciling desired state (manifest) with current state (local resources). Note: All resources are provisioned upfront by the CLI before environment activation. +Each endpoint boots independently. Endpoints that make cross-endpoint calls (i.e., call `@remote` functions deployed on a different resource config) query State Manager to discover peer endpoint URLs. Endpoints that only execute local functions do not need State Manager access. ```mermaid sequenceDiagram - Runpod->>Mothership: Boot endpoint - Mothership->>Mothership: Initialize runtime - Mothership->>ManifestFetcher: Load manifest from .flash/ - ManifestFetcher->>ManifestFetcher: Read flash_manifest.json - Mothership->>MothershipsProvisioner: Execute reconcile_children() - MothershipsProvisioner->>StateManager: Fetch persisted state - StateManager->>GraphQL: Query persisted manifest - GraphQL->>StateManager: Return persisted manifest - MothershipsProvisioner->>MothershipsProvisioner: Compute diff:
new, changed, removed - MothershipsProvisioner->>StateManager: Update state after
reconciliation - StateManager->>GraphQL: Mutation:
updateFlashBuildManifest - MothershipsProvisioner->>Mothership: Reconciliation complete -``` - -**Key Components:** - -**MothershipsProvisioner** (`src/runpod_flash/runtime/mothership_provisioner.py`): -- `is_mothership()`: Check if endpoint is mothership (FLASH_IS_MOTHERSHIP=true) -- `reconcile_children()`: Compute diff between desired and current state -- Verifies child endpoints are deployed and healthy -- Updates State Manager with reconciliation results - -**ResourceManager** (`src/runpod_flash/core/resources/resource_manager.py`): -- Singleton pattern (global resource registry) -- Stores state in `.runpod/resources.pkl` with file locking -- Tracks config hashes for drift detection (hash comparison) -- Provisioned upfront by CLI before environment activation -- Auto-migrates legacy resources - -**StateManagerClient** (`src/runpod_flash/runtime/state_manager_client.py`): -- GraphQL client for persisting manifest state -- Read-modify-write pattern for updates (3 GQL roundtrips) -- Thread-safe with asyncio.Lock for concurrent updates -- Retries with exponential backoff (3 attempts) - -**Reconciliation Logic**: -1. **Fetch persisted manifest**: Query State Manager for previous reconciliation state -2. **Compare with current manifest**: Detect new, changed, and removed resources -3. **Verify new resources**: Check that new endpoints are deployed and healthy -4. **Verify changed resources**: Check if hash differs, verify endpoint health -5. **Verify removed resources**: Check that deleted endpoints are decommissioned -6. **Persist new state**: Update State Manager with current reconciliation results - -**Key Files:** -- `src/runpod_flash/runtime/mothership_provisioner.py` - Reconciliation logic -- `src/runpod_flash/core/resources/resource_manager.py` - Resource provisioning -- `src/runpod_flash/runtime/state_manager_client.py` - State persistence - ---- - -### Phase 4: Child Endpoint Initialization - -Each child endpoint boots independently and prepares for function execution. - -```mermaid -sequenceDiagram - Runpod->>Child: Boot with handler_gpu_config.py - Child->>Child: Initialize runtime - Child->>ManifestFetcher: Load manifest from .flash/ + Runpod->>Endpoint: Boot with handler + Endpoint->>Endpoint: Initialize runtime + Endpoint->>ManifestFetcher: Load manifest from .flash/ ManifestFetcher->>ManifestFetcher: Check cache
(TTL: 300s) alt Cache expired - ManifestFetcher->>StateManager: Query GraphQL API
State Manager + ManifestFetcher->>StateManager: Query GraphQL API StateManager->>ManifestFetcher: Return manifest else Cache valid ManifestFetcher->>ManifestFetcher: Return cached end - ManifestFetcher->>Child: Manifest loaded - Child->>ServiceRegistry: Load manifest + ManifestFetcher->>Endpoint: Manifest loaded + Endpoint->>ServiceRegistry: Load manifest ServiceRegistry->>ServiceRegistry: Build function_registry ServiceRegistry->>ServiceRegistry: Build resource_mapping - Child->>StateManager: Query State Manager
peer-to-peer discovery - StateManager->>Child: Return peer endpoints - Child->>ServiceRegistry: Cache endpoint URLs - Child->>Ready: Ready to execute functions + Endpoint->>StateManager: Query State Manager
peer-to-peer discovery + StateManager->>Endpoint: Return peer endpoints + Endpoint->>ServiceRegistry: Cache endpoint URLs + Endpoint->>Ready: Ready to execute functions ``` **ManifestFetcher** (`src/runpod_flash/runtime/manifest_fetcher.py`): @@ -394,7 +330,7 @@ sequenceDiagram --- -### Phase 5: Runtime Function Execution +### Phase 4: Runtime Function Execution When client calls `@remote function`: @@ -530,7 +466,7 @@ The manifest is the contract between build-time and runtime. It defines all depl ### Runtime: Distribution & Caching -**Mothership Side** - `ManifestFetcher`: +**Endpoint Side** - `ManifestFetcher`: 1. **Check cache**: Is manifest cached and TTL valid? - Cache TTL: 300 seconds (configurable) @@ -547,7 +483,7 @@ The manifest is the contract between build-time and runtime. It defines all depl **Code Reference**: `src/runpod_flash/runtime/manifest_fetcher.py:47-118` -**Child Endpoint Side** - `ServiceRegistry`: +**Worker Endpoint Side** - `ServiceRegistry`: 1. **Load manifest**: From local file - Searches multiple locations (cwd, module dir, etc) @@ -558,7 +494,7 @@ The manifest is the contract between build-time and runtime. It defines all depl 3. **Query State Manager**: Get endpoint URLs via GraphQL - Queries Runpod State Manager GraphQL API directly - - Returns: Resource endpoints for all deployed child endpoints + - Returns: Resource endpoints for all deployed worker endpoints - Retries with exponential backoff 4. **Cache endpoints**: Store for routing decisions @@ -608,7 +544,7 @@ Write: Mutation updateFlashBuildManifest ## Resource Provisioning -Resources are dynamically provisioned by the mothership during boot, based on the manifest. +Resources are provisioned by the CLI during `flash deploy`, based on the manifest. ### ResourceManager: Local State @@ -646,14 +582,14 @@ Resources are dynamically provisioned by the mothership during boot, based on th ### Deployment Orchestration -**MothershipsProvisioner** reconciles manifest with local state: +The reconciler reconciles the manifest with the endpoint's local state: ```python # 1. Load manifest from flash_manifest.json manifest = load_manifest() # 2. Fetch persisted state from State Manager -persisted = await StateManagerClient.get_persisted_manifest(mothership_id) +persisted = await StateManagerClient.get_persisted_manifest(flash_environment_id) # 3. Compute diff diff = compute_manifest_diff(manifest, persisted) @@ -674,7 +610,7 @@ for resource_config in diff.removed: delete_resource(resource_config) # 7. Persist new state -await StateManagerClient.update_resource_state(mothership_id, resources) +await StateManagerClient.update_resource_state(flash_environment_id, resources) ``` **Parallel Deployment**: @@ -686,8 +622,6 @@ await StateManagerClient.update_resource_state(mothership_id, resources) - If hashes differ: Resource has been modified, trigger update - Prevents unnecessary updates when resource unchanged -**Code Reference**: `src/runpod_flash/runtime/mothership_provisioner.py:1-150` - --- ## Remote Execution @@ -870,21 +804,15 @@ graph TB Archive["Archive Builder
(tar.gz)"] end - subgraph Upload["Upload"] + subgraph Deploy["Deploy (CLI)"] S3["S3 Storage"] + Provisioner["ResourceManager
(provision endpoints)"] + StateMgr["StateManagerClient
(persist state)"] end - subgraph MothershipBoot["Mothership Boot"] - Fetcher["ManifestFetcher
(cache + GQL)"] - MProvisioner["MothershipsProvisioner
(reconciliation)"] - ResMgr["ResourceManager
(state)"] - StateMgr["StateManagerClient
(persistence)"] - end - - subgraph ChildBoot["Child Endpoint Boot"] - ChildFetcher["ManifestFetcher
(local file)"] + subgraph EndpointBoot["Endpoint Boot"] + Fetcher["ManifestFetcher
(local file + GQL)"] Registry["ServiceRegistry
(function mapping)"] - ManifestC["ManifestClient
(query mothership)"] end subgraph Runtime["Runtime Execution"] @@ -896,20 +824,16 @@ graph TB Scanner --> ManifestB ManifestB --> Archive Archive --> S3 - S3 --> Fetcher - Fetcher --> MProvisioner - MProvisioner --> ResMgr - ResMgr --> StateMgr - StateMgr -->|update| S3 - ChildFetcher --> Registry - ManifestC -->|query| Fetcher - Registry --> ManifestC + S3 --> Provisioner + Provisioner --> StateMgr + Fetcher --> Registry + Registry -->|query State Manager
peer-to-peer| StateMgr Handler --> Serial Serial --> Exec style Build fill:#f57c00,stroke:#e65100,stroke-width:3px,color:#fff - style MothershipBoot fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff - style ChildBoot fill:#388e3c,stroke:#1b5e20,stroke-width:3px,color:#fff + style Deploy fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff + style EndpointBoot fill:#388e3c,stroke:#1b5e20,stroke-width:3px,color:#fff style Runtime fill:#7b1fa2,stroke:#4a148c,stroke-width:3px,color:#fff ``` @@ -921,14 +845,12 @@ graph TB graph LR A["Build Time
ManifestBuilder"] -->|Generate| B["flash_manifest.json
(embedded in archive)"] B -->|Upload| C["S3
(artifact.tar.gz)"] - C -->|Provision upfront
before activation| D["Child Endpoints
(deployed)"] + C -->|CLI provisions
endpoints| D["Endpoints
(deployed)"] D -->|Extract from
.flash/ directory| E["LocalManifest
(from archive)"] - Mothership -->|Load from
.flash/| E E -->|Build registry| F["ServiceRegistry
(function mapping)"] F -->|Query State Manager
peer-to-peer| G["StateManager
(GraphQL API)"] G -->|Return endpoints| F F -->|Route calls| H["Handler
(execute)"] - Mothership -->|reconcile_children
on boot| D style A fill:#f57c00,stroke:#e65100,stroke-width:2px,color:#fff style B fill:#ff6f00,stroke:#e65100,stroke-width:2px,color:#fff @@ -938,7 +860,6 @@ graph LR style F fill:#388e3c,stroke:#1b5e20,stroke-width:2px,color:#fff style G fill:#0d47a1,stroke:#051c66,stroke-width:2px,color:#fff style H fill:#388e3c,stroke:#1b5e20,stroke-width:2px,color:#fff - style Mothership fill:#1976d2,stroke:#0d47a1,stroke-width:2px,color:#fff ``` --- @@ -947,7 +868,7 @@ graph LR ```mermaid graph LR - A["Mothership Boots"] -->|Load manifest| B["Desired State"] + A["CLI: flash deploy"] -->|Load manifest| B["Desired State"] B -->|Fetch persisted| C["Current State"] C -->|Compute diff| D{"Reconciliation"} D -->|new| E["Create Resource"] @@ -959,7 +880,7 @@ graph LR D -->|removed| J["Delete Resource"] J -->|Decommission| K["Deleted"] K -->|Remove state| G - G -->|On next boot| C + G -->|On next deploy| C style A fill:#1976d2,stroke:#0d47a1,stroke-width:2px,color:#fff style B fill:#1976d2,stroke:#0d47a1,stroke-width:2px,color:#fff @@ -974,39 +895,26 @@ graph LR ## Environment Variables Reference -### Mothership Configuration - -**FLASH_IS_MOTHERSHIP** (Required on mothership) -- Value: `"true"` -- Enables mothership auto-provisioning logic -- Triggers manifest reconciliation on boot - -**RUNPOD_ENDPOINT_ID** (Required on mothership) -- Runpod serverless endpoint ID -- Used to construct mothership URL: `https://{RUNPOD_ENDPOINT_ID}.api.runpod.ai` -- Set automatically by Runpod platform +### All Endpoints -**RUNPOD_API_KEY** (Required for State Manager) +**RUNPOD_API_KEY** (Required) - Runpod API authentication token - Used by StateManagerClient for GraphQL queries -- Enables manifest persistence +- Enables peer-to-peer service discovery and manifest persistence -### Child Endpoint Configuration - -**FLASH_RESOURCE_NAME** (Required on child endpoints) +**FLASH_RESOURCE_NAME** (Required) - Resource config name (e.g., "gpu_config", "cpu_config") - Identifies which resource config this endpoint represents - Used by ServiceRegistry for local vs remote detection -**RUNPOD_API_KEY** (Required for peer-to-peer discovery) -- API key for State Manager GraphQL access -- Enables endpoints to query manifest peer-to-peer -- Used by all endpoints for service discovery +**RUNPOD_ENDPOINT_ID** (Set by Runpod) +- Runpod serverless endpoint ID +- Used to construct endpoint URL: `https://{RUNPOD_ENDPOINT_ID}.api.runpod.ai` +- Set automatically by Runpod platform **FLASH_MANIFEST_PATH** (Optional) - Override default manifest file location - If not set, searches: cwd, module dir, parent dirs -- Useful for testing or non-standard layouts ### Runtime Configuration @@ -1048,7 +956,7 @@ Flash Deploy uses a dual-layer state system for reliability and consistency. ### Remote State: Runpod State Manager (GraphQL API) -**Purpose**: Persist deployment state across mothership boots +**Purpose**: Persist deployment state across endpoint boots **Data Model**: ```graphql @@ -1092,7 +1000,7 @@ async with state_manager_lock: ``` **Reconciliation**: -On mothership boot: +On deploy: 1. Load local manifest from .flash/ (desired state) 2. Fetch persisted manifest from State Manager (previous reconciliation state) 3. Compare → detect new, changed, removed resources @@ -1117,9 +1025,9 @@ flash build --preview 1. Builds your project (creates archive, manifest) 2. Creates a Docker network for inter-container communication 3. Starts one Docker container per resource config: - - Mothership container (orchestrator) + - Application container - All worker containers (GPU, CPU, etc.) -4. Exposes mothership on `localhost:8000` +4. Exposes application on `localhost:8000` 5. All containers communicate via Docker DNS 6. Auto-cleanup on exit (Ctrl+C) @@ -1132,25 +1040,6 @@ flash build --preview **Code Reference**: `src/runpod_flash/cli/commands/preview.py` -### Local Docker Testing - -For testing complete deployment flow locally: - -```bash -# Build project -flash build - -# Start local mothership simulator -docker run -it \ - -e FLASH_IS_MOTHERSHIP=true \ - -e RUNPOD_API_KEY=$RUNPOD_API_KEY \ - -v $(pwd)/.flash:/workspace/.flash \ - runpod-flash:latest - -# Run provisioner -python -m runpod_flash.runtime.mothership_provisioner -``` - ### Debugging Tips **Enable Debug Logging**: @@ -1189,7 +1078,6 @@ logging.getLogger("runpod_flash.runtime.service_registry").setLevel(logging.DEBU |------|---------| | `src/runpod_flash/cli/commands/deploy.py` | Deploy environment management commands | | `src/runpod_flash/cli/commands/build.py` | Build packaging and archive creation | -| `src/runpod_flash/cli/commands/test_mothership.py` | Local mothership testing | ### Build System @@ -1212,7 +1100,6 @@ logging.getLogger("runpod_flash.runtime.service_registry").setLevel(logging.DEBU |------|---------| | `src/runpod_flash/runtime/manifest_fetcher.py` | Manifest loading from local .flash/ directory | | `src/runpod_flash/runtime/state_manager_client.py` | GraphQL client for peer-to-peer service discovery | -| `src/runpod_flash/runtime/mothership_provisioner.py` | Auto-provisioning logic | ### Runtime: Execution @@ -1235,7 +1122,7 @@ logging.getLogger("runpod_flash.runtime.service_registry").setLevel(logging.DEBU ## Common Issues & Solutions -### Issue: Manifest not found on child endpoint +### Issue: Manifest not found on worker endpoint **Cause**: flash_manifest.json not included in archive or not found at runtime @@ -1255,13 +1142,13 @@ logging.getLogger("runpod_flash.runtime.service_registry").setLevel(logging.DEBU ### Issue: Remote function calls fail with endpoint not found -**Cause**: ServiceRegistry unable to query mothership or manifest outdated +**Cause**: ServiceRegistry unable to query State Manager or manifest outdated **Solution**: 1. Verify `RUNPOD_API_KEY` environment variable is set 2. Check State Manager GraphQL API is accessible 3. Verify manifest includes the resource config: `grep resource_name flash_manifest.json` -4. Check that child endpoints are deployed and healthy +4. Check that worker endpoints are deployed and healthy ### Issue: Manifest cache staleness diff --git a/docs/Load_Balancer_Endpoints.md b/docs/Load_Balancer_Endpoints.md index 091e1893..77ba38bc 100644 --- a/docs/Load_Balancer_Endpoints.md +++ b/docs/Load_Balancer_Endpoints.md @@ -4,7 +4,7 @@ The `LoadBalancerSlsResource` class enables provisioning and management of Runpod load-balanced serverless endpoints. Unlike queue-based endpoints that process requests sequentially, load-balanced endpoints expose HTTP servers directly to clients, enabling REST APIs, webhooks, and real-time communication patterns. -This resource type is used for specialized endpoints like the Mothership. Cross-endpoint service discovery now uses State Manager GraphQL API (peer-to-peer) rather than HTTP endpoints. +This resource type is used for specialized endpoints like entry-point endpoints. Cross-endpoint service discovery now uses State Manager GraphQL API (peer-to-peer) rather than HTTP endpoints. ## Design Context @@ -35,10 +35,10 @@ Load-balanced endpoints require different provisioning and health check logic th ### Why This Matters -The Mothership coordinates resource deployment and reconciliation. This requires: -- Peer-to-peer service discovery via State Manager GraphQL API (not HTTP-based) -- Ability to expose custom endpoints (HTTP routes like `/ping`, user-defined routes) -- Health checking to verify children are ready before routing traffic +Load-balanced endpoints expose HTTP servers directly to clients. This enables: +- Custom HTTP routes (user-defined REST endpoints, `/ping` for health checks) +- Direct request routing to workers (lower latency than queue-based) +- Health check polling to verify workers are ready before routing traffic ## Architecture @@ -147,9 +147,9 @@ This document focuses on the `LoadBalancerSlsResource` class implementation and from runpod_flash import LoadBalancerSlsResource # Create a load-balanced endpoint -mothership = LoadBalancerSlsResource( - name="mothership", - imageName="my-mothership-app:latest", +api_endpoint = LoadBalancerSlsResource( + name="api-endpoint", + imageName="my-api-app:latest", workersMin=1, workersMax=3, env={ @@ -159,7 +159,7 @@ mothership = LoadBalancerSlsResource( ) # Deploy endpoint (returns immediately) -deployed = await mothership.deploy() +deployed = await api_endpoint.deploy() # Endpoint is now deployed (may still be initializing) print(f"Endpoint ID: {deployed.id}") @@ -246,7 +246,7 @@ except ValueError as e: ```python try: endpoint = LoadBalancerSlsResource( - name="mothership", + name="api-endpoint", imageName="my-image:latest", ) deployed = await endpoint.deploy() @@ -294,10 +294,10 @@ If you need to verify the endpoint is ready before routing traffic: ```python # Deploy returns immediately -mothership = await LoadBalancerSlsResource(name="my-lb", ...).deploy() +endpoint = await LoadBalancerSlsResource(name="my-lb", ...).deploy() # Optional: Wait for endpoint to become healthy -healthy = await mothership._wait_for_health(max_retries=10, retry_interval=5) +healthy = await endpoint._wait_for_health(max_retries=10, retry_interval=5) if not healthy: print("Warning: Endpoint deployed but not yet healthy") ``` @@ -319,7 +319,7 @@ Default health check configuration: | Scalability | Per-function | Per-worker | | Health checks | Runpod SDK | `/ping` endpoint | | Use cases | Batch processing | APIs, webhooks, real-time | -| Suitable for | Workers | Mothership, services | +| Suitable for | Workers | APIs, services | ## Implementation Details @@ -411,7 +411,6 @@ endpoint = LoadBalancerSlsResource( ## Next Steps -- **Mothership integration**: Use LoadBalancerSlsResource for Mothership endpoints +- **Entry-point integration**: Use LoadBalancerSlsResource for entry-point endpoints - **Upfront provisioning**: CLI provisions all resources before environment activation -- **Reconciliation**: Mothership performs reconcile_children() on boot - **Cross-endpoint routing**: Route requests using State Manager GraphQL API (peer-to-peer) diff --git a/src/runpod_flash/cli/docs/README.md b/src/runpod_flash/cli/docs/README.md index a9a70853..1a1b4dfe 100644 --- a/src/runpod_flash/cli/docs/README.md +++ b/src/runpod_flash/cli/docs/README.md @@ -15,7 +15,7 @@ Create a new project, navigate to it, and install dependencies: ```bash flash init my-project cd my-project -pip install -r requirements.txt +uv sync # or: pip install -r requirements.txt ``` Add your Runpod API key to `.env`: @@ -295,16 +295,10 @@ Default location: `.flash/logs/activity.log` ``` my-project/ -├── main.py # Flash Server (FastAPI) -├── workers/ -│ ├── gpu/ # GPU worker -│ │ ├── __init__.py -│ │ └── endpoint.py -│ └── cpu/ # CPU worker -│ ├── __init__.py -│ └── endpoint.py +├── gpu_worker.py # GPU worker with @remote function +├── cpu_worker.py # CPU worker with @remote function ├── .env -├── requirements.txt +├── pyproject.toml # Python dependencies (uv/pip compatible) └── README.md ``` @@ -322,12 +316,12 @@ RUNPOD_API_KEY=your_api_key_here curl http://localhost:8888/ping # Call GPU worker -curl -X POST http://localhost:8888/gpu/hello \ +curl -X POST http://localhost:8888/gpu_worker/run_sync \ -H "Content-Type: application/json" \ -d '{"message": "Hello GPU!"}' # Call CPU worker -curl -X POST http://localhost:8888/cpu/hello \ +curl -X POST http://localhost:8888/cpu_worker/run_sync \ -H "Content-Type: application/json" \ -d '{"message": "Hello CPU!"}' ``` diff --git a/src/runpod_flash/cli/docs/flash-app.md b/src/runpod_flash/cli/docs/flash-app.md index 3abc29a2..00cecaff 100644 --- a/src/runpod_flash/cli/docs/flash-app.md +++ b/src/runpod_flash/cli/docs/flash-app.md @@ -444,8 +444,7 @@ flash deploy --app my-project ``` Or ensure you're in a valid Flash project directory with: -- `main.py` with Flash server -- `workers/` directory +- Python files containing `@remote` decorated functions - Proper project structure ### Multiple Apps With Same Name diff --git a/src/runpod_flash/cli/docs/flash-build.md b/src/runpod_flash/cli/docs/flash-build.md index 120fe60e..deb0e633 100644 --- a/src/runpod_flash/cli/docs/flash-build.md +++ b/src/runpod_flash/cli/docs/flash-build.md @@ -108,9 +108,9 @@ Launch a local Docker-based test environment immediately after building. This al 1. Builds your project (creates archive, manifest) 2. Creates a Docker network for inter-container communication 3. Starts one Docker container per resource config: - - Mothership container (orchestrator) + - Application container - All worker containers (GPU, CPU, etc.) -4. Exposes the mothership on `localhost:8000` +4. Exposes the application on `localhost:8888` 5. All containers communicate via Docker DNS 6. On shutdown (Ctrl+C), automatically stops and removes all containers @@ -192,7 +192,7 @@ Successful build displays: ### Build fails with "functions not found" -Ensure your project has `@remote` decorated functions in `workers/` directory: +Ensure your project has `@remote` decorated functions in your `.py` files: ```python from runpod_flash import remote, LiveServerless diff --git a/src/runpod_flash/cli/docs/flash-deploy.md b/src/runpod_flash/cli/docs/flash-deploy.md index 504ad874..d0fcb6a7 100644 --- a/src/runpod_flash/cli/docs/flash-deploy.md +++ b/src/runpod_flash/cli/docs/flash-deploy.md @@ -27,25 +27,25 @@ The `flash deploy` command is the primary way to get your Flash application runn ## Architecture: Fully Deployed to Runpod -With `flash deploy`, your **entire application** runs on Runpod Serverless—both your FastAPI app (the "orchestrator") and all `@remote` worker functions: +With `flash deploy`, your **entire application** runs on Runpod Serverless—all `@remote` functions deploy as peer serverless endpoints: ``` ┌─────────────────────────────────────────────────────────────────┐ │ RUNPOD SERVERLESS │ │ │ -│ ┌─────────────────────────────────────┐ │ -│ │ MOTHERSHIP ENDPOINT │ │ -│ │ (your FastAPI app from main.py) │ │ -│ │ - Your HTTP routes │ │ -│ │ - Orchestrates @remote calls │───────────┐ │ -│ │ - Public URL for users │ │ │ -│ └─────────────────────────────────────┘ │ │ -│ │ internal │ -│ ▼ │ +│ All endpoints deployed as peers, using manifest for discovery │ +│ │ │ ┌─────────────────────────┐ ┌─────────────────────────┐ │ │ │ gpu-worker │ │ cpu-worker │ │ │ │ (your @remote function) │ │ (your @remote function) │ │ │ └─────────────────────────┘ └─────────────────────────┘ │ +│ │ +│ ┌─────────────────────────┐ │ +│ │ lb-worker │ │ +│ │ (load-balanced endpoint)│ │ +│ └─────────────────────────┘ │ +│ │ +│ Service discovery: flash_manifest.json + State Manager GraphQL │ └─────────────────────────────────────────────────────────────────┘ ▲ │ HTTPS (authenticated) @@ -56,9 +56,8 @@ With `flash deploy`, your **entire application** runs on Runpod Serverless—bot ``` **Key points:** -- **Your FastAPI app runs on Runpod** as the "mothership" endpoint -- **`@remote` functions run on Runpod** as separate worker endpoints -- **Users call the mothership URL** directly (e.g., `https://xyz123.api.runpod.ai/api/hello`) +- **All `@remote` functions run on Runpod** as serverless endpoints +- **Users call endpoint URLs** directly (e.g., `https://xyz123.api.runpod.ai/api/hello`) - **No `live-` prefix** on endpoint names (these are production endpoints) - **No hot reload:** code changes require a new deployment @@ -68,7 +67,7 @@ This is different from `flash run`, where your FastAPI app runs locally on your | Aspect | `flash run` | `flash deploy` | |--------|-------------|----------------| -| **FastAPI app runs on** | Your machine (localhost) | Runpod Serverless (mothership) | +| **App runs on** | Your machine (localhost) | Runpod Serverless | | **`@remote` functions run on** | Runpod Serverless | Runpod Serverless | | **Endpoint naming** | `live-` prefix (e.g., `live-gpu-worker`) | No prefix (e.g., `gpu-worker`) | | **Hot reload** | Yes | No | @@ -183,9 +182,9 @@ Builds your project and launches a local Docker-based test environment instead o 1. Builds your project (creates the archive and manifest) 2. Creates a Docker network for inter-container communication 3. Starts one Docker container per resource config: - - Mothership container (orchestrator) + - Application container - All worker containers (GPU, CPU, etc.) -4. Exposes the mothership on `localhost:8000` +4. Exposes the application on `localhost:8000` 5. All containers communicate via Docker DNS 6. On shutdown (Ctrl+C), automatically stops and removes all containers @@ -350,7 +349,7 @@ Next Steps: variable... 2. Call Your Functions - Your mothership is deployed at: + Your application is deployed at: https://api-xxxxx.runpod.net 3. Available Routes diff --git a/src/runpod_flash/cli/docs/flash-env.md b/src/runpod_flash/cli/docs/flash-env.md index c3f87744..81ce7993 100644 --- a/src/runpod_flash/cli/docs/flash-env.md +++ b/src/runpod_flash/cli/docs/flash-env.md @@ -464,8 +464,7 @@ flash env delete **Problem**: Command requires `--app` flag even when in project directory **Solution**: Ensure you're in a Flash project directory with: -- `main.py` with Flash server -- `workers/` directory +- Python files containing `@remote` decorated functions - `.env` file with `RUNPOD_API_KEY` Or specify app explicitly: diff --git a/src/runpod_flash/cli/docs/flash-init.md b/src/runpod_flash/cli/docs/flash-init.md index 082b619a..19c32f13 100644 --- a/src/runpod_flash/cli/docs/flash-init.md +++ b/src/runpod_flash/cli/docs/flash-init.md @@ -4,7 +4,7 @@ Create a new Flash project with a ready-to-use template structure. ## Overview -The `flash init` command scaffolds a new Flash project with everything you need to get started: a main server (mothership), example GPU and CPU workers, and the directory structure that Flash expects. It's the fastest way to go from zero to a working distributed application. +The `flash init` command scaffolds a new Flash project with everything you need to get started: example GPU and CPU worker files with `@remote` functions and the project structure that Flash expects. It's the fastest way to go from zero to a working distributed application. > **Note:** This command only creates **local files**. It doesn't interact with Runpod or create any cloud resources. Cloud resources (apps, environments, endpoints) are created later when you run `flash deploy`. @@ -51,16 +51,10 @@ flash init my-project --force ``` my-project/ -├── main.py # Flash Server (FastAPI) -├── workers/ -│ ├── gpu/ # GPU worker example -│ │ ├── __init__.py -│ │ └── endpoint.py -│ └── cpu/ # CPU worker example -│ ├── __init__.py -│ └── endpoint.py +├── gpu_worker.py # GPU worker with @remote function +├── cpu_worker.py # CPU worker with @remote function ├── .env -├── requirements.txt +├── pyproject.toml # Python dependencies (uv/pip compatible) └── README.md ``` @@ -68,7 +62,7 @@ my-project/ ```bash cd my-project -pip install -r requirements.txt # or use your preferred environment manager +uv sync # or: pip install -r requirements.txt # Add RUNPOD_API_KEY to .env flash run ``` diff --git a/src/runpod_flash/cli/docs/flash-run.md b/src/runpod_flash/cli/docs/flash-run.md index 0b9cfd73..70976d6c 100644 --- a/src/runpod_flash/cli/docs/flash-run.md +++ b/src/runpod_flash/cli/docs/flash-run.md @@ -4,7 +4,7 @@ Start the Flash development server for testing/debugging/development. ## Overview -The `flash run` command starts a local development server that hosts your FastAPI app on your machine while deploying `@remote` functions to Runpod Serverless. This hybrid architecture lets you rapidly iterate on your application with hot-reload while testing real GPU/CPU workloads in the cloud. +The `flash run` command starts a local development server that auto-discovers your `@remote` functions and serves them on your machine while deploying them to Runpod Serverless. This hybrid architecture lets you rapidly iterate on your application with hot-reload while testing real GPU/CPU workloads in the cloud. Use `flash run` when you want to skip the build step and test/develop/debug your remote functions rapidly before deploying your full application with `flash deploy`. (See [Flash Deploy](./flash-deploy.md) for details.) @@ -16,10 +16,10 @@ With `flash run`, your system runs in a **hybrid architecture**: ┌─────────────────────────────────────────────────────────────────┐ │ YOUR MACHINE (localhost:8888) │ │ ┌─────────────────────────────────────┐ │ -│ │ FastAPI App (main.py) │ │ -│ │ - Your HTTP routes │ │ -│ │ - Orchestrates @remote calls │─────────┐ │ -│ │ - Hot-reload enabled │ │ │ +│ │ Auto-generated server │ │ +│ │ (.flash/server.py) │ │ +│ │ - Discovers @remote functions │─────────┐ │ +│ │ - Hot-reload via watchfiles │ │ │ │ └─────────────────────────────────────┘ │ │ └──────────────────────────────────────────────────│──────────────┘ │ HTTPS @@ -34,10 +34,11 @@ With `flash run`, your system runs in a **hybrid architecture**: ``` **Key points:** -- **Your FastAPI app runs locally** on your machine (uvicorn at `localhost:8888`) +- **`flash run` auto-discovers `@remote` functions** and generates `.flash/server.py` +- **Queue-based (QB) routes execute locally** at `/{file_prefix}/run_sync` +- **Load-balanced (LB) routes dispatch remotely** via `LoadBalancerSlsStub` - **`@remote` functions run on Runpod** as serverless endpoints -- **Your machine is the orchestrator** that calls remote endpoints when you invoke `@remote` functions -- **Hot reload works** because your app code is local—changes are picked up instantly +- **Hot reload** watches for `.py` file changes via watchfiles - **Endpoints are prefixed with `live-`** to distinguish development endpoints from production (e.g., `gpu-worker` becomes `live-gpu-worker`) This is different from `flash deploy`, where **everything** (including your FastAPI app) runs on Runpod. See [flash deploy](./flash-deploy.md) for the fully-deployed architecture. @@ -73,9 +74,9 @@ flash run --host 0.0.0.0 --port 8000 ## What It Does -1. Discovers `main.py` (or `app.py`, `server.py`) -2. Checks for FastAPI app -3. Starts uvicorn server with hot reload +1. Scans project files for `@remote` decorated functions +2. Generates `.flash/server.py` with QB and LB routes +3. Starts uvicorn server with hot-reload via watchfiles 4. GPU workers use LiveServerless (no packaging needed) ### How It Works @@ -84,8 +85,11 @@ When you call a `@remote` function using `flash run`, Flash deploys a **Serverle ``` flash run │ + ├── Scans project for @remote functions + ├── Generates .flash/server.py ├── Starts local server (e.g. localhost:8888) - │ └── Hosts your FastAPI mothership + │ ├── QB routes: /{file_prefix}/run_sync (local execution) + │ └── LB routes: /{file_prefix}/{path} (remote dispatch) │ └── On @remote function call: └── Deploys a Serverless endpoint (if not cached) @@ -106,7 +110,7 @@ Auto-provisioning discovers and deploys Serverless endpoints before the Flash de ### How It Works -1. **Resource Discovery**: Scans your FastAPI app for `@remote` decorated functions +1. **Resource Discovery**: Scans project files for `@remote` decorated functions 2. **Parallel Deployment**: Deploys resources concurrently (up to 3 at a time) 3. **Confirmation**: Asks for confirmation if deploying more than 5 endpoints 4. **Caching**: Stores deployed resources in `.runpod/resources.pkl` for reuse across runs