diff --git a/README.md b/README.md
index 8b9c3ea9..cce420ce 100644
--- a/README.md
+++ b/README.md
@@ -141,7 +141,7 @@ These endpoints use the same Python `@remote` decorators [demonstrated above](#g
### Step 1: Initialize a new project
-Use the `flash init` command to generate a structured project template with a preconfigured FastAPI application entry point.
+Use the `flash init` command to generate a project template with example worker files.
Run this command to initialize a new project directory:
@@ -162,30 +162,24 @@ This is the structure of the project template created by `flash init`:
```txt
my_project/
-├── main.py # FastAPI application entry point
-├── workers/
-│ ├── gpu/ # GPU worker example
-│ │ ├── __init__.py # FastAPI router
-│ │ └── endpoint.py # GPU script @remote decorated function
-│ └── cpu/ # CPU worker example
-│ ├── __init__.py # FastAPI router
-│ └── endpoint.py # CPU script with @remote decorated function
-├── .env # Environment variable template
+├── gpu_worker.py # GPU worker with @remote function
+├── cpu_worker.py # CPU worker with @remote function
+├── .env # Environment variable template
├── .gitignore # Git ignore patterns
├── .flashignore # Flash deployment ignore patterns
-├── requirements.txt # Python dependencies
+├── pyproject.toml # Python dependencies (uv/pip compatible)
└── README.md # Project documentation
```
This template includes:
-- A FastAPI application entry point and routers.
+- Example worker files with `@remote` decorated functions.
- Templates for Python dependencies, `.env`, `.gitignore`, etc.
-- Flash scripts (`endpoint.py`) for both GPU and CPU workers, which include:
+- Each worker file contains:
- Pre-configured worker scaling limits using the `LiveServerless()` object.
- A `@remote` decorated function that returns a response from a worker.
-When you start the FastAPI server, it creates API endpoints at `/gpu/hello` and `/cpu/hello`, which call the remote function described in their respective `endpoint.py` files.
+When you run `flash run`, it auto-discovers all `@remote` functions and generates a local development server at `.flash/server.py`. Queue-based workers are exposed at `/{file_prefix}/run_sync` (e.g., `/gpu_worker/run_sync`).
### Step 3: Install Python dependencies
@@ -195,9 +189,11 @@ After initializing the project, navigate into the project directory:
cd my_project
```
-Install required dependencies:
+Install required dependencies using uv (recommended) or pip:
```bash
+uv sync # recommended
+# or
pip install -r requirements.txt
```
@@ -232,7 +228,7 @@ flash run
Open a new terminal tab or window and test your GPU API using cURL:
```bash
-curl -X POST http://localhost:8888/gpu/hello \
+curl -X POST http://localhost:8888/gpu_worker/run_sync \
-H "Content-Type: application/json" \
-d '{"message": "Hello from the GPU!"}'
```
@@ -257,19 +253,18 @@ Besides starting the API server, `flash run` also starts an interactive API expl
To run remote functions in the explorer:
-1. Expand one of the functions under **GPU Workers** or **CPU Workers**.
-2. Click **Try it out** and then **Execute**
+1. Expand one of the available endpoints (e.g., `/gpu_worker/run_sync`).
+2. Click **Try it out** and then **Execute**.
You'll get a response from your workers right in the explorer.
### Step 7: Customize your API
-To customize your API endpoint and functionality:
+To customize your API:
-1. Add/edit remote functions in your `endpoint.py` files.
-2. Test the scripts individually by running `python endpoint.py`.
-3. Configure your FastAPI routers by editing the `__init__.py` files.
-4. Add any new endpoints to your `main.py` file.
+1. Create new `.py` files with `@remote` decorated functions.
+2. Test the scripts individually by running `python your_worker.py`.
+3. Run `flash run` to auto-discover all `@remote` functions and serve them.
## CLI Reference
@@ -541,7 +536,7 @@ After `flash build` completes:
- `.flash/artifact.tar.gz`: Deployment package
- `.flash/flash_manifest.json`: Service discovery configuration
-For information on load-balanced endpoints (required for Mothership and HTTP services), see [docs/Load_Balancer_Endpoints.md](docs/Load_Balancer_Endpoints.md).
+For information on load-balanced endpoints (required for HTTP services), see [docs/Load_Balancer_Endpoints.md](docs/Load_Balancer_Endpoints.md).
#### Troubleshooting Build Issues
diff --git a/VERIFICATION.md b/VERIFICATION.md
deleted file mode 100644
index 02def0ec..00000000
--- a/VERIFICATION.md
+++ /dev/null
@@ -1,303 +0,0 @@
-# Docker Image Constants Fix - Verification Guide
-
-This document provides step-by-step instructions for verifying the Docker image constant configuration fix.
-
-## Overview
-
-**Commit**: `1f3a6fd` - "refactor(resources): centralize docker image configuration"
-
-The fix centralizes all Docker image references into constants that support environment variable overrides. This eliminates hardcoded image names and enables flexible configuration for local development, testing, and production deployment.
-
-## Quick Start
-
-### Run All Tests
-
-```bash
-cd /Users/deanquinanola/Github/python/runpod-flash
-
-# Run the verification script
-uv run python3 scripts/test-image-constants.py
-```
-
-Expected output:
-```
-✓ 20/20 tests passed
-✓ ALL TESTS PASSED
-
-The Docker image configuration fix is working correctly:
- ✓ Constants are properly centralized
- ✓ Manifest builder uses constants
- ✓ LiveServerless classes use constants
- ✓ Environment variables override constants
- ✓ No hardcoded values remain
-```
-
-## Individual Test Scenarios
-
-### Test 1: Constants Are Defined
-
-```bash
-uv run python3 << 'EOF'
-import sys
-sys.path.insert(0, 'src')
-
-from runpod_flash.core.resources.constants import (
- FLASH_IMAGE_TAG,
- FLASH_GPU_IMAGE,
- FLASH_CPU_IMAGE,
- FLASH_LB_IMAGE,
- FLASH_CPU_LB_IMAGE,
- DEFAULT_WORKERS_MIN,
- DEFAULT_WORKERS_MAX,
-)
-
-print(f"FLASH_IMAGE_TAG: {FLASH_IMAGE_TAG}")
-print(f"FLASH_GPU_IMAGE: {FLASH_GPU_IMAGE}")
-print(f"FLASH_CPU_IMAGE: {FLASH_CPU_IMAGE}")
-print(f"FLASH_LB_IMAGE: {FLASH_LB_IMAGE}")
-print(f"FLASH_CPU_LB_IMAGE: {FLASH_CPU_LB_IMAGE}")
-print(f"DEFAULT_WORKERS_MIN: {DEFAULT_WORKERS_MIN}")
-print(f"DEFAULT_WORKERS_MAX: {DEFAULT_WORKERS_MAX}")
-EOF
-```
-
-### Test 2: Environment Variable Override (FLASH_IMAGE_TAG=local)
-
-```bash
-FLASH_IMAGE_TAG=local uv run python3 << 'EOF'
-import sys
-sys.path.insert(0, 'src')
-
-from runpod_flash.core.resources.constants import (
- FLASH_IMAGE_TAG,
- FLASH_GPU_IMAGE,
- FLASH_LB_IMAGE,
- FLASH_CPU_LB_IMAGE,
-)
-
-print(f"With FLASH_IMAGE_TAG={FLASH_IMAGE_TAG}:")
-print(f" FLASH_GPU_IMAGE: {FLASH_GPU_IMAGE}")
-print(f" FLASH_LB_IMAGE: {FLASH_LB_IMAGE}")
-print(f" FLASH_CPU_LB_IMAGE: {FLASH_CPU_LB_IMAGE}")
-
-assert ":local" in FLASH_GPU_IMAGE
-assert ":local" in FLASH_LB_IMAGE
-assert ":local" in FLASH_CPU_LB_IMAGE
-print("✓ All images use :local tag")
-EOF
-```
-
-### Test 3: Individual Image Override
-
-```bash
-FLASH_CPU_LB_IMAGE=custom/lb-cpu:v1 uv run python3 << 'EOF'
-import sys
-sys.path.insert(0, 'src')
-
-from runpod_flash.core.resources.constants import FLASH_CPU_LB_IMAGE
-
-print(f"FLASH_CPU_LB_IMAGE: {FLASH_CPU_LB_IMAGE}")
-assert FLASH_CPU_LB_IMAGE == "custom/lb-cpu:v1"
-print("✓ Custom override works")
-EOF
-```
-
-### Test 4: Manifest Builder Uses Constants
-
-```bash
-uv run python3 << 'EOF'
-import sys
-sys.path.insert(0, 'src')
-
-from pathlib import Path
-from runpod_flash.cli.commands.build_utils.manifest import ManifestBuilder
-from runpod_flash.core.resources.constants import (
- FLASH_CPU_LB_IMAGE,
- DEFAULT_WORKERS_MIN,
- DEFAULT_WORKERS_MAX,
-)
-
-builder = ManifestBuilder(project_name="test", remote_functions=[])
-mothership = builder._create_mothership_resource({
- "file_path": Path("main.py"),
- "app_variable": "app"
-})
-
-print(f"Mothership configuration:")
-print(f" imageName: {mothership['imageName']} (expected: {FLASH_CPU_LB_IMAGE})")
-print(f" workersMin: {mothership['workersMin']} (expected: {DEFAULT_WORKERS_MIN})")
-print(f" workersMax: {mothership['workersMax']} (expected: {DEFAULT_WORKERS_MAX})")
-
-assert mothership['imageName'] == FLASH_CPU_LB_IMAGE
-assert mothership['workersMin'] == DEFAULT_WORKERS_MIN
-assert mothership['workersMax'] == DEFAULT_WORKERS_MAX
-
-print("✓ Manifest builder uses constants correctly")
-EOF
-```
-
-### Test 5: LiveServerless Uses Constants
-
-```bash
-uv run python3 << 'EOF'
-import sys
-sys.path.insert(0, 'src')
-
-from runpod_flash import LiveServerless, LiveLoadBalancer, CpuLiveLoadBalancer
-from runpod_flash.core.resources.constants import (
- FLASH_GPU_IMAGE,
- FLASH_LB_IMAGE,
- FLASH_CPU_LB_IMAGE,
-)
-
-gpu_ls = LiveServerless(name="test-gpu")
-gpu_lb = LiveLoadBalancer(name="test-gpu-lb")
-cpu_lb = CpuLiveLoadBalancer(name="test-cpu-lb")
-
-print(f"Resource image configuration:")
-print(f" LiveServerless: {gpu_ls.imageName} (expected: {FLASH_GPU_IMAGE})")
-print(f" LiveLoadBalancer: {gpu_lb.imageName} (expected: {FLASH_LB_IMAGE})")
-print(f" CpuLiveLoadBalancer: {cpu_lb.imageName} (expected: {FLASH_CPU_LB_IMAGE})")
-
-assert gpu_ls.imageName == FLASH_GPU_IMAGE
-assert gpu_lb.imageName == FLASH_LB_IMAGE
-assert cpu_lb.imageName == FLASH_CPU_LB_IMAGE
-
-print("✓ All LiveServerless classes use correct image constants")
-EOF
-```
-
-### Test 6: No Hardcoded Values Remain
-
-```bash
-# Verify no hardcoded image names in manifest.py
-grep -n "runpod/runpod-flash-lb" src/runpod_flash/cli/commands/build_utils/manifest.py || echo "✓ No hardcoded images found"
-
-# Verify constants are imported
-grep "FLASH_CPU_LB_IMAGE\|FLASH_LB_IMAGE\|DEFAULT_WORKERS" src/runpod_flash/cli/commands/build_utils/manifest.py
-```
-
-### Test 7: Unit Tests Pass
-
-```bash
-# Run manifest mothership tests
-uv run pytest tests/unit/cli/commands/build_utils/test_manifest_mothership.py -v
-
-# Run all tests
-uv run pytest --tb=short
-```
-
-## Test Coverage
-
-The verification tests cover:
-
-1. **Constants Definition** (✓ 7 tests)
- - All 7 constants properly defined
- - Default values correct
- - Support environment variable overrides
-
-2. **Manifest Builder Integration** (✓ 3 tests)
- - `_create_mothership_resource()` uses constants
- - `_create_mothership_from_explicit()` uses constants
- - Worker count constants used correctly
-
-3. **LiveServerless Integration** (✓ 3 tests)
- - `LiveServerless` uses `FLASH_GPU_IMAGE`
- - `LiveLoadBalancer` uses `FLASH_LB_IMAGE`
- - `CpuLiveLoadBalancer` uses `FLASH_CPU_LB_IMAGE`
-
-4. **Environment Variable Overrides** (✓ 1 test)
- - `FLASH_IMAGE_TAG=dev` works correctly
- - Individual image overrides work
-
-5. **Code Quality** (✓ 6 tests)
- - No hardcoded image names remain
- - Constants are properly imported
- - Code follows project patterns
-
-## Environment Variables
-
-### Global Override: FLASH_IMAGE_TAG
-
-Affects all images at once:
-
-```bash
-export FLASH_IMAGE_TAG=local
-# or
-export FLASH_IMAGE_TAG=dev
-# or
-export FLASH_IMAGE_TAG=staging
-```
-
-### Individual Overrides
-
-Override specific images:
-
-```bash
-export FLASH_GPU_IMAGE=my-registry/runpod-flash:custom
-export FLASH_CPU_IMAGE=my-registry/runpod-flash-cpu:custom
-export FLASH_LB_IMAGE=my-registry/runpod-flash-lb:custom
-export FLASH_CPU_LB_IMAGE=my-registry/runpod-flash-lb-cpu:custom
-```
-
-## Files Modified
-
-- `src/runpod_flash/cli/commands/build_utils/manifest.py` - Uses constants
-- `src/runpod_flash/cli/commands/test_mothership.py` - Uses constants
-- `src/runpod_flash/core/resources/constants.py` - Centralizes constants
-- `src/runpod_flash/core/resources/live_serverless.py` - Imports from constants
-- `tests/unit/cli/commands/build_utils/test_manifest_mothership.py` - Updated tests
-
-## Related Documentation
-
-- **Commit**: `1f3a6fd` - Full diff of changes
-- **CLAUDE.md**: Project development guidelines
-- **README**: Project overview
-
-## Future Verification
-
-To re-run this verification after future changes:
-
-```bash
-cd /Users/deanquinanola/Github/python/runpod-flash
-uv run python3 scripts/test-image-constants.py
-```
-
-This script can be retained indefinitely and re-run to ensure the fix remains intact.
-
-## Troubleshooting
-
-### Test Fails with "Module not found"
-
-Make sure you're running from the runpod-flash directory:
-```bash
-cd /Users/deanquinanola/Github/python/runpod-flash
-```
-
-### Constants Have Unexpected Values
-
-Check if environment variables are set:
-```bash
-echo $FLASH_IMAGE_TAG
-echo $FLASH_CPU_LB_IMAGE
-```
-
-Unset them if they're interfering:
-```bash
-unset FLASH_IMAGE_TAG FLASH_CPU_LB_IMAGE FLASH_LB_IMAGE
-```
-
-### Manifest Not Using Constants
-
-Verify imports in manifest.py:
-```bash
-grep "from runpod_flash.core.resources.constants import" src/runpod_flash/cli/commands/build_utils/manifest.py
-```
-
-## Summary
-
-✅ All hardcoded image names have been eliminated
-✅ Constants are centralized with environment variable support
-✅ All tests pass (856 passed, 68.74% coverage)
-✅ Backward compatible (defaults unchanged)
-✅ Ready for production deployment
diff --git a/docs/Cross_Endpoint_Routing.md b/docs/Cross_Endpoint_Routing.md
index aa851705..b9318c9b 100644
--- a/docs/Cross_Endpoint_Routing.md
+++ b/docs/Cross_Endpoint_Routing.md
@@ -342,7 +342,7 @@ graph TD
B -->|"load service configuration"| C["ServiceRegistry"]
C -->|"if not cached"| D["ManifestClient"]
- D -->|"query mothership API"| E["Manifest
Endpoint URLs"]
+ D -->|"query State Manager API"| E["Manifest
Endpoint URLs"]
E -->|"cache result
TTL 300s"| C
C -->|"lookup in manifest
flash_manifest.json"| F{"Routing
Decision"}
@@ -465,7 +465,7 @@ class ServiceRegistry:
Environment Variables (for local vs remote detection):
RUNPOD_API_KEY: API key for State Manager GraphQL access (peer-to-peer).
- FLASH_RESOURCE_NAME: Resource config name for this endpoint (child endpoints).
+ FLASH_RESOURCE_NAME: Resource config name for this endpoint (worker endpoints).
Identifies which resource config this endpoint represents in the manifest.
RUNPOD_ENDPOINT_ID: Endpoint ID (used as fallback for identification).
"""
@@ -473,7 +473,7 @@ class ServiceRegistry:
self._state_manager_client = state_manager_client or StateManagerClient()
self._endpoint_registry = {} # Cached endpoint URLs
self._endpoint_registry_lock = asyncio.Lock()
- # Child endpoints use FLASH_RESOURCE_NAME to identify which resource they represent
+ # Worker endpoints use FLASH_RESOURCE_NAME to identify which resource they represent
# Falls back to RUNPOD_ENDPOINT_ID if not set
self._current_endpoint = os.getenv("FLASH_RESOURCE_NAME") or os.getenv(
"RUNPOD_ENDPOINT_ID"
@@ -531,7 +531,7 @@ class ServiceRegistry:
**Location**: `src/runpod_flash/runtime/state_manager_client.py`
-GraphQL client for State Manager manifest persistence (used by mothership auto-provisioning):
+GraphQL client for State Manager manifest persistence (used by endpoint auto-provisioning):
```python
class StateManagerClient:
@@ -542,10 +542,13 @@ class StateManagerClient:
"""
async def get_persisted_manifest(
- self, mothership_id: str
+ self, flash_environment_id: str
) -> Optional[Dict[str, Any]]:
"""Fetch persisted manifest from State Manager.
+ Args:
+ flash_environment_id: ID of the Flash environment.
+
Returns:
Manifest dict or None if not found (first boot).
@@ -556,7 +559,7 @@ class StateManagerClient:
async def update_resource_state(
self,
- mothership_id: str,
+ flash_environment_id: str,
resource_name: str,
resource_data: Dict[str, Any],
) -> None:
@@ -815,7 +818,7 @@ class JsonSerializer:
#### Adding New Manifest Backends
-To support directories other than mothership:
+To support alternative manifest backends:
1. Create client class with `get_manifest()` method:
```python
@@ -974,7 +977,7 @@ print(f"RUNPOD_ENDPOINT_ID: {os.getenv('RUNPOD_ENDPOINT_ID')}")
# Check state manager client directly
client = StateManagerClient()
-manifest = await client.get_persisted_manifest(mothership_id)
+manifest = await client.get_persisted_manifest(flash_environment_id)
```
## Peer-to-Peer Architecture with StateManagerClient
@@ -983,7 +986,7 @@ manifest = await client.get_persisted_manifest(mothership_id)
Cross-endpoint routing uses a **peer-to-peer architecture** where all endpoints query State Manager directly for service discovery. This eliminates single points of failure and simplifies the system architecture compared to previous hub-and-spoke models.
-**Key Difference**: No mothership endpoint exposing a `/manifest` HTTP endpoint. Instead, all endpoints use `StateManagerClient` to query the Runpod GraphQL API directly.
+**Key Difference**: No dedicated endpoint exposing a `/manifest` HTTP endpoint. Instead, all endpoints use `StateManagerClient` to query the Runpod GraphQL API directly.
### Architecture
@@ -1034,7 +1037,7 @@ export RUNPOD_ENDPOINT_ID=gpu-endpoint-123
- **Caching**: 300-second TTL cache to minimize API calls
- **Retry Logic**: Exponential backoff on failures (default 3 attempts)
- **Thread-Safe**: Uses `asyncio.Lock` for concurrent operations
-- **Auto-Provisioning**: Used by mothership provisioner to update resource state
+- **Auto-Provisioning**: Used by endpoint provisioner to update resource state
## Key Implementation Highlights
diff --git a/docs/Deployment_Architecture.md b/docs/Deployment_Architecture.md
index cc395aaf..c483debf 100644
--- a/docs/Deployment_Architecture.md
+++ b/docs/Deployment_Architecture.md
@@ -1,7 +1,7 @@
# Flash App Deployment Architecture Specification
## Overview
-A deployed Flash App consists of a Mothership coordinator and distributed Child Endpoints, where functions are partitioned across endpoints. The system uses a manifest-driven approach to route requests and coordinate execution across the distributed topology.
+A deployed Flash App consists of peer endpoints, where functions are partitioned across endpoints. The system uses a manifest-driven approach to route requests and coordinate execution across the distributed topology.
## Build and Deploy Flow
@@ -11,33 +11,31 @@ graph TD
B -->|"Write"| C["flash_manifest.json"]
B -->|"Archive"| D["artifact.tar.gz"]
- D -->|"flash deploy"| E["Push Archive +
Provision Resources"]
+ D -->|"flash deploy"| E["Push Archive +
Load Manifest"]
- E -->|"CLI provisions
upfront"| F["Child Endpoints
Deployed"]
-
- G["🎯 Mothership
Endpoint"] -->|"Load from
.flash/"| H["Load Local
Manifest"]
-
- H --> I["reconcile_children()"]
+ E --> I["Reconcile:
Compute Diff"]
I --> J["Categorize:
New, Changed,
Removed, Unchanged"]
- J --> K["Verify NEW
Endpoints"]
- J --> L["Verify CHANGED
Endpoints"]
- J --> M["Verify REMOVED
Endpoints"]
+ J --> K["Provision NEW
Endpoints"]
+ J --> L["Update CHANGED
Endpoints"]
+ J --> M["Remove DELETED
Endpoints"]
J --> N["Skip UNCHANGED
Endpoints"]
- K -->|"Healthy?"| O["Update State"]
- L -->|"Healthy?"| O
- M -->|"Decommissioned?"| O
+ K -->|"Deployed"| O["Update State"]
+ L -->|"Updated"| O
+ M -->|"Decommissioned"| O
O --> P["Persist to State Manager"]
- P --> Q["🚀 Reconciliation
Complete"]
+ P --> Q["🚀 Deploy
Complete"]
+
+ Q -.->|"Endpoints boot"| F["Peer Endpoints
Running"]
F -.->|"Peer-to-peer
Service Discovery"| R["Query State Manager
GraphQL API"]
style A fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff
- style G fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff
+ style E fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff
style I fill:#f57c00,stroke:#bf360c,stroke-width:3px,color:#fff
style K fill:#f57c00,stroke:#bf360c,stroke-width:3px,color:#fff
style L fill:#f57c00,stroke:#bf360c,stroke-width:3px,color:#fff
@@ -51,9 +49,9 @@ graph TD
```mermaid
graph TD
- A["Request arrives at
Mothership for funcA"] -->|"Consult manifest"| B{"Function
Location?"}
+ A["Request arrives at
Endpoint for funcA"] -->|"Consult manifest"| B{"Function
Location?"}
- B -->|"Local to Mothership"| C["Execute locally"]
+ B -->|"Local to Endpoint"| C["Execute locally"]
B -->|"On Endpoint1"| D["Route request to
Endpoint1 with payload"]
D --> E["Endpoint1 receives
Endpoint1>funcA"]
@@ -69,7 +67,7 @@ graph TD
L --> J
J --> M["funcA completes
with all results"]
- M --> N["Response back
to Mothership"]
+ M --> N["Response back
to Endpoint"]
N --> O["Return to client"]
style A fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff
@@ -82,7 +80,7 @@ graph TD
```mermaid
graph LR
- subgraph Mothership["🎯 Mothership
(Coordinator)"]
+ subgraph CoordinatorNode["🎯 Manifest Store"]
MF["Manifest Store
Function Map"]
end
@@ -105,7 +103,7 @@ graph LR
E1F1 -.->|"Local execution"| E1F2
E1F1 -.->|"Remote call"| E2F1
- style Mothership fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff
+ style CoordinatorNode fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff
style EP1 fill:#f57c00,stroke:#bf360c,stroke-width:3px,color:#fff
style EP2 fill:#f57c00,stroke:#bf360c,stroke-width:3px,color:#fff
style MF fill:#1565c0,stroke:#0d47a1,stroke-width:2px,color:#fff
@@ -118,8 +116,8 @@ graph LR
- **Smart Routing**: System automatically determines if execution is local (in-process) or remote (inter-endpoint)
- **Deployed Mode**: Unlike Live mode, endpoints are aware they're in distributed deployment with explicit role assignments
- **Transparent Execution**: Functions can call other functions without knowing deployment topology; manifest handles routing
-- **State Synchronization**: Mothership maintains single source of truth, synced with GQL State Manager
-- **Reconciliation**: On each boot, Mothership reconciles local manifest with persisted state to deploy/update/undeploy resources
+- **State Synchronization**: State Manager maintains the source of truth; endpoints sync via GraphQL
+- **Reconciliation**: The CLI reconciles the manifest with persisted state during `flash deploy`
- **Peer-to-Peer Discovery**: All endpoints query State Manager GraphQL API directly for service discovery
## Actual Manifest Structure
@@ -285,12 +283,8 @@ Each reconciliation action updates State Manager:
## Environment Variables
-### Mothership
-- `FLASH_IS_MOTHERSHIP=true` - Identifies this endpoint as mothership
-- `RUNPOD_API_KEY` - For State Manager authentication
-- `FLASH_MANIFEST_PATH` - Optional explicit path to manifest
-
-### Child Endpoints
+### All Endpoints
- `RUNPOD_API_KEY` - For State Manager GraphQL access (peer-to-peer service discovery)
- `FLASH_RESOURCE_NAME` - Which resource config this endpoint represents
-- `RUNPOD_ENDPOINT_ID` - This child's endpoint ID
+- `RUNPOD_ENDPOINT_ID` - This endpoint's ID (set by Runpod)
+- `FLASH_MANIFEST_PATH` - Optional explicit path to manifest
diff --git a/docs/Flash_Deploy_Guide.md b/docs/Flash_Deploy_Guide.md
index 234e5f33..1cd39780 100644
--- a/docs/Flash_Deploy_Guide.md
+++ b/docs/Flash_Deploy_Guide.md
@@ -30,12 +30,7 @@ graph TB
subgraph Cloud["Runpod Cloud"]
S3["S3 Storage
artifact.tar.gz"]
- subgraph Mothership["Mothership Endpoint
(FLASH_IS_MOTHERSHIP=true)"]
- MothershipReconciler["MothershipsProvisioner
Reconcile Children"]
- MothershipState["State Sync
to State Manager"]
- end
-
- subgraph ChildEndpoints["Child Endpoints
(Resource Configs)"]
+ subgraph Endpoints["Peer Endpoints
(one per resource config)"]
Handler1["GPU Handler
@remote functions"]
Handler2["CPU Handler
@remote functions"]
StateQuery["Service Registry
Query State Manager"]
@@ -47,22 +42,19 @@ graph TB
Developer -->|flash build| Build
Build -->|archive| S3
Developer -->|flash deploy --env| S3
- CLI -->|provision upfront
before activation| ChildEndpoints
- Mothership -->|reconcile_children
on boot| ChildEndpoints
- MothershipReconciler -->|update state| Database
- ChildEndpoints -->|query manifest
peer-to-peer| Database
- Developer -->|call @remote| ChildEndpoints
-
- style Mothership fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff
- style ChildEndpoints fill:#388e3c,stroke:#1b5e20,stroke-width:3px,color:#fff
+ CLI -->|provision all endpoints| Endpoints
+ Endpoints -->|query manifest
peer-to-peer| Database
+ Developer -->|call @remote| Endpoints
+
+ style Endpoints fill:#388e3c,stroke:#1b5e20,stroke-width:3px,color:#fff
style Build fill:#f57c00,stroke:#e65100,stroke-width:3px,color:#fff
```
### Key Concepts
-**Mothership**: The orchestration endpoint responsible for deployment, resource provisioning, and manifest distribution. Created via `flash env create `.
+**Endpoints**: All deployed endpoints are peers. The CLI provisions them upfront during `flash deploy`. Each endpoint loads the manifest from its `.flash/` directory and queries State Manager for peer discovery.
-**Child Endpoints**: Worker endpoints that execute `@remote` functions. One per resource config (e.g., `gpu_config`, `cpu_config`).
+**Worker Endpoints**: Endpoints that execute `@remote` functions. One per resource config (e.g., `gpu_config`, `cpu_config`).
**Manifest**: JSON document describing all deployed functions, their resource configs, routing rules, and metadata. Built at compile-time, distributed to all endpoints.
@@ -76,7 +68,7 @@ graph TB
### flash env create
-Create a new deployment environment (mothership).
+Create a new deployment environment.
```bash
flash env create [--app ]
@@ -91,7 +83,7 @@ flash env create [--app ]
**What it does:**
1. Creates a FlashApp in Runpod (if first environment for the app)
2. Creates FlashEnvironment with the specified name
-3. Provisions a mothership serverless endpoint
+3. Provisions serverless endpoints
**Example:**
```bash
@@ -277,93 +269,37 @@ sequenceDiagram
**Upload Process** (`src/runpod_flash/cli/commands/deploy.py:197-224`):
1. Archive uploaded to Runpod's built-in S3 storage
2. URL generated with temporary access
-3. URL passed to mothership endpoint creation
+3. URL passed to endpoint creation
**Key Files:**
- `src/runpod_flash/cli/commands/deploy.py` - Deploy CLI commands
---
-### Phase 3: Mothership Boot & Reconciliation
+### Phase 3: Endpoint Boot & Service Discovery
-The mothership runs on each boot to perform reconcile_children() - reconciling desired state (manifest) with current state (local resources). Note: All resources are provisioned upfront by the CLI before environment activation.
+Each endpoint boots independently. Endpoints that make cross-endpoint calls (i.e., call `@remote` functions deployed on a different resource config) query State Manager to discover peer endpoint URLs. Endpoints that only execute local functions do not need State Manager access.
```mermaid
sequenceDiagram
- Runpod->>Mothership: Boot endpoint
- Mothership->>Mothership: Initialize runtime
- Mothership->>ManifestFetcher: Load manifest from .flash/
- ManifestFetcher->>ManifestFetcher: Read flash_manifest.json
- Mothership->>MothershipsProvisioner: Execute reconcile_children()
- MothershipsProvisioner->>StateManager: Fetch persisted state
- StateManager->>GraphQL: Query persisted manifest
- GraphQL->>StateManager: Return persisted manifest
- MothershipsProvisioner->>MothershipsProvisioner: Compute diff:
new, changed, removed
- MothershipsProvisioner->>StateManager: Update state after
reconciliation
- StateManager->>GraphQL: Mutation:
updateFlashBuildManifest
- MothershipsProvisioner->>Mothership: Reconciliation complete
-```
-
-**Key Components:**
-
-**MothershipsProvisioner** (`src/runpod_flash/runtime/mothership_provisioner.py`):
-- `is_mothership()`: Check if endpoint is mothership (FLASH_IS_MOTHERSHIP=true)
-- `reconcile_children()`: Compute diff between desired and current state
-- Verifies child endpoints are deployed and healthy
-- Updates State Manager with reconciliation results
-
-**ResourceManager** (`src/runpod_flash/core/resources/resource_manager.py`):
-- Singleton pattern (global resource registry)
-- Stores state in `.runpod/resources.pkl` with file locking
-- Tracks config hashes for drift detection (hash comparison)
-- Provisioned upfront by CLI before environment activation
-- Auto-migrates legacy resources
-
-**StateManagerClient** (`src/runpod_flash/runtime/state_manager_client.py`):
-- GraphQL client for persisting manifest state
-- Read-modify-write pattern for updates (3 GQL roundtrips)
-- Thread-safe with asyncio.Lock for concurrent updates
-- Retries with exponential backoff (3 attempts)
-
-**Reconciliation Logic**:
-1. **Fetch persisted manifest**: Query State Manager for previous reconciliation state
-2. **Compare with current manifest**: Detect new, changed, and removed resources
-3. **Verify new resources**: Check that new endpoints are deployed and healthy
-4. **Verify changed resources**: Check if hash differs, verify endpoint health
-5. **Verify removed resources**: Check that deleted endpoints are decommissioned
-6. **Persist new state**: Update State Manager with current reconciliation results
-
-**Key Files:**
-- `src/runpod_flash/runtime/mothership_provisioner.py` - Reconciliation logic
-- `src/runpod_flash/core/resources/resource_manager.py` - Resource provisioning
-- `src/runpod_flash/runtime/state_manager_client.py` - State persistence
-
----
-
-### Phase 4: Child Endpoint Initialization
-
-Each child endpoint boots independently and prepares for function execution.
-
-```mermaid
-sequenceDiagram
- Runpod->>Child: Boot with handler_gpu_config.py
- Child->>Child: Initialize runtime
- Child->>ManifestFetcher: Load manifest from .flash/
+ Runpod->>Endpoint: Boot with handler
+ Endpoint->>Endpoint: Initialize runtime
+ Endpoint->>ManifestFetcher: Load manifest from .flash/
ManifestFetcher->>ManifestFetcher: Check cache
(TTL: 300s)
alt Cache expired
- ManifestFetcher->>StateManager: Query GraphQL API
State Manager
+ ManifestFetcher->>StateManager: Query GraphQL API
StateManager->>ManifestFetcher: Return manifest
else Cache valid
ManifestFetcher->>ManifestFetcher: Return cached
end
- ManifestFetcher->>Child: Manifest loaded
- Child->>ServiceRegistry: Load manifest
+ ManifestFetcher->>Endpoint: Manifest loaded
+ Endpoint->>ServiceRegistry: Load manifest
ServiceRegistry->>ServiceRegistry: Build function_registry
ServiceRegistry->>ServiceRegistry: Build resource_mapping
- Child->>StateManager: Query State Manager
peer-to-peer discovery
- StateManager->>Child: Return peer endpoints
- Child->>ServiceRegistry: Cache endpoint URLs
- Child->>Ready: Ready to execute functions
+ Endpoint->>StateManager: Query State Manager
peer-to-peer discovery
+ StateManager->>Endpoint: Return peer endpoints
+ Endpoint->>ServiceRegistry: Cache endpoint URLs
+ Endpoint->>Ready: Ready to execute functions
```
**ManifestFetcher** (`src/runpod_flash/runtime/manifest_fetcher.py`):
@@ -394,7 +330,7 @@ sequenceDiagram
---
-### Phase 5: Runtime Function Execution
+### Phase 4: Runtime Function Execution
When client calls `@remote function`:
@@ -530,7 +466,7 @@ The manifest is the contract between build-time and runtime. It defines all depl
### Runtime: Distribution & Caching
-**Mothership Side** - `ManifestFetcher`:
+**Endpoint Side** - `ManifestFetcher`:
1. **Check cache**: Is manifest cached and TTL valid?
- Cache TTL: 300 seconds (configurable)
@@ -547,7 +483,7 @@ The manifest is the contract between build-time and runtime. It defines all depl
**Code Reference**: `src/runpod_flash/runtime/manifest_fetcher.py:47-118`
-**Child Endpoint Side** - `ServiceRegistry`:
+**Worker Endpoint Side** - `ServiceRegistry`:
1. **Load manifest**: From local file
- Searches multiple locations (cwd, module dir, etc)
@@ -558,7 +494,7 @@ The manifest is the contract between build-time and runtime. It defines all depl
3. **Query State Manager**: Get endpoint URLs via GraphQL
- Queries Runpod State Manager GraphQL API directly
- - Returns: Resource endpoints for all deployed child endpoints
+ - Returns: Resource endpoints for all deployed worker endpoints
- Retries with exponential backoff
4. **Cache endpoints**: Store for routing decisions
@@ -608,7 +544,7 @@ Write: Mutation updateFlashBuildManifest
## Resource Provisioning
-Resources are dynamically provisioned by the mothership during boot, based on the manifest.
+Resources are provisioned by the CLI during `flash deploy`, based on the manifest.
### ResourceManager: Local State
@@ -646,14 +582,14 @@ Resources are dynamically provisioned by the mothership during boot, based on th
### Deployment Orchestration
-**MothershipsProvisioner** reconciles manifest with local state:
+The reconciler reconciles the manifest with the endpoint's local state:
```python
# 1. Load manifest from flash_manifest.json
manifest = load_manifest()
# 2. Fetch persisted state from State Manager
-persisted = await StateManagerClient.get_persisted_manifest(mothership_id)
+persisted = await StateManagerClient.get_persisted_manifest(flash_environment_id)
# 3. Compute diff
diff = compute_manifest_diff(manifest, persisted)
@@ -674,7 +610,7 @@ for resource_config in diff.removed:
delete_resource(resource_config)
# 7. Persist new state
-await StateManagerClient.update_resource_state(mothership_id, resources)
+await StateManagerClient.update_resource_state(flash_environment_id, resources)
```
**Parallel Deployment**:
@@ -686,8 +622,6 @@ await StateManagerClient.update_resource_state(mothership_id, resources)
- If hashes differ: Resource has been modified, trigger update
- Prevents unnecessary updates when resource unchanged
-**Code Reference**: `src/runpod_flash/runtime/mothership_provisioner.py:1-150`
-
---
## Remote Execution
@@ -870,21 +804,15 @@ graph TB
Archive["Archive Builder
(tar.gz)"]
end
- subgraph Upload["Upload"]
+ subgraph Deploy["Deploy (CLI)"]
S3["S3 Storage"]
+ Provisioner["ResourceManager
(provision endpoints)"]
+ StateMgr["StateManagerClient
(persist state)"]
end
- subgraph MothershipBoot["Mothership Boot"]
- Fetcher["ManifestFetcher
(cache + GQL)"]
- MProvisioner["MothershipsProvisioner
(reconciliation)"]
- ResMgr["ResourceManager
(state)"]
- StateMgr["StateManagerClient
(persistence)"]
- end
-
- subgraph ChildBoot["Child Endpoint Boot"]
- ChildFetcher["ManifestFetcher
(local file)"]
+ subgraph EndpointBoot["Endpoint Boot"]
+ Fetcher["ManifestFetcher
(local file + GQL)"]
Registry["ServiceRegistry
(function mapping)"]
- ManifestC["ManifestClient
(query mothership)"]
end
subgraph Runtime["Runtime Execution"]
@@ -896,20 +824,16 @@ graph TB
Scanner --> ManifestB
ManifestB --> Archive
Archive --> S3
- S3 --> Fetcher
- Fetcher --> MProvisioner
- MProvisioner --> ResMgr
- ResMgr --> StateMgr
- StateMgr -->|update| S3
- ChildFetcher --> Registry
- ManifestC -->|query| Fetcher
- Registry --> ManifestC
+ S3 --> Provisioner
+ Provisioner --> StateMgr
+ Fetcher --> Registry
+ Registry -->|query State Manager
peer-to-peer| StateMgr
Handler --> Serial
Serial --> Exec
style Build fill:#f57c00,stroke:#e65100,stroke-width:3px,color:#fff
- style MothershipBoot fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff
- style ChildBoot fill:#388e3c,stroke:#1b5e20,stroke-width:3px,color:#fff
+ style Deploy fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff
+ style EndpointBoot fill:#388e3c,stroke:#1b5e20,stroke-width:3px,color:#fff
style Runtime fill:#7b1fa2,stroke:#4a148c,stroke-width:3px,color:#fff
```
@@ -921,14 +845,12 @@ graph TB
graph LR
A["Build Time
ManifestBuilder"] -->|Generate| B["flash_manifest.json
(embedded in archive)"]
B -->|Upload| C["S3
(artifact.tar.gz)"]
- C -->|Provision upfront
before activation| D["Child Endpoints
(deployed)"]
+ C -->|CLI provisions
endpoints| D["Endpoints
(deployed)"]
D -->|Extract from
.flash/ directory| E["LocalManifest
(from archive)"]
- Mothership -->|Load from
.flash/| E
E -->|Build registry| F["ServiceRegistry
(function mapping)"]
F -->|Query State Manager
peer-to-peer| G["StateManager
(GraphQL API)"]
G -->|Return endpoints| F
F -->|Route calls| H["Handler
(execute)"]
- Mothership -->|reconcile_children
on boot| D
style A fill:#f57c00,stroke:#e65100,stroke-width:2px,color:#fff
style B fill:#ff6f00,stroke:#e65100,stroke-width:2px,color:#fff
@@ -938,7 +860,6 @@ graph LR
style F fill:#388e3c,stroke:#1b5e20,stroke-width:2px,color:#fff
style G fill:#0d47a1,stroke:#051c66,stroke-width:2px,color:#fff
style H fill:#388e3c,stroke:#1b5e20,stroke-width:2px,color:#fff
- style Mothership fill:#1976d2,stroke:#0d47a1,stroke-width:2px,color:#fff
```
---
@@ -947,7 +868,7 @@ graph LR
```mermaid
graph LR
- A["Mothership Boots"] -->|Load manifest| B["Desired State"]
+ A["CLI: flash deploy"] -->|Load manifest| B["Desired State"]
B -->|Fetch persisted| C["Current State"]
C -->|Compute diff| D{"Reconciliation"}
D -->|new| E["Create Resource"]
@@ -959,7 +880,7 @@ graph LR
D -->|removed| J["Delete Resource"]
J -->|Decommission| K["Deleted"]
K -->|Remove state| G
- G -->|On next boot| C
+ G -->|On next deploy| C
style A fill:#1976d2,stroke:#0d47a1,stroke-width:2px,color:#fff
style B fill:#1976d2,stroke:#0d47a1,stroke-width:2px,color:#fff
@@ -974,39 +895,26 @@ graph LR
## Environment Variables Reference
-### Mothership Configuration
-
-**FLASH_IS_MOTHERSHIP** (Required on mothership)
-- Value: `"true"`
-- Enables mothership auto-provisioning logic
-- Triggers manifest reconciliation on boot
-
-**RUNPOD_ENDPOINT_ID** (Required on mothership)
-- Runpod serverless endpoint ID
-- Used to construct mothership URL: `https://{RUNPOD_ENDPOINT_ID}.api.runpod.ai`
-- Set automatically by Runpod platform
+### All Endpoints
-**RUNPOD_API_KEY** (Required for State Manager)
+**RUNPOD_API_KEY** (Required)
- Runpod API authentication token
- Used by StateManagerClient for GraphQL queries
-- Enables manifest persistence
+- Enables peer-to-peer service discovery and manifest persistence
-### Child Endpoint Configuration
-
-**FLASH_RESOURCE_NAME** (Required on child endpoints)
+**FLASH_RESOURCE_NAME** (Required)
- Resource config name (e.g., "gpu_config", "cpu_config")
- Identifies which resource config this endpoint represents
- Used by ServiceRegistry for local vs remote detection
-**RUNPOD_API_KEY** (Required for peer-to-peer discovery)
-- API key for State Manager GraphQL access
-- Enables endpoints to query manifest peer-to-peer
-- Used by all endpoints for service discovery
+**RUNPOD_ENDPOINT_ID** (Set by Runpod)
+- Runpod serverless endpoint ID
+- Used to construct endpoint URL: `https://{RUNPOD_ENDPOINT_ID}.api.runpod.ai`
+- Set automatically by Runpod platform
**FLASH_MANIFEST_PATH** (Optional)
- Override default manifest file location
- If not set, searches: cwd, module dir, parent dirs
-- Useful for testing or non-standard layouts
### Runtime Configuration
@@ -1048,7 +956,7 @@ Flash Deploy uses a dual-layer state system for reliability and consistency.
### Remote State: Runpod State Manager (GraphQL API)
-**Purpose**: Persist deployment state across mothership boots
+**Purpose**: Persist deployment state across endpoint boots
**Data Model**:
```graphql
@@ -1092,7 +1000,7 @@ async with state_manager_lock:
```
**Reconciliation**:
-On mothership boot:
+On deploy:
1. Load local manifest from .flash/ (desired state)
2. Fetch persisted manifest from State Manager (previous reconciliation state)
3. Compare → detect new, changed, removed resources
@@ -1117,9 +1025,9 @@ flash build --preview
1. Builds your project (creates archive, manifest)
2. Creates a Docker network for inter-container communication
3. Starts one Docker container per resource config:
- - Mothership container (orchestrator)
+ - Application container
- All worker containers (GPU, CPU, etc.)
-4. Exposes mothership on `localhost:8000`
+4. Exposes application on `localhost:8000`
5. All containers communicate via Docker DNS
6. Auto-cleanup on exit (Ctrl+C)
@@ -1132,25 +1040,6 @@ flash build --preview
**Code Reference**: `src/runpod_flash/cli/commands/preview.py`
-### Local Docker Testing
-
-For testing complete deployment flow locally:
-
-```bash
-# Build project
-flash build
-
-# Start local mothership simulator
-docker run -it \
- -e FLASH_IS_MOTHERSHIP=true \
- -e RUNPOD_API_KEY=$RUNPOD_API_KEY \
- -v $(pwd)/.flash:/workspace/.flash \
- runpod-flash:latest
-
-# Run provisioner
-python -m runpod_flash.runtime.mothership_provisioner
-```
-
### Debugging Tips
**Enable Debug Logging**:
@@ -1189,7 +1078,6 @@ logging.getLogger("runpod_flash.runtime.service_registry").setLevel(logging.DEBU
|------|---------|
| `src/runpod_flash/cli/commands/deploy.py` | Deploy environment management commands |
| `src/runpod_flash/cli/commands/build.py` | Build packaging and archive creation |
-| `src/runpod_flash/cli/commands/test_mothership.py` | Local mothership testing |
### Build System
@@ -1212,7 +1100,6 @@ logging.getLogger("runpod_flash.runtime.service_registry").setLevel(logging.DEBU
|------|---------|
| `src/runpod_flash/runtime/manifest_fetcher.py` | Manifest loading from local .flash/ directory |
| `src/runpod_flash/runtime/state_manager_client.py` | GraphQL client for peer-to-peer service discovery |
-| `src/runpod_flash/runtime/mothership_provisioner.py` | Auto-provisioning logic |
### Runtime: Execution
@@ -1235,7 +1122,7 @@ logging.getLogger("runpod_flash.runtime.service_registry").setLevel(logging.DEBU
## Common Issues & Solutions
-### Issue: Manifest not found on child endpoint
+### Issue: Manifest not found on worker endpoint
**Cause**: flash_manifest.json not included in archive or not found at runtime
@@ -1255,13 +1142,13 @@ logging.getLogger("runpod_flash.runtime.service_registry").setLevel(logging.DEBU
### Issue: Remote function calls fail with endpoint not found
-**Cause**: ServiceRegistry unable to query mothership or manifest outdated
+**Cause**: ServiceRegistry unable to query State Manager or manifest outdated
**Solution**:
1. Verify `RUNPOD_API_KEY` environment variable is set
2. Check State Manager GraphQL API is accessible
3. Verify manifest includes the resource config: `grep resource_name flash_manifest.json`
-4. Check that child endpoints are deployed and healthy
+4. Check that worker endpoints are deployed and healthy
### Issue: Manifest cache staleness
diff --git a/docs/Load_Balancer_Endpoints.md b/docs/Load_Balancer_Endpoints.md
index 091e1893..77ba38bc 100644
--- a/docs/Load_Balancer_Endpoints.md
+++ b/docs/Load_Balancer_Endpoints.md
@@ -4,7 +4,7 @@
The `LoadBalancerSlsResource` class enables provisioning and management of Runpod load-balanced serverless endpoints. Unlike queue-based endpoints that process requests sequentially, load-balanced endpoints expose HTTP servers directly to clients, enabling REST APIs, webhooks, and real-time communication patterns.
-This resource type is used for specialized endpoints like the Mothership. Cross-endpoint service discovery now uses State Manager GraphQL API (peer-to-peer) rather than HTTP endpoints.
+This resource type is used for specialized endpoints like entry-point endpoints. Cross-endpoint service discovery now uses State Manager GraphQL API (peer-to-peer) rather than HTTP endpoints.
## Design Context
@@ -35,10 +35,10 @@ Load-balanced endpoints require different provisioning and health check logic th
### Why This Matters
-The Mothership coordinates resource deployment and reconciliation. This requires:
-- Peer-to-peer service discovery via State Manager GraphQL API (not HTTP-based)
-- Ability to expose custom endpoints (HTTP routes like `/ping`, user-defined routes)
-- Health checking to verify children are ready before routing traffic
+Load-balanced endpoints expose HTTP servers directly to clients. This enables:
+- Custom HTTP routes (user-defined REST endpoints, `/ping` for health checks)
+- Direct request routing to workers (lower latency than queue-based)
+- Health check polling to verify workers are ready before routing traffic
## Architecture
@@ -147,9 +147,9 @@ This document focuses on the `LoadBalancerSlsResource` class implementation and
from runpod_flash import LoadBalancerSlsResource
# Create a load-balanced endpoint
-mothership = LoadBalancerSlsResource(
- name="mothership",
- imageName="my-mothership-app:latest",
+api_endpoint = LoadBalancerSlsResource(
+ name="api-endpoint",
+ imageName="my-api-app:latest",
workersMin=1,
workersMax=3,
env={
@@ -159,7 +159,7 @@ mothership = LoadBalancerSlsResource(
)
# Deploy endpoint (returns immediately)
-deployed = await mothership.deploy()
+deployed = await api_endpoint.deploy()
# Endpoint is now deployed (may still be initializing)
print(f"Endpoint ID: {deployed.id}")
@@ -246,7 +246,7 @@ except ValueError as e:
```python
try:
endpoint = LoadBalancerSlsResource(
- name="mothership",
+ name="api-endpoint",
imageName="my-image:latest",
)
deployed = await endpoint.deploy()
@@ -294,10 +294,10 @@ If you need to verify the endpoint is ready before routing traffic:
```python
# Deploy returns immediately
-mothership = await LoadBalancerSlsResource(name="my-lb", ...).deploy()
+endpoint = await LoadBalancerSlsResource(name="my-lb", ...).deploy()
# Optional: Wait for endpoint to become healthy
-healthy = await mothership._wait_for_health(max_retries=10, retry_interval=5)
+healthy = await endpoint._wait_for_health(max_retries=10, retry_interval=5)
if not healthy:
print("Warning: Endpoint deployed but not yet healthy")
```
@@ -319,7 +319,7 @@ Default health check configuration:
| Scalability | Per-function | Per-worker |
| Health checks | Runpod SDK | `/ping` endpoint |
| Use cases | Batch processing | APIs, webhooks, real-time |
-| Suitable for | Workers | Mothership, services |
+| Suitable for | Workers | APIs, services |
## Implementation Details
@@ -411,7 +411,6 @@ endpoint = LoadBalancerSlsResource(
## Next Steps
-- **Mothership integration**: Use LoadBalancerSlsResource for Mothership endpoints
+- **Entry-point integration**: Use LoadBalancerSlsResource for entry-point endpoints
- **Upfront provisioning**: CLI provisions all resources before environment activation
-- **Reconciliation**: Mothership performs reconcile_children() on boot
- **Cross-endpoint routing**: Route requests using State Manager GraphQL API (peer-to-peer)
diff --git a/src/runpod_flash/cli/docs/README.md b/src/runpod_flash/cli/docs/README.md
index a9a70853..1a1b4dfe 100644
--- a/src/runpod_flash/cli/docs/README.md
+++ b/src/runpod_flash/cli/docs/README.md
@@ -15,7 +15,7 @@ Create a new project, navigate to it, and install dependencies:
```bash
flash init my-project
cd my-project
-pip install -r requirements.txt
+uv sync # or: pip install -r requirements.txt
```
Add your Runpod API key to `.env`:
@@ -295,16 +295,10 @@ Default location: `.flash/logs/activity.log`
```
my-project/
-├── main.py # Flash Server (FastAPI)
-├── workers/
-│ ├── gpu/ # GPU worker
-│ │ ├── __init__.py
-│ │ └── endpoint.py
-│ └── cpu/ # CPU worker
-│ ├── __init__.py
-│ └── endpoint.py
+├── gpu_worker.py # GPU worker with @remote function
+├── cpu_worker.py # CPU worker with @remote function
├── .env
-├── requirements.txt
+├── pyproject.toml # Python dependencies (uv/pip compatible)
└── README.md
```
@@ -322,12 +316,12 @@ RUNPOD_API_KEY=your_api_key_here
curl http://localhost:8888/ping
# Call GPU worker
-curl -X POST http://localhost:8888/gpu/hello \
+curl -X POST http://localhost:8888/gpu_worker/run_sync \
-H "Content-Type: application/json" \
-d '{"message": "Hello GPU!"}'
# Call CPU worker
-curl -X POST http://localhost:8888/cpu/hello \
+curl -X POST http://localhost:8888/cpu_worker/run_sync \
-H "Content-Type: application/json" \
-d '{"message": "Hello CPU!"}'
```
diff --git a/src/runpod_flash/cli/docs/flash-app.md b/src/runpod_flash/cli/docs/flash-app.md
index 3abc29a2..00cecaff 100644
--- a/src/runpod_flash/cli/docs/flash-app.md
+++ b/src/runpod_flash/cli/docs/flash-app.md
@@ -444,8 +444,7 @@ flash deploy --app my-project
```
Or ensure you're in a valid Flash project directory with:
-- `main.py` with Flash server
-- `workers/` directory
+- Python files containing `@remote` decorated functions
- Proper project structure
### Multiple Apps With Same Name
diff --git a/src/runpod_flash/cli/docs/flash-build.md b/src/runpod_flash/cli/docs/flash-build.md
index 120fe60e..deb0e633 100644
--- a/src/runpod_flash/cli/docs/flash-build.md
+++ b/src/runpod_flash/cli/docs/flash-build.md
@@ -108,9 +108,9 @@ Launch a local Docker-based test environment immediately after building. This al
1. Builds your project (creates archive, manifest)
2. Creates a Docker network for inter-container communication
3. Starts one Docker container per resource config:
- - Mothership container (orchestrator)
+ - Application container
- All worker containers (GPU, CPU, etc.)
-4. Exposes the mothership on `localhost:8000`
+4. Exposes the application on `localhost:8888`
5. All containers communicate via Docker DNS
6. On shutdown (Ctrl+C), automatically stops and removes all containers
@@ -192,7 +192,7 @@ Successful build displays:
### Build fails with "functions not found"
-Ensure your project has `@remote` decorated functions in `workers/` directory:
+Ensure your project has `@remote` decorated functions in your `.py` files:
```python
from runpod_flash import remote, LiveServerless
diff --git a/src/runpod_flash/cli/docs/flash-deploy.md b/src/runpod_flash/cli/docs/flash-deploy.md
index 504ad874..d0fcb6a7 100644
--- a/src/runpod_flash/cli/docs/flash-deploy.md
+++ b/src/runpod_flash/cli/docs/flash-deploy.md
@@ -27,25 +27,25 @@ The `flash deploy` command is the primary way to get your Flash application runn
## Architecture: Fully Deployed to Runpod
-With `flash deploy`, your **entire application** runs on Runpod Serverless—both your FastAPI app (the "orchestrator") and all `@remote` worker functions:
+With `flash deploy`, your **entire application** runs on Runpod Serverless—all `@remote` functions deploy as peer serverless endpoints:
```
┌─────────────────────────────────────────────────────────────────┐
│ RUNPOD SERVERLESS │
│ │
-│ ┌─────────────────────────────────────┐ │
-│ │ MOTHERSHIP ENDPOINT │ │
-│ │ (your FastAPI app from main.py) │ │
-│ │ - Your HTTP routes │ │
-│ │ - Orchestrates @remote calls │───────────┐ │
-│ │ - Public URL for users │ │ │
-│ └─────────────────────────────────────┘ │ │
-│ │ internal │
-│ ▼ │
+│ All endpoints deployed as peers, using manifest for discovery │
+│ │
│ ┌─────────────────────────┐ ┌─────────────────────────┐ │
│ │ gpu-worker │ │ cpu-worker │ │
│ │ (your @remote function) │ │ (your @remote function) │ │
│ └─────────────────────────┘ └─────────────────────────┘ │
+│ │
+│ ┌─────────────────────────┐ │
+│ │ lb-worker │ │
+│ │ (load-balanced endpoint)│ │
+│ └─────────────────────────┘ │
+│ │
+│ Service discovery: flash_manifest.json + State Manager GraphQL │
└─────────────────────────────────────────────────────────────────┘
▲
│ HTTPS (authenticated)
@@ -56,9 +56,8 @@ With `flash deploy`, your **entire application** runs on Runpod Serverless—bot
```
**Key points:**
-- **Your FastAPI app runs on Runpod** as the "mothership" endpoint
-- **`@remote` functions run on Runpod** as separate worker endpoints
-- **Users call the mothership URL** directly (e.g., `https://xyz123.api.runpod.ai/api/hello`)
+- **All `@remote` functions run on Runpod** as serverless endpoints
+- **Users call endpoint URLs** directly (e.g., `https://xyz123.api.runpod.ai/api/hello`)
- **No `live-` prefix** on endpoint names (these are production endpoints)
- **No hot reload:** code changes require a new deployment
@@ -68,7 +67,7 @@ This is different from `flash run`, where your FastAPI app runs locally on your
| Aspect | `flash run` | `flash deploy` |
|--------|-------------|----------------|
-| **FastAPI app runs on** | Your machine (localhost) | Runpod Serverless (mothership) |
+| **App runs on** | Your machine (localhost) | Runpod Serverless |
| **`@remote` functions run on** | Runpod Serverless | Runpod Serverless |
| **Endpoint naming** | `live-` prefix (e.g., `live-gpu-worker`) | No prefix (e.g., `gpu-worker`) |
| **Hot reload** | Yes | No |
@@ -183,9 +182,9 @@ Builds your project and launches a local Docker-based test environment instead o
1. Builds your project (creates the archive and manifest)
2. Creates a Docker network for inter-container communication
3. Starts one Docker container per resource config:
- - Mothership container (orchestrator)
+ - Application container
- All worker containers (GPU, CPU, etc.)
-4. Exposes the mothership on `localhost:8000`
+4. Exposes the application on `localhost:8000`
5. All containers communicate via Docker DNS
6. On shutdown (Ctrl+C), automatically stops and removes all containers
@@ -350,7 +349,7 @@ Next Steps:
variable...
2. Call Your Functions
- Your mothership is deployed at:
+ Your application is deployed at:
https://api-xxxxx.runpod.net
3. Available Routes
diff --git a/src/runpod_flash/cli/docs/flash-env.md b/src/runpod_flash/cli/docs/flash-env.md
index c3f87744..81ce7993 100644
--- a/src/runpod_flash/cli/docs/flash-env.md
+++ b/src/runpod_flash/cli/docs/flash-env.md
@@ -464,8 +464,7 @@ flash env delete
**Problem**: Command requires `--app` flag even when in project directory
**Solution**: Ensure you're in a Flash project directory with:
-- `main.py` with Flash server
-- `workers/` directory
+- Python files containing `@remote` decorated functions
- `.env` file with `RUNPOD_API_KEY`
Or specify app explicitly:
diff --git a/src/runpod_flash/cli/docs/flash-init.md b/src/runpod_flash/cli/docs/flash-init.md
index 082b619a..19c32f13 100644
--- a/src/runpod_flash/cli/docs/flash-init.md
+++ b/src/runpod_flash/cli/docs/flash-init.md
@@ -4,7 +4,7 @@ Create a new Flash project with a ready-to-use template structure.
## Overview
-The `flash init` command scaffolds a new Flash project with everything you need to get started: a main server (mothership), example GPU and CPU workers, and the directory structure that Flash expects. It's the fastest way to go from zero to a working distributed application.
+The `flash init` command scaffolds a new Flash project with everything you need to get started: example GPU and CPU worker files with `@remote` functions and the project structure that Flash expects. It's the fastest way to go from zero to a working distributed application.
> **Note:** This command only creates **local files**. It doesn't interact with Runpod or create any cloud resources. Cloud resources (apps, environments, endpoints) are created later when you run `flash deploy`.
@@ -51,16 +51,10 @@ flash init my-project --force
```
my-project/
-├── main.py # Flash Server (FastAPI)
-├── workers/
-│ ├── gpu/ # GPU worker example
-│ │ ├── __init__.py
-│ │ └── endpoint.py
-│ └── cpu/ # CPU worker example
-│ ├── __init__.py
-│ └── endpoint.py
+├── gpu_worker.py # GPU worker with @remote function
+├── cpu_worker.py # CPU worker with @remote function
├── .env
-├── requirements.txt
+├── pyproject.toml # Python dependencies (uv/pip compatible)
└── README.md
```
@@ -68,7 +62,7 @@ my-project/
```bash
cd my-project
-pip install -r requirements.txt # or use your preferred environment manager
+uv sync # or: pip install -r requirements.txt
# Add RUNPOD_API_KEY to .env
flash run
```
diff --git a/src/runpod_flash/cli/docs/flash-run.md b/src/runpod_flash/cli/docs/flash-run.md
index 0b9cfd73..70976d6c 100644
--- a/src/runpod_flash/cli/docs/flash-run.md
+++ b/src/runpod_flash/cli/docs/flash-run.md
@@ -4,7 +4,7 @@ Start the Flash development server for testing/debugging/development.
## Overview
-The `flash run` command starts a local development server that hosts your FastAPI app on your machine while deploying `@remote` functions to Runpod Serverless. This hybrid architecture lets you rapidly iterate on your application with hot-reload while testing real GPU/CPU workloads in the cloud.
+The `flash run` command starts a local development server that auto-discovers your `@remote` functions and serves them on your machine while deploying them to Runpod Serverless. This hybrid architecture lets you rapidly iterate on your application with hot-reload while testing real GPU/CPU workloads in the cloud.
Use `flash run` when you want to skip the build step and test/develop/debug your remote functions rapidly before deploying your full application with `flash deploy`. (See [Flash Deploy](./flash-deploy.md) for details.)
@@ -16,10 +16,10 @@ With `flash run`, your system runs in a **hybrid architecture**:
┌─────────────────────────────────────────────────────────────────┐
│ YOUR MACHINE (localhost:8888) │
│ ┌─────────────────────────────────────┐ │
-│ │ FastAPI App (main.py) │ │
-│ │ - Your HTTP routes │ │
-│ │ - Orchestrates @remote calls │─────────┐ │
-│ │ - Hot-reload enabled │ │ │
+│ │ Auto-generated server │ │
+│ │ (.flash/server.py) │ │
+│ │ - Discovers @remote functions │─────────┐ │
+│ │ - Hot-reload via watchfiles │ │ │
│ └─────────────────────────────────────┘ │ │
└──────────────────────────────────────────────────│──────────────┘
│ HTTPS
@@ -34,10 +34,11 @@ With `flash run`, your system runs in a **hybrid architecture**:
```
**Key points:**
-- **Your FastAPI app runs locally** on your machine (uvicorn at `localhost:8888`)
+- **`flash run` auto-discovers `@remote` functions** and generates `.flash/server.py`
+- **Queue-based (QB) routes execute locally** at `/{file_prefix}/run_sync`
+- **Load-balanced (LB) routes dispatch remotely** via `LoadBalancerSlsStub`
- **`@remote` functions run on Runpod** as serverless endpoints
-- **Your machine is the orchestrator** that calls remote endpoints when you invoke `@remote` functions
-- **Hot reload works** because your app code is local—changes are picked up instantly
+- **Hot reload** watches for `.py` file changes via watchfiles
- **Endpoints are prefixed with `live-`** to distinguish development endpoints from production (e.g., `gpu-worker` becomes `live-gpu-worker`)
This is different from `flash deploy`, where **everything** (including your FastAPI app) runs on Runpod. See [flash deploy](./flash-deploy.md) for the fully-deployed architecture.
@@ -73,9 +74,9 @@ flash run --host 0.0.0.0 --port 8000
## What It Does
-1. Discovers `main.py` (or `app.py`, `server.py`)
-2. Checks for FastAPI app
-3. Starts uvicorn server with hot reload
+1. Scans project files for `@remote` decorated functions
+2. Generates `.flash/server.py` with QB and LB routes
+3. Starts uvicorn server with hot-reload via watchfiles
4. GPU workers use LiveServerless (no packaging needed)
### How It Works
@@ -84,8 +85,11 @@ When you call a `@remote` function using `flash run`, Flash deploys a **Serverle
```
flash run
│
+ ├── Scans project for @remote functions
+ ├── Generates .flash/server.py
├── Starts local server (e.g. localhost:8888)
- │ └── Hosts your FastAPI mothership
+ │ ├── QB routes: /{file_prefix}/run_sync (local execution)
+ │ └── LB routes: /{file_prefix}/{path} (remote dispatch)
│
└── On @remote function call:
└── Deploys a Serverless endpoint (if not cached)
@@ -106,7 +110,7 @@ Auto-provisioning discovers and deploys Serverless endpoints before the Flash de
### How It Works
-1. **Resource Discovery**: Scans your FastAPI app for `@remote` decorated functions
+1. **Resource Discovery**: Scans project files for `@remote` decorated functions
2. **Parallel Deployment**: Deploys resources concurrently (up to 3 at a time)
3. **Confirmation**: Asks for confirmation if deploying more than 5 endpoints
4. **Caching**: Stores deployed resources in `.runpod/resources.pkl` for reuse across runs