refactor: simplify flash init skeleton for zero-boilerplate flash run#208
refactor: simplify flash init skeleton for zero-boilerplate flash run#208
Conversation
There was a problem hiding this comment.
Pull request overview
Refactors the Flash “init + run” experience to be zero-boilerplate by removing the FastAPI-first skeleton, auto-discovering @remote functions, and generating a local dev server under .flash/ with hot-reload and LB dispatch support.
Changes:
- Replace the skeleton template with a flat
gpu_worker.py/cpu_worker.py/lb_worker.pylayout and rewrite the skeleton README accordingly. - Rework
flash runto scan for@remotefunctions, generate.flash/server.py, run uvicorn with targeted reload, and clean up live endpoints on Ctrl+C. - Update scanner/manifest/build plumbing to support file-path-derived routing fields and LB handler generation; adjust/add unit tests.
Reviewed changes
Copilot reviewed 33 out of 34 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_skeleton.py | Updates skeleton expectations to the new flat-file template layout. |
| tests/unit/resources/test_serverless.py | Adds unit coverage for live-provisioning deploy checks and payload exclude behavior. |
| tests/unit/cli/test_run.py | Adds unit coverage for flash run server generation, reload behavior, watcher, and LB route generation. |
| tests/unit/cli/commands/build_utils/test_path_utilities.py | New tests for file-path → URL/module/resource naming utilities and LB handler detection. |
| tests/unit/cli/commands/build_utils/test_manifest_mothership.py | Removes legacy mothership manifest tests (mothership concept removed). |
| tests/integration/test_run_auto_provision.py | Removes old integration tests tied to the FastAPI entrypoint model. |
| src/runpod_flash/stubs/registry.py | Adds stubbing support for CpuLiveLoadBalancer via LoadBalancerSlsStub. |
| src/runpod_flash/stubs/load_balancer_sls.py | Broadens /execute routing decision to cover all live resources via LiveServerlessMixin. |
| src/runpod_flash/core/resources/serverless.py | Skips health check during live provisioning; excludes template when templateId set; injects FLASH_MODULE_PATH for LB deploys. |
| src/runpod_flash/core/resources/resource_manager.py | Removes noisy URL logging during deploy/get-or-deploy flows. |
| src/runpod_flash/core/resources/load_balancer_sls_resource.py | Promotes LB deploy logs to info with endpoint URL output. |
| src/runpod_flash/core/api/runpod.py | Tweaks GraphQL logging messages for endpoint save operations. |
| src/runpod_flash/client.py | Adds LB route-handler passthrough behavior for @remote(method=..., path=...) LB functions. |
| src/runpod_flash/cli/utils/skeleton_template/workers/gpu/endpoint.py | Removes legacy GPU worker skeleton module. |
| src/runpod_flash/cli/utils/skeleton_template/workers/gpu/init.py | Removes legacy FastAPI router wrapper for GPU worker. |
| src/runpod_flash/cli/utils/skeleton_template/workers/cpu/endpoint.py | Removes legacy CPU worker skeleton module. |
| src/runpod_flash/cli/utils/skeleton_template/workers/cpu/init.py | Removes legacy FastAPI router wrapper for CPU worker. |
| src/runpod_flash/cli/utils/skeleton_template/pyproject.toml | Simplifies skeleton pyproject (removes FastAPI/uvicorn + dev tooling from template). |
| src/runpod_flash/cli/utils/skeleton_template/mothership.py | Removes legacy mothership config from the skeleton template. |
| src/runpod_flash/cli/utils/skeleton_template/main.py | Removes legacy main.py FastAPI app from the skeleton template. |
| src/runpod_flash/cli/utils/skeleton_template/lb_worker.py | New skeleton LB worker example using @remote(method=..., path=...). |
| src/runpod_flash/cli/utils/skeleton_template/gpu_worker.py | New skeleton GPU QB worker example. |
| src/runpod_flash/cli/utils/skeleton_template/cpu_worker.py | New skeleton CPU QB worker example. |
| src/runpod_flash/cli/utils/skeleton_template/README.md | Rewrites template docs for uv + flat files + new routes; adds GPU/CPU references. |
| src/runpod_flash/cli/utils/skeleton_template/.gitignore | Adds .flash/ to template gitignore. |
| src/runpod_flash/cli/commands/run.py | Implements scanner-driven .flash/server.py generation, targeted reload, auto-provision, and Ctrl+C cleanup. |
| src/runpod_flash/cli/commands/init.py | Updates init output to reflect the new skeleton structure and removes mothership steps. |
| src/runpod_flash/cli/commands/build_utils/scanner.py | Adds file-path utilities and LB route-handler metadata; excludes __init__.py from scanning. |
| src/runpod_flash/cli/commands/build_utils/manifest.py | Removes mothership detection; adds file_path, local_path_prefix, module_path fields per resource. |
| src/runpod_flash/cli/commands/build_utils/lb_handler_generator.py | Simplifies LB handler lifespan (removes mothership reconciliation logic). |
| src/runpod_flash/cli/commands/build.py | Generates LB handlers during build; loosens project validation to any Python-containing dir. |
| src/runpod_flash/cli/commands/_run_server_helpers.py | Adds helper layer for LB route dispatch and body→kwargs mapping in generated server. |
| PRD.md | Adds a PRD/spec describing the intended zero-boilerplate model and route conventions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 36 out of 37 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 37 out of 38 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 43 out of 44 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…covery LB @Remote functions (with method= and path=) now return the decorated function unwrapped with __is_lb_route_handler__=True. The function body executes directly on the LB endpoint server rather than being dispatched as a remote stub. QB stubs inside the body are unaffected. Scanner gains three path utilities (file_to_url_prefix, file_to_resource_name, file_to_module_path) that convert file paths to URL prefixes, resource names, and dotted module paths respectively. RemoteFunctionMetadata gains is_lb_route_handler to distinguish LB route handlers from QB remote stubs during discovery.
Remove _serialize_routes, _create_mothership_resource, and _create_mothership_from_explicit — all referenced unimported symbols and caused F821 lint errors. The manifest now emits a flat resources dict with file_path, local_path_prefix, and module_path per resource; no is_mothership flag.
flash run now scans the project for all @Remote functions, generates .flash/server.py with routes derived from file paths, and starts uvicorn with --app-dir .flash/. Route convention: gpu_worker.py -> /gpu_worker/run and /gpu_worker/run_sync; subdirectory files produce matching URL prefixes. Cleanup on Ctrl+C is fixed: _cleanup_live_endpoints now reads .runpod/resources.pkl written by the uvicorn subprocess and deprovisions all live- prefixed endpoints, removing the dead in-process _SESSION_ENDPOINTS approach which never received data from the subprocess.
…project validation LBHandlerGenerator is now called from run_build() for all is_load_balanced resources, wiring the build pipeline to the new module_path-based handler generation. validate_project_structure switches from glob to rglob so projects with files only in subdirectories (e.g. 00_multi_resource) are not incorrectly rejected. lb_handler_generator loses the mothership reconciliation lifespan (StateManagerClient, reconcile_children) in favour of a clean startup/shutdown lifespan.
is_deployed skips the health check when FLASH_IS_LIVE_PROVISIONING=true. Newly created endpoints can fail RunPod's health API for a few seconds after creation (propagation delay), causing get_or_deploy_resource to trigger a spurious re-deploy on the second request (e.g. /run_sync immediately after /run). _payload_exclude now excludes template when templateId is already set. After first deployment _do_deploy sets templateId on the config object while the set_serverless_template validator has already set template at construction time. Sending both fields in the same payload causes RunPod to return 'You can only provide one of templateId or template.' Also adds _get_module_path helper and injects FLASH_MODULE_PATH into LB endpoint environment at deploy time so the deployed handler can import the correct user module.
Parent process watches project .py files via watchfiles and regenerates .flash/server.py on change. Uvicorn now watches only .flash/server.py instead of the whole project, so it reloads exactly once per change with the updated routes visible. - Add _watch_and_regenerate() background thread using watchfiles - Change --reload-dir from '.' to '.flash', --reload-include to 'server.py' - Start watcher thread when reload=True, stop on KeyboardInterrupt/Exception - Add TestRunCommandHotReload and TestWatchAndRegenerate test classes
watchfiles emits DEBUG-level messages ("all changes filtered out",
"rust notify timeout") that are correct behavior but should not be
visible to users. Silence the watchfiles logger at WARNING in
_watch_and_regenerate() — scoped to that namespace only.
FastAPI treats `body: dict` as a required JSON body. GET/HEAD routes must be zero-arg so Swagger UI and browsers do not attempt to send a body, which triggers a fetch TypeError. Split the LB route code generator in _generate_flash_server() on method: get/head emit no-arg handlers; all other methods keep body: dict.
…ision LB route handlers were executing locally in the dev server process instead of forwarding to the deployed LB endpoint. The @Remote decorator returns LB handlers unwrapped (passthrough) because in a deployed pod the body IS the HTTP handler, but in flash run there is no deployed pod. Changes: - Add _run_server_helpers.py with lb_proxy() that uses ResourceManager.get_or_deploy_resource() for on-demand provisioning and get_authenticated_httpx_client() for auth headers - Generate proxy handlers for all LB routes (any HTTP method) that forward requests to the deployed endpoint transparently - Import resource config variables (not function bodies) for LB workers so the actual DeployableResource object is passed to lb_proxy - Restore --auto-provision flag dropped in 35cfa6e, using existing ResourceDiscovery and DeploymentOrchestrator to provision all endpoints upfront and eliminate cold-start latency - Replace TestGenerateFlashServer tests with proxy-aware assertions
ResourceDiscovery._import_module() uses importlib to execute each file, but cross-module imports (e.g. "from longruns.stage1 import ...") fail when the project root isn't on sys.path. This caused --auto-provision to silently skip LB endpoints whose files import from sibling packages.
Cleanup on server stop now prints a summary line with undeployed count and wall-clock duration, matching the provisioning output format.
…proxy Replace lb_proxy (transparent HTTP forwarding) with lb_execute which uses LoadBalancerSlsStub's /execute dispatch path. This fixes 404s on CpuLiveLoadBalancer resources where the remote container has no user routes — only the /execute endpoint that accepts serialized function code. - Change isinstance check from LiveLoadBalancer to LiveServerlessMixin so all live resource types (including CpuLiveLoadBalancer) use /execute - Add explicit CpuLiveLoadBalancer singledispatch registration in registry - Generate server.py imports for both config var and function reference - Clean up redundant URL debug logs in resource_manager
Restore lb_execute to dispatch through LoadBalancerSlsStub instead of calling functions locally — LB resources require Live Serverless containers and cannot execute on a local machine. Keep _map_body_to_params and body: dict signatures for OpenAPI/Swagger compatibility while dispatching remotely via the stub's /execute path. Remove /run from generated QB routes, retaining only /run_sync since the dev server executes synchronously.
Replace the old multi-directory skeleton (main.py, mothership.py, workers/) with three flat files: gpu_worker.py, cpu_worker.py, and lb_worker.py. flash run auto-discovers @Remote functions so the FastAPI boilerplate and router structure are no longer needed. - Remove main.py, mothership.py, workers/, .ruff_cache from skeleton - Add gpu_worker.py (QB GPU), cpu_worker.py (QB CPU), lb_worker.py (LB) - Simplify pyproject.toml deps (drop fastapi/uvicorn) - Add .flash/ to .gitignore - Rewrite README with uv setup, QB/LB examples, GpuType reference - Update init command panel output and next steps - Add Ctrl+C cleanup hint to flash run startup output - Update skeleton tests for new file structure
Directory names starting with digits (e.g. 01_getting_started/) produce invalid Python when used in import statements and function names. - Add _flash_import helper to generated server.py that uses importlib.import_module() with scoped sys.path so sibling imports (e.g. `from cpu_worker import ...`) resolve to the correct directory - Prefix generated function names with '_' when they start with a digit - Scope sys.path per-import to prevent name collisions when multiple directories contain files with the same name (e.g. cpu_worker.py)
The skeleton template was replaced with flat worker files (cpu_worker.py, gpu_worker.py, lb_worker.py, pyproject.toml) but the wheel validation script still expected the old multi-directory structure (main.py, workers/**). This caused the Build Package CI check to fail.
- Guard watcher_thread.join() with is_alive() check for --no-reload - Wrap watchfiles import in try/except for missing dependency - Fix debug log to show actual type instead of hardcoded class name - Fix invalid dict addition in skeleton README example - Fix PRD spec to match actual /run_sync-only behavior
…tures
The dev server codegen always generated `await fn(body.get("input", body))`
regardless of actual function signature. This crashed zero-param functions
with TypeError and incorrectly passed a dict to multi-param functions.
Scanner changes:
- Extract param_names from function AST nodes (excluding self)
- Extract class_method_params per public method for @Remote classes
Codegen changes:
- 0 params: `await fn()` with no `body: dict` in handler signature
- 1 param: `await fn(body.get("input", body))` (preserves current behavior)
- 2+ params: `await fn(**body.get("input", body))` (kwargs spread)
- LB GET routes with path params (e.g. `/images/{file_id}`) now declare
typed parameters in handler signature for proper Swagger UI rendering
- LB POST routes with path params merge body and path params
Use pydantic.create_model() at server startup to dynamically build input models from @Remote function signatures. Swagger UI now shows typed form fields instead of a generic JSON text area. - Add make_input_model(), call_with_body(), to_dict() helpers - Codegen emits model creation lines and typed handler signatures - Simplify _build_call_expr to 2-way branch (zero-param vs body) - Fix class method introspection: use _class_type to bypass RemoteClassWrapper proxy signatures (*args, **kwargs) - Skip VAR_POSITIONAL/VAR_KEYWORD params in model creation as safety net - Fall back to dict when model creation fails (zero disruption)
When @Remote funcA calls @Remote funcB, the worker receives only funcA's source via exec(). funcB is undefined in that namespace, causing NameError. This adds dependency_resolver.py which AST-detects calls to other @Remote functions, provisions their endpoints via ResourceManager, and generates async dispatch stubs that are prepended to the caller's source code. The worker's exec() then defines both the stubs and the caller in the same namespace, allowing stacked @Remote calls to dispatch correctly. - Add dependency_resolver.py with detect, resolve, generate, and build - Change prepare_request to async in LiveServerlessStub and LoadBalancerSlsStub - Move LB stub timeout to constants.py as DEFAULT_LB_STUB_TIMEOUT (60s) - Update registry.py to await prepare_request - Add 24 unit tests for dependency resolver
Tags now use the parent directory path instead of per-file worker type labels. Routes from the same project appear under a single collapsible group in the Swagger UI, making multi-worker projects easier to navigate.
- Guard watcher_thread creation behind reload flag - Fix project_root derivation in build mode (use build_dir, not parent) - Update PRD QB route spec to match /run_sync-only implementation - Harden _cleanup_live_endpoints with granular error handling - Replace bare except in _watch_and_regenerate with specific handlers - Differentiate error types in lb_execute (422 for app errors, 500 for infra) - Add templateId/template mutual exclusivity validation - Improve _flash_import sys.path cleanup with index-based pop - Use explicit event loop in cleanup to avoid nested loop errors - Parallelize dependency provisioning with asyncio.gather - Clarify null-safety in detect_remote_dependencies
…oth templateId and template _payload_exclude() raised ValueError after _do_deploy() set templateId on a config object that already had template from initialization. Remove the raise in favor of silently excluding template (templateId takes precedence), and clear self.template after deploy mutation to prevent the inconsistent state at its source.
50ee92e to
9f1928d
Compare
Not meant to be committed; internal planning document.
Summary
main.py,mothership.py,workers/) with three flat files:gpu_worker.py,cpu_worker.py,lb_worker.pyflash runauto-discovers@remotefunctions — no FastAPI boilerplate, routers, ormain.pyneededChanges
Skeleton template:
main.py,mothership.py,workers/,.ruff_cache/gpu_worker.py(QB GPU),cpu_worker.py(QB CPU),lb_worker.py(LB HTTP)pyproject.toml(remove fastapi/uvicorn deps).flash/to.gitignoreREADME.mdfor flat-file approachCLI:
flash initpanel output and next steps for new structureflash runstartup outputflash run engine (prior commits):
@remotefunctions, generates.flash/server.pyLoadBalancerSlsStub--auto-provisionflagTests:
flash rununit testsTest plan
make quality-checkpasses (1043 tests, 68% coverage)flash init test_projectcreates flat structurecd test_project && flash runstarts dev server/gpu_worker/run_syncand/cpu_worker/run_sync/lb_worker/processand/lb_worker/health