Skip to content

Comments

refactor: simplify flash init skeleton for zero-boilerplate flash run#208

Merged
deanq merged 25 commits intomainfrom
refactor/ae-2210-simplified-starter
Feb 20, 2026
Merged

refactor: simplify flash init skeleton for zero-boilerplate flash run#208
deanq merged 25 commits intomainfrom
refactor/ae-2210-simplified-starter

Conversation

@deanq
Copy link
Member

@deanq deanq commented Feb 19, 2026

Summary

  • Replace the old multi-directory skeleton (main.py, mothership.py, workers/) with three flat files: gpu_worker.py, cpu_worker.py, lb_worker.py
  • flash run auto-discovers @remote functions — no FastAPI boilerplate, routers, or main.py needed
  • Rewrite skeleton README with uv setup, QB/LB worker examples, GpuType reference table, and auto-provision tips

Changes

Skeleton template:

  • Delete main.py, mothership.py, workers/, .ruff_cache/
  • Add gpu_worker.py (QB GPU), cpu_worker.py (QB CPU), lb_worker.py (LB HTTP)
  • Simplify pyproject.toml (remove fastapi/uvicorn deps)
  • Add .flash/ to .gitignore
  • Rewrite README.md for flat-file approach

CLI:

  • Update flash init panel output and next steps for new structure
  • Add Ctrl+C cleanup hint to flash run startup output

flash run engine (prior commits):

  • Zero-boilerplate dev server: scans @remote functions, generates .flash/server.py
  • LB route dispatch through LoadBalancerSlsStub
  • Hot-reload on file changes
  • Auto-provision with --auto-provision flag
  • Endpoint cleanup on Ctrl+C

Tests:

  • Update skeleton tests for new file structure
  • Add flash run unit tests

Test plan

  • make quality-check passes (1043 tests, 68% coverage)
  • flash init test_project creates flat structure
  • cd test_project && flash run starts dev server
  • QB endpoints respond at /gpu_worker/run_sync and /cpu_worker/run_sync
  • LB endpoints respond at /lb_worker/process and /lb_worker/health
  • Ctrl+C cleans up provisioned endpoints

@deanq deanq changed the title Simplify flash init skeleton for zero-boilerplate flash run refactor: simplify flash init skeleton for zero-boilerplate flash run Feb 19, 2026
@deanq deanq requested a review from Copilot February 19, 2026 22:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the Flash “init + run” experience to be zero-boilerplate by removing the FastAPI-first skeleton, auto-discovering @remote functions, and generating a local dev server under .flash/ with hot-reload and LB dispatch support.

Changes:

  • Replace the skeleton template with a flat gpu_worker.py / cpu_worker.py / lb_worker.py layout and rewrite the skeleton README accordingly.
  • Rework flash run to scan for @remote functions, generate .flash/server.py, run uvicorn with targeted reload, and clean up live endpoints on Ctrl+C.
  • Update scanner/manifest/build plumbing to support file-path-derived routing fields and LB handler generation; adjust/add unit tests.

Reviewed changes

Copilot reviewed 33 out of 34 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/unit/test_skeleton.py Updates skeleton expectations to the new flat-file template layout.
tests/unit/resources/test_serverless.py Adds unit coverage for live-provisioning deploy checks and payload exclude behavior.
tests/unit/cli/test_run.py Adds unit coverage for flash run server generation, reload behavior, watcher, and LB route generation.
tests/unit/cli/commands/build_utils/test_path_utilities.py New tests for file-path → URL/module/resource naming utilities and LB handler detection.
tests/unit/cli/commands/build_utils/test_manifest_mothership.py Removes legacy mothership manifest tests (mothership concept removed).
tests/integration/test_run_auto_provision.py Removes old integration tests tied to the FastAPI entrypoint model.
src/runpod_flash/stubs/registry.py Adds stubbing support for CpuLiveLoadBalancer via LoadBalancerSlsStub.
src/runpod_flash/stubs/load_balancer_sls.py Broadens /execute routing decision to cover all live resources via LiveServerlessMixin.
src/runpod_flash/core/resources/serverless.py Skips health check during live provisioning; excludes template when templateId set; injects FLASH_MODULE_PATH for LB deploys.
src/runpod_flash/core/resources/resource_manager.py Removes noisy URL logging during deploy/get-or-deploy flows.
src/runpod_flash/core/resources/load_balancer_sls_resource.py Promotes LB deploy logs to info with endpoint URL output.
src/runpod_flash/core/api/runpod.py Tweaks GraphQL logging messages for endpoint save operations.
src/runpod_flash/client.py Adds LB route-handler passthrough behavior for @remote(method=..., path=...) LB functions.
src/runpod_flash/cli/utils/skeleton_template/workers/gpu/endpoint.py Removes legacy GPU worker skeleton module.
src/runpod_flash/cli/utils/skeleton_template/workers/gpu/init.py Removes legacy FastAPI router wrapper for GPU worker.
src/runpod_flash/cli/utils/skeleton_template/workers/cpu/endpoint.py Removes legacy CPU worker skeleton module.
src/runpod_flash/cli/utils/skeleton_template/workers/cpu/init.py Removes legacy FastAPI router wrapper for CPU worker.
src/runpod_flash/cli/utils/skeleton_template/pyproject.toml Simplifies skeleton pyproject (removes FastAPI/uvicorn + dev tooling from template).
src/runpod_flash/cli/utils/skeleton_template/mothership.py Removes legacy mothership config from the skeleton template.
src/runpod_flash/cli/utils/skeleton_template/main.py Removes legacy main.py FastAPI app from the skeleton template.
src/runpod_flash/cli/utils/skeleton_template/lb_worker.py New skeleton LB worker example using @remote(method=..., path=...).
src/runpod_flash/cli/utils/skeleton_template/gpu_worker.py New skeleton GPU QB worker example.
src/runpod_flash/cli/utils/skeleton_template/cpu_worker.py New skeleton CPU QB worker example.
src/runpod_flash/cli/utils/skeleton_template/README.md Rewrites template docs for uv + flat files + new routes; adds GPU/CPU references.
src/runpod_flash/cli/utils/skeleton_template/.gitignore Adds .flash/ to template gitignore.
src/runpod_flash/cli/commands/run.py Implements scanner-driven .flash/server.py generation, targeted reload, auto-provision, and Ctrl+C cleanup.
src/runpod_flash/cli/commands/init.py Updates init output to reflect the new skeleton structure and removes mothership steps.
src/runpod_flash/cli/commands/build_utils/scanner.py Adds file-path utilities and LB route-handler metadata; excludes __init__.py from scanning.
src/runpod_flash/cli/commands/build_utils/manifest.py Removes mothership detection; adds file_path, local_path_prefix, module_path fields per resource.
src/runpod_flash/cli/commands/build_utils/lb_handler_generator.py Simplifies LB handler lifespan (removes mothership reconciliation logic).
src/runpod_flash/cli/commands/build.py Generates LB handlers during build; loosens project validation to any Python-containing dir.
src/runpod_flash/cli/commands/_run_server_helpers.py Adds helper layer for LB route dispatch and body→kwargs mapping in generated server.
PRD.md Adds a PRD/spec describing the intended zero-boilerplate model and route conventions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 38 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 43 out of 44 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

deanq added 15 commits February 20, 2026 14:02
…covery

LB @Remote functions (with method= and path=) now return the decorated
function unwrapped with __is_lb_route_handler__=True. The function body
executes directly on the LB endpoint server rather than being dispatched
as a remote stub. QB stubs inside the body are unaffected.

Scanner gains three path utilities (file_to_url_prefix,
file_to_resource_name, file_to_module_path) that convert file paths to
URL prefixes, resource names, and dotted module paths respectively.
RemoteFunctionMetadata gains is_lb_route_handler to distinguish LB route
handlers from QB remote stubs during discovery.
Remove _serialize_routes, _create_mothership_resource, and
_create_mothership_from_explicit — all referenced unimported symbols and
caused F821 lint errors. The manifest now emits a flat resources dict
with file_path, local_path_prefix, and module_path per resource; no
is_mothership flag.
flash run now scans the project for all @Remote functions, generates
.flash/server.py with routes derived from file paths, and starts uvicorn
with --app-dir .flash/. Route convention: gpu_worker.py -> /gpu_worker/run
and /gpu_worker/run_sync; subdirectory files produce matching URL prefixes.

Cleanup on Ctrl+C is fixed: _cleanup_live_endpoints now reads
.runpod/resources.pkl written by the uvicorn subprocess and deprovisions
all live- prefixed endpoints, removing the dead in-process _SESSION_ENDPOINTS
approach which never received data from the subprocess.
…project validation

LBHandlerGenerator is now called from run_build() for all is_load_balanced
resources, wiring the build pipeline to the new module_path-based handler
generation. validate_project_structure switches from glob to rglob so
projects with files only in subdirectories (e.g. 00_multi_resource) are
not incorrectly rejected.

lb_handler_generator loses the mothership reconciliation lifespan
(StateManagerClient, reconcile_children) in favour of a clean
startup/shutdown lifespan.
is_deployed skips the health check when FLASH_IS_LIVE_PROVISIONING=true.
Newly created endpoints can fail RunPod's health API for a few seconds
after creation (propagation delay), causing get_or_deploy_resource to
trigger a spurious re-deploy on the second request (e.g. /run_sync
immediately after /run).

_payload_exclude now excludes template when templateId is already set.
After first deployment _do_deploy sets templateId on the config object
while the set_serverless_template validator has already set template at
construction time. Sending both fields in the same payload causes RunPod
to return 'You can only provide one of templateId or template.'

Also adds _get_module_path helper and injects FLASH_MODULE_PATH into LB
endpoint environment at deploy time so the deployed handler can import
the correct user module.
Parent process watches project .py files via watchfiles and regenerates
.flash/server.py on change. Uvicorn now watches only .flash/server.py
instead of the whole project, so it reloads exactly once per change
with the updated routes visible.

- Add _watch_and_regenerate() background thread using watchfiles
- Change --reload-dir from '.' to '.flash', --reload-include to 'server.py'
- Start watcher thread when reload=True, stop on KeyboardInterrupt/Exception
- Add TestRunCommandHotReload and TestWatchAndRegenerate test classes
watchfiles emits DEBUG-level messages ("all changes filtered out",
"rust notify timeout") that are correct behavior but should not be
visible to users. Silence the watchfiles logger at WARNING in
_watch_and_regenerate() — scoped to that namespace only.
FastAPI treats `body: dict` as a required JSON body. GET/HEAD routes
must be zero-arg so Swagger UI and browsers do not attempt to send a
body, which triggers a fetch TypeError.

Split the LB route code generator in _generate_flash_server() on
method: get/head emit no-arg handlers; all other methods keep body: dict.
…ision

LB route handlers were executing locally in the dev server process
instead of forwarding to the deployed LB endpoint. The @Remote decorator
returns LB handlers unwrapped (passthrough) because in a deployed pod
the body IS the HTTP handler, but in flash run there is no deployed pod.

Changes:
- Add _run_server_helpers.py with lb_proxy() that uses
  ResourceManager.get_or_deploy_resource() for on-demand provisioning
  and get_authenticated_httpx_client() for auth headers
- Generate proxy handlers for all LB routes (any HTTP method) that
  forward requests to the deployed endpoint transparently
- Import resource config variables (not function bodies) for LB workers
  so the actual DeployableResource object is passed to lb_proxy
- Restore --auto-provision flag dropped in 35cfa6e, using existing
  ResourceDiscovery and DeploymentOrchestrator to provision all
  endpoints upfront and eliminate cold-start latency
- Replace TestGenerateFlashServer tests with proxy-aware assertions
ResourceDiscovery._import_module() uses importlib to execute each file,
but cross-module imports (e.g. "from longruns.stage1 import ...") fail
when the project root isn't on sys.path. This caused --auto-provision
to silently skip LB endpoints whose files import from sibling packages.
Cleanup on server stop now prints a summary line with undeployed count
and wall-clock duration, matching the provisioning output format.
…proxy

Replace lb_proxy (transparent HTTP forwarding) with lb_execute which
uses LoadBalancerSlsStub's /execute dispatch path. This fixes 404s on
CpuLiveLoadBalancer resources where the remote container has no user
routes — only the /execute endpoint that accepts serialized function
code.

- Change isinstance check from LiveLoadBalancer to LiveServerlessMixin
  so all live resource types (including CpuLiveLoadBalancer) use /execute
- Add explicit CpuLiveLoadBalancer singledispatch registration in registry
- Generate server.py imports for both config var and function reference
- Clean up redundant URL debug logs in resource_manager
Restore lb_execute to dispatch through LoadBalancerSlsStub instead of
calling functions locally — LB resources require Live Serverless
containers and cannot execute on a local machine.

Keep _map_body_to_params and body: dict signatures for OpenAPI/Swagger
compatibility while dispatching remotely via the stub's /execute path.

Remove /run from generated QB routes, retaining only /run_sync since
the dev server executes synchronously.
Replace the old multi-directory skeleton (main.py, mothership.py,
workers/) with three flat files: gpu_worker.py, cpu_worker.py, and
lb_worker.py. flash run auto-discovers @Remote functions so the
FastAPI boilerplate and router structure are no longer needed.

- Remove main.py, mothership.py, workers/, .ruff_cache from skeleton
- Add gpu_worker.py (QB GPU), cpu_worker.py (QB CPU), lb_worker.py (LB)
- Simplify pyproject.toml deps (drop fastapi/uvicorn)
- Add .flash/ to .gitignore
- Rewrite README with uv setup, QB/LB examples, GpuType reference
- Update init command panel output and next steps
- Add Ctrl+C cleanup hint to flash run startup output
- Update skeleton tests for new file structure
Directory names starting with digits (e.g. 01_getting_started/) produce
invalid Python when used in import statements and function names.

- Add _flash_import helper to generated server.py that uses
  importlib.import_module() with scoped sys.path so sibling imports
  (e.g. `from cpu_worker import ...`) resolve to the correct directory
- Prefix generated function names with '_' when they start with a digit
- Scope sys.path per-import to prevent name collisions when multiple
  directories contain files with the same name (e.g. cpu_worker.py)
The skeleton template was replaced with flat worker files (cpu_worker.py,
gpu_worker.py, lb_worker.py, pyproject.toml) but the wheel validation
script still expected the old multi-directory structure (main.py,
workers/**). This caused the Build Package CI check to fail.
- Guard watcher_thread.join() with is_alive() check for --no-reload
- Wrap watchfiles import in try/except for missing dependency
- Fix debug log to show actual type instead of hardcoded class name
- Fix invalid dict addition in skeleton README example
- Fix PRD spec to match actual /run_sync-only behavior
…tures

The dev server codegen always generated `await fn(body.get("input", body))`
regardless of actual function signature. This crashed zero-param functions
with TypeError and incorrectly passed a dict to multi-param functions.

Scanner changes:
- Extract param_names from function AST nodes (excluding self)
- Extract class_method_params per public method for @Remote classes

Codegen changes:
- 0 params: `await fn()` with no `body: dict` in handler signature
- 1 param: `await fn(body.get("input", body))` (preserves current behavior)
- 2+ params: `await fn(**body.get("input", body))` (kwargs spread)
- LB GET routes with path params (e.g. `/images/{file_id}`) now declare
  typed parameters in handler signature for proper Swagger UI rendering
- LB POST routes with path params merge body and path params
Use pydantic.create_model() at server startup to dynamically build input
models from @Remote function signatures. Swagger UI now shows typed form
fields instead of a generic JSON text area.

- Add make_input_model(), call_with_body(), to_dict() helpers
- Codegen emits model creation lines and typed handler signatures
- Simplify _build_call_expr to 2-way branch (zero-param vs body)
- Fix class method introspection: use _class_type to bypass
  RemoteClassWrapper proxy signatures (*args, **kwargs)
- Skip VAR_POSITIONAL/VAR_KEYWORD params in model creation as safety net
- Fall back to dict when model creation fails (zero disruption)
When @Remote funcA calls @Remote funcB, the worker receives only funcA's
source via exec(). funcB is undefined in that namespace, causing NameError.

This adds dependency_resolver.py which AST-detects calls to other @Remote
functions, provisions their endpoints via ResourceManager, and generates
async dispatch stubs that are prepended to the caller's source code. The
worker's exec() then defines both the stubs and the caller in the same
namespace, allowing stacked @Remote calls to dispatch correctly.

- Add dependency_resolver.py with detect, resolve, generate, and build
- Change prepare_request to async in LiveServerlessStub and LoadBalancerSlsStub
- Move LB stub timeout to constants.py as DEFAULT_LB_STUB_TIMEOUT (60s)
- Update registry.py to await prepare_request
- Add 24 unit tests for dependency resolver
Tags now use the parent directory path instead of per-file worker type
labels. Routes from the same project appear under a single collapsible
group in the Swagger UI, making multi-worker projects easier to navigate.
- Guard watcher_thread creation behind reload flag
- Fix project_root derivation in build mode (use build_dir, not parent)
- Update PRD QB route spec to match /run_sync-only implementation
- Harden _cleanup_live_endpoints with granular error handling
- Replace bare except in _watch_and_regenerate with specific handlers
- Differentiate error types in lb_execute (422 for app errors, 500 for infra)
- Add templateId/template mutual exclusivity validation
- Improve _flash_import sys.path cleanup with index-based pop
- Use explicit event loop in cleanup to avoid nested loop errors
- Parallelize dependency provisioning with asyncio.gather
- Clarify null-safety in detect_remote_dependencies
…oth templateId and template

_payload_exclude() raised ValueError after _do_deploy() set templateId on
a config object that already had template from initialization. Remove the
raise in favor of silently excluding template (templateId takes precedence),
and clear self.template after deploy mutation to prevent the inconsistent
state at its source.
@deanq deanq force-pushed the refactor/ae-2210-simplified-starter branch from 50ee92e to 9f1928d Compare February 20, 2026 22:06
Not meant to be committed; internal planning document.
@deanq deanq merged commit 22894d4 into main Feb 20, 2026
6 checks passed
@deanq deanq deleted the refactor/ae-2210-simplified-starter branch February 20, 2026 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants