Skip to content

The Great Server Refactor#122

Merged
philpax merged 105 commits intomainfrom
the-great-server-refactor
May 6, 2026
Merged

The Great Server Refactor#122
philpax merged 105 commits intomainfrom
the-great-server-refactor

Conversation

@philpax
Copy link
Copy Markdown
Contributor

@philpax philpax commented Apr 30, 2026

Fixes #78.

Why

Our server is an agent-grown monolith that suffers from the following:

  • module-level globals
  • a loosely-specified, unversioned protocol prone to desynchronisation with the consuming code
  • no static type discipline
  • unstructured logging
  • unclear flow
  • local imports everywhere
  • multiple sources of truth

This PR resolves this by aggressively refactoring the server to break it apart, making it easier to maintain, and moving it to a form that may be easier to port to another language in future. The end result is a server that's much easier to maintain and that is generally more robust.

What changed

Python server: from monolith to packages

  • server.py was split into many files, each with their own focus. main.py is now a thin entry point.
  • Module globals are gone.
    • Per-connection state lives on a Connection dataclass.
    • Engine state lives on a WorldEngineManager.
    • Cross-cutting deps (Engines, ServerStartup, SystemMonitor) hang off app.state and are pulled via typed FastAPI Depends accessors.
  • Concrete exception classes replace bare RuntimeError("long message…"); _require_X() helpers replace the repeated if x is None: raise boilerplate.

Static-type discipline

  • ruff + basedpyright (strict) added with a per-line / per-file suppressions strategy with zero project-wide suppressions, so a new violation in pure-Python code surfaces.
  • The CI's lint-backend job runs both on every PR.
  • Pydantic models everywhere on the wire. server/protocol.py is the single source of truth for all protocol messages.

Cross-language type sharing

  • Custom Pydantic -> TypeScript codegen at server-components/scripts/codegen_ts.py walks protocol.py and recording/video_recorder.py and emits src/types/protocol.generated.ts containing both Zod schemas and z.infer-derived TS types.
  • CI's lint-backend runs codegen_ts.py --check and fails if the generated TS is stale.
  • Compile-time drift gates connect the protocol to the renderer's i18n: every MessageId value must have a translation under app.server.{error,warning}.* (and vice versa); every StageId must have a STAGE_PERCENTS entry and a stage.* translation.
  • Receive-path validation: every incoming WS message and binary frame header is run through the Zod schema; bad payloads are dropped with a logged Zod error rather than fed downstream as garbage.
  • An RPC request map links each *Request discriminator to its *ResponseData, so request('init', params) returns Promise<InitResponseData> without any call-site casts.

Structured logging, end-to-end

  • Server-side logs go through structlog, with per-connection and per-thread scope. All logs are structured, improving consistency and making programmatic ingest more viable. Electron produces similarly-structured logs.
  • WS log broadcast is split off TeeStream into a dedicated LogBroadcast and ships typed LogMessage records (event + level + logger + timestamp + exception + fields) to every connected client.
  • The Electron and server logs are now de-duplicated, ensuring that server logs that are sent to Electron will not be produced twice in the resulting logs.
  • ServerLogDisplay pretty-renders them with a fixed visual hierarchy.

Protocol versioning

  • PROTOCOL_VERSION is a module-level constant in protocol.py, codegen'd to TS as a runtime constant. The client appends ?protocol_version=N to the WS URL on connect; the server reads websocket.query_params right after accept() and rejects mismatches with an error message rendered by the client using the existing translatable-error infrastructure.

Single-session gate

  • The shared WorldEngine is single-tenant (rolling frame history, seed slot, progress callback are all process-wide singletons). This was an implicit constraint and not checked. We add a check for this that will reject any additional clients with an appropriate error.

Diagnostics export

  • The diagnostic JSON copied from "Copy Report" / "Report on GitHub" / Debug-tab "Copy diagnostics" now carries electron_logs and server_logs as separate buckets.
  • Mid-session exports from the Debug settings tab now actually include the running session's history (was logs: []).
  • Cross-bucket de-duplication: subprocess pass-through no longer records into the Electron buffer (Python events would otherwise appear twice - once via stdout pipe, once via WS broadcast).

Fixes spotted along the way

  • Stale-file unpacker: the package split (server.py -> server/, server_logging.py -> util/server_logging.py, …) broke pre-refactor installs because the unpacker only copied files in and never pruned them. The structure is now mirrored, with a carve-out for files and folders that should persist (e.g. the cache).
  • Recording-induced frame stalls: the video recorder was writing frames on the same thread as the WebSocket transmission, which meant that the former could lead to stalls in the latter. Frames-to-record are now submitted to another thread to process, resolving this.

For testing, on both Windows and Linux, ensure that all functionality works, as described in CONTRIBUTING.md.

philpax and others added 30 commits April 30, 2026 16:48
Strict-from-day-one ruleset (F/E/W/I/N/UP/B/A/C4/SIM/RET/PIE/TC/ICN/ARG/FBT/
BLE/DTZ/ASYNC/TRY/RUF/PLE/PLW/PERF) with two ignore tiers in pyproject.toml:
permanent (library naming conventions, FastAPI patterns) and step-tracked
(each remaining violation tagged with the refactor step that removes it),
so subsequent commits delete ignore lines rather than add them.

41 errors auto-fixed (imports sorted, pyupgrade rewrites, unused imports,
redundant else-after-return, etc.). Remaining 101 are all suppressed via
tracked ignores; ruff format pass follows in the next commit.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mechanical formatter pass over server-components/ — quotes, line breaks,
trailing commas. No semantic changes; isolated commit so subsequent diffs
aren't muddied by formatter churn.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Strict mode globally; legacy violations suppressed in tiers — permanent
(third-party stub gaps, FastAPI/Pydantic re-export pattern) and step-tracked
(each rule annotated with the refactor step that removes it). New modules
added in subsequent steps are written to pass strict mode without relying
on these suppressions; the suppressions only let the existing legacy code
through while we incrementally tighten it.

800 → 0 errors with the suppressions applied. Each later step's exit
criterion includes deleting the corresponding ignore lines.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Already a transitive dep of fastapi, but step 2 onward writes Pydantic
models directly throughout the codebase — promoting it to a direct
dependency makes the version explicit and survives any upstream change
to fastapi's transitive set.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Server-side ruff + basedpyright now exist (see prior commits in this
branch). Document them alongside the JS lint commands so contributors
know to run both, and explain the tiered-ignore policy that lets the
refactor land incrementally.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pydantic-based wire protocol layer that every WS message will route
through. Discriminated unions on `type: Literal[...]` for both directions:

- ClientMessage: ControlNotif | PauseNotif | ResumeNotif | ResetNotif |
  PromptNotif | InitRequest | SceneEditRequest | GenerateSceneRequest |
  CheckSeedSafetyRequest
- ServerPushMessage: StatusMessage | SystemInfoMessage | ErrorMessage |
  WarningMessage | LogMessage
- RpcSuccess[T] | RpcError as the response envelope (PEP 695 generic)

Translation keys consolidated into MessageId StrEnum so the renderer and
server can't disagree silently. SystemInfo / ErrorSnapshot promoted from
TypedDict to BaseModel since they cross the wire.

No consumers in this commit — exception to the "land with consumer" rule
because the next four commits *are* the consumers, and bundling them
into one diff would be unreviewable. The module is fully strict-typed
under basedpyright (zero rule suppressions needed) and round-trips
parse/serialize cleanly per smoke test.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pre-init loop now uses ClientMessageAdapter.validate_json + match
instead of json.loads + msg.get(...):

- handle_check_seed_safety takes a typed CheckSeedSafetyRequest and
  returns RpcSuccess[CheckSeedSafetyResponseData] | RpcError; sends
  responses via model_dump_json directly.
- handle_init takes a typed InitRequest; partial-delta semantics
  preserved via Pydantic's model_fields_set.
- SceneEditRequest / GenerateSceneRequest sent before init now reject
  with a typed RpcError instead of being silently dropped.
- Notifications during pre-init are explicitly enumerated (no wildcard),
  so adding a new ClientMessage variant breaks compilation here.
- dispatch_request becomes a thin adapter that the unconverted game-loop
  receiver still calls; deleted in commit 5 of this step.
- The game-loop handle_init call site is wrapped via
  InitRequest.model_validate(msg) — temporary scaffolding, removed in
  commit 3 when the receiver itself is converted.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The receiver now uses ClientMessageAdapter.validate_json + an exhaustive
match over the discriminated union — every ClientMessage variant has its
own arm, so adding a new variant breaks compilation here. Net 36 lines
removed; the three near-duplicate `if isinstance(result, BinaryResponse)`
dispatch blocks are gone.

Each RPC arm constructs typed RpcSuccess[T] / RpcError directly via the
protocol helpers; ControlNotif's resolution of buttons → button_codes
runs against typed fields rather than `msg.get(...)`. SafetyRejection-
Error's `message_id` attribute is now mapped through `MessageId(...)`
so the wire value is enforceably one of the enumerated keys.

Outstanding:
- BinaryResponse and dispatch_request are now unreferenced; commit 5
  deletes them along with the now-unused build_init_response_data dict
  shape.
- queue_send / send_json still take dict; commit 4 (next) converts them
  to BaseModel and unifies serialisation at the wire boundary.
- The future returned by the generator thread for scene_edit /
  generate_scene is still a dict; step 5 (scene_authoring extraction)
  types it directly so the `**preview` / `**data` splat goes away.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every server-to-client push now flows through Pydantic models — no more
ad-hoc dicts assembled at call sites:

- Single send_message(msg: BaseModel) helper at the WS edge; all of
  send_json / send_warning / send_stage now build a typed model and
  delegate to it.
- frame_queue / queue_send retyped from `dict | bytes` to `BaseModel |
  bytes`; the sender's branch picks model_dump_json or send_bytes by
  type. Receiver call sites drop their .model_dump(exclude_none=True)
  noise.
- _error_payload becomes build_error_message returning ErrorMessage,
  with ErrorSnapshot constructed directly from system_info_module.
- send_warning's signature is `MessageId` instead of `str`; every call
  site previously passing the raw i18n key string now passes the
  enumerated value, so a renamed key fails compilation rather than
  silently desynchronising.
- _broadcast_startup_stage and progress_callback enqueue StatusMessage
  directly; the dict sentinel `{"type": "_startup_done"}` becomes a
  typed `None` over the typed `Queue[StatusMessage | None]`.
- All 11 outbound `json.dumps({"type": ...})` literals removed; the
  remaining two are binary frame headers (struct-prefixed envelopes).

Net 21 lines removed; the protocol seam is now genuinely "edges only" —
no string-keyed dict ever crosses the WS in either direction.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both have been unreferenced since the receiver was converted to use
ClientMessageAdapter directly. Drop them along with the ARG001 entry
in the dispatch_request comment of pyproject.toml; the per-file ignore
now only covers lifespan(app) (FastAPI-mandated unused param), which
is step 3's job.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New `app_state.py` defines the AppState dataclass (process-wide mutable
state container) plus typed accessors (`get_app_state` for HTTP via
Depends, `get_app_state_ws` for WebSocket, `attach_app_state` for the
lifespan).

This commit absorbs the four startup-state globals — `startup_complete`,
`startup_error`, `startup_stages`, `ws_startup_waiters` — into AppState.
`_broadcast_startup_stage` and `_heavy_init` take AppState as their first
argument; `_signal_startup_done` extracted to dedupe the pair of waiter-
notification loops. /health and the websocket endpoint receive the typed
state via Depends instead of touching module-level names.

Subsequent commits in this step absorb the remaining globals
(world_engine / image_gen / safety_checker, safety_hash_cache,
_parent_pid) using the same plumbing.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves world_engine, image_gen, safety_checker, and safety_hash_cache
onto AppState. _heavy_init now writes through `state.X = ...`; consumers
read from state. The four module-level globals are deleted.

SafetyCacheEntry moves to app_state.py since it types an AppState field;
it stays a TypedDict for now (pickle compatibility) — step 7 swaps the
cache file format to JSON-with-Pydantic.

Inside websocket_endpoint and the two `_run_*_on_generator` helpers,
non-None aliases are bound once after the startup gate so the bodies
keep their existing world_engine.X / image_gen.X access patterns
without `assert` noise on every call. Step 6 replaces the local-asserts
trick with an explicit "ready" state machine on AppState; step 5 moves
the two helpers out of server.py entirely.

handle_check_seed_safety takes AppState as its first argument. Both
call sites (pre-init and game-loop) updated.

/health receives state via Depends(get_app_state); no module-level
mutable globals remain in server.py.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the last module-level mutable global. ParentWatchdog encapsulates
both the synchronous one-shot startup check (called from __main__ in
case the parent is already gone) and the continuous async polling
(scheduled by the lifespan).

Configuration flows in via a frozen StartupConfig dataclass attached to
`app.state.startup_config` from __main__ before uvicorn.run, retrieved
back inside the lifespan via the typed `get_startup_config(app)`
accessor. Same handoff shape as AppState — explicit attach + typed
read, no `globals()` access.

After this commit `grep -n "^[a-z_]* = " server-components/server.py`
returns nothing mutable.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 40-line `resolve_hf_token` block sat at the top of server.py with
five env-var / file-path fallbacks; it's process-launch infrastructure,
not server logic. Move it to its own module behind a single
`apply_resolved_token()` entry point that resolves and stamps HF_TOKEN
into os.environ before the heavy imports run.

Drops the last "side effect at module-import time" idiom from
server.py; the call is now an explicit one-liner that file-readers
can grep for.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 11-line default prompt string was load-bearing in name only — none
of the shipped Waypoint models actually act on it (the prompt path is
gated behind `has_prompt_conditioning`, and even when set, the engine
ignores the content). Drop the constant rather than ceremonially
relocating it.

`current_prompt` defaults to `""` instead; the PromptNotif handler now
forwards the client's prompt verbatim (empty string included) without
substituting a fallback.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The two `_run_*_on_generator` helpers move to a new scene_authoring.py
module, returning typed `SceneEditResponseData` / `GenerateSceneResponseData`
directly instead of dicts — the receiver's `Pydantic(**dict)` splat goes
away too. The function-scoped `from image_gen import ...` is preserved
inside the new module so importing it from server.py stays light (the
diffusers/llama_cpp import graph still loads lazily on first use).

Also drops the dead prompt path: `WorldEngineManager.current_prompt`
and every `engine.set_prompt(...)` call across `engine_manager.py` and
the new `scene_authoring.py`. None of the shipped Waypoint models
condition on the prompt, so this code never had effect on generation.
The PromptNotif handler still flips the reset flag (a prompt change
forces a session reset) but no longer mutates engine state.

Net 195 lines deleted; server.py drops to ~1700 lines (down from 1735
at the start of the branch, despite all the new imports).

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oad state

`MODEL_CFG` (a `dict[str, dict[str, Any]]` of per-model defaults) becomes
a frozen `ModelConfig` dataclass with typed fields and an `is_multiframe`
property. The new `model_config_from_engine_cfg(engine_model_cfg)` is the
sole place we touch the third-party world_engine config object's
untyped attributes — every other consumer reads typed fields off the
returned `ModelConfig`.

`WorldEngineManager` now holds `model_config: ModelConfig | None`. The
six scattered cached fields (`n_frames`, `temporal_compression`,
`is_multiframe`, `seed_target_size`, `has_prompt_conditioning`,
`inference_fps`) become @Property delegators that raise on access
pre-load — so the implicit "engine has fake waypoint-1 defaults until
you load something" intermediate state is gone. `is_loaded` formalises
the gate.

`_unload_engine_sync` resets `model_config = None` instead of restoring
fake defaults; accidental access on an unloaded manager now fails loud.

External call sites continue to work via the @Property delegators; no
churn outside engine_manager.py for this commit.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every `getattr(world_engine, "X", default)` site in server.py now
reads `world_engine.X` directly. The defensive fallbacks were vestigial
from the era when `world_engine` was a module-level `None` global; with
the AppState refactor these always run downstream of the asserted
local binding, so the fields are guaranteed present.

Issue #78 explicitly calls out "no `getattr`" as a goal — `world_engine`
is now a typed attribute access target. The remaining `getattr` callsite
in the codebase (`model_config_from_engine_cfg` reading the third-party
world_engine model_cfg object) is genuinely defensive against an
external library and stays.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both methods drop their `async` keyword and submit to the cuda_executor
synchronously via `submit(...).result()` instead of the awaitable
`_run_on_cuda_thread` helper. Their only caller is the generator
thread, which previously had to bounce through `run_coro()` (i.e.
`asyncio.run_coroutine_threadsafe(...).result()`) — a cross-thread
synchronous wait on the asyncio loop that would deadlock if the loop
ever blocked. Direct sync calls are simpler, avoid the thread-hop, and
remove one of the three coroutine round-trips called out by issue #78.

The two remaining `run_coro` sites (both calling the websocket-defined
`reset_engine` closure) stay until step 4 lifts that closure out of
the endpoint.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 6 narrowed the surface that the optional-access / argument-type
suppressions cover but didn't eliminate them: the lingering sites live
in image_gen.py, safety.py, and scene_authoring.py's CUDA closures
where third-party torch / PIL / diffusers APIs are deeply Any-typed.
Move those rules from the step-6 tier to the step-7 tier with an
honest note about what remains.

The reportFunctionMemberAccess one (the `_flush_pending.work`
function-attribute hack) stays as step-4-tracked.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…aclass

The deferred-encode batch state was kludged onto a function attribute
(`_flush_pending.work = (cpu_frames, gen_time, ...)`) — a 7-tuple of
positional values, mutated in place by both the inference path and
the flush path. Replace with a `PendingFlush` dataclass and a
`pending: PendingFlush | None` local in the generator scope; the flush
function reads/writes via `nonlocal pending`.

Each field is now named and typed at the declaration, the assignment
sites are kwarg-explicit, and the basedpyright suppression for
`reportFunctionMemberAccess` (which only existed to allow this hack)
goes away.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New `ws_session.py` defines the `Connection` dataclass — a typed
container for the ~25 nonlocals currently scattered across the
receiver / sender / generator closures inside `websocket_endpoint`.
Fields cover init flags, recorder instances, seed metadata, the
pending-init-RPC tracker, game-loop running/paused/reset/prompt
state, the scene-authoring future handoff, and the inter-thread
channels (frame queue, control input, GPU metrics cache).

This commit only adds the type — exception to the "land with
consumer" rule because the next commits in this step *are* the
consumers, and bundling everything into one diff would be
unreviewable. Same exception we made introducing protocol.py at
the start of step 2.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
websocket_endpoint instantiates a single `conn = Connection(...)` and
the eleven previously-nonlocal per-connection variables (init flags,
recorder instances, seed metadata, init_req_id) become `conn.X`
attributes. The long `nonlocal` declarations in `handle_init` and
`load_seed_from_data` go away — mutating attributes on a closed-over
Connection doesn't require nonlocal, only reassigning a name does.

Connection's scope was also narrowed in this commit (from the wider
draft introduced one commit earlier) — channels, control input, and
GPU metric cache stay as locals for now. They migrate onto Connection
(or the extracted Receiver/Generator classes) in subsequent step-4
commits, where the construction order vs. the asyncio loop is naturally
respected.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the seven game-loop variables (running, paused, reset_flag,
prompt_pending, scene_edit_request, generate_scene_request,
last_generated_cpu_frames) to Connection. The receiver, sender, and
generator no longer declare them as nonlocals — closures mutate
attributes on the closed-over `conn` directly.

Net: 17 of the original ~25 nonlocals are now Connection fields.
The remaining locals (frame_queue / frame_ready / progress_queue /
ctrl_state / ctrl_lock / cached_gpu_metrics / main_loop) are
construction-bound to the asyncio loop or to inter-thread channel
semantics; they migrate onto the extracted Receiver / Generator
classes in subsequent step-4 commits.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The remaining channel/queue locals (frame_queue, frame_ready,
progress_queue, log_queue, main_loop, ctrl_lock) and the inline
ctrl_state dict move onto Connection. ctrl_state becomes a typed
ControlState dataclass — `conn.ctrl.buttons` instead of `ctrl_state["buttons"]`.

Loop-bound fields are constructed in `Connection.__post_init__` so
they're non-Optional from the consumer's POV — Connection's only
construction-time invariant is "must be inside a running asyncio loop"
(which `websocket_endpoint` always satisfies).

Net: every nonlocal-style shared state for the WS lifetime is now a
typed Connection field. The receiver / sender / generator closures
remain as nested defs that close over `conn`; the next step-4 commits
extract them as standalone classes/files in `ws_session.py`.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Last shared-state local — `_cached_gpu_metrics` (a 2-key dict) — moves
to the typed `cached_vram_used_bytes` / `cached_gpu_util_percent` int
fields on Connection. `build_binary_frame` reads them inline rather
than splatting a dict into the header. Same for `_update_gpu_metrics`.

After this commit, every per-WS-connection nonlocal is a Connection
field. The receiver / sender / generator closures still live inside
`websocket_endpoint`; their extraction into standalone classes/files
is the next step-4 commit.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`send_message`, `send_warning`, `send_stage`, and `queue_send` move
from local closures inside `websocket_endpoint` to instance methods on
Connection. The closures were 1-3 line wrappers reading nothing from
the websocket-endpoint scope they didn't already have via `conn`, so
the lift is mechanical: 30+ call sites switch from `await
send_X(...)` to `await conn.send_X(...)`, and the closure defs go away.

This is groundwork for extracting the receiver / sender / generator
closures themselves — the fewer enclosing-scope helpers they reference,
the smaller their dependency closure becomes when lifted out.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The sender closure (~17 lines) moves out of `websocket_endpoint` to
a standalone `run_sender(conn)` coroutine in ws_session.py. The
endpoint now does `asyncio.create_task(run_sender(conn))` instead of
defining + spawning a closure. All references the sender used were
already on `conn`; no other dependencies survive the lift.

First of three closure extractions (sender / receiver / generator).
The receiver and generator are 100+ and 280+ lines respectively, with
deeper helper-closure dependencies that need lifting too — those
follow in subsequent commits.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The closure that bridged engine_manager's CUDA-thread progress
callbacks to the asyncio drain queue was the last shared-helper
inside `websocket_endpoint`. Move it to a `Connection.push_progress`
method — bound methods satisfy the `Callable[[Stage], None]`
signature `WorldEngineManager.set_progress_callback` expects.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The four `_action_logger_*` / `_video_recorder_*` closures (lazy
construct + start/end segment, ~30 lines combined) move to
`Connection` methods: `start_action_log_segment(world_engine)`,
`end_action_log_segment()`, `start_video_segment(world_engine)`,
`end_video_segment()`. WorldEngineManager is imported under
TYPE_CHECKING so ws_session.py stays light.

Net 14 closures inside `websocket_endpoint` are now Connection
methods or extracted free functions; what remains is the four big
async/sync closures (handle_init, load_seed_from_data, receiver,
generator) plus build_binary_frame and build_init_response_data.

Refs #78.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
philpax added 14 commits May 4, 2026 19:07
Two issues with the previous setup made it ambiguous where each log
line came from:

1. The format string `%(asctime)s [%(levelname)s] %(message)s` didn't
   include `%(name)s` — even files using `logging.getLogger(__name__)`
   were rendering as bare `[INFO]` lines.
2. Five files (`main.py`, `recording/video_recorder.py`,
   `recording/action_logger.py`, `server/routes.py`,
   `util/system_info.py`) imported the shared `from util.server_logging
   import logger` (named `biome_server`) instead of declaring their own
   per-module logger, so even the name field would have been wrong.

Fix:
- `_LOG_FORMAT` becomes `"%(asctime)s [%(levelname)s] [%(name)s] %(message)s"`.
  Lines now render like `12:34:56 [INFO] [engine.manager] Loading model: …`.
- Every module declares `logger = logging.getLogger(__name__)`. The
  shared `logger` export from `util/server_logging.py` is gone; the
  module keeps its own internal logger for its crash hooks.
- `main.py` triggers the `util.server_logging` side-effect import
  (TeeStream install, `logging.basicConfig`) explicitly with an
  unused-import noqa.

Per-event prefixes like `[client_host]`, `[1/3]`, `[RECV]`, `[ENGINE]`
are kept inside messages — those are operation-level context that the
module name doesn't capture.

Two cascading lint adjustments fall out of the change:

- ruff's `TRY400` and `BLE001` rules can now see that `logger` is a
  `logging.Logger` (the assignment is local), so a couple of previously
  necessary `# noqa: BLE001` directives became unused (logging via
  `exc_info=True` exempts the line) and two previously silent `TRY400`s
  surfaced for `OSError`-as-status-signal call sites that legitimately
  don't want the traceback. Both handled per-line.
Switch the Python server from stdlib `logging` to `structlog`, with the
shape Rust's `tracing` would take if/when this server is ported:

- Module loggers via `structlog.stdlib.get_logger(__name__)`. Output runs
  through a processor pipeline ending in stdlib `logging`, so existing
  transports (TeeStream → stdout / server.log / WebSocket broadcast)
  still work unchanged.

- Per-connection scope via `structlog.contextvars.bind_contextvars(
  client_host=…)` at WS accept. Asyncio tasks inherit; the generator
  thread copies the context at submit time so its logs stay attributed:

      gen_ctx = contextvars.copy_context()
      threading.Thread(target=gen_ctx.run, args=(run_generator, …))

  Every `f"[{conn.client_host}] …"` prefix in the message body was
  dropped — `client_host` now lands in the structured suffix automatically.

- Per-operation scope via bound loggers: `log = logger.bind(operation="warmup")`
  for ladders that distinguish phases inside a single module
  (`reset`, `init_session`, `device_recovery`, `warmup`, `scene_edit`,
  `generate_scene`). Replaces hand-typed `[RESET]` / `[INIT]` / `[WARMUP]`
  / `[SCENE_EDIT]` / `[DEVICE RECOVERY]` prefixes.

- Step counters as structured fields: `current_step` / `total_steps`
  with module-level constants (`LOAD_ENGINE_TOTAL_STEPS`,
  `WARMUP_TOTAL_STEPS`). Replaces `[1/3]` / `[2/3]` / `[3/3]` substrings
  buried in the message text.

- Dynamic values move from f-string interpolation to kwargs: `model`,
  `quant`, `dtype`, `duration_s`, `elapsed_ms`, `frame_id`, `version`,
  `parent_pid`, etc. Renderer's k=v suffix surfaces them; future
  consumers (Rust port, observability ingest) get them as fields.

- Console renderer (`structlog.dev.ConsoleRenderer(colors=False)`) for
  human-readable stdout / WS stream. Sample line:

      12:34:56 [info] [engine.manager] Loading model client_host=10.0.0.1 model=waypoint-1.5 current_step=1 total_steps=3

- Redundant subsystem prefixes (`[SERVER]`, `[ENGINE]`, `[DEVICE]`)
  dropped — module name covers them. Decorative `=== … ===` banner
  blocks dropped — added noise without context.

- `logger.error(..., exc_info=True)` → `logger.exception(...)` where the
  traceback adds signal; per-line `# noqa: TRY400` with rationale on the
  status-only sites that legitimately don't want the traceback.

`structlog==25.5.0` added as a dependency.
Move WS log fan-out off TeeStream and into a new LogBroadcast,
fed by a structlog processor that snapshots the event_dict before
the final renderer runs. Each broadcast LogMessage carries the
rendered line plus level / logger / timestamp / fields so a future
consumer (diagnostics export, structured filter UI) can use the
typed payload — TeeStream is back to its single job of mirroring
stdout/stderr to server.log.

The new fields on LogMessage are all optional, so existing clients
that only read `line` keep working until they're updated to consume
the structured form.
Promote the renderer's log buffers from string[] to a new LogRecord[]
shape that mirrors the WS LogMessage minus the discriminator. WS log
messages now keep their level / logger / timestamp / fields all the
way through to the diagnostics export; engine-log IPC events from the
Electron main process project to the same shape with just `line` and
a derived level from is_stderr.

DiagnosticsPayload.logs is now LogRecord[] — bug reports surface the
structured form so external triagers see logger / fields / level
alongside the rendered line, without re-parsing the text.
Cover the bits a new contributor needs to match the post-refactor
style: per-module loggers, kwargs over f-strings, per-connection
contextvars, sub-operation `bind(operation=...)`, the
current_step/total_steps idiom, the no-bracket-prefixes rule, and
the LogBroadcast / TeeStream split. Also refresh the WS protocol's
`log` bullet to reflect the enriched LogMessage shape and its
LogRecord projection on the renderer.
Define a `PROTOCOL_VERSION` constant in `server/protocol.py`, ship it
to the renderer via the codegen, and tag every WS connection with
`?protocol_version=N`. The server reads the query param right after
`accept()` and on mismatch pushes a typed `ErrorMessage` carrying
`message_id: app.server.error.protocolVersionMismatch` and
`params: {client, server}`, then closes — so the existing
`resolveServerMessage` path surfaces a localised "update Biome"
error in the UI without a special case. Translation key added to
all six locales; CONTRIBUTING.md documents the bump rules and the
codegen's new constants section.

The codegen now scans the primary protocol module for
UPPER_SNAKE_CASE primitive constants and emits them verbatim;
extra modules (`video_recorder.py`) still contribute models but
not their internal implementation constants — that gate exists
because an unfiltered scan pulled in a hard-coded local FFmpeg
path on first run.
…ale files

The unpacker used to copy server-components in over the top of any
existing world_engine/ layout but never removed entries that no
longer existed in source. After the great-server-refactor moved
single-file modules into packages (`server.py` → `server/`,
`server_logging.py` → `util/server_logging.py`, etc.), pre-refactor
installs kept the old top-level `server.py` shadowing the new
`server/` package and crashed at import time with
`ModuleNotFoundError: No module named 'server.protocol'`.

Replace `copyDirRecursive` with `mirrorDirRecursive`: copy source
in, then prune anything in dest that's not in source. Names in
`SERVER_COMPONENT_EXCLUDES` are protected on both sides so the
synced `.venv`, the canonical `server.log`, and library caches
(`.safety_cache.json`, `gemlite_config.json`, `.cache/`) survive
the mirror pass. Verified on a real broken install: the stale
top-level modules get pruned, the venv survives, and the server
boots cleanly under nix-shell.
Promote the wire shape of every log event from a pre-rendered text line
to a structured `LogMessage` (renamed `line`→`event`, plus a typed
`exception` slot and primitive-typed `fields`). Server-side, the
structlog pipeline picks a renderer per `_resolve_log_format()`:
`BIOME_LOG_FORMAT=text|json` overrides; otherwise TTY → text, non-TTY
→ JSON-Lines. The WS broadcast always carries the typed `LogMessage`,
regardless of the local renderer.

The text mode uses a custom `_text_renderer` (replacing
`ConsoleRenderer`) with the layout `HH:MM:SS [level] [logger] event
k=v` — fixed `[level] [logger]` prefix block keeps lines scannable
while the event sits at a variable column. JSON mode is the standard
`structlog.processors.JSONRenderer`. `read_log_tail_records` parses
each replayed `server.log` line back into a `LogMessage` so the WS
log replay keeps the same fidelity as live events.

Also fix a pre-existing double-traceback bug: a `_BoundLogger`
override routes `.exception()` through stdlib's `error` instead of
`exception`, because stdlib's `Logger.exception()` defaults
`exc_info=True` and re-renders the traceback on top of structlog's
already-rendered string.

Bump `PROTOCOL_VERSION` to 2 (the `event`/`exception` field rename is
wire-incompatible). The version handshake we just shipped surfaces
the mismatch as a localised "update Biome" error for any client
still on v1.

Renderer side, `LogRecord` mirrors the new shape (event/exception/
typed-primitive fields). `ServerLogDisplay.LogLine` renders each
record with a fixed visual hierarchy — timestamp dim, level
color-coded, logger pill subdued, event body, fields k=v, exception
preformatted — matching the text-mode formatter.

Electron-side, `parseLogLine` (`electron/lib/logRecord.ts`) projects
each subprocess stdout/stderr line onto the same `LogRecord`: JSON-
parse if possible, fallback to `{event: line}` otherwise. The IPC
`engine-log` payload type is now `LogRecord`, replacing
`{line, is_stderr}`. The codegen also gained a small dedup so
`int | float` in a Pydantic union collapses to a single `number` in
both the TS type and the Zod schema.
…rop bracketed prefixes

Add `electron/lib/logger.ts`: a hand-rolled mirror of the Python
server's `util/server_logging.py` setup. `getLogger(name, opts?)`
returns a `{debug, info, warning, error}` object backed by the same
text-vs-JSON `BIOME_LOG_FORMAT` resolution (env override + TTY
heuristic) and the same `LogRecord` shape on the wire. Loggers with
`defaultBroadcast: true` mirror every event onto the renderer's
`engine-log` IPC channel; per-call `broadcast: true|false` overrides.

Sweep every `console.log/warn/error('[ENGINE] …')` /
`'[SERVER] …'` / `'[UV] …'` / etc. across `electron/` and replace
with `log.info/warning/error(...)`. Logger names are dotted paths
matching the Python convention so the renderer's log buffer and the
diagnostic export visually distinguish "this came from Electron"
from "this came from Python" by the logger pill, without any
prefix glued onto the event text:

  - electron.main      — app lifecycle, signals
  - electron.settings  — load / migrate / validate
  - electron.seeds     — seed thumbnail loading
  - electron.recordings — recording open
  - electron.update    — update checks
  - engine.setup       — install / sync / nuke (defaultBroadcast)
  - engine.diagnostics — `check-engine-status` internals
  - engine.server      — server lifecycle + Python-stdout fallback

Subprocess pass-through stays a separate path (`parseLogLine`):
`runUvSyncWithMirroredLogs` no longer takes `logPrefix` and no
longer glues a prefix onto the line passed to `onLine`; the line
flows raw to our stdout/stderr and through `parseLogLine` (with
`engine.uv-sync` as the fallback logger) to the renderer.
`electron/ipc/server.ts` similarly drops the `[SERVER]` glue and
attributes Python-stdout fallbacks via `engine.server`.

CONTRIBUTING.md gains an "Electron (`getLogger`)" subsection
mirroring the Python conventions, and the lead paragraph of
"Structured logging" now frames the logger field as the Python ↔
Electron distinguisher.
Override `BIOME_LOG_FORMAT=json` in the standalone server's
environment so the inherited value from the dev's shell can't
silently degrade the renderer's `engineLogs` buffer to text-only
fallback records when they've set `BIOME_LOG_FORMAT=text` for
direct-terminal work. The dev's terminal still gets the JSON via
the existing raw pass-through write to our stdout — `jq` recovers
the human-readable form.
`VideoRecorder.write_frames` was writing raw RGB bytes (~10 MB per
batch) to ffmpeg's stdin pipe synchronously on the gen thread.
Linux pipe buffers are ~64 KB, so any time libx264 fell behind, the
write blocked and held up the next iteration's `submit_gen_frame`
plus the deferred `_flush_pending` that ships JPEGs to the WS —
gameplay frame rate dropped whenever recording was on.

Hand each batch off to a per-segment writer thread via an unbounded
`queue.Queue` instead. The gen-thread `write_frames` is now a
near-zero-cost `q.put(frames)`; the writer thread feeds ffmpeg
sequentially. RAM grows by a few MB per backlogged batch — easily
absorbed even on long sessions, and only when libx264 actually
falls behind.

`end_segment` likewise dispatches the drain (queue sentinel +
writer join + ffmpeg-stdin close + ffmpeg-wait + short-recording
cleanup) to a background daemon thread, so the asyncio handler
toggling recording off and the gen-thread reset path don't block
on encoder backpressure either. The recording finishes a couple of
seconds after gameplay ends in the worst case, which is fine.

Overlay anchoring now uses \`_frames_queued\` (gen-thread counter)
so \`note_edit\` aligns the overlay with the user's edit moment in
the recording. The writer thread tracks frames-written locally
and passes the index into \`_apply_overlay\`, so both counters use
the same numbering scheme and the elapsed-frames math stays
correct.

Verified end-to-end: 150 frames at 320x240 enqueued in 0.07 ms,
end_segment returns in 0.17 ms, the background drain produces a
valid 6 MB H.264 MP4 with all 150 frames at 30 fps.
`Copy diagnostics` from the Debug settings tab was passing `logs: []`
to `buildDiagnosticsPayload`, so the resulting JSON was missing the
running session's log history — exactly the thing a triager needs.
Thread `wsAllLogs` (the unbounded WS-broadcast buffer from
`useStreaming`) through instead, so a mid-session export captures
everything the WS has emitted so far.
Renames `DiagnosticsPayload.logs` → `electron_logs` + `server_logs`
so the dump separates the two sources clearly, and adds the missing
half: a rolling buffer of every Electron-side `LogRecord`, exposed
via a new `get-electron-log-tail` IPC, that the builder pulls
automatically.  Captures internal-debug events that never broadcast
onto the `engine-log` channel — `electron.update`,
`electron.settings`, `engine.diagnostics`, etc. — so they ride into
diagnostic exports too.

`BuildDiagnosticsOptions.logs` becomes `serverLogs` to match the
new contract: callers supply the WS-sourced server events; the
builder pulls Electron's tail itself.  Each call site now uses the
unbounded `wsAllLogs` for the server bucket — `DebugTab`,
`TerminalDisplay`, `ConnectionLostOverlay` — and `EngineInstallModal`
passes `[]` since it runs pre-WS.  The synthetic `[ERROR] {message}`
log records appended in TerminalDisplay and ConnectionLostOverlay
are dropped — `error.message` already carries the same string at
the top of the payload.

Two sources of cross-bucket / within-bucket duplication that
showed up in a real session dump are also fixed:

1. `parseLogLine` no longer records into the Electron rolling
   buffer.  Python-server stdout pass-through went into the buffer
   AND the same events came in via WS broadcast, producing exact
   duplicates across `electron_logs` and `server_logs`.  The
   uv-sync pass-through (which exists nowhere else) calls
   `recordElectronLog` explicitly so install-time errors survive.

2. `electron/ipc/server.ts` no longer writes to `server.log`.
   Python's `TeeStream` already mirrors stdout/stderr into that
   file; an extra `fs.appendFileSync` per subprocess line was
   doubling every entry and racing with Python's writes — the
   visible artefacts were the duplicated `Initializing
   WorldEngine...` block and the partial `: "__main__",
   "timestamp": "22:14:12"}` line in the user's dump after WS
   replay re-read the doubled file.
@philpax philpax marked this pull request as ready for review May 5, 2026 15:01
Copy link
Copy Markdown
Collaborator

@Clydingus Clydingus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm (i did not look closely) 👍

philpax added 4 commits May 5, 2026 18:55
The shared `WorldEngine` is single-tenant — its rolling frame history,
seed slot, and progress callback are all process-wide singletons —
and serialising two clients through it would just produce incoherent
output for both.  Rather than try to multiplex, gate the WS endpoint
to one active session at a time.

Right after the protocol-version check, the WS endpoint reads
`app.state.active_session`.  If something's already there, it pushes
a typed `ErrorMessage` carrying `app.server.error.serverBusy` and
closes; otherwise it claims the slot.  Check + claim are both sync,
so two concurrent handshakes can't both pass under asyncio's
cooperative scheduling.  The existing `finally` block clears the
slot with an identity check.  Reconnection (same client returning
after disconnect) works — each session releases its slot on teardown
and the next handshake claims it.

Translation key added to every locale; the `MessageId` ↔ i18n
drift gate would have failed compilation if any were missed.

While here, move the `app.state.system_monitor` and
`app.state.active_session` initialisation into `lifespan` rather
than running it at module scope, and drop the `system_monitor`
module-level local that was just feeding the assignment.
`useEngine.checkServerRunning` decides whether to refresh the full
engine status by comparing the live `is-server-running` IPC return
against `status?.server_running` from its closure.  But
`waitForHealthy` captures one specific callback reference and reuses
it for the entire poll loop — so the closure's snapshot of `status`
never updates, the comparison always reports a delta, and every
500 ms tick fires a fresh `check-engine-status` IPC, which spawns a
`uv run python --version` subprocess (~50–100 ms each) just to
confirm dependencies are still synced.  In standalone mode this
shows up as a wall of `[engine.diagnostics]` spam in the logs while
the server's still loading.

Track the previous `is-server-running` value via a ref so the delta
detection compares against the actual previous-call result, not a
stale closure.  Callback identity is now stable too (no `status`
dep), so consumers that hold `checkServerRunning` references don't
need to re-grab on every status update.
The 120 s wall-clock deadline in `waitForHealthy` turns "user is
patient" into a false-positive timeout when the standalone server's
first-time init has long synchronous work — model downloads, kernel
JIT, device-graph compilation can each blow past it on their own.
The user-cancel signal and the process-died signal are the only
meaningful exit conditions; if neither fires, the server is still
working towards readiness and we should keep polling.  Cut the
deadline.

The orphaned `app.server.startupTimeout` translation key goes away
in every locale.
@philpax philpax force-pushed the the-great-server-refactor branch 3 times, most recently from 45d34b7 to 1d66822 Compare May 5, 2026 20:59
The first time Scene Authoring loads, the user was stuck staring at
"Model loaded!" while the (slow) VLM + editing pipeline came up:
SESSION_INPAINTING_LOAD was sent *after* configure_for_session
returned, so its label never won the race against SESSION_LOADING_DONE.

- Drop the terminal SESSION_LOADING_DONE and SESSION_INPAINTING_READY
  stages. The previous in-progress label holds until the next stage
  takes over.
- Emit the SA-load stage before configure_for_session so the user sees
  "Loading Scene Authoring models..." during the load.
- Rename SESSION_INPAINTING_LOAD -> SESSION_SCENE_AUTHORING_LOAD
  (wire: session.scene_authoring.load) to match the feature name.
- Capitalise + pluralise the user-facing label in en/goose.
@philpax philpax force-pushed the the-great-server-refactor branch from 1d66822 to 54dd5e9 Compare May 5, 2026 21:06
philpax added 3 commits May 5, 2026 23:08
`Path(__file__).parent.parent / "world_engine" / ".safety_cache.json"`
landed the cache at `server-components/world_engine/.safety_cache.json`
in dev (a phantom dir auto-created on first cache write) and at
`<exe>/world_engine/world_engine/.safety_cache.json` in prod (a nested
duplicate). Drop the extra segment so the cache lives next to main.py
in both contexts — matching where gemlite_config.json already lands.

Existing prod caches at the old nested path are orphaned; they rebuild
on demand on the next session.
The MDL row only had the model URI; the quantization mode is set
independently in settings and is the most common reason two sessions
on the same model perform differently. Append it as `(quant)` when
non-none.
Quant changes weren't reliably triggering a reload: switching from a
quantized mode back to no-quant sent `quant: undefined`, JSON.stringify
dropped the field, the server's `model_fields_set` didn't see "quant"
present, and `quant_changed` stayed false — so the engine kept the old
quant loaded and the renderer flashed past the loading screen.

Root cause was the partial-update protocol: `InitRequest` carried
optional deltas where field absence meant "don't change", and the
renderer's `'none'` UI sentinel had to be encoded as the absence of a
field rather than as a value. Easy to get wrong and the server's
six-field-at-a-time `if "X" in present` machinery was ugly to extend.

Pull the live tunables out into a `SessionConfig` nested struct that's
required on every InitRequest. The renderer always sends the full
config; the server diffs each field against current state and applies
deltas. Quant goes back to `Literal["fp8w8a8", "intw8a8"] | None = None`
(None = no quantization) — no enum sentinel needed because the wrapper
struct itself signals presence.

- protocol.py: add SessionConfig, move quant + 5 live flags into it,
  drop their top-level optional twins on InitRequest.
- handlers.py: drop model_fields_set; six guards collapse to direct
  assignments. Quant comparison is now `cfg.quant != world_engine.quant`.
- StreamingContext.tsx: one buildSessionConfig helper, called from all
  four sendInit sites (bootstrap, live re-apply, selectSeed, …). The
  live-toggle effect uses a settings ref so engine_model / quant /
  scene_authoring changes don't race the lifecycle reconnect path.
@philpax philpax merged commit 6d6f6ae into main May 6, 2026
8 checks passed
@philpax philpax deleted the the-great-server-refactor branch May 6, 2026 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor server code

2 participants