The Great Server Refactor by philpax · Pull Request #122 · Overworldai/Biome

philpax · 2026-04-30T21:31:10Z

Fixes #78.

Why

Our server is an agent-grown monolith that suffers from the following:

module-level globals
a loosely-specified, unversioned protocol prone to desynchronisation with the consuming code
no static type discipline
unstructured logging
unclear flow
local imports everywhere
multiple sources of truth

This PR resolves this by aggressively refactoring the server to break it apart, making it easier to maintain, and moving it to a form that may be easier to port to another language in future. The end result is a server that's much easier to maintain and that is generally more robust.

What changed

Python server: from monolith to packages

server.py was split into many files, each with their own focus. main.py is now a thin entry point.
Module globals are gone.
- Per-connection state lives on a Connection dataclass.
- Engine state lives on a WorldEngineManager.
- Cross-cutting deps (Engines, ServerStartup, SystemMonitor) hang off app.state and are pulled via typed FastAPI Depends accessors.
Concrete exception classes replace bare RuntimeError("long message…"); _require_X() helpers replace the repeated if x is None: raise boilerplate.

Static-type discipline

ruff + basedpyright (strict) added with a per-line / per-file suppressions strategy with zero project-wide suppressions, so a new violation in pure-Python code surfaces.
The CI's lint-backend job runs both on every PR.
Pydantic models everywhere on the wire. server/protocol.py is the single source of truth for all protocol messages.

Cross-language type sharing

Custom Pydantic -> TypeScript codegen at server-components/scripts/codegen_ts.py walks protocol.py and recording/video_recorder.py and emits src/types/protocol.generated.ts containing both Zod schemas and z.infer-derived TS types.
CI's lint-backend runs codegen_ts.py --check and fails if the generated TS is stale.
Compile-time drift gates connect the protocol to the renderer's i18n: every MessageId value must have a translation under app.server.{error,warning}.* (and vice versa); every StageId must have a STAGE_PERCENTS entry and a stage.* translation.
Receive-path validation: every incoming WS message and binary frame header is run through the Zod schema; bad payloads are dropped with a logged Zod error rather than fed downstream as garbage.
An RPC request map links each *Request discriminator to its *ResponseData, so request('init', params) returns Promise<InitResponseData> without any call-site casts.

Structured logging, end-to-end

Server-side logs go through structlog, with per-connection and per-thread scope. All logs are structured, improving consistency and making programmatic ingest more viable. Electron produces similarly-structured logs.
WS log broadcast is split off TeeStream into a dedicated LogBroadcast and ships typed LogMessage records (event + level + logger + timestamp + exception + fields) to every connected client.
The Electron and server logs are now de-duplicated, ensuring that server logs that are sent to Electron will not be produced twice in the resulting logs.
ServerLogDisplay pretty-renders them with a fixed visual hierarchy.

Protocol versioning

PROTOCOL_VERSION is a module-level constant in protocol.py, codegen'd to TS as a runtime constant. The client appends ?protocol_version=N to the WS URL on connect; the server reads websocket.query_params right after accept() and rejects mismatches with an error message rendered by the client using the existing translatable-error infrastructure.

Single-session gate

The shared WorldEngine is single-tenant (rolling frame history, seed slot, progress callback are all process-wide singletons). This was an implicit constraint and not checked. We add a check for this that will reject any additional clients with an appropriate error.

Diagnostics export

The diagnostic JSON copied from "Copy Report" / "Report on GitHub" / Debug-tab "Copy diagnostics" now carries electron_logs and server_logs as separate buckets.
Mid-session exports from the Debug settings tab now actually include the running session's history (was logs: []).
Cross-bucket de-duplication: subprocess pass-through no longer records into the Electron buffer (Python events would otherwise appear twice - once via stdout pipe, once via WS broadcast).

Fixes spotted along the way

Stale-file unpacker: the package split (server.py -> server/, server_logging.py -> util/server_logging.py, …) broke pre-refactor installs because the unpacker only copied files in and never pruned them. The structure is now mirrored, with a carve-out for files and folders that should persist (e.g. the cache).
Recording-induced frame stalls: the video recorder was writing frames on the same thread as the WebSocket transmission, which meant that the former could lead to stalls in the latter. Frames-to-record are now submitted to another thread to process, resolving this.

For testing, on both Windows and Linux, ensure that all functionality works, as described in CONTRIBUTING.md.

Strict-from-day-one ruleset (F/E/W/I/N/UP/B/A/C4/SIM/RET/PIE/TC/ICN/ARG/FBT/ BLE/DTZ/ASYNC/TRY/RUF/PLE/PLW/PERF) with two ignore tiers in pyproject.toml: permanent (library naming conventions, FastAPI patterns) and step-tracked (each remaining violation tagged with the refactor step that removes it), so subsequent commits delete ignore lines rather than add them. 41 errors auto-fixed (imports sorted, pyupgrade rewrites, unused imports, redundant else-after-return, etc.). Remaining 101 are all suppressed via tracked ignores; ruff format pass follows in the next commit. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mechanical formatter pass over server-components/ — quotes, line breaks, trailing commas. No semantic changes; isolated commit so subsequent diffs aren't muddied by formatter churn. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Strict mode globally; legacy violations suppressed in tiers — permanent (third-party stub gaps, FastAPI/Pydantic re-export pattern) and step-tracked (each rule annotated with the refactor step that removes it). New modules added in subsequent steps are written to pass strict mode without relying on these suppressions; the suppressions only let the existing legacy code through while we incrementally tighten it. 800 → 0 errors with the suppressions applied. Each later step's exit criterion includes deleting the corresponding ignore lines. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Already a transitive dep of fastapi, but step 2 onward writes Pydantic models directly throughout the codebase — promoting it to a direct dependency makes the version explicit and survives any upstream change to fastapi's transitive set. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Server-side ruff + basedpyright now exist (see prior commits in this branch). Document them alongside the JS lint commands so contributors know to run both, and explain the tiered-ignore policy that lets the refactor land incrementally. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pydantic-based wire protocol layer that every WS message will route through. Discriminated unions on `type: Literal[...]` for both directions: - ClientMessage: ControlNotif | PauseNotif | ResumeNotif | ResetNotif | PromptNotif | InitRequest | SceneEditRequest | GenerateSceneRequest | CheckSeedSafetyRequest - ServerPushMessage: StatusMessage | SystemInfoMessage | ErrorMessage | WarningMessage | LogMessage - RpcSuccess[T] | RpcError as the response envelope (PEP 695 generic) Translation keys consolidated into MessageId StrEnum so the renderer and server can't disagree silently. SystemInfo / ErrorSnapshot promoted from TypedDict to BaseModel since they cross the wire. No consumers in this commit — exception to the "land with consumer" rule because the next four commits *are* the consumers, and bundling them into one diff would be unreviewable. The module is fully strict-typed under basedpyright (zero rule suppressions needed) and round-trips parse/serialize cleanly per smoke test. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The pre-init loop now uses ClientMessageAdapter.validate_json + match instead of json.loads + msg.get(...): - handle_check_seed_safety takes a typed CheckSeedSafetyRequest and returns RpcSuccess[CheckSeedSafetyResponseData] | RpcError; sends responses via model_dump_json directly. - handle_init takes a typed InitRequest; partial-delta semantics preserved via Pydantic's model_fields_set. - SceneEditRequest / GenerateSceneRequest sent before init now reject with a typed RpcError instead of being silently dropped. - Notifications during pre-init are explicitly enumerated (no wildcard), so adding a new ClientMessage variant breaks compilation here. - dispatch_request becomes a thin adapter that the unconverted game-loop receiver still calls; deleted in commit 5 of this step. - The game-loop handle_init call site is wrapped via InitRequest.model_validate(msg) — temporary scaffolding, removed in commit 3 when the receiver itself is converted. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The receiver now uses ClientMessageAdapter.validate_json + an exhaustive match over the discriminated union — every ClientMessage variant has its own arm, so adding a new variant breaks compilation here. Net 36 lines removed; the three near-duplicate `if isinstance(result, BinaryResponse)` dispatch blocks are gone. Each RPC arm constructs typed RpcSuccess[T] / RpcError directly via the protocol helpers; ControlNotif's resolution of buttons → button_codes runs against typed fields rather than `msg.get(...)`. SafetyRejection- Error's `message_id` attribute is now mapped through `MessageId(...)` so the wire value is enforceably one of the enumerated keys. Outstanding: - BinaryResponse and dispatch_request are now unreferenced; commit 5 deletes them along with the now-unused build_init_response_data dict shape. - queue_send / send_json still take dict; commit 4 (next) converts them to BaseModel and unifies serialisation at the wire boundary. - The future returned by the generator thread for scene_edit / generate_scene is still a dict; step 5 (scene_authoring extraction) types it directly so the `**preview` / `**data` splat goes away. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Every server-to-client push now flows through Pydantic models — no more ad-hoc dicts assembled at call sites: - Single send_message(msg: BaseModel) helper at the WS edge; all of send_json / send_warning / send_stage now build a typed model and delegate to it. - frame_queue / queue_send retyped from `dict | bytes` to `BaseModel | bytes`; the sender's branch picks model_dump_json or send_bytes by type. Receiver call sites drop their .model_dump(exclude_none=True) noise. - _error_payload becomes build_error_message returning ErrorMessage, with ErrorSnapshot constructed directly from system_info_module. - send_warning's signature is `MessageId` instead of `str`; every call site previously passing the raw i18n key string now passes the enumerated value, so a renamed key fails compilation rather than silently desynchronising. - _broadcast_startup_stage and progress_callback enqueue StatusMessage directly; the dict sentinel `{"type": "_startup_done"}` becomes a typed `None` over the typed `Queue[StatusMessage | None]`. - All 11 outbound `json.dumps({"type": ...})` literals removed; the remaining two are binary frame headers (struct-prefixed envelopes). Net 21 lines removed; the protocol seam is now genuinely "edges only" — no string-keyed dict ever crosses the WS in either direction. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Both have been unreferenced since the receiver was converted to use ClientMessageAdapter directly. Drop them along with the ARG001 entry in the dispatch_request comment of pyproject.toml; the per-file ignore now only covers lifespan(app) (FastAPI-mandated unused param), which is step 3's job. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New `app_state.py` defines the AppState dataclass (process-wide mutable state container) plus typed accessors (`get_app_state` for HTTP via Depends, `get_app_state_ws` for WebSocket, `attach_app_state` for the lifespan). This commit absorbs the four startup-state globals — `startup_complete`, `startup_error`, `startup_stages`, `ws_startup_waiters` — into AppState. `_broadcast_startup_stage` and `_heavy_init` take AppState as their first argument; `_signal_startup_done` extracted to dedupe the pair of waiter- notification loops. /health and the websocket endpoint receive the typed state via Depends instead of touching module-level names. Subsequent commits in this step absorb the remaining globals (world_engine / image_gen / safety_checker, safety_hash_cache, _parent_pid) using the same plumbing. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Moves world_engine, image_gen, safety_checker, and safety_hash_cache onto AppState. _heavy_init now writes through `state.X = ...`; consumers read from state. The four module-level globals are deleted. SafetyCacheEntry moves to app_state.py since it types an AppState field; it stays a TypedDict for now (pickle compatibility) — step 7 swaps the cache file format to JSON-with-Pydantic. Inside websocket_endpoint and the two `_run_*_on_generator` helpers, non-None aliases are bound once after the startup gate so the bodies keep their existing world_engine.X / image_gen.X access patterns without `assert` noise on every call. Step 6 replaces the local-asserts trick with an explicit "ready" state machine on AppState; step 5 moves the two helpers out of server.py entirely. handle_check_seed_safety takes AppState as its first argument. Both call sites (pre-init and game-loop) updated. /health receives state via Depends(get_app_state); no module-level mutable globals remain in server.py. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drops the last module-level mutable global. ParentWatchdog encapsulates both the synchronous one-shot startup check (called from __main__ in case the parent is already gone) and the continuous async polling (scheduled by the lifespan). Configuration flows in via a frozen StartupConfig dataclass attached to `app.state.startup_config` from __main__ before uvicorn.run, retrieved back inside the lifespan via the typed `get_startup_config(app)` accessor. Same handoff shape as AppState — explicit attach + typed read, no `globals()` access. After this commit `grep -n "^[a-z_]* = " server-components/server.py` returns nothing mutable. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The 40-line `resolve_hf_token` block sat at the top of server.py with five env-var / file-path fallbacks; it's process-launch infrastructure, not server logic. Move it to its own module behind a single `apply_resolved_token()` entry point that resolves and stamps HF_TOKEN into os.environ before the heavy imports run. Drops the last "side effect at module-import time" idiom from server.py; the call is now an explicit one-liner that file-readers can grep for. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The 11-line default prompt string was load-bearing in name only — none of the shipped Waypoint models actually act on it (the prompt path is gated behind `has_prompt_conditioning`, and even when set, the engine ignores the content). Drop the constant rather than ceremonially relocating it. `current_prompt` defaults to `""` instead; the PromptNotif handler now forwards the client's prompt verbatim (empty string included) without substituting a fallback. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The two `_run_*_on_generator` helpers move to a new scene_authoring.py module, returning typed `SceneEditResponseData` / `GenerateSceneResponseData` directly instead of dicts — the receiver's `Pydantic(**dict)` splat goes away too. The function-scoped `from image_gen import ...` is preserved inside the new module so importing it from server.py stays light (the diffusers/llama_cpp import graph still loads lazily on first use). Also drops the dead prompt path: `WorldEngineManager.current_prompt` and every `engine.set_prompt(...)` call across `engine_manager.py` and the new `scene_authoring.py`. None of the shipped Waypoint models condition on the prompt, so this code never had effect on generation. The PromptNotif handler still flips the reset flag (a prompt change forces a session reset) but no longer mutates engine state. Net 195 lines deleted; server.py drops to ~1700 lines (down from 1735 at the start of the branch, despite all the new imports). Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…oad state `MODEL_CFG` (a `dict[str, dict[str, Any]]` of per-model defaults) becomes a frozen `ModelConfig` dataclass with typed fields and an `is_multiframe` property. The new `model_config_from_engine_cfg(engine_model_cfg)` is the sole place we touch the third-party world_engine config object's untyped attributes — every other consumer reads typed fields off the returned `ModelConfig`. `WorldEngineManager` now holds `model_config: ModelConfig | None`. The six scattered cached fields (`n_frames`, `temporal_compression`, `is_multiframe`, `seed_target_size`, `has_prompt_conditioning`, `inference_fps`) become @Property delegators that raise on access pre-load — so the implicit "engine has fake waypoint-1 defaults until you load something" intermediate state is gone. `is_loaded` formalises the gate. `_unload_engine_sync` resets `model_config = None` instead of restoring fake defaults; accidental access on an unloaded manager now fails loud. External call sites continue to work via the @Property delegators; no churn outside engine_manager.py for this commit. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Every `getattr(world_engine, "X", default)` site in server.py now reads `world_engine.X` directly. The defensive fallbacks were vestigial from the era when `world_engine` was a module-level `None` global; with the AppState refactor these always run downstream of the asserted local binding, so the fields are guaranteed present. Issue #78 explicitly calls out "no `getattr`" as a goal — `world_engine` is now a typed attribute access target. The remaining `getattr` callsite in the codebase (`model_config_from_engine_cfg` reading the third-party world_engine model_cfg object) is genuinely defensive against an external library and stays. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Both methods drop their `async` keyword and submit to the cuda_executor synchronously via `submit(...).result()` instead of the awaitable `_run_on_cuda_thread` helper. Their only caller is the generator thread, which previously had to bounce through `run_coro()` (i.e. `asyncio.run_coroutine_threadsafe(...).result()`) — a cross-thread synchronous wait on the asyncio loop that would deadlock if the loop ever blocked. Direct sync calls are simpler, avoid the thread-hop, and remove one of the three coroutine round-trips called out by issue #78. The two remaining `run_coro` sites (both calling the websocket-defined `reset_engine` closure) stay until step 4 lifts that closure out of the endpoint. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Step 6 narrowed the surface that the optional-access / argument-type suppressions cover but didn't eliminate them: the lingering sites live in image_gen.py, safety.py, and scene_authoring.py's CUDA closures where third-party torch / PIL / diffusers APIs are deeply Any-typed. Move those rules from the step-6 tier to the step-7 tier with an honest note about what remains. The reportFunctionMemberAccess one (the `_flush_pending.work` function-attribute hack) stays as step-4-tracked. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…aclass The deferred-encode batch state was kludged onto a function attribute (`_flush_pending.work = (cpu_frames, gen_time, ...)`) — a 7-tuple of positional values, mutated in place by both the inference path and the flush path. Replace with a `PendingFlush` dataclass and a `pending: PendingFlush | None` local in the generator scope; the flush function reads/writes via `nonlocal pending`. Each field is now named and typed at the declaration, the assignment sites are kwarg-explicit, and the basedpyright suppression for `reportFunctionMemberAccess` (which only existed to allow this hack) goes away. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New `ws_session.py` defines the `Connection` dataclass — a typed container for the ~25 nonlocals currently scattered across the receiver / sender / generator closures inside `websocket_endpoint`. Fields cover init flags, recorder instances, seed metadata, the pending-init-RPC tracker, game-loop running/paused/reset/prompt state, the scene-authoring future handoff, and the inter-thread channels (frame queue, control input, GPU metrics cache). This commit only adds the type — exception to the "land with consumer" rule because the next commits in this step *are* the consumers, and bundling everything into one diff would be unreviewable. Same exception we made introducing protocol.py at the start of step 2. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

websocket_endpoint instantiates a single `conn = Connection(...)` and the eleven previously-nonlocal per-connection variables (init flags, recorder instances, seed metadata, init_req_id) become `conn.X` attributes. The long `nonlocal` declarations in `handle_init` and `load_seed_from_data` go away — mutating attributes on a closed-over Connection doesn't require nonlocal, only reassigning a name does. Connection's scope was also narrowed in this commit (from the wider draft introduced one commit earlier) — channels, control input, and GPU metric cache stay as locals for now. They migrate onto Connection (or the extracted Receiver/Generator classes) in subsequent step-4 commits, where the construction order vs. the asyncio loop is naturally respected. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the seven game-loop variables (running, paused, reset_flag, prompt_pending, scene_edit_request, generate_scene_request, last_generated_cpu_frames) to Connection. The receiver, sender, and generator no longer declare them as nonlocals — closures mutate attributes on the closed-over `conn` directly. Net: 17 of the original ~25 nonlocals are now Connection fields. The remaining locals (frame_queue / frame_ready / progress_queue / ctrl_state / ctrl_lock / cached_gpu_metrics / main_loop) are construction-bound to the asyncio loop or to inter-thread channel semantics; they migrate onto the extracted Receiver / Generator classes in subsequent step-4 commits. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The remaining channel/queue locals (frame_queue, frame_ready, progress_queue, log_queue, main_loop, ctrl_lock) and the inline ctrl_state dict move onto Connection. ctrl_state becomes a typed ControlState dataclass — `conn.ctrl.buttons` instead of `ctrl_state["buttons"]`. Loop-bound fields are constructed in `Connection.__post_init__` so they're non-Optional from the consumer's POV — Connection's only construction-time invariant is "must be inside a running asyncio loop" (which `websocket_endpoint` always satisfies). Net: every nonlocal-style shared state for the WS lifetime is now a typed Connection field. The receiver / sender / generator closures remain as nested defs that close over `conn`; the next step-4 commits extract them as standalone classes/files in `ws_session.py`. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Last shared-state local — `_cached_gpu_metrics` (a 2-key dict) — moves to the typed `cached_vram_used_bytes` / `cached_gpu_util_percent` int fields on Connection. `build_binary_frame` reads them inline rather than splatting a dict into the header. Same for `_update_gpu_metrics`. After this commit, every per-WS-connection nonlocal is a Connection field. The receiver / sender / generator closures still live inside `websocket_endpoint`; their extraction into standalone classes/files is the next step-4 commit. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`send_message`, `send_warning`, `send_stage`, and `queue_send` move from local closures inside `websocket_endpoint` to instance methods on Connection. The closures were 1-3 line wrappers reading nothing from the websocket-endpoint scope they didn't already have via `conn`, so the lift is mechanical: 30+ call sites switch from `await send_X(...)` to `await conn.send_X(...)`, and the closure defs go away. This is groundwork for extracting the receiver / sender / generator closures themselves — the fewer enclosing-scope helpers they reference, the smaller their dependency closure becomes when lifted out. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The sender closure (~17 lines) moves out of `websocket_endpoint` to a standalone `run_sender(conn)` coroutine in ws_session.py. The endpoint now does `asyncio.create_task(run_sender(conn))` instead of defining + spawning a closure. All references the sender used were already on `conn`; no other dependencies survive the lift. First of three closure extractions (sender / receiver / generator). The receiver and generator are 100+ and 280+ lines respectively, with deeper helper-closure dependencies that need lifting too — those follow in subsequent commits. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The closure that bridged engine_manager's CUDA-thread progress callbacks to the asyncio drain queue was the last shared-helper inside `websocket_endpoint`. Move it to a `Connection.push_progress` method — bound methods satisfy the `Callable[[Stage], None]` signature `WorldEngineManager.set_progress_callback` expects. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The four `_action_logger_*` / `_video_recorder_*` closures (lazy construct + start/end segment, ~30 lines combined) move to `Connection` methods: `start_action_log_segment(world_engine)`, `end_action_log_segment()`, `start_video_segment(world_engine)`, `end_video_segment()`. WorldEngineManager is imported under TYPE_CHECKING so ws_session.py stays light. Net 14 closures inside `websocket_endpoint` are now Connection methods or extracted free functions; what remains is the four big async/sync closures (handle_init, load_seed_from_data, receiver, generator) plus build_binary_frame and build_init_response_data. Refs #78. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two issues with the previous setup made it ambiguous where each log line came from: 1. The format string `%(asctime)s [%(levelname)s] %(message)s` didn't include `%(name)s` — even files using `logging.getLogger(__name__)` were rendering as bare `[INFO]` lines. 2. Five files (`main.py`, `recording/video_recorder.py`, `recording/action_logger.py`, `server/routes.py`, `util/system_info.py`) imported the shared `from util.server_logging import logger` (named `biome_server`) instead of declaring their own per-module logger, so even the name field would have been wrong. Fix: - `_LOG_FORMAT` becomes `"%(asctime)s [%(levelname)s] [%(name)s] %(message)s"`. Lines now render like `12:34:56 [INFO] [engine.manager] Loading model: …`. - Every module declares `logger = logging.getLogger(__name__)`. The shared `logger` export from `util/server_logging.py` is gone; the module keeps its own internal logger for its crash hooks. - `main.py` triggers the `util.server_logging` side-effect import (TeeStream install, `logging.basicConfig`) explicitly with an unused-import noqa. Per-event prefixes like `[client_host]`, `[1/3]`, `[RECV]`, `[ENGINE]` are kept inside messages — those are operation-level context that the module name doesn't capture. Two cascading lint adjustments fall out of the change: - ruff's `TRY400` and `BLE001` rules can now see that `logger` is a `logging.Logger` (the assignment is local), so a couple of previously necessary `# noqa: BLE001` directives became unused (logging via `exc_info=True` exempts the line) and two previously silent `TRY400`s surfaced for `OSError`-as-status-signal call sites that legitimately don't want the traceback. Both handled per-line.

Switch the Python server from stdlib `logging` to `structlog`, with the shape Rust's `tracing` would take if/when this server is ported: - Module loggers via `structlog.stdlib.get_logger(__name__)`. Output runs through a processor pipeline ending in stdlib `logging`, so existing transports (TeeStream → stdout / server.log / WebSocket broadcast) still work unchanged. - Per-connection scope via `structlog.contextvars.bind_contextvars( client_host=…)` at WS accept. Asyncio tasks inherit; the generator thread copies the context at submit time so its logs stay attributed: gen_ctx = contextvars.copy_context() threading.Thread(target=gen_ctx.run, args=(run_generator, …)) Every `f"[{conn.client_host}] …"` prefix in the message body was dropped — `client_host` now lands in the structured suffix automatically. - Per-operation scope via bound loggers: `log = logger.bind(operation="warmup")` for ladders that distinguish phases inside a single module (`reset`, `init_session`, `device_recovery`, `warmup`, `scene_edit`, `generate_scene`). Replaces hand-typed `[RESET]` / `[INIT]` / `[WARMUP]` / `[SCENE_EDIT]` / `[DEVICE RECOVERY]` prefixes. - Step counters as structured fields: `current_step` / `total_steps` with module-level constants (`LOAD_ENGINE_TOTAL_STEPS`, `WARMUP_TOTAL_STEPS`). Replaces `[1/3]` / `[2/3]` / `[3/3]` substrings buried in the message text. - Dynamic values move from f-string interpolation to kwargs: `model`, `quant`, `dtype`, `duration_s`, `elapsed_ms`, `frame_id`, `version`, `parent_pid`, etc. Renderer's k=v suffix surfaces them; future consumers (Rust port, observability ingest) get them as fields. - Console renderer (`structlog.dev.ConsoleRenderer(colors=False)`) for human-readable stdout / WS stream. Sample line: 12:34:56 [info] [engine.manager] Loading model client_host=10.0.0.1 model=waypoint-1.5 current_step=1 total_steps=3 - Redundant subsystem prefixes (`[SERVER]`, `[ENGINE]`, `[DEVICE]`) dropped — module name covers them. Decorative `=== … ===` banner blocks dropped — added noise without context. - `logger.error(..., exc_info=True)` → `logger.exception(...)` where the traceback adds signal; per-line `# noqa: TRY400` with rationale on the status-only sites that legitimately don't want the traceback. `structlog==25.5.0` added as a dependency.

Move WS log fan-out off TeeStream and into a new LogBroadcast, fed by a structlog processor that snapshots the event_dict before the final renderer runs. Each broadcast LogMessage carries the rendered line plus level / logger / timestamp / fields so a future consumer (diagnostics export, structured filter UI) can use the typed payload — TeeStream is back to its single job of mirroring stdout/stderr to server.log. The new fields on LogMessage are all optional, so existing clients that only read `line` keep working until they're updated to consume the structured form.

Promote the renderer's log buffers from string[] to a new LogRecord[] shape that mirrors the WS LogMessage minus the discriminator. WS log messages now keep their level / logger / timestamp / fields all the way through to the diagnostics export; engine-log IPC events from the Electron main process project to the same shape with just `line` and a derived level from is_stderr. DiagnosticsPayload.logs is now LogRecord[] — bug reports surface the structured form so external triagers see logger / fields / level alongside the rendered line, without re-parsing the text.

Cover the bits a new contributor needs to match the post-refactor style: per-module loggers, kwargs over f-strings, per-connection contextvars, sub-operation `bind(operation=...)`, the current_step/total_steps idiom, the no-bracket-prefixes rule, and the LogBroadcast / TeeStream split. Also refresh the WS protocol's `log` bullet to reflect the enriched LogMessage shape and its LogRecord projection on the renderer.

Define a `PROTOCOL_VERSION` constant in `server/protocol.py`, ship it to the renderer via the codegen, and tag every WS connection with `?protocol_version=N`. The server reads the query param right after `accept()` and on mismatch pushes a typed `ErrorMessage` carrying `message_id: app.server.error.protocolVersionMismatch` and `params: {client, server}`, then closes — so the existing `resolveServerMessage` path surfaces a localised "update Biome" error in the UI without a special case. Translation key added to all six locales; CONTRIBUTING.md documents the bump rules and the codegen's new constants section. The codegen now scans the primary protocol module for UPPER_SNAKE_CASE primitive constants and emits them verbatim; extra modules (`video_recorder.py`) still contribute models but not their internal implementation constants — that gate exists because an unfiltered scan pulled in a hard-coded local FFmpeg path on first run.

…ale files The unpacker used to copy server-components in over the top of any existing world_engine/ layout but never removed entries that no longer existed in source. After the great-server-refactor moved single-file modules into packages (`server.py` → `server/`, `server_logging.py` → `util/server_logging.py`, etc.), pre-refactor installs kept the old top-level `server.py` shadowing the new `server/` package and crashed at import time with `ModuleNotFoundError: No module named 'server.protocol'`. Replace `copyDirRecursive` with `mirrorDirRecursive`: copy source in, then prune anything in dest that's not in source. Names in `SERVER_COMPONENT_EXCLUDES` are protected on both sides so the synced `.venv`, the canonical `server.log`, and library caches (`.safety_cache.json`, `gemlite_config.json`, `.cache/`) survive the mirror pass. Verified on a real broken install: the stale top-level modules get pruned, the venv survives, and the server boots cleanly under nix-shell.

Promote the wire shape of every log event from a pre-rendered text line to a structured `LogMessage` (renamed `line`→`event`, plus a typed `exception` slot and primitive-typed `fields`). Server-side, the structlog pipeline picks a renderer per `_resolve_log_format()`: `BIOME_LOG_FORMAT=text|json` overrides; otherwise TTY → text, non-TTY → JSON-Lines. The WS broadcast always carries the typed `LogMessage`, regardless of the local renderer. The text mode uses a custom `_text_renderer` (replacing `ConsoleRenderer`) with the layout `HH:MM:SS [level] [logger] event k=v` — fixed `[level] [logger]` prefix block keeps lines scannable while the event sits at a variable column. JSON mode is the standard `structlog.processors.JSONRenderer`. `read_log_tail_records` parses each replayed `server.log` line back into a `LogMessage` so the WS log replay keeps the same fidelity as live events. Also fix a pre-existing double-traceback bug: a `_BoundLogger` override routes `.exception()` through stdlib's `error` instead of `exception`, because stdlib's `Logger.exception()` defaults `exc_info=True` and re-renders the traceback on top of structlog's already-rendered string. Bump `PROTOCOL_VERSION` to 2 (the `event`/`exception` field rename is wire-incompatible). The version handshake we just shipped surfaces the mismatch as a localised "update Biome" error for any client still on v1. Renderer side, `LogRecord` mirrors the new shape (event/exception/ typed-primitive fields). `ServerLogDisplay.LogLine` renders each record with a fixed visual hierarchy — timestamp dim, level color-coded, logger pill subdued, event body, fields k=v, exception preformatted — matching the text-mode formatter. Electron-side, `parseLogLine` (`electron/lib/logRecord.ts`) projects each subprocess stdout/stderr line onto the same `LogRecord`: JSON- parse if possible, fallback to `{event: line}` otherwise. The IPC `engine-log` payload type is now `LogRecord`, replacing `{line, is_stderr}`. The codegen also gained a small dedup so `int | float` in a Pydantic union collapses to a single `number` in both the TS type and the Zod schema.

…rop bracketed prefixes Add `electron/lib/logger.ts`: a hand-rolled mirror of the Python server's `util/server_logging.py` setup. `getLogger(name, opts?)` returns a `{debug, info, warning, error}` object backed by the same text-vs-JSON `BIOME_LOG_FORMAT` resolution (env override + TTY heuristic) and the same `LogRecord` shape on the wire. Loggers with `defaultBroadcast: true` mirror every event onto the renderer's `engine-log` IPC channel; per-call `broadcast: true|false` overrides. Sweep every `console.log/warn/error('[ENGINE] …')` / `'[SERVER] …'` / `'[UV] …'` / etc. across `electron/` and replace with `log.info/warning/error(...)`. Logger names are dotted paths matching the Python convention so the renderer's log buffer and the diagnostic export visually distinguish "this came from Electron" from "this came from Python" by the logger pill, without any prefix glued onto the event text: - electron.main — app lifecycle, signals - electron.settings — load / migrate / validate - electron.seeds — seed thumbnail loading - electron.recordings — recording open - electron.update — update checks - engine.setup — install / sync / nuke (defaultBroadcast) - engine.diagnostics — `check-engine-status` internals - engine.server — server lifecycle + Python-stdout fallback Subprocess pass-through stays a separate path (`parseLogLine`): `runUvSyncWithMirroredLogs` no longer takes `logPrefix` and no longer glues a prefix onto the line passed to `onLine`; the line flows raw to our stdout/stderr and through `parseLogLine` (with `engine.uv-sync` as the fallback logger) to the renderer. `electron/ipc/server.ts` similarly drops the `[SERVER]` glue and attributes Python-stdout fallbacks via `engine.server`. CONTRIBUTING.md gains an "Electron (`getLogger`)" subsection mirroring the Python conventions, and the lead paragraph of "Structured logging" now frames the logger field as the Python ↔ Electron distinguisher.

Override `BIOME_LOG_FORMAT=json` in the standalone server's environment so the inherited value from the dev's shell can't silently degrade the renderer's `engineLogs` buffer to text-only fallback records when they've set `BIOME_LOG_FORMAT=text` for direct-terminal work. The dev's terminal still gets the JSON via the existing raw pass-through write to our stdout — `jq` recovers the human-readable form.

`VideoRecorder.write_frames` was writing raw RGB bytes (~10 MB per batch) to ffmpeg's stdin pipe synchronously on the gen thread. Linux pipe buffers are ~64 KB, so any time libx264 fell behind, the write blocked and held up the next iteration's `submit_gen_frame` plus the deferred `_flush_pending` that ships JPEGs to the WS — gameplay frame rate dropped whenever recording was on. Hand each batch off to a per-segment writer thread via an unbounded `queue.Queue` instead. The gen-thread `write_frames` is now a near-zero-cost `q.put(frames)`; the writer thread feeds ffmpeg sequentially. RAM grows by a few MB per backlogged batch — easily absorbed even on long sessions, and only when libx264 actually falls behind. `end_segment` likewise dispatches the drain (queue sentinel + writer join + ffmpeg-stdin close + ffmpeg-wait + short-recording cleanup) to a background daemon thread, so the asyncio handler toggling recording off and the gen-thread reset path don't block on encoder backpressure either. The recording finishes a couple of seconds after gameplay ends in the worst case, which is fine. Overlay anchoring now uses \`_frames_queued\` (gen-thread counter) so \`note_edit\` aligns the overlay with the user's edit moment in the recording. The writer thread tracks frames-written locally and passes the index into \`_apply_overlay\`, so both counters use the same numbering scheme and the elapsed-frames math stays correct. Verified end-to-end: 150 frames at 320x240 enqueued in 0.07 ms, end_segment returns in 0.17 ms, the background drain produces a valid 6 MB H.264 MP4 with all 150 frames at 30 fps.

`Copy diagnostics` from the Debug settings tab was passing `logs: []` to `buildDiagnosticsPayload`, so the resulting JSON was missing the running session's log history — exactly the thing a triager needs. Thread `wsAllLogs` (the unbounded WS-broadcast buffer from `useStreaming`) through instead, so a mid-session export captures everything the WS has emitted so far.

Renames `DiagnosticsPayload.logs` → `electron_logs` + `server_logs` so the dump separates the two sources clearly, and adds the missing half: a rolling buffer of every Electron-side `LogRecord`, exposed via a new `get-electron-log-tail` IPC, that the builder pulls automatically. Captures internal-debug events that never broadcast onto the `engine-log` channel — `electron.update`, `electron.settings`, `engine.diagnostics`, etc. — so they ride into diagnostic exports too. `BuildDiagnosticsOptions.logs` becomes `serverLogs` to match the new contract: callers supply the WS-sourced server events; the builder pulls Electron's tail itself. Each call site now uses the unbounded `wsAllLogs` for the server bucket — `DebugTab`, `TerminalDisplay`, `ConnectionLostOverlay` — and `EngineInstallModal` passes `[]` since it runs pre-WS. The synthetic `[ERROR] {message}` log records appended in TerminalDisplay and ConnectionLostOverlay are dropped — `error.message` already carries the same string at the top of the payload. Two sources of cross-bucket / within-bucket duplication that showed up in a real session dump are also fixed: 1. `parseLogLine` no longer records into the Electron rolling buffer. Python-server stdout pass-through went into the buffer AND the same events came in via WS broadcast, producing exact duplicates across `electron_logs` and `server_logs`. The uv-sync pass-through (which exists nowhere else) calls `recordElectronLog` explicitly so install-time errors survive. 2. `electron/ipc/server.ts` no longer writes to `server.log`. Python's `TeeStream` already mirrors stdout/stderr into that file; an extra `fs.appendFileSync` per subprocess line was doubling every entry and racing with Python's writes — the visible artefacts were the duplicated `Initializing WorldEngine...` block and the partial `: "__main__", "timestamp": "22:14:12"}` line in the user's dump after WS replay re-read the doubled file.

Clydingus

lgtm (i did not look closely) 👍

The shared `WorldEngine` is single-tenant — its rolling frame history, seed slot, and progress callback are all process-wide singletons — and serialising two clients through it would just produce incoherent output for both. Rather than try to multiplex, gate the WS endpoint to one active session at a time. Right after the protocol-version check, the WS endpoint reads `app.state.active_session`. If something's already there, it pushes a typed `ErrorMessage` carrying `app.server.error.serverBusy` and closes; otherwise it claims the slot. Check + claim are both sync, so two concurrent handshakes can't both pass under asyncio's cooperative scheduling. The existing `finally` block clears the slot with an identity check. Reconnection (same client returning after disconnect) works — each session releases its slot on teardown and the next handshake claims it. Translation key added to every locale; the `MessageId` ↔ i18n drift gate would have failed compilation if any were missed. While here, move the `app.state.system_monitor` and `app.state.active_session` initialisation into `lifespan` rather than running it at module scope, and drop the `system_monitor` module-level local that was just feeding the assignment.

`useEngine.checkServerRunning` decides whether to refresh the full engine status by comparing the live `is-server-running` IPC return against `status?.server_running` from its closure. But `waitForHealthy` captures one specific callback reference and reuses it for the entire poll loop — so the closure's snapshot of `status` never updates, the comparison always reports a delta, and every 500 ms tick fires a fresh `check-engine-status` IPC, which spawns a `uv run python --version` subprocess (~50–100 ms each) just to confirm dependencies are still synced. In standalone mode this shows up as a wall of `[engine.diagnostics]` spam in the logs while the server's still loading. Track the previous `is-server-running` value via a ref so the delta detection compares against the actual previous-call result, not a stale closure. Callback identity is now stable too (no `status` dep), so consumers that hold `checkServerRunning` references don't need to re-grab on every status update.

The 120 s wall-clock deadline in `waitForHealthy` turns "user is patient" into a false-positive timeout when the standalone server's first-time init has long synchronous work — model downloads, kernel JIT, device-graph compilation can each blow past it on their own. The user-cancel signal and the process-died signal are the only meaningful exit conditions; if neither fires, the server is still working towards readiness and we should keep polling. Cut the deadline. The orphaned `app.server.startupTimeout` translation key goes away in every locale.

The first time Scene Authoring loads, the user was stuck staring at "Model loaded!" while the (slow) VLM + editing pipeline came up: SESSION_INPAINTING_LOAD was sent *after* configure_for_session returned, so its label never won the race against SESSION_LOADING_DONE. - Drop the terminal SESSION_LOADING_DONE and SESSION_INPAINTING_READY stages. The previous in-progress label holds until the next stage takes over. - Emit the SA-load stage before configure_for_session so the user sees "Loading Scene Authoring models..." during the load. - Rename SESSION_INPAINTING_LOAD -> SESSION_SCENE_AUTHORING_LOAD (wire: session.scene_authoring.load) to match the feature name. - Capitalise + pluralise the user-facing label in en/goose.

`Path(__file__).parent.parent / "world_engine" / ".safety_cache.json"` landed the cache at `server-components/world_engine/.safety_cache.json` in dev (a phantom dir auto-created on first cache write) and at `<exe>/world_engine/world_engine/.safety_cache.json` in prod (a nested duplicate). Drop the extra segment so the cache lives next to main.py in both contexts — matching where gemlite_config.json already lands. Existing prod caches at the old nested path are orphaned; they rebuild on demand on the next session.

The MDL row only had the model URI; the quantization mode is set independently in settings and is the most common reason two sessions on the same model perform differently. Append it as `(quant)` when non-none.

Quant changes weren't reliably triggering a reload: switching from a quantized mode back to no-quant sent `quant: undefined`, JSON.stringify dropped the field, the server's `model_fields_set` didn't see "quant" present, and `quant_changed` stayed false — so the engine kept the old quant loaded and the renderer flashed past the loading screen. Root cause was the partial-update protocol: `InitRequest` carried optional deltas where field absence meant "don't change", and the renderer's `'none'` UI sentinel had to be encoded as the absence of a field rather than as a value. Easy to get wrong and the server's six-field-at-a-time `if "X" in present` machinery was ugly to extend. Pull the live tunables out into a `SessionConfig` nested struct that's required on every InitRequest. The renderer always sends the full config; the server diffs each field against current state and applies deltas. Quant goes back to `Literal["fp8w8a8", "intw8a8"] | None = None` (None = no quantization) — no enum sentinel needed because the wrapper struct itself signals presence. - protocol.py: add SessionConfig, move quant + 5 live flags into it, drop their top-level optional twins on InitRequest. - handlers.py: drop model_fields_set; six guards collapse to direct assignments. Quant comparison is now `cfg.quant != world_engine.quant`. - StreamingContext.tsx: one buildSessionConfig helper, called from all four sendInit sites (bootstrap, live re-apply, selectSeed, …). The live-toggle effect uses a settings ref so engine_model / quant / scene_authoring changes don't race the lifecycle reconnect path.

philpax and others added 30 commits April 30, 2026 16:48

philpax added 14 commits May 4, 2026 19:07

fix(server): ignore gemlite

6e8fcfa

philpax marked this pull request as ready for review May 5, 2026 15:01

Clydingus approved these changes May 5, 2026

View reviewed changes

philpax added 4 commits May 5, 2026 18:55

fix(connectionlost): readd buttons for menu/reconnect

97eb48a

philpax force-pushed the the-great-server-refactor branch 3 times, most recently from 45d34b7 to 1d66822 Compare May 5, 2026 20:59

philpax force-pushed the the-great-server-refactor branch from 1d66822 to 54dd5e9 Compare May 5, 2026 21:06

philpax added 3 commits May 5, 2026 23:08

feat(stats): show quant alongside the model in the perf overlay

b923770

The MDL row only had the model URI; the quantization mode is set independently in settings and is the most common reason two sessions on the same model perform differently. Append it as `(quant)` when non-none.

philpax merged commit 6d6f6ae into main May 6, 2026
8 checks passed

philpax deleted the the-great-server-refactor branch May 6, 2026 13:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Great Server Refactor#122

The Great Server Refactor#122
philpax merged 105 commits intomainfrom
the-great-server-refactor

philpax commented Apr 30, 2026 •

edited

Loading

Uh oh!

Clydingus left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

philpax commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed

Python server: from monolith to packages

Static-type discipline

Cross-language type sharing

Structured logging, end-to-end

Protocol versioning

Single-session gate

Diagnostics export

Fixes spotted along the way

Uh oh!

Clydingus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

philpax commented Apr 30, 2026 •

edited

Loading