Queued offline updates#1430
Conversation
…e devices. When a firmware job completes successfully for a device that is currently offline, we set a new `queued_update` flag on the device. This indicates that the compiled firmware is ready and waiting to be flashed via OTA the next time the device checks in via mDNS.
…7/esphome-device-builder into queued-offline-updates
for more information, see https://pre-commit.ci
|
Will submit frontend to show queued status and option to cancel. Feedback appreciated. |
Core gap: |
PR Review — Queued offline updatesPromising feature, but incomplete and not safely landable as-is.
🔴 Blocking1. Queued-update callback is never wired, and the flag is never cleared (`esphome_device_builder/controllers/_device_state_monitor/controller.py`, L248-267)This adds the Even once wired, two problems remain:
Please include the consuming side (the devices controller wiring + the flag reset on successful upload) in this PR, plus a test that drives 2. Wrong event payload shape and unpersisted direct device mutation (`esphome_device_builder/controllers/firmware/runner.py`, L218-237)Two convention violations here:
3. Tests reimplement production logic instead of exercising it (`tests/controllers/firmware/test_offline_queue.py`, L64-116)
for d in [mock_device]:
if d.state != DeviceState.ONLINE and getattr(d, "queued_update", False):
trigger_queued_update = TrueThis tests the duplicated code, not Drive the real Also note 🟡 Important1. Cargo-cult exception handling around bus.fire (`esphome_device_builder/controllers/firmware/runner.py`, L234-236)
2. Duplicated device lookup; inconsistent enum-vs-string state checks (`esphome_device_builder/controllers/firmware/controller.py`, L217-237)The configuration→device lookup loop is duplicated verbatim here and in Style: 3. Offline install silently changes semantics with no user signal (`esphome_device_builder/controllers/firmware/controller.py`, L224-237)When the device is offline, Make the queued state explicit in the job/response so the frontend can show "queued, will flash when online" rather than a stalled install. The author notes a companion frontend PR — ensure the WS response shape conveys this state, not just the new Checklist
To rebase specific severity levels, mention me: Silent Failure Analysis🟠 **HIGH** — swallowed exception (`esphome_device_builder/controllers/firmware/runner.py:234-237`)Risk: The flag is mutated in RAM but the event announcing it is caught and logged only at DEBUG (invisible in production), so a misconfigured bus or a typo'd EventType.DEVICE_UPDATED member silently drops the notification and the frontend never learns the firmware is queued. Fix: Let the event fire unguarded (or log at WARNING/ERROR and re-raise) so a broken bus/event-type surfaces instead of being swallowed; the event is load-bearing, not best-effort cleanup. 🟡 **MEDIUM** — unpersisted state hidden behind in-RAM flag (`esphome_device_builder/controllers/firmware/runner.py:230-232`)Risk: queued_update is set only on the in-RAM Device and is not written to the metadata sidecar, so a backend restart before the device returns online silently loses the queued flash and the compiled firmware is never installed. Fix: Persist queued_update through the device metadata Store (or reconcile against pending COMPILE jobs on startup) so the queued install survives a restart. 🟡 **MEDIUM** — discarded failure result (`esphome_device_builder/controllers/firmware/runner.py:222-228`)Risk: queued_update is set whenever the compile job succeeded, but the surrounding branch only inspects Fix: Gate the flag on the artifact actually existing (binary present / job type that produces a flashable image), not solely on the subprocess exit code. 🟡 **MEDIUM** — optional masking via getattr default (`esphome_device_builder/controllers/_device_state_monitor/controller.py:256-261`)Risk: queued_update is now a declared Device field, so the getattr-with-default defensively swallows a future rename/typo of the attribute into a permanent silent False, meaning queued updates would stop firing with no error. Fix: Access d.queued_update directly so a renamed/removed field raises instead of silently disabling the trigger. Automated review by Kōan (Claude) |
esphbot
left a comment
There was a problem hiding this comment.
Blocking issues found — see the review comment above.
What does this implement/fix?
Add detection when device is offline to local build and add a queue flag that triggers an OTA when the device wakes.
Related issue or feature (if applicable):
Types of changes
bugfixnew-featureenhancementbreaking-changerefactordocsmaintenancecidependenciesFrontend coordination
Checklist
ruff,codespell, yaml/json/python checks).tests/where applicable.components.index.json/definitions/components/*.jsonhave not been hand-edited (regenerate viascript/sync_components.pyif a sync is needed).docs/ARCHITECTURE.mdand/ordocs/API.md.