Skip to content

fix: don't flap a lock_offline repair during integration startup (#1257)#1260

Merged
raman325 merged 2 commits into
mainfrom
fix/matter-startup-lock-offline-flap
Jun 15, 2026
Merged

fix: don't flap a lock_offline repair during integration startup (#1257)#1260
raman325 merged 2 commits into
mainfrom
fix/matter-startup-lock-offline-flap

Conversation

@raman325

@raman325 raman325 commented Jun 15, 2026

Copy link
Copy Markdown
Owner

Proposed change

Closes the repair-issue flapping in #1257: after a Home Assistant restart, Lock Code Manager begins polling/writing before the lock's underlying integration has finished starting up. For Matter this surfaces as transient InvalidState: Not connected errors. Those failures feed the lock circuit breaker, and at POLL_FAILURE_ALERT_THRESHOLD (12) consecutive failures the coordinator raises the lock_offline repair — which is then auto-cleared the instant the integration finishes loading. The repair is created and dismissed entirely within the startup window, so the user sees a repair appear and vanish for a lock that was never actually offline.

Fix

A lock that has never been reached is not "offline" — "offline" presupposes it was once online. The coordinator now tracks _reached_once (set on the first successful poll/push, in _reset_backoff) and only raises lock_offline once the lock has actually been reached:

  • Startup window (never reached, integration still loading): transient failures accumulate on the breaker as before, but no lock_offline repair is raised — nothing to flap.
  • Genuine outage (reached, then drops): _reached_once is True, so a sustained failure raises lock_offline exactly as it does today, and recovery clears it.

Relationship to #1258 / #1257

#1258 (released in 4.0.6) already reclassified the underlying transient Matter startup errors (unknown(133), InvalidState: Not connected) from fatal (CodeRejectedError → disable slot / unexpected error → suspend slot) to the retry path (LockDisconnected). That stopped the slot_disabled / slot_suspended repairs. But it rerouted those same failures into the connectivity breaker, which is what feeds lock_offline — so the flap moved there. This PR closes that remaining startup-window repair.

This is unrelated to the verified-credential lifecycle work (#1259) and branches off main.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New feature (which adds functionality)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • A previously-reachable lock that goes offline still alerts; only a never-yet-reached lock is suppressed (such a lock is "still starting up / misconfigured", surfaced by its own entity being unavailable and slots not syncing, not by a spurious "offline" repair).
  • Full Python suite green (1251). New tests: lock_offline not created when never reached (even past the threshold); created normally after a reach-then-drop; push_update marks the lock reached. The four existing lock_offline threshold tests were updated to establish a prior reach, reflecting the new semantic.
  • Accepted trade-off: _reached_once is per-coordinator in-memory state. A lock that is dead across an HA-restart boundary and is never reached afterward will not raise lock_offline (it was never "online" this session). This is intentional — it's the startup-flap fix — and is surfaced instead by the lock entity being unavailable and slots not syncing.
  • Fixes [ISSUE] Matter lock slots disabled after HA restart due to startup credential sync failures #1257.

🤖 Generated with Claude Code

…reached

Closes the repair-issue flapping in #1257: after a Home Assistant restart,
LCM polls/writes before the lock's integration has finished starting up
(Matter surfaces this as `InvalidState: Not connected`). Those failures feed
the lock breaker, and at POLL_FAILURE_ALERT_THRESHOLD (12) the coordinator
raises the `lock_offline` repair -- which is then auto-cleared the instant the
integration finishes loading. The repair is created and dismissed entirely
within the startup window.

A lock that has never been reached is not "offline" -- "offline" presupposes it
was once online. Track `_reached_once` (set on the first successful poll/push
via `_reset_backoff`) and only raise `lock_offline` once the lock has actually
been reached. A lock that is reached and then drops still alerts normally; a
lock still coming up at startup no longer flaps a repair.

#1258 already routes the underlying transient Matter startup errors
(`unknown(133)`, `InvalidState: Not connected`) to the retry path, so they no
longer disable/suspend slots; this closes the remaining startup-window repair
they fed into via the connectivity breaker.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 15, 2026 02:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@github-actions github-actions Bot added python Pull requests that update Python code bug Something isn't working labels Jun 15, 2026
@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.15%. Comparing base (ebac273) to head (e8330cc).
✅ All tests successful. No failed tests found.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1260   +/-   ##
=======================================
  Coverage   97.15%   97.15%           
=======================================
  Files          54       54           
  Lines        6434     6437    +3     
  Branches      461      461           
=======================================
+ Hits         6251     6254    +3     
  Misses        183      183           
Flag Coverage Δ
python 97.71% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...components/lock_code_manager/domain/coordinator.py 96.63% <100.00%> (+0.08%) ⬆️
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Review follow-up: a successful drift-detection hard refresh is a genuine
contact with the lock, but it did not flow through _reset_backoff, so it
never set _reached_once. Set it on drift success too, so a lock whose only
successful contact was via drift can still raise lock_offline on a later
outage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@raman325 raman325 merged commit a7fb854 into main Jun 15, 2026
20 of 22 checks passed
@raman325 raman325 deleted the fix/matter-startup-lock-offline-flap branch June 15, 2026 03:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ISSUE] Matter lock slots disabled after HA restart due to startup credential sync failures

2 participants