fix: don't flap a lock_offline repair during integration startup (#1257) by raman325 · Pull Request #1260 · raman325/lock_code_manager

raman325 · 2026-06-15T02:07:00Z

Proposed change

Closes the repair-issue flapping in #1257: after a Home Assistant restart, Lock Code Manager begins polling/writing before the lock's underlying integration has finished starting up. For Matter this surfaces as transient InvalidState: Not connected errors. Those failures feed the lock circuit breaker, and at POLL_FAILURE_ALERT_THRESHOLD (12) consecutive failures the coordinator raises the lock_offline repair — which is then auto-cleared the instant the integration finishes loading. The repair is created and dismissed entirely within the startup window, so the user sees a repair appear and vanish for a lock that was never actually offline.

Fix

A lock that has never been reached is not "offline" — "offline" presupposes it was once online. The coordinator now tracks _reached_once (set on the first successful poll/push, in _reset_backoff) and only raises lock_offline once the lock has actually been reached:

Startup window (never reached, integration still loading): transient failures accumulate on the breaker as before, but no lock_offline repair is raised — nothing to flap.
Genuine outage (reached, then drops): _reached_once is True, so a sustained failure raises lock_offline exactly as it does today, and recovery clears it.

Relationship to #1258 / #1257

#1258 (released in 4.0.6) already reclassified the underlying transient Matter startup errors (unknown(133), InvalidState: Not connected) from fatal (CodeRejectedError → disable slot / unexpected error → suspend slot) to the retry path (LockDisconnected). That stopped the slot_disabled / slot_suspended repairs. But it rerouted those same failures into the connectivity breaker, which is what feeds lock_offline — so the flap moved there. This PR closes that remaining startup-window repair.

This is unrelated to the verified-credential lifecycle work (#1259) and branches off main.

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New feature (which adds functionality)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

A previously-reachable lock that goes offline still alerts; only a never-yet-reached lock is suppressed (such a lock is "still starting up / misconfigured", surfaced by its own entity being unavailable and slots not syncing, not by a spurious "offline" repair).
Full Python suite green (1251). New tests: lock_offline not created when never reached (even past the threshold); created normally after a reach-then-drop; push_update marks the lock reached. The four existing lock_offline threshold tests were updated to establish a prior reach, reflecting the new semantic.
Accepted trade-off: _reached_once is per-coordinator in-memory state. A lock that is dead across an HA-restart boundary and is never reached afterward will not raise lock_offline (it was never "online" this session). This is intentional — it's the startup-flap fix — and is surfaced instead by the lock entity being unavailable and slots not syncing.
Fixes [ISSUE] Matter lock slots disabled after HA restart due to startup credential sync failures #1257.

🤖 Generated with Claude Code

…reached Closes the repair-issue flapping in #1257: after a Home Assistant restart, LCM polls/writes before the lock's integration has finished starting up (Matter surfaces this as `InvalidState: Not connected`). Those failures feed the lock breaker, and at POLL_FAILURE_ALERT_THRESHOLD (12) the coordinator raises the `lock_offline` repair -- which is then auto-cleared the instant the integration finishes loading. The repair is created and dismissed entirely within the startup window. A lock that has never been reached is not "offline" -- "offline" presupposes it was once online. Track `_reached_once` (set on the first successful poll/push via `_reset_backoff`) and only raise `lock_offline` once the lock has actually been reached. A lock that is reached and then drops still alerts normally; a lock still coming up at startup no longer flaps a repair. #1258 already routes the underlying transient Matter startup errors (`unknown(133)`, `InvalidState: Not connected`) to the retry path, so they no longer disable/suspend slots; this closes the remaining startup-window repair they fed into via the connectivity breaker. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

codecov · 2026-06-15T02:10:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.15%. Comparing base (ebac273) to head (e8330cc).
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1260   +/-   ##
=======================================
  Coverage   97.15%   97.15%           
=======================================
  Files          54       54           
  Lines        6434     6437    +3     
  Branches      461      461           
=======================================
+ Hits         6251     6254    +3     
  Misses        183      183

Flag	Coverage Δ
python	`97.71% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...components/lock_code_manager/domain/coordinator.py	`96.63% <100.00%> (+0.08%)`	⬆️

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Review follow-up: a successful drift-detection hard refresh is a genuine contact with the lock, but it did not flow through _reset_backoff, so it never set _reached_once. Set it on drift success too, so a lock whose only successful contact was via drift can still raise lock_offline on a later outage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings June 15, 2026 02:07

Copilot AI reviewed Jun 15, 2026

github-actions Bot added python Pull requests that update Python code bug Something isn't working labels Jun 15, 2026

raman325 merged commit a7fb854 into main Jun 15, 2026
20 of 22 checks passed

raman325 deleted the fix/matter-startup-lock-offline-flap branch June 15, 2026 03:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: don't flap a lock_offline repair during integration startup (#1257)#1260

fix: don't flap a lock_offline repair during integration startup (#1257)#1260
raman325 merged 2 commits into
mainfrom
fix/matter-startup-lock-offline-flap

raman325 commented Jun 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

codecov Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

raman325 commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed change

Fix

Relationship to #1258 / #1257

Type of change

Additional information

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

raman325 commented Jun 15, 2026 •

edited

Loading

codecov Bot commented Jun 15, 2026 •

edited

Loading