Fix: Improve MQTT reconnection reliability with active reconnection#26
Fix: Improve MQTT reconnection reliability with active reconnection#26
Conversation
- Implement active reconnection that recreates MQTT connection on interruption - Remove unreliable passive fallback to AWS IoT SDK automatic reconnection - Make reconnect_func required parameter in MqttReconnectionHandler - Add _active_reconnect() method to NavienMqttClient - Emit reconnection_failed event when max attempts are exhausted - Improve error handling and logging during reconnection - Resolves AWS_ERROR_MQTT_UNEXPECTED_HANGUP connection failures Breaking Internal Change: - MqttReconnectionHandler now requires reconnect_func parameter - Public API (NavienMqttClient) unchanged - no user impact Fixes connection issues where reconnection would fail after network interruptions, causing 'Failed to reconnect after 10 attempts' errors.
There was a problem hiding this comment.
Pull Request Overview
This PR fixes MQTT reconnection failures by replacing passive waiting for AWS IoT SDK auto-reconnection with active reconnection that always recreates the MQTT connection. The changes ensure reliable recovery from network interruptions like AWS_ERROR_MQTT_UNEXPECTED_HANGUP.
Key Changes:
- Made
reconnect_funcparameter required (breaking internal change) inMqttReconnectionHandler - Added
_active_reconnect()method toNavienMqttClientthat recreates MQTT connections with fresh authentication - Enhanced reconnection handler to emit
reconnection_failedevents and check connection state during retry attempts
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/nwp500/mqtt_reconnection.py |
Updated reconnection handler to require reconnect_func, implement active reconnection checks, and emit failure events |
src/nwp500/mqtt_client.py |
Added _active_reconnect() method and wired it to reconnection handler with event emission |
CHANGELOG.rst |
Documented the fix for version 3.1.3 |
.github/copilot-instructions.md |
Added instruction to use date command for getting current dates |
src/nwp500/mqtt_reconnection.py
Outdated
| config: "MqttConnectionConfig", | ||
| is_connected_func: Callable[[], bool], | ||
| schedule_coroutine_func: Callable[[Any], None], | ||
| reconnect_func: Callable[[], Any], |
There was a problem hiding this comment.
The type hint Callable[[], Any] should be more specific. Based on the usage in _active_reconnect() which is async, this should be Callable[[], Awaitable[None]] to accurately reflect that it's an async function returning None. Update the docstring to clarify this is an async callable.
src/nwp500/mqtt_client.py
Outdated
|
|
||
| # Update subscription manager with new connection | ||
| if self._subscription_manager and self._connection: | ||
| self._subscription_manager._connection = ( |
There was a problem hiding this comment.
Directly accessing the private _connection attribute of _subscription_manager violates encapsulation. Consider adding a public method like update_connection() or a setter property to _subscription_manager to handle connection updates properly.
| self._subscription_manager._connection = ( | |
| self._subscription_manager.update_connection( |
| _logger.info("Active reconnection successful") | ||
| else: | ||
| # Restore old connection manager | ||
| self._connection_manager = old_connection_manager |
There was a problem hiding this comment.
When restoring the old connection manager on failure, the _connection reference is not restored. This could leave self._connection pointing to a failed connection while self._connection_manager is the old one, creating an inconsistent state. Add self._connection = old_connection_manager.connection after this line.
| self._connection_manager = old_connection_manager | |
| self._connection_manager = old_connection_manager | |
| self._connection = old_connection_manager.connection |
When restoring old connection manager on failure, also restore the _connection reference to maintain consistent state. Prevents self._connection from pointing to a failed connection while self._connection_manager points to the old one.
|
✅ Fixed inconsistent state issue Good catch on the connection reference inconsistency! Issue: When reconnection failed, we restored Fix: Now restoring both references to maintain consistent state: # Restore old connection manager and connection reference
self._connection_manager = old_connection_manager
self._connection = old_connection_manager.connectionAll checks still passing:
|
- Change reconnect_func type from Callable[[], Any] to Callable[[], Awaitable[None]] - Change emit_event_func type to Callable[..., Awaitable[Any]] to match EventEmitter.emit signature - Add import for Awaitable from collections.abc - Update docstrings to clarify these are async callables - More accurate type hints for better type safety and IDE support
|
✅ Improved type hints for async callables Made the type hints more specific and accurate: Changes:
Benefits:
All checks passing:
|
…ion manager - Add update_connection() public method to MqttSubscriptionManager - Replace direct access to _connection attribute with proper method call - Improves encapsulation and maintainability - Provides clear API for updating connection after reconnection - Includes documentation explaining the method's purpose
|
✅ Fixed encapsulation violation Added proper public method instead of directly accessing private attributes. Issue: Direct access to Solution:
Benefits:
All checks passing:
|
Summary
Fixes MQTT reconnection failures after network interruptions by implementing active reconnection instead of relying on unreliable passive waiting for AWS IoT SDK auto-reconnection.
Problem
The MQTT client was experiencing persistent connection failures:
AWS_ERROR_MQTT_UNEXPECTED_HANGUP: Connection unexpectedly closedFailed to reconnect after 10 attempts. Manual reconnection requiredThe root cause was that the reconnection handler only passively waited for AWS IoT SDK to automatically reconnect, which is unreliable with WebSocket connections.
Solution
Implemented active reconnection that always recreates the MQTT connection:
Changes Made
1.
mqtt_reconnection.py(50 lines changed)reconnect_funcparameter is now required (not Optional)reconnection_failedevent when max attempts reached2.
mqtt_client.py(67 lines added)_active_reconnect()method that:3.
CHANGELOG.rst(19 lines added)4.
.github/copilot-instructions.md(2 lines added)date +\"%Y-%m-%d\"for datesImpact
✅ Public API Unchanged:
⚠️ Internal Breaking Change:
NavienMqttClientworks exactly as before - no user changes neededMqttReconnectionHandlernow requiresreconnect_funcparameter✅ Compatible: Works with existing auto-recovery examples
Testing
Benefits
Fixes connection issues that were causing coordinator timeouts and reconnection failures in Home Assistant integrations.