Rework MQTT Connection Logic by yulivee · Pull Request #2 · YuliRox/Helios.AI

yulivee · 2026-02-08T21:31:23Z

Replace hand-rolled exponential backoff reconnection (ScheduleReconnectionAsync + TrackBackgroundTask chaining)
with a Polly ResiliencePipeline, eliminating unbounded task chain growth when the broker is unreachable
Add connection loss detection: PublishAsync and SubscribeAsync now catch MqttClientNotConnectedException, update
connection state, and automatically trigger the reconnection loop
Add bounded command queue (CommandQueueDepth, default 20) that discards and logs commands when full, preventing
unbounded memory growth while disconnected
Make MaxReconnectionAttempts configurable (default 20) instead of retrying indefinitely

greptile-apps · 2026-02-08T21:34:25Z

Greptile Overview

Greptile Summary

This PR successfully replaces hand-rolled exponential backoff reconnection logic with a Polly ResiliencePipeline, addressing unbounded task chain growth when the MQTT broker is unreachable.

Key improvements:

Polly-based retry strategy with configurable MaxReconnectionAttempts (default 20) prevents indefinite reconnection attempts
Connection loss detection via MqttClientNotConnectedException in PublishAsync and SubscribeAsync automatically triggers reconnection
Bounded command queue (CommandQueueDepth, default 20) prevents unbounded memory growth during disconnection
Enhanced disposal with timeout-based cleanup of background tasks

Issues found:

Critical race condition in EnqueueCommand (line 359-369): the check-then-increment pattern is not atomic, allowing multiple threads to exceed the configured queue depth
Potential race condition in HandleConnectionLost (line 229-244): modifies _isConnected without holding _connectionLock, which could cause inconsistent state during concurrent connection attempts

Minor improvements:

Added ArgumentNullException.ThrowIfNull validation across MQTT services
Refactored DimmerCommandPublisher to extract PublishPowerCommandAsync helper
Improved disposal of Rx subjects with OnCompleted() calls
Added double-dispose protection in AlarmStateMachine

Confidence Score: 3/5

This PR has solid architectural improvements but contains two race conditions that could cause issues under concurrent load
The Polly integration follows project conventions and solves the unbounded task chain problem. However, the race condition in EnqueueCommand could allow the queue to exceed its configured depth under concurrent access, and the unsynchronized _isConnected modification in HandleConnectionLost could lead to inconsistent connection state. These threading issues need to be resolved before merging.
Pay close attention to src/LumiRise.Api/Services/Mqtt/Implementation/MqttConnectionManager.cs lines 229-244 and 359-369 for race condition fixes

greptile-apps · 2026-02-08T21:55:04Z

Greptile Overview

Greptile Summary

Replaced hand-rolled exponential backoff reconnection with Polly ResiliencePipeline, eliminating unbounded task chain growth from the previous approach. Added connection loss detection via MqttClientNotConnectedException catching in PublishAsync/SubscribeAsync, and implemented bounded command queue with configurable depth to prevent unbounded memory growth while disconnected.

Confidence Score: 3/5

This PR has important improvements but contains critical concurrency bugs that need resolution before merge
The Polly integration is clean and eliminates unbounded task chains as intended. However, the deadlock risk from awaiting _queueDrainTask while holding _connectionLock (line 155-156) is a blocking issue, and the race condition in EnqueueCommand could cause count mismatches. The unsynchronized _isConnected check in the drain loop is minor but should be fixed. Test coverage with ErrorFailingLogger is excellent.
Pay close attention to src/LumiRise.Api/Services/Mqtt/Implementation/MqttConnectionManager.cs — the deadlock and race conditions need resolution

greptile-apps · 2026-02-08T22:02:44Z

Greptile Overview

Greptile Summary

Replaces hand-rolled exponential backoff reconnection with Polly ResiliencePipeline, eliminating unbounded task chain growth. Adds connection loss detection in PublishAsync and SubscribeAsync that catches MqttClientNotConnectedException, updates state, and triggers reconnection. Implements bounded command queue (default 20 items) that discards commands when full to prevent unbounded memory growth during disconnection. Makes MaxReconnectionAttempts configurable (default 20) instead of retrying indefinitely.

Key improvements:

Polly-based resilience replaces custom reconnection chaining (aligns with CLAUDE.md convention)
Bounded queue prevents memory leaks during extended disconnection
Automatic connection loss detection in publish/subscribe paths

Critical issue found:

Infinite recursion risk in ProcessQueuedCommandsAsync → PublishAsync → EnqueueCommand loop when publish repeatedly fails with non-connection errors (e.g., timeouts)

Confidence Score: 2/5

Critical infinite recursion risk requires resolution before merge
Found critical infinite recursion vulnerability in ProcessQueuedCommandsAsync that can cause unbounded retries when publish fails with timeouts or other non-connection errors. Multiple race conditions noted in previous review threads remain unaddressed. The Polly integration and bounded queue are good improvements, but the drain loop logic has fundamental flaws
src/LumiRise.Api/Services/Mqtt/Implementation/MqttConnectionManager.cs requires attention - specifically the queue drain loop and command retry logic

greptile-apps

_{1 file reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-08T22:02:47Z