Ensure that message IDs are used in the correct order#42
Merged
saml-dev merged 7 commits intosaml-dev:mainfrom Dec 10, 2025
Merged
Ensure that message IDs are used in the correct order#42saml-dev merged 7 commits intosaml-dev:mainfrom
saml-dev merged 7 commits intosaml-dev:mainfrom
Conversation
Previously, some were named `a` and some were named `app`. Name them all `app` for consistency.
The functions `SubscribeToEventType()` and `SubscribeToStateChangedEvents()` didn't use their `ctx` parameters, and they don't really make sense. So remove them.
For now, it only includes a single method, `WriteMessage()`, provided by `*Conn`. But that will change soon.
There's soon going to be another lock, so give this one a more specific name.
Use new `App` methods to call HA services and fire HA events. This is not super interesting yet, but the work done by `Call()` will increase soon, making this helper method more of a win.
Instead of using a global function `internal.GetId()` to generate message IDs, make this a feature of `websocket.Conn`. This makes more logical sense, and it will also make it possible to fix a race in the next commit.
The websocket API insists that message IDs be monotonically increasing (i.e., not just unique). That means that we need to hold a lock the whole time from when we allocate the message ID until we send the message. Change the interface for sending messages to make this safe: * Move the "write" methods of `Conn` to a new type, `LockedConn`. * Add a method `Conn.Send()`, which takes a callback function as an argument. The callback gets a `LockedConn` as an argument. (This is the only way to get a hold of a `LockedConn`.) The callback is invoked while `Conn.writeLock` is held. The callback can thus allocate new message IDs and send messages with those IDs without fear that another thread interferes with it.
This was referenced Dec 7, 2025
Owner
|
Thanks for calling out the caveat. I will keep updating and running my home automations on the latest master as I merge your PRs, and wait to cut a release until later. That way we can get some testing in. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Home Assistant insists that message IDs be monotonically increasing. But with Go's concurrency, that wasn't being guaranteed. It could happen that
Aallocates aninternal.NextID()equal to 42.Ballocates aninternal.NextID()equal to 43.Bsends a message with message ID 43.Atries to send a message with message ID 42, but it fails, because Home Assistant has already seen message ID 43.I was indeed occasionally seeing problems like this.
To fix the problem, a connection-level lock has to be held from the time that the message ID is allocated until the message is sent, to prevent another thread from allocating its own message ID between those two actions. Ensure that the locking is done correctly using a trick similar to that used by
syscall.RawConn:websocket.Conn. This is the correct scope, because different websocket connections wouldn't have a problem about using overlapping message IDs.LockedConn. Move methodsNextMessageID()andSendMessage()fromConntoLockedConn. These are the methods that are critical with respect to message ID ordering.(*Conn).Send(), which locks theConn, creates a temporaryLockedConn, and passes it to a callback. The callback thus has exclusive access to the message IDs while it is running, and (via theLockedConninterface) has access to the critical methods.The hope is that this (somewhat subtle) mechanism only needs to be used in the implementation of
App, and not by client code. In that spirit, add methodsApp.Call()andApp.FireEvent(), so that even the service definitions don't need to know aboutLockedConn.This PR is orthogonal to #41 and they don't conflict.
Caveat: I still can't run my own home setup on this branch, because it relies on other changes in #21 that haven't been re-rolled yet. So, while every commit compiles and
example_live_test.goruns correctly at the tip of this branch, I can't do a lot more testing than that./cc @saml-dev