For any future developers of the client, here's some general info, including dev workflow, bugs, and future improvements.
Tips
- We noticed that lowering the log level (effectively disabling logging) provided significantly better performance and allows for much higher framerates. To do this, open
idf.py menuconfig (described in the client README) and search (press "/") for "LOG_DEFAULT_LEVEL" and change the "Default log verbosity" to "No output." Note that to reenable logging, change this setting back to a level such as "Debug."
- I highly recommend using clangd for LSP support in any editor when developing the client. For more information on how to get started, read the client README.
- We initially used Arduino, then switched to esp-idf with the Arduino library, then finally switched to just esp-idf. For more information on our reasoning, check out this PR.
Bugs
- Random crashes at higher framerates.
- The cause is known, but why it happens is still a question. Sometimes reading the message size here returns either 0 or an extremely large number. This can happen for 2 reasons: either we are reading misaligned (reading a message not at the beginning) or there is some sort of corruption going on. It's not entirely clear, we've printed on the server and confirmed it's sending the correct bytes, and also confirmed we are reading the correct bytes on the client, but it's possible we are missing something.
- Here is a very hacky fix to the issue, this was just done temporarily so the device didn't crash. We close the socket when an issue occurs because otherwise there's a good chance we read the next message at the wrong starting byte.
- This issue also extends in set_leds_batched, sometimes we read everything normally but some of the batch lengths in the header are again, a very large and incorrect number.
- Some panels lag behind and need to play catch up.
- Occasionally the network lags behind and we read a lot of packets in one big burst. This is expected, but the way we handle this scenario is not robust. The problem is that we use a TCP socket for everything, from sending general messages to updating frames on-demand. TCP guarantees in-order delivery, meaning we must handle all previous frames before we can handle the new frames. Ideally we'd open a separate UDP socket (in conjunction with the existing TCP socket) when we start streaming so we have the choice to drop older frames and instantly catch up.
- This issue is exacerbated when the TCP receive buffer fills up, which is especially noticeable at higher framerates. Interestingly, when this occurs, it usually also triggers the "random crash" bug, described earlier. Note that you can technically increase the TCP buffer size further (we've already done this) but eventually it will fill up again.
- Synchronization issues.
- This issue sort of builds off the previous one, except mainly focuses on improving general timing/synchronization outside of the TCP issues. If we receive a "set_leds" message at different times for the same frame at each microcontroller, the stream may appear out of sync. We never got around to solving this issue, but we had a few ideas on how to go about it.
- As a quick solution, we initially developed a separate "redraw" message that we'd send to all microcontrollers after receiving a "set_leds" message. We found that adding this extra message slowed down the microcontroller substantially for reasons that aren't clear. It's not a perfect solution, but is definitely an improvement.
- Another option is to have a clock on each microcontroller via network time protocol (NTP), and redraw each frame at a specific timestamp so each microcontroller has enough time to handle the "set_leds." This isn't a bad solution but I'm also not sure if it's the best one.
- The last option we thought of was to have each microcontroller run its own "render" loop, where it maintains its own clock and redraws at a precise interval. If that interval is out of sync (updating 1ms ahead of the other microcontrollers), we can sync it back up with the server. Note that this "sync with server" step would usually happen automatically at a particular interval and this solution would also utilize NTP.
- These are all just potential improvements, we generally found that running at a high enough framerate with a stable enough network, desync was unnoticeable to the eye. However, I expect this issue to be more obvious when running through the school network.
Any questions about the client, feel free to leave a comment on this issue and I will respond when I get the chance.
For any future developers of the client, here's some general info, including dev workflow, bugs, and future improvements.
Tips
idf.py menuconfig(described in the client README) and search (press "/") for "LOG_DEFAULT_LEVEL" and change the "Default log verbosity" to "No output." Note that to reenable logging, change this setting back to a level such as "Debug."Bugs
Any questions about the client, feel free to leave a comment on this issue and I will respond when I get the chance.