Skip to content

voice-command: optional LED indicator with listening/thinking states#14

Open
jckras wants to merge 9 commits into
mainfrom
voice-command-led-indicator
Open

voice-command: optional LED indicator with listening/thinking states#14
jckras wants to merge 9 commits into
mainfrom
voice-command-led-indicator

Conversation

@jckras
Copy link
Copy Markdown
Contributor

@jckras jckras commented May 21, 2026

Summary

  • Adds a new model viam:conversation-bundle:led-bridge (rdk:service:generic) that opens a USB-serial port and forwards DoCommand payloads as line-delimited JSON to an external LED indicator firmware. The bridge itself is payload-agnostic — semantics live entirely in the firmware on the other end.
  • Adds an optional led_indicator field on viam:conversation-bundle:voice-command (resource name of a generic resource). When set, voice-command fires three states at the indicator on lifecycle transitions:
    • {"state": "listening"} — STT actively capturing speech (post wake-word, or in conversation follow-up mode)
    • {"state": "thinking"} — LLM call, command dispatch, and TTS playback
    • {"state": "idle"} — wake-mode idle, or conversation closed
  • Transitions are designed to avoid flickering: when a conversation continues across turns, the LED stays on through thinking and is replaced by listening on the next STT cycle (no intermediate idle flash). idle only fires when a turn ends without continuation.
  • Field is optional and defaults to off, so existing configs without an indicator continue to work unchanged.

Companion firmware

The corresponding ESP32 firmware (Arduino sketch) for a WS2812B strip is not in this repo — it's a standalone sketch users flash to their indicator hardware. A user can implement their own driver to any other indicator (single-LED, BlinkStick, etc.) by writing a generic resource that consumes the same {"state": "..."} protocol.

Test plan

  • go build ./... passes (verified locally)
  • led-bridge resource opens the configured serial port on Reconfigure and logs the port + baud
  • DoCommand on led-bridge sends payload + \n to the device
  • In isolation: sending {"state": "listening"} to a flashed ESP32 produces blue pulse; {"state": "thinking"} produces amber pulse; {"state": "idle"} clears
  • End-to-end: wake word triggers blue pulse, transitions to amber during LLM/TTS, returns to blue on continued conversation, goes dark on conversation end
  • Existing voice-command configs without led_indicator set still work unchanged

@jckras jckras requested review from EshaMaharishi and NickPPC and removed request for EshaMaharishi May 21, 2026 18:57
// STT stream cycle (e.g. Google's 305s reopens) while only waiting for
// a wake word.
var firedListening bool
if state == stateListening {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, are we sure firedListening is needed? Can we just check stateListening?

ale7714 and others added 8 commits May 22, 2026 12:01
The deferred idle in listenForCommand fired before run() set thinking on
every captured utterance, flickering the LED idle between listening and
thinking. Hand all idle transitions to run() and handleLull so the LED
can hold thinking across the listen→capture→interpret boundary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
multi-led and other off-the-shelf LED modules register under
rdk:component:generic, not rdk:service:generic. Try the component API
first at lookup, fall back to the service API so this module's own
led-bridge still works as an indicator target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
signalLED ran synchronously on the conversation worker, so a slow or
hung LED resource (e.g. USB-serial bridge stuck on a disconnected
device) would stall STT, wake-word detection, and TTS. Move dispatch to
a dedicated ledLoop goroutine fed by a small buffered channel; enqueue
is non-blocking with drop-on-full semantics, and each DoCommand gets a
500ms timeout. The LED becomes purely best-effort — the conversation
loop never waits on it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Status now returns the configured serial port, baud rate, cumulative
write counters, and last-error info — surfaced through DoCommand's
reserved "status" key since Status isn't on resource.Resource. A
deferred debug log on every DoCommand prints the status snapshot so a
debug-level tail shows counters and last_error advancing in real time.
Stats are guarded by a separate mutex so Status queries don't block
behind a slow port.Write.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
speak() blocks for the full TTS playback duration, so firing the idle
signal after speakEndCue left the LED stuck in "thinking" for the
entire end-cue audio. Move the idle signal in front of every blocking
speak path on conversation-ending branches so the LED transitions
immediately while the audio plays out — covering the run-loop
end-of-turn, the interpret-error fallback, and both handleLull exits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The errSilence path that ends a conversation cleanly (silent + no
command in progress) was missing the idle signal, so the LED stayed
in its previous state for the full duration of the "Moving along" end
cue and beyond. Fire idle before the blocking speakEndCue, matching
every other conversation-ending branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the wake word fires listening but no follow-up speech arrives
before listenTimeout, errSilence is returned but the wake-mode branch
in run() did nothing — leaving the LED stuck in listening until the
next wake word or a conversation actually ended. Signal idle on that
path so the LED tracks the actual listening state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When Google STT closes the stream with IsFinal and no captured speech
(wake word fired but the user said nothing else), listenForCommand
returns "" with no error — bypassing the errSilence path. That left
the LED stuck in listening and gave the user no audible feedback
because no utterance reached the LLM.

Reset the LED to idle on that branch in wake mode, and add a new
wake_silence_cue config (default "Sorry, I didn't catch that.", opt
out via explicit empty string) so the user hears the wake word was
heard but timed out. Conversation mode is left alone since the next
iteration re-signals listening.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants