Skip to content

demo: Use Kroko ASR, show icon in tray#42

Draft
Mathnerd314 wants to merge 1 commit intorichiejp:mainfrom
Mathnerd314:develop
Draft

demo: Use Kroko ASR, show icon in tray#42
Mathnerd314 wants to merge 1 commit intorichiejp:mainfrom
Mathnerd314:develop

Conversation

@Mathnerd314
Copy link
Contributor

I wanted to show you what I have in my fork, in case you have any ideas on project evolution. My thoughts are:

  • there definitely needs to be a better configuration mechanism, like a JSON file or something, the current environment variable-per-setting approach is not scalable
  • resampling like I did here seems like a good idea, especially if the device capture rate is configurable
  • supporting multiple models / APIs is maybe possible with the Go "build tags" / "build constraints" system? Or just compiling in support for all of them, but that could lead to dependency hell.

Also I went for a kind of minimalist approach with the GUI, just a tray icon that changes, I see you have gone in a different direction with the toaster notifications
image

@richiejp
Copy link
Owner

richiejp commented Jan 9, 2026

Thanks for sharing! This looks really interesting and actually what you are doing with Sherpa here could be really useful in LocalAI as well where the Kokoro backend is presently implemented with Python, so a Sherpa backend could be really nice. I've only had a brief look so far, but here are my thoughts on the topics this brings up in general.

  • I'd be in favor of having JSON/YAML/TOML etc. as soon as we have a setting that requires hierarchical data (as in an array of structs) and would still want to keep env vars for the settings that are flat data. Env vars are good for external system managers (e.g. SystemD, Kubernetes/Docker compose, Nix (see https://github.com/NixOS/nixpkgs/pull/475019/changes#diff-f79247f9a5a519d871c7c9df661fe337a1235da96ecf612fcaef3ff4df252872R17 voxinput is configured completely inline within someones NixOS configuration using standard SystemD and Nix mechanisms).
  • Resampling is done in LocalAI presently
  • Also having multiple models/backends is currently outsourced to LocalAI or whatever implements the OpenAI API, I'll go into that more below
  • The tray only approach is good; I didn't try it because Fyne's tray doesn't work on Sway and I was getting annoyed with Fyne. Having said that, I'd still accept the tray only approach as an option in its own PR.

As to having multiple API's and running models actually inside VoxInput;

  • I have nothing against supporting multiple APIs and I think they could all be compiled in at once unless some of them use CGO and then yes build tags sound like a good idea to support distro's that don't have the underlying C libraries. Having said that my personal current focus is on the realtime API.
  • I don't want any mandatory or default features that require running an AI model inside VoxInput. The basic assumption is that the machine running VoxInput is a low spec laptop with an already saturated CPU and no GPU/TPU.
  • I'm not against having the option of running models inside VoxInput on CPU. Especially if it is for something like wake-words where possibly you don't want to stream all your mic audio to OpenAI or even your own LocalAI instance to implement wake-words.
  • I don't want to maintain a large number of models in VoxInput. If we adopt kokoro, then a better model comes along, then we replace it. Having a large choice of models is outsourced to LocalAI (or the user can directly pass through settings to Sherpa).
  • AFAICT you are using ONNX runtime with Sherpa which is not pure Go? So should be behind a build tag for distros that don't have the underlying libraries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants