demo: Use Kroko ASR, show icon in tray by Mathnerd314 · Pull Request #42 · richiejp/VoxInput

Mathnerd314 · 2026-01-09T03:06:58Z

I wanted to show you what I have in my fork, in case you have any ideas on project evolution. My thoughts are:

there definitely needs to be a better configuration mechanism, like a JSON file or something, the current environment variable-per-setting approach is not scalable
resampling like I did here seems like a good idea, especially if the device capture rate is configurable
supporting multiple models / APIs is maybe possible with the Go "build tags" / "build constraints" system? Or just compiling in support for all of them, but that could lead to dependency hell.

Also I went for a kind of minimalist approach with the GUI, just a tray icon that changes, I see you have gone in a different direction with the toaster notifications

richiejp · 2026-01-09T09:57:14Z

Thanks for sharing! This looks really interesting and actually what you are doing with Sherpa here could be really useful in LocalAI as well where the Kokoro backend is presently implemented with Python, so a Sherpa backend could be really nice. I've only had a brief look so far, but here are my thoughts on the topics this brings up in general.

I'd be in favor of having JSON/YAML/TOML etc. as soon as we have a setting that requires hierarchical data (as in an array of structs) and would still want to keep env vars for the settings that are flat data. Env vars are good for external system managers (e.g. SystemD, Kubernetes/Docker compose, Nix (see https://github.com/NixOS/nixpkgs/pull/475019/changes#diff-f79247f9a5a519d871c7c9df661fe337a1235da96ecf612fcaef3ff4df252872R17 voxinput is configured completely inline within someones NixOS configuration using standard SystemD and Nix mechanisms).
Resampling is done in LocalAI presently
Also having multiple models/backends is currently outsourced to LocalAI or whatever implements the OpenAI API, I'll go into that more below
The tray only approach is good; I didn't try it because Fyne's tray doesn't work on Sway and I was getting annoyed with Fyne. Having said that, I'd still accept the tray only approach as an option in its own PR.

As to having multiple API's and running models actually inside VoxInput;

I have nothing against supporting multiple APIs and I think they could all be compiled in at once unless some of them use CGO and then yes build tags sound like a good idea to support distro's that don't have the underlying C libraries. Having said that my personal current focus is on the realtime API.
I don't want any mandatory or default features that require running an AI model inside VoxInput. The basic assumption is that the machine running VoxInput is a low spec laptop with an already saturated CPU and no GPU/TPU.
I'm not against having the option of running models inside VoxInput on CPU. Especially if it is for something like wake-words where possibly you don't want to stream all your mic audio to OpenAI or even your own LocalAI instance to implement wake-words.
I don't want to maintain a large number of models in VoxInput. If we adopt kokoro, then a better model comes along, then we replace it. Having a large choice of models is outsourced to LocalAI (or the user can directly pass through settings to Sherpa).
AFAICT you are using ONNX runtime with Sherpa which is not pure Go? So should be behind a build tag for distros that don't have the underlying libraries.

feat: Use Kroko ASR, update icons in tray

84bf8ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demo: Use Kroko ASR, show icon in tray#42

demo: Use Kroko ASR, show icon in tray#42
Mathnerd314 wants to merge 1 commit intorichiejp:mainfrom
Mathnerd314:develop

Mathnerd314 commented Jan 9, 2026

Uh oh!

richiejp commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Mathnerd314 commented Jan 9, 2026

Uh oh!

richiejp commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants