Skip to content

A better STT would make auto-gain not needed in ESPHome Voice Assistants (VPE+others) #152

@laupalombi

Description

@laupalombi

Problem statement

Speech recognition is one of the main reported problems consistently - and solving it has been hard to prove. One main issue (but not the only one) is that auto gain can do as good as harm, depending on the situation. But there is an opportunity to improve this with a better Speak to text (STT).

For expectations, there is an important chance this makes the Voice experience better, but its not a single solution - just another brick in the wall.

Community signals

Community survey, although biased towards high-tech users and not clear on a specific problem, showed how there is not a great satisfaction towards voice recognition + achievement of the task. Its not conclusive, but grants the chance to at least explore this opportunity

Image Image

Scope & Boundaries

In scope

  • New STT
  • Being able to test it with and without auto gain

Not in scope

  • Bigger architectural changes to Voice

Foreseen solution

Add second audio channel for voice

esphome PR - esphome/esphome#16265
aioesphomeapi PR - esphome/aioesphomeapi#1625

Risks & open questions

  • Is the solution really better?
  • Can we test it propperly?
  • How many biases do we meet when testing?

Appetite

Small - Should be done in one cycle of 2 releases.

Execution issues

No response

Decision log

Date Decision Outcome

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

Awaiting approval

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions