-
Notifications
You must be signed in to change notification settings - Fork 555
Open
Description
Summary
When using an external microphone (e.g. webcam mic like Brio 301) with built-in MacBook speakers, AEC (Acoustic Echo Cancellation) causes the transcript to fragment into single words alternating between speakers. Each word gets its own segment with a separate speaker label, making the transcript unreadable.
Setup
- Input: External USB webcam mic (Brio 301), set as priority 1
- Output: MacBook Pro Speakers (built-in)
- Transcript shows word-by-word segments alternating between "John" and "Speaker 1" with identical or overlapping timestamps
Expected behavior
AEC should handle the external mic + built-in speaker combination without fragmenting the transcript. Words from the same speaker should be grouped into continuous segments.
Notes
- Likely related to AEC mishandling the input/output device pairing when mic and speakers are on different hardware
- May also relate to bug: Live transcript renders word-by-word as separate speaker segments #4444 (word-by-word transcript rendering)
- Previous AEC issues: perf: High CPU usage due to AEC inference #4314, echo cancellation #3096, Implement AEC for pro speech-to-text models #1428
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Backlog