Skip to content

Silero adapter cuts off prefixPaddingSamples in non-streaming STT #1572

@ianjaku

Description

@ianjaku

Describe the bug

The Silero adapter cuts off the prefixPaddingSamples before calling _recognize when not streaming.

This causes the following:

  1. We pass prefixPaddingDuration as 500ms so that we receive the audio right before the VAD fires
  2. Right before _recognize gets called in vad.ts, the adapter cuts off the prefix.

The culprit is line 306 on 77a8355: this.#speechBuffer.subarray(this.#prefixPaddingSamples, speechBufferIndex)
Which should be this.#speechBuffer.subarray(0, speechBufferIndex) instead.

Relevant log output

No response

Describe your environment

We're running agents framework 1.2.7 and using the Silero adapter version 1.2.7, but we verified the same issue exists on the latest version of Silero.

Minimal reproducible example

  1. Set up the most basic Livekit pipeline
  2. Create a custom STT handler by extending the stt.STT class
  3. Implement the _recognize class
  4. Pass prefixPaddingDuration: 500 to the Silero settings
  5. The audio buffer in _recognize will not receive the 500ms before the VAD fires

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions