Describe the bug
audio-to-text pipeline is not returning word level timestamps.
@RUFFY-369 is there a way to change to sdpa if word level timestamps is requested without reloading the pipeline to the gpu?

Reproduction steps
- Download new
audio-to-text pipeline with flash attention 2 enabled
- Send request to pipeline including
return_timestamps=word
curl -X POST http://172.17.0.1:6666/audio-to-text -F "audio=@test-audio.mp4" -F "model_id=openai/whisper-large-v3" -F "return_timestamps=word"
- See error returned
{"error":{"message":": Error during model execution: WhisperFlashAttention2 attention does not support output_attentions."}}
Expected behaviour
Return word level timestamps.
Severity
None
Screenshots / Live demo link
No response
OS
None
Running on
None
AI-worker version
No response
Additional context
No response
Describe the bug
audio-to-textpipeline is not returning word level timestamps.@RUFFY-369 is there a way to change to
sdpaif word level timestamps is requested without reloading the pipeline to the gpu?Reproduction steps
audio-to-textpipeline with flash attention 2 enabledreturn_timestamps=wordcurl -X POST http://172.17.0.1:6666/audio-to-text -F "audio=@test-audio.mp4" -F "model_id=openai/whisper-large-v3" -F "return_timestamps=word"{"error":{"message":": Error during model execution: WhisperFlashAttention2 attention does not support output_attentions."}}Expected behaviour
Return word level timestamps.
Severity
None
Screenshots / Live demo link
No response
OS
None
Running on
None
AI-worker version
No response
Additional context
No response