Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@jozefchutka would appreciate if you could check this out! 🙏 We'll put out a new release tomorrow, aiming to fit this in. |
|
I tested this branch with:
Please consider adding these three cases into integration tests, I provided minimalistic repro code and .pcm file within each issue in the first post |
|
Thanks so much for testing! Strangely, I tested your exact inputs from those issues and got much better results... Let me revisit. |
|
the python version can also run into some weird issues, like from transformers import pipeline
import numpy as np
pipe = pipeline("automatic-speech-recognition", "openai/whisper-base")
audio = "src.pcm"
audio = np.fromfile(audio, dtype=np.float32)
print(audio.shape)
result = pipe(audio, return_timestamps=True, num_beams=1, do_sample=False)
print(result)producing notice (empty chunk + backwards timestamp) The current version I've got in my dev branch, after making a few fixes, looks like would you say that's a suitable transcription? |
|
here's the current version for #1590 (the "Oh, no" is a hallucination which also appears in python version) |
|
and here are the outputs for src.pcm, for both phrase-level and word-level timestamps: phrase-level: word-level: @jozefchutka let me know if that looks good now! :) And if any other test cases of yours are still having issues. |
|
Thanks for putting so much effort into this @xenova . I re-tested the latest commit
Do you think 1590 can be somehow fixed? |
Did a deep dive recently to uncover and fix issues with whisper models.
Also add spectrogram unit tests