-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
bugSomething isn't workingSomething isn't working
Description
When trying to describe an english video with clear speech (from wikitongues), I experience the following problem: (caution: very long)
Starting transcription for /tmp/video.mkv...
Installed 121 packages in 319ms
/home/***/.cache/uv/archive-v0/9hYUItw3c5aKXEwGssDe5/lib/python3.10/site-packages/pyannote/audio/core/io.py:212: UserWarning: torchaudio._backend.list_audio_backends has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release.
torchaudio.list_audio_backends()
/home/***/.cache/uv/archive-v0/9hYUItw3c5aKXEwGssDe5/lib/python3.10/site-packages/speechbrain/utils/torch_audio_backend.py:57: UserWarning: torchaudio._backend.list_audio_backends has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release.
available_backends = torchaudio.list_audio_backends()
2026-01-11 23:28:01 - whisperx.asr - INFO - No language specified, language will be detected for each audio file (increases inference time)
2026-01-11 23:28:01 - whisperx.vads.silero - INFO - Performing voice activity detection using Silero...
Downloading: "https://github.com/snakers4/silero-vad/zipball/master" to /home/***/.cache/torch/hub/master.zip
2026-01-11 23:28:08 - whisperx.transcribe - INFO - Performing transcription...
2026-01-11 23:28:09 - whisperx.asr - INFO - Detected language: cy (0.51) in first 30s of audio
Transcript: [0.706 --> 30.526] Felly mae'n gwneud gen i, ac mae'n Scottish. Felly mae'n gweithio'r rhaglen cyfle o'r idea yng Nghymru. Felly mae'n gweithio'n... Felly mae'n gweithio'n... Felly mae'n gweithio'n... Felly mae'n gweithio'n... Felly mae'n Gweithio'n... Felly mae'n Gweithio'n... Felly mae'n Gweithio'n... Felly mae'n Gweithio'n... Felly mae'n Gweithio'n... Felly mae'n Gweithio'n... Felly mae'n Gweithio'n... Felly mae'n Gweithio'n... Felly mae'n Gweithio'n... Felly mae'n Gweithio'n... Felly mae'n Gweithio'n...
Transcript: [30.658 --> 58.398] Ac rwy'n meddwl, rwy'n Llywodraeth, ond rwy'n Llywodraeth, ond rwy'n meddwl i'n meddwl i'n meddwl i'n meddwl i Llywodraeth, ond rwy'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n
Transcript: [59.17 --> 88.926] Felly y Pynonpa i'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio
Transcript: [89.89 --> 117.758] ac wrth gwrs rydyn ni'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod i'n gwybod
Transcript: [118.37 --> 145.47] Rydyn ni'n mynd i'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio'n gweithio
Transcript: [147.138 --> 169.438] Felly mae'r ffordd o'r ffordd o'r ffordd. Mae'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r fford
Downloading: "https://download.pytorch.org/torchaudio/models/wav2vec2_fairseq_large_lv60k_asr_ls960.pth" to /home/***/.cache/torch/hub/checkpoints/wav2vec2_fairseq_large_lv60k_asr_ls960.pth
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.18G/1.18G [03:13<00:00, 6.52MB/s]
2026-01-11 23:31:36 - whisperx.transcribe - INFO - New language found (cy)! Previous was (en), loading new alignment model for new language...
2026-01-11 23:31:36 - whisperx.alignment - ERROR - No default alignment model for language: cy. Please find a wav2vec2.0 model finetuned on this language at https://huggingface.co/models, then pass the model name via --align_model [MODEL_NAME]
Traceback (most recent call last):
File "/home/***/.cache/uv/archive-v0/9hYUItw3c5aKXEwGssDe5/bin/whisperx", line 12, in <module>
sys.exit(cli())
File "/home/***/.cache/uv/archive-v0/9hYUItw3c5aKXEwGssDe5/lib/python3.10/site-packages/whisperx/__main__.py", line 97, in cli
transcribe_task(args, parser)
File "/home/***/.cache/uv/archive-v0/9hYUItw3c5aKXEwGssDe5/lib/python3.10/site-packages/whisperx/transcribe.py", line 184, in transcribe_task
align_model, align_metadata = load_align_model(
File "/home/***/.cache/uv/archive-v0/9hYUItw3c5aKXEwGssDe5/lib/python3.10/site-packages/whisperx/alignment.py", line 91, in load_align_model
raise ValueError(f"No default align-model for language: {language_code}")
ValueError: No default align-model for language: cy
Error handling video command: Failed to run WhisperX for /home/***/.cache/instant/video/e2db1d2995c3a86fe3070478ef1f5802ac98f27beec6c9d8b7514b3ac6d73cd9/e2db1d2995c3a86fe3070478ef1f5802ac98f27beec6c9d8b7514b3ac6d73cd9.mkv
Caused by: command ["uvx", "whisperx", "/home/***/.cache/instant/video/e2db1d2995c3a86fe3070478ef1f5802ac98f27beec6c9d8b7514b3ac6d73cd9/e2db1d2995c3a86fe3070478ef1f5802ac98f27beec6c9d8b7514b3ac6d73cd9.mkv", "--output_format", "json", "--output_dir", "/home/***/.cache/instant/video/e2db1d2995c3a86fe3070478ef1f5802ac98f27beec6c9d8b7514b3ac6d73cd9", "--vad_method", "silero", "--compute_type", "int8", "--device", "cpu", "--align_model", "WAV2VEC2_ASR_LARGE_LV60K_960H", "--batch_size", "4", "--segment_resolution", "chunk", "--beam_size", "5", "--patience", "1.0", "--max_line_width", "42", "--threads", "8"] exited with code 1
Error: Failed to run WhisperX for /home/***/.cache/instant/video/e2db1d2995c3a86fe3070478ef1f5802ac98f27beec6c9d8b7514b3ac6d73cd9/e2db1d2995c3a86fe3070478ef1f5802ac98f27beec6c9d8b7514b3ac6d73cd9.mkv
Caused by:
command ["uvx", "whisperx", "/home/***/.cache/instant/video/e2db1d2995c3a86fe3070478ef1f5802ac98f27beec6c9d8b7514b3ac6d73cd9/e2db1d2995c3a86fe3070478ef1f5802ac98f27beec6c9d8b7514b3ac6d73cd9.mkv", "--output_format", "json", "--output_dir", "/home/***/.cache/instant/video/e2db1d2995c3a86fe3070478ef1f5802ac98f27beec6c9d8b7514b3ac6d73cd9", "--vad_method", "silero", "--compute_type", "int8", "--device", "cpu", "--align_model", "WAV2VEC2_ASR_LARGE_LV60K_960H", "--batch_size", "4", "--segment_resolution", "chunk", "--beam_size", "5", "--patience", "1.0", "--max_line_width", "42", "--threads", "8"] exited with code 1
At first, everything works perfectly, all required dependencies were installed but then:
WhisperX thinks that this is scottish (Detected language: cy (0.51) in first 30s of audio) and generates some gibberish out of the audio sequence. At the end there is total failure as uvx can not be found. Why do we need uv run tool for the model? If it is necessary, it probably should be installed as well.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working