Skip to content

[bugfix] guard load_audio against bytes input#9606

Open
he-yufeng wants to merge 2 commits into
modelscope:mainfrom
he-yufeng:fix/load-audio-bytes-startswith
Open

[bugfix] guard load_audio against bytes input#9606
he-yufeng wants to merge 2 commits into
modelscope:mainfrom
he-yufeng:fix/load-audio-bytes-startswith

Conversation

@he-yufeng

Copy link
Copy Markdown

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

load_audio() accepts Union[str, bytes]. When the first librosa.load() fails (e.g. an audio format soundfile can't read on its own), it falls into the except block and checks audio.startswith(('http://', 'https://')). For bytes input this raises TypeError: a bytes-like object is required, not 'str', masking the real decode error — and bytes is a documented input type (the dataset can hand the template raw audio bytes, and load_file already handles bytes).

This guards the scheme check with isinstance(audio, str), so byte audio takes the non-URL fallback path instead of crashing on the type mismatch. String/URL behavior is unchanged.

Experiment results

Added test_load_audio_bytes_input_does_not_crash_on_fallback in tests/general/test_template.py: it injects a stub librosa whose first load raises, passes bytes audio, and asserts load_audio reaches the fallback without a TypeError. The test fails on the previous code (TypeError at the startswith line) and passes with this change. flake8 / yapf / isort are clean on both files.

$ python -m pytest tests/general/test_template.py::test_load_audio_bytes_input_does_not_crash_on_fallback -q
1 passed

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to prevent a TypeError in load_audio when fallback decoding is triggered with bytes input, by ensuring audio is a string before calling startswith. It also adds a unit test to verify this behavior. However, the review feedback correctly points out that a similar TypeError will still occur in the subsequent else block when _check_path(audio) is called with bytes input, and that mocking _check_path in the test masks this underlying issue.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

res = librosa.load(audio_io, sr=sampling_rate)
except Exception:
if audio.startswith(('http://', 'https://')):
if isinstance(audio, str) and audio.startswith(('http://', 'https://')):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While guarding the startswith check with isinstance(audio, str) prevents a TypeError on this line, a similar TypeError will still occur on line 309 in the else block:

audio_io = _check_path(audio) or audio

If audio is of type bytes, _check_path(audio) is called. _check_path expects a str and performs string-specific operations (like data.startswith('data:') on line 118), which will raise TypeError: startswith first arg must be bytes or a tuple of bytes, not str.

To fully support bytes input without crashing, we should also avoid calling _check_path when audio is bytes.

Comment thread tests/general/test_template.py Outdated

fake_librosa.load = fake_load
monkeypatch.setitem(sys.modules, 'librosa', fake_librosa)
monkeypatch.setattr(vision_utils, '_check_path', lambda p: None)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Mocking _check_path here masks a secondary TypeError bug in vision_utils.py line 309. When audio is bytes, _check_path(audio) is called in the else block, which crashes because _check_path performs string operations (e.g., data.startswith('data:')).

If we don't mock _check_path here, the test will correctly fail and expose this issue. Consider removing this mock and fixing the underlying bug in vision_utils.py by ensuring _check_path is only called for str inputs.

…back

When load_audio's first decode fails for bytes input, the except branch
calls _check_path(audio) with bytes, which then crashed on the str-only
checks (e.g. data.startswith('data:')). Return None for non-str input so the
caller falls back to the raw bytes, completing the bytes guard started in the
previous commit. The existing fallback test now exercises the real
_check_path (the patch is removed) and a focused _check_path bytes test is
added.
@he-yufeng he-yufeng changed the title [bugfix] guard load_audio URL check against bytes input [bugfix] guard load_audio against bytes input Jun 20, 2026
@he-yufeng

Copy link
Copy Markdown
Author

Extended this to fully guard load_audio against bytes input. The first commit fixed the audio.startswith(...) check (line 305); but on the same except-branch fallback, _check_path(audio) was still called with bytes and crashed on its str-only checks (data.startswith('data:')). The second commit makes _check_path return None for non-str input so the caller falls back to the raw bytes. The existing fallback test now exercises the real _check_path (the mock is removed) and a focused _check_path bytes test is added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant