Skip to content

[bug] Qwen2.5-VL vllm infer video failed on transformers v5 #9582

Description

@Tohrusky

Checklist / 检查清单

  • I have searched existing issues, and this is a new bug report. / 我已经搜索过现有的 issues,确认这是一个新的 bug report。

Bug Description / Bug 描述

#9357 上个pr被自动关了,重新开一个

随便跑个视频

uv run --no-sync swift infer \
--model Qwen/Qwen2.5-VL-3B-Instruct \
--use_hf True \
--infer_backend vllm \
--val_dataset_sample 4 \
--val_dataset ./video.jsonl \
--temperature 0

vllm 模式不能跑,tf 模式可以跑

qwen-vl-utils using decord to read video.

WARNING 05-15 11:34:45 [input_processor.py:235] Passing raw prompts to InputProcessor is deprecated and will be removed in v0.18. You should instead pass the outputs of Renderer.render_cmpl() or Renderer.render_chat().
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/huggingface_hub/dataclasses.py", line 144, in __strict_setattr__
[rank0]:     validator(value)
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/huggingface_hub/dataclasses.py", line 625, in validator
[rank0]:     type_validator(field.name, value, field.type)
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/huggingface_hub/dataclasses.py", line 482, in type_validator
[rank0]:     type_validator(name, value, args[0])
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/huggingface_hub/dataclasses.py", line 470, in type_validator
[rank0]:     validator(name, value, args)
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/huggingface_hub/dataclasses.py", line 507, in _validate_union
[rank0]:     raise TypeError(
[rank0]: TypeError: Field 'fps' with value [1.991761570147807] doesn't match any type in (<class 'int'>, <class 'float'>, <class 'NoneType'>). Errors: Field 'fps' expected int, got list (value: [1.991761570147807]); Field 'fps' expected float, got list (value: [1.991761570147807]); Field 'fps' expected NoneType, got list (value: [1.991761570147807])

[rank0]: The above exception was the direct cause of the following exception:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/multimodal/processing/context.py", line 269, in call_hf_processor
[rank0]:     output = hf_processor(**data, **allowed_kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py", line 82, in __call__
[rank0]:     output_kwargs = self._merge_kwargs(
[rank0]:                     ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1377, in _merge_kwargs
[rank0]:     validate_typed_dict(typed_dict_obj, output_kwargs[key])
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/huggingface_hub/dataclasses.py", line 333, in validate_typed_dict
[rank0]:     strict_cls(**data)  # will raise if validation fails
[rank0]:     ^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/huggingface_hub/dataclasses.py", line 275, in init_with_validate
[rank0]:     initial_init(self, *args, **kwargs)  # type: ignore [call-arg]
[rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "<string>", line 21, in __init__
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/huggingface_hub/dataclasses.py", line 146, in __strict_setattr__
[rank0]:     raise StrictDataclassFieldValidationError(field=name, cause=e) from e
[rank0]: huggingface_hub.errors.StrictDataclassFieldValidationError: Validation error for field 'fps':
[rank0]:     TypeError: Field 'fps' with value [1.991761570147807] doesn't match any type in (<class 'int'>, <class 'float'>, <class 'NoneType'>). Errors: Field 'fps' expected int, got list (value: [1.991761570147807]); Field 'fps' expected float, got list (value: [1.991761570147807]); Field 'fps' expected NoneType, got list (value: [1.991761570147807])

[rank0]: The above exception was the direct cause of the following exception:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/ms-swift/swift/cli/infer.py", line 5, in <module>
[rank0]:     infer_main()
[rank0]:   File "/home/ms-swift/swift/pipelines/infer/infer.py", line 308, in infer_main
[rank0]:     return SwiftInfer(args).main()
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/swift/pipelines/base.py", line 52, in main
[rank0]:     result = self.run()
[rank0]:              ^^^^^^^^^^
[rank0]:   File "/home/ms-swift/swift/pipelines/infer/infer.py", line 97, in run
[rank0]:     result = self.infer_dataset()
[rank0]:              ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/swift/pipelines/infer/infer.py", line 257, in infer_dataset
[rank0]:     result = self._batch_infer(shard_dataset, request_config)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/swift/pipelines/infer/infer.py", line 296, in _batch_infer
[rank0]:     resp_list = self.infer(val_dataset, request_config, use_tqdm=True, **self.infer_kwargs)
[rank0]:                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/swift/infer_engine/vllm_engine.py", line 784, in infer
[rank0]:     self._add_request(inputs, generation_config, request_id, adapter_request=adapter_request)
[rank0]:   File "/home/ms-swift/swift/infer_engine/vllm_engine.py", line 453, in _add_request
[rank0]:     return self.engine.add_request(request_id, llm_inputs, generation_config, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 248, in add_request
[rank0]:     request = self.input_processor.process_inputs(
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/v1/engine/input_processor.py", line 244, in process_inputs
[rank0]:     processed_inputs = self.input_preprocessor.preprocess(
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 288, in preprocess
[rank0]:     return self._process_decoder_only_prompt(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 269, in _process_decoder_only_prompt
[rank0]:     return self._prompt_to_llm_inputs(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 220, in _prompt_to_llm_inputs
[rank0]:     return self._process_tokens(prompt)  # type: ignore[arg-type]
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 144, in _process_tokens
[rank0]:     inputs = self._process_multimodal(
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 103, in _process_multimodal
[rank0]:     return self.renderer._process_multimodal(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/renderers/base.py", line 650, in _process_multimodal
[rank0]:     mm_inputs = mm_processor.apply(mm_processor_inputs, mm_timing_ctx)
[rank0]:                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/multimodal/processing/processor.py", line 1685, in apply
[rank0]:     ) = self._cached_apply_hf_processor(inputs, timing_ctx)
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/multimodal/processing/processor.py", line 1474, in _cached_apply_hf_processor
[rank0]:     ) = self._apply_hf_processor_main(
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/multimodal/processing/processor.py", line 1291, in _apply_hf_processor_main
[rank0]:     mm_processed_data = self._apply_hf_processor_mm_only(
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/multimodal/processing/processor.py", line 1232, in _apply_hf_processor_mm_only
[rank0]:     _, mm_processed_data, _ = self._apply_hf_processor_text_mm(
[rank0]:                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/multimodal/processing/processor.py", line 1153, in _apply_hf_processor_text_mm
[rank0]:     processed_data = self._call_hf_processor(
[rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 941, in _call_hf_processor
[rank0]:     return super()._call_hf_processor(prompt, mm_data, mm_kwargs, tok_kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/multimodal/processing/processor.py", line 1110, in _call_hf_processor
[rank0]:     return self.info.ctx.call_hf_processor(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/multimodal/processing/context.py", line 298, in call_hf_processor
[rank0]:     raise ValueError(msg) from exc
[rank0]: ValueError: Failed to apply Qwen2_5_VLProcessor on data={'text': '<|video_pad|>', 'videos': [array([[[[  2,   2,   2, ...,   0,   0,   0],
[rank0]:          [  2,   2,   2, ...,   0,   0,   0],
[rank0]:          [  2,   2,   2, ...,   0,   0,   0],
[rank0]:          ...,
[rank0]:          [  5,   5,   5, ...,   4,   4,   4],
[rank0]:          [  5,   5,   5, ...,   7,   7,   7],
[rank0]:          [  5,   5,   5, ...,  10,  10,  10]],

[rank0]:         [[  2,   2,   2, ...,   0,   0,   0],
[rank0]:          [  2,   2,   2, ...,   0,   0,   0],
[rank0]:          [  2,   2,   2, ...,   0,   0,   0],
[rank0]:          ...,
[rank0]:          [  4,   4,   4, ...,   4,   4,   4],
[rank0]:          [  4,   4,   4, ...,   7,   7,   7],
[rank0]:          [  4,   4,   4, ...,  10,  10,  10]],

[rank0]:         [[  2,   2,   2, ...,   0,   0,   0],
[rank0]:          [  2,   2,   2, ...,   0,   0,   0],
[rank0]:          [  2,   2,   2, ...,   0,   0,   0],
[rank0]:          ...,
[rank0]:          [  9,   9,   9, ...,  12,  12,  12],
[rank0]:          [  9,   9,   9, ...,  15,  15,  15],
[rank0]:          [  9,   9,   9, ...,  18,  18,  18]]],


[rank0]:        [[[  0,   0,   0, ...,   0,   0,   0],
[rank0]:          [  0,   0,   0, ...,   0,   0,   0],
[rank0]:          [  0,   0,   0, ...,   0,   0,   0],
[rank0]:          ...,
[rank0]:          [  3,   3,   4, ...,   7,   7,   7],
[rank0]:          [  3,   3,   4, ...,   6,   6,   6],
[rank0]:          [  3,   3,   4, ...,   6,   6,   6]],

[rank0]:         [[  5,   5,   5, ...,   0,   0,   0],
[rank0]:          [  5,   5,   5, ...,   0,   0,   0],
[rank0]:          [  4,   4,   4, ...,   0,   0,   0],
[rank0]:          ...,
[rank0]:          [  2,   2,   3, ...,   7,   7,   7],
[rank0]:          [  2,   2,   3, ...,   6,   6,   6],
[rank0]:          [  2,   2,   3, ...,   6,   6,   6]],

[rank0]:         [[  4,   4,   4, ...,   0,   0,   0],
[rank0]:          [  4,   4,   4, ...,   0,   0,   0],
[rank0]:          [  3,   3,   3, ...,   0,   0,   0],
[rank0]:          ...,
[rank0]:          [  7,   7,   8, ...,  17,  17,  17],
[rank0]:          [  7,   7,   8, ...,  16,  16,  16],
[rank0]:          [  7,   7,   8, ...,  16,  16,  16]]],


[rank0]:        [[[  2,   1,   0, ...,   0,   0,   0],
[rank0]:          [  3,   2,   1, ...,   0,   0,   0],
[rank0]:          [  5,   4,   2, ...,   0,   0,   0],
[rank0]:          ...,
[rank0]:          [  2,   2,   2, ...,   8,   8,   8],
[rank0]:          [  2,   2,   2, ...,   9,   9,   9],
[rank0]:          [  2,   2,   2, ...,   9,   9,   9]],

[rank0]:         [[  7,   6,   4, ...,   1,   1,   1],
[rank0]:          [  8,   7,   6, ...,   1,   1,   1],
[rank0]:          [ 10,   9,   7, ...,   1,   1,   1],
[rank0]:          ...,
[rank0]:          [  1,   1,   1, ...,   8,   8,   8],
[rank0]:          [  1,   1,   1, ...,   9,   9,   9],
[rank0]:          [  1,   1,   1, ...,   9,   9,   9]],

[rank0]:         [[  1,   0,   0, ...,   1,   1,   1],
[rank0]:          [  2,   1,   0, ...,   1,   1,   1],
[rank0]:          [  4,   3,   1, ...,   1,   1,   1],
[rank0]:          ...,
[rank0]:          [  6,   6,   6, ...,  18,  18,  18],
[rank0]:          [  6,   6,   6, ...,  19,  19,  19],
[rank0]:          [  6,   6,   6, ...,  19,  19,  19]]],


[rank0]:        ...,


[rank0]:        [[[165, 165, 165, ...,  79,  79,  79],
[rank0]:          [165, 165, 165, ...,  79,  79,  79],
[rank0]:          [165, 165, 165, ...,  79,  79,  79],
[rank0]:          ...,
[rank0]:          [ 97,  97,  97, ..., 181, 182, 183],
[rank0]:          [ 97,  97,  97, ..., 181, 182, 183],
[rank0]:          [ 97,  97,  97, ..., 181, 183, 183]],

[rank0]:         [[ 32,  32,  32, ...,  29,  29,  29],
[rank0]:          [ 32,  32,  32, ...,  29,  29,  29],
[rank0]:          [ 32,  32,  32, ...,  29,  29,  29],
[rank0]:          ...,
[rank0]:          [ 23,  23,  23, ...,  91,  92,  93],
[rank0]:          [ 23,  23,  23, ...,  91,  92,  93],
[rank0]:          [ 23,  23,  23, ...,  91,  93,  93]],

[rank0]:         [[ 24,  24,  24, ...,  12,  12,  12],
[rank0]:          [ 24,  24,  24, ...,  12,  12,  12],
[rank0]:          [ 24,  24,  24, ...,  12,  12,  12],
[rank0]:          ...,
[rank0]:          [ 14,  14,  14, ...,  47,  48,  49],
[rank0]:          [ 14,  14,  14, ...,  47,  48,  49],
[rank0]:          [ 14,  14,  14, ...,  47,  49,  49]]],


[rank0]:        [[[160, 160, 160, ..., 145, 169, 192],
[rank0]:          [160, 160, 160, ..., 156, 181, 204],
[rank0]:          [160, 160, 160, ..., 168, 194, 218],
[rank0]:          ...,
[rank0]:          [ 80,  80,  80, ..., 143, 143, 143],
[rank0]:          [ 80,  80,  80, ..., 143, 143, 143],
[rank0]:          [ 80,  80,  80, ..., 143, 143, 143]],

[rank0]:         [[ 24,  24,  24, ..., 111, 135, 158],
[rank0]:          [ 24,  24,  24, ..., 122, 147, 170],
[rank0]:          [ 24,  24,  24, ..., 134, 160, 184],
[rank0]:          ...,
[rank0]:          [ 25,  25,  25, ...,  55,  55,  55],
[rank0]:          [ 25,  25,  25, ...,  55,  55,  55],
[rank0]:          [ 25,  25,  25, ...,  55,  55,  55]],

[rank0]:         [[ 22,  22,  22, ...,  89, 113, 136],
[rank0]:          [ 22,  22,  22, ..., 100, 125, 148],
[rank0]:          [ 22,  22,  22, ..., 112, 138, 162],
[rank0]:          ...,
[rank0]:          [ 12,  12,  12, ...,  37,  37,  37],
[rank0]:          [ 12,  12,  12, ...,  37,  37,  37],
[rank0]:          [ 12,  12,  12, ...,  37,  37,  37]]],


[rank0]:        [[[129, 129, 128, ...,  40,  40,  40],
[rank0]:          [129, 129, 128, ...,  40,  40,  40],
[rank0]:          [129, 129, 128, ...,  40,  40,  40],
[rank0]:          ...,
[rank0]:          [122, 167, 207, ..., 178, 178, 178],
[rank0]:          [141, 182, 212, ..., 178, 178, 178],
[rank0]:          [156, 195, 211, ..., 178, 178, 178]],

[rank0]:         [[ 16,  16,  17, ...,  34,  34,  34],
[rank0]:          [ 16,  16,  17, ...,  34,  34,  34],
[rank0]:          [ 16,  16,  17, ...,  34,  34,  34],
[rank0]:          ...,
[rank0]:          [ 55, 100, 139, ...,  19,  19,  19],
[rank0]:          [ 75, 116, 146, ...,  19,  19,  19],
[rank0]:          [ 90, 129, 147, ...,  19,  19,  19]],

[rank0]:         [[ 11,  11,  11, ...,  11,  11,  11],
[rank0]:          [ 11,  11,  11, ...,  11,  11,  11],
[rank0]:          [ 11,  11,  11, ...,  11,  11,  11],
[rank0]:          ...,
[rank0]:          [  8,  49,  88, ...,  20,  20,  20],
[rank0]:          [ 22,  61,  89, ...,  20,  20,  20],
[rank0]:          [ 35,  72,  86, ...,  20,  20,  20]]]],
[rank0]:       shape=(274, 3, 560, 1008), dtype=uint8)]} with kwargs={'do_resize': False, 'fps': [1.991761570147807], 'return_tensors': 'pt'}
[rank0]:[W515 11:34:50.839824072 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

How to Reproduce / 如何复现

印象中从 tf v4 后几个版本就有这个问题,目前装的 tf 5.8.1 & swift latest commit, vllm 0.19.1

Additional Information / 补充信息

fix

最近的pr把这周围的代码重构了 希望能顺便修一下这里,可能需要注意 fps 有没有正确传入

swift/template/templates/qwen.py

            if self.version == 'v2_5':
                if self.mode != 'vllm': # 这里可能也要绕一下 vllm,但是看了下新版 tf 代码,似乎也得和 qwen3vl 一样传 video_metadata
                    inputs.mm_processor_kwargs.setdefault('fps', []).append(video_kwargs)
            elif self.version == 'v3':
                if self.mode != 'vllm':
                    video, video_metadata = video
                    inputs.mm_processor_kwargs.setdefault('video_metadata', []).append(video_metadata)
                    tokens = ['<|video_pad|>']
                inputs.mm_processor_kwargs['do_sample_frames'] = False
            if isinstance(video, torch.Tensor):
                video = video.to(torch.uint8)
            inputs.videos[index] = video
            return tokens

EvolvingLMMs-Lab/lmms-eval#1269

有些框架之前这里漏传 video_metadata,对视频时间理解挺灾难的

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions