qwen-vl-utils using decord to read video.
WARNING 05-15 11:34:45 [input_processor.py:235] Passing raw prompts to InputProcessor is deprecated and will be removed in v0.18. You should instead pass the outputs of Renderer.render_cmpl() or Renderer.render_chat().
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/huggingface_hub/dataclasses.py", line 144, in __strict_setattr__
[rank0]: validator(value)
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/huggingface_hub/dataclasses.py", line 625, in validator
[rank0]: type_validator(field.name, value, field.type)
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/huggingface_hub/dataclasses.py", line 482, in type_validator
[rank0]: type_validator(name, value, args[0])
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/huggingface_hub/dataclasses.py", line 470, in type_validator
[rank0]: validator(name, value, args)
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/huggingface_hub/dataclasses.py", line 507, in _validate_union
[rank0]: raise TypeError(
[rank0]: TypeError: Field 'fps' with value [1.991761570147807] doesn't match any type in (<class 'int'>, <class 'float'>, <class 'NoneType'>). Errors: Field 'fps' expected int, got list (value: [1.991761570147807]); Field 'fps' expected float, got list (value: [1.991761570147807]); Field 'fps' expected NoneType, got list (value: [1.991761570147807])
[rank0]: The above exception was the direct cause of the following exception:
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/multimodal/processing/context.py", line 269, in call_hf_processor
[rank0]: output = hf_processor(**data, **allowed_kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py", line 82, in __call__
[rank0]: output_kwargs = self._merge_kwargs(
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1377, in _merge_kwargs
[rank0]: validate_typed_dict(typed_dict_obj, output_kwargs[key])
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/huggingface_hub/dataclasses.py", line 333, in validate_typed_dict
[rank0]: strict_cls(**data) # will raise if validation fails
[rank0]: ^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/huggingface_hub/dataclasses.py", line 275, in init_with_validate
[rank0]: initial_init(self, *args, **kwargs) # type: ignore [call-arg]
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "<string>", line 21, in __init__
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/huggingface_hub/dataclasses.py", line 146, in __strict_setattr__
[rank0]: raise StrictDataclassFieldValidationError(field=name, cause=e) from e
[rank0]: huggingface_hub.errors.StrictDataclassFieldValidationError: Validation error for field 'fps':
[rank0]: TypeError: Field 'fps' with value [1.991761570147807] doesn't match any type in (<class 'int'>, <class 'float'>, <class 'NoneType'>). Errors: Field 'fps' expected int, got list (value: [1.991761570147807]); Field 'fps' expected float, got list (value: [1.991761570147807]); Field 'fps' expected NoneType, got list (value: [1.991761570147807])
[rank0]: The above exception was the direct cause of the following exception:
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/ms-swift/swift/cli/infer.py", line 5, in <module>
[rank0]: infer_main()
[rank0]: File "/home/ms-swift/swift/pipelines/infer/infer.py", line 308, in infer_main
[rank0]: return SwiftInfer(args).main()
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/swift/pipelines/base.py", line 52, in main
[rank0]: result = self.run()
[rank0]: ^^^^^^^^^^
[rank0]: File "/home/ms-swift/swift/pipelines/infer/infer.py", line 97, in run
[rank0]: result = self.infer_dataset()
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/swift/pipelines/infer/infer.py", line 257, in infer_dataset
[rank0]: result = self._batch_infer(shard_dataset, request_config)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/swift/pipelines/infer/infer.py", line 296, in _batch_infer
[rank0]: resp_list = self.infer(val_dataset, request_config, use_tqdm=True, **self.infer_kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/swift/infer_engine/vllm_engine.py", line 784, in infer
[rank0]: self._add_request(inputs, generation_config, request_id, adapter_request=adapter_request)
[rank0]: File "/home/ms-swift/swift/infer_engine/vllm_engine.py", line 453, in _add_request
[rank0]: return self.engine.add_request(request_id, llm_inputs, generation_config, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 248, in add_request
[rank0]: request = self.input_processor.process_inputs(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/v1/engine/input_processor.py", line 244, in process_inputs
[rank0]: processed_inputs = self.input_preprocessor.preprocess(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 288, in preprocess
[rank0]: return self._process_decoder_only_prompt(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 269, in _process_decoder_only_prompt
[rank0]: return self._prompt_to_llm_inputs(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 220, in _prompt_to_llm_inputs
[rank0]: return self._process_tokens(prompt) # type: ignore[arg-type]
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 144, in _process_tokens
[rank0]: inputs = self._process_multimodal(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 103, in _process_multimodal
[rank0]: return self.renderer._process_multimodal(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/renderers/base.py", line 650, in _process_multimodal
[rank0]: mm_inputs = mm_processor.apply(mm_processor_inputs, mm_timing_ctx)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/multimodal/processing/processor.py", line 1685, in apply
[rank0]: ) = self._cached_apply_hf_processor(inputs, timing_ctx)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/multimodal/processing/processor.py", line 1474, in _cached_apply_hf_processor
[rank0]: ) = self._apply_hf_processor_main(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/multimodal/processing/processor.py", line 1291, in _apply_hf_processor_main
[rank0]: mm_processed_data = self._apply_hf_processor_mm_only(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/multimodal/processing/processor.py", line 1232, in _apply_hf_processor_mm_only
[rank0]: _, mm_processed_data, _ = self._apply_hf_processor_text_mm(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/multimodal/processing/processor.py", line 1153, in _apply_hf_processor_text_mm
[rank0]: processed_data = self._call_hf_processor(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 941, in _call_hf_processor
[rank0]: return super()._call_hf_processor(prompt, mm_data, mm_kwargs, tok_kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/multimodal/processing/processor.py", line 1110, in _call_hf_processor
[rank0]: return self.info.ctx.call_hf_processor(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ms-swift/.venv/lib/python3.12/site-packages/vllm/multimodal/processing/context.py", line 298, in call_hf_processor
[rank0]: raise ValueError(msg) from exc
[rank0]: ValueError: Failed to apply Qwen2_5_VLProcessor on data={'text': '<|video_pad|>', 'videos': [array([[[[ 2, 2, 2, ..., 0, 0, 0],
[rank0]: [ 2, 2, 2, ..., 0, 0, 0],
[rank0]: [ 2, 2, 2, ..., 0, 0, 0],
[rank0]: ...,
[rank0]: [ 5, 5, 5, ..., 4, 4, 4],
[rank0]: [ 5, 5, 5, ..., 7, 7, 7],
[rank0]: [ 5, 5, 5, ..., 10, 10, 10]],
[rank0]: [[ 2, 2, 2, ..., 0, 0, 0],
[rank0]: [ 2, 2, 2, ..., 0, 0, 0],
[rank0]: [ 2, 2, 2, ..., 0, 0, 0],
[rank0]: ...,
[rank0]: [ 4, 4, 4, ..., 4, 4, 4],
[rank0]: [ 4, 4, 4, ..., 7, 7, 7],
[rank0]: [ 4, 4, 4, ..., 10, 10, 10]],
[rank0]: [[ 2, 2, 2, ..., 0, 0, 0],
[rank0]: [ 2, 2, 2, ..., 0, 0, 0],
[rank0]: [ 2, 2, 2, ..., 0, 0, 0],
[rank0]: ...,
[rank0]: [ 9, 9, 9, ..., 12, 12, 12],
[rank0]: [ 9, 9, 9, ..., 15, 15, 15],
[rank0]: [ 9, 9, 9, ..., 18, 18, 18]]],
[rank0]: [[[ 0, 0, 0, ..., 0, 0, 0],
[rank0]: [ 0, 0, 0, ..., 0, 0, 0],
[rank0]: [ 0, 0, 0, ..., 0, 0, 0],
[rank0]: ...,
[rank0]: [ 3, 3, 4, ..., 7, 7, 7],
[rank0]: [ 3, 3, 4, ..., 6, 6, 6],
[rank0]: [ 3, 3, 4, ..., 6, 6, 6]],
[rank0]: [[ 5, 5, 5, ..., 0, 0, 0],
[rank0]: [ 5, 5, 5, ..., 0, 0, 0],
[rank0]: [ 4, 4, 4, ..., 0, 0, 0],
[rank0]: ...,
[rank0]: [ 2, 2, 3, ..., 7, 7, 7],
[rank0]: [ 2, 2, 3, ..., 6, 6, 6],
[rank0]: [ 2, 2, 3, ..., 6, 6, 6]],
[rank0]: [[ 4, 4, 4, ..., 0, 0, 0],
[rank0]: [ 4, 4, 4, ..., 0, 0, 0],
[rank0]: [ 3, 3, 3, ..., 0, 0, 0],
[rank0]: ...,
[rank0]: [ 7, 7, 8, ..., 17, 17, 17],
[rank0]: [ 7, 7, 8, ..., 16, 16, 16],
[rank0]: [ 7, 7, 8, ..., 16, 16, 16]]],
[rank0]: [[[ 2, 1, 0, ..., 0, 0, 0],
[rank0]: [ 3, 2, 1, ..., 0, 0, 0],
[rank0]: [ 5, 4, 2, ..., 0, 0, 0],
[rank0]: ...,
[rank0]: [ 2, 2, 2, ..., 8, 8, 8],
[rank0]: [ 2, 2, 2, ..., 9, 9, 9],
[rank0]: [ 2, 2, 2, ..., 9, 9, 9]],
[rank0]: [[ 7, 6, 4, ..., 1, 1, 1],
[rank0]: [ 8, 7, 6, ..., 1, 1, 1],
[rank0]: [ 10, 9, 7, ..., 1, 1, 1],
[rank0]: ...,
[rank0]: [ 1, 1, 1, ..., 8, 8, 8],
[rank0]: [ 1, 1, 1, ..., 9, 9, 9],
[rank0]: [ 1, 1, 1, ..., 9, 9, 9]],
[rank0]: [[ 1, 0, 0, ..., 1, 1, 1],
[rank0]: [ 2, 1, 0, ..., 1, 1, 1],
[rank0]: [ 4, 3, 1, ..., 1, 1, 1],
[rank0]: ...,
[rank0]: [ 6, 6, 6, ..., 18, 18, 18],
[rank0]: [ 6, 6, 6, ..., 19, 19, 19],
[rank0]: [ 6, 6, 6, ..., 19, 19, 19]]],
[rank0]: ...,
[rank0]: [[[165, 165, 165, ..., 79, 79, 79],
[rank0]: [165, 165, 165, ..., 79, 79, 79],
[rank0]: [165, 165, 165, ..., 79, 79, 79],
[rank0]: ...,
[rank0]: [ 97, 97, 97, ..., 181, 182, 183],
[rank0]: [ 97, 97, 97, ..., 181, 182, 183],
[rank0]: [ 97, 97, 97, ..., 181, 183, 183]],
[rank0]: [[ 32, 32, 32, ..., 29, 29, 29],
[rank0]: [ 32, 32, 32, ..., 29, 29, 29],
[rank0]: [ 32, 32, 32, ..., 29, 29, 29],
[rank0]: ...,
[rank0]: [ 23, 23, 23, ..., 91, 92, 93],
[rank0]: [ 23, 23, 23, ..., 91, 92, 93],
[rank0]: [ 23, 23, 23, ..., 91, 93, 93]],
[rank0]: [[ 24, 24, 24, ..., 12, 12, 12],
[rank0]: [ 24, 24, 24, ..., 12, 12, 12],
[rank0]: [ 24, 24, 24, ..., 12, 12, 12],
[rank0]: ...,
[rank0]: [ 14, 14, 14, ..., 47, 48, 49],
[rank0]: [ 14, 14, 14, ..., 47, 48, 49],
[rank0]: [ 14, 14, 14, ..., 47, 49, 49]]],
[rank0]: [[[160, 160, 160, ..., 145, 169, 192],
[rank0]: [160, 160, 160, ..., 156, 181, 204],
[rank0]: [160, 160, 160, ..., 168, 194, 218],
[rank0]: ...,
[rank0]: [ 80, 80, 80, ..., 143, 143, 143],
[rank0]: [ 80, 80, 80, ..., 143, 143, 143],
[rank0]: [ 80, 80, 80, ..., 143, 143, 143]],
[rank0]: [[ 24, 24, 24, ..., 111, 135, 158],
[rank0]: [ 24, 24, 24, ..., 122, 147, 170],
[rank0]: [ 24, 24, 24, ..., 134, 160, 184],
[rank0]: ...,
[rank0]: [ 25, 25, 25, ..., 55, 55, 55],
[rank0]: [ 25, 25, 25, ..., 55, 55, 55],
[rank0]: [ 25, 25, 25, ..., 55, 55, 55]],
[rank0]: [[ 22, 22, 22, ..., 89, 113, 136],
[rank0]: [ 22, 22, 22, ..., 100, 125, 148],
[rank0]: [ 22, 22, 22, ..., 112, 138, 162],
[rank0]: ...,
[rank0]: [ 12, 12, 12, ..., 37, 37, 37],
[rank0]: [ 12, 12, 12, ..., 37, 37, 37],
[rank0]: [ 12, 12, 12, ..., 37, 37, 37]]],
[rank0]: [[[129, 129, 128, ..., 40, 40, 40],
[rank0]: [129, 129, 128, ..., 40, 40, 40],
[rank0]: [129, 129, 128, ..., 40, 40, 40],
[rank0]: ...,
[rank0]: [122, 167, 207, ..., 178, 178, 178],
[rank0]: [141, 182, 212, ..., 178, 178, 178],
[rank0]: [156, 195, 211, ..., 178, 178, 178]],
[rank0]: [[ 16, 16, 17, ..., 34, 34, 34],
[rank0]: [ 16, 16, 17, ..., 34, 34, 34],
[rank0]: [ 16, 16, 17, ..., 34, 34, 34],
[rank0]: ...,
[rank0]: [ 55, 100, 139, ..., 19, 19, 19],
[rank0]: [ 75, 116, 146, ..., 19, 19, 19],
[rank0]: [ 90, 129, 147, ..., 19, 19, 19]],
[rank0]: [[ 11, 11, 11, ..., 11, 11, 11],
[rank0]: [ 11, 11, 11, ..., 11, 11, 11],
[rank0]: [ 11, 11, 11, ..., 11, 11, 11],
[rank0]: ...,
[rank0]: [ 8, 49, 88, ..., 20, 20, 20],
[rank0]: [ 22, 61, 89, ..., 20, 20, 20],
[rank0]: [ 35, 72, 86, ..., 20, 20, 20]]]],
[rank0]: shape=(274, 3, 560, 1008), dtype=uint8)]} with kwargs={'do_resize': False, 'fps': [1.991761570147807], 'return_tensors': 'pt'}
[rank0]:[W515 11:34:50.839824072 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Checklist / 检查清单
Bug Description / Bug 描述
#9357 上个pr被自动关了,重新开一个
随便跑个视频
vllm 模式不能跑,tf 模式可以跑
How to Reproduce / 如何复现
印象中从 tf v4 后几个版本就有这个问题,目前装的 tf 5.8.1 & swift latest commit, vllm 0.19.1
Additional Information / 补充信息
fix
最近的pr把这周围的代码重构了 希望能顺便修一下这里,可能需要注意 fps 有没有正确传入
swift/template/templates/qwen.py
EvolvingLMMs-Lab/lmms-eval#1269
有些框架之前这里漏传 video_metadata,对视频时间理解挺灾难的