Skip to content

minicpm-v-4_5 vllm fail with larger batch size (>=16) #304

@zhm-algo

Description

@zhm-algo

test script: https://github.com/Zjq9409/intel_benchmark_vlm/blob/master/performance_benchmark/online/intel_benchmark_server.sh

GPU: b60 x2 and b70 x1
vllm : b7
vllm setting: max model length 32768
max num batched tokens 8192

error log
(APIServer pid=1112) INFO 03-02 02:46:13 [loggers.py:236] Engine 000: Avg prompt throughput: 147.0 tokens/s, Avg generation throughput: 12.9 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.5%, Prefix cache hit rate: 0.0%, MM cache hit rate: 80.5%
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] WorkerProc hit an exception.
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] Traceback (most recent call last):
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 810, in worker_busy_loop
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] output = func(*args, **kwargs)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 367, in execute_model
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self.worker.execute_model(scheduler_output, *args, **kwargs)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return func(*args, **kwargs)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 563, in execute_model
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] output = self.model_runner.execute_model(
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return func(*args, **kwargs)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2756, in execute_model
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ) = self._preprocess(
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2331, in _preprocess
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] self._execute_mm_encoder(scheduler_output)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1978, in _execute_mm_encoder
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] curr_group_outputs = model.embed_multimodal(**mm_kwargs_group)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 1147, in embed_multimodal
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self._process_multimodal_inputs(modalities)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 1130, in _process_multimodal_inputs
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] image_embeddings = self._process_vision_input(image_input)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 1115, in _process_vision_input
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] image_features_flat = self.get_vision_hidden_states(image_input)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 1691, in get_vision_hidden_states
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self.resampler(vision_embedding, tgt_sizes, all_temporal_ids)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return forward_call(*args, **kwargs)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 426, in forward
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] out = self.attn(
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return forward_call(*args, **kwargs)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/activation.py", line 1488, in forward
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] attn_output, attn_output_weights = F.multi_head_attention_forward(
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/functional.py", line 6307, in multi_head_attention_forward
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] q, k, v = _in_projection_packed(query, key, value, in_proj_weight, in_proj_bias)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/functional.py", line 5733, in _in_projection_packed
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return linear(q, w_q, b_q), linear(k, w_k, b_k), linear(v, w_v, b_v)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] RuntimeError: UR backend failed. UR backend returns:40 (UR_RESULT_ERROR_OUT_OF_RESOURCES)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] Traceback (most recent call last):
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 810, in worker_busy_loop
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] output = func(*args, **kwargs)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 367, in execute_model
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self.worker.execute_model(scheduler_output, *args, **kwargs)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return func(*args, **kwargs)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 563, in execute_model
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] output = self.model_runner.execute_model(
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return func(*args, **kwargs)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2756, in execute_model
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ) = self._preprocess(
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2331, in _preprocess
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] self._execute_mm_encoder(scheduler_output)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1978, in _execute_mm_encoder
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] curr_group_outputs = model.embed_multimodal(**mm_kwargs_group)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 1147, in embed_multimodal
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self._process_multimodal_inputs(modalities)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 1130, in _process_multimodal_inputs
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] image_embeddings = self._process_vision_input(image_input)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 1115, in _process_vision_input
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] image_features_flat = self.get_vision_hidden_states(image_input)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 1691, in get_vision_hidden_states
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self.resampler(vision_embedding, tgt_sizes, all_temporal_ids)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return forward_call(*args, **kwargs)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 426, in forward
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] out = self.attn(
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return forward_call(*args, **kwargs)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/activation.py", line 1488, in forward
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] attn_output, attn_output_weights = F.multi_head_attention_forward(
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/functional.py", line 6307, in multi_head_attention_forward
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] q, k, v = _in_projection_packed(query, key, value, in_proj_weight, in_proj_bias)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/functional.py", line 5733, in _in_projection_packed
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return linear(q, w_q, b_q), linear(k, w_k, b_k), linear(v, w_v, b_v)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815] RuntimeError: UR backend failed. UR backend returns:40 (UR_RESULT_ERROR_OUT_OF_RESOURCES)
(Worker_TP0 pid=1400) ERROR 03-02 02:46:13 [multiproc_executor.py:815]
(EngineCore_DP0 pid=1258) ERROR 03-02 02:46:13 [dump_input.py:72] Dumping input data for V1 LLM engine (v0.11.2.dev0+g439368496.d20260115) with config: model='/llm/models/MiniCPM-V-4_5', speculative_config=None, tokenizer='/llm/models/MiniCPM-V-4_5', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=xpu, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=MiniCPM-V-4_5, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.NONE: 0>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['all'], 'splitting_ops': None, 'compile_mm_encoder': False, 'use_inductor': None, 'compile_sizes': [], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.NONE: 0>, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': None, 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {}, 'max_cudagraph_capture_size': None, 'local_cache_dir': None},
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] WorkerProc hit an exception.
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] Traceback (most recent call last):
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 810, in worker_busy_loop
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] output = func(*args, **kwargs)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 367, in execute_model
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self.worker.execute_model(scheduler_output, *args, **kwargs)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return func(*args, **kwargs)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 563, in execute_model
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] output = self.model_runner.execute_model(
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return func(*args, **kwargs)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2756, in execute_model
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ) = self._preprocess(
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2331, in _preprocess
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] self._execute_mm_encoder(scheduler_output)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1978, in _execute_mm_encoder
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] curr_group_outputs = model.embed_multimodal(**mm_kwargs_group)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 1147, in embed_multimodal
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self._process_multimodal_inputs(modalities)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 1130, in _process_multimodal_inputs
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] image_embeddings = self._process_vision_input(image_input)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 1115, in _process_vision_input
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] image_features_flat = self.get_vision_hidden_states(image_input)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 1691, in get_vision_hidden_states
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self.resampler(vision_embedding, tgt_sizes, all_temporal_ids)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return forward_call(*args, **kwargs)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 426, in forward
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] out = self.attn(
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return forward_call(*args, **kwargs)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/activation.py", line 1488, in forward
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] attn_output, attn_output_weights = F.multi_head_attention_forward(
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/functional.py", line 6439, in multi_head_attention_forward
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] q_scaled = q * math.sqrt(1.0 / float(E))
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] RuntimeError: UR backend failed. UR backend returns:40 (UR_RESULT_ERROR_OUT_OF_RESOURCES)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] Traceback (most recent call last):
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 810, in worker_busy_loop
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] output = func(*args, **kwargs)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 367, in execute_model
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self.worker.execute_model(scheduler_output, *args, **kwargs)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return func(*args, **kwargs)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 563, in execute_model
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] output = self.model_runner.execute_model(
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return func(*args, **kwargs)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2756, in execute_model
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ) = self._preprocess(
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2331, in _preprocess
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] self._execute_mm_encoder(scheduler_output)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1978, in _execute_mm_encoder
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] curr_group_outputs = model.embed_multimodal(**mm_kwargs_group)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 1147, in embed_multimodal
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self._process_multimodal_inputs(modalities)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 1130, in _process_multimodal_inputs
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] image_embeddings = self._process_vision_input(image_input)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 1115, in _process_vision_input
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] image_features_flat = self.get_vision_hidden_states(image_input)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 1691, in get_vision_hidden_states
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self.resampler(vision_embedding, tgt_sizes, all_temporal_ids)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return forward_call(*args, **kwargs)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minicpmv.py", line 426, in forward
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] out = self.attn(
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] return forward_call(*args, **kwargs)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/activation.py", line 1488, in forward
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] attn_output, attn_output_weights = F.multi_head_attention_forward(
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/functional.py", line 6439, in multi_head_attention_forward
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] q_scaled = q * math.sqrt(1.0 / float(E))
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815] RuntimeError: UR backend failed. UR backend returns:40 (UR_RESULT_ERROR_OUT_OF_RESOURCES)
(Worker_TP1 pid=1401) ERROR 03-02 02:46:13 [multiproc_executor.py:815]

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions