Skip to content

[Bug]: AMD's minimax mxfp4 trust_remote_code bug #38307

@functionstackx

Description

@functionstackx

Your current environment

image: vllm/vllm-openai-rocm:v0.17.1

🐛 Describe the bug

already filed via slack last friday but want to file here to track it.

blocker for merging this PR in SemiAnalysisAI/InferenceX#827

even when doing trust_remote_code=true, minimax mxfp4 doesnt use it leading to this bug.

seems like @hongxiayang already working on fixing it
#37698

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23326389246/job/67848378566?pr=827

+ vllm serve amd/MiniMax-M2.5-MXFP4 --port 8888 --tensor-parallel-size=2 --gpu-memory-utilization 0.95 --max-model-len 2248 --block-size=32 --trust-remote-code
WARNING 03-20 02:24:48 [gpt_oss_triton_kernels_moe.py:56] Using legacy triton_kernels on ROCm
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/chat_completion/protocol.py:346: SyntaxWarning: invalid escape sequence '\e'
  "(e.g. 'abcdabcdabcd...' or '\emoji \emoji \emoji ...'). This feature "
/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/completion/protocol.py:176: SyntaxWarning: invalid escape sequence '\e'
  "(e.g. 'abcdabcdabcd...' or '\emoji \emoji \emoji ...'). This feature "
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302] 
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302]        █     █     █▄   ▄█
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.17.1
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302]   █▄█▀ █     █     █     █  model   amd/MiniMax-M2.5-MXFP4
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:302] 
(APIServer pid=1169773) INFO 03-20 02:24:49 [utils.py:238] non-default args: {'model_tag': 'amd/MiniMax-M2.5-MXFP4', 'port': 8888, 'model': 'amd/MiniMax-M2.5-MXFP4', 'trust_remote_code': True, 'max_model_len': 2248, 'tensor_parallel_size': 2, 'block_size': 32, 'gpu_memory_utilization': 0.95}
(APIServer pid=1169773) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1169773) [2026-03-20 02:24:49] WARNING configuration_utils.py:697: The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1169773) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1169773) [2026-03-20 02:24:49] WARNING configuration_utils.py:697: The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1169773) INFO 03-20 02:24:56 [model.py:531] Resolved architecture: MiniMaxM2ForCausalLM
(APIServer pid=1169773) INFO 03-20 02:24:56 [model.py:1554] Using max model len 2248
(APIServer pid=1169773) [aiter] start build [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/build/module_aiter_enum
(APIServer pid=1169773) [2026-03-20 02:24:56] INFO core.py:549: start build [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/build/module_aiter_enum
(APIServer pid=1169773) [aiter] finish build [module_aiter_enum], cost 7.8s 
(APIServer pid=1169773) [2026-03-20 02:25:04] INFO core.py:699: finish build [module_aiter_enum], cost 7.8s 
(APIServer pid=1169773) [aiter] import [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/module_aiter_enum.so
(APIServer pid=1169773) [2026-03-20 02:25:04] INFO core.py:501: import [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/module_aiter_enum.so
(APIServer pid=1169773) INFO 03-20 02:25:04 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=8192.
(APIServer pid=1169773) Traceback (most recent call last):
(APIServer pid=1169773)   File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=1169773)     sys.exit(main())
(APIServer pid=1169773)              ^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=1169773)     args.dispatch_function(args)
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd
(APIServer pid=1169773)     uvloop.run(run_server(args))
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=1169773)     return __asyncio.run(
(APIServer pid=1169773)            ^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1169773)     return runner.run(main)
(APIServer pid=1169773)            ^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1169773)     return self._loop.run_until_complete(task)
(APIServer pid=1169773)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=1169773)     return await main
(APIServer pid=1169773)            ^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server
(APIServer pid=1169773)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker
(APIServer pid=1169773)     async with build_async_engine_client(
(APIServer pid=1169773)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1169773)     return await anext(self.gen)
(APIServer pid=1169773)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client
(APIServer pid=1169773)     async with build_async_engine_client_from_engine_args(
(APIServer pid=1169773)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1169773)     return await anext(self.gen)
(APIServer pid=1169773)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 122, in build_async_engine_client_from_engine_args
(APIServer pid=1169773)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=1169773)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1890, in create_engine_config
(APIServer pid=1169773)     config = VllmConfig(
(APIServer pid=1169773)              ^^^^^^^^^^^
(APIServer pid=1169773)   File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
(APIServer pid=1169773)     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=1169773) pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig
(APIServer pid=1169773)   Value error, The repository amd/MiniMax-M2.5-MXFP4 contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/amd/MiniMax-M2.5-MXFP4 .
(APIServer pid=1169773)  You can inspect the repository content at https://hf.co/amd/MiniMax-M2.5-MXFP4.
(APIServer pid=1169773) Please pass the argument `trust_remote_code=True` to allow custom code to be run. [type=value_error, input_value=ArgsKwargs((), {'model_co...transfer_config': None}), input_type=ArgsKwargs]
(APIServer pid=1169773)     For further information visit https://errors.pydantic.dev/2.12/v/value_error

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingrocmRelated to AMD ROCm

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions