【Hackathon 9th No.93】【RFC】为 Fastdeploy 新增 MiniMax-M1 模型 by ZhijunLStudio · Pull Request #1156 · PaddlePaddle/community

ZhijunLStudio · 2025-09-16T01:23:05Z

本文档为新增 MiniMax-M1 模型的 RFC，规划了从 CUDA 算子开发到模型整体集成的技术方案。

paddle-bot · 2025-09-16T01:23:11Z

你的PR提交成功，感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备，具体请参考示例和模版。
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.

luotao1 · 2025-09-16T03:18:02Z

@chang-wenbin

chang-wenbin · 2025-09-16T06:34:22Z

rfcs/FastDeploy/20250916_add_minimax_m1_for_fastdeploy.md

+
+**核心技术路径**:
+1.  **复用**: 最大化复用 GLM-4.5 PR 中已有的 Partial RoPE 和标准 GQA Attention 组件。
+2.  **翻译与开发**: 将 vLLM 的 `lightning_attn.py` (Triton) 翻译为高性能的 CUDA C++ 算子，以支持 MiniMax-M1 的线性注意力层。


可以先用triton算子快速验证，同步开发高性能cuda kernel，FD目前支持使用triton算子

chang-wenbin · 2025-09-16T06:37:39Z

rfcs/FastDeploy/20250916_add_minimax_m1_for_fastdeploy.md

+
+---
+
+### **Phase 1: [核心开发] 实现 Mamba/线性注意力 CUDA 算子 (2-4 周)**


目前看主要开发工作在线性注意力，可以先尝试接入triton算子快速验证下，如果attention性能不佳可以尝试实现cuda kenrel优化端到端性能。

cloudforge1 · 2026-03-30T11:56:19Z

基于本 RFC 完成了 MiniMax-M1 模型的 FastDeploy 实现，代码 PR：PaddlePaddle/FastDeploy#6994（2022 行新增，32/32 测试通过，CI 绿）。

实现与原设计的主要技术偏差：

线性注意力：直接复用 Triton 算子（711 行），未翻译为 CUDA C++——遵循本 PR review 中的建议"可以先用triton算子快速验证"
MambaBackend：Mamba 逻辑直接集成到 MiniMaxM1LinearAttention，未新增独立 Backend 抽象
权重加载：原 RFC 标注"待定"，已实现 v0（set_state_dict）+ v1（load_weights）双路径
量化：原 RFC 未涉及，已补充 w4a8、w4afp8（静态/动态）、tensor_wise_fp8、block_wise_fp8

设计文档已按实际实现更新：#1252 。感谢原 RFC 提供的架构框架。

add minimax rfc

6c4fda7

paddle-bot bot added the contributor label Sep 16, 2025

luotao1 mentioned this pull request Sep 16, 2025

【Hackathon 9th】开源贡献个人挑战赛 PaddlePaddle/Paddle#74773

Closed

luotao1 self-assigned this Sep 16, 2025

chang-wenbin reviewed Sep 16, 2025

View reviewed changes

chang-wenbin approved these changes Sep 19, 2025

View reviewed changes

luotao1 approved these changes Sep 19, 2025

View reviewed changes

luotao1 merged commit cbd6610 into PaddlePaddle:master Sep 19, 2025
1 check passed

This was referenced Mar 20, 2026

【Hackathon 10th Spring No.47】【RFC】新增 MiniMax-M1 模型设计文档 #1252

Open

[Feature]【Hackathon 10th Spring No.47】Add MiniMax-M1 model support PaddlePaddle/FastDeploy#6994

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Hackathon 9th No.93】【RFC】为 Fastdeploy 新增 MiniMax-M1 模型#1156

【Hackathon 9th No.93】【RFC】为 Fastdeploy 新增 MiniMax-M1 模型#1156
luotao1 merged 1 commit intoPaddlePaddle:masterfrom
ZhijunLStudio:minimax-rfc

ZhijunLStudio commented Sep 16, 2025

Uh oh!

paddle-bot bot commented Sep 16, 2025

Uh oh!

luotao1 commented Sep 16, 2025

Uh oh!

chang-wenbin Sep 16, 2025

Uh oh!

chang-wenbin Sep 16, 2025

Uh oh!

ZhijunLStudio Sep 16, 2025

Uh oh!

Uh oh!

cloudforge1 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		---

		### Phase 1: [核心开发] 实现 Mamba/线性注意力 CUDA 算子 (2-4 周)

Conversation

ZhijunLStudio commented Sep 16, 2025

Uh oh!

paddle-bot bot commented Sep 16, 2025

Uh oh!

luotao1 commented Sep 16, 2025

Uh oh!

chang-wenbin Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

chang-wenbin Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

ZhijunLStudio Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cloudforge1 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants