diff --git a/README.md b/README.md
index 98d700b4ec..4922d80384 100644
--- a/README.md
+++ b/README.md
@@ -77,7 +77,7 @@ You can contact us and communicate with us by adding our group:
 
 
 ## 🎉 News
-- 🎁 2026.01.15: **ms-swift v4.0** major version update is in progress. It is recommended to use the stable branch [release/3.12](https://github.com/modelscope/ms-swift/tree/release/3.12). You can provide your feedback in [this issue](https://github.com/modelscope/ms-swift/issues/7250). Thank you for your support.
+- 🎁 2026.03.03: **ms-swift v4.0** major version is officially released. For release notes, please refer to [here](https://github.com/modelscope/ms-swift/releases/tag/v4.0.0). You can provide your suggestions to us in [this issue](https://github.com/modelscope/ms-swift/issues/7250). Thank you for your support.
 - 🎁 2025.11.14: Megatron GRPO is now available!  Check out the [docs](./docs/source_en/Megatron-SWIFT/GRPO.md) and [examples](examples/megatron/grpo).
 - 🎁 2025.11.04: Support for [Mcore-Bridge](docs/source_en/Megatron-SWIFT/Mcore-Bridge.md), making Megatron training as simple and easy to use as transformers.
 - 🎁 2025.10.28: Ray [here](docs/source_en/Instruction/Ray.md).
diff --git a/README_CN.md b/README_CN.md
index 7bfe927c3c..d02362e67c 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -73,7 +73,7 @@
 - **模型量化**：支持AWQ、GPTQ、FP8和BNB的量化导出，导出的模型支持使用vLLM/SGLang/LmDeploy推理加速。
 
 ## 🎉 新闻
-- 🎁 2026.01.15: **ms-swift v4.0**大版本更新进行中，建议使用稳定分支[release/3.12](https://github.com/modelscope/ms-swift/tree/release/3.12)，您的建议可以在[这个issue](https://github.com/modelscope/ms-swift/issues/7250)中反馈给我们，感谢您的支持。
+- 🎁 2026.03.03: **ms-swift v4.0**大版本正式发布，release note参考[这里](https://github.com/modelscope/ms-swift/releases/tag/v4.0.0)，您的建议可以在[这个issue](https://github.com/modelscope/ms-swift/issues/7250)中反馈给我们，感谢您的支持。
 - 🎁 2025.11.14: Megatron GRPO现已支持！查看[文档](./docs/source/Megatron-SWIFT/GRPO.md)和[示例](examples/megatron/grpo)。
 - 🎁 2025.11.04: 支持[Mcore-Bridge](docs/source/Megatron-SWIFT/Mcore-Bridge.md)，使Megatron训练像transformers一样简单易用。
 - 🎁 2025.10.28: Ray [已支持](docs/source/Instruction/Ray.md)。
diff --git a/docs/source/BestPractices/Qwen3-VL-Best-Practice.md b/docs/source/BestPractices/Qwen3-VL-Best-Practice.md
index 8ff2c9640a..aef70e6e4d 100644
--- a/docs/source/BestPractices/Qwen3-VL-Best-Practice.md
+++ b/docs/source/BestPractices/Qwen3-VL-Best-Practice.md
@@ -153,7 +153,7 @@ Here’s a breakdown of what unfolds:
 Overall, this is a sweet, lighthearted video that showcases the innocence and imagination of early childhood. The child’s engagement with the book, combined with their glasses and playful demeanor, creates a delightful and memorable scene.
 ```
 
-- 其中特定模型参数，例如 `VIDEO_MAX_TOKEN_NUM` 等环境变量的含义参考[命令行参数文档](../Instruction/Command-line-parameters.md#qwen3_vl)。
+- 其中特定模型参数，例如 `VIDEO_MAX_TOKEN_NUM` 等环境变量的含义参考[命令行参数文档](../Instruction/Command-line-parameters.md#qwen3_vl-qwen3_5)。
 
 
 ## 训练
diff --git a/docs/source/Customization/Architecture.md b/docs/source/Customization/Architecture.md
new file mode 100644
index 0000000000..736ce9e061
--- /dev/null
+++ b/docs/source/Customization/Architecture.md
@@ -0,0 +1,233 @@
+# 架构介绍
+
+ms-swift 4.0 采用模块化设计，各功能模块分布在一级目录下，便于开发者进行自定义扩展。本文档将详细介绍各模块的功能及自定义方法。
+
+## Agent Template
+
+agent模板的mapping文件可以参考[这里](https://github.com/modelscope/ms-swift/blob/main/swift/agent_template/mapping.py)。agent template设计目标是，基于统一的Agent数据集格式，可以灵活切换不同模型进行训练，无需修改数据。训练时使用`--agent_template`指定对应的agent模板。
+
+所有的AgentTemplate需要继承自`BaseAgentTemplate`，并实现其中的几个方法: `_format_tools`, `_format_tool_calls`, `_format_tool_responses`, `get_toolcall`。
+- _format_tools: 将`tools`和`system`格式化，组成完整的system。
+- _format_tool_calls: 将tool_call部分 `[{"role": "tool_call", "content": "..."}, {"role": "tool_call", "content": "..."}]`进行格式化，最后返回字符串。
+- _format_tool_responses: 对tool（也称为tool_response）部分 `[{"role": "tool", "content": "..."}, {"role": "tool", "content": "..."}]`进行格式化。
+- get_toolcall: 在部署的时候使用，用于解析模型输出内容中的工具名和参数，返回`List[Function]`。
+
+
+如何debug：
+```python
+data = {"tools": "[{\"type\": \"function\", \"function\": {\"name\": \"realtime_aqi\", \"description\": \"天气预报。获取实时空气质量。当前空气质量，PM2.5，PM10信息\", \"parameters\": {\"type\": \"object\", \"properties\": {\"city\": {\"type\": \"string\", \"description\": \"城市名，例如：上海\"}}, \"required\": [\"city\"]}}}]", "messages": [{"role": "user", "content": "北京和上海今天的天气情况"}, {"role": "tool_call", "content": "{\"name\": \"realtime_aqi\", \"arguments\": {\"city\": \"北京\"}}"}, {"role": "tool_call", "content": "{\"name\": \"realtime_aqi\", \"arguments\": {\"city\": \"上海\"}}"}, {"role": "tool_response", "content": "{\"city\": \"北京\", \"aqi\": \"10\", \"unit\": \"celsius\"}"}, {"role": "tool_response", "content": "{\"city\": \"上海\", \"aqi\": \"72\", \"unit\": \"fahrenheit\"}"}, {"role": "assistant", "content": "根据天气预报工具，北京今天的空气质量指数为10，属于良好水平；上海今天的空气质量指数为72，属于轻度污染水平。"}]}
+
+
+from swift import get_processor, get_template
+
+tokenizer = get_processor('Qwen/Qwen3.5-2B')
+template = get_template(tokenizer)  # 使用默认agent模板
+# template = get_template(tokenizer, agent_template='qwen3_5')
+print(f'agent_template: {template._agent_template}')
+template.set_mode('train')
+encoded = template.encode(data)
+print(f'[INPUT_IDS] {template.safe_decode(encoded["input_ids"])}\n')
+print(f'[LABELS] {template.safe_decode(encoded["labels"])}')
+```
+
+如果你想要给我们提供PR，请参考[这里](https://github.com/modelscope/ms-swift/blob/main/tests/test_align/test_template/test_agent.py)书写你的测试案例。
+
+## Callbacks
+
+callbacks的mapping文件可以参考[这里](https://github.com/modelscope/ms-swift/blob/main/swift/callbacks/mapping.py)。callbacks可以对trainer中的关键节点的行为进行自定义。自定义后，你需要在mapping中进行注册，训练时使用`--callbacks`指定对应的回调类。例如，你可以自定义：
+
+```python
+class CustomCallback(TrainerCallback):
+
+    def on_train_begin(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs):
+        # Doing something when the training begins.
+        pass
+
+    def on_save(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs):
+        # Doing something when save checkpoint
+        pass
+```
+
+所有的回调类需继承自base.py中的`TrainerCallback`，并覆盖其方法。接口与transformers的`TrainerCallback`一致，请参考transformers的[callback文档](https://huggingface.co/docs/transformers/main_classes/callback)。
+
+
+## Loss
+
+Loss的mapping文件可以参考[这里](https://github.com/modelscope/ms-swift/blob/main/swift/loss/mapping.py)。
+swift支持自定义loss（当前只支持sft/pretrain/reranker/embedding任务），注册后在训练时设置`--loss_type <loss-name>`使用你定制的loss方法。
+
+自定义Loss需继承自`BaseLoss`，并实现`__call__`方法，返回标量Tensor。你可以参考[CustomCrossEntropyLoss](https://github.com/modelscope/ms-swift/blob/0d7c9f5bc0e7e7d67d914ce6edeb9ce24f60746f/swift/loss/causal_lm.py#L5)进行定制。例如：
+
+```python
+class CustomLoss(BaseLoss):
+
+    def __call__(self, outputs, labels, **kwargs) -> torch.Tensor:
+        pass
+```
+
+## Loss Scale
+
+loss scale的mapping文件可以参考[这里](https://github.com/modelscope/ms-swift/blob/main/swift/loss_scale/mapping.py)。在pretrain和sft任务中，可训练token的loss是平均的，即每个token平等地对待。但在某些情况下，某些token需要被额外关注，并设置更高的权重或者对某些token不进行训练。loss_scale可以让开发者自由地定义自己的token权重。（预训练和SFT支持使用loss_scale控制token是否参与训练以及和其权重大小，RLHF中只支持控制token是否参与训练）
+
+你可以通过继承LossScale基类，并实现`get_loss_scale`方法来自定义loss scale。
+```python
+class CustomLossScale(LossScale):
+
+    def get_loss_scale(self, context: str, **kwargs) -> Tuple[List[str], List[float]]:
+        ...
+```
+`get_loss_scale`函数需要返回了一个Tuple，第一个返回是拆解后的字符串的列表，第二个参数是字符串对应的loss_scale的列表，float值代表了权重。例如下面的权重设置：
+```text
+["学习", "好", "数学", "是", "重要", "的"]
+[1.0, 0.5, 2.0, 0.5, 2.0, 0.1]
+```
+例子中，我们更看重数学和重要两个词，因为其loss_scale为2.0。
+
+
+当然我们也需要关注`__call__`方法的核心逻辑，即loss_scale基本策略（base_strategy）all/default/last_round 对loss_scale的影响，具体参考[命令行参数文档](../Instruction/Command-line-parameters.md)的介绍。以及数据集中的'loss'字段对loss_scale的影响，参考[自定义数据集文档](../Customization/Custom-dataset.md)。
+```python
+if loss or loss is None and (self.base_strategy == 'all' or
+                            (self.base_strategy == 'default' and is_assistant) or
+                            (self.base_strategy == 'last_round' and is_assistant and is_last_round)):
+    new_context, loss_scale = self.get_loss_scale(context, query=query)
+else:
+    new_context, loss_scale = [context], [0.]
+```
+
+此外你也可以使用[json配置文件](https://github.com/modelscope/ms-swift/tree/main/swift/loss_scale/config)，继承内置的ConfigLossScale类，来自定义loss_scale。目前支持两种配置方式：字符串精确匹配和正则表达式匹配。你可以参考[Agent支持文档](../Instruction/Agent-support.md#loss_scale的使用)的内容进行理解。
+
+- 字符串精确匹配，例如参考`react.json`, `qwen.json`。json中需要书写`Dict[str, List[float]]`的映射。字符串代表关键词，列表中需要有两个值。我们会根据关键词，将字符串切分成多段字符串。列表的第一个值代表关键词的权重，列表的第二个值代表该关键值后，下一关键词前的内容的权重。
+
+- 正则表达式匹配，例如参考`ignore_empty_think.json`, `hermes.json`。json中需要书写`Dict[str, float]`的映射。字符串代表正则表达式pattern，浮点数代表匹配字符串的权重。
+
+
+如何debug：
+```python
+from swift import get_processor, get_template
+
+data = {"messages": [
+    {"role": "user", "content": "今天的日期是多少？"},
+    {"role": "assistant", "content": (
+        "<think>\n我可以通过调用`get_date`函数来获取当前时间。\n</think>\n"
+        '<tool_call>\n{"name": "get_date", "arguments": {}}\n</tool_call>'
+    )}
+]}
+
+template = get_template(get_processor('Qwen/Qwen3-8B'), loss_scale='hermes')
+template.set_mode('train')
+inputs = template.encode(data)
+
+print(template.safe_decode(inputs['labels']))
+print(inputs['loss_scale'])
+```
+
+## Metrics
+
+metrics的mapping文件可以参考[这里](https://github.com/modelscope/ms-swift/blob/main/swift/metrics/mapping.py)。该组件在ms-swift/Megatron-SWIFT中都有被使用。
+- 如果是在ms-swift中被使用，你需要继承 base.py 中`EvalMetrics`基类，并实现`compute_metrics`函数，返回字典`Dict[str, float]`。你可以参考[NlgMetrics](https://github.com/modelscope/ms-swift/blob/0d7c9f5bc0e7e7d67d914ce6edeb9ce24f60746f/swift/metrics/nlg.py#L33)进行定制。
+- 如果是在Megatron-SWIFT中被使用，你需要继承 utils.py 中`Metric`基类，并实现`update`和`compute`方法，compute方法需返回字典`Dict[str, float]`。
+
+你可以自定义metrics（当前只支持sft/pretrain/reranker/embedding任务），在训练时设置`--eval_metric <metric-name>`使用你定制的metrics。
+
+## Optimizers
+
+optimizer的mapping文件可以参考[这里](https://github.com/modelscope/ms-swift/blob/main/swift/optimizers/mapping.py)。如果你需要自定义优化器，你需要继承`OptimizerCallback`基类，并覆盖`create_optimizer`函数。训练时使用`--optimizer <optimizer-name>`指定自定义的优化器。
+- 你可以参考[MultimodalOptimizerCallback](https://github.com/modelscope/ms-swift/blob/0d7c9f5bc0e7e7d67d914ce6edeb9ce24f60746f/swift/optimizers/multimodal.py#L43)进行实现，该类实现了vit_lr, aligner_lr的功能，即对vit, aligner和LLM分别使用不同的学习率。
+
+
+
+## Tuner Plugin
+
+Tuner插件的mapping文件可以参考[这里](https://github.com/modelscope/ms-swift/blob/main/swift/tuner_plugin/mapping.py)。如果你需要自定义tuner，你需要继承`Tuner`基类，并覆盖`prepare_model`, `save_pretrained`, `from_pretrained`函数。
+- prepare_model: 该函数在训练前被调用，将原始模型进行处理与准备，使用tuner封装，并设置可训练参数。例如：你可以对某些层附加LoRA，对某些层进行冻结等。
+- save_pretrained: 该函数在训练中被调用，对模型进行保存。
+- from_pretrained: 该函数在推理/断点续训时被调用，准备模型并读取权重。
+
+你可以参考[LoRALLMTuner](https://github.com/modelscope/ms-swift/blob/0d7c9f5bc0e7e7d67d914ce6edeb9ce24f60746f/swift/tuner_plugin/lora_llm.py#L24)进行实现，该类实现了对LLM进行LoRA训练，对ViT进行全参数训练的功能。
+
+
+## ORM
+
+example参考[这里](https://github.com/modelscope/ms-swift/blob/main/swift/rewards/orm.py)。
+
+ORM是结果奖励模型。ORM一般使用正则表达式来进行，ORM决定了response是否是正确的。例如：
+
+```python
+class MathORM(ORM):
+
+    @staticmethod
+    def extract_boxed_result(text):
+        pattern = r'\\boxed{([^}]*)}'
+        match = re.search(pattern, text)
+        if match:
+            return match.group(1).strip()
+        else:
+            return None
+
+    def __call__(self, infer_requests: List[InferRequest], ground_truths: List[str],
+                **kwargs) -> List[float]:
+        rewards = []
+        predictions = [request.messages[-1]['content'] for request in infer_requests]
+        for prediction, ground_truth in zip(predictions, ground_truths):
+            res1 = MathORM.extract_boxed_result(prediction) or ''
+            res2 = MathORM.extract_boxed_result(ground_truth) or ''
+            rewards.append(float(res1.strip() == res2.strip()))
+
+        return rewards
+
+
+orms = {
+    'math': MathORM,
+}
+```
+
+在上面的代码中，我们定义了一个对数学response进行解析的过程，如果结果相同则返回score为1.0，否则为0.0。和PRM不同，这个类的infer中有一个额外参数`ground_truths`，
+该参数是对应的infer_requests的实际label（数据集中定义的标准response）。
+
+
+## PRM
+
+example参考[这里](https://github.com/modelscope/ms-swift/blob/main/swift/rewards/prm.py)。
+
+PRM是过程奖励模型，PRM会在`swift sample`命令中使用。PRM需要支持的接口比较简单：
+```python
+class PRM:
+
+    def __init__(self):
+        # init here
+        pass
+
+    def __call__(self, infer_requests: List[InferRequest], **kwargs) -> List[Union[float, List[float]]]:
+        raise NotImplementedError
+```
+
+其中的InferRequest来自于`swift.infer_engine`，返回的`List[Union[float, List[float]]]`，列表中可能是reward也可能是若干reward。开发者可以在infer_requests中拿到queries和responses，并按照自己的方式进行切分，例如：
+```text
+Let's think step by step.
+
+Step1: xxx
+
+Step2: xxx
+
+So, the answer is ...
+```
+开发者可以在这里对过程进行切分，并按batch传入PRM中进行推理并返回rewards。更通用来说，开发者可以在这里调用一个远端URL，例如一个闭源PRM大模型并返回rewards。
+
+
+## 其他目录结构介绍
+
+- arguments: 命令行参数定义，例如：`SftArguments`, `RLHFArguments`等。
+- cli: swift命令行机制以及启动文件。例如`swift sft ...`等价于`python swift/cli/main.py sft ...`也等价于`python swift/cli/sft.py ...`。
+- config: deepspeed/fsdp2配置文件。
+- dataloader: dataloader的实现，包括shard/dispatcher两种方式。
+- dataset: 数据集相关模块实现，包括数据预处理、packing、流式数据等。内置数据集的注册在`dataset/dataset`和`dataset/data`文件夹内。具体参考[自定义数据集文档](Custom-dataset.md)。
+- infer_engine: 推理引擎实现。包括transformers/vllm/sglang/lmdeploy为后端的推理引擎实现。
+- megatron: Megatron-SWIFT 实现。
+- model: 模型加载与注册。具体参考[自定义模型文档](Custom-model.md)，[多模态模型注册最佳实践](../BestPractices/MLLM-Registration.md)。
+- pipelines: `swift sft/rlhf/infer`等主函数pipeline实现，包括`sft_main/rlhf_main/infer_main`等。
+- rlhf_trainers: GRPO/GKD/DPO/KTO/RM等算法的Trainer实现。
+- rollout: RL算法中rollout过程的采样实现。
+- rewards: RL算法中的奖励函数实现，支持自定义奖励计算逻辑。
+- template: 对话模板的实现与注册，包含各个任务将messages转换成input_ids的逻辑，以及data_collator相关逻辑。具体参考[自定义模型文档](Custom-model.md)，[多模态模型注册最佳实践](../BestPractices/MLLM-Registration.md)。
+- trainers: 预训练/SFT/Embedding/Reranker/序列分类任务的Trainer实现。
+- ui: `swift web-ui`界面训练与推理实现。
diff --git a/docs/source/Customization/Pluginization.md b/docs/source/Customization/Pluginization.md
deleted file mode 100644
index 0eadfa3d36..0000000000
--- a/docs/source/Customization/Pluginization.md
+++ /dev/null
@@ -1,219 +0,0 @@
-# 插件化
-
-> [!WARNING]
-> 该文档待更新到ms-swift4.0
-
-插件化是SWIFT3.0中新增的重要能力。我们希望通过插件化的方式，让开发者对开发流程的定制更加自然。
-
-## callback回调
-
-example在[这里](https://github.com/modelscope/ms-swift/blob/main/swift/callbacks).
-
-`callback`机制是transformers Trainer中的一种训练定制化机制。开发者可以在callback中控制训练流程。通常来说，callback的定制化类似下面的样子：
-```python
-class CustomCallback(TrainerCallback):
-
-    def on_train_begin(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs):
-        # Doing something when the training begins.
-        pass
-
-    def on_save(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs):
-        # Doing something when save checkpoint
-        pass
-```
-callback会在trainer构造前注册进trainer中，example中给出了一个简单版本的EarlyStop方案。注册你自己的callback的方式比较简单：
-```python
-extra_callbacks = [CustomCallback()]
-```
-开发者可以在plugin/callback.py中增加新的callback，并定制自己的训练流程。callback的具体参数可以查看[这里](https://huggingface.co/docs/transformers/main_classes/callback)。
-
-
-## 定制化loss
-
-example在[这里](https://github.com/modelscope/ms-swift/blob/main/swift/loss/mapping.py).
-
-SWIFT支持在plugin中定制loss。如果不使用这个能力，默认会使用交叉熵Loss（CE Loss）。开发者可以在这个文件中编写代码，注册后在训练时设置`--loss_type custom_loss`使用你定制的loss方法。
-例如在plugin/loss.py中添加下面的代码：
-```python
-def custom_loss_func(outputs, labels, loss_scale=None, num_items_in_batch=None) -> torch.Tensor:
-    # Write your own loss calculating here
-    return loss
-
-loss_map['custom_loss'] = custom_loss_func
-```
-需要注意的是，loss和trainer训练的任务是强相关的，目前的loss定制针对pt和sft任务，如果是人类对齐任务（例如DPO、PPO等）或分类任务（seq_cls）任务在插件中是无法定制的。
-
-## 定制化loss_scale
-
-example在[这里](https://github.com/modelscope/ms-swift/blob/main/swift/loss_scale/mapping.py).
-
-loss_scale机制在SWIFT中是非常重要的机制之一。在pt和sft任务中，可训练token的loss是均匀的，即每个token平等的进行bp。但在某些情况下，某些token的权重比较大，需要被额外关注，
-在这种情况下就需要更高的权重。loss_scale可以让开发者自由地定义自己的token权重。
-```python
-class LastRoundLossScale(LossScale):
-
-    def get_loss_scale(self, context: str, context_type: ContextType, is_last_round: bool, **kwargs):
-        if context_type == ContextType.RESPONSE:
-            return [context], [float(is_last_round)]
-        return super().get_loss_scale(context, context_type, is_last_round)
-```
-在上面的代码中，返回了一个Tuple，第一个返回是context（或拆解后的context），第二个参数是context对应的loss_scale，float值代表了权重。例如下面的权重设置：
-```text
-["学习", "好", "数学", "是", "重要", "的"]
-[1.0, 0.5, 2.0, 0.5, 2.0, 0.1]
-```
-我们更看重数学和重要两个词，因此我们把它们的权重提升到2.0。
-回到上面的代码，我们判断了传入的context是否是response，如果是response且如果是多轮对话的最后一轮才返回[1]，在其他情况下使用基类的实现（在本场景下loss_scale时[0]）。使用这种方案，
-我们做到了只有最后一轮的response参与训练，其他response不参与训练。使用这种方式，可以让所有token（prompt、response）参与训练，或针对agent某些特殊字符重点训练等。
-在pt和sft中，loss_scale是整体支持（是否参与训练，以及权重大小）的，而人类对齐中只能支持某些token是否参与训练，无法支持权重大小。
-
-## 定制化metric
-
-example在[这里](https://github.com/modelscope/ms-swift/blob/main/swift/metrics).
-
-metric可以定制训练时使用的评测参数：
-```python
-metric_mapping = {
-    'acc': (compute_acc_metrics, preprocess_logits_for_acc),
-    'nlg': (compute_nlg_metrics, None),
-    'custom': (custom_metric, custom_preprocess),
-}
-
-
-def get_metric(metric: str):
-    return metric_mapping[metric]
-```
-在上面的定义中，我们添加了新的custom metric，它的value有两个值，第一个值是计算metric的过程，返回一个包含metric key-value对的dict，第二个值是针对logits做前处理，返回实际的predictions。
-
-## 定制化optimizer
-
-example在[这里](https://github.com/modelscope/ms-swift/blob/main/swift/optimizers/mapping.py).
-- 对模型不同部分采用不同的学习率，例如：ViT和LLM分别使用不同的学习率，参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/lora_llm_full_vit/custom_plugin.py)。
-
-用户可以在这里增加自己的optimizer和lr_scheduler实现：
-```python
-def create_custom_optimizers(args, model, dataset):
-    # 创建自己的optimizer
-    return CustomOptimizer(optimizer_grouped_parameters, **optimizer_kwargs), CustomScheduler(...)
-
-optimizers_map = {
-    'custom': create_custom_optimizers,
-    ...
-}
-```
-
-当开发者需要使用其他optimizer，例如某些新论文中定义的optimizer时，可以在这里定义其创建过程，并在参数中使用：
-```shell
---optimizer custom
-```
-就可以实际调用了。
-
-## 定制化agent template
-
-example在[这里](https://github.com/modelscope/ms-swift/blob/main/swift/agent_template/mapping.py).
-
-## 定制化tuner
-
-example在[这里](https://github.com/modelscope/ms-swift/blob/main/swift/tuner_plugin).
-- 多模态模型对ViT部分使用全参数训练，LLM部分使用LoRA训练，参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal/lora_llm_full_vit)。
-- Phi4-multimodal，直接对其已有LoRA进行训练而不额外附加LoRA，参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/plugins/tuner_phi4_mm.sh)。
-
-tuner定制也是swift中有特色的能力之一，开发者可以无视复杂的tuner初始化流程和代码整合成本，将新的tuner注册在这里：
-```python
-class IA3(Tuner):
-
-    @staticmethod
-    def prepare_model(args: 'SftArguments', model: torch.nn.Module) -> torch.nn.Module:
-        model_arch: ModelKeys = model.model_meta.model_arch
-        ia3_config = IA3Config(
-            target_modules=find_all_linears(model), feedforward_modules='.*' + model_arch.mlp.split('{}.')[1] + '.*')
-        return get_peft_model(model, ia3_config)
-
-    @staticmethod
-    def save_pretrained(
-        model: torch.nn.Module,
-        save_directory: str,
-        state_dict: Optional[dict] = None,
-        safe_serialization: bool = True,
-        **kwargs,
-    ) -> None:
-        model: PeftModel
-        model.save_pretrained(save_directory, state_dict=state_dict, safe_serialization=safe_serialization, **kwargs)
-
-    @staticmethod
-    def from_pretrained(model: torch.nn.Module, model_id: str, **kwargs) -> torch.nn.Module:
-        return PeftModel.from_pretrained(model, model_id, **kwargs)
-```
-
-上面的例子中，我们将peft的IA3应用于模型训练中，在这个类中包含了三个方法：
-- prepare_model: 如何将原始模型使用tuner进行封装，并设置好可训练参数
-- save_pretrained: 如何在训练中保存模型
-- from_pretrained: 如何在后续训练和推理中将之前存下来的checkpoint重新拉起
-
-上面的三个方法会在swift训练流程中被调用，这样就做到了开发者可以不阅读复杂的训练代码而使用自己的tuner。
-
-## PRM
-
-example在[这里](https://github.com/modelscope/ms-swift/blob/main/swift/rewards/prm.py)。
-
-PRM是过程奖励模型，PRM会在`swift sample`命令中使用。PRM需要支持的接口比较简单：
-```python
-class PRM:
-
-    def __init__(self):
-        # init here
-        pass
-
-    def __call__(self, infer_requests: List[InferRequest], **kwargs) -> List[Union[float, List[float]]]:
-        raise NotImplementedError
-```
-
-其中的InferRequest来自于`swift.infer_engine`，返回的`List[Union[float, List[float]]]`，列表中可能是reward也可能是若干reward。开发者可以在infer_requests中拿到queries和responses，并按照自己的方式进行切分，例如：
-```text
-Let's think step by step.
-
-Step1: xxx
-
-Step2: xxx
-
-So, the answer is ...
-```
-开发者可以在这里对过程进行切分，并按batch传入PRM中进行推理并返回rewards。更通用来说，开发者可以在这里调用一个远端URL，例如一个闭源PRM大模型并返回rewards。
-
-## ORM
-
-example在[这里](https://github.com/modelscope/ms-swift/blob/main/swift/rewards/orm.py)。
-
-ORM是结果奖励模型。ORM一般使用正则表达式来进行，ORM决定了response是否是正确的。例如：
-
-```python
-class MathORM(ORM):
-
-    @staticmethod
-    def extract_boxed_result(text):
-        pattern = r'\\boxed{([^}]*)}'
-        match = re.search(pattern, text)
-        if match:
-            return match.group(1).strip()
-        else:
-            return None
-
-    def __call__(self, infer_requests: List[InferRequest], ground_truths: List[str],
-                **kwargs) -> List[float]:
-        rewards = []
-        predictions = [request.messages[-1]['content'] for request in infer_requests]
-        for prediction, ground_truth in zip(predictions, ground_truths):
-            res1 = MathORM.extract_boxed_result(prediction) or ''
-            res2 = MathORM.extract_boxed_result(ground_truth) or ''
-            rewards.append(float(res1.strip() == res2.strip()))
-
-        return rewards
-
-
-orms = {
-    'math': MathORM,
-}
-```
-
-在上面的代码中，我们定义了一个对数学response进行解析的过程，如果结果相同则返回score为1.0，否则为0.0。和PRM不同，这个类的infer中有一个额外参数`ground_truths`，
-该参数是对应的infer_requests的实际label（数据集中定义的标准response）。
diff --git a/docs/source/Instruction/Agent-support.md b/docs/source/Instruction/Agent-support.md
index b3764f3a68..6883a18198 100644
--- a/docs/source/Instruction/Agent-support.md
+++ b/docs/source/Instruction/Agent-support.md
@@ -223,7 +223,7 @@ print(template.safe_decode(inputs['labels']))
 # '[-100 * 14]abc<think>\n\n</think>\n\n123<|im_end|>\n'
 ```
 
-更多的loss_scale插件设计，请参考[插件化](../Customization/Pluginization.md)文档.
+更多的loss_scale插件设计，请参考[架构](../Customization/Architecture.md#loss-scale)文档.
 
 ## 训练
 - 训练Base模型的Agent能力，通过修改`--model`切换不同模型，参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/agent/qwen2_5.sh)。
diff --git a/docs/source/Instruction/Command-line-parameters.md b/docs/source/Instruction/Command-line-parameters.md
index 9c15432779..63a2b3a537 100644
--- a/docs/source/Instruction/Command-line-parameters.md
+++ b/docs/source/Instruction/Command-line-parameters.md
@@ -795,7 +795,7 @@ qwen2_5_omni除了包含qwen2_5_vl和qwen2_audio的模型特定参数外，还
 - 🔥ENABLE_AUDIO_OUTPUT: 默认为None，即使用`config.json`中的值。若使用zero3进行训练，请设置为False。
   - 提示：ms-swift只对thinker部分进行微调，建议设置为False以降低显存占用（只创建thinker部分的模型结构）。
 
-### qwen3_vl
+### qwen3_vl, qwen3_5
 参数含义与`qwen_vl_utils>=0.0.14`库中的含义一致，可以查看[这里](https://github.com/QwenLM/Qwen2.5-VL/blob/main/qwen-vl-utils/src/qwen_vl_utils/vision_process.py#L24)。通过传入以下环境变量，可以修改该库的全局变量默认值。（也兼容使用`qwen2_5_vl`的环境变量，例如：`MAX_PIXELS`、`VIDEO_MAX_PIXELS`，会做自动转换。）
 
 - SPATIAL_MERGE_SIZE: 默认为2。
diff --git a/docs/source/Instruction/GRPO/DeveloperGuide/multi_turn.md b/docs/source/Instruction/GRPO/DeveloperGuide/multi_turn.md
index 00a776cb7a..d25772f6f6 100644
--- a/docs/source/Instruction/GRPO/DeveloperGuide/multi_turn.md
+++ b/docs/source/Instruction/GRPO/DeveloperGuide/multi_turn.md
@@ -192,7 +192,7 @@ swift rollout \
 
 **第一种：设置 loss_scale**
 
-ms-swift 提供 loss_scale 参数来对模型回复部分的内容进行损失缩放设置。比如设置`--loss_scale last_round`，可以将非最后一轮的模型回复的损失置零。我们也可以实现自定义 loss_scale，具体请参考[定制化 loss_scale 文档](../../../Customization/Pluginization.md#定制化loss_scale)。
+ms-swift 提供 loss_scale 参数来对模型回复部分的内容进行损失缩放设置。比如设置`--loss_scale last_round`，可以将非最后一轮的模型回复的损失置零。我们也可以实现自定义 loss_scale，具体请参考[定制化 loss_scale 文档](../../../Customization/Architecture.md#loss-scale)。
 
 > 注：在GRPO中，loss_scale 只提供掩码功能，不提供缩放功能。
 
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 9b99e18a1f..bfa14a60da 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -51,9 +51,9 @@ Swift DOCUMENTATION
    :maxdepth: 2
    :caption: Customization
 
+   Customization/Architecture.md
    Customization/Custom-model.md
    Customization/Custom-dataset.md
-   Customization/Pluginization.md
 
 
 .. toctree::
diff --git a/docs/source_en/BestPractices/Qwen3-VL-Best-Practice.md b/docs/source_en/BestPractices/Qwen3-VL-Best-Practice.md
index fafa93e344..dc5b458e2b 100644
--- a/docs/source_en/BestPractices/Qwen3-VL-Best-Practice.md
+++ b/docs/source_en/BestPractices/Qwen3-VL-Best-Practice.md
@@ -153,7 +153,7 @@ Here's a breakdown of what unfolds:
 Overall, this is a sweet, lighthearted video that showcases the innocence and imagination of early childhood. The child's engagement with the book, combined with their glasses and playful demeanor, creates a delightful and memorable scene.
 ```
 
-- For model-specific parameters, such as environment variables like `VIDEO_MAX_TOKEN_NUM`, please refer to the [Command Line Parameters Documentation](../Instruction/Command-line-parameters.md#qwen3_vl).
+- For model-specific parameters, such as environment variables like `VIDEO_MAX_TOKEN_NUM`, please refer to the [Command Line Parameters Documentation](../Instruction/Command-line-parameters.md#qwen3_vl-qwen3_5).
 
 
 ## Training
diff --git a/docs/source_en/Customization/Architecture.md b/docs/source_en/Customization/Architecture.md
new file mode 100644
index 0000000000..8570f82b61
--- /dev/null
+++ b/docs/source_en/Customization/Architecture.md
@@ -0,0 +1,233 @@
+# Architecture Introduction
+
+ms-swift 4.0 adopts a modular design, with functional modules distributed in first-level directories, making it convenient for developers to perform custom extensions. This document will provide a detailed introduction to the functions of each module and customization methods.
+
+## Agent Template
+
+The mapping file for agent templates can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/agent_template/mapping.py). The design goal of agent template is to flexibly switch between different models for training based on a unified Agent dataset format, without modifying the data. During training, use `--agent_template` to specify the corresponding agent template.
+
+All AgentTemplates need to inherit from `BaseAgentTemplate` and implement several methods: `_format_tools`, `_format_tool_calls`, `_format_tool_responses`, `get_toolcall`.
+- _format_tools: Format `tools` and `system` to compose a complete system.
+- _format_tool_calls: Format the tool_call part `[{"role": "tool_call", "content": "..."}, {"role": "tool_call", "content": "..."}]` and finally return a string.
+- _format_tool_responses: Format the tool (also called tool_response) part `[{"role": "tool", "content": "..."}, {"role": "tool", "content": "..."}]`.
+- get_toolcall: Used during deployment to parse the tool name and parameters from the model output content, returning `List[Function]`.
+
+
+How to debug:
+```python
+data = {"tools": "[{\"type\": \"function\", \"function\": {\"name\": \"realtime_aqi\", \"description\": \"天气预报。获取实时空气质量。当前空气质量，PM2.5，PM10信息\", \"parameters\": {\"type\": \"object\", \"properties\": {\"city\": {\"type\": \"string\", \"description\": \"城市名，例如：上海\"}}, \"required\": [\"city\"]}}}]", "messages": [{"role": "user", "content": "北京和上海今天的天气情况"}, {"role": "tool_call", "content": "{\"name\": \"realtime_aqi\", \"arguments\": {\"city\": \"北京\"}}"}, {"role": "tool_call", "content": "{\"name\": \"realtime_aqi\", \"arguments\": {\"city\": \"上海\"}}"}, {"role": "tool_response", "content": "{\"city\": \"北京\", \"aqi\": \"10\", \"unit\": \"celsius\"}"}, {"role": "tool_response", "content": "{\"city\": \"上海\", \"aqi\": \"72\", \"unit\": \"fahrenheit\"}"}, {"role": "assistant", "content": "根据天气预报工具，北京今天的空气质量指数为10，属于良好水平；上海今天的空气质量指数为72，属于轻度污染水平。"}]}
+
+
+from swift import get_processor, get_template
+
+tokenizer = get_processor('Qwen/Qwen3.5-2B')
+template = get_template(tokenizer)  # Use default agent template
+# template = get_template(tokenizer, agent_template='qwen3_5')
+print(f'agent_template: {template._agent_template}')
+template.set_mode('train')
+encoded = template.encode(data)
+print(f'[INPUT_IDS] {template.safe_decode(encoded["input_ids"])}\n')
+print(f'[LABELS] {template.safe_decode(encoded["labels"])}')
+```
+
+If you want to provide us with a PR, please refer to [here](https://github.com/modelscope/ms-swift/blob/main/tests/test_align/test_template/test_agent.py) to write your test cases.
+
+## Callbacks
+
+The mapping file for callbacks can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/callbacks/mapping.py). Callbacks can customize the behavior at key points in the trainer. After customization, you need to register them in the mapping and use `--callbacks` to specify the corresponding callback class during training. For example, you can customize:
+
+```python
+class CustomCallback(TrainerCallback):
+
+    def on_train_begin(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs):
+        # Doing something when the training begins.
+        pass
+
+    def on_save(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs):
+        # Doing something when save checkpoint
+        pass
+```
+
+All callback classes need to inherit from `TrainerCallback` in base.py and override its methods. The interface is consistent with transformers' `TrainerCallback`, please refer to transformers' [callback documentation](https://huggingface.co/docs/transformers/main_classes/callback).
+
+## Loss
+
+The mapping file for Loss can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/loss/mapping.py).
+Swift supports custom loss (currently only supports sft/pretrain/reranker/embedding tasks). After registration, set `--loss_type <loss-name>` during training to use your custom loss method.
+
+Custom Loss needs to inherit from `BaseLoss` and implement the `__call__` method, returning a scalar Tensor. You can refer to [CustomCrossEntropyLoss](https://github.com/modelscope/ms-swift/blob/0d7c9f5bc0e7e7d67d914ce6edeb9ce24f60746f/swift/loss/causal_lm.py#L5) for customization. For example:
+
+```python
+class CustomLoss(BaseLoss):
+
+    def __call__(self, outputs, labels, **kwargs) -> torch.Tensor:
+        pass
+```
+
+## Loss Scale
+
+The mapping file for loss scale can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/loss_scale/mapping.py). In pretrain and sft tasks, the loss of trainable tokens is averaged, meaning each token is treated equally. However, in some cases, certain tokens need extra attention and should be assigned higher weights, or some tokens should not be trained. loss_scale allows developers to freely define their own token weights. (Pretrain and SFT support using loss_scale to control whether tokens participate in training and their weight sizes, while in RLHF, it only supports controlling whether tokens participate in training)
+
+You can customize loss scale by inheriting the LossScale base class and implementing the `get_loss_scale` method.
+
+```python
+class CustomLossScale(LossScale):
+
+    def get_loss_scale(self, context: str, **kwargs) -> Tuple[List[str], List[float]]:
+        ...
+```
+
+The `get_loss_scale` function returns a Tuple. The first return is a list of decomposed strings, and the second parameter is a list of loss_scales corresponding to the strings. The float value represents the weight. For example, the following weight setting:
+
+```text
+["学习", "好", "数学", "是", "重要", "的"]
+[1.0, 0.5, 2.0, 0.5, 2.0, 0.1]
+```
+In the example, we place more emphasis on the words "数学" and "重要" because their loss_scale is 2.0.
+
+Of course, we also need to pay attention to the core logic of the `__call__` method, namely the influence of the loss_scale base strategy (base_strategy) all/default/last_round on loss_scale. For details, refer to the introduction in the [Command-line Parameters Documentation](../Instruction/Command-line-parameters.md). Also, refer to the influence of the 'loss' field in the dataset on loss_scale in the [Custom Dataset Documentation](../Customization/Custom-dataset.md).
+
+```python
+if loss or loss is None and (self.base_strategy == 'all' or
+                            (self.base_strategy == 'default' and is_assistant) or
+                            (self.base_strategy == 'last_round' and is_assistant and is_last_round)):
+    new_context, loss_scale = self.get_loss_scale(context, query=query)
+else:
+    new_context, loss_scale = [context], [0.]
+```
+
+In addition, you can also use [JSON configuration files](https://github.com/modelscope/ms-swift/tree/main/swift/loss_scale/config) and inherit the built-in ConfigLossScale class to customize loss_scale. Currently, two configuration methods are supported: exact string matching and regular expression matching. You can refer to the content in [Agent Support Documentation](../Instruction/Agent-support.md#usage-of-loss_scale) for understanding.
+
+- Exact string matching, for example, refer to `react.json`, `qwen.json`. The JSON needs to contain a mapping of `Dict[str, List[float]]`. The string represents a keyword, and the list needs to have two values. We will split the string into multiple segments based on the keyword. The first value in the list represents the weight of the keyword, and the second value represents the weight of the content after this keyword and before the next keyword.
+- Regular expression matching, for example, refer to `ignore_empty_think.json`, `hermes.json`. The JSON needs to contain a mapping of `Dict[str, float]`. The string represents a regular expression pattern, and the float represents the weight of the matching string.
+
+How to debug:
+
+```python
+from swift import get_processor, get_template
+
+data = {"messages": [
+    {"role": "user", "content": "What is today's date?"},
+    {"role": "assistant", "content": (
+        "<think>\nI can get the current time by calling the `get_date` function.\n</think>\n"
+        '<tool_call>\n{"name": "get_date", "arguments": {}}\n</tool_call>'
+    )}
+]}
+
+template = get_template(get_processor('Qwen/Qwen3-8B'), loss_scale='hermes')
+template.set_mode('train')
+inputs = template.encode(data)
+
+print(template.safe_decode(inputs['labels']))
+print(inputs['loss_scale'])
+```
+
+## Metrics
+
+The mapping file for metrics can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/metrics/mapping.py). This component is used in both ms-swift and Megatron-SWIFT.
+
+- If used in ms-swift, you need to inherit the `EvalMetrics` base class from base.py and implement the `compute_metrics` function, returning a dictionary `Dict[str, float]`. You can refer to [NlgMetrics](https://github.com/modelscope/ms-swift/blob/0d7c9f5bc0e7e7d67d914ce6edeb9ce24f60746f/swift/metrics/nlg.py#L33) for customization.
+- If used in Megatron-SWIFT, you need to inherit the `Metric` base class from utils.py and implement the `update` and `compute` methods. The compute method should return a dictionary `Dict[str, float]`.
+
+You can customize metrics (currently only supports sft/pretrain/reranker/embedding tasks) and set `--eval_metric <metric-name>` during training to use your custom metrics.
+
+## Optimizers
+
+The mapping file for optimizers can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/optimizers/mapping.py). If you need to customize an optimizer, you need to inherit the `OptimizerCallback` base class and override the `create_optimizer` function. Use `--optimizer <optimizer-name>` during training to specify the custom optimizer.
+
+- You can refer to [MultimodalOptimizerCallback](https://github.com/modelscope/ms-swift/blob/0d7c9f5bc0e7e7d67d914ce6edeb9ce24f60746f/swift/optimizers/multimodal.py#L43) for implementation. This class implements the functionality of vit_lr and aligner_lr, which uses different learning rates for vit, aligner, and LLM respectively.
+
+## Tuner Plugin
+
+The mapping file for Tuner plugins can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/tuner_plugin/mapping.py). If you need to customize a tuner, you need to inherit the `Tuner` base class and override the `prepare_model`, `save_pretrained`, `from_pretrained` functions.
+
+- prepare_model: This function is called before training to process and prepare the original model, wrap it with the tuner, and set trainable parameters. For example: you can attach LoRA to certain layers and freeze certain layers.
+- save_pretrained: This function is called during training to save the model.
+- from_pretrained: This function is called during inference/resuming training to prepare the model and load weights.
+
+You can refer to [LoRALLMTuner](https://github.com/modelscope/ms-swift/blob/0d7c9f5bc0e7e7d67d914ce6edeb9ce24f60746f/swift/tuner_plugin/lora_llm.py#L24) for implementation. This class implements the functionality of performing LoRA training on LLM and full parameter training on ViT.
+
+## ORM
+
+Examples can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/rewards/orm.py).
+
+ORM is an Outcome Reward Model. ORM is generally implemented using regular expressions. ORM determines whether a response is correct. For example:
+
+```python
+class MathORM(ORM):
+
+    @staticmethod
+    def extract_boxed_result(text):
+        pattern = r'\\boxed{([^}]*)}'
+        match = re.search(pattern, text)
+        if match:
+            return match.group(1).strip()
+        else:
+            return None
+
+    def __call__(self, infer_requests: List[InferRequest], ground_truths: List[str],
+                **kwargs) -> List[float]:
+        rewards = []
+        predictions = [request.messages[-1]['content'] for request in infer_requests]
+        for prediction, ground_truth in zip(predictions, ground_truths):
+            res1 = MathORM.extract_boxed_result(prediction) or ''
+            res2 = MathORM.extract_boxed_result(ground_truth) or ''
+            rewards.append(float(res1.strip() == res2.strip()))
+
+        return rewards
+
+
+orms = {
+    'math': MathORM,
+}
+```
+
+In the code above, we define a process for parsing mathematical responses. If the results are the same, it returns a score of 1.0, otherwise 0.0. Unlike PRM, this class has an additional parameter `ground_truths` in infer,
+which contains the actual labels (standard responses defined in the dataset) of the corresponding infer_requests.
+
+## PRM
+
+Examples can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/rewards/prm.py).
+
+PRM is a Process Reward Model, which will be used in the `swift sample` command. The interface that PRM needs to support is relatively simple:
+
+```python
+class PRM:
+
+    def __init__(self):
+        # init here
+        pass
+
+    def __call__(self, infer_requests: List[InferRequest], **kwargs) -> List[Union[float, List[float]]]:
+        raise NotImplementedError
+```
+
+The InferRequest comes from `swift.infer_engine`, and the returned `List[Union[float, List[float]]]` can contain either a reward or multiple rewards. Developers can obtain queries and responses from infer_requests and split them according to their own methods. For example:
+
+```text
+Let's think step by step.
+
+Step1: xxx
+
+Step2: xxx
+
+So, the answer is ...
+```
+
+## Introduction to Other Directory Structures
+
+- arguments: Command-line parameter definitions, such as: `SftArguments`, `RLHFArguments`, etc.
+- cli: Swift command-line mechanism and startup files. For example, `swift sft ...` is equivalent to `python swift/cli/main.py sft ...` and also equivalent to `python swift/cli/sft.py ...`.
+- config: deepspeed/fsdp2 configuration files.
+- dataloader: Implementation of dataloader, including shard/dispatcher methods.
+- dataset: Dataset-related module implementation, including data preprocessing, packing, streaming data, etc. Registration of built-in datasets is in the `dataset/dataset` and `dataset/data` folders. For details, refer to [Custom Dataset Documentation](Custom-dataset.md).
+- infer_engine: Inference engine implementation. Includes inference engine implementations with transformers/vllm/sglang/lmdeploy as backends.
+- megatron: Megatron-SWIFT implementation.
+- model: Model loading and registration. For details, refer to [Custom Model Documentation](Custom-model.md), [Multimodal Model Registration Best Practices](../BestPractices/MLLM-Registration.md).
+- pipelines: Main function pipeline implementations for `swift sft/rlhf/infer`, etc., including `sft_main/rlhf_main/infer_main`, etc.
+- rlhf_trainers: Trainer implementations for algorithms such as GRPO/GKD/DPO/KTO/RM.
+- rollout: Sampling implementation of the rollout process in RL algorithms.
+- rewards: Reward function implementation in RL algorithms, supporting custom reward calculation logic.
+- template: Implementation and registration of dialogue templates, including the logic for converting messages to input_ids for various tasks, as well as data_collator-related logic. For details, refer to [Custom Model Documentation](Custom-model.md), [Multimodal Model Registration Best Practices](../BestPractices/MLLM-Registration.md).
+- trainers: Trainer implementations for pretrain/SFT/Embedding/Reranker/sequence classification tasks.
+- ui: `swift web-ui` interface training and inference implementation.
diff --git a/docs/source_en/Customization/Pluginization.md b/docs/source_en/Customization/Pluginization.md
deleted file mode 100644
index a16508e27c..0000000000
--- a/docs/source_en/Customization/Pluginization.md
+++ /dev/null
@@ -1,238 +0,0 @@
-# Pluginization
-
-> [!WARNING]
-> This document is pending update to ms-swift 4.0
-
-Pluginization is a significant new feature introduced in SWIFT 3.0. We aim to make the customization of the development process more natural for developers through a plugin-based approach.
-
-## Callback Mechanism
-
-An example can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/callbacks).
-
-The `callback` mechanism is a customization feature in the Transformers Trainer that allows developers to control the training process. Typically, customizing a callback looks like the following:
-
-```python
-class CustomCallback(TrainerCallback):
-
-    def on_train_begin(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs):
-        # Doing something when the training begins.
-        pass
-
-    def on_save(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs):
-        # Doing something when saving a checkpoint.
-        pass
-```
-
-Callbacks are registered with the trainer before it is instantiated. The example provided demonstrates a simple version of an EarlyStopping mechanism. Registering your own callback is straightforward:
-
-```python
-extra_callbacks = [CustomCallback()]
-```
-
-Developers can add new callbacks in `plugin/callback.py` and customize their training process. For detailed parameters of callbacks, refer to [this documentation](https://huggingface.co/docs/transformers/main_classes/callback).
-
-## Customizing Loss
-
-An example can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/loss/mapping.py).
-
-SWIFT supports customizing the loss function in a plugin. If you don’t use this capability, cross-entropy loss (CE Loss) will be used by default. You can write your code in this file, register it, and then enable your custom loss during training by setting `--loss_type custom_loss` to use your customized loss method.
-
-For example, adding the following code in `plugin/loss.py`:
-
-```python
-def custom_loss_func(outputs, labels, loss_scale=None, num_items_in_batch=None) -> torch.Tensor:
-    # Write your own loss calculation here
-    return loss
-
-loss_map['custom_loss'] = custom_loss_func
-```
-
-It is important to note that the loss function is strongly related to the training task. Currently, loss customization supports PT and SFT tasks. For human alignment tasks (e.g., DPO, PPO) or classification tasks (seq_cls), loss customization through plugins is not supported.
-
-## Customizing Loss Scale
-
-An example can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/loss_scale/mapping.py).
-
-The `loss_scale` mechanism is one of the crucial features in SWIFT. In PT and SFT tasks, the loss for trainable tokens is uniform, meaning each token is equally involved in backpropagation. However, in certain situations, some tokens require higher weights and extra attention. In such cases, `loss_scale` allows developers to define custom token weights.
-
-```python
-class LastRoundLossScale(LossScale):
-
-    def get_loss_scale(self, context: str, context_type: ContextType, is_last_round: bool, **kwargs):
-        if context_type == ContextType.RESPONSE:
-            return [context], [float(is_last_round)]
-        return super().get_loss_scale(context, context_type, is_last_round)
-```
-
-In the above code, a `Tuple` is returned where the first element is the `context` (or its split parts), and the second element is the corresponding `loss_scale`. The float value represents the weight. For example, the following weight settings:
-
-```text
-["学习", "好", "数学", "是", "重要", "的"]
-[1.0, 0.5, 2.0, 0.5, 2.0, 0.1]
-```
-
-Here, we place more emphasis on the words "数学" (mathematics) and "重要" (important) by increasing their weights to 2.0.
-
-Referring back to the code, we check if the provided `context` is a response. If it is a response and is the last round in a multi-turn dialogue, we return a `loss_scale` of `[1]`. In other cases, we use the base implementation (which sets `loss_scale` to `[0]`). This approach ensures that only the responses from the last round participate in training, while other responses do not. Using this method, we can make all tokens (prompts and responses) participate in training or focus on specific special characters of the agent for training, etc.
-
-In PT and SFT, `loss_scale` is uniformly supported (whether to participate in training and the size of the weights). However, in human alignment tasks, only the participation of certain tokens in training is supported, not the size of the weights.
-
-## Customizing Metrics
-
-An example can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/metrics).
-
-Metrics can be customized to evaluate the training process:
-
-```python
-metric_mapping = {
-    'acc': (compute_acc_metrics, preprocess_logits_for_acc),
-    'nlg': (compute_nlg_metrics, None),
-    'custom': (custom_metric, custom_preprocess),
-}
-
-def get_metric(metric: str):
-    return metric_mapping[metric]
-```
-
-In the above definition, we added a new `custom` metric. Its value consists of two parts: the first is the metric computation process, which returns a dictionary containing metric key-value pairs, and the second is the preprocessing step for logits, which returns the actual predictions.
-
-## Customizing Optimizers
-
-An example can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/optimizers/mapping.py).
-- Apply different learning rates to different parts of the model. For example, use separate learning rates for ViT and LLM, as referenced [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/lora_llm_full_vit/custom_plugin.py).
-
-Users can add their own optimizers and learning rate schedulers here:
-
-```python
-def create_custom_optimizers(args, model, dataset):
-    # Create your own optimizer
-    return CustomOptimizer(optimizer_grouped_parameters, **optimizer_kwargs), CustomScheduler(...)
-
-optimizers_map = {
-    'custom': create_custom_optimizers,
-    ...
-}
-```
-
-When developers need to use other optimizers, such as those defined in new research papers, they can define their creation process here and specify the parameter:
-
-```shell
---optimizer custom
-```
-
-This will invoke the custom optimizer.
-
-
-## Customizing Agent Template
-
-The example is [here](https://github.com/modelscope/ms-swift/blob/main/swift/agent_template/mapping.py).
-
-## Customizing Tuners
-
-An example can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/tuner_plugin).
-- For the multimodal model, full-parameter training is applied to the ViT part, while LoRA training is used for the LLM part. Refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal/lora_llm_full_vit).
-- For Phi4-multimodal, train its existing LoRA directly without adding extra LoRA. Refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/plugins/tuner_phi4_mm.sh).
-
-Tuner customization is another unique feature of SWIFT. Developers can bypass the complex tuner initialization process and code integration costs by registering new tuners here:
-
-```python
-class IA3(Tuner):
-
-    @staticmethod
-    def prepare_model(args: 'SftArguments', model: torch.nn.Module) -> torch.nn.Module:
-        model_arch: ModelKeys = model.model_meta.model_arch
-        ia3_config = IA3Config(
-            target_modules=find_all_linears(model), feedforward_modules='.*' + model_arch.mlp.split('{}.')[1] + '.*')
-        return get_peft_model(model, ia3_config)
-
-    @staticmethod
-    def save_pretrained(
-        model: torch.nn.Module,
-        save_directory: str,
-        state_dict: Optional[dict] = None,
-        safe_serialization: bool = True,
-        **kwargs,
-    ) -> None:
-        model: PeftModel
-        model.save_pretrained(save_directory, state_dict=state_dict, safe_serialization=safe_serialization, **kwargs)
-
-    @staticmethod
-    def from_pretrained(model: torch.nn.Module, model_id: str, **kwargs) -> torch.nn.Module:
-        return PeftModel.from_pretrained(model, model_id, **kwargs)
-```
-
-In the above example, we apply PEFT's IA3 to model training. This class includes three methods:
-
-- `prepare_model`: How to wrap the original model using the tuner and set up trainable parameters.
-- `save_pretrained`: How to save the model during training.
-- `from_pretrained`: How to reload checkpoints saved earlier for subsequent training and inference.
-
-These three methods are invoked during the SWIFT training process, allowing developers to use their tuners without reading the complex training code.
-
-## PRM (Process Reward Model)
-
-An example can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/rewards/prm.py).
-
-PRM stands for Process Reward Model, which is used in the `swift sample` command. PRM needs to support simple interfaces:
-
-```python
-class PRM:
-
-    def __init__(self):
-        # init here
-        pass
-
-    def __call__(self, infer_requests: List[InferRequest], **kwargs) -> List[Union[float, List[float]]]:
-        raise NotImplementedError
-```
-
-The InferRequest comes from `swift.infer_engine`, and the returned `List[Union[float, List[float]]]` may contain a reward or several rewards. Developers can access queries and responses in infer_requests and split them according to their own methods, for example:
-
-```text
-Let's think step by step.
-
-Step1: xxx
-
-Step2: xxx
-
-So, the answer is ...
-```
-
-Developers can split the process here, batch them into PRM for inference, and return rewards. More generally, developers can call a remote URL here, such as a closed-source PRM large model, and return rewards.
-
-## ORM (Outcome Reward Model)
-
-An example can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/rewards/orm.py).
-
-ORM stands for Outcome Reward Model. ORM typically uses regular expressions to determine whether a response is correct. For example:
-
-```python
-class MathORM(ORM):
-
-    @staticmethod
-    def extract_boxed_result(text):
-        pattern = r'\\boxed{([^}]*)}'
-        match = re.search(pattern, text)
-        if match:
-            return match.group(1).strip()
-        else:
-            return None
-
-    def __call__(self, infer_requests: List[InferRequest], ground_truths: List[str],
-                **kwargs) -> List[float]:
-        rewards = []
-        predictions = [request.messages[-1]['content'] for request in infer_requests]
-        for prediction, ground_truth in zip(predictions, ground_truths):
-            res1 = MathORM.extract_boxed_result(prediction) or ''
-            res2 = MathORM.extract_boxed_result(ground_truth) or ''
-            rewards.append(float(res1.strip() == res2.strip()))
-
-        return rewards
-
-
-orms = {
-    'math': MathORM,
-}
-```
-
-In the above code, we define a process to parse mathematical responses. If the results are the same, it returns a score of `1.0`; otherwise, it returns `0.0`. Unlike PRM, this class's `infer` method includes an additional parameter `ground_truths`, which corresponds to the actual labels (standard responses defined in the dataset) for the `infer_requests`.
diff --git a/docs/source_en/Instruction/Agent-support.md b/docs/source_en/Instruction/Agent-support.md
index 2072f30f94..6cf6b5ce18 100644
--- a/docs/source_en/Instruction/Agent-support.md
+++ b/docs/source_en/Instruction/Agent-support.md
@@ -239,7 +239,7 @@ print(template.safe_decode(inputs['labels']))
 ```
 
 
-For more `loss_scale` plugin designs, please refer to the [Pluginization](../Customization/Pluginization.md) documentation.
+For more `loss_scale` plugin designs, please refer to the [Architecture](../Customization/Architecture.md) documentation.
 
 ## Training
 
diff --git a/docs/source_en/Instruction/Command-line-parameters.md b/docs/source_en/Instruction/Command-line-parameters.md
index bc0d5b441c..49ceff538b 100644
--- a/docs/source_en/Instruction/Command-line-parameters.md
+++ b/docs/source_en/Instruction/Command-line-parameters.md
@@ -819,7 +819,7 @@ qwen2_5_omni not only includes the model-specific parameters of qwen2_5_vl and q
   - Tip: ms-swift only fine-tunes the "thinker" component; it is recommended to set this to `False` to reduce GPU memory usage (only the thinker part of the model structure will be created).
 
 
-### qwen3_vl
+### qwen3_vl, qwen3_5
 The parameter meanings are the same as in the `qwen_vl_utils>=0.0.14` library — see here: https://github.com/QwenLM/Qwen2.5-VL/blob/main/qwen-vl-utils/src/qwen_vl_utils/vision_process.py#L24. By passing the following environment variables you can override the library's global default values: (It is also compatible with environment variables used by `qwen2_5_vl`, such as: `MAX_PIXELS`, `VIDEO_MAX_PIXELS`, and will perform automatic conversion.)
 
 
diff --git a/docs/source_en/Instruction/GRPO/DeveloperGuide/multi_turn.md b/docs/source_en/Instruction/GRPO/DeveloperGuide/multi_turn.md
index 6c065483a8..2dba696e24 100644
--- a/docs/source_en/Instruction/GRPO/DeveloperGuide/multi_turn.md
+++ b/docs/source_en/Instruction/GRPO/DeveloperGuide/multi_turn.md
@@ -206,7 +206,7 @@ You can set the loss mask in two ways.
 
 ms-swift provides the `loss_scale` parameter to scale or mask parts of the response.
 For example, `--loss_scale last_round` zeroes out the loss for all but the last round.
-Custom `loss_scale` can also be implemented; see the [customisation guide](../../../Customization/Pluginization.md#customizing-loss-scale).
+Custom `loss_scale` can also be implemented; see the [customisation guide](../../../Customization/Architecture.md#loss-scale).
 
 > Note: In GRPO, `loss_scale` serves only as a mask; it does not scale the loss.
 
diff --git a/docs/source_en/index.rst b/docs/source_en/index.rst
index 9b99e18a1f..e2af21c715 100644
--- a/docs/source_en/index.rst
+++ b/docs/source_en/index.rst
@@ -51,9 +51,10 @@ Swift DOCUMENTATION
    :maxdepth: 2
    :caption: Customization
 
+   Customization/Architecture.md
    Customization/Custom-model.md
    Customization/Custom-dataset.md
-   Customization/Pluginization.md
+
 
 
 .. toctree::
diff --git a/swift/metrics/acc.py b/swift/metrics/acc.py
index 225ee94970..e513d285bc 100644
--- a/swift/metrics/acc.py
+++ b/swift/metrics/acc.py
@@ -43,7 +43,7 @@ def compute_acc(preds,
 
 class AccMetrics(EvalMetrics):
 
-    def compute_acc_metrics(self, eval_prediction: EvalPrediction) -> Dict[str, float]:
+    def compute_metrics(self, eval_prediction: EvalPrediction) -> Dict[str, float]:
         metric = compute_acc(
             eval_prediction.predictions,
             eval_prediction.label_ids,