diff --git a/README.md b/README.md
index fecaea4ac5..bba9d77a58 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
* [2026-01] Introducing [R3L](https://github.com/shiweijiezero/R3L): a systematic reflect-then-retry RL mechanism with efficient language-guided exploration and stable off-policy learning ([paper](https://arxiv.org/abs/2601.03715)).
* [2025-12] [[Release Notes]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.4.0) Trinity-RFT v0.4.0 released: added [Tinker](https://thinkingmachines.ai/tinker/) backend for users **without GPUs**, add more benchmarks, enhance online RL and more.
* [2025-12] Trinity-RFT powers the medical and health business of "Taobao Shangou", enabling the AI agent to understand vague symptoms, proactively ask follow-up questions, and provide precise recommendations ([News](https://tech.china.com.cn/sx/20251201/411376.shtml)).
-* [2025-11] Introducing [Learn-to-Ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask): a framework for training proactive dialogue agents from offline expert data ([paper](https://arxiv.org/pdf/2510.25441)).
+* [2025-11] Introducing [Learn-to-Ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask): a framework for training proactive dialogue agents from offline expert data ([paper](https://arxiv.org/pdf/2510.25441)).
* [2025-11] Introducing [BOTS](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots): online RL task selection for efficient LLM fine-tuning ([paper](https://arxiv.org/pdf/2510.26374)).
* [2025-09] [Our paper](https://arxiv.org/pdf/2509.24203) reveals a novel off-policy interpretation for group-relative REINFORCE and its variants like GRPO and AsymRE ([implementation](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k)).
* [2025-08] Introducing [CHORD](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/mix_chord): dynamic SFT + RL integration for advanced LLM fine-tuning ([paper](https://arxiv.org/pdf/2508.11408)).
@@ -70,6 +70,15 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
| *Benchmarks* | • [Benchmark toolkit (quick verification & experimentation)](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/README.md)
• [Guru-Math benchmark & comparison with veRL](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)
• [FrozenLake benchmark & comparison with rLLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)
• [Alfworld benchmark & comparison with rLLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
| *Going deeper into Trinity-RFT* | • [Full configurations](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)
• [GPU resource and training configuration guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)
• [Training VLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_vlm)
• [Understand the coordination between explorer and trainer](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)
• [How to align configuration with veRL](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) |
+> [!TIP]
+> **Recommended Learning Paths**
+>
+> 🆕 **New users:** [Installation](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_installation.html) → [Quick Start (GSM8K)](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html) → [Configuration Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html) → [GPU Resource Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)
+>
+> 🔬 **Algorithm researchers:** [Developer Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_overview.html) → [Algorithm Development Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html) → [CHORD Algorithm Example](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)
+>
+> 🤖 **Agent developers:** [Developer Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_overview.html) → [Workflow Development](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_workflow.html) → [General Multi-step Workflow Example](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html)
+
> [!NOTE]
> For more tutorials, please refer to the [Trinity-RFT documentation](https://agentscope-ai.github.io/Trinity-RFT/).
@@ -366,12 +375,12 @@ For studio users, click "Run" in the web interface.
## Contribution Guide
-This project is currently under active development, and we welcome contributions from the community!
+This project is currently under active development--star the repo to watch releases for the latest updates!
-We welcome contributions of all kinds, including:
+We welcome all kinds of contributions from the community, including:
* Documentation improvements
-* Example workflows
+* Example workflows, algorithms, and data pipelines
* Bug fixes and performance optimizations
If you're new to the project, documentation and example updates are a great place to start.
diff --git a/README_zh.md b/README_zh.md
index 5b46002dcb..246dda0110 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -47,10 +47,9 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
* [2026-01] [[发布说明]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.4.1) Trinity-RFT v0.4.1 发布:升级 verl 至 v0.7.0,Tinker 后端支持 OpenAI API,修复若干 Bug。
* [2026-01] 推出 [R3L](https://github.com/shiweijiezero/R3L):基于反思-重试的强化学习机制,由自然语言反馈引导高效探索,并达成稳定的 off-policy 学习([论文](https://arxiv.org/abs/2601.03715))。
* [2025-12] [[发布说明]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.4.0) Trinity-RFT v0.4.0 发布:新增[Tinker](https://thinkingmachines.ai/tinker/) 后端以支持在 **无 GPU** 的设备上训练,增加更多基准测试,增强在线 RL 等功能。
-* [2025-12] Trinity-RFT 已支持 [tinker](https://thinkingmachines.ai/tinker/) 训练后端,可在**无 GPU 的设备**上进行模型训练。
* [2025-12] Trinity-RFT 助力淘宝闪购医药健康业务,让 AI 智能体能够理解模糊症状、主动询问后续问题,并提供精准推荐([新闻](https://tech.china.com.cn/sx/20251201/411376.shtml))。
* [2025-11] [[发布说明](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.3.3)] Trinity-RFT v0.3.3 发布:修复若干 Bug。
-* [2025-11] 推出 [Learn-to-Ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask):利用离线专家数据,训练具备主动问询能力的对话智能体([论文](https://arxiv.org/pdf/2510.25441)).
+* [2025-11] 推出 [Learn-to-Ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask):利用离线专家数据,训练具备主动问询能力的对话智能体([论文](https://arxiv.org/pdf/2510.25441))。
* [2025-11] 推出 [BOTS](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots):在线 RL 任务选择,实现高效 LLM 微调([论文](https://arxiv.org/pdf/2510.26374))。
* [2025-09] 我们的 [论文](https://arxiv.org/pdf/2509.24203) 揭示了 group-relative REINFORCE 及其变种(如 GRPO 和 AsymRE)的 off-policy 解释([代码](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k))。
* [2025-08] 推出 [CHORD](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/mix_chord):动态 SFT + RL 集成,实现进阶 LLM 微调([论文](https://arxiv.org/pdf/2508.11408))。
@@ -84,6 +83,15 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
| *深入了解 Trinity-RFT* | + [完整配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)
+ [GPU 资源与训练配置对应指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)
+ [训练多模态模型](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_vlm)
+ [理解 explorer-trainer 同步逻辑](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)
+ [如何与 verl 对齐配置](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html) |
+> [!TIP]
+> **推荐阅读顺序**
+>
+> 🆕 **新手入门:** [安装](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_installation.html) → [快速开始 (GSM8K)](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_reasoning_basic.html) → [参数配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html) → [GPU 资源配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)
+>
+> 🔬 **算法研究者:** [开发者指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_overview.html) → [算法开发指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_algorithm.html) → [CHORD 算法示例](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html)
+>
+> 🤖 **Agent 开发者:** [开发者指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_overview.html) → [Workflow 开发](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_workflow.html) → [通用多轮 Workflow 示例](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_multi_turn.html)
+
> [!NOTE]
> 更多教程请参考 [Trinity-RFT 文档](https://agentscope-ai.github.io/Trinity-RFT/)。
@@ -149,6 +157,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
- [快速上手](#快速上手)
+ - [使用 CPU 快速上手](#使用-cpu-快速上手)
- [第一步:安装](#第一步安装)
- [第二步:准备数据集和模型](#第二步准备数据集和模型)
- [第三步:准备配置文件](#第三步准备配置文件)
@@ -161,14 +170,31 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
## 快速上手
-
> [!NOTE]
> 本项目正处于活跃开发阶段。欢迎提出意见和建议!
->
-> **没有 GPU?没问题!** 您仍然可以尝试使用:
-> 1. 按照安装步骤进行操作(可跳过 `flash-attn` 等 GPU 专用的软件包)
-> 2. 运行 **[Tinker 训练示例](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/tinker)**,该示例专为仅使用 CPU 的系统设计。
+### 使用 CPU 快速上手
+
+如果您没有 GPU,仍然可以通过 Tinker 后端体验 Trinity-RFT。
+
+```bash
+# 创建并激活环境
+python3.10 -m venv .venv
+source .venv/bin/activate
+
+# 安装支持仅 CPU 后端的 Trinity-RFT
+pip install -e ".[tinker]"
+```
+
+运行一个简单示例:
+
+```bash
+trinity run --config examples/tinker/tinker.yaml
+```
+
+该示例专为仅使用 CPU 的设备设计。更多细节请参见完整的 [Tinker 训练示例](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/tinker)。
+
+如需在 GPU 设备上运行 Trinity-RFT,请按照以下步骤操作。
### 第一步:安装
@@ -178,22 +204,26 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
- **CUDA**:版本 >= 12.8
- **GPU**: 至少一块 [compute capability](https://developer.nvidia.com/cuda/gpus) 为 8.0 或更高的 NVIDIA GPU(例如 RTX 30 系列、A100、H100)
-## 源码安装(推荐)
+**推荐安装方式:**
+
+* 没有 GPU → 使用 Tinker 后端
+* 希望快速搭建 → 使用 Docker
+* 希望开发和贡献 → 使用 Conda / venv
+
+#### 源码安装(推荐)
如需修改、扩展 Trinity-RFT,推荐使用此方法。
-### 1. 克隆仓库
+首先,克隆仓库:
```bash
git clone https://github.com/agentscope-ai/Trinity-RFT
cd Trinity-RFT
```
-### 2. 构建环境
-
-可选择以下任一方式:
+然后,通过以下任一方式构建环境:
-#### 使用预构建 Docker 镜像(推荐初学者使用该方法)
+**使用预构建 Docker 镜像(推荐初学者使用该方法)**
```bash
@@ -211,7 +241,7 @@ docker run -it \
> 该镜像已经通过 `uv` 安装了 Trinity-RFT 以及所有 GPU 相关依赖,且会自动激活虚拟环境(也可通过 `source /opt/venv/bin/activate` 手动激活)。必要时可使用 `uv pip install` 添加额外的包。
-#### 使用 Conda
+**使用 Conda**
```bash
conda create -n trinity python=3.12
@@ -228,7 +258,7 @@ pip install -e ".[vllm,flash_attn]"
pip install -e ".[dev]" # 用于调试和开发
```
-#### 使用 venv
+**使用 venv**
```bash
python3.10 -m venv .venv
@@ -245,7 +275,7 @@ pip install -e ".[vllm,flash_attn]"
pip install -e ".[dev]" # 用于调试和开发
```
-#### 使用 `uv`
+**使用 uv**
[`uv`](https://github.com/astral-sh/uv) 是现代的 Python 包管理工具。
@@ -256,7 +286,7 @@ uv sync --extra vllm --extra dev --extra flash_attn
# uv sync --extra tinker --extra dev
```
-## 通过 PyPI 安装
+#### 通过 PyPI 安装
如果您只需使用 Trinity-RFT 而不打算修改代码:
@@ -382,12 +412,17 @@ trinity run --config examples/grpo_gsm8k/gsm8k.yaml
## 贡献指南
+本项目正处于活跃开发阶段——点击 Star 关注本仓库以获取最新更新!
-本项目正处于活跃开发阶段,我们欢迎来自社区的贡献!
+我们欢迎来自社区的各种贡献,包括:
+* 文档改进
+* 工作流、算法和数据处理流水线
+* Bug 修复和性能优化
-请参阅 [贡献指南](./CONTRIBUTING.md) 了解详情。
+如果您是项目新手,文档和例子的更新是很好的入手点。
+详细的贡献指南请参见 [CONTRIBUTING.md](./CONTRIBUTING.md),以及我们的 [good-first-issue 列表](https://github.com/agentscope-ai/Trinity-RFT/issues/470)。
## 致谢
@@ -399,7 +434,7 @@ trinity run --config examples/grpo_gsm8k/gsm8k.yaml
+ [Data-Juicer](https://github.com/datajuicer/data-juicer) 用于数据处理流水线;
+ [AgentScope](https://github.com/agentscope-ai/agentscope) 用于智能体工作流;
+ [Ray](https://github.com/ray-project/ray) 用于分布式系统;
-+ 我们也从 [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)、[TRL](https://github.com/huggingface/trl) 和 [ChatLearn](https://github.com/alibaba/ChatLearn) 等框架中汲取了灵感;
++ 我们也从 [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)、[TRL](https://github.com/huggingface/trl)、[ChatLearn](https://github.com/alibaba/ChatLearn) 和 [rLLM](https://github.com/rllm-org/rllm) 等框架中汲取了灵感;
+ ......
## 引用
diff --git a/docs/sphinx_doc/source/main.md b/docs/sphinx_doc/source/main.md
index b2dfb3b371..066fbbc0f4 100644
--- a/docs/sphinx_doc/source/main.md
+++ b/docs/sphinx_doc/source/main.md
@@ -27,7 +27,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
| Category | Tutorial / Guideline |
| --- | ----|
-| *Run diverse RFT modes* | + [Quick start: GRPO on GSM8k](/tutorial/example_reasoning_basic.md)
+ [Off-policy RFT](/tutorial/example_reasoning_advanced.md)
+ [Fully asynchronous RFT](/tutorial/example_async_mode.md)
+ [Offline learning by DPO or SFT](/tutorial/example_dpo.md) |
+| *Run diverse RFT modes* | + [Quick start: GRPO on GSM8k](/tutorial/example_reasoning_basic.md)
+ [Off-policy RFT](/tutorial/example_reasoning_advanced.md)
+ [Fully asynchronous RFT](/tutorial/example_async_mode.md)
+ [Offline learning by DPO or SFT](/tutorial/example_dpo.md)
+ [RFT without local GPU (Tinker Backend)](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/tinker) |
| *Multi-step agentic RL* | + [Concatenated multi-turn workflow](/tutorial/example_multi_turn.md)
+ [General multi-step workflow](/tutorial/example_step_wise.md)
+ [ReAct workflow with an agent framework](/tutorial/example_react.md)
+ [Example: train a web-search agent](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/agentscope_websearch) |
| *Full-lifecycle data pipelines* | + [Rollout task mixing and selection](/tutorial/develop_selector.md)
+ [Online task curriculum](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374))
+ [Research project: learn-to-ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441))
+ [Experience replay with prioritization](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)
+ [Advanced data processing & human-in-the-loop](/tutorial/example_data_functionalities.md) |
| *Algorithm development* | + [RL algorithm development with Trinity-RFT](/tutorial/example_mix_algo.md) (📝 [paper](https://arxiv.org/pdf/2508.11408))
+ [Research project: R3L (reflect-then-retry RL)](https://github.com/shiweijiezero/R3L) (📝 [paper](https://arxiv.org/abs/2601.03715))
+ [Research project: group-relative REINFORCE](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203))
+ Non-verifiable domains: [RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
@@ -98,12 +98,12 @@ We list some algorithms supported by Trinity-RFT in the following table. For mor
This project is built upon many excellent open-source projects, including:
-+ [verl](https://github.com/volcengine/verl) and [PyTorch's FSDP](https://pytorch.org/docs/stable/fsdp.html) for LLM training;
++ [verl](https://github.com/volcengine/verl), [FSDP](https://pytorch.org/docs/stable/fsdp.html) and [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) for LLM training;
+ [vLLM](https://github.com/vllm-project/vllm) for LLM inference;
+ [Data-Juicer](https://github.com/datajuicer/data-juicer) for data processing pipelines;
+ [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow;
+ [Ray](https://github.com/ray-project/ray) for distributed systems;
-+ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl) and [ChatLearn](https://github.com/alibaba/ChatLearn);
++ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl), [ChatLearn](https://github.com/alibaba/ChatLearn) and [rLLM](https://github.com/rllm-org/rllm);
+ ......
diff --git a/docs/sphinx_doc/source/tutorial/faq.md b/docs/sphinx_doc/source/tutorial/faq.md
index 137b3ad961..58b041f833 100644
--- a/docs/sphinx_doc/source/tutorial/faq.md
+++ b/docs/sphinx_doc/source/tutorial/faq.md
@@ -80,7 +80,7 @@ ImportError: ...
UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key]) ...
```
-**A:** Try to log in to WandB before starting Ray and running the experiment. One way to do this is run the command `export WANDB_API_KEY=[your_api_key]`. Yoy may also try using other monitors instead of WandB by setting `monitor.monitor_type=tensorboard/mlflow`.
+**A:** Try to log in to WandB before starting Ray and running the experiment. One way to do this is run the command `export WANDB_API_KEY=[your_api_key]`. You may also try using other monitors instead of WandB by setting `monitor.monitor_type=tensorboard/mlflow`.
---
diff --git a/docs/sphinx_doc/source_zh/main.md b/docs/sphinx_doc/source_zh/main.md
index 58ab96ad0e..9ed175386c 100644
--- a/docs/sphinx_doc/source_zh/main.md
+++ b/docs/sphinx_doc/source_zh/main.md
@@ -26,7 +26,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
| 类别 | 教程 / 指南 |
| --- | ----|
-| *运行各种 RFT 模式* | + [快速开始:在 GSM8k 上运行 GRPO](/tutorial/example_reasoning_basic.md)
+ [Off-policy RFT](/tutorial/example_reasoning_advanced.md)
+ [全异步 RFT](/tutorial/example_async_mode.md)
+ [通过 DPO 或 SFT 进行离线学习](/tutorial/example_dpo.md) |
+| *运行各种 RFT 模式* | + [快速开始:在 GSM8k 上运行 GRPO](/tutorial/example_reasoning_basic.md)
+ [Off-policy RFT](/tutorial/example_reasoning_advanced.md)
+ [全异步 RFT](/tutorial/example_async_mode.md)
+ [通过 DPO 或 SFT 进行离线学习](/tutorial/example_dpo.md)
+ [在无GPU环境下运行RFT训练(Tinker 后端)](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/tinker) |
| *多轮智能体强化学习* | + [拼接多轮任务](/tutorial/example_multi_turn.md)
+ [通用多轮任务](/tutorial/example_step_wise.md)
+ [调用智能体框架中的 ReAct 工作流](/tutorial/example_react.md)
+ [例子:训练一个网络搜索智能体](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/agentscope_websearch) |
| *全生命周期的数据流水线* | + [Rollout 任务混合与选取](/tutorial/develop_selector.md)
+ [在线任务选择](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots) (📝 [论文](https://arxiv.org/pdf/2510.26374))
+ [研究项目:learn-to-ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [论文](https://arxiv.org/pdf/2510.25441))
+ [经验回放机制](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)
+ [高级数据处理能力 & Human-in-the-loop](/tutorial/example_data_functionalities.md) |
| *强化学习算法开发* | + [使用 Trinity-RFT 进行 RL 算法开发](/tutorial/example_mix_algo.md) (📝 [论文](https://arxiv.org/pdf/2508.11408))
+ [研究项目: R3L (基于反思-重试的强化学习)](https://github.com/shiweijiezero/R3L) (📝 [论文](https://arxiv.org/abs/2601.03715))
+ [研究项目: group-relative REINFORCE](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [论文](https://arxiv.org/abs/2509.24203))
+ 不可验证的领域: [RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [可训练 RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
@@ -98,7 +98,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
+ [Data-Juicer](https://github.com/datajuicer/data-juicer?tab=readme-ov-file) 用于数据处理流水线;
+ [AgentScope](https://github.com/agentscope-ai/agentscope) 用于智能体工作流;
+ [Ray](https://github.com/ray-project/ray) 用于分布式系统;
-+ 我们也从 [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)、[TRL](https://github.com/huggingface/trl) 和 [ChatLearn](https://github.com/alibaba/ChatLearn) 等框架中汲取了灵感;
++ 我们也从 [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)、[TRL](https://github.com/huggingface/trl)、[ChatLearn](https://github.com/alibaba/ChatLearn) 和 [rLLM](https://github.com/rllm-org/rllm) 等框架中汲取了灵感;
+ ......
## 引用
diff --git a/docs/sphinx_doc/source_zh/tutorial/faq.md b/docs/sphinx_doc/source_zh/tutorial/faq.md
index a79b4dc2bc..f1a719dc8d 100644
--- a/docs/sphinx_doc/source_zh/tutorial/faq.md
+++ b/docs/sphinx_doc/source_zh/tutorial/faq.md
@@ -104,7 +104,7 @@ ray start --head
- 对于 trainer,当 `trainer.use_dynamic_bsz=false` 时,调整 `trainer.max_token_len_per_gpu`;当 `trainer.use_dynamic_bsz=true` 时,调整 `trainer.ppo_max_token_len_per_gpu` 和 `trainer.ulysses_sequence_parallel_size`。设置 `trainer.trainer_config.actor_rollout_ref.actor.entropy_from_logits_with_chunking=true` 也可能有帮助。
- 对于 explorer,调整 `explorer.rollout_model.tensor_parallel_size`。
-此外,Trinity-RFT 提供了[GPU 相关配置指南](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html),可参考其中建议。
+此外,Trinity-RFT 提供了[GPU 相关配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html),可参考其中建议。
## 第三部分:调试方法
diff --git a/examples/grpo_gsm8k/gsm8k.yaml b/examples/grpo_gsm8k/gsm8k.yaml
index 150cc68497..ac3762c565 100644
--- a/examples/grpo_gsm8k/gsm8k.yaml
+++ b/examples/grpo_gsm8k/gsm8k.yaml
@@ -12,7 +12,7 @@ model:
max_model_len: 2048
cluster:
node_num: 1
- gpu_per_node: 8
+ gpu_per_node: 2
buffer:
total_epochs: 1
batch_size: 96
@@ -47,7 +47,7 @@ explorer:
eval_interval: 50
runner_per_model: 8
rollout_model:
- engine_num: 2
+ engine_num: 1
tensor_parallel_size: 1
enable_prefix_caching: false
enforce_eager: true