diff --git a/README.md b/README.md index fecaea4ac5..bba9d77a58 100644 --- a/README.md +++ b/README.md @@ -39,7 +39,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob * [2026-01] Introducing [R3L](https://github.com/shiweijiezero/R3L): a systematic reflect-then-retry RL mechanism with efficient language-guided exploration and stable off-policy learning ([paper](https://arxiv.org/abs/2601.03715)). * [2025-12] [[Release Notes]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.4.0) Trinity-RFT v0.4.0 released: added [Tinker](https://thinkingmachines.ai/tinker/) backend for users **without GPUs**, add more benchmarks, enhance online RL and more. * [2025-12] Trinity-RFT powers the medical and health business of "Taobao Shangou", enabling the AI agent to understand vague symptoms, proactively ask follow-up questions, and provide precise recommendations ([News](https://tech.china.com.cn/sx/20251201/411376.shtml)). -* [2025-11] Introducing [Learn-to-Ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask): a framework for training proactive dialogue agents from offline expert data ([paper](https://arxiv.org/pdf/2510.25441)). +* [2025-11] Introducing [Learn-to-Ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask): a framework for training proactive dialogue agents from offline expert data ([paper](https://arxiv.org/pdf/2510.25441)). * [2025-11] Introducing [BOTS](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots): online RL task selection for efficient LLM fine-tuning ([paper](https://arxiv.org/pdf/2510.26374)). * [2025-09] [Our paper](https://arxiv.org/pdf/2509.24203) reveals a novel off-policy interpretation for group-relative REINFORCE and its variants like GRPO and AsymRE ([implementation](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k)). * [2025-08] Introducing [CHORD](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/mix_chord): dynamic SFT + RL integration for advanced LLM fine-tuning ([paper](https://arxiv.org/pdf/2508.11408)). @@ -70,6 +70,15 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob | *Benchmarks* | • [Benchmark toolkit (quick verification & experimentation)](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/README.md)
• [Guru-Math benchmark & comparison with veRL](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)
• [FrozenLake benchmark & comparison with rLLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)
• [Alfworld benchmark & comparison with rLLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) | | *Going deeper into Trinity-RFT* | • [Full configurations](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)
• [GPU resource and training configuration guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)
• [Training VLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_vlm)
• [Understand the coordination between explorer and trainer](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)
• [How to align configuration with veRL](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) | +> [!TIP] +> **Recommended Learning Paths** +> +> 🆕 **New users:** [Installation](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_installation.html) → [Quick Start (GSM8K)](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html) → [Configuration Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html) → [GPU Resource Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html) +> +> 🔬 **Algorithm researchers:** [Developer Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_overview.html) → [Algorithm Development Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html) → [CHORD Algorithm Example](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) +> +> 🤖 **Agent developers:** [Developer Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_overview.html) → [Workflow Development](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_workflow.html) → [General Multi-step Workflow Example](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html) + > [!NOTE] > For more tutorials, please refer to the [Trinity-RFT documentation](https://agentscope-ai.github.io/Trinity-RFT/). @@ -366,12 +375,12 @@ For studio users, click "Run" in the web interface. ## Contribution Guide -This project is currently under active development, and we welcome contributions from the community! +This project is currently under active development--star the repo to watch releases for the latest updates! -We welcome contributions of all kinds, including: +We welcome all kinds of contributions from the community, including: * Documentation improvements -* Example workflows +* Example workflows, algorithms, and data pipelines * Bug fixes and performance optimizations If you're new to the project, documentation and example updates are a great place to start. diff --git a/README_zh.md b/README_zh.md index 5b46002dcb..246dda0110 100644 --- a/README_zh.md +++ b/README_zh.md @@ -47,10 +47,9 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能: * [2026-01] [[发布说明]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.4.1) Trinity-RFT v0.4.1 发布:升级 verl 至 v0.7.0,Tinker 后端支持 OpenAI API,修复若干 Bug。 * [2026-01] 推出 [R3L](https://github.com/shiweijiezero/R3L):基于反思-重试的强化学习机制,由自然语言反馈引导高效探索,并达成稳定的 off-policy 学习([论文](https://arxiv.org/abs/2601.03715))。 * [2025-12] [[发布说明]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.4.0) Trinity-RFT v0.4.0 发布:新增[Tinker](https://thinkingmachines.ai/tinker/) 后端以支持在 **无 GPU** 的设备上训练,增加更多基准测试,增强在线 RL 等功能。 -* [2025-12] Trinity-RFT 已支持 [tinker](https://thinkingmachines.ai/tinker/) 训练后端,可在**无 GPU 的设备**上进行模型训练。 * [2025-12] Trinity-RFT 助力淘宝闪购医药健康业务,让 AI 智能体能够理解模糊症状、主动询问后续问题,并提供精准推荐([新闻](https://tech.china.com.cn/sx/20251201/411376.shtml))。 * [2025-11] [[发布说明](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.3.3)] Trinity-RFT v0.3.3 发布:修复若干 Bug。 -* [2025-11] 推出 [Learn-to-Ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask):利用离线专家数据,训练具备主动问询能力的对话智能体([论文](https://arxiv.org/pdf/2510.25441)). +* [2025-11] 推出 [Learn-to-Ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask):利用离线专家数据,训练具备主动问询能力的对话智能体([论文](https://arxiv.org/pdf/2510.25441))。 * [2025-11] 推出 [BOTS](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots):在线 RL 任务选择,实现高效 LLM 微调([论文](https://arxiv.org/pdf/2510.26374))。 * [2025-09] 我们的 [论文](https://arxiv.org/pdf/2509.24203) 揭示了 group-relative REINFORCE 及其变种(如 GRPO 和 AsymRE)的 off-policy 解释([代码](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k))。 * [2025-08] 推出 [CHORD](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/mix_chord):动态 SFT + RL 集成,实现进阶 LLM 微调([论文](https://arxiv.org/pdf/2508.11408))。 @@ -84,6 +83,15 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能: | *深入了解 Trinity-RFT* | + [完整配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)
+ [GPU 资源与训练配置对应指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)
+ [训练多模态模型](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_vlm)
+ [理解 explorer-trainer 同步逻辑](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)
+ [如何与 verl 对齐配置](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html) | +> [!TIP] +> **推荐阅读顺序** +> +> 🆕 **新手入门:** [安装](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_installation.html) → [快速开始 (GSM8K)](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_reasoning_basic.html) → [参数配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html) → [GPU 资源配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html) +> +> 🔬 **算法研究者:** [开发者指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_overview.html) → [算法开发指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_algorithm.html) → [CHORD 算法示例](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html) +> +> 🤖 **Agent 开发者:** [开发者指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_overview.html) → [Workflow 开发](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_workflow.html) → [通用多轮 Workflow 示例](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_multi_turn.html) + > [!NOTE] > 更多教程请参考 [Trinity-RFT 文档](https://agentscope-ai.github.io/Trinity-RFT/)。 @@ -149,6 +157,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能: - [快速上手](#快速上手) + - [使用 CPU 快速上手](#使用-cpu-快速上手) - [第一步:安装](#第一步安装) - [第二步:准备数据集和模型](#第二步准备数据集和模型) - [第三步:准备配置文件](#第三步准备配置文件) @@ -161,14 +170,31 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能: ## 快速上手 - > [!NOTE] > 本项目正处于活跃开发阶段。欢迎提出意见和建议! -> -> **没有 GPU?没问题!** 您仍然可以尝试使用: -> 1. 按照安装步骤进行操作(可跳过 `flash-attn` 等 GPU 专用的软件包) -> 2. 运行 **[Tinker 训练示例](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/tinker)**,该示例专为仅使用 CPU 的系统设计。 +### 使用 CPU 快速上手 + +如果您没有 GPU,仍然可以通过 Tinker 后端体验 Trinity-RFT。 + +```bash +# 创建并激活环境 +python3.10 -m venv .venv +source .venv/bin/activate + +# 安装支持仅 CPU 后端的 Trinity-RFT +pip install -e ".[tinker]" +``` + +运行一个简单示例: + +```bash +trinity run --config examples/tinker/tinker.yaml +``` + +该示例专为仅使用 CPU 的设备设计。更多细节请参见完整的 [Tinker 训练示例](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/tinker)。 + +如需在 GPU 设备上运行 Trinity-RFT,请按照以下步骤操作。 ### 第一步:安装 @@ -178,22 +204,26 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能: - **CUDA**:版本 >= 12.8 - **GPU**: 至少一块 [compute capability](https://developer.nvidia.com/cuda/gpus) 为 8.0 或更高的 NVIDIA GPU(例如 RTX 30 系列、A100、H100) -## 源码安装(推荐) +**推荐安装方式:** + +* 没有 GPU → 使用 Tinker 后端 +* 希望快速搭建 → 使用 Docker +* 希望开发和贡献 → 使用 Conda / venv + +#### 源码安装(推荐) 如需修改、扩展 Trinity-RFT,推荐使用此方法。 -### 1. 克隆仓库 +首先,克隆仓库: ```bash git clone https://github.com/agentscope-ai/Trinity-RFT cd Trinity-RFT ``` -### 2. 构建环境 - -可选择以下任一方式: +然后,通过以下任一方式构建环境: -#### 使用预构建 Docker 镜像(推荐初学者使用该方法) +**使用预构建 Docker 镜像(推荐初学者使用该方法)** ```bash @@ -211,7 +241,7 @@ docker run -it \ > 该镜像已经通过 `uv` 安装了 Trinity-RFT 以及所有 GPU 相关依赖,且会自动激活虚拟环境(也可通过 `source /opt/venv/bin/activate` 手动激活)。必要时可使用 `uv pip install` 添加额外的包。 -#### 使用 Conda +**使用 Conda** ```bash conda create -n trinity python=3.12 @@ -228,7 +258,7 @@ pip install -e ".[vllm,flash_attn]" pip install -e ".[dev]" # 用于调试和开发 ``` -#### 使用 venv +**使用 venv** ```bash python3.10 -m venv .venv @@ -245,7 +275,7 @@ pip install -e ".[vllm,flash_attn]" pip install -e ".[dev]" # 用于调试和开发 ``` -#### 使用 `uv` +**使用 uv** [`uv`](https://github.com/astral-sh/uv) 是现代的 Python 包管理工具。 @@ -256,7 +286,7 @@ uv sync --extra vllm --extra dev --extra flash_attn # uv sync --extra tinker --extra dev ``` -## 通过 PyPI 安装 +#### 通过 PyPI 安装 如果您只需使用 Trinity-RFT 而不打算修改代码: @@ -382,12 +412,17 @@ trinity run --config examples/grpo_gsm8k/gsm8k.yaml ## 贡献指南 +本项目正处于活跃开发阶段——点击 Star 关注本仓库以获取最新更新! -本项目正处于活跃开发阶段,我们欢迎来自社区的贡献! +我们欢迎来自社区的各种贡献,包括: +* 文档改进 +* 工作流、算法和数据处理流水线 +* Bug 修复和性能优化 -请参阅 [贡献指南](./CONTRIBUTING.md) 了解详情。 +如果您是项目新手,文档和例子的更新是很好的入手点。 +详细的贡献指南请参见 [CONTRIBUTING.md](./CONTRIBUTING.md),以及我们的 [good-first-issue 列表](https://github.com/agentscope-ai/Trinity-RFT/issues/470)。 ## 致谢 @@ -399,7 +434,7 @@ trinity run --config examples/grpo_gsm8k/gsm8k.yaml + [Data-Juicer](https://github.com/datajuicer/data-juicer) 用于数据处理流水线; + [AgentScope](https://github.com/agentscope-ai/agentscope) 用于智能体工作流; + [Ray](https://github.com/ray-project/ray) 用于分布式系统; -+ 我们也从 [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)、[TRL](https://github.com/huggingface/trl) 和 [ChatLearn](https://github.com/alibaba/ChatLearn) 等框架中汲取了灵感; ++ 我们也从 [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)、[TRL](https://github.com/huggingface/trl)、[ChatLearn](https://github.com/alibaba/ChatLearn) 和 [rLLM](https://github.com/rllm-org/rllm) 等框架中汲取了灵感; + ...... ## 引用 diff --git a/docs/sphinx_doc/source/main.md b/docs/sphinx_doc/source/main.md index b2dfb3b371..066fbbc0f4 100644 --- a/docs/sphinx_doc/source/main.md +++ b/docs/sphinx_doc/source/main.md @@ -27,7 +27,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob | Category | Tutorial / Guideline | | --- | ----| -| *Run diverse RFT modes* | + [Quick start: GRPO on GSM8k](/tutorial/example_reasoning_basic.md)
+ [Off-policy RFT](/tutorial/example_reasoning_advanced.md)
+ [Fully asynchronous RFT](/tutorial/example_async_mode.md)
+ [Offline learning by DPO or SFT](/tutorial/example_dpo.md) | +| *Run diverse RFT modes* | + [Quick start: GRPO on GSM8k](/tutorial/example_reasoning_basic.md)
+ [Off-policy RFT](/tutorial/example_reasoning_advanced.md)
+ [Fully asynchronous RFT](/tutorial/example_async_mode.md)
+ [Offline learning by DPO or SFT](/tutorial/example_dpo.md)
+ [RFT without local GPU (Tinker Backend)](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/tinker) | | *Multi-step agentic RL* | + [Concatenated multi-turn workflow](/tutorial/example_multi_turn.md)
+ [General multi-step workflow](/tutorial/example_step_wise.md)
+ [ReAct workflow with an agent framework](/tutorial/example_react.md)
+ [Example: train a web-search agent](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/agentscope_websearch) | | *Full-lifecycle data pipelines* | + [Rollout task mixing and selection](/tutorial/develop_selector.md)
+ [Online task curriculum](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374))
+ [Research project: learn-to-ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441))
+ [Experience replay with prioritization](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)
+ [Advanced data processing & human-in-the-loop](/tutorial/example_data_functionalities.md) | | *Algorithm development* | + [RL algorithm development with Trinity-RFT](/tutorial/example_mix_algo.md) (📝 [paper](https://arxiv.org/pdf/2508.11408))
+ [Research project: R3L (reflect-then-retry RL)](https://github.com/shiweijiezero/R3L) (📝 [paper](https://arxiv.org/abs/2601.03715))
+ [Research project: group-relative REINFORCE](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203))
+ Non-verifiable domains: [RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) | @@ -98,12 +98,12 @@ We list some algorithms supported by Trinity-RFT in the following table. For mor This project is built upon many excellent open-source projects, including: -+ [verl](https://github.com/volcengine/verl) and [PyTorch's FSDP](https://pytorch.org/docs/stable/fsdp.html) for LLM training; ++ [verl](https://github.com/volcengine/verl), [FSDP](https://pytorch.org/docs/stable/fsdp.html) and [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) for LLM training; + [vLLM](https://github.com/vllm-project/vllm) for LLM inference; + [Data-Juicer](https://github.com/datajuicer/data-juicer) for data processing pipelines; + [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow; + [Ray](https://github.com/ray-project/ray) for distributed systems; -+ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl) and [ChatLearn](https://github.com/alibaba/ChatLearn); ++ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl), [ChatLearn](https://github.com/alibaba/ChatLearn) and [rLLM](https://github.com/rllm-org/rllm); + ...... diff --git a/docs/sphinx_doc/source/tutorial/faq.md b/docs/sphinx_doc/source/tutorial/faq.md index 137b3ad961..58b041f833 100644 --- a/docs/sphinx_doc/source/tutorial/faq.md +++ b/docs/sphinx_doc/source/tutorial/faq.md @@ -80,7 +80,7 @@ ImportError: ... UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key]) ... ``` -**A:** Try to log in to WandB before starting Ray and running the experiment. One way to do this is run the command `export WANDB_API_KEY=[your_api_key]`. Yoy may also try using other monitors instead of WandB by setting `monitor.monitor_type=tensorboard/mlflow`. +**A:** Try to log in to WandB before starting Ray and running the experiment. One way to do this is run the command `export WANDB_API_KEY=[your_api_key]`. You may also try using other monitors instead of WandB by setting `monitor.monitor_type=tensorboard/mlflow`. --- diff --git a/docs/sphinx_doc/source_zh/main.md b/docs/sphinx_doc/source_zh/main.md index 58ab96ad0e..9ed175386c 100644 --- a/docs/sphinx_doc/source_zh/main.md +++ b/docs/sphinx_doc/source_zh/main.md @@ -26,7 +26,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能: | 类别 | 教程 / 指南 | | --- | ----| -| *运行各种 RFT 模式* | + [快速开始:在 GSM8k 上运行 GRPO](/tutorial/example_reasoning_basic.md)
+ [Off-policy RFT](/tutorial/example_reasoning_advanced.md)
+ [全异步 RFT](/tutorial/example_async_mode.md)
+ [通过 DPO 或 SFT 进行离线学习](/tutorial/example_dpo.md) | +| *运行各种 RFT 模式* | + [快速开始:在 GSM8k 上运行 GRPO](/tutorial/example_reasoning_basic.md)
+ [Off-policy RFT](/tutorial/example_reasoning_advanced.md)
+ [全异步 RFT](/tutorial/example_async_mode.md)
+ [通过 DPO 或 SFT 进行离线学习](/tutorial/example_dpo.md)
+ [在无GPU环境下运行RFT训练(Tinker 后端)](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/tinker) | | *多轮智能体强化学习* | + [拼接多轮任务](/tutorial/example_multi_turn.md)
+ [通用多轮任务](/tutorial/example_step_wise.md)
+ [调用智能体框架中的 ReAct 工作流](/tutorial/example_react.md)
+ [例子:训练一个网络搜索智能体](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/agentscope_websearch) | | *全生命周期的数据流水线* | + [Rollout 任务混合与选取](/tutorial/develop_selector.md)
+ [在线任务选择](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots) (📝 [论文](https://arxiv.org/pdf/2510.26374))
+ [研究项目:learn-to-ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [论文](https://arxiv.org/pdf/2510.25441))
+ [经验回放机制](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)
+ [高级数据处理能力 & Human-in-the-loop](/tutorial/example_data_functionalities.md) | | *强化学习算法开发* | + [使用 Trinity-RFT 进行 RL 算法开发](/tutorial/example_mix_algo.md) (📝 [论文](https://arxiv.org/pdf/2508.11408))
+ [研究项目: R3L (基于反思-重试的强化学习)](https://github.com/shiweijiezero/R3L) (📝 [论文](https://arxiv.org/abs/2601.03715))
+ [研究项目: group-relative REINFORCE](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [论文](https://arxiv.org/abs/2509.24203))
+ 不可验证的领域: [RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [可训练 RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) | @@ -98,7 +98,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能: + [Data-Juicer](https://github.com/datajuicer/data-juicer?tab=readme-ov-file) 用于数据处理流水线; + [AgentScope](https://github.com/agentscope-ai/agentscope) 用于智能体工作流; + [Ray](https://github.com/ray-project/ray) 用于分布式系统; -+ 我们也从 [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)、[TRL](https://github.com/huggingface/trl) 和 [ChatLearn](https://github.com/alibaba/ChatLearn) 等框架中汲取了灵感; ++ 我们也从 [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)、[TRL](https://github.com/huggingface/trl)、[ChatLearn](https://github.com/alibaba/ChatLearn) 和 [rLLM](https://github.com/rllm-org/rllm) 等框架中汲取了灵感; + ...... ## 引用 diff --git a/docs/sphinx_doc/source_zh/tutorial/faq.md b/docs/sphinx_doc/source_zh/tutorial/faq.md index a79b4dc2bc..f1a719dc8d 100644 --- a/docs/sphinx_doc/source_zh/tutorial/faq.md +++ b/docs/sphinx_doc/source_zh/tutorial/faq.md @@ -104,7 +104,7 @@ ray start --head - 对于 trainer,当 `trainer.use_dynamic_bsz=false` 时,调整 `trainer.max_token_len_per_gpu`;当 `trainer.use_dynamic_bsz=true` 时,调整 `trainer.ppo_max_token_len_per_gpu` 和 `trainer.ulysses_sequence_parallel_size`。设置 `trainer.trainer_config.actor_rollout_ref.actor.entropy_from_logits_with_chunking=true` 也可能有帮助。 - 对于 explorer,调整 `explorer.rollout_model.tensor_parallel_size`。 -此外,Trinity-RFT 提供了[GPU 相关配置指南](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html),可参考其中建议。 +此外,Trinity-RFT 提供了[GPU 相关配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html),可参考其中建议。 ## 第三部分:调试方法 diff --git a/examples/grpo_gsm8k/gsm8k.yaml b/examples/grpo_gsm8k/gsm8k.yaml index 150cc68497..ac3762c565 100644 --- a/examples/grpo_gsm8k/gsm8k.yaml +++ b/examples/grpo_gsm8k/gsm8k.yaml @@ -12,7 +12,7 @@ model: max_model_len: 2048 cluster: node_num: 1 - gpu_per_node: 8 + gpu_per_node: 2 buffer: total_epochs: 1 batch_size: 96 @@ -47,7 +47,7 @@ explorer: eval_interval: 50 runner_per_model: 8 rollout_model: - engine_num: 2 + engine_num: 1 tensor_parallel_size: 1 enable_prefix_caching: false enforce_eager: true