Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
* [2026-01] Introducing [R3L](https://github.com/shiweijiezero/R3L): a systematic reflect-then-retry RL mechanism with efficient language-guided exploration and stable off-policy learning ([paper](https://arxiv.org/abs/2601.03715)).
* [2025-12] [[Release Notes]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.4.0) Trinity-RFT v0.4.0 released: added [Tinker](https://thinkingmachines.ai/tinker/) backend for users **without GPUs**, add more benchmarks, enhance online RL and more.
* [2025-12] Trinity-RFT powers the medical and health business of "Taobao Shangou", enabling the AI agent to understand vague symptoms, proactively ask follow-up questions, and provide precise recommendations ([News](https://tech.china.com.cn/sx/20251201/411376.shtml)).
* [2025-11] Introducing [Learn-to-Ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask): a framework for training proactive dialogue agents from offline expert data ([paper](https://arxiv.org/pdf/2510.25441)).
* [2025-11] Introducing [Learn-to-Ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask): a framework for training proactive dialogue agents from offline expert data ([paper](https://arxiv.org/pdf/2510.25441)).
* [2025-11] Introducing [BOTS](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots): online RL task selection for efficient LLM fine-tuning ([paper](https://arxiv.org/pdf/2510.26374)).
* [2025-09] [Our paper](https://arxiv.org/pdf/2509.24203) reveals a novel off-policy interpretation for group-relative REINFORCE and its variants like GRPO and AsymRE ([implementation](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k)).
* [2025-08] Introducing [CHORD](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/mix_chord): dynamic SFT + RL integration for advanced LLM fine-tuning ([paper](https://arxiv.org/pdf/2508.11408)).
Expand Down Expand Up @@ -70,6 +70,15 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
| *Benchmarks* | • [Benchmark toolkit (quick verification & experimentation)](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/README.md)<br>• [Guru-Math benchmark & comparison with veRL](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)<br>• [FrozenLake benchmark & comparison with rLLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)<br>• [Alfworld benchmark & comparison with rLLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
| *Going deeper into Trinity-RFT* | • [Full configurations](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)<br>• [GPU resource and training configuration guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)<br>• [Training VLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_vlm)<br>• [Understand the coordination between explorer and trainer](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)<br>• [How to align configuration with veRL](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) |

> [!TIP]
> **Recommended Learning Paths**
>
> 🆕 **New users:** [Installation](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_installation.html) → [Quick Start (GSM8K)](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html) → [Configuration Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html) → [GPU Resource Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)
>
> 🔬 **Algorithm researchers:** [Developer Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_overview.html) → [Algorithm Development Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html) → [CHORD Algorithm Example](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)
>
> 🤖 **Agent developers:** [Developer Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_overview.html) → [Workflow Development](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_workflow.html) → [General Multi-step Workflow Example](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html)

> [!NOTE]
> For more tutorials, please refer to the [Trinity-RFT documentation](https://agentscope-ai.github.io/Trinity-RFT/).

Expand Down Expand Up @@ -366,12 +375,12 @@ For studio users, click "Run" in the web interface.

## Contribution Guide

This project is currently under active development, and we welcome contributions from the community!
This project is currently under active development--star the repo to watch releases for the latest updates!

We welcome contributions of all kinds, including:
We welcome all kinds of contributions from the community, including:

* Documentation improvements
* Example workflows
* Example workflows, algorithms, and data pipelines
* Bug fixes and performance optimizations

If you're new to the project, documentation and example updates are a great place to start.
Expand Down
75 changes: 55 additions & 20 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,9 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
* [2026-01] [[发布说明]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.4.1) Trinity-RFT v0.4.1 发布:升级 verl 至 v0.7.0,Tinker 后端支持 OpenAI API,修复若干 Bug。
* [2026-01] 推出 [R3L](https://github.com/shiweijiezero/R3L):基于反思-重试的强化学习机制,由自然语言反馈引导高效探索,并达成稳定的 off-policy 学习([论文](https://arxiv.org/abs/2601.03715))。
* [2025-12] [[发布说明]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.4.0) Trinity-RFT v0.4.0 发布:新增[Tinker](https://thinkingmachines.ai/tinker/) 后端以支持在 **无 GPU** 的设备上训练,增加更多基准测试,增强在线 RL 等功能。
* [2025-12] Trinity-RFT 已支持 [tinker](https://thinkingmachines.ai/tinker/) 训练后端,可在**无 GPU 的设备**上进行模型训练。
* [2025-12] Trinity-RFT 助力淘宝闪购医药健康业务,让 AI 智能体能够理解模糊症状、主动询问后续问题,并提供精准推荐([新闻](https://tech.china.com.cn/sx/20251201/411376.shtml))。
* [2025-11] [[发布说明](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.3.3)] Trinity-RFT v0.3.3 发布:修复若干 Bug。
* [2025-11] 推出 [Learn-to-Ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask):利用离线专家数据,训练具备主动问询能力的对话智能体([论文](https://arxiv.org/pdf/2510.25441)).
* [2025-11] 推出 [Learn-to-Ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask):利用离线专家数据,训练具备主动问询能力的对话智能体([论文](https://arxiv.org/pdf/2510.25441))
* [2025-11] 推出 [BOTS](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots):在线 RL 任务选择,实现高效 LLM 微调([论文](https://arxiv.org/pdf/2510.26374))。
* [2025-09] 我们的 [论文](https://arxiv.org/pdf/2509.24203) 揭示了 group-relative REINFORCE 及其变种(如 GRPO 和 AsymRE)的 off-policy 解释([代码](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k))。
* [2025-08] 推出 [CHORD](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/mix_chord):动态 SFT + RL 集成,实现进阶 LLM 微调([论文](https://arxiv.org/pdf/2508.11408))。
Expand Down Expand Up @@ -84,6 +83,15 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
| *深入了解 Trinity-RFT* | + [完整配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)<br>+ [GPU 资源与训练配置对应指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)<br>+ [训练多模态模型](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_vlm)<br>+ [理解 explorer-trainer 同步逻辑](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)<br>+ [如何与 verl 对齐配置](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html) |


> [!TIP]
> **推荐阅读顺序**
>
> 🆕 **新手入门:** [安装](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_installation.html) → [快速开始 (GSM8K)](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_reasoning_basic.html) → [参数配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html) → [GPU 资源配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)
>
> 🔬 **算法研究者:** [开发者指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_overview.html) → [算法开发指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_algorithm.html) → [CHORD 算法示例](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html)
>
> 🤖 **Agent 开发者:** [开发者指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_overview.html) → [Workflow 开发](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_workflow.html) → [通用多轮 Workflow 示例](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_multi_turn.html)

> [!NOTE]
> 更多教程请参考 [Trinity-RFT 文档](https://agentscope-ai.github.io/Trinity-RFT/)。

Expand Down Expand Up @@ -149,6 +157,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:


- [快速上手](#快速上手)
- [使用 CPU 快速上手](#使用-cpu-快速上手)
- [第一步:安装](#第一步安装)
- [第二步:准备数据集和模型](#第二步准备数据集和模型)
- [第三步:准备配置文件](#第三步准备配置文件)
Expand All @@ -161,14 +170,31 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:

## 快速上手


> [!NOTE]
> 本项目正处于活跃开发阶段。欢迎提出意见和建议!
>
> **没有 GPU?没问题!** 您仍然可以尝试使用:
> 1. 按照安装步骤进行操作(可跳过 `flash-attn` 等 GPU 专用的软件包)
> 2. 运行 **[Tinker 训练示例](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/tinker)**,该示例专为仅使用 CPU 的系统设计。

### 使用 CPU 快速上手

如果您没有 GPU,仍然可以通过 Tinker 后端体验 Trinity-RFT。

```bash
# 创建并激活环境
python3.10 -m venv .venv
source .venv/bin/activate

# 安装支持仅 CPU 后端的 Trinity-RFT
pip install -e ".[tinker]"
```

运行一个简单示例:

```bash
trinity run --config examples/tinker/tinker.yaml
```

该示例专为仅使用 CPU 的设备设计。更多细节请参见完整的 [Tinker 训练示例](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/tinker)。

如需在 GPU 设备上运行 Trinity-RFT,请按照以下步骤操作。

### 第一步:安装

Expand All @@ -178,22 +204,26 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
- **CUDA**:版本 >= 12.8
- **GPU**: 至少一块 [compute capability](https://developer.nvidia.com/cuda/gpus) 为 8.0 或更高的 NVIDIA GPU(例如 RTX 30 系列、A100、H100)

## 源码安装(推荐)
**推荐安装方式:**

* 没有 GPU → 使用 Tinker 后端
* 希望快速搭建 → 使用 Docker
* 希望开发和贡献 → 使用 Conda / venv

#### 源码安装(推荐)

如需修改、扩展 Trinity-RFT,推荐使用此方法。

### 1. 克隆仓库
首先,克隆仓库

```bash
git clone https://github.com/agentscope-ai/Trinity-RFT
cd Trinity-RFT
```

### 2. 构建环境

可选择以下任一方式:
然后,通过以下任一方式构建环境:

#### 使用预构建 Docker 镜像(推荐初学者使用该方法)
**使用预构建 Docker 镜像(推荐初学者使用该方法)**


```bash
Expand All @@ -211,7 +241,7 @@ docker run -it \

> 该镜像已经通过 `uv` 安装了 Trinity-RFT 以及所有 GPU 相关依赖,且会自动激活虚拟环境(也可通过 `source /opt/venv/bin/activate` 手动激活)。必要时可使用 `uv pip install` 添加额外的包。

#### 使用 Conda
**使用 Conda**

```bash
conda create -n trinity python=3.12
Expand All @@ -228,7 +258,7 @@ pip install -e ".[vllm,flash_attn]"
pip install -e ".[dev]" # 用于调试和开发
```

#### 使用 venv
**使用 venv**

```bash
python3.10 -m venv .venv
Expand All @@ -245,7 +275,7 @@ pip install -e ".[vllm,flash_attn]"
pip install -e ".[dev]" # 用于调试和开发
```

#### 使用 `uv`
**使用 uv**

[`uv`](https://github.com/astral-sh/uv) 是现代的 Python 包管理工具。

Expand All @@ -256,7 +286,7 @@ uv sync --extra vllm --extra dev --extra flash_attn
# uv sync --extra tinker --extra dev
```

## 通过 PyPI 安装
#### 通过 PyPI 安装

如果您只需使用 Trinity-RFT 而不打算修改代码:

Expand Down Expand Up @@ -382,12 +412,17 @@ trinity run --config examples/grpo_gsm8k/gsm8k.yaml

## 贡献指南

本项目正处于活跃开发阶段——点击 Star 关注本仓库以获取最新更新!

本项目正处于活跃开发阶段,我们欢迎来自社区的贡献!
我们欢迎来自社区的各种贡献,包括:

* 文档改进
* 工作流、算法和数据处理流水线
* Bug 修复和性能优化

请参阅 [贡献指南](./CONTRIBUTING.md) 了解详情
如果您是项目新手,文档和例子的更新是很好的入手点

详细的贡献指南请参见 [CONTRIBUTING.md](./CONTRIBUTING.md),以及我们的 [good-first-issue 列表](https://github.com/agentscope-ai/Trinity-RFT/issues/470)。

## 致谢

Expand All @@ -399,7 +434,7 @@ trinity run --config examples/grpo_gsm8k/gsm8k.yaml
+ [Data-Juicer](https://github.com/datajuicer/data-juicer) 用于数据处理流水线;
+ [AgentScope](https://github.com/agentscope-ai/agentscope) 用于智能体工作流;
+ [Ray](https://github.com/ray-project/ray) 用于分布式系统;
+ 我们也从 [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)、[TRL](https://github.com/huggingface/trl)[ChatLearn](https://github.com/alibaba/ChatLearn) 等框架中汲取了灵感;
+ 我们也从 [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)、[TRL](https://github.com/huggingface/trl)[ChatLearn](https://github.com/alibaba/ChatLearn) 和 [rLLM](https://github.com/rllm-org/rllm) 等框架中汲取了灵感;
+ ......

## 引用
Expand Down
6 changes: 3 additions & 3 deletions docs/sphinx_doc/source/main.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob

| Category | Tutorial / Guideline |
| --- | ----|
| *Run diverse RFT modes* | + [Quick start: GRPO on GSM8k](/tutorial/example_reasoning_basic.md)<br>+ [Off-policy RFT](/tutorial/example_reasoning_advanced.md)<br>+ [Fully asynchronous RFT](/tutorial/example_async_mode.md)<br>+ [Offline learning by DPO or SFT](/tutorial/example_dpo.md) |
| *Run diverse RFT modes* | + [Quick start: GRPO on GSM8k](/tutorial/example_reasoning_basic.md)<br>+ [Off-policy RFT](/tutorial/example_reasoning_advanced.md)<br>+ [Fully asynchronous RFT](/tutorial/example_async_mode.md)<br>+ [Offline learning by DPO or SFT](/tutorial/example_dpo.md)<br>+ [RFT without local GPU (Tinker Backend)](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/tinker) |
| *Multi-step agentic RL* | + [Concatenated multi-turn workflow](/tutorial/example_multi_turn.md)<br>+ [General multi-step workflow](/tutorial/example_step_wise.md)<br>+ [ReAct workflow with an agent framework](/tutorial/example_react.md) <br>+ [Example: train a web-search agent](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/agentscope_websearch) |
| *Full-lifecycle data pipelines* | + [Rollout task mixing and selection](/tutorial/develop_selector.md)<br>+ [Online task curriculum](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374))<br>+ [Research project: learn-to-ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441)) <br>+ [Experience replay with prioritization](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>+ [Advanced data processing & human-in-the-loop](/tutorial/example_data_functionalities.md) |
| *Algorithm development* | + [RL algorithm development with Trinity-RFT](/tutorial/example_mix_algo.md) (📝 [paper](https://arxiv.org/pdf/2508.11408))<br>+ [Research project: R3L (reflect-then-retry RL)](https://github.com/shiweijiezero/R3L) (📝 [paper](https://arxiv.org/abs/2601.03715))<br>+ [Research project: group-relative REINFORCE](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203)) <br>+ Non-verifiable domains: [RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
Expand Down Expand Up @@ -98,12 +98,12 @@ We list some algorithms supported by Trinity-RFT in the following table. For mor

This project is built upon many excellent open-source projects, including:

+ [verl](https://github.com/volcengine/verl) and [PyTorch's FSDP](https://pytorch.org/docs/stable/fsdp.html) for LLM training;
+ [verl](https://github.com/volcengine/verl), [FSDP](https://pytorch.org/docs/stable/fsdp.html) and [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) for LLM training;
+ [vLLM](https://github.com/vllm-project/vllm) for LLM inference;
+ [Data-Juicer](https://github.com/datajuicer/data-juicer) for data processing pipelines;
+ [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow;
+ [Ray](https://github.com/ray-project/ray) for distributed systems;
+ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl) and [ChatLearn](https://github.com/alibaba/ChatLearn);
+ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl), [ChatLearn](https://github.com/alibaba/ChatLearn) and [rLLM](https://github.com/rllm-org/rllm);
+ ......


Expand Down
2 changes: 1 addition & 1 deletion docs/sphinx_doc/source/tutorial/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ ImportError: ...
UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key]) ...
```

**A:** Try to log in to WandB before starting Ray and running the experiment. One way to do this is run the command `export WANDB_API_KEY=[your_api_key]`. Yoy may also try using other monitors instead of WandB by setting `monitor.monitor_type=tensorboard/mlflow`.
**A:** Try to log in to WandB before starting Ray and running the experiment. One way to do this is run the command `export WANDB_API_KEY=[your_api_key]`. You may also try using other monitors instead of WandB by setting `monitor.monitor_type=tensorboard/mlflow`.

---

Expand Down
Loading