RL Infra Notes

Deep source-code walkthroughs of LLM RL training infrastructure — async RL scheduling, weight synchronization, FP8 mixed-precision, MoE routing precision, and more.

LLM 强化学习训练基础设施的源码级深度分析笔记。不止于"是什么"，更关注"为什么这样设计"和"代码里实际怎么做的"。

Why This Repo?

开源 RL 框架越来越多，但大部分文档只告诉你 API 怎么用。当你需要理解：

Async RL 训练中 rollout 和 training 到底怎么调度的？
权重同步时推理引擎发生了什么？abort 还是 drain？
FP8 训练到底量化了哪些算子？scale 格式是什么？
MoE Router 在 bf16 下 topk 会出什么问题？

答案只在源码里。这个 repo 就是把"读源码"的过程结构化记录下来，附带代码位置、对比表和架构图。

Notes

Async RL Training

对比分析三个框架在异步 RL 训练中的设计选择，覆盖 HuggingFace Async RL Survey 的 4 个核心维度：Rollout Buffer、权重同步、Staleness 管理、Partial Rollout。

Note	Framework	Highlights
SLIME Async RL Walkthrough	THUDM/slime	Double-buffer 调度、TIS + OPSM staleness 修正、abort + recycle 机制
veRL Async RL Walkthrough	volcengine/verl	Bounded queue + backpressure、NCCL bucketed broadcast、MIS 多版本 IS、prefix continuation
NeMo-RL Async RL Walkthrough	NVIDIA/NeMo-RL	Replay Buffer + target weight matching、in-flight weight update、TIS / ICE-POP / seq-mask-TIS

FP8 Mixed-Precision Training & Inference

FP8 训练和推理中的量化范围、scale 格式、通信精度等细节分析。

Note	Framework	Highlights
Megatron Overview	Megatron-LM / Bridge / TE	组件关系、FP8 Blockwise 量化范围
fp8_param_gather 详解	Megatron-LM	FP8 all-gather 通信优化、参数更新流程对比
FP8 Blockwise Scale 分析	vLLM	DeepGEMM UE8M0 vs FP32 scale、kernel dispatch 优先级
MoE Router Dtype 分析	Megatron-LM + vLLM	Router 全链路 dtype 追踪（训练 vs 推理）、bf16 topk 精度风险
MoE Unpermute 非确定性分析	Megatron-LM + TE	scatter_add_ 非确定性根因、TE gather-reduce Triton kernel、row_id_map 3-pass 构建

🌐 English Translations

所有笔记的英文翻译版本位于 docs-en/ 目录，结构与 docs/ 完全平行。

Frameworks Studied

Framework	Focus
NVIDIA NeMo-RL	RL training pipeline, async GRPO
veRL	Async RL, weight sync
SLIME	Async RL, TIS/OPSM
Megatron-LM	Distributed training, FP8, MoE
Megatron-Bridge	HF↔Megatron conversion
TransformerEngine	FP8 kernels
vLLM	Inference, FP8, MoE routing

Contributing

欢迎提 Issue 讨论或补充分析。如果你发现笔记中的代码引用已过时（框架更新很快），也欢迎 PR 修正。

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
docs-en		docs-en
docs		docs
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
README_EN.md		README_EN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL Infra Notes

Why This Repo?

Notes

Async RL Training

FP8 Mixed-Precision Training & Inference

🌐 English Translations

Frameworks Studied

Contributing

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RL Infra Notes

Why This Repo?

Notes

Async RL Training

FP8 Mixed-Precision Training & Inference

🌐 English Translations

Frameworks Studied

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages