-
Notifications
You must be signed in to change notification settings - Fork 1
[Don't Merge] Roadmap and Dev Plan #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
08ce80b
3af806e
48fbde3
f8ceed6
2caa4a0
8caa8ba
25ee005
09f534a
ab7eb0b
411e2d2
546d2ad
d480da0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| source scripts/models/qwen2.5-0.5B.sh | ||
| PYTHONPATH=/root/Megatron-LM python tools/convert_hf_to_torch_dist.py \ | ||
| ${MODEL_ARGS[@]} \ | ||
| --hf-checkpoint /root/Qwen2.5-0.5B-Instruct \ | ||
| --save /root/Qwen2.5-0.5B-Instruct_torch_dist/ | ||
Large diffs are not rendered by default.
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| ### 阶段一:打通Qwen2.5-0.5B GRPO 8卡同步/异步训练(train.py和train_async.py),GSM8K 数据集,loss/reward 收敛与 SGLang backend 基本一致,且满足确定性计算,多次重复运行Loss曲线完全一致。 | ||
|
|
||
| First Design and RFC by 03/06 | ||
|
|
||
| #### 初步方案: | ||
| - 对标SGLang,Slime 在 Ray 内管理 vLLM 的完整生命周期,包括进程拉起、权重同步、推理暂停/恢复 | ||
| - 暂不使用Router,SGLang Model Gateway仅只支持SGLang Worker,SlimeRouter仅在 R3 / radix-tree caching 时需要,Qwen2.5-0.5B 非 MoE 且用 token-in/token-out | ||
| - 单vLLM实例,无router,通过vLLMClient 直连本地 vLLM 进程端口 | ||
| - 先支持训推不共卡(non-colocate),权重同步采用NCCL broadcast,对标SGLang update_weights_from_distributed (默认) | ||
| - 再支持和验证colocate,权重同步采用GPU IPC(vLLM update_weights_from_ipc, update_weights_from_tensor),对标SGLang update_weights_from_tensor,以验证Reproductivity。**IPC 依赖vllm 0.17** | ||
|
|
||
| #### 风险: | ||
| - slime, sglang版本依赖,和vllm 0.16的版本依赖冲突(numpy, torch, transformers, etc) | ||
| - slime代码较挫,可靠性差,强依赖preset docker | ||
| - 算力 | ||
|
|
||
|
|
||
| #### Reference | ||
|
|
||
| https://thudm.github.io/slime/advanced/reproducibility.html | ||
|
|
||
|
|
||
| ### 阶段二:接入vllm-project/router,支持多实例vLLM | ||
|
|
||
| - vllm router forked from SGLang Model Gateway | ||
|
|
||
| ### 阶段三:多节点大规模验证,MoE模型,optional:验证MTP Speculative Decoding,FP8 rollout 等高级特性 | ||
|
|
||
| - Model: Qwen/Qwen3-30B-A3B or GLM4.7 | ||
| - Parallel: 16卡 or 128卡, Train mixed EP+FSDP, Rollout EP+DP | ||
| - Verify more features: | ||
| - Bf16 train, FP8 rollout | ||
| - MTP Speculative Decoding |
Large diffs are not rendered by default.
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,137 @@ | ||
| #!/bin/bash | ||
| # Non-colocate version of run-qwen2.5-0.5B-reproducibility.sh | ||
| # 2 GPUs: 1 for training, 1 for SGLang rollout | ||
|
|
||
| # for rerun the task | ||
| pkill -9 sglang | ||
| sleep 3 | ||
| ray stop --force | ||
| pkill -9 ray | ||
| pkill -9 python | ||
| sleep 3 | ||
| pkill -9 ray | ||
| pkill -9 python | ||
|
|
||
| set -ex | ||
|
|
||
| export PYTHONBUFFERED=16 | ||
|
|
||
| SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)" | ||
| source "${SCRIPT_DIR}/scripts/models/qwen2.5-0.5B.sh" | ||
|
|
||
| CKPT_ARGS=( | ||
| --hf-checkpoint /root/Qwen2.5-0.5B-Instruct/ | ||
| --ref-load /root/Qwen2.5-0.5B-Instruct_torch_dist/ | ||
| ) | ||
|
|
||
| ROLLOUT_ARGS=( | ||
| --prompt-data /root/gsm8k/train.parquet | ||
|
Comment on lines
+23
to
+28
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| --input-key messages | ||
| --label-key label | ||
| --apply-chat-template | ||
| --rollout-shuffle | ||
| --rm-type math | ||
| --num-rollout 100 | ||
| --rollout-batch-size 32 | ||
| --n-samples-per-prompt 8 | ||
| --rollout-max-response-len 1024 | ||
| --rollout-temperature 1 | ||
|
|
||
| --global-batch-size 256 | ||
| ) | ||
|
|
||
| EVAL_ARGS=( | ||
| --eval-interval 20 | ||
| --eval-prompt-data gsm8k /root/gsm8k/test.parquet | ||
| --n-samples-per-eval-prompt 1 | ||
| --eval-max-response-len 1024 | ||
| --eval-top-k 1 | ||
| ) | ||
|
|
||
| PERF_ARGS=( | ||
| --tensor-model-parallel-size 1 | ||
| --sequence-parallel | ||
| --pipeline-model-parallel-size 1 | ||
| --context-parallel-size 1 | ||
| --expert-model-parallel-size 1 | ||
| --expert-tensor-parallel-size 1 | ||
|
|
||
| --use-dynamic-batch-size | ||
| --max-tokens-per-gpu 9216 | ||
| ) | ||
|
|
||
| GRPO_ARGS=( | ||
| --advantage-estimator grpo | ||
| --use-kl-loss | ||
| --kl-loss-coef 0.00 | ||
| --kl-loss-type low_var_kl | ||
| --kl-coef 0.00 | ||
| --entropy-coef 0.00 | ||
| --eps-clip 0.2 | ||
| --eps-clip-high 0.28 | ||
| ) | ||
|
|
||
| OPTIMIZER_ARGS=( | ||
| --optimizer adam | ||
| --lr 1e-6 | ||
| --lr-decay-style constant | ||
| --weight-decay 0.1 | ||
| --adam-beta1 0.9 | ||
| --adam-beta2 0.98 | ||
| ) | ||
|
|
||
| WANDB_ARGS=( | ||
| --use-wandb | ||
| --wandb-host https://wandb.ai/ | ||
| --wandb-entity samithuang | ||
| --wandb-project slime-rl | ||
| --wandb-group qwen2.5-0.5B-gsm8k-noncolocate | ||
| ) | ||
|
|
||
| SGLANG_ARGS=( | ||
| --rollout-num-gpus-per-engine 1 | ||
| --sglang-mem-fraction-static 0.7 | ||
|
|
||
| --sglang-enable-deterministic-inference | ||
| --sglang-attention-backend flashinfer | ||
|
|
||
| --deterministic-mode | ||
| ) | ||
|
|
||
| MISC_ARGS=( | ||
| --attention-dropout 0.0 | ||
| --hidden-dropout 0.0 | ||
| --accumulate-allreduce-grads-in-fp32 | ||
| --attention-softmax-in-fp32 | ||
| --attention-backend flash | ||
| ) | ||
|
|
||
| ray start --head --node-ip-address 127.0.0.1 --num-gpus 2 --disable-usage-stats | ||
|
|
||
| ray job submit --address="http://127.0.0.1:8265" \ | ||
| --runtime-env-json='{ | ||
| "env_vars": { | ||
| "PYTHONPATH": "/root/Megatron-LM", | ||
| "CUDA_DEVICE_MAX_CONNECTIONS": "1", | ||
| "NCCL_ALGO": "Ring", | ||
| "NVTE_ALLOW_NONDETERMINISTIC_ALGO": "0", | ||
| "CUBLAS_WORKSPACE_CONFIG": ":4096:8" | ||
| } | ||
| }' \ | ||
| -- python3 train.py \ | ||
| --actor-num-nodes 1 \ | ||
| --actor-num-gpus-per-node 1 \ | ||
| --num-gpus-per-node 2 \ | ||
| --rollout-num-gpus 1 \ | ||
| --calculate-per-token-loss \ | ||
| --use-slime-router \ | ||
| ${MODEL_ARGS[@]} \ | ||
| ${CKPT_ARGS[@]} \ | ||
| ${ROLLOUT_ARGS[@]} \ | ||
| ${OPTIMIZER_ARGS[@]} \ | ||
| ${GRPO_ARGS[@]} \ | ||
| ${WANDB_ARGS[@]} \ | ||
| ${PERF_ARGS[@]} \ | ||
| ${EVAL_ARGS[@]} \ | ||
| ${SGLANG_ARGS[@]} \ | ||
| ${MISC_ARGS[@]} | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,143 @@ | ||
| #!/bin/bash | ||
| # vLLM rollout backend validation script (Phase 1) | ||
| # Based on run-qwen2.5-0.5B-reproducibility.sh | ||
|
|
||
| # for rerun the task | ||
| pkill -9 vllm | ||
| pkill -9 sglang | ||
| sleep 3 | ||
| ray stop --force | ||
| pkill -9 ray | ||
| pkill -9 python | ||
| sleep 3 | ||
| pkill -9 ray | ||
| pkill -9 python | ||
|
|
||
|
|
||
| set -ex | ||
|
|
||
| export PYTHONBUFFERED=16 | ||
|
|
||
| SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)" | ||
| source "${SCRIPT_DIR}/scripts/models/qwen2.5-0.5B.sh" | ||
|
|
||
| CKPT_ARGS=( | ||
| --hf-checkpoint /root/Qwen2.5-0.5B-Instruct/ | ||
| --ref-load /root/Qwen2.5-0.5B-Instruct_torch_dist/ | ||
| ) | ||
|
|
||
| # num-rollout:100 | ||
| ROLLOUT_ARGS=( | ||
| --prompt-data /root/gsm8k/train.parquet | ||
|
Comment on lines
+25
to
+31
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This script contains hardcoded absolute paths for model checkpoints and datasets (e.g., |
||
| --input-key messages | ||
| --label-key label | ||
| --apply-chat-template | ||
| --rollout-shuffle | ||
| --rm-type math | ||
| --num-rollout 500 | ||
| --rollout-batch-size 32 | ||
| --n-samples-per-prompt 8 | ||
| --rollout-max-response-len 1024 | ||
| --rollout-temperature 1 | ||
|
|
||
| --global-batch-size 256 | ||
| ) | ||
|
|
||
| EVAL_ARGS=( | ||
| --eval-interval 20 | ||
| --eval-prompt-data gsm8k /root/gsm8k/test.parquet | ||
| --n-samples-per-eval-prompt 1 | ||
| --eval-max-response-len 1024 | ||
| --eval-top-k 1 | ||
| ) | ||
|
|
||
| PERF_ARGS=( | ||
| --tensor-model-parallel-size 1 | ||
| --sequence-parallel | ||
| --pipeline-model-parallel-size 1 | ||
| --context-parallel-size 1 | ||
| --expert-model-parallel-size 1 | ||
| --expert-tensor-parallel-size 1 | ||
|
|
||
| --use-dynamic-batch-size | ||
| --max-tokens-per-gpu 9216 | ||
| ) | ||
|
|
||
| GRPO_ARGS=( | ||
| --advantage-estimator grpo | ||
| --use-kl-loss | ||
| --kl-loss-coef 0.00 | ||
| --kl-loss-type low_var_kl | ||
| --kl-coef 0.00 | ||
| --entropy-coef 0.00 | ||
| --eps-clip 0.2 | ||
| --eps-clip-high 0.28 | ||
| ) | ||
|
|
||
| OPTIMIZER_ARGS=( | ||
| --optimizer adam | ||
| --lr 1e-6 | ||
| --lr-decay-style constant | ||
| --weight-decay 0.1 | ||
| --adam-beta1 0.9 | ||
| --adam-beta2 0.98 | ||
| ) | ||
|
|
||
| WANDB_ARGS=( | ||
| --use-wandb | ||
| --wandb-host https://wandb.ai/ | ||
| --wandb-entity samithuang | ||
| --wandb-project slime-rl | ||
| --wandb-group qwen2.5-0.5B-gsm8k-vllm | ||
| ) | ||
|
|
||
| VLLM_ARGS=( | ||
| --rollout-backend vllm | ||
| --rollout-num-gpus-per-engine 1 | ||
| --sglang-server-concurrency 512 | ||
| --use-slime-router | ||
| --slime-router-middleware-paths slime.router.middleware_hub.radix_tree_middleware.RadixTreeMiddleware | ||
| ) | ||
|
|
||
| MISC_ARGS=( | ||
| --attention-dropout 0.0 | ||
| --hidden-dropout 0.0 | ||
| --accumulate-allreduce-grads-in-fp32 | ||
| --attention-softmax-in-fp32 | ||
| --attention-backend flash | ||
| --deterministic-mode | ||
| ) | ||
|
|
||
| ray start --head --node-ip-address 127.0.0.1 --num-gpus 2 --disable-usage-stats | ||
|
|
||
| ray job submit --address="http://127.0.0.1:8265" \ | ||
| --runtime-env-json='{ | ||
| "env_vars": { | ||
| "PYTHONPATH": "/root/Megatron-LM", | ||
| "CUDA_DEVICE_MAX_CONNECTIONS": "1", | ||
| "NCCL_ALGO": "Ring", | ||
| "NCCL_IB_DISABLE": "1", | ||
| "NCCL_P2P_DISABLE": "1", | ||
| "NCCL_SHM_DISABLE": "1", | ||
| "NCCL_NET_GDR_LEVEL": "0", | ||
| "NCCL_DEBUG": "INFO", | ||
| "NVTE_ALLOW_NONDETERMINISTIC_ALGO": "0", | ||
| "CUBLAS_WORKSPACE_CONFIG": ":4096:8" | ||
| } | ||
| }' \ | ||
| -- python3 train.py \ | ||
| --actor-num-nodes 1 \ | ||
| --actor-num-gpus-per-node 1 \ | ||
| --num-gpus-per-node 2 \ | ||
| --rollout-num-gpus 1 \ | ||
| --calculate-per-token-loss \ | ||
| ${MODEL_ARGS[@]} \ | ||
| ${CKPT_ARGS[@]} \ | ||
| ${ROLLOUT_ARGS[@]} \ | ||
| ${OPTIMIZER_ARGS[@]} \ | ||
| ${GRPO_ARGS[@]} \ | ||
| ${WANDB_ARGS[@]} \ | ||
| ${PERF_ARGS[@]} \ | ||
| ${EVAL_ARGS[@]} \ | ||
| ${VLLM_ARGS[@]} \ | ||
| ${MISC_ARGS[@]} | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| ``` | ||
| docker pull slimerl/slime:latest | ||
| ``` | ||
|
|
||
| ``` | ||
| docker run -itd --gpus all --ipc=host --shm-size=128g --net=host --privileged=true --restart=always \ | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| --ulimit memlock=-1 --ulimit stack=67108864 \ | ||
| --ulimit nofile=65536:65536 \ | ||
| --name DNAME \ | ||
| -it slimerl/slime:latest /bin/bash \ | ||
|
|
||
| ``` | ||
| docker exec -it --user root DNAME bash | ||
| ``` | ||
|
|
||
| ``` | ||
| pip install vllm=0.16 | ||
|
|
||
| # for compatibility | ||
| pip install numpy==1.26.4 | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -128,7 +128,8 @@ def init( | |
| if self.args.vocab_size is None: | ||
| self.args.vocab_size = self.tokenizer.vocab_size | ||
|
|
||
| update_weight_cls = UpdateWeightFromTensor if self.args.colocate else UpdateWeightFromDistributed | ||
| use_tensor_update = self.args.colocate and getattr(self.args, "rollout_backend", "sglang") != "vllm" | ||
| update_weight_cls = UpdateWeightFromTensor if use_tensor_update else UpdateWeightFromDistributed | ||
|
Comment on lines
+131
to
+132
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The logic here seems to disable the use of |
||
| self.weight_updater = update_weight_cls( | ||
| self.args, | ||
| self.model, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script contains hardcoded absolute paths for
PYTHONPATH,--hf-checkpoint, and--save. This makes the script not portable and difficult to use in different environments. It's recommended to use environment variables or script arguments to specify these paths.