[PZ COMPETITION] UCI001(liximeng0824)#168
Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @lxmkobe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
本次拉取请求引入了一套完整的强化学习训练流程,专注于数学推理任务。它涵盖了从原始数据到可用于模型训练的结构化数据的预处理步骤,定义了用于评估模型响应的奖励计算逻辑,并配置了一个分布式 PPO 训练框架,以支持大规模语言模型的优化。这些更改旨在为数学问题解决任务建立一个高效且可扩展的训练环境。
Highlights
- GSM8K 评分工具: 新增了
gsm8k.py文件,其中包含用于从解决方案字符串中提取答案并计算 GSM8K 数据集分数的函数。它支持严格和灵活的答案提取方法,并处理数值等价性。 - GSM8K 数据预处理: 新增了
gsm8k_lxm2_newprompt_trainval.py脚本,用于将 GSM8K 数据集预处理为 parquet 格式。该脚本会根据来源(如 orca_math、cn_k12、gsm8k)过滤数据,应用新的提示指令格式(要求逐步推理并将最终答案放入\boxed{}中),并将数据集划分为训练集和验证集。 - PPO 训练配置: 新增了
ppo_trainer.yaml配置文件,这是一个全面的 PPO (Proximal Policy Optimization) 训练器配置,包含了数据加载、actor/critic 模型、rollout 策略、奖励模型以及训练过程的详细参数。 - Ray 分布式 PPO 训练器: 新增了
ray_trainer.py文件,实现了基于 Ray 的 FSDP (Fully Sharded Data Parallel) PPO 训练器。该训练器负责管理分布式训练工作节点、检查点保存与加载、批处理平衡、奖励计算以及训练指标日志记录。
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
本次提交为UCI001团队的初赛/决赛内容,包含了数据预处理脚本、训练配置文件和PPO训练器代码。整体来看,代码结构清晰,功能完整。
我在审查中发现了一些需要关注的问题:
- 严重配置错误:
ppo_trainer.yaml文件中存在多处关键配置项为null,这将导致训练程序在启动时因验证失败而崩溃。 - 代码缺陷:
gsm8k.py中的奖励计算函数存在一个bug,硬编码了返回值,忽略了函数参数。gsm8k_lxm2_newprompt_trainval.py数据预处理脚本在特定条件下(使用HDFS时)会因缺少导入而崩溃。
- 潜在风险:
ray_trainer.py中的验证逻辑可能会一次性加载整个验证集,存在内存溢出的风险。- 代码中多处使用了过于宽泛的异常捕获(
except Exception),这不利于问题的排查和调试。
此外,我还提供了一些关于代码健壮性、可读性和最佳实践的建议。修复这些问题将有助于提高代码的稳定性和可维护性。请查看具体的审查评论以获取详细信息和修改建议。
| ppo_micro_batch_size: null | ||
|
|
||
| # Local per-GPU micro batch size | ||
| ppo_micro_batch_size_per_gpu: null |
| ppo_micro_batch_size: null | ||
|
|
||
| # Local per-GPU micro batch size | ||
| ppo_micro_batch_size_per_gpu: null |
| import datasets | ||
| from glob import glob | ||
|
|
||
| # from verl.utils.hdfs_io import copy, makedirs |
| log_prob_micro_batch_size: null | ||
|
|
||
| # The batch size for one forward pass in the computation of log_prob. Local batch size per GPU. | ||
| log_prob_micro_batch_size_per_gpu: null |
| log_prob_micro_batch_size: null | ||
|
|
||
| # The batch size for one forward pass in the computation of log_prob. Local batch size per GPU. | ||
| log_prob_micro_batch_size_per_gpu: null |
| return final_answer | ||
|
|
||
|
|
||
| def compute_score(solution_str, ground_truth, method="strict", format_score=0.2, score=1.0): |
There was a problem hiding this comment.
| # Save frequency (by iteration) for model checkpoints | ||
| save_freq: -1 | ||
|
|
||
| # ESI redundant time (in seconds) for model checkpointsAdd commentMore actions |
| train_candidates = glob(os.path.join(ms_base_dir, "**", "big-math-rl-verified-processed-train.arrow"), recursive=True) | ||
| # test_candidates = glob(os.path.join(ms_base_dir, "**", "gsm8k-test.arrow"), recursive=True) | ||
|
|
||
| assert len(train_candidates) > 0, f"未在 {ms_base_dir} 下找到 gsm8k-train.arrow" |
There was a problem hiding this comment.
此处的断言消息硬编码为 gsm8k-train.arrow,但实际搜索的文件名是 big-math-rl-verified-processed-train.arrow(见第48行)。这可能会在调试时引起困惑。建议将断言消息与实际搜索的文件名保持一致。
| assert len(train_candidates) > 0, f"未在 {ms_base_dir} 下找到 gsm8k-train.arrow" | |
| assert len(train_candidates) > 0, f"未在 {ms_base_dir} 下找到 big-math-rl-verified-processed-train.arrow" |
| solution = re.search("#### (\\-?[0-9\\.\\,]+)", solution_str) | ||
| assert solution is not None | ||
| final_solution = solution.group(0) | ||
| final_solution = final_solution.split("#### ")[1].replace(",", "") | ||
| return final_solution |
There was a problem hiding this comment.
使用 solution.group(0) 获取整个匹配(例如 "#### 123"),然后再用 split 来提取数字,这种方式有点迂回且不够健壮。直接使用 solution.group(1) 可以更简洁、直接地获取正则表达式中捕获组匹配到的数字部分。同时,建议在断言失败时提供更有用的错误信息。
| solution = re.search("#### (\\-?[0-9\\.\\,]+)", solution_str) | |
| assert solution is not None | |
| final_solution = solution.group(0) | |
| final_solution = final_solution.split("#### ")[1].replace(",", "") | |
| return final_solution | |
| solution = re.search("#### (\\-?[0-9\\.\\,]+)", solution_str) | |
| assert solution is not None, f"无法在字符串中找到解决方案格式 '#### ...':{solution_str}" | |
| final_solution = solution.group(1).replace(",", "") | |
| return final_solution |
| rollout_data_dir = self.config.trainer.get("rollout_data_dir", None) | ||
| if rollout_data_dir: | ||
| with marked_timer("dump_rollout_generations", timing_raw, color="green"): | ||
| print(batch.batch.keys()) |
团队名称:UCI001
提交账号:liximeng0824
PR提交内容:初赛/决赛报告、决赛verl的diff代码、训练数据预处理代码