Skip to content

[PZ COMPETITION] UCI001(liximeng0824)#168

Open
lxmkobe wants to merge 2 commits intoFlagAI-Open:mainfrom
lxmkobe:UCI001_submit
Open

[PZ COMPETITION] UCI001(liximeng0824)#168
lxmkobe wants to merge 2 commits intoFlagAI-Open:mainfrom
lxmkobe:UCI001_submit

Conversation

@lxmkobe
Copy link
Copy Markdown

@lxmkobe lxmkobe commented Sep 12, 2025

团队名称:UCI001
提交账号:liximeng0824
PR提交内容:初赛/决赛报告、决赛verl的diff代码、训练数据预处理代码

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @lxmkobe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

本次拉取请求引入了一套完整的强化学习训练流程,专注于数学推理任务。它涵盖了从原始数据到可用于模型训练的结构化数据的预处理步骤,定义了用于评估模型响应的奖励计算逻辑,并配置了一个分布式 PPO 训练框架,以支持大规模语言模型的优化。这些更改旨在为数学问题解决任务建立一个高效且可扩展的训练环境。

Highlights

  • GSM8K 评分工具: 新增了 gsm8k.py 文件,其中包含用于从解决方案字符串中提取答案并计算 GSM8K 数据集分数的函数。它支持严格和灵活的答案提取方法,并处理数值等价性。
  • GSM8K 数据预处理: 新增了 gsm8k_lxm2_newprompt_trainval.py 脚本,用于将 GSM8K 数据集预处理为 parquet 格式。该脚本会根据来源(如 orca_math、cn_k12、gsm8k)过滤数据,应用新的提示指令格式(要求逐步推理并将最终答案放入 \boxed{} 中),并将数据集划分为训练集和验证集。
  • PPO 训练配置: 新增了 ppo_trainer.yaml 配置文件,这是一个全面的 PPO (Proximal Policy Optimization) 训练器配置,包含了数据加载、actor/critic 模型、rollout 策略、奖励模型以及训练过程的详细参数。
  • Ray 分布式 PPO 训练器: 新增了 ray_trainer.py 文件,实现了基于 Ray 的 FSDP (Fully Sharded Data Parallel) PPO 训练器。该训练器负责管理分布式训练工作节点、检查点保存与加载、批处理平衡、奖励计算以及训练指标日志记录。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

本次提交为UCI001团队的初赛/决赛内容,包含了数据预处理脚本、训练配置文件和PPO训练器代码。整体来看,代码结构清晰,功能完整。

我在审查中发现了一些需要关注的问题:

  1. 严重配置错误ppo_trainer.yaml 文件中存在多处关键配置项为 null,这将导致训练程序在启动时因验证失败而崩溃。
  2. 代码缺陷
    • gsm8k.py 中的奖励计算函数存在一个bug,硬编码了返回值,忽略了函数参数。
    • gsm8k_lxm2_newprompt_trainval.py 数据预处理脚本在特定条件下(使用HDFS时)会因缺少导入而崩溃。
  3. 潜在风险
    • ray_trainer.py 中的验证逻辑可能会一次性加载整个验证集,存在内存溢出的风险。
    • 代码中多处使用了过于宽泛的异常捕获(except Exception),这不利于问题的排查和调试。

此外,我还提供了一些关于代码健壮性、可读性和最佳实践的建议。修复这些问题将有助于提高代码的稳定性和可维护性。请查看具体的审查评论以获取详细信息和修改建议。

Comment on lines +687 to +690
ppo_micro_batch_size: null

# Local per-GPU micro batch size
ppo_micro_batch_size_per_gpu: null
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

与 actor 配置类似,当 use_dynamic_bszfalse 时(第699行),critic 的 ppo_micro_batch_sizeppo_micro_batch_size_per_gpu 也不能同时为 null。这将导致启动失败。请为 ppo_micro_batch_size_per_gpu 设置一个值。

Comment on lines +163 to +166
ppo_micro_batch_size: null

# Local per-GPU micro batch size
ppo_micro_batch_size_per_gpu: null
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

此处的 actor 配置无效。当 use_dynamic_bszfalse 时(第169行),ppo_micro_batch_sizeppo_micro_batch_size_per_gpu 不能同时为 null。这会导致 ray_trainer.py 在启动时因配置验证失败而报错。请为 ppo_micro_batch_size_per_gpu 设置一个合适的非空值。

import datasets
from glob import glob

# from verl.utils.hdfs_io import copy, makedirs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

在脚本的末尾(166-168行)调用了 makedirscopy 函数,但它们的导入语句在第25行被注释掉了。如果提供了 hdfs_dir 参数,这将导致 NameError。请取消此行的注释以修复该错误。

Suggested change
# from verl.utils.hdfs_io import copy, makedirs
from verl.utils.hdfs_io import copy, makedirs

Comment on lines +356 to +359
log_prob_micro_batch_size: null

# The batch size for one forward pass in the computation of log_prob. Local batch size per GPU.
log_prob_micro_batch_size_per_gpu: null
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

当参考模型(reference policy)被启用且 log_prob_use_dynamic_bszfalse 时(第362行),log_prob_micro_batch_sizelog_prob_micro_batch_size_per_gpu 不能同时为 null。请为 log_prob_micro_batch_size_per_gpu 提供一个值,以避免在需要参考模型时出现运行时错误。

Comment on lines +449 to +452
log_prob_micro_batch_size: null

# The batch size for one forward pass in the computation of log_prob. Local batch size per GPU.
log_prob_micro_batch_size_per_gpu: null
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

与参考模型配置类似,当 log_prob_use_dynamic_bszfalse 时(第455行),rollout 的 log_prob_micro_batch_sizelog_prob_micro_batch_size_per_gpu 不能同时为 null。请为 log_prob_micro_batch_size_per_gpu 设置一个值以避免运行时错误。

return final_answer


def compute_score(solution_str, ground_truth, method="strict", format_score=0.2, score=1.0):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

函数 compute_score 的参数 method 在函数体内没有被使用,因为 extract_solution 总是以 method="strict" 的方式被调用。建议移除这个未使用的参数,以简化函数签名并避免混淆。

Suggested change
def compute_score(solution_str, ground_truth, method="strict", format_score=0.2, score=1.0):
def compute_score(solution_str, ground_truth, format_score=0.2, score=1.0):

# Save frequency (by iteration) for model checkpoints
save_freq: -1

# ESI redundant time (in seconds) for model checkpointsAdd commentMore actions
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

这行注释的末尾似乎包含了从某个UI界面意外复制过来的文本(Add commentMore actions)。建议将其移除以保持配置文件的整洁。

  # ESI redundant time (in seconds) for model checkpoints

train_candidates = glob(os.path.join(ms_base_dir, "**", "big-math-rl-verified-processed-train.arrow"), recursive=True)
# test_candidates = glob(os.path.join(ms_base_dir, "**", "gsm8k-test.arrow"), recursive=True)

assert len(train_candidates) > 0, f"未在 {ms_base_dir} 下找到 gsm8k-train.arrow"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

此处的断言消息硬编码为 gsm8k-train.arrow,但实际搜索的文件名是 big-math-rl-verified-processed-train.arrow(见第48行)。这可能会在调试时引起困惑。建议将断言消息与实际搜索的文件名保持一致。

Suggested change
assert len(train_candidates) > 0, f"未在 {ms_base_dir} 下找到 gsm8k-train.arrow"
assert len(train_candidates) > 0, f"未在 {ms_base_dir} 下找到 big-math-rl-verified-processed-train.arrow"

Comment on lines +29 to +33
solution = re.search("#### (\\-?[0-9\\.\\,]+)", solution_str)
assert solution is not None
final_solution = solution.group(0)
final_solution = final_solution.split("#### ")[1].replace(",", "")
return final_solution
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

使用 solution.group(0) 获取整个匹配(例如 "#### 123"),然后再用 split 来提取数字,这种方式有点迂回且不够健壮。直接使用 solution.group(1) 可以更简洁、直接地获取正则表达式中捕获组匹配到的数字部分。同时,建议在断言失败时提供更有用的错误信息。

Suggested change
solution = re.search("#### (\\-?[0-9\\.\\,]+)", solution_str)
assert solution is not None
final_solution = solution.group(0)
final_solution = final_solution.split("#### ")[1].replace(",", "")
return final_solution
solution = re.search("#### (\\-?[0-9\\.\\,]+)", solution_str)
assert solution is not None, f"无法在字符串中找到解决方案格式 '#### ...':{solution_str}"
final_solution = solution.group(1).replace(",", "")
return final_solution

rollout_data_dir = self.config.trainer.get("rollout_data_dir", None)
if rollout_data_dir:
with marked_timer("dump_rollout_generations", timing_raw, color="green"):
print(batch.batch.keys())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

print 语句似乎是用于调试的,它会打印出批处理数据的所有键。建议在提交最终代码前移除这类调试信息,以保持控制台输出的干净和专业。

@ftgreat ftgreat changed the title UCI001(liximeng0824)提交 [PZ COMPETITION] UCI001(liximeng0824) Sep 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant