求问Z-image-turbo在grpo（lora）时的train/ratio_min=1 train/ratio_max=1 train/ratio_mean=1 train/ratio_std=0问题

配置文件如下：

[default.yaml](https://github.com/user-attachments/files/27578822/default.yaml)

<img width="1636" height="678" alt="Image" src="https://github.com/user-attachments/assets/ccec0624-cd89-47d5-809b-1e392f23e74e" />

<img width="1634" height="599" alt="Image" src="https://github.com/user-attachments/assets/f0eb1ad1-78ee-43ad-843d-547b90f0c4ec" />

训练了400多steps，上图1中ratio一直为1，求问这种情况表示策略没有更新的，但是train_reward如下图2中确实在上升，似乎模型是在更新变好的，求问这是为什么呢？调试的时候是只需要监控train_reward的趋势吗？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

求问Z-image-turbo在grpo（lora）时的train/ratio_min=1 train/ratio_max=1 train/ratio_mean=1 train/ratio_std=0问题 #157

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

求问Z-image-turbo在grpo（lora）时的train/ratio_min=1 train/ratio_max=1 train/ratio_mean=1 train/ratio_std=0问题 #157

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions