Skip to content

Fix get_wandb_tags tag-limit handling and log all GRPO configs#1727

Open
mnoukhov wants to merge 3 commits into
mainfrom
wandb-tags-and-grpo-config
Open

Fix get_wandb_tags tag-limit handling and log all GRPO configs#1727
mnoukhov wants to merge 3 commits into
mainfrom
wandb-tags-and-grpo-config

Conversation

@mnoukhov

Copy link
Copy Markdown
Contributor

What

Two small fixes ported over from another branch:

  1. get_wandb_tags tag-limit handling. get_wandb_tags now accepts an extra_tags argument (e.g. the experiment name). Extra tags are prepended and, like all tags, truncated to W&B's 64-char limit. Previously callers did [args.exp_name] + get_wandb_tags(), which bypassed truncation and could exceed W&B's limit. All callers (grpo_fast.py, finetune.py, dpo_tune_cache.py, reward_modeling.py) are updated, and the OLMo-core GRPO WandBCallback now also receives tags.

  2. Log all GRPO config dataclasses. grpo.py now logs tc, model_config, streaming_config, and vllm_config into the wandb config in addition to args, instead of only args.

Testing

  • make style && make quality pass.

🤖 Generated with Claude Code

@mnoukhov mnoukhov force-pushed the wandb-tags-and-grpo-config branch from de66fa9 to bf5e0f9 Compare June 17, 2026 20:28

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Weights & Biases (W&B) integration across multiple training scripts to use an updated get_wandb_tags function that prepends extra tags. It also updates grpo.py to log multiple configuration dataclasses to the W&B config. Feedback was provided on grpo.py regarding the risk of unpacking multiple dataclasses using ** in a single .update() call, which could cause a runtime TypeError if there are overlapping field names; updating the dictionary sequentially is recommended instead.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread open_instruct/grpo.py
Comment on lines +302 to +308
json_config.update(
**dataclasses.asdict(args),
**dataclasses.asdict(tc),
**dataclasses.asdict(model_config),
**dataclasses.asdict(streaming_config),
**dataclasses.asdict(vllm_config),
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Unpacking multiple dataclasses as keyword arguments using ** in a single update() call is risky. If any of these dataclasses (args, tc, model_config, streaming_config, vllm_config) share any field names (either now or in the future), Python will raise a TypeError: update() got multiple values for keyword argument ... at runtime.

To prevent potential runtime crashes and ensure robustness, update the dictionary sequentially by passing the dictionaries directly to .update().

    for config in (args, tc, model_config, streaming_config, vllm_config):
        json_config.update(dataclasses.asdict(config))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would silently overwrite some configs with others. I prefer to have it fail if two dataclasses share configs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant