Fix get_wandb_tags tag-limit handling and log all GRPO configs#1727
Fix get_wandb_tags tag-limit handling and log all GRPO configs#1727mnoukhov wants to merge 3 commits into
Conversation
de66fa9 to
bf5e0f9
Compare
There was a problem hiding this comment.
Code Review
This pull request refactors the Weights & Biases (W&B) integration across multiple training scripts to use an updated get_wandb_tags function that prepends extra tags. It also updates grpo.py to log multiple configuration dataclasses to the W&B config. Feedback was provided on grpo.py regarding the risk of unpacking multiple dataclasses using ** in a single .update() call, which could cause a runtime TypeError if there are overlapping field names; updating the dictionary sequentially is recommended instead.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| json_config.update( | ||
| **dataclasses.asdict(args), | ||
| **dataclasses.asdict(tc), | ||
| **dataclasses.asdict(model_config), | ||
| **dataclasses.asdict(streaming_config), | ||
| **dataclasses.asdict(vllm_config), | ||
| ) |
There was a problem hiding this comment.
Unpacking multiple dataclasses as keyword arguments using ** in a single update() call is risky. If any of these dataclasses (args, tc, model_config, streaming_config, vllm_config) share any field names (either now or in the future), Python will raise a TypeError: update() got multiple values for keyword argument ... at runtime.
To prevent potential runtime crashes and ensure robustness, update the dictionary sequentially by passing the dictionaries directly to .update().
for config in (args, tc, model_config, streaming_config, vllm_config):
json_config.update(dataclasses.asdict(config))There was a problem hiding this comment.
This would silently overwrite some configs with others. I prefer to have it fail if two dataclasses share configs
What
Two small fixes ported over from another branch:
get_wandb_tagstag-limit handling.get_wandb_tagsnow accepts anextra_tagsargument (e.g. the experiment name). Extra tags are prepended and, like all tags, truncated to W&B's 64-char limit. Previously callers did[args.exp_name] + get_wandb_tags(), which bypassed truncation and could exceed W&B's limit. All callers (grpo_fast.py,finetune.py,dpo_tune_cache.py,reward_modeling.py) are updated, and the OLMo-core GRPOWandBCallbacknow also receives tags.Log all GRPO config dataclasses.
grpo.pynow logstc,model_config,streaming_config, andvllm_configinto the wandb config in addition toargs, instead of onlyargs.Testing
make style && make qualitypass.🤖 Generated with Claude Code