Add use_cpu_adam flag for GRPO fast training#1737
Conversation
Expose DeepSpeedCPUAdam as an optional optimizer when running grpo_fast with CPU optimizer offload on memory-constrained setups. Fixes allenai#1031. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a new '--use_cpu_adam' flag to 'grpo_fast.py' to support 'DeepSpeedCPUAdam' when training with CPU optimizer offload. The feedback suggests importing 'DeepSpeedCPUAdam' lazily at the point of use rather than at the top level of the file to prevent import-time errors or startup delays in environments where DeepSpeed is not installed or compiled.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| with contextlib.suppress(Exception): | ||
| import deepspeed | ||
| from deepspeed.ops.adam import DeepSpeedCPUAdam | ||
| from deepspeed.runtime.sequence_parallel.ulysses_sp import UlyssesSPAttentionHF | ||
| from deepspeed.utils import groups |
There was a problem hiding this comment.
Importing DeepSpeedCPUAdam at the top level can cause import-time failures or significant startup delays, especially in environments where DeepSpeed is not installed or where its C++ extensions are not compiled (e.g., local CPU-only development environments).
It is highly recommended to lazily import DeepSpeedCPUAdam only when it is actually needed (i.e., when args.use_cpu_adam is enabled).
| with contextlib.suppress(Exception): | |
| import deepspeed | |
| from deepspeed.ops.adam import DeepSpeedCPUAdam | |
| from deepspeed.runtime.sequence_parallel.ulysses_sp import UlyssesSPAttentionHF | |
| from deepspeed.utils import groups | |
| with contextlib.suppress(Exception): | |
| import deepspeed | |
| from deepspeed.runtime.sequence_parallel.ulysses_sp import UlyssesSPAttentionHF | |
| from deepspeed.utils import groups |
| if args.use_cpu_adam: | ||
| self.optimizer = DeepSpeedCPUAdam(optim_params, lr=args.learning_rate) | ||
| else: | ||
| self.optimizer = torch.optim.AdamW(optim_params, lr=args.learning_rate, fused=args.fused_optimizer) |
There was a problem hiding this comment.
Import DeepSpeedCPUAdam lazily here to prevent import-time errors and provide a clear, actionable error message if DeepSpeed is not installed or compiled with CPU Adam support.
if args.use_cpu_adam:
try:
from deepspeed.ops.adam import DeepSpeedCPUAdam
except ImportError as e:
raise ImportError(
"DeepSpeedCPUAdam requires deepspeed to be installed and compiled with CPU Adam support. "
"Please ensure deepspeed is installed correctly."
) from e
self.optimizer = DeepSpeedCPUAdam(optim_params, lr=args.learning_rate)
else:
self.optimizer = torch.optim.AdamW(optim_params, lr=args.learning_rate, fused=args.fused_optimizer)Co-authored-by: Cursor <cursoragent@cursor.com>
Reject incompatible combinations early and warn when CPU Adam is enabled without optimizer offload. Fixes allenai#1031 Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
--use_cpu_adamCLI flag togrpo_fast.pyto useDeepSpeedCPUAdaminstead of fusedAdamW--deepspeed_offload_optimizer/--deepspeed_offload_paramflags for low-VRAM local trainingTest plan
grpo_fast.pywith--use_cpu_adam true --deepspeed_offload_optimizer trueon a single GPU--use_cpu_adamis omittedFixes #1031
Made with Cursor