Skip to content

Add use_cpu_adam flag for GRPO fast training#1737

Open
Chessing234 wants to merge 3 commits into
allenai:mainfrom
Chessing234:fix/grpo-cpu-adam
Open

Add use_cpu_adam flag for GRPO fast training#1737
Chessing234 wants to merge 3 commits into
allenai:mainfrom
Chessing234:fix/grpo-cpu-adam

Conversation

@Chessing234

Copy link
Copy Markdown
Contributor

Summary

  • Add --use_cpu_adam CLI flag to grpo_fast.py to use DeepSpeedCPUAdam instead of fused AdamW
  • Complements existing --deepspeed_offload_optimizer / --deepspeed_offload_param flags for low-VRAM local training

Test plan

  • Run grpo_fast.py with --use_cpu_adam true --deepspeed_offload_optimizer true on a single GPU
  • Confirm default behavior unchanged when --use_cpu_adam is omitted

Fixes #1031

Made with Cursor

Expose DeepSpeedCPUAdam as an optional optimizer when running grpo_fast
with CPU optimizer offload on memory-constrained setups.

Fixes allenai#1031.

Co-authored-by: Cursor <cursoragent@cursor.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new '--use_cpu_adam' flag to 'grpo_fast.py' to support 'DeepSpeedCPUAdam' when training with CPU optimizer offload. The feedback suggests importing 'DeepSpeedCPUAdam' lazily at the point of use rather than at the top level of the file to prevent import-time errors or startup delays in environments where DeepSpeed is not installed or compiled.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines 38 to 42
with contextlib.suppress(Exception):
import deepspeed
from deepspeed.ops.adam import DeepSpeedCPUAdam
from deepspeed.runtime.sequence_parallel.ulysses_sp import UlyssesSPAttentionHF
from deepspeed.utils import groups

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Importing DeepSpeedCPUAdam at the top level can cause import-time failures or significant startup delays, especially in environments where DeepSpeed is not installed or where its C++ extensions are not compiled (e.g., local CPU-only development environments).

It is highly recommended to lazily import DeepSpeedCPUAdam only when it is actually needed (i.e., when args.use_cpu_adam is enabled).

Suggested change
with contextlib.suppress(Exception):
import deepspeed
from deepspeed.ops.adam import DeepSpeedCPUAdam
from deepspeed.runtime.sequence_parallel.ulysses_sp import UlyssesSPAttentionHF
from deepspeed.utils import groups
with contextlib.suppress(Exception):
import deepspeed
from deepspeed.runtime.sequence_parallel.ulysses_sp import UlyssesSPAttentionHF
from deepspeed.utils import groups

Comment on lines +376 to +379
if args.use_cpu_adam:
self.optimizer = DeepSpeedCPUAdam(optim_params, lr=args.learning_rate)
else:
self.optimizer = torch.optim.AdamW(optim_params, lr=args.learning_rate, fused=args.fused_optimizer)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Import DeepSpeedCPUAdam lazily here to prevent import-time errors and provide a clear, actionable error message if DeepSpeed is not installed or compiled with CPU Adam support.

        if args.use_cpu_adam:
            try:
                from deepspeed.ops.adam import DeepSpeedCPUAdam
            except ImportError as e:
                raise ImportError(
                    "DeepSpeedCPUAdam requires deepspeed to be installed and compiled with CPU Adam support. "
                    "Please ensure deepspeed is installed correctly."
                ) from e
            self.optimizer = DeepSpeedCPUAdam(optim_params, lr=args.learning_rate)
        else:
            self.optimizer = torch.optim.AdamW(optim_params, lr=args.learning_rate, fused=args.fused_optimizer)

Chessing234 and others added 2 commits June 24, 2026 21:03
Co-authored-by: Cursor <cursoragent@cursor.com>
Reject incompatible combinations early and warn when CPU Adam is enabled
without optimizer offload.

Fixes allenai#1031

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Add CLI toggles for CPU offloading in grpo_fast.py

1 participant