Skip to content

Drop stale async rollout results#1738

Open
hamishivi wants to merge 4 commits into
mainfrom
hamishivi/drop-stale-rollouts
Open

Drop stale async rollout results#1738
hamishivi wants to merge 4 commits into
mainfrom
hamishivi/drop-stale-rollouts

Conversation

@hamishivi

Copy link
Copy Markdown
Collaborator

Summary

  • Add max_result_age_steps to accumulate_inference_batches: in async GRPO, a rollout whose generating policy is more than that many steps behind the current trainer step (training_step - result.model_step > max_result_age_steps) is dropped instead of trained on.
  • When a stale result is dropped, a fresh prompt is replenished into the generator (when replenish_prompts is on) so throughput is unaffected, and the result is not kept for requeue-on-timeout.
  • Logs a stale_results_dropped reward metric per accumulation.
  • Wired from DataPreparationActor using async_steps as the staleness threshold (results more than one async window behind are dropped).

Notes

  • model_step tracking and the model_step_min/max/mean / num_steps_off_policy metrics already existed on main; this only adds the drop policy on top.
  • The staleness predicate is factored into a small pure result_is_stale(...) helper for testability.

Test plan

  • New unit tests TestResultIsStale cover: disabled when max_result_age_steps/training_step/model_step is None, stale when lag exceeds the threshold, not stale at the threshold boundary, and not stale when fresh.
  • ruff, formatting, and whole-project ty check pass.

GPU_TESTS=bypass

Made with Cursor

hamishivi and others added 2 commits June 25, 2026 09:20
Add max_result_age_steps to accumulate_inference_batches: rollouts whose
generating policy is more than that many steps behind the trainer are dropped
(and a fresh prompt is replenished) rather than trained on, with a
stale_results_dropped metric. Wired from the DataPreparationActor using
async_steps as the threshold. Adds a result_is_stale helper + unit tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to drop stale async rollout results whose generating policy lags too far behind the trainer, replenishing them with fresh prompts and logging a stale_results_dropped metric. Unit tests were also added to verify the staleness logic. The reviewer identified a critical issue where dropping stale results when replenish_prompts is disabled could lead to a silent deadlock or hang due to a drained pipeline, and suggested adding a validation check to prevent this configuration.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +1082 to +1104
if result_is_stale(result.model_step, training_step, max_result_age_steps):
stale_results_dropped += 1
logger.warning(
"[accumulate_inference_batches] Dropping stale result for index=%s at training_step=%s: "
"model_step=%s lag=%s max_result_age_steps=%s",
result.index,
training_step,
result.model_step,
training_step - result.model_step,
max_result_age_steps,
)
if replenish_prompts:
assert iter_dataloader is not None and param_prompt_Q is not None
add_prompt_to_generator(
next(iter_dataloader),
iter_dataloader._epoch,
param_prompt_Q,
generation_config,
is_eval=False,
base_env_config=base_env_config,
ground_truth_overrides=ground_truth_overrides,
)
continue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If max_result_age_steps is configured but replenish_prompts is False, dropping stale results without replenishing them will permanently reduce the number of prompts in flight. Over time, this will drain the pipeline and cause a silent deadlock/hang as the accumulator waits indefinitely for num_prompts results that will never arrive.

To prevent this, we should explicitly validate that replenish_prompts is enabled when staleness checking is active.

        if result_is_stale(result.model_step, training_step, max_result_age_steps):
            if not replenish_prompts:
                raise ValueError("max_result_age_steps requires replenish_prompts to be True to avoid deadlocks.")
            stale_results_dropped += 1
            logger.warning(
                "[accumulate_inference_batches] Dropping stale result for index=%s at training_step=%s: "
                "model_step=%s lag=%s max_result_age_steps=%s",
                result.index,
                training_step,
                result.model_step,
                training_step - result.model_step,
                max_result_age_steps,
            )
            assert iter_dataloader is not None and param_prompt_Q is not None
            add_prompt_to_generator(
                next(iter_dataloader),
                iter_dataloader._epoch,
                param_prompt_Q,
                generation_config,
                is_eval=False,
                base_env_config=base_env_config,
                ground_truth_overrides=ground_truth_overrides,
            )
            continue

hamishivi and others added 2 commits June 25, 2026 09:22
Co-authored-by: Cursor <cursoragent@cursor.com>
Dropping stale results without replenishing would steadily drain the in-flight
prompt pool and hang the accumulator, so validate up front that
max_result_age_steps is only used with replenish_prompts=True. The drop path now
replenishes unconditionally. Adds a unit test for the guard.

Co-authored-by: Cursor <cursoragent@cursor.com>
@hamishivi

Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to drop stale async rollout results whose generating policy lags too far behind the trainer (exceeding max_result_age_steps). When a stale result is detected, it is discarded, a fresh prompt is replenished to prevent the pipeline from draining, and a stale_results_dropped metric is logged. Unit tests have been added to verify the staleness detection logic and the associated configuration validation. There are no review comments, so no additional feedback is provided.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@hamishivi hamishivi requested a review from farhatkevin June 25, 2026 16:48

@farhatkevin farhatkevin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants