Drop stale async rollout results#1738
Conversation
Add max_result_age_steps to accumulate_inference_batches: rollouts whose generating policy is more than that many steps behind the trainer are dropped (and a fresh prompt is replenished) rather than trained on, with a stale_results_dropped metric. Wired from the DataPreparationActor using async_steps as the threshold. Adds a result_is_stale helper + unit tests. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to drop stale async rollout results whose generating policy lags too far behind the trainer, replenishing them with fresh prompts and logging a stale_results_dropped metric. Unit tests were also added to verify the staleness logic. The reviewer identified a critical issue where dropping stale results when replenish_prompts is disabled could lead to a silent deadlock or hang due to a drained pipeline, and suggested adding a validation check to prevent this configuration.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if result_is_stale(result.model_step, training_step, max_result_age_steps): | ||
| stale_results_dropped += 1 | ||
| logger.warning( | ||
| "[accumulate_inference_batches] Dropping stale result for index=%s at training_step=%s: " | ||
| "model_step=%s lag=%s max_result_age_steps=%s", | ||
| result.index, | ||
| training_step, | ||
| result.model_step, | ||
| training_step - result.model_step, | ||
| max_result_age_steps, | ||
| ) | ||
| if replenish_prompts: | ||
| assert iter_dataloader is not None and param_prompt_Q is not None | ||
| add_prompt_to_generator( | ||
| next(iter_dataloader), | ||
| iter_dataloader._epoch, | ||
| param_prompt_Q, | ||
| generation_config, | ||
| is_eval=False, | ||
| base_env_config=base_env_config, | ||
| ground_truth_overrides=ground_truth_overrides, | ||
| ) | ||
| continue |
There was a problem hiding this comment.
If max_result_age_steps is configured but replenish_prompts is False, dropping stale results without replenishing them will permanently reduce the number of prompts in flight. Over time, this will drain the pipeline and cause a silent deadlock/hang as the accumulator waits indefinitely for num_prompts results that will never arrive.
To prevent this, we should explicitly validate that replenish_prompts is enabled when staleness checking is active.
if result_is_stale(result.model_step, training_step, max_result_age_steps):
if not replenish_prompts:
raise ValueError("max_result_age_steps requires replenish_prompts to be True to avoid deadlocks.")
stale_results_dropped += 1
logger.warning(
"[accumulate_inference_batches] Dropping stale result for index=%s at training_step=%s: "
"model_step=%s lag=%s max_result_age_steps=%s",
result.index,
training_step,
result.model_step,
training_step - result.model_step,
max_result_age_steps,
)
assert iter_dataloader is not None and param_prompt_Q is not None
add_prompt_to_generator(
next(iter_dataloader),
iter_dataloader._epoch,
param_prompt_Q,
generation_config,
is_eval=False,
base_env_config=base_env_config,
ground_truth_overrides=ground_truth_overrides,
)
continueCo-authored-by: Cursor <cursoragent@cursor.com>
Dropping stale results without replenishing would steadily drain the in-flight prompt pool and hang the accumulator, so validate up front that max_result_age_steps is only used with replenish_prompts=True. The drop path now replenishes unconditionally. Adds a unit test for the guard. Co-authored-by: Cursor <cursoragent@cursor.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to drop stale async rollout results whose generating policy lags too far behind the trainer (exceeding max_result_age_steps). When a stale result is detected, it is discarded, a fresh prompt is replenished to prevent the pipeline from draining, and a stale_results_dropped metric is logged. Unit tests have been added to verify the staleness detection logic and the associated configuration validation. There are no review comments, so no additional feedback is provided.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Summary
max_result_age_stepstoaccumulate_inference_batches: in async GRPO, a rollout whose generating policy is more than that many steps behind the current trainer step (training_step - result.model_step > max_result_age_steps) is dropped instead of trained on.replenish_promptsis on) so throughput is unaffected, and the result is not kept for requeue-on-timeout.stale_results_droppedreward metric per accumulation.DataPreparationActorusingasync_stepsas the staleness threshold (results more than one async window behind are dropped).Notes
model_steptracking and themodel_step_min/max/mean/num_steps_off_policymetrics already existed on main; this only adds the drop policy on top.result_is_stale(...)helper for testability.Test plan
TestResultIsStalecover: disabled whenmax_result_age_steps/training_step/model_stepisNone, stale when lag exceeds the threshold, not stale at the threshold boundary, and not stale when fresh.ruff, formatting, and whole-projectty checkpass.GPU_TESTS=bypass
Made with Cursor