Skip to content

Enhanced run_immediately functionality#64162

Open
manipatnam wants to merge 2 commits intoapache:mainfrom
manipatnam:fix/run-immediately-respects-start-date
Open

Enhanced run_immediately functionality#64162
manipatnam wants to merge 2 commits intoapache:mainfrom
manipatnam:fix/run-immediately-respects-start-date

Conversation

@manipatnam
Copy link
Contributor

@manipatnam manipatnam commented Mar 24, 2026

Summary

Fix CronTriggerTimetable ignoring run_immediately when start_date is set, simplify its
semantics, and document the parameter properly.

Bug fix

run_immediately was only respected on the very first run when start_date=None.
When start_date was set (the common case), _calc_first_run() was skipped and the
timetable always fell back to _align_to_prev(now), behaving as run_immediately=True
regardless of the value passed. The fix ensures _calc_first_run() is always called on
the first run, whether or not start_date is set.

Semantic simplification (justified by the bug)

Because run_immediately was silently ignored whenever start_date was set, the False
path in _calc_first_run() was effectively dead code for the common case. Auditing that
dead code revealed two inconsistencies:

  • run_immediately=False applied an undocumented auto-buffer window (10% of the cron
    interval, minimum 5 min), identical to None, making False not mean what users expected.
    Since timedelta already covers the "run past if within a window" use case, the buffer
    is removed. False now cleanly means "always skip the past cron point and wait for the
    next future one".
  • run_immediately was not documented at all, so no user could have knowingly depended on
    the buffer behaviour.

Backward compatibility

The default is changed from False to True so that existing DAGs without an explicit
run_immediately continue to run the most recent past cron point immediately. This preserves
the de-facto behaviour that all users with start_date set were already seeing (since the
bug made False act like True in that case). Only DAGs that explicitly pass
run_immediately=False without a start_date are affected — and for those, False now
means exactly what the name implies.

Changes

  • _TriggerTimetable.next_dagrun_info: on first run (no prior runs), always call
    _calc_first_run() regardless of start_date; after a pause/resume keep the existing
    "pick most recent past boundary" logic unchanged
  • CronTriggerTimetable._calc_first_run: simplified to three clean cases —
    True (run past), False (wait for next), timedelta (run past if within window)
  • Default run_immediately changed False → True in CronTriggerTimetable,
    MultipleCronTriggerTimetable, and CronPartitionTimetable
  • Docs: added full run_immediately documentation with examples and versionadded:: 3.0.0
  • Docs: corrected the paused-then-unpaused example (both timetables skip only Feb 1 and
    immediately trigger Feb 2 — the claim that CronTriggerTimetable skips Feb 2 as well was wrong)
  • Tests: updated existing tests and added test_run_immediately_false_with_start_date,
    test_run_immediately_true_with_start_date, test_run_immediately_false_after_unpause

Was generative AI tooling used to co-author this PR?
  • Yes — Claude Sonnet 4.6

@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label Mar 24, 2026
@collinmcnulty
Copy link
Contributor

Can you fill in this table so the intent is clear?

start_date run_immediately behavior before this PR behavior after this PR
set default
set explicit True
set explicit False
None default
None explicit True
None explicit False

@manipatnam
Copy link
Contributor Author

start_date run_immediately Before this PR After this PR
set default (True) Old default was False; Ran past boundary immediately Runs past boundary immediately
set explicit True Ran past boundary immediately Runs past boundary immediately
set explicit False Ran past boundary immediately Waits for next future boundary
None default (True) Old default was False; waited for next future boundary Runs past boundary immediately
None explicit True Ran past boundary immediately Runs past boundary immediately
None explicit False Waited for next future boundary Waits for next future boundary

cc: @collinmcnulty

Copy link
Contributor

@SameerMesiah97 SameerMesiah97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is mixing a legitimate bug fix i.e. ensuring that setting the start_date does not subvert run_immediately in the timetable logic, with a behavioral change that feels a bit opinionated. Always routing the first run through _calc_first_run() makes sense, since previously run_immediately could effectively be ignored when start_date was set which is clearly inconsistent.

The changes around run_immediately=False and removing the implicit buffer feel more like a semantic change than a bug fix. The buffer logic looks quite intentional and provides a small grace window for slightly late scheduling, and now False strictly skips the past run. Also changing the default from False to True is prettty big shift. Now, I will caveat this by stating that I may be missing context, so I am happy to be corrected but a semantic shift like this in scheduler-adjacent code needs stronger justification than what you have provided.

@manipatnam
Copy link
Contributor Author

The changes around run_immediately=False and removing the implicit buffer feel more like a semantic change than a bug fix. The buffer logic looks quite intentional and provides a small grace window for slightly late scheduling, and now False strictly skips the past run. Also changing the default from False to True is prettty big shift. Now, I will caveat this by stating that I may be missing context, so I am happy to be corrected but a semantic shift like this in scheduler-adjacent code needs stronger justification than what you have provided.

Thanks for the review @SameerMesiah97

Why the auto-buffer was removed from False

In the old code, False and None had identical implementations — both applied the same
10% grace window. So passing False did not guarantee skipping the past cron point; it
could still run it if the scheduler evaluated within the buffer window. Since timedelta
already exists for users who want a grace window, False is cleaned up to strictly mean
what it says: always skip the past cron point.

Why the default changed from False to True

Because of the bug, run_immediately=False was silently ignored whenever start_date was
set. Since almost every production DAG sets start_date, the default False always behaved
like True for the vast majority of users in practice. Changing the default to True preserves
what users have always observed. If we kept the default as False and only fixed the bug,
every existing DAG without an explicit run_immediately would suddenly stop running the past
cron point on first enable — a much larger breaking change.

I think this PR is mixing a legitimate bug fix i.e. ensuring that setting the start_date does not subvert run_immediately in the timetable logic, with a behavioral change that feels a bit opinionated. Always routing the first run through _calc_first_run() makes sense, since previously run_immediately could effectively be ignored when start_date was set which is clearly inconsistent.

This is my actual pain point. I have updated the description as stated

@eladkal eladkal added this to the Airflow 3.2.0 milestone Mar 25, 2026
@eladkal eladkal added the type:bug-fix Changelog: Bug Fixes label Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:documentation ready for maintainer review Set after triaging when all criteria pass. type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants