Skip to content

Fix backfill marked complete before DagRuns are created#62561

Open
shivaam wants to merge 3 commits intoapache:mainfrom
shivaam:fix/backfill-race-61375
Open

Fix backfill marked complete before DagRuns are created#62561
shivaam wants to merge 3 commits intoapache:mainfrom
shivaam:fix/backfill-race-61375

Conversation

@shivaam
Copy link
Contributor

@shivaam shivaam commented Feb 27, 2026

What

The scheduler's _mark_backfills_complete() prematurely marks a backfill
as completed when it runs during the window between the Backfill row
commit and the DagRun creation in _create_backfill().

closes: #61375

Why

_create_backfill() works in two steps:

  1. First it commits the Backfill row to the DB
  2. Then it creates the DagRuns

The scheduler runs _mark_backfills_complete() every 30 seconds. If it happens to run between step 1 and step 2, it sees a backfill with no running DagRuns (because they don't exist yet) and marks it done. The DagRuns get created after, but the backfill is already completed.

How

Added an EXISTS check on the backfill_dag_run table in the completion query. Now a backfill needs at least one BackfillDagRun row before it can be marked complete. If it has zero, it means the backfill is still being set up, so we skip it.

Tests

  • test_mark_backfills_complete_skips_initializing_backfill — verifies that backfill without any dagruns is skipped, then completed after DagRuns finish. If we remove the fix, the test will fail.

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below): Kiro

@shivaam shivaam force-pushed the fix/backfill-race-61375 branch from dcaf372 to 3372139 Compare February 27, 2026 10:01
@shivaam shivaam marked this pull request as ready for review February 27, 2026 10:11
@shivaam shivaam requested review from XD-DENG and ashb as code owners February 27, 2026 10:11
@eladkal eladkal added this to the Airflow 3.1.8 milestone Feb 28, 2026
@eladkal eladkal added type:bug-fix Changelog: Bug Fixes backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch labels Feb 28, 2026
@eladkal eladkal requested a review from dstandish February 28, 2026 04:55
Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
will need a 2nd reviewer as this is scheduler core area

@eladkal eladkal requested review from Lee-W, kaxil and uranusjr March 3, 2026 13:37
@shivaam
Copy link
Contributor Author

shivaam commented Mar 7, 2026

Is it possible to also add copilot reviewer to this cr as well?

@kaxil kaxil requested a review from Copilot March 12, 2026 19:03
@kaxil kaxil changed the title Fix backfill marked complete before DagRuns are created (#61375) Fix backfill marked complete before DagRuns are created Mar 12, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a scheduler race where _mark_backfills_complete() could mark a Backfill as complete in the gap between the Backfill row being committed and its DagRuns being created.

Changes:

  • Add a scheduler-side guard requiring a Backfill to have at least one BackfillDagRun association before it can be marked complete.
  • Add a unit test covering the “initializing backfill” window (no associated DagRuns/BackfillDagRun rows yet), ensuring it is not prematurely completed.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
airflow-core/src/airflow/jobs/scheduler_job_runner.py Updates the backfill completion query to require existence of at least one BackfillDagRun row before completing.
airflow-core/tests/unit/jobs/test_scheduler_job.py Adds a regression test that reproduces the initialization window and asserts the scheduler skips completion until associations exist.

You can also share your feedback on Copilot code review. Take the survey.

shivaam added 2 commits March 22, 2026 19:55
The scheduler's _mark_backfills_complete() could mark a backfill as
completed during the window between the Backfill row commit and DagRun
creation. Add an EXISTS guard on BackfillDagRun so backfills still
being initialized are skipped.
Add a 2-minute age-based fallback to the backfill completion guard so
orphaned backfills (those that failed during initialization and never
got any BackfillDagRun associations) are auto-completed instead of
permanently blocking new backfills for the same DAG.

Also adds tests for: orphan cleanup after cutoff, old backfill with
running DagRuns stays open, young backfill with finished runs completes
immediately, and multiple backfills processed independently.
@shivaam shivaam force-pushed the fix/backfill-race-61375 branch from f450cd7 to 2eb0314 Compare March 23, 2026 04:18
# that failed during initialization and never got any associations.
or_(
exists(select(BackfillDagRun.id).where(BackfillDagRun.backfill_id == Backfill.id)),
Backfill.created_at < initializing_cutoff,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this gaurd that will cleanup any backfills which dont have associated dag runs after 2 mins. Therefore, we wont have any stuck backfills

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm but it would be good to get your review @dstandish

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Backfill job marked as completed immediately, though backfill dag runs still are executing

5 participants