fix: prune deadline alerts when DAG run fails, not just on success#62918
fix: prune deadline alerts when DAG run fails, not just on success#62918YoannAbriel wants to merge 3 commits into
Conversation
46eb8e1 to
10f4217
Compare
10f4217 to
411a87f
Compare
|
No, not quite the answer we need here. When you call The right fix would be: when the DagRun transitions to You can keep the change you made, but you'll have to wrap the new location in a state check to only call prune on SUCCESS and also add a block for state==FAILED to call handle_miss as well. |
|
Good catch — will update to call handle_miss() on FAILED and only prune on SUCCESS. |
|
The change of location isn't necessarily bad, but I think it's somewhat irrelevant. We can either do or I don't think the location itself matters, but maybe inside the existing "if terminal state" block is slightly easier to find/read. |
4faf4e9 to
b5a52f1
Compare
|
Thanks @ferruzzi — updated to keep the logic inside the existing terminal-state if-block as you suggested: |
35c9dc2 to
feac1a4
Compare
|
Don't you want to add the handle_miss branch in that same place, so it gets run immediately at failure as well? |
e98fa49 to
4d8e8a4
Compare
|
Done — SUCCESS prunes, FAILURE calls handle_miss() immediately. |
3e6e577 to
c964a37
Compare
c964a37 to
dd18c08
Compare
ac96d0a to
262c533
Compare
There was a problem hiding this comment.
Pull request overview
Fixes a scheduler-side cleanup bug where DagRun deadline alerts could still fire after a DagRun had already failed, by ensuring deadlines are pruned whenever the DagRun reaches a terminal state.
Changes:
- Move
Deadline.prune_deadlines()invocation from the success-only branch to the shared terminal-state block (SUCCESS/FAILED). - Add a unit test asserting DagRun deadline pruning also occurs on failure.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| airflow-core/src/airflow/models/dagrun.py | Prunes pending DagRun deadlines when a DagRun finishes (FAILED or SUCCESS) so alerts don’t trigger after completion. |
| airflow-core/tests/unit/models/test_dagrun.py | Adds a regression test covering deadline pruning for failed DagRuns. |
| self, mock_get_by_id, mock_prune, session, deadline_test_dag | ||
| ): | ||
| """Deadlines should be pruned when a DAG run fails, not just on success.""" | ||
| mock_deadline_alert = mock.MagicMock() |
There was a problem hiding this comment.
mock.MagicMock() is created without a spec/autospec, which can hide attribute/typo mistakes in tests. Consider using a spec’d mock (or an autospecced patch) matching the DeadlineAlert model so the test fails if it relies on non-existent attributes.
| mock_deadline_alert = mock.MagicMock() | |
| mock_deadline_alert = mock.create_autospec(DeadlineAlertModel, instance=True) |
| deadline_alerts = [ | ||
| DeadlineAlertModel.get_by_id(alert_id, session) for alert_id in dag.deadline | ||
| ] | ||
|
|
||
| if any( | ||
| deadline_alert.reference_class in SerializedReferenceModels.TYPES.DAGRUN | ||
| for deadline_alert in deadline_alerts |
There was a problem hiding this comment.
deadline_alerts = [DeadlineAlertModel.get_by_id(...)] issues one DB query per deadline ID. Since this code runs on every terminal DagRun (now including failures), consider fetching all referenced DeadlineAlert rows in a single query (id IN (...)) and then checking whether any reference_class is DagRun-related, to avoid N queries when multiple deadlines are configured.
| deadline_alerts = [ | |
| DeadlineAlertModel.get_by_id(alert_id, session) for alert_id in dag.deadline | |
| ] | |
| if any( | |
| deadline_alert.reference_class in SerializedReferenceModels.TYPES.DAGRUN | |
| for deadline_alert in deadline_alerts | |
| deadline_alert_reference_classes = session.execute( | |
| select(DeadlineAlertModel.reference_class).where( | |
| DeadlineAlertModel.id.in_(set(dag.deadline)) | |
| ) | |
| ).scalars() | |
| if any( | |
| reference_class in SerializedReferenceModels.TYPES.DAGRUN | |
| for reference_class in deadline_alert_reference_classes |
There was a problem hiding this comment.
Honestly, I don't really think this is necessary and I'd just resolve it. Seems like over-optimizing around an edge case where a dagrun has many deadlines as opposed to just using the exiting helper that serves the same purpose.
d65ff0e to
89dc314
Compare
Previously, deadline alerts were only pruned when a DAG run succeeded. When a DAG run failed, pending deadlines remained and would still fire even though the run had already reached a terminal state. Moved deadline pruning from the success-only code path to the shared terminal state block (covers both SUCCESS and FAILED). Closes: apache#60927
89dc314 to
6856b62
Compare
|
@YoannAbriel can you address failing static checks? |
|
Current head still deletes failed-run deadlines instead of handling them as missed. |
Problem
Deadline alerts fire even after a DAG run has failed. A user configured a 180-minute deadline, the DAG failed within 10 minutes, but the deadline alert still triggered 3 hours later.
Root Cause
DagRun.update_state()only callsDeadline.prune_deadlines()in the success path. When a DAG run fails (task failure or deadlock), pending deadlines are never cleaned up, so the scheduler still fires them later.Fix
Moved deadline pruning from the success-only block to the shared terminal state block that runs for both SUCCESS and FAILED states. Added a test for the failure case.
Closes: #60927
Was generative AI tooling used to co-author this PR?
Generated-by: Claude Code following the guidelines
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.Important
🛠️ Maintainer triage note for @YoannAbriel · by
@potiuk· 2026-06-22 06:31 UTCHelpful heads-up from the maintainers — please address before this PR can be reviewed (see the Pull Request quality criteria):
The ball is in your court — you've been assigned to this PR. Fix the above, then mark it Ready for review.
Automated triage — may be imperfect; a maintainer takes the next look.