-
Notifications
You must be signed in to change notification settings - Fork 16.8k
Description
Apache Airflow version
2.10.5
If "Other Airflow 2 version" selected, which one?
No response
What happened?
Two DAGs each receive a large batch of DAG Runs. The number of runs for each DAG exceeds max_dagruns_per_loop_to_schedule. Each DAG run is very short, shorter than the heartrate of this Airflow deployment. Both DAGs have a max_active_runs that is far less than dagruns_per_loop.
So: max_active_runs < max_dagruns_per_loop_to_schedule < number of queued DAG runs.
Each scheduler loop, there are a very small number of DAG Run "slots" for the first DAG, so the check coalesce(running_drs.c.num_running, text("0")) < coalesce(Backfill.max_active_runs, DagModel.max_active_runs), does not apply. But then all the DAG runs that are considered are from the first DAG. So Second DAG effectively has to wait for nearly all of First DAG's runs to complete before any of its runs are moved from queued to running.
What you think should happen instead?
I think the "most correct" thing to do is to change the global yes/no for a DAG being included in the check on the basis of max_active_runs to some kind of limit on the number for that DAG that can be included. I can't see a good way to do this in SQL but others may have insight.
Alternatively, because this is predominantly a problem when a single DAG dominates the scheduler's attention, we could add an explicit check to see if the result of the DAG run query contains only the a single DAG, and if so re-run the query with that DAG excluded.
How to reproduce
- Create two DAGs with a single, simple task.
- Set max_active_runs=100
- Set max_dagruns_per_loop_to_schedule=2000
- Start 5000 Runs of the first DAG
- Start 5000 Runs of the second DAG
- Hard to reproduce: keep the heartrate of the scheduler low enough that Runs complete within one scheduler loop.
Operating System
Debian GNU/Linux 12 (bookworm)
Versions of Apache Airflow Providers
No response
Deployment
Astronomer
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct