Skip to content

Resolve VariableInterval deadlines safely at DagRun creation#68917

Open
seanghaeli wants to merge 2 commits into
apache:mainfrom
aws-mwaa:feature/variable-interval-resolution
Open

Resolve VariableInterval deadlines safely at DagRun creation#68917
seanghaeli wants to merge 2 commits into
apache:mainfrom
aws-mwaa:feature/variable-interval-resolution

Conversation

@seanghaeli

@seanghaeli seanghaeli commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Why

This re-introduces the VariableInterval resolution portion of #66608.

A DeadlineAlert configured with a VariableInterval is resolved when the scheduler creates a DagRun, inside DAG._process_dagrun_deadline_alerts (which runs under the prohibit_commit guard). Two problems are fixed:

  1. Resolution goes through the full secrets chain (env vars, configured secrets backends, then the metadata DB) via a dedicated _resolve_variable_interval helper, rather than Variable.get / begin_nested. Variable.get and a SAVEPOINT release both commit on the scheduler's session, tripping prohibit_commit (UNEXPECTED COMMIT) and silently dropping deadlines for every scheduled DagRun. The helper passes the scheduler session through to the metastore backend so the DB read does not commit, and reading via the secrets chain (not the variable table directly) means AIRFLOW_VAR_* env vars and secrets-backend-backed Variables resolve too.

  2. Each deadline alert is isolated with a plain try/except (deliberately not begin_nested, which would commit a SAVEPOINT and trip the same guard). Creating a deadline is auxiliary to creating the DagRun; a single bad alert — a missing/invalid backing Variable, or an undecodable serialized blob — must never abort the DagRun and stop the DAG from scheduling.

VariableInterval.resolve is split into resolve + coerce_to_timedelta so the scheduler-side reader reuses the exact same validation (including the OverflowError -> ValueError translation) without going through Variable.get.

Tests

  • airflow-core/tests/unit/models/test_dagrun.py: VariableInterval resolves from a real Variable row and from an AIRFLOW_VAR_* env var; a missing Variable and an undecodable alert are isolated (DagRun still created, no Deadline row, error logged).
  • task-sdk/tests/task_sdk/definitions/test_deadline.py: coerce_to_timedelta validation (non-integer, <= 0, overflow).

Verified locally in Breeze: 8 passed (dagrun deadline/variable) + 14 passed (SDK TestVariableInterval).

Generated-by: Claude Code (Opus via Claude Code) on behalf of Sean Ghaeli

@seanghaeli seanghaeli force-pushed the feature/variable-interval-resolution branch 2 times, most recently from 7aee60c to 94825ec Compare June 23, 2026 20:54
@seanghaeli seanghaeli requested a review from potiuk as a code owner June 23, 2026 20:54
@seanghaeli seanghaeli marked this pull request as draft June 23, 2026 20:57
A ``DeadlineAlert`` configured with a ``VariableInterval`` is resolved when the
scheduler creates a DagRun, inside ``DAG._process_dagrun_deadline_alerts`` (which
runs under the ``prohibit_commit`` guard). Two problems are fixed:

1. Resolution now goes through the full secrets chain (env vars, configured
   secrets backends, then the metadata DB) via a dedicated
   ``_resolve_variable_interval`` helper, rather than ``Variable.get`` /
   ``begin_nested``. ``Variable.get`` and a SAVEPOINT release both commit on the
   scheduler's session, tripping ``prohibit_commit`` ("UNEXPECTED COMMIT") and
   silently dropping deadlines for every scheduled DagRun. The helper passes the
   scheduler session through to the metastore backend so the DB read happens
   without committing, and reading via the secrets chain (not the variable table
   directly) means ``AIRFLOW_VAR_*`` env vars and secrets backends resolve too.

2. Each deadline alert is isolated with a plain ``try``/``except`` (NOT
   ``begin_nested``). Creating a deadline is auxiliary to creating the DagRun; a
   single bad alert -- a missing/invalid backing Variable, or an undecodable
   serialized blob -- must never abort the DagRun and stop the DAG scheduling.

``VariableInterval.resolve`` is split into ``resolve`` + ``coerce_to_timedelta``
so the scheduler-side reader reuses the exact same validation (including the
OverflowError -> ValueError translation) without going through ``Variable.get``.

Generated-by: Claude Code (Sonnet/Opus via Claude Code) on behalf of Sean Ghaeli
@seanghaeli seanghaeli force-pushed the feature/variable-interval-resolution branch from 94825ec to 2342c03 Compare June 23, 2026 21:31
@seanghaeli seanghaeli marked this pull request as ready for review June 23, 2026 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant