Under which category would you file this issue?
Task SDK
Apache Airflow version
3.2.1
What happened and how to reproduce it?
When a task whose operator sets overwrite_rtif_after_execution = True raises an exception during execute(), the task supervisor/finalize path attempts to update the rendered template fields after the failure has already been reported. The SDK then sends a request to the API server that is no longer valid for the TI's state and gets back:
AirflowRuntimeError: API_SERVER_ERROR: {'status_code': 404, 'message': 'Not Found', 'detail': {'detail': 'Not Found'}}
This surfaces as a top-level error in the task log right after the original RuntimeError, so the user sees two stacked tracebacks where they should see only the original failure. This also affects remote logging where users don't see the remote logs upon retries for the earlier failed attempts because maybe the upload to remote logging is aborted/does not happen(?).
This was originally reported by Cosmos users in astronomer/astronomer-cosmos#2021 because the Cosmos local execution operator opts in to overwrite_rtif_after_execution = True on Airflow 3.x to refresh the rendered compiled_sql after the dbt invocation. However, the failure is not Cosmos-specific: any operator that sets this flag and then raises will hit the same path.
#62070 wrapped the SetRenderedFields call in finalize() with try/except so the original task failure is not masked. #63705 then simplified the error logging to avoid a RecursionError in the structlog JSON fallback when the error context is logged. Even with both merged, on 3.2.1 we still see Failed to set rendered fields during finalization followed by AirflowRuntimeError: API_SERVER_ERROR: 404 Not Found. #63719 ("Only update RTIF for terminal task states") is also being attempted as a solution but it's in draft.
Minimal reproduction (no Cosmos required)
# dags/repro_rtif_finalize.py
from __future__ import annotations
import pendulum
from airflow.sdk import DAG
from airflow.sdk.bases.operator import BaseOperator
class FailingOverwriteRTIFOperator(BaseOperator):
"""Minimal operator that triggers the finalize-time RTIF update path."""
template_fields = ("message",)
overwrite_rtif_after_execution = True
def __init__(self, *, message: str = "hello {{ ds }}", **kwargs):
super().__init__(**kwargs)
self.message = message
def execute(self, context):
# Simulate any runtime failure during execute (DB error, network, etc.)
raise RuntimeError("Intentional failure to reproduce RTIF finalize bug")
with DAG(
dag_id="repro_rtif_finalize",
start_date=pendulum.datetime(2026, 1, 1, tz="UTC"),
schedule=None,
catchup=False,
):
FailingOverwriteRTIFOperator(task_id="boom")
How to reproduce
- Drop the DAG above into a fresh Airflow 3.x environment (no special executor or logging configuration required).
- Trigger repro_rtif_finalize once.
- Look at the task log for the first attempt. You will see the intended RuntimeError from execute(), followed by:
What you think should happen instead?
A failing task whose operator declares overwrite_rtif_after_execution = True should not produce a finalize-time AirflowRuntimeError i'm. Conceptually, rendered template fields should not be re-pushed to the API server when the TI has already moved into a failure state for which that endpoint is not valid (this matches the direction of #63719) or a better solution?
Operating System
No response
Deployment
None
Apache Airflow Provider(s)
No response
Versions of Apache Airflow Providers
No response
Official Helm Chart version
Not Applicable
Kubernetes Version
No response
Helm Chart configuration
No response
Docker Image customizations
No response
Anything else?
Are you willing to submit PR?
Code of Conduct
Under which category would you file this issue?
Task SDK
Apache Airflow version
3.2.1
What happened and how to reproduce it?
When a task whose operator sets
overwrite_rtif_after_execution = Trueraises an exception duringexecute(), the task supervisor/finalize path attempts to update the rendered template fields after the failure has already been reported. The SDK then sends a request to the API server that is no longer valid for the TI's state and gets back:AirflowRuntimeError: API_SERVER_ERROR: {'status_code': 404, 'message': 'Not Found', 'detail': {'detail': 'Not Found'}}This surfaces as a top-level error in the task log right after the original
RuntimeError, so the user sees two stacked tracebacks where they should see only the original failure. This also affects remote logging where users don't see the remote logs upon retries for the earlier failed attempts because maybe the upload to remote logging is aborted/does not happen(?).This was originally reported by Cosmos users in astronomer/astronomer-cosmos#2021 because the Cosmos local execution operator opts in to
overwrite_rtif_after_execution = Trueon Airflow 3.x to refresh the renderedcompiled_sqlafter the dbt invocation. However, the failure is not Cosmos-specific: any operator that sets this flag and then raises will hit the same path.#62070 wrapped the
SetRenderedFieldscall infinalize()with try/except so the original task failure is not masked. #63705 then simplified the error logging to avoid aRecursionErrorin thestructlogJSON fallback when the error context is logged. Even with both merged, on3.2.1we still seeFailed to set rendered fields during finalizationfollowed byAirflowRuntimeError: API_SERVER_ERROR: 404 Not Found. #63719 ("Only update RTIF for terminal task states") is also being attempted as a solution but it's in draft.Minimal reproduction (no Cosmos required)
How to reproduce
AirflowRuntimeError: API_SERVER_ERROR: {'status_code': 404, 'message': 'Not Found', 'detail': {'detail': 'Not Found'}}Failed to set rendered fields during finalization...AirflowRuntimeError: API_SERVER_ERROR: 404 Not FoundWhat you think should happen instead?
A failing task whose operator declares
overwrite_rtif_after_execution = Trueshould not produce a finalize-timeAirflowRuntimeErrori'm. Conceptually, rendered template fields should not be re-pushed to the API server when the TI has already moved into a failure state for which that endpoint is not valid (this matches the direction of #63719) or a better solution?Operating System
No response
Deployment
None
Apache Airflow Provider(s)
No response
Versions of Apache Airflow Providers
No response
Official Helm Chart version
Not Applicable
Kubernetes Version
No response
Helm Chart configuration
No response
Docker Image customizations
No response
Anything else?
cosmos/operators/local.py_override_rtif(self.overwrite_rtif_after_execution = Trueon Airflow 3.x) -> https://github.com/astronomer/astronomer-cosmos/blob/d33115b69da5573b33123c310a5a7b6fbc02a364/cosmos/operators/local.py#L420.Are you willing to submit PR?
Code of Conduct