Skip to content

Task with overwrite_rtif_after_execution = True raises API_SERVER_ERROR: 404 Not Found in finalize when execute() fails #66416

Description

@pankajkoti

Under which category would you file this issue?

Task SDK

Apache Airflow version

3.2.1

What happened and how to reproduce it?

When a task whose operator sets overwrite_rtif_after_execution = True raises an exception during execute(), the task supervisor/finalize path attempts to update the rendered template fields after the failure has already been reported. The SDK then sends a request to the API server that is no longer valid for the TI's state and gets back:
AirflowRuntimeError: API_SERVER_ERROR: {'status_code': 404, 'message': 'Not Found', 'detail': {'detail': 'Not Found'}}

This surfaces as a top-level error in the task log right after the original RuntimeError, so the user sees two stacked tracebacks where they should see only the original failure. This also affects remote logging where users don't see the remote logs upon retries for the earlier failed attempts because maybe the upload to remote logging is aborted/does not happen(?).

This was originally reported by Cosmos users in astronomer/astronomer-cosmos#2021 because the Cosmos local execution operator opts in to overwrite_rtif_after_execution = True on Airflow 3.x to refresh the rendered compiled_sql after the dbt invocation. However, the failure is not Cosmos-specific: any operator that sets this flag and then raises will hit the same path.

#62070 wrapped the SetRenderedFields call in finalize() with try/except so the original task failure is not masked. #63705 then simplified the error logging to avoid a RecursionError in the structlog JSON fallback when the error context is logged. Even with both merged, on 3.2.1 we still see Failed to set rendered fields during finalization followed by AirflowRuntimeError: API_SERVER_ERROR: 404 Not Found. #63719 ("Only update RTIF for terminal task states") is also being attempted as a solution but it's in draft.

Minimal reproduction (no Cosmos required)

# dags/repro_rtif_finalize.py
from __future__ import annotations

import pendulum
from airflow.sdk import DAG
from airflow.sdk.bases.operator import BaseOperator


class FailingOverwriteRTIFOperator(BaseOperator):
    """Minimal operator that triggers the finalize-time RTIF update path."""

    template_fields = ("message",)
    overwrite_rtif_after_execution = True

    def __init__(self, *, message: str = "hello {{ ds }}", **kwargs):
        super().__init__(**kwargs)
        self.message = message

    def execute(self, context):
        # Simulate any runtime failure during execute (DB error, network, etc.)
        raise RuntimeError("Intentional failure to reproduce RTIF finalize bug")


with DAG(
    dag_id="repro_rtif_finalize",
    start_date=pendulum.datetime(2026, 1, 1, tz="UTC"),
    schedule=None,
    catchup=False,
):
    FailingOverwriteRTIFOperator(task_id="boom")

How to reproduce

  1. Drop the DAG above into a fresh Airflow 3.x environment (no special executor or logging configuration required).
  2. Trigger repro_rtif_finalize once.
  3. Look at the task log for the first attempt. You will see the intended RuntimeError from execute(), followed by:

What you think should happen instead?

A failing task whose operator declares overwrite_rtif_after_execution = True should not produce a finalize-time AirflowRuntimeError i'm. Conceptually, rendered template fields should not be re-pushed to the API server when the TI has already moved into a failure state for which that endpoint is not valid (this matches the direction of #63719) or a better solution?

Operating System

No response

Deployment

None

Apache Airflow Provider(s)

No response

Versions of Apache Airflow Providers

No response

Official Helm Chart version

Not Applicable

Kubernetes Version

No response

Helm Chart configuration

No response

Docker Image customizations

No response

Anything else?

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:corearea:task-sdkkind:bugThis is a clearly a bugpriority:highHigh priority bug that should be patched quickly but does not require immediate new release

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions