The most expensive mistake teams make with Airflow is using it as a compute engine. Pandas transforms inside PythonOperator. Business logic in task callables. 5 GB data bodies flying through XCom. At scale, every one of those choices becomes an operational tax.

Airflow's job is to supervise work that happens elsewhere. The work itself runs on the system built to run it.

The five things a supervisor does

  1. Decides when a unit of work should run — schedule, event, asset update, or manual trigger.
  2. Triggers the right external system — Databricks, dbt, an API, a Lambda.
  3. Tracks success, failure, and retry state — the metadata database is the authoritative source of "did this run".
  4. Coordinates dependencies across systems — DAG A, then B, then C, where each runs on different compute.
  5. Emits metadata downstream — lineage, run results, SLA signals, observability events.

That is the whole surface. Everything else should happen elsewhere.

The engine anti-pattern

Compare two implementations of the same pipeline.

The anti-pattern: work inside the worker

@task
def transform_orders():
    import pandas as pd
    df = pd.read_parquet("s3://bucket/orders_raw.parquet")   # 5 GB of data
    df["revenue"] = df["amount"] * df["fx_rate"]
    df["is_active"] = df["status"] == "active"
    df.to_parquet("s3://bucket/orders_enriched.parquet")

What is wrong:

The right pattern: supervise, don't execute

from airflow.providers.databricks.operators.databricks import DatabricksRunNowOperator

transform = DatabricksRunNowOperator(
    task_id="transform_orders",
    databricks_conn_id="databricks_default",
    job_id="{{ var.value.transform_orders_job_id }}",
    deferrable=True,   # release the worker slot while Databricks runs
)

The 5 GB transform runs on a Databricks cluster sized for it. Airflow holds no memory. The task is deferrable, so the worker slot is free for other work while Databricks does the heavy lifting. If the task fails, Airflow retries; if the Databricks job fails, Databricks's own retry logic kicks in first.

Important

This distinction drives every other decision in Airflow. Deferrable operators, pools, Assets, concurrency knobs, retry backoff: all of them make sense only once you accept that Airflow is supervising work, not doing it. Teams that treat Airflow as a compute engine spend the bulk of their time fighting the scheduler. Teams that treat it as a supervisor spend their time building pipelines.

The rule of thumb

If a task takes 45 minutes and processes 5 GB of data inside the worker, you have built an ETL engine out of a scheduler. Stop; find the right engine.

Right engines, common cases:

What the task doesRight engine
Transforms on lakehouse dataDatabricks job (via DatabricksRunNowOperator)
SQL on a warehouseWarehouse job or dbt model, triggered from Airflow
Python batch on small dataStill Airflow, but tight and short
ML trainingDedicated GPU compute (Modal, SageMaker, Databricks GPU cluster)
API call with retriesAirflow, but deferrable with an async operator
File processing at scaleDatabricks, Glue, or a Lambda fleet; Airflow triggers and waits

The "what goes in Airflow" test

Three questions for any logic you are considering putting in a DAG file or task callable:

  1. Is this decision logic (when to run, what to run, what came before)? → Airflow.
  2. Is this metadata emission (writing lineage, logging business signal, alerting)? → Airflow.
  3. Is this computational work on non-trivial data (joins, aggregations, transforms)? → Not Airflow. Trigger the system that owns that work and wait on completion.

When in doubt, err on pushing work out. Airflow's value is consistency as an orchestrator. Taking on compute responsibilities erodes that consistency.

A legitimate small exception

Not every piece of work deserves its own cluster. A 30-second Python script that touches a few hundred rows and writes a status record is fine inside a task. The rule is "one unit of recoverable work per task," not "never touch data in Python".

The heuristic: if the task runtime dominates the scheduler orchestration overhead, find the right engine. If the task is ten seconds of trivial glue, keep it in Airflow.

What this means for DAG design

Once you internalize the supervisor model, DAG design becomes mostly shape:

Every other section of the Airflow documentation makes more sense with this frame in place.

See also