Airflow concurrency reference

Reference Airflow

Airflow has six distinct concurrency controls. They interact. Tuning the wrong one is the most common cause of "why is my DAG queueing even though workers are idle".

The six knobs

#	Knob	Scope	What it limits
1	`parallelism`	Whole cluster	Total task instances across all DAGs
2	`max_active_tasks_per_dag`	One DAG	Tasks of one DAG running simultaneously
3	`max_active_runs_per_dag`	One DAG	DAG runs live in parallel
4	`max_active_tis_per_dagrun`	One task	Task instances within a single run (dynamic mapping)
5	Pools	Across DAGs	Gates against an external rate-limited resource
6	`pool_slots`	Task-level	Slots this task consumes from its pool

Tune in order: cluster → DAG → run → mapping → pool. A slot consumed at any higher level blocks every lower one.

1. `parallelism`

Cluster-wide upper bound.

# airflow.cfg
[core]
parallelism = 64

Or via env:

AIRFLOW__CORE__PARALLELISM=64

Default: 32 (OSS); often higher on Astronomer.
Set to: the actual capacity of your worker fleet. If you have 4 workers with 16 concurrency each, parallelism = 64 is the ceiling.
Raising it above worker capacity does nothing; tasks queue.
Lowering it throttles the whole cluster regardless of worker availability.

2. `max_active_tasks_per_dag`

Per-DAG ceiling on concurrent tasks.

with DAG(
    dag_id="fat_dag",
    max_active_tasks=8,
    ...
) as dag:
    ...

Default: 16 (inherits from dag_concurrency in some versions).
Use to prevent one DAG from monopolizing the cluster during a burst.
Not a substitute for pools. If the goal is "do not hit Salesforce with more than 5 concurrent requests", use a pool, not a per-DAG limit.

3. `max_active_runs_per_dag`

How many runs of one DAG can be in flight at the same time.

with DAG(
    dag_id="stateful_dag",
    max_active_runs=1,
    ...
) as dag:
    ...

Default: 16.
Set to 1 for stateful DAGs. An hourly DAG that writes to fct_orders partitioned by hour should never overlap runs; if it does, the two runs compete to write the same partition and corrupt state.
Leave higher for DAGs whose runs are truly independent (per-tenant ingest, fan-out over a list).

Warning

max_active_runs=1 is the most commonly-missed setting in the wild. Backfills with parallel run execution regularly race on shared state. If two runs of the same DAG write to the same table, you need max_active_runs=1 — always.

4. `max_active_tis_per_dagrun`

Limits the concurrency of a single task within one run. Important for dynamically mapped tasks.

@task(max_active_tis_per_dagrun=5)
def process_file(uri: str):
    ...

process_file.expand(uri=get_file_list())

If get_file_list returns 100 URIs, without this limit Airflow spawns 100 concurrent task instances. With max_active_tis_per_dagrun=5, at most 5 are active at once.

Use when a dynamic map could fan out beyond the cluster's appetite.
Combine with pools when the bottleneck is external (API rate limit, warehouse capacity).

5. Pools

Cross-DAG gates against a shared external resource.

# Define the pool via UI, CLI, or a seeding DAG
# airflow pools set salesforce_api 5 "Max concurrent SF API calls"

# Reference from every task that hits the resource
extract = PythonOperator(
    task_id="extract_from_sf",
    pool="salesforce_api",
    pool_slots=1,
    python_callable=fetch,
)

Pool semantics:

Slots represent concurrent capacity against the resource. slots=5 means 5 concurrent tasks max.
Scoped across DAGs. Every DAG that references the pool shares its budget.
Queue when exhausted; tasks wait for a slot, not fail.
Default pool default_pool has 128 slots; tasks that don't specify a pool use it.

When a pool, not per-DAG concurrency

Goal	Use
"Do not run more than 5 concurrent tasks against Salesforce, across all DAGs"	Pool `salesforce_api` with 5 slots
"Do not let this DAG run more than 5 tasks at once"	`max_active_tasks_per_dag=5` on the DAG
"Do not overlap runs of the stateful daily ETL"	`max_active_runs=1` on the DAG

6. `pool_slots`

A single task can claim multiple slots from a pool. Useful for heavy tasks that should count as multiple:

heavy_export = PythonOperator(
    task_id="heavy_export",
    pool="warehouse_slots",
    pool_slots=3,      # this task counts as 3 concurrent tasks
    python_callable=export,
)

With pool_slots=3 in a pool of 10, heavy_export occupies 3 of 10 slots; lighter tasks in the same pool share the remaining 7.

The interaction matrix

Tasks must satisfy every applicable limit to run. A task in DAG A that references pool P with 1 slot requires:

A free slot in parallelism (cluster).
A free slot in DAG A's max_active_tasks_per_dag.
The run being within max_active_runs_per_dag.
If dynamically mapped: a free slot in max_active_tis_per_dagrun.
Enough slots in P for pool_slots.

Any bottleneck queues the task.

Schedule choices

Related to concurrency: what triggers the DAG at all.

Schedule	Meaning	Best for
`"@hourly"`, `"@daily"`, cron	Time-based	Clock-driven pipelines
`timedelta(minutes=30)`	Interval since last success	"Every X time" with backpressure
`[asset]` or `[asset_a, asset_b]`	Asset-based	Cross-DAG event-driven
`AssetWatcher(...)`	External event	Outside-world triggers
`None`	Manual only	On-demand workflows

See dependency types for which to pick.

Catchup

catchup=False  # always, in production

catchup=True replays every missed interval since start_date. A DAG paused for a week dumps 168 hourly runs into the queue the moment you unpause. Always catchup=False in production; handle intentional backfills explicitly with airflow dags backfill.

Sensor discipline

Sensors interact badly with concurrency if you get them wrong. The rules:

Always use deferrable (async) variants. They release the worker slot via the Triggerer process.
Set realistic timeout. 2× expected upstream latency is a good start.
Set sane poke_interval. Every 60 seconds is usually right; every 5 seconds usually DoS's upstream.

Danger

A synchronous sensor with timeout=None is a worker-starvation bomb. It holds a slot forever, and the first time it stalls, your Airflow cluster becomes a sensor-holding pen for every remaining DAG. Deferrable sensors are not optional; they are the baseline.

Tuning recipes

Symptom: tasks queued even though workers are idle

Diagnosis: one of the six limits is capping. Check in order:

airflow config get-value core parallelism — cluster ceiling.
Review the DAG's max_active_tasks and max_active_runs.
Check pools: airflow pools list. Any at 100% utilization?
Check for max_active_tis_per_dagrun on dynamically mapped tasks.

Symptom: one DAG monopolizing workers

Diagnosis: probably no max_active_tasks_per_dag set; one DAG bursts to fill the cluster.

Fix: add max_active_tasks=N to the greedy DAG.

Symptom: duplicate writes in a backfill

Diagnosis: max_active_runs > 1 on a stateful DAG.

Fix: set max_active_runs=1.

Symptom: API rate-limits hit during simultaneous DAG runs

Diagnosis: no pool; multiple DAGs independently exceed the rate limit.

Fix: create a pool for the API; every task referencing it shares the budget.

Symptom: async sensors never fire

Diagnosis: no Triggerer process.

Fix: deploy the Triggerer. Astronomer defaults it on; OSS installs must enable explicitly.

Config file reference

Common settings:

[core]
parallelism = 64
default_task_retries = 2
max_active_tasks_per_dag = 16
max_active_runs_per_dag = 16
dagbag_import_timeout = 120

[scheduler]
scheduler_heartbeat_sec = 5
min_file_process_interval = 60
dag_dir_list_interval = 300
parsing_processes = 4
schedule_after_task_execution = True

[celery]                              # for CeleryExecutor
worker_concurrency = 16
worker_autoscale = 16,4

[webserver]
rbac = True
web_server_worker_timeout = 120
expose_config = False

[logging]
remote_logging = True
remote_log_conn_id = aws_default
remote_base_log_folder = s3://airflow-logs/{env}/

On Astronomer, set via environment variables with the AIRFLOW__ prefix.

The six knobs

1. parallelism

2. max_active_tasks_per_dag

3. max_active_runs_per_dag

4. max_active_tis_per_dagrun

5. Pools

When a pool, not per-DAG concurrency

6. pool_slots

The interaction matrix

Schedule choices

Catchup

Sensor discipline

Tuning recipes

Symptom: tasks queued even though workers are idle

Symptom: one DAG monopolizing workers

Symptom: duplicate writes in a backfill

Symptom: API rate-limits hit during simultaneous DAG runs

Symptom: async sensors never fire

Config file reference

See also

1. `parallelism`

2. `max_active_tasks_per_dag`

3. `max_active_runs_per_dag`

4. `max_active_tis_per_dagrun`

6. `pool_slots`