These standards bind every DAG deployed to a Causeway Airflow environment. They are not recommendations. Exceptions require a documented waiver in the PR.

1. DAG shape

2. Idempotency (non-negotiable)

Danger

datetime.now() in task code is banned outright. A backfill that runs the task for last Tuesday but writes with "today's" date corrupts data in ways that take months to surface. If you need "right now" semantics, use pendulum.now() only at the task boundary (start of execution); never embed in the payload written to storage.

3. start_date, schedule, catchup

4. Dependencies

See dependency types.

5. Retries

default_args = {
    "retries": 3,
    "retry_delay": timedelta(minutes=5),
    "retry_exponential_backoff": True,
    "max_retry_delay": timedelta(hours=1),
    "on_failure_callback": alert_pagerduty,
}

6. Sensors

7. Pools

8. Task callables

Warning

The single most common Airflow performance regression is a DAG file that imports pandas or boto3 at module scope. Every scheduler parse cycle (every 30 seconds by default) pays the import cost. Multiply by 100 DAGs and the scheduler spends most of its time parsing, not scheduling. Imports inside task callables are late-bound to actual task execution and cost nothing at parse time.

9. Error handling

See error recovery.

10. Environment

11. Versioning

12. Deployment

13. Observability

14. Databricks integration (if applicable)

See the dbt + Cosmos guide for the full Cosmos config.

15. Unstructured data

16. Review checklist

PRs touching DAG code must satisfy:

See also