dbt production readiness checklist

Standards dbt

A dbt project is production-ready when a steward can step away from it and nothing catches fire in their absence. This document is the checklist that makes that true.

1. Project hygiene

[ ] Three-layer structure (staging/, intermediate/, marts/); no fourth layer.
[ ] All models conform to naming standards (see Model authoring).
[ ] dbt_project.yml declares per-layer defaults (materialization, schema, tags).
[ ] packages.yml pins versions; no floating latest.
[ ] dbt compile passes on a fresh clone with no warnings.
[ ] dbt parse runs in under 5 seconds on current project size.

2. Environment isolation

[ ] Unity Catalog catalogs per environment: dev, staging, prod.
[ ] Schemas carve domains inside each catalog.
[ ] generate_schema_name macro produces per-developer schemas in dev, unprefixed in prod.
[ ] profiles.yml uses env vars for secrets; no literal tokens in any committed file.

{% macro generate_schema_name(custom_schema_name, node) -%}
  {%- if target.name == 'prod' -%}
    {{ custom_schema_name | default(target.schema, true) }}
  {%- else -%}
    {{ target.schema }}{% if custom_schema_name %}_{{ custom_schema_name }}{% endif %}
  {%- endif -%}
{%- endmacro %}

Danger

Before any prod deploy, confirm generate_schema_name is in place. Without it, a developer's dbt run can silently write to prod schemas named the same as their dev schemas. The fastest way to lose confidence in a data platform is to find out a developer's test data ended up in the executive dashboard.

3. Materialization choices are justified

[ ] Staging: all view. Any exceptions documented.
[ ] Intermediate: ephemeral unless referenced by 3+ downstream (then view).
[ ] Marts: table by default; incremental only when full refresh exceeds ~10 min or costs more than the team is willing to absorb nightly; materialized_view for large aggregations Databricks can maintain; streaming_table for latency-critical paths.
[ ] Every partition_by has an isolation justification. Otherwise liquid clustering.
[ ] Named compute defined in profiles.yml; heavy models pinned to a larger warehouse.

4. Testing

[ ] Generic data tests on every staging primary key (not_null + unique).
[ ] Generic data tests on every mart grain key.
[ ] relationships tests on every foreign key in marts.
[ ] Unit tests (dbt 1.8+) on every model containing non-trivial logic: window functions, case-when chains, regex, date math, SCD.
[ ] Singular tests for domain-specific invariants (e.g., "no refund greater than its order").
[ ] Model contracts enforced on every mart that BI or services consume; versioned via versions: for breaking changes.
[ ] store_failures: true at project level so failed rows are inspectable in dbt_test_audit.

Unit tests run in dev and CI. Never in prod. dbt Labs is explicit on this.

5. CI pipeline

[ ] Every push runs dbt parse && dbt compile in under 30 seconds.
[ ] Every PR runs SQLFluff with the dbt templater.
[ ] Every PR runs unit tests: dbt test --select test_type:unit.
[ ] Every PR runs Slim CI: dbt build --select state:modified+ --defer --state ./prod-manifest/ --favor-state into a PR-specific schema.
[ ] PR schemas torn down on merge or timeout.
[ ] PR CI finishes within 5 minutes on an average change.

See Slim CI guide for the canonical pipeline.

6. Prod deploy

[ ] Single source of truth for when prod runs (Airflow DAG, dbt Cloud job, Databricks Workflow).
[ ] Prod run uses dbt build (never dbt run + dbt test split).
[ ] manifest.json uploaded to object storage after every successful prod run. CI reads this file; it is not optional.
[ ] A dated copy of manifest.json is kept for 30+ days for audit and rollback.
[ ] partial_parse.msgpack shipped alongside the manifest for orchestrator consumers (Cosmos).

# Trailing steps of the prod deploy
dbt build --target prod
aws s3 cp target/manifest.json              s3://dbt-artifacts/prod/manifest.json
aws s3 cp target/manifest.json              s3://dbt-artifacts/prod/archive/$(date -u +%FT%H%MZ)/manifest.json
aws s3 cp target/partial_parse.msgpack      s3://dbt-artifacts/prod/partial_parse.msgpack

7. Orchestration

[ ] One Airflow task per dbt node via Cosmos; not BashOperator("dbt run").
[ ] LoadMode.DBT_MANIFEST in production; never DBT_LS.
[ ] TestBehavior.AFTER_EACH so test failures block downstream.
[ ] emit_datasets=True so downstream DAGs can trigger on model updates.
[ ] dbt in its own virtualenv, separate from Airflow's package set.
[ ] partial_parse=True and partial_parse.msgpack present to cut per-task parse cost.

See Orchestrate with Cosmos for the full configuration.

Warning

BashOperator("dbt run") is banned in Causeway production code. It gives you one Airflow task for an entire dbt DAG: no per-model retry, no lineage, no parallelism, no per-task log isolation. If a team cannot adopt Cosmos for compatibility reasons, get an exception on file before shipping.

8. Source freshness monitoring

[ ] Every source declares warn_after and error_after freshness thresholds.
[ ] dbt source freshness runs in orchestration on a schedule appropriate to each source's cadence.
[ ] Freshness failures alert the data team, not the dbt CI pipeline.
[ ] Source freshness results uploaded as artifacts (target/sources.json) for tools like Elementary.

9. Observability

[ ] manifest.json + run_results.json + catalog.json uploaded after every run.
[ ] A dashboard (Elementary, custom, or otel-based) shows model run success/failure over time.
[ ] Model execution time is tracked per node; regressions flag in review.
[ ] On-call rotation knows where to find the last-run logs without asking.

10. Documentation and contracts

[ ] Every public mart has a description field in its yml.
[ ] Every public mart's columns have description fields.
[ ] Every public mart has an enforced contract:.
[ ] dbt docs generate runs in CI; the output is published somewhere internal consumers can browse.

Note

"Documented" means in the yml file, not in a Notion page. dbt's docs regenerate on every run; Notion pages do not. When yml and Notion disagree, yml wins. The fastest way to keep docs honest is to make them part of the artifact pipeline.

11. Backfills and recovery

[ ] Project has a documented runbook for --full-refreshing a specific model.
[ ] Project has a documented runbook for backfilling a date range on an incremental model.
[ ] On-call knows how to identify which models errored last run and how to rerun just those.
[ ] For high-value marts, backfill windows are tested at least once per quarter.

See Failure triage for the 5-minute incident procedure.

12. Security

[ ] No secrets in committed files. All sensitive values from env vars.
[ ] Service principal used for prod runs has scope-limited grants: USE CATALOG + USE SCHEMA + SELECT on sources; CREATE TABLE on target schemas; nothing broader.
[ ] Dev tokens rotated on a schedule (90 days typical).
[ ] PII-bearing models declare classification in Causeway's contract system (see the contract triple) and are materialized into Restricted-tier schemas.

13. Scale readiness

When the project crosses these thresholds, extra attention is warranted:

Model count	Action
100	Partial parse must be on (default since 1.4). Verify.
300	`--empty` CI path for sub-minute PR feedback.
500	Cosmos `TestBehavior.BUILD` or per-model task groups to keep DAG parse tractable.
800	Dedicated manifest archival, retention policy, and a documented recovery procedure.
1500	Split into multiple dbt projects, glued by exposures and manifest handoffs (see dbt-loom).

14. The promote-to-prod gate

Before a dbt project (or a new mart) is considered production-ready, a reviewer confirms each section above with a concrete artifact: a PR review note, a link to the CI run, a screenshot of the object-storage manifest, etc. The gate is the checklist; there is no other ceremony.

Deviations require an RFD and a dated waiver in the project's README. Waivers expire.