Naively, every pull request rebuilds the full dbt DAG. In a project with eight hundred models that takes an hour and costs real money. Slim CI is dbt's answer: read prod's manifest.json, figure out what actually changed in the PR, and build only those models plus their downstream. This guide walks through the setup.
The core idea
Three flags do the work:
state:modified+— select only models whose compiled state differs from a stored reference (plus their downstream).--defer— when a model references an upstream that was not modified, read prod's table directly instead of rebuilding it.--favor-state— when the CI schema has a stale copy of a model, prefer the state's reference.
All three read the same reference: a manifest.json produced by the most recent successful prod run.
dbt build \
--select state:modified+ \
--defer \
--state ./prod-manifest/ \
--favor-state
1. Upload manifest.json after every prod run
Whatever CI/CD you use, extend the prod job to upload target/manifest.json to object storage. S3 is the canonical destination; ADLS and GCS work identically.
# At the end of the prod deploy job, after `dbt build` succeeds:
aws s3 cp target/manifest.json \
s3://dbt-artifacts/prod/manifest.json
# And a dated copy for auditability:
aws s3 cp target/manifest.json \
s3://dbt-artifacts/prod/archive/$(date -u +%Y-%m-%dT%H%MZ)/manifest.json
Important
The manifest.json is the single most important artifact in a dbt project. Everything downstream — Slim CI, Cosmos DAG rendering, exposures, data contracts, docs — reads it. Treat it as a build output.
2. Fetch the manifest at the start of every CI run
In the PR CI job, before dbt runs:
mkdir -p ./prod-manifest
aws s3 cp s3://dbt-artifacts/prod/manifest.json \
./prod-manifest/manifest.json
Warning
If the prod manifest is missing, state:modified+ cannot compute a diff and dbt treats every model as modified. CI falls back to a full build and takes an hour. Always guard with an explicit check that the file exists and fail fast if it does not.
3. Build against a PR-specific schema
Never let CI builds write into prod's schemas. Point the --target at a profile that generates a PR-scoped schema:
# profiles.yml — add a `ci` target
causeway_demo:
outputs:
ci:
type: databricks
catalog: dev
schema: "ci_pr_{{ env_var('PR_NUMBER') }}"
host: "{{ env_var('DATABRICKS_HOST') }}"
http_path: "/sql/1.0/warehouses/{{ env_var('DATABRICKS_WAREHOUSE_ID') }}"
token: "{{ env_var('DATABRICKS_TOKEN') }}"
threads: 8
Set PR_NUMBER from the CI system's variables (GitHub Actions: ${{ github.event.pull_request.number }}).
4. The CI job itself
# .github/workflows/dbt-pr.yml
name: dbt PR
on:
pull_request:
paths: ['models/**', 'macros/**', 'tests/**', 'packages.yml']
jobs:
build:
runs-on: ubuntu-latest
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
DATABRICKS_WAREHOUSE_ID: ${{ secrets.DATABRICKS_WAREHOUSE_ID }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: '3.11' }
- run: pip install dbt-databricks==1.11.0
- name: Fetch prod manifest
run: |
mkdir -p ./prod-manifest
aws s3 cp s3://dbt-artifacts/prod/manifest.json \
./prod-manifest/manifest.json
test -s ./prod-manifest/manifest.json
- run: dbt deps
- name: Slim CI build
run: |
dbt build \
--target ci \
--select state:modified+ \
--defer \
--state ./prod-manifest/ \
--favor-state
- name: Drop PR schema on failure
if: always()
run: |
# Optional: teardown, keeping data only on success for debugging
true
5. Even faster with --empty
--empty (dbt 1.8+) builds zero-row versions of every selected model. That validates SQL and references without scanning any data. Pair it with Slim CI for sub-minute PR feedback:
dbt build --empty --select state:modified+ --defer --state ./prod-manifest/
Gate the real-data build on a label or commit trailer. A common pattern:
- Every PR runs
--emptyon push. - PRs tagged
[data]orrun-full-ciadditionally run the real Slim CI build. - Nightly cron runs the full real build regardless.
Tip
--empty catches around 80% of regressions (ref errors, column drift, Jinja typos, contract violations) for about 1% of the cost of a data build. It is the right default for most PRs.
6. Teardown
PR schemas accumulate if you do not clean them up. Add an on-close workflow that drops the schema when the PR merges or closes:
dbt run-operation drop_schema_if_exists \
--args "{schema: ci_pr_${PR_NUMBER}}"
You need to define drop_schema_if_exists as a macro in macros/:
{% macro drop_schema_if_exists(schema) %}
{% set sql %}
drop schema if exists {{ target.catalog }}.{{ schema }} cascade
{% endset %}
{% do run_query(sql) %}
{% endmacro %}
Danger
Never run drop_schema_if_exists with a value from user input without validation. A malicious or accidental value like prod.silver would drop a production schema. Constrain the macro to only accept names matching ci_pr_*.
Common mistakes
| Symptom | Root cause |
|---|---|
| CI rebuilds everything, not just changes | state:modified+ cannot read the stored manifest (missing or wrong path). |
CI fails on ref() to an unmodified upstream | --defer is missing; CI tries to build the upstream from scratch. |
| CI succeeds but prod fails the same code | CI ran against a stale manifest; upstream changed between the manifest upload and the PR. |
| Model builds but tests fail in CI | Data quality issues, not SQL issues. Check the test output; do not just re-run. |
See also
- CLI commands — full selector syntax, including
state:family. - Production readiness — Causeway's checklist for shipping a dbt project.
- Orchestrate with Cosmos — once CI is solid, how the scheduled prod job should look.