Set up Slim CI for your dbt project

How-tos dbt

Naively, every pull request rebuilds the full dbt DAG. In a project with eight hundred models that takes an hour and costs real money. Slim CI is dbt's answer: read prod's manifest.json, figure out what actually changed in the PR, and build only those models plus their downstream. This guide walks through the setup.

The core idea

Three flags do the work:

state:modified+ — select only models whose compiled state differs from a stored reference (plus their downstream).
--defer — when a model references an upstream that was not modified, read prod's table directly instead of rebuilding it.
--favor-state — when the CI schema has a stale copy of a model, prefer the state's reference.

All three read the same reference: a manifest.json produced by the most recent successful prod run.

dbt build \
  --select state:modified+ \
  --defer \
  --state ./prod-manifest/ \
  --favor-state

1. Upload `manifest.json` after every prod run

Whatever CI/CD you use, extend the prod job to upload target/manifest.json to object storage. S3 is the canonical destination; ADLS and GCS work identically.

# At the end of the prod deploy job, after `dbt build` succeeds:
aws s3 cp target/manifest.json \
  s3://dbt-artifacts/prod/manifest.json

# And a dated copy for auditability:
aws s3 cp target/manifest.json \
  s3://dbt-artifacts/prod/archive/$(date -u +%Y-%m-%dT%H%MZ)/manifest.json

Important

The manifest.json is the single most important artifact in a dbt project. Everything downstream — Slim CI, Cosmos DAG rendering, exposures, data contracts, docs — reads it. Treat it as a build output.

2. Fetch the manifest at the start of every CI run

In the PR CI job, before dbt runs:

mkdir -p ./prod-manifest
aws s3 cp s3://dbt-artifacts/prod/manifest.json \
  ./prod-manifest/manifest.json

Warning

If the prod manifest is missing, state:modified+ cannot compute a diff and dbt treats every model as modified. CI falls back to a full build and takes an hour. Always guard with an explicit check that the file exists and fail fast if it does not.

3. Build against a PR-specific schema

Never let CI builds write into prod's schemas. Point the --target at a profile that generates a PR-scoped schema:

# profiles.yml — add a `ci` target
causeway_demo:
  outputs:
    ci:
      type: databricks
      catalog: dev
      schema: "ci_pr_{{ env_var('PR_NUMBER') }}"
      host: "{{ env_var('DATABRICKS_HOST') }}"
      http_path: "/sql/1.0/warehouses/{{ env_var('DATABRICKS_WAREHOUSE_ID') }}"
      token: "{{ env_var('DATABRICKS_TOKEN') }}"
      threads: 8

Set PR_NUMBER from the CI system's variables (GitHub Actions: ${{ github.event.pull_request.number }}).

4. The CI job itself

# .github/workflows/dbt-pr.yml
name: dbt PR
on:
  pull_request:
    paths: ['models/**', 'macros/**', 'tests/**', 'packages.yml']

jobs:
  build:
    runs-on: ubuntu-latest
    env:
      PR_NUMBER: ${{ github.event.pull_request.number }}
      DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
      DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
      DATABRICKS_WAREHOUSE_ID: ${{ secrets.DATABRICKS_WAREHOUSE_ID }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.11' }

      - run: pip install dbt-databricks==1.11.0

      - name: Fetch prod manifest
        run: |
          mkdir -p ./prod-manifest
          aws s3 cp s3://dbt-artifacts/prod/manifest.json \
            ./prod-manifest/manifest.json
          test -s ./prod-manifest/manifest.json

      - run: dbt deps

      - name: Slim CI build
        run: |
          dbt build \
            --target ci \
            --select state:modified+ \
            --defer \
            --state ./prod-manifest/ \
            --favor-state

      - name: Drop PR schema on failure
        if: always()
        run: |
          # Optional: teardown, keeping data only on success for debugging
          true

5. Even faster with `--empty`

--empty (dbt 1.8+) builds zero-row versions of every selected model. That validates SQL and references without scanning any data. Pair it with Slim CI for sub-minute PR feedback:

dbt build --empty --select state:modified+ --defer --state ./prod-manifest/

Gate the real-data build on a label or commit trailer. A common pattern:

Every PR runs --empty on push.
PRs tagged [data] or run-full-ci additionally run the real Slim CI build.
Nightly cron runs the full real build regardless.

Tip

--empty catches around 80% of regressions (ref errors, column drift, Jinja typos, contract violations) for about 1% of the cost of a data build. It is the right default for most PRs.

6. Teardown

PR schemas accumulate if you do not clean them up. Add an on-close workflow that drops the schema when the PR merges or closes:

dbt run-operation drop_schema_if_exists \
  --args "{schema: ci_pr_${PR_NUMBER}}"

You need to define drop_schema_if_exists as a macro in macros/:

{% macro drop_schema_if_exists(schema) %}
  {% set sql %}
    drop schema if exists {{ target.catalog }}.{{ schema }} cascade
  {% endset %}
  {% do run_query(sql) %}
{% endmacro %}

Danger

Never run drop_schema_if_exists with a value from user input without validation. A malicious or accidental value like prod.silver would drop a production schema. Constrain the macro to only accept names matching ci_pr_*.

Common mistakes

Symptom	Root cause
CI rebuilds everything, not just changes	`state:modified+` cannot read the stored manifest (missing or wrong path).
CI fails on `ref()` to an unmodified upstream	`--defer` is missing; CI tries to build the upstream from scratch.
CI succeeds but prod fails the same code	CI ran against a stale manifest; upstream changed between the manifest upload and the PR.
Model builds but tests fail in CI	Data quality issues, not SQL issues. Check the test output; do not just re-run.

The core idea

1. Upload manifest.json after every prod run

2. Fetch the manifest at the start of every CI run

3. Build against a PR-specific schema

4. The CI job itself

5. Even faster with --empty

6. Teardown

Common mistakes

See also

1. Upload `manifest.json` after every prod run

5. Even faster with `--empty`