Naively, every pull request rebuilds the full dbt DAG. In a project with eight hundred models that takes an hour and costs real money. Slim CI is dbt's answer: read prod's manifest.json, figure out what actually changed in the PR, and build only those models plus their downstream. This guide walks through the setup.

The core idea

Three flags do the work:

All three read the same reference: a manifest.json produced by the most recent successful prod run.

dbt build \
  --select state:modified+ \
  --defer \
  --state ./prod-manifest/ \
  --favor-state

1. Upload manifest.json after every prod run

Whatever CI/CD you use, extend the prod job to upload target/manifest.json to object storage. S3 is the canonical destination; ADLS and GCS work identically.

# At the end of the prod deploy job, after `dbt build` succeeds:
aws s3 cp target/manifest.json \
  s3://dbt-artifacts/prod/manifest.json

# And a dated copy for auditability:
aws s3 cp target/manifest.json \
  s3://dbt-artifacts/prod/archive/$(date -u +%Y-%m-%dT%H%MZ)/manifest.json

Important

The manifest.json is the single most important artifact in a dbt project. Everything downstream — Slim CI, Cosmos DAG rendering, exposures, data contracts, docs — reads it. Treat it as a build output.

2. Fetch the manifest at the start of every CI run

In the PR CI job, before dbt runs:

mkdir -p ./prod-manifest
aws s3 cp s3://dbt-artifacts/prod/manifest.json \
  ./prod-manifest/manifest.json

Warning

If the prod manifest is missing, state:modified+ cannot compute a diff and dbt treats every model as modified. CI falls back to a full build and takes an hour. Always guard with an explicit check that the file exists and fail fast if it does not.

3. Build against a PR-specific schema

Never let CI builds write into prod's schemas. Point the --target at a profile that generates a PR-scoped schema:

# profiles.yml — add a `ci` target
causeway_demo:
  outputs:
    ci:
      type: databricks
      catalog: dev
      schema: "ci_pr_{{ env_var('PR_NUMBER') }}"
      host: "{{ env_var('DATABRICKS_HOST') }}"
      http_path: "/sql/1.0/warehouses/{{ env_var('DATABRICKS_WAREHOUSE_ID') }}"
      token: "{{ env_var('DATABRICKS_TOKEN') }}"
      threads: 8

Set PR_NUMBER from the CI system's variables (GitHub Actions: ${{ github.event.pull_request.number }}).

4. The CI job itself

# .github/workflows/dbt-pr.yml
name: dbt PR
on:
  pull_request:
    paths: ['models/**', 'macros/**', 'tests/**', 'packages.yml']

jobs:
  build:
    runs-on: ubuntu-latest
    env:
      PR_NUMBER: ${{ github.event.pull_request.number }}
      DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
      DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
      DATABRICKS_WAREHOUSE_ID: ${{ secrets.DATABRICKS_WAREHOUSE_ID }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.11' }

      - run: pip install dbt-databricks==1.11.0

      - name: Fetch prod manifest
        run: |
          mkdir -p ./prod-manifest
          aws s3 cp s3://dbt-artifacts/prod/manifest.json \
            ./prod-manifest/manifest.json
          test -s ./prod-manifest/manifest.json

      - run: dbt deps

      - name: Slim CI build
        run: |
          dbt build \
            --target ci \
            --select state:modified+ \
            --defer \
            --state ./prod-manifest/ \
            --favor-state

      - name: Drop PR schema on failure
        if: always()
        run: |
          # Optional: teardown, keeping data only on success for debugging
          true

5. Even faster with --empty

--empty (dbt 1.8+) builds zero-row versions of every selected model. That validates SQL and references without scanning any data. Pair it with Slim CI for sub-minute PR feedback:

dbt build --empty --select state:modified+ --defer --state ./prod-manifest/

Gate the real-data build on a label or commit trailer. A common pattern:

Tip

--empty catches around 80% of regressions (ref errors, column drift, Jinja typos, contract violations) for about 1% of the cost of a data build. It is the right default for most PRs.

6. Teardown

PR schemas accumulate if you do not clean them up. Add an on-close workflow that drops the schema when the PR merges or closes:

dbt run-operation drop_schema_if_exists \
  --args "{schema: ci_pr_${PR_NUMBER}}"

You need to define drop_schema_if_exists as a macro in macros/:

{% macro drop_schema_if_exists(schema) %}
  {% set sql %}
    drop schema if exists {{ target.catalog }}.{{ schema }} cascade
  {% endset %}
  {% do run_query(sql) %}
{% endmacro %}

Danger

Never run drop_schema_if_exists with a value from user input without validation. A malicious or accidental value like prod.silver would drop a production schema. Constrain the macro to only accept names matching ci_pr_*.

Common mistakes

SymptomRoot cause
CI rebuilds everything, not just changesstate:modified+ cannot read the stored manifest (missing or wrong path).
CI fails on ref() to an unmodified upstream--defer is missing; CI tries to build the upstream from scratch.
CI succeeds but prod fails the same codeCI ran against a stale manifest; upstream changed between the manifest upload and the PR.
Model builds but tests fail in CIData quality issues, not SQL issues. Check the test output; do not just re-run.

See also