dbt on Databricks: quickstart · Docs

Get started dbt

This walks you from an empty directory to a dbt project that builds against Databricks, compiles cleanly, runs one staging model, and shows up in Unity Catalog lineage. It assumes you can read SQL and have run a command line before; it does not assume prior dbt experience.

What you need

A Databricks workspace with a SQL warehouse you can query.
A Unity Catalog dev catalog and a schema you own.
Python 3.11 and pip.
A personal access token from the workspace (Settings → Developer → Access tokens).

1. Install the adapter

pip install dbt-databricks==1.11.0

Note

Install dbt-databricks, not dbt-core. The Databricks adapter pulls dbt-core in as a dependency and stays version-aligned with Databricks runtime features like liquid clustering and streaming tables.

2. Create a project

dbt init causeway_demo

dbt init prompts for an adapter and a profile name. Pick databricks and name the profile causeway_demo to match.

3. Configure the profile

dbt reads connection settings from ~/.dbt/profiles.yml. Create or edit that file:

causeway_demo:
  target: dev
  outputs:
    dev:
      type: databricks
      catalog: dev
      schema: "{{ env_var('DBT_SCHEMA', 'dbt_' ~ env_var('USER')) }}"
      host: "{{ env_var('DATABRICKS_HOST') }}"
      http_path: "/sql/1.0/warehouses/{{ env_var('DATABRICKS_WAREHOUSE_ID') }}"
      token: "{{ env_var('DATABRICKS_TOKEN') }}"
      threads: 4

Export the three environment variables your profile reads:

export DATABRICKS_HOST=your-workspace.cloud.databricks.com
export DATABRICKS_WAREHOUSE_ID=123abc456def
export DATABRICKS_TOKEN=dapi...

Test the connection:

dbt debug

Warning

dbt debug talks to your workspace and your warehouse. If the warehouse is stopped, dbt waits for it to start, which can take 30 to 60 seconds the first time.

4. Write a staging model

Inside your project, create models/staging/stg_orders.sql:

with source as (
    select * from {{ source('bronze', 'raw_orders') }}
),
renamed as (
    select
        order_id,
        customer_id,
        cast(amount as decimal(12, 2)) as amount_usd,
        cast(placed_at as timestamp)    as placed_at,
        _loaded_at                       as ingested_at
    from source
)

select * from renamed

Declare the source it reads in models/staging/_sources.yml:

version: 2
sources:
  - name: bronze
    database: dev
    schema: bronze
    tables:
      - name: raw_orders

Tip

Staging models are always named stg_<source>__<table> and are always views. They exist so every column rename and type cast lives in exactly one place. Later models reference ref('stg_orders'), never the source directly.

5. Build it

dbt build --select stg_orders

build compiles the SQL, runs it against your warehouse, and runs any tests you declared on the model. A successful run prints 1 of 1 OK and exits zero.

6. See it in Unity Catalog

Open the Databricks UI, navigate to Data → Catalog → dev → your schema. You should see stg_orders as a view, with column-level lineage back to bronze.raw_orders. dbt-databricks emits this lineage automatically; you do not need to configure anything.

Important

Unity Catalog lineage is the real source of truth for how data flows through your warehouse. Resist building a parallel lineage system. If a downstream query does not show up in the UC lineage panel, the column is probably constructed dynamically (SELECT * expansion, unioned columns) and needs to be listed explicitly in the model.

Next steps

The contract triple — how Causeway layers governance onto the dbt project.
Project layers — where to put the next model: intermediate, mart, or stay in staging.
Your first incremental model — when the nightly full refresh starts to hurt.