This walks you from an empty directory to a dbt project that builds against Databricks, compiles cleanly, runs one staging model, and shows up in Unity Catalog lineage. It assumes you can read SQL and have run a command line before; it does not assume prior dbt experience.
What you need
- A Databricks workspace with a SQL warehouse you can query.
- A Unity Catalog
devcatalog and a schema you own. - Python 3.11 and
pip. - A personal access token from the workspace (Settings → Developer → Access tokens).
1. Install the adapter
pip install dbt-databricks==1.11.0
Note
Install dbt-databricks, not dbt-core. The Databricks adapter pulls dbt-core in as a dependency and stays version-aligned with Databricks runtime features like liquid clustering and streaming tables.
2. Create a project
dbt init causeway_demo
dbt init prompts for an adapter and a profile name. Pick databricks and name the profile causeway_demo to match.
3. Configure the profile
dbt reads connection settings from ~/.dbt/profiles.yml. Create or edit that file:
causeway_demo:
target: dev
outputs:
dev:
type: databricks
catalog: dev
schema: "{{ env_var('DBT_SCHEMA', 'dbt_' ~ env_var('USER')) }}"
host: "{{ env_var('DATABRICKS_HOST') }}"
http_path: "/sql/1.0/warehouses/{{ env_var('DATABRICKS_WAREHOUSE_ID') }}"
token: "{{ env_var('DATABRICKS_TOKEN') }}"
threads: 4
Export the three environment variables your profile reads:
export DATABRICKS_HOST=your-workspace.cloud.databricks.com
export DATABRICKS_WAREHOUSE_ID=123abc456def
export DATABRICKS_TOKEN=dapi...
Test the connection:
dbt debug
Warning
dbt debug talks to your workspace and your warehouse. If the warehouse is stopped, dbt waits for it to start, which can take 30 to 60 seconds the first time.
4. Write a staging model
Inside your project, create models/staging/stg_orders.sql:
with source as (
select * from {{ source('bronze', 'raw_orders') }}
),
renamed as (
select
order_id,
customer_id,
cast(amount as decimal(12, 2)) as amount_usd,
cast(placed_at as timestamp) as placed_at,
_loaded_at as ingested_at
from source
)
select * from renamed
Declare the source it reads in models/staging/_sources.yml:
version: 2
sources:
- name: bronze
database: dev
schema: bronze
tables:
- name: raw_orders
Tip
Staging models are always named stg_<source>__<table> and are always views. They exist so every column rename and type cast lives in exactly one place. Later models reference ref('stg_orders'), never the source directly.
5. Build it
dbt build --select stg_orders
build compiles the SQL, runs it against your warehouse, and runs any tests you declared on the model. A successful run prints 1 of 1 OK and exits zero.
6. See it in Unity Catalog
Open the Databricks UI, navigate to Data → Catalog → dev → your schema. You should see stg_orders as a view, with column-level lineage back to bronze.raw_orders. dbt-databricks emits this lineage automatically; you do not need to configure anything.
Important
Unity Catalog lineage is the real source of truth for how data flows through your warehouse. Resist building a parallel lineage system. If a downstream query does not show up in the UC lineage panel, the column is probably constructed dynamically (SELECT * expansion, unioned columns) and needs to be listed explicitly in the model.
Next steps
- The contract triple — how Causeway layers governance onto the dbt project.
- Project layers — where to put the next model: intermediate, mart, or stay in staging.
- Your first incremental model — when the nightly full refresh starts to hurt.