A materialization is how dbt physically persists your model. dbt-databricks supports five: view, table, incremental, materialized_view, and streaming_table. Each has a sweet spot; mismatches waste money and confuse debugging.

The five

MaterializationWhat it isCost profile
viewA SQL view; computed on readZero storage; query-time cost
tableA full table rebuild every runWrite cost every run; cheap reads
incrementalMerge/append/overwrite of new rows onlyLow write per run after first; complex
materialized_viewDatabricks-managed MV (delegated to DLT)Databricks decides refresh cadence
streaming_tableReads a Delta change feed or streamContinuous; latency-over-cost
ephemeralNot a materialization; a CTE inlined into callersNo storage; inflates downstream SQL

Decision framework

Walk the questions in order. Stop at the first "yes".

  1. Is the model 1:1 with a source table and only renames/casts?view (staging default).
  2. Is the model a small reusable building block not exposed downstream?ephemeral or view (intermediate default).
  3. Does the full-refresh cost hurt and the source has a reliable watermark?incremental.
  4. Is the model an aggregation of a large source that Databricks could maintain for you?materialized_view.
  5. Does latency matter more than cost, and is there a streaming source?streaming_table.
  6. Otherwisetable.

Defaults per layer

# dbt_project.yml
models:
  my_project:
    staging:
      +materialized: view
    intermediate:
      +materialized: ephemeral
    marts:
      +materialized: table

Override per-model with a config() block when the rule of thumb does not fit.

view — the default you should reach for

Views cost nothing to maintain. They always reflect the latest source data. They are the right default for staging and most intermediates.

Views stop being right when:

table — when writes are cheaper than the reads would be

A table materialization does CREATE OR REPLACE TABLE ... AS ... every run. You pay the full rewrite cost on the schedule, in exchange for cheap reads the rest of the time.

table is the right mart default when:

Move to incremental when the full-refresh cost becomes painful.

incremental — the one that earns its complexity

Incremental models merge, append, or overwrite only the new data. They are the single largest source of pain in a production dbt project. They are also the single largest source of cost savings when the source is big.

Read the dedicated incremental model guide before building your first one. The failure modes (drift, schema change, unique-key violations) are shape-specific.

materialized_view — hand the refresh logic to Databricks

materialized_view delegates to Delta Live Tables under the hood. Databricks owns the incrementalization: no is_incremental(), no unique_key debate, no on_schema_change config. You write the aggregation query; DLT figures out what to change.

The trade-off:

Good default for aggregation-heavy gold tables where "eventually consistent, roughly hourly" is fine.

streaming_table — when latency beats cost

streaming_table reads from a Delta change feed or a streaming source (Kafka, Kinesis). Use it when:

Not appropriate for dashboards. Very appropriate for operational surfaces (fraud detection, live product counts).

ephemeral — useful, sometimes

An ephemeral model is not materialized at all. dbt inlines it into the callers' compiled SQL as a CTE.

Pros:

Cons:

Rule of thumb: ephemeral for intermediates used by one or two downstream models, view when used by more.

Picking a strategy for incremental

Five strategies, all Delta-only unless noted:

StrategyUse when
merge (default)Updates happen to existing rows
appendEvents are immutable (pure inserts)
insert_overwriteYou rewrite whole partitions, like "today"
replace_whereYou rewrite a predicate-scoped slice
microbatch (1.9+)Late-arriving events in a time-series fact

Start with merge if you have updates, append if you do not. Move to insert_overwrite or replace_where if the merge cost hurts and the rewrite boundary is clean.

Physical layer: clustering

On Databricks, every table and incremental should use liquid clustering unless you have a specific reason to partition.

{{ config(
    materialized='table',
    liquid_clustered_by=['customer_id', 'event_date']
) }}

Liquid clustering adapts as query patterns shift and eliminates the "wrong partition key" mistake. Fall back to partition_by only when you need hard isolation (GDPR region boundaries, for instance) or you are on a runtime too old for clustering.

See also