Databricks exposes three compute tiers. They look similar in the UI and they bill very differently. Using the wrong one is the single most expensive mistake teams make on the platform.

ComputeWorkloadCost profile
SQL warehouseBI, dbt, ad-hoc SQLDBU/s, scales per-query
Serverless jobsScheduled Python / notebook tasksDBU/s for task runtime only
Job clusterJobs that need specific instance types or init scriptsDBU/s for task + cluster lifetime
All-purpose clusterInteractive notebook development2–3× job compute; pays while idle

Three rules set before anything else:

  1. SQL is a SQL warehouse.
  2. A scheduled job is jobs compute.
  3. A notebook you are typing in is all-purpose compute.

Violations of any of those three are billing leaks.

SQL warehouses

Three flavors exist: Serverless, Pro, Classic. There is essentially one right answer in 2026.

Sizing, counterintuitively

Start bigger than you think, then size down. Small warehouses saturate and queue; medium and large warehouses finish queries so fast they idle and cost less overall.

The metric that matters is Peak Queued Queries. If it is above zero under normal load, you have not mis-sized the warehouse: raise max_num_clusters before bumping the t-shirt.

One warehouse per workload class, not per team

wh-bi     — many concurrent users, generous min/max clusters, scale-to-zero
wh-elt    — one dbt run at a time, single cluster, large t-shirt
wh-adhoc  — small, aggressive auto-stop

This separation prevents the "Tableau refresh slowed down our pipeline" problem. Teams share workload-class warehouses; they do not each get their own.

Serverless compute for jobs

In 2026 this is the default for scheduled jobs. Autoscaling is on. Photon is on. Cold starts measure in single-digit seconds. You pay only while tasks run.

Use it unless one of the following is true:

When those apply, use a job cluster (a classic cluster dedicated to one run, terminated on completion). Job clusters remain the fallback; serverless is the first choice.

All-purpose clusters

All-purpose clusters are for humans at notebooks. They stay alive between invocations, which is what makes interactive development feel fast. They are also priced at 2–3× job compute, which is what makes them a cost leak when anything automated attaches to one.

Danger

Never attach a production job to an all-purpose cluster. It is the single most common cost overrun surfaced in Databricks billing reviews. If a job is scheduled, it runs on serverless jobs or a job cluster. No exceptions that have not been reviewed and waived.

Photon

Photon is Databricks' vectorized C++ execution engine. It replaces the JVM query engine and delivers roughly 2–3× throughput for Parquet scans and aggregations.

Picking compute: a decision tree

Walk this in order. Stop at the first match.

  1. Is the workload a SQL query (BI, dbt, ad-hoc)? → SQL warehouse. Serverless unless unavailable.
  2. Is the workload a scheduled job (Airflow, Lakeflow Jobs, cron)? → Serverless jobs compute. Job cluster if serverless cannot support it.
  3. Are you typing into a notebook while looking at the output? → All-purpose cluster. Prefer one provisioned by an instance pool.
  4. Are you building a streaming pipeline with declarative semantics? → Serverless Lakeflow Declarative Pipelines. See the LDP guide.
  5. Are you serving low-latency transactional queries to an application? → Not compute. Lakebase. See the Lakebase guide.

Cost attribution

Every compute resource should carry custom tags for billing attribution:

{
  "custom_tags": {
    "team": "data-engineering",
    "cost_center": "DE-001",
    "project": "customer-360",
    "environment": "prod"
  }
}

Tags propagate to AWS Cost Explorer (or Azure / GCP equivalent) so Finance can cross-reference DBU spend against the team that owns it.

Cluster policies

Cluster policies enforce guardrails: allowed instance types, maximum worker counts, required tags, mandatory autotermination. They are the mechanism that prevents a rogue config from turning into a surprise on the invoice.

{
  "node_type_id": {
    "type": "allowlist",
    "values": ["m5.xlarge", "m5.2xlarge", "r5.xlarge"]
  },
  "autoscale.max_workers": {
    "type": "range",
    "minValue": 1,
    "maxValue": 20,
    "defaultValue": 8
  },
  "autotermination_minutes": {
    "type": "range",
    "minValue": 10,
    "maxValue": 120
  },
  "custom_tags.cost_center": {
    "type": "fixed",
    "value": "data-engineering"
  }
}

Every Causeway workspace applies a default policy to its all-purpose clusters. You cannot opt out; you can only request an exception.

See also