Make DirectQuery fast on Databricks

How-tos Power BI

DirectQuery means every visual interaction runs a SQL query against Databricks. Done wrong it feels like dialup; done right it feels close to Import. This guide is the tuning playbook.

When you are reading this

You ended up on DirectQuery because one of the following applied (see storage modes):

The fact table is tens of TB and does not fit in capacity memory.
Users need minute-level freshness.
Policy keeps data in Databricks.

If none of those is true, go back to Import. DirectQuery tuning is tractable but not free; Import is already fast.

1. Warehouse

DirectQuery performance is floored by the warehouse. Three things must be true:

Serverless SQL warehouse

Serverless only. Pro and Classic cold-start too slowly for the "click a slicer, get a result" UX. See SQL warehouse reference.

Photon on

Photon accelerates the aggregation-heavy SQL Power BI emits by 2-8×. On Serverless it is always on; on Pro, turn it on.

Warm during report hours

Set min_num_clusters: 1 and auto_stop_mins: 5-10. Cold starts are visible to end users; a scale-from-zero warehouse on an 8am dashboard hit means every executive sees 6 seconds of spinner.

Warning

"Auto-stop" is not the enemy; long cold-start is. Serverless scale-to-zero cold-starts in 2-6 seconds, which is acceptable for interactive BI. Pro cold-starts in ~4 minutes, which is not. If you are on Pro, keep the warehouse warm during report hours; if you are on Serverless, aggressive auto-stop is fine.

2. Model the fact right

The single largest DirectQuery performance determinant is the shape of the underlying tables.

Star schema, not snowflake

Power BI's query engine optimizes specifically for stars. Snowflakes produce worse query plans even with a perfect warehouse.

Facts in the middle, dimensions radiating out.
Wide dimensions, narrow facts.
Conformed dimensions (dim_date, dim_customer) shared across facts.

See model authoring standards for the full rule set.

Surrogate keys, integer-typed

Every relationship should be on an integer surrogate key. Strings compress worse and join slower; VertiPaq's dictionary encoding works best on small integer domains.

No calculated columns on the fact

Calculated columns materialize at refresh (Import) or per query (DirectQuery). Either way, the correct home for derived columns is upstream: a dbt model, a Databricks view, a metric view. They compress better and do not bloat query plans.

3. Make the warehouse help

The warehouse can only be as fast as the data it sees lets it. Four knobs.

Liquid clustering on fact tables

ALTER TABLE prod.gold.fct_orders CLUSTER BY (customer_id, order_date);

Liquid clustering adapts as query patterns shift. Reserve partition_by for hard isolation requirements (GDPR region partitions); liquid clustering is the 2026 default.

Keep statistics current

ANALYZE TABLE prod.gold.fct_orders COMPUTE STATISTICS FOR ALL COLUMNS;

Run this after a schema change or a large data load. Stale statistics make the optimizer pick wrong plans; no warehouse size fixes that.

Narrow the scan

Every DirectQuery visual emits SQL with a WHERE clause derived from the slicers and filters on the page. If those filters do not map to columns the warehouse can prune on, you scan the whole fact.

Audit: for every facing visual, check the compiled SQL in the query profile (Databricks SQL Editor → Query History → Query Profile). If the scan reads all 10B rows to aggregate one region, add a partition filter or a Z-order to the table.

Respect the row budget

A DirectQuery visual's result set is capped by Power BI (default 1M rows). If you return more, the visual silently truncates. Keep aggregations in the SQL, not in DAX.

4. Automatic aggregations

Turn on automatic aggregations for any DirectQuery semantic model with more than a few hundred million fact rows. Power BI watches query patterns, materializes the top-N aggregation tables in memory, and serves matching queries from cache.

Model view → Manage aggregations → Enable automatic aggregations

This is free performance if your model is a clean star. If it is not, aggregations will be subtly wrong; fix the model shape first.

Tip

After enabling automatic aggregations, let them bake for a week against real usage. The service chooses which aggregation tables to build based on actual query patterns, not your predictions. You get better recommendations from a week of prod traffic than from any amount of desktop modeling.

5. Dual on the dims

One table's storage mode is not the whole model's storage mode. Put your facts in DirectQuery and your dimensions in Dual. Slicer changes resolve instantly against in-memory dim copies; only fact scans round-trip.

See storage modes for the canonical deployment.

6. Shape the DAX

The five-rule DAX tour from the DAX reference matters more in DirectQuery than in Import. Specifically:

CALCULATE(SUM(x), filter) is faster than SUMX(FILTER(table, predicate), x) because the first pushes the filter to storage.
DIVIDE() instead of / handles zeros and is storage-engine-friendly.
Measures, not calculated columns; VAR for anything reused.
No bidirectional relationships; use CROSSFILTER() inside a specific measure when the behavior is truly local.

7. Avoid the `Databricks.Query` trap

Danger

Databricks.Query (custom SQL) is not supported in DirectQuery. A model that uses it builds locally and publishes, then fails at query time in the service. Always model against Databricks.Catalogs(...) and let Power BI generate the SQL. Put custom SQL in a Databricks view; the connector reads it like any other table.

8. Measure it

"I tested it in the report" is not a performance check. Use real tools.

DAX Studio

Server timings view: milliseconds per storage engine call, per formula engine step.
Query plan view: the actual plan Power BI builds for a given query.
Run common visuals' underlying DAX and read the timings before shipping.

Tabular Editor 3

Best Practice Analyzer (BPA) rules flag anti-patterns before they hit prod. The default rule set is strong; extend with org-specific rules.
Gate PRs on BPA: no new measures merge with open BPA violations.

Query history

In Databricks SQL → Query History, filter by the service principal your gateway uses. Sort by duration; look at the p95. A p95 above 3-5 seconds means at least one visual is doing work that would survive a tune-up.

9. Monitoring

Production DirectQuery models need SLIs. A reasonable starter:

Metric	Target
p95 visual render time	< 3s
p99 visual render time	< 8s
Warehouse queued-queries peak	0 sustained
Refresh success rate (for hybrid models with Imported dims)	> 99.5%

Wire these into your observability stack (Grafana, Datadog, or the Causeway-internal dashboard) as first-class SLOs, not post-hoc investigations.

Common mistakes

Symptom	Root cause
Visuals take 10+ seconds	Cold warehouse, or DirectQuery scanning the full fact
One slicer change freezes the whole report	Dims in DirectQuery instead of Dual
Different visuals show different numbers	`Databricks.Query` source is bypassing Power BI's query generation
Everything slow after a Power BI upgrade	Connection flipped back to ODBC; check the driver
"Too many rows" errors	Aggregation happening in DAX instead of SQL; push it to the warehouse