This walkthrough gets you from a clean Power BI Desktop install to a published report reading from a Databricks gold table. It assumes you have used SQL before and are comfortable installing desktop software. It does not assume prior Power BI experience.
What you need
- Power BI Desktop (December 2025 build or later, which ships ADBC support).
- Access to a Databricks workspace: the workspace URL and a serverless SQL warehouse ID.
SELECTgrants on at least one gold table.- A Power BI workspace to publish into (Fabric or Premium capacity).
Install Power BI Desktop from the Microsoft Store or the download page. Launch it; pick Blank report.
1. Connect to Databricks
Home → Get data → Databricks (under Database). Enter:
- Server hostname:
adb-1234567.azuredatabricks.net(nohttps://, no trailing slash) - HTTP path:
/sql/1.0/warehouses/abc123xyz - Data Connectivity mode: Import (you can change later)
Note
In the 2026 connector, the Advanced options panel exposes the driver. For new connections prefer ADBC over ODBC (Simba). ADBC is Arrow-native end-to-end: lower connection latency, faster metadata discovery, dramatically faster row transfer on large result sets.
Authenticate. The canonical identity for BI users is OAuth (Microsoft Entra ID). The authentication window opens a browser; sign in with your work account.
2. Pick a table
After auth, you see the Unity Catalog navigator: catalogs → schemas → tables. Expand prod → gold and select a curated fact and its dimensions, say fct_orders and dim_customers.
Click Transform Data (not Load). The Power Query editor opens.
3. Clean up in Power Query
The only transformations that belong here are shape fixes that cannot live upstream:
- Correct obviously wrong data types that survived the Databricks cast.
- Promote headers if the source was a flat file.
- Remove columns the semantic model will not need.
Warning
Do not do business logic in Power Query. Every GROUP BY, every merged join, every derived column here bakes the logic into the report rather than the platform. It compresses worse than a Databricks-computed version, it hides from lineage, and it makes the model hard to hand off. If you are reaching for Power Query for calculations, stop; the answer is "push it into dbt or a Databricks view."
Click Close & Apply. The data loads into Power BI's in-memory VertiPaq engine.
4. Define relationships
Model view (left sidebar, third icon down). If relationships did not auto-detect, drag customer_id from fct_orders onto dim_customers.customer_id. Cardinality: many-to-one. Cross-filter direction: single.
Do this for every dimension your fact references. Three rules:
- One relationship per fact-to-dim edge. No "alternative paths".
- Single direction (dim → fact). Bidirectional filters are a query-plan explosion; reach for them only with
CROSSFILTER()inside a specific measure when you need them locally. - No snowflakes. If
dim_customershas its own foreign key, denormalize it intodim_customersupstream rather than following the chain.
5. Write one measure
In the report view, right-click fct_orders → New measure:
Total Revenue =
CALCULATE (
SUM ( 'fct_orders'[amount_usd] ),
'fct_orders'[status] = "completed"
)
Format: $#,##0. Category: Currency.
Two reflexes from the start:
- Measures, not calculated columns. Columns materialize at refresh and bloat memory; measures compute only for the visible grain.
- Variables for anything reused:
Avg Order Value =
VAR Revenue = [Total Revenue]
VAR Orders = DISTINCTCOUNT ( 'fct_orders'[order_id] )
RETURN DIVIDE ( Revenue, Orders )
DIVIDE() instead of / handles zeros cleanly.
6. Build one visual
Insert → Clustered bar chart. Drag dim_customers[region] to Y-axis, [Total Revenue] to X-axis. You now have a revenue-by-region chart fueled by an in-memory columnar store.
7. Save as PBIP (not .pbix)
File → Save As → change Save as type to Power BI project (.pbip). Pick a folder inside your Git-backed repository.
Important
Never commit .pbix files. They are binary and produce unreadable diffs. PBIP is the folder format: .Report holds visuals, .SemanticModel holds the model as TMDL (Tabular Model Definition Language), which is human-readable, diff-friendly, and review-friendly. Always save as PBIP for anything going into source control.
8. Publish
Home → Publish. Pick a workspace (ideally a per-user dev workspace named workspace-dev-<you>). Power BI uploads the model plus the report.
The published model is called a semantic model (formerly "dataset"). Reports connect to semantic models via live connection.
9. Schedule a refresh
In the browser, open the workspace → your semantic model → Settings → Scheduled refresh. Add credentials for Databricks (OAuth / service principal), set a daily refresh.
Tip
The UI scheduler is fine for learning. For anything real, drive refresh from your orchestrator via the Enhanced Refresh REST API so the dashboard cannot show stale data after a failed upstream pipeline. See the Enhanced Refresh guide.
What just happened
You:
- Connected Power BI to a Databricks SQL warehouse via the ADBC driver.
- Pulled a gold-layer fact and dim into VertiPaq (Import mode).
- Built one measure and one visual using measures-not-columns discipline.
- Saved as PBIP so the project is version-controllable.
- Published to a workspace and set up refresh.
That is the whole development loop. The same commands scale from "revenue by region" to a governed, shared semantic model with thirty measures and five fact tables.
Next steps
- Connectivity — ADBC vs ODBC vs native; when each matters.
- Storage modes — when Import stops being the right default.
- Model authoring standards — the rules the rest of the team follows.
- PBIP + Git workflow — how concurrent edits actually work.