The built-in scheduled refresh dropdown is fine for demos. For production it is the single most common reason dashboards show stale data after a "successful" refresh: the schedule drifted from the data pipeline, the data pipeline failed, and Power BI refreshed anyway. This guide is how to stop having that conversation.
The pattern
dbt build (Databricks)
↓
success signal
↓
Power BI Enhanced Refresh API
↓
success or failure alert
Power BI refreshes after the data pipeline succeeds, never on its own wall-clock schedule. If the pipeline fails, the refresh never runs; the dashboard keeps yesterday's data (correctly) instead of showing a fresh-but-wrong view.
1. Enhanced Refresh REST API
The Enhanced Refresh API (POST /datasets/{id}/refreshes) gives you four things the UI scheduler does not:
- Async execution (no HTTP timeout when the refresh takes 30 minutes).
- Table- and partition-scoped refresh (
objects: [{table, partition}]). - Commit modes (
transactionalvspartialBatch). - Retry and timeout control per invocation.
Minimal call:
curl -X POST "https://api.powerbi.com/v1.0/myorg/groups/{group_id}/datasets/{dataset_id}/refreshes" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"type": "full",
"commitMode": "transactional",
"objects": [
{"table": "fct_sales", "partition": "Current"}
],
"maxParallelism": 2,
"retryCount": 2,
"notifyOption": "NoNotification"
}'
Note
Scope the refresh to the partitions that actually changed, not the whole model. Most fact tables have a Current partition (today) and historical partitions that stay frozen. Refreshing only Current takes seconds instead of the minutes a full refresh takes, and costs the capacity proportionally less.
2. Incremental refresh policy
For any fact table that grows over time, set an incremental refresh policy in the model. The policy:
- Keep N years of history.
- Refresh the last N days (the "incremental window").
- Power BI does the partition math; you configure it once in Desktop.
In Power BI Desktop, select the fact table → Incremental refresh policy:
- Archive data starting 3 years before refresh date.
- Incrementally refresh data starting 7 days before refresh date.
- Detect data changes via a modified-at column (optional; reduces the refreshed slice further).
- Only refresh complete days (skip today, which is still arriving).
On Databricks, pair this with a source query that respects RangeStart and RangeEnd:
Source = Databricks.Catalogs(...),
Filtered = Table.SelectRows(Source, each
[order_date] >= RangeStart and [order_date] < RangeEnd
)
Power BI substitutes RangeStart/RangeEnd at runtime, so Databricks only reads the window that needs refreshing, not the whole table.
Warning
Without RangeStart/RangeEnd in the M query, incremental refresh still works but Databricks scans the full table every run. The whole point of incremental is that the scan scopes to the window. Verify the compiled SQL in Databricks Query History shows a WHERE order_date BETWEEN ... predicate; if it does not, the M query is not parameterized.
3. Airflow integration
The Microsoft Azure provider for Apache Airflow ships PowerBIDatasetRefreshOperator. On Astronomer, the provider is already in the standard package.
from airflow.providers.microsoft.azure.operators.powerbi import (
PowerBIDatasetRefreshOperator,
)
refresh = PowerBIDatasetRefreshOperator(
task_id="refresh_sales_mart",
conn_id="powerbi_default",
dataset_id="{{ var.value.sales_semantic_model_id }}",
group_id="{{ var.value.sales_workspace_id }}",
wait_for_termination=True,
request_body={
"type": "full",
"commitMode": "transactional",
"objects": [{"table": "fact_sales", "partition": "Current"}],
},
)
dbt_build_task >> refresh
wait_for_termination=True uses Airflow's deferrable trigger so the worker does not block while Power BI churns. The operator reports success, failure, or timeout back to the DAG.
Authentication
Power BI connections in Airflow use a service principal (Entra ID app registration) with the Service Principal Can Use Read Write APIs tenant setting enabled and the principal added to the target workspace as Member or higher.
# Airflow Connection (powerbi_default)
Conn Type: Microsoft Azure Power BI
Login: <service-principal-client-id>
Password: <service-principal-client-secret>
Extra: {"tenantId": "<tenant-id>"}
Danger
Do not use a named user identity for scheduled refreshes. When that person leaves, every scheduled refresh breaks at 2am on a weeknight. Service principals are the correct identity; their rotation is an Entra concern, not a people concern.
4. Lakeflow Jobs alternative
Lakeflow Jobs does not yet have a native Power BI trigger. If your entire orchestration is on Databricks and you do not want to spin up Airflow just for this, shell out to the Enhanced Refresh API from a small Python task:
import requests, os
def refresh_powerbi_model(group_id: str, dataset_id: str):
token = get_entra_token() # see the token helper below
response = requests.post(
f"https://api.powerbi.com/v1.0/myorg/groups/{group_id}/datasets/{dataset_id}/refreshes",
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
},
json={
"type": "full",
"commitMode": "transactional",
"objects": [{"table": "fact_sales", "partition": "Current"}],
},
)
response.raise_for_status()
Works; less elegant than the Airflow operator but operationally fine.
5. MWAA on AWS
For teams on Managed Workflows for Apache Airflow: same operators, same service-principal auth. One useful pattern: S3 event trigger → conditional Power BI refresh. When an upstream dbt pipeline lands its run artifacts in S3, the event triggers the DAG; the DAG checks whether the fact table it reads actually changed, and only refreshes Power BI if it did.
from airflow.providers.amazon.aws.sensors.s3 import S3KeySensor
wait_for_dbt = S3KeySensor(
task_id="wait_for_dbt_run_results",
bucket_name="dbt-artifacts",
bucket_key="prod/run_results.json",
mode="reschedule",
)
check_changes = BranchPythonOperator(
task_id="check_fact_changed",
python_callable=compare_row_counts,
)
wait_for_dbt >> check_changes >> refresh_sales_mart
6. Monitoring
Three signals to wire into observability:
| Signal | Threshold |
|---|---|
| Refresh success rate per model | > 99.5% |
| p95 refresh duration | Under the interval between refreshes |
| Stale data (latest refresh older than expected) | Alert if beyond SLA |
Refresh history comes from GET /groups/{id}/datasets/{id}/refreshes; scrape it into your stack.
7. Rollback and failure handling
When a refresh fails mid-run, commitMode: "transactional" rolls the model back to the previous state. Users see yesterday's data (correct), not a half-refreshed model (dangerous).
commitMode: "partialBatch" commits table-by-table; useful for very large models where a single table's failure should not invalidate the rest. Use with caution; partial models can confuse cross-table measures.
Important
Default to transactional. Reach for partialBatch only when you have a specific reason and have briefed consumers on what partial means. Consistency in analytics is worth the occasional refresh rollback.
Common mistakes
| Symptom | Root cause |
|---|---|
| UI scheduler runs even when pipeline fails | Schedule not chained to pipeline; disable UI schedule, drive from orchestrator |
| Incremental refresh scans full table | M query missing RangeStart/RangeEnd parameterization |
| Refresh fails at 2am after person leaves org | Named user credential; switch to service principal |
429 Too Many Requests | Concurrent refreshes exceeding capacity; lower maxParallelism or stagger |
| Half-refreshed model | commitMode: "partialBatch" with a failure; switch to transactional |
See also
- Semantic models — the artifact you are refreshing.
- CI/CD for Power BI — deploying the model that gets refreshed.
- dbt + Databricks quickstart — the upstream pipeline this refresh chains from.