Chain Power BI refresh to your data pipeline

How-tos Power BI

The built-in scheduled refresh dropdown is fine for demos. For production it is the single most common reason dashboards show stale data after a "successful" refresh: the schedule drifted from the data pipeline, the data pipeline failed, and Power BI refreshed anyway. This guide is how to stop having that conversation.

The pattern

dbt build (Databricks)
       ↓
   success signal
       ↓
Power BI Enhanced Refresh API
       ↓
   success or failure alert

Power BI refreshes after the data pipeline succeeds, never on its own wall-clock schedule. If the pipeline fails, the refresh never runs; the dashboard keeps yesterday's data (correctly) instead of showing a fresh-but-wrong view.

1. Enhanced Refresh REST API

The Enhanced Refresh API (POST /datasets/{id}/refreshes) gives you four things the UI scheduler does not:

Async execution (no HTTP timeout when the refresh takes 30 minutes).
Table- and partition-scoped refresh (objects: [{table, partition}]).
Commit modes (transactional vs partialBatch).
Retry and timeout control per invocation.

Minimal call:

curl -X POST "https://api.powerbi.com/v1.0/myorg/groups/{group_id}/datasets/{dataset_id}/refreshes" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "full",
    "commitMode": "transactional",
    "objects": [
      {"table": "fct_sales", "partition": "Current"}
    ],
    "maxParallelism": 2,
    "retryCount": 2,
    "notifyOption": "NoNotification"
  }'

Note

Scope the refresh to the partitions that actually changed, not the whole model. Most fact tables have a Current partition (today) and historical partitions that stay frozen. Refreshing only Current takes seconds instead of the minutes a full refresh takes, and costs the capacity proportionally less.

2. Incremental refresh policy

For any fact table that grows over time, set an incremental refresh policy in the model. The policy:

Keep N years of history.
Refresh the last N days (the "incremental window").
Power BI does the partition math; you configure it once in Desktop.

In Power BI Desktop, select the fact table → Incremental refresh policy:

Archive data starting 3 years before refresh date.
Incrementally refresh data starting 7 days before refresh date.
Detect data changes via a modified-at column (optional; reduces the refreshed slice further).
Only refresh complete days (skip today, which is still arriving).

On Databricks, pair this with a source query that respects RangeStart and RangeEnd:

Source = Databricks.Catalogs(...),
Filtered = Table.SelectRows(Source, each
    [order_date] >= RangeStart and [order_date] < RangeEnd
)

Power BI substitutes RangeStart/RangeEnd at runtime, so Databricks only reads the window that needs refreshing, not the whole table.

Warning

Without RangeStart/RangeEnd in the M query, incremental refresh still works but Databricks scans the full table every run. The whole point of incremental is that the scan scopes to the window. Verify the compiled SQL in Databricks Query History shows a WHERE order_date BETWEEN ... predicate; if it does not, the M query is not parameterized.

3. Airflow integration

The Microsoft Azure provider for Apache Airflow ships PowerBIDatasetRefreshOperator. On Astronomer, the provider is already in the standard package.

from airflow.providers.microsoft.azure.operators.powerbi import (
    PowerBIDatasetRefreshOperator,
)

refresh = PowerBIDatasetRefreshOperator(
    task_id="refresh_sales_mart",
    conn_id="powerbi_default",
    dataset_id="{{ var.value.sales_semantic_model_id }}",
    group_id="{{ var.value.sales_workspace_id }}",
    wait_for_termination=True,
    request_body={
        "type": "full",
        "commitMode": "transactional",
        "objects": [{"table": "fact_sales", "partition": "Current"}],
    },
)

dbt_build_task >> refresh

wait_for_termination=True uses Airflow's deferrable trigger so the worker does not block while Power BI churns. The operator reports success, failure, or timeout back to the DAG.

Authentication

Power BI connections in Airflow use a service principal (Entra ID app registration) with the Service Principal Can Use Read Write APIs tenant setting enabled and the principal added to the target workspace as Member or higher.

# Airflow Connection (powerbi_default)
Conn Type:      Microsoft Azure Power BI
Login:          <service-principal-client-id>
Password:       <service-principal-client-secret>
Extra:          {"tenantId": "<tenant-id>"}

Danger

Do not use a named user identity for scheduled refreshes. When that person leaves, every scheduled refresh breaks at 2am on a weeknight. Service principals are the correct identity; their rotation is an Entra concern, not a people concern.

4. Lakeflow Jobs alternative

Lakeflow Jobs does not yet have a native Power BI trigger. If your entire orchestration is on Databricks and you do not want to spin up Airflow just for this, shell out to the Enhanced Refresh API from a small Python task:

import requests, os

def refresh_powerbi_model(group_id: str, dataset_id: str):
    token = get_entra_token()  # see the token helper below
    response = requests.post(
        f"https://api.powerbi.com/v1.0/myorg/groups/{group_id}/datasets/{dataset_id}/refreshes",
        headers={
            "Authorization": f"Bearer {token}",
            "Content-Type": "application/json",
        },
        json={
            "type": "full",
            "commitMode": "transactional",
            "objects": [{"table": "fact_sales", "partition": "Current"}],
        },
    )
    response.raise_for_status()

Works; less elegant than the Airflow operator but operationally fine.

5. MWAA on AWS

For teams on Managed Workflows for Apache Airflow: same operators, same service-principal auth. One useful pattern: S3 event trigger → conditional Power BI refresh. When an upstream dbt pipeline lands its run artifacts in S3, the event triggers the DAG; the DAG checks whether the fact table it reads actually changed, and only refreshes Power BI if it did.

from airflow.providers.amazon.aws.sensors.s3 import S3KeySensor

wait_for_dbt = S3KeySensor(
    task_id="wait_for_dbt_run_results",
    bucket_name="dbt-artifacts",
    bucket_key="prod/run_results.json",
    mode="reschedule",
)

check_changes = BranchPythonOperator(
    task_id="check_fact_changed",
    python_callable=compare_row_counts,
)

wait_for_dbt >> check_changes >> refresh_sales_mart

6. Monitoring

Three signals to wire into observability:

Signal	Threshold
Refresh success rate per model	> 99.5%
p95 refresh duration	Under the interval between refreshes
Stale data (latest refresh older than expected)	Alert if beyond SLA

Refresh history comes from GET /groups/{id}/datasets/{id}/refreshes; scrape it into your stack.

7. Rollback and failure handling

When a refresh fails mid-run, commitMode: "transactional" rolls the model back to the previous state. Users see yesterday's data (correct), not a half-refreshed model (dangerous).

commitMode: "partialBatch" commits table-by-table; useful for very large models where a single table's failure should not invalidate the rest. Use with caution; partial models can confuse cross-table measures.

Important

Default to transactional. Reach for partialBatch only when you have a specific reason and have briefed consumers on what partial means. Consistency in analytics is worth the occasional refresh rollback.

Common mistakes

Symptom	Root cause
UI scheduler runs even when pipeline fails	Schedule not chained to pipeline; disable UI schedule, drive from orchestrator
Incremental refresh scans full table	M query missing `RangeStart`/`RangeEnd` parameterization
Refresh fails at 2am after person leaves org	Named user credential; switch to service principal
`429 Too Many Requests`	Concurrent refreshes exceeding capacity; lower `maxParallelism` or stagger
Half-refreshed model	`commitMode: "partialBatch"` with a failure; switch to `transactional`