How-tos · Docs · Causeway

Docs · By type

Airflow

The idempotency, atomicity, and DAG-shaping rules that let an Airflow pipeline survive contact with a backfill. Walks through a realistic hourly ingest DAG.

AirflowHow-tos

Error recovery: retries, pools, sensors, callbacks

The hard part of orchestration is what happens when things go wrong. Retries are for transient failures, pools are for rate-limited resources, sensors are for not-ready-yet. Get the distinction right or you build flakiness.

AirflowHow-tos

Triage a failed Airflow DAG

The on-call procedure for a failed task: classify, check blast radius, pick a recovery path. Five minutes to the first decision.

AirflowHow-tos

Event-driven DAGs with Assets and AssetWatchers

Stop polling. Trigger DAGs when the data arrives, whether the producer is another DAG or an external system.

AirflowHow-tos

dbt

Build your first incremental dbt model

Moving a model from full refresh to incremental without breaking downstream consumers. Step-by-step with Databricks-specific choices.

dbtHow-tos

Set up Slim CI for your dbt project

Running only modified models and their downstream, deferring everything else to prod. The pattern that keeps CI under five minutes as the project grows.

dbtHow-tos

Orchestrate dbt with Airflow + Astronomer Cosmos

One Airflow task per dbt model, data-aware scheduling, and how to avoid the single-BashOperator trap.

dbtHow-tos

Triage a failed dbt model

The first five minutes after a dbt run fails: classify the error, identify the blast radius, and pick the right recovery path.

dbtHow-tos

Databricks

Ship jobs with Databricks Asset Bundles

YAML in, deployed resources out. The only sanctioned way to ship anything to a Causeway Databricks workspace.

DatabricksHow-tos

Lakebase: OLTP in the lakehouse

Managed Postgres inside Databricks. When to reach for it, the adoption ladder, and the Postgres fundamentals that managed does not absolve you from.

DatabricksHow-tos

Build a Lakeflow Declarative Pipeline

A walkthrough of building an LDP (formerly Delta Live Tables) from source to silver, with expectations, CDC, and the modes that trip teams up.

DatabricksHow-tos

Troubleshoot a failing Databricks cluster

Cluster will not start, cluster terminates unexpectedly, cluster runs but queries hang. The 5-minute triage and the next 20.

DatabricksHow-tos

Power BI

Make DirectQuery fast on Databricks

DirectQuery performance is a joint responsibility between your Power BI model and your warehouse. The full tuning checklist.

Power BIHow-tos

Chain Power BI refresh to your data pipeline

Stop using the UI scheduler. The Enhanced Refresh REST API, incremental policies, and Airflow integration that prevent stale dashboards after a failed upstream build.

Power BIHow-tos

Version-control Power BI with PBIP and TMDL

Stop committing .pbix. The folder format and TMDL files that finally make Power BI diff-friendly, review-friendly, and merge-friendly.

Power BIHow-tos

CI/CD for Power BI with fabric-cicd

Deploy PBIP artifacts through dev, staging, and prod workspaces with a service principal, BPA gates, and post-deploy refresh tests.

Power BIHow-tos

VSCode

Devcontainers and DevPod

Reproducible dev environments, shipped in the repo, identical for every engineer. The fix for 'works on my machine' that actually holds.

VSCodeHow-tos

The inner loop: Python, PySpark, dbt, Airflow

What fast feedback actually looks like per stack — the 5-to-30-second write-run-inspect-fix cycle that divides productive data creators from slow ones.

VSCodeHow-tos

Debugging in VS Code

Launch configs per workload, conditional breakpoints, data breakpoints, query-profile reading. The single highest-leverage skill for data engineers.

VSCodeHow-tos

Wiring MCP servers into your agent

Give Claude Code, Cursor, or Copilot real tools: Databricks managed MCP, dbt Power User embedded MCP, GitHub, and the scoping rules that keep agents safe.

VSCodeHow-tos

Cross-cutting

Promoting to Gold

Walk a Silver dataset through review and into Gold.

How-tos