Docs · By type
Airflow
Author your first production DAG
The idempotency, atomicity, and DAG-shaping rules that let an Airflow pipeline survive contact with a backfill. Walks through a realistic hourly ingest DAG.
Error recovery: retries, pools, sensors, callbacks
The hard part of orchestration is what happens when things go wrong. Retries are for transient failures, pools are for rate-limited resources, sensors are for not-ready-yet. Get the distinction right or you build flakiness.
Triage a failed Airflow DAG
The on-call procedure for a failed task: classify, check blast radius, pick a recovery path. Five minutes to the first decision.
Event-driven DAGs with Assets and AssetWatchers
Stop polling. Trigger DAGs when the data arrives, whether the producer is another DAG or an external system.
dbt
Build your first incremental dbt model
Moving a model from full refresh to incremental without breaking downstream consumers. Step-by-step with Databricks-specific choices.
Set up Slim CI for your dbt project
Running only modified models and their downstream, deferring everything else to prod. The pattern that keeps CI under five minutes as the project grows.
Orchestrate dbt with Airflow + Astronomer Cosmos
One Airflow task per dbt model, data-aware scheduling, and how to avoid the single-BashOperator trap.
Triage a failed dbt model
The first five minutes after a dbt run fails: classify the error, identify the blast radius, and pick the right recovery path.
Databricks
Ship jobs with Databricks Asset Bundles
YAML in, deployed resources out. The only sanctioned way to ship anything to a Causeway Databricks workspace.
Lakebase: OLTP in the lakehouse
Managed Postgres inside Databricks. When to reach for it, the adoption ladder, and the Postgres fundamentals that managed does not absolve you from.
Build a Lakeflow Declarative Pipeline
A walkthrough of building an LDP (formerly Delta Live Tables) from source to silver, with expectations, CDC, and the modes that trip teams up.
Troubleshoot a failing Databricks cluster
Cluster will not start, cluster terminates unexpectedly, cluster runs but queries hang. The 5-minute triage and the next 20.
Power BI
Make DirectQuery fast on Databricks
DirectQuery performance is a joint responsibility between your Power BI model and your warehouse. The full tuning checklist.
Chain Power BI refresh to your data pipeline
Stop using the UI scheduler. The Enhanced Refresh REST API, incremental policies, and Airflow integration that prevent stale dashboards after a failed upstream build.
Version-control Power BI with PBIP and TMDL
Stop committing .pbix. The folder format and TMDL files that finally make Power BI diff-friendly, review-friendly, and merge-friendly.
CI/CD for Power BI with fabric-cicd
Deploy PBIP artifacts through dev, staging, and prod workspaces with a service principal, BPA gates, and post-deploy refresh tests.
VSCode
Devcontainers and DevPod
Reproducible dev environments, shipped in the repo, identical for every engineer. The fix for 'works on my machine' that actually holds.
The inner loop: Python, PySpark, dbt, Airflow
What fast feedback actually looks like per stack — the 5-to-30-second write-run-inspect-fix cycle that divides productive data creators from slow ones.
Debugging in VS Code
Launch configs per workload, conditional breakpoints, data breakpoints, query-profile reading. The single highest-leverage skill for data engineers.
Wiring MCP servers into your agent
Give Claude Code, Cursor, or Copilot real tools: Databricks managed MCP, dbt Power User embedded MCP, GitHub, and the scoping rules that keep agents safe.