Docs · Causeway

Start here

Get started

Quickstart: your first dataset

From a CSV on disk to a Silver contract in fifteen minutes.

Databricks: quickstart

From zero to a running job on a Causeway Databricks workspace. Twenty minutes, no notebooks required.

dbt on Databricks: quickstart

Your first dbt model against the Causeway lakehouse, start to finish in about twenty minutes.

Power BI on Databricks: quickstart

Connect Power BI Desktop to a Databricks SQL warehouse, build a model against a gold table, publish a thin report. About thirty minutes.

Airflow: quickstart

From empty directory to a running DAG on local Astro in about fifteen minutes. Assumes less experience with Airflow.

VS Code: quickstart

From a fresh laptop to a working data-creator environment: extensions installed, interpreter wired, first dbt model running, in under thirty minutes.

Concepts

The contract triple

Schema, SLA, policy: why the three live together.

The three dbt project layers

Staging, intermediate, marts — what each layer is for, why you should resist inventing a fourth, and how it maps to the medallion architecture on Databricks.

Materializations: when to use which

dbt-databricks supports five materializations. This is the decision framework for picking the right one per model.

Databricks compute: SQL warehouses, jobs, all-purpose

Three physically distinct compute offerings, one right default per workload class, and the decision framework for getting it right the first time.

Lakeflow: Connect, Declarative Pipelines, Jobs

Three products consolidated into one data-engineering plane. Knowing which piece does what prevents architectural flailing.

Unity Catalog: hierarchy, grants, and lineage

The object model that governs every table, view, volume, and function on a Causeway Databricks workspace. What lives where, who can touch what, and why lineage comes for free.

Power BI connectivity: ADBC, ODBC, and the native connector

Three ways Power BI can talk to Databricks, one right default, and the traps that trip teams up on AWS deployments and DirectQuery.

Storage modes: Import, DirectQuery, Dual, Direct Lake

Four modes, increasingly overlapping, with a clear decision tree. When Import stops being the right default.

Semantic models: shared, certified, governed

Datasets are called semantic models now and the term matters. The canonical pattern for one model per domain, many thin reports, and how Databricks metric views close the double-definition problem.

Airflow is a supervisor, not an engine

The most common Airflow mistake is treating it as a compute engine. Internalize this one distinction and every other decision follows.

Airflow 3: what changed, what got removed

Airflow 3.0 shipped in April 2025 and reshaped the model. The migration reality, the net-new capabilities, and the things that will break your old DAGs.

Dependencies: direct, Asset, AssetWatcher

Three ways to say 'run B after A'. They are not equivalent. The decision framework for which to reach for when.

Edit local, execute remote

The organizing pattern behind a modern VS Code data-creator setup: the editor stays on the laptop, the compute stays in the cloud, and the extensions wire the two together invisibly.

AI agents in VS Code: the 2026 landscape

Copilot, Claude Code, Cursor, Continue, Cline, Amazon Q: what each is good at, how they coexist, and how to avoid paying for four of them.

The VS Code extension ecosystem

How extensions extend, where they live, how to pin and audit them, and why the marketplace is a supply-chain surface that deserves governance.

How-tos

Promoting to Gold

Walk a Silver dataset through review and into Gold.

Build your first incremental dbt model

Moving a model from full refresh to incremental without breaking downstream consumers. Step-by-step with Databricks-specific choices.

Set up Slim CI for your dbt project

Running only modified models and their downstream, deferring everything else to prod. The pattern that keeps CI under five minutes as the project grows.

Orchestrate dbt with Airflow + Astronomer Cosmos

One Airflow task per dbt model, data-aware scheduling, and how to avoid the single-BashOperator trap.

Ship jobs with Databricks Asset Bundles

YAML in, deployed resources out. The only sanctioned way to ship anything to a Causeway Databricks workspace.

Triage a failed dbt model

The first five minutes after a dbt run fails: classify the error, identify the blast radius, and pick the right recovery path.

Lakebase: OLTP in the lakehouse

Managed Postgres inside Databricks. When to reach for it, the adoption ladder, and the Postgres fundamentals that managed does not absolve you from.

Build a Lakeflow Declarative Pipeline

A walkthrough of building an LDP (formerly Delta Live Tables) from source to silver, with expectations, CDC, and the modes that trip teams up.

Troubleshoot a failing Databricks cluster

Cluster will not start, cluster terminates unexpectedly, cluster runs but queries hang. The 5-minute triage and the next 20.

Make DirectQuery fast on Databricks

DirectQuery performance is a joint responsibility between your Power BI model and your warehouse. The full tuning checklist.

Chain Power BI refresh to your data pipeline

Stop using the UI scheduler. The Enhanced Refresh REST API, incremental policies, and Airflow integration that prevent stale dashboards after a failed upstream build.

Version-control Power BI with PBIP and TMDL

Stop committing .pbix. The folder format and TMDL files that finally make Power BI diff-friendly, review-friendly, and merge-friendly.

CI/CD for Power BI with fabric-cicd

Deploy PBIP artifacts through dev, staging, and prod workspaces with a service principal, BPA gates, and post-deploy refresh tests.

Author your first production DAG

The idempotency, atomicity, and DAG-shaping rules that let an Airflow pipeline survive contact with a backfill. Walks through a realistic hourly ingest DAG.

Error recovery: retries, pools, sensors, callbacks

The hard part of orchestration is what happens when things go wrong. Retries are for transient failures, pools are for rate-limited resources, sensors are for not-ready-yet. Get the distinction right or you build flakiness.

Triage a failed Airflow DAG

The on-call procedure for a failed task: classify, check blast radius, pick a recovery path. Five minutes to the first decision.

Event-driven DAGs with Assets and AssetWatchers

Stop polling. Trigger DAGs when the data arrives, whether the producer is another DAG or an external system.

Devcontainers and DevPod

Reproducible dev environments, shipped in the repo, identical for every engineer. The fix for 'works on my machine' that actually holds.

The inner loop: Python, PySpark, dbt, Airflow

What fast feedback actually looks like per stack — the 5-to-30-second write-run-inspect-fix cycle that divides productive data creators from slow ones.

Debugging in VS Code

Launch configs per workload, conditional breakpoints, data breakpoints, query-profile reading. The single highest-leverage skill for data engineers.

Wiring MCP servers into your agent

Give Claude Code, Cursor, or Copilot real tools: Databricks managed MCP, dbt Power User embedded MCP, GitHub, and the scoping rules that keep agents safe.

API reference

API Reference

API reference for the Causeway platform, covering available endpoints, parameters, and responses.

CLI reference

dbt CLI commands

The subset of dbt commands you will use daily, with selector syntax and the flags that actually matter.

Databricks CLI commands

The Databricks CLI subcommands you actually use day-to-day, in the order you hit them.

Power BI CLI tools: Tabular Editor, ALM Toolkit, fabric-cicd

Three production tools, distinct roles, and the commands you actually use day to day.

Airflow + Astro CLI commands

The subset of Airflow and Astro CLI commands you use daily, plus the REST API fallback when the CLI does not cover something.

VS Code commands and keyboard shortcuts

The `code` CLI, the command palette, the shortcuts that save seconds per iteration, and the Tasks that save minutes per day.

Standards

dbt model authoring standards

The Causeway rules for how a dbt model should look. Naming, structure, materialization, testing, and contracts. Enforced at review time.

dbt production readiness checklist

What a Causeway dbt project must satisfy before the first prod deploy, and what it must keep satisfying after. Enforced at the Gold-promotion gate.

Databricks asset authoring standards

Causeway's rules for how a Databricks workload is structured, named, and deployed. Enforced at review time; exceptions require an RFD.

Databricks production readiness checklist

What a Causeway Databricks workload must satisfy before its first prod deploy, and what it must keep satisfying. Enforced at the promote-to-prod gate.

Python Standards

Python coding standards and best practices for the Causeway data platform.

Power BI model authoring standards

The Causeway rules for a Power BI semantic model: shape, connectivity, DAX, storage mode, versioning. Enforced in PR review via BPA.

SQL Standards

SQL coding standards and best practices for the Causeway data platform.

Power BI production readiness checklist

What a shared semantic model must satisfy before its first prod deploy, and keep satisfying after. Enforced at the Certified-endorsement gate.

Airflow DAG authoring standards

The Causeway rules for a DAG: shape, dependencies, retries, resource gating, deployment. Enforced at review time.

Airflow production readiness checklist

What an Airflow deployment must satisfy before the first production DAG runs, and keep satisfying after. Enforced at the promote-to-prod gate.

VS Code workspace standards

The Causeway rules for a data-creator repo's `.vscode/`, `.devcontainer/`, and agent configuration. Enforced in PR review.

VS Code production readiness checklist

What the paved-path VS Code environment must satisfy before a team calls it 'done,' and keep satisfying after. The operational gate platform owns.

Reference

Data Model Reference

Reference documentation for the Causeway platform data model, including core entities and relationships.

Materialization options reference

The full config matrix for every dbt-databricks materialization, including incremental strategies, clustering, compute routing, and schema-change handling.

Common dbt errors and resolutions

Symptom-first lookup for the errors you hit weekly: compilation, database, incremental schema drift, permissions, package conflicts.

Configuration Reference

Detailed reference for configuring the Causeway platform, including configuration files and environment variables.

SQL warehouse reference

Exhaustive config options for Databricks SQL warehouses: sizing, scaling, auto-stop, Photon, type selection.

Common Databricks errors and resolutions

Symptom-first lookup for the errors you hit most on a Databricks workspace: cluster launches, UC permissions, Delta writes, init scripts, Lakeflow pipelines.

DAX rules and anti-patterns

The handful of DAX rules that cover 80% of production performance and correctness issues, plus the patterns to avoid.

Airflow concurrency reference

The six concurrency knobs, how they interact, and how to tune for the bottleneck actually limiting you.

Common Airflow errors and resolutions

Symptom-first lookup for the errors on-call hits weekly. Task failures, scheduler issues, resource exhaustion.

Extension pack reference

Every extension ID the paved path ships, what it does, what settings it introduces, and the known sharp edges.

Settings, launch, and tasks reference

The settings.json, launch.json, and tasks.json recipes every data-creator workspace ships, plus the anti-patterns to avoid.

Build on the paved path.