These standards bind every dbt model in the Causeway monorepo. They are not recommendations. If a model needs to deviate, deviate in the model's yml file with a comment explaining why and a link to the RFD or decision.
1. Project layout
Three layers, no fourth.
models/
staging/
intermediate/
marts/
- Staging (
stg_<source>__<table>.sql): one-to-one with a source table. Renames, casts, coerces nulls. No joins, no aggregations, no filters. Always aview. - Intermediate (
int_<verb>__<noun>.sql): composes staging into reusable building blocks. Defaultephemeral; promote toviewif referenced by three or more downstream models. Never exposed to BI or external consumers. - Marts (
fct_<grain>.sql,dim_<entity>.sql,agg_<metric>.sql): what downstream reads. Defaulttable; promote toincrementalwhen full-refresh cost hurts;materialized_viewfor DLT-managed aggregations;streaming_tablewhen latency beats cost.
Warning
Do not invent a fourth layer. "metrics/", "presentation/", "semantic/" always feel clarifying the day they are introduced and always become naming arguments within a quarter. The three layers compose everything.
2. Naming
| Thing | Pattern | Example |
|---|---|---|
| Staging | stg_<source>__<table> | stg_salesforce__accounts |
| Intermediate | int_<verb>__<noun> | int_joined__orders_customers |
| Fact mart | fct_<grain> | fct_orders |
| Dimension mart | dim_<entity> | dim_customers |
| Aggregate mart | agg_<metric> | agg_revenue_daily |
| Test file | test_<assertion>.sql | test_no_orphan_orders.sql |
| Column | snake_case, no abbrev. except id | customer_id, placed_at, revenue_usd |
| Boolean | is_ / has_ prefix | is_active, has_subscription |
| Timestamp | _at suffix | created_at, updated_at |
| Date | _date suffix | event_date, placed_date |
| Ingestion timestamp | _loaded_at | reserved; never use for anything else |
Currency columns carry the currency in the name: amount_usd, spend_eur.
3. File structure inside a model
Every non-trivial model follows this shape:
with <input1> as ( select * from {{ ref('…') }} ),
<input2> as ( select * from {{ ref('…') }} ),
<transform_step_1> as (
...
),
<transform_step_2> as (
...
),
final as (
...
)
select * from final
Rules:
- Read every
ref()into a named CTE at the top, before any logic. Never subquery aref(). Keeps compiled SQL readable in Databricks query history. - One CTE per logical step. Name the step, not the output table:
renamed,filtered,deduplicated,joined,final. - Always end with
select * from final. The pattern is a promise about shape; it also makes the final column list debuggable with a trailing-- select * from joined limit 100during development.
4. Jinja discipline
- Macros are for repetition, not cleverness. If a macro is used once, inline it. If it is used twice, wait for a third. Intermediate models observe as nodes in the DAG; macros do not.
- Reach for packages before writing macros.
dbt_utilsfor pivots, surrogate keys, date spines.dbt_expectationsfor expressive tests.dbt-labs/codegenfor boilerplate. - Env gating via
target.name. Differences between dev and prod go in{% if target.name == 'prod' %}blocks, not environment variables read in SQL. - No per-developer
vars. Dev scoping happens viagenerate_schema_name. If two developers need different values at build time, they need different models.
5. Materialization defaults
Baked into dbt_project.yml:
models:
causeway:
staging:
+materialized: view
+schema: staging
+tags: [staging]
intermediate:
+materialized: ephemeral
+tags: [intermediate]
marts:
+materialized: table
+schema: gold
+tags: [marts]
Override per-model only when the default is wrong. Justify the deviation in a comment next to config().
Note
"Justify" means one line pointing at a measurable reason: "full refresh takes 42m, incremental needed" or "consumed by four dashboards that poll every minute, needs to be a table". Opinions about "feels right" do not count.
6. Incremental models
If you write materialized='incremental', you commit to all of these:
- A real, reliable watermark column.
- A deduplication CTE before any
merge. is_incremental()filter anchored tomax(watermark) from {{ this }}, never to wall-clock time.on_schema_changeset explicitly.append_new_columnsis the Causeway default;ignoreis banned without a reviewed RFD.- A
uniquetest on theunique_key. - A
not_nulltest on the watermark column.
See Build your first incremental model for the worked example.
Danger
on_schema_change: 'ignore' silently drops columns that disappear upstream. It is a data-loss path that hides until someone asks why a column is empty. Never set it. If you have a reason, write an RFD first.
7. Physical layer
- Liquid clustering by default for every
tableandincremental. Pick clustering keys from the columns consumers filter and join on, typically the primary entity key and a time column. partition_byonly for hard isolation requirements (GDPR region separation, multi-tenant where tenants must never co-locate).- Compute routing via named compute in
profiles.yml. Serverless SQL warehouses for everything SQL. All-purpose clusters only for Python models.
{{ config(
materialized='table',
liquid_clustered_by=['customer_id', 'event_date'],
tblproperties={'delta.autoOptimize.optimizeWrite': 'true'}
) }}
8. Tests
Tests fall into four categories. Do not conflate them.
| Category | Where declared | Runs against | Catches |
|---|---|---|---|
| Generic data tests | schema.yml | Real data | Data quality issues |
| Singular tests | .sql in tests/ | Real data | One-off business rules |
| Unit tests | schema.yml | Static mocks | Logic regressions |
| Model contracts | schema.yml | Schema metadata | Breaking changes to public interfaces |
Required per model:
| Model type | Required tests |
|---|---|
| Staging | not_null + unique on the primary key; not_null on the ingestion timestamp |
| Intermediate | not_null on any required join keys |
| Fact mart | not_null + unique on the grain key; not_null on all required dims; relationships to referenced dims |
| Dim mart | not_null + unique on the entity key; contract enforced |
| Incremental model | All of the above plus not_null on the watermark |
Run tests with dbt build, not dbt run + dbt test split. build interleaves so a failing test blocks downstream, which is almost always what you want.
9. Model contracts
Contracts are for public interfaces: marts that BI, other dbt projects, or services consume. Contracts are not for internal intermediates.
models:
- name: dim_customers
config:
contract:
enforced: true
columns:
- name: customer_id
data_type: bigint
constraints:
- type: not_null
- type: primary_key
- name: email
data_type: string
constraints: [{type: not_null}]
When a contract breaks, create dim_customers_v2 alongside and version with versions: rather than breaking consumers in place. Consumers migrate on their own cadence.
10. Sources and freshness
Every external table enters the project through a declared source. Every source has freshness thresholds.
sources:
- name: bronze
database: prod
schema: bronze
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
loaded_at_field: _loaded_at
tables:
- name: raw_orders
Source freshness checks run out-of-band via dbt source freshness, wired into orchestration (Airflow DAG, Cosmos task group), not into CI. A stale source should page someone, not fail a PR.
11. Packages
Prefer community packages over writing macros. Vendor-controlled and version-pinned in packages.yml.
Acceptable packages in Causeway projects:
dbt-labs/dbt_utils— date spines, surrogate keys, pivots.calogica/dbt_expectations— statistical and distributional tests.elementary-data/elementary— anomaly detection and test-run history.dbt-labs/codegen— generateschema.ymland source declarations.
Any package not on that list requires review in a PR with explicit justification.
12. Review checklist
PRs touching models/ must satisfy:
- [ ] Naming conforms to the table in section 2.
- [ ] File structure follows the CTE pattern in section 3.
- [ ] No macro defined that is used fewer than twice.
- [ ] Materialization matches layer defaults, or deviation is justified.
- [ ] Liquid clustering on any new
table/incremental; nopartition_bywithout an isolation justification. - [ ] Required tests present per section 8.
- [ ] Public marts have
contract: enforced: true. - [ ] Sources declared with freshness thresholds.
- [ ] No new
vars:for per-developer values. - [ ] Slim CI build passes against the stored prod manifest.
See also
- Production readiness — what it takes to promote a project or model to prod.
- Materializations concepts — the framework these standards rest on.
- The contract triple — how Causeway's contract spec layers onto dbt model contracts.