These standards bind every job, pipeline, notebook, and bundle that deploys to a Causeway Databricks workspace. They are not recommendations.

1. Workspace and environment isolation

Danger

Never share a workspace across dev/staging/prod. The blast radius of a bad deploy is unbounded: one bad notebook can drop prod tables, one cluster policy mistake can burn the prod monthly budget. Three workspaces are the cheapest insurance you will ever buy.

2. Notebooks vs. Python packages

Logic lives in .py packages. Notebooks are thin bootstrappers.

my_project/
  src/mypkg/
    transforms.py        # pure functions; pytest-friendly; no spark session coupling
    io.py                # readers, writers, side effects
  notebooks/
    run_daily.py         # a dozen lines that import mypkg and call transforms
  tests/
    test_transforms.py   # runs in plain CI via chispa or pytest
  bundle.yml

Rules:

Warning

Two notebook pitfalls catch every team. First, git pull inside a Databricks Git folder destroys cell output and in-memory state; build with the assumption that the floor can be yanked. Second, notebook source formats (.py, .sql, .scala) strip outputs on commit; do not treat a notebook like a document.

3. Asset Bundles are the deployment unit

Every job, pipeline, warehouse, and dashboard deploys via a bundle. Exceptions require a written waiver.

Standards for every bundle:

4. Naming

ResourcePatternExample
CatalogEnvironmentprod, staging, dev
SchemaDomain or data layerbronze, silver, gold, finance
Tablesnake_case, descriptivecustomer_transactions, daily_revenue
Columnsnake_casecustomer_id, created_at
VolumePurpose-firstraw_files, model_artifacts
Job<purpose>_<scope>transform_daily, ingest_hourly
Pipeline<domain>_<layer>customers_silver, events_bronze
Warehousewh-<workload-class>wh-bi, wh-elt, wh-adhoc
Cluster policy<team>-<purpose>de-standard, ds-gpu
Service principalsp-<project>-<env>sp-analytics-prod

5. Tags are mandatory

Every deployable resource carries tags for cost attribution and governance:

tags:
  environment: ${bundle.target}
  owner: data-engineering
  cost_center: DE-001
  project: customer-360

The four required tags:

Tables additionally carry governance tags:

6. Compute

See compute types for the framework. Standards summarized:

7. Authentication

Important

PAT tokens in CI secrets are banned in Causeway projects. OIDC federation takes an afternoon to set up per workspace; it saves every rotation-review thereafter. If your project still uses PATs, migrate before your next security review.

8. Secrets

9. Unity Catalog layout

10. Lakeflow usage

See Lakeflow concepts for the decision framework.

11. Git and concurrent development

12. CI/CD pipeline

Every PR runs:

databricks bundle validate         # catches YAML / permission issues
pytest src/ tests/                  # pure-Python unit tests
dbt parse && dbt compile            # if the project has dbt
dbt test --select test_type:unit
sqlfluff lint                       # SQL style
databricks bundle deploy -t dev     # deploy to ephemeral dev target
<smoke tests against dev>

On merge to main:

databricks bundle deploy -t staging
<integration tests>
databricks bundle deploy -t prod    # service-principal auth

Every prod deploy tags the commit. Rollback is bundle deploy at the previous tag.

Danger

Lock prod deploys to CI-only. A team member running bundle deploy -t prod from a laptop can clobber production. The deploying service principal's grants are in CI's workload identity; no human identity should hold them.

13. Review checklist

PRs touching Databricks resources must satisfy:

See also