These standards bind every job, pipeline, notebook, and bundle that deploys to a Causeway Databricks workspace. They are not recommendations.
1. Workspace and environment isolation
- Three separate workspaces:
dev,staging,prod. One Databricks workspace per environment, never shared. - One Unity Catalog catalog per environment:
dev,staging,prod. Schemas carve domains inside a catalog. - Service principals own prod, not humans.
mode: productionbundles requirerun_as.service_principal_name. - Human identity deploys to dev only. Dev targets use
mode: developmentso twenty engineers share the workspace without collision.
Danger
Never share a workspace across dev/staging/prod. The blast radius of a bad deploy is unbounded: one bad notebook can drop prod tables, one cluster policy mistake can burn the prod monthly budget. Three workspaces are the cheapest insurance you will ever buy.
2. Notebooks vs. Python packages
Logic lives in .py packages. Notebooks are thin bootstrappers.
my_project/
src/mypkg/
transforms.py # pure functions; pytest-friendly; no spark session coupling
io.py # readers, writers, side effects
notebooks/
run_daily.py # a dozen lines that import mypkg and call transforms
tests/
test_transforms.py # runs in plain CI via chispa or pytest
bundle.yml
Rules:
- Transformation logic is unit-testable Python. No business rule lives only in a notebook.
- Ship
mypkgas a wheel. Attach the wheel to the job; let the notebook bootstrap. - Notebooks do not ship to production in 2026. If a notebook is scheduled, its logic belongs in a wheel.
- Unit tests run in plain CI, on a runner without a Databricks workspace. Chispa and pytest handle the Spark parts.
Warning
Two notebook pitfalls catch every team. First, git pull inside a Databricks Git folder destroys cell output and in-memory state; build with the assumption that the floor can be yanked. Second, notebook source formats (.py, .sql, .scala) strip outputs on commit; do not treat a notebook like a document.
3. Asset Bundles are the deployment unit
Every job, pipeline, warehouse, and dashboard deploys via a bundle. Exceptions require a written waiver.
Standards for every bundle:
databricks.ymldefines bundle + variables + targets. One target per environment.resources/holds one YAML per resource:jobs/,pipelines/,warehouses/.src/holds Python packages.tests/holds pytest tests that run in plain CI.databricks bundle validatepasses on every PR, before anything else runs.- Variables carry per-env values; no hard-coded workspace hosts, warehouse IDs, or catalog names in resource YAML.
- Dev target uses
mode: development; staging and prod usemode: productionwithrun_as.service_principal_name.
4. Naming
| Resource | Pattern | Example |
|---|---|---|
| Catalog | Environment | prod, staging, dev |
| Schema | Domain or data layer | bronze, silver, gold, finance |
| Table | snake_case, descriptive | customer_transactions, daily_revenue |
| Column | snake_case | customer_id, created_at |
| Volume | Purpose-first | raw_files, model_artifacts |
| Job | <purpose>_<scope> | transform_daily, ingest_hourly |
| Pipeline | <domain>_<layer> | customers_silver, events_bronze |
| Warehouse | wh-<workload-class> | wh-bi, wh-elt, wh-adhoc |
| Cluster policy | <team>-<purpose> | de-standard, ds-gpu |
| Service principal | sp-<project>-<env> | sp-analytics-prod |
5. Tags are mandatory
Every deployable resource carries tags for cost attribution and governance:
tags:
environment: ${bundle.target}
owner: data-engineering
cost_center: DE-001
project: customer-360
The four required tags:
environment: matches the bundle target (dev,staging,prod).owner: the team.cost_center: the billing cost center.project: the product or initiative.
Tables additionally carry governance tags:
classification:public,internal, orconfidential.pii:trueorfalse.domain: the business domain.tier:bronze,silver, orgold.
6. Compute
See compute types for the framework. Standards summarized:
- SQL warehouses are Serverless. Pro only when Serverless is unavailable in the region. Classic is banned for new work.
- Job compute is Serverless. Job clusters only when Serverless cannot support the workload.
- All-purpose compute is for notebooks only. No scheduled jobs on all-purpose compute; this is the single largest Databricks cost leak.
- Photon is on by default. Disable only for jobs with sub-2-second queries where the startup tax hurts.
- Every resource tagged for cost attribution.
7. Authentication
- CI authenticates via workload identity federation (OIDC). GitHub Actions / Azure DevOps to a Databricks service principal. No long-lived PATs in CI secrets.
- Human developers authenticate via
databricks auth loginwhich uses OAuth against the workspace. - Service principals scope narrowly. A prod service principal gets
USE CATALOG prod,CREATE JOB,MODIFYon target schemas, and nothing broader.
Important
PAT tokens in CI secrets are banned in Causeway projects. OIDC federation takes an afternoon to set up per workspace; it saves every rotation-review thereafter. If your project still uses PATs, migrate before your next security review.
8. Secrets
- Secrets live in Databricks Secret Scopes or the cloud's KMS/Key Vault. Never in bundle YAML. Never in notebooks. Never as
vars:in dbt. - Reference secrets from bundles via
${secrets.<scope>.<key>}; from notebooks viadbutils.secrets.get(scope, key). - Rotate on a schedule (90 days for human-scope secrets; service principals are the default path precisely because they avoid this).
9. Unity Catalog layout
- Layer schemas:
bronze,silver,gold,sandbox. - Domain schemas:
finance,marketing,product,ops. - A table belongs in one schema. Cross-domain facts go in
gold, not in each domain's schema. - Managed tables by default. External tables only when an existing consumer or compliance requirement dictates the storage path.
- Every public table (schema
gold) has adescriptionand column-levelcomments.
10. Lakeflow usage
- LDP owns silver streaming tables with quality gates and CDC ingestion.
- dbt owns gold marts and batch SQL transformations.
- Lakeflow Jobs orchestrates Databricks-internal workloads.
- Airflow (+ Cosmos) orchestrates cross-platform DAGs; the job itself is defined in a DAB, called via
DatabricksRunNowOperator.
See Lakeflow concepts for the decision framework.
11. Git and concurrent development
- Trunk-based development. Short-lived feature branches; merge to main in hours, not days.
- Git Folders for notebook-heavy work; external Git + IDE + Databricks Connect for pure-code projects.
- Per-developer isolation:
- DAB
mode: developmentprepends[dev ${user}]to resource names. - Unity Catalog schema-per-user for dbt (
dbt_<user>) and ad-hoc output. - Lakebase branching for ephemeral test OLTP.
- DAB
- Feature flags via DAB variables or target selectors, not long-running branches.
12. CI/CD pipeline
Every PR runs:
databricks bundle validate # catches YAML / permission issues
pytest src/ tests/ # pure-Python unit tests
dbt parse && dbt compile # if the project has dbt
dbt test --select test_type:unit
sqlfluff lint # SQL style
databricks bundle deploy -t dev # deploy to ephemeral dev target
<smoke tests against dev>
On merge to main:
databricks bundle deploy -t staging
<integration tests>
databricks bundle deploy -t prod # service-principal auth
Every prod deploy tags the commit. Rollback is bundle deploy at the previous tag.
Danger
Lock prod deploys to CI-only. A team member running bundle deploy -t prod from a laptop can clobber production. The deploying service principal's grants are in CI's workload identity; no human identity should hold them.
13. Review checklist
PRs touching Databricks resources must satisfy:
- [ ] Bundle validates cleanly (
databricks bundle validate). - [ ] Python logic is unit-tested; notebook is a bootstrapper.
- [ ] Resources named per section 4.
- [ ] Tags per section 5.
- [ ] Compute per section 6 (Serverless defaults; no prod on all-purpose).
- [ ] Secrets per section 8 (scopes, not YAML).
- [ ] UC layout per section 9 (catalog per env, schemas for domain/layer).
- [ ] LDP / dbt / Lakeflow Jobs / Airflow chosen per section 10.
- [ ] PR's CI deploys to dev and runs smoke tests before merge.
See also
- Production readiness — what it takes to promote a workload to prod.
- Unity Catalog — the governance plane these standards rest on.
- Asset Bundles guide — the mechanism for most of section 3.