Unity Catalog (UC) is the governance plane under every Databricks workspace. If you have used Postgres, it is the metastore you always wish Postgres had: hierarchical, cross-workspace, grant-based, lineage-aware, and queryable via information schema.
The hierarchy
Metastore (one per region)
└── Catalog (per environment: dev, staging, prod)
└── Schema (per domain or data layer)
├── Table (managed or external)
├── View
├── Volume (file storage)
├── Function
└── Model (ML models registered via MLflow)
Three levels of fully-qualified name: prod.silver.customers = catalog.schema.table. Databricks SQL treats the catalog as part of the path; you do not switch "databases" like you might in Postgres.
Object types
| Object | What it is | Where the bytes live |
|---|---|---|
| Managed table | UC owns storage | S3 path managed by UC |
| External table | You own storage | Your S3 path |
| Volume | Non-tabular file storage | S3 path (managed or external) |
| View | SQL view referencing tables | No storage; query definition only |
| Materialized view | Pre-computed view (LDP-backed) | Managed storage |
| Streaming table | LDP streaming target | Managed storage |
| Function | UDF or SQL function | Catalog metadata only |
Default to managed tables. External tables exist for cases where the data must live at a specific path for compliance, cost allocation, or an existing consumer you cannot migrate.
Catalog-per-environment
The Causeway convention:
prodcatalog for production data.stagingcatalog for pre-prod validation.devcatalog for development.
Each catalog mirrors the same schema shape:
prod/
bronze/ raw, ingested, as-is
silver/ cleaned, joined, deduplicated
gold/ business-ready, aggregated, consumer-facing
sandbox/ scratch space for ad-hoc exploration
system/ (reserved; do not use)
A dbt or LDP pipeline that works in dev promotes to staging and prod by changing only the target catalog. Everything else stays identical.
Note
One catalog per environment, not per team. Teams carve domains inside schemas (prod.finance, prod.product), not inside catalogs. Teams-as-catalogs leads to permission sprawl and cross-catalog joins that break lineage tooling.
Grants
UC's privilege model is hierarchical and explicit.
-- Catalog access
GRANT USE CATALOG ON CATALOG prod TO `data-engineering`;
-- Schema access
GRANT USE SCHEMA ON SCHEMA prod.silver TO `data-engineering`;
-- Table read
GRANT SELECT ON TABLE prod.gold.revenue_summary TO `bi-team`;
-- Table write
GRANT SELECT, MODIFY ON TABLE prod.silver.customers TO `data-engineering`;
-- All privileges on a schema (use sparingly)
GRANT ALL PRIVILEGES ON SCHEMA prod.sandbox TO `data-science`;
Three rules:
- Grants flow down.
USE CATALOGis required before any schema grant has effect.USE SCHEMAis required before any table grant has effect. - Grants are additive. If you want to remove access, explicit
REVOKEor change ownership. - Ownership is separate from grants. Owners can grant to others; non-owners cannot, even if they have
ALL PRIVILEGES.
-- Change ownership
ALTER TABLE prod.gold.revenue_summary OWNER TO `data-engineering`;
Common permission errors
| Error | You need |
|---|---|
User does not have USE CATALOG on catalog 'prod' | GRANT USE CATALOG ON CATALOG prod TO <principal> |
User does not have USE SCHEMA on schema | GRANT USE SCHEMA ON SCHEMA ... TO <principal> |
User does not have SELECT on table | GRANT SELECT ON TABLE ... TO <principal> |
User does not have CREATE TABLE | GRANT CREATE TABLE ON SCHEMA ... TO <principal> |
External locations and storage credentials
UC manages storage through two related objects:
- Storage credential: an IAM role (AWS) or managed identity (Azure / GCP equivalent) that UC assumes to access object storage.
- External location: a specific S3 path plus the storage credential UC uses for it.
CREATE STORAGE CREDENTIAL prod_s3_credential
WITH (AWS_IAM_ROLE = 'arn:aws:iam::ACCOUNT:role/databricks-unity-catalog-role');
CREATE EXTERNAL LOCATION bronze_landing
URL 's3://data-lake-prod/bronze/'
WITH (STORAGE CREDENTIAL prod_s3_credential);
-- Test it
VALIDATE EXTERNAL LOCATION bronze_landing;
Every external table maps to an external location. Every managed table lives in UC-managed storage, accessed via the catalog's storage root.
Lineage, automatic
UC captures table-level and column-level lineage automatically from:
- Spark SQL queries (any notebook, any warehouse).
- Lakeflow Declarative Pipelines.
- dbt models (via the dbt-databricks adapter).
You do not configure anything. A SELECT that joins two tables produces a lineage edge between them on the next UC refresh cycle.
Databricks UI → Catalog → Navigate to table → Lineage tab
Via REST:
GET /api/2.1/unity-catalog/lineage/table-lineage?table_name=prod.silver.customers&direction=DOWNSTREAM
Note
Do not build a parallel lineage system on top of UC. You will fall behind within a quarter. If a downstream query is missing from the lineage panel, the cause is almost always dynamic column expansion (SELECT *) or an external system reading via JDBC without UC credentials. Fix those; do not recreate the lineage layer.
Information schema
Every catalog exposes an information_schema that mirrors the ANSI SQL standard:
-- Every table in a schema
SELECT table_name, table_type, data_source_format, comment
FROM prod.information_schema.tables
WHERE table_schema = 'silver'
ORDER BY table_name;
-- Columns in a table
SELECT column_name, data_type, is_nullable, comment
FROM prod.information_schema.columns
WHERE table_schema = 'silver' AND table_name = 'customers'
ORDER BY ordinal_position;
-- Who has access to what
SELECT *
FROM prod.information_schema.schema_privileges
WHERE schema_name = 'gold';
Information schema is queryable by anyone with USE CATALOG and USE SCHEMA, which makes it a natural source for internal catalogs, lineage dashboards, and automated audits.
Audit logging
Every UC action lands in system.access.audit:
-- Who touched prod.silver.* in the last week
SELECT event_time, user_identity.email, action_name,
request_params.full_name_arg AS table_name
FROM system.access.audit
WHERE action_name IN ('getTable', 'createTable', 'deleteTable',
'generateTemporaryTableCredential')
AND request_params.full_name_arg LIKE 'prod.silver.%'
AND event_date >= CURRENT_DATE() - INTERVAL 7 DAYS
ORDER BY event_time DESC;
-- Permission changes in the last month
SELECT event_time, user_identity.email, action_name, request_params
FROM system.access.audit
WHERE action_name IN ('updatePermissions', 'grantPermission', 'revokePermission')
AND event_date >= CURRENT_DATE() - INTERVAL 30 DAYS
ORDER BY event_time DESC;
Audit tables retain by default for 365 days; extend via metastore settings.
Tagging
Tags live alongside the object and propagate through information schema, lineage, and the REST API. The Causeway policy engine reads tags to make enforcement decisions.
ALTER TABLE prod.gold.customer_360
SET TAGS ('pii' = 'true', 'classification' = 'confidential');
ALTER TABLE prod.gold.revenue_summary
SET TAGS ('classification' = 'internal', 'domain' = 'finance');
The convention for Causeway workspaces:
| Tag | Meaning | Values |
|---|---|---|
classification | RAG tier | public, internal, confidential |
pii | Does the table contain personally identifiable information | true, false |
domain | Business domain | finance, marketing, product, ops, … |
tier | Quality tier | bronze, silver, gold |
Migration from Hive metastore
Any workspace with a pre-UC history has tables in hive_metastore. Migrate them:
-- Upgrade a Hive table to a UC managed table
CREATE TABLE prod.silver.customers
AS SELECT * FROM hive_metastore.silver.customers;
-- Or use SYNC for an in-place reference
SYNC TABLE prod.silver.customers FROM hive_metastore.silver.customers;
The per-workspace migration checklist:
- [ ] Create UC catalogs and schemas.
- [ ] Create storage credentials and external locations.
- [ ] Migrate tables (SYNC or recreate).
- [ ] Set up grants.
- [ ] Update every
hive_metastore.x.yreference in dbt, pipelines, notebooks to<catalog>.x.y. - [ ] Update downstream consumers (BI tools, Dremio, applications).
- [ ] Switch cluster access mode to Shared (UC-compatible).
- [ ] Retire the hive_metastore objects.
See also
- Workspace standards — Causeway's rules for catalog/schema layout.
- Common errors — permission errors and their fixes.
- The contract triple — how Causeway's contract spec layers on top of UC tags and grants.