Unity Catalog (UC) is the governance plane under every Databricks workspace. If you have used Postgres, it is the metastore you always wish Postgres had: hierarchical, cross-workspace, grant-based, lineage-aware, and queryable via information schema.

The hierarchy

Metastore (one per region)
  └── Catalog (per environment: dev, staging, prod)
        └── Schema (per domain or data layer)
              ├── Table (managed or external)
              ├── View
              ├── Volume (file storage)
              ├── Function
              └── Model (ML models registered via MLflow)

Three levels of fully-qualified name: prod.silver.customers = catalog.schema.table. Databricks SQL treats the catalog as part of the path; you do not switch "databases" like you might in Postgres.

Object types

ObjectWhat it isWhere the bytes live
Managed tableUC owns storageS3 path managed by UC
External tableYou own storageYour S3 path
VolumeNon-tabular file storageS3 path (managed or external)
ViewSQL view referencing tablesNo storage; query definition only
Materialized viewPre-computed view (LDP-backed)Managed storage
Streaming tableLDP streaming targetManaged storage
FunctionUDF or SQL functionCatalog metadata only

Default to managed tables. External tables exist for cases where the data must live at a specific path for compliance, cost allocation, or an existing consumer you cannot migrate.

Catalog-per-environment

The Causeway convention:

Each catalog mirrors the same schema shape:

prod/
  bronze/          raw, ingested, as-is
  silver/          cleaned, joined, deduplicated
  gold/            business-ready, aggregated, consumer-facing
  sandbox/         scratch space for ad-hoc exploration
  system/          (reserved; do not use)

A dbt or LDP pipeline that works in dev promotes to staging and prod by changing only the target catalog. Everything else stays identical.

Note

One catalog per environment, not per team. Teams carve domains inside schemas (prod.finance, prod.product), not inside catalogs. Teams-as-catalogs leads to permission sprawl and cross-catalog joins that break lineage tooling.

Grants

UC's privilege model is hierarchical and explicit.

-- Catalog access
GRANT USE CATALOG ON CATALOG prod TO `data-engineering`;

-- Schema access
GRANT USE SCHEMA ON SCHEMA prod.silver TO `data-engineering`;

-- Table read
GRANT SELECT ON TABLE prod.gold.revenue_summary TO `bi-team`;

-- Table write
GRANT SELECT, MODIFY ON TABLE prod.silver.customers TO `data-engineering`;

-- All privileges on a schema (use sparingly)
GRANT ALL PRIVILEGES ON SCHEMA prod.sandbox TO `data-science`;

Three rules:

  1. Grants flow down. USE CATALOG is required before any schema grant has effect. USE SCHEMA is required before any table grant has effect.
  2. Grants are additive. If you want to remove access, explicit REVOKE or change ownership.
  3. Ownership is separate from grants. Owners can grant to others; non-owners cannot, even if they have ALL PRIVILEGES.
-- Change ownership
ALTER TABLE prod.gold.revenue_summary OWNER TO `data-engineering`;

Common permission errors

ErrorYou need
User does not have USE CATALOG on catalog 'prod'GRANT USE CATALOG ON CATALOG prod TO <principal>
User does not have USE SCHEMA on schemaGRANT USE SCHEMA ON SCHEMA ... TO <principal>
User does not have SELECT on tableGRANT SELECT ON TABLE ... TO <principal>
User does not have CREATE TABLEGRANT CREATE TABLE ON SCHEMA ... TO <principal>

External locations and storage credentials

UC manages storage through two related objects:

CREATE STORAGE CREDENTIAL prod_s3_credential
  WITH (AWS_IAM_ROLE = 'arn:aws:iam::ACCOUNT:role/databricks-unity-catalog-role');

CREATE EXTERNAL LOCATION bronze_landing
  URL 's3://data-lake-prod/bronze/'
  WITH (STORAGE CREDENTIAL prod_s3_credential);

-- Test it
VALIDATE EXTERNAL LOCATION bronze_landing;

Every external table maps to an external location. Every managed table lives in UC-managed storage, accessed via the catalog's storage root.

Lineage, automatic

UC captures table-level and column-level lineage automatically from:

You do not configure anything. A SELECT that joins two tables produces a lineage edge between them on the next UC refresh cycle.

Databricks UI → Catalog → Navigate to table → Lineage tab

Via REST:

GET /api/2.1/unity-catalog/lineage/table-lineage?table_name=prod.silver.customers&direction=DOWNSTREAM

Note

Do not build a parallel lineage system on top of UC. You will fall behind within a quarter. If a downstream query is missing from the lineage panel, the cause is almost always dynamic column expansion (SELECT *) or an external system reading via JDBC without UC credentials. Fix those; do not recreate the lineage layer.

Information schema

Every catalog exposes an information_schema that mirrors the ANSI SQL standard:

-- Every table in a schema
SELECT table_name, table_type, data_source_format, comment
FROM prod.information_schema.tables
WHERE table_schema = 'silver'
ORDER BY table_name;

-- Columns in a table
SELECT column_name, data_type, is_nullable, comment
FROM prod.information_schema.columns
WHERE table_schema = 'silver' AND table_name = 'customers'
ORDER BY ordinal_position;

-- Who has access to what
SELECT *
FROM prod.information_schema.schema_privileges
WHERE schema_name = 'gold';

Information schema is queryable by anyone with USE CATALOG and USE SCHEMA, which makes it a natural source for internal catalogs, lineage dashboards, and automated audits.

Audit logging

Every UC action lands in system.access.audit:

-- Who touched prod.silver.* in the last week
SELECT event_time, user_identity.email, action_name,
       request_params.full_name_arg AS table_name
FROM system.access.audit
WHERE action_name IN ('getTable', 'createTable', 'deleteTable',
                      'generateTemporaryTableCredential')
  AND request_params.full_name_arg LIKE 'prod.silver.%'
  AND event_date >= CURRENT_DATE() - INTERVAL 7 DAYS
ORDER BY event_time DESC;

-- Permission changes in the last month
SELECT event_time, user_identity.email, action_name, request_params
FROM system.access.audit
WHERE action_name IN ('updatePermissions', 'grantPermission', 'revokePermission')
  AND event_date >= CURRENT_DATE() - INTERVAL 30 DAYS
ORDER BY event_time DESC;

Audit tables retain by default for 365 days; extend via metastore settings.

Tagging

Tags live alongside the object and propagate through information schema, lineage, and the REST API. The Causeway policy engine reads tags to make enforcement decisions.

ALTER TABLE prod.gold.customer_360
  SET TAGS ('pii' = 'true', 'classification' = 'confidential');

ALTER TABLE prod.gold.revenue_summary
  SET TAGS ('classification' = 'internal', 'domain' = 'finance');

The convention for Causeway workspaces:

TagMeaningValues
classificationRAG tierpublic, internal, confidential
piiDoes the table contain personally identifiable informationtrue, false
domainBusiness domainfinance, marketing, product, ops, …
tierQuality tierbronze, silver, gold

Migration from Hive metastore

Any workspace with a pre-UC history has tables in hive_metastore. Migrate them:

-- Upgrade a Hive table to a UC managed table
CREATE TABLE prod.silver.customers
  AS SELECT * FROM hive_metastore.silver.customers;

-- Or use SYNC for an in-place reference
SYNC TABLE prod.silver.customers FROM hive_metastore.silver.customers;

The per-workspace migration checklist:

See also