Lakebase: OLTP in the lakehouse

How-tos Databricks

Lakebase is Databricks' managed, serverless Postgres. It closes the one gap the Lakehouse had: low-latency transactional reads and writes without shipping data off-platform.

Under the hood: Postgres compute is decoupled from storage, data lives as open-format files in the lake, and sync tables mirror Delta into Lakebase and back without bespoke ETL. The 2026 Autoscaling generation added scale-to-zero, branching (yes, Postgres branching), and instant restore.

This guide is how to adopt Lakebase without burning yourself.

When to reach for it

Use Lakebase when you need Postgres wire-protocol access to lakehouse data:

Feature stores and vector-adjacent serving that need low-latency reads on a hot subset of Delta data.
Agent state and checkpoints that want ephemeral Postgres per agent with scale-to-zero when idle.
Application backends that already need to see analytical data and otherwise would build their own mirror.
Dev / test OLTP where branching and instant restore replace a zoo of RDS instances.

Do not use Lakebase for:

Pure analytical queries. SQL warehouses still win on those.
High-write OLTP workloads whose semantics rely on a single-node Postgres. Lakebase is managed distributed Postgres; some tuning assumptions change.

The adoption ladder

Do not migrate an existing OLTP system on day one. The Databricks-recommended ladder is:

Read-mostly synced copy. Mirror Delta into Lakebase with sync tables; let apps read from Postgres, keep writes on the existing system. Lowest risk; most teams stop here for quite a while.
Hot-path mirror. Mirror a slice of existing OLTP into Lakebase; compare; hold the old system as fallback.
Primary writes. Cut over once latency, governance, and dev velocity wins justify it.

Warning

Step 3 is a point-of-no-return migration. Run steps 1 and 2 long enough to have real load, real latency numbers, and real cutover dry runs. Cutting over because "it works in dev" has ended more than one quarter early.

1. Provision an instance

databricks lakebase create-instance --json '{
  "name": "prod-serving",
  "catalog": "prod",
  "schema": "serving",
  "size": "MEDIUM",
  "storage_size_gb": 100
}'

Sizes roughly match what RDS users expect:

Size	vCPUs	Memory	Max connections
Small	2	8 GB	100
Medium	4	16 GB	200
Large	8	32 GB	500
X-Large	16	64 GB	1000

Start at Medium unless you are confident you are below Small's workload. Scale down easier than scale up.

2. Connect from your application

Connection string is standard Postgres:

postgresql://<username>:<password>@<lakebase-host>:<port>/<database>?sslmode=require

Use a connection pool. Lakebase has finite max connections (see the table above) and Python apps without a pool open one per request.

import psycopg2.pool

pool = psycopg2.pool.ThreadedConnectionPool(
    minconn=5,
    maxconn=20,
    host="<lakebase-host>",
    port=5432,
    database="prod",
    user="<service-principal>",
    password="<token>",
    sslmode="require",
)

Rules of thumb:

Pool maxconn at 50-80% of the instance's max connections.
Connection timeout around 30 seconds.
Validation query on checkout (SELECT 1).
Close idle connections after five minutes.

Note

For production, prefer service principal + OAuth for authentication over a long-lived password. The OAuth flow integrates with Unity Catalog identity, so you get the same grant-based access control as the rest of the platform.

3. Sync a Delta table into Lakebase

Sync tables mirror a Delta table into a Postgres table, incrementally:

-- Inside the Lakebase instance
CREATE SYNC TABLE customers_hot
  FROM prod.silver.customers
  WITH (
    sync_mode = 'INCREMENTAL',
    sync_schedule = 'EVERY 5 MINUTES',
    filter = 'is_active = true AND last_seen_at > CURRENT_DATE - INTERVAL 30 DAYS'
  );

The sync target is a regular Postgres table: you can index it, query it with the full Postgres dialect, and join it against application-owned tables.

The source is a read-only mirror from the app's perspective. If you need to write back to Delta, it is a separate reverse-sync: Postgres to Delta.

4. Indexes and schema migrations

Managed does not absolve you from Postgres fundamentals.

Indexes matter. Lakebase adds them the same way Postgres does. The sync process does not auto-infer indexes; you declare them per the query patterns of your application.
Schema migrations use a real tool. Flyway, Atlas, and sqitch all work. Do not migrate schemas by clicking in the UI.
Observe everything. Postgres pg_stat_statements, connection counts, slow query log. Lakebase surfaces these in the UC observability tables.

-- Standard Postgres indexing
CREATE INDEX idx_customers_email ON customers_hot (email);
CREATE INDEX idx_customers_active ON customers_hot (is_active) WHERE is_active = true;

Danger

Do not skip connection pooling. A Lambda or Cloud Run function opening a fresh connection per invocation will exhaust the instance's connection limit in minutes under load. Managed Postgres still cares about connection pressure.

5. Branching and restore

Lakebase Autoscaling (March 2026 rollout) added two features that change how you think about dev / test OLTP:

Branching

# Create an instant branch from prod
databricks lakebase create-branch \
  --source prod-serving \
  --name pr-42-branch

# Point an app at the branch for integration tests
export DATABASE_URL=postgresql://.../pr-42-branch
# ... run tests ...

# Tear down
databricks lakebase delete-branch pr-42-branch

Branches share storage with the parent until diverged; they are cheap and fast to create. Per-PR integration tests against a real-shaped database become practical.

Instant restore

Point-in-time restore at any moment in the last 7 days (configurable). Not point-and-click recovery from a 2am incident: literal databricks lakebase restore --at 2026-04-20T03:15:00Z and you have a branch representing that moment.

6. Governance

Lakebase tables live under Unity Catalog, same as Delta tables. Every GRANT you understand from the UC model applies:

GRANT USE CATALOG ON CATALOG prod TO `app-service`;
GRANT USE SCHEMA ON SCHEMA prod.serving TO `app-service`;
GRANT SELECT ON TABLE prod.serving.customers_hot TO `app-service`;

Audit trails land in system.access.audit. Lineage tracks from the Delta source through the sync table to every application that queries it.

7. Cost

Lakebase bills for compute time and storage. Two things matter:

Scale-to-zero works. If an instance has no active queries and no holding connections, it suspends. Cold start when the next query lands is a few seconds. For dev / test instances, let them scale to zero.
Sync frequency matters. EVERY 5 MINUTES vs. EVERY 1 HOUR is a real cost difference on a large source. Set sync cadence to the slowest the consumer can tolerate.

Common mistakes

Symptom	Root cause
App errors: "too many connections"	No connection pool, or pool size bigger than instance max.
Sync table rows stale	Sync schedule is slower than consumer expects. Tighten cadence or switch to continuous sync.
Slow queries despite small tables	No index on the column the query filters by. Create one; verify with `EXPLAIN`.
Instance costs more than expected	Continuous sync on a table that could be hourly, or scale-to-zero is disabled.
Schema drift between Delta and Postgres	Delta schema evolved; sync table did not. `ALTER SYNC TABLE ... REFRESH SCHEMA`.

When to reach for it

The adoption ladder

1. Provision an instance

2. Connect from your application

3. Sync a Delta table into Lakebase

4. Indexes and schema migrations

5. Branching and restore

Branching

Instant restore

6. Governance

7. Cost

Common mistakes

See also