Lakebase is Databricks' managed, serverless Postgres. It closes the one gap the Lakehouse had: low-latency transactional reads and writes without shipping data off-platform.
Under the hood: Postgres compute is decoupled from storage, data lives as open-format files in the lake, and sync tables mirror Delta into Lakebase and back without bespoke ETL. The 2026 Autoscaling generation added scale-to-zero, branching (yes, Postgres branching), and instant restore.
This guide is how to adopt Lakebase without burning yourself.
When to reach for it
Use Lakebase when you need Postgres wire-protocol access to lakehouse data:
- Feature stores and vector-adjacent serving that need low-latency reads on a hot subset of Delta data.
- Agent state and checkpoints that want ephemeral Postgres per agent with scale-to-zero when idle.
- Application backends that already need to see analytical data and otherwise would build their own mirror.
- Dev / test OLTP where branching and instant restore replace a zoo of RDS instances.
Do not use Lakebase for:
- Pure analytical queries. SQL warehouses still win on those.
- High-write OLTP workloads whose semantics rely on a single-node Postgres. Lakebase is managed distributed Postgres; some tuning assumptions change.
The adoption ladder
Do not migrate an existing OLTP system on day one. The Databricks-recommended ladder is:
- Read-mostly synced copy. Mirror Delta into Lakebase with sync tables; let apps read from Postgres, keep writes on the existing system. Lowest risk; most teams stop here for quite a while.
- Hot-path mirror. Mirror a slice of existing OLTP into Lakebase; compare; hold the old system as fallback.
- Primary writes. Cut over once latency, governance, and dev velocity wins justify it.
Warning
Step 3 is a point-of-no-return migration. Run steps 1 and 2 long enough to have real load, real latency numbers, and real cutover dry runs. Cutting over because "it works in dev" has ended more than one quarter early.
1. Provision an instance
databricks lakebase create-instance --json '{
"name": "prod-serving",
"catalog": "prod",
"schema": "serving",
"size": "MEDIUM",
"storage_size_gb": 100
}'
Sizes roughly match what RDS users expect:
| Size | vCPUs | Memory | Max connections |
|---|---|---|---|
| Small | 2 | 8 GB | 100 |
| Medium | 4 | 16 GB | 200 |
| Large | 8 | 32 GB | 500 |
| X-Large | 16 | 64 GB | 1000 |
Start at Medium unless you are confident you are below Small's workload. Scale down easier than scale up.
2. Connect from your application
Connection string is standard Postgres:
postgresql://<username>:<password>@<lakebase-host>:<port>/<database>?sslmode=require
Use a connection pool. Lakebase has finite max connections (see the table above) and Python apps without a pool open one per request.
import psycopg2.pool
pool = psycopg2.pool.ThreadedConnectionPool(
minconn=5,
maxconn=20,
host="<lakebase-host>",
port=5432,
database="prod",
user="<service-principal>",
password="<token>",
sslmode="require",
)
Rules of thumb:
- Pool
maxconnat 50-80% of the instance's max connections. - Connection timeout around 30 seconds.
- Validation query on checkout (
SELECT 1). - Close idle connections after five minutes.
Note
For production, prefer service principal + OAuth for authentication over a long-lived password. The OAuth flow integrates with Unity Catalog identity, so you get the same grant-based access control as the rest of the platform.
3. Sync a Delta table into Lakebase
Sync tables mirror a Delta table into a Postgres table, incrementally:
-- Inside the Lakebase instance
CREATE SYNC TABLE customers_hot
FROM prod.silver.customers
WITH (
sync_mode = 'INCREMENTAL',
sync_schedule = 'EVERY 5 MINUTES',
filter = 'is_active = true AND last_seen_at > CURRENT_DATE - INTERVAL 30 DAYS'
);
The sync target is a regular Postgres table: you can index it, query it with the full Postgres dialect, and join it against application-owned tables.
The source is a read-only mirror from the app's perspective. If you need to write back to Delta, it is a separate reverse-sync: Postgres to Delta.
4. Indexes and schema migrations
Managed does not absolve you from Postgres fundamentals.
- Indexes matter. Lakebase adds them the same way Postgres does. The sync process does not auto-infer indexes; you declare them per the query patterns of your application.
- Schema migrations use a real tool. Flyway, Atlas, and sqitch all work. Do not migrate schemas by clicking in the UI.
- Observe everything. Postgres
pg_stat_statements, connection counts, slow query log. Lakebase surfaces these in the UC observability tables.
-- Standard Postgres indexing
CREATE INDEX idx_customers_email ON customers_hot (email);
CREATE INDEX idx_customers_active ON customers_hot (is_active) WHERE is_active = true;
Danger
Do not skip connection pooling. A Lambda or Cloud Run function opening a fresh connection per invocation will exhaust the instance's connection limit in minutes under load. Managed Postgres still cares about connection pressure.
5. Branching and restore
Lakebase Autoscaling (March 2026 rollout) added two features that change how you think about dev / test OLTP:
Branching
# Create an instant branch from prod
databricks lakebase create-branch \
--source prod-serving \
--name pr-42-branch
# Point an app at the branch for integration tests
export DATABASE_URL=postgresql://.../pr-42-branch
# ... run tests ...
# Tear down
databricks lakebase delete-branch pr-42-branch
Branches share storage with the parent until diverged; they are cheap and fast to create. Per-PR integration tests against a real-shaped database become practical.
Instant restore
Point-in-time restore at any moment in the last 7 days (configurable). Not point-and-click recovery from a 2am incident: literal databricks lakebase restore --at 2026-04-20T03:15:00Z and you have a branch representing that moment.
6. Governance
Lakebase tables live under Unity Catalog, same as Delta tables. Every GRANT you understand from the UC model applies:
GRANT USE CATALOG ON CATALOG prod TO `app-service`;
GRANT USE SCHEMA ON SCHEMA prod.serving TO `app-service`;
GRANT SELECT ON TABLE prod.serving.customers_hot TO `app-service`;
Audit trails land in system.access.audit. Lineage tracks from the Delta source through the sync table to every application that queries it.
7. Cost
Lakebase bills for compute time and storage. Two things matter:
- Scale-to-zero works. If an instance has no active queries and no holding connections, it suspends. Cold start when the next query lands is a few seconds. For dev / test instances, let them scale to zero.
- Sync frequency matters.
EVERY 5 MINUTESvs.EVERY 1 HOURis a real cost difference on a large source. Set sync cadence to the slowest the consumer can tolerate.
Common mistakes
| Symptom | Root cause |
|---|---|
| App errors: "too many connections" | No connection pool, or pool size bigger than instance max. |
| Sync table rows stale | Sync schedule is slower than consumer expects. Tighten cadence or switch to continuous sync. |
| Slow queries despite small tables | No index on the column the query filters by. Create one; verify with EXPLAIN. |
| Instance costs more than expected | Continuous sync on a table that could be hourly, or scale-to-zero is disabled. |
| Schema drift between Delta and Postgres | Delta schema evolved; sync table did not. ALTER SYNC TABLE ... REFRESH SCHEMA. |
See also
- Unity Catalog concepts — how Lakebase plugs into the governance plane.
- Compute types — why Lakebase is not a substitute for a SQL warehouse.
- Production readiness — what Lakebase items land on the checklist.