In 2025-2026 Databricks consolidated three previously separate products under the Lakeflow umbrella. The pieces have clean boundaries once you see them, and mushing them together is the source of most "why are we building this ourselves" conversations.
| Piece | Previously | Does |
|---|---|---|
| Lakeflow Connect | (Fivetran-style) | Managed connectors from SaaS and databases into Unity Catalog |
| Lakeflow Declarative Pipelines (LDP) | Delta Live Tables | Declarative streaming/batch transformations with quality expectations and CDC |
| Lakeflow Jobs | Databricks Workflows | Native orchestration of Databricks tasks |
Lakeflow Connect
Managed connectors for SaaS and databases: Salesforce, Workday, SQL Server CDC, Google Analytics, and a growing catalog. Think Fivetran-inside-Databricks.
Use Connect when the source is supported and you want the bytes in Unity Catalog without owning ingest infrastructure. Do not build your own Salesforce connector in Python if Connect already speaks to it.
The output lands as Delta tables in a bronze schema of your choice. From there, dbt or LDP take over.
Lakeflow Declarative Pipelines (LDP)
Formerly Delta Live Tables. A declarative framework for streaming and incremental batch transformations with:
- Expectations: data quality constraints that warn, drop, or fail a pipeline.
- Automatic lineage: every table's upstream and downstream inferred from the pipeline code.
- Change data capture:
AUTO CDC(replacingAPPLY CHANGES INTO) handles SCD-1 and SCD-2 from CDC feeds. - Automatic backfill and retry: the framework owns the plumbing.
The model is SQL or Python:
import dlt
@dlt.table(
comment="Customer segments, behavioral",
tblproperties={"quality": "silver"},
)
@dlt.expect_or_drop("valid_customer_id", "customer_id IS NOT NULL")
@dlt.expect("reasonable_ltv", "ltv_usd BETWEEN 0 AND 1000000")
def stg_customer_segments():
return (
spark.readStream.table("prod.bronze.customer_segments_raw")
.select("customer_id", "segment_id", "ltv_usd", "_loaded_at")
)
When to reach for LDP
- Streaming or near-real-time silver tables with quality gates enforced at write time.
- CDC ingestion (SCD-1 or SCD-2) from a Delta change feed or a CDC source.
- Pipelines where you want Databricks to own the incrementalization rather than coding
is_incremental()by hand in dbt.
2026 notes
Three things shifted in the last year that matter:
AUTO CDChas replacedAPPLY CHANGES INTO. New pipelines should use it.- Queued execution mode serializes concurrent triggers cleanly; you no longer hit conflict errors from overlapping runs.
- Type widening lets you broaden column types without a full pipeline reset.
- Pipeline configs can live as Unity Catalog table properties, unifying governance between pipeline metadata and table metadata.
LDP vs. dbt
Both exist on Databricks. They are not competitors.
| LDP is right when | dbt is right when |
|---|---|
| Streaming / near-real-time matters | Batch SQL transformations |
| You want expectations enforced at write time | Your team lives in SQL and wants macros/packages |
| CDC is a first-class concern | You need the manifest for Slim CI and Cosmos |
| Incrementalization should be Databricks' problem | You want full control of the incremental logic |
The pattern most mature Causeway teams land on: LDP owns silver streaming tables with quality gates; dbt owns gold marts. See the dbt-on-Databricks quickstart for the dbt half.
Lakeflow Jobs
The native orchestrator. Formerly Databricks Workflows. A Lakeflow Job is one or more tasks with dependencies, schedules, retries, SLAs, and alerts.
When Lakeflow Jobs is the right orchestrator
- Everything you orchestrate is on Databricks. No cross-platform DAGs.
- You want the best UX for retries and lineage within the platform.
- You do not want to pay for a separate orchestration control plane.
When Airflow is the right orchestrator
- Your DAG crosses platforms. Databricks after a Snowflake sync, after an SFTP drop, after a Salesforce export, after a dbt-on-Redshift build. Airflow is a cross-platform workflow engine; Lakeflow is a data-first engine.
- You already have a mature Airflow estate you do not want to split.
The 2026 canonical pattern
Lakeflow Jobs and Airflow coexist when the graph crosses platforms:
- DAB owns the job definition. The job is declared in YAML, deployed by
bundle deploy. - Airflow triggers by job ID via
DatabricksRunNowOperator. It does not redefine the job. - Both live in Git.
from airflow.providers.databricks.operators.databricks import DatabricksRunNowOperator
run = DatabricksRunNowOperator(
task_id="run_silver_pipeline",
databricks_conn_id="databricks_default",
job_id="{{ var.value.silver_pipeline_job_id }}",
)
Warning
DatabricksSubmitRunOperator submits a JSON job spec at trigger time. It exists for one-off cases; avoid it in production. Using it means you have two sources of truth for the job definition: the bundle in Git and the JSON in the Airflow DAG. They will drift.
Putting it together
A typical Causeway pipeline stitches all three:
Salesforce → [Lakeflow Connect] → prod.bronze.sf_accounts
↓
[Lakeflow Declarative Pipeline]
stg_accounts (streaming table + expectations)
int_accounts_joined (materialized view)
↓
[dbt] prod.gold.dim_customers
↓
[Lakeflow Jobs]
schedule → refresh BI dashboards
↓
[Airflow]
sequence with upstream dbt-on-Redshift run
Each tool does the part it is best at. None of them overlaps with another.
See also
- Asset Bundles guide — the deployment unit for jobs and pipelines.
- Declarative Pipelines guide — building an LDP.
- dbt + Databricks quickstart — the dbt half of the pattern.