The single most important idea in modern data tooling: edit locally, execute remotely. Internalize it and every other choice — extension, language server, inner loop, debugger — falls out naturally. Ignore it and you end up with a slow laptop running small samples of big data, or a browser tab for every compute system, or both.

What lives where

The physics are simple. Two categories of work, two places they belong:

WorkWhere it belongsWhy
Editing, linting, formatting, navigation, autocompleteLaptopSub-second feedback. Latency to a remote compilation is a productivity killer.
Git operations, PR review, diff viewing, commit authoringLaptopGit is local by design; round-tripping to a cloud IDE for git commit is absurd.
Type checking via Pylance, Ruff formattingLaptopLanguage servers need fast filesystem access. Cloud-hosted LSPs add 100 ms to every keystroke.
AI autocomplete (Copilot, Cursor)Laptop for the input, cloud for the modelThe agent runs on the user's screen. The model runs where the provider hosts it.
Fast unit tests (pure-Python, in-memory)LaptopSeconds matter. Don't send a assert 1 + 1 == 2 to a Spark cluster.
Spark / PySpark jobs that touch real dataCloudReal data does not fit on your laptop. Databricks Connect handles the handoff.
dbt builds against prod-sized warehousesCloudThe warehouse runs the SQL. The editor compiles the Jinja and inspects results.
Airflow DAG scheduling in productionCloudAirflow on a laptop is a local development convenience, not a production deployment.
Model training, LLM inference, GPU workloadsCloudGPUs are expensive. Keep them centralized and scheduled.

VS Code becomes the one surface where both halves meet. The laptop provides the keyboard and screen. The cloud provides the compute and data. Extensions hide the boundary so cleanly that most of the time you forget which side runs what.

Why this pattern wins

Before VS Code + remote-execution extensions, data creators faced three bad options:

  1. All local. Install a tiny Spark cluster on your laptop, sample the data down to something that fits in RAM, run against the sample, pray the real data behaves the same. Sampling bias kills half of production bugs before they are even seen.
  2. All remote. Live in a cloud IDE (Databricks workspace UI, Snowflake Snowsight, AWS Cloud9). Lose every editor affordance you cared about: real keyboard shortcuts, a real Git client, real language servers, real AI agents. Productivity collapses to the lowest common denominator.
  3. Tabbed browsers. Edit in VS Code, tab to the Databricks UI to run, tab to Snowsight to query, tab to GitHub to review, tab to Slack to ask a question. Every tab switch fragments attention and spawns errors.

The edit-local-execute-remote pattern dissolves the trade-off. The editor experience is as rich as a desktop IDE can make it. The compute is as real as production can get. You do not choose between them.

How the seams are hidden

Three examples of how the pattern works in practice.

Databricks Connect

The Databricks extension plus the databricks-connect Python client gives you a local SparkSession that is really a remote one:

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.getOrCreate()
df = spark.table("main.silver.subscription_events")
df.filter(df.status == "active").count()

The SparkSession.builder call returns an object that looks like a local Spark driver but forwards every DataFrame operation to the configured remote cluster. Pandas-style exploration runs on cluster memory. Debugger breakpoints work against the local Python code. df.show() streams results back and prints them in the integrated terminal.

You write what reads like local PySpark. The cluster runs it. Inspection feels local because the Python process on your laptop is what holds the driver state.

dbt compile-and-inspect

dbt compilation is itself partly local and partly remote:

The dbt extensions (Fusion LSP or dbt Power User) show the compiled SQL inline before you save. You read the final SQL, hover to inspect CTEs, and confirm the compile output matches your mental model — all without leaving the editor.

Note

The compile step is the "local" in edit-local-execute-remote. That is what makes the inner loop fast for dbt: you catch half of your bugs in compiled SQL before any warehouse credits are spent.

Astro CLI

Airflow is different: the compute is light, but the scheduler is heavy. The pattern there is develop locally, deploy remotely:

The laptop is the iteration surface. The cloud is the production surface. You never iterate on production DAGs by editing them in the web UI.

When the pattern breaks

The pattern has limits. Recognize them and work around them.

These are exceptions. The default is still edit-local-execute-remote for the 95% of work that tolerates it.

Implications for the toolchain

Once the pattern is settled, the rest of the toolchain arranges itself:

The discipline

The pattern works only if you stay in the editor. The moment you tab to a browser to "just quickly" check something in the Databricks UI, the loop breaks. Every tool and convention in the rest of this documentation exists to make sure you do not have to leave.

See also