Databricks: quickstart · Docs

Get started Databricks

This walkthrough gets you from a fresh checkout to a running Databricks job, deployed from your laptop, observable in the workspace UI. It assumes you can read Python and have used a shell, but not that you have ever touched Databricks before.

What you need

Workspace URL and either a personal access token or OIDC credentials.
Databricks CLI installed locally.
Python 3.11, pip, a text editor.
Permission to create a job in a target workspace schema.

Install the CLI:

brew install databricks      # macOS
# or
pip install databricks-cli   # any platform

Authenticate once:

databricks auth login --host https://your-workspace.cloud.databricks.com

Test:

databricks current-user me

Note

If your organization uses OIDC federation from CI, the databricks auth login prompt instead walks you through browser-based OAuth. The end result is the same: your CLI can talk to the workspace without you managing a long-lived token.

1. Create a bundle

A Databricks Asset Bundle (DAB) is a declarative description of jobs, pipelines, warehouses, and notebooks. In 2026 it is the only sanctioned way to ship work to Databricks.

databricks bundle init default-python

The scaffolding creates:

my_project/
  databricks.yml            # top-level bundle definition
  resources/
    my_project.job.yml       # one example job
  src/
    my_project/
      main.py                # Python code the job runs

Bundle structure is the unit of deployment, not the notebook. You edit Python files in your editor of choice, push to Git, and let the bundle deploy command do the rest.

2. Edit the bundle

Open databricks.yml and make it look like this:

bundle:
  name: my_project

variables:
  catalog:
    description: Unity Catalog for this env
    default: dev

targets:
  dev:
    mode: development
    default: true
    workspace:
      host: https://your-workspace.cloud.databricks.com

Open resources/my_project.job.yml:

resources:
  jobs:
    hello_job:
      name: hello_${bundle.target}
      tasks:
        - task_key: run_main
          python_wheel_task:
            package_name: my_project
            entry_point: main
          libraries:
            - whl: ../dist/*.whl

And src/my_project/main.py:

def main():
    print("hello from causeway via databricks")

3. Validate

Never deploy a bundle you have not validated first:

databricks bundle validate

Validation catches YAML drift, missing variables, undefined references, and permission mismatches. It takes about three seconds.

Warning

bundle validate is the step one of every CI run. If you ever skip it and go straight to deploy, you will get mysterious failures that take longer to unpick than the five seconds validate would have cost. Make it muscle memory.

4. Deploy to dev

databricks bundle deploy --target dev

Deployment uploads your code, creates the job in the workspace, and wires everything up. Because targets.dev.mode is development, the job is scoped to your user: it appears in the UI as [dev your.email@company.com] hello_dev.

5. Run the job

databricks bundle run hello_job

The CLI prints a run URL. Open it. You see the task progress in the UI, the stdout of your main() function, and the run metadata.

6. Clean up

Deleting a dev deployment is one command:

databricks bundle destroy --target dev

What just happened

You:

Authenticated against a Databricks workspace with a scoped identity.
Declared a job plus its code in version-controlled YAML + Python.
Validated the declaration before applying it.
Deployed scoped to yourself so you did not collide with colleagues.
Ran the job and observed it in the workspace UI.
Tore it down cleanly.

That is the whole development loop. The same commands scale from "hello world" to a production pipeline with thirty tasks and three environments.

What to resist

Danger

Two anti-patterns to kill on sight:

Clicking to create a job in the UI, then referencing it from a bundle. The next bundle deploy overwrites your UI edits. Treat the UI as read-only for bundle-managed resources.
Attaching a production job to an all-purpose cluster. All-purpose compute costs 2–3× job compute. It is the single most common line item on Databricks billing reviews. Serverless job compute or a dedicated job cluster are the right defaults.

Next steps

Compute types — picking the right engine per workload.
Asset Bundles guide — everything you can ship via DAB.
Unity Catalog — where your tables live, how grants work.
Workspace authoring standards — the rules that apply from the first commit onward.