These standards bind every repository a data creator opens as their primary workspace. They are not recommendations. Exceptions require a documented waiver in the PR.
1. Repo artifacts
Every data-product repo ships the following files. Missing any of them blocks PR merge:
- [ ]
.vscode/extensions.jsonwith the paved-path recommendations array. - [ ]
.vscode/settings.jsonwith format-on-save, Python interpreter discovery, and YAML schema wiring. - [ ]
.vscode/launch.jsonwith at least one debugger config for the repo's primary language. - [ ]
.vscode/tasks.jsonwith the commands the engineer runs dozens of times a day. - [ ]
.devcontainer/devcontainer.json(or a documented exemption in the repo README). - [ ]
.editorconfigat the repo root enforcing indent and line endings. - [ ] A project-level AI agent config (
.claude/CLAUDE.md,.cursor/rules, or equivalent) if the team uses agents.
2. Extension recommendations
- Use the published Extension Pack ID in
recommendationswhen one exists. Do not enumerate individual IDs per repo when the pack already bundles them. - Include
unwantedRecommendationsfor extensions that conflict with the pack (e.g.,ms-python.black-formatterwhen the pack uses Ruff). - Do not include personal-preference extensions (themes, icon packs). Those belong in user Settings Sync.
3. Workspace settings
- Format-on-save on for Python, SQL, YAML, and Markdown. Failure to save-format is the single most common cause of CI lint failures.
python.defaultInterpreterPathpointing at the project venv. Do not rely on system Python.- Test framework configured (pytest, in most cases). Not configuring it means half the team's Test Explorer is blank.
- YAML schemas wired for at least GitHub Actions, dbt
schema.yml, and Databricksdatabricks.yml. - Watcher exclusions set for
.venv,target,dbt_packages,__pycache__. Missing exclusions cause laggy editors on medium repos.
Important
Workspace settings commit to the repo. Never place a secret, a personal auth token, or a user-specific path (with /Users/you/... in it) in .vscode/settings.json. Use ${env:VAR} or ${workspaceFolder} substitutions.
4. Launch configs
- At least one runnable config per primary entry point. "Run the current file," "run a specific module," and "run pytest on the current file" cover 90% of cases.
- At least one debug config for the stack's remote-execution path. Databricks Connect for PySpark work, attach-to-container for dbt Python models, attach-to-Astro for Airflow work.
- Launch configs reference settings, not literals. Use
${workspaceFolder},${command:python.interpreterPath}, and${env:VAR}. Hard-coded paths break on other engineers' laptops. - No
console: externalTerminalin committed configs. Breaks on every OS except the one the author tested on.
5. Tasks
- Tasks for every command the engineer runs more than three times a day. Rule of thumb: if it has a cadence, it gets a task.
- The default build task (
"group": { "kind": "build", "isDefault": true }) runs the repo's primary pipeline. Usuallydbt build --select state:modified+ --deferfor a dbt repo,pytestfor a Python package, ordatabricks bundle deploy --target devfor a DAB repo. - No
presentation: { "reveal": "never" }on tasks that can fail. Silent failures waste hours. - No task shells out to
curl | sh. Ever.
6. Devcontainers
- Devcontainer exists unless the team has an RFD-documented reason otherwise.
- Base image pinned by tag, not
latest.mcr.microsoft.com/devcontainers/python:3.12@sha256:...is better than:3.12. - Features pinned by version.
"ghcr.io/devcontainers/features/aws-cli:1.0.3", not:1or unversioned. postCreateCommandis idempotent. Re-running it on an existing container does not break anything.- No secrets in
devcontainer.json. Credentials come from host-mount passthroughs or OAuth. - The devcontainer Python version matches the repo's
pyproject.toml. Drift is a bug. - The devcontainer JDK version matches the Spark version's requirement (for PySpark projects).
7. Agent configuration
If the team uses AI agents (most do), additional rules apply:
CLAUDE.md,cursor/rules, or equivalent in the repo root, describing:- The project's tech stack and conventions.
- The test and lint commands the agent should run.
- Which files the agent should never modify (e.g.,
dbt_packages/,target/).
- MCP config scoped to the project (
.mcp.json,.cursor/mcp.json) rather than global. - No secrets in MCP config. Use env-var substitution.
- Pinned MCP server versions. A floating version is an unreproducible agent.
See the MCP servers guide and AI agents for the specifics.
8. Git hygiene
.gitignoreexcludes.venv/,target/,logs/,dbt_packages/,.DS_Store, and editor scratch files..vscode/is committed. It is project infrastructure, not a scratch directory..devcontainer/is committed.- User-specific caches are not committed (
.ruff_cache/,.pytest_cache/,.mypy_cache/).
Warning
Committing .venv/ to Git is the classic junior-engineer mistake that bloats the repo forever. Audit the initial commit. If it lands, prune with git filter-repo and force-push before the repo gets cloned widely.
9. Pre-commit hooks
Every repo ships a .pre-commit-config.yaml running, at minimum:
- Ruff check and Ruff format for Python.
- SQLFluff lint and fix for SQL.
dbt parsefor dbt repos (catches compile errors before push).- YAML schema validation for
schema.yml,databricks.yml, and CI workflows. - Trailing-whitespace and end-of-file-fixer checks.
VS Code respects pre-commit; failures surface in the Problems pane and in the Source Control panel.
10. Shared formatters
- Ruff for Python. Not Black, not autopep8. Ruff's formatter is stable and 10× faster.
- SQLFluff for SQL. Not
sqlformat, not a bespoke formatter. terraform fmtfor Terraform via the HashiCorp extension.markdownlintfor Markdown, via the official extension. Only enable the rules the team has agreed to.
Mixing formatters within a repo is a lint-noise generator. Pick one per language, pin the version in both pyproject.toml / .sqlfluff and CI.
11. Per-stack rules
Python
- [ ] Pylance installed with
typeCheckingMode: basic. - [ ] Ruff configured via
pyproject.toml's[tool.ruff]block. - [ ] pytest discovery working in Test Explorer.
- [ ] debugpy available;
F5on any test file runs under the debugger.
dbt
- [ ] One of:
dbtLabsInc.dbt(Fusion) orinnoverio.vscode-dbt-power-user(Core). - [ ]
DBT_PROFILES_DIRset viaterminal.integrated.env.*or.envrc. - [ ]
schema.ymlfiles have yaml-language-server schema comments. - [ ] Jinja-SQL highlighting via
samuelcolvin.jinjahtml.
Databricks
- [ ] Databricks extension authenticated via OAuth (U2M), not a PAT.
- [ ]
databricks-connectversion pinned to the cluster's DBR version inpyproject.toml. - [ ]
databricks.ymlschema wired inyaml.schemas. - [ ] A launch config named "Python: Databricks Connect" ready to run the current file.
Airflow
- [ ] Astro CLI on PATH;
astro dev startcommand is a Task. - [ ]
astro dev run dags testis a Task parameterized with the DAG id. - [ ] DAG files open with Python extension's language mode (not plain text).
12. Anti-patterns
- No
.vscode/at all, "everyone sets their own." A repo without committed editor config is a repo where every contributor pays a configuration tax on day one. - Workspace settings overriding user preferences.
editor.fontSize, themes, and key rebindings in workspace config impose one engineer's taste on the team. - Launch configs with hard-coded
/Users/you/...paths. Breaks on every other laptop. - Dev containers that diverge from production. Different Python versions, different JDK versions, different CLI versions. Drift is a bug.
- MCP servers with broad write scopes. Mostly because someone needed
reposcope on one script and never narrowed.
13. The review checklist
PRs that touch workspace configuration must satisfy:
- [ ] No secrets, tokens, or user-specific paths in committed files.
- [ ] Launch configs use
${workspaceFolder}and${env:*}substitutions. - [ ] Tasks do not fetch-and-execute untrusted scripts.
- [ ] Devcontainer features are version-pinned.
- [ ] Extension recommendations match the paved-path pack.
- [ ] Settings do not introduce personal-preference values (themes, font sizes).
- [ ] Pre-commit hooks match the CI pipeline's linter set.
14. Governance
- Platform team owns the paved-path Extension Pack and publishes it to the org's internal marketplace (or, for smaller shops, a public marketplace entry).
- Quarterly review of the pack, the default settings, and the devcontainer base images.
- A documented waiver path for repos that need to opt out of a specific rule. Waivers expire.
- CI enforcement of the minimum file set (
.vscode/extensions.json,.devcontainer/,.pre-commit-config.yaml).
Important
Workspace standards are what make onboarding a day, not a week. The discipline of committing the paved path to every repo compounds: new engineers clone, open, and are productive in an hour. Teams that skip this consistently onboard in weeks, not hours.
See also
- The extension pack reference — the specific extension IDs the pack ships.
- Settings reference — the JSON recipes these rules reference.
- Production readiness — the operational gate that follows this baseline.
- Devcontainers — the reproducible environments these standards assume.