SQL Standards
This document outlines the SQL standards and best practices for the Causeway data platform. Following these standards ensures consistency, maintainability, and optimal performance across all database interactions.
Consistent SQL standards are critical for maintaining a well-functioning data ecosystem. Whether you're creating data models, writing queries, or developing data pipelines, these guidelines will help ensure your SQL code is readable, performant, and follows best practices.
Naming Conventions
| Database Object | Convention | Example |
|---|---|---|
| Tables | Snake case, plural nouns | customer_orders, product_categories |
| Columns | Snake case, singular descriptive names | first_name, created_at |
| Primary Keys | Table name (singular) + _id | customer_id, order_id |
| Foreign Keys | Referenced table (singular) + _id | customer_id in orders table |
| Views | Prefix vw_ + descriptive name | vw_customer_orders_summary |
| Functions | Prefix fn_ + verb + noun | fn_calculate_order_total |
| Stored Procedures | Prefix sp_ + verb + noun | sp_update_inventory |
| Indexes | Prefix idx_ + table + columns | idx_customers_last_name |
| Constraints | Prefix based on type (pk_, fk_, uq_, ck_) | pk_customers, fk_orders_customers |
| Temporary Tables | Prefix tmp_ or # (for SQL Server) | tmp_order_processing, #order_processing |
SQL Style Guidelines
Capitalization
- Use uppercase for SQL keywords (SELECT, FROM, WHERE, etc.)
- Use lowercase for table names, column names, and aliases
- Use CamelCase for alias names when needed for readability
Indentation and Formatting
- Use consistent indentation (2 or 4 spaces recommended)
- Place each major SQL clause on a new line
- Include line breaks for readability, especially for complex queries
- Align JOIN clauses and conditions for better readability
- Use table aliases when joining multiple tables
-- Good formatting example
SELECT
c.customer_id,
c.first_name,
c.last_name,
o.order_id,
o.order_date,
o.total_amount
FROM
customers c
JOIN
orders o ON c.customer_id = o.customer_id
WHERE
o.order_date >= '2023-01-01'
AND o.total_amount > 100
ORDER BY
o.order_date DESC
LIMIT 100;
Comments
- Add a header comment for complex queries explaining the purpose
- Comment on complex logic or business rules
- Use inline comments sparingly and only when necessary
- Use
--for single-line comments and/* */for multi-line comments
/*
* Query to find high-value customers who made purchases in the last 30 days
* Used for the monthly marketing campaign
* Created by: Data Team
* Last updated: 2023-05-15
*/
SELECT
c.customer_id,
c.email,
SUM(o.total_amount) AS total_spent
FROM
customers c
JOIN
orders o ON c.customer_id = o.customer_id
WHERE
o.order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY
c.customer_id,
c.email
HAVING
SUM(o.total_amount) > 1000 -- High-value threshold defined by marketing
ORDER BY
total_spent DESC;
Query Best Practices
Select Only Required Columns
Avoid using SELECT * in production code. Explicitly specify the columns you need to improve performance and make dependencies clear.
Instead of:
SELECT * FROM customers;
Use:
SELECT
customer_id,
first_name,
last_name,
email
FROM
customers;
Use Appropriate JOINs
Always use the appropriate JOIN type (INNER, LEFT, RIGHT, FULL) based on your requirements. Be explicit with JOIN conditions.
-- Example of explicit JOIN
SELECT
c.customer_id,
c.first_name,
o.order_id
FROM
customers c
LEFT JOIN -- Using LEFT JOIN because we want all customers even without orders
orders o ON c.customer_id = o.customer_id;
Use Table Aliases
Use meaningful table aliases, especially when working with multiple tables. Use single-letter aliases only for simple queries.
Instead of:
SELECT
a.id,
a.name,
b.id,
b.date
FROM
customers a
JOIN
orders b ON a.id = b.customer_id;
Use:
SELECT
cust.customer_id,
cust.customer_name,
ord.order_id,
ord.order_date
FROM
customers cust
JOIN
orders ord ON cust.customer_id = ord.customer_id;
Optimize WHERE Clauses
Place the most restrictive conditions first. Avoid functions on indexed columns in WHERE clauses as they can prevent index usage.
Instead of:
SELECT
*
FROM
orders
WHERE
EXTRACT(YEAR FROM order_date) = 2023;
Use:
SELECT
*
FROM
orders
WHERE
order_date >= '2023-01-01'
AND order_date < '2024-01-01';
Use CTEs for Complex Queries
Use Common Table Expressions (CTEs) for complex queries to improve readability and maintainability.
WITH monthly_sales AS (
SELECT
EXTRACT(MONTH FROM order_date) AS month,
SUM(total_amount) AS revenue
FROM
orders
WHERE
order_date >= '2023-01-01'
GROUP BY
EXTRACT(MONTH FROM order_date)
)
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
revenue - LAG(revenue) OVER (ORDER BY month) AS revenue_growth
FROM
monthly_sales
ORDER BY
month;
Data Types
Use appropriate data types for the data being stored. Here are the recommended data types for common scenarios:
| Type of Data | Recommended Data Type | Notes |
|---|---|---|
| Primary Keys | INTEGER or BIGINT | Use with AUTO_INCREMENT or IDENTITY |
| Small Text | VARCHAR(n) | For variable-length text with a reasonable maximum size |
| Large Text | TEXT | For large text content without a defined limit |
| Integer Numbers | INTEGER or BIGINT | Choose based on expected range of values |
| Decimal Numbers | DECIMAL(p,s) | Use for exact numeric values like money |
| Floating Point | FLOAT or DOUBLE | Use when exact decimal precision is not required |
| Boolean | BOOLEAN | Use for true/false flags |
| Dates | DATE | For date values without time |
| Date and Time | TIMESTAMP | With time zone awareness when needed |
| Time | TIME | For time values without a date |
Common Patterns
Pagination
Use the following pattern for paginated results:
SELECT
column1,
column2
FROM
table_name
ORDER BY
sort_column
LIMIT
page_size
OFFSET
(page_number - 1) * page_size;
Hierarchical Data
For hierarchical data (like organizational structures), use a recursive CTE:
WITH RECURSIVE hierarchy AS (
-- Base case: top-level categories
SELECT
id,
name,
parent_id,
1 AS level
FROM
categories
WHERE
parent_id IS NULL
UNION ALL
-- Recursive case: child categories
SELECT
c.id,
c.name,
c.parent_id,
h.level + 1
FROM
categories c
JOIN
hierarchy h ON c.parent_id = h.id
)
SELECT
*
FROM
hierarchy
ORDER BY
level, name;
Pivot Tables
For dynamic pivot tables, use conditional aggregation:
SELECT
product_id,
SUM(CASE WHEN month = 1 THEN sales ELSE 0 END) AS jan_sales,
SUM(CASE WHEN month = 2 THEN sales ELSE 0 END) AS feb_sales,
SUM(CASE WHEN month = 3 THEN sales ELSE 0 END) AS mar_sales,
SUM(sales) AS total_sales
FROM
monthly_sales
WHERE
year = 2023
GROUP BY
product_id
ORDER BY
total_sales DESC;
Anti-Patterns
SELECT *
Never use SELECT * in production code. It creates implicit dependencies, reduces query performance, and returns unnecessary data.
Instead, explicitly list required columns:
SELECT
customer_id,
first_name,
last_name,
email
FROM
customers;
Implicit JOINs
Avoid old-style implicit joins in the WHERE clause. They are harder to read and can lead to accidental cross joins.
Avoid:
SELECT
c.customer_id,
o.order_id
FROM
customers c,
orders o
WHERE
c.customer_id = o.customer_id;
Use explicit JOINs:
SELECT
c.customer_id,
o.order_id
FROM
customers c
JOIN
orders o ON c.customer_id = o.customer_id;
Functions on Indexed Columns
Avoid using functions on indexed columns in WHERE clauses as they prevent the database from using indexes.
Avoid:
SELECT
*
FROM
customers
WHERE
LOWER(email) = 'customer@example.com';
Better approach:
SELECT
*
FROM
customers
WHERE
email = 'customer@example.com';
Correlated Subqueries
Avoid correlated subqueries when possible, as they execute for each row in the outer query, leading to poor performance.
Avoid:
SELECT
c.customer_id,
c.first_name,
(SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) AS order_count
FROM
customers c;
Use JOIN and GROUP BY:
SELECT
c.customer_id,
c.first_name,
COUNT(o.order_id) AS order_count
FROM
customers c
LEFT JOIN
orders o ON c.customer_id = o.customer_id
GROUP BY
c.customer_id,
c.first_name;
Examples
Basic Query
-- Basic query with proper formatting and style
SELECT
customer_id,
first_name,
last_name,
email,
created_at
FROM
customers
WHERE
status = 'active'
AND created_at >= '2023-01-01'
ORDER BY
created_at DESC
LIMIT 100;
JOIN Query
-- JOIN query with table aliases and comments
SELECT
c.customer_id,
c.first_name,
c.last_name,
o.order_id,
o.order_date,
o.total_amount,
p.payment_method
FROM
customers c
INNER JOIN -- INNER JOIN because we only want customers with orders
orders o ON c.customer_id = o.customer_id
LEFT JOIN -- LEFT JOIN because some orders might not have payments yet
payments p ON o.order_id = p.order_id
WHERE
o.order_date BETWEEN '2023-01-01' AND '2023-12-31'
AND o.status = 'completed'
ORDER BY
o.order_date DESC,
o.total_amount DESC;
Aggregation and Grouping
-- Aggregation with grouping and having
SELECT
c.customer_id,
c.first_name,
c.last_name,
COUNT(o.order_id) AS order_count,
SUM(o.total_amount) AS total_spent,
AVG(o.total_amount) AS avg_order_value,
MIN(o.order_date) AS first_order_date,
MAX(o.order_date) AS last_order_date
FROM
customers c
JOIN
orders o ON c.customer_id = o.customer_id
WHERE
o.order_date >= '2023-01-01'
AND o.status = 'completed'
GROUP BY
c.customer_id,
c.first_name,
c.last_name
HAVING
COUNT(o.order_id) >= 3 -- Only include customers with at least 3 orders
AND SUM(o.total_amount) > 1000 -- And who spent over $1000
ORDER BY
total_spent DESC;
Complex Query with CTEs
-- Complex query with Common Table Expressions (CTEs)
WITH customer_orders AS (
SELECT
c.customer_id,
c.first_name,
c.last_name,
c.email,
COUNT(o.order_id) AS order_count,
SUM(o.total_amount) AS total_spent
FROM
customers c
JOIN
orders o ON c.customer_id = o.customer_id
WHERE
o.order_date >= '2023-01-01'
AND o.status = 'completed'
GROUP BY
c.customer_id,
c.first_name,
c.last_name,
c.email
),
customer_segments AS (
SELECT
customer_id,
first_name,
last_name,
email,
order_count,
total_spent,
CASE
WHEN total_spent > 5000 THEN 'Premium'
WHEN total_spent > 1000 THEN 'Standard'
ELSE 'Basic'
END AS customer_segment
FROM
customer_orders
)
SELECT
customer_segment,
COUNT(*) AS customer_count,
SUM(order_count) AS total_orders,
SUM(total_spent) AS total_revenue,
AVG(total_spent) AS avg_customer_value
FROM
customer_segments
GROUP BY
customer_segment
ORDER BY
total_revenue DESC;
Window Functions
-- Query using window functions
SELECT
c.customer_id,
c.first_name,
c.last_name,
o.order_id,
o.order_date,
o.total_amount,
ROW_NUMBER() OVER (PARTITION BY c.customer_id ORDER BY o.order_date) AS order_sequence,
SUM(o.total_amount) OVER (PARTITION BY c.customer_id ORDER BY o.order_date) AS running_total,
AVG(o.total_amount) OVER (PARTITION BY c.customer_id) AS customer_avg_order,
o.total_amount - LAG(o.total_amount) OVER (PARTITION BY c.customer_id ORDER BY o.order_date) AS order_amount_change
FROM
customers c
JOIN
orders o ON c.customer_id = o.customer_id
WHERE
o.order_date >= '2023-01-01'
AND o.status = 'completed'
ORDER BY
c.customer_id,
o.order_date;
Note: These SQL standards may vary slightly depending on your specific database platform (PostgreSQL, MySQL, SQL Server, etc.). Always consult the specific database documentation for platform-specific best practices.