🟢 How to Document Your SQL Work So Anyone Can Understand It


Hey Reader,

Quick question: Could a colleague open your most recent SQL query and understand what it does without asking you a single question?

If the answer is "probably not," this newsletter is for you.

Writing SQL that works is step one. Writing SQL that someone else can read, trust, and maintain — that's the skill that changes how people see your work.

Why This Matters More Than You Think

Here's what I've noticed over 15+ years in analytics: the analysts who get promoted aren't always the ones who write the most complex queries. They're the ones whose work is so clear that other people can build on it.

When your analysis is easy to follow:

  • Colleagues trust the results without interrogating you
  • Leadership can reference your work in meetings
  • New team members can pick up where you left off
  • You can revisit your own work 6 months later without confusion

When it's not: every analysis becomes a bottleneck that requires you to explain it. That limits your impact.

Three Documentation Habits That Make a Difference

Habit 1: Comment the "Why," Not the "What"

-- BAD: Comments that describe what SQL does (obvious from reading it)-- Select first name and last name from customersSELECT first_name, last_name FROM customers;
-- GOOD: Comments that explain WHY and business context-- Marketing requested VIP list for Q2 retention campaign-- VIP defined as: $5K+ lifetime spend, active in last 6 monthsSELECT 
    c.first_name ||' '|| c.last_name AS customer_name,
    c.email,
    SUM(p.amount) AS lifetime_spend,
    MAX(b.booking_date)::dateAS last_booking
FROM customers c
    INNERJOIN bookings b ON c.customer_id = b.customer_id
    INNERJOIN payments p ON b.booking_id = p.booking_id
WHERE p.payment_status ='completed'AND b.status IN ('completed', 'confirmed')
GROUPBY c.customer_id, c.first_name, c.last_name, c.email
HAVINGSUM(p.amount) >=5000ANDMAX(b.booking_date) >=CURRENT_DATE-INTERVAL'6 months'ORDERBY lifetime_spend DESC;

The "what" is visible in the code itself. The "why" disappears unless you write it down. Six months later, you'll need the why more than the what.

Habit 2: Use CTEs to Tell a Story

Compare these two versions of the same analysis:

Version A: One big query

SELECT e.expedition_type, COUNT(DISTINCT b.booking_id),
SUM(p.amount), ROUND(AVG(p.amount), 2),
COUNT(DISTINCT b.booking_id) FILTER (WHERE b.status = 'cancelled') * 100.0 / COUNT(DISTINCT b.booking_id)
FROM expeditions e INNER JOIN expedition_instances ei
ON e.expedition_id = ei.expedition_id INNER JOIN bookings b
ON ei.instance_id = b.instance_id INNER JOIN payments p
ON b.booking_id = p.booking_id WHERE p.payment_status = 'completed'
GROUP BY e.expedition_type ORDER BY SUM(p.amount) DESC;

Version B: CTEs with clear steps

-- Expedition category performance analysis
-- Requested by: Marcus (Operations)
-- Purpose: Q2 planning - identify which categories to expand
-- Step 1: Calculate revenue and volume per category
WITH category_performance AS (
    SELECT 
        e.expedition_type,
        COUNT(DISTINCT b.booking_id) AS total_bookings,
        SUM(p.amount) AS total_revenue,
        ROUND(AVG(p.amount), 2) AS avg_booking_value
    FROM expeditions e
        INNER JOIN expedition_instances ei ON e.expedition_id = ei.expedition_id
        INNER JOIN bookings b ON ei.instance_id = b.instance_id
        INNER JOIN payments p ON b.booking_id = p.booking_id
    WHERE p.payment_status = 'completed'
    GROUP BY e.expedition_type
),
-- Step 2: Calculate cancellation rates per category
category_cancellations AS (
    SELECT 
        e.expedition_type,
        ROUND(
            COUNT(*) FILTER (WHERE b.status = 'cancelled') * 100.0 / COUNT(*),
            1
        ) AS cancellation_rate
    FROM expeditions e
        INNER JOIN expedition_instances ei ON e.expedition_id = ei.expedition_id
        INNER JOIN bookings b ON ei.instance_id = b.instance_id
    GROUP BY e.expedition_type
)
-- Step 3: Combine for complete picture
SELECT 
    cp.expedition_type,
    cp.total_bookings,
    cp.total_revenue,
    cp.avg_booking_value,
    cc.cancellation_rate
FROM category_performance cp
    INNER JOIN category_cancellations cc ON cp.expedition_type = cc.expedition_type
ORDER BY cp.total_revenue DESC;

Same results. Version B is readable by anyone on your team.

Each CTE has a name that describes what it calculates. The comments label each step. A new analyst joining your team can follow the logic without asking you what it does.

Habit 3: Name Things for Business Users

Column names your manager won't understand:

SELECT 
    cnt_bid,
    sum_amt,
    avg_val,
    exp_typ
FROM ...

Column names anyone can read:

SELECT 
    total_bookings,
    total_revenue,
    avg_booking_value,
    expedition_type
FROM ...

This takes 10 extra seconds when writing the query and saves 10 minutes every time someone reads it. Including you.

A Template for Documented Analysis

Here's a pattern to use for any analysis that might be shared or revisited:

-- ============================================
-- Analysis: [What question this answers]
-- Requested by: [Who asked for it]
-- Date: [When you wrote it]
-- Data source: [Which database/tables]
-- Assumptions: [Any filters or business rules]
-- ============================================
-- Step 1: [Description of first step]
WITH step_one AS (
    SELECT ...
),
-- Step 2: [Description of second step]
step_two AS (
    SELECT ...
)
-- Final output: [What the results show]
SELECT ...
FROM step_one
    JOIN step_two ON ...
ORDER BY ...;

It takes 2 minutes to add this header. That 2-minute investment saves hours of "wait, what does this query do?" conversations.

The Speed of Trust

A documented query isn't just easier to read. It's easier to trust.

When leadership reviews your analysis, clean documentation signals: "This person is thorough. I can rely on their numbers." Messy, uncommented queries signal the opposite — even if the results are correct.

This isn't about perfection. It's about building a reputation for clear, reliable work. Over time, that reputation compounds into bigger projects, more visibility, and greater career impact.

Start Today

Pick your most recent SQL query and apply these three habits:

  1. Add a comment explaining why (not what) the query exists
  2. Break it into named CTEs if it has more than 2 JOINs
  3. Rename columns to plain business language

It won't take long. And the next time someone opens your work, they'll notice the difference.

Until next time,

Brian

Brian Graves, creator of Analytics in Action

Say 👋 on X/Twitter, LinkedIn, or book a call with me. You can always reply to these emails. I check them all.


P.S. Clear, well-structured queries are a running theme throughout SQL for Business Impact. Every module teaches not just the SQL pattern, but how to structure and present it clearly. It's the difference between writing queries and doing analysis. Check it out at sqlforbusinessimpact.com.

P.P.S. What does your team's SQL documentation look like? Are queries well-documented or is everything tribal knowledge? Hit reply — I'm curious how different teams handle this. I read every response.

Starting With Data

Learn to build analytics projects with SQL, Tableau, Excel, and Python. For data analysts looking to level up their career and complete beginners looking to get started. No fluff. No theory. Just step-by-step tutorials anyone can follow.

Read more from Starting With Data

Hey Reader, Imagine you're presenting a finding to your team: "Customer A spent $4,200 with us." The first question from the room: "Is that a lot?" Most analysts learning SQL don't realize that, without context, the number means nothing. You need comparison. Is the average customer spend $500 or $5,000? Is $4,200 in the top 10% or the middle of the pack? This is the "compared to what?" problem, and subqueries solve it elegantly. Numbers without context don't drive decisions. Executives don't...

Hello Reader, Almost every business question boils down to one of two things: "How many?" (COUNT) "How much?" (SUM) How many customers booked this month? How much revenue did we generate? How many trips were cancelled? How much did we lose? Once you're comfortable with COUNT and SUM, you can answer the majority of questions that come your way (like sales analytics using SQL). Let me show you both patterns using a real business scenario. The Scenario Your operations manager asks: "Give me a...

HeyReader, Most analysts never think to ask: "Why are our customers cancelling?" Everyone looks at bookings. Everyone reports revenue. But the data hiding in your cancelled orders often tells a more important story than your completed ones. And analysts that want to move from beginner to intermediate SQL skill level have an unfair advantage here. Because there's a sequence to work through this kind of problem. That's what we're covering here. At Summit Adventures (the fake adventure tourism...