🟢 How to Compare Records Within the Same Table (Self-JOINs)


Hello Reader,

A common analytics question that sounds simple but isn't:

"Which customers have booked the same type of expedition more than once?"

You can't answer this with a regular JOIN between two different tables. The information lives in one table. You need to compare rows within that table to find patterns.

That's what a self-JOIN does. It joins a table to itself.

It sounds unusual, but once you see the pattern, you'll recognize situations where it's exactly what you need.

The Business Problem

Marcus, your VP of Operations, wants to understand customer loyalty at a deeper level than just "repeat customers." He wants to know: which customers are so loyal to a specific expedition type that they've booked it three or more times?

This matters because a customer who booked three photography tours might become a brand advocate for that category. They might respond to premium photography packages, refer friends, or provide testimonials.

Gold.

The Query: Finding Category Loyalists

Here's what I ran against the Summit Adventures database (the fake adventure tourism company I created to help people learn business analytics):

-- Find customers who booked the same expedition type 3+ times
SELECT 
    c.first_name || ' ' || c.last_name AS customer_name,
    e.expedition_type,
    COUNT(DISTINCT b.booking_id) AS times_booked,
    SUM(p.amount) AS total_spent_on_type
FROM customers c
    INNER JOIN bookings b ON c.customer_id = b.customer_id
    INNER JOIN expedition_instances ei ON b.instance_id = ei.instance_id
    INNER JOIN expeditions e ON ei.expedition_id = e.expedition_id
    INNER JOIN payments p ON b.booking_id = p.booking_id
WHERE b.status IN ('completed', 'confirmed')
    AND p.payment_status = 'completed'
GROUP BY c.customer_id, c.first_name, c.last_name, e.expedition_type
HAVING COUNT(DISTINCT b.booking_id) >= 3
ORDER BY times_booked DESC, total_spent_on_type DESC;

Eight customers have booked the same expedition type three or more times. That's real loyalty. And real business intelligence.

Alexandrie Ullrich has spent over $18,600 on hiking alone. That's a customer who should be getting early access to every new hiking expedition, a direct email from a guide, and perhaps an invitation to a premium experience.

Now: The Self-JOIN Pattern

The query above uses GROUP BY and HAVING — powerful, but it doesn't compare individual records to each other. Let's look at where self-JOINs shine: comparing rows within the same table.

Business question: Which guides have worked together on the same expedition instance? This tells operations which guide pairings work well and which guides have experience collaborating.

-- Find guide pairs who've been assigned to the same trip
SELECT 
    g1.first_name || ' ' || g1.last_name AS guide_1,
    g2.first_name || ' ' || g2.last_name AS guide_2,
    COUNT(DISTINCT ga1.instance_id) AS instances_together
FROM guide_assignments ga1
    INNER JOIN guide_assignments ga2 
        ON ga1.instance_id = ga2.instance_id
        AND ga1.guide_id < ga2.guide_id
    INNER JOIN guides g1 ON ga1.guide_id = g1.guide_id
    INNER JOIN guides g2 ON ga2.guide_id = g2.guide_id
GROUP BY g1.guide_id, g1.first_name, g1.last_name, 
         g2.guide_id, g2.first_name, g2.last_name
ORDER BY instances_together DESC
LIMIT 10;

Lewis Morris has worked with several different guides across multiple trips. Laurent Neveu is another frequent collaborator. These are the guides you'd pair with newcomers because they have experience working in teams.

How the Self-JOIN (Actually) Works

The key is this part:

FROM guide_assignments ga1
    INNER JOIN guide_assignments ga2 
        ON ga1.instance_id = ga2.instance_id    -- Same trip
        AND ga1.guide_id < ga2.guide_id          -- Different guides

We're joining guide_assignments to itself. Same table, two aliases (ga1 and ga2).

  • ga1.instance_id = ga2.instance_id → matches guides assigned to the same expedition instance
  • ga1.guide_id < ga2.guide_id → ensures we don't count a guide paired with themselves, and avoids duplicates (Lewis-Aenne only appears once, not also as Aenne-Lewis)

That < instead of != is a subtle but important detail. Using != would give you both (Lewis, Aenne) and (Aenne, Lewis). Using < gives you each pair exactly once.

The Self-JOIN Pattern

Here's the general structure:

-- Compare rows within the same table
SELECT 
    a.column_value AS record_1,
    b.column_value AS record_2,
    -- comparison metrics
FROM table_name a
    INNER JOIN table_name b 
        ON a.shared_attribute = b.shared_attribute
        AND a.id < b.id   -- Avoid duplicates and self-matches

Three components:

  1. Same table, two aliases — give the table two names (a, b) so SQL can tell them apart
  2. Shared attribute — what connects the rows you want to compare (same city, same trip, same department)
  3. De-duplicationa.id < b.id prevents self-matches and duplicate pairs

When to Use Self-JOINs in Business

Finding customers in the same city: "Which of our high-value customers live in the same city?" → Useful for local events and meetups.

Comparing employees: "Which team members started in the same month?" → Useful for cohort mentoring programs.

Product pairings: "Which products are frequently bought together?" → Useful for bundle pricing.

Sequential events: "Which bookings happened within 30 days of each other for the same customer?" → Useful for understanding purchase velocity.

Common Mistakes to Avoid

Mistake 1: Forgetting the de-duplication condition

Without a.id < b.id, you'll get every pair twice and every row paired with itself. Your result set explodes in size and your numbers are wrong.

Mistake 2: Using a self-JOIN when GROUP BY would work

If you're counting "how many times did X happen?" That's GROUP BY + HAVING. Self-JOINs are for comparing one row to another row. Make sure you're using the right tool for the right job.

Mistake 3: Not limiting results

Self-JOINs can produce enormous result sets (every possible pair of rows). Always add filters and LIMIT while developing your query.

Try This at Your Job

Think about comparisons within a single dataset:

  • HR: Which employees in the same department have overlapping job titles? (Role duplication audit)
  • Sales: Which deals in the same quarter had the same sales rep? (Workload analysis)
  • Support: Which tickets from the same customer came in within 24 hours? (Escalation patterns)

The self-JOIN pattern works whenever you need to find relationships between rows in the same table.

Until next time,

Brian

Brian Graves, creator of Analytics in Action

Say 👋 on X/Twitter, LinkedIn, or book a call with me. You can always reply to these emails. I check them all.

P.S. Multi-table analysis is the core of Module 4 (The FBI Evidence Board Blueprint) in SQL for Business Impact. The course builds from simple JOINs to complex self-JOINs, giving you the full toolkit for connecting data across any source. Check it out at sqlforbusinessimpact.com.

P.P.S. What kind of "same-table comparison" would be most useful at your company? Hit reply and let me know. These are fun puzzles to solve. I read every response.

Starting With Data

Learn to build analytics projects with SQL, Tableau, Excel, and Python. For data analysts looking to level up their career and complete beginners looking to get started. No fluff. No theory. Just step-by-step tutorials anyone can follow.

Read more from Starting With Data

Hello Reader, This week's newsletter is a bit different. There's SQL in here, but the real topic is career strategy. Because one of the most common questions I get from readers is some version of: "I've been learning SQL for a few months. How do I prove I can actually do this job?" The honest answer: a portfolio of 2-3 projects that demonstrate business thinking, not just technical syntax. The problem is most portfolios I've reviewed look the same. A Kaggle competition. A tutorial from...

Hello Reader, You've built a clean customer report. Locations, booking history, revenue by region. You send it off to whoever asked for it. Ten minutes later: "Why does this report show blank cells in the location column? And why do the totals not add up?" Welcome to the NULL problem. You're going to learn how to solve it once and for all today. In the Summit Adventures database (the fake adventure tourism company I created to help people learn business analytics), 78.2% of customers don't...

Hello Reader, Imagine you're an analyst at Summit Adventures. Monday morning, you get called into a meeting and your VP asks: "How's revenue looking across our expedition types?" You pull up your analysis. You've got clean numbers. You present them: "Cultural expeditions brought in $374K. Photography brought in $370K. Hiking was $352K. Safari was $349K. Climbing was $298K." Silence. Then someone asks: "Okay... but what should we do about it?" You've answered the question accurately. But you...