|
Hello Reader, A common analytics question that sounds simple but isn't: "Which customers have booked the same type of expedition more than once?" You can't answer this with a regular JOIN between two different tables. The information lives in one table. You need to compare rows within that table to find patterns. That's what a self-JOIN does. It joins a table to itself. It sounds unusual, but once you see the pattern, you'll recognize situations where it's exactly what you need. The Business ProblemMarcus, your VP of Operations, wants to understand customer loyalty at a deeper level than just "repeat customers." He wants to know: which customers are so loyal to a specific expedition type that they've booked it three or more times? This matters because a customer who booked three photography tours might become a brand advocate for that category. They might respond to premium photography packages, refer friends, or provide testimonials. Gold. The Query: Finding Category LoyalistsHere's what I ran against the Summit Adventures database (the fake adventure tourism company I created to help people learn business analytics):
Eight customers have booked the same expedition type three or more times. That's real loyalty. And real business intelligence. Alexandrie Ullrich has spent over $18,600 on hiking alone. That's a customer who should be getting early access to every new hiking expedition, a direct email from a guide, and perhaps an invitation to a premium experience. Now: The Self-JOIN PatternThe query above uses GROUP BY and HAVING — powerful, but it doesn't compare individual records to each other. Let's look at where self-JOINs shine: comparing rows within the same table. Business question: Which guides have worked together on the same expedition instance? This tells operations which guide pairings work well and which guides have experience collaborating.
Lewis Morris has worked with several different guides across multiple trips. Laurent Neveu is another frequent collaborator. These are the guides you'd pair with newcomers because they have experience working in teams. How the Self-JOIN (Actually) WorksThe key is this part:
We're joining
That The Self-JOIN PatternHere's the general structure:
Three components:
When to Use Self-JOINs in BusinessFinding customers in the same city: "Which of our high-value customers live in the same city?" → Useful for local events and meetups. Comparing employees: "Which team members started in the same month?" → Useful for cohort mentoring programs. Product pairings: "Which products are frequently bought together?" → Useful for bundle pricing. Sequential events: "Which bookings happened within 30 days of each other for the same customer?" → Useful for understanding purchase velocity. Common Mistakes to AvoidMistake 1: Forgetting the de-duplication condition Without Mistake 2: Using a self-JOIN when GROUP BY would work If you're counting "how many times did X happen?" That's GROUP BY + HAVING. Self-JOINs are for comparing one row to another row. Make sure you're using the right tool for the right job. Mistake 3: Not limiting results Self-JOINs can produce enormous result sets (every possible pair of rows). Always add filters and LIMIT while developing your query. Try This at Your JobThink about comparisons within a single dataset:
The self-JOIN pattern works whenever you need to find relationships between rows in the same table. Until next time, Brian P.S. Multi-table analysis is the core of Module 4 (The FBI Evidence Board Blueprint) in SQL for Business Impact. The course builds from simple JOINs to complex self-JOINs, giving you the full toolkit for connecting data across any source. Check it out at sqlforbusinessimpact.com. P.P.S. What kind of "same-table comparison" would be most useful at your company? Hit reply and let me know. These are fun puzzles to solve. I read every response. |
Learn to build analytics projects with SQL, Tableau, Excel, and Python. For data analysts looking to level up their career and complete beginners looking to get started. No fluff. No theory. Just step-by-step tutorials anyone can follow.
Hello Reader, A while back, I shared some tips about formatting SQL output for spreadsheets. Today we're going the other direction. Doing the kind of analysis that makes spreadsheet formulas feel clunky. Here's the kind of request that analysts get every single day: "Show me monthly revenue and how the trend is moving." If you pull monthly revenue into a spreadsheet, you'd probably create a running total column, then a 3-month moving average column, then a month-over-month change column....
Hello Reader, Here's a fact of analyst life: most of the people who use your analysis will never use SQL. Instead, they'll use spreadsheets and reports. Your director doesn't open pgAdmin. Your marketing team doesn't "connect to the database". They open the Excel file or Google Sheet you sent them. And if the data you exported requires 20 minutes of cleanup before it's usable (reformatting dates, splitting columns, fixing number formats) you've done extra work that didn't need to happen....
Hello Reader, Most analysts panic when they get a note like this on a random Tuesday afternoon (from the CEO): "Revenue feels off this quarter. Can you figure out what's happening before the board call at 4:30?" Great. You have 90 minutes. No time to build a comprehensive analysis. No time to make it pretty. You need to diagnose the problem (super fast) and deliver a clear answer. Lucky you. You're going to learn the way a seasoned pro handles this situation. I call it The Gordon Ramsay...