|
Hello Reader, Here's a scenario that happens more often than anyone admits: You build a report. Your director uses it in a board presentation. A week later, someone notices the customer count includes 58 records with obviously fake email addresses like The report wasn't wrong. The data underneath it was dirty. And now your credibility takes a hit. Bummer. Not because of your SQL skills, but because you didn't check the final product before sending it off. Data quality auditing is one of the most practical skills you can develop. It takes 10 minutes per query, it saves you from embarrassing situations, and the same patterns work in every database you'll ever touch. Build Your Own Data Quality Audit ProcessThe idea here is to build data quality checks for what you need (invalid emails is the example I use here). Then, combine those into a unified audit script to run all checks at once. I ran these against the Summit Adventures database (the fake adventure tourism company I created to help people learn business analytics). This database has intentional data quality issues built in. About 15% of records have some kind of anomaly specifically so students can practice finding them. Check #1: Invalid Email Addresses
These are records where no valid email was provided. In a real company, these customers are unreachable by email — which means any marketing campaign that targets them is wasting resources. Check #2: The Full Audit SummaryHere's the powerful part. One query that summarizes all major data quality issues:
This is your data quality scorecard. In one glance, you know:
Over time, you can update based on new findings or requirements and build your own audit query. How to Build Your Own Audit QueryThe pattern is the same for any database:
Each What to Check in Any DatabaseHere's a checklist you can adapt for your specific needs: Contact information:
Dates and ages:
Financial data:
Duplicates:
When to Run Audits
Common Mistakes to AvoidMistake 1: Fixing data silently Don't just exclude bad records without mentioning it. Document what you found and what you excluded. Transparency builds more trust than a "clean" report that hides problems. Mistake 2: Treating all NULLs as errors Missing dietary restrictions isn't a data quality issue, it's an optional field. Missing email addresses on records that need email outreach IS a quality issue. Context matters. Mistake 3: Auditing once and never again Data quality degrades over time as new records come in. Build your audit as a saved query you can re-run monthly. Try This Monday MorningBefore your next analysis:
It takes less than 15 minutes. And the first time it catches something before your boss does, you'll never skip it again. Until next time, Brian P.S. Data quality thinking is woven throughout SQL for Business Impact. Every module uses real-world data with intentional messiness, because that's what you'll face at work. The course teaches you to handle it confidently. Check it out at sqlforbusinessimpact.com. P.P.S. What's the worst data quality surprise you've found at work? Hit reply. I collect these stories because they're genuinely fascinating. I read every response. |
Learn to build analytics projects with SQL, Tableau, Excel, and Python. For data analysts looking to level up their career and complete beginners looking to get started. No fluff. No theory. Just step-by-step tutorials anyone can follow.
Hey Reader, Last month, we covered LEFT JOIN to find customers who never booked. That's one type of absence — people who never showed up. But there's another kind that's even more expensive: people who showed up, committed, and then left. The dreaded cancellation. Customers who said "yes", then said "actually, never mind". In the Summit Adventures database (the fake adventure tourism company I created to help people learn business analytics), nearly 47% of all bookings end in cancellation....
Hello Reader, This week's newsletter is a bit different. There's SQL in here, but the real topic is career strategy. Because one of the most common questions I get from readers is some version of: "I've been learning SQL for a few months. How do I prove I can actually do this job?" The honest answer: a portfolio of 2-3 projects that demonstrate business thinking, not just technical syntax. The problem is most portfolios I've reviewed look the same. A Kaggle competition. A tutorial from...
Hello Reader, A common analytics question that sounds simple but isn't: "Which customers have booked the same type of expedition more than once?" You can't answer this with a regular JOIN between two different tables. The information lives in one table. You need to compare rows within that table to find patterns. That's what a self-JOIN does. It joins a table to itself. It sounds unusual, but once you see the pattern, you'll recognize situations where it's exactly what you need. The Business...