🟢 Python Data Blending Made Easy: 3 Simple Steps to Combine Data Sets in Python (Even if You’re Not Sure Where to Start)


Starting with Data

Actionable data analytics tips, tricks, and tutorials to increase your earning potential. Delivered right to your inbox every Saturday.

Python Data Blending Made Easy: 3 Simple Steps to Combine Data Sets in Python (Even if You’re Not Sure Where to Start)

Hey Reader,

Have you ever been given a few different spreadsheets and had to combine them into one?

  • Use clunky VLOOKUPs in Excel
  • Copy and paste (hoping you don't break something)
  • Give up and ask a more technical coworker for help

It's a waste of time.

And it doesn't take you any closer to becoming a six-figure data analyst. Never mind what your boss thinks of you not being able to work with data. Embarrassing.

Blending data is a breeze in Python.

People see Python and run for the hills.

I get it. And I used to be the same way. But this is expertise you can build up over time.

Start small with this tutorial. Master the fundamentals. Go from being clueless to solving real problems and turning messy data into something useful.

By the way, if you haven't snagged your copy of The Data Analytics Portfolio Playbook, now's the time! Get up and running in two weeks or less. Data visualizations are awesome for your portfolio. The playbook includes everything you need to create an awesome portfolio -- including how to host for free.

Here's a specific example of what's possible when you follow the proven playbook.


Introduction to Data Blending in Python

This step-by-step tutorial covers the basics of blending different data sets in Python.

Here's why:

  • simplify your data management
  • become the go-to Python data person
  • level up your Python data analytics skills
  • hands-on experience with a real-world example
  • increased confidence in your data career journey

Prerequisites

Alright, let's create this thing.

Step 1: Setup and Data Loading

Start by importing necessary libraries and loading our data.

Paste this code into a new Google Collab notebook cell and run it.

# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
# Load data from CSV files
sales_data = pd.read_csv('sales_data.csv')
customer_details = pd.read_csv('customer_details.csv')
# Quick view of the datasets
print(sales_data.head())
print(customer_details.head())

Step 2: Data Blending with Different Joins

Next, let's explore various join operations: left, right, and inner joins.

Left join keeps all rows from the left table, right join from the right table, and inner join only the rows that match in both tables.

# Left join
left_joined_data = pd.merge(sales_data, customer_details, on='customer_id', how='left')
print(left_joined_data.head())
# Right join
right_joined_data = pd.merge(sales_data, customer_details, on='customer_id', how='right')
print(right_joined_data.head())
# Inner join
inner_joined_data = pd.merge(sales_data, customer_details, on='customer_id', how='inner')
print(inner_joined_data.head())

Step 3: Data Analysis and Visualization

Now that our joins are created, let’s create a visualization to understand the impact of different join types.

  • Left Join: Includes all records from the left table (sales data) and matches with the right table (customer details) wherever possible.
  • Right Join: Includes all records from the right table (customer details) and matches with the left table (sales data) wherever possible.
  • Inner Join: Includes only the records that have matching entries in both tables.
# Counting records for each join type
left_count = left_joined_data['customer_id'].count()
right_count = right_joined_data['customer_id'].count()
inner_count = inner_joined_data['customer_id'].count()
# Creating a bar plot with descriptive labels
join_descriptions = ['Left Join \n(All Sales, Matched Customers)',
                     'Right Join \n(All Customers, Matched Sales)',
                     'Inner Join \n(Only Matched Sales and Customers)']
counts = [left_count, right_count, inner_count]
colors = ['#FF9999', '#99CC99', '#9999FF']  # Soft red, green, and blue
plt.figure(figsize=(10, 6))
bars = plt.bar(join_descriptions, counts, color=colors)
# Adding data labels
for bar in bars:
    yval = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2, yval, int(yval), 
             verticalalignment='bottom', horizontalalignment='center')
plt.xlabel('Type of Join Operation')
plt.ylabel('Record Count')
plt.title('Comparative Analysis of Data Blending Techniques')
plt.show()

Looks great!

Did this tutorial help? Hit reply and let me know or fill out this quick survey.

Until next time, keep exploring and happy blending!

Brian

Whenever you're ready, here's how I can help you:

  1. Get more data analytics tips in my previous newsletter articles
  2. Build your data analytics portfolio in 2 weeks with The Data Analytics Portfolio Playbook

You are receiving this because you signed up for Starting with Data, purchased one of my data analytics products, or enrolled in one of my data analytics courses. Unsubscribe at any time using the link below.

113 Cherry St #92768, Seattle, WA 98104
UnsubscribePreferences

Starting With Data

Learn to build analytics projects with SQL, Tableau, Excel, and Python. For data analysts looking to level up their career and complete beginners looking to get started. No fluff. No theory. Just step-by-step tutorials anyone can follow.

Read more from Starting With Data

Hey Reader, Last week I showed you The Mister Rogers Blueprint, the step-by-step guide to approaching unfamiliar dataset and delivering actionable insights. You might be thinking: “Okay, cool framework. But does this actually work?” Fair question. Let me show you what happens when people learn SQL this way. First, let’s talk about what doesn’t work. Jenn (a student and data coaching client of mine) paid $2,500 for an SQL course from an Ivy League university. She thought it would be robust,...

Hey Reader, Remember Mister Rogers? "Won't you be my neighbor?" That show taught me back in the day (and my now-grown kids more recently) along with millions of others young and old how to approach new situations with curiosity instead of fear. Turns out, that's exactly the mindset you need when someone drops an unfamiliar database on your desk. Here's the situation every analyst will face: You start a new job or project. Your manager says "Pull the customer data for Monday's meeting." You...

Business leaders are dumb. That's what a lot of business analysts think because they create something "awesome" with a dataset and then it gets ignored. Unfortunately, these types of business analysts don't realize that leaders aren't dumb. They are just busy. They are responsible for making decisions and making them quickly. And leaders need answers (based on data) more than anything. Think about it: if they need answers and you have the skills to provide those answers... You become their...