Navigating the world of HR analytics can feel like stepping into a maze without a map. Sifting through mountains of raw employee data, trying to see patterns, predict turnovers, or even just understand how long your employees stick around. Not just daunting. But overwhelming at times.
The sheer weight makes you question if HR analytics is even worth the effort. But it can be different. An organized, systematic approach to HR analytics where employee tenure analysis becomes a breeze. That's right. With just a touch of Python magic, you can turn the tide.
Dive into this guide, and I'll walk you through the transformative world of HR analytics, one step at a time. By the end, you'll have the tools and knowledge to confidently tackle any HR analytics challenge that comes your way. βFollow along using my Google Colab notebook (free)β βStruggling with something else? Let me know!β Step 1: Install Required Libraries and Preview the DatasetFirst, we load the dataset from a URL using pandas and display the first few rows
Step 2: Data Cleansing: Remove Specific RecordsNext, filter out rows from the dataset where the department is labeled as 'Human Resources'. This is just an example to show how filtering works with Pandas.
Step 3: Date Parsing with Timezone ConversionConverting 'hire_date' and 'termdate' columns into datetime format with UTC timezone and handling potential invalid dates. This is an important skill to learn because often times dates and time stamps are poorly formatted when working with data.
Step 4: Combine Location Data and Remove RedundanciesHere, we create a unified 'location' column by merging 'location_city' and 'location_state' columns and then drop the originals to streamline the dataframe. This can be useful when combining any columns. For example, combining first name and last name into a full name field. Or combining business unit and sub-business unit. Very handy!
Step 5: Calculate Employee Tenure in DaysNext, let's calculate the tenure of employees in terms of days. If an employee has a termination date (termdate), the tenure is the difference between the hire_date and termdate. Otherwise, it's the difference between the hire_date and the current date.
Step 6: Categorize Employee TenureThis code segments employees based on their tenure, classifying them as 'New', 'Mid-level', or 'Experienced'. Notice how we use a Python function here. First, we create the categorize_tenure function with one parameter (the number of days that each employee has been employed). Then we call that function while creating a new field in the data frame.
Step 7: Visualization of Employee Tenure DistributionFor the final step, let's take a look at everything we've done to turn raw data into a useful analysis. This code visualizes the distribution of employees across the 'New', 'Mid-level', and 'Experienced' tenure categories using a histogram.
And here's the final result: Great job! The idea for this newsletter came directly from a reader β just like you! βTake 3 minutes to let me know what you want help with next.β Until next time, keep exploring and happy analyzing! Brian Whenever you're ready, here's how I can help you:
|
You are receiving this because you signed up for Starting with Data, purchased one of my data analytics products, or enrolled in one of my data analytics courses. Unsubscribe at any time using the link below. 600 1st Ave, Ste 330 PMB 92768, Seattle, WA 98104-2246 |
Learn to build analytics projects with SQL, Tableau, Excel, and Python. For data analysts looking to level up their career and complete beginners looking to get started. No fluff. No theory. Just step-by-step tutorials anyone can follow.
Hey Reader, Imagine you're presenting a finding to your team: "Customer A spent $4,200 with us." The first question from the room: "Is that a lot?" Most analysts learning SQL don't realize that, without context, the number means nothing. You need comparison. Is the average customer spend $500 or $5,000? Is $4,200 in the top 10% or the middle of the pack? This is the "compared to what?" problem, and subqueries solve it elegantly. Numbers without context don't drive decisions. Executives don't...
Hello Reader, Almost every business question boils down to one of two things: "How many?" (COUNT) "How much?" (SUM) How many customers booked this month? How much revenue did we generate? How many trips were cancelled? How much did we lose? Once you're comfortable with COUNT and SUM, you can answer the majority of questions that come your way (like sales analytics using SQL). Let me show you both patterns using a real business scenario. The Scenario Your operations manager asks: "Give me a...
HeyReader, Most analysts never think to ask: "Why are our customers cancelling?" Everyone looks at bookings. Everyone reports revenue. But the data hiding in your cancelled orders often tells a more important story than your completed ones. And analysts that want to move from beginner to intermediate SQL skill level have an unfair advantage here. Because there's a sequence to work through this kind of problem. That's what we're covering here. At Summit Adventures (the fake adventure tourism...