How To Merge Categories of a Categorical Variable Of a Pandas DataFrame In Python

In this article, I will show you how to merge Categories of a Categorical variable of a pandas dataframe in Python.

To achieve this. I will carry out the following:

1. Define what a categorical variable is;

2. Give you three examples of a categorical variable, so you can contextualize it better;

3. And finally, I will demonstrate how to merge the categories of a categorical variable of a Pandas Dataframe in Python.

1. WHAT IS CATEGORICAL VARIABLE

A categorical variable is a variable that can only take a finite number of categories or options.

 

 

2. EXAMPLES OF CATEGORICAL VARIABLES

Examples of categorical variables are: sex, highest educational level, and socioeconomic status.

For variable sex. It can only take two possible number of categories which are male and female;

For variable highest educational level, It can only take five possible number of categories. which are, no education, pre-primary education, primary education, secondary education, and tertiary education,

and finally; for variable Socio-Economic Status (SES), it can only take five possible number of categories, which are. poorest-class poorer-class middle-class. richer-class. and richest-class.

 

 

 

3. HOW TO MERGE TWO OR MORE CATEGORIES OF A CATEGORICAL VARIABLE IN PYTHON

In this final section of our presentation, I will show you how to merge, two or more categories of a categorical variable in Python.

For this demonstration we would make use of the socio-economic status variable.

Recall the socio-economic status (SES) variable has five categories namely: Poorest-class. Poorer-class. Middle-class. Richer-class. and the Richest-class.

In this instance,  We would make use of the five categories of the socioeconomic status variable into three categories.

 

To do this. We will merge the poorer-class and the poorest class into one as poor . We would merge the richer and the richest class into one as Rich. So that we will now have a socio-economic status variable that has three categories,  namely:  poor-class, middle-class. and rich-class. To do this in Python . Let’s get to it.

 

4. DEMONSTRATING HOW TO MERGE THE CATEGORIES OF A CATEGORICAL VARIABLE IN PYTHON PANDAS DATAFRAME

 

a. Import and Convert Our Dataset into Pandas Dataframe

To do this, we would import our pandas dataframe into our Jupyter notebook using the ‘import pandas as pd’ command.

Then, We would declare the path directory to our dataset and then import and convert as a pandas dataframe using the read method.

Note: The data for this analysis is that of the Nigeria demographic and health survey, NDHS 2018.

 

b. Check the No. of Observations and Variables In The Pandas Dataframe

The second step is to check the number of observations and variables in our DHS_dataset dataframe has, using the dot shape function.  Based on these explanatory analysis, our dataset has 41,821 observations. and 5,394 variables.

 

c. Identify The Categorical Variable (v190) For This Demonstration

The third step is to identify variable We are going to use for our demonstration.

Based on domain knowledge. The variable that we will use as our socio-economic status variable is the v190.

Using the dot name and the dot dtypes functions, we would understand our variable better. The dots name function tells us what our variable name is, while the dot dtypes function indicate the categories of our variable.

 

d. Commence Merging of Richest & Richer as Rich, and Poorest & Poorer as Poor

To do this, we would create a dictionary for the categories of our variable, such that the categories that we want to merge will serve as keys to the new variable they are being merged into.

In our instance, Poorest and Poorer will serve as keys to the same value “Poor” Also category “Middle” will serve as a key to value “Middle”. Then categories, “Richer” and “Richest” will serve as keys to value “Rich”.

Thereafter. We would use the replace method to pass and apply the dictionary outcome to how our v190 variable.

So that our v190 categorical variable will now have three categories, namely – “Poor”, “Middle”, and “Rich”.

 

Find below the video version of this post. Please like, and subscribe to our channel Youtube Channel. Thank you.

Leave a Comment

Your email address will not be published. Required fields are marked *

A note to our visitors

This website has updated its privacy policy in compliance with changes to European Union data protection law, for all members globally. We’ve also updated our Privacy Policy to give you more information about your rights and responsibilities with respect to your privacy and personal information. Please read this to review the updates about which cookies we use and what information we collect on our site. By continuing to use this site, you are agreeing to our updated privacy policy.