How to Collapse Multiple Columns in Python Pandas?

Python Pandas Server Side Programming Programming

Pandas is a popular data manipulation library in Python that is widely used for working with structured data. One of the common tasks when working with data is to clean and transform it in order to prepare it for analysis.

Sometimes, the data might contain multiple columns that have similar information or are related to each other. In such cases, it might be useful to collapse these columns into a single column for easier analysis or visualization.

Pandas provides several methods to collapse multiple columns into a single column. In this tutorial, we will explore these methods in detail and provide examples to demonstrate how to use them. By the end of this tutorial, you will have a better understanding of how to collapse multiple columns in Pandas and be able to apply these techniques to your own datasets.

Now let's talk about two different approaches that we can make use of to collapse multiple columns in pandas.

Collapsing Multiple Columns Using the melt() Method in Pandas

The melt() method in Pandas is a powerful tool for transforming wide data into long data. We can use this method to collapse multiple columns into a single column.

To use the melt() method, we need to specify the columns that we want to keep as identifier variables and the columns that we want to collapse into a single column using the value_vars parameter.
The resulting DataFrame will have a new column that contains the values from the collapsed columns, and the old columns will be transformed into two columns, one for the identifier variable and one for the value variable.

Here are the steps to collapse multiple columns using the melt() method

Import the Pandas library.
Load the dataset into a DataFrame.
Use the melt() method on the DataFrame and specify the identifier variables using the id_vars parameter and the columns to collapse using the value_vars parameter.
Rename the columns as necessary.

Example

Now let's consider the code for the same.

import pandas as pd
from tabulate import tabulate

data = {
   'name': ['Alice', 'Bob', 'Charlie'],
   'age': [25, 30, 35],
   'income_2019': [50000, 60000, 70000],
   'income_2020': [55000, 65000, 75000]}
df = pd.DataFrame(data)
print(tabulate(df, headers='keys', tablefmt='psql'))

print("\n Data Frame After Collapsing Similar Columns")

melted_df = df.melt(
   id_vars=['name', 'age'],
   value_vars=['income_2019', 'income_2020'],
   var_name='year',
   value_name='income')

print(tabulate(melted_df, headers = 'keys', tablefmt = 'psql'))

Output

On executing this code, you will get the following output

Observe the original table and how it looks after we collapsed and merged its similar columns.

Collapsing Multiple Columns Using the stack() Method in Pandas

The stack() method in Pandas is another way to collapse multiple columns into a single column. We can use this method when the columns to be collapsed have a hierarchical index.

To use the stack() method, we first need to create a MultiIndex on the DataFrame using the set_index() method.
We can then use the stack() method to collapse the columns and create a Series object.

Here are the steps to collapse multiple columns using the stack() method

Import the Pandas library.
Load the dataset into a DataFrame.
Use the set_index() method to create a MultiIndex on the DataFrame using the columns that we want to keep as identifier variables.
Use the stack() method to collapse the columns.
Convert the resulting Series object back into a DataFrame using the reset_index() method.
Rename the columns as necessary.

Example

Consider the code shown below.

import pandas as pd
from tabulate import tabulate

data = {
   'name': ['Alice', 'Bob', 'Charlie'],
   'age': [25, 30, 35],
   'income': [(50000, 55000), (60000, 65000), (70000, 75000)]
}
df = pd.DataFrame(data)
print(tabulate(df, headers='keys', tablefmt='fancy_grid'))

df.set_index(['name', 'age'], inplace=True)
stacked_df = df['income'].apply(pd.Series)
print(tabulate(stacked_df, headers='keys', tablefmt='fancy_grid'))

Output

On executing this code, you will get the following output

Observe how the Pandas dataframe looks after we collapsed and merged some of its columns using the stack() method.

Conclusion

In conclusion, collapsing multiple columns in Pandas can be done using various methods, such as using the melt() and stack() methods. Both of these methods are powerful tools for transforming wide data into long data, and they can be used to collapse multiple columns into a single column. The choice of method depends on the specific dataset and the structure of the columns that need to be collapsed. By using these methods, we can create a more organised and efficient dataset, which is crucial for data analysis and machine learning applications.

Pandas offers a variety of powerful tools for data manipulation, and mastering these tools can greatly enhance your ability to analyze and process data.

Mukul Latiyan

Updated on: 2023-09-28T14:46:52+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started