
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Collapse Multiple Columns in Python Pandas
Pandas is a popular data manipulation library in Python that is widely used for working with structured data. One of the common tasks when working with data is to clean and transform it in order to prepare it for analysis.
Sometimes, the data might contain multiple columns that have similar information or are related to each other. In such cases, it might be useful to collapse these columns into a single column for easier analysis or visualization.
Pandas provides several methods to collapse multiple columns into a single column. In this tutorial, we will explore these methods in detail and provide examples to demonstrate how to use them. By the end of this tutorial, you will have a better understanding of how to collapse multiple columns in Pandas and be able to apply these techniques to your own datasets.
Now let's talk about two different approaches that we can make use of to collapse multiple columns in pandas.
Collapsing Multiple Columns Using the melt() Method in Pandas
The melt() method in Pandas is a powerful tool for transforming wide data into long data. We can use this method to collapse multiple columns into a single column.
To use the melt() method, we need to specify the columns that we want to keep as identifier variables and the columns that we want to collapse into a single column using the value_vars parameter.
The resulting DataFrame will have a new column that contains the values from the collapsed columns, and the old columns will be transformed into two columns, one for the identifier variable and one for the value variable.
Here are the steps to collapse multiple columns using the melt() method
Import the Pandas library.
Load the dataset into a DataFrame.
Use the melt() method on the DataFrame and specify the identifier variables using the id_vars parameter and the columns to collapse using the value_vars parameter.
Rename the columns as necessary.
Example
Now let's consider the code for the same.
import pandas as pd from tabulate import tabulate data = { 'name': ['Alice', 'Bob', 'Charlie'], 'age': [25, 30, 35], 'income_2019': [50000, 60000, 70000], 'income_2020': [55000, 65000, 75000]} df = pd.DataFrame(data) print(tabulate(df, headers='keys', tablefmt='psql')) print("\n Data Frame After Collapsing Similar Columns") melted_df = df.melt( id_vars=['name', 'age'], value_vars=['income_2019', 'income_2020'], var_name='year', value_name='income') print(tabulate(melted_df, headers = 'keys', tablefmt = 'psql'))
Output
On executing this code, you will get the following output

Observe the original table and how it looks after we collapsed and merged its similar columns.
Collapsing Multiple Columns Using the stack() Method in Pandas
The stack() method in Pandas is another way to collapse multiple columns into a single column. We can use this method when the columns to be collapsed have a hierarchical index.
To use the stack() method, we first need to create a MultiIndex on the DataFrame using the set_index() method.
We can then use the stack() method to collapse the columns and create a Series object.
Here are the steps to collapse multiple columns using the stack() method
Import the Pandas library.
Load the dataset into a DataFrame.
Use the set_index() method to create a MultiIndex on the DataFrame using the columns that we want to keep as identifier variables.
Use the stack() method to collapse the columns.
Convert the resulting Series object back into a DataFrame using the reset_index() method.
Rename the columns as necessary.
Example
Consider the code shown below.
import pandas as pd from tabulate import tabulate data = { 'name': ['Alice', 'Bob', 'Charlie'], 'age': [25, 30, 35], 'income': [(50000, 55000), (60000, 65000), (70000, 75000)] } df = pd.DataFrame(data) print(tabulate(df, headers='keys', tablefmt='fancy_grid')) df.set_index(['name', 'age'], inplace=True) stacked_df = df['income'].apply(pd.Series) print(tabulate(stacked_df, headers='keys', tablefmt='fancy_grid'))
Output
On executing this code, you will get the following output

Observe how the Pandas dataframe looks after we collapsed and merged some of its columns using the stack() method.
Conclusion
In conclusion, collapsing multiple columns in Pandas can be done using various methods, such as using the melt() and stack() methods. Both of these methods are powerful tools for transforming wide data into long data, and they can be used to collapse multiple columns into a single column. The choice of method depends on the specific dataset and the structure of the columns that need to be collapsed. By using these methods, we can create a more organised and efficient dataset, which is crucial for data analysis and machine learning applications.
Pandas offers a variety of powerful tools for data manipulation, and mastering these tools can greatly enhance your ability to analyze and process data.