How to sort a Pandas DataFrame by multiple columns?
Last Updated :
05 Apr, 2025
We are given a DataFrame and our task is to sort it based on multiple columns. This means organizing the data first by one column and then by another within that sorted order. For example, if we want to sort by ‘Rank’ in ascending order and then by ‘Age’ in descending order, the output will be a DataFrame ordered according to those rules, with NaN values placed at the end if specified.
Using nlargest()
nlargest() method is the fastest way to get the top n rows sorted by specific columns. It is optimized for performance, making it ideal when you need to retrieve only the top values based on one or more criteria.
Python
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['Raj', 'Akhil', 'Sonum', 'Tilak', 'Divya', 'Megha'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})
# Selecting top 3 rows with highest 'Rank'
res = df.nlargest(3, ['Rank'])
print(res)
Output Name Age Rank
3 Tilak 19 9.0
2 Sonum 21 8.0
4 Divya 17 4.0
Explanation: nlargest(n, columns) selects the top n rows with the highest values in the specified column, ignoring NaNs. Here, df.nlargest(3, [‘Rank’]) efficiently sorts by ‘Rank’ in descending order and returns the top 3 rows.
Using nsmallest()
nsmallest() method works similarly to nlargest() but retrieves the lowest n values instead.
Python
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['Raj', 'Akhil', 'Sonum', 'Tilak', 'Divya', 'Megha'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})
# Selecting bottom 3 rows with lowest 'Rank'
res = df.nsmallest(3, ['Rank'])
print(res)
Output Name Age Rank
0 Raj 20 1.0
4 Divya 17 4.0
2 Sonum 21 8.0
Explanation: nsmallest(n, columns) selects the bottom n rows with the lowest values in the specified column, ignoring NaNs. Here, df.nsmallest(3, [‘Rank’]) sorts ‘Rank’ in ascending order and returns the lowest 3 rows.
Using sort_values()
sort_values() method is the most flexible and widely used method for sorting a DataFrame by multiple columns. It allows sorting in both ascending and descending order while handling missing values efficiently.
Python
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['Raj', 'Akhil', 'Sonum', 'Tilak', 'Divya', 'Megha'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})
# Sorting by 'Rank' in ascending order and 'Age' in descending order
res = df.sort_values(by=['Rank', 'Age'], ascending=[True, False], na_position='last')
print(res)
Output Name Age Rank
0 Raj 20 1.0
4 Divya 17 4.0
2 Sonum 21 8.0
3 Tilak 19 9.0
5 Megha 23 NaN
1 Akhil 22 NaN
Explanation: sort_values(by, ascending, na_position) sorts a DataFrame based on multiple columns. Here, df.sort_values() sorts ‘Rank’ in ascending order and, for equal ranks, sorts ‘Age’ in descending order while pushing NaN values to the end.
Using sort_index()
sort_index() method sorts the DataFrame based on its index rather than its column values. It is useful when you want to reorder rows by their index, such as after setting a custom index.
Python
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['Raj', 'Akhil', 'Sonum', 'Tilak', 'Divya', 'Megha'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})
# Sorting the DataFrame by index in descending order
res = df.sort_index(ascending=False)
print(res)
Output Name Age Rank
5 Megha 23 NaN
4 Divya 17 4.0
3 Tilak 19 9.0
2 Sonum 21 8.0
1 Akhil 22 NaN
0 Raj 20 1.0
Explanation: sort_index(ascending) sorts a DataFrame based on its index. Here, df.sort_index(ascending=False) arranges the rows in descending order of their index values.
Using argsort()
If you need extremely fast sorting and are working with NumPy arrays, you can use argsort() to get the sorted indices and then apply them to the DataFrame.
Python
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['Raj', 'Akhil', 'Sonum', 'Tilak', 'Divya', 'Megha'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})
# Sorting DataFrame by 'Rank' using NumPy's argsort
sorted_idx = np.argsort(df['Rank'].values, kind='quicksort')
res = df.iloc[sorted_idx]
print(res)
Output Name Age Rank
0 Raj 20 1.0
4 Divya 17 4.0
2 Sonum 21 8.0
3 Tilak 19 9.0
1 Akhil 22 NaN
5 Megha 23 NaN
Explanation: np.argsort(df[‘Rank’].values, kind=’quicksort’) returns sorted indices for the ‘Rank’ column, ignoring NaNs. Using .iloc[sorted_idx], the DataFrame is reordered accordingly.
Similar Reads
Add multiple columns to dataframe in Pandas
In Pandas, we have the freedom to add columns in the data frame whenever needed. There are multiple ways to add columns to pandas dataframe. Add multiple columns to a DataFrame using Lists C/C++ Code # importing pandas library import pandas as pd # creating and initializing a nested list students =
3 min read
How to Sort a Pandas DataFrame by Both Index and Column?
In this article, we will discuss how to sort a Pandas dataframe by both index and columns. Sort DataFrame based on IndexWe can sort a Pandas DataFrame based on Index and column using sort_index method. To sort the DataFrame based on the index we need to pass axis=0 as a parameter to sort_index metho
3 min read
How to Order PysPark DataFrame by Multiple Columns ?
In this article, we are going to order the multiple columns by using orderBy() functions in pyspark dataframe. Ordering the rows means arranging the rows in ascending or descending order, so we are going to create the dataframe using nested list and get the distinct data. orderBy() function that sor
2 min read
How to Show All Columns of a Pandas DataFrame?
Pandas limit the display of rows and columns, making it difficult to view the full data, so let's learn how to show all the columns of Pandas DataFrame. Using pd.set_option to Show All Pandas ColumnsPandas provides a set_option() function that allows you to configure various display options, includi
2 min read
How to plot multiple data columns in a DataFrame?
Python comes with a lot of useful packages such as pandas, matplotlib, numpy, etc. To use DataFrame, we need a Pandas library and to plot columns of a DataFrame, we require matplotlib. Pandas has a tight integration with Matplotlib. You can plot data directly from your DataFrame using the plot() met
3 min read
How to Stack Multiple Pandas DataFrames?
In this article, we will see how to stack Multiple Pandas Dataframe. Stacking means appending the dataframe rows to the second dataframe and so on. If there are 4 dataframes, then after stacking the result will be a single dataframe with an order of dataframe1,dataframe2,dataframe3,dataframe4. Panda
6 min read
How to rename multiple column headers in a Pandas DataFrame?
Here we are going to rename multiple column headers using the rename() method. The rename method is used to rename a single column as well as rename multiple columns at a time. And pass columns that contain the new values and in place = true as an argument. We pass inplace = true because we just mod
5 min read
Sort the Pandas DataFrame by two or more columns
In this article, our basic task is to sort the data frame based on two or more columns. For this, Dataframe.sort_values() method is used. This method sorts the data frame in Ascending or Descending order according to the columns passed inside the function. First, Let's Create a Dataframe: C/C++ Code
2 min read
How to rename columns in Pandas DataFrame
In this article, we will see how to rename column in Pandas DataFrame. The simplest way to rename columns in a Pandas DataFrame is to use the rename() function. This method allows renaming specific columns by passing a dictionary, where keys are the old column names and values are the new column nam
4 min read
How to drop one or multiple columns in Pandas DataFrame
Let's learn how to drop one or more columns in Pandas DataFrame for data manipulation. Drop Columns Using df.drop() MethodLet's consider an example of the dataset (data) with three columns 'A', 'B', and 'C'. Now, to drop a single column, use the drop() method with the columnâs name. [GFGTABS] Python
4 min read