Pandas dataframe.sort_index()

How to Sort Pandas DataFrame?

Last Updated : 02 Dec, 2024

Pandas provides a powerful method called sort_values() that allows to sort the DataFrame based on one or more columns. The method can sort in both ascending and descending order, handle missing values, and even apply custom sorting logic. To immediately understand how sorting works, let’s look at a simple example:

1. Sort DataFrame by One Column Value

To sort a DataFrame by a single column, you use the sort_values() method and specify the column name using the by parameter.

Python

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'Score': [85, 90, 95, 80]}
df = pd.DataFrame(data)

# Sorting by 'Age' in ascending order
sorted_df = df.sort_values(by='Age')
print(sorted_df)

Output:

Sort-Pandas-DataFrame

Sort Pandas DataFrame

In this example, the DataFrame is sorted by the Age column in ascending order. Now let’s dive deeper into how this works.

Sorting is essential when dealing with large datasets as it helps organize and interpret data more efficiently. In Pandas, the sort_values() method allows you to sort a DataFrame by one or more columns. By default, it sorts in ascending order but can be customized with various parameters.

Key Parameters of sort_values():

by: Specifies the column(s) to sort by.
ascending: Boolean (default True). If False, sorts in descending order.
inplace: If True, modifies the original DataFrame; otherwise returns a new sorted DataFrame.
na_position: Specifies whether to place NaN values at the beginning (‘first’) or end (‘last’).
ignore_index: If True, resets the index after sorting.

By default, the sorting is done in ascending order. If you want to sort in descending order, you can set the ascending parameter to False.

Python

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],'Age': [25, 30, 35, 40],'Score': [85, 90, 95, 80]}
df = pd.DataFrame(data)

# Sorting by 'Age' in descending order
sorted_df = df.sort_values(by='Age',ascending=False)
print(sorted_df)

Output

      Name  Age  Score
3    David   40     80
2  Charlie   35     95
1      Bob   30     90
0    Alice   25     85

2. Sort DataFrame by Multiple Columns

Sometimes, you need to sort your data based on multiple criteria. For example, you might want to sort by age and then by name. You can achieve this by passing a list of column names to the by parameter.

Python

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'Score': [85, 90, 95, 80]}
df = pd.DataFrame(data)

# Sorting by 'Score' in ascending order
sorted_df = df.sort_values(by=['Age', 'Score'])
print(sorted_df)

Output

      Name  Age  Score
0    Alice   25     85
1      Bob   30     90
2  Charlie   35     95
3    David   40     80

This will first sort by Age, and if there are ties (same age), it will then sort by Score. You can also specify different sort orders for each column by using the ascending parameter with a list of boolean values.

3. Sort DataFrame with Missing Values

When datasets contain missing values, sorting behavior can be controlled using na_position parameter in sort_values(). By default, missing values are placed last, but you can place them first if needed.

Python

import pandas as pd
data_with_nan = {"Name": ["Alice", "Bob", "Charlie", "David"],"Age": [28, 22, None, 22]}
df_nan = pd.DataFrame(data_with_nan)

# Sort by 'Age', placing missing values first
sorted_df = df_nan.sort_values(by="Age", na_position="first")
print(sorted_df)

Output

      Name   Age
2  Charlie   NaN
1      Bob  22.0
3    David  22.0
0    Alice  28.0

The na_position='first' option moves rows with NaN values to the top during sorting.

Choosing the Sorting Algorithm

Pandas allows you to specify the sorting algorithm using the kind parameter. The available options are:

'quicksort': Quicksort is a highly efficient, divide-and-conquer sorting algorithm. It selects a “pivot” element and partitions the dataset into two halves: one with elements smaller than the pivot and the other with elements greater than the pivot.
'mergesort': Divides the dataset into smaller subarrays, sorts them, and then merges them back together in sorted order.
'heapsort': Heapsort is another comparison-based sorting algorithm that builds a heap data structure to systematically extract the largest or smallest element and reorder the dataset.

To better demonstrate the behavior and benefits of using the 'mergesort' algorithm, particularly its stability, let’s modify the example to include duplicate values in the column being sorted.

Python

import pandas as pd

# Create a DataFrame with duplicate 'Age' values
data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "Age": [28, 22, 25, 22, 28],
    "Score": [85, 90, 95, 80, 88]
}
df = pd.DataFrame(data)

# Sort the DataFrame by 'Age' using the 'mergesort' algorithm
sorted_df = df.sort_values(by='Age', kind='mergesort')
print(sorted_df)

Output:

Sort-Pandas-DataFrame

Sort Pandas DataFrame

Stability ensures that the relative order of rows with equal values in the sorting column is preserved.

Custom Sorting with Key Functions

You can also apply custom sorting logic using the key parameter. For example, let’s say you want to sort strings ignoring case sensitivity:

Python

import pandas as pd
data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "Age": [28, 22, 25, 22, 28],
    "Score": [85, 90, 95, 80, 88]
}
df = pd.DataFrame(data)

sorted_df = df.sort_values(by='Name', key=lambda col: col.str.lower())
print(sorted_df)

Output

      Name  Age  Score
0    Alice   28     85
1      Bob   22     90
2  Charlie   25     95
3    David   22     80
4      Eve   28     88

This ensures that names are sorted alphabetically without considering case differences.

Key Takeaways:

sort_values() is versatile and allows sorting by one or multiple columns.
You can control whether sorting is ascending or descending using the ascending parameter.
Missing values (NaN) can be placed at either the beginning or end using the na_position parameter.
Custom sorting logic can be applied using the key parameter.

Pandas dataframe.sort_index()

vanshgaur14866

Improve

Article Tags :

Practice Tags :

Similar Reads

How to Sort a Pandas DataFrame by Date?

In the real world, we can come across datasets of any form that may include the date inside them too. These datasets can be present in any file format like .CSV, .xlsx, .txt, etc. To load this data inside Python, we use a library named Pandas which provides us a plethora of functions and methods to

Pandas dataframe.sort_index()

Pandas is one of those packages and makes importing and analyzing data much easier. When working with DataFrames, Pandas is used for handling tabular data. Let's learn Pandas DataFrame sort_index() method, which is used to sort the DataFrame based on index or column labels. Pandas sort_index() funct

Pandas Dataframe.sort_values()

In Pandas, sort_values() function sorts a DataFrame by one or more columns in ascending or descending order. This method is essential for organizing and analyzing large datasets effectively. Syntax: DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last'

How to Reverse Row in Pandas DataFrame?

In this article, we will learn how to reverse a row in a pandas data frame using Python.Â With the help of Pandas, we can perform a reverse operation by using loc(), iloc(), reindex(), slicing, and indexing on a row of a data set.Â Creating Dataframe Letâ€™s create a simple data frame with a dictionar

How to Stack Multiple Pandas DataFrames?

In this article, we will see how to stack Multiple Pandas Dataframe. Stacking means appending the dataframe rows to the second dataframe and so on. If there are 4 dataframes, then after stacking the result will be a single dataframe with an order of dataframe1,dataframe2,dataframe3,dataframe4. Panda

Pandas DataFrame take() Method

Python is a great tool for data analysis, primarily because of the fantastic ecosystem of data-centric Python packages like Pandas which make analyzing data much easier. Pandas take() function returns elements on the given indices, along an axis. This means that we are not indexing according to actu

How to combine two DataFrames in Pandas?

While working with data, there are multiple times when you would need to combine data from multiple sources. For example, you may have one DataFrame that contains information about a customer, while another DataFrame contains data about their transaction history. If you want to analyze this data tog

Python | Pandas dataframe.mode()

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.mode() function gets the mode(s) of each element along the axis selec

How to Get First Row of Pandas DataFrame?

To get the first row of a Pandas Dataframe there are several methods available, each with its own advantages depending on the situation. The most common methods include using .iloc[], .head(), and .loc[]. Let's understand with this example: [GFGTABS] Python import pandas as pd data = {'Name'

Python | Pandas dataframe.eq()

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.eq() is a wrapper used for the flexible comparison. It provides a con