
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Replace Values in Columns Based on Condition in Pandas
In Python, we can replace values in Column based on conditions in Pandas with the help of various inbuilt functions like loc, where and mask, apply and lambda, etc. Pandas is a Python library that is used for data manipulation and work with structured data. In this article, we will replace values in columns based on conditions in Pandas.
Method 1: Using loc
The loc function is used to access a group of rows and columns in a DataFrame. We can use this function to replace values in a column based on some condition.
Syntax
df.loc[row_labels, column_labels]
The loc method is used to select rows and columns from a DataFrame based on labels. Here,row_labels is a label or a list of labels to select rows from the DataFrame and column_labels is a label or a list of labels to select columns from the DataFrame.
Example
In the below example, we will replace the gender of the people with age more than 50 with Male, in our created data frame. we used df.loc[df['age'] >= 50, 'gender'] to access all the rows where age is greater than or equal to 50, and the 'gender' column of those rows. We then replaced the value of the 'gender' column with 'M'.
import pandas as pd data = { 'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'], 'age': [25, 35, 45, 55, 65], 'gender': ['F', 'M', 'M', 'F', 'F'] } df = pd.DataFrame(data) df.loc[df['age'] >= 50, 'gender'] = 'M' print(df)
Output
name age gender 0 Alice 25 F 1 Bob 35 M 2 Charlie 45 M 3 David 55 M 4 Emily 65 M
Method 2: Using where and mask
The where and mask functions are used to replace values based on a condition. The where function replaces values where the condition is False, and the mask function replaces values where the condition is True.
Syntax
df.where(condition, other=nan, inplace=False, axis=None, level=None, errors='raise') df.mask(condition, other=nan, inplace=False, axis=None, level=None, errors='raise')
The where and mask methods are used to replace values in a DataFrame based on a condition. Here, the condition is a boolean array or a callable function that specifies the condition for the replacement. other is the value to replace the existing values with. If inplace is True, the original DataFrame is modified. axis specifies whether to replace values along rows (0) or columns (1). level specifies the level for multi-level indexing. errors specifies how to handle error
Example
In the below example, we will replace the age of all the person with 0 whose gender is Male. we used df['age'].where(df['gender'] != 'M', 0) to replace the age with 0 where the gender is 'M'.
import pandas as pd data = { 'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'], 'age': [25, 35, 45, 55, 65], 'gender': ['F', 'M', 'M', 'F', 'F'] } df = pd.DataFrame(data) df['age'] = df['age'].where(df['gender'] != 'M', 0) print(df)
Output
name age gender 0 Alice 25 F 1 Bob 0 M 2 Charlie 0 M 3 David 55 F 4 Emily 65 F
We can also perform the same operation using the mask method.
import pandas as pd data = { 'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'], 'age': [25, 35, 45, 55, 65], 'gender': ['F', 'M', 'M', 'F', 'F'] } df = pd.DataFrame(data) df['age'] = df['age'].mask(df['gender'] == 'M', 0) print(df)
Output
name age gender 0 Alice 25 F 1 Bob 0 M 2 Charlie 0 M 3 David 55 F 4 Emily 65 F
Method 3: Using Apply and Lambda
We can also use the apply function along with a lambda function to replace values in a column based on some condition.
Syntax
df.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds) lambda arguments: expression
The apply method is used to apply a function to a DataFrame. The lambda function is a type of anonymous function that can be used with the apply method to apply a function to each row or column of the DataFrame. Here, func is the function to apply to the DataFrame.axis specifies whether to apply the function to rows (0) or columns (1).raw if True, the function is applied to the underlying numpy array.result_types specifies the type of the resulting object. args is a tuple of arguments to pass to the function.**kwds is an additional keyword argument to pass to the function.
Example
In the below example, we used df.apply(lambda x: 'F' if x['name'].startswith('A') else x['gender'], axis=1) to apply a lambda function to each row of the DataFrame. The lambda function replaces the gender with 'F' where the name starts with 'A'.
import pandas as pd data = { 'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'], 'age': [25, 35, 45, 55, 65], 'gender': ['F', 'M', 'M', 'F', 'F'] } df = pd.DataFrame(data) df['gender'] = df.apply(lambda x: 'F' if x['name'].startswith('A') else x['gender'], axis=1) print(df)
Output
name age gender 0 Alice 25 F 1 Bob 35 M 2 Charlie 45 M 3 David 55 F 4 Emily 65 F
Method 4: Using map method
The map method is used to replace values in a DataFrame column based on a dictionary.
Syntax
df['column'] = df['column'].map(dict)
Here, column is the column to replace values in and dict is a dictionary that maps the old values to the new values.
Example
If we want to replace the gender of all people whose age is less than or equal to 30 with an 'F'. We can use the map method like this ?
import pandas as pd data = { 'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'], 'age': [25, 35, 45, 55, 65], 'gender': ['F', 'M', 'M', 'F', 'F'] } df = pd.DataFrame(data) df['age'] = df['age'].mask(df['gender'] == 'M', 0) print(df)
Output
name age gender 0 Alice 25 F 1 Bob 0 M 2 Charlie 0 M 3 David 55 F 4 Emily 65 F
Method 5: Using numpy.where() method
The numpy.where() method is used to replace values in a DataFrame column based on a condition.
Syntax
df['column'] = np.where(condition, x, y)
Here, condition is a boolean array that specifies the condition for the replacement.
X is the value to replace the existing values with where the condition is True. y is the value to keep where the condition is False.
Example
If we want to replace the age of all people whose gender is 'M' with 0. We can use the numpy.where() method like this ?
import pandas as pd import numpy as np data = { 'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'], 'age': [25, 35, 45, 55, 65], 'gender': ['F', 'M', 'M', 'F', 'F'] } df = pd.DataFrame(data) df['age'] = np.where(df['gender'] == 'M', 0, df['age']) print(df)
Output
name age gender 0 Alice 25 F 1 Bob 0 M 2 Charlie 0 M 3 David 55 F 4 Emily 65 F
Conclusion
In the below example, we discussed how we can replace values in columns based on Conditions in pandas using Python inbuilt methods like loc, where and mask, apply, and lambda, map(), and numpy.where() method we can replace values in columns based on the condition is pandas. Depending on the scenario and the type of data, one method may be more suitable than the others. It's always good practice to choose a method that is efficient and easy to understand.