
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Subset R Data Frame Based on String Match in Two Columns with OR Condition
To subset an R data frame based on string match in two columns with OR condition, we can use grepl function with double square brackets and OR operator |. For example, if we have a data frame called df that contains two string columns say x and y then subsetting based on a particular string match in any of the columns can be done by using the below
Syntax
df[grepl("text",df[["x"]])|grepl("text",df[["y"]]),]
Check out the below examples to understand how it works.
Example1
Consider the below data frame −
f1<-sample(c("India","China","Egypt","UK"),20,replace=TRUE) f2<-sample(c("India","China","Egypt","UK"),20,replace=TRUE) v1<-rnorm(20) df1<-data.frame(f1,f2,v1) df1
Output
f1 f2 v1 1 India India 0.58383357 2 UK Egypt -0.71045054 3 India China -0.07848666 4 Egypt India 1.21017481 5 Egypt UK -0.81991817 6 Egypt China 1.98979283 7 India India 0.36160374 8 Egypt China -1.77619986 9 China UK -0.05397712 10 India Egypt -0.30372078 11 Egypt India -1.68623489 12 India India -0.41997104 13 India China -0.97064798 14 UK Egypt 2.02704796 15 UK Egypt -0.47732133 16 China China 0.53153059 17 Egypt UK -1.71608164 18 Egypt India -0.73298689 19 UK UK 1.83674440 20 China China -1.12186527
Subsetting df1 based on matching of India in any of the first two columns −
df1<-df1[grepl("India",df1[["f1"]])|grepl("India",df1[["f2"]]),] df1
f1 f2 v1 1 India India 0.58383357 3 India China -0.07848666 4 Egypt India 1.21017481 7 India India 0.36160374 10 India Egypt -0.30372078 11 Egypt India -1.68623489 12 India India -0.41997104 13 India China -0.97064798 18 Egypt India -0.73298689
Example2
g1<-sample(c("Male","Female"),20,replace=TRUE) g2<-sample(c("Male","Female"),20,replace=TRUE) v2<-rpois(20,5) df2<-data.frame(g1,g2) df2
Output
g1 g2 1 Female Male 2 Female Male 3 Female Female 4 Male Male 5 Male Female 6 Female Female 7 Female Male 8 Male Male 9 Male Female 10 Male Female 11 Female Female 12 Male Male 13 Male Male 14 Male Female 15 Female Male 16 Female Male 17 Female Male 18 Male Female 19 Female Female 20 Male Female
Subsetting df2 based on matching of Female in any of the first two columns −
df2<-df2[grepl("Female",df2[["g2"]])|grepl("Female",df2[["g2"]]),] df2
g1 g2 3 Female Female 5 Male Female 6 Female Female 9 Male Female 10 Male Female 11 Female Female 14 Male Female 18 Male Female 19 Female Female 20 Male Female
Advertisements