
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Replace Missing Values in R Data Frame Column
Often, we get missing data for analysis and we need to replace those missing values with something. Sometimes, we might want to replace them with the corresponding values in other column especially in situations when both the columns depict similar characteristics. This type of replacement can be easily done with the help of mutate function of dplyr package as shown in the below examples.
Example1
Consider the below data frame:
> set.seed(214) > x1<-rpois(20,5) > x2<-sample(c(NA,24,38,75),20,replace=TRUE) > df1<-data.frame(x1,x2) > df1
Output
x1 x2 1 4 75 2 8 24 3 5 38 4 4 38 5 7 NA 6 6 24 7 10 75 8 4 75 9 6 38 10 4 75 11 6 NA 12 6 75 13 7 38 14 3 24 15 6 NA 16 4 NA 17 1 NA 18 4 NA 19 10 NA 20 4 24
Loading dplyr package then replacing NAs in x2 with the corresponding values in x1 and creating a new column to store them:
Example
> library(dplyr) > df1 %>% mutate(x3=coalesce(x2,x1))
Output
x1 x2 x3 1 4 75 75 2 8 24 24 3 5 38 38 4 4 38 38 5 7 NA 7 6 6 24 24 7 10 75 75 8 4 75 75 9 6 38 38 10 4 75 75 11 6 NA 6 12 6 75 75 13 7 38 38 14 3 24 24 15 6 NA 6 16 4 NA 4 17 1 NA 1 18 4 NA 4 19 10 NA 10 20 4 24 24
Example2
> y1<-LETTERS[1:20] > y2<-sample(c(NA,"E","G","J"),20,replace=TRUE) > df2<-data.frame(y1,y2) > df2
Output
y1 y2 1 A E 2 B J 3 C E 4 D G 5 E E 6 F E 7 G J 8 H 9 I J 10 J E 11 K G 12 L J 13 M 14 N J 15 O G 16 P E 17 Q 18 R J 19 S J 20 T E
Replacing NAs in y2 with the corresponding values in y1 and creating a new column to store them:
Example
> df2 %>% mutate(y3=coalesce(y2,y1))
Output
y1 y2 y3 1 A E E 2 B J J 3 C E E 4 D G G 5 E E E 6 F E E 7 G J J 8 H H 9 I J J 10 J E E 11 K G G 12 L J J 13 M M 14 N J J 15 O G G 16 P E E 17 Q Q 18 R J J 19 S J J 20 T E E
Example3
> z1<-rnorm(20) > z2<-sample(c(NA,rnorm(5,2,1)),20,replace=TRUE) > df3<-data.frame(z1,z2) > df3
Output
z1 z2 1 -0.17901883 2.472282 2 -1.08901004 3.952225 3 0.42542338 NA 4 -0.60317450 NA 5 -0.03915205 1.437245 6 -0.16003909 3.952225 7 -0.46318505 2.472282 8 -1.42080605 1.372504 9 0.15067345 NA 10 0.19228563 2.472282 11 0.51799822 2.472282 12 0.50620920 2.472282 13 -0.21986645 1.437245 14 -0.17156659 1.372504 15 -1.28965823 2.472282 16 0.94734362 2.520788 17 1.78005478 NA 18 0.62910182 1.372504 19 -0.37591062 2.472282 20 -0.80137780 2.520788
Example
> df3 %>% mutate(z3=coalesce(z2,z1))
Output
z1 z2 z3 1 -0.17901883 2.472282 2.4722816 2 -1.08901004 3.952225 3.9522252 3 0.42542338 NA 0.4254234 4 -0.60317450 NA -0.6031745 5 -0.03915205 1.437245 1.4372447 6 -0.16003909 3.952225 3.9522252 7 -0.46318505 2.472282 2.4722816 8 -1.42080605 1.372504 1.3725039 9 0.15067345 NA 0.1506735 10 0.19228563 2.472282 2.4722816 11 0.51799822 2.472282 2.4722816 12 0.50620920 2.472282 2.4722816 13 -0.21986645 1.437245 1.4372447 14 -0.17156659 1.372504 1.3725039 15 -1.28965823 2.472282 2.4722816 16 0.94734362 2.520788 2.5207880 17 1.78005478 NA 1.7800548 18 0.62910182 1.372504 1.3725039 19 -0.37591062 2.472282 2.4722816 20 -0.80137780 2.520788 2.5207880
Advertisements