
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Roll Up R Data Frame Columns for Summation by Group with Missing Values
The summation of column values if missing values exist in the R data frame can be found with the help of summarise_each function of dplyr package where we can remove missing values by setting na.rm argument to TRUE.
Since, we we will have groups in the data frame hence group_by function of the same package will help the summarise_each function to perform the summation by group. Check out the below Examples to understand how it works.
Example 1
Following snippet creates a sample data frame −
Grp<-sample(c("I","II","III"),20,replace=TRUE) Rate<-sample(c(NA,2,4),20,replace=TRUE) df1<-data.frame(Grp,Rate) df1
The following dataframe is created
Grp Rate 1 II 2 2 I 4 3 II NA 4 II 4 5 II 2 6 I NA 7 III 2 8 III 4 9 I NA 10 I 4 11 III 4 12 II 4 13 II NA 14 I 4 15 III 4 16 III 4 17 III 2 18 II NA 19 III 2 20 III 2
To load dplyr package and rolling up columns in df1 to find the summation by ignoring missing values on the above created data frame, add the following code to the above snippet −
Grp<-sample(c("I","II","III"),20,replace=TRUE) Rate<-sample(c(NA,2,4),20,replace=TRUE) df1<-data.frame(Grp,Rate) library(dplyr) df1%%group_by(Grp)%%summarise_each(funs(sum(.,na.rm=TRUE))) # A tibble: 3 x 2
Output
If you execute all the above given snippets as a single program, it generates the following Output −
Grp Rate <chr <dbl 1 I 12 2 II 12 3 III 24
Example 2
Following snippet creates a sample data frame −
Class<-sample(c("First","Second","Third"),20,replace=TRUE) Price<-sample(c(NA,1.25,2.75),20,replace=TRUE) df2<-data.frame(Class,Price) df2
The following dataframe is created
Class Price 1 Third 1.25 2 First NA 3 Third 1.25 4 Third 2.75 5 Second NA 6 Second 1.25 7 Second 1.25 8 Third NA 9 First 2.75 10 Second NA 11 Second NA 12 Second NA 13 Third NA 14 Third NA 15 Third NA 16 Third 2.75 17 First 2.75 18 Third NA 19 Third 1.25 20 Third NA
To roll up columns in df2 to find the summation by ignoring missing values on the above created data frame, add the following code to the above snippet −
Class<-sample(c("First","Second","Third"),20,replace=TRUE) Price<-sample(c(NA,1.25,2.75),20,replace=TRUE) df2<-data.frame(Class,Price) df2%%group_by(Class)%%summarise_each(funs(sum(.,na.rm=TRUE))) # A tibble: 3 x 2
Output
If you execute all the above given snippets as a single program, it generates the following Output −
Class Price <chr <dbl 1 First 5.5 2 Second 2.5 3 Third 9.25
Example 3
Following snippet creates a sample data frame −
Category<-sample(c("Small","Medium","Large"),20,replace=TRUE) Quantity<-sample(c(NA,500,1000),20,replace=TRUE) df3<-data.frame(Category,Quantity) df3
The following dataframe is created
Category Quantity 1 Large 1000 2 Small 1000 3 Small 500 4 Large 500 5 Small 1000 6 Medium NA 7 Small 500 8 Medium 500 9 Large NA 10 Medium 500 11 Medium NA 12 Large NA 13 Small 500 14 Medium 1000 15 Large NA 16 Medium 500 17 Small 500 18 Medium NA 19 Small NA 20 Medium 1000
To roll up columns in df3 to find the summation by ignoring missing values on the above created data frame, add the following code to the above snippet −
Category<-sample(c("Small","Medium","Large"),20,replace=TRUE) Quantity<-sample(c(NA,500,1000),20,replace=TRUE) df3<-data.frame(Category,Quantity) df3%%group_by(Category)%%summarise_each(funs(sum(.,na.rm=TRUE))) # A tibble: 3 x 2
Output
If you execute all the above given snippets as a single program, it generates the following Output −
Category Quantity <chr <dbl 1 Large 1500 2 Medium 3500 3 Small 4000