Working with Sparse Matrices in R Programming
Last Updated :
08 Jun, 2023
Sparse matrices are sparsely populated collections of elements, where there is a very less number of non-null elements. Storage of sparsely populated data in a fully dense matrix leads to increased complexities of time and space. Therefore, the data structures are optimized to store this data much more efficiently and decrease the access time of elements.
Creating a Sparse Matrix
R has an in-built package “matrix” which provides classes for the creation and working with Sparse Matrices in R.
library(Matrix)
The following code snippet illustrates the usage of the matrix library:
R
library ( 'Matrix' )
mat1 <- Matrix (0, nrow = 1000,
ncol = 1000,
sparse = TRUE )
mat1[1][1]<-5
print ( "Size of sparse mat1" )
print ( object.size (mat1))
|
Output:
[1] "Size of sparse mat1"
5440 bytes
The space occupied by the sparse matrix decrease largely, because it saves space only for the non-zero values.
Constructing Sparse Matrices From Dense
The dense matrix can be simply created by the in-built matrix() command in R. The dense matrix is then fed as input into the as() function which is embedded implicitly in R. The function has the following signature:
Syntax: as(dense_matrix, type = )
Parameters:
dense_matrix : A numeric or logical array.
type : Default evaluates to dgCMatrix, in case we mention sparseMatrix. This converts the matrix to compressed sparse column( CSC ) format. The other type available is the dgRMatrix, which converts the dense matrix in sparse row format.
The following code snippet indicates the conversion of the dense matrix to Sparse Matrices in R:
R
library (Matrix)
set.seed (0)
rows <- 4L
cols <- 6L
vals <- sample (
x = c (0, 6, 7),
prob = c (0.8, 0.1, 0.1),
size = rows * cols,
replace = TRUE
)
dense_mat <- matrix (vals, nrow = rows)
print ( "Dense Matrix" )
print (dense_mat)
sparse_mat <- as (dense_mat,
"sparseMatrix" )
print ( "Sparse Matrix" )
print (sparse_mat)
|
Output:
[1] "Dense Matrix"
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 7 6 0 0 0 0
[2,] 0 0 0 0 0 6
[3,] 0 7 0 0 6 0
[4,] 0 6 0 0 0 0
[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 7 6 . . . .
[2,] . . . . . 6
[3,] . 7 . . 6 .
[4,] . 6 . . . .
Operations on Sparse Matrices
Various arithmetic and binding operations can be performed on Sparse Matrices in R:
Addition and subtraction by Scalar Value
The scalar values are added or subtracted to all the elements of the Sparse Matrices in R. The resultant matrix is a dense matrix since the scalar value is operated upon by all elements. The following code indicates the usage of + or – operators:
R
library (Matrix)
set.seed (0)
rows <- 4L
cols <- 6L
vals <- sample (
x = c (0, 10),
prob = c (0.85, 0.15),
size = rows * cols,
replace = TRUE
)
dense_mat <- matrix (vals, nrow = rows)
sparse_mat <- as (dense_mat, "sparseMatrix" )
print ( "Sparse Matrix" )
print (sparse_mat)
print ( "Addition" )
print (sparse_mat + 5)
print ( "Subtraction" )
print (sparse_mat - 1)
|
Output:
[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 10 10 . . . .
[2,] . . . . . 10
[3,] . 10 . . 10 .
[4,] . 10 . . . .
[1] "Addition"
4 x 6 Matrix of class "dgeMatrix"
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 15 15 5 5 5 5
[2,] 5 5 5 5 5 15
[3,] 5 15 5 5 15 5
[4,] 5 15 5 5 5 5
[1] "Subtraction"
4 x 6 Matrix of class "dgeMatrix"
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 9 9 -1 -1 -1 -1
[2,] -1 -1 -1 -1 -1 9
[3,] -1 9 -1 -1 9 -1
[4,] -1 9 -1 -1 -1 -1
Multiplication or Division by Scalar
These operations are performed on all the non-zero elements of the matrix. The resultant matrix is a sparse matrix:
R
set.seed (0)
rows <- 4L
cols <- 6L
vals <- sample (
x = c (0, 10),
prob = c (0.85, 0.15),
size = rows * cols,
replace = TRUE
)
dense_mat <- matrix (vals, nrow = rows)
sparse_mat <- as (dense_mat, "sparseMatrix" )
print ( "Sparse Matrix" )
print (sparse_mat)
print ( "Multiplication" )
print (sparse_mat * 10)
print ( "Division" )
print (sparse_mat / 10)
|
Output:
[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 10 10 . . . .
[2,] . . . . . 10
[3,] . 10 . . 10 .
[4,] . 10 . . . .
[1] "Multiplication"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 100 100 . . . .
[2,] . . . . . 100
[3,] . 100 . . 100 .
[4,] . 100 . . . .
[1] "Division"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 1 1 . . . .
[2,] . . . . . 1
[3,] . 1 . . 1 .
[4,] . 1 . . . .
Matrix Multiplication
Matrices can be multiplied with each other, irrespective of sparse or dense. However, the columns of the first matrix should be equal to the rows of the second.
R
library (Matrix)
set.seed (0)
rows <- 4L
cols <- 6L
vals <- sample (
x = c (0, 10),
prob = c (0.85, 0.15),
size = rows * cols,
replace = TRUE
)
dense_mat <- matrix (vals, nrow = rows)
sparse_mat <- as (dense_mat, "sparseMatrix" )
print ( "Sparse Matrix" )
print (sparse_mat)
transpose_mat = t (sparse_mat)
mul_mat = sparse_mat %*% transpose_mat
print ( "Multiplication of Matrices" )
print (mul_mat)
|
Output:
[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 10 10 . . . .
[2,] . . . . . 10
[3,] . 10 . . 10 .
[4,] . 10 . . . .
[1] "Multiplication of Matrices"
4 x 4 sparse Matrix of class "dgCMatrix"
[1,] 200 . 100 100
[2,] . 100 . .
[3,] 100 . 200 100
[4,] 100 . 100 100
Multiplication by a Vector
Matrices can be multiplied by uni-dimensional vectors, to transform data. The rows are multiplied by the corresponding elements of the vector, that is the first row is multiplied by the first indexed element of the vector, until the length of the vector.
R
library (Matrix)
set.seed (0)
rows <- 4L
cols <- 6L
vals <- sample (
x = c (0, 10),
prob = c (0.85, 0.15),
size = rows * cols,
replace = TRUE
)
dense_mat <- matrix (vals, nrow = rows)
sparse_mat <- as (dense_mat, "sparseMatrix" )
print ( "Sparse Matrix" )
print (sparse_mat)
vec <- c (3, 2)
print ( "Multiplication by vector" )
print (sparse_mat * vec)
|
Output:
[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 10 10 . . . .
[2,] . . . . . 10
[3,] . 10 . . 10 .
[4,] . 10 . . . .
[1] "Multiplication by vector"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 30 30 . . . .
[2,] . . . . . 20
[3,] . 30 . . 30 .
[4,] . 20 . . . .
Combination of Matrices
Matrices can be combined with vectors or other matrices using column bind cbind( ) or row bind rbind( ) operations. The resultant matrices rows are the summation of the rows of the input matrices in rbind() function and the columns are the summation of the columns of the input matrices in cbind().
R
library (Matrix)
set.seed (0)
rows <- 4L
cols <- 6L
vals <- sample (
x = c (0, 10),
prob = c (0.85, 0.15),
size = rows * cols,
replace = TRUE
)
dense_mat <- matrix (vals, nrow = rows)
sparse_mat <- as (dense_mat, "sparseMatrix" )
print ( "Sparse Matrix" )
print (sparse_mat)
row_bind <- rbind (sparse_mat,
sparse_mat)
print ( "Row Bind" )
print (row_bind)
|
Output:
[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"
[1,] 10 10 . . . .
[2,] . . . . . 10
[3,] . 10 . . 10 .
[4,] . 10 . . . .
[1] "Row Bind"
8 x 6 sparse Matrix of class "dgCMatrix"
[1,] 10 10 . . . .
[2,] . . . . . 10
[3,] . 10 . . 10 .
[4,] . 10 . . . .
[5,] 10 10 . . . .
[6,] . . . . . 10
[7,] . 10 . . 10 .
[8,] . 10 . . . .
Properties of Sparse Matrices
- NA Values
NA values are not considered equivalent to sparsity and therefore are treated as non-zero values. However, they don’t participate in any sparse matrix operations.
R
library (Matrix)
mat <- matrix (data = c (5.5, 0, NA ,
0, 0, NA ), nrow = 3)
print ( "Original Matrix" )
print (mat)
sparse_mat <- as (mat, "sparseMatrix" )
print ( "Sparse Matrix" )
print (sparse_mat)
|
Output:
[1] "Original Matrix"
[,1] [,2]
[1,] 5.5 0
[2,] 0.0 0
[3,] NA NA
[1] "Sparse Matrix"
3 x 2 sparse Matrix of class "dgCMatrix"
[1,] 5.5 .
[2,] . .
[3,] NA NA
- Sparse matrix data can be written into an ordinary file in the MatrixMarketformat(.mtx). WriteMM function is available to transfer the data of a sparse matrix into a file.
writeMM(obj-matrix,file="fname.mtx")
Similar Reads
Data Wrangling in R Programming - Working with Tibbles
R is a robust language used by Analysts, Data Scientists, and Business users to perform various tasks such as statistical analysis, visualizations, and developing statistical software in multiple fields.In R Programming Language Data Wrangling is a process of reimaging the raw data to a more structu
6 min read
Lasso Regression in R Programming
Lasso regression is a classification algorithm that uses shrinkage in simple and sparse models(i.e models with fewer parameters). In Shrinkage, data values are shrunk towards a central point like the mean. Lasso regression is a regularized regression algorithm that performs L1 regularization which a
11 min read
Data Reshaping in R Programming
Generally, in R Programming Language, data processing is done by taking data as input from a data frame where the data is organized into rows and columns. Data frames are mostly used since extracting data is much simpler and hence easier. But sometimes we need to reshape the format of the data frame
5 min read
Array vs Matrix in R Programming
The data structure is a particular way of organizing data in a computer so that it can be used effectively. The idea is to reduce the space and time complexities of different tasks. Data structures in R programming are tools for holding multiple values. The two most important data structures in R ar
3 min read
Transporting Sparse Matrix from Python to R
The Sparse matrices are matrices that are predominantly composed of the zero values. They are essential in data science and scientific computing where memory and performance optimizations are crucial. Instead of storing every element sparse matrices only store the non-zero elements drastically reduc
5 min read
Regularization in R Programming
Regularization is a form of regression technique that shrinks or regularizes or constraints the coefficient estimates towards 0 (or zero). In this technique, a penalty is added to the various parameters of the model in order to reduce the freedom of the given model. The concept of Regularization can
7 min read
tidyr Package in R Programming
Packages in the R language are a collection of R functions, compiled code, and sample data. They are stored under a directory called âlibraryâ in the R environment. By default, R installs a set of packages during installation. One of the most important packages in R is the tidyr package. The sole pu
14 min read
How To Start Programming With R
R Programming Language is designed specifically for data analysis, visualization, and statistical modeling. Here, we'll walk through the basics of programming with R, from installation to writing our first lines of code, best practices, and much more. Table of Content 1. Installation2. Variables and
12 min read
Working with DataFrames in Julia
A Data frame is a two-dimensional data structure that resembles a table, where the columns represent variables and rows contain values for those variables. It is mutable and can hold various data types. Julia is a high performance, dynamic programming language which has a high-level syntax. The Data
7 min read
R Tutorial | Learn R Programming Language
R is an interpreted programming language widely used for statistical computing, data analysis and visualization. R language is open-source with large community support. R provides structured approach to data manipulation, along with decent libraries and packages like Dplyr, Ggplot2, shiny, Janitor a
6 min read