Data Visualisation using ggplot2(Scatter Plots)
Last Updated :
24 Apr, 2025
The correlation Scatter Plot is a crucial tool in data visualization and helps to identify the relationship between two continuous variables. In this article, we will discuss how to create a Correlation Scatter Plot using ggplot2 in R. The ggplot2 library is a popular library used for creating beautiful and informative data visualizations in R Programming Language.
- Scatter Plot: A scatter plot is a graphical representation of the relationship between two variables, where each observation is represented by a point on a 2D plane.
- Correlation: Correlation is a measure of the linear association between two variables. The correlation coefficient can range from -1 to 1, where -1 indicates a perfect negative correlation, 0 indicates no correlation, and 1 indicates a perfect positive correlation.
- ggplot2: ggplot2 is a widely used data visualization library in R. It provides a simple and intuitive syntax for creating complex visualizations.
- Load the ggplot2 library: Before creating a Correlation Scatter Plot, you need to load the ggplot2 library by using the following command: “library(ggplot2)”.
- Prepare the data: You need to prepare the data that you want to visualize in the form of a data frame. The data should contain two columns, representing the two variables that you want to visualize.
Basic correlation Scatter Plot using ggplot2:
The first we’ll do is load the necessary packages and create a sample dataset. For the below example, we’ll use the default mtcars dataset that contains information on various car models and their specifications.
R
library (ggplot2)
data (mtcars)
df <- mtcars[, c ( "mpg" , "wt" )]
|
The next thing we’ll do is use ggplot() function that creates a plot object and will use the geom_point() function to add points to the plot with mpg on the x-axis and wt on the y-axis:
R
ggplot (df, aes (x = mpg, y = wt)) +
geom_point ()
|
Output:

Scatter plot using ggplot2
It is often useful to add a regression line to plot for the visualization of the overall trend in data. For doing this we can use the geom_smooth() function:
R
ggplot (df, aes (x = mpg, y = wt)) +
geom_point () +
geom_smooth (method = "lm" )
|
Output:
This above snippet will add a regression line to the plot using the linear regression method. Here’s another example of a correlation scatter plot using the ggplot2 package. For this example, we’ll use the iris dataset that contains information on various iris flowers and their petal and sepal dimensions.
R
data (iris)
df <- iris[, c ( "Sepal.Length" , "Sepal.Width" ,
"Petal.Length" , "Petal.Width" ,
"Species" )]
|
Then, we’ll use the ggplot() function to create a plot object, and the geom_point() function to add points to the plot with Sepal.Length on the x-axis and Petal.Length on the y-axis. We’ll also use the aes() function to map the color of points to different Species of iris flowers.
R
ggplot (df, aes (x = Sepal.Length,
y = Petal.Length,
color = Species)) +
geom_point ()
|
Output:
Now, to add a regression line to the plot, we would use the geom_smooth() function with the method argument set to “lm” for linear regression:
R
ggplot (df, aes (x = Sepal.Length,
y = Petal.Length,
color = Species)) +
geom_point () +
geom_smooth (method = "lm" )
|
Output:
To further customize the plot, what we can do is use the facet_wrap() function to create separate plots for each Species of iris flower:
R
ggplot (df, aes (x = Sepal.Length,
y = Petal.Length,
color = Species)) +
geom_point () +
geom_smooth (method = "lm" ) +
facet_wrap (~Species, ncol = 2)
|
Output:
In conclusion to this example, we loaded the ggplot2 package, created a sample dataset, and used ggplot() to initialize a plot object anc then used the geom_point() to add points to the plot with the color of the points mapped to the different Species using the aes() function. Then, added a regression line to the plot using the geom_smooth() function with the method argument set to “lm” for linear regression. Finally used the facet_wrap() to create separate plots for each Species and specified the number of columns using the ncol argument.
Scatter Plot of MPG dataset using the ggplot2 function
As we know we’ll load the necessary packages and create a sample dataset first. For this example we are going to use the mpg dataset that contains information on various cars and their fuel economy:
R
library (ggplot2)
data (mpg)
df <- mpg[, c ( "displ" , "hwy" , "cyl" , "class" )]
|
Next, we’ll use the ggplot() function to create a plot object and the aes() function to map the displ column to the x-axis and hwy column to the y-axis. And also use the geom_point() function to add points to the plot with the color of the points mapped to cyl column and the shape of the points mapped to the class column. We’ll be using the scale_shape_manual() and scale_color_manual() functions to manually set the shapes and colors of the points.
R
ggplot (df, aes (x = displ, y = hwy,
color = factor (cyl),
shape = factor (class)))+
geom_point () +
scale_shape_manual (values = c (15, 16, 17,
18, 19, 24, 25))
|
Output:
Now to add a regression line to the plot we could use the stat_smooth() function with the method argument set to “lm” for linear regression:
R
ggplot (df, aes (x = displ, y = hwy,
color = factor (cyl),
shape = factor (class))) +
geom_point () +
scale_shape_manual (values = c (15, 16, 17,
18, 19, 24, 25)) +
stat_smooth (method = "lm" , se = FALSE )
|
Output:
To further customize the plot, we’ve changed the color palette using the scale_color_brewer() function with palette = “Set1” to use a more visually appealing color scheme.
R
ggplot (df, aes (x = displ, y = hwy,
color = factor (cyl),
shape = factor (class))) +
geom_point () +
scale_shape_manual (values = c (15, 16,
17, 18,
19, 24, 25)) +
stat_smooth (method = "lm" , se = FALSE )+
scale_color_brewer (palette = "Set1" )
|
Output:
Finally, we can use the labs() function to add custom axis and legend labels:
R
ggplot (df, aes (x = displ, y = hwy,
color = factor (cyl),
shape = factor (class))) +
geom_point () +
scale_shape_manual (values = c (15, 16, 17,
18, 19, 24, 25)) +
stat_smooth (method = "lm" , se = FALSE )+
scale_color_brewer (palette = "Set1" ) +
labs (x = "Engine displacement (L)" ,
y = "Highway fuel economy (mpg)" ,
color = "Number of cylinders" ,
shape = "Vehicle class" )
|
In conclusion to this example, we created a correlation scatter plot with engine displacement (displ) on the x-axis, highway fuel economy (hwy) on the y-axis, and color and shape of points mapped to a number of cylinders (cyl) and vehicle class. The plot also includes a linear regression line with shaded confidence intervals and custom labels for the axes and legend. Also, the color and shape of the points are manually specified using the scale_color_manual() and scale_shape_manual() functions, respectively.
Conclusion:
In this article, we demonstrated how to create a correlation scatter plot in R using the ggplot2 library. We’ve discussed the concepts of scatter plots, correlation, and ggplot2, and provided step-by-step instructions on how to create a scatter plot. Three detailed examples were also provided to showcase the capabilities of ggplot2. The information in the article should be useful for anyone looking to visualize the relationship between two variables using a scatter plot in R.
Similar Reads
Data Visualization using Plotnine and ggplot2 in Python
Plotnoine is a Python library that implements a grammar of graphics similar to ggplot2 in R. It allows users to build plots by defining data, aesthetics, and geometric objects. This approach provides a flexible and consistent method for creating a wide range of visualizations. It is built on the con
7 min read
Data Visualization using ggvis Package in R
The ggvis is an interactive visualization package in R language that is based on the popular ggplot2 package. It allows you to create interactive plots and graphics that can be explored and manipulated by the user. ggvis supports a wide range of plot types including scatter plots, line charts, bar c
15+ min read
Master Data Visualization With ggplot2
In this article, we are going to see the master data visualization with ggplot2 in R Programming Language. Generally, data visualization is the pictorial representation of a dataset in a visual format like charts, plots, etc. These are the important graphs in data visualization with ggplot2, Bar Ch
8 min read
How to Save Time with Data Visualization using Stack in R with ggplot2
The widely used R package ggplot2 is used to produce beautiful and efficient data visualisations. Here are some pointers for speeding up data visualisation using the "stack" feature of ggplot2: Select the pertinent information: Make sure the data you plan to use in your visualisation is appropriate.
6 min read
Visualizing Multiple Datasets on the Same Scatter Plot
Seaborn is a powerful Python visualization library built on top of Matplotlib, designed for making statistical graphics easier and more attractive. One common requirement in data visualization is to compare two datasets on the same scatter plot to identify patterns, correlations, or differences. Thi
4 min read
How to Plot 3D Scatter Diagram Using ggplot in R
The ggplot2 package in R is one of the most popular tools for creating complex and aesthetically pleasing plots. However, ggplot2 is primarily designed for 2D plotting, which presents a challenge when it comes to creating 3D scatter plots. While ggplot2 does not natively support 3D plotting, it can
4 min read
Visualizing clusters using Hull Plots in ggplot2 using ggforce
The HULL Plots are also known as Grouped Scatter Plots because these plots are used to segregate the scatter plots based on clusters. The Hull plots are much more useful when one wants to visualize the clusters present among the data. The Hull plots in R can be plotted using the geom_mark_hull() fun
2 min read
Time series visualization with ggplot2 in R
In this article, we will discuss time-series visualization with the ggplot2 package in the R programming Language. A time series is the series of data points listed in the order timeline i.e. one of the axes in the form of dates, years, or months. A time series is a sequence of successive equal inte
3 min read
Plot from DataFrame in ggplot2 using R
ggplot2 is a popular data visualization library in the R programming language. It is widely used for creating beautiful, customizable, and informative visualizations. One of the most useful features of ggplot2 is the ability to plot data stored in a data frame. In this article, we will learn how to
4 min read
Create a Scatter Plot with Multiple Groups using ggplot2 in R
In this article, we will discuss how to create a scatter plot with multiple groups in R Programming Language. Geoms can be added to the plot to compute various graphical representations of the data in the plot (points, lines, bars). The geom_point() method is used to create scatter plots in R. The g
2 min read