Plotting in R with ggplot2


Graphics in R

The R language has extensive graphical capabilities.

Graphics in R may be created by many different methods including base graphics and more advanced plotting packages such as lattice.

ggplot2

The ggplot2 package was created by Hadley Wickham to provide an intuitive plotting system to rapidly generate publication quality graphics.

ggplot2 builds on the concept of the “Grammar of Graphics” (Wilkinson 2005, Bertin 1983) which describes a consistent syntax for the construction of a wide range of complex graphics by a concise description of their components.

ggplot2 is a core part of the Tidyverse, a group of packages designed to make data science easy and functional in R. To get an introduction to the core concepts of Tidyverse check out our training materials here.

Why use ggplot2

The structured syntax and high level of abstraction used by ggplot2 should allow for the user to concentrate on the visualizations instead of creating the underlying code.

On top of this central philosophy ggplot2 has:

  • Increased flexible over many plotting systems.
  • An advanced theme system for professional/publication level graphics.
  • Large developer base – Many libraries extending its flexibility.
  • Large user base – Great documentation and active mailing list.

Grammar of Graphics


How ggplot2 builds a plot

igv

Example scatter plot

Overview of example code for the ggplot2 scatter plot.

ggplot(data = <default data set>, 
       aes(x = <default x axis variable>,
           y = <default y axis variable>,
           ... <other default aesthetic mappings>),
       ... <other plot defaults>) +

       geom_scatter(aes(size = <size variable for this geom>, 
                      ... <other aesthetic mappings>),
                  data = <data for this point geom>,
                  stat = <statistic string or function>,
                  position = <position string or function>,
                  color = <"fixed color specification">,
                  <other arguments, possibly passed to the _stat_ function) +

  scale_<aesthetic>_<type>(name = <"scale label">,
                     breaks = <where to put tick marks>,
                     labels = <labels for tick marks>,
                     ... <other options for the scale>) +
  
  ggtitle("Graphics/Plot")+
  xlab("Weight")+
  ylab("Height")+

  theme(plot.title = element_text(color = "gray"),
        ... <other theme elements>)

What users are required to specify

igv

Actual example scatter plot

ggplot(data = patients_clean, aes(y = Weight, x = Height, color = Sex, size = BMI,
    shape = Pet)) + geom_point()

Getting Started With ggplot2


Setting the Working directory

First we need a dataset. If you downloaded the course material you we have some data in there.

Your current working directory is typically your HOME directory. We want to set our working directory to be in the downloaded course material, so everyone is in the same place.

Session -> Set Working Directory -> Choose Directory

or in the console.

setwd("/PathToMyDownload/Plotting_In_R-master/r_course")
# e.g. setwd('/Users/mattpaul/Downloads/Intro_To_R_1Day/r_course')

Getting our data

Now lets get our data set. Here we read some data from the data directory using the read.delim() function.

We can use the class() function to get the data.type of our table and dim() function to get the numbers of row and column.

library(ggplot2)
patients_clean <- read.delim("data/patient-data-cleaned.txt", sep = "\t")

class(patients_clean)
## [1] "data.frame"
dim(patients_clean)
## [1] 100  17

Review the data

We can just review the first two rows to get an idea of the content of data

patients_clean[1:2, ]
##          ID    Name  Race  Sex     Smokes Height Weight      Birth    State Pet
## 1 AC/AH/001 Michael White Male Non-Smoker 182.87  76.57 1972-02-06  Georgia Dog
## 2 AC/AH/017   Derek White Male Non-Smoker 179.12  80.43 1972-06-15 Missouri Dog
##   Grade  Died Count Date.Entered.Study Age   BMI Overweight
## 1     2 FALSE  0.01         2015-12-01  44 22.90      FALSE
## 2     2 FALSE -1.31         2015-12-01  43 25.07       TRUE

Review the data.frame

By default, R’s read.delim function has read in the data as a data.frame.

Data.frames are essential for ggplot2 as we can have mixes of numerical, character and catergorical data in one table.

patients_clean$Smokes[1:5]
## [1] "Non-Smoker" "Non-Smoker" "Non-Smoker" "Non-Smoker" "Non-Smoker"
patients_clean$Height[1:5]
## [1] 182.87 179.12 169.15 175.66 164.47

Review the data.frame with summary

We can get an overview of the data in all columns of data.frame using the summary() function

summary(patients_clean)
##       ID                Name               Race               Sex           
##  Length:100         Length:100         Length:100         Length:100        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##     Smokes              Height          Weight         Birth          
##  Length:100         Min.   :157.0   Min.   :63.54   Length:100        
##  Class :character   1st Qu.:161.5   1st Qu.:68.17   Class :character  
##  Mode  :character   Median :165.7   Median :72.27   Mode  :character  
##                     Mean   :167.9   Mean   :74.89                     
##                     3rd Qu.:174.5   3rd Qu.:80.56                     
##                     Max.   :185.4   Max.   :97.67                     
##                                                                       
##     State               Pet                Grade          Died        
##  Length:100         Length:100         Min.   :1.000   Mode :logical  
##  Class :character   Class :character   1st Qu.:1.000   FALSE:46       
##  Mode  :character   Mode  :character   Median :2.000   TRUE :54       
##                                        Mean   :2.054                  
##                                        3rd Qu.:3.000                  
##                                        Max.   :3.000                  
##                                        NA's   :7                      
##      Count         Date.Entered.Study      Age             BMI       
##  Min.   :-3.1400   Length:100         Min.   :42.00   Min.   :21.41  
##  1st Qu.:-0.8100   Class :character   1st Qu.:42.75   1st Qu.:25.07  
##  Median :-0.0550   Mode  :character   Median :43.00   Median :26.51  
##  Mean   :-0.1066                      Mean   :43.09   Mean   :26.54  
##  3rd Qu.: 0.6150                      3rd Qu.:44.00   3rd Qu.:27.90  
##  Max.   : 1.7900                      Max.   :44.00   Max.   :31.70  
##                                                                      
##  Overweight     
##  Mode :logical  
##  FALSE:23       
##  TRUE :77       
##                 
##                 
##                 
## 

Our first ggplot2 graph

As seen above, in order to produce a ggplot2 graph we need a minimum of:

  • Data to be used in graph
  • Mappings of data to the graph (aesthetic mapping)
  • What type of graph we want to use (The geom to use).

Our first ggplot2 graph

In the code below we define the data as our cleaned patients data frame.

pcPlot <- ggplot(data = patients_clean)
class(pcPlot)
## [1] "gg"     "ggplot"
pcPlot$data[1:4, ]
##          ID    Name  Race  Sex     Smokes Height Weight      Birth        State
## 1 AC/AH/001 Michael White Male Non-Smoker 182.87  76.57 1972-02-06      Georgia
## 2 AC/AH/017   Derek White Male Non-Smoker 179.12  80.43 1972-06-15     Missouri
## 3 AC/AH/020    Todd Black Male Non-Smoker 169.15  75.48 1972-07-09 Pennsylvania
## 4 AC/AH/022  Ronald White Male Non-Smoker 175.66  94.54 1972-08-17      Florida
##    Pet Grade  Died Count Date.Entered.Study Age   BMI Overweight
## 1  Dog     2 FALSE  0.01         2015-12-01  44 22.90      FALSE
## 2  Dog     2 FALSE -1.31         2015-12-01  43 25.07       TRUE
## 3 None     2 FALSE -0.17         2015-12-01  43 26.38       TRUE
## 4  Cat     1 FALSE -1.10         2015-12-01  43 30.64       TRUE

Now we can see that we have gg/ggplot object (pcPlot).

Our first ggplot2 graph

Within this gg/ggplot object the data has been defined.

Our first ggplot2 graph

Important information on how to map the data to the visual properties (aesthetics) of the plot as well as what type of plot to use (geom) have however yet to specified.

pcPlot$mapping
## Aesthetic mapping: 
## <empty>
pcPlot$theme
## list()
pcPlot$layers
## list()

Our first ggplot2 graph

The information to map the data to the plot can be added now using the aes() function.

pcPlot <- ggplot(data = patients_clean)

pcPlot <- pcPlot + aes(x = Height, y = Weight)

pcPlot$mapping
## Aesthetic mapping: 
## * `x` -> `Height`
## * `y` -> `Weight`
pcPlot$theme
## list()
pcPlot$layers
## list()

But we are still missing the final component of our plot, the type of plot to use (geom).

Our first ggplot2 graph

Below the geom_point function is used to specify a point plot, a scatter plot of Height values on the x-axis versus Weight values on the y values.

pcPlot <- ggplot(data = patients_clean)

pcPlot <- pcPlot + aes(x = Height, y = Weight)
pcPlot <- pcPlot + geom_point()
pcPlot

pcPlot$mapping
## Aesthetic mapping: 
## * `x` -> `Height`
## * `y` -> `Weight`
pcPlot$theme
## list()
pcPlot$layers
## [[1]]
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity

Our first ggplot2 graph

Our first ggplot2 graph

Now we have all the components of our plot, we need we can display the results.

pcPlot

Geoms


Our first ggplot2 graph

More typically, the data and aesthetics are defined within ggplot function and geoms applied afterwards. This makes it easier to switch between plot types to find the best way to visualize your data.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight))
pcPlot + geom_point()

Plot types

There are many geoms available in ggplot2:

  • geom_point() - Scatter plots
  • geom_line() - Line plots
  • geom_smooth() - Fitted line plots
  • geom_bar() - Bar plots
  • geom_boxplot() - Boxplots
  • geom_jitter() - Jitter to plots
  • geom_histogram() - Histogram plots
  • geom_density() - Density plots
  • geom_text() - Text to plots
  • geom_errorbar() - Errorbars to plots
  • geom_violin() - Violin plots

Line plots

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight))

pcPlot_line <- pcPlot + geom_line()

pcPlot_line

Smoothed line plots

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight))

pcPlot_smooth <- pcPlot + geom_smooth()

pcPlot_smooth
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Bar plots

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Sex))

pcPlot_bar <- pcPlot + geom_bar()

pcPlot_bar

Histograms

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height))

pcPlot_hist <- pcPlot + geom_histogram()

pcPlot_hist
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Density plots

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height))

pcPlot_density <- pcPlot + geom_density()

pcPlot_density

Box plots

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Sex, y = Height))

pcPlot_boxplot <- pcPlot + geom_boxplot()

pcPlot_boxplot

Violin plots

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Sex, y = Height))

pcPlot_violin <- pcPlot + geom_violin()

pcPlot_violin

Multiple Geoms

We can also provide multiple geoms. Often we may want to do this to add complexity to our plot. In this case we often may like to see the data points driving our violin plot. So we also add a jitter on top.

ggplot(data = patients_clean, mapping = aes(x = Sex, y = Height)) + geom_violin() +
    geom_jitter()

There are a world of geoms

An overview of geoms and their arguments can be found in the ggplot2 documentation or within the ggplot2 quick reference guides.

Aesthetics


Aesthetics

In order to change the property on an aesthetic of a plot into a constant value (e.g. set color of all points to red) we can supply the color argument to the geom_point() function.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight))
pcPlot + geom_point(color = "red")

Plot properties

As we discussed earlier however, ggplot2 makes use of aesthetic mappings to assign variables in the data to the properties/aesthetics of the plot. This allows the properties of the plot to reflect variables in the data dynamically.

In these examples we supply additional information to the aes() function to define what information to display and how it is represented in the plot.

First we can recreate the plot we saw earlier.

pcPlot <- ggplot(data=patients_clean,
                 mapping=aes(x=Height,
                             y=Weight))
pcPlot+geom_point()

Color

Now we can adjust the aes mapping by supplying an argument to the color parameter in the aes function. (Note that ggplot2 accepts “color” or “color” as parameter name)

This simple adjustment allows for identification of the separation between male and female measurements for height and weight.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight, color = Sex))
pcPlot + geom_point()

Point shape

Similarly the shape of points may be adjusted.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight, shape = Sex))
pcPlot + geom_point()

Aesthetics in geom

The aesthetic mappings may be set directly in the geom_points() function as previously when specifying red. This can allow the same ggplot object to be used by different aesthetic mappings and varying geoms.

pcPlot <- ggplot(data = patients_clean)
pcPlot + geom_point(aes(x = Height, y = Weight, color = Sex))

pcPlot + geom_point(aes(x = Height, y = Weight, color = Smokes))

pcPlot + geom_point(aes(x = Height, y = Weight, color = Smokes, shape = Sex))

pcPlot + geom_violin(aes(x = Sex, y = Height, fill = Smokes))

Aesthetics in geom

Aesthetics in geom

Aesthetics in geom

Aesthetics in geom

Geom paramters

Many Geoms will have some specific aesthetics for that plot type. In this case we may want to adjust the width of our jitter to make it a bit more narrow and focused.

ggplot(data = patients_clean, mapping = aes(x = Sex, y = Height, fill = Sex)) + geom_violin() +
    geom_jitter(width = 0.2)

Aesthetics in geom

Again, for a comprehensive list of parameters and aesthetic mappings used in geom_type functions see the ggplot2 documentation for individual geoms by using ?geom_type

`?`(geom_point)

or visit the ggplot2 documentations pages and quick reference:

Exercise on the principles of ggplot can be found here

Answers for the principles of ggplot can be found here

Facets


Facets

One very useful feature of ggplot is faceting. This allows you to produce several plots that subset by variables in your data.

To facet our data into multiple plots we can use the facet_wrap or facet_grid function specifying the variable we split by.

The facet_grid function is well suited to splitting the data by two factors while facet_wrap simply wraps the data in a 2d format based on factor levels.

(https://ggplot2-book.org/)

Split by 2 factors

Here we can plot the data with the Smokes variable as rows and Sex variable as columns.

facet_grid(Rows~Columns)

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = Sex)) +
    geom_point()
pcPlot + facet_grid(Smokes ~ Sex)

Split by 1 factor

To split by one factor we use the the facet_grid() function again, but omit the variable before the “~”. This will facet along columns in plot.

facet_grid(~Columns)

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = Sex)) +
    geom_point()
pcPlot + facet_grid(~Sex)

Split by 1 factor

Similarly, to split along rows in plot, the variable is placed before the “~.”.

facet_grid(Rows~.)

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = Sex)) +
    geom_point()
pcPlot + facet_grid(Sex ~ .)

facet_wrap()

The facet_wrap() function offers a less grid-based structure but is well suited to faceting data by one variable.

For facet_wrap() we follow as similar syntax to facet_grid().

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = Sex)) +
    geom_point()
pcPlot + facet_wrap(~Smokes)

Multiple variables

For more complex faceting both facet_grid and facet_wrap can accept combinations of variables. Here we use facet_wrap.

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = Sex)) +
    geom_point()
pcPlot + facet_wrap(~Pet + Smokes + Sex)

Multiple variables

Or in a nice grid format using facet_grid() and the Smokes variable against a combination of Gender and Pet.

pcPlot + facet_grid(Smokes ~ Sex + Pet)

Plotting Order


Plotting order in ggplot

We will shortly discuss how to change various aspects of the plot layout and appearance. However, a common-asked question is how to change the order in which R plots a categorical variable. Consider the boxplot to compare weights of males and females:

ggplot(patients_clean, aes(x = Sex, y = Weight)) + geom_boxplot()

Plotting order and factors

Here, R decides the order to arrange the boxes according to the levels of the categorical variable. If there are no levels or the levels are not ordered it defaults to the alphabetical order. i.e. Female before Male.

levels(patients_clean$Sex)
## NULL

Plotting order and factors

Depending on the message we want the plot to convey, we might want control over the order of boxes. The factor functions allows us to explicitly change the order of the levels.

patients_clean$Sex <- factor(patients_clean$Sex, levels = c("Male", "Female"))
ggplot(patients_clean, aes(x = Sex, y = Weight)) + geom_boxplot()

Scales


Scales

Scales and their legends have so far been handled using ggplot2 defaults. ggplot2 offers functionality to have finer control over scales and legends using the scale methods.

Scale methods are divided into functions by combinations of

  • the aesthetics they control.

  • the type of data mapped to scale.

    scale_aesthetic_type

    Try typing in scale_ then tab to autocomplete. This will provide some examples of the scale functions available in ggplot2.

Arguments

Although different scale functions accept some variety in their arguments, common arguments to scale functions include:

  • name - The axis or legend title

  • limits - Minimum and maximum of the scale

  • breaks - Label/tick positions along an axis

  • labels - Label names at each break

Controlling the X and Y scale.

Both continuous and discrete X/Y scales can be controlled in ggplot2 using:

scale_(x/y)_(continuous/discrete)

Continuous axes scales

In this example we control the continuous scale on the x-axis by providing a name, x-axis limits, the positions of breaks (ticks/labels) and the labels to place at breaks.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight, color = Sex)) +
    geom_point()
pcPlot + geom_point() + scale_x_continuous(name = "height ('cm')", limits = c(150,
    200), breaks = c(160, 180), labels = c("Short", "Tall"))

Discrete axes scales

Similarly control over discrete scales is shown below.

pcPlot <- ggplot(data = patients_clean, aes(x = Sex, y = Height))
pcPlot + geom_violin(aes(x = Sex, y = Height)) + scale_x_discrete(labels = c("Men",
    "Women"))

Combining axes scales

Multiple X/Y scales can be combined to give full control of axis marks.

pcPlot <- ggplot(data = patients_clean, aes(x = Sex, y = Height, fill = Smokes))
pcPlot + geom_violin(aes(x = Sex, y = Height)) + scale_x_discrete(labels = c("Men",
    "Women")) + scale_y_continuous(breaks = c(160, 180), labels = c("Short", "Tall"))

Controlling other scales

When using fill, color, linetype, shape, size or alpha aesthetic mappings the scales are automatically selected for you and the appropriate legends created.

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = Sex))
pcPlot + geom_point(size = 4)

In the above example the discrete colors for the Sex variable was selected by default.

Manual discrete color scale

Manual control of discrete variables can be performed using scale_aes_Of_Interest_manual with the values parameter. Additionally in this example an updated name for the legend is provided.

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = Sex))
pcPlot + geom_point(size = 4) + scale_color_manual(values = c("Green", "Purple"),
    name = "Gender")

Named Colors and hex codes

Most colors can simply be defined by writing them in as a character vector i.e. “green”. There are a wide variety of named colors available in R. From “darkgoldenrod” to “bisque”. And 100 different shades of gray. You can find an extensive list of R colors here.

You can also use hex codes: a hexadecimal format for identifying colors. This gives greater variety of options as they use the full color spectrum. Each pair of characters corresponds to the Red, Green and Blue content for the color i.e. #ffe4c4 (also known as Bisque) is composed of 100% red, 89.4% green and 76.9% blue. Resources like this color picker can be used to help you find specific shades, and even create complementary palettes.

Packages for color scales

Here we have specified the colors to be used (hence the scale_color_manual) but when the number of levels to a variable are high this may be impractical and often we would like ggplot2 to choose colors from a scale of our choice.

There are a number of collections of in-built and installable package palettes in ggplot2, namely colorbrewer, paleteer and viridis. Palettes are prebuilt collections of colors. They can consist of a various numbers of colors and can have different properties i.e. continuous/discrete or divergent

scale_color_brewer

Colorbrewer comes with ggplot

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = Pet))
pcPlot + geom_point(size = 4) + scale_color_brewer(palette = "Set2")

scale_color_paletteer_d

Paletteer is a collection of user-defined palettes that have been collated. They have wildly different rationales including famous artworks, Wes Anderson movies and Birds.

library(paletteer)
pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = Pet))
pcPlot + geom_point(size = 4) + scale_color_paletteer_d(palette = "wesanderson::Zissou1")

scale_color_viridis_d

Viridis is a collection scientifically designed color palette to help with color blindness and to accurately show the dynamic range in data sets.

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = Pet))
pcPlot + geom_point(size = 4) + scale_color_viridis_d()

Color blindness + Uniform Perception

Using custom colors is great and can really help a make a piece of work cohesive and stand out. But you have to be careful.

~4% of people are color blind. In white males this number raises to ~10%. Considering the demographics in science, there will likely be someone with color blindness in your meeting.

Furthermore, when we pick gradients the ability to see patterns in the data varies depending on the color scales used, even in sighted people.

Color blindness + Uniform Perception

palettes
palettes

(Crameri et al, 2020)

Color blindness and ggplot2 defaults

(@dichromat-chloe.bsky.social)

Picking Colors

Sticking with default colors maybe easy, but it is important to switch things up before you present the data in any way for multiple reasons. There’s no singular correct answer but remember the aim:

  • Visually faithful to the dynamic scale of the underlying data
  • Translates well to greyscale (printing)
  • Is accessible to those with color blindness.

Palettes

For more details on palette sizes and styles visit the colorbrewer website and ggplot2 reference page.

Continuous Scales


Continuous scales

So far we have looked a qualitative scales but ggplot2 offers much functionality for continuous scales such as for size, alpha (transparency), color and fill.

  • scale_alpha_continuous() - For transparency

  • scale_size_continuous() - For control of size.

Alpha

Both these functions accept the range of alpha/size to be used in plotting.

Below the range of alpha to be used in plot is limited to between 0.5 and 1.

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, alpha = BMI))
pcPlot + geom_point(size = 4) + scale_alpha_continuous(range = c(0.5, 1))

Size

Below the range of sizes to be used in plot is limited to between 3 and 6.

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, size = BMI))
pcPlot + geom_point(alpha = 0.8) + scale_size_continuous(range = c(3, 6))

Limits

The limits of the scale can also be controlled but it is important to note data outside of scale is removed from plot.

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, size = BMI))
pcPlot + geom_point() + scale_size_continuous(range = c(3, 6), limits = c(25, 40))

Labels

What points of scale to be labeled and labels text can also be controlled.

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, size = BMI))
pcPlot + geom_point() + scale_size_continuous(range = c(3, 6), breaks = c(25, 30),
    labels = c("Good", "Good but not 25"))

Color

Control of color/fill scales can be best achieved through the gradient subfunctions of scale.

  • scale_(color/fill)_gradient - 2 color gradient (eg. low to high BMI)

  • scale_(color/fill)_gradient2 - Diverging color scale with a midpoint color (e.g. Down, No Change, Up)

Both functions take a common set of arguments:-

  • low - color for low end of gradient scale
  • high - color for high end of gradient scale.
  • na.value - color for any NA values.

Color

An example using scale_color_gradient below sets the low and high end colors to White and Red respectively

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = BMI))
pcPlot + geom_point(size = 4, alpha = 0.8) + scale_color_gradient(low = "White",
    high = "Red")

Color

Similarly we can use the scale_color_gradient2 function which allows for the specification of a midpoint value and its associated color.

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = BMI))
pcPlot + geom_point(size = 4, alpha = 0.8) + scale_color_gradient2(low = "Blue",
    mid = "Black", high = "Red", midpoint = median(patients_clean$BMI))

Labels

As with previous continuous scales, limits and custom labels in scale legend can be added.

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = BMI))
pcPlot + geom_point(size = 4, alpha = 0.8) + scale_color_gradient2(low = "Blue",
    mid = "Black", high = "Red", midpoint = median(patients_clean$BMI), breaks = c(25,
        30), labels = c("Low", "High"), name = "Body Mass Index")

Scales are very customizable

Multiple scales may be combined to create high customizable plots and scales

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = BMI,
    shape = Sex))
pcPlot + geom_point(size = 4, alpha = 0.8) + scale_shape_discrete(name = "Gender") +
    scale_color_gradient2(low = "Blue", mid = "Black", high = "Red", midpoint = median(patients_clean$BMI),
        breaks = c(25, 30), labels = c("Low", "High"), name = "Body Mass Index")

Conditional scales and colors

We can also use an ifelse conditional statement to apply discrete color cutoffs for groups of data points that aren’t represented by categorical variables in the data set.

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, shape = Sex,
    color = ifelse(BMI > 30, "High", ifelse(BMI < 25, "Low", "Middle"))))
pcPlot + geom_point(size = 4, alpha = 0.8) + scale_shape_discrete(name = "Gender") +
    scale_color_manual(name = "BMI category", values = c("red", "blue", "grey"))

Continuous scales and palettes

There are similar continuous palettes to use with ggplot2. You can just modify the palette function to end with *_c* for paletteer and viridis.

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = BMI))
pcPlot + geom_point(size = 4, alpha = 0.8) + scale_color_viridis_c()

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = BMI))
pcPlot + geom_point(size = 4, alpha = 0.8) + scale_color_paletteer_c(palette = "grDevices::Temps")

Transformations


Statistical transformations

In ggplot2, many of the statistical transformations are performed without any direct specification e.g. geom_histogram will use stat_bin function to generate bin counts to be used in plot.

An example of statistical methods in ggplot2 which are very useful include the stat_smooth and stat_summary functions.

Fitting lines

The stat_smooth function can be used to fit a line to the data being displayed.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight))
pcPlot + geom_point() + stat_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Loess and more

By default a “loess” smooth line is plotted by stat_smooth. Other methods available include lm, glm, gam, rlm.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight))
pcPlot + geom_point() + stat_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'

Fitting lines in groups

A useful feature of ggplot2 is that it uses previously defined grouping when performing smoothing. If coloring by Sex is an aesthetic mapping then two smooth lines are drawn, one for each sex.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight, color = Sex))
pcPlot + geom_point() + stat_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'

Fitting lines in groups

This behavior can be overridden by specifying an aes within the stat_smooth function and setting inherit.aes to FALSE.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight, color = Sex))
pcPlot + geom_point() + stat_smooth(aes(x = Height, y = Weight), method = "lm", inherit.aes = F)
## `geom_smooth()` using formula = 'y ~ x'

Displaying fitted line statistics

The ggpubr package contains functions to help display statistics on the plot. Here we add the equation for the line of best fit using the stat_regline_equation function.

The after_stat function is required to tell ggplot to map the aesthetics after a function has computed the relevant statistics.

library(ggpubr)
pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight)) +
    geom_point() + stat_smooth(method = "lm", formula = y ~ x)
pcPlot + stat_regline_equation(aes(label = after_stat(eq.label)))

Displaying fitted line statistics

We can also get the R-squared value from the stat_regline_equation transformation

pcPlot + stat_regline_equation(label.y = 94, aes(label = after_stat(eq.label))) +
    stat_regline_equation(label.y = 91, aes(label = after_stat(rr.label)))

Displaying fitted line statistics

By giving subsets of the data to the stat_regline_equation function, we can display statistics for each group that we make a line of best fit.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight, color = Sex)) +
    geom_point() + stat_smooth(aes(x = Height, y = Weight), method = "lm", formula = y ~
    x)
pcPlot + stat_regline_equation(data = patients_clean[patients_clean$Sex == "Male",
    ], label.y = 90, aes(label = after_stat(rr.label))) + stat_regline_equation(data = patients_clean[patients_clean$Sex ==
    "Female", ], label.x = 175, label.y = 65, aes(label = after_stat(rr.label)))

Displaying stats on the plot

The ggpubr package also has useful functions that allows the display p-values on plots when combined with the rstatix package.

Here we use rstatix to create an object with relevant statistics for our desired comparison, and then we add x and y position information. Check out the the many other functions rstatix has to add information to this object (adjusted p, other stats tests, etc.)

library(rstatix)
# https://rpkgs.datanovia.com/rstatix/

stat_test <- t_test(patients_clean, Height ~ Sex)
stat_test <- add_xy_position(stat_test, x = "Sex", dodge = 0.8)

data.frame(stat_test)  # show object as dataframe
##      .y. group1 group2 n1 n2 statistic       df        p y.position
## 1 Height   Male Female 45 55  14.01073 60.55786 1.16e-20    187.278
##         groups xmin xmax
## 1 Male, Female    1    2

Displaying stats on the plot

The output from the rstatix functions can then be used in the stat_pvalue_manual function from ggpubr to add the pvalue.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Sex, y = Height)) + geom_boxplot()

pcPlot + stat_pvalue_manual(stat_test, label = "p")

Displaying stats for grouped data

By grouping the initial dataframe, we can look for differences between smokers within sex. We also add an adjusted p-value.

grouped_data <- group_by(patients_clean, Sex)
stat_test_grp <- t_test(grouped_data, formula = Height ~ Smokes)
stat_test_grp <- adjust_pvalue(stat_test_grp, method = "BH")
stat_test_grp <- add_xy_position(stat_test_grp, x = "Sex", dodge = 0.8)

data.frame(stat_test_grp)
##      Sex    .y.     group1 group2 n1 n2  statistic       df      p  p.adj
## 1   Male Height Non-Smoker Smoker 35 10 -0.6542605 14.42848 0.5230 0.5230
## 2 Female Height Non-Smoker Smoker 43 12 -1.8289039 16.20010 0.0859 0.1718
##   y.position       groups x xmin xmax
## 1   187.4928 Non-Smok.... 1  0.8  1.2
## 2   172.0928 Non-Smok.... 2  1.8  2.2

Displaying stats for grouped data

We can also modify the label of the p-values by putting the column used for the p-value inside curly brackets {} within a string.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Sex, y = Height, fill = Smokes)) +
    geom_boxplot()

# don't inherit aesthetic to make this work
pcPlot + stat_pvalue_manual(stat_test_grp, label = "p = {p.adj}", inherit.aes = F)

Summary statistics

Another useful method is stat_summary which allows for a custom statistical function to be performed and then visualized.

The fun parameter specifies a function to apply to the y variables for every value of x. In this example we use it to plot the quantiles of the Female and Male Height data

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Sex, y = Height)) + geom_jitter()
pcPlot + stat_summary(fun = quantile, geom = "point", color = "purple", size = 8)

Marginal plots with ggExtra

Another way to highlight differences between groups is with a marginal plot for the the X or Y axis variables.

By default this is a line and the groupcolor and groupFill arguments carry over the color aesthetic of the main plot.

library(ggExtra)
pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight, color = Sex)) +
    geom_point()
ggMarginal(pcPlot, groupColor = TRUE, groupFill = TRUE)

Marginal plots with ggExtra

We can easily turn this into a histogram and only display either X or Y axis.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight, color = Sex)) +
    geom_point()
ggMarginal(pcPlot, groupColor = TRUE, groupFill = TRUE, type = "histogram", margins = "x")

Exercise on scales and transformations in ggplot can be found here

Exercise on scales and transformations in ggplot can be found here

Themes


Themes

Themes specify the details of data independent elements of the plot. This includes titles, background color, text fonts etc.

The graphs created so far have all used the default themes, theme_grey(), but ggplot2 allows for the specification of theme used.

Predefined themes

Predefined themes can be applied to a ggplot2 object using a family of functions theme_style()

Here is a scatter with the default theme…

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight)) +
    geom_point()
pcPlot

…and the same scatter plot with the minimal theme.

pcPlot+theme_minimal()

Predefined themes

Several predefined themes are available within ggplot2 including:

  • theme_bw

  • theme_classic

  • theme_dark

  • theme_gray

  • theme_light

  • theme_linedraw

  • theme_minimal

You can review them here

Theme packages

There are many themes you can gain access to through installing additional packages. Packages such as ggthemes also contain many useful collections of predefined theme_style functions.

Other alternatives include: * hrbrthemes - focuses on controlling typography * ggthemr - A large collection of themes - includes dark options * ggpomologica - Hand drawn plots * tvthemes - TV shows

ggthemes

After we have loaded in the ggthemes packages we can then simply use the new themes available to using the theme_style convention of function.

Here is a scatter with the default theme…

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight)) +
    geom_point()
pcPlot

…and the same scatter plot with the Wall Street Journal theme.

install.packages("ggthemes")
library(ggthemes)
pcPlot+theme_wsj()

Custom themes

As well as making use of predefined theme styles, ggplot2 allows for control over the attributes and elements within a plot through a collection of related functions and attributes.

theme() is the global function used to set attributes for the collections of elements/components making up the current plot.

Within the theme functions there are 4 general graphic elements which may be controlled…

  • rect
  • line
  • text
  • title

…and 5 groups of related elements:

  • axis
  • legend
  • strip
  • panel (plot panel)
  • plot (Global plot parameters) ]

Custom themes

These elements may be specified by the use of their appropriate element functions including:

  • element_line()
  • element_text()
  • element_rect()

and additionally element_blank() to set an element to “blank”.

Custom themes

A detailed description of controlling elements within a theme can be seen at the ggplot2 vignette and by typing ?theme into the console.

Customizing your theme

To demonstrate customizing a theme, in the example below we alter one element of theme. Here we will change the text color for the plot.

  • Note because we are changing a text element we use the element_text() function.

A detailed description of which elements are available and their associated element functions can be found by typing ?theme.

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Height,y=Weight))+
  geom_point()
pcPlot+
  theme(
    text = element_text(color="red")
      )

Customizing your theme

If we wished to set the y-axis label to be at an angle we can adjust that as well.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight)) +
    geom_point()
pcPlot + theme(text = element_text(color = "red"), axis.title.y = element_text(angle = 0))

Customizing your theme

Finally we may wish to remove axis line, set the background of plot panels to be white and give the strips (title above facet) a cyan background color.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight)) +
    geom_point() + facet_grid(Sex ~ Smokes)
pcPlot + theme(text = element_text(color = "red"), axis.title.y = element_text(angle = 0),
    axis.line = element_line(linetype = 0), panel.background = element_rect(fill = "white"),
    strip.background = element_rect(fill = "cyan"))

Customizing your theme

Finally we may wish to remove axis line, set the background of plot panels to be white and give the strips (title above facet) a cyan background color.

Useful example for legend

A useful example of using the theme can be seen in controlling the legend. By default the legend is in right of plot.

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = Sex)) +
    geom_point()
pcPlot

Useful example for legend

By modifying the theme we can control the legend positioning.

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = Sex)) +
    geom_point()
pcPlot + theme(legend.position = "left")

Useful example for legend

We can control all aspects of a legend as we can for other theme elements.

pcPlot <- ggplot(data = patients_clean, aes(x = Height, y = Weight, color = Sex)) +
    geom_point()
pcPlot + theme(legend.text = element_text(color = "darkred"), legend.title = element_text(size = 20),
    legend.position = "bottom")

+ and %+replace%

When altering themes we have been using the + operator to add themes as we would adding geoms,scales and stats.

When using the + operator

  • Themes elements specified in new scheme replace elements in old theme

  • Theme elements in the old theme which have not been specified in new theme are maintained.

This makes the + operator useful for building up from old themes.

The + operator

In the example below, we maintain all elements set by theme_bw() but overwrite the theme element attribute of the color of text.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight)) +
    geom_point() + theme_bw()
pcPlot + theme(text = element_text(color = "red"))

%+replace%

In contrast %+replace% replaces all elements within a theme regardless of whether they have been previously specfied in old theme.

When using the %+replace% operator:

  • Theme elements specified in new scheme replace elements in old theme

  • Theme elements in the old theme which have not been specified in new theme are also replaced by blank theme elements.

oldTheme <- theme_bw()

newTheme_Plus <- theme_bw() + theme(text = element_text(color = "red"))

newTheme_Replace <- theme_bw() %+replace% theme(text = element_text(color = "red"))

oldTheme$text
## List of 11
##  $ family       : chr ""
##  $ face         : chr "plain"
##  $ colour       : chr "black"
##  $ size         : num 11
##  $ hjust        : num 0.5
##  $ vjust        : num 0.5
##  $ angle        : num 0
##  $ lineheight   : num 0.9
##  $ margin       : 'margin' num [1:4] 0points 0points 0points 0points
##   ..- attr(*, "unit")= int 8
##  $ debug        : logi FALSE
##  $ inherit.blank: logi TRUE
##  - attr(*, "class")= chr [1:2] "element_text" "element"
newTheme_Plus$text
## List of 11
##  $ family       : chr ""
##  $ face         : chr "plain"
##  $ colour       : chr "red"
##  $ size         : num 11
##  $ hjust        : num 0.5
##  $ vjust        : num 0.5
##  $ angle        : num 0
##  $ lineheight   : num 0.9
##  $ margin       : 'margin' num [1:4] 0points 0points 0points 0points
##   ..- attr(*, "unit")= int 8
##  $ debug        : logi FALSE
##  $ inherit.blank: logi FALSE
##  - attr(*, "class")= chr [1:2] "element_text" "element"
newTheme_Replace$text
## List of 11
##  $ family       : NULL
##  $ face         : NULL
##  $ colour       : chr "red"
##  $ size         : NULL
##  $ hjust        : NULL
##  $ vjust        : NULL
##  $ angle        : NULL
##  $ lineheight   : NULL
##  $ margin       : NULL
##  $ debug        : NULL
##  $ inherit.blank: logi FALSE
##  - attr(*, "class")= chr [1:2] "element_text" "element"

+ and %+replace%

Original theme

+ and %+replace%

Theme modified with +

+ and %+replace%

Theme modified with %+replace%

This means that %+replace% is most useful when creating new themes.

theme_get and theme_set

In the examples we have shown you we have been modifying the theme for a specific plot. But once you have a theme that you really like you may want it to apply to every plot you draw.

The active theme is automatically applied to every plot you draw. Use theme_get to get the current theme, and theme_set to completely override it.

Titles and Labels


Adding titles for plot and labels

So far no plot titles have been specified. Plot titles can be specified using the labs functions.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight)) +
    geom_point()
pcPlot + labs(title = "Weight vs Height", y = "Height (cm)")

Adding titles for plot and labels

You can also specify titles using the ggtitle and xlab/ylab functions.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight)) +
    geom_point()
pcPlot + ggtitle("Weight vs Height") + ylab("Height (cm)")

Saving Plots


Saving plots

Plots produced by ggplot can be saved in the same way as base plots

The ggsave() function allows for additional arguments to be specified including the type, resolution and size of plot.

By default ggsave() will use the size of your current graphics window when saving plots so it may be important to specify width and height arguments desired.

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight)) +
    geom_point()

ggsave(pcPlot, filename = "anExampleplot.png", width = 15, height = 15, units = "cm")

Saving in pdf format

PDFs are maybe the most useful format to export into. PDFs are vector-based so each part of the plot is saved as scalable cooridnates as opposed to specific pixels.

PDFs can then be opened in imaging software like illustrator or inkscape (this is a open source and free equivalent). When you open a PDF in these programs you can fully customize the plots to your aesthetic with a graphic user interface. Furthermore as they are vector-based, they can be easily assembled into publication quality figures without resolution issues and pixelation.

pdf(file = "anExampleplot.pdf", paper = "A4")
plot(control)
dev.off()

Other external packages


Extending ggplot

We have introduced you to several packages that extend the capabilities of ggplot2 either by adding additional plotting capabilities or providing pre-built color/theme options.

There are also several packages that specialize in creating specific plots or additional geoms not included within ggplot2 i.e. UpSet, ggbeeswarm, ggridges

We will quickly review UpSet plots.

Upset Plots

Upset plots are used for looking at overlaps between sets. They are considered to be a better way to visualize data then the more traditionally used Venn Diagram.

We can use the package demo data which is presence absence of mutants in different clones.

library(UpSetR)
mutations <- read.csv(system.file("extdata", "mutations.csv", package = "UpSetR"),
    header = T, sep = ",")
mutations[1:10, 1:10]
##    Identifier TTN PTEN TP53 EGFR MUC16 FLG RYR2 PCLO PIK3R1
## 1     02-0003   0    0    1    1     0   0    0    0      1
## 2     02-0033   0    0    1    0     0   0    0    0      0
## 3     02-0047   0    0    0    0     0   0    1    0      0
## 4     02-0055   1    1    1    0     0   0    0    0      0
## 5     02-2470   0    1    0    0     0   0    1    0      0
## 6     02-2483   0    0    1    0     0   0    0    1      0
## 7     02-2485   0    0    1    1     1   1    0    0      0
## 8     02-2486   0    0    0    0     0   0    0    0      0
## 9     06-0119   1    0    0    0     0   0    0    0      0
## 10    06-0122   1    0    0    0     0   0    1    1      0

Upset Plots

The Upset Plot will show us the number of intersections between different groups. For example EGFR and TTN only are present in 12 clones. The size of the set is also shown on the left to give additional context. This gives a good quantitative basis from which to understand set intersections.

upset(mutations)

Upset Plots customization

You may have noticed that only a few genes are present. This is because by default only the top 5 biggest sets are included. We can increase this number easily or directly specify which sets to include using the nsets parameter.

upset(mutations, nsets = 10)

UpsetR and ggplot2

Though ggplot2 is being used under the hood, the ggplot identity is lost in the final plot. Instead we have made an upset object. Unfortunately this means we cannot modify our upset plots using what we have learnt. Luckily the upset function has lots of arguments to allow for customization.

myupset <- upset(mutations)
class(myupset)
## [1] "upset"

Modifying plots from other packages

As with UpsetR, most packages make plots using ggplot2. There are >2000 packages on CRAN that use ggplot2 and many more on Bioconductor. If a package is using ggplot2 most of the time the ggplot identity is maintained. This means that you can interact and modify that plot just as if you made the ggobject yourself.

Lets look at some examples. We will look at the popular package: Seurat.

Seurat

Seurat is one of the most popular packages for analyzing scRNAseq data. Often we look at the dimension reduction (tSNE or UMAP typically).

library(Seurat)
data("pbmc_small")
DimPlot(object = pbmc_small)

Seurat

We created this using the DimPlot() function. This is specific plotting function for Seurat, but it is using ggplot under the hood.

mydimplot <- DimPlot(object = pbmc_small)
class(mydimplot)
## [1] "patchwork" "gg"        "ggplot"

Seurat

As it is a ggplot object, we can customize it. Here we can change the theme, title and colors.

mydimplot + ggtitle("tSNE of scRNAseq - PBMC") + scale_color_viridis_d() + theme_bw()

Seurat

You can typically only modify the scale of variables that you have aesthetic mappings. So if the function is not specifying that aesthetic i.e. size we cannot modify it.

Instead we can go back and redefine our ggplot. The information regarding setup can be found in the layout slot ggplot

mydimplot$layers
## [[1]]
## mapping: colour = ~ident, shape = NULL, alpha = NULL, x = ~tSNE_1, y = ~tSNE_2 
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity

Seurat

We can use the layout information to recreate the original ggplot, and then add some modifications i.e. add a size argument. We must also remove the original plot stored in the layers slot first.

mydimplot$layers <- NULL

mydimplot + geom_point(aes(x = tSNE_1, y = tSNE_2, color = ident, size = 2)) + ggtitle("tSNE of scRNAseq - PBMC") +
    scale_color_viridis_d() + theme_bw()

Interactive Plots


Interactive Plots

Most of the time when we are drawing plots the aim is to generate a saved file we can use for figures later. Sometimes though we may want to generate a interactive plot. This can be really useful to help parse our data, or if we want to generate dynamic reports to share our work.

library(plotly)

Interactive Plots

pcPlot <- ggplot(data = patients_clean, mapping = aes(x = Height, y = Weight)) +
    geom_point()

ggplotly(pcPlot)

Adding Annotation

To show off how we add additional annotation or work with more complex examples we have a ggplot object to load in. You can use the load() function with to read in this object: data/pcPlot.RData

load("data/pcPlot.RData")

Review the data

We can quickly check what is in the data.

head(pcPlot$data)
##           PC1       PC2         PC3        PC4         PC5        PC6 Sample
## A_1 0.6367280 0.6163906  0.08484202 -0.2844121  0.07083379 -0.5845692    A_1
## A_2 0.6090663 0.5384380  0.51330734 -0.4480145 -0.36444039  0.4824065    A_2
## B_1 0.4596307 0.5424412 -0.20859717  0.1143239  0.67195612  0.5775545    B_1
## B_2 0.4190463 0.4812430  0.49898282  0.7603890  0.04205184 -0.2294964    B_2
## C_1 0.3220796 0.5434719 -0.59606139  0.2760484 -0.64316800  0.1058694    C_1
## C_2 0.3130591 0.4426442 -0.31284107 -0.3514908  0.21796029 -0.3125952    C_2
##     Time Rep
## A_1  0hr   1
## A_2  0hr   2
## B_1  2hr   1
## B_2  2hr   2
## C_1 12hr   1
## C_2 12hr   2

Review the data

We can also check the aesthetics mapping, and any plots already in the ggplot object.

pcPlot$mapping
## Aesthetic mapping: 
## * `x`      -> `PC1`
## * `y`      -> `PC2`
## * `colour` -> `Time`
## * `shape`  -> `Rep`
pcPlot$layers
## [[1]]
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity

Lets make a plot

The aesthetic mappings are naturally inherited as labels. If we want to add custom labels we can specify it in our plot.

ggplotly(pcPlot)

Extra Annotation

As the aesthetic mappings are naturally inherited as labels,we can just add an additional aesthetic. If we want to add custom labels we can specify it in our plot.

ggplotly(pcPlot + geom_point(aes(label = Sample)))
## Warning in geom_point(aes(label = Sample)): Ignoring unknown aesthetics: label

Extra Annotation

If we want full control over the labeling we can instead use the ggplotly function to take care of this for us by specifying when/what/where labels are shown.

ggplotly(pcPlot + geom_point(aes(text = Sample)), source = "select", tooltip = c("Sample"))
## Warning in geom_point(aes(text = Sample)): Ignoring unknown aesthetics: text

Exercises on themes, saving and interactive plots in ggplot can be found here

Answers on themes, saving and interactive plots in ggplot can be found here

References

Contact

Any suggestions, comments, edits or questions (about content or the slides themselves) please reach out to our GitHub and raise an issue.