Plotting in R

Set Up

All prerequisites, links to material and slides for this course can be found on github.

Plotting_In_R

Or can be downloaded as a zip archive from here.

Download zip

Course content

Once the zip file in unarchived. All presentations as HTML slides and pages, their R code and HTML practical sheets will be available in the directories underneath.

presentations/slides/ Presentations as an HTML slide show.
presentations/singlepage/ Presentations as an HTML single page.
presentations/r_code/ R code in presentations.
exercises/ Practicals as HTML pages.
answers/ Practicals with answers as HTML pages and R code solutions.

Set the Working directory

Before running any of the code in the practicals or slides we need to set the working directory to the folder we unarchived.

You may navigate to the unarchived Plotting_In_R folder in the Rstudio menu.

Session -> Set Working Directory -> Choose Directory

or in the console.

setwd("/PathToMyDownload/Plotting_In_R-master/r_course")
# e.g. setwd('~/Downloads/Plotting_In_R-master/r_course')

Introduction

R has excellent graphics and plotting capabilities. In fact this is commonly seen as one of the advantages of R over other competing languages like python and matlab. They are mostly found in following three sources. + base graphics + the lattice package + the ggplot2 package

Base R graphics uses a pen and paper model for plotting while Lattice and ggplot2 packages are built on the routines first used in grid graphics.

A pen and paper model

Once plot is produced, can only add more elements, cannot remove.
Makes it hard to update.
But faster than more complex plotting systems.

Building a new plot is often a stepwise process with gradual addition of features. It will likely require replotting many times.

What do we use?

Base Plot - Quick and easy plots while we are initially reviewing data
ggplot2 - Producing publication quality figures

Scatter and line Charts

First we’ll produce a very simple graph using the values in a numeric vector:

treatment <- c(0.02, 1.8, 17.5, 55, 75.7, 80)

Scatter Plot

Now we plot the treatment vector with default parameters.

plot(treatment)

Plot Customization

Parameters

Many features of pots can be customized, by providing addtional arguments to the plot() function.

Type

First we can plot treatment using points overlayed by a line. We control this with the type argument.

plot(treatment, type = "o")

Type

To see a complete list we can use ?plot

plot(treatment, type = "l")

plot(treatment, type = "p")

Point size

We can control the size of points in our plot using the cex parameter.

plot(treatment, cex = 2)

plot(treatment, cex = 0.5)

Point shape

We can control the type of points in our plot using the pch parameter.

plot(treatment, pch = 1)

plot(treatment, pch = 20)

Line weight

Similarly when plotting a line we control size with lwd parameter.

plot(treatment, type = "l", lwd = 10)

plot(treatment, type = "l", lwd = 0.5)

Line type

We can also control the type of line with lty parameter.

plot(treatment, type = "l", lty = 1)

plot(treatment, type = "l", lty = 2)

Color

An important parameter we can control is color. We can control color or lines or points using the col argument.

plot(treatment, type = "l", col = "red")

plot(treatment, type = "l", col = "blue")

Title

We add a title with main argument and or a sub-title with the sub argument.

plot(treatment, main = "My Plot", sub = "a plot")

Axis labels

We can customize our x and y axis label with the xlab and ylab arguments respectively.

plot(treatment, xlab = "Position", ylab = "score")

Axis labels

We can control the orientation of labels on axis using las argument.

plot(treatment, las = 1)

plot(treatment, las = 2)

Other parameters

Review ?plot and ?par for complete list of options.

Plot multiple vectors

The plot function vector will accept two vectors to be plotted against each other.

control <- c(0, 20, 40, 60, 80, 100)
plot(treatment, control)

Plot multiple vectors

We often want multiple lines in same plot. So if we want to plot scores for control and treatment against position we will need a new method.

We can add an additional line to our existing plot using the lines() function.

plot(treatment, type = "o", col = "blue")
lines(control, type = "o", pch = 22, lty = 2, col = "red")

Setting plot limits

The new line doesn’t quite fit into our original plot.

We can extend our x or y axis by specifying values to xlim ylim arguments directly.

control <- c(0, 20, 40, 60, 80, 100)
plot(treatment, type = "o", col = "blue", ylim = c(0, 100))
lines(control, type = "o", pch = 22, lty = 2, col = "red")

Defining your limits

Instead of defining the axis limits explicitly we can compute the y-axis values using the range function. This means any updates to our data will be automatically reflected in our graph.

range() returns a vector containing the minimum and maximum of all the given arguments.

Calculate range from 0 to max value of data.

g_range <- range(0, treatment, control)
g_range

## [1]   0 100

Custom axes

To be able to customize axes we need to turn off axes and annotations (axis labels). We will then be able to specify them ourselves. We turn of axis and annotation plotting using axes=FALSE and ann=FALSE

plot(treatment, type = "o", col = "blue", ylim = g_range, axes = FALSE, ann = FALSE)

Creating axes

We can create our own X axis by using the axis() function. We specify the side argument for where to place axis, the at argument to specify where to put axis ticks and lab argument to specify labels for axis ticks.

axis(side = 1, at = 1:6, lab = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))

Creating axes

We can make our y axis with horizontal labels that display ticks at every 20 marks in a similar way.

We specify our side and use seq() function to make axis tick postions for at argument. We can use our y-axis range again to help define how many ticks we need.

axis(2, las = 1, at = seq(0, g_range[2], by = 20))

Framing plots

We can now add a box around our plot using the box() function.

box()

Plot multiple lines

Now I can add my control data using lines argument.

lines(control, type = "o", pch = 22, lty = 2, col = "red")

Legends

Finally we may wish to add a legend to out plot. We can add a legend to current plot using the legend() function.

We need to specify where to place legend in plot, the names in legend to legend argument and any additional point/line type configuration we used e.g the color and shape.

legend("topleft", legend = c("treatment", "control"), col = c("blue", "red"), pch = 21:22,
    lty = 1:2)

Put it all together

Though we can make a okay plot in a single line, this example demonstrates how you can make something custom and ready to share with others. To make that final plot you can see that there are many lines of code we put together.

plot(treatment, type = "o", col = "blue", lwd = 1, ylim = g_range, axes = FALSE,
    ann = FALSE)
axis(1, at = 1:6, lab = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))
axis(2, las = 1, at = 20 * 0:g_range[2])
box()

lines(control, type = "o", pch = 22, lty = 2, col = "red", lwd = 2.5)
legend("topleft", legend = c("treatment", "control"), col = c("blue", "red"), pch = 21:22,
    lty = 1:2, lwd = c(1, 2.5))

Colors

Named Colors and hex codes

Most colors can simply be defined by writing them in as a character vector i.e. “green”. There are a wide variety of named colors available in R. From “darkgoldenrod” to “bisque”. And 100 different shades of gray. You can find an extensive list of R colors here.

You can also use hex codes: a hexadecimal format for identifying colors. This gives greater variety of options as they use the full color spectrum. Each pair of characters corresponds to the Red, Green and Blue content for the color i.e. #ffe4c4 (also known as Bisque) is composed of 100% red, 89.4% green and 76.9% blue. Resources like this color picker can be used to help you find specific shades, and even create complementary palettes.

Palettes

Palettes are prebuilt collections of colors. They can consist of a various numbers of colors and can have different properties i.e. continuous/discrete or divergent

Here we can look at the rainbow palette that comes with R. Often the palettes are functions where we can simply ask for how many colors we want and it pull appropriate numbers back.

rainbow(2)

## [1] "#FF0000" "#00FFFF"

Palettes

Rainbow is continuous so we it will pull as many colors as you ask from the color space, along a specific gradient defined by the palette.

rainbow(6)

## [1] "#FF0000" "#FFFF00" "#00FF00" "#00FFFF" "#0000FF" "#FF00FF"

Palettes

Many people have created there own palette packages along many themes including famous artworks, Wes Anderson movies and Birds. Paleteer is a package in which many different palettes across many themes are collated. In this case we can grab a discrete palettes. This means it will have a limited number of options.

library(paletteer)
my_colors <- paletteer_d("wesanderson::Zissou1")

my_colors

## <colors>
## #3B9AB2FF #78B7C5FF #EBCC2AFF #E1AF00FF #F21A00FF

Palettes

R color brewer is another popular option. It has many built-in palettes. What makes this package special though is the ability to customize continuous color palettes with the colorRampPalette() function.

library(RColorBrewer)

my_pal <- colorRampPalette(c("Red", "White", "Blue"))(25)

Color blindness + Uniform Perception

Using custom colors is great and can really help a make a piece of work cohesive and stand out. But you have to be careful.

~4% of people are color blind. In white males this number raises to ~10%. Considering the demographics in science, there will likely be someone with color blindness in your meeting.

Furthermore, when we pick gradients the ability to see patterns in the data varies depending on the color scales used, even in sighted people.

Color blindness + Uniform Perception

palettes

(Crameri et al, 2020)

Viridis

The viridis color palette was designed by color scientists to be perceptually uniform, to have a wide dynamic range, to be accessible to the various forms of color blindness and to work even when converted into gray scale.

install.packages("viridis")
library(viridis)
viridis(5)

## [1] "#440154FF" "#3B528BFF" "#21908CFF" "#5DC863FF" "#FDE725FF"

viridis_palettes

Viridis alternatives

Viridis gradients are great, but there is not an good divergent color scheme.

The khroma package is a similarly derived collection of colors, with even more options like divergent palettes.

Good data visualization

There are often a trade offs in creating good plots.

Is it easy to digest and accessible to everyone?
Is it engaging and appealing?
Does it contain all the information with nothing superfluous?
Is it the best way to tell the story I want to tell, without being misleading?

Fundamentals of Data Visualization by Claus O. Wilke is a good resource on the theory of making data visualizations the right way.

Bar Charts

Base graphics has a useful built in function for bar charts. The barplot() function. We can simply pass our numeric vector to this function to get our barchart.

barplot(treatment)

Labels

The barplot() function hasn’t added any labels by default. We can speciy our own however using the names.arg argument. names.arg is a vector of names to be plotted below each bar or group of bars.

barplot(treatment, names.arg = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))

Labels

If my vector was named however, then my vectors names would be used for labels. We use names() function to add names to our vector then we replot.

names(treatment) <- c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat")
barplot(treatment)

Stacking

Sometimes you want to have several data series stacked in a single barplot. The barplot() function handles this readily.

Let’s read the data from the example_plot.txt data file.

Read values from comma-seperated example_plot.txt

data <- read.table("data/example_plot.txt", header = T, row.names = 1, sep = ",")

Stacking

To build a stacked barplot we need to give the barplot funcion a matrix. We can use as.matrix() function to convert our data frame to a matrix.

barplot(as.matrix(data))

Grouping

Now we can plot data from a matrix with grouped barchart using the beside argument.

barplot(as.matrix(data), beside = TRUE)

Customization

Though a different function to plot(), barplot can be customized in much the same way. Most of the parameters have the same names.

barplot(as.matrix(data), main = "Daily progression of X in\nControl and Treatment",
    ylab = "Total", beside = TRUE, col = viridis(6))
legend("topleft", c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat"), cex = 0.8, fill = viridis(6))

Histograms and Density Plots

Histograms

Base graphics has a useful built-in function for histograms too. This is the hist() function, which just needs a numeric vector.

hist(treatment)

Customization

Similar customization exists as for other plots.

hist(treatment, col = "lightblue", ylim = c(0, 5), cex.main = 0.8)

Breaks

We can create more fine grained histogram by specify the number of required bins to the breaks argument.

hist(treatment, col = "lightblue", ylim = c(0, 5), cex.main = 0.8, breaks = 2)

hist(treatment, col = "lightblue", ylim = c(0, 5), cex.main = 0.8, breaks = 10)

Density Plot

Unlike histograms, there is not a density plot function within base R. Instead we call the density() to calculate the density values. We can then just pass this straight to the basic plot function.

d <- density(treatment)  # returns the density data
plot(d)

Density Plot

Similar to bins, we can alter the resolution using the width argument.

d <- density(treatment, width = 20)  # returns the density data
plot(d)

Dot Charts

Dot charts

Base graphics also has a dotchart() function. Dot charts help compare paired data. First though we need to modify the matrix, as we are comparing in pairs as opposed to all control versus treatment.

We use the function t to return the transpose of a matrix. This means rows are now columns and the columns are now rows.

dotchart(t(data))

Customization

Again we can use the arguments to modify the layout and appearance.

Now we create a colored dotchart for autos with smaller labels.

dotchart(t(data), color = c("red", "blue"), main = "Dotchart", cex = 0.5)

Box Plots

Box plots

The final plot we will look at is a box and whisker plot.

Boxplots allow you to quickly review data distributions, showing the median and 1st/3rd quartile.

Read in bigger data

First lets read in the gene expression data. This a matrix where each column corresponds toa group of observations that we want to make a single boxplot.

exprs <- read.delim("data/gene_data.txt", sep = "\t", h = T, row.names = 1)
head(exprs)

##                    Untreated1 Untreated2  Treated1   Treated2
## ENSDARG00000093639  0.8616832  1.9311442 0.1041508 0.14055604
## ENSDARG00000094508  0.9857575  2.0256352 0.1549917 0.20301609
## ENSDARG00000095893  0.8498889  1.9875580 0.2317969 0.20925123
## ENSDARG00000095252  0.9242996  2.0857620 0.2562264 0.24669079
## ENSDARG00000078878  0.3571734  0.4653908 0.1167221 0.09710237
## ENSDARG00000079403  1.0604071  1.2581398 0.3884836 0.31567299

Boxplots

Now we can use the boxplot() function on our data.frame to get our boxplot

boxplot(exprs)

Rescaling

Perhaps it would look better on a log scale. We can add addition colors and labels as with other plots.

boxplot(log2(exprs), ylab = "log2 Expression", col = c("red", "red", "blue", "blue"))

Heatmaps

Heatmaps are really easy to make with the heatmap() function. The input required is a numeric matrix, so we can use our expression data set again. But first we must coerce it from a data frame.

exprs_mat <- as.matrix(exprs)
heatmap(exprs_mat, cexCol = 0.75)

Heatmaps

An issue with this heatmap is there no color scale bar. Unfortunately there is no argument or straightforward alternative for adding it to our plot. Luckily there are specialist packages that compensate for some of the holes in base R.

Going Beyond Base Plotting

Specialist packages

There are many packages that extend the capabilities of base R. We have mentioned ggplot2 which is a whole different philosophy. But there are many packages that focus on making specific plot types that aren’t handled well in base R:

Heatmaps
Violin Plots
Venn Diagrams/Euler Plots

Heatmaps

There are several options for making heatmaps in R beyond base R. We will show you pheatmap. The result looks similar, but unlike base heatmap, pheatmap does not scale by row.

library(pheatmap)
pheatmap(exprs_mat)

Heatmaps

We can turn the scaling on using the scale argument. This will do a by row Zscore. Lets also turn the rownames off as we have too many to read them properly.

library(pheatmap)
pheatmap(exprs_mat, scale = "row", show_rownames = F)

Heatmaps

We can use our knowledge of color palettes to customize the heatmaps. In this case we are recoloring it so the 0 values are white.

To do this we need to specify the number of colors and the breaks for those colors. The breaks are the numeric value for each color on the gradient.

library(RColorBrewer)

my_pal <- colorRampPalette(c("Blue", "White", "Red"))(60)
my_breaks = seq(-1.5, 1.5, 0.05)

Heatmaps

We can give our new colors and breaks to the pheatmap function.

pheatmap(exprs_mat, scale = "row", show_rownames = F, breaks = my_breaks, color = my_pal)

Violin Plots

Violin plots are often preferred to box plots as they show the underlying distribution.

library(vioplot)

vioplot(log2(exprs), main = "distribution of log2(expression)")

Venn Diagram

We can show overlaps between groups by creating a venn diagram. We can use the eulerr package to create a variant of this, the Euler plot. This provides a good approximation of the relative sizes of overlaps, so the plot is more visually informative.

library(eulerr)
mat <- cbind(A = sample(c(TRUE, TRUE, FALSE), 50, TRUE), B = sample(c(TRUE, FALSE),
    50, TRUE), C = sample(c(TRUE, FALSE, FALSE, FALSE), 50, TRUE))
head(mat)

##          A     B     C
## [1,] FALSE FALSE FALSE
## [2,]  TRUE FALSE FALSE
## [3,]  TRUE  TRUE FALSE
## [4,] FALSE  TRUE FALSE
## [5,]  TRUE  TRUE  TRUE
## [6,]  TRUE  TRUE FALSE

Venn Diagram

First we cacluate the euler fit with the euler() function. Then we can use plot() to make the plot.

fit2 <- euler(mat)
plot(fit2)

Combining Plots

R makes it easy to combine multiple plots into one overall graph, using either the par( ) or layout( ) function. With the par( ) function, you can include the option mfrow=c(nrows, ncols) to create a matrix of nrows x ncols plots that are filled in by row. mfcol=c(nrows, ncols) fills in the matrix by columns.

Define a layout with 2 rows and 2 columns

par(mfrow = c(2, 2))

Combining Plots

Plot histograms for different columns in the data frame separately. This is not very efficient.

par(mfrow = c(2, 2))
hist(exprs$Untreated1)
hist(exprs$Untreated2)
hist(exprs$Treated1)
hist(exprs$Treated2)

Combining Plots

You could also do it more efficiently using a for loop.

par(mfrow = c(2, 2))
for (i in 1:4) {
    hist(exprs[, i])
}

Other parameter options

The par() function can control a variety of other graph parameters.

mar - size of plot margins
mgp - spacing between margin elements i.e. axis labels and titles
fig - dimensions of whole plot

Other Customizations

Text

Custom text can be added to you plot using the text() function. Simply provide the position and the label.

You can use the data itself to label data points. The adj argument allows you to nudge the annotation a constant amount away from the defined position.

Any labels to be added to the margin need to use mtext() instead.

plot(control, treatment)
text(20, 60, "THIS IS MY PLOT", col = "red")
text(control, treatment, letters[1:6], adj = c(0, -1), col = "blue")

Lines

abline() allows you to add specific straight lines. This is often useful to help demonstrate known linear relationships or thresholds as reference points for your data. * h = horizontal line with y-intercept * v = vertical line with x-intercept * a,b = intercept and slope

plot(control, treatment)
abline(h = 10, col = "blue")
abline(v = 50, col = "red", lwd = 2)
abline(a = 0, b = 1, lty = 2)

Shapes

polygon() allows you to draw specific polygons. You just need to give it the coordinates of each vertex. Again this is often to highlight specific parts of the plot. This can be filled, or if you give the density argument there will be a hash fill.

plot(control, treatment)
polygon(c(50, 50, 100, 100), c(50, 80, 80, 50), col = "gray", density = 5)

Saving Plots

Saving your plots

There are many different ways of saving your plots in R.

The easiest way is to use the export button in the plot pane in RStudio. This is not good reproducible practice though as the code is not tied to the plot.

To save plots through the console, the argument you would need is name of file in which you want to save the plot. Plotting commands then can be entered as usual. The output is then redirected to the file, instead of the plotting window.

When you’re done with your plotting commands, enter the dev.off() command to close the file.

jpeg(filename, width = 480, height = 480, units = "px", pointsize = 12, quality = 75)
plot(control)
dev.off()

Saving in pdf format

PDFs are maybe the most useful format to export into. PDFs are vector-based so each part of the plot is saved as scalable cooridnates as opposed to specific pixels.

PDFs can then be opened in imaging software like illustrator or inkscape (this is a open source and free equivalent). When you open a PDF in these programs you can fully customize the plots to your aesthetic with a graphic user interface. Furthermore as they are vector-based, they can be easily assembled into publication quality figures without resolution issues and pixelation.

pdf(file = "control.pdf", paper = "A4")
plot(control)
dev.off()

Exercises on base plotting can be found here

Answers for base plotting can be found here

Help while plotting

Data visualization theory - Fundamentals of Data Visualization
Example plots - R Graph Gallery

Contact

Any suggestions, comments, edits or questions (about content or the slides themselves) please reach out to our GitHub and raise an issue.

Introduction to Plotting

Rockefeller University, Bioinformatics Resource Centre

http://rockefelleruniversity.github.io/Plotting_In_R/

Plotting in R

Set Up

Course content

Set the Working directory

Introduction

A pen and paper model

What do we use?

Scatter and line Charts

Scatter and line Charts

Scatter Plot

Plot Customization

Parameters

Type

Type

Point size

Point shape

Line weight

Line type

Color

Title

Axis labels

Axis labels

Other parameters

Plot multiple vectors

Plot multiple vectors

Setting plot limits

Defining your limits

Custom axes

Creating axes

Creating axes

Framing plots

Plot multiple lines

Legends

Put it all together

Colors

Named Colors and hex codes

Palettes

Palettes

Palettes

Palettes

Color blindness + Uniform Perception

Color blindness + Uniform Perception

Viridis

Viridis alternatives

Good data visualization

Bar Charts

Bar Charts

Labels

Labels

Stacking

Stacking

Grouping

Customization

Histograms and Density Plots

Histograms

Customization

Breaks

Density Plot

Density Plot

Dot Charts

Dot charts

Customization

Box Plots

Box plots

Read in bigger data

Boxplots

Rescaling

Heatmaps

Heatmaps

Heatmaps

Going Beyond Base Plotting

Specialist packages

Heatmaps

Heatmaps

Heatmaps

Heatmaps

Violin Plots