Plotting in R


Set Up

All prerequisites, links to material and slides for this course can be found on github.

Or can be downloaded as a zip archive from here.

Course content

Once the zip file in unarchived. All presentations as HTML slides and pages, their R code and HTML practical sheets will be available in the directories underneath.

  • presentations/slides/ Presentations as an HTML slide show.
  • presentations/singlepage/ Presentations as an HTML single page.
  • presentations/r_code/ R code in presentations.
  • exercises/ Practicals as HTML pages.
  • answers/ Practicals with answers as HTML pages and R code solutions.

Set the Working directory

Before running any of the code in the practicals or slides we need to set the working directory to the folder we unarchived.

You may navigate to the unarchived Plotting_In_R folder in the Rstudio menu.

Session -> Set Working Directory -> Choose Directory

or in the console.

setwd("/PathToMyDownload/Plotting_In_R-master/r_course")
# e.g. setwd("~/Downloads/Plotting_In_R-master/r_course")

Introduction

R has excellent graphics and plotting capabilities. In fact this is commonly seen as one of the advantages of R over other competing languages like python and matlab. They are mostly found in following three sources. + base graphics + the lattice package + the ggplot2 package

Base R graphics uses a pen and paper model for plotting while Lattice and ggplot2 packages are built on the routines first used in grid graphics.

A pen and paper model

  • Once plot is produced, can only add more elements, cannot remove.
  • Makes it hard to update.
  • But faster than more complex plotting systems.

Building a new plot is often a stepwise process with gradual addition of features. It will likely require replotting many times.

Scatter and line Charts


Scatter and line Charts

First we’ll produce a very simple graph using the values in a numeric vector:

treatment <- c(0.02,1.8, 17.5, 55,75.7, 80)

Scatter Plot

Now we plot the treatment vector with default parameters.

plot(treatment)

Plot Customization


Type

First we can plot treatment using points overlayed by a line. We control this with the type argument.

plot(treatment, type="o")

Type

To see a complete list we can use ?plot

plot(treatment, type="l")

plot(treatment, type="p")

Title

We add a title with main argument and or a sub-title with the sub argument.

plot(treatment, main="My Plot", sub="a plot")

Axis labels

We can customize our x and y axis label with the xlab and ylab arguments respectively.

plot(treatment, xlab="Position", ylab="score")

Axis labels

We can control the orientation of labels on axis using las argument.

plot(treatment, las=1)

plot(treatment, las=2)

Point size

We can control the size of points in our plot using the cex parameter.

plot(treatment, cex=2)

plot(treatment, cex=0.5)

Point shape

We can control the type of points in our plot using the pch parameter.

plot(treatment, pch=1)

plot(treatment, pch=20)

Line weight

Similarly when plotting a line we control size with lwd parameter.

plot(treatment, type="l",lwd=10)

plot(treatment, type="l",lwd=0.5)

Line type

We can also control the type of line with lty parameter.

plot(treatment, type="l",lty=1)

plot(treatment, type="l",lty=2)

Color

An important parameter we can control is color. We can control color or lines or points using the col argument.

plot(treatment, type="l", col="red")

plot(treatment, type="l", col="dodgerblue")

Choosing colors

You can find an extensive list of R colors here.

R color Chart

Other parameters

Review ?plot and ?par for complete list of options.

Plot multiple vectors

The plot function vector will accept two vectors to be plotted against each other.

control <- c(0, 20, 40, 60, 80,100)
plot(treatment,control)

Plot multiple vectors

We often want multiple lines in same plot. So if we want to plot scores for control and treatment against position we will need a new method.

We can add an additional line to our existing plot using the lines() function.

plot(treatment, type="o", col="blue")
lines(control, type="o", pch=22, lty=2, col="red")

Setting plot limits

The new line doesn’t quite fit into our original plot.

We can extend our x or y axis by specifying values to xlim ylim arguments directly.

control <- c(0, 20, 40, 60, 80,100)
plot(treatment, type="o", col="blue",ylim=c(0,100))
lines(control, type="o", pch=22, lty=2, col="red")

Defining your limits

Instead of defining the axis limits explicitly we can compute the y-axis values using the range function. This means any updates to our data will be automatically reflected in our graph.

range() returns a vector containing the minimum and maximum of all the given arguments.

Calculate range from 0 to max value of data.

g_range <- range(0, treatment, control)
g_range
## [1]   0 100

Custom axes

To be able to customize axes we need to turn off axes and annotations (axis labels). We will then be able to specify them ourselves. We turn of axis and annotation plotting using axes=FALSE and ann=FALSE

plot(treatment, type="o", col="blue", 
     ylim=g_range, axes=FALSE, ann=FALSE)

Creating axes

We can create our own X axis by using the axis() function. We specify the side argument for where to place axis, the at argument to specify where to put axis ticks and lab argument to specify labels for axis ticks.

axis(side=1, at=1:6, lab=c("Mon","Tue","Wed","Thu","Fri","Sat"))

Creating axes

We can make our y axis with horizontal labels that display ticks at every 20 marks in a similar way.

We specify our side and use seq() function to make axis tick postions for at argument. We can use our y-axis range again to help define how many ticks we need.

axis(2, las=1, at=seq(0,g_range[2],by=20))

Framing plots

We can now add a box around our plot using the box() function.

box()

Plot multiple lines

Now I can add my control data using lines argument.

lines(control, type="o", pch=22, lty=2, col="red")

Legends

Finally we may wish to add a legend to out plot. We can add a legend to current plot using the legend() function.

We need to specify where to place legend in plot, the names in legend to legend argument and any additional point/line type configuration we used e.g the color and shape.

legend("topleft",legend=c("treatment","control"),
       col=c("blue","red"), pch=21:22, lty=1:2);  

Making plots readable

In our line plot we have already done a good job of making it easier to differentiate the lines as we have different line styles and different shape points.

Other things we can do is also differentiate thickness.

Put it all together

To make that final plot you can see that there are many lines of code we put together.

plot(treatment, type="o", col="blue", lwd=1, ylim=g_range,axes=FALSE, ann=FALSE)
axis(1, at=1:6, lab=c("Mon","Tue","Wed","Thu","Fri","Sat"))
axis(2, las=1, at=20*0:g_range[2])
box()

lines(control, type="o", pch=22, lty=2, col="red", lwd=2.5)
legend("topleft",legend=c("treatment","control"),col=c("blue","red"), pch=21:22, lty=1:2, lwd=c(1,2.5))

Color blindness and palettes

~4% of people are color blind. In white males this number raises to ~10%. Considering the demographics in science, there will likely be someone with color blindness in your meeting.

Palette packages exist that contain a curated collection colors. These can be themed for anything, from La Croix flavors to Pokemon. A list of palettes can be found here. Some of the more useful palettes are designed to be color blind friendly, like viridis. To get colors from the package you just have to call the function with the number of colors you want.

install.packages('viridis')
library(viridis)
viridis(5)
## [1] "#440154FF" "#3B528BFF" "#21908CFF" "#5DC863FF" "#FDE725FF"

Good data visualization

There are often a trade offs in creating good plots.

  • Is it easy to digest and accesible to everyone?
  • Is it engaging and appealing?
  • Does it contain all the information with nothing superfluous?
  • Is it the best way to tell the story I want to tell?

Fundamentals of Data Visualization by Claus O. Wilke is a good resource on the theory of making data visualizations the right way.

Bar Charts


Bar Charts

Base graphics has a useful built in function for bar charts. The barplot() function. We can simply pass our numeric vector to this function to get our barchart.

barplot(treatment)

Labels

The barplot() function hasn’t added any labels by default. We can speciy our own however using the names.arg argument. names.arg is a vector of names to be plotted below each bar or group of bars.

barplot(treatment,
        names.arg=c("Mon","Tue","Wed","Thu","Fri","Sat"))

Labels

If my vector was named however, then my vectors names would be used for labels. We use names() function to add names to our vector then we replot.

names(treatment) <- c("Mon","Tue","Wed","Thu","Fri","Sat")
barplot(treatment)

Stacking

Sometimes you want to have several data series stacked in a single barplot. The barplot() function handles this readily.

Let’s read the data from the example_plot.txt data file.

Read values from tab-delimited example_plot.txt

data <- read.table("data/example_plot.txt", header=T, row.names=1, sep=",")

Stacking

To build a stacked barplot we need to give the barplot funcion a matrix. We can use as.matrix() function to convert our data frame to a matrix.

barplot(as.matrix(data))

Grouping

Now we can plot data from a matrix with grouped barchart using the beside argument.

barplot(as.matrix(data),beside=TRUE)

Customization

Though a different function to plot(), barplot can be customized in much the same way. Most of the parameters have the same names.

barplot(as.matrix(data), main="Daily progression of X in\nControl and Treatment", ylab= "Total", beside=TRUE, 
        col= viridis(6))
legend("topleft", c("Mon","Tue","Wed","Thu","Fri","Sat"), cex=0.8,
        fill=  viridis(6))

Histograms


Histograms

Base graphics has a useful built-in function for histograms too. This is the hist() function, which just needs a numeric vector.

hist(treatment)  

Customization

Similar customization exists as for other plots.

hist(treatment, col="lightblue", ylim=c(0,5),cex.main=0.8)

Breaks

We can create more fine grained histogram by specify the number of required bins to the breaks argument.

hist(treatment, col="lightblue", 
     ylim=c(0,5), cex.main=0.8, 
     breaks = 2)

hist(treatment, col="lightblue", 
     ylim=c(0,5), cex.main=0.8, 
     breaks = 10)

Dot Charts


Dot charts

Base graphics also has a dotchart() function. Dot charts help compare paired data. First though we need to modify the matrix, as we are comparing in pairs as opposed to all control versus treatment.

We use the function t to return the transpose of a matrix. This means rows are now columns and the columns are now rows.

dotchart(t(data))  

Customization

Again we can use the arguments to modify the layout and appearance.

Now we create a colored dotchart for autos with smaller labels.

dotchart(t(data), color=c("red","blue"),main="Dotchart", cex=0.5)

Box Plots


Box plots

The final plot we will look at is a box and whisker plot.

Boxplots allow you to quickly review data distributions, showing the median and 1st/3rd quartile.

Read in bigger data

First lets read in the gene expression data

exprs <- read.delim("data/gene_data.txt",sep="\t",h=T,row.names = 1)
head(exprs)
##                    Untreated1 Untreated2  Treated1   Treated2
## ENSDARG00000093639  0.8616832  1.9311442 0.1041508 0.14055604
## ENSDARG00000094508  0.9857575  2.0256352 0.1549917 0.20301609
## ENSDARG00000095893  0.8498889  1.9875580 0.2317969 0.20925123
## ENSDARG00000095252  0.9242996  2.0857620 0.2562264 0.24669079
## ENSDARG00000078878  0.3571734  0.4653908 0.1167221 0.09710237
## ENSDARG00000079403  1.0604071  1.2581398 0.3884836 0.31567299

Boxplots

Now we can use the boxplot() function on our data.frame to get our boxplot

boxplot(exprs)

Rescaling

Perhaps it would look better on a log scale. We can add addition colors and labels as with other plots.

boxplot(log2(exprs),ylab="log2 Expression",
        col=c("red","red","blue","blue"))

Combining Plots


Combining Plots

R makes it easy to combine multiple plots into one overall graph, using either the par( ) or layout( ) function. With the par( ) function, you can include the option mfrow=c(nrows, ncols) to create a matrix of nrows x ncols plots that are filled in by row. mfcol=c(nrows, ncols) fills in the matrix by columns.

Define a layout with 2 rows and 2 columns

par(mfrow=c(2,2))

Combining Plots

Plot histograms for different columns in the data frame separately. This is not very efficient.

par(mfrow=c(2,2))
hist(exprs$Untreated1)
hist(exprs$Untreated2)
hist(exprs$Treated1)
hist(exprs$Treated2)

Combining Plots

You could also do it more efficiently using a for loop.

par(mfrow=c(2,2))
for (i in 1:4){
hist(exprs[,i])
}

Other parameter options

The par() function can control a variety of other graph parameters.

  • mar - size of plot margins
  • mgp - spacing between margin elements i.e. axis labels and titles
  • fig - dimensions of whole plot

Other Customizations


Text

Custom text can be added to you plot using the text() function. Simply provide the position and the label.

You can use the data itself to label data points. The adj argument allows you to nudge the annotation a constant amount away from the defined position.

Any labels to be added to the margin need to use mtext() instead.

plot(control, treatment)
text(20,60, 'THIS IS MY PLOT', col='red')
text(control, treatment, letters[1:6], adj=c(0,-1), col='blue')

Lines

abline() allows you to add specific straight lines. This is often useful to help demonstrate known linear relationships or thresholds as reference points for your data. * h = horizontal line with y-intercept * v = vertical line with x-intercept * a,b = intercept and slope

plot(control, treatment)
abline(h=10, col='blue')
abline(v=50, col='red', lwd=2)
abline(a=0, b=1, lty=2)

Shapes

polygon() allows you to draw specific polygons. You just need to give it the coordinates of each vertex. Again this is often to highlight specific parts of the plot. This can be filled, or if you give the denisty argument there will be a hash fill.

plot(control, treatment)
polygon(c(50,50,100,100),c(50,80,80,50), col='gray', density=5)

Saving Plots


Saving your plots

There are many different ways of saving your plots in R.

The easiest way is to use the export button in the plot pane in RStudio. This is not good reproducible practice though as the code is not tied to the plot.

To save plots through the console, the argument you would need is name of file in which you want to save the plot. Plotting commands then can be entered as usual. The output would be redirected to the file.

When you’re done with your plotting commands, enter the dev.off() command.

bmp(filename, width = 480, height = 480, units = "px", 
    pointsize = 12)
jpeg(filename, width = 480, height = 480, units = "px", 
     pointsize  = 12, quality = 75)

Saving in bitmap format

bmp(file = "control.bmp")
plot(control)
dev.off()

Saving in jpeg format

jpeg(file = "control.jpg", quality = 20)
plot(control)
dev.off()

Saving in postscript format

postscript(file = "control.ps")
plot(control)
dev.off()

Saving in pdf format

PDFs are maybe the most useful format to export into. PDFs are vector-based so each part of the plot is saved as scalable cooridnates as opposed to specific pixels.

PDFs can then be opened in imaging software like illustrator or inkscape (this is a open source and free software). When you open a PDF in these programs you can fully customize the plots to your aesthetic with a graphic user interface. Furthermore as they are vector-based, they can be easily assembled into publication quality figures without resolution issues and pixelation.

pdf(file = "control.pdf", paper = "A4")
plot(control)
dev.off()

Exercises on base plotting can be found here

Answers for base plotting can be found here

Help while plotting

Contact

Any suggestions, comments, edits or questions (about content or the slides themselves) please reach out to our GitHub and raise an issue.