All prerequisites, links to material and slides for this course can be found on github.
Or can be downloaded as a zip archive from here.
Once the zip file in unarchived. All presentations as HTML slides and pages, their R code and HTML practical sheets will be available in the directories underneath.
Before running any of the code in the practicals or slides we need to set the working directory to the folder we unarchived.
You may navigate to the unarchived Plotting_In_R folder in the Rstudio menu.
Session -> Set Working Directory -> Choose Directory
or in the console.
setwd("/PathToMyDownload/Plotting_In_R-master/r_course")
# e.g. setwd("~/Downloads/Plotting_In_R-master/r_course")
R has excellent graphics and plotting capabilities. In fact this is commonly seen as one of the advantages of R over other competing languages like python and matlab. They are mostly found in following three sources. + base graphics + the lattice package + the ggplot2 package
Base R graphics uses a pen and paper model for plotting while Lattice and ggplot2 packages are built on the routines first used in grid graphics.
Building a new plot is often a stepwise process with gradual addition of features. It will likely require replotting many times.
First we’ll produce a very simple graph using the values in a numeric vector:
<- c(0.02,1.8, 17.5, 55,75.7, 80) treatment
Now we plot the treatment vector with default parameters.
plot(treatment)
First we can plot treatment using points overlayed by a line. We control this with the type argument.
plot(treatment, type="o")
To see a complete list we can use ?plot
plot(treatment, type="l")
plot(treatment, type="p")
We add a title with main argument and or a sub-title with the sub argument.
plot(treatment, main="My Plot", sub="a plot")
We can customize our x and y axis label with the xlab and ylab arguments respectively.
plot(treatment, xlab="Position", ylab="score")
We can control the orientation of labels on axis using las argument.
plot(treatment, las=1)
plot(treatment, las=2)
We can control the size of points in our plot using the cex parameter.
plot(treatment, cex=2)
plot(treatment, cex=0.5)
We can control the type of points in our plot using the pch parameter.
plot(treatment, pch=1)
plot(treatment, pch=20)
Similarly when plotting a line we control size with lwd parameter.
plot(treatment, type="l",lwd=10)
plot(treatment, type="l",lwd=0.5)
We can also control the type of line with lty parameter.
plot(treatment, type="l",lty=1)
plot(treatment, type="l",lty=2)
An important parameter we can control is color. We can control color or lines or points using the col argument.
plot(treatment, type="l", col="red")
plot(treatment, type="l", col="dodgerblue")
Review ?plot and ?par for complete list of options.
The plot function vector will accept two vectors to be plotted against each other.
<- c(0, 20, 40, 60, 80,100)
control plot(treatment,control)
We often want multiple lines in same plot. So if we want to plot scores for control and treatment against position we will need a new method.
We can add an additional line to our existing plot using the lines() function.
plot(treatment, type="o", col="blue")
lines(control, type="o", pch=22, lty=2, col="red")
The new line doesn’t quite fit into our original plot.
We can extend our x or y axis by specifying values to xlim ylim arguments directly.
<- c(0, 20, 40, 60, 80,100)
control plot(treatment, type="o", col="blue",ylim=c(0,100))
lines(control, type="o", pch=22, lty=2, col="red")
Instead of defining the axis limits explicitly we can compute the y-axis values using the range function. This means any updates to our data will be automatically reflected in our graph.
range() returns a vector containing the minimum and maximum of all the given arguments.
Calculate range from 0 to max value of data.
<- range(0, treatment, control)
g_range g_range
## [1] 0 100
To be able to customize axes we need to turn off axes and annotations (axis labels). We will then be able to specify them ourselves. We turn of axis and annotation plotting using axes=FALSE and ann=FALSE
plot(treatment, type="o", col="blue",
ylim=g_range, axes=FALSE, ann=FALSE)
We can create our own X axis by using the axis() function. We specify the side argument for where to place axis, the at argument to specify where to put axis ticks and lab argument to specify labels for axis ticks.
axis(side=1, at=1:6, lab=c("Mon","Tue","Wed","Thu","Fri","Sat"))
We can make our y axis with horizontal labels that display ticks at every 20 marks in a similar way.
We specify our side and use seq() function to make axis tick postions for at argument. We can use our y-axis range again to help define how many ticks we need.
axis(2, las=1, at=seq(0,g_range[2],by=20))
We can now add a box around our plot using the box() function.
box()
Now I can add my control data using lines argument.
lines(control, type="o", pch=22, lty=2, col="red")
Finally we may wish to add a legend to out plot. We can add a legend to current plot using the legend() function.
We need to specify where to place legend in plot, the names in legend to legend argument and any additional point/line type configuration we used e.g the color and shape.
legend("topleft",legend=c("treatment","control"),
col=c("blue","red"), pch=21:22, lty=1:2);
In our line plot we have already done a good job of making it easier to differentiate the lines as we have different line styles and different shape points.
Other things we can do is also differentiate thickness.
To make that final plot you can see that there are many lines of code we put together.
plot(treatment, type="o", col="blue", lwd=1, ylim=g_range,axes=FALSE, ann=FALSE)
axis(1, at=1:6, lab=c("Mon","Tue","Wed","Thu","Fri","Sat"))
axis(2, las=1, at=20*0:g_range[2])
box()
lines(control, type="o", pch=22, lty=2, col="red", lwd=2.5)
legend("topleft",legend=c("treatment","control"),col=c("blue","red"), pch=21:22, lty=1:2, lwd=c(1,2.5))
~4% of people are color blind. In white males this number raises to ~10%. Considering the demographics in science, there will likely be someone with color blindness in your meeting.
Palette packages exist that contain a curated collection colors. These can be themed for anything, from La Croix flavors to Pokemon. A list of palettes can be found here. Some of the more useful palettes are designed to be color blind friendly, like viridis. To get colors from the package you just have to call the function with the number of colors you want.
install.packages('viridis')
library(viridis)
viridis(5)
## [1] "#440154FF" "#3B528BFF" "#21908CFF" "#5DC863FF" "#FDE725FF"
There are often a trade offs in creating good plots.
Fundamentals of Data Visualization by Claus O. Wilke is a good resource on the theory of making data visualizations the right way.
Base graphics has a useful built in function for bar charts. The barplot() function. We can simply pass our numeric vector to this function to get our barchart.
barplot(treatment)
The barplot() function hasn’t added any labels by default. We can speciy our own however using the names.arg argument. names.arg is a vector of names to be plotted below each bar or group of bars.
barplot(treatment,
names.arg=c("Mon","Tue","Wed","Thu","Fri","Sat"))
If my vector was named however, then my vectors names would be used for labels. We use names() function to add names to our vector then we replot.
names(treatment) <- c("Mon","Tue","Wed","Thu","Fri","Sat")
barplot(treatment)
Sometimes you want to have several data series stacked in a single barplot. The barplot() function handles this readily.
Let’s read the data from the example_plot.txt data file.
Read values from tab-delimited example_plot.txt
<- read.table("data/example_plot.txt", header=T, row.names=1, sep=",") data
To build a stacked barplot we need to give the barplot funcion a matrix. We can use as.matrix() function to convert our data frame to a matrix.
barplot(as.matrix(data))
Now we can plot data from a matrix with grouped barchart using the beside argument.
barplot(as.matrix(data),beside=TRUE)
Though a different function to plot(), barplot can be customized in much the same way. Most of the parameters have the same names.
barplot(as.matrix(data), main="Daily progression of X in\nControl and Treatment", ylab= "Total", beside=TRUE,
col= viridis(6))
legend("topleft", c("Mon","Tue","Wed","Thu","Fri","Sat"), cex=0.8,
fill= viridis(6))
Base graphics has a useful built-in function for histograms too. This is the hist() function, which just needs a numeric vector.
hist(treatment)
Similar customization exists as for other plots.
hist(treatment, col="lightblue", ylim=c(0,5),cex.main=0.8)
We can create more fine grained histogram by specify the number of required bins to the breaks argument.
hist(treatment, col="lightblue",
ylim=c(0,5), cex.main=0.8,
breaks = 2)
hist(treatment, col="lightblue",
ylim=c(0,5), cex.main=0.8,
breaks = 10)
Base graphics also has a dotchart() function. Dot charts help compare paired data. First though we need to modify the matrix, as we are comparing in pairs as opposed to all control versus treatment.
We use the function t to return the transpose of a matrix. This means rows are now columns and the columns are now rows.
dotchart(t(data))
Again we can use the arguments to modify the layout and appearance.
Now we create a colored dotchart for autos with smaller labels.
dotchart(t(data), color=c("red","blue"),main="Dotchart", cex=0.5)
The final plot we will look at is a box and whisker plot.
Boxplots allow you to quickly review data distributions, showing the median and 1st/3rd quartile.
First lets read in the gene expression data
<- read.delim("data/gene_data.txt",sep="\t",h=T,row.names = 1)
exprs head(exprs)
## Untreated1 Untreated2 Treated1 Treated2
## ENSDARG00000093639 0.8616832 1.9311442 0.1041508 0.14055604
## ENSDARG00000094508 0.9857575 2.0256352 0.1549917 0.20301609
## ENSDARG00000095893 0.8498889 1.9875580 0.2317969 0.20925123
## ENSDARG00000095252 0.9242996 2.0857620 0.2562264 0.24669079
## ENSDARG00000078878 0.3571734 0.4653908 0.1167221 0.09710237
## ENSDARG00000079403 1.0604071 1.2581398 0.3884836 0.31567299
Now we can use the boxplot() function on our data.frame to get our boxplot
boxplot(exprs)
Perhaps it would look better on a log scale. We can add addition colors and labels as with other plots.
boxplot(log2(exprs),ylab="log2 Expression",
col=c("red","red","blue","blue"))
R makes it easy to combine multiple plots into one overall graph, using either the par( ) or layout( ) function. With the par( ) function, you can include the option mfrow=c(nrows, ncols) to create a matrix of nrows x ncols plots that are filled in by row. mfcol=c(nrows, ncols) fills in the matrix by columns.
Define a layout with 2 rows and 2 columns
par(mfrow=c(2,2))
Plot histograms for different columns in the data frame separately. This is not very efficient.
par(mfrow=c(2,2))
hist(exprs$Untreated1)
hist(exprs$Untreated2)
hist(exprs$Treated1)
hist(exprs$Treated2)
You could also do it more efficiently using a for loop.
par(mfrow=c(2,2))
for (i in 1:4){
hist(exprs[,i])
}
The par() function can control a variety of other graph parameters.
Custom text can be added to you plot using the text() function. Simply provide the position and the label.
You can use the data itself to label data points. The adj argument allows you to nudge the annotation a constant amount away from the defined position.
Any labels to be added to the margin need to use mtext() instead.
plot(control, treatment)
text(20,60, 'THIS IS MY PLOT', col='red')
text(control, treatment, letters[1:6], adj=c(0,-1), col='blue')
abline() allows you to add specific straight lines. This is often useful to help demonstrate known linear relationships or thresholds as reference points for your data. * h = horizontal line with y-intercept * v = vertical line with x-intercept * a,b = intercept and slope
plot(control, treatment)
abline(h=10, col='blue')
abline(v=50, col='red', lwd=2)
abline(a=0, b=1, lty=2)
polygon() allows you to draw specific polygons. You just need to give it the coordinates of each vertex. Again this is often to highlight specific parts of the plot. This can be filled, or if you give the denisty argument there will be a hash fill.
plot(control, treatment)
polygon(c(50,50,100,100),c(50,80,80,50), col='gray', density=5)
There are many different ways of saving your plots in R.
The easiest way is to use the export button in the plot pane in RStudio. This is not good reproducible practice though as the code is not tied to the plot.
To save plots through the console, the argument you would need is name of file in which you want to save the plot. Plotting commands then can be entered as usual. The output would be redirected to the file.
When you’re done with your plotting commands, enter the dev.off() command.
bmp(filename, width = 480, height = 480, units = "px",
pointsize = 12)
jpeg(filename, width = 480, height = 480, units = "px",
pointsize = 12, quality = 75)
bmp(file = "control.bmp")
plot(control)
dev.off()
jpeg(file = "control.jpg", quality = 20)
plot(control)
dev.off()
postscript(file = "control.ps")
plot(control)
dev.off()
PDFs are maybe the most useful format to export into. PDFs are vector-based so each part of the plot is saved as scalable cooridnates as opposed to specific pixels.
PDFs can then be opened in imaging software like illustrator or inkscape (this is a open source and free software). When you open a PDF in these programs you can fully customize the plots to your aesthetic with a graphic user interface. Furthermore as they are vector-based, they can be easily assembled into publication quality figures without resolution issues and pixelation.
pdf(file = "control.pdf", paper = "A4")
plot(control)
dev.off()
Exercises on base plotting can be found here
Answers for base plotting can be found here
Data vizualisation theory - Fundamentals of Data Visualization
Example plots - R Graph Gallery
Any suggestions, comments, edits or questions (about content or the slides themselves) please reach out to our GitHub and raise an issue.