All prerequisites, links to material and slides for this course can be found on github.
Or can be downloaded as a zip archive from here.
Once the zip file in unarchived. All presentations as HTML slides and pages, their R code and HTML practical sheets will be available in the directories underneath.
Before running any of the code in the practicals or slides we need to set the working directory to the folder we unarchived.
You may navigate to the unarchived Plotting_In_R folder in the Rstudio menu.
Session -> Set Working Directory -> Choose Directory
or in the console.
R has excellent graphics and plotting capabilities. In fact this is commonly seen as one of the advantages of R over other competing languages like python and matlab. They are mostly found in following three sources. + base graphics + the lattice package + the ggplot2 package
Base R graphics uses a pen and paper model for plotting while Lattice and ggplot2 packages are built on the routines first used in grid graphics.
Building a new plot is often a stepwise process with gradual addition of features. It will likely require replotting many times.
Base Plot - Quick and easy plots while we are initially reviewing data
ggplot2 - Producing publication quality figures
First we’ll produce a very simple graph using the values in a numeric vector:
Many features of pots can be customized, by providing addtional arguments to the plot() function.
First we can plot treatment using points overlayed by a line. We control this with the type argument.
We can control the size of points in our plot using the cex parameter.
We can control the type of points in our plot using the pch parameter.
Similarly when plotting a line we control size with lwd parameter.
We can also control the type of line with lty parameter.
An important parameter we can control is color. We can control color or lines or points using the col argument.
We add a title with main argument and or a sub-title with the sub argument.
We can customize our x and y axis label with the xlab and ylab arguments respectively.
We can control the orientation of labels on axis using las argument.
Review ?plot and ?par for complete list of options.
The plot function vector will accept two vectors to be plotted against each other.
We often want multiple lines in same plot. So if we want to plot scores for control and treatment against position we will need a new method.
We can add an additional line to our existing plot using the lines() function.
The new line doesn’t quite fit into our original plot.
We can extend our x or y axis by specifying values to xlim ylim arguments directly.
Instead of defining the axis limits explicitly we can compute the y-axis values using the range function. This means any updates to our data will be automatically reflected in our graph.
range() returns a vector containing the minimum and maximum of all the given arguments.
Calculate range from 0 to max value of data.
## [1] 0 100
To be able to customize axes we need to turn off axes and annotations (axis labels). We will then be able to specify them ourselves. We turn of axis and annotation plotting using axes=FALSE and ann=FALSE
We can create our own X axis by using the axis() function. We specify the side argument for where to place axis, the at argument to specify where to put axis ticks and lab argument to specify labels for axis ticks.
We can make our y axis with horizontal labels that display ticks at every 20 marks in a similar way.
We specify our side and use seq() function to make axis tick postions for at argument. We can use our y-axis range again to help define how many ticks we need.
Now I can add my control data using lines argument.
Finally we may wish to add a legend to out plot. We can add a legend to current plot using the legend() function.
We need to specify where to place legend in plot, the names in legend to legend argument and any additional point/line type configuration we used e.g the color and shape.
In our line plot we have already done a good job of making it easier to differentiate the lines as we have different line styles and different shape points.
Other things we can do is also differentiate thickness.
To make that final plot you can see that there are many lines of code we put together.
plot(treatment, type="o", col="blue", lwd=1, ylim=g_range,axes=FALSE, ann=FALSE)
axis(1, at=1:6, lab=c("Mon","Tue","Wed","Thu","Fri","Sat"))
axis(2, las=1, at=20*0:g_range[2])
box()
lines(control, type="o", pch=22, lty=2, col="red", lwd=2.5)
legend("topleft",legend=c("treatment","control"),col=c("blue","red"), pch=21:22, lty=1:2, lwd=c(1,2.5))
Most colors can simply be defined by writing them in as a chracter vector i.e. “green”. There are a wide variety of named colors available in R. From “darkgoldenrod” to “bisque”. And 100 different shades of gray. You can find an extensive list of R colors here.
You can also use hex codes: a hexadecimal format for identifying colors. This gives greater variety of options as they use the full color spectrum. Each pair of characters corresponds to the Red, Green and Blue content for the color i.e. #ffe4c4 (also known as Bisque) is composed of 100% red, 89.4% green and 76.9% blue. Resources like this color picker can be used to help you find specific shades, and even create complementary palettes.
Palettes are prebuilt collections of colors. They can consist of a various numbers of colors and can have different properties i.e. continuous/discrete or divergent
Here we acn look at the rainbow palette that comes with R. Often the palettes are functions where we can simply ask for how many colors we want and it pull appropraite numbers back.
## [1] "#FF0000" "#00FFFF"
Rainbow is continuous so we it will pull as many colors as you ask from the color space, along a specifc gradient defined by the palette.
## [1] "#FF0000" "#FFFF00" "#00FF00" "#00FFFF" "#0000FF" "#FF00FF"
Many people have created there own palette packages along many themes including famous artworks, Wes Anderson movies and Birds. Paleteer is a package in which many different palettes across many themes are collated. In this case we can grab a discrete palettes. This means it will have a limited number of options.
## <colors>
## #3B9AB2FF #78B7C5FF #EBCC2AFF #E1AF00FF #F21A00FF
R color brewer is another popular option. It has many built-in palettes. What makes this package special though is the abilty to customize continuous color palettes with the colorRampPalette() function.
Using custom colors is great and can really help a make a piece of work cohesive and stand out. But you have to be careful.
~4% of people are color blind. In white males this number raises to ~10%. Considering the demographics in science, there will likely be someone with color blindness in your meeting.
Furthermore when we pick gradients the ability to see patterns in the data varies depending on the color scales used, even in sighted people.
The viridis color palette was designed by color scientists to be perceptually uniform, to have a wide dynamic range, to be accessible to the various forms of color blindness and to work even when converted into gray scale.
## [1] "#440154FF" "#3B528BFF" "#21908CFF" "#5DC863FF" "#FDE725FF"
There are often a trade offs in creating good plots.
Fundamentals of Data Visualization by Claus O. Wilke is a good resource on the theory of making data visualizations the right way.
Base graphics has a useful built in function for bar charts. The barplot() function. We can simply pass our numeric vector to this function to get our barchart.
The barplot() function hasn’t added any labels by default. We can speciy our own however using the names.arg argument. names.arg is a vector of names to be plotted below each bar or group of bars.
If my vector was named however, then my vectors names would be used for labels. We use names() function to add names to our vector then we replot.
Sometimes you want to have several data series stacked in a single barplot. The barplot() function handles this readily.
Let’s read the data from the example_plot.txt data file.
Read values from tab-delimited example_plot.txt
To build a stacked barplot we need to give the barplot funcion a matrix. We can use as.matrix() function to convert our data frame to a matrix.
Now we can plot data from a matrix with grouped barchart using the beside argument.
Though a different function to plot(), barplot can be customized in much the same way. Most of the parameters have the same names.
Base graphics has a useful built-in function for histograms too. This is the hist() function, which just needs a numeric vector.
Similar customization exists as for other plots.
Base graphics also has a dotchart() function. Dot charts help compare paired data. First though we need to modify the matrix, as we are comparing in pairs as opposed to all control versus treatment.
We use the function t to return the transpose of a matrix. This means rows are now columns and the columns are now rows.
The final plot we will look at is a box and whisker plot.
Boxplots allow you to quickly review data distributions, showing the median and 1st/3rd quartile.
First lets read in the gene expression data
## Untreated1 Untreated2 Treated1 Treated2
## ENSDARG00000093639 0.8616832 1.9311442 0.1041508 0.14055604
## ENSDARG00000094508 0.9857575 2.0256352 0.1549917 0.20301609
## ENSDARG00000095893 0.8498889 1.9875580 0.2317969 0.20925123
## ENSDARG00000095252 0.9242996 2.0857620 0.2562264 0.24669079
## ENSDARG00000078878 0.3571734 0.4653908 0.1167221 0.09710237
## ENSDARG00000079403 1.0604071 1.2581398 0.3884836 0.31567299
R makes it easy to combine multiple plots into one overall graph, using either the par( ) or layout( ) function. With the par( ) function, you can include the option mfrow=c(nrows, ncols) to create a matrix of nrows x ncols plots that are filled in by row. mfcol=c(nrows, ncols) fills in the matrix by columns.
Define a layout with 2 rows and 2 columns
Plot histograms for different columns in the data frame separately. This is not very efficient.
You could also do it more efficiently using a for loop.
The par() function can control a variety of other graph parameters.
Custom text can be added to you plot using the text() function. Simply provide the position and the label.
You can use the data itself to label data points. The adj argument allows you to nudge the annotation a constant amount away from the defined position.
Any labels to be added to the margin need to use mtext() instead.
abline() allows you to add specific straight lines. This is often useful to help demonstrate known linear relationships or thresholds as reference points for your data. * h = horizontal line with y-intercept * v = vertical line with x-intercept * a,b = intercept and slope
polygon() allows you to draw specific polygons. You just need to give it the coordinates of each vertex. Again this is often to highlight specific parts of the plot. This can be filled, or if you give the denisty argument there will be a hash fill.
There are many different ways of saving your plots in R.
The easiest way is to use the export button in the plot pane in RStudio. This is not good reproducible practice though as the code is not tied to the plot.
To save plots through the console, the argument you would need is name of file in which you want to save the plot. Plotting commands then can be entered as usual. The output would be redirected to the file.
When you’re done with your plotting commands, enter the dev.off() command.
PDFs are maybe the most useful format to export into. PDFs are vector-based so each part of the plot is saved as scalable cooridnates as opposed to specific pixels.
PDFs can then be opened in imaging software like illustrator or inkscape (this is a open source and free equivalent). When you open a PDF in these programs you can fully customize the plots to your aesthetic with a graphic user interface. Furthermore as they are vector-based, they can be easily assembled into publication quality figures without resolution issues and pixelation.
Exercises on base plotting can be found here
Answers for base plotting can be found here
Data vizualisation theory - Fundamentals of Data Visualization
Example plots - R Graph Gallery
Any suggestions, comments, edits or questions (about content or the slides themselves) please reach out to our GitHub and raise an issue.