Reproducible Reports


Set Up


Set Up

All prerequisites, links to material and slides for this course can be found on github.

Or can be downloaded as a zip archive from here.

Course materials

Once the zip file in unarchived. All presentations as HTML slides and pages, their R code and HTML practical sheets will be available in the directories underneath.

  • presentations/slides/ Presentations as an HTML slide show.
  • presentations/singlepage/ Presentations as an HTML single page.
  • presentations/r_code/ R code in presentations.
  • exercises/ Practicals as HTML pages.
  • answers/ Practicals with answers as HTML pages and R code solutions.

Set the Working directory

Before running any of the code in the practicals or slides we need to set the working directory to the folder we unarchived.

You may navigate to the unarchived Reproducible_R folder in the Rstudio menu.

Session -> Set Working Directory -> Choose Directory

or in the console.

setwd("/PathToMyDownload/Reproducible_R-master/r_course")
# e.g. setwd('~/Downloads/Reproducible_R-master/r_course')

Reproducible Research


Reproducible Research

“Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.” – Donald E. Knuth, Literate Programming, 1984

Reproducible Research in R

Sometime in the future, I, or someone else, will need to understand what analysis I did here.

There is a growing push to ensure all research is open and reproducible. New NIH guidelines are going to require plans for preservation and sharing of data, which includes your code.

Using RStudio to make reproducible documents is very easy, so why not?

A very quick reproducible document in R

  • Find your R script of interest.
  • Add comments with # to describe what you are doing.
  • Add the sessionInfo() function to the last line.
  • Click the “Compile Notebook function” -> Select HTML document as output format.
# Generate some random numbers and plot them
myRandNumbers <- rnorm(100, 10, 2)

plot(myRandNumbers)

sessionInfo()

R Markdown


The Gold Standard

When we do some analysis the ideal situation is the preservation od ALL aspects of the analysis in a single document.

Rmarkdown can help with this.

Here is an example of a report generated from Rmd. The Rmd from which this is made is in the scripts directory.

scripts/markdownExample.html scripts/markdownExample.Rmd

R Markdown

R Markdown is built on Markdown.

Github and Sourceforge make use of Markdown syntax in their Readme files and renders these in their web pages. It is also used by other notebook tools like Jupyter.

https://github.com/github/markup/blob/master/README.md

Markdown syntax

Markdown uses simple syntax to control text output.

This allows for the inclusion of font styles, text structures, images and code chunks.

R Markdown Structure

There are 3 main parts

  1. Header - YAML format
  2. Body - Markdown
  3. Code Chunks - R, python, shell etc.

R Markdown Header


R Markdown Header

In R Markdown the options for document processing are stored in YAML format at the top of the document. Most of this is automatically generated when you open a new Rmd.

---
title: "My New Analysis"
author: "Tom Carroll"
date: "19th October 2022"
output: html_document
---

Controlling output type

The output YAML option specifies the document type to be produced. THere are several options.

---
output: html_document
---
---
output: pdf_document
---
---
output: ioslides_presentation
---

Figure options in YAML

Global default options can also be set in the YAML. For example figure sizes can be set within the YAML metadata.

---
output: 
  html_document:
    fig_width: 7
    fig_height: 6
---

Adding styles

Styles for HTML can be applied using the theme option and syntax highlighting styles control by the highlight option.

---
output: 
  html_document:
    theme: journal
    highlight: espresso
---

For a full list of theme options see - http://rmarkdown.rstudio.com/html_document_format.html

R Markdown Body


R Markdown Body

The free form text and annotation is set in Markdown. This written as plain text and has a specific set of formatting rules. For example it ignores new lines.

To include a new line in markdown, you need to end the previous line with two spaces.

This is my first line.  # the comment here shows the line end  
This would be a new line.
This wouldn't be a new line.

This is my first line.
This would be a new line. This wouldn’t be a new line.

Font emphasis

Emphasis can be added to text in markdown documents using either the **_** or __*__

Italics = _Italics_ or *Italics*
Bold  =  __Bold__ or **Bold**

Italics
Bold

Including images

Figures or external images can be used in markdown documents.
Files may be local or accessible from http URL.

![alt text](imgs/Dist.jpg)
![alt text](https://rockefelleruniversity.github.io//r_course/imgs/Dist.jpg)

Creating headers

Section headers can be added to Markdown documents.

Headers follow the same conventions as used in HTML markup and can implemented at multiple levels of size. Section headers in Markdown are created by using the # symbol

# Top level section
## Middle level section
### Bottom level section

Rendered Bottom level section

Lists

Lists can be created in Markdown using the - symbol.
Nested lists be specified with + symbol.

- First item
- Second item
    + Second item A
    + Second item B

My list rendered:

  • First item
  • Second item
    • Second item A
    • Second item B

R Markdown Code Chunks


Code chunks

In RMarkdown, text may be highlighted as if code by placing the text between apostrophes in triplicate: ’’’. The engine used to evaluate the code is in curly brackets, in this case R.

```{r}
hist(rnorm(100))
```

This is what the code chunk renders to in your report:

hist(rnorm(100))

Code chunk options

Many other options can be supplied to an individual code chunk

R can produce a lot of output not related to your results. To control whether messages and warnings are reported in the rendered document we can specify the message and warning arguments.

Loading libraries in R Markdown is often somewhere you would specify these as FALSE.

```{r,warning=F,message=F} 
library(ggplot2)
```

Code chunk options

There are many chunk control options * eval - Run the code? * echo - Include code in final report? * tidy - tidy up the code? * cache - save a cache * fig.height/fig.width - size of plot made by code * fig.path and dev - save plots to specified path in specified format

Other R Markdown tricks


Inserting tables

The results of printing data frames or matrices in the console aren’t neat.

We can insert tables into R Markdown by using the knitr function kable().

temp <- rnorm(3)
temp2 <- rnorm(3)
dfExample <- data.frame(temp, temp2)
kable(dfExample)
temp temp2
-0.5657569 -0.4546865
-0.8049686 0.2702011
-0.0391181 -0.6841633

Evaluating code in the body

Most of your code will be in code chunks. But it may be useful to report the results of R within the block of Markdown. This can be done adding the code to evalulate within ‘r’

Here is some freeform _markdown_ and the first result
from an rnorm call is 'r rnorm(3)[1]', followed by some 
more free form text.

Here is some freeform markdown and the first result from an rnorm call is 1.2933661, followed by some more free form text.

Interactivity


Interactivity

As we are summarizing data in a dynamic document, we can make data exporation easier by creating interactive versions of plots and tables.

Plotly

To do this we use the plotly package. Lets load in the package and the plot we want to make interactive. We have an R object that contains a ggplot2 object in our data directory.

library(plotly)
load("data/pcPlot.RData")
pcPlot

Plotly labels

The ggplotly function will wrap our ggplot object and add interactivity so we can hover over the points.

ggplotly(pcPlot)

Plotly labels

The information assocaited with the plot are naturally inherited as labels. If we want to add custom labels we can specify it in our plot.

ggplotly(pcPlot + geom_point(aes(label = Sample)))
## Warning in geom_point(aes(label = Sample)): Ignoring unknown aesthetics: label

Plotly labels

If we want full control over the labeling we can instead use the ggplotly function to take care of this for us by specifying when/what/where labels are shown.

ggplotly(pcPlot + geom_point(aes(text = Sample)), source = "select", tooltip = c("Sample"))
## Warning in geom_point(aes(text = Sample)): Ignoring unknown aesthetics: text

Tables - DT

Simlar to plots we can create interactive tables that are both searchable and sortable. We do this with the DT package. Lets make some quick demo data.

label <- c("Gene1", "Gene2", "Gene3")
temp <- rnorm(3)
temp2 <- rnorm(3)
dfExample <- data.frame(label, temp, temp2)
dfExample
##   label       temp      temp2
## 1 Gene1  0.9082745 -0.1489549
## 2 Gene2 -1.8589354  0.7825425
## 3 Gene3 -0.6779331  0.3106761

DT

We can simply use the datatable function to then create our interactive table.

library(DT)
datatable(dfExample)

DT

There are many ways to customize our table i.e. removing rownames, adding titles, default sorting etc

datatable(dfExample, rownames = FALSE, caption = "My Table")

Quarto


Quarto

Rmarkdown has been an incredibly useful tool for building reports for a long time, but it is predominately focused on R (though it can be used indirectly for python).

We have also mentioned Jupyter Notebook. It is a similar notebook system. It was built for Python (though it can be used indirectly for R).

Quarto is a new form of report built on Rmds that can handle python and other languages in a more native way.

Rmarkdown and Quarto

Quarto is supported by Posit, the people behind RStudio (who also developed Rmarkdown). Right now if you work in R, Rmarkdown will suffice, but the use of Quarto is growing and ultimately it will be the succesor.

Luckily everything you have just learnt about Rmd formatting will work for Quarto Markdowns (qmds) too.

Why Quarto?

It will natively allow you to run code for: R, Python, Julia, JavaScript.

Bioinformatics has a rich history with R, while Python has a wealth of machine learning packages. With the more recent developments in Deep leanring and AI more folks are dabbling with specific Python tools for parts of their analysis. As a result bioinformatics is becoming increasingly multi-language.

Why Quarto?

There is a lot of active development to add new features not found in Rmds:

How to make a Qmd?

In Rstudio it is the same as making a Rmd. Just pick a different option:

New > Quarto Document

This should largely look the same.

Quarto Code Chunks

Quarto


```{r}
#| warning=F
#| message=F
library(ggplot2)
```

Rmarkdown


```{r,warning=F,message=F} 
library(ggplot2)
```

This will be accepted by Quarto regardless. There is backwards compatability.

Quarto YAML

Most of the YAML will work in the same way, though there are slight differenes in the preferred names.

Quarto

---
format: 
  html:
    fig-width: 7
    fig-height: 6
---

Rmarkdown

---
output: 
  html_document:
    fig_width: 7
    fig_height: 6
---

This will be accepted by Quarto regardless. There is some backwards compatability.

A key difference

Resource like images are not embedded by default into your report when you are using Quarto. This means if you move your report to share it, unless you also move the files associated with the rendered html i.e. MyReport.html and MyReport_files.

To ensure everything you need is embedded into the final document you just need to add embed-resources: true to the YAML.

---
format: 
  html:
    embed-resources: true
    fig-width: 7
    fig-height: 6
---

An example YAML

---
title: "My_Project_Name"
author: 
  - "Matt Paul"
  - "Bioinformatics Resource Center"
  - "Rockefeller University"
date: "`r format(Sys.Date(), '%m/%d/%Y')`"
format:
  html:
    embed-resources: true
    code_folding: hide
    number_sections: true
    theme: yeti
    highlight: tango
    toc: true
    toc_float: true
---

Quarto and Jupyter

As mentioned many people who already are making reports, use Jupyter. These are typically .ipynb files.

Quarto can be used to render these files directly. To do this you just need to add a new cell at the top containing our YAML. Code arguments can also be added.

jupyter

Quarto and Jupyter

You just need to run Quarto directly to render the Jupyter notebook. You will run this opn the command line.


quarto render notebook.ipynb --execute
alt text
alt text

Quarto and VS Code

We have mostly been showing you how to use Quarto with RStudio. But if you are a VS Code user you can just install the Quarto extension there and use it within VS Code.

Quarto and the future

Rmarkdown and Jupyter are great and widely supported. They will likely be superseded by Quarto over time.

To summarize: * Rmarkdown and Jupyter will largely work with Quarto already * New features and increased customization option * You will probably want to use embed-resources

Exercises

Exercise on Reproducibility Reports can be found here

Contact

Any suggestions, comments, edits or questions (about content or the slides themselves) please reach out to our GitHub and raise an issue.