All prerequisites, links to material and slides for this course can be found on github.
Or can be downloaded as a zip archive from here.
Once the zip file in unarchived. All presentations as HTML slides and pages, their R code and HTML practical sheets will be available in the directories underneath.
Before running any of the code in the practicals or slides we need to set the working directory to the folder we unarchived.
You may navigate to the unarchived Reproducible_R folder in the Rstudio menu.
Session -> Set Working Directory -> Choose Directory
or in the console.
“Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.” – Donald E. Knuth, Literate Programming, 1984
Sometime in the future, I, or someone else, will need to understand what analysis I did here.
There is a growing push to ensure all research is open and reproducible. New NIH guidelines are going to require plans for preservation and sharing of data, which includes your code.
Using RStudio to make reproducible documents is very easy, so why not?
When we do some analysis the ideal situation is the preservation od ALL aspects of the analysis in a single document.
Rmarkdown can help with this.
Here is an example of a report generated from Rmd. The Rmd from which this is made is in the scripts directory.
scripts/markdownExample.html scripts/markdownExample.Rmd
R Markdown is built on Markdown.
Github and Sourceforge make use of Markdown syntax in their Readme files and renders these in their web pages. It is also used by other notebook tools like Jupyter.
Markdown uses simple syntax to control text output.
This allows for the inclusion of font styles, text structures, images and code chunks.
There are 3 main parts
In R Markdown the options for document processing are stored in YAML format at the top of the document. Most of this is automatically generated when you open a new Rmd.
The output YAML option specifies the document type to be produced. THere are several options.
Global default options can also be set in the YAML. For example figure sizes can be set within the YAML metadata.
Styles for HTML can be applied using the theme option and syntax highlighting styles control by the highlight option.
For a full list of theme options see - http://rmarkdown.rstudio.com/html_document_format.html
The free form text and annotation is set in Markdown. This written as plain text and has a specific set of formatting rules. For example it ignores new lines.
To include a new line in markdown, you need to end the previous line with two spaces.
This is my first line. # the comment here shows the line end
This would be a new line.
This wouldn't be a new line.
This is my first line.
This would be a new line. This wouldn’t be a new line.
Emphasis can be added to text in markdown documents using either the **_** or __*__
Italics
Bold
Figures or external images can be used in markdown documents.
Files may be local or accessible from http URL.
HTML links can be included in Markdown documents either by simply including address in text or by using [] for the phrase to add link to, followed the link in ()
Section headers can be added to Markdown documents.
Headers follow the same conventions as used in HTML markup and can implemented at multiple levels of size. Section headers in Markdown are created by using the # symbol
In RMarkdown, text may be highlighted as if code by placing the text between apostrophes in triplicate: ’’’. The engine used to evaluate the code is in curly brackets, in this case R.
This is what the code chunk renders to in your report:
Many other options can be supplied to an individual code chunk
R can produce a lot of output not related to your results. To control whether messages and warnings are reported in the rendered document we can specify the message and warning arguments.
Loading libraries in R Markdown is often somewhere you would specify these as FALSE.
There are many chunk control options * eval - Run the code? * echo - Include code in final report? * tidy - tidy up the code? * cache - save a cache * fig.height/fig.width - size of plot made by code * fig.path and dev - save plots to specified path in specified format
The results of printing data frames or matrices in the console aren’t neat.
We can insert tables into R Markdown by using the knitr function kable().
temp | temp2 |
---|---|
-0.5657569 | -0.4546865 |
-0.8049686 | 0.2702011 |
-0.0391181 | -0.6841633 |
Most of your code will be in code chunks. But it may be useful to report the results of R within the block of Markdown. This can be done adding the code to evalulate within ‘r’
Here is some freeform _markdown_ and the first result
from an rnorm call is 'r rnorm(3)[1]', followed by some
more free form text.
Here is some freeform markdown and the first result from an rnorm call is 1.2933661, followed by some more free form text.
As we are summarizing data in a dynamic document, we can make data exporation easier by creating interactive versions of plots and tables.
To do this we use the plotly package. Lets load in the package and the plot we want to make interactive. We have an R object that contains a ggplot2 object in our data directory.
The ggplotly function will wrap our ggplot object and add interactivity so we can hover over the points.
The information assocaited with the plot are naturally inherited as labels. If we want to add custom labels we can specify it in our plot.
## Warning in geom_point(aes(label = Sample)): Ignoring unknown aesthetics: label
If we want full control over the labeling we can instead use the ggplotly function to take care of this for us by specifying when/what/where labels are shown.
## Warning in geom_point(aes(text = Sample)): Ignoring unknown aesthetics: text
Tables - DT
Simlar to plots we can create interactive tables that are both searchable and sortable. We do this with the DT package. Lets make some quick demo data.
label <- c("Gene1", "Gene2", "Gene3")
temp <- rnorm(3)
temp2 <- rnorm(3)
dfExample <- data.frame(label, temp, temp2)
dfExample
## label temp temp2
## 1 Gene1 0.9082745 -0.1489549
## 2 Gene2 -1.8589354 0.7825425
## 3 Gene3 -0.6779331 0.3106761
DT
We can simply use the datatable function to then create our interactive table.
DT
There are many ways to customize our table i.e. removing rownames, adding titles, default sorting etc
Rmarkdown has been an incredibly useful tool for building reports for a long time, but it is predominately focused on R (though it can be used indirectly for python).
We have also mentioned Jupyter Notebook. It is a similar notebook system. It was built for Python (though it can be used indirectly for R).
Quarto is a new form of report built on Rmds that can handle python and other languages in a more native way.
Quarto is supported by Posit, the people behind RStudio (who also developed Rmarkdown). Right now if you work in R, Rmarkdown will suffice, but the use of Quarto is growing and ultimately it will be the succesor.
Luckily everything you have just learnt about Rmd formatting will work for Quarto Markdowns (qmds) too.
It will natively allow you to run code for: R, Python, Julia, JavaScript.
Bioinformatics has a rich history with R, while Python has a wealth of machine learning packages. With the more recent developments in Deep leanring and AI more folks are dabbling with specific Python tools for parts of their analysis. As a result bioinformatics is becoming increasingly multi-language.
There is a lot of active development to add new features not found in Rmds:
In Rstudio it is the same as making a Rmd. Just pick a different option:
New > Quarto Document
This should largely look the same.
Quarto
Rmarkdown
This will be accepted by Quarto regardless. There is backwards compatability.
Most of the YAML will work in the same way, though there are slight differenes in the preferred names.
Quarto
Rmarkdown
This will be accepted by Quarto regardless. There is some backwards compatability.
Resource like images are not embedded by default into your report when you are using Quarto. This means if you move your report to share it, unless you also move the files associated with the rendered html i.e. MyReport.html and MyReport_files.
To ensure everything you need is embedded into the final document you just need to add embed-resources: true to the YAML.
As mentioned many people who already are making reports, use Jupyter. These are typically .ipynb files.
Quarto can be used to render these files directly. To do this you just need to add a new cell at the top containing our YAML. Code arguments can also be added.
You just need to run Quarto directly to render the Jupyter notebook. You will run this opn the command line.
We have mostly been showing you how to use Quarto with RStudio. But if you are a VS Code user you can just install the Quarto extension there and use it within VS Code.
Rmarkdown and Jupyter are great and widely supported. They will likely be superseded by Quarto over time.
To summarize: * Rmarkdown and Jupyter will largely work with Quarto already * New features and increased customization option * You will probably want to use embed-resources
Exercise on Reproducibility Reports can be found here
Any suggestions, comments, edits or questions (about content or the slides themselves) please reach out to our GitHub and raise an issue.