Â
These exercises follow the material in the Docker section of Reproducible
R.
- Build a Docker image using a Dockerfile with the following
components:
RStudio with R4.4
The following packages from CRAN: ggplot2 and
BiocManager
The following packages from Bioconductor: Herper
the pseudo aligner ‘salmon’ from conda
Confirm that this image is available on your computer
# the dockerfile to build this image is at './data/docker_exercise/Dockerfile_exercise' within the course folders
docker build -t rstudio_4.4.0_salmon -f ./data/docker_exercise/Dockerfile_exercise ./data/docker_exercise/
# Confirm that this image is available on your computer
docker images
- Launch a container from this image and mount the ‘data’ folder
within the course folder structure onto the docker image. Launch RStudio
from your browser
- make sure that the files in the ‘data’ folder are visible within the
RStudio session within the docker image
# launch from r_course directory in course matrials
cd ~/Downloads/RU_reproducibleR-master/r_course
docker run --rm \
-v ./data:/home/rstudio \
-p 8787:8787 \
-e PASSWORD=password123 \
rstudio_4.4.0_salmon
# go to browser and go to http://localhost:8787/
- Activate the conda environment that contains salmon and confirm the
version that you have installed
Herper::local_CondaEnv("pipe_env", "/home/miniconda")
system("salmon --version")
- Use salmon to get counts for the fastq files present in the
‘data/docker_exercise’ folder within the course files.
- code for indexing salmon: salmon index -t /path/to/fasta_file.fa -i
/path/to/index_destination
- code for pseudo-alignment: salmon quant -i
/path/to/index_destination -l A -1 path/to/reads_1.fastq -2
path/to/reads_2.fastq –output path/to/output_dir
- learn more with the salmon
manual
system("salmon index -t docker_exercise/transcripts.fasta -i docker_exercise/transcripts_index")
system("salmon quant -i docker_exercise/transcripts_index -l A -1 docker_exercise/reads_1.fastq -2 docker_exercise/reads_2.fastq --output docker_exercise/sample_counts")
- Read in the quant.sf result file into R and plot a bar graph showing
the number of reads for each of the genes in the fasta file.
library(ggplot2)
counts <- read.table("docker_exercise/sample_counts/quant.sf", header = T)
ggplot(counts, aes(x = Name, y = NumReads)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

- Push this image to Docker Hub
- HINT: Make sure you create a repository on Docker Hub before
pushing.

# Code in terminal
# log in and provide credentials used to sign into Docker Hub
# this will prompt you to enter username and password
docker login -u username
# get image ID to tag and push
docker images
# tag the image you want to push with your Docker Hub username and a tag name after the colon
# the ID is from the 'docker images' command
docker tag 2c5152e60109 rubrc/rstudio_4.4.0_salmon:topush
# push to Docker Hub
docker push rubrc/rstudio_4.4.0_salmon:topush