These exercises follow the material in the GitHub section of Reproducible
R.
- Command line Git
- Create a new folder called GitHub_Training, and turn it into a git
repository using the git command line.
- Create a README.md. Write in the name of a TV show.
- Copy the dayOfWeek Rscript from the data directory to the
GitHub_Training directory.
- Add and commit each the README and the Rscript separately.
- Create a project on GitHub. Add this as a remote to your local Git
for GitHub_Training.
- Push changes up to the remote.
mkdir GitHub_Training
cd GitHub_Training
git init
touch README.md
echo 'The Twilight Zone' >> README.md
cp ../data/dayOfWeek.r .
git add README.md
git commit -m"create README"
git commit -m"add days of week script"
git remote add origin https://github.com/BRC-RU/GitHub_Training.git
git push origin master
- RStudio Git Integration
- Create a version controlled R project. Link this to your
GitHub_Training remote. Save this in a different place to the Git
repository you created in section 1 with command line Git.
- Update the README.md to include the name of a movie.
- Create a new Rmarkdown. Within this source the dayOfWeek Rscript.
Add session information. and then compile the Rmarkdown to HTML.
- Update the .gitignore so Rmd files are ignored.
- Stage and commit changes through RStudio. Remember to not commit
every change together. Commit changes that make sense
together.
- Push changes up to GitHub.
- Workflow and Collaboration Next we will practice the GitHub
workflow. To do this we will use the repository here. This is
my collection of GIFS for how this training session makes me feel, and
you can add to this.
Contribute to the issues. Either create a new one, or add
comments to existing issues.
Take a fork of the repository. You can create a local copy
through RStudio or command line git.
Find GIF that you like. (GIPHY
has lots of options). Copy the URL.
Open up the README.md document. Add a line to insert you GIF to
the README.md. An example line is here: <img src=“https://my_gif_url_here.gif” width=“300”/>
Create a pull request to merge your fork back into the main
repository.
- Build a Docker image from Github
Take a fork of this repository: https://github.com/RockefellerUniversity/reproducibility_exercise
Create a local copy by cloning the repository
You’ll notice there is a Dockerfile in the directory that you
have created. Currently, this Dockerfile only installs the
pseudo-aligner software Salmon.
Make the same plot that we made in the Docker exercises using the
Docker image from Docker Hub.
- This Dockerfile does not have ggplot2, add a line of code to make
sure ggplot2 will be installed in an image built form this
Dockerfile.
- Build an image from this revised Dockerfile
- Run a container from this image and open an RStudio session in your
browser. HINT: make sure you mount the directory of the cloned
repository (on your computer) onto the Docker container.
- Activate the conda environment that contains salmon and confirm the
version that you have installed
- Use salmon to get counts for the fastq files present in the folder
from the cloned repository.
- code for indexing salmon: salmon index -t /path/to/fasta_file.fa -i
/path/to/index_destination
- code for pseudo-alignment: salmon quant -i
/path/to/index_destination -l A -1 path/to/reads_1.fastq -2
path/to/reads_2.fastq –output path/to/output_dir
- learn more with the salmon
manual
- Read in the quant.sf result file into your RStudio session and make
a bar graph showing the number of reads for each of the genes in the
fasta file.
- save a png or pdf of the plot in the cloned repository
directory
- Add the revised Dockerfile and the plot to the git repositoy on your
computer
- Push these changes to your forked repository on Github.
# clone using git on terminal
git clone https://github.com/dougbarrows/reproducibility_exercise
# navigate to the directory we just made from cloning
cd reproducibility_exercise
## Edited Dockerfile

# build the image
docker image build -t salmon_with_ggplot2 .
docker container run --rm \
-v .:/home/rstudio \
-p 8787:8787 \
-e PASSWORD=password123 \
salmon_with_ggplot2
# open up the RStudio session at http://localhost:8787/
# activate the conda environment
Herper::local_CondaEnv("pipe_env", "/home/miniconda")
system("salmon --version")
# index and count with salmon
system("salmon index -t transcripts.fasta -i transcripts_index")
system("salmon quant -i transcripts_index -l A -1 reads_1.fastq -2 reads_2.fastq --output sample_counts")
library(ggplot2)
counts <- read.table("sample_counts/quant.sf", header = T)
ggplot(counts, aes(x = Name, y = NumReads)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
ggsave("exercise_plot.pdf")

# look at files that have changed in git repo
git status

# look at files that have changed in git repo
git add Dockerfile exercise_plot.pdf
git commit -m "added ggplot and plot"
git push origin main
