class: center, middle, inverse, title-slide .title[ # Using containers with Docker
] .author[ ### Rockefeller University, Bioinformatics Resource Centre ] .date[ ###
https://rockefelleruniversity.github.io/RU_reproducibleR/
] --- ## Set Up All prerequisites, links to material and slides for this course can be found on github. * [Reproducible_R](https://rockefelleruniversity.github.io/RU_reproducibleR/) Or can be downloaded as a zip archive from here. * [Download zip](https://github.com/RockefellerUniversity/RU_reproducibleR/archive/master.zip) --- ## Course materials Once the zip file in unarchived. All presentations as HTML slides and pages, their R code and HTML practical sheets will be available in the directories underneath. * **presentations/slides/** Presentations as an HTML slide show. * **presentations/singlepage/** Presentations as an HTML single page. * **presentations/r_code/** R code in presentations. * **exercises/** Practicals as HTML pages. * **answers/** Practicals with answers as HTML pages and R code solutions. * **data/** Data used in this presentation. --- class: inverse, center, middle # What are containers? <br> Why should we use them? <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## The problem Something works on your computer (e.g. bioinformatics analysis or software deployment), and you want to make sure that it will work on another computer. <img src="imgs/jhu_docker_rationale.png" width="75%" /> <font size='1'><a href="https://jhudatascience.org/Adv_Reproducibility_in_Cancer_Informatics/launching-a-docker-image.html">https://jhudatascience.org/Adv_Reproducibility_in_Cancer_Informatics/launching-a-docker-image.html - </a><a href="https://creativecommons.org/licenses/by/4.0">CC-BY 4.0</a></font> --- ## The solution - Docker! Docker allows for the creation of an isolated environment that can be shipped across different users, machines, or operating systems, and to virtual machines or the cloud. <img src="imgs/jhu_docker_rationale2.png" width="75%" /> <font size='1'><a href="https://jhudatascience.org/Adv_Reproducibility_in_Cancer_Informatics/launching-a-docker-image.html">https://jhudatascience.org/Adv_Reproducibility_in_Cancer_Informatics/launching-a-docker-image.html - </a><a href="https://creativecommons.org/licenses/by/4.0">CC-BY 4.0</a></font> --- ## Docker client and host Once installed on your computer, Docker runs a process called the Docker daemon. A daemon is a program that runs as a background process and is not under direct control of the computer user, and the Docker daemon is the engine that manages Docker services and objects by communicating with the client. The Docker client, typically the command line interface, communicates with the Docker daemon based on [user commands.](https://docs.docker.com/engine/reference/commandline/cli/) <img src="imgs/docker_schema_empty.png" width="85%" style="display: block; margin: auto;" /> --- ## Creating Docker images The 'docker image build' command uses a Dockerfile to create an image. A Docker image is a read-only, isolated file system that contains all software, dependencies, scripts, and metadata required to run a container. <img src="imgs/docker_schema_addBuild.png" width="85%" style="display: block; margin: auto;" /> --- ## Launching Docker containers Once an image is built, an instance of this image can be launched as a stand-alone application, also known as a container. <img src="imgs/docker_schema_addRun.png" width="85%" style="display: block; margin: auto;" /> --- ## Pulling Docker images There are public repositories of Docker images (e.g. [Docker Hub](https://hub.docker.com/)), and typically you start with an existing image and build on top of this. <img src="imgs/docker_schema_all.png" width="75%" style="display: block; margin: auto;" /> --- ## Installing Docker Use [this link](https://www.docker.com/get-started/) to install Docker. * Click on the Docker desktop icon and make an account with Docker. * Docker must be open and running to use the command line interface (CLI), which is how we will primarily use Docker. * [See here](https://docs.docker.com/engine/reference/commandline/cli/) for Docker CLI commands. Check Docker version to make sure Docker is installed and running. *Code (terminal):* ``` sh docker --version ``` *Output:* <img src="imgs/docker_version.png" width="40%" style="display: block; margin: auto auto auto 0;" /> --- ## Installing Docker If previous command isn't found check the Docker Desktop advanced settings and make sure CLI tools are available system-wide. <img src="imgs/docker_config.png" width="100%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Running Docker containers<br>from Docker Hub images <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Pulling Docker images - Rocker [Rocker](https://rocker-project.org/) is a very useful source of images on Docker Hub for R and RStudio. We can pull these images immediately after installing Docker. Here we pull an image containing RStudio and a specific version of R. *Code (terminal):* ``` sh docker image pull rocker/rstudio:4.4.0 # alias of 'docker pull' from older Docker versions ``` --- ## Viewing local Docker images After pulling, the image is now available on our system to run. Images have names, tags, and image IDs as shown in the output. The ID is a hash of the metadata and filesystem of the Docker image. *Code (terminal):* ``` sh docker images ``` *Output:* <img src="imgs/docker_images.png" width="100%" style="display: block; margin: auto auto auto 0;" /> --- ## Viewing local Docker images After pulling, the image is now available on our system to run. Images have names, tags, and image IDs as shown in the output. The ID is a hash of the metadata and filesystem of the Docker image. We can also view the image in Docker desktop: <img src="imgs/docker_desktop_images.png" width="80%" style="display: block; margin: auto;" /> --- ## Running Docker containers Once the image is on our system, we can launch a container with the ['docker container run' command](https://docs.docker.com/engine/reference/commandline/run/). Components of the run command: * --rm: this will automatically remove a container when you exit, otherwise can take up room on computer with old, unused containers * -p: before the colon is the port on your computer to be exposed and after the colon is the port inside the container * -e: an environmental variable is set when the container is run, and this will be the password to login * the last argument is the image name and tag (both seen with 'docker images') *Code (terminal):* ``` sh docker container run --rm \ -p 8787:8787 \ -e PASSWORD=password123 \ rocker/rstudio:4.4.0 # alias of 'docker run' from older Docker versions ``` --- ## Running Docker containers While the container is running, we can go to 'http://localhost:8787' in a browser and log in with the the user 'rstudio' and the password from 'docker container run'. <img src="imgs/rstudio_interface.png" width="100%" style="display: block; margin: auto;" /> --- ## Listing active Docker containers To see all containers running in the local environment, use the 'docker container ls' command *Code (terminal):* ``` sh docker container ls # alias of 'docker ps' from older Docker versions ``` *Output:* <img src="imgs/docker_ps.png" width="100%" style="display: block; margin: auto;" /> --- ## Stopping Docker containers To stop the container currently running, if you are in the terminal tab where it was launched press Ctrl+C. Or open up another tab and the 'docker stop' command can be used with the ID listed from 'docker container ls' *Code (terminal):* ``` sh docker container stop 8b1619f9189d # this is the ID from 'docker container ls' docker container ls ``` *Output:* <img src="imgs/docker_stop.png" width="100%" style="display: block; margin: auto;" /> --- ## Adding volumes to containers The docker container has it's own file system, and we can mount a local directory onto that file system with the '-v' flag for the 'docker container run' command. * Navigate to the 'r_course' directory within the downloaded course using the 'cd' command in the terminal * Use the 'docker container run' command with the '-v' flag + the left side of the colon is the path on your computer to mount + the right side is the location within the docker container file system where that data will be accessible + '/home/rstudio' is set by the container to be the working directory of Rstudio *Code (terminal):* ``` sh # navigate to 'r_course' directory in downloaded material cd ~/Downloads/RU_reproducibleR-master/r_course # launch docker container docker container run --rm \ -v ./data:/home/rstudio \ -p 8787:8787 \ -e PASSWORD=password123 \ rocker/rstudio:4.4.0 ``` --- ## Adding volumes to containers The RStudio interface now shows the files in the 'data' directory <img src="imgs/rstudio_interface_volume.png" width="100%" style="display: block; margin: auto;" /> --- ## Adding volumes to containers These files can be read into R, and also files can be written to the local environment *Code (R in docker image):* ``` r dataIn <- read.csv("readThisTable.csv") head(dataIn, 2) # add gene IDs and write to new file on local computer dataIn$Gene_ID <- seq(nrow(dataIn)) write.csv(dataIn, "rnaseq_table_withIDs.csv") ``` --- ## Adding volumes to containers These files can be read into R, and also files can be written to the local environment *Output:* <img src="imgs/rstudio_interface_volume_write.png" width="100%" style="display: block; margin: auto auto auto 0;" /> --- ## Adding volumes to containers In addition to the files deliberately written to the local directory, the R environment files from this RStudio session are written to the working directory in the container, and therefore are copied to the local directory as hidden folders (.config and .local). This R environment will then be loaded the next time you launch an RStudio container with this volume mounted. While this is normally okay, if desired a fresh RStudio session can be launched with the same mounted volume by removing these hidden directories. *Code (terminal):* ``` sh # For windows use: dir /a ls -a data rm -r data/.local data/.config ``` *Output:* <img src="imgs/docker_hidden_files.png" width="100%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Building custom images<br>from a Dockerfile <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Dockerfile basics and commands The image we pull from Rocker contains base R and its associated packages. To customize the image, we will need to make a Dockerfile that adds to the Rocker image. A Dockerfile provides the recipe to make the image. Using [specialized commands](https://docs.docker.com/engine/reference/builder/), this file provides instructions to install the R packages and its dependencies. Some examples: * FROM: sets the base image and further instructions build off of this * RUN: executes a command as if in terminal * LABEL: add metadata to the image * COPY: copies files from the the host system to the image file system * CMD: when the container is launched, this is the command that will be run --- ## Dockerfile components <img src="imgs/dockerfile1_all.png" width="85%" style="display: block; margin: auto auto auto 0;" /> --- ## Dockerfile components Here we start with the same RStudio base image we used previously, and then add some key R packages. <img src="imgs/dockerfile1_FROM.png" width="85%" style="display: block; margin: auto auto auto 0;" /> --- ## Dockerfile components The first RUN command installs system dependencies that are common to R packages. This command looks for updates, installs, and cleans up unnecessary files. <img src="imgs/dockerfile1_sys_deps.png" width="85%" style="display: block; margin: auto auto auto 0;" /> --- ## Dockerfile components This is a list of libraries is not comprehensive and adding more R packages could result in missing dependencies. These can be identified in the log for the build command and manually added to the *apt-get* command, or alternatively, dependencies for a specific R or Python package can be found using the [Posit package manager](https://packagemanager.posit.co). This resource can also be used to install only those libraries that are necessary, which can help to limit the size of the image. <img src="imgs/dockerfile1_sys_deps.png" width="85%" style="display: block; margin: auto auto auto 0;" /> --- ## Dockerfile components Then the R packages are installed using 'install.packages' or 'BiocManager::install' for Bioconductor packages. Note: The 'options(warn=2)' at the beginning of the R command will stop the installation when there is a warning, making it easier to debug. <img src="imgs/dockerfile1_Rpackages.png" width="85%" style="display: block; margin: auto auto auto 0;" /> --- ## Dockerfile components The port 8787 is exposed and the 'init' script that is included with the base RStudio image. These commands are specific to this image and will vary depending on what the Docker container is meant to do when launched. <img src="imgs/dockerfile1_EXPOSE_CMD.png" width="85%" style="display: block; margin: auto auto auto 0;" /> --- ## Building an image with a Dockerfile * A tag is added to distinguish this image * The directory that contains the Dockerfile is the last argument * If no file name is given, it will look for a file called 'Dockerfile' + 'Dockerfile' is in the data directory of course materials *Code (terminal):* ``` sh cd ~/Downloads/RU_reproducibleR-master/r_course docker image build -t rocker/rstudio:4.4.0_v2 ./data ``` *Output:* <img src="imgs/dockerV1_build_log.png" width="100%" style="display: block; margin: auto auto auto 0;" /> --- ## Building an image with a Dockerfile Use the *docker images* command to see image *Code (terminal):* ``` sh docker images ``` *Output:* <img src="imgs/docker_images_v1.png" width="65%" style="display: block; margin: auto auto auto 0;" /> --- ## Running the custom container As done previously, use the *docker container run* command to launch a container with our customized RStudio session *Code (terminal):* ``` sh docker container run --rm \ -v ./data:/home/rstudio \ -p 8787:8787 \ -e PASSWORD=password123 \ rocker/rstudio:4.4.0_v2 ``` --- ## Running the custom container As done previously, use the *docker container run* command to launch a container with our customized RStudio session *Output:* <img src="imgs/docker_image_v1_interface.png" width="80%" style="display: block; margin: auto auto auto 0;" /> --- class: inverse, center, middle # Install conda packages<br>in a Docker image <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Use Herper for conda packages <img src="imgs/dockerfile_samtools_all.png" width="100%" style="display: block; margin: auto auto auto 0;" /> --- ## Use Herper for conda packages The directory that contains the Dockerfile is the last argument This Dockerfile is not named 'Dockerfile', so we specify the exact path with '-f' argument *Code (terminal):* ``` sh docker image build -t rocker/rstudio:4.4.0_samtools -f ./data/Dockerfile_samtools ./data/ ``` *Output:* <img src="imgs/docker_samtools_build_log.png" width="100%" style="display: block; margin: auto auto auto 0;" /> --- ## Use Herper for conda packages *Code (terminal):* ``` sh docker images ``` *Output:* <img src="imgs/docker_images_samtools.png" width="70%" style="display: block; margin: auto auto auto 0;" /> *Code (terminal):* ``` sh docker container run --rm \ -v ./data:/home/rstudio \ -p 8787:8787 \ -e PASSWORD=password123 \ rocker/rstudio:4.4.0_samtools ``` --- ## Use Herper for conda packages *Code (R in docker image):* ``` r library(Herper) # the environment name and miniconda path set in the Dockerfile Herper::local_CondaEnv(new = "pipe_env", pathToMiniConda = "/home/miniconda") # test out samtools system("samtools --help") ``` *Output:* <img src="imgs/docker_image_samtools_interface.png" width="75%" style="display: block; margin: auto auto auto 0;" /> --- class: inverse, center, middle # Run container from Docker Desktop <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Run container from Docker Desktop We can also run containers from Docker Desktop <img src="imgs/docker_desktop_samtools.png" width="100%" style="display: block; margin: auto auto auto 0;" /> --- ## Run container from Docker Desktop We can also run containers from Docker Desktop <img src="imgs/docker_desktop_samtools_running.png" width="100%" style="display: block; margin: auto auto auto 0;" /> --- class: inverse, center, middle # Run Docker containers interactively in the terminal <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Run R interactively in container Oftentimes Docker is used to run a specific set of software interactively within the terminal. This allows for running custom scripts contained within the Docker container as well. So far we have only been using the R Studio Docker image which provides it's own user interface. Here we will use Docker to run a specific version of R and R packages in the terminal interactively. The Rocker suite has images that just contain R, without R studio. --- ## Run R interactively in container All we have to do is change the base Docker image that we pull, and change the command at the end of the Dockerfile to 'R' so it will run R once we launch a container. <img src="imgs/dockerfile_R.png" width="70%" style="display: block; margin: auto auto auto 0;" /> --- ## Run R interactively in container After building this image as we have done previously, it should be run with the '-it' flag, which will run this container within an interactive terminal. Using these flags, when the 'R' command at the end of the Dockerfile is run upon launch, it will open the version of R in the image. First we build this image from the Dockerfile like we've done before. *Code (terminal):* ``` sh docker image build -t rocker/r-ver:4.4.0_cust -f ./data/Dockerfile_R ./data/ ``` Then we run a container with the *data* directory mounted and the '-it' flag. *Code (terminal):* ``` sh cd ~/Downloads/RU_reproducibleR-master/r_course docker container run -it -v ./data:/data rocker/r-ver:4.4.0_cust ``` *output on next slide --- ## Run R interactively in container *Output (terminal):* <img src="imgs/onlyR_docker_Routput.png" width="80%" style="display: block; margin: auto auto auto 0;" /> --- ## Run Python interactively in container Python also has [images available](https://hub.docker.com/_/python) on Docker Hub and we can make a Dockerfile that has a specific version of Python and any other packages we want available in that environment. Here we pull the python image, use pip to install a Python package (scanpy), then end with the 'python' command to open a Python session in the terminal. Dockerfile: <img src="imgs/dockerfile_scanpy.png" width="30%" style="display: block; margin: auto auto auto 0;" /> --- ## Run Python interactively in container *Code (terminal):* ``` sh docker image build -t python_scanpy -f ./data/Dockerfile_scanpy ./data/ cd ~/Downloads/RU_reproducibleR-master/r_course docker container run -it -v ./data:/data python_scanpy ``` <img src="imgs/python_console.png" width="100%" style="display: block; margin: auto auto auto 0;" /> --- class: inverse, center, middle # Share images with Docker Hub <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Pushing images to Docker Hub If we then want to share our images with someone else, or simply store them elsewhere for future use, we can push to Docker Hub. Make sure you have an account on [Docker Hub](https://hub.docker.com/). Then create a repository with the same name as the image we want to push. Let's share the R Studio image with samtools, so it would be 'rstudio_4.4.0_samtools'. A repository must exist on Docker Hub before pushing to it. <img src="imgs/dockerhub_create_repo.png" width="100%" style="display: block; margin: auto auto auto 0;" /> --- ## Pushing images to Docker Hub Once the repository is created on Docker Hub, we do the following steps to push the image: * login to Docker with your credentials * tag the image you want to push with the version * push to Docker Hub *Code (terminal):* ``` sh # log in and provide credentials used to sign into Docker Hub # include the username you used for dockerhub and this will prompt you to enter the password docker login -u (enter your username) # tag the image you want to push with your Docker Hub username and a tag name after the colon # the ID is from the 'docker images' command docker image tag 292c85d1812f rubrc/rstudio_4.4.0_samtools:topush # push to Docker Hub docker image push rubrc/rstudio_4.4.0_samtools:topush ``` --- ## Pushing images to Docker Hub If we then want to share our images with someone else, or simply store them elsewhere for future use, we can push to Docker Hub. Make sure you have an account on [Docker Hub](https://hub.docker.com/). <img src="imgs/dockerhub_after_push.png" width="100%" style="display: block; margin: auto auto auto 0;" /> --- class: inverse, center, middle # Use renv and Docker together <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Make a lock file renv and Docker can be used in tandem to easily recreate and R environment. There is a renv lock file in the 'r_course/data/renv_docker' folder within the course materials. This lock file generated by renv shows the versions of R and the loaded packages that were part of my project. <img src="imgs/lock_file_docker.png" width="70%" style="display: block; margin: auto auto auto 0;" /> --- ## Dockerfile with renv We will now use renv within a Dockerfile. R still needs to be installed to use renv, so we use Rocker again to install a specific version of R to match the renv lock file. <img src="imgs/dockerfile_renv_rver.png" width="80%" style="display: block; margin: auto auto auto 0;" /> --- ## Dockerfile with renv When building the image the lock file is copied to the image into a directory that is created and set as the working directory with the WORKDIR command. <img src="imgs/dockerfile_renv_restore.png" width="80%" style="display: block; margin: auto auto auto 0;" /> --- ## Building image and running container Build the image with the build context (last argument) set to the directory containing the Dockerfile and the lock file, then launch a container. ``` sh cd ~/Downloads/RU_reproducibleR-master/r_course # build the image docker image build -t rocker/rstudio:4.4.0_renv ./data/renv_docker # launch a container docker container run --rm \ -v ./data/renv_docker:/home/rstudio \ -p 8787:8787 \ -e PASSWORD=password123 \ rocker/rstudio:4.4.0_renv ``` --- ## Using Docker on HPC Docker is generally not allowed on the HPC due to the need to have root access. You can use Docker images by using another containerization software called Apptainer. Apptainer was designed to play nice with Docker and the [Apptainer manual](https://apptainer.org/docs/user/main/docker_and_oci.html) goes into detail about how to use Apptainer with Docker images. Apptainer is not installed on the head nodes, so you can run interactively if your lab has it's own node, or you can submit jobs to the HPC scheduler. ``` sh # login to your lab node # pull image from Docker Hub apptainer pull docker://rocker/r-ver:4.4.0 # open an R console using the resulting Apptainer image apptainer exec r-ver_4.4.0.sif R ``` --- ## Exercises Exercise on Reproducibility in R can be found [here](../../exercises/exercises/Docker_exercise.html) --- ## Contact Any suggestions, comments, edits or questions (about content or the slides themselves) please reach out to our [GitHub](https://github.com/RockefellerUniversity/Reproducible_R/issues) and raise an issue.