Conda is an open-source and cross-platform package and environment management system.
It was originally developed to help manage packages for Python, but that has expanded beyond python and is very popular with R users as well.
There are many different repository tools in the Conda ecosystem. They all have slight differences.
Conda - The core package manager
Miniconda - A minimal install of Conda
Anaconda - A maximal install of Conda
Mamba - A reimplementation of Conda built in C++ [FASTER]
Minimamba - A minimal install of Mamba
Herper is an R package that provides a simple toolset to install and manage Conda packages and environments from within the R console.
It is built up from the package reticulate which is used to run python from within R.
The install_CondaTools() function allows the user to specify required Conda software and the desired environment to install into.
Miniconda is installed as part of the process (by default into the r-reticulate’s default Conda location - /github/home/.local/share/r-miniconda) and the user’s requested conda environment built within the same directory (by default /github/home/.local/share/r-miniconda/envs/USERS_ENVIRONMENT_HERE).
library(Herper)
install_CondaTools(tools = "star", env = "rnaseq")
$pathToConda
[1] "/Users/mattpaul/Desktop/My_Conda/bin/conda"
$environment
[1] "rnaseq"
$pathToEnvBin
[1] "/Users/mattpaul/Desktop/My_Conda/envs/rnaseq/bin"
If you already have Miniconda installed or you would like to install to a custom location, you can specify the path with the pathToMiniConda parameter.
<- "~/Desktop/My_Conda"
my_miniconda
install_CondaTools(tools = "star", env = "rnaseq", pathToMiniConda = my_miniconda)
It’s easy to add more tools. You just use install_CondaTools() again, but an extra argument also needs to be added: updateEnv = TRUE.
By default when we run this command, the path to Conda and the environment is returned. We can save this as a variable for later.
<- install_CondaTools(tools = c("salmon", "kallisto"), env = "rnaseq",
conda_paths updateEnv = TRUE, pathToMiniConda = my_miniconda)
conda_paths
$pathToConda
[1] "/Users/mattpaul/Desktop/My_Conda/bin/conda"
$environment
[1] "rnaseq"
$pathToEnvBin
[1] "/Users/mattpaul/Desktop/My_Conda/envs/rnaseq/bin"
The list_CondaPkgs function allows users to check what packages are installed in a given environment.
list_CondaPkgs("rnaseq", pathToMiniConda = my_miniconda)
boost-cpp 1.78.0 conda-forge osx-64
bzip2 1.0.8 pkgs/main osx-64
c-ares 1.19.0 pkgs/main osx-64
ca-certificates 2023.05.30 pkgs/main osx-64
hdf5 1.12.2 conda-forge osx-64
htslib 1.17 bioconda osx-64
icu 70.1 conda-forge osx-64
kallisto 0.50.0 bioconda osx-64
krb5 1.20.1 pkgs/main osx-64
libaec 1.0.6 conda-forge osx-64
If the user is unsure of the exact name, or version of a tool available on conda, they can use the conda_search function. Searches will find close matches for incorrect queries.
conda_search("kall", pathToMiniConda = my_miniconda)
There are no exact matches for the query 'kall', but multiple packages contain this text:
- kallisto
- r-merge-kallisto
If you have the exact name you can search for what versions are available on Conda.
conda_search("kallisto", pathToMiniConda = my_miniconda)
2 kallisto 0.43.1 https://conda.anaconda.org/bioconda/osx-64
4 kallisto 0.44.0 https://conda.anaconda.org/bioconda/osx-64
5 kallisto 0.45.0 https://conda.anaconda.org/bioconda/osx-64
6 kallisto 0.45.1 https://conda.anaconda.org/bioconda/osx-64
8 kallisto 0.46.0 https://conda.anaconda.org/bioconda/osx-64
9 kallisto 0.46.1 https://conda.anaconda.org/bioconda/osx-64
12 kallisto 0.46.2 https://conda.anaconda.org/bioconda/osx-64
15 kallisto 0.48.0 https://conda.anaconda.org/bioconda/osx-64
16 kallisto 0.50.0 https://conda.anaconda.org/bioconda/osx-64
Specific package versions can be searched for using the Conda format i.e. “kallisto==0.46”, “kallisto>=0.48” or “kallisto<=0.45”.
conda_search("kallisto<=0.45", pathToMiniConda = my_miniconda)
2 kallisto 0.43.1 https://conda.anaconda.org/bioconda/osx-64
4 kallisto 0.44.0 https://conda.anaconda.org/bioconda/osx-64
We can use the same version nomenclature to also install these tools. Here we will downgrade the version of Kallisto we have installed.
<- install_CondaTools(tools = "kallisto<=0.45", env = "rnaseq", updateEnv = T,
conda_paths pathToMiniConda = my_miniconda)
We can now see that we have downgraded Kallisto.
library(magrittr)
library(dplyr)
list_CondaPkgs("rnaseq", pathToMiniConda = my_miniconda) %>%
::filter(name == "kallisto") dplyr
Once installed within a Conda environment, many external software can be executed directly from the Conda environment’s bin directory without having to perform any additional actions.
The Herper package uses the withr family of functions (with_CondaEnv() and local_CondaEnv()) to provide methods to temporarily alter the system PATH and to add or update any required environmental variables. This is done without formally activating your environment or initializing your conda.
The with_CondaEnv allows users to run R code with the required PATH and environmental variables automatically set. The with_CondaEnv function simply requires the name of conda environment and the code to be executed within this environment. Additionally we can also the pathToMiniconda argument to specify any custom miniconda install location.
Here we use the with_CondaEnv to allow us to see the tools installed in out our “rnaseq” environment. We can then use the R function system2() to run some terminal/command line code from within R. In this case we want to run the salmon help.
<- with_CondaEnv("rnaseq",
res system2(command="salmon",args = "help",stdout = TRUE),
pathToMiniConda=my_miniconda)
res
[1] "salmon v1.10.2"
[2] ""
[3] "Usage: salmon -h|--help or "
[4] " salmon -v|--version or "
[5] " salmon -c|--cite or "
[6] " salmon [--no-version-check] <COMMAND> [-h | options]"
[7] ""
[8] "Commands:"
[9] " index : create a salmon index"
[10] " quant : quantify a sample"
[11] " alevin : single cell analysis"
[12] " swim : perform super-secret operation"
[13] " quantmerge : merge multiple quantifications into a single file"
The local_CondaEnv function acts in a similar fashion to the with_CondaEnv function: it will update the required PATH and environmental variable so you can access the tools you need. The PATH and environmental variables will persist though until the current function ends.
local_CondaEnv is best used within a user-created function, allowing access to the Conda environment’s PATH and variables from within the the function itself but resetting all environmental variables once complete.
<- function() {
salmonHelp local_CondaEnv("rnaseq", pathToMiniConda = my_miniconda)
<- system2(command = "salmon", args = "help", stdout = TRUE)
helpMessage
helpMessage
}salmonHelp()
[1] "salmon v1.10.2"
[2] ""
[3] "Usage: salmon -h|--help or "
[4] " salmon -v|--version or "
[5] " salmon -c|--cite or "
[6] " salmon [--no-version-check] <COMMAND> [-h | options]"
[7] ""
[8] "Commands:"
[9] " index : create a salmon index"
[10] " quant : quantify a sample"
[11] " alevin : single cell analysis"
[12] " swim : perform super-secret operation"
[13] " quantmerge : merge multiple quantifications into a single file"
Once you are done you have an environment that is functioning well you will want to save it. One way to back it up is to export a snapshot. For Conda this snapshot is a .yml file. These files contain all information about the environment you would need in order to rebuild or share it for collaboration.
<- paste0("rnaseq_", format(Sys.Date(), "%Y%m%d"), ".yml")
yml_name export_CondaEnv("rnaseq", yml_name, pathToMiniConda = my_miniconda)
The yml that is output contains all the information you need to rebuild the environment. This is also a a nice resource for when it comes to writing your methods.
The import_CondaEnv function allows the user to create a new conda environment from a .yml file. These can be previously exported from export_CondaEnv, conda, renv or manually created.
Users can simply provide a path to the YAML file for import. They can also specify the environment name, but by default the name will be taken from the YAML.
<- system.file("extdata/test.yml", package = "Herper")
testYML import_CondaEnv(yml_import = testYML, pathToMiniConda = my_miniconda)
Pip is an alternative system to install python tools. But it typically only works with python tools.
Pip doesn’t handle conflicting dependencies, and will upgrade/downgrade software without checking if something else depends on it.
Even if this doesn’t break your tools, it may give different results and hinder reproducibility as unsupported versions of tools could be used.
Conda is much more considered, and checks installs to make sure that all your dependencies do not conflict.
Sometimes we have to use pip. But that does not mean we have to leave our environment behind.
First we need to mkae sure we have pip installed. Oftent his is not the case in new environments. Then we provide the direct path to our command and the isntall should run.
install_CondaTools("pip", "rnaseq", pathToMiniConda = my_miniconda, updateEnv = TRUE)
with_CondaEnv("rnaseq", system2(command = paste0(conda_paths$pathToEnvBin, "/pip"),
args = c("install", "scanpy"), stdout = TRUE), pathToMiniConda = my_miniconda)
list_CondaPkgs("rnaseq", pathToMiniConda = my_miniconda) %>%
::filter(name == "scanpy") dplyr
scanpy 1.9.3 pypi pypi
Exercise on Conda and Herper can be found here
Any suggestions, comments, edits or questions (about content or the slides themselves) please reach out to our GitHub and raise an issue.