The Herper package is a simple toolset to install and manage Conda packages and environments from within the R console.
Unfortunately many tools for data analysis are not available in R, but are present in public repositories like conda. With Herper users can install, manage, record and run conda tools from the comfort of their R session.
Furthermore, many R packages require the use of these external dependencies. Again these dependencies can be installed and managed with the Conda package repository. For example 169 Bioconductor packages have external dependencies listed in their System Requirements field (often with these packages having several requirements) [03 September, 2020].
Herper provides an ad-hoc approach to handling external system requirements for R packages. For people developing packages with python conda dependencies we recommend using basilisk to internally support these system requirements pre-hoc.
The Herper package was developed by Matt Paul, Doug Barrows and Thomas Carroll at the Rockefeller University Bioinformatics Resources Center with contributions from Kathryn Rozen-Gagnon.
Use the BiocManager
package to download and install the
package from our Github repository:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("Herper")
Once installed, load it into your R session:
library(Herper)
## Warning: package 'reticulate' was built under R version 4.1.2
The install_CondaTools() function allows the user to specify required Conda software and the desired environment to install into.
Miniconda is installed as part of the process (by default into the r-reticulate’s default Conda location - /Users/mattpaul/Library/r-miniconda) and the user’s requested conda environment built within the same directory (by default /Users/mattpaul/Library/r-miniconda/envs/USERS_ENVIRONMENT_HERE).
If you already have Miniconda installed or you would like to install to a custom location, you can specify the path with the pathToMiniConda parameter. In this example we are installing in a temporary directory, but most likely you will want to install/use a stable version of Miniconda.
myMiniconda <- file.path(tempdir2(), "Test")
myMiniconda
## [1] "/var/folders/zy/x35d37h50sq2_fp3zrjydcl00000gn/T//RtmpBk134k/rr/Test"
install_CondaTools("samtools", "herper", pathToMiniConda = myMiniconda)
## $pathToConda
## [1] "/var/folders/zy/x35d37h50sq2_fp3zrjydcl00000gn/T//RtmpBk134k/rr/Test/bin/conda"
##
## $environment
## [1] "herper"
##
## $pathToEnvBin
## [1] "/var/folders/zy/x35d37h50sq2_fp3zrjydcl00000gn/T//RtmpBk134k/rr/Test/envs/herper/bin"
We can add additional tools to our Conda environment by
specifying updateEnv = TRUE. A vector of tools can be used to
install several at once.
pathToConda <- install_CondaTools(c("salmon", "kallisto"), "herper", updateEnv = TRUE, pathToMiniConda = myMiniconda)
pathToConda
## $pathToConda
## [1] "/var/folders/zy/x35d37h50sq2_fp3zrjydcl00000gn/T//RtmpBk134k/rr/Test/bin/conda"
##
## $environment
## [1] "herper"
##
## $pathToEnvBin
## [1] "/var/folders/zy/x35d37h50sq2_fp3zrjydcl00000gn/T//RtmpBk134k/rr/Test/envs/herper/bin"
Specific package versions can be installed using conda formatted
inputs into the tools argument i.e. “salmon==1.3”,
“salmon>=1.3” or “salmon<=1.3”. This can also be used to
specifically upgrade or downgrade existing tools in the chosen
environment.
pathToConda <- install_CondaTools("salmon<=1.3", "herper", updateEnv = TRUE, pathToMiniConda = myMiniconda)
The install_CondaSysReqs checks the System Requirements for the specified R package, and uses Conda to install this software. Here we will use a test package contained within Herper. This test package has two System Requirements:
testPkg <- system.file("extdata/HerperTestPkg", package = "Herper")
install.packages(testPkg, type = "source", repos = NULL)
utils::packageDescription("HerperTestPkg", fields = "SystemRequirements")
## [1] "samtools==1.10, rmats>=v4.1.0"
The user can simply supply the name of an installed R package, and install_CondaSysReqs will install the System Requirements through conda.
install_CondaSysReqs("HerperTestPkg", pathToMiniConda = myMiniconda)
## $pathToConda
## [1] "/var/folders/zy/x35d37h50sq2_fp3zrjydcl00000gn/T//RtmpBk134k/rr/Test/bin/conda"
##
## $environment
## [1] "HerperTestPkg_0.1.0"
##
## $pathToEnvBin
## [1] "/var/folders/zy/x35d37h50sq2_fp3zrjydcl00000gn/T//RtmpBk134k/rr/Test/envs/HerperTestPkg_0.1.0/bin"
By default these packages are installed in a new environment, which has the name name of the R package and its version number. Users can control the environment name using the env parameter. As with install_CondaTools(), user can control which version of Miniconda with the parameter pathToMiniConda, and whether they want to amend an existing environment with the parameter updateEnv.
Note: install_CondaSysReqs can handle standard System Requirement formats, but will not work if the package has free form text. In this case just use install_CondaTools
Once installed within a conda environment, many external software can be executed directly from the conda environment’s bin directory without having to perform any additional actions.
pathToSamtools <- file.path(pathToConda$pathToEnvBin,"samtools")
Res <- system2(command=pathToSamtools, args = "help",stdout = TRUE)
Res
##
## Program: samtools (Tools for alignments in the SAM format)
## Version: 1.15 (using htslib 1.15)
##
## Usage: samtools <command> [options]
##
## Commands:
## -- Indexing
## dict create a sequence dictionary file
## faidx index/extract FASTA
## fqidx index/extract FASTQ
## index index alignment
##
## -- Editing
## calmd recalculate MD/NM tags and '=' bases
## fixmate fix mate information
## reheader replace BAM header
## targetcut cut fosmid regions (for fosmid pool only)
## addreplacerg adds or replaces RG tags
## markdup mark duplicates
## ampliconclip clip oligos from the end of reads
##
## -- File operations
## collate shuffle and group alignments by name
## cat concatenate BAMs
## consensus produce a consensus Pileup/FASTA/FASTQ
## merge merge sorted alignments
## mpileup multi-way pileup
## sort sort alignment file
## split splits a file by read group
## quickcheck quickly check if SAM/BAM/CRAM file appears intact
## fastq converts a BAM to a FASTQ
## fasta converts a BAM to a FASTA
## import Converts FASTA or FASTQ files to SAM/BAM/CRAM
##
## -- Statistics
## bedcov read depth per BED region
## coverage alignment depth and percent coverage
## depth compute the depth
## flagstat simple stats
## idxstats BAM index stats
## phase phase heterozygotes
## stats generate stats (former bamcheck)
## ampliconstats generate amplicon specific stats
##
## -- Viewing
## flags explain BAM flags
## head header viewer
## tview text alignment viewer
## view SAM<->BAM<->CRAM conversion
## depad convert padded BAM to unpadded BAM
## samples list the samples in a set of SAM/BAM/CRAM files
##
## -- Misc
## help [cmd] display this help message or help for [cmd]
## version detailed version information
Some external software however require additional environmental variable to be set in order to execute correctly. An example of this would be Cytoscape which requires the java home directory and java library paths to be set prior to its execution.
The Herper package uses the withr family of functions (with_CondaEnv() and local_CondaEnv()) to provide methods to temporarily alter the system PATH and to add or update any required environmental variables. This is done without formally activating your environment or initializing your conda.
The with_CondaEnv allows users to run R code with the required PATH and environmental variables automatically set. The with_CondaEnv function simply requires the name of conda environment and the code to be executed within this environment. Additionally we can also the pathToMiniconda argument to specify any custom miniconda install location.
The with_CondaEnv function will update the PATH we can now run the above samtools command without specifying the full directory path to samtools.
res <- with_CondaEnv("herper",
system2(command="samtools",args = "help",stdout = TRUE),
pathToMiniConda=myMiniconda)
res
##
## Program: samtools (Tools for alignments in the SAM format)
## Version: 1.15 (using htslib 1.15)
##
## Usage: samtools <command> [options]
##
## Commands:
## -- Indexing
## dict create a sequence dictionary file
## faidx index/extract FASTA
## fqidx index/extract FASTQ
## index index alignment
##
## -- Editing
## calmd recalculate MD/NM tags and '=' bases
## fixmate fix mate information
## reheader replace BAM header
## targetcut cut fosmid regions (for fosmid pool only)
## addreplacerg adds or replaces RG tags
## markdup mark duplicates
## ampliconclip clip oligos from the end of reads
##
## -- File operations
## collate shuffle and group alignments by name
## cat concatenate BAMs
## consensus produce a consensus Pileup/FASTA/FASTQ
## merge merge sorted alignments
## mpileup multi-way pileup
## sort sort alignment file
## split splits a file by read group
## quickcheck quickly check if SAM/BAM/CRAM file appears intact
## fastq converts a BAM to a FASTQ
## fasta converts a BAM to a FASTA
## import Converts FASTA or FASTQ files to SAM/BAM/CRAM
##
## -- Statistics
## bedcov read depth per BED region
## coverage alignment depth and percent coverage
## depth compute the depth
## flagstat simple stats
## idxstats BAM index stats
## phase phase heterozygotes
## stats generate stats (former bamcheck)
## ampliconstats generate amplicon specific stats
##
## -- Viewing
## flags explain BAM flags
## head header viewer
## tview text alignment viewer
## view SAM<->BAM<->CRAM conversion
## depad convert padded BAM to unpadded BAM
## samples list the samples in a set of SAM/BAM/CRAM files
##
## -- Misc
## help [cmd] display this help message or help for [cmd]
## version detailed version information
The local_CondaEnv function acts in a similar fashion to the with_CondaEnv function and allows the user to temporarily update the required PATH and environmental variable from within a function. The PATH and environmental variables will be modified only until the current function ends.
local_CondaEnv is best used within a user-created function, allowing access to the Conda environment’s PATH and variables from within the the function itself but resetting all environmental variables once complete.
samtoolsHelp <- function(){
local_CondaEnv("herper", pathToMiniConda=myMiniconda)
helpMessage <- system2(command="samtools",args = "help",stdout = TRUE)
helpMessage
}
samtoolsHelp()
##
## Program: samtools (Tools for alignments in the SAM format)
## Version: 1.15 (using htslib 1.15)
##
## Usage: samtools <command> [options]
##
## Commands:
## -- Indexing
## dict create a sequence dictionary file
## faidx index/extract FASTA
## fqidx index/extract FASTQ
## index index alignment
##
## -- Editing
## calmd recalculate MD/NM tags and '=' bases
## fixmate fix mate information
## reheader replace BAM header
## targetcut cut fosmid regions (for fosmid pool only)
## addreplacerg adds or replaces RG tags
## markdup mark duplicates
## ampliconclip clip oligos from the end of reads
##
## -- File operations
## collate shuffle and group alignments by name
## cat concatenate BAMs
## consensus produce a consensus Pileup/FASTA/FASTQ
## merge merge sorted alignments
## mpileup multi-way pileup
## sort sort alignment file
## split splits a file by read group
## quickcheck quickly check if SAM/BAM/CRAM file appears intact
## fastq converts a BAM to a FASTQ
## fasta converts a BAM to a FASTA
## import Converts FASTA or FASTQ files to SAM/BAM/CRAM
##
## -- Statistics
## bedcov read depth per BED region
## coverage alignment depth and percent coverage
## depth compute the depth
## flagstat simple stats
## idxstats BAM index stats
## phase phase heterozygotes
## stats generate stats (former bamcheck)
## ampliconstats generate amplicon specific stats
##
## -- Viewing
## flags explain BAM flags
## head header viewer
## tview text alignment viewer
## view SAM<->BAM<->CRAM conversion
## depad convert padded BAM to unpadded BAM
## samples list the samples in a set of SAM/BAM/CRAM files
##
## -- Misc
## help [cmd] display this help message or help for [cmd]
## version detailed version information
To further demonstrate this we will use the first command from the seqCNA vignette. This step requires samtools. If this is not installed and available there is an error.
library(seqCNA)
## Warning: package 'doSNOW' was built under R version 4.1.2
## Warning: package 'foreach' was built under R version 4.1.2
## Warning: package 'iterators' was built under R version 4.1.2
## Warning: package 'MASS' was built under R version 4.1.2
data(seqsumm_HCC1143)
try(rco <- readSeqsumm(tumour.data = seqsumm_HCC1143), silent = FALSE)
Samtools is listed as a System Requirement for seqCNA, so we can first use install_CondaSysReqs() to install samtools. In this case we are installing samtools in the environment: seqCNA_env. We can then run the seqCNA command using with_CondaEnv specifying that we want to use our environment containing samtools. seqCNA can then find samtools and execute successfully.
install_CondaSysReqs(pkg="seqCNA",env="seqCNA_env",pathToMiniConda=myMiniconda)
rco <- with_CondaEnv(new="seqCNA_env",readSeqsumm(tumour.data=seqsumm_HCC1143)
,pathToMiniConda = myMiniconda)
summary(rco)
## Basic information:
## SeqCNAInfo object with 5314 200Kbp-long windows.
## PEM information is not available.
## Paired normal is not available.
## Genome and build unknown (chromosomes chr1 to chr5).
## The profile is not yet normalized and not yet segmented.
If the user is unsure of the exact name, or version of a tool available on conda, they can use the conda_search function.
conda_search("salmon", pathToMiniConda = myMiniconda)
## name version channel
## 2 salmon 0.8.2 https://conda.anaconda.org/bioconda/osx-64
## 3 salmon 0.9.0 https://conda.anaconda.org/bioconda/osx-64
## 5 salmon 0.9.1 https://conda.anaconda.org/bioconda/osx-64
## 6 salmon 0.10.0 https://conda.anaconda.org/bioconda/osx-64
## 7 salmon 0.10.1 https://conda.anaconda.org/bioconda/osx-64
## 9 salmon 0.10.2 https://conda.anaconda.org/bioconda/osx-64
## 11 salmon 0.11.3 https://conda.anaconda.org/bioconda/osx-64
## 12 salmon 0.12.0 https://conda.anaconda.org/bioconda/osx-64
## 14 salmon 0.13.0 https://conda.anaconda.org/bioconda/osx-64
## 15 salmon 0.13.1 https://conda.anaconda.org/bioconda/osx-64
## 17 salmon 0.14.0 https://conda.anaconda.org/bioconda/osx-64
## 20 salmon 0.14.1 https://conda.anaconda.org/bioconda/osx-64
## 22 salmon 0.14.2 https://conda.anaconda.org/bioconda/osx-64
## 23 salmon 0.15.0 https://conda.anaconda.org/bioconda/osx-64
## 24 salmon 1.0.0 https://conda.anaconda.org/bioconda/osx-64
## 25 salmon 1.1.0 https://conda.anaconda.org/bioconda/osx-64
## 26 salmon 1.2.0 https://conda.anaconda.org/bioconda/osx-64
## 27 salmon 1.2.1 https://conda.anaconda.org/bioconda/osx-64
## 28 salmon 1.3.0 https://conda.anaconda.org/bioconda/osx-64
## 30 salmon 1.4.0 https://conda.anaconda.org/bioconda/osx-64
## 31 salmon 1.5.0 https://conda.anaconda.org/bioconda/osx-64
## 32 salmon 1.5.1 https://conda.anaconda.org/bioconda/osx-64
## 33 salmon 1.5.2 https://conda.anaconda.org/bioconda/osx-64
## 34 salmon 1.6.0 https://conda.anaconda.org/bioconda/osx-64
## 36 salmon 1.7.0 https://conda.anaconda.org/bioconda/osx-64
## 38 salmon 1.8.0 https://conda.anaconda.org/bioconda/osx-64
## [1] TRUE
Specific package versions can be searched for using the conda format i.e. “salmon==1.3”, “salmon>=1.3” or “salmon<=1.3”. Searches will also find close matches for incorrect queries. Channels to search in can be controlled with channels parameter.
conda_search("salmon<=1.0", pathToMiniConda = myMiniconda)
## name version channel
## 2 salmon 0.8.2 https://conda.anaconda.org/bioconda/osx-64
## 3 salmon 0.9.0 https://conda.anaconda.org/bioconda/osx-64
## 5 salmon 0.9.1 https://conda.anaconda.org/bioconda/osx-64
## 6 salmon 0.10.0 https://conda.anaconda.org/bioconda/osx-64
## 7 salmon 0.10.1 https://conda.anaconda.org/bioconda/osx-64
## 9 salmon 0.10.2 https://conda.anaconda.org/bioconda/osx-64
## 11 salmon 0.11.3 https://conda.anaconda.org/bioconda/osx-64
## 12 salmon 0.12.0 https://conda.anaconda.org/bioconda/osx-64
## 14 salmon 0.13.0 https://conda.anaconda.org/bioconda/osx-64
## 15 salmon 0.13.1 https://conda.anaconda.org/bioconda/osx-64
## 17 salmon 0.14.0 https://conda.anaconda.org/bioconda/osx-64
## 20 salmon 0.14.1 https://conda.anaconda.org/bioconda/osx-64
## 22 salmon 0.14.2 https://conda.anaconda.org/bioconda/osx-64
## 23 salmon 0.15.0 https://conda.anaconda.org/bioconda/osx-64
## [1] TRUE
conda_search("salmo", pathToMiniConda = myMiniconda)
## [1] FALSE
The export_CondaEnv function allows the user to export the environment information to a .yml file. These environment YAML files contain all essential information about the package, allowing for reproducibility and easy distribution of Conda system configuration for collaboration.
yml_name <- paste0("herper_", format(Sys.Date(), "%Y%m%d"), ".yml")
export_CondaEnv("herper", yml_name, pathToMiniConda = myMiniconda)
## [1] "herper_20220329.yml"
The YAML export will contain all packages in the environment by default. If the user wants to only export the packages that were specifically installed and not their dependencies they can use the depends paramter.
yml_name <- paste0("herper_nodeps_", format(Sys.Date(), "%Y%m%d"), ".yml")
export_CondaEnv("herper", yml_name, depends = FALSE, pathToMiniConda = myMiniconda)
## [1] "herper_nodeps_20220329.yml"
The import_CondaEnv function allows the user to create a new conda environment from a .yml file. These can be previously exported from export_CondaEnv, conda, renv or manually created.
Users can simply provide a path to the YAML file for import. They can also specify the environment name, but by default the name will be taken from the YAML.
testYML <- system.file("extdata/test.yml",package="Herper")
import_CondaEnv(yml_import=testYML, pathToMiniConda = myMiniconda)
The list_CondaEnv function allows users to check what environments already exist within the given conda build.
If the User is using multiple builds of conda and wants to check environments across all them, they can include the parameter allCondas = TRUE.
list_CondaEnv(pathToMiniConda = myMiniconda)
## conda path env
## 1 /private/var/folders/zy/x35d37h50sq2_fp3zrjydcl00000gn/T/RtmpBk134k/rr/Test base
## 2 /private/var/folders/zy/x35d37h50sq2_fp3zrjydcl00000gn/T/RtmpBk134k/rr/Test HerperTestPkg_0.1.0
## 3 /private/var/folders/zy/x35d37h50sq2_fp3zrjydcl00000gn/T/RtmpBk134k/rr/Test herper
## 4 /private/var/folders/zy/x35d37h50sq2_fp3zrjydcl00000gn/T/RtmpBk134k/rr/Test seqCNA_env
The list_CondaPkgs function allows users to check what packages are installed in a given environment.
list_CondaPkgs("herper", pathToMiniConda = myMiniconda)
## name version channel platform
## 1 boost-cpp 1.74.0 conda-forge osx-64
## 2 bzip2 1.0.8 pkgs/main osx-64
## 3 c-ares 1.18.1 pkgs/main osx-64
## 4 ca-certificates 2022.3.18 pkgs/main osx-64
## 5 hdf5 1.10.6 pkgs/main osx-64
## 6 htslib 1.15 bioconda osx-64
## 7 icu 69.1 conda-forge osx-64
## 8 kallisto 0.48.0 bioconda osx-64
## 9 krb5 1.19.2 pkgs/main osx-64
## 10 libcurl 7.82.0 conda-forge osx-64
## 11 libcxx 13.0.1 conda-forge osx-64
## 12 libdeflate 1.10 conda-forge osx-64
## 13 libedit 3.1.20210910 pkgs/main osx-64
## 14 libev 4.33 pkgs/main osx-64
## 15 libgfortran 3.0.1 pkgs/main osx-64
## 16 libjemalloc 5.2.1 conda-forge osx-64
## 17 libnghttp2 1.47.0 conda-forge osx-64
## 18 libssh2 1.10.0 conda-forge osx-64
## 19 libzlib 1.2.11 conda-forge osx-64
## 20 lz4-c 1.9.3 pkgs/main osx-64
## 21 ncurses 6.3 pkgs/main osx-64
## 22 openssl 1.1.1n pkgs/main osx-64
## 23 salmon 0.14.2 bioconda osx-64
## 24 samtools 1.15 bioconda osx-64
## 25 tbb 2021.5.0 pkgs/main osx-64
## 26 xz 5.2.5 pkgs/main osx-64
## 27 zlib 1.2.11 conda-forge osx-64
## 28 zstd 1.5.0 pkgs/main osx-64
Thank you to Ji-Dung Luo and Wei Wang for testing/vignette review/critical feedback and Ziwei Liang for their support.
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS 12.2.1
##
## Matrix products: default
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] seqCNA_1.38.0 seqCNA.annot_1.28.0 adehabitatLT_0.3.25 CircStats_0.2-6
## [5] boot_1.3-28 MASS_7.3-55 adehabitatMA_0.3.14 ade4_1.7-18
## [9] sp_1.4-6 doSNOW_1.0.20 snow_0.4-4 iterators_1.0.14
## [13] foreach_1.5.2 GLAD_2.56.0 Herper_1.1.2 reticulate_1.24
## [17] rmarkdown_2.11 yaml_2.2.2 ymlthis_0.1.5 magrittr_2.0.2
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.8 lattice_0.20-45 tidyr_1.2.0 png_0.1-7 assertthat_0.2.1
## [6] digest_0.6.29 utf8_1.2.2 R6_2.5.1 backports_1.4.1 evaluate_0.14
## [11] highr_0.9 pillar_1.7.0 rlang_1.0.1 jquerylib_0.1.4 Matrix_1.4-0
## [16] stringr_1.4.0 bit_4.0.4 broom_0.7.12 compiler_4.1.0 xfun_0.29
## [21] pkgconfig_2.0.3 htmltools_0.5.2 tidyselect_1.1.1 tibble_3.1.6 codetools_0.2-18
## [26] fansi_1.0.2 crayon_1.4.2 dplyr_1.0.8 tzdb_0.2.0 withr_2.4.3
## [31] grid_4.1.0 jsonlite_1.7.3 lifecycle_1.0.1 DBI_1.1.2 cli_3.1.1
## [36] stringi_1.7.6 vroom_1.5.7 bslib_0.3.1 ellipsis_0.3.2 vctrs_0.3.8
## [41] generics_0.1.2 rjson_0.2.21 tools_4.1.0 bit64_4.0.5 glue_1.6.1
## [46] purrr_0.3.4 parallel_4.1.0 fastmap_1.1.0 knitr_1.37 sass_0.4.0