The Bioinformatics Resource Center develops, maintains and hosts a range of pipelines and downstream analysis apps for use by The Rockefeller University community.
So far these pipelines have:
Important features of our pipelines include:
To get learn more about our pipelines please contact us at brc@rockefeller.edu
The Bioinformatics Resource Center focuses on the development of bioinformatics software for high-throughput analysis, quality control and visualization of biological data relevant to our work at The Rockefeller University.
All software developed is publicly available through our github site or from online, peer-reviewed repositories.
We have developed and maintain five Bioconductor packages within the
team: profileplyr, basecallQC, ChIPQC, tracktablesm, soGGi, rfastp and
Herper.
The basecallQC package provides tools to work with Illumina bcl2Fastq (versions >= 2.1.7) software.Prior to basecalling and demultiplexing using the bcl2Fastq software, basecallQC functions allow the user to update Illumina sample sheets from versions <= 1.8.9 to >= 2.1.7 standards, clean sample sheets of common problems such as invalid sample names and IDs, create read and index basemasks and the bcl2Fastq command.
Following the generation of basecalled and demultiplexed data, the basecallQC packages allows the user to generate HTML tables, plots and a self contained report of summary metrics from Illumina XML output files.
Author: Thomas Carroll and Marian Dore
Bioconductor site - Link
ChIPQC will automatically compute a number of quality metrics, and provides simple ways to generate a ChIP-seq experiment quality report which can be examined to asses the absolute and relative quality of individual ChIP-seq samples (and their associated controls, as well as overall quality of the experimental data.)
Author: Thomas Carroll and Rory Stark
Bioconductor site - Link
The soGGi package provides a toolset to create genomic interval aggregate/summary plots of signal or motif occurence from BAM and bigWig files as well as PWM, rlelist, GRanges and GAlignments Bioconductor objects.
soGGi allows for normalisation, transformation and arithmetic operation on and between summary plot objects as well as grouping and subsetting of plots by GRanges objects and user supplied metadata.
Plots are created using the GGplot2 libary to allow user defined manipulation of the returned plot object. Coupled together, soGGi features a broad set of methods to visualise genomics data in the context of groups of genomic intervals such as genes, superenhancers and transcription factor binding events.
Author: Thomas Carroll and Gopuraja Dharmaligham
Bioconductor site - Link
Visualising genomics data in genome browsers is a key step in both quality control and the initial interrogation of any hypothesis under investigation.
The organisation of large collections of genomics data (such as from large scale high-thoughput sequencing experiments) alongside their associated sample or experimental metadata allows for the rapid evaluation of patterns across exper- imental groups. Broad’s Intergrative Genome Viewer (IGV) provides a set of methods to make use of sample metadata when visualising genomics data. As well as identifying sample metadata within the genome browser, this sample information can be used in IGV to group, sort and filter samples.
This use of sample metadata, alongside the ability to control IGV through ports, provides the desired framework to rapidly interrogate large cohorts of genomics data once the appropriate file structure is built.
The tracktables package provides a set of tools to build IGV session files from data-frames of sample files and their associated metadata as well as produce IGV linked HTML reports for high-throughput visualisation of sample data in IGV.superenhancers and transcription factor binding events.
Author: Thomas Carroll and Anne Pajon
Bioconductor site - Link
## Rfastp
Rfastp is an R wrapper of fastp developed in c++. fastp performs quality control for fastq files. including low quality bases trimming, polyX trimming, adapter auto-detection and trimming, paired-end reads merging, UMI sequence/id handling.
Rfastp can concatenate multiple files into one file (like shell command cat) and accept multiple files as input.
Author: Thomas Carroll and Wei Wang
Bioconductor site - Link
Quick and straightforward visualization of read signal over genomic intervals is key for generating hypotheses from sequencing data sets (e.g. ChIP-seq, ATAC-seq, bisulfite/methyl-seq).
Many tools both inside and outside of R and Bioconductor are available to explore these types of data, and they typically start with a bigWig or BAM file and end with some representation of the signal (e.g. heatmap).
profileplyr leverages many Bioconductor tools to allow for both flexibility and additional functionality in workflows that end with visualization of the read signal.
Author: Thomas Carroll and Doug Barrows
Bioconductor site - Link
Many tools for data analysis are not available in R, but are present in public repositories like conda. The Herper package provides a comprehensive set of functions to interact with the conda package managament system.
With Herper users can install, manage and run conda packages from the comfort of their R session. Herper also provides an ad-hoc approach to handling external system requirements for R packages.
For people developing packages with python conda dependencies we recommend using basilisk (https://bioconductor.org/packages/release/bioc/html/basilisk.html) to internally support these system requirments pre-hoc.
Author: Thomas Carroll and Matt Paul
Bioconductor site - Link
CLIPflexR is a R wrapper for the CLIP Tool Kit (CTK) and additional functions to call other external libraries into an R environment.
CLIPflexR makes use of the Herper library to install conda dependencies to your machine within a conda environment using the reticulate libraries.
Author: Kathryn Rozen-Gagnon and Thomas Carroll
GitHub site - Link