ChIPseq (part 4)

Data

In todays session we will continue to review the Myc ChIPseq we were working on in our last sessions.

This include Myc ChIPseq for MEL and Ch12 celllines.

Information and files for the Myc ChIPseq in MEL cell line can be found here

Information and files for the Myc ChIPseq in Ch12 cell line can be found here

Data

I have provided peak calls from MACS2 following the processing steps outlined in our last session.

Peak calls for Myc in MEL and Ch12 cellines can be found in

data/peaks/

data/peaks/Mel_1_peaks.xls
data/peaks/Mel_2_peaks.xls
data/peaks/Ch12_1_peaks.xls
data/peaks/Ch12_1_peaks.xls

TF binding and epigenetic states

A common goal in ChIPseq is characterise genome wide transcription factor binding sites or epigenetic states.

The presence of transcription factor binding sites and epigenetics events is often further analyzed in the context of their putative targets genes to characterize the transcription factor’s and epigenetic event’s function and/or biological role.

offset

TF binding and epigenetic states

With the release of ENCODE’s wide scale mapping of transcription factor binding sites or epigenetic states and the advent of multiplexing technologies for high throughput sequencing, in has become common practice to have replicated ChIPseq experiments so as to have higher confidence in identified epigenetic events.

offset

TF binding and epigenetic states

In addition to the genome wide characterization of epigenetic events, ChIPseq had been increasingly used to identify changes in epigenetic events between conditions and/or cell lines.

$offset$

Myc ChIPseq in Mel and Ch12 cell lines

We have been working to process and a characterize a Myc ChIPseq replicate in the Mel cell line.

In this session we will look at how we can define a high confidence/reproducible set of Myc peaks in the Mel cell line as well as identify Myc binding events unique or common between Mel and Ch12 cell lines.

Consensus Peaks

Reading in a set of peaks

First we need to read our peak calls from MACS2 into R.

The Myc peak calls we will review are within the peaks directory, so here we list all files matching our expected file pattern using the dir() function.

peakFiles <- dir("data/peaks/", pattern = "*.peaks", full.names = TRUE)
peakFiles

## [1] "data/peaks//ch12_1_peaks.xls" "data/peaks//ch12_2_peaks.xls"
## [3] "data/peaks//Mel_1_peaks.xls"  "data/peaks//Mel_2_peaks.xls"

Reading in a set of peaks

We can loop through our tab separated files (disguised as .xls functions) and import them into R as a list of data.frames using a loop.

macsPeaks_DF <- list()
for (i in 1:length(peakFiles)) {
    macsPeaks_DF[[i]] <- read.delim(peakFiles[i], comment.char = "#")
}
length(macsPeaks_DF)

## [1] 4

Reading in a set of peaks

Now with our list of data.frames of peak calls, we loop through the list and create a GRanges for each of our peak calls.

Remember you can also do this with the import function from rtracklayer.

library(GenomicRanges)
macsPeaks_GR <- list()
for (i in 1:length(macsPeaks_DF)) {
    peakDFtemp <- macsPeaks_DF[[i]]
    macsPeaks_GR[[i]] <- GRanges(seqnames = peakDFtemp[, "chr"], IRanges(peakDFtemp[, "start"],
        peakDFtemp[, "end"]))
}
macsPeaks_GR[[1]]

## GRanges object with 13910 ranges and 0 metadata columns:
##           seqnames            ranges strand
##              <Rle>         <IRanges>  <Rle>
##       [1]     chr1   4490620-4490773      *
##       [2]     chr1   4671581-4671836      *
##       [3]     chr1   4785359-4785661      *
##       [4]     chr1   5283235-5283439      *
##       [5]     chr1   6262317-6262572      *
##       ...      ...               ...    ...
##   [13906]     chrY 90742039-90742407      *
##   [13907]     chrY 90742425-90742646      *
##   [13908]     chrY 90742687-90742878      *
##   [13909]     chrY 90742905-90743091      *
##   [13910]     chrY 90825379-90825714      *
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths

Reading in a set of peaks

We will want to assign a sensible set of names to our peak calls.

We can use the gsub() and basename() function with our file names to create some samplenames.

The basename() function accepts a file path (such as the path to our bam files) and returns just the file name (removing directory paths).

The gsub() function accepts the text to replace, the replacement text and a character vector to replace within.

fileNames <- basename(peakFiles)
fileNames

## [1] "ch12_1_peaks.xls" "ch12_2_peaks.xls" "Mel_1_peaks.xls"  "Mel_2_peaks.xls"

sampleNames <- gsub("_peaks.xls", "", fileNames)
sampleNames

## [1] "ch12_1" "ch12_2" "Mel_1"  "Mel_2"

Reading in a set of peaks

Now we have a named list of our peak calls as GRanges objects.

We can convert our list of GRanges objects to a GRangesList using the GRangesList() function.

macsPeaks_GRL <- GRangesList(macsPeaks_GR)
names(macsPeaks_GRL) <- sampleNames
class(macsPeaks_GRL)

## [1] "CompressedGRangesList"
## attr(,"package")
## [1] "GenomicRanges"

names(macsPeaks_GRL)

## [1] "ch12_1" "ch12_2" "Mel_1"  "Mel_2"

GRangesList objects

The GRangesList object can behave just as our standard lists. Here we use the lengths() function to a get the number of peaks in each replicate.

lengths(macsPeaks_GRL)

## ch12_1 ch12_2  Mel_1  Mel_2 
##  13910  28420  13777  13512

GRangesList objects

A major advantage of GRangesList objects is that we can apply many of the GRanges accessor and operator functions directly to our GRangesList.

This means there is no need to lapply and convert back to GRangesList if we wish to alter our GRanges by a common method.

library(rtracklayer)
macsPeaks_GRLCentred <- resize(macsPeaks_GRL, 10, fix = "center")
width(macsPeaks_GRLCentred)

## IntegerList of length 4
## [["ch12_1"]] 10 10 10 10 10 10 10 10 10 10 10 10 10 ... 10 10 10 10 10 10 10 10 10 10 10 10
## [["ch12_2"]] 10 10 10 10 10 10 10 10 10 10 10 10 10 ... 10 10 10 10 10 10 10 10 10 10 10 10
## [["Mel_1"]] 10 10 10 10 10 10 10 10 10 10 10 10 10 ... 10 10 10 10 10 10 10 10 10 10 10 10
## [["Mel_2"]] 10 10 10 10 10 10 10 10 10 10 10 10 10 ... 10 10 10 10 10 10 10 10 10 10 10 10

GRangesList objects

Now we have our GRangesList we can extract the peak calls for the Mel replicates.

Mel_1_Peaks <- macsPeaks_GRL$Mel_1
Mel_2_Peaks <- macsPeaks_GRL$Mel_2
length(Mel_1_Peaks)

## [1] 13777

length(Mel_2_Peaks)

## [1] 13512

Finding unique peaks

We can extract peak calls unique to replicate 1 or 2 using the %over% operator.

Mel_1_Unique <- Mel_1_Peaks[!Mel_1_Peaks %over% Mel_2_Peaks]
Mel_2_Unique <- Mel_2_Peaks[!Mel_2_Peaks %over% Mel_1_Peaks]
length(Mel_1_Unique)

## [1] 4668

length(Mel_2_Unique)

## [1] 4263

export.bed(Mel_1_Unique, "Mel_1_Unique.bed")
export.bed(Mel_2_Unique, "Mel_2_Unique.bed")

Finding unique peaks

Finding common peaks

Similarly we can extract peak calls common to replicate 1 or 2.

The numbers in common however differ. This is because 2 peak calls in one sample can overlap 1 peak call in the other replicate.

Mel_1_Common <- Mel_1_Peaks[Mel_1_Peaks %over% Mel_2_Peaks]
Mel_2_Common <- Mel_2_Peaks[Mel_2_Peaks %over% Mel_1_Peaks]
length(Mel_1_Common)

## [1] 9109

length(Mel_2_Common)

## [1] 9249

export.bed(Mel_1_Common, "Mel_1_Common.bed")
export.bed(Mel_2_Common, "Mel_2_Common.bed")

Finding common peaks

Despite overlapping these peaks are not identical. So how do we determine a common consensus peak several samples.

Define a consensus, redundant set

To address this problem, a common operation in ChIPseq is to define a nonredundant set of peaks across all samples.

To do this we first pool all our peaks across all replicates, here Mel and Ch12, into one set of redundant, overlapping peaks.

allPeaksSet_Overlapping <- unlist(macsPeaks_GRL)
allPeaksSet_Overlapping

## GRanges object with 69619 ranges and 0 metadata columns:
##          seqnames              ranges strand
##             <Rle>           <IRanges>  <Rle>
##   ch12_1     chr1     4490620-4490773      *
##   ch12_1     chr1     4671581-4671836      *
##   ch12_1     chr1     4785359-4785661      *
##   ch12_1     chr1     5283235-5283439      *
##   ch12_1     chr1     6262317-6262572      *
##      ...      ...                 ...    ...
##    Mel_2     chrX 169107487-169107684      *
##    Mel_2     chrX 169299643-169299795      *
##    Mel_2     chrY       897622-897763      *
##    Mel_2     chrY     1245573-1245845      *
##    Mel_2     chrY     1286522-1286823      *
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths

Define a consensus, nonredundant set

We can then use the reduce() function to collapse our peaks into nonredundant, distinct peaks representing peaks present in any sample.

allPeaksSet_nR <- reduce(allPeaksSet_Overlapping)
allPeaksSet_nR

## GRanges object with 38937 ranges and 0 metadata columns:
##           seqnames            ranges strand
##              <Rle>         <IRanges>  <Rle>
##       [1]     chr1   4416806-4417001      *
##       [2]     chr1   4490620-4490773      *
##       [3]     chr1   4671488-4671836      *
##       [4]     chr1   4785341-4785781      *
##       [5]     chr1   4857628-4857783      *
##       ...      ...               ...    ...
##   [38933]     chrY 90741011-90741723      *
##   [38934]     chrY 90741934-90743197      *
##   [38935]     chrY 90744329-90744494      *
##   [38936]     chrY 90825195-90825796      *
##   [38937]     chrY 90828711-90828830      *
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths

export.bed(allPeaksSet_nR, "allPeaksSet_nR.bed")

Define a consensus, nonredundant set

Defining a common set of peaks

With our newly defined nonredundant peak set we can now identify from this set which peaks were present in both our replicates using the %over% operator and a logical expression.

commonPeaks <- allPeaksSet_nR[allPeaksSet_nR %over% Mel_1_Peaks & allPeaksSet_nR %over% Mel_2_Peaks]
commonPeaks

## GRanges object with 8888 ranges and 0 metadata columns:
##          seqnames              ranges strand
##             <Rle>           <IRanges>  <Rle>
##      [1]     chr1     4785341-4785781      *
##      [2]     chr1     7397557-7398093      *
##      [3]     chr1     9545120-9545648      *
##      [4]     chr1     9943560-9943856      *
##      [5]     chr1     9943893-9944649      *
##      ...      ...                 ...    ...
##   [8884]     chrX 166535216-166535529      *
##   [8885]     chrX 168673273-168673855      *
##   [8886]     chrX 169047634-169047962      *
##   [8887]     chrY     1245573-1245871      *
##   [8888]     chrY     1286522-1286823      *
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths

export.bed(commonPeaks, "commonPeaks.bed")

Defining a common set of peaks

Defining a unique set of peaks

Similarly we can identify which peaks are present only in one replicate.

mel1_Only <- allPeaksSet_nR[allPeaksSet_nR %over% Mel_1_Peaks & !allPeaksSet_nR %over% Mel_2_Peaks]
mel2_Only <- allPeaksSet_nR[!allPeaksSet_nR %over% Mel_1_Peaks & allPeaksSet_nR %over% Mel_2_Peaks]
length(mel1_Only)

## [1] 4445

length(mel2_Only)

## [1] 4185

export.bed(mel1_Only, "mel1_Only.bed")
export.bed(mel2_Only, "mel2_Only.bed")

Defining a unique set of peaks

Complex overlaps

When working with larger numbers of peaks we will often define a logical matrix describing in which samples our nonredundant peaks were present.

First then we use a loop to generate a logical vector for the occurence of nonredundant peaks in each sample.

overlap <- list()
for (i in 1:length(macsPeaks_GRL)) {
    overlap[[i]] <- allPeaksSet_nR %over% macsPeaks_GRL[[i]]
}
overlap[[1]][1:2]

## [1] FALSE  TRUE

Complex overlaps

We can now use to do.call and cbind function to column bind our list of overlaps into our matrix of peak occurrence.

overlapMatrix <- do.call(cbind, overlap)
colnames(overlapMatrix) <- names(macsPeaks_GRL)
overlapMatrix[1:2, ]

##      ch12_1 ch12_2 Mel_1 Mel_2
## [1,]  FALSE   TRUE FALSE FALSE
## [2,]   TRUE  FALSE FALSE FALSE

Complex overlaps

We can add the matrix back into the metadata columns of our GRanges() of nonredundant peaks using the mcols() accessor.

Now we have our nonredundant peaks and the occurence of these peaks in every sample we can easily identify peaks unique or common to replicates and conditions/cell lines.

mcols(allPeaksSet_nR) <- overlapMatrix
allPeaksSet_nR[1:2, ]

## GRanges object with 2 ranges and 4 metadata columns:
##       seqnames          ranges strand |    ch12_1    ch12_2     Mel_1     Mel_2
##          <Rle>       <IRanges>  <Rle> | <logical> <logical> <logical> <logical>
##   [1]     chr1 4416806-4417001      * |     FALSE      TRUE     FALSE     FALSE
##   [2]     chr1 4490620-4490773      * |      TRUE     FALSE     FALSE     FALSE
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths

Complex overlaps

The limma package is commonly used in the analysis of RNAseq and microarray data and contains many useful helpful functions.

One very useful function is the vennDiagram function which allows us to plot overlaps from a logical matrix, just like the one we created.

library(limma)
vennDiagram(mcols(allPeaksSet_nR))

Complex overlaps

The limma package’s vennCounts function allows us to retrieve the counts displayed in the Venn diagram as a data.frame.

vennCounts(mcols(allPeaksSet_nR))

##    ch12_1 ch12_2 Mel_1 Mel_2 Counts
## 1       0      0     0     0      0
## 2       0      0     0     1   3361
## 3       0      0     1     0   2435
## 4       0      0     1     1   3876
## 5       0      1     0     0  13042
## 6       0      1     0     1    378
## 7       0      1     1     0   1275
## 8       0      1     1     1   1195
## 9       1      0     0     0   1446
## 10      1      0     0     1     46
## 11      1      0     1     0     24
## 12      1      0     1     1     91
## 13      1      1     0     0   6931
## 14      1      1     0     1    400
## 15      1      1     1     0    711
## 16      1      1     1     1   3726
## attr(,"class")
## [1] "VennCounts"

High confidence peaks

With our nonredundant set of peaks and our matrix of peak occurrence, we can define replicated peaks within conditions.

Here we define the peaks which occur in both the Ch12 replicates.

Since logical matrix is equivalent to a 1 or 0 matrix (1 = TRUE and 0 = FALSE), we can use the rowSums function to extract peaks in at least 2 of the Ch12 replicates.

ch12_HC_Peaks <- allPeaksSet_nR[rowSums(as.data.frame(mcols(allPeaksSet_nR)[, c("ch12_1", "ch12_2")])) >=
    2]

export.bed(ch12_HC_Peaks, "ch12_HC_Peaks.bed")

ch12_HC_Peaks[1:2, ]

## GRanges object with 2 ranges and 4 metadata columns:
##       seqnames          ranges strand |    ch12_1    ch12_2     Mel_1     Mel_2
##          <Rle>       <IRanges>  <Rle> | <logical> <logical> <logical> <logical>
##   [1]     chr1 4671488-4671836      * |      TRUE      TRUE     FALSE     FALSE
##   [2]     chr1 4785341-4785781      * |      TRUE      TRUE      TRUE      TRUE
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths

High confidence peaks

High confidence unique peaks

Similarly we can define peaks which are replicated in Ch12 but absent in Mel samples.

ch12_HC_UniquePeaks <- allPeaksSet_nR[rowSums(as.data.frame(mcols(allPeaksSet_nR)[, c("ch12_1",
    "ch12_2")])) >= 2 & rowSums(as.data.frame(mcols(allPeaksSet_nR)[, c("Mel_1", "Mel_2")])) ==
    0]
export.bed(ch12_HC_UniquePeaks, "ch12_HC_UniquePeaks.bed")
ch12_HC_UniquePeaks[1, ]

## GRanges object with 1 range and 4 metadata columns:
##       seqnames          ranges strand |    ch12_1    ch12_2     Mel_1     Mel_2
##          <Rle>       <IRanges>  <Rle> | <logical> <logical> <logical> <logical>
##   [1]     chr1 4671488-4671836      * |      TRUE      TRUE     FALSE     FALSE
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths

High confidence unique peaks

Differential Peaks

Finding differential regions

Identifying peaks specific to cell lines or conditions however does not capture the full range of changes in epigenetic events.

To identify differences in epigenetic events we can attempt to quantify the changes in fragment abundance from IP samples across our nonredundant set of peaks.

Finding differential regions

We first must establish a set of regions within which to quantify IP-ed fragments.

An established technique is to produce a set of nonredundant peaks which occur in the majority of at least one experimental condition under evaluation.

Here we identify peaks which occurred in both replicates in either Mel or Ch12 cell lines.

HC_Peaks <- allPeaksSet_nR[rowSums(as.data.frame(mcols(allPeaksSet_nR)[, c("ch12_1", "ch12_2")])) >=
    2 | rowSums(as.data.frame(mcols(allPeaksSet_nR)[, c("Mel_1", "Mel_2")])) >= 2]
HC_Peaks

## GRanges object with 16930 ranges and 4 metadata columns:
##           seqnames            ranges strand |    ch12_1    ch12_2     Mel_1     Mel_2
##              <Rle>         <IRanges>  <Rle> | <logical> <logical> <logical> <logical>
##       [1]     chr1   4671488-4671836      * |      TRUE      TRUE     FALSE     FALSE
##       [2]     chr1   4785341-4785781      * |      TRUE      TRUE      TRUE      TRUE
##       [3]     chr1   5283217-5283439      * |      TRUE      TRUE     FALSE     FALSE
##       [4]     chr1   6262317-6262572      * |      TRUE      TRUE     FALSE     FALSE
##       [5]     chr1   6406474-6406849      * |      TRUE      TRUE     FALSE     FALSE
##       ...      ...               ...    ... .       ...       ...       ...       ...
##   [16926]     chrY 90739739-90740228      * |      TRUE      TRUE     FALSE     FALSE
##   [16927]     chrY 90740354-90740695      * |      TRUE      TRUE     FALSE     FALSE
##   [16928]     chrY 90741011-90741723      * |      TRUE      TRUE     FALSE     FALSE
##   [16929]     chrY 90741934-90743197      * |      TRUE      TRUE     FALSE     FALSE
##   [16930]     chrY 90825195-90825796      * |      TRUE      TRUE     FALSE     FALSE
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths

export.bed(HC_Peaks, "HC_Peaks.bed")

Finding differential regions

Counting regions

We will count from our aligned BAM files to quantify IP fragments.

As we have seen previously we can use the BamFileList() function to specify which BAMs to count and importantly, to control memory we specify the number of reads to be held in memory at one time using the yield() parameter.

library(Rsamtools)

bams <- c("~/Projects/Results/chipseq/testRun/BAMs/Sorted_Myc_Ch12_1.bam", "~/Projects/Results/chipseq/testRun/BAMs/Sorted_Myc_Ch12_2.bam",
    "~/Projects/Results/chipseq/testRun/BAMs/Sorted_Myc_Mel_1.bam", "~/Projects/Results/chipseq/testRun/BAMs/Sorted_Myc_Mel_2.bam")
bamFL <- BamFileList(bams, yieldSize = 5e+06)
bamFL

Counting regions

We can count the number of fragments overlapping our peaks using the summarizeOverlaps function. Since ChIPseq is strandless, we set the ignore.strand parameter to TRUE.

The returned object is a familiar RangedSummarizedExperiment containing our GRanges of nonredundant peaks and the counts in these regions for our BAM files.

library(GenomicAlignments)
myMycCounts <- summarizeOverlaps(HC_Peaks, reads = bamFL, ignore.strand = TRUE)
class(myMycCounts)
save(myMycCounts, file = "data/MycCounts.RData")

## [1] "RangedSummarizedExperiment"
## attr(,"package")
## [1] "SummarizedExperiment"

Differential regions using DESeq2

To assess changes in ChIPseq signal across cell lines we will use the DESeq2 package.

The DESeq2 package contains a workflow for assessing local changes in fragment/read abundance between replicated conditions. This workflow includes normalization, variance estimation, outlier removal/replacement as well as significance testing suited to high throughput sequencing data (i.e. integer counts).

To make use of DESeq2 workflow we must first create a data.frame of conditions of interest with rownames set as our BAM file names.

metaDataFrame <- data.frame(CellLine = c("Ch12", "Ch12", "Mel", "Mel"))
rownames(metaDataFrame) <- colnames(myMycCounts)
metaDataFrame

##                       CellLine
## Sorted_Myc_Ch12_1.bam     Ch12
## Sorted_Myc_Ch12_2.bam     Ch12
## Sorted_Myc_Mel_1.bam       Mel
## Sorted_Myc_Mel_2.bam       Mel

Differential regions using DESeq2

We can use the DESeqDataSetFromMatrix() function to create a DESeq2 object.

We must provide our matrix of counts to countData parameter, our metadata data.frame to colData parameter and we include to an optional parameter of rowRanges the nonredundant peak set we can counted on.

Finally we provide the name of the column in our metadata data.frame within which we wish to test to the design parameter.

library(DESeq2)
deseqMyc <- DESeqDataSetFromMatrix(countData = assay(myMycCounts), colData = metaDataFrame, design = ~CellLine,
    rowRanges = HC_Peaks)

Differential regions using DESeq2

We can now run the DESeq2 workflow on our DESeq2 object using the DESeq() function.

deseqMyc <- DESeq(deseqMyc)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

Differential regions using DESeq2

Our DESeq2 object is updated to include useful statistics such our normalised values and variance of signal within each nonredundant peak call.

deseqMyc

## class: DESeqDataSet 
## dim: 16930 4 
## metadata(1): version
## assays(4): counts mu H cooks
## rownames: NULL
## rowData names(26): ch12_1 ch12_2 ... deviance maxCooks
## colnames(4): Sorted_Myc_Ch12_1.bam Sorted_Myc_Ch12_2.bam Sorted_Myc_Mel_1.bam
##   Sorted_Myc_Mel_2.bam
## colData names(2): CellLine sizeFactor

Differential regions using DESeq2

We can extract our information of differential regions using the results() function.

We provide to the results() function the DESeq2 object, the comparison of interest to the contrast parameter and the type of output to return to the format parameter.

The comparison to contrast parameter is provided as a vector of length 3 including the metadata column of interest and groups to test.

We can sort the results by pvalue using the order() function to rank by the most significant changes.

MelMinusCh12 <- results(deseqMyc, contrast = c("CellLine", "Mel", "Ch12"), format = "GRanges")
MelMinusCh12 <- MelMinusCh12[order(MelMinusCh12$pvalue), ]
class(MelMinusCh12)

## [1] "GRanges"
## attr(,"package")
## [1] "GenomicRanges"

Differential regions using DESeq2

The GRanges object contains information on the comparison made in DESeq2.

Most useful it contains the the difference in IP signal as log2 fold change in log2FoldChange, the significance of the change in the pvalue column and an adjusted p-value to address multiple correction in padj column.

MelMinusCh12[1, ]

## GRanges object with 1 range and 6 metadata columns:
##       seqnames            ranges strand |  baseMean log2FoldChange     lfcSE      stat
##          <Rle>         <IRanges>  <Rle> | <numeric>      <numeric> <numeric> <numeric>
##   [1]     chr4 45953273-45954170      * |   613.942        6.09357  0.297353   20.4927
##            pvalue        padj
##         <numeric>   <numeric>
##   [1] 2.50058e-93 4.23348e-89
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths

Differential regions using DESeq2

We can now filter our nonredundant peaks to those with significantly more signal in Mel or Ch12 cell lines by filtering by log2FoldChange and padj (p-value adjusted for multiple correction) less than 0.05.

MelMinusCh12Filt <- MelMinusCh12[!is.na(MelMinusCh12$pvalue) | !is.na(MelMinusCh12$padj)]
UpinMel <- MelMinusCh12[MelMinusCh12$padj < 0.05 & MelMinusCh12$log2FoldChange > 0]
DowninMel <- MelMinusCh12[MelMinusCh12$padj < 0.05 & MelMinusCh12$log2FoldChange < 0]
export.bed(UpinMel, "UpinMel.bed")
export.bed(DowninMel, "DowninMel.bed")

Differential regions using DESeq2

Finally we can make our reviewing of sites in IGV a little easier using the tracktables package.

The tracktables package’s makebedtable() function accepts a GRanges object and writes an HTML report contains links to IGV.

An example can be found here

library(tracktables)
myReport <- makebedtable(MelMinusCh12Filt, "MelMinusCh12.html", basedirectory = getwd())

browseURL(myReport)

Time for an exercise!

Exercise on ChIPseq data can be found here

Answers to exercise

Answers can be found here

ChIPseq In Bioconductor (part4)

Rockefeller University, Bioinformatics Resource Center

http://rockefelleruniversity.github.io/RU_ChIPseq/

ChIPseq (part 4)

Data

Data

TF binding and epigenetic states

TF binding and epigenetic states

TF binding and epigenetic states

Myc ChIPseq in Mel and Ch12 cell lines

Consensus Peaks

Reading in a set of peaks

Reading in a set of peaks

Reading in a set of peaks

Reading in a set of peaks

Reading in a set of peaks

GRangesList objects

GRangesList objects

GRangesList objects

Finding unique peaks

Finding unique peaks

Finding common peaks

Finding common peaks

Finding common peaks

Define a consensus, redundant set

Define a consensus, nonredundant set

Define a consensus, nonredundant set

Defining a common set of peaks

Defining a common set of peaks

Defining a unique set of peaks

Defining a unique set of peaks

Complex overlaps

Complex overlaps

Complex overlaps

Complex overlaps

Complex overlaps

High confidence peaks

High confidence peaks

High confidence unique peaks

High confidence unique peaks

Differential Peaks

Finding differential regions

Finding differential regions

Finding differential regions

Counting regions

Counting regions

Differential regions using DESeq2

Differential regions using DESeq2

Differential regions using DESeq2

Differential regions using DESeq2

Differential regions using DESeq2

Differential regions using DESeq2

Differential regions using DESeq2

Differential regions using DESeq2

Differential regions using DESeq2

Time for an exercise!

Answers to exercise