ChIPseq In Bioconductor (part4)

class: center, middle, inverse, title-slide

.title[
# ChIPseq In Bioconductor (part4)<br />
<html><br />
<br />
<hr color='#EB811B' size=1px width=796px><br />
</html>
]
.author[
### Rockefeller University, Bioinformatics Resource Center
]
.date[
### <a href="http://rockefelleruniversity.github.io/RU_ChIPseq/" class="uri">http://rockefelleruniversity.github.io/RU_ChIPseq/</a>
]

---

## Data

In todays session we will continue to review the Myc ChIPseq we were working on in our last sessions.

This include Myc ChIPseq for MEL and Ch12 celllines.

Information and files for the [Myc ChIPseq in MEL cell line can be found here](https://www.encodeproject.org/experiments/ENCSR000EUA/)

Information and files for the [Myc ChIPseq in Ch12 cell line can be found here](https://www.encodeproject.org/experiments/ENCSR000ERN/)

---

## Data

I have provided peak calls from MACS2 following the processing steps outlined in our last session.

Peak calls for Myc in MEL and Ch12 cellines can be found in

**data/peaks/**

* **data/peaks/Mel_1_peaks.xls**
* **data/peaks/Mel_2_peaks.xls**
* **data/peaks/Ch12_1_peaks.xls**
* **data/peaks/Ch12_1_peaks.xls**

---
## TF binding and epigenetic states

A common goal in ChIPseq is characterise genome wide transcription factor binding sites or epigenetic states.

The presence of transcription factor binding sites and epigenetics events is often further analyzed in the context of their putative targets genes to characterize the transcription factor's and epigenetic event's function and/or biological role.

---
## TF binding and epigenetic states

With the release of ENCODE's wide scale mapping of transcription factor binding sites or epigenetic states and the advent of multiplexing technologies for high throughput sequencing, in has become common practice to have replicated ChIPseq experiments so as to have higher confidence in identified epigenetic events.

---
## TF binding and epigenetic states

In addition to the genome wide characterization of epigenetic events, ChIPseq had been increasingly used to identify changes in epigenetic events between conditions and/or cell lines.

---
## Myc ChIPseq in Mel and Ch12 cell lines

We have been working to process and a characterize a Myc ChIPseq replicate in the Mel cell line.

In this session we will look at how we can define a high confidence/reproducible set of Myc peaks in the Mel cell line as well as identify Myc binding events unique or common between Mel and Ch12 cell lines.

---
class: inverse, center, middle

# Consensus Peaks

---

## Reading in a set of peaks

First we need to read our peak calls from MACS2 into R.

The Myc peak calls we will review are within the peaks directory, so here we list all files matching our expected file pattern using the **dir() function.**

``` r
peakFiles <- dir("data/peaks/", pattern = "*.peaks", full.names = TRUE)
peakFiles
```

```
## [1] "data/peaks//ch12_1_peaks.xls" "data/peaks//ch12_2_peaks.xls"
## [3] "data/peaks//Mel_1_peaks.xls"  "data/peaks//Mel_2_peaks.xls"
```

---
## Reading in a set of peaks

We can loop through our tab separated files (disguised as *.xls* functions) and import them into R as a list of data.frames using a loop.

``` r
macsPeaks_DF <- list()
for (i in 1:length(peakFiles)) {
    macsPeaks_DF[[i]] <- read.delim(peakFiles[i], comment.char = "#")
}
length(macsPeaks_DF)
```

```
## [1] 4
```

---
## Reading in a set of peaks

Now with our list of data.frames of peak calls, we loop through the list and create a **GRanges** for each of our peak calls.

Remember you can also do this with the [import function from rtracklayer](https://rockefelleruniversity.github.io/RU_ChIPseq/presentations/slides/ChIPseq_In_Bioconductor2.html#46).

``` r
library(GenomicRanges)
macsPeaks_GR <- list()
for (i in 1:length(macsPeaks_DF)) {
    peakDFtemp <- macsPeaks_DF[[i]]
    macsPeaks_GR[[i]] <- GRanges(seqnames = peakDFtemp[, "chr"], IRanges(peakDFtemp[, "start"],
        peakDFtemp[, "end"]))
}
macsPeaks_GR[[1]]
```

```
## GRanges object with 13910 ranges and 0 metadata columns:
##           seqnames            ranges strand
##              <Rle>         <IRanges>  <Rle>
##       [1]     chr1   4490620-4490773      *
##       [2]     chr1   4671581-4671836      *
##       [3]     chr1   4785359-4785661      *
##       [4]     chr1   5283235-5283439      *
##       [5]     chr1   6262317-6262572      *
##       ...      ...               ...    ...
##   [13906]     chrY 90742039-90742407      *
##   [13907]     chrY 90742425-90742646      *
##   [13908]     chrY 90742687-90742878      *
##   [13909]     chrY 90742905-90743091      *
##   [13910]     chrY 90825379-90825714      *
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths
```

---
## Reading in a set of peaks

We will want to assign a sensible set of names to our peak calls.

We can use the **gsub()** and **basename()** function with our file names to create some samplenames.

The **basename()** function accepts a file path (such as the path to our bam files) and returns just the file name (removing directory paths).

The **gsub()** function accepts the text to replace, the replacement text and a character vector to replace within.

``` r
fileNames <- basename(peakFiles)
fileNames
```

```
## [1] "ch12_1_peaks.xls" "ch12_2_peaks.xls" "Mel_1_peaks.xls"  "Mel_2_peaks.xls"
```

``` r
sampleNames <- gsub("_peaks.xls", "", fileNames)
sampleNames
```

```
## [1] "ch12_1" "ch12_2" "Mel_1"  "Mel_2"
```

---
## Reading in a set of peaks

Now we have a named list of our peak calls as **GRanges** objects.

We can convert our list of **GRanges** objects to a **GRangesList** using the **GRangesList()** function.

``` r
macsPeaks_GRL <- GRangesList(macsPeaks_GR)
names(macsPeaks_GRL) <- sampleNames
class(macsPeaks_GRL)
```

```
## [1] "CompressedGRangesList"
## attr(,"package")
## [1] "GenomicRanges"
```

``` r
names(macsPeaks_GRL)
```

```
## [1] "ch12_1" "ch12_2" "Mel_1"  "Mel_2"
```

---
## GRangesList objects

The GRangesList object can behave just as our standard lists. Here we use the **lengths()** function to a get the number of peaks in each replicate.

``` r
lengths(macsPeaks_GRL)
```

```
## ch12_1 ch12_2  Mel_1  Mel_2 
##  13910  28420  13777  13512
```

---
## GRangesList objects

A major advantage of **GRangesList** objects is that we can apply many of the **GRanges** accessor and operator functions directly to our **GRangesList**.

This means there is no need to lapply and convert back to **GRangesList** if we wish to alter our **GRanges** by a common method.

``` r
library(rtracklayer)
macsPeaks_GRLCentred <- resize(macsPeaks_GRL, 10, fix = "center")
width(macsPeaks_GRLCentred)
```

```
## IntegerList of length 4
## [["ch12_1"]] 10 10 10 10 10 10 10 10 10 10 10 10 10 ... 10 10 10 10 10 10 10 10 10 10 10 10
## [["ch12_2"]] 10 10 10 10 10 10 10 10 10 10 10 10 10 ... 10 10 10 10 10 10 10 10 10 10 10 10
## [["Mel_1"]] 10 10 10 10 10 10 10 10 10 10 10 10 10 ... 10 10 10 10 10 10 10 10 10 10 10 10
## [["Mel_2"]] 10 10 10 10 10 10 10 10 10 10 10 10 10 ... 10 10 10 10 10 10 10 10 10 10 10 10
```

---
## GRangesList objects

Now we have our GRangesList we can extract the peak calls for the Mel replicates.

``` r
Mel_1_Peaks <- macsPeaks_GRL$Mel_1
Mel_2_Peaks <- macsPeaks_GRL$Mel_2
length(Mel_1_Peaks)
```

```
## [1] 13777
```

``` r
length(Mel_2_Peaks)
```

```
## [1] 13512
```

---

## Finding unique peaks

We can extract peak calls unique to replicate 1 or 2 using the **%over%** operator.

``` r
Mel_1_Unique <- Mel_1_Peaks[!Mel_1_Peaks %over% Mel_2_Peaks]
Mel_2_Unique <- Mel_2_Peaks[!Mel_2_Peaks %over% Mel_1_Peaks]
length(Mel_1_Unique)
```

```
## [1] 4668
```

``` r
length(Mel_2_Unique)
```

```
## [1] 4263
```

``` r
export.bed(Mel_1_Unique, "Mel_1_Unique.bed")
export.bed(Mel_2_Unique, "Mel_2_Unique.bed")
```

---

## Finding unique peaks

![](imgs/uniquePeaks.png)
---

## Finding common peaks

Similarly we can extract peak calls common to replicate 1 or 2.

The numbers in common however differ. This is because 2 peak calls in one sample can overlap 1 peak call in the other replicate.

``` r
Mel_1_Common <- Mel_1_Peaks[Mel_1_Peaks %over% Mel_2_Peaks]
Mel_2_Common <- Mel_2_Peaks[Mel_2_Peaks %over% Mel_1_Peaks]
length(Mel_1_Common)
```

```
## [1] 9109
```

``` r
length(Mel_2_Common)
```

```
## [1] 9249
```

``` r
export.bed(Mel_1_Common, "Mel_1_Common.bed")
export.bed(Mel_2_Common, "Mel_2_Common.bed")
```

---
## Finding common peaks
![](imgs/commonP.png)

---
## Finding common peaks
Despite overlapping these peaks are not identical. So how do we determine a common consensus peak several samples. 
![](imgs/2for1.png)

---
## Define a consensus, redundant set

To address this problem, a common operation in ChIPseq is to define a nonredundant set of peaks across all samples.

To do this we first pool all our peaks across all replicates, here Mel and Ch12, into one set of redundant, overlapping peaks.

``` r
allPeaksSet_Overlapping <- unlist(macsPeaks_GRL)
allPeaksSet_Overlapping
```

```
## GRanges object with 69619 ranges and 0 metadata columns:
##          seqnames              ranges strand
##             <Rle>           <IRanges>  <Rle>
##   ch12_1     chr1     4490620-4490773      *
##   ch12_1     chr1     4671581-4671836      *
##   ch12_1     chr1     4785359-4785661      *
##   ch12_1     chr1     5283235-5283439      *
##   ch12_1     chr1     6262317-6262572      *
##      ...      ...                 ...    ...
##    Mel_2     chrX 169107487-169107684      *
##    Mel_2     chrX 169299643-169299795      *
##    Mel_2     chrY       897622-897763      *
##    Mel_2     chrY     1245573-1245845      *
##    Mel_2     chrY     1286522-1286823      *
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths
```

---
## Define a consensus, nonredundant set

We can then use the **reduce()** function to collapse our peaks into nonredundant, distinct peaks representing peaks present in any sample.

``` r
allPeaksSet_nR <- reduce(allPeaksSet_Overlapping)
allPeaksSet_nR
```

```
## GRanges object with 38937 ranges and 0 metadata columns:
##           seqnames            ranges strand
##              <Rle>         <IRanges>  <Rle>
##       [1]     chr1   4416806-4417001      *
##       [2]     chr1   4490620-4490773      *
##       [3]     chr1   4671488-4671836      *
##       [4]     chr1   4785341-4785781      *
##       [5]     chr1   4857628-4857783      *
##       ...      ...               ...    ...
##   [38933]     chrY 90741011-90741723      *
##   [38934]     chrY 90741934-90743197      *
##   [38935]     chrY 90744329-90744494      *
##   [38936]     chrY 90825195-90825796      *
##   [38937]     chrY 90828711-90828830      *
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths
```

``` r
export.bed(allPeaksSet_nR, "allPeaksSet_nR.bed")
```

---
## Define a consensus, nonredundant set
![](imgs/nrp.png)

---
## Defining a common set of peaks

With our newly defined nonredundant peak set we can now identify from this set which peaks were present in both our replicates using the **%over%** operator and a logical expression.

``` r
commonPeaks <- allPeaksSet_nR[allPeaksSet_nR %over% Mel_1_Peaks & allPeaksSet_nR %over% Mel_2_Peaks]
commonPeaks
```

```
## GRanges object with 8888 ranges and 0 metadata columns:
##          seqnames              ranges strand
##             <Rle>           <IRanges>  <Rle>
##      [1]     chr1     4785341-4785781      *
##      [2]     chr1     7397557-7398093      *
##      [3]     chr1     9545120-9545648      *
##      [4]     chr1     9943560-9943856      *
##      [5]     chr1     9943893-9944649      *
##      ...      ...                 ...    ...
##   [8884]     chrX 166535216-166535529      *
##   [8885]     chrX 168673273-168673855      *
##   [8886]     chrX 169047634-169047962      *
##   [8887]     chrY     1245573-1245871      *
##   [8888]     chrY     1286522-1286823      *
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths
```

``` r
export.bed(commonPeaks, "commonPeaks.bed")
```

---
## Defining a common set of peaks
![](imgs/commonPR.png)

---
## Defining a unique set of peaks

Similarly we can identify which peaks are present only in one replicate.

``` r
mel1_Only <- allPeaksSet_nR[allPeaksSet_nR %over% Mel_1_Peaks & !allPeaksSet_nR %over% Mel_2_Peaks]
mel2_Only <- allPeaksSet_nR[!allPeaksSet_nR %over% Mel_1_Peaks & allPeaksSet_nR %over% Mel_2_Peaks]
length(mel1_Only)
```

```
## [1] 4445
```

``` r
length(mel2_Only)
```

```
## [1] 4185
```

``` r
export.bed(mel1_Only, "mel1_Only.bed")
export.bed(mel2_Only, "mel2_Only.bed")
```

---
## Defining a unique set of peaks
![](imgs/newMelOnly.png)

---
## Complex overlaps

When working with larger numbers of peaks we will often define a logical matrix describing in which samples our nonredundant peaks were present.

First then we use a loop to generate a logical vector for the occurence of nonredundant peaks in each sample.

``` r
overlap <- list()
for (i in 1:length(macsPeaks_GRL)) {
    overlap[[i]] <- allPeaksSet_nR %over% macsPeaks_GRL[[i]]
}
overlap[[1]][1:2]
```

```
## [1] FALSE  TRUE
```

---
## Complex overlaps

We can now use to **do.call** and **cbind** function to column bind our list of overlaps into our matrix of peak occurrence.

``` r
overlapMatrix <- do.call(cbind, overlap)
colnames(overlapMatrix) <- names(macsPeaks_GRL)
overlapMatrix[1:2, ]
```

```
##      ch12_1 ch12_2 Mel_1 Mel_2
## [1,]  FALSE   TRUE FALSE FALSE
## [2,]   TRUE  FALSE FALSE FALSE
```

---
## Complex overlaps

We can add the matrix back into the metadata columns of our **GRanges()** of nonredundant peaks using the **mcols()** accessor.

Now we have our nonredundant peaks and the occurence of these peaks in every sample we can easily identify peaks unique or common to replicates and conditions/cell lines.

``` r
mcols(allPeaksSet_nR) <- overlapMatrix
allPeaksSet_nR[1:2, ]
```

```
## GRanges object with 2 ranges and 4 metadata columns:
##       seqnames          ranges strand |    ch12_1    ch12_2     Mel_1     Mel_2
##          <Rle>       <IRanges>  <Rle> | <logical> <logical> <logical> <logical>
##   [1]     chr1 4416806-4417001      * |     FALSE      TRUE     FALSE     FALSE
##   [2]     chr1 4490620-4490773      * |      TRUE     FALSE     FALSE     FALSE
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths
```

---
## Complex overlaps

The **limma** package is commonly used in the analysis of RNAseq and microarray data and contains many useful helpful functions.

One very useful function is the **vennDiagram** function which allows us to plot overlaps from a logical matrix, just like the one we created.

``` r
library(limma)
vennDiagram(mcols(allPeaksSet_nR))
```

![](ChIPseq_In_Bioconductor4_files/figure-html/unnamed-chunk-22-1.png)

---
## Complex overlaps

The **limma** package's **vennCounts** function allows us to retrieve the counts displayed in the Venn diagram as a data.frame.

``` r
vennCounts(mcols(allPeaksSet_nR))
```

```
##    ch12_1 ch12_2 Mel_1 Mel_2 Counts
## 1       0      0     0     0      0
## 2       0      0     0     1   3361
## 3       0      0     1     0   2435
## 4       0      0     1     1   3876
## 5       0      1     0     0  13042
## 6       0      1     0     1    378
## 7       0      1     1     0   1275
## 8       0      1     1     1   1195
## 9       1      0     0     0   1446
## 10      1      0     0     1     46
## 11      1      0     1     0     24
## 12      1      0     1     1     91
## 13      1      1     0     0   6931
## 14      1      1     0     1    400
## 15      1      1     1     0    711
## 16      1      1     1     1   3726
## attr(,"class")
## [1] "VennCounts"
```

---
## High confidence peaks

With our nonredundant set of peaks and our matrix of peak occurrence, we can define replicated peaks within conditions.

Here we define the peaks which occur in both the Ch12 replicates.

Since logical matrix is equivalent to a 1 or 0  matrix (1 = TRUE and 0 = FALSE), we can use the rowSums function to extract peaks in at least 2 of the Ch12 replicates.

``` r
ch12_HC_Peaks <- allPeaksSet_nR[rowSums(as.data.frame(mcols(allPeaksSet_nR)[, c("ch12_1", "ch12_2")])) >=
    2]

export.bed(ch12_HC_Peaks, "ch12_HC_Peaks.bed")

ch12_HC_Peaks[1:2, ]
```

```
## GRanges object with 2 ranges and 4 metadata columns:
##       seqnames          ranges strand |    ch12_1    ch12_2     Mel_1     Mel_2
##          <Rle>       <IRanges>  <Rle> | <logical> <logical> <logical> <logical>
##   [1]     chr1 4671488-4671836      * |      TRUE      TRUE     FALSE     FALSE
##   [2]     chr1 4785341-4785781      * |      TRUE      TRUE      TRUE      TRUE
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths
```

---
## High confidence peaks

![](imgs/ch12HC.png)
---
## High confidence unique peaks

Similarly we can define peaks which are replicated in Ch12 but absent in Mel samples.

``` r
ch12_HC_UniquePeaks <- allPeaksSet_nR[rowSums(as.data.frame(mcols(allPeaksSet_nR)[, c("ch12_1",
    "ch12_2")])) >= 2 & rowSums(as.data.frame(mcols(allPeaksSet_nR)[, c("Mel_1", "Mel_2")])) ==
    0]
export.bed(ch12_HC_UniquePeaks, "ch12_HC_UniquePeaks.bed")
ch12_HC_UniquePeaks[1, ]
```

```
## GRanges object with 1 range and 4 metadata columns:
##       seqnames          ranges strand |    ch12_1    ch12_2     Mel_1     Mel_2
##          <Rle>       <IRanges>  <Rle> | <logical> <logical> <logical> <logical>
##   [1]     chr1 4671488-4671836      * |      TRUE      TRUE     FALSE     FALSE
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths
```

---
## High confidence unique peaks
![](imgs/ch12U.png)

---
class: inverse, center, middle

# Differential Peaks

---

## Finding differential regions

Identifying peaks specific to cell lines or conditions however does not capture the full range of changes in epigenetic events.

To identify differences in epigenetic events we can attempt to quantify the changes in fragment abundance from IP samples across our nonredundant set of peaks.

![](imgs/igsss_1.png)

---
## Finding differential regions

We first must establish a set of regions within which to quantify IP-ed fragments.

An established technique is to produce a set of nonredundant peaks which occur in the majority of at least one experimental condition under evaluation.

Here we identify peaks which occurred in both replicates in either Mel or Ch12 cell lines.

``` r
HC_Peaks <- allPeaksSet_nR[rowSums(as.data.frame(mcols(allPeaksSet_nR)[, c("ch12_1", "ch12_2")])) >=
    2 | rowSums(as.data.frame(mcols(allPeaksSet_nR)[, c("Mel_1", "Mel_2")])) >= 2]
HC_Peaks
```

```
## GRanges object with 16930 ranges and 4 metadata columns:
##           seqnames            ranges strand |    ch12_1    ch12_2     Mel_1     Mel_2
##              <Rle>         <IRanges>  <Rle> | <logical> <logical> <logical> <logical>
##       [1]     chr1   4671488-4671836      * |      TRUE      TRUE     FALSE     FALSE
##       [2]     chr1   4785341-4785781      * |      TRUE      TRUE      TRUE      TRUE
##       [3]     chr1   5283217-5283439      * |      TRUE      TRUE     FALSE     FALSE
##       [4]     chr1   6262317-6262572      * |      TRUE      TRUE     FALSE     FALSE
##       [5]     chr1   6406474-6406849      * |      TRUE      TRUE     FALSE     FALSE
##       ...      ...               ...    ... .       ...       ...       ...       ...
##   [16926]     chrY 90739739-90740228      * |      TRUE      TRUE     FALSE     FALSE
##   [16927]     chrY 90740354-90740695      * |      TRUE      TRUE     FALSE     FALSE
##   [16928]     chrY 90741011-90741723      * |      TRUE      TRUE     FALSE     FALSE
##   [16929]     chrY 90741934-90743197      * |      TRUE      TRUE     FALSE     FALSE
##   [16930]     chrY 90825195-90825796      * |      TRUE      TRUE     FALSE     FALSE
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths
```

``` r
export.bed(HC_Peaks, "HC_Peaks.bed")
```

---
## Finding differential regions
![](imgs/hcPeaks2.png)

---
## Counting regions

We will count from our aligned BAM files to quantify IP fragments.

As we have seen previously we can use the **BamFileList()** function to specify which BAMs to count and importantly, to control memory we specify the number of reads to be held in memory at one time using the **yield()** parameter.

``` r
library(Rsamtools)

bams <- c("~/Projects/Results/chipseq/testRun/BAMs/Sorted_Myc_Ch12_1.bam", "~/Projects/Results/chipseq/testRun/BAMs/Sorted_Myc_Ch12_2.bam",
    "~/Projects/Results/chipseq/testRun/BAMs/Sorted_Myc_Mel_1.bam", "~/Projects/Results/chipseq/testRun/BAMs/Sorted_Myc_Mel_2.bam")
bamFL <- BamFileList(bams, yieldSize = 5e+06)
bamFL
```

---
## Counting regions

We can count the number of fragments overlapping our peaks using the **summarizeOverlaps** function. Since ChIPseq is strandless, we set the **ignore.strand** parameter to **TRUE**.

The returned object is a familiar **RangedSummarizedExperiment** containing our GRanges of nonredundant peaks and the counts in these regions for our BAM files.

``` r
library(GenomicAlignments)
myMycCounts <- summarizeOverlaps(HC_Peaks, reads = bamFL, ignore.strand = TRUE)
class(myMycCounts)
save(myMycCounts, file = "data/MycCounts.RData")
```

```
## [1] "RangedSummarizedExperiment"
## attr(,"package")
## [1] "SummarizedExperiment"
```

---
## Differential regions using DESeq2

To assess changes in ChIPseq signal across cell lines we will use the **DESeq2** package.

The DESeq2 package contains a workflow for assessing local changes in fragment/read abundance between replicated conditions. This workflow includes normalization, variance estimation, outlier removal/replacement as well as significance testing suited to high throughput sequencing data (i.e. integer counts).

To make use of DESeq2 workflow we must first create a data.frame of conditions of interest with rownames set as our BAM file names.

``` r
metaDataFrame <- data.frame(CellLine = c("Ch12", "Ch12", "Mel", "Mel"))
rownames(metaDataFrame) <- colnames(myMycCounts)
metaDataFrame
```

```
##                       CellLine
## Sorted_Myc_Ch12_1.bam     Ch12
## Sorted_Myc_Ch12_2.bam     Ch12
## Sorted_Myc_Mel_1.bam       Mel
## Sorted_Myc_Mel_2.bam       Mel
```

---
## Differential regions using DESeq2

We can use the **DESeqDataSetFromMatrix()** function to create a **DESeq2** object.

We must provide our matrix of counts to **countData** parameter, our metadata data.frame to **colData** parameter and we include to an optional parameter of **rowRanges** the nonredundant peak set we can counted on.

Finally we provide the name of the column in our metadata data.frame within which we wish to test to the **design** parameter.

``` r
library(DESeq2)
deseqMyc <- DESeqDataSetFromMatrix(countData = assay(myMycCounts), colData = metaDataFrame, design = ~CellLine,
    rowRanges = HC_Peaks)
```

---
## Differential regions using DESeq2

We can now run the DESeq2 workflow on our **DESeq2** object using the **DESeq()** function.

``` r
deseqMyc <- DESeq(deseqMyc)
```

```
## estimating size factors
```

```
## estimating dispersions
```

```
## gene-wise dispersion estimates
```

```
## mean-dispersion relationship
```

```
## final dispersion estimates
```

```
## fitting model and testing
```

---
## Differential regions using DESeq2

Our **DESeq2** object is updated to include useful statistics such our normalised values and variance of signal within each nonredundant peak call.

``` r
deseqMyc
```

```
## class: DESeqDataSet 
## dim: 16930 4 
## metadata(1): version
## assays(4): counts mu H cooks
## rownames: NULL
## rowData names(26): ch12_1 ch12_2 ... deviance maxCooks
## colnames(4): Sorted_Myc_Ch12_1.bam Sorted_Myc_Ch12_2.bam Sorted_Myc_Mel_1.bam
##   Sorted_Myc_Mel_2.bam
## colData names(2): CellLine sizeFactor
```
---
## Differential regions using DESeq2

We can extract our information of differential regions using the **results()** function.

We provide to the **results()** function the **DESeq2** object, the comparison of interest to the **contrast** parameter and the type of output to return to the **format** parameter.

The comparison to **contrast** parameter is provided as a vector of length 3 including the metadata column of interest and groups to test.

We can sort the results by pvalue using the **order()** function to rank by the most significant changes.

``` r
MelMinusCh12 <- results(deseqMyc, contrast = c("CellLine", "Mel", "Ch12"), format = "GRanges")
MelMinusCh12 <- MelMinusCh12[order(MelMinusCh12$pvalue), ]
class(MelMinusCh12)
```

```
## [1] "GRanges"
## attr(,"package")
## [1] "GenomicRanges"
```

---
## Differential regions using DESeq2

The GRanges object contains information on the comparison made in DESeq2.

Most useful it contains the the difference in IP signal as log2 fold change in **log2FoldChange**, the significance of the change in the **pvalue** column and an adjusted p-value to address multiple correction in **padj** column.

``` r
MelMinusCh12[1, ]
```

```
## GRanges object with 1 range and 6 metadata columns:
##       seqnames            ranges strand |  baseMean log2FoldChange     lfcSE      stat
##          <Rle>         <IRanges>  <Rle> | <numeric>      <numeric> <numeric> <numeric>
##   [1]     chr4 45953273-45954170      * |   613.942        6.09357  0.297353   20.4927
##            pvalue        padj
##         <numeric>   <numeric>
##   [1] 2.50058e-93 4.23348e-89
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths
```

---
## Differential regions using DESeq2

We can now filter our nonredundant peaks to those with significantly more signal in Mel or Ch12 cell lines by filtering by log2FoldChange and padj (p-value adjusted for multiple correction) less than 0.05.

``` r
MelMinusCh12Filt <- MelMinusCh12[!is.na(MelMinusCh12$pvalue) | !is.na(MelMinusCh12$padj)]
UpinMel <- MelMinusCh12[MelMinusCh12$padj < 0.05 & MelMinusCh12$log2FoldChange > 0]
DowninMel <- MelMinusCh12[MelMinusCh12$padj < 0.05 & MelMinusCh12$log2FoldChange < 0]
export.bed(UpinMel, "UpinMel.bed")
export.bed(DowninMel, "DowninMel.bed")
```

---
## Differential regions using DESeq2
![](imgs/DB.png)

---
## Differential regions using DESeq2

Finally we can make our reviewing of sites in IGV a little easier using the **tracktables** package.

The **tracktables** package's **makebedtable()** function accepts a **GRanges** object and writes an HTML report contains links to IGV.

An example can be found [here](../../data/MelMinusCh12.html)

``` r
library(tracktables)
myReport <- makebedtable(MelMinusCh12Filt, "MelMinusCh12.html", basedirectory = getwd())

browseURL(myReport)
```

---
## Time for an exercise!

Exercise on ChIPseq data can be found [here](../../exercises/exercises/chipseq_part4_exercise.html)

---
## Answers to exercise

Answers can be found [here](../../exercises/answers/chipseq_part4_answers.html)