+ - 0:00:00
Notes for current slide
Notes for next slide

ChIPseq In Bioconductor (part4)


Rockefeller University, Bioinformatics Resource Center

http://rockefelleruniversity.github.io/RU_ChIPseq/

1 / 54

Data

In todays session we will continue to review the Myc ChIPseq we were working on in our last sessions.

This include Myc ChIPseq for MEL and Ch12 celllines.

Information and files for the Myc ChIPseq in MEL cell line can be found here

Information and files for the Myc ChIPseq in Ch12 cell line can be found here

2 / 54

Data

I have provided peak calls from MACS2 following the processing steps outlined in our last session.

Peak calls for Myc in MEL and Ch12 cellines can be found in

data/peaks/

  • data/peaks/Mel_1_peaks.xls
  • data/peaks/Mel_2_peaks.xls
  • data/peaks/Ch12_1_peaks.xls
  • data/peaks/Ch12_1_peaks.xls
3 / 54

TF binding and epigenetic states

A common goal in ChIPseq is characterise genome wide transcription factor binding sites or epigenetic states.

The presence of transcription factor binding sites and epigenetics events is often further analyzed in the context of their putative targets genes to characterize the transcription factor's and epigenetic event's function and/or biological role.

offset
4 / 54

TF binding and epigenetic states

With the release of ENCODE's wide scale mapping of transcription factor binding sites or epigenetic states and the advent of multiplexing technologies for high throughput sequencing, in has become common practice to have replicated ChIPseq experiments so as to have higher confidence in identified epigenetic events.

offset
5 / 54

TF binding and epigenetic states

In addition to the genome wide characterization of epigenetic events, ChIPseq had been increasingly used to identify changes in epigenetic events between conditions and/or cell lines.

offset
6 / 54

Myc ChIPseq in Mel and Ch12 cell lines

We have been working to process and a characterize a Myc ChIPseq replicate in the Mel cell line.

In this session we will look at how we can define a high confidence/reproducible set of Myc peaks in the Mel cell line as well as identify Myc binding events unique or common between Mel and Ch12 cell lines.

7 / 54

Consensus Peaks


8 / 54

Reading in a set of peaks

First we need to read our peak calls from MACS2 into R.

The Myc peak calls we will review are within the peaks directory, so here we list all files matching our expected file pattern using the dir() function.

peakFiles <- dir("data/peaks/", pattern = "*.peaks", full.names = TRUE)
peakFiles
## [1] "data/peaks//ch12_1_peaks.xls" "data/peaks//ch12_2_peaks.xls"
## [3] "data/peaks//Mel_1_peaks.xls" "data/peaks//Mel_2_peaks.xls"
9 / 54

Reading in a set of peaks

We can loop through our tab separated files (disguised as .xls functions) and import them into R as a list of data.frames using a loop.

macsPeaks_DF <- list()
for (i in 1:length(peakFiles)) {
macsPeaks_DF[[i]] <- read.delim(peakFiles[i], comment.char = "#")
}
length(macsPeaks_DF)
## [1] 4
10 / 54

Reading in a set of peaks

Now with our list of data.frames of peak calls, we loop through the list and create a GRanges for each of our peak calls.

Remember you can also do this with the import function from rtracklayer.

library(GenomicRanges)
macsPeaks_GR <- list()
for (i in 1:length(macsPeaks_DF)) {
peakDFtemp <- macsPeaks_DF[[i]]
macsPeaks_GR[[i]] <- GRanges(seqnames = peakDFtemp[, "chr"], IRanges(peakDFtemp[,
"start"], peakDFtemp[, "end"]))
}
macsPeaks_GR[[1]]
## GRanges object with 13910 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 4490620-4490773 *
## [2] chr1 4671581-4671836 *
## [3] chr1 4785359-4785661 *
## [4] chr1 5283235-5283439 *
## [5] chr1 6262317-6262572 *
## ... ... ... ...
## [13906] chrY 90742039-90742407 *
## [13907] chrY 90742425-90742646 *
## [13908] chrY 90742687-90742878 *
## [13909] chrY 90742905-90743091 *
## [13910] chrY 90825379-90825714 *
## -------
## seqinfo: 21 sequences from an unspecified genome; no seqlengths
11 / 54

Reading in a set of peaks

We will want to assign a sensible set of names to our peak calls.

We can use the gsub() and basename() function with our file names to create some samplenames.

The basename() function accepts a file path (such as the path to our bam files) and returns just the file name (removing directory paths).

The gsub() function accepts the text to replace, the replacement text and a character vector to replace within.

fileNames <- basename(peakFiles)
fileNames
## [1] "ch12_1_peaks.xls" "ch12_2_peaks.xls" "Mel_1_peaks.xls" "Mel_2_peaks.xls"
sampleNames <- gsub("_peaks.xls", "", fileNames)
sampleNames
## [1] "ch12_1" "ch12_2" "Mel_1" "Mel_2"
12 / 54

Reading in a set of peaks

Now we have a named list of our peak calls as GRanges objects.

We can convert our list of GRanges objects to a GRangesList using the GRangesList() function.

macsPeaks_GRL <- GRangesList(macsPeaks_GR)
names(macsPeaks_GRL) <- sampleNames
class(macsPeaks_GRL)
## [1] "CompressedGRangesList"
## attr(,"package")
## [1] "GenomicRanges"
names(macsPeaks_GRL)
## [1] "ch12_1" "ch12_2" "Mel_1" "Mel_2"
13 / 54

GRangesList objects

The GRangesList object can behave just as our standard lists. Here we use the lengths() function to a get the number of peaks in each replicate.

lengths(macsPeaks_GRL)
## ch12_1 ch12_2 Mel_1 Mel_2
## 13910 28420 13777 13512
14 / 54

GRangesList objects

A major advantage of GRangesList objects is that we can apply many of the GRanges accessor and operator functions directly to our GRangesList.

This means there is no need to lapply and convert back to GRangesList if we wish to alter our GRanges by a common method.

library(rtracklayer)
macsPeaks_GRLCentred <- resize(macsPeaks_GRL, 10, fix = "center")
width(macsPeaks_GRLCentred)
## IntegerList of length 4
## [["ch12_1"]] 10 10 10 10 10 10 10 10 10 10 10 ... 10 10 10 10 10 10 10 10 10 10
## [["ch12_2"]] 10 10 10 10 10 10 10 10 10 10 10 ... 10 10 10 10 10 10 10 10 10 10
## [["Mel_1"]] 10 10 10 10 10 10 10 10 10 10 10 ... 10 10 10 10 10 10 10 10 10 10
## [["Mel_2"]] 10 10 10 10 10 10 10 10 10 10 10 ... 10 10 10 10 10 10 10 10 10 10
15 / 54

GRangesList objects

Now we have our GRangesList we can extract the peak calls for the Mel replicates.

Mel_1_Peaks <- macsPeaks_GRL$Mel_1
Mel_2_Peaks <- macsPeaks_GRL$Mel_2
length(Mel_1_Peaks)
## [1] 13777
length(Mel_2_Peaks)
## [1] 13512
16 / 54

Finding unique peaks

We can extract peak calls unique to replicate 1 or 2 using the %over% operator.

Mel_1_Unique <- Mel_1_Peaks[!Mel_1_Peaks %over% Mel_2_Peaks]
Mel_2_Unique <- Mel_2_Peaks[!Mel_2_Peaks %over% Mel_1_Peaks]
length(Mel_1_Unique)
## [1] 4668
length(Mel_2_Unique)
## [1] 4263
export.bed(Mel_1_Unique, "Mel_1_Unique.bed")
export.bed(Mel_2_Unique, "Mel_2_Unique.bed")
17 / 54

Finding unique peaks

18 / 54

Finding common peaks

Similarly we can extract peak calls common to replicate 1 or 2.

The numbers in common however differ. This is because 2 peak calls in one sample can overlap 1 peak call in the other replicate.

Mel_1_Common <- Mel_1_Peaks[Mel_1_Peaks %over% Mel_2_Peaks]
Mel_2_Common <- Mel_2_Peaks[Mel_2_Peaks %over% Mel_1_Peaks]
length(Mel_1_Common)
## [1] 9109
length(Mel_2_Common)
## [1] 9249
export.bed(Mel_1_Common, "Mel_1_Common.bed")
export.bed(Mel_2_Common, "Mel_2_Common.bed")
19 / 54

Finding common peaks

20 / 54

Finding common peaks

Despite overlapping these peaks are not identical. So how do we determine a common consensus peak several samples.

21 / 54

Define a consensus, redundant set

To address this problem, a common operation in ChIPseq is to define a nonredundant set of peaks across all samples.

To do this we first pool all our peaks across all replicates, here Mel and Ch12, into one set of redundant, overlapping peaks.

allPeaksSet_Overlapping <- unlist(macsPeaks_GRL)
allPeaksSet_Overlapping
## GRanges object with 69619 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## ch12_1 chr1 4490620-4490773 *
## ch12_1 chr1 4671581-4671836 *
## ch12_1 chr1 4785359-4785661 *
## ch12_1 chr1 5283235-5283439 *
## ch12_1 chr1 6262317-6262572 *
## ... ... ... ...
## Mel_2 chrX 169107487-169107684 *
## Mel_2 chrX 169299643-169299795 *
## Mel_2 chrY 897622-897763 *
## Mel_2 chrY 1245573-1245845 *
## Mel_2 chrY 1286522-1286823 *
## -------
## seqinfo: 21 sequences from an unspecified genome; no seqlengths
22 / 54

Define a consensus, nonredundant set

We can then use the reduce() function to collapse our peaks into nonredundant, distinct peaks representing peaks present in any sample.

allPeaksSet_nR <- reduce(allPeaksSet_Overlapping)
allPeaksSet_nR
## GRanges object with 38937 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 4416806-4417001 *
## [2] chr1 4490620-4490773 *
## [3] chr1 4671488-4671836 *
## [4] chr1 4785341-4785781 *
## [5] chr1 4857628-4857783 *
## ... ... ... ...
## [38933] chrY 90741011-90741723 *
## [38934] chrY 90741934-90743197 *
## [38935] chrY 90744329-90744494 *
## [38936] chrY 90825195-90825796 *
## [38937] chrY 90828711-90828830 *
## -------
## seqinfo: 21 sequences from an unspecified genome; no seqlengths
export.bed(allPeaksSet_nR, "allPeaksSet_nR.bed")
23 / 54

Define a consensus, nonredundant set

24 / 54

Defining a common set of peaks

With our newly defined nonredundant peak set we can now identify from this set which peaks were present in both our replicates using the %over% operator and a logical expression.

commonPeaks <- allPeaksSet_nR[allPeaksSet_nR %over% Mel_1_Peaks & allPeaksSet_nR %over%
Mel_2_Peaks]
commonPeaks
## GRanges object with 8888 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 4785341-4785781 *
## [2] chr1 7397557-7398093 *
## [3] chr1 9545120-9545648 *
## [4] chr1 9943560-9943856 *
## [5] chr1 9943893-9944649 *
## ... ... ... ...
## [8884] chrX 166535216-166535529 *
## [8885] chrX 168673273-168673855 *
## [8886] chrX 169047634-169047962 *
## [8887] chrY 1245573-1245871 *
## [8888] chrY 1286522-1286823 *
## -------
## seqinfo: 21 sequences from an unspecified genome; no seqlengths
export.bed(commonPeaks, "commonPeaks.bed")
25 / 54

Defining a common set of peaks

26 / 54

Defining a unique set of peaks

Similarly we can identify which peaks are present only in one replicate.

mel1_Only <- allPeaksSet_nR[allPeaksSet_nR %over% Mel_1_Peaks & !allPeaksSet_nR %over%
Mel_2_Peaks]
mel2_Only <- allPeaksSet_nR[!allPeaksSet_nR %over% Mel_1_Peaks & allPeaksSet_nR %over%
Mel_2_Peaks]
length(mel1_Only)
## [1] 4445
length(mel2_Only)
## [1] 4185
export.bed(mel1_Only, "mel1_Only.bed")
export.bed(mel2_Only, "mel2_Only.bed")
27 / 54

Defining a unique set of peaks

28 / 54

Complex overlaps

When working with larger numbers of peaks we will often define a logical matrix describing in which samples our nonredundant peaks were present.

First then we use a loop to generate a logical vector for the occurence of nonredundant peaks in each sample.

overlap <- list()
for (i in 1:length(macsPeaks_GRL)) {
overlap[[i]] <- allPeaksSet_nR %over% macsPeaks_GRL[[i]]
}
overlap[[1]][1:2]
## [1] FALSE TRUE
29 / 54

Complex overlaps

We can now use to do.call and cbind function to column bind our list of overlaps into our matrix of peak occurrence.

overlapMatrix <- do.call(cbind, overlap)
colnames(overlapMatrix) <- names(macsPeaks_GRL)
overlapMatrix[1:2, ]
## ch12_1 ch12_2 Mel_1 Mel_2
## [1,] FALSE TRUE FALSE FALSE
## [2,] TRUE FALSE FALSE FALSE
30 / 54

Complex overlaps

We can add the matrix back into the metadata columns of our GRanges() of nonredundant peaks using the mcols() accessor.

Now we have our nonredundant peaks and the occurence of these peaks in every sample we can easily identify peaks unique or common to replicates and conditions/cell lines.

mcols(allPeaksSet_nR) <- overlapMatrix
allPeaksSet_nR[1:2, ]
## GRanges object with 2 ranges and 4 metadata columns:
## seqnames ranges strand | ch12_1 ch12_2 Mel_1 Mel_2
## <Rle> <IRanges> <Rle> | <logical> <logical> <logical> <logical>
## [1] chr1 4416806-4417001 * | FALSE TRUE FALSE FALSE
## [2] chr1 4490620-4490773 * | TRUE FALSE FALSE FALSE
## -------
## seqinfo: 21 sequences from an unspecified genome; no seqlengths
31 / 54

Complex overlaps

The limma package is commonly used in the analysis of RNAseq and microarray data and contains many useful helpful functions.

One very useful function is the vennDiagram function which allows us to plot overlaps from a logical matrix, just like the one we created.

library(limma)
vennDiagram(mcols(allPeaksSet_nR))

32 / 54

Complex overlaps

The limma package's vennCounts function allows us to retrieve the counts displayed in the Venn diagram as a data.frame.

vennCounts(mcols(allPeaksSet_nR))
## ch12_1 ch12_2 Mel_1 Mel_2 Counts
## 1 0 0 0 0 0
## 2 0 0 0 1 3361
## 3 0 0 1 0 2435
## 4 0 0 1 1 3876
## 5 0 1 0 0 13042
## 6 0 1 0 1 378
## 7 0 1 1 0 1275
## 8 0 1 1 1 1195
## 9 1 0 0 0 1446
## 10 1 0 0 1 46
## 11 1 0 1 0 24
## 12 1 0 1 1 91
## 13 1 1 0 0 6931
## 14 1 1 0 1 400
## 15 1 1 1 0 711
## 16 1 1 1 1 3726
## attr(,"class")
## [1] "VennCounts"
33 / 54

High confidence peaks

With our nonredundant set of peaks and our matrix of peak occurrence, we can define replicated peaks within conditions.

Here we define the peaks which occur in both the Ch12 replicates.

Since logical matrix is equivalent to a 1 or 0 matrix (1 = TRUE and 0 = FALSE), we can use the rowSums function to extract peaks in at least 2 of the Ch12 replicates.

ch12_HC_Peaks <- allPeaksSet_nR[rowSums(as.data.frame(mcols(allPeaksSet_nR)[, c("ch12_1",
"ch12_2")])) >= 2]
export.bed(ch12_HC_Peaks, "ch12_HC_Peaks.bed")
ch12_HC_Peaks[1:2, ]
## GRanges object with 2 ranges and 4 metadata columns:
## seqnames ranges strand | ch12_1 ch12_2 Mel_1 Mel_2
## <Rle> <IRanges> <Rle> | <logical> <logical> <logical> <logical>
## [1] chr1 4671488-4671836 * | TRUE TRUE FALSE FALSE
## [2] chr1 4785341-4785781 * | TRUE TRUE TRUE TRUE
## -------
## seqinfo: 21 sequences from an unspecified genome; no seqlengths
34 / 54

High confidence peaks

35 / 54

High confidence unique peaks

Similarly we can define peaks which are replicated in Ch12 but absent in Mel samples.

ch12_HC_UniquePeaks <- allPeaksSet_nR[rowSums(as.data.frame(mcols(allPeaksSet_nR)[,
c("ch12_1", "ch12_2")])) >= 2 & rowSums(as.data.frame(mcols(allPeaksSet_nR)[,
c("Mel_1", "Mel_2")])) == 0]
export.bed(ch12_HC_UniquePeaks, "ch12_HC_UniquePeaks.bed")
ch12_HC_UniquePeaks[1, ]
## GRanges object with 1 range and 4 metadata columns:
## seqnames ranges strand | ch12_1 ch12_2 Mel_1 Mel_2
## <Rle> <IRanges> <Rle> | <logical> <logical> <logical> <logical>
## [1] chr1 4671488-4671836 * | TRUE TRUE FALSE FALSE
## -------
## seqinfo: 21 sequences from an unspecified genome; no seqlengths
36 / 54

High confidence unique peaks

37 / 54

Differential Peaks


38 / 54

Finding differential regions

Identifying peaks specific to cell lines or conditions however does not capture the full range of changes in epigenetic events.

To identify differences in epigenetic events we can attempt to quantify the changes in fragment abundance from IP samples across our nonredundant set of peaks.

39 / 54

Finding differential regions

We first must establish a set of regions within which to quantify IP-ed fragments.

An established technique is to produce a set of nonredundant peaks which occur in the majority of at least one experimental condition under evaluation.

Here we identify peaks which occurred in both replicates in either Mel or Ch12 cell lines.

HC_Peaks <- allPeaksSet_nR[rowSums(as.data.frame(mcols(allPeaksSet_nR)[, c("ch12_1",
"ch12_2")])) >= 2 | rowSums(as.data.frame(mcols(allPeaksSet_nR)[, c("Mel_1",
"Mel_2")])) >= 2]
HC_Peaks
## GRanges object with 16930 ranges and 4 metadata columns:
## seqnames ranges strand | ch12_1 ch12_2 Mel_1
## <Rle> <IRanges> <Rle> | <logical> <logical> <logical>
## [1] chr1 4671488-4671836 * | TRUE TRUE FALSE
## [2] chr1 4785341-4785781 * | TRUE TRUE TRUE
## [3] chr1 5283217-5283439 * | TRUE TRUE FALSE
## [4] chr1 6262317-6262572 * | TRUE TRUE FALSE
## [5] chr1 6406474-6406849 * | TRUE TRUE FALSE
## ... ... ... ... . ... ... ...
## [16926] chrY 90739739-90740228 * | TRUE TRUE FALSE
## [16927] chrY 90740354-90740695 * | TRUE TRUE FALSE
## [16928] chrY 90741011-90741723 * | TRUE TRUE FALSE
## [16929] chrY 90741934-90743197 * | TRUE TRUE FALSE
## [16930] chrY 90825195-90825796 * | TRUE TRUE FALSE
## Mel_2
## <logical>
## [1] FALSE
## [2] TRUE
## [3] FALSE
## [4] FALSE
## [5] FALSE
## ... ...
## [16926] FALSE
## [16927] FALSE
## [16928] FALSE
## [16929] FALSE
## [16930] FALSE
## -------
## seqinfo: 21 sequences from an unspecified genome; no seqlengths
export.bed(HC_Peaks, "HC_Peaks.bed")
40 / 54

Finding differential regions

41 / 54

Counting regions

We will count from our aligned BAM files to quantify IP fragments.

As we have seen previously we can use the BamFileList() function to specify which BAMs to count and importantly, to control memory we specify the number of reads to be held in memory at one time using the yield() parameter.

library(Rsamtools)
bams <- c("~/Projects/Results/chipseq/testRun/BAMs/Sorted_Myc_Ch12_1.bam", "~/Projects/Results/chipseq/testRun/BAMs/Sorted_Myc_Ch12_2.bam",
"~/Projects/Results/chipseq/testRun/BAMs/Sorted_Myc_Mel_1.bam", "~/Projects/Results/chipseq/testRun/BAMs/Sorted_Myc_Mel_2.bam")
bamFL <- BamFileList(bams, yieldSize = 5e+06)
bamFL
42 / 54

Counting regions

We can count the number of fragments overlapping our peaks using the summarizeOverlaps function. Since ChIPseq is strandless, we set the ignore.strand parameter to TRUE.

The returned object is a familiar RangedSummarizedExperiment containing our GRanges of nonredundant peaks and the counts in these regions for our BAM files.

library(GenomicAlignments)
myMycCounts <- summarizeOverlaps(HC_Peaks, reads = bamFL, ignore.strand = TRUE)
class(myMycCounts)
save(myMycCounts, file = "data/MycCounts.RData")
## [1] "RangedSummarizedExperiment"
## attr(,"package")
## [1] "SummarizedExperiment"
43 / 54

Differential regions using DESeq2

To assess changes in ChIPseq signal across cell lines we will use the DESeq2 package.

The DESeq2 package contains a workflow for assessing local changes in fragment/read abundance between replicated conditions. This workflow includes normalization, variance estimation, outlier removal/replacement as well as significance testing suited to high throughput sequencing data (i.e. integer counts).

To make use of DESeq2 workflow we must first create a data.frame of conditions of interest with rownames set as our BAM file names.

metaDataFrame <- data.frame(CellLine = c("Ch12", "Ch12", "Mel", "Mel"))
rownames(metaDataFrame) <- colnames(myMycCounts)
metaDataFrame
## CellLine
## Sorted_Myc_Ch12_1.bam Ch12
## Sorted_Myc_Ch12_2.bam Ch12
## Sorted_Myc_Mel_1.bam Mel
## Sorted_Myc_Mel_2.bam Mel
44 / 54

Differential regions using DESeq2

We can use the DESeqDataSetFromMatrix() function to create a DESeq2 object.

We must provide our matrix of counts to countData parameter, our metadata data.frame to colData parameter and we include to an optional parameter of rowRanges the nonredundant peak set we can counted on.

Finally we provide the name of the column in our metadata data.frame within which we wish to test to the design parameter.

library(DESeq2)
deseqMyc <- DESeqDataSetFromMatrix(countData = assay(myMycCounts), colData = metaDataFrame,
design = ~CellLine, rowRanges = HC_Peaks)
45 / 54

Differential regions using DESeq2

We can now run the DESeq2 workflow on our DESeq2 object using the DESeq() function.

deseqMyc <- DESeq(deseqMyc)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
46 / 54

Differential regions using DESeq2

Our DESeq2 object is updated to include useful statistics such our normalised values and variance of signal within each nonredundant peak call.

deseqMyc
## class: DESeqDataSet
## dim: 16930 4
## metadata(1): version
## assays(4): counts mu H cooks
## rownames: NULL
## rowData names(26): ch12_1 ch12_2 ... deviance maxCooks
## colnames(4): Sorted_Myc_Ch12_1.bam Sorted_Myc_Ch12_2.bam
## Sorted_Myc_Mel_1.bam Sorted_Myc_Mel_2.bam
## colData names(2): CellLine sizeFactor
47 / 54

Differential regions using DESeq2

We can extract our information of differential regions using the results() function.

We provide to the results() function the DESeq2 object, the comparison of interest to the contrast parameter and the type of output to return to the format parameter.

The comparison to contrast parameter is provided as a vector of length 3 including the metadata column of interest and groups to test.

We can sort the results by pvalue using the order() function to rank by the most significant changes.

MelMinusCh12 <- results(deseqMyc, contrast = c("CellLine", "Mel", "Ch12"), format = "GRanges")
MelMinusCh12 <- MelMinusCh12[order(MelMinusCh12$pvalue), ]
class(MelMinusCh12)
## [1] "GRanges"
## attr(,"package")
## [1] "GenomicRanges"
48 / 54

Differential regions using DESeq2

The GRanges object contains information on the comparison made in DESeq2.

Most useful it contains the the difference in IP signal as log2 fold change in log2FoldChange, the significance of the change in the pvalue column and an adjusted p-value to address multiple correction in padj column.

MelMinusCh12[1, ]
## GRanges object with 1 range and 6 metadata columns:
## seqnames ranges strand | baseMean log2FoldChange lfcSE
## <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric>
## [1] chr4 45953273-45954170 * | 613.942 6.09357 0.297353
## stat pvalue padj
## <numeric> <numeric> <numeric>
## [1] 20.4927 2.50058e-93 4.23348e-89
## -------
## seqinfo: 21 sequences from an unspecified genome; no seqlengths
49 / 54

Differential regions using DESeq2

We can now filter our nonredundant peaks to those with significantly more signal in Mel or Ch12 cell lines by filtering by log2FoldChange and padj (p-value adjusted for multiple correction) less than 0.05.

MelMinusCh12Filt <- MelMinusCh12[!is.na(MelMinusCh12$pvalue) | !is.na(MelMinusCh12$padj)]
UpinMel <- MelMinusCh12[MelMinusCh12$padj < 0.05 & MelMinusCh12$log2FoldChange >
0]
DowninMel <- MelMinusCh12[MelMinusCh12$padj < 0.05 & MelMinusCh12$log2FoldChange <
0]
export.bed(UpinMel, "UpinMel.bed")
export.bed(DowninMel, "DowninMel.bed")
50 / 54

Differential regions using DESeq2

51 / 54

Differential regions using DESeq2

Finally we can make our reviewing of sites in IGV a little easier using the tracktables package.

The tracktables package's makebedtable() function accepts a GRanges object and writes an HTML report contains links to IGV.

An example can be found here

library(tracktables)
myReport <- makebedtable(MelMinusCh12Filt, "MelMinusCh12.html", basedirectory = getwd())
browseURL(myReport)
52 / 54

Time for an exercise!

Exercise on ChIPseq data can be found here

53 / 54

Answers to exercise

Answers can be found here

54 / 54

Data

In todays session we will continue to review the Myc ChIPseq we were working on in our last sessions.

This include Myc ChIPseq for MEL and Ch12 celllines.

Information and files for the Myc ChIPseq in MEL cell line can be found here

Information and files for the Myc ChIPseq in Ch12 cell line can be found here

2 / 54
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow