+ - 0:00:00
Notes for current slide
Notes for next slide

ATACseq In Bioconductor


Rockefeller University, Bioinformatics Resource Centre

http://rockefelleruniversity.github.io/RU_ATACseq/

1 / 67

Data

In this session we will also keep working with the larger dataset from the ENCODE consortium generated by Bing Ren at UCSD. This dataset has multiple samples for multiple tissues (Liver, Kidney and Hindbrain), allowing us to perform a differential analysis on open regions.

More details are here

2 / 67

ATACseq

offset
  • ATACseq - Uses transposases and offers a method to simultaneously extract signal from transcription factors binding sites and nucleosome positions from a single sample.
3 / 67

ATACseq

Unlike with ChIPseq where we know the target of IP, with ATACseq we know that a region is open/accessible but have no knowledge of what transcription factors may be present.

We briefly saw at the end of our last session how we may scan for a specific motif under our peaks using matchPWM.

In the upcoming sessions we will review how we can investigate motifs in ATACseq using motifmatchr.

4 / 67

Motif Databases


5 / 67

Known Motif sources

Bioconductor provides two major sources of motifs as database packages.

These include:

6 / 67

MotifDb

The MotifDB package collects motif information from a wide range of sources and stores them in a DB object for use with other Bioconductor packages.

library(MotifDb)
MotifDb
## MotifDb object of length 10701
## | Created from downloaded public sources: 2013-Aug-30
## | 10701 position frequency matrices from 21 sources:
## | cisbp_1.02: 874
## | FlyFactorSurvey: 614
## | HOCOMOCOv10: 1066
## | HOCOMOCOv11-core-A: 181
## | HOCOMOCOv11-core-B: 84
## | HOCOMOCOv11-core-C: 135
## | HOCOMOCOv11-secondary-A: 46
## | HOCOMOCOv11-secondary-B: 19
## | HOCOMOCOv11-secondary-C: 13
## | HOCOMOCOv11-secondary-D: 290
## | HOMER: 332
## | hPDI: 437
## | JASPAR_2014: 592
## | JASPAR_CORE: 459
## | jaspar2016: 1209
## | jaspar2018: 1564
## | jolma2013: 843
## | ScerTF: 196
## | stamlab: 683
## | SwissRegulon: 684
## | UniPROBE: 380
## | 61 organism/s
## | Hsapiens: 5384
## | Mmusculus: 1411
## | Dmelanogaster: 1287
## | Scerevisiae: 1051
## | Athaliana: 803
## | Celegans: 90
## | other: 675
## Scerevisiae-cisbp_1.02-M0001_1.02
## Scerevisiae-cisbp_1.02-M0002_1.02
## Scerevisiae-cisbp_1.02-M0003_1.02
## Csativa-cisbp_1.02-M0004_1.02
## Athaliana-cisbp_1.02-M0005_1.02
## ...
## Mmusculus-UniPROBE-Zfp740.UP00022
## Mmusculus-UniPROBE-Zic1.UP00102
## Mmusculus-UniPROBE-Zic2.UP00057
## Mmusculus-UniPROBE-Zic3.UP00006
## Mmusculus-UniPROBE-Zscan4.UP00026
7 / 67

MotifDb

MotifDb object is special class of object called a MotifList.

class(MotifDb)
## [1] "MotifList"
## attr(,"package")
## [1] "MotifDb"
8 / 67

MotifDb

Like standard List objects we can use length and names to get some information on our object

length(MotifDb)
## [1] 10701
MotifNames <- names(MotifDb)
MotifNames[1:10]
## [1] "Scerevisiae-cisbp_1.02-M0001_1.02" "Scerevisiae-cisbp_1.02-M0002_1.02"
## [3] "Scerevisiae-cisbp_1.02-M0003_1.02" "Csativa-cisbp_1.02-M0004_1.02"
## [5] "Athaliana-cisbp_1.02-M0005_1.02" "Athaliana-cisbp_1.02-M0006_1.02"
## [7] "Athaliana-cisbp_1.02-M0007_1.02" "Athaliana-cisbp_1.02-M0008_1.02"
## [9] "Athaliana-cisbp_1.02-M0009_1.02" "Athaliana-cisbp_1.02-M0010_1.02"
9 / 67

Accesing MotifDb contents

We can also access information directly from our list using standard list accessors.

Here a [ will subset to a single MotifList. Now we can see the information held in the MotifList a little more clearly.

MotifDb[1]
## MotifDb object of length 1
## | Created from downloaded public sources: 2013-Aug-30
## | 1 position frequency matrices from 1 source:
## | cisbp_1.02: 1
## | 1 organism/s
## | Scerevisiae: 1
## Scerevisiae-cisbp_1.02-M0001_1.02
10 / 67

Accesing MotifDb contents

A [[ will subset to object within the element as with standard lists. Here we extract the position probability matrix.

MotifDb[[1]]
## 1 2 3 4 5 6
## A 0.009615385 0.009615385 0.971153846 0.009615385 0.009615385 0.971153846
## C 0.009615385 0.009615385 0.009615385 0.009615385 0.971153846 0.009615385
## G 0.009615385 0.009615385 0.009615385 0.009615385 0.009615385 0.009615385
## T 0.971153846 0.971153846 0.009615385 0.971153846 0.009615385 0.009615385
## 7 8
## A 0.009615385 0.009615385
## C 0.971153846 0.009615385
## G 0.009615385 0.009615385
## T 0.009615385 0.971153846
colSums(MotifDb[[1]])
## 1 2 3 4 5 6 7 8
## 1 1 1 1 1 1 1 1
11 / 67

Accesing MotifDb contents

We can extract a DataFrame of all the motif metadata information using the values() function.

values(MotifDb)[1:2, ]
## DataFrame with 2 rows and 15 columns
## providerName providerId dataSource
## <character> <character> <character>
## Scerevisiae-cisbp_1.02-M0001_1.02 M0001_1.02 MA0000004_1.02 cisbp_1.02
## Scerevisiae-cisbp_1.02-M0002_1.02 M0002_1.02 MA0000135_1.02 cisbp_1.02
## geneSymbol geneId geneIdType
## <character> <character> <character>
## Scerevisiae-cisbp_1.02-M0001_1.02 ABF1 NA NA
## Scerevisiae-cisbp_1.02-M0002_1.02 AFT1 NA NA
## proteinId proteinIdType organism
## <character> <character> <character>
## Scerevisiae-cisbp_1.02-M0001_1.02 YKL112W SGD Scerevisiae
## Scerevisiae-cisbp_1.02-M0002_1.02 YGL071W SGD Scerevisiae
## sequenceCount bindingSequence bindingDomain
## <character> <character> <character>
## Scerevisiae-cisbp_1.02-M0001_1.02 NA NA NA
## Scerevisiae-cisbp_1.02-M0002_1.02 NA NA NA
## tfFamily experimentType pubmedID
## <character> <character> <character>
## Scerevisiae-cisbp_1.02-M0001_1.02 ABF1 NA 19111667
## Scerevisiae-cisbp_1.02-M0002_1.02 AFT NA 19158363
12 / 67

Accesing MotifDb contents

We can use the query function to subset our MotifList by infomation in the metadata.

CTCFMotifs <- query(MotifDb, "CTCF")
CTCFMotifs
## MotifDb object of length 19
## | Created from downloaded public sources: 2013-Aug-30
## | 19 position frequency matrices from 9 sources:
## | HOCOMOCOv10: 3
## | HOCOMOCOv11-core-A: 2
## | HOMER: 3
## | JASPAR_2014: 2
## | JASPAR_CORE: 1
## | jaspar2016: 2
## | jaspar2018: 3
## | jolma2013: 1
## | SwissRegulon: 2
## | 4 organism/s
## | Hsapiens: 12
## | Dmelanogaster: 3
## | Mmusculus: 1
## | other: 3
## Hsapiens-HOCOMOCOv10-CTCFL_HUMAN.H10MO.A
## Hsapiens-HOCOMOCOv10-CTCF_HUMAN.H10MO.A
## Mmusculus-HOCOMOCOv10-CTCF_MOUSE.H10MO.A
## Hsapiens-HOCOMOCOv11-core-A-CTCFL_HUMAN.H11MO.0.A
## Hsapiens-HOCOMOCOv11-core-A-CTCF_HUMAN.H11MO.0.A
## ...
## Dmelanogaster-jaspar2018-CTCF-MA0531.1
## Hsapiens-jaspar2018-CTCFL-MA1102.1
## Hsapiens-jolma2013-CTCF
## Hsapiens-SwissRegulon-CTCFL.SwissRegulon
## Hsapiens-SwissRegulon-CTCF.SwissRegulon
13 / 67

Accesing MotifDb contents

For more specific queries, multiple words can be used for filtering.

CTCFMotifs <- query(MotifDb, c("CTCF", "hsapiens", "jaspar2018"))
CTCFMotifs
## MotifDb object of length 2
## | Created from downloaded public sources: 2013-Aug-30
## | 2 position frequency matrices from 1 source:
## | jaspar2018: 2
## | 1 organism/s
## | Hsapiens: 2
## Hsapiens-jaspar2018-CTCF-MA0139.1
## Hsapiens-jaspar2018-CTCFL-MA1102.1
14 / 67

JASPAR2020 package

The JASPAR packages are updated more frequently and so may contain motifs not characterized in the MotifDb package.

We can load the JASPAR package as usual and review the object as before.

library(JASPAR2020)
JASPAR2020
## An object of class "JASPAR2020"
## Slot "db":
## [1] "/usr/local/lib/R/host-site-library/JASPAR2020/extdata/JASPAR2020.sqlite"
15 / 67

JASPAR2020 and TFBSTools

To interact with JASPAR package we will make use of the TFBSTools package from the same lab.

Whereas the JASPAR package holds the information on Motifs and Position Probability Matrices (PPMs), TFBSTools has the functionality to manipulate and interact with these tools.

Three useful functions available from TFBStools to interact with the JASPAR databases are the getMatrixSet, getMatrixByID and getMatrixByID.

library(TFBSTools)
`?`(getMatrixByID)
16 / 67

TFBSTools for TFs

The getMatrixByID and getMatrixByName take the JASPAR DB object and a JASPAR ID or transcription factor name respectively.

Here we are using the transcription factor GATA2. The result is a new object class PFMatrix.

GATA2mat <- getMatrixByName(JASPAR2020, "GATA2")
class(GATA2mat)
## [1] "PFMatrix"
## attr(,"package")
## [1] "TFBSTools"
17 / 67

TFBSTools for motifs

JASPAR IDs are unique, so we can use them to select the exact motif we wanted.

GATA2mat <- getMatrixByID(JASPAR2020, "MA0036.2")
18 / 67

TFBSTools for motifs

List accessors will not work here but we can retrieve names using the ID function.

ID(GATA2mat)
## [1] "MA0036.2"
19 / 67

Position Frequency Matrix

To get hold of the Position Frequency Matrix (PFM) we can use the Matrix or as.matrix functions.

myMatrix <- Matrix(GATA2mat)
myMatrixToo <- as.matrix(myMatrix)
myMatrix
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## A 1569 1125 1555 1051 32 9 0 51 4380 0 0 1225 104 1256
## C 567 768 729 525 1057 3015 192 0 0 0 4380 0 1765 1185
## G 754 1709 1220 427 765 914 0 0 0 0 0 0 1967 224
## T 1490 778 876 2377 2526 442 4188 4329 0 4380 0 3155 544 1715
20 / 67

TFBSTools for motif sets

We can also use the getMatrixSet() function to retrieve sets of motifs. We can specify a list of options for motifs we want to retrieve.

To see the available filters use the help for getMatrixSet() function, ?getMatrixSet.

opts <- list()
opts[["collection"]] <- "CORE"
opts[["tax_group"]] <- "vertebrates"
motifList <- getMatrixSet(JASPAR2020, opts)
21 / 67

Visualizing motifs


22 / 67

Visualizing motifs

The seqLogo package offers a simple and intuitive way of visualizing our base frequencies within out motifs using the seqLogo function.

A simple seqLogo shows the relative frequency of a base at each motif position by the relative size of the base compared to other bases at the same position.

23 / 67

Visualizing motifs

To create a seqLogo we must supply a PPM to the seqLogo function. Here we grab a PPM straight from MotidDb for CTCF.

Here we set the ic.scale to FALSE to show the probability across the Y-axis

library(seqLogo)
CTCFMotifs <- query(MotifDb, "CTCF")
seqLogo::seqLogo(CTCFMotifs[[1]], ic.scale = FALSE)

24 / 67

Visualizing motifs

It may be useful to plot the probabilities as information content. Information content here will range from 0 to 2 Bits. Positions with equal probabilities for each base will score 0 and positions with only 1 possible base will score 2.

This allows you to quickly identify the important bases.

seqLogo::seqLogo(CTCFMotifs[[1]])

25 / 67

Visualizing motifs

The TFBSTools package and JASPAR provide us with point frequency matrices which we can't use directly in the seqLogo package

myMatrix
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## A 1569 1125 1555 1051 32 9 0 51 4380 0 0 1225 104 1256
## C 567 768 729 525 1057 3015 192 0 0 0 4380 0 1765 1185
## G 754 1709 1220 427 765 914 0 0 0 0 0 0 1967 224
## T 1490 778 876 2377 2526 442 4188 4329 0 4380 0 3155 544 1715
26 / 67

Visualizing motifs

We can convert our point frequency matrix to a point probabilty matrix by simply dividing columns by their sum.

ppm <- myMatrix/colSums(myMatrix)
ppm
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## A 0.3582192 0.2568493 0.3550228 0.23995434 0.007305936 0.002054795 0.00000000
## C 0.1294521 0.1753425 0.1664384 0.11986301 0.241324201 0.688356164 0.04383562
## G 0.1721461 0.3901826 0.2785388 0.09748858 0.174657534 0.208675799 0.00000000
## T 0.3401826 0.1776256 0.2000000 0.54269406 0.576712329 0.100913242 0.95616438
## [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## A 0.01164384 1 0 0 0.2796804 0.02374429 0.28675799
## C 0.00000000 0 0 1 0.0000000 0.40296804 0.27054795
## G 0.00000000 0 0 0 0.0000000 0.44908676 0.05114155
## T 0.98835616 0 1 0 0.7203196 0.12420091 0.39155251
27 / 67

Visualizing motifs

We can then plot the result matrix using seqLogo.

seqLogo::seqLogo(ppm)

28 / 67

Visualizing motifs

Fortunately, TFBSTools has its own version of seqLogo function we can use with one of its own ICMatrix classes. We simply need to convert our object with the toICM() function.

GATA2_IC <- toICM(GATA2mat)
TFBSTools::seqLogo(GATA2_IC)

29 / 67

Visualizing with ggplot

We can also use a ggplot2 style grammar of graphics with the ggseqlogo package.

library(ggseqlogo)
library(ggplot2)
ggseqlogo(myMatrix) + theme_minimal()
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

30 / 67

Identifying Motifs in ATAC


31 / 67

Known motif identification

Since ATACseq only tells us which regions are open/accessible, we can use motifs within our ATACseq peak regions to identify potential transcription factors which may be acting there.

The databases such as MotifDb and JASPAR2020, provide sets of known motif patterns for us to use our motif identification.

32 / 67

motifmatchr

To identify known motifs we will use the motifmatchr package which is a wrapper for the MOODS c++ library.

This means that motifmatchr offers us a fast methods to identify motifs within our ATACseq data.

We will use motifmatchr/MOODs with a default p-value cut-off.

33 / 67

motifmatchr

First we can retrieve a sensible set of motifs to scan for within our mouse tissue ATACseq data.

Here we retrieve the vertebrate, JASPAR CORE motifs. We additional specify all_versions is FALSE to only include the latest version of a motif.

opts <- list()
opts[["tax_group"]] <- "vertebrates"
opts[["collection"]] <- "CORE"
opts[["all_versions"]] <- FALSE
motifsToScan <- getMatrixSet(JASPAR2020, opts)
34 / 67

motifmatchr

The motifmatchr package main function is matchMotifs().

As with many Bioconductor functions , matchMotifs makes use of other Bioconductor objects such the BSGenome, GRanges and summarizedExperiment objects.

We can review the full functionality by reading the help for matchMotifs, ?matchMotifs.

35 / 67

ENCODE ATAC data

We need some data to review so we can make use of the ENCODE mouse tissue ATAC data.

We can load in the counts in a summarizedExperiment object from the data directory.

load("data/myCounts.RData")
myCounts
## class: RangedSummarizedExperiment
## dim: 36320 6
## metadata(0):
## assays(1): counts
## rownames: NULL
## rowData names(6):
## Sorted_Hindbrain_day_12_1_Small_Paired_peaks.narrowPeak
## Sorted_Hindbrain_day_12_2_Small_Paired_peaks.narrowPeak ...
## Sorted_Liver_day_12_1_Small_Paired_peaks.narrowPeak
## Sorted_Liver_day_12_2_Small_Paired_peaks.narrowPeak
## colnames(6): Sorted_Hindbrain_day_12_1_Small_Paired.bam
## Sorted_Hindbrain_day_12_2_Small_Paired.bam ...
## Sorted_Liver_day_12_1_Small_Paired.bam
## Sorted_Liver_day_12_2_Small_Paired.bam
## colData names(0):
36 / 67

ENCODE ATAC peaks

We can retrieve the ranges of our peaks directly from our SummarizedExperiment object by using the standard accessor rowRanges.

peakRanges <- rowRanges(myCounts)
peakRanges[1, ]
## GRanges object with 1 range and 6 metadata columns:
## seqnames ranges strand |
## <Rle> <IRanges> <Rle> |
## [1] chr1 3433954-3434095 * |
## Sorted_Hindbrain_day_12_1_Small_Paired_peaks.narrowPeak
## <logical>
## [1] FALSE
## Sorted_Hindbrain_day_12_2_Small_Paired_peaks.narrowPeak
## <logical>
## [1] FALSE
## Sorted_Kidney_day_15_1_Small_Paired_peaks.narrowPeak
## <logical>
## [1] TRUE
## Sorted_Kidney_day_15_2_Small_Paired_peaks.narrowPeak
## <logical>
## [1] TRUE
## Sorted_Liver_day_12_1_Small_Paired_peaks.narrowPeak
## <logical>
## [1] FALSE
## Sorted_Liver_day_12_2_Small_Paired_peaks.narrowPeak
## <logical>
## [1] FALSE
## -------
## seqinfo: 22 sequences from an unspecified genome; no seqlengths
37 / 67

ENCODE ATAC peaks

We can use the getSeq() function to extract sequences from a BSGenome object using a GRanges object to define the regions of interest. This is a similar approach to de novo motif discovery for ChIPseq.

We load the BSgenome.Mmusculus.UCSC.mm10 library and re-center our peaks to 100bp using the resize function. Once resized we can extract the required sequences using the getSeq() function.

library(BSgenome.Mmusculus.UCSC.mm10)
peakRangesCentered <- resize(peakRanges, fix = "center", width = 100)
peakSeqs <- getSeq(BSgenome.Mmusculus.UCSC.mm10, peakRangesCentered)
names(peakSeqs) <- as.character(peakRangesCentered)
peakSeqs
## DNAStringSet object of length 36320:
## width seq names
## [1] 100 GGTTGTGCACTGTAAGCTGAC...CACTTCTGCTTCTGTCTGGCT chr1:3433975-3434074
## [2] 100 AGTGCCTAGTCTTACTGCAAC...AGAAGAGGAGTAGGTGGGAGT chr1:3575937-3576036
## [3] 100 CGCCGCCGCCGCCAGCAGCAG...GCGCGCTCCCTCCTCCCGCAG chr1:3671660-3671759
## [4] 100 CTTCCCCCACCGCTGACACAC...TCTTCCCTTCACCCACCCTGA chr1:3994882-3994981
## [5] 100 GAGGCAGCACCCTTTGTGGGC...GGCCTAAGCCACAGCAGCAGC chr1:4560256-4560355
## ... ... ...
## [36316] 100 GCCATGTGCTCGCAGCTCCGC...CACGGCCCCGCCGGCTCCCCT chrY:90784203-907...
## [36317] 100 GGAGTTGTAGTCCTGCCGTAG...TGGGAGTTGCAGCGCCATGTT chrY:90784972-907...
## [36318] 100 CTCTGGAGACCAGGCTGACCT...ACCACGCCCGGATTTTTACCA chrY:90801528-908...
## [36319] 100 CTTCTCTGCCTCATGGGGACG...CGTCCCGGGCAGGCGGCGTCT chrY:90807427-908...
## [36320] 100 GGGGGGTTAGGTTAGGGTTAG...GGGGGTTAGGTTAGGGTTAGG chrY:90813049-908...
38 / 67

Finding motif positions

As we saw in its help, The matchMotifs function can provide output of motif matches as matches, scores or positions.

Here we scan for 4 selected motifs from JASPAR2020 using the sequence under the first 100 ATACseq peaks and specify the out as positions.

motif_positions <- matchMotifs(motifsToScan[1:4], peakSeqs[1:100], out = "positions")
class(motif_positions)
## [1] "list"
length(motif_positions)
## [1] 4
39 / 67

Finding motif positions

The result contains a list of the same length as the number of motifs tested.

Each element contains a IRangeslist with an entry for every sequence tested and a IRanges of motif positions within the peak sequences.

motif_positions$MA0029.1
## IRangesList object of length 100:
## [[1]]
## IRanges object with 0 ranges and 2 metadata columns:
## start end width | strand score
## <integer> <integer> <integer> | <character> <numeric>
##
## [[2]]
## IRanges object with 0 ranges and 2 metadata columns:
## start end width | strand score
## <integer> <integer> <integer> | <character> <numeric>
##
## [[3]]
## IRanges object with 0 ranges and 2 metadata columns:
## start end width | strand score
## <integer> <integer> <integer> | <character> <numeric>
##
## ...
## <97 more elements>
40 / 67

Finding motif positions

We can unlist our IRangeslist to a standard list for easier working.

MA0029hits <- motif_positions$MA0029.1
names(MA0029hits) <- names(peakSeqs[1:100])
unlist(MA0029hits, use.names = TRUE)
## IRanges object with 1 range and 2 metadata columns:
## start end width | strand score
## <integer> <integer> <integer> | <character> <numeric>
## chr1:13566347-13566446 56 69 14 | + 10.0096
41 / 67

Finding motif hits

We may simply want to map motifs to their ATACseq peaks.

To do this we can set the out parameter to matches. This will return a SummarizedExperiment object.

motifHits <- matchMotifs(motifsToScan, peakSeqs, out = "matches")
class(motifHits)
## [1] "SummarizedExperiment"
## attr(,"package")
## [1] "SummarizedExperiment"
motifHits
## class: SummarizedExperiment
## dim: 36320 746
## metadata(0):
## assays(1): motifMatches
## rownames: NULL
## rowData names(0):
## colnames(746): MA0004.1 MA0006.1 ... MA0528.2 MA0609.2
## colData names(1): name
42 / 67

Finding motif hits

We can retrieve a matrix of matches by motif and peak using the motifMatches function.

mmMatrix <- motifMatches(motifHits)
dim(mmMatrix)
## [1] 36320 746
mmMatrix[1:8, 1:8]
## 8 x 8 sparse Matrix of class "lgCMatrix"
## MA0004.1 MA0006.1 MA0019.1 MA0029.1 MA0030.1 MA0031.1 MA0040.1 MA0041.1
## [1,] . . . . . . . .
## [2,] . . . . . . . .
## [3,] . . . . . . . .
## [4,] . . . . . . . .
## [5,] . . . . . . . .
## [6,] . . . . . . . .
## [7,] . | . . . . . .
## [8,] . . . . . . . .
43 / 67

Finding motif hits

Although a sparse matrix, we can still use our matrix operations to extract useful information from this object.

We can use the colSums() to identify the total occurrence of motifs in our peak sequences.

totalMotifOccurence <- colSums(mmMatrix)
totalMotifOccurence[1:4]
## MA0004.1 MA0006.1 MA0019.1 MA0029.1
## 1137 1303 184 1163
44 / 67

Finding motif hits

We can also identify peaks which contain a hit for a selected motif.

peaksWithMA0912 <- peakRangesCentered[mmMatrix[, "MA0912.2"] == 1]
peaksWithMA0912
## GRanges object with 361 ranges and 6 metadata columns:
## seqnames ranges strand |
## <Rle> <IRanges> <Rle> |
## [1] chr1 5232911-5233010 * |
## [2] chr1 9783650-9783749 * |
## [3] chr1 15644761-15644860 * |
## [4] chr1 31233772-31233871 * |
## [5] chr1 39393610-39393709 * |
## ... ... ... ... .
## [357] chrX 101332594-101332693 * |
## [358] chrX 101429501-101429600 * |
## [359] chrX 109012641-109012740 * |
## [360] chrX 166385018-166385117 * |
## [361] chrX 166672780-166672879 * |
## Sorted_Hindbrain_day_12_1_Small_Paired_peaks.narrowPeak
## <logical>
## [1] TRUE
## [2] FALSE
## [3] FALSE
## [4] TRUE
## [5] FALSE
## ... ...
## [357] TRUE
## [358] TRUE
## [359] FALSE
## [360] FALSE
## [361] FALSE
## Sorted_Hindbrain_day_12_2_Small_Paired_peaks.narrowPeak
## <logical>
## [1] TRUE
## [2] FALSE
## [3] FALSE
## [4] TRUE
## [5] FALSE
## ... ...
## [357] TRUE
## [358] TRUE
## [359] FALSE
## [360] FALSE
## [361] FALSE
## Sorted_Kidney_day_15_1_Small_Paired_peaks.narrowPeak
## <logical>
## [1] FALSE
## [2] FALSE
## [3] TRUE
## [4] FALSE
## [5] FALSE
## ... ...
## [357] TRUE
## [358] TRUE
## [359] FALSE
## [360] TRUE
## [361] FALSE
## Sorted_Kidney_day_15_2_Small_Paired_peaks.narrowPeak
## <logical>
## [1] FALSE
## [2] FALSE
## [3] TRUE
## [4] FALSE
## [5] FALSE
## ... ...
## [357] TRUE
## [358] TRUE
## [359] FALSE
## [360] TRUE
## [361] FALSE
## Sorted_Liver_day_12_1_Small_Paired_peaks.narrowPeak
## <logical>
## [1] FALSE
## [2] TRUE
## [3] FALSE
## [4] FALSE
## [5] TRUE
## ... ...
## [357] TRUE
## [358] TRUE
## [359] TRUE
## [360] FALSE
## [361] TRUE
## Sorted_Liver_day_12_2_Small_Paired_peaks.narrowPeak
## <logical>
## [1] FALSE
## [2] TRUE
## [3] FALSE
## [4] FALSE
## [5] TRUE
## ... ...
## [357] TRUE
## [358] TRUE
## [359] TRUE
## [360] FALSE
## [361] TRUE
## -------
## seqinfo: 22 sequences from an unspecified genome; no seqlengths
45 / 67

summarizing ATAC signal to Motifs


46 / 67

summarizing ATAC signal to Motifs

The chromVar package allows for the summarization of ATACseq signal changes to the motifs within peaks.

With this summarization we can potentially identify which motifs may have an important role in a set of ATAC samples compared to other samples.

The chromVar package comes from the same lab and author as the motifmatchr package and so will work well together.

First we load the library.

library(chromVAR)
47 / 67

Setting up for chromVar

To identify motifs for chomVar we will use motifMatchr with a different set of inputs.

Here we will provide our RangedSummarizedExperiment object contaitng our counts over peaks directly to the matchMotifs function.

First we will remove any peaks with less than 5 reads across all samples.

myCounts <- myCounts[rowSums(assay(myCounts)) > 5, ]
48 / 67

Correct GC bias

Next we can correct for any potential GC bias which may have arisen in sequencing. Previously we have seen that we have GC bias observable in our FastQ quality checks.

For comparing across differing sets of peaks with differing sequences compositions we will need to correct for this bias. We can use the addGCBias and specify our genome as a BSgenome object to correct in chromVar.

myCounts <- addGCBias(myCounts, genome = BSgenome.Mmusculus.UCSC.mm10)
49 / 67

ID motifs in peaks

Having corrected for bias, we can use the matchMotifs function again to identify motifs under our ATACseq peaks.

Here we supply our RangedSummarizedExperiment of counts in peaks and the genome of interest to the matchMotifs function and use the default out of matches.

motif_ix <- matchMotifs(motifsToScan, myCounts, genome = BSgenome.Mmusculus.UCSC.mm10)
motif_ix
## class: RangedSummarizedExperiment
## dim: 36320 746
## metadata(0):
## assays(1): motifMatches
## rownames: NULL
## rowData names(7):
## Sorted_Hindbrain_day_12_1_Small_Paired_peaks.narrowPeak
## Sorted_Hindbrain_day_12_2_Small_Paired_peaks.narrowPeak ...
## Sorted_Liver_day_12_2_Small_Paired_peaks.narrowPeak bias
## colnames(746): MA0004.1 MA0006.1 ... MA0528.2 MA0609.2
## colData names(1): name
50 / 67

Run chromVar

Following the identification of motifs in our peaks, we can perform the summarization of ATACseq signal to motifs using the computeDeviations and the computeVariability functions.

deviations <- computeDeviations(object = myCounts, annotations = motif_ix)
variability_Known <- computeVariability(deviations)
51 / 67

chromVar deviations

The deviations result contains a SummarizedExperiment object with the Z-scores showing the enrichment for ATACseq signal in each sample for every motif.

devZscores <- deviationScores(deviations)
devZscores[1:2, ]
## Sorted_Hindbrain_day_12_1_Small_Paired.bam
## MA0004.1 3.679156
## MA0006.1 8.046464
## Sorted_Hindbrain_day_12_2_Small_Paired.bam
## MA0004.1 5.431137
## MA0006.1 7.633895
## Sorted_Kidney_day_15_1_Small_Paired.bam
## MA0004.1 4.114432
## MA0006.1 6.570759
## Sorted_Kidney_day_15_2_Small_Paired.bam
## MA0004.1 4.564610
## MA0006.1 6.004668
## Sorted_Liver_day_12_1_Small_Paired.bam
## MA0004.1 -4.276874
## MA0006.1 -8.348624
## Sorted_Liver_day_12_2_Small_Paired.bam
## MA0004.1 -6.653514
## MA0006.1 -6.824266
52 / 67

chromVar variability

The variability result contains the ranking of motifs by their variability across samples. Highly variable motifs may indicate motifs associated with a particular sample group or variable across all groups.

variability_Known <- variability_Known[order(variability_Known$p_value), ]
variability_Known[1:10, ]
## name variability bootstrap_lower_bound bootstrap_upper_bound
## MA0140.2 GATA1::TAL1 31.47098 3.1948304 34.67008
## MA0600.2 RFX2 17.77072 7.3944769 21.22610
## MA0036.3 GATA2 32.01213 0.3921849 34.17913
## MA0037.3 GATA3 25.31902 0.8039623 27.44567
## MA0766.2 GATA5 25.01678 0.4178051 26.75562
## MA0035.4 GATA1 26.04815 0.8122387 27.98797
## MA0482.2 GATA4 27.45124 0.8642210 29.45822
## MA1104.2 GATA6 29.73974 0.5806377 31.84163
## MA0509.2 RFX1 17.52837 8.5454333 21.34927
## MA0867.2 SOX4 16.47110 5.8375559 19.59171
## p_value p_value_adj
## MA0140.2 0.000000e+00 0.000000e+00
## MA0600.2 0.000000e+00 0.000000e+00
## MA0036.3 0.000000e+00 0.000000e+00
## MA0037.3 0.000000e+00 0.000000e+00
## MA0766.2 0.000000e+00 0.000000e+00
## MA0035.4 0.000000e+00 0.000000e+00
## MA0482.2 0.000000e+00 0.000000e+00
## MA1104.2 0.000000e+00 0.000000e+00
## MA0509.2 0.000000e+00 0.000000e+00
## MA0867.2 3.690799e-291 2.749645e-289
53 / 67

chromVar results

We can use the results from the variability with our Z-score deviations to identify in which sample our motifs are enriched.

topVariable <- variability_Known[1:20, ]
devTop <- merge(topVariable[, 1, drop = FALSE], devZscores, by = 0)
devTop[1:2, ]
## Row.names name Sorted_Hindbrain_day_12_1_Small_Paired.bam
## 1 MA0035.4 GATA1 -24.29333
## 2 MA0036.3 GATA2 -29.55701
## Sorted_Hindbrain_day_12_2_Small_Paired.bam
## 1 -22.56358
## 2 -30.63070
## Sorted_Kidney_day_15_1_Small_Paired.bam
## 1 -25.01967
## 2 -30.24737
## Sorted_Kidney_day_15_2_Small_Paired.bam
## 1 -24.19630
## 2 -29.73203
## Sorted_Liver_day_12_1_Small_Paired.bam Sorted_Liver_day_12_2_Small_Paired.bam
## 1 25.81764 26.97208
## 2 31.46003 32.42450
54 / 67

chromVar results

A useful way to visualize these results is using a heatmap. Although we cover this in a later session, here we can use the pheatmap library with default settings to illustrate where our most variable motifs are active.

devToPlot <- as.matrix(devTop[, -c(1:2)])
rownames(devToPlot) <- devTop[, 2]
library(pheatmap)
pheatmap(devToPlot)

55 / 67

Identifying Motifs de novo in ATAC


56 / 67

De novo motif discovery

So far we have reviewed how to identify known motifs within our ATACseq peaks.

In our ChIPseq training we cover howt we can use the Meme-ChIP software to perform De novo motif identification.

Here we wll use the Meme-ChIP software in a discriminative mode to identify De novo motifs enriched in one set of samples over another.

57 / 67

Differential peaks

First we will need to identify a set of peaks enriched for ATACseq signal in one group over another. To do this we will use DESeq2 to identify ATACseq peaks enriched in 1 set over another.

We can load our Liver, Kidney and Brain counts from earlier into a DEseq object for differential analysis.

require(DESeq2)
load("data/myCounts.RData")
Group <- factor(c("HindBrain", "HindBrain", "Kidney", "Kidney", "Liver", "Liver"))
colData(myCounts) <- DataFrame(data.frame(Group, row.names = colnames(myCounts)))
dds <- DESeqDataSet(myCounts, design = ~Group)
dds <- DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## -- note: fitType='parametric', but the dispersion trend was not well captured by the
## function: y = a/x + b, and a local regression fit was automatically substituted.
## specify fitType='local' or 'mean' to avoid this message next time.
## final dispersion estimates
## fitting model and testing
58 / 67

Differential peaks

We can then use the DESeq2 function results to identify differential ATAC regions and set the format parameter to GRanges to allow the return of the result as a GRanges object.

myRes <- results(dds, contrast = c("Group", "Liver", "Kidney"), format = "GRanges")
myRes <- myRes[order(myRes$padj), ]
upRegions <- myRes[myRes$log2FoldChange > 0][1:1000]
downRegions <- myRes[myRes$log2FoldChange < 0, ][1:1000]
upRegions
## GRanges object with 1000 ranges and 6 metadata columns:
## seqnames ranges strand | baseMean log2FoldChange
## <Rle> <IRanges> <Rle> | <numeric> <numeric>
## [1] chr7 100467072-100467556 * | 101.6408 3.99023
## [2] chr18 32542348-32542812 * | 96.8956 4.11485
## [3] chr10 30763197-30763615 * | 94.4364 3.78759
## [4] chr15 103258321-103258745 * | 99.6945 3.59581
## [5] chr8 123978898-123979311 * | 84.3669 4.13890
## ... ... ... ... . ... ...
## [996] chr5 148897184-148897609 * | 24.8790 3.10487
## [997] chr5 147888548-147888858 * | 21.1643 3.45870
## [998] chr7 131425006-131425298 * | 25.7261 3.01058
## [999] chr3 14886438-14886963 * | 63.2840 1.59576
## [1000] chr8 122719754-122720057 * | 26.7802 2.94430
## lfcSE stat pvalue padj
## <numeric> <numeric> <numeric> <numeric>
## [1] 0.290132 13.7532 4.87419e-43 1.77031e-38
## [2] 0.304919 13.4949 1.67615e-41 3.04389e-37
## [3] 0.281546 13.4528 2.96263e-41 3.58676e-37
## [4] 0.281400 12.7783 2.16807e-37 1.96861e-33
## [5] 0.330165 12.5359 4.75193e-36 3.45180e-32
## ... ... ... ... ...
## [996] 0.487194 6.37297 1.85407e-10 5.79017e-09
## [997] 0.543067 6.36883 1.90469e-10 5.94317e-09
## [998] 0.472801 6.36754 1.92089e-10 5.98855e-09
## [999] 0.250669 6.36602 1.93994e-10 6.04275e-09
## [1000] 0.462528 6.36567 1.94435e-10 6.05130e-09
## -------
## seqinfo: 22 sequences from an unspecified genome; no seqlengths
59 / 67

Resize peaks

For Meme-ChIP we will want to resize the regions to 100bp. By default Meme-ChIP will perform this trimming for us but to ensure it is in the centre we will do this upfront.

upRegions <- resize(upRegions, fix = "center", width = 100)
downRegions <- resize(downRegions, fix = "center", width = 100)
60 / 67

Peak sequences

We can now use the getSeq function as previously to extract the signal from around within the GRange regions and write to a FASTA file for use in Meme-ChIP using the writeXStringSet function.

library(BSgenome.Mmusculus.UCSC.mm10)
upStrings <- getSeq(BSgenome.Mmusculus.UCSC.mm10, upRegions)
downStrings <- getSeq(BSgenome.Mmusculus.UCSC.mm10, downRegions)
names(upStrings) <- as.character(upRegions)
names(downStrings) <- as.character(downRegions)
writeXStringSet(upStrings, file = "UpRegions.fa")
writeXStringSet(downStrings, file = "DownStrings.fa")
61 / 67

Meme-ChIP

We can then submit the sample to the Meme-ChIP online submission form found here.

In contrast to for ChIPseq, here we will run this in Differential mode to compare our sequences in Liver enriched regions to those depleted in Liver.

The results can be found in data/memeResult_LiverVsBrain

62 / 67

Using De novo motifs

We may want to use the motifs discovered by Meme-ChIP in other Bioconductor software.

To do this we can use the universalmotif package.

library(universalmotif)
63 / 67

Importing motifs

The universalmotif package provides lots of useful functions for importing motif sets.

One useful and relevant function is the read_meme() function.

memeMotifs <- read_meme("data/memeResult_LiverVsBrain/combined.meme")
memeMotifs
## [[1]]
##
## Motif name: 1
## Alternate name: HGATA-DREME-1
## Alphabet: DNA
## Type: PPM
## Strands: +-
## Total IC: 8.78
## Pseudocount: 1
## Consensus: WGATA
## Target sites: 1236
## E-value: 7.6e-275
## Extra info: [eval.string] 7.6e-275
##
## W G A T A
## A 0.58 0 1 0 1
## C 0.04 0 0 0 0
## G 0.00 1 0 0 0
## T 0.38 0 0 1 0
##
## [[2]]
##
## Motif name: 2
## Alternate name: WGATAAG-MEME-1
## Alphabet: DNA
## Type: PPM
## Strands: +-
## Total IC: 10.2
## Pseudocount: 1
## Consensus: WGATAAG
## Target sites: 786
## E-value: 1.4e-96
## Extra info: [eval.string] 1.4e-096
##
## W G A T A A G
## A 0.64 0 1 0 0.96 0.82 0.15
## C 0.03 0 0 0 0.00 0.00 0.17
## G 0.01 1 0 0 0.00 0.12 0.63
## T 0.32 0 0 1 0.04 0.05 0.05
##
## [[3]]
##
## Motif name: 3
## Alternate name: AYACH-DREME-2
## Alphabet: DNA
## Type: PPM
## Strands: +-
## Total IC: 7.73
## Pseudocount: 1
## Consensus: ACACH
## Target sites: 1084
## E-value: 1.5e-34
## Extra info: [eval.string] 1.5e-034
##
## A C A C H
## A 1 0.00 1 0 0.38
## C 0 0.83 0 1 0.34
## G 0 0.00 0 0 0.00
## T 0 0.17 0 0 0.28
##
## [[4]]
##
## Motif name: 4
## Alternate name: ACAK-DREME-3
## Alphabet: DNA
## Type: PPM
## Strands: +-
## Total IC: 7.19
## Pseudocount: 1
## Consensus: ACAG
## Target sites: 1719
## E-value: 3.7e-17
## Extra info: [eval.string] 3.7e-017
##
## A C A G
## A 1 0 1 0.00
## C 0 1 0 0.00
## G 0 0 0 0.76
## T 0 0 0 0.24
##
## [[5]]
##
## Motif name: 5
## Alternate name: CTHATC-DREME-4
## Alphabet: DNA
## Type: PPM
## Strands: +-
## Total IC: 10.13
## Pseudocount: 1
## Consensus: CTHATC
## Target sites: 138
## E-value: 1.1e-10
## Extra info: [eval.string] 1.1e-010
##
## C T H A T C
## A 0 0 0.39 1 0 0
## C 1 0 0.32 0 0 1
## G 0 0 0.00 0 0 0
## T 0 1 0.29 0 1 0
##
## [[6]]
##
## Motif name: 6
## Alternate name: YTATCW-MEME-2
## Alphabet: DNA
## Type: PPM
## Strands: +-
## Total IC: 6.89
## Pseudocount: 1
## Consensus: YTATCW
## Target sites: 761
## E-value: 7.4e-09
## Extra info: [eval.string] 7.4e-009
##
## Y T A T C W
## A 0.04 0 0.59 0 0.05 0.37
## C 0.35 0 0.00 0 0.86 0.05
## G 0.19 0 0.17 0 0.00 0.01
## T 0.42 1 0.23 1 0.09 0.57
##
## [[7]]
##
## Motif name: 7
## Alternate name: ASWA-DREME-5
## Alphabet: DNA
## Type: PPM
## Strands: +-
## Total IC: 6.11
## Pseudocount: 1
## Consensus: ASWA
## Target sites: 1492
## E-value: 6e-07
## Extra info: [eval.string] 6.0e-007
##
## A S W A
## A 1 0.00 0.7 1
## C 0 0.42 0.0 0
## G 0 0.58 0.0 0
## T 0 0.00 0.3 0
##
## [[8]]
##
## Motif name: 8
## Alternate name: ACCCAGW-DREME-6
## Alphabet: DNA
## Type: PPM
## Strands: +-
## Total IC: 12.13
## Pseudocount: 1
## Consensus: ACCCAGW
## Target sites: 51
## E-value: 0.00036
## Extra info: [eval.string] 3.6e-004
##
## A C C C A G W
## A 1 0 0 0 1 0 0.53
## C 0 1 1 1 0 0 0.00
## G 0 0 0 0 0 1 0.00
## T 0 0 0 0 0 0 0.47
##
## [[9]]
##
## Motif name: 9
## Alternate name: CMCACCCW-MEME-3
## Alphabet: DNA
## Type: PPM
## Strands: +-
## Total IC: 12.73
## Pseudocount: 1
## Consensus: CMCACCCW
## Target sites: 164
## E-value: 0.029
## Extra info: [eval.string] 2.9e-002
##
## C M C A C C C W
## A 0.01 0.63 0 0.95 0.00 0 0 0.48
## C 0.96 0.37 1 0.00 0.99 1 1 0.07
## G 0.00 0.00 0 0.05 0.01 0 0 0.01
## T 0.03 0.00 0 0.00 0.00 0 0 0.45
64 / 67

Importing motifs

The universalmotif package also provides functions to convert between motif objects from different package. Here we convert to TFBStools motif objects.

memeMotifsTFBStools <- convert_motifs(memeMotifs, "TFBSTools-PWMatrix")
memeMotifsTFBStools
## [[1]]
## An object of class PWMatrix
## ID: HGATA-DREME-1
## Name: 1
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags:
## $eval.string
## [1] "7.6e-275"
##
## Background:
## A C G T
## 0.2304 0.2696 0.2696 0.2304
## Matrix:
## W G A T A
## A 1.3294375 -10.272630 2.11689 -10.27263 2.11689
## C -2.5931079 -10.272630 -10.27263 -10.27263 -10.27263
## G -10.2726298 1.890255 -10.27263 -10.27263 -10.27263
## T 0.7069592 -10.272630 -10.27263 2.11689 -10.27263
##
## [[2]]
## An object of class PWMatrix
## ID: WGATAAG-MEME-1
## Name: 2
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags:
## $eval.string
## [1] "1.4e-096"
##
## Background:
## A C G T
## 0.2304 0.2696 0.2696 0.2304
## Matrix:
## W G A T A A G
## A 1.4668977 -9.620220 2.116376 -9.620220 2.060250 1.835703 -0.6292295
## C -3.1887377 -9.620220 -9.620220 -9.620220 -9.620220 -6.019998 -0.6709083
## G -4.5166051 1.889768 -9.620220 -9.620220 -9.620220 -1.155168 1.2171203
## T 0.4818744 -9.620220 -9.620220 2.116376 -2.584504 -2.136796 -2.0684582
##
## [[3]]
## An object of class PWMatrix
## ID: AYACH-DREME-2
## Name: 3
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags:
## $eval.string
## [1] "1.5e-034"
##
## Background:
## A C G T
## 0.2304 0.2696 0.2696 0.2304
## Matrix:
## A C A C H
## A 2.116764 -10.0834793 2.116764 -10.083479 0.7390085
## C -10.083479 1.6154172 -10.083479 1.890136 0.3243904
## G -10.083479 -10.0834793 -10.083479 -10.083479 -10.0834793
## T -10.083479 -0.4093361 -10.083479 -10.083479 0.2690316
##
## [[4]]
## An object of class PWMatrix
## ID: ACAK-DREME-3
## Name: 4
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags:
## $eval.string
## [1] "3.7e-017"
##
## Background:
## A C G T
## 0.2304 0.2696 0.2696 0.2304
## Matrix:
## A C A G
## A 2.117142 -10.748193 2.117142 -10.74819285
## C -10.748193 1.890495 -10.748193 -10.74819285
## G -10.748193 -10.748193 -10.748193 1.49966337
## T -10.748193 -10.748193 -10.748193 0.04283434
##
## [[5]]
## An object of class PWMatrix
## ID: CTHATC-DREME-4
## Name: 5
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags:
## $eval.string
## [1] "1.1e-010"
##
## Background:
## A C G T
## 0.2304 0.2696 0.2696 0.2304
## Matrix:
## C T H A T C
## A -7.118941 -7.118941 0.7598762 2.109777 -7.118941 -7.118941
## C 1.883507 -7.118941 0.2404110 -7.118941 -7.118941 1.883507
## G -7.118941 -7.118941 -7.1189411 -7.118941 -7.118941 -7.118941
## T -7.118941 2.109777 0.3290605 -7.118941 2.109777 -7.118941
##
## [[6]]
## An object of class PWMatrix
## ID: YTATCW-MEME-2
## Name: 6
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags:
## $eval.string
## [1] "7.4e-009"
##
## Background:
## A C G T
## 0.2304 0.2696 0.2696 0.2304
## Matrix:
## Y T A T C W
## A -2.6366821 -9.573647 1.365054357 -9.573647 -2.125646 0.6642764
## C 0.3796124 -9.573647 -9.573647187 -9.573647 1.671202 -2.3871986
## G -0.5199509 -9.573647 -0.624335673 -9.573647 -9.573647 -4.3222315
## T 0.8805593 2.116330 0.005459193 2.116330 -1.384818 1.3064571
##
## [[7]]
## An object of class PWMatrix
## ID: ASWA-DREME-5
## Name: 7
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags:
## $eval.string
## [1] "6.0e-007"
##
## Background:
## A C G T
## 0.2304 0.2696 0.2696 0.2304
## Matrix:
## A S W A
## A 2.117044 -10.5439985 1.5978626 2.117044
## C -10.543998 0.6423491 -10.5439985 -10.543998
## G -10.543998 1.1024468 -10.5439985 -10.543998
## T -10.543998 -10.5439985 0.3915094 -10.543998
##
## [[8]]
## An object of class PWMatrix
## ID: ACCCAGW-DREME-6
## Name: 8
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags:
## $eval.string
## [1] "3.6e-004"
##
## Background:
## A C G T
## 0.2304 0.2696 0.2696 0.2304
## Matrix:
## A C C C A G W
## A 2.096276 -5.70044 -5.70044 -5.70044 2.096276 -5.70044 1.184494
## C -5.700440 1.87070 1.87070 1.87070 -5.700440 -5.70044 -5.700440
## G -5.700440 -5.70044 -5.70044 -5.70044 -5.700440 1.87070 -5.700440
## T -5.700440 -5.70044 -5.70044 -5.70044 -5.700440 -5.70044 1.016094
##
## [[9]]
## An object of class PWMatrix
## ID: CMCACCCW-MEME-3
## Name: 9
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags:
## $eval.string
## [1] "2.9e-002"
##
## Background:
## A C G T
## 0.2304 0.2696 0.2696 0.2304
## Matrix:
## C M C A C C C
## A -4.949407 1.4411892 -7.366322 2.029732 -7.366322 -7.366322 -7.366322
## C 1.831026 0.4618849 1.884707 -7.366322 1.867034 1.884707 1.884707
## G -7.366322 -7.3663222 -7.366322 -2.262708 -4.292777 -7.366322 -7.366322
## T -2.861614 -7.3663222 -7.366322 -7.366322 -7.366322 -7.366322 -7.366322
## W
## A 1.0411226
## C -1.9808502
## G -4.2927766
## T 0.9458359
65 / 67

Time for an exercise!

Exercise on ATACseq data can be found here

66 / 67

Answers to exercise

Answers can be found here

67 / 67

Data

In this session we will also keep working with the larger dataset from the ENCODE consortium generated by Bing Ren at UCSD. This dataset has multiple samples for multiple tissues (Liver, Kidney and Hindbrain), allowing us to perform a differential analysis on open regions.

More details are here

2 / 67
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow