ATACseq (part 2)

Data

In this session we will also keep working with the larger dataset from the ENCODE consortium generated by Bing Ren at UCSD. This dataset has multiple samples for multiple tissues (Liver, Kidney and Hindbrain), allowing us to perform a differential analysis on open regions.

More details are here

ATACseq

offset

ATACseq - Uses transposases and offers a method to simultaneously extract signal from transcription factors binding sites and nucleosome positions from a single sample.

ATACseq

Unlike with ChIPseq where we know the target of IP, with ATACseq we know that a region is open/accessible but have no knowledge of what transcription factors may be present.

We briefly saw at the end of our last session how we may scan for a specific motif under our peaks using matchPWM.

In the upcoming sessions we will review how we can investigate motifs in ATACseq using motifmatchr.

Motif Databases

Known Motif sources

Bioconductor provides two major sources of motifs as database packages.

These include:

The MotifDb package.
The JASPAR databases, JASPAR2020 being latest.

MotifDb

The MotifDB package collects motif information from a wide range of sources and stores them in a DB object for use with other Bioconductor packages.

library(MotifDb)
MotifDb

## MotifDb object of length 10701
## | Created from downloaded public sources: 2013-Aug-30
## | 10701 position frequency matrices from 21 sources:
## |         cisbp_1.02:  874
## |    FlyFactorSurvey:  614
## |        HOCOMOCOv10: 1066
## | HOCOMOCOv11-core-A:  181
## | HOCOMOCOv11-core-B:   84
## | HOCOMOCOv11-core-C:  135
## | HOCOMOCOv11-secondary-A:   46
## | HOCOMOCOv11-secondary-B:   19
## | HOCOMOCOv11-secondary-C:   13
## | HOCOMOCOv11-secondary-D:  290
## |              HOMER:  332
## |               hPDI:  437
## |        JASPAR_2014:  592
## |        JASPAR_CORE:  459
## |         jaspar2016: 1209
## |         jaspar2018: 1564
## |          jolma2013:  843
## |             ScerTF:  196
## |            stamlab:  683
## |       SwissRegulon:  684
## |           UniPROBE:  380
## | 61 organism/s
## |           Hsapiens: 5384
## |          Mmusculus: 1411
## |      Dmelanogaster: 1287
## |        Scerevisiae: 1051
## |          Athaliana:  803
## |           Celegans:   90
## |              other:  675
## Scerevisiae-cisbp_1.02-M0001_1.02 
## Scerevisiae-cisbp_1.02-M0002_1.02 
## Scerevisiae-cisbp_1.02-M0003_1.02 
## Csativa-cisbp_1.02-M0004_1.02 
## Athaliana-cisbp_1.02-M0005_1.02 
## ...
## Mmusculus-UniPROBE-Zfp740.UP00022 
## Mmusculus-UniPROBE-Zic1.UP00102 
## Mmusculus-UniPROBE-Zic2.UP00057 
## Mmusculus-UniPROBE-Zic3.UP00006 
## Mmusculus-UniPROBE-Zscan4.UP00026

MotifDb

MotifDb object is special class of object called a MotifList.

class(MotifDb)

## [1] "MotifList"
## attr(,"package")
## [1] "MotifDb"

MotifDb

Like standard List objects we can use length and names to get some information on our object

length(MotifDb)

## [1] 10701

MotifNames <- names(MotifDb)
MotifNames[1:10]

##  [1] "Scerevisiae-cisbp_1.02-M0001_1.02" "Scerevisiae-cisbp_1.02-M0002_1.02"
##  [3] "Scerevisiae-cisbp_1.02-M0003_1.02" "Csativa-cisbp_1.02-M0004_1.02"    
##  [5] "Athaliana-cisbp_1.02-M0005_1.02"   "Athaliana-cisbp_1.02-M0006_1.02"  
##  [7] "Athaliana-cisbp_1.02-M0007_1.02"   "Athaliana-cisbp_1.02-M0008_1.02"  
##  [9] "Athaliana-cisbp_1.02-M0009_1.02"   "Athaliana-cisbp_1.02-M0010_1.02"

Accesing MotifDb contents

We can also access information directly from our list using standard list accessors.

Here a [ will subset to a single MotifList. Now we can see the information held in the MotifList a little more clearly.

MotifDb[1]

## MotifDb object of length 1
## | Created from downloaded public sources: 2013-Aug-30
## | 1 position frequency matrices from 1 source:
## |         cisbp_1.02:    1
## | 1 organism/s
## |        Scerevisiae:    1
## Scerevisiae-cisbp_1.02-M0001_1.02

Accesing MotifDb contents

A [[ will subset to object within the element as with standard lists. Here we extract the position probability matrix.

MotifDb[[1]]

##             1           2           3           4           5           6
## A 0.009615385 0.009615385 0.971153846 0.009615385 0.009615385 0.971153846
## C 0.009615385 0.009615385 0.009615385 0.009615385 0.971153846 0.009615385
## G 0.009615385 0.009615385 0.009615385 0.009615385 0.009615385 0.009615385
## T 0.971153846 0.971153846 0.009615385 0.971153846 0.009615385 0.009615385
##             7           8
## A 0.009615385 0.009615385
## C 0.971153846 0.009615385
## G 0.009615385 0.009615385
## T 0.009615385 0.971153846

colSums(MotifDb[[1]])

## 1 2 3 4 5 6 7 8 
## 1 1 1 1 1 1 1 1

Accesing MotifDb contents

We can extract a DataFrame of all the motif metadata information using the values() function.

values(MotifDb)[1:2, ]

## DataFrame with 2 rows and 15 columns
##                                   providerName     providerId  dataSource
##                                    <character>    <character> <character>
## Scerevisiae-cisbp_1.02-M0001_1.02   M0001_1.02 MA0000004_1.02  cisbp_1.02
## Scerevisiae-cisbp_1.02-M0002_1.02   M0002_1.02 MA0000135_1.02  cisbp_1.02
##                                    geneSymbol      geneId  geneIdType
##                                   <character> <character> <character>
## Scerevisiae-cisbp_1.02-M0001_1.02        ABF1          NA          NA
## Scerevisiae-cisbp_1.02-M0002_1.02        AFT1          NA          NA
##                                     proteinId proteinIdType    organism
##                                   <character>   <character> <character>
## Scerevisiae-cisbp_1.02-M0001_1.02     YKL112W           SGD Scerevisiae
## Scerevisiae-cisbp_1.02-M0002_1.02     YGL071W           SGD Scerevisiae
##                                   sequenceCount bindingSequence bindingDomain
##                                     <character>     <character>   <character>
## Scerevisiae-cisbp_1.02-M0001_1.02            NA              NA            NA
## Scerevisiae-cisbp_1.02-M0002_1.02            NA              NA            NA
##                                      tfFamily experimentType    pubmedID
##                                   <character>    <character> <character>
## Scerevisiae-cisbp_1.02-M0001_1.02        ABF1             NA    19111667
## Scerevisiae-cisbp_1.02-M0002_1.02         AFT             NA    19158363

Accesing MotifDb contents

We can use the query function to subset our MotifList by infomation in the metadata.

CTCFMotifs <- query(MotifDb, "CTCF")
CTCFMotifs

## MotifDb object of length 19
## | Created from downloaded public sources: 2013-Aug-30
## | 19 position frequency matrices from 9 sources:
## |        HOCOMOCOv10:    3
## | HOCOMOCOv11-core-A:    2
## |              HOMER:    3
## |        JASPAR_2014:    2
## |        JASPAR_CORE:    1
## |         jaspar2016:    2
## |         jaspar2018:    3
## |          jolma2013:    1
## |       SwissRegulon:    2
## | 4 organism/s
## |           Hsapiens:   12
## |      Dmelanogaster:    3
## |          Mmusculus:    1
## |              other:    3
## Hsapiens-HOCOMOCOv10-CTCFL_HUMAN.H10MO.A 
## Hsapiens-HOCOMOCOv10-CTCF_HUMAN.H10MO.A 
## Mmusculus-HOCOMOCOv10-CTCF_MOUSE.H10MO.A 
## Hsapiens-HOCOMOCOv11-core-A-CTCFL_HUMAN.H11MO.0.A 
## Hsapiens-HOCOMOCOv11-core-A-CTCF_HUMAN.H11MO.0.A 
## ...
## Dmelanogaster-jaspar2018-CTCF-MA0531.1 
## Hsapiens-jaspar2018-CTCFL-MA1102.1 
## Hsapiens-jolma2013-CTCF 
## Hsapiens-SwissRegulon-CTCFL.SwissRegulon 
## Hsapiens-SwissRegulon-CTCF.SwissRegulon

Accesing MotifDb contents

For more specific queries, multiple words can be used for filtering.

CTCFMotifs <- query(MotifDb, c("CTCF", "hsapiens", "jaspar2018"))
CTCFMotifs

## MotifDb object of length 2
## | Created from downloaded public sources: 2013-Aug-30
## | 2 position frequency matrices from 1 source:
## |         jaspar2018:    2
## | 1 organism/s
## |           Hsapiens:    2
## Hsapiens-jaspar2018-CTCF-MA0139.1 
## Hsapiens-jaspar2018-CTCFL-MA1102.1

JASPAR2020 package

The JASPAR packages are updated more frequently and so may contain motifs not characterized in the MotifDb package.

We can load the JASPAR package as usual and review the object as before.

library(JASPAR2020)
JASPAR2020

## An object of class "JASPAR2020"
## Slot "db":
## [1] "/usr/local/lib/R/host-site-library/JASPAR2020/extdata/JASPAR2020.sqlite"

JASPAR2020 and TFBSTools

To interact with JASPAR package we will make use of the TFBSTools package from the same lab.

Whereas the JASPAR package holds the information on Motifs and Position Probability Matrices (PPMs), TFBSTools has the functionality to manipulate and interact with these tools.

Three useful functions available from TFBStools to interact with the JASPAR databases are the getMatrixSet, getMatrixByID and getMatrixByID.

library(TFBSTools)
`?`(getMatrixByID)

TFBSTools for TFs

The getMatrixByID and getMatrixByName take the JASPAR DB object and a JASPAR ID or transcription factor name respectively.

Here we are using the transcription factor GATA2. The result is a new object class PFMatrix.

GATA2mat <- getMatrixByName(JASPAR2020, "GATA2")
class(GATA2mat)

## [1] "PFMatrix"
## attr(,"package")
## [1] "TFBSTools"

TFBSTools for motifs

JASPAR IDs are unique, so we can use them to select the exact motif we wanted.

GATA2mat <- getMatrixByID(JASPAR2020, "MA0036.2")

TFBSTools for motifs

List accessors will not work here but we can retrieve names using the ID function.

ID(GATA2mat)

## [1] "MA0036.2"

Position Frequency Matrix

To get hold of the Position Frequency Matrix (PFM) we can use the Matrix or as.matrix functions.

myMatrix <- Matrix(GATA2mat)
myMatrixToo <- as.matrix(myMatrix)
myMatrix

##   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## A 1569 1125 1555 1051   32    9    0   51 4380     0     0  1225   104  1256
## C  567  768  729  525 1057 3015  192    0    0     0  4380     0  1765  1185
## G  754 1709 1220  427  765  914    0    0    0     0     0     0  1967   224
## T 1490  778  876 2377 2526  442 4188 4329    0  4380     0  3155   544  1715

TFBSTools for motif sets

We can also use the getMatrixSet() function to retrieve sets of motifs. We can specify a list of options for motifs we want to retrieve.

To see the available filters use the help for getMatrixSet() function, ?getMatrixSet.

opts <- list()
opts[["collection"]] <- "CORE"
opts[["tax_group"]] <- "vertebrates"
motifList <- getMatrixSet(JASPAR2020, opts)

Visualizing motifs

The seqLogo package offers a simple and intuitive way of visualizing our base frequencies within out motifs using the seqLogo function.

A simple seqLogo shows the relative frequency of a base at each motif position by the relative size of the base compared to other bases at the same position.

Visualizing motifs

To create a seqLogo we must supply a PPM to the seqLogo function. Here we grab a PPM straight from MotidDb for CTCF.

Here we set the ic.scale to FALSE to show the probability across the Y-axis

library(seqLogo)
CTCFMotifs <- query(MotifDb, "CTCF")
seqLogo::seqLogo(CTCFMotifs[[1]], ic.scale = FALSE)

Visualizing motifs

It may be useful to plot the probabilities as information content. Information content here will range from 0 to 2 Bits. Positions with equal probabilities for each base will score 0 and positions with only 1 possible base will score 2.

This allows you to quickly identify the important bases.

seqLogo::seqLogo(CTCFMotifs[[1]])

Visualizing motifs

The TFBSTools package and JASPAR provide us with point frequency matrices which we can’t use directly in the seqLogo package

myMatrix

##   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## A 1569 1125 1555 1051   32    9    0   51 4380     0     0  1225   104  1256
## C  567  768  729  525 1057 3015  192    0    0     0  4380     0  1765  1185
## G  754 1709 1220  427  765  914    0    0    0     0     0     0  1967   224
## T 1490  778  876 2377 2526  442 4188 4329    0  4380     0  3155   544  1715

Visualizing motifs

We can convert our point frequency matrix to a point probabilty matrix by simply dividing columns by their sum.

ppm <- myMatrix/colSums(myMatrix)
ppm

##        [,1]      [,2]      [,3]       [,4]        [,5]        [,6]       [,7]
## A 0.3582192 0.2568493 0.3550228 0.23995434 0.007305936 0.002054795 0.00000000
## C 0.1294521 0.1753425 0.1664384 0.11986301 0.241324201 0.688356164 0.04383562
## G 0.1721461 0.3901826 0.2785388 0.09748858 0.174657534 0.208675799 0.00000000
## T 0.3401826 0.1776256 0.2000000 0.54269406 0.576712329 0.100913242 0.95616438
##         [,8] [,9] [,10] [,11]     [,12]      [,13]      [,14]
## A 0.01164384    1     0     0 0.2796804 0.02374429 0.28675799
## C 0.00000000    0     0     1 0.0000000 0.40296804 0.27054795
## G 0.00000000    0     0     0 0.0000000 0.44908676 0.05114155
## T 0.98835616    0     1     0 0.7203196 0.12420091 0.39155251

Visualizing motifs

We can then plot the result matrix using seqLogo.

seqLogo::seqLogo(ppm)

Visualizing motifs

Fortunately, TFBSTools has its own version of seqLogo function we can use with one of its own ICMatrix classes. We simply need to convert our object with the toICM() function.

GATA2_IC <- toICM(GATA2mat)
TFBSTools::seqLogo(GATA2_IC)

Visualizing with ggplot

We can also use a ggplot2 style grammar of graphics with the ggseqlogo package.

library(ggseqlogo)
library(ggplot2)
ggseqlogo(myMatrix) + theme_minimal()

## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

Identifying Motifs in ATAC

Known motif identification

Since ATACseq only tells us which regions are open/accessible, we can use motifs within our ATACseq peak regions to identify potential transcription factors which may be acting there.

The databases such as MotifDb and JASPAR2020, provide sets of known motif patterns for us to use our motif identification.

motifmatchr

To identify known motifs we will use the motifmatchr package which is a wrapper for the MOODS c++ library.

This means that motifmatchr offers us a fast methods to identify motifs within our ATACseq data.

We will use motifmatchr/MOODs with a default p-value cut-off.

motifmatchr

First we can retrieve a sensible set of motifs to scan for within our mouse tissue ATACseq data.

Here we retrieve the vertebrate, JASPAR CORE motifs. We additional specify all_versions is FALSE to only include the latest version of a motif.

opts <- list()
opts[["tax_group"]] <- "vertebrates"
opts[["collection"]] <- "CORE"
opts[["all_versions"]] <- FALSE
motifsToScan <- getMatrixSet(JASPAR2020, opts)

motifmatchr

The motifmatchr package main function is matchMotifs().

As with many Bioconductor functions , matchMotifs makes use of other Bioconductor objects such the BSGenome, GRanges and summarizedExperiment objects.

We can review the full functionality by reading the help for matchMotifs, ?matchMotifs.

ENCODE ATAC data

We need some data to review so we can make use of the ENCODE mouse tissue ATAC data.

We can load in the counts in a summarizedExperiment object from the data directory.

load("data/myCounts.RData")
myCounts

## class: RangedSummarizedExperiment 
## dim: 36320 6 
## metadata(0):
## assays(1): counts
## rownames: NULL
## rowData names(6):
##   Sorted_Hindbrain_day_12_1_Small_Paired_peaks.narrowPeak
##   Sorted_Hindbrain_day_12_2_Small_Paired_peaks.narrowPeak ...
##   Sorted_Liver_day_12_1_Small_Paired_peaks.narrowPeak
##   Sorted_Liver_day_12_2_Small_Paired_peaks.narrowPeak
## colnames(6): Sorted_Hindbrain_day_12_1_Small_Paired.bam
##   Sorted_Hindbrain_day_12_2_Small_Paired.bam ...
##   Sorted_Liver_day_12_1_Small_Paired.bam
##   Sorted_Liver_day_12_2_Small_Paired.bam
## colData names(0):

ENCODE ATAC peaks

We can retrieve the ranges of our peaks directly from our SummarizedExperiment object by using the standard accessor rowRanges.

peakRanges <- rowRanges(myCounts)
peakRanges[1, ]

## GRanges object with 1 range and 6 metadata columns:
##       seqnames          ranges strand |
##          <Rle>       <IRanges>  <Rle> |
##   [1]     chr1 3433954-3434095      * |
##       Sorted_Hindbrain_day_12_1_Small_Paired_peaks.narrowPeak
##                                                     <logical>
##   [1]                                                   FALSE
##       Sorted_Hindbrain_day_12_2_Small_Paired_peaks.narrowPeak
##                                                     <logical>
##   [1]                                                   FALSE
##       Sorted_Kidney_day_15_1_Small_Paired_peaks.narrowPeak
##                                                  <logical>
##   [1]                                                 TRUE
##       Sorted_Kidney_day_15_2_Small_Paired_peaks.narrowPeak
##                                                  <logical>
##   [1]                                                 TRUE
##       Sorted_Liver_day_12_1_Small_Paired_peaks.narrowPeak
##                                                 <logical>
##   [1]                                               FALSE
##       Sorted_Liver_day_12_2_Small_Paired_peaks.narrowPeak
##                                                 <logical>
##   [1]                                               FALSE
##   -------
##   seqinfo: 22 sequences from an unspecified genome; no seqlengths

ENCODE ATAC peaks

We can use the getSeq() function to extract sequences from a BSGenome object using a GRanges object to define the regions of interest. This is a similar approach to de novo motif discovery for ChIPseq.

We load the BSgenome.Mmusculus.UCSC.mm10 library and re-center our peaks to 100bp using the resize function. Once resized we can extract the required sequences using the getSeq() function.

library(BSgenome.Mmusculus.UCSC.mm10)
peakRangesCentered <- resize(peakRanges, fix = "center", width = 100)
peakSeqs <- getSeq(BSgenome.Mmusculus.UCSC.mm10, peakRangesCentered)
names(peakSeqs) <- as.character(peakRangesCentered)
peakSeqs

## DNAStringSet object of length 36320:
##         width seq                                           names               
##     [1]   100 GGTTGTGCACTGTAAGCTGAC...CACTTCTGCTTCTGTCTGGCT chr1:3433975-3434074
##     [2]   100 AGTGCCTAGTCTTACTGCAAC...AGAAGAGGAGTAGGTGGGAGT chr1:3575937-3576036
##     [3]   100 CGCCGCCGCCGCCAGCAGCAG...GCGCGCTCCCTCCTCCCGCAG chr1:3671660-3671759
##     [4]   100 CTTCCCCCACCGCTGACACAC...TCTTCCCTTCACCCACCCTGA chr1:3994882-3994981
##     [5]   100 GAGGCAGCACCCTTTGTGGGC...GGCCTAAGCCACAGCAGCAGC chr1:4560256-4560355
##     ...   ... ...
## [36316]   100 GCCATGTGCTCGCAGCTCCGC...CACGGCCCCGCCGGCTCCCCT chrY:90784203-907...
## [36317]   100 GGAGTTGTAGTCCTGCCGTAG...TGGGAGTTGCAGCGCCATGTT chrY:90784972-907...
## [36318]   100 CTCTGGAGACCAGGCTGACCT...ACCACGCCCGGATTTTTACCA chrY:90801528-908...
## [36319]   100 CTTCTCTGCCTCATGGGGACG...CGTCCCGGGCAGGCGGCGTCT chrY:90807427-908...
## [36320]   100 GGGGGGTTAGGTTAGGGTTAG...GGGGGTTAGGTTAGGGTTAGG chrY:90813049-908...

Finding motif positions

As we saw in its help, The matchMotifs function can provide output of motif matches as matches, scores or positions.

Here we scan for 4 selected motifs from JASPAR2020 using the sequence under the first 100 ATACseq peaks and specify the out as positions.

motif_positions <- matchMotifs(motifsToScan[1:4], peakSeqs[1:100], out = "positions")
class(motif_positions)

## [1] "list"

length(motif_positions)

## [1] 4

Finding motif positions

The result contains a list of the same length as the number of motifs tested.

Each element contains a IRangeslist with an entry for every sequence tested and a IRanges of motif positions within the peak sequences.

motif_positions$MA0029.1

## IRangesList object of length 100:
## [[1]]
## IRanges object with 0 ranges and 2 metadata columns:
##        start       end     width |      strand     score
##    <integer> <integer> <integer> | <character> <numeric>
## 
## [[2]]
## IRanges object with 0 ranges and 2 metadata columns:
##        start       end     width |      strand     score
##    <integer> <integer> <integer> | <character> <numeric>
## 
## [[3]]
## IRanges object with 0 ranges and 2 metadata columns:
##        start       end     width |      strand     score
##    <integer> <integer> <integer> | <character> <numeric>
## 
## ...
## <97 more elements>

Finding motif positions

We can unlist our IRangeslist to a standard list for easier working.

MA0029hits <- motif_positions$MA0029.1
names(MA0029hits) <- names(peakSeqs[1:100])
unlist(MA0029hits, use.names = TRUE)

## IRanges object with 1 range and 2 metadata columns:
##                              start       end     width |      strand     score
##                          <integer> <integer> <integer> | <character> <numeric>
##   chr1:13566347-13566446        56        69        14 |           +   10.0096

Finding motif hits

We may simply want to map motifs to their ATACseq peaks.

To do this we can set the out parameter to matches. This will return a SummarizedExperiment object.

motifHits <- matchMotifs(motifsToScan, peakSeqs, out = "matches")
class(motifHits)

## [1] "SummarizedExperiment"
## attr(,"package")
## [1] "SummarizedExperiment"

motifHits

## class: SummarizedExperiment 
## dim: 36320 746 
## metadata(0):
## assays(1): motifMatches
## rownames: NULL
## rowData names(0):
## colnames(746): MA0004.1 MA0006.1 ... MA0528.2 MA0609.2
## colData names(1): name

Finding motif hits

We can retrieve a matrix of matches by motif and peak using the motifMatches function.

mmMatrix <- motifMatches(motifHits)
dim(mmMatrix)

## [1] 36320   746

mmMatrix[1:8, 1:8]

## 8 x 8 sparse Matrix of class "lgCMatrix"
##      MA0004.1 MA0006.1 MA0019.1 MA0029.1 MA0030.1 MA0031.1 MA0040.1 MA0041.1
## [1,]        .        .        .        .        .        .        .        .
## [2,]        .        .        .        .        .        .        .        .
## [3,]        .        .        .        .        .        .        .        .
## [4,]        .        .        .        .        .        .        .        .
## [5,]        .        .        .        .        .        .        .        .
## [6,]        .        .        .        .        .        .        .        .
## [7,]        .        |        .        .        .        .        .        .
## [8,]        .        .        .        .        .        .        .        .

Finding motif hits

Although a sparse matrix, we can still use our matrix operations to extract useful information from this object.

We can use the colSums() to identify the total occurrence of motifs in our peak sequences.

totalMotifOccurence <- colSums(mmMatrix)
totalMotifOccurence[1:4]

## MA0004.1 MA0006.1 MA0019.1 MA0029.1 
##     1137     1303      184     1163

Finding motif hits

We can also identify peaks which contain a hit for a selected motif.

peaksWithMA0912 <- peakRangesCentered[mmMatrix[, "MA0912.2"] == 1]
peaksWithMA0912

## GRanges object with 361 ranges and 6 metadata columns:
##         seqnames              ranges strand |
##            <Rle>           <IRanges>  <Rle> |
##     [1]     chr1     5232911-5233010      * |
##     [2]     chr1     9783650-9783749      * |
##     [3]     chr1   15644761-15644860      * |
##     [4]     chr1   31233772-31233871      * |
##     [5]     chr1   39393610-39393709      * |
##     ...      ...                 ...    ... .
##   [357]     chrX 101332594-101332693      * |
##   [358]     chrX 101429501-101429600      * |
##   [359]     chrX 109012641-109012740      * |
##   [360]     chrX 166385018-166385117      * |
##   [361]     chrX 166672780-166672879      * |
##         Sorted_Hindbrain_day_12_1_Small_Paired_peaks.narrowPeak
##                                                       <logical>
##     [1]                                                    TRUE
##     [2]                                                   FALSE
##     [3]                                                   FALSE
##     [4]                                                    TRUE
##     [5]                                                   FALSE
##     ...                                                     ...
##   [357]                                                    TRUE
##   [358]                                                    TRUE
##   [359]                                                   FALSE
##   [360]                                                   FALSE
##   [361]                                                   FALSE
##         Sorted_Hindbrain_day_12_2_Small_Paired_peaks.narrowPeak
##                                                       <logical>
##     [1]                                                    TRUE
##     [2]                                                   FALSE
##     [3]                                                   FALSE
##     [4]                                                    TRUE
##     [5]                                                   FALSE
##     ...                                                     ...
##   [357]                                                    TRUE
##   [358]                                                    TRUE
##   [359]                                                   FALSE
##   [360]                                                   FALSE
##   [361]                                                   FALSE
##         Sorted_Kidney_day_15_1_Small_Paired_peaks.narrowPeak
##                                                    <logical>
##     [1]                                                FALSE
##     [2]                                                FALSE
##     [3]                                                 TRUE
##     [4]                                                FALSE
##     [5]                                                FALSE
##     ...                                                  ...
##   [357]                                                 TRUE
##   [358]                                                 TRUE
##   [359]                                                FALSE
##   [360]                                                 TRUE
##   [361]                                                FALSE
##         Sorted_Kidney_day_15_2_Small_Paired_peaks.narrowPeak
##                                                    <logical>
##     [1]                                                FALSE
##     [2]                                                FALSE
##     [3]                                                 TRUE
##     [4]                                                FALSE
##     [5]                                                FALSE
##     ...                                                  ...
##   [357]                                                 TRUE
##   [358]                                                 TRUE
##   [359]                                                FALSE
##   [360]                                                 TRUE
##   [361]                                                FALSE
##         Sorted_Liver_day_12_1_Small_Paired_peaks.narrowPeak
##                                                   <logical>
##     [1]                                               FALSE
##     [2]                                                TRUE
##     [3]                                               FALSE
##     [4]                                               FALSE
##     [5]                                                TRUE
##     ...                                                 ...
##   [357]                                                TRUE
##   [358]                                                TRUE
##   [359]                                                TRUE
##   [360]                                               FALSE
##   [361]                                                TRUE
##         Sorted_Liver_day_12_2_Small_Paired_peaks.narrowPeak
##                                                   <logical>
##     [1]                                               FALSE
##     [2]                                                TRUE
##     [3]                                               FALSE
##     [4]                                               FALSE
##     [5]                                                TRUE
##     ...                                                 ...
##   [357]                                                TRUE
##   [358]                                                TRUE
##   [359]                                                TRUE
##   [360]                                               FALSE
##   [361]                                                TRUE
##   -------
##   seqinfo: 22 sequences from an unspecified genome; no seqlengths

summarizing ATAC signal to Motifs

The chromVar package allows for the summarization of ATACseq signal changes to the motifs within peaks.

With this summarization we can potentially identify which motifs may have an important role in a set of ATAC samples compared to other samples.

The chromVar package comes from the same lab and author as the motifmatchr package and so will work well together.

First we load the library.

library(chromVAR)

Setting up for chromVar

To identify motifs for chomVar we will use motifMatchr with a different set of inputs.

Here we will provide our RangedSummarizedExperiment object contaitng our counts over peaks directly to the matchMotifs function.

First we will remove any peaks with less than 5 reads across all samples.

myCounts <- myCounts[rowSums(assay(myCounts)) > 5, ]

Correct GC bias

Next we can correct for any potential GC bias which may have arisen in sequencing. Previously we have seen that we have GC bias observable in our FastQ quality checks.

For comparing across differing sets of peaks with differing sequences compositions we will need to correct for this bias. We can use the addGCBias and specify our genome as a BSgenome object to correct in chromVar.

myCounts <- addGCBias(myCounts, genome = BSgenome.Mmusculus.UCSC.mm10)

ID motifs in peaks

Having corrected for bias, we can use the matchMotifs function again to identify motifs under our ATACseq peaks.

Here we supply our RangedSummarizedExperiment of counts in peaks and the genome of interest to the matchMotifs function and use the default out of matches.

motif_ix <- matchMotifs(motifsToScan, myCounts, genome = BSgenome.Mmusculus.UCSC.mm10)
motif_ix

## class: RangedSummarizedExperiment 
## dim: 36320 746 
## metadata(0):
## assays(1): motifMatches
## rownames: NULL
## rowData names(7):
##   Sorted_Hindbrain_day_12_1_Small_Paired_peaks.narrowPeak
##   Sorted_Hindbrain_day_12_2_Small_Paired_peaks.narrowPeak ...
##   Sorted_Liver_day_12_2_Small_Paired_peaks.narrowPeak bias
## colnames(746): MA0004.1 MA0006.1 ... MA0528.2 MA0609.2
## colData names(1): name

Run chromVar

Following the identification of motifs in our peaks, we can perform the summarization of ATACseq signal to motifs using the computeDeviations and the computeVariability functions.

deviations <- computeDeviations(object = myCounts, annotations = motif_ix)
variability_Known <- computeVariability(deviations)

chromVar deviations

The deviations result contains a SummarizedExperiment object with the Z-scores showing the enrichment for ATACseq signal in each sample for every motif.

devZscores <- deviationScores(deviations)
devZscores[1:2, ]

##          Sorted_Hindbrain_day_12_1_Small_Paired.bam
## MA0004.1                                   3.679156
## MA0006.1                                   8.046464
##          Sorted_Hindbrain_day_12_2_Small_Paired.bam
## MA0004.1                                   5.431137
## MA0006.1                                   7.633895
##          Sorted_Kidney_day_15_1_Small_Paired.bam
## MA0004.1                                4.114432
## MA0006.1                                6.570759
##          Sorted_Kidney_day_15_2_Small_Paired.bam
## MA0004.1                                4.564610
## MA0006.1                                6.004668
##          Sorted_Liver_day_12_1_Small_Paired.bam
## MA0004.1                              -4.276874
## MA0006.1                              -8.348624
##          Sorted_Liver_day_12_2_Small_Paired.bam
## MA0004.1                              -6.653514
## MA0006.1                              -6.824266

chromVar variability

The variability result contains the ranking of motifs by their variability across samples. Highly variable motifs may indicate motifs associated with a particular sample group or variable across all groups.

variability_Known <- variability_Known[order(variability_Known$p_value), ]
variability_Known[1:10, ]

##                 name variability bootstrap_lower_bound bootstrap_upper_bound
## MA0140.2 GATA1::TAL1    31.47098             3.1948304              34.67008
## MA0600.2        RFX2    17.77072             7.3944769              21.22610
## MA0036.3       GATA2    32.01213             0.3921849              34.17913
## MA0037.3       GATA3    25.31902             0.8039623              27.44567
## MA0766.2       GATA5    25.01678             0.4178051              26.75562
## MA0035.4       GATA1    26.04815             0.8122387              27.98797
## MA0482.2       GATA4    27.45124             0.8642210              29.45822
## MA1104.2       GATA6    29.73974             0.5806377              31.84163
## MA0509.2        RFX1    17.52837             8.5454333              21.34927
## MA0867.2        SOX4    16.47110             5.8375559              19.59171
##                p_value   p_value_adj
## MA0140.2  0.000000e+00  0.000000e+00
## MA0600.2  0.000000e+00  0.000000e+00
## MA0036.3  0.000000e+00  0.000000e+00
## MA0037.3  0.000000e+00  0.000000e+00
## MA0766.2  0.000000e+00  0.000000e+00
## MA0035.4  0.000000e+00  0.000000e+00
## MA0482.2  0.000000e+00  0.000000e+00
## MA1104.2  0.000000e+00  0.000000e+00
## MA0509.2  0.000000e+00  0.000000e+00
## MA0867.2 3.690799e-291 2.749645e-289

chromVar results

We can use the results from the variability with our Z-score deviations to identify in which sample our motifs are enriched.

topVariable <- variability_Known[1:20, ]
devTop <- merge(topVariable[, 1, drop = FALSE], devZscores, by = 0)
devTop[1:2, ]

##   Row.names  name Sorted_Hindbrain_day_12_1_Small_Paired.bam
## 1  MA0035.4 GATA1                                  -24.29333
## 2  MA0036.3 GATA2                                  -29.55701
##   Sorted_Hindbrain_day_12_2_Small_Paired.bam
## 1                                  -22.56358
## 2                                  -30.63070
##   Sorted_Kidney_day_15_1_Small_Paired.bam
## 1                               -25.01967
## 2                               -30.24737
##   Sorted_Kidney_day_15_2_Small_Paired.bam
## 1                               -24.19630
## 2                               -29.73203
##   Sorted_Liver_day_12_1_Small_Paired.bam Sorted_Liver_day_12_2_Small_Paired.bam
## 1                               25.81764                               26.97208
## 2                               31.46003                               32.42450

chromVar results

A useful way to visualize these results is using a heatmap. Although we cover this in a later session, here we can use the pheatmap library with default settings to illustrate where our most variable motifs are active.

devToPlot <- as.matrix(devTop[, -c(1:2)])
rownames(devToPlot) <- devTop[, 2]
library(pheatmap)
pheatmap(devToPlot)

Identifying Motifs de novo in ATAC

De novo motif discovery

So far we have reviewed how to identify known motifs within our ATACseq peaks.

In our ChIPseq training we cover howt we can use the Meme-ChIP software to perform De novo motif identification.

Here we wll use the Meme-ChIP software in a discriminative mode to identify De novo motifs enriched in one set of samples over another.

Differential peaks

First we will need to identify a set of peaks enriched for ATACseq signal in one group over another. To do this we will use DESeq2 to identify ATACseq peaks enriched in 1 set over another.

We can load our Liver, Kidney and Brain counts from earlier into a DEseq object for differential analysis.

require(DESeq2)
load("data/myCounts.RData")
Group <- factor(c("HindBrain", "HindBrain", "Kidney", "Kidney", "Liver", "Liver"))
colData(myCounts) <- DataFrame(data.frame(Group, row.names = colnames(myCounts)))
dds <- DESeqDataSet(myCounts, design = ~Group)
dds <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## -- note: fitType='parametric', but the dispersion trend was not well captured by the
##    function: y = a/x + b, and a local regression fit was automatically substituted.
##    specify fitType='local' or 'mean' to avoid this message next time.

## final dispersion estimates

## fitting model and testing

Differential peaks

We can then use the DESeq2 function results to identify differential ATAC regions and set the format parameter to GRanges to allow the return of the result as a GRanges object.

myRes <- results(dds, contrast = c("Group", "Liver", "Kidney"), format = "GRanges")
myRes <- myRes[order(myRes$padj), ]
upRegions <- myRes[myRes$log2FoldChange > 0][1:1000]
downRegions <- myRes[myRes$log2FoldChange < 0, ][1:1000]
upRegions

## GRanges object with 1000 ranges and 6 metadata columns:
##          seqnames              ranges strand |  baseMean log2FoldChange
##             <Rle>           <IRanges>  <Rle> | <numeric>      <numeric>
##      [1]     chr7 100467072-100467556      * |  101.6408        3.99023
##      [2]    chr18   32542348-32542812      * |   96.8956        4.11485
##      [3]    chr10   30763197-30763615      * |   94.4364        3.78759
##      [4]    chr15 103258321-103258745      * |   99.6945        3.59581
##      [5]     chr8 123978898-123979311      * |   84.3669        4.13890
##      ...      ...                 ...    ... .       ...            ...
##    [996]     chr5 148897184-148897609      * |   24.8790        3.10487
##    [997]     chr5 147888548-147888858      * |   21.1643        3.45870
##    [998]     chr7 131425006-131425298      * |   25.7261        3.01058
##    [999]     chr3   14886438-14886963      * |   63.2840        1.59576
##   [1000]     chr8 122719754-122720057      * |   26.7802        2.94430
##              lfcSE      stat      pvalue        padj
##          <numeric> <numeric>   <numeric>   <numeric>
##      [1]  0.290132   13.7532 4.87419e-43 1.77031e-38
##      [2]  0.304919   13.4949 1.67615e-41 3.04389e-37
##      [3]  0.281546   13.4528 2.96263e-41 3.58676e-37
##      [4]  0.281400   12.7783 2.16807e-37 1.96861e-33
##      [5]  0.330165   12.5359 4.75193e-36 3.45180e-32
##      ...       ...       ...         ...         ...
##    [996]  0.487194   6.37297 1.85407e-10 5.79017e-09
##    [997]  0.543067   6.36883 1.90469e-10 5.94317e-09
##    [998]  0.472801   6.36754 1.92089e-10 5.98855e-09
##    [999]  0.250669   6.36602 1.93994e-10 6.04275e-09
##   [1000]  0.462528   6.36567 1.94435e-10 6.05130e-09
##   -------
##   seqinfo: 22 sequences from an unspecified genome; no seqlengths

Resize peaks

For Meme-ChIP we will want to resize the regions to 100bp. By default Meme-ChIP will perform this trimming for us but to ensure it is in the centre we will do this upfront.

upRegions <- resize(upRegions, fix = "center", width = 100)
downRegions <- resize(downRegions, fix = "center", width = 100)

Peak sequences

We can now use the getSeq function as previously to extract the signal from around within the GRange regions and write to a FASTA file for use in Meme-ChIP using the writeXStringSet function.

library(BSgenome.Mmusculus.UCSC.mm10)
upStrings <- getSeq(BSgenome.Mmusculus.UCSC.mm10, upRegions)
downStrings <- getSeq(BSgenome.Mmusculus.UCSC.mm10, downRegions)
names(upStrings) <- as.character(upRegions)
names(downStrings) <- as.character(downRegions)
writeXStringSet(upStrings, file = "UpRegions.fa")
writeXStringSet(downStrings, file = "DownStrings.fa")

Meme-ChIP

We can then submit the sample to the Meme-ChIP online submission form found here.

In contrast to for ChIPseq, here we will run this in Differential mode to compare our sequences in Liver enriched regions to those depleted in Liver.

The results can be found in data/memeResult_LiverVsBrain

Using De novo motifs

We may want to use the motifs discovered by Meme-ChIP in other Bioconductor software.

To do this we can use the universalmotif package.

library(universalmotif)

Importing motifs

The universalmotif package provides lots of useful functions for importing motif sets.

One useful and relevant function is the read_meme() function.

memeMotifs <- read_meme("data/memeResult_LiverVsBrain/combined.meme")
memeMotifs

## [[1]]
## 
##        Motif name:   1
##    Alternate name:   HGATA-DREME-1
##          Alphabet:   DNA
##              Type:   PPM
##           Strands:   +-
##          Total IC:   8.78
##       Pseudocount:   1
##         Consensus:   WGATA
##      Target sites:   1236
##           E-value:   7.6e-275
##        Extra info:   [eval.string] 7.6e-275 
## 
##      W G A T A
## A 0.58 0 1 0 1
## C 0.04 0 0 0 0
## G 0.00 1 0 0 0
## T 0.38 0 0 1 0
## 
## [[2]]
## 
##        Motif name:   2
##    Alternate name:   WGATAAG-MEME-1
##          Alphabet:   DNA
##              Type:   PPM
##           Strands:   +-
##          Total IC:   10.2
##       Pseudocount:   1
##         Consensus:   WGATAAG
##      Target sites:   786
##           E-value:   1.4e-96
##        Extra info:   [eval.string] 1.4e-096 
## 
##      W G A T    A    A    G
## A 0.64 0 1 0 0.96 0.82 0.15
## C 0.03 0 0 0 0.00 0.00 0.17
## G 0.01 1 0 0 0.00 0.12 0.63
## T 0.32 0 0 1 0.04 0.05 0.05
## 
## [[3]]
## 
##        Motif name:   3
##    Alternate name:   AYACH-DREME-2
##          Alphabet:   DNA
##              Type:   PPM
##           Strands:   +-
##          Total IC:   7.73
##       Pseudocount:   1
##         Consensus:   ACACH
##      Target sites:   1084
##           E-value:   1.5e-34
##        Extra info:   [eval.string] 1.5e-034 
## 
##   A    C A C    H
## A 1 0.00 1 0 0.38
## C 0 0.83 0 1 0.34
## G 0 0.00 0 0 0.00
## T 0 0.17 0 0 0.28
## 
## [[4]]
## 
##        Motif name:   4
##    Alternate name:   ACAK-DREME-3
##          Alphabet:   DNA
##              Type:   PPM
##           Strands:   +-
##          Total IC:   7.19
##       Pseudocount:   1
##         Consensus:   ACAG
##      Target sites:   1719
##           E-value:   3.7e-17
##        Extra info:   [eval.string] 3.7e-017 
## 
##   A C A    G
## A 1 0 1 0.00
## C 0 1 0 0.00
## G 0 0 0 0.76
## T 0 0 0 0.24
## 
## [[5]]
## 
##        Motif name:   5
##    Alternate name:   CTHATC-DREME-4
##          Alphabet:   DNA
##              Type:   PPM
##           Strands:   +-
##          Total IC:   10.13
##       Pseudocount:   1
##         Consensus:   CTHATC
##      Target sites:   138
##           E-value:   1.1e-10
##        Extra info:   [eval.string] 1.1e-010 
## 
##   C T    H A T C
## A 0 0 0.39 1 0 0
## C 1 0 0.32 0 0 1
## G 0 0 0.00 0 0 0
## T 0 1 0.29 0 1 0
## 
## [[6]]
## 
##        Motif name:   6
##    Alternate name:   YTATCW-MEME-2
##          Alphabet:   DNA
##              Type:   PPM
##           Strands:   +-
##          Total IC:   6.89
##       Pseudocount:   1
##         Consensus:   YTATCW
##      Target sites:   761
##           E-value:   7.4e-09
##        Extra info:   [eval.string] 7.4e-009 
## 
##      Y T    A T    C    W
## A 0.04 0 0.59 0 0.05 0.37
## C 0.35 0 0.00 0 0.86 0.05
## G 0.19 0 0.17 0 0.00 0.01
## T 0.42 1 0.23 1 0.09 0.57
## 
## [[7]]
## 
##        Motif name:   7
##    Alternate name:   ASWA-DREME-5
##          Alphabet:   DNA
##              Type:   PPM
##           Strands:   +-
##          Total IC:   6.11
##       Pseudocount:   1
##         Consensus:   ASWA
##      Target sites:   1492
##           E-value:   6e-07
##        Extra info:   [eval.string] 6.0e-007 
## 
##   A    S   W A
## A 1 0.00 0.7 1
## C 0 0.42 0.0 0
## G 0 0.58 0.0 0
## T 0 0.00 0.3 0
## 
## [[8]]
## 
##        Motif name:   8
##    Alternate name:   ACCCAGW-DREME-6
##          Alphabet:   DNA
##              Type:   PPM
##           Strands:   +-
##          Total IC:   12.13
##       Pseudocount:   1
##         Consensus:   ACCCAGW
##      Target sites:   51
##           E-value:   0.00036
##        Extra info:   [eval.string] 3.6e-004 
## 
##   A C C C A G    W
## A 1 0 0 0 1 0 0.53
## C 0 1 1 1 0 0 0.00
## G 0 0 0 0 0 1 0.00
## T 0 0 0 0 0 0 0.47
## 
## [[9]]
## 
##        Motif name:   9
##    Alternate name:   CMCACCCW-MEME-3
##          Alphabet:   DNA
##              Type:   PPM
##           Strands:   +-
##          Total IC:   12.73
##       Pseudocount:   1
##         Consensus:   CMCACCCW
##      Target sites:   164
##           E-value:   0.029
##        Extra info:   [eval.string] 2.9e-002 
## 
##      C    M C    A    C C C    W
## A 0.01 0.63 0 0.95 0.00 0 0 0.48
## C 0.96 0.37 1 0.00 0.99 1 1 0.07
## G 0.00 0.00 0 0.05 0.01 0 0 0.01
## T 0.03 0.00 0 0.00 0.00 0 0 0.45

Importing motifs

The universalmotif package also provides functions to convert between motif objects from different package. Here we convert to TFBStools motif objects.

memeMotifsTFBStools <- convert_motifs(memeMotifs, "TFBSTools-PWMatrix")
memeMotifsTFBStools

## [[1]]
## An object of class PWMatrix
## ID: HGATA-DREME-1
## Name: 1
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags: 
## $eval.string
## [1] "7.6e-275"
## 
## Background: 
##      A      C      G      T 
## 0.2304 0.2696 0.2696 0.2304 
## Matrix: 
##             W          G         A         T         A
## A   1.3294375 -10.272630   2.11689 -10.27263   2.11689
## C  -2.5931079 -10.272630 -10.27263 -10.27263 -10.27263
## G -10.2726298   1.890255 -10.27263 -10.27263 -10.27263
## T   0.7069592 -10.272630 -10.27263   2.11689 -10.27263
## 
## [[2]]
## An object of class PWMatrix
## ID: WGATAAG-MEME-1
## Name: 2
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags: 
## $eval.string
## [1] "1.4e-096"
## 
## Background: 
##      A      C      G      T 
## 0.2304 0.2696 0.2696 0.2304 
## Matrix: 
##            W         G         A         T         A         A          G
## A  1.4668977 -9.620220  2.116376 -9.620220  2.060250  1.835703 -0.6292295
## C -3.1887377 -9.620220 -9.620220 -9.620220 -9.620220 -6.019998 -0.6709083
## G -4.5166051  1.889768 -9.620220 -9.620220 -9.620220 -1.155168  1.2171203
## T  0.4818744 -9.620220 -9.620220  2.116376 -2.584504 -2.136796 -2.0684582
## 
## [[3]]
## An object of class PWMatrix
## ID: AYACH-DREME-2
## Name: 3
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags: 
## $eval.string
## [1] "1.5e-034"
## 
## Background: 
##      A      C      G      T 
## 0.2304 0.2696 0.2696 0.2304 
## Matrix: 
##            A           C          A          C           H
## A   2.116764 -10.0834793   2.116764 -10.083479   0.7390085
## C -10.083479   1.6154172 -10.083479   1.890136   0.3243904
## G -10.083479 -10.0834793 -10.083479 -10.083479 -10.0834793
## T -10.083479  -0.4093361 -10.083479 -10.083479   0.2690316
## 
## [[4]]
## An object of class PWMatrix
## ID: ACAK-DREME-3
## Name: 4
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags: 
## $eval.string
## [1] "3.7e-017"
## 
## Background: 
##      A      C      G      T 
## 0.2304 0.2696 0.2696 0.2304 
## Matrix: 
##            A          C          A            G
## A   2.117142 -10.748193   2.117142 -10.74819285
## C -10.748193   1.890495 -10.748193 -10.74819285
## G -10.748193 -10.748193 -10.748193   1.49966337
## T -10.748193 -10.748193 -10.748193   0.04283434
## 
## [[5]]
## An object of class PWMatrix
## ID: CTHATC-DREME-4
## Name: 5
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags: 
## $eval.string
## [1] "1.1e-010"
## 
## Background: 
##      A      C      G      T 
## 0.2304 0.2696 0.2696 0.2304 
## Matrix: 
##           C         T          H         A         T         C
## A -7.118941 -7.118941  0.7598762  2.109777 -7.118941 -7.118941
## C  1.883507 -7.118941  0.2404110 -7.118941 -7.118941  1.883507
## G -7.118941 -7.118941 -7.1189411 -7.118941 -7.118941 -7.118941
## T -7.118941  2.109777  0.3290605 -7.118941  2.109777 -7.118941
## 
## [[6]]
## An object of class PWMatrix
## ID: YTATCW-MEME-2
## Name: 6
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags: 
## $eval.string
## [1] "7.4e-009"
## 
## Background: 
##      A      C      G      T 
## 0.2304 0.2696 0.2696 0.2304 
## Matrix: 
##            Y         T            A         T         C          W
## A -2.6366821 -9.573647  1.365054357 -9.573647 -2.125646  0.6642764
## C  0.3796124 -9.573647 -9.573647187 -9.573647  1.671202 -2.3871986
## G -0.5199509 -9.573647 -0.624335673 -9.573647 -9.573647 -4.3222315
## T  0.8805593  2.116330  0.005459193  2.116330 -1.384818  1.3064571
## 
## [[7]]
## An object of class PWMatrix
## ID: ASWA-DREME-5
## Name: 7
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags: 
## $eval.string
## [1] "6.0e-007"
## 
## Background: 
##      A      C      G      T 
## 0.2304 0.2696 0.2696 0.2304 
## Matrix: 
##            A           S           W          A
## A   2.117044 -10.5439985   1.5978626   2.117044
## C -10.543998   0.6423491 -10.5439985 -10.543998
## G -10.543998   1.1024468 -10.5439985 -10.543998
## T -10.543998 -10.5439985   0.3915094 -10.543998
## 
## [[8]]
## An object of class PWMatrix
## ID: ACCCAGW-DREME-6
## Name: 8
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags: 
## $eval.string
## [1] "3.6e-004"
## 
## Background: 
##      A      C      G      T 
## 0.2304 0.2696 0.2696 0.2304 
## Matrix: 
##           A        C        C        C         A        G         W
## A  2.096276 -5.70044 -5.70044 -5.70044  2.096276 -5.70044  1.184494
## C -5.700440  1.87070  1.87070  1.87070 -5.700440 -5.70044 -5.700440
## G -5.700440 -5.70044 -5.70044 -5.70044 -5.700440  1.87070 -5.700440
## T -5.700440 -5.70044 -5.70044 -5.70044 -5.700440 -5.70044  1.016094
## 
## [[9]]
## An object of class PWMatrix
## ID: CMCACCCW-MEME-3
## Name: 9
## Matrix Class: Unknown
## strand: *
## Pseudocounts: 1
## Tags: 
## $eval.string
## [1] "2.9e-002"
## 
## Background: 
##      A      C      G      T 
## 0.2304 0.2696 0.2696 0.2304 
## Matrix: 
##           C          M         C         A         C         C         C
## A -4.949407  1.4411892 -7.366322  2.029732 -7.366322 -7.366322 -7.366322
## C  1.831026  0.4618849  1.884707 -7.366322  1.867034  1.884707  1.884707
## G -7.366322 -7.3663222 -7.366322 -2.262708 -4.292777 -7.366322 -7.366322
## T -2.861614 -7.3663222 -7.366322 -7.366322 -7.366322 -7.366322 -7.366322
##            W
## A  1.0411226
## C -1.9808502
## G -4.2927766
## T  0.9458359

Time for an exercise!

Exercise on ATACseq data can be found here

Answers to exercise

Answers can be found here

ATACseq In Bioconductor

Rockefeller University, Bioinformatics Resource Centre

http://rockefelleruniversity.github.io/RU_ATACseq/

ATACseq (part 2)

Data

ATACseq

ATACseq

Motif Databases

Known Motif sources

MotifDb

MotifDb

MotifDb

Accesing MotifDb contents

Accesing MotifDb contents

Accesing MotifDb contents

Accesing MotifDb contents

Accesing MotifDb contents

JASPAR2020 package

JASPAR2020 and TFBSTools

TFBSTools for TFs

TFBSTools for motifs

TFBSTools for motifs

Position Frequency Matrix

TFBSTools for motif sets

Visualizing motifs

Visualizing motifs

Visualizing motifs

Visualizing motifs

Visualizing motifs

Visualizing motifs

Visualizing motifs

Visualizing motifs

Visualizing with ggplot

Identifying Motifs in ATAC

Known motif identification

motifmatchr

motifmatchr

motifmatchr

ENCODE ATAC data

ENCODE ATAC peaks

ENCODE ATAC peaks

Finding motif positions

Finding motif positions

Finding motif positions

Finding motif hits

Finding motif hits

Finding motif hits

Finding motif hits

summarizing ATAC signal to Motifs

summarizing ATAC signal to Motifs

Setting up for chromVar

Correct GC bias

ID motifs in peaks

Run chromVar

chromVar deviations

chromVar variability

chromVar results

chromVar results

Identifying Motifs de novo in ATAC

De novo motif discovery

Differential peaks

Differential peaks

Resize peaks

Peak sequences

Meme-ChIP

Using De novo motifs

Importing motifs

Importing motifs

Time for an exercise!

Answers to exercise