Epigenomics, Session 6

class: middle, inverse, title-slide

.title[
# Epigenomics, Session 6
]
.subtitle[
## <html><br />
<br />
<hr color='#EB811B' size=1px width=796px><br />
</html><br />
Bioinformatics Resource Center - Rockefeller University
]
.author[
### <a href="http://rockefelleruniversity.github.io/ATAC.Cut-Run.ChIP/" class="uri">http://rockefelleruniversity.github.io/ATAC.Cut-Run.ChIP/</a>
]
.author[
### <a href="mailto:brc@rockefeller.edu" class="email">brc@rockefeller.edu</a>
]

---

##  Recap

1) Fastq QC
2) Alignment
3) Peak calling
4) Technique QC
5) Consensus Building
6) Counting
7) Differentials
8) Annotation and enrichment

---
##  This Session

- [Motif Databases](https://rockefelleruniversity.github.io/Intro_To_R_1Day/r_course/presentations/singlepage/introToR_Session1.html#motif-databases)
- [Visualizing motifs](https://rockefelleruniversity.github.io/Intro_To_R_1Day/r_course/presentations/singlepage/introToR_Session1.html#visualizing-motifs)
- [Motif enrichment analysis](https://rockefelleruniversity.github.io/Intro_To_R_1Day/r_course/presentations/singlepage/introToR_Session1.html#motif-enrichment-analysis)
- [De novo motif analysis](https://rockefelleruniversity.github.io/Intro_To_R_1Day/r_course/presentations/singlepage/introToR_Session1.html#de-novo-motif-analysis)
- [Finding Motifs](https://rockefelleruniversity.github.io/Intro_To_R_1Day/r_course/presentations/singlepage/introToR_Session1.html#finding-motifs)

---
## Our Data

We have been working to process and a characterize developmental changes in the context of the TF Sox9 using data from the Fuchs lab: [*The pioneer factor SOX9 competes for epigenetic factors to switch stem cell fates*](https://www.nature.com/articles/s41556-023-01184-y)

---
## Motifs

Once we have identified regions of interest from our ATAC or Cut&Run often the next step is to investigate the motifs enriched under peaks. 
Motif analysis like this can help find the drivers of epigenomic changes and help create a more mechanistic understanding of your experiment.

For Cut&Run with a known transcription factor this kind of analysis may be less obvious as we have an expected target i.e. Sox9. That said it is still useful to validate our IP, find cofactors, indirect effects and also find specific motif variants.

---
class: inverse, center, middle

# Motif Databases

---

## Known Motif sources

Bioconductor provides two major sources of motifs as database packages.

These include:

* The [MotifDb](https://www.bioconductor.org/packages/release/bioc/html/MotifDb.html) package.
 * The JASPAR databases, [JASPAR2024](https://www.bioconductor.org/packages/release/data/annotation/html/JASPAR2024.html) being latest (they do biannual updates).

---
## MotifDb

The MotifDB package collects motif information from a wide range of sources and stores them in a DB object for use with other Bioconductor packages.

``` r
library(MotifDb)
```

```
## Warning: package 'MotifDb' was built under R version 4.4.1
```

```
## Warning in load(data.file): strings not representable in native encoding will
## be translated to UTF-8
```

```
## See system.file("LICENSE", package="MotifDb") for use restrictions.
```

``` r
MotifDb
```

---
## MotifDb

MotifDb object is special class of object called a **MotifList**.

``` r
class(MotifDb)
```

```
## [1] "MotifList"
## attr(,"package")
## [1] "MotifDb"
```

---
## MotifDb

Like standard List objects we can use length and names to get some information on our object

``` r
length(MotifDb)
```

```
## [1] 12657
```

``` r
MotifNames <- names(MotifDb)
MotifNames[1:10]
```

```
##  [1] "Scerevisiae-ScerTF-ABF2-badis"  "Scerevisiae-ScerTF-CAT8-badis" 
##  [3] "Scerevisiae-ScerTF-CST6-badis"  "Scerevisiae-ScerTF-ECM23-badis"
##  [5] "Scerevisiae-ScerTF-EDS1-badis"  "Scerevisiae-ScerTF-FKH2-badis" 
##  [7] "Scerevisiae-ScerTF-FZF1-badis"  "Scerevisiae-ScerTF-GIS1-badis" 
##  [9] "Scerevisiae-ScerTF-GSM1-badis"  "Scerevisiae-ScerTF-GZF3-badis"
```

---
## Accesing MotifDb contents

We can also access information directly from our list using standard list accessors.

Here a **[** will subset to a single MotifList. Now we can see the information held in the MotifList a little more clearly.

``` r
MotifDb[1]
```

---
## Accesing MotifDb contents

A **[[** will subset to object within the element as with standard lists. Here we extract the position probability matrix.

``` r
MotifDb[[1]]
```

```
##      1    2    3    4    5    6
## A 0.09 0.01 0.01 0.97 0.01 0.94
## C 0.09 0.97 0.01 0.01 0.01 0.02
## G 0.02 0.01 0.01 0.01 0.97 0.02
## T 0.80 0.01 0.97 0.01 0.01 0.02
```

``` r
colSums(MotifDb[[1]])
```

```
## 1 2 3 4 5 6 
## 1 1 1 1 1 1
```

---
## Accesing MotifDb contents

We can extract a DataFrame of all the motif metadata information using the **values()** function.

``` r
values(MotifDb)[1:2, ]
```

```
## DataFrame with 2 rows and 15 columns
##                               providerName  providerId  dataSource  geneSymbol
##                                <character> <character> <character> <character>
## Scerevisiae-ScerTF-ABF2-badis   badis.ABF2        ABF2      ScerTF        ABF2
## Scerevisiae-ScerTF-CAT8-badis   badis.CAT8        CAT8      ScerTF        CAT8
##                                    geneId  geneIdType   proteinId proteinIdType
##                               <character> <character> <character>   <character>
## Scerevisiae-ScerTF-ABF2-badis     YMR072W         SGD      Q02486       UNIPROT
## Scerevisiae-ScerTF-CAT8-badis     YMR280C         SGD      P39113       UNIPROT
##                                  organism sequenceCount bindingSequence
##                               <character>   <character>     <character>
## Scerevisiae-ScerTF-ABF2-badis Scerevisiae            NA              NA
## Scerevisiae-ScerTF-CAT8-badis Scerevisiae            NA              NA
##                               bindingDomain    tfFamily experimentType
##                                 <character> <character>    <character>
## Scerevisiae-ScerTF-ABF2-badis            NA          NA             NA
## Scerevisiae-ScerTF-CAT8-badis            NA          NA             NA
##                                  pubmedID
##                               <character>
## Scerevisiae-ScerTF-ABF2-badis    19111667
## Scerevisiae-ScerTF-CAT8-badis    19111667
```

---
## Accesing MotifDb contents

We can use the **query** function to subset our MotifList by infomation in the metadata.

``` r
Sox9Motifs <- query(MotifDb, "Sox9")
```

```
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
```

```
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6084 is invalid
```

```
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
```

```
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6140 is invalid
```

```
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
```

```
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6281 is invalid
```

```
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
```

```
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6598 is invalid
```

```
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
```

```
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6622 is invalid
```

```
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
```

``` r
Sox9Motifs
```

```
## MotifDb object of length 18
## | Created from downloaded public sources, last update: 2022-Mar-04
## | 18 position frequency matrices from 11 sources:
## |        HOCOMOCOv10:    2
## | HOCOMOCOv11-core-B:    1
## | HOCOMOCOv11-secondary-B:    1
## |              HOMER:    1
## |        JASPAR_2014:    1
## |        JASPAR_CORE:    1
## |       SwissRegulon:    1
## |         jaspar2016:    1
## |         jaspar2018:    1
## |         jaspar2022:    1
## |          jolma2013:    7
## | 3 organism/s
## |           Hsapiens:   16
## |          Mmusculus:    1
## |              other:    1
## Hsapiens-SwissRegulon-SOX9.SwissRegulon 
## Hsapiens-HOCOMOCOv10-SOX9_HUMAN.H10MO.B 
## Mmusculus-HOCOMOCOv10-SOX9_MOUSE.H10MO.B 
## Hsapiens-HOCOMOCOv11-core-B-SOX9_HUMAN.H11MO.0.B 
## Hsapiens-HOCOMOCOv11-secondary-B-SOX9_HUMAN.H11MO.1.B 
## ...
## Hsapiens-jolma2013-SOX9-3 
## Hsapiens-jolma2013-SOX9-4 
## Hsapiens-jolma2013-SOX9-5 
## Hsapiens-jolma2013-SOX9-6 
## Hsapiens-jolma2013-SOX9-7
```

---
## Accesing MotifDb contents

For more specific queries, multiple words can be used for filtering.

``` r
Sox9Motifs <- query(MotifDb, c("Sox9", "hsapiens", "jaspar2022"))
```