We have been working to process and a characterize developmental changes in the context of the TF Sox9 using data from the Fuchs lab: The pioneer factor SOX9 competes for epigenetic factors to switch stem cell fates
Once we have identified regions of interest from our ATAC or Cut&Run often the next step is to investigate the motifs enriched under peaks. Motif analysis like this can help find the drivers of epigenomic changes and help create a more mechanistic understanding of your experiment.
For Cut&Run with a known transcription factor this kind of analysis may be less obvious as we have an expected target i.e. Sox9. That said it is still useful to validate our IP, find cofactors, indirect effects and also find specific motif variants.
Bioconductor provides two major sources of motifs as database packages.
These include:
The MotifDB package collects motif information from a wide range of sources and stores them in a DB object for use with other Bioconductor packages.
## MotifDb object of length 12657
## | Created from downloaded public sources, last update: 2022-Mar-04
## | 12657 position frequency matrices from 22 sources:
## | FlyFactorSurvey: 614
## | HOCOMOCOv10: 1066
## | HOCOMOCOv11-core-A: 181
## | HOCOMOCOv11-core-B: 84
## | HOCOMOCOv11-core-C: 135
## | HOCOMOCOv11-secondary-A: 46
## | HOCOMOCOv11-secondary-B: 19
## | HOCOMOCOv11-secondary-C: 13
## | HOCOMOCOv11-secondary-D: 290
## | HOMER: 332
## | JASPAR_2014: 592
## | JASPAR_CORE: 459
## | ScerTF: 196
## | SwissRegulon: 684
## | UniPROBE: 380
## | cisbp_1.02: 874
## | hPDI: 437
## | jaspar2016: 1209
## | jaspar2018: 1564
## | jaspar2022: 1956
## | jolma2013: 843
## | stamlab: 683
## | 62 organism/s
## | Hsapiens: 6075
## | Mmusculus: 1554
## | Dmelanogaster: 1437
## | Athaliana: 1371
## | Scerevisiae: 1221
## | NA: 184
## | other: 815
## Scerevisiae-ScerTF-ABF2-badis
## Scerevisiae-ScerTF-CAT8-badis
## Scerevisiae-ScerTF-CST6-badis
## Scerevisiae-ScerTF-ECM23-badis
## Scerevisiae-ScerTF-EDS1-badis
## ...
## Mmusculus-UniPROBE-Zfp740.UP00022
## Mmusculus-UniPROBE-Zic1.UP00102
## Mmusculus-UniPROBE-Zic2.UP00057
## Mmusculus-UniPROBE-Zic3.UP00006
## Mmusculus-UniPROBE-Zscan4.UP00026
MotifDb object is special class of object called a MotifList.
## [1] "MotifList"
## attr(,"package")
## [1] "MotifDb"
Like standard List objects we can use length and names to get some information on our object
## [1] 12657
## [1] "Scerevisiae-ScerTF-ABF2-badis" "Scerevisiae-ScerTF-CAT8-badis"
## [3] "Scerevisiae-ScerTF-CST6-badis" "Scerevisiae-ScerTF-ECM23-badis"
## [5] "Scerevisiae-ScerTF-EDS1-badis" "Scerevisiae-ScerTF-FKH2-badis"
## [7] "Scerevisiae-ScerTF-FZF1-badis" "Scerevisiae-ScerTF-GIS1-badis"
## [9] "Scerevisiae-ScerTF-GSM1-badis" "Scerevisiae-ScerTF-GZF3-badis"
We can also access information directly from our list using standard list accessors.
Here a [ will subset to a single MotifList. Now we can see the information held in the MotifList a little more clearly.
## MotifDb object of length 1
## | Created from downloaded public sources, last update: 2022-Mar-04
## | 1 position frequency matrices from 1 source:
## | ScerTF: 1
## | 1 organism/s
## | Scerevisiae: 1
## Scerevisiae-ScerTF-ABF2-badis
A [[ will subset to object within the element as with standard lists. Here we extract the position probability matrix.
## 1 2 3 4 5 6
## A 0.09 0.01 0.01 0.97 0.01 0.94
## C 0.09 0.97 0.01 0.01 0.01 0.02
## G 0.02 0.01 0.01 0.01 0.97 0.02
## T 0.80 0.01 0.97 0.01 0.01 0.02
## 1 2 3 4 5 6
## 1 1 1 1 1 1
We can extract a DataFrame of all the motif metadata information using the values() function.
## DataFrame with 2 rows and 15 columns
## providerName providerId dataSource geneSymbol
## <character> <character> <character> <character>
## Scerevisiae-ScerTF-ABF2-badis badis.ABF2 ABF2 ScerTF ABF2
## Scerevisiae-ScerTF-CAT8-badis badis.CAT8 CAT8 ScerTF CAT8
## geneId geneIdType proteinId proteinIdType
## <character> <character> <character> <character>
## Scerevisiae-ScerTF-ABF2-badis YMR072W SGD Q02486 UNIPROT
## Scerevisiae-ScerTF-CAT8-badis YMR280C SGD P39113 UNIPROT
## organism sequenceCount bindingSequence
## <character> <character> <character>
## Scerevisiae-ScerTF-ABF2-badis Scerevisiae NA NA
## Scerevisiae-ScerTF-CAT8-badis Scerevisiae NA NA
## bindingDomain tfFamily experimentType
## <character> <character> <character>
## Scerevisiae-ScerTF-ABF2-badis NA NA NA
## Scerevisiae-ScerTF-CAT8-badis NA NA NA
## pubmedID
## <character>
## Scerevisiae-ScerTF-ABF2-badis 19111667
## Scerevisiae-ScerTF-CAT8-badis 19111667
We can use the query function to subset our MotifList by infomation in the metadata.
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6084 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6140 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6281 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6598 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6622 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## MotifDb object of length 18
## | Created from downloaded public sources, last update: 2022-Mar-04
## | 18 position frequency matrices from 11 sources:
## | HOCOMOCOv10: 2
## | HOCOMOCOv11-core-B: 1
## | HOCOMOCOv11-secondary-B: 1
## | HOMER: 1
## | JASPAR_2014: 1
## | JASPAR_CORE: 1
## | SwissRegulon: 1
## | jaspar2016: 1
## | jaspar2018: 1
## | jaspar2022: 1
## | jolma2013: 7
## | 3 organism/s
## | Hsapiens: 16
## | Mmusculus: 1
## | other: 1
## Hsapiens-SwissRegulon-SOX9.SwissRegulon
## Hsapiens-HOCOMOCOv10-SOX9_HUMAN.H10MO.B
## Mmusculus-HOCOMOCOv10-SOX9_MOUSE.H10MO.B
## Hsapiens-HOCOMOCOv11-core-B-SOX9_HUMAN.H11MO.0.B
## Hsapiens-HOCOMOCOv11-secondary-B-SOX9_HUMAN.H11MO.1.B
## ...
## Hsapiens-jolma2013-SOX9-3
## Hsapiens-jolma2013-SOX9-4
## Hsapiens-jolma2013-SOX9-5
## Hsapiens-jolma2013-SOX9-6
## Hsapiens-jolma2013-SOX9-7
For more specific queries, multiple words can be used for filtering.
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6084 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6140 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6281 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6598 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6622 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6084 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6140 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6281 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6598 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6622 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6084 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6140 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6281 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6598 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6622 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## MotifDb object of length 1
## | Created from downloaded public sources, last update: 2022-Mar-04
## | 1 position frequency matrices from 1 source:
## | jaspar2022: 1
## | 1 organism/s
## | Hsapiens: 1
## Hsapiens-jaspar2022-SOX9-MA0077.1
The JASPAR packages are updated more frequently and so may contain motifs not characterized in the MotifDb package.
We can load the JASPAR package as usual. This is a little more complex then MotifDb, but ultimately we are connecting to a SQL database.
To interact with JASPAR package we will make use of the TFBSTools package from the same lab.
Whereas the JASPAR package holds the information on Motifs and Position Probability Matrices (PPMs), TFBSTools has the functionality to manipulate and interact with these tools.
Three useful functions available from TFBStools to interact with the JASPAR databases are the getMatrixSet, getMatrixByID and getMatrixByID.
The getMatrixByID and getMatrixByName take the JASPAR DB object and a JASPAR ID or transcription factor name respectively. If you are unsure of exact names/IDs you can review the Jaspar website first.
Here we are using the transcription factor SOX9 again. The result is a new object class PFMatrix.
## [1] "PFMatrix"
## attr(,"package")
## [1] "TFBSTools"
JASPAR IDs are unique, so we can use them to select the exact motif we wanted. Sox9 has two very subtlely different motifs
List accessors will not work here but we can retrieve names using the ID function.
## [1] "MA0077.2"
To get hold of the Position Frequency Matrix (PFM) we can use the Matrix or as.matrix functions.
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## A 3 8 71 2 7 2 13 9
## C 55 38 1 1 0 2 0 7
## G 9 6 0 2 4 72 4 6
## T 9 24 4 71 65 0 59 54
We can also use the getMatrixSet() function to retrieve sets of motifs. We can specify a list of options for motifs we want to retrieve.
To see the available filters use the help for getMatrixSet() function, ?getMatrixSet.
Here we retrieve the vertebrate, JASPAR CORE motifs. We additional specify all_versions is FALSE to only include the latest version of a motif.
The seqLogo package offers a simple and intuitive way of visualizing our base frequencies within out motifs using the seqLogo function.
A simple seqLogo shows the relative frequency of a base at each motif position by the relative size of the base compared to other bases at the same position.
To create a seqLogo we must supply a PPM to the seqLogo function. Here we grab a PPM straight from MotifdDb for Sox9.
Here we set the ic.scale to FALSE to show the probability across the Y-axis
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6084 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6140 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6281 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6598 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): input string 6622 is invalid
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
## Warning in grep(queryString, mcols(object)[, colname], ignore.case =
## ignore.case): unable to translate 'Three-zinc finger Kr<c3><bc>ppel-related
## factors' to a wide string
It may be useful to plot the probabilities as information content. Information content here will range from 0 to 2 Bits. Positions with equal probabilities for each base will score 0 and positions with only 1 possible base will score 2.
This allows you to quickly identify the important bases.
The TFBSTools package and JASPAR provide us with point frequency matrices which we can’t use directly in the seqLogo package
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## A 3 8 71 2 7 2 13 9
## C 55 38 1 1 0 2 0 7
## G 9 6 0 2 4 72 4 6
## T 9 24 4 71 65 0 59 54
We can convert our point frequency matrix to a point probabilty matrix by simply dividing columns by their sum.
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## A 0.03947368 0.10526316 0.93421053 0.02631579 0.09210526 0.02631579 0.17105263
## C 0.72368421 0.50000000 0.01315789 0.01315789 0.00000000 0.02631579 0.00000000
## G 0.11842105 0.07894737 0.00000000 0.02631579 0.05263158 0.94736842 0.05263158
## T 0.11842105 0.31578947 0.05263158 0.93421053 0.85526316 0.00000000 0.77631579
## [,8]
## A 0.11842105
## C 0.09210526
## G 0.07894737
## T 0.71052632
Fortunately, TFBSTools has its own version of seqLogo function we can use with one of its own ICMatrix classes. We simply need to convert our object with the toICM() function.
Motif enrichment can be performed using the MEME suite of tools. They have developed a huge collection of algorithms with situation specificity for finding and matching motifs or detecting differential enrichment of motifs.
You can directly run these tools sing the web interface (and we show you how to work with the results here), but in these sessions we will show you how to run this through R.
As we did for MACS, we can also install MEME from the Conda repository using the Herper package.
Now we have MEME installed, the next step is to grab some Motifs as our reference. We can follow similar steps to before.
We specifically will want to convert this to a new format: the universalmotif class. We can do this by grabbing our motifs like we did for TFBStools, then using the convert_motifs function. The result is a list of universalmotif PCMs.
library(universalmotif)
motifs <- getMatrixSet(sq24, opts)
motifs_um <- convert_motifs(motifs)
class(motifs_um)
## [1] "list"
## [1] "universalmotif"
## attr(,"package")
## [1] "universalmotif"
We need a collection of sequences that we are querying. In this case our query is: what motifs are enriched in our regulated regions at W6 time point. We can start by importing the ranges from the data folder UpinW6.bed.
Next we need to grab the corresponding sequence from under these ranges. Where you are grabbing sequences from needs to match your alignment reference. If you have updated your BSGenome or are using a different build you can grab the sequence ifnromation from any fasta using readDNAStringSet
The last thing we need is background sequences. Though you can use a random shuffled background it is often better to give something a little more representative of your data i.e. consensus peaks. Otherwise common motifs found in regions like promoters may seem enriched, even if they are not enriched in the specific promoters you care about.
Though there is no R pacakge for MEME, there is a wrapper package that will run MEME for you from within R. This is called memes. We just need to be able to tell memes where MEME is.
We will use the Ame algorithm (Analysis of Motif Enrichment) to look for enriched motifs.
NOTE: This will take some time. We will provide the result
Once this has run the MEME results are saved in the output directory we specified in the function. data/ame_diff. We can review the report to see our top hits.
The memes package also returns our results into R. We have saved the R result object data/ame_results.RData so you can load it in.
This object is a special kind of data frame called a tibble. This means we can use a combination of tidyverse and memes functions to extract our result and generate some nice plots.
## [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
First lets look at our top 10 motifs associated with the W6 timepoint. They are predominately the SOX family, which is a good result.
We can then use what we know about making motifs to quickly grab our seqlogo.
top_res_motif <- getMatrixByID(sq24, ame_results$motif_alt_id[1])
top_res_motif_mat <- Matrix(top_res_motif)
ggseqlogo(top_res_motif_mat) + theme_minimal()
As you can see we have a high degree of redundancy in our results i.e. all the top hits are different Sox!
This can make it a little trickier to aprse your results. Two factors can help with this.
Filter out unexpressed genes. It is possible to use matching RNAseq (or even ATAC) to filter out TF motifs if they are not expressed. This will also boost your padj scores.
There are JASPAR familial profiles that contain a consensus motif for TF families. This will simplify the result, though it may blunt and obfuscate some findings.
Databases of motifs are a great resource, but sometimes they fall short or miss aditional biology for various reasons.
By doing de novo motif characterization you can account for this. The DREME algorithm from MEME is used for this.
We can run DREME (Discriminative Regular Expression Motif Elicitation) in a very similar way to AME, using the runDreme() function. The only change is we do not include a motif reference database.
As before, once this has run the MEME results are saved in the output directory we specified in the function. data/dreme_diff. We can review the report to see our top hits.
The memes package also returns our results into R. We have saved the R result object data/dreme_results.RData so you can load it in.
This object again is a special kind of data frame, but this time it is a universalmotif data frame as it has motifs stored inside.
## [1] "universalmotif_df" "data.frame"
DREME results will be sightly different to AME as we do not know the identity of the motif. But we can view the motifs easily using some universalmotif functions.
Once you have derived a list of de novo motifs, the next step is then to compare them to a list of known motifs. We can use the TomTom algorithm from MEME to do this.
The best_match_motif column contains all the top hits from TOMTOM for each de novo motif.
Though there is a top hit, each motif can have multiple hits. These are some of the many potential hits:
## [1] "SOX4" "Sox6" "SOX10" "SOX13" "SOX15"
## [6] "Sox3" "SOX2" "Sox11" "Sox5" "Sox7"
## [11] "Sox17" "Dmrt1" "Sox1" "SOX8" "POU2F1::SOX2"
## [16] "SOX18" "SOX14" "SOX21" "SOX9" "SRY"
## [21] "Hnf1A" "Lef1" "TCF7L1" "TCF7"
If you wanted to review these motifs you can use the view_tomtom_hits function. This allows you to compare the motifs easily to the original. Though our result is pretty clear, this is an important step as the “best match” is not always the right match. You need to make sure it amkes sense given your domain knowledge.
We now have a couple of tools in our toolkit to characterize which motifs are interesting to us. Often the next step is to actually determine where those motifs are.
Unsurprisingly, MEME also has a tool for this: FIMO. You can use the runFIMO() function to do this through memes.
There is also a nice alternative tool called motifmatchr, which we will show you as it is very quick.
To identify known motifs we will use the motifmatchr package which is a wrapper for the MOODS c++ library.
This means that motifmatchr offers us a fast methods to identify motifs within our data.
We will use motifmatchr/MOODs with a default p-value cut-off.
Again lets use the JASPAR database as our reference for this. For motifmatchr we want a PFMatrixList using TFBSTools.
The motifmatchr package main function is matchMotifs().
As with many Bioconductor functions , matchMotifs makes use of other Bioconductor objects such the BSGenome, GRanges and summarizedExperiment objects.
We can review the full functionality by reading the help for matchMotifs, ?matchMotifs.
motifmatchr needs some search space. Though we could run this genome-wide, for efficiency and speeds sake we normally just run it on regions of interest i.e. our W6-specific peaks.
Specifically we need the sequence associated with these peaks. Luckily we generated all this while running MEME.
## DNAStringSet object of length 4652:
## width seq names
## [1] 513 AAGGCCTGTCAAAAATCTTAAC...GGAGTTTGAGCCTTGAAAGCT peak_1
## [2] 1017 ATCCCCATGAATTTCACTACCT...GCTTGTGAGTAGAGATCTCCT peak_2
## [3] 448 TGAGGAGTCAATGTTTTTTAGC...CTATGCCTTTAACCCATGGAT peak_3
## [4] 744 AGACAAAATGAGCCCTAGTGGT...CCTGATAAACACTGCCTGTCT peak_4
## [5] 573 TGCCTCCTCTAAACTCCTCTGC...TCCAAAGGCATCTTCAGCTGT peak_5
## ... ... ...
## [4648] 595 AGAAACACAGTCGGACACAGGG...ACGGAGATGATTCACATCCCT peak_4648
## [4649] 350 ATGCAAACATACCTTCCCCCTA...AAAACAAAACACCCAAAACAT peak_4649
## [4650] 403 AGCAGCTGGAGAGAGAGAGATG...AAGCAATCATTCGTCTTCTCT peak_4650
## [4651] 383 AGGTTTGTGTGCAGGAAGAACT...AGCCATCGTGACTGTTGCCCA peak_4651
## [4652] 171 TAAGGTAGTCATTTTCTGGGAT...TGGTATGCATCCTAGTACACT peak_4652
As we saw in its help, The matchMotifs function can provide output of motif matches as matches, scores or positions.
Here we scan for the top 3 motifs from our AME results using the sequence under the first 100 peaks and specify the out as positions.
library(motifmatchr)
motifs_subset <- motifs[names(motifs) %in% ame_results$motif_alt_id[1:3]]
motif_positions <- matchMotifs(motifs_subset, peaksSequences[1:100], out = "positions")
class(motif_positions)
## [1] "list"
## [1] 3
The result contains a list of the same length as the number of motifs tested.
Each element contains a IRangeslist with an entry for every sequence tested and a IRanges of motif positions within the peak sequences.
## IRangesList object of length 100:
## [[1]]
## IRanges object with 0 ranges and 2 metadata columns:
## start end width | strand score
## <integer> <integer> <integer> | <character> <numeric>
##
## [[2]]
## IRanges object with 0 ranges and 2 metadata columns:
## start end width | strand score
## <integer> <integer> <integer> | <character> <numeric>
##
## [[3]]
## IRanges object with 1 range and 2 metadata columns:
## start end width | strand score
## <integer> <integer> <integer> | <character> <numeric>
## [1] 295 301 7 | + 11.582
##
## ...
## <97 more elements>
We can unlist our IRangeslist to a standard list for easier working.
MA1152.2hits <- motif_positions$MA1152.2
names(MA1152.2hits) <- names(peaksSequences[1:100])
unlist(MA1152.2hits, use.names = TRUE)
## IRanges object with 24 ranges and 2 metadata columns:
## start end width | strand score
## <integer> <integer> <integer> | <character> <numeric>
## peak_3 295 301 7 | + 11.582
## peak_7 87 93 7 | + 11.582
## peak_9 654 660 7 | - 11.582
## peak_14 313 319 7 | + 11.582
## peak_19 320 326 7 | + 11.582
## ... ... ... ... . ... ...
## peak_79 132 138 7 | + 11.582
## peak_89 164 170 7 | + 11.582
## peak_94 111 117 7 | - 11.582
## peak_94 186 192 7 | - 11.582
## peak_96 79 85 7 | + 11.582
Once motif instances have been identified they can then be used to generate metaplots like we saw in session 2 with soGGi, or we could use profileplyr.
These metaplots would show changes to accessibility/enrichment directly over the motifs. The only changes to what we should in session 2 would be to replace the TSS ranges with motif ranges.
We may simply want to map motifs to their peaks.
To do this we can set the out parameter to matches. This will return a SummarizedExperiment object.
## [1] "SummarizedExperiment"
## attr(,"package")
## [1] "SummarizedExperiment"
## class: SummarizedExperiment
## dim: 4652 879
## metadata(0):
## assays(1): motifMatches
## rownames: NULL
## rowData names(0):
## colnames(879): MA0004.1 MA0069.1 ... MA1602.2 MA1722.2
## colData names(1): name
We can retrieve a matrix of matches by motif and peak using the motifMatches function.
## [1] 4652 879
## 8 x 8 sparse Matrix of class "lgCMatrix"
## MA0004.1 MA0069.1 MA0071.1 MA0074.1 MA0101.1 MA0107.1 MA0111.1 MA0115.1
## [1,] . . . . . . . .
## [2,] . . | . . . | |
## [3,] . . . . . . . .
## [4,] . . | . . . . |
## [5,] | . . . | | . .
## [6,] . . . . . . . .
## [7,] . . . . . . | .
## [8,] . . . . | . . .
Although a sparse matrix, we can still use our matrix operations to extract useful information from this object.
We can use the colSums() to identify the total occurrence of motifs in our peak sequences.
Fittingly our top hit from AME has a high number of motifs.
## MA0004.1 MA0069.1 MA0071.1 MA0074.1
## 242 106 183 147
## MA1152.2
## 674
We can also identify peaks which contain a hit for a selected motif.
## GRanges object with 674 ranges and 2 metadata columns:
## seqnames ranges strand | name score
## <Rle> <IRanges> <Rle> | <character> <numeric>
## [1] chr7 114358259-114358706 * | <NA> 0
## [2] chr6 52684371-52684890 * | <NA> 0
## [3] chr6 134329482-134330318 * | <NA> 0
## [4] chr14 123530967-123531638 * | <NA> 0
## [5] chr10 24407882-24408394 * | <NA> 0
## ... ... ... ... . ... ...
## [670] chr14 70211676-70212181 * | <NA> 0
## [671] chr4 124400836-124401853 * | <NA> 0
## [672] chr14 78531771-78532053 * | <NA> 0
## [673] chr4 100174849-100175341 * | <NA> 0
## [674] chr15 91037086-91037371 * | <NA> 0
## -------
## seqinfo: 20 sequences from an unspecified genome; no seqlengths
We have focussed primarily on using MEME to conduct our analysis, there are several other popular tools.
Exercise on functions can be found here
Answers can be found here here