In this practical we will investigate some CTCF ChIPseq for the Mel and Ch12 cellines.
Peak calls can be found in Data/CTCFpeaks/
I have also precounted CTCF signal high confidence peaks and save the results as CTCFcounts in an RData object called CTCFcounts.RData.
Information and raw/processed files can be found for Ch12 cell line here and Myc cell line here
## [1] "data/CTCFpeaks//CTCF_Ch12_1_peaks.xls" "data/CTCFpeaks//CTCF_Ch12_2_peaks.xls"
## [3] "data/CTCFpeaks//CTCF_MEL_1_peaks.xls" "data/CTCFpeaks//CTCF_MEL_2_peaks.xls"
Extract the peaks common to all replicates and cell-lines
Annotate common peaks to mm10 genes (TSS +/- 500) using ChIPseeker and create and upset plot on annotation.
## >> preparing features information... 2025-06-05 20:19:54
## >> identifying nearest features... 2025-06-05 20:19:54
## >> calculating distance from peak to TSS... 2025-06-05 20:19:55
## >> assigning genomic annotation... 2025-06-05 20:19:55
## >> adding gene annotation... 2025-06-05 20:19:57
## 'select()' returned 1:many mapping between keys and columns
## >> assigning chromosome lengths 2025-06-05 20:19:58
## >> done... 2025-06-05 20:19:58
## Warning: Removed 121 rows containing non-finite outside the scale range (`stat_count()`).
5 Center common peaks to 100bp around geometric centre, extract sequence under region to FASTA and submit to Meme-ChIP. To save time, randomly sample 1000 sequences to submit to Meme-ChIP.
6 Load in the CTCF counts, identify peaks with higher signal in Ch12 cell line (padj <0.05, log2FoldChange > 3) and create an HTML report with tracktables.
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in design formula
## are characters, converting to factors
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## [1] "/Users/thomascarroll/Github/RU_ChIPseq/extdata/CTCF_UpinCh12.html"
7 Using rGreat, identify MSigDB pathways enriched for targets genes of our CTCF peaks which are significantly higher in Ch12.
## Warning in submitGreatJob(UpinCh12, species = "mm10", request_interval = 1, : GREAT gives a warning:
## Your set hits a large fraction of the genes in the genome, which often does not
## work well with the GREAT Significant by Both view due to a saturation of the
## gene-based hypergeometric test.
##
## See our tips for handling large datasets or try the Significant By Region-based
## Binomial view.
## The default enrichment table does not contain informatin of associated genes for
## each input region. You can set `download_by = 'tsv'` to download the complete
## table, but note only the top 500 regions can be retreived. See the following
## link:
##
## https://great-help.atlassian.net/wiki/spaces/GREAT/pages/655401/Export#Export-GlobalExport
##
## Except the additional gene-region association column if taking 'tsv' as the
## source of result, all other columns are the same if you choose 'json' (the
## default) as the source. Or you can try the local GREAT analysis with the function
## `great()`.
## [1] "PANTHER Pathway" "BioCyc Pathway" "MSigDB Pathway"