ChIPseq with replicates

In this practical we will investigate some CTCF ChIPseq for the Mel and Ch12 cellines.

Peak calls can be found in Data/CTCFpeaks/

I have also precounted CTCF signal high confidence peaks and save the results as CTCFcounts in an RData object called CTCFcounts.RData.

Information and raw/processed files can be found for Ch12 cell line here and Myc cell line here

  1. Load the CTCF peaks into a GRangesList object.
## [1] "data/CTCFpeaks//CTCF_Ch12_1_peaks.xls" "data/CTCFpeaks//CTCF_Ch12_2_peaks.xls"
## [3] "data/CTCFpeaks//CTCF_MEL_1_peaks.xls"  "data/CTCFpeaks//CTCF_MEL_2_peaks.xls"
  1. Create a bar chart of number of peaks in each sample.

  1. Extract the peaks common to all replicates and cell-lines

  2. Annotate common peaks to mm10 genes (TSS +/- 500) using ChIPseeker and create and upset plot on annotation.

## >> preparing features information...      2025-06-05 20:19:54 
## >> identifying nearest features...        2025-06-05 20:19:54 
## >> calculating distance from peak to TSS...   2025-06-05 20:19:55 
## >> assigning genomic annotation...        2025-06-05 20:19:55 
## >> adding gene annotation...          2025-06-05 20:19:57
## 'select()' returned 1:many mapping between keys and columns
## >> assigning chromosome lengths           2025-06-05 20:19:58 
## >> done...                    2025-06-05 20:19:58
## Warning: Removed 121 rows containing non-finite outside the scale range (`stat_count()`).

5 Center common peaks to 100bp around geometric centre, extract sequence under region to FASTA and submit to Meme-ChIP. To save time, randomly sample 1000 sequences to submit to Meme-ChIP.

6 Load in the CTCF counts, identify peaks with higher signal in Ch12 cell line (padj <0.05, log2FoldChange > 3) and create an HTML report with tracktables.

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in design formula
## are characters, converting to factors
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## [1] "/Users/thomascarroll/Github/RU_ChIPseq/extdata/CTCF_UpinCh12.html"

7 Using rGreat, identify MSigDB pathways enriched for targets genes of our CTCF peaks which are significantly higher in Ch12.

## Warning in submitGreatJob(UpinCh12, species = "mm10", request_interval = 1, : GREAT gives a warning:
## Your set hits a large fraction of the genes in the genome, which often does not
## work well with the GREAT Significant by Both view due to a saturation of the
## gene-based hypergeometric test.
## 
## See our tips for handling large datasets or try the Significant By Region-based
## Binomial view.
## The default enrichment table does not contain informatin of associated genes for
## each input region. You can set `download_by = 'tsv'` to download the complete
## table, but note only the top 500 regions can be retreived. See the following
## link:
## 
## https://great-help.atlassian.net/wiki/spaces/GREAT/pages/655401/Export#Export-GlobalExport
## 
## Except the additional gene-region association column if taking 'tsv' as the
## source of result, all other columns are the same if you choose 'json' (the
## default) as the source. Or you can try the local GREAT analysis with the function
## `great()`.
## [1] "PANTHER Pathway" "BioCyc Pathway"  "MSigDB Pathway"