ChIPseq with replicates

In this practical we will investigate some CTCF ChIPseq for the Mel and Ch12 cellines.

Peak calls can be found in Data/CTCFpeaks/

I have also precounted CTCF signal high confidence peaks and save the results as CTCFcounts in an RData object called CTCFcounts.RData.

Information and raw/processed files can be found for Ch12 cell line here and Myc cell line here

  1. Load the CTCF peaks into a GRangesList object.
## [1] "data/CTCFpeaks//CTCF_Ch12_1_peaks.xls"
## [2] "data/CTCFpeaks//CTCF_Ch12_2_peaks.xls"
## [3] "data/CTCFpeaks//CTCF_MEL_1_peaks.xls" 
## [4] "data/CTCFpeaks//CTCF_MEL_2_peaks.xls"
  1. Create a bar chart of number of peaks in each sample.

  1. Extract the peaks common to all replicates and cell-lines

  2. Annotate common peaks to mm10 genes (TSS +/- 500) using ChIPseeker and create and upset plot on annotation.

## >> preparing features information...      2021-08-02 03:32:17 PM 
## >> identifying nearest features...        2021-08-02 03:32:17 PM 
## >> calculating distance from peak to TSS...   2021-08-02 03:32:19 PM 
## >> assigning genomic annotation...        2021-08-02 03:32:19 PM 
## >> adding gene annotation...          2021-08-02 03:32:22 PM
## 'select()' returned 1:many mapping between keys and columns
## >> assigning chromosome lengths           2021-08-02 03:32:22 PM 
## >> done...                    2021-08-02 03:32:22 PM
## Warning: Removed 121 rows containing non-finite values (stat_count).

5 Center common peaks to 100bp around geometric centre, extract sequence under region to FASTA and submit to Meme-ChIP. To save time, randomly sample 1000 sequences to submit to Meme-ChIP.

6 Load in the CTCF counts, identify peaks with higher signal in Ch12 cell line (padj <0.05, log2FoldChange > 3) and create an HTML report with tracktables.

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## [1] "/__w/RU_ChIPseq/RU_ChIPseq/extdata/CTCF_UpinCh12.html"

7 Using rGreat, identify MSigDB pathways enriched for targets genes of our CTCF peaks which are significantly higher in Ch12.

## Warning in submitGreatJob(UpinCh12, species = "mm10", request_interval = 1, : GREAT gives a warning:
## Your set hits a large fraction of the genes in the genome, which often
## does not work well with the GREAT Significant by Both view due to a
## saturation of the gene-based hypergeometric test.
## 
## See our tips for handling large datasets or try the Significant By
## Region-based Binomial view.
## The default enrichment tables contain no associated genes for the input
## regions. You can set `download_by = 'tsv'` to download the complete
## table, but note only the top 500 regions can be retreived. See the
## following link:
## 
## https://great-help.atlassian.net/wiki/spaces/GREAT/pages/655401/Export#Export-GlobalExport
## [1] "PANTHER Pathway" "BioCyc Pathway"  "MSigDB Pathway"