In this session we will review some CTCF ChIP-seq data from the Encode consortium datasets.

These include the replicated peak call for Lung, Brain and Kidney sample.

Exercise 1 - Download the bed files and create 1 non-redundant set of peaks for the Lung, Heart and Kidney samples. Write the set of peaks to a BED file.

## Loading required package: rtracklayer
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which, which.max, which.min
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
## 
##     expand.grid
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## GRangesList object of length 3:
## $Lung
## GRanges object with 64742 ranges and 0 metadata columns:
##           seqnames              ranges strand
##              <Rle>           <IRanges>  <Rle>
##       [1]    chr11   53317313-53317593      *
##       [2]     chr7   34746183-34746463      *
##       [3]     chr1 154103315-154103595      *
##       [4]     chr1 179799028-179799308      *
##       [5]    chr12   72732024-72732304      *
##       ...      ...                 ...    ...
##   [64738]     chr7 137478266-137478622      *
##   [64739]     chr2   32912428-32912796      *
##   [64740]     chr3   96580185-96580539      *
##   [64741]     chr8   70629724-70630132      *
##   [64742]     chr5 120102766-120103127      *
##   -------
##   seqinfo: 31 sequences from an unspecified genome; no seqlengths
## 
## $Heart
## GRanges object with 44533 ranges and 0 metadata columns:
##           seqnames              ranges strand
##              <Rle>           <IRanges>  <Rle>
##       [1]    chr15   86261377-86261601      *
##       [2]     chr1   36580890-36581114      *
##       [3]    chr11   75510348-75510572      *
##       [4]     chr7 105400308-105400532      *
##       [5]    chr11   74408767-74408991      *
##       ...      ...                 ...    ...
##   [44529]    chr10   77130640-77130878      *
##   [44530]     chr6   90737496-90737712      *
##   [44531]     chr2   32912500-32912723      *
##   [44532]     chr3 122638720-122638954      *
##   [44533]     chr7 137478331-137478557      *
##   -------
##   seqinfo: 31 sequences from an unspecified genome; no seqlengths
## 
## $Kidney
## GRanges object with 65482 ranges and 0 metadata columns:
##           seqnames              ranges strand
##              <Rle>           <IRanges>  <Rle>
##       [1]     chr7 116309784-116310114      *
##       [2]    chr11   95105506-95105836      *
##       [3]     chr4   98091490-98091820      *
##       [4]     chr7     4996229-4996559      *
##       [5]     chr1 133054622-133054952      *
##       ...      ...                 ...    ...
##   [65478]     chr1 181888785-181889231      *
##   [65479]     chr4 138319502-138319898      *
##   [65480]     chr2   32912430-32912817      *
##   [65481]     chr5 114667471-114667974      *
##   [65482]     chr8   70629657-70630155      *
##   -------
##   seqinfo: 31 sequences from an unspecified genome; no seqlengths

Exercise 2 - Create a vennDiagram of overlaps between Lung, heart and kidney peaks with out non-redundant set of peak.

## Loading required package: limma
## 
## Attaching package: 'limma'
## The following object is masked from 'package:BiocGenerics':
## 
##     plotMA

Exercise 3 - Create BED files containing peaks unique to Lung, Heart and Liver as well as peaks common to all samples. Also save these to GRanges objects for later use.

Exercise 4 - Sample 2000 peaks from the non-reduced set of peaks and write to a new bed.

nrSet <- nrSet[sample(1:length(nrSet),2000),]
export.bed(nrSet,"nrCTCF.bed")

Exercise 5 With the new sampled BED file, produce a heatmap of signal over peaks

## Loading required package: profileplyr
## Loading required package: SummarizedExperiment
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: DelayedArray
## Loading required package: matrixStats
## 
## Attaching package: 'matrixStats'
## The following objects are masked from 'package:Biobase':
## 
##     anyMissing, rowMedians
## Loading required package: BiocParallel
## 
## Attaching package: 'DelayedArray'
## The following objects are masked from 'package:matrixStats':
## 
##     colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
## The following objects are masked from 'package:base':
## 
##     aperm, apply, rowsum
## 
## 
## 
## Warning: replacing previous import 'ComplexHeatmap::pheatmap' by
## 'pheatmap::pheatmap' when loading 'profileplyr'
## 
## Attaching package: 'profileplyr'
## The following object is masked from 'package:S4Vectors':
## 
##     params
## Loading bigwig files.
## Making ChIPprofile object from signal files.
## Importing rlelist..Done
## Filtering regions which extend outside of genome boundaries.....Done
## Filtered 0 of 2000 regions
## Splitting regions by Watson and Crick strand....Done
## ..Done
## Found 2000 Watson strand regions
## Found 0 Crick strand regions
## Extending regions.....done
## Calculating coverage across regions
## Calculating per contig. 
## contig: 1
## contig: 2
## contig: 3
## contig: 4
## contig: 5
## contig: 6
## contig: 7
## contig: 8
## contig: 9
## contig: 10
## contig: 11
## contig: 12
## contig: 13
## contig: 14
## contig: 15
## contig: 16
## contig: 17
## contig: 18
## contig: 19
## contig: 20
## Creating ChIPprofile.
## Importing rlelist..Done
## Filtering regions which extend outside of genome boundaries.....Done
## Filtered 0 of 2000 regions
## Splitting regions by Watson and Crick strand....Done
## ..Done
## Found 2000 Watson strand regions
## Found 0 Crick strand regions
## Extending regions.....done
## Calculating coverage across regions
## Calculating per contig. 
## contig: 1
## contig: 2
## contig: 3
## contig: 4
## contig: 5
## contig: 6
## contig: 7
## contig: 8
## contig: 9
## contig: 10
## contig: 11
## contig: 12
## contig: 13
## contig: 14
## contig: 15
## contig: 16
## contig: 17
## contig: 18
## contig: 19
## contig: 20
## Creating ChIPprofile.
## Importing rlelist..Done
## Filtering regions which extend outside of genome boundaries.....Done
## Filtered 0 of 2000 regions
## Splitting regions by Watson and Crick strand....Done
## ..Done
## Found 2000 Watson strand regions
## Found 0 Crick strand regions
## Extending regions.....done
## Calculating coverage across regions
## Calculating per contig. 
## contig: 1
## contig: 2
## contig: 3
## contig: 4
## contig: 5
## contig: 6
## contig: 7
## contig: 8
## contig: 9
## contig: 10
## contig: 11
## contig: 12
## contig: 13
## contig: 14
## contig: 15
## contig: 16
## contig: 17
## contig: 18
## contig: 19
## contig: 20
## Creating ChIPprofile.
## class: profileplyr 
## dim: 2000 40 
## metadata(0):
## assays(3): ENCFF193CUU.bigWig ENCFF633LEJ.bigWig ENCFF688LXO.bigWig
## rownames(2000): giID1790 giID1791 ... giID1999 giID2000
## rowData names(5): name score sgGroup giID names
## colnames: NULL
## colData names(0):

Exercise 5 Provide more sensible sample names and colour the heatmaps by tissue.

Exercise 5 Provide more sensible sample names and colour the heatmaps by tissue.

## K means clustering used.
## A column has been added to the range metadata with the column name 'cluster', and the 'rowGroupsInUse' has been set to this column.

Exercise 6 Create a violin plot of signal in peaks across samples and clusters

## Loading required package: ggplot2

Exercise 7 Annotate the clusters using Rgreat and write annotated cluster 5 to a data.frame.