In this session we will review some CTCF ChIP-seq data from the Encode consortium datasets.
These include the replicated peak call for Lung, Brain and Kidney sample.
Kindey CTCF peakcalls - ENCFF784YSO
Lung CTCF peakcalls - ENCFF116ZIX
Heart CTCF peakcalls - ENCFF409RRE
Kindey CTCF BigWig - ENCFF193CUU
Lung CTCF BigWig - ENCFF633LEJ
Heart CTCF BigWig - ENCFF688LXO
## Loading required package: rtracklayer
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which, which.max, which.min
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
##
## expand.grid
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## GRangesList object of length 3:
## $Lung
## GRanges object with 64742 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr11 53317313-53317593 *
## [2] chr7 34746183-34746463 *
## [3] chr1 154103315-154103595 *
## [4] chr1 179799028-179799308 *
## [5] chr12 72732024-72732304 *
## ... ... ... ...
## [64738] chr7 137478266-137478622 *
## [64739] chr2 32912428-32912796 *
## [64740] chr3 96580185-96580539 *
## [64741] chr8 70629724-70630132 *
## [64742] chr5 120102766-120103127 *
## -------
## seqinfo: 31 sequences from an unspecified genome; no seqlengths
##
## $Heart
## GRanges object with 44533 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr15 86261377-86261601 *
## [2] chr1 36580890-36581114 *
## [3] chr11 75510348-75510572 *
## [4] chr7 105400308-105400532 *
## [5] chr11 74408767-74408991 *
## ... ... ... ...
## [44529] chr10 77130640-77130878 *
## [44530] chr6 90737496-90737712 *
## [44531] chr2 32912500-32912723 *
## [44532] chr3 122638720-122638954 *
## [44533] chr7 137478331-137478557 *
## -------
## seqinfo: 31 sequences from an unspecified genome; no seqlengths
##
## $Kidney
## GRanges object with 65482 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr7 116309784-116310114 *
## [2] chr11 95105506-95105836 *
## [3] chr4 98091490-98091820 *
## [4] chr7 4996229-4996559 *
## [5] chr1 133054622-133054952 *
## ... ... ... ...
## [65478] chr1 181888785-181889231 *
## [65479] chr4 138319502-138319898 *
## [65480] chr2 32912430-32912817 *
## [65481] chr5 114667471-114667974 *
## [65482] chr8 70629657-70630155 *
## -------
## seqinfo: 31 sequences from an unspecified genome; no seqlengths
## Loading required package: limma
##
## Attaching package: 'limma'
## The following object is masked from 'package:BiocGenerics':
##
## plotMA
Exercise 5 With the new sampled BED file, produce a heatmap of signal over peaks
## Loading required package: profileplyr
## Loading required package: SummarizedExperiment
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: DelayedArray
## Loading required package: matrixStats
##
## Attaching package: 'matrixStats'
## The following objects are masked from 'package:Biobase':
##
## anyMissing, rowMedians
## Loading required package: BiocParallel
##
## Attaching package: 'DelayedArray'
## The following objects are masked from 'package:matrixStats':
##
## colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
## The following objects are masked from 'package:base':
##
## aperm, apply, rowsum
##
##
##
## Warning: replacing previous import 'ComplexHeatmap::pheatmap' by
## 'pheatmap::pheatmap' when loading 'profileplyr'
##
## Attaching package: 'profileplyr'
## The following object is masked from 'package:S4Vectors':
##
## params
## Loading bigwig files.
## Making ChIPprofile object from signal files.
## Importing rlelist..Done
## Filtering regions which extend outside of genome boundaries.....Done
## Filtered 0 of 2000 regions
## Splitting regions by Watson and Crick strand....Done
## ..Done
## Found 2000 Watson strand regions
## Found 0 Crick strand regions
## Extending regions.....done
## Calculating coverage across regions
## Calculating per contig.
## contig: 1
## contig: 2
## contig: 3
## contig: 4
## contig: 5
## contig: 6
## contig: 7
## contig: 8
## contig: 9
## contig: 10
## contig: 11
## contig: 12
## contig: 13
## contig: 14
## contig: 15
## contig: 16
## contig: 17
## contig: 18
## contig: 19
## contig: 20
## Creating ChIPprofile.
## Importing rlelist..Done
## Filtering regions which extend outside of genome boundaries.....Done
## Filtered 0 of 2000 regions
## Splitting regions by Watson and Crick strand....Done
## ..Done
## Found 2000 Watson strand regions
## Found 0 Crick strand regions
## Extending regions.....done
## Calculating coverage across regions
## Calculating per contig.
## contig: 1
## contig: 2
## contig: 3
## contig: 4
## contig: 5
## contig: 6
## contig: 7
## contig: 8
## contig: 9
## contig: 10
## contig: 11
## contig: 12
## contig: 13
## contig: 14
## contig: 15
## contig: 16
## contig: 17
## contig: 18
## contig: 19
## contig: 20
## Creating ChIPprofile.
## Importing rlelist..Done
## Filtering regions which extend outside of genome boundaries.....Done
## Filtered 0 of 2000 regions
## Splitting regions by Watson and Crick strand....Done
## ..Done
## Found 2000 Watson strand regions
## Found 0 Crick strand regions
## Extending regions.....done
## Calculating coverage across regions
## Calculating per contig.
## contig: 1
## contig: 2
## contig: 3
## contig: 4
## contig: 5
## contig: 6
## contig: 7
## contig: 8
## contig: 9
## contig: 10
## contig: 11
## contig: 12
## contig: 13
## contig: 14
## contig: 15
## contig: 16
## contig: 17
## contig: 18
## contig: 19
## contig: 20
## Creating ChIPprofile.
## class: profileplyr
## dim: 2000 40
## metadata(0):
## assays(3): ENCFF193CUU.bigWig ENCFF633LEJ.bigWig ENCFF688LXO.bigWig
## rownames(2000): giID1790 giID1791 ... giID1999 giID2000
## rowData names(5): name score sgGroup giID names
## colnames: NULL
## colData names(0):
Exercise 5 Provide more sensible sample names and colour the heatmaps by tissue.
Exercise 5 Provide more sensible sample names and colour the heatmaps by tissue.
## K means clustering used.
## A column has been added to the range metadata with the column name 'cluster', and the 'rowGroupsInUse' has been set to this column.
Exercise 6 Create a violin plot of signal in peaks across samples and clusters
## Loading required package: ggplot2
Exercise 7 Annotate the clusters using Rgreat and write annotated cluster 5 to a data.frame.