##The Course

Plotting Epigenomic data
Plotting Motifs occurrence

##Reminder of file types

In this session we will be dealing with a range of data types. For more information on file types you can revisit our material.

File Formats.

For more information on visualizing genomics data in browsers you can visit our IGV course.

IGV.

##Reminder of data types in Bioconductor

We will also encounter and make use of many data structures and data types which we have seen throughout our courses on HTS data. You can revisit this material to refresh on HTS data analysis in Bioconductor and R below.

##Materials

All material for this course can be found on github. * Visualizing Genomics Data

Or can be downloaded as a zip archive from here.
* Download zip

##Materials

Once the zip file in unarchived. All presentations as HTML slides and pages, their R code and HTML practical sheets will be available in the directories underneath.

r_course/presentations/Slides/ Presentations as an HTML slide show.
r_course/presentations/exercises/ Some tasks/examples to work through.

##Materials

All data to run code in the presentations and in the practicals is available in the zip archive. This includes coverage as bigWig files, aligned reads as BAM files and genomic intervals stored as BED files.

We also include some RData files containing precompiled results from querying database (in case of external server downtime).

All data can be found under the data directory

data/

##Set the Working directory

Before running any of the code in the practicals or slides we need to set the working directory to the folder we unarchived.

You may navigate to the unarchived visualizingGenomicsData folder in the Rstudio menu

Session -> Set Working Directory -> Choose Directory

or in the console.

setwd("/PathToMyDownload/VisualizingGenomicsData/r_course/") 
# e.g. setwd("~/Downloads/VisualizingGenomicsData/r_course")

Recap

Previously we have reviewed how we can review signal and annotation over individual sites using and genome browser such as IGV and or programmatically using Gviz.

]

Recap

With many high throughput sequencing experiments we are able to gain a genome wide view of our assays (in RNA-seq we measure every gene’s expression, in ChIPseq we identify all events for a user-defined transcription factor/mark, in ATAC-seq we evaluate all open/accessible regions in the genome.)

Recap

Dimension reduction and clustering are very useful tools for simplifying the interpretation of this highly complex and dimensional biological data.

Vizualizing Biological data

As with RNAseq, epigenomics data can be displayed once reduced in dimension, such as with meta gene and profile plots, as well as with heatmaps.

igv

Vizualizing Biological data

In this session will review a few approaches to visualize epigenetics data.

There are many alternate packages for visualizing epigenetic data in Bioconductor (soGGi,heatmaps and enrichmentmap packages) and in R (ngsPlot).

Useful software outside R include Homer and Deeptools.

CTCF ChIPseq

In our ChIPseq session, we identified differential occupancy of Myc transcription factor between Mel and Ch12 cell lines. Here we will assess CTCF signal difference between Mel and Ch12.

First lets read in some CTCF peaks into a GRangesList

peakFiles <- dir("data/CTCFpeaks/",pattern="*.peaks",
                 full.names = TRUE)
macsPeaks_GR <- list()
for(i in 1:length(peakFiles)){
  macsPeaks_DF <- read.delim(peakFiles[i],
                                  comment.char="#")
     macsPeaks_GR[[i]] <- GRanges(
     seqnames=macsPeaks_DF[,"chr"],
     IRanges(macsPeaks_DF[,"start"],
             macsPeaks_DF[,"end"]
     )
  )
}
macsPeaks_GRL <- GRangesList(macsPeaks_GR)
names(macsPeaks_GRL) <- c("Ch12_1","Ch12_2","Mel_1","Mel_2")

CTCF ChIPseq

Now we can produce our non-redundant set of peaks and score each of the peaks by their occurrence in our Mel and Ch12 samples.

allPeaksSet_nR <- reduce(unlist(macsPeaks_GRL))
overlap <- list()
for(i in 1:length(macsPeaks_GRL)){
  overlap[[i]] <- allPeaksSet_nR %over% macsPeaks_GRL[[i]]
}
overlapMatrix <- do.call(cbind,overlap)
colnames(overlapMatrix) <- names(macsPeaks_GRL)
mcols(allPeaksSet_nR) <- overlapMatrix
allPeaksSet_nR[1,]

## GRanges object with 1 range and 4 metadata columns:
##       seqnames          ranges strand |    Ch12_1    Ch12_2     Mel_1     Mel_2
##          <Rle>       <IRanges>  <Rle> | <logical> <logical> <logical> <logical>
##   [1]     chr1 3012656-3012773      * |      TRUE      TRUE     FALSE     FALSE
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths

CTCF ChIPseq

With the overlap matrix we can define our set of high-confidence peaks

HC_Peaks <- allPeaksSet_nR[
  rowSums(as.data.frame(
    mcols(allPeaksSet_nR)[,c("Ch12_1","Ch12_2")])) >= 2 |
  rowSums(as.data.frame(
    mcols(allPeaksSet_nR)[,c("Mel_1","Mel_2")])) >= 2  
  ]
HC_Peaks[1:2,]

## GRanges object with 2 ranges and 4 metadata columns:
##       seqnames          ranges strand |    Ch12_1    Ch12_2     Mel_1     Mel_2
##          <Rle>       <IRanges>  <Rle> | <logical> <logical> <logical> <logical>
##   [1]     chr1 3012656-3012773      * |      TRUE      TRUE     FALSE     FALSE
##   [2]     chr1 3448274-3448415      * |     FALSE     FALSE      TRUE      TRUE
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths

Defining a consensus peak set

We can count the number of reads overlapping our peaks using the summarizeOverlaps function and a BamFileList of our BAMs to count.

library(Rsamtools)
library(GenomicAlignments)

bams <- c("~/Projects/Results/chipseq/testRun/BAMs/Sorted_CTCF_Ch12_1.bam",
          "~/Projects/Results/chipseq/testRun/BAMs/Sorted_CTCF_Ch12_2.bam",
          "~/Projects/Results/chipseq/testRun/BAMs/Sorted_CTCF_Mel_1.bam",
          "~/Projects/Results/chipseq/testRun/BAMs/Sorted_CTCF_Mel_2.bam")
bamFL <- BamFileList(bams,yieldSize = 5000000)
myCTCFCounts <- summarizeOverlaps(HC_Peaks,
                              reads = bamFL,
                              ignore.strand = TRUE)
colData(myCTCFCounts)$Group <- c("Ch12","Ch12","Mel","Mel")

Differential ChIP

Now we can use the DESeq2 package to compare our 2 cell-lines.

library(DESeq2)
deseqCTCF <- DESeqDataSet(myCTCFCounts,design = ~ Group)
deseqCTCF <- DESeq(deseqCTCF)
CTCF_Ch12MinusMel <- results(deseqCTCF,
                        contrast = c("CellLine","Mel","Ch12"),
                        format="GRanges")

Generating rlog counts

We can also use DESeq2 to give us some normalized, transformed data for visualization using rlog.

CTCFrlog <- rlog(deseqCTCF)
CTCFrlog

Assessing the dataset

With our newly created rlog transformed CTCF signal data we can create similar visualization as we have for RNAseq.

First we can create our measures of sample distances.

sampleDists <- as.dist(1-cor(assay(CTCFrlog)))
sampleDistMatrix <- as.matrix(sampleDists)
pheatmap(sampleDistMatrix,
         clustering_distance_rows = sampleDists,
         clustering_distance_cols = sampleDists)

Assessing the dataset

We can also review our dimension reduction by PCA to assess differences within and between groups with the plotPCA. Here we use by default the top 500 most variable sites.

plotPCA(CTCFrlog,intgroup="Group")

Assessing the dataset

We can rerun using all sites to see some additional variance in PC2.

plotPCA(CTCFrlog,intgroup="Group",
                  ntop=nrow(CTCFrlog))

Assessing the dataset

The plotPCA can return a data.frame of the PC scores for samples used in plot.

myData <- plotPCA(CTCFrlog,intgroup="Group",
                  ntop=nrow(CTCFrlog),
                  returnData=TRUE)
myData

##                   PC1       PC2     group     Group        name
## CTCF_Ch12_1 -95.59844  17.12250 CTCF_Ch12 CTCF_Ch12 CTCF_Ch12_1
## CTCF_Ch12_2 -91.08241 -17.63094 CTCF_Ch12 CTCF_Ch12 CTCF_Ch12_2
## CTCF_MEL_1   94.06208 -11.13739  CTCF_Mel  CTCF_Mel  CTCF_MEL_1
## CTCF_MEL_2   92.61876  11.64583  CTCF_Mel  CTCF_Mel  CTCF_MEL_2

Assessing the dataset

So we can use this data.frame and any additional information to produce a customized version of this plot. Here we add fragment length information.

myData$FragmentLength <- c(129,136,133,125)
library(ggplot2)
ggplot(myData,aes(x=PC1,y=PC2,
                  colour=Group,size=FragmentLength))+
  geom_point()+
  scale_size_continuous(range=c(2,10))

Assessing the dataset

We can extract information on the influence of sites to the separation of samples along PC1 as we have with RNAseq.

pcRes <- prcomp(t(assay(CTCFrlog)))
RankedPC1 <- rownames(pcRes$rotation)[order(pcRes$rotation[,1],
                                            decreasing=T)]
RankedPC1[1:3]

## [1] "ID_chr4_89129278_89129824"   "ID_chr4_89145405_89145765"  
## [3] "ID_chrX_159987900_159988285"

Assessing the dataset

We can see from the IGV that these top sites are Mel cell-line specific CTCF peaks.

Assessing the dataset

We can also plot our significantly differential sites as defined by our DESeq2 test in a gene by sample heatmaps.

library(pheatmap)
rlogMat <- assay(CTCFrlog)
Diff <-  names(CTCF_Ch12MinusMel)[CTCF_Ch12MinusMel$padj < 0.05 & 
                                      !is.na(CTCF_Ch12MinusMel$padj) &
                                    abs(CTCF_Ch12MinusMel$log2FoldChange) > 3]

sigMat <- rlogMat[rownames(rlogMat) %in% Diff,]
pheatmap(sigMat,scale="row",show_rownames = FALSE)

soGGi

The soGGi package contains a set of tools to profile high-throughput signalling data and motif occurrence over a set of genomic locations.

We have used this earlier to produce some plots of our ATACseq signal split by fragment length over TSS regions and our cut-site footprints around CTCF motifs.

]

soGGi

We will use the development version of soGGi package available from Github to take advantage of the latest features.

library(devtools)
install_github("ThomasCarroll/soGGi")

library(soGGi)

soGGi

Before we start plotting over regions we can update our DEseq2 results GRanges with some more information on the differential binding sites. This is some categorical data, based on the direction and significance of change.

UpInMel <-  names(CTCF_Ch12MinusMel)[CTCF_Ch12MinusMel$padj < 0.05 & 
                                      !is.na(CTCF_Ch12MinusMel$padj) &
                                    CTCF_Ch12MinusMel$log2FoldChange < -3]

DownInMel <-  names(CTCF_Ch12MinusMel)[CTCF_Ch12MinusMel$padj < 0.05 & 
                                      !is.na(CTCF_Ch12MinusMel$padj) &
                                    abs(CTCF_Ch12MinusMel$log2FoldChange) > 3]

SigVec  <- ifelse(names(CTCF_Ch12MinusMel) %in% UpInMel,"UpInMel",
             ifelse(names(CTCF_Ch12MinusMel) %in% DownInMel,"DownInMel",
                    "Other"))
CTCF_Ch12MinusMel$Sig <- SigVec
CTCF_Ch12MinusMel$name <- names(CTCF_Ch12MinusMel)

soGGi

We also add some information on CTCF site signal strength. We are dividing our peaks into two groups: those that have high or low signal.

grPs <- quantile(CTCF_Ch12MinusMel$baseMean,c(0.33,0.66))
high <-  names(CTCF_Ch12MinusMel)[CTCF_Ch12MinusMel$baseMean > grPs[2]]
low <-  names(CTCF_Ch12MinusMel)[CTCF_Ch12MinusMel$baseMean < grPs[2]]

Strength  <- ifelse(names(CTCF_Ch12MinusMel) %in% high,"high",
             ifelse(names(CTCF_Ch12MinusMel) %in% low,"low",
                    "")
             )
CTCF_Ch12MinusMel$Strength <- Strength

soGGi

Now we have our updated GRanges of DEseq2 results with our custom annotation.

CTCF_Ch12MinusMel[1:2,]

## GRanges object with 2 ranges and 9 metadata columns:
##                              seqnames            ranges strand |  baseMean
##                                 <Rle>         <IRanges>  <Rle> | <numeric>
##   ID_chr16_11670061_11670805    chr16 11670061-11670805      * |   3608.19
##    ID_chr1_75559493_75559979     chr1 75559493-75559979      * |   1400.68
##                              log2FoldChange     lfcSE      stat       pvalue
##                                   <numeric> <numeric> <numeric>    <numeric>
##   ID_chr16_11670061_11670805        3.73522  0.121393   30.7696 6.69175e-208
##    ID_chr1_75559493_75559979        4.23851  0.139268   30.4341 1.94252e-203
##                                      padj         Sig                   name
##                                 <numeric> <character>            <character>
##   ID_chr16_11670061_11670805 4.12473e-203   DownInMel ID_chr16_11670061_11..
##    ID_chr1_75559493_75559979 5.98675e-199   DownInMel ID_chr1_75559493_755..
##                                 Strength
##                              <character>
##   ID_chr16_11670061_11670805        high
##    ID_chr1_75559493_75559979        high
##   -------
##   seqinfo: 21 sequences from an unspecified genome; no seqlengths

soGGi

We can use the regionPlot() function with our normalised bigWig of CTCF signal by setting format parameter to bigwig. Additionally we set the plot area width with the distanceAround parameter.

rownames(CTCF_Ch12MinusMel) <- NULL

CTCFbigWig <- "data/Sorted_CTCF_Ch12_1Normalised.bw"
CTCF_ch12 <- regionPlot(CTCFbigWig,
                        testRanges = CTCF_Ch12MinusMel,
                        style = "point",
                        format = "bigwig",
                        distanceAround = 750)

soGGi

We can produce a simple profile plot by using the plotRegion() function with our ChIPprofile object.

p <- plotRegion(CTCF_ch12)
p

igv

soGGi

As with our other packages, soGGi returns a ggplot object which can be further customized.

p+theme_bw()+
  geom_point(colour="red")

igv

soGGi

We can use the summariseBy argument to group our data for plotting by column information. Here by specify to group data by values in the Strength column.

plotRegion(CTCF_ch12,
           summariseBy = "Strength")

igv

soGGi

We can supply additional groups to the gts argument as a list of sets of IDS/names to create groups from. By default the names are matched against the rownames of our ChIPprofile object.

rownames(CTCF_ch12) <- mcols(CTCF_ch12)$name
plotRegion(CTCF_ch12,
           gts=list(high=high,low=low))

igv

soGGi

When using the gts argument, we can supply the name of column to check group IDs/names against for grouping our plot to the summariseBy argument.

plotRegion(CTCF_ch12,
           gts=list(high=high,low=low),
           summariseBy = "name")

igv

soGGi

We can also use lists of GRanges to group our data in soGGi.

First we can create some TSS and TTS regions to investigate.

library(TxDb.Mmusculus.UCSC.mm10.knownGene)
dsds <- genes(TxDb.Mmusculus.UCSC.mm10.knownGene)
TSS <- resize(dsds,500,fix = "start")
TTS <- resize(dsds,500,fix = "end")

soGGi

We can then supply this list of GRanges to the gts parameter as we have our lists of IDs/names. Groups are defined by regions overlapping the specified ranges.

plotRegion(CTCF_ch12,
           gts=list(TSS=TSS,
                    TTS=TTS))

igv

soGGi

We can control colouring with the colourBy argument. Here we specify Group to colour by our newly defined groups.

plotRegion(CTCF_ch12,
           summariseBy = "Strength",
           colourBy = "Group")

soGGi

We can control faceting with the groupBy argument.

plotRegion(CTCF_ch12,
           summariseBy = "Strength",
           colourBy = "Group",
           groupBy = "Group")

igv

soGGi

Again we can use ggplot basics to add some of our own colour schemes to our plots.

p <- plotRegion(CTCF_ch12,
                 summariseBy = "Strength",
                 colourBy = "Group",
                 groupBy = "Group")
cols <- c("high" = "darkred", "low" = "darkblue")
p + scale_colour_manual(values = cols)+
  theme_minimal()

igv

soGGi

We can use to soGGi to compare signal across samples.

Here we load in some data now for CTCF signal in our Mel cell-line.

CTCFbigWig <- "data//Sorted_CTCF_MEL_1Normalised.bw"
CTCF_mel <- regionPlot(CTCFbigWig,
                        testRanges = CTCF_Ch12MinusMel,
                        style = "point",
                        format = "bigwig",
                        distanceAround = 750)

soGGi

We can combine ChIPprofile objects just as with vectors and lists using the c() function.

CTCF <- c(CTCF_mel,CTCF_ch12)
CTCF

## class: ChIPprofile 
## dim: 61639 1501 
## metadata(2): names AlignedReadsInBam
## assays(2): '' ''
## rownames(61639): giID2175 giID2176 ... giID59822 giID59823
## rowData names(10): baseMean log2FoldChange ... Strength giID
## colnames(1501): Point_Centre-750 Point_Centre-749 ... Point_Centre749
##   Point_Centre750
## colData names(0):

soGGi

We can now plot our CTCF samples together in the same plot. We can control the plot by setting the colourBy argument to Sample.

plotRegion(CTCF,
           colourBy = "Sample")

igv

soGGi

We could also use the sample to facet our plot. Here we set groupBy to Sample.

plotRegion(CTCF,
           groupBy = "Sample")

igv

soGGi

And we combine Sample and Grouping to produce some more complex plots. Here we group our data by the Sig column.

plotRegion(CTCF,
           summariseBy = "Sig",
           groupBy = "Group",
           colourBy="Sample")

igv

soGGi

We have previously used soGGi to produce our plots of footprints across motifs. Now we will use motif occurrence as the signal to plot instead of regions to plot over.

First we extract the CTCF motif and plot the seqLogo.

library(MotifDb)
library(Biostrings)
library(seqLogo)
library(BSgenome.Mmusculus.UCSC.mm10)

CTCFm <- query(MotifDb, c("CTCF"))
ctcfMotif <- CTCFm[[1]]
seqLogo(ctcfMotif)

soGGi

We can use the Biostrings matchPWM() function to scan the genome for CTCF motifs, convert sites to a GRanges and this to an rleList using the coverage function.

myRes <- matchPWM(ctcfMotif,
                  BSgenome.Mmusculus.UCSC.mm10[["chr19"]])
CTCFmotifs <- GRanges("chr19",ranges(myRes))
CTCFrle <- coverage(CTCFmotifs)
CTCFrle[[1]]

## integer-Rle of length 61322860 with 5646 runs
##   Lengths: 3055542      19   24798      19 ...    8069      19   13447      19
##   Values :       0       1       0       1 ...       0       1       0       1

soGGi

We can now plot our profile of motif occurrence across CTCF sites using the regionPlot() function and setting format parameter to rlelist.

motifPlot <- regionPlot(CTCFrle,
               CTCF_Ch12MinusMel,
               format = "rlelist")
plotRegion(motifPlot)

igv

soGGi

We can combine our ChIP profiles from CTCF signal in Mel and Motif occurrence and plot side by side.

CTCFSig19 <- CTCF_mel[seqnames(CTCF_mel) %in% "chr19",]
rownames(motifPlot) <- rownames(CTCFSig19) <- NULL
CTCFall <- c(motifPlot,CTCFSig19)
p <- plotRegion(CTCFall)
p

igv

soGGi

Since these two objects represent different data types, they will be in different scales.

We can set the scales to resize within our facets by setting freeScale parameter to TRUE.

p <- plotRegion(CTCFall,freeScale=TRUE)
p

igv

soGGi

In some instances we will want to plot over regions of unequal lengths, such as genes.

dsds <- genes(TxDb.Mmusculus.UCSC.mm10.knownGene)

##   66 genes were dropped because they have exons located on both strands
##   of the same reference sequence or on more than one reference sequence,
##   so cannot be represented by a single genomic range.
##   Use 'single.strand.genes.only=FALSE' to get all the genes in a
##   GRangesList object, or use suppressMessages() to suppress this message.

names(dsds) <- NULL
dsds

## GRanges object with 24528 ranges and 1 metadata column:
##           seqnames              ranges strand |     gene_id
##              <Rle>           <IRanges>  <Rle> | <character>
##       [1]     chr9   21062393-21073096      - |   100009600
##       [2]     chr7   84935565-84964115      - |   100009609
##       [3]    chr10   77711457-77712009      + |   100009614
##       [4]    chr11   45808087-45841171      + |   100009664
##       [5]     chr4 144157557-144162663      - |      100012
##       ...      ...                 ...    ... .         ...
##   [24524]     chr3   84496093-85887516      - |       99889
##   [24525]     chr3 110246109-110250998      - |       99890
##   [24526]     chr3 151730922-151749960      - |       99899
##   [24527]     chr3   65528410-65555518      + |       99929
##   [24528]     chr4 136550540-136602723      - |       99982
##   -------
##   seqinfo: 66 sequences (1 circular) from mm10 genome

soGGi

To produce average plots over genes we must rescale genes to be the same size.This is performed by splitting genes into equal number of bins, with bin size variable across genes and plotting as percent of regions.

We can produce these plots in soGGi by setting style to percentOfRegion.

pol2Ser2 <- regionPlot(bamFile = "~/Downloads/ENCFF239XXP.bigWig",
                    testRanges = dsds,style = "percentOfRegion",
                    format = "bigwig")

soGGi

Now we can plot the distribution of Polymerase 2 serine 2 across genes with plotRegion() function.

plotRegion(pol2Ser2)

igv

soGGi

We can now include the CTCF signal to review distibution of CTCF in genes compared to Pol2s2 signal.

CTCF_gene <- regionPlot(CTCFbigWig,
                    testRanges = dsds,
                    style = "percentOfRegion",
                    format = "bigwig")

soGGi

We can now plot the signal of CTCF and Pol2serine2 side by side.

rownames(CTCF_gene) <- rownames(pol2Ser2) <- NULL

plotRegion(c(CTCF_gene,pol2Ser2),
           colourBy = "Sample",
           freeScale = TRUE,
           groupBy="Sample")

igv

soGGi

We may also want to review our CTCF profile data as heatmap of signal across our sites.

We can produce heatmaps in soGGi using the plotHeatmap function. First we can extract a subset of sites to plot, here our sites which are higher in Mel.

Since ChIPprofile objects act like GRanges we can subset by our metadata columns using mcols() function.

CTCF_Mel_Up <- CTCF[mcols(CTCF)$name %in% UpInMel,]

soGGi

We can pass individual ChIPprofile objects to the plotHeatmap function to produce our heatmaps of CTCF signal in Mel sample. Here we set maxValue to make Heatmaps comparable across samples.

k1 <- plotHeatmap(CTCF_Mel_Up[[1]],
                  col = colorRampPalette(blues9)(100),
                  maxValue=15)

igv

soGGi

And we can produce the heatmap for our other sample in the same way.

k2 <- plotHeatmap(CTCF_Mel_Up[[2]],
                  col = colorRampPalette(blues9)(100),
                  maxValue=15)

igv

profileplyr

The heatmaps generated in soGGi however aren’t as customizable as we would like.

The profileplyr written by Doug Barrows (Allis lab/BRC) provides a framework for plotting and manipulating genomic signal heatmaps.

library(profileplyr)

profileplyr

The profileplyr package has methods to import/export signal heatmaps from external sources such as DeepTools. Most usefully for us here though it can translate our soGGi ChIPprofile objects to profileplyr objects.

The profileplyr object is a version of RangedSummarizedExperiment object we should already recognize.

ctcfProPlyr <- as_profileplyr(CTCF)
class(ctcfProPlyr)

## [1] "profileplyr"
## attr(,"package")
## [1] "profileplyr"

ctcfProPlyr

## class: profileplyr 
## dim: 61639 1501 
## metadata(0):
## assays(2): Sorted_CTCF_MEL_1Normalised.bw
##   Sorted_CTCF_Ch12_1Normalised.bw
## rownames(61639): giID2175 giID2176 ... giID59822 giID59823
## rowData names(12): baseMean log2FoldChange ... names sgGroup
## colnames(1501): Point_Centre-750 Point_Centre-749 ... Point_Centre749
##   Point_Centre750
## colData names(0):

profileplyr

This means we can act on the object like any other RangedSummarisedExperiment object including subsetting by GRanges.

ExampleGR <- GRanges("chr1:1-10000000")
ctcfProPlyr[ctcfProPlyr %over% ExampleGR]

## class: profileplyr 
## dim: 81 1501 
## metadata(0):
## assays(2): Sorted_CTCF_MEL_1Normalised.bw
##   Sorted_CTCF_Ch12_1Normalised.bw
## rownames(81): giID2233 giID2238 ... giID6060 giID6128
## rowData names(12): baseMean log2FoldChange ... names sgGroup
## colnames(1501): Point_Centre-750 Point_Centre-749 ... Point_Centre749
##   Point_Centre750
## colData names(0):

profileplyr

Before we do any plotting we will subset our profileplyr object to just CTCF peaks changing between Mel and Ch12 cell-lines.

We can use the mcols of our profileplyr object to subset to genes of interest.

ctcfProPlyr <- ctcfProPlyr[mcols(ctcfProPlyr)$name %in% c(UpInMel,DownInMel),]

profileplyr

Once we have the profileplyr object filtered to the data of interest, we can create our heatmap using the generateEnrichedHeatmap function.

generateEnrichedHeatmap(ctcfProPlyr)

igv

profileplyr

We can perform clustering of our profileplyr object using the clusterRanges() function.

By default this will perform hierarchical clustering and plot the result summarised within peaks.

ctcfProPlyr <- clusterRanges(ctcfProPlyr)

## No 'kmeans_k' or 'cutree_rows' arguments specified. profileplyr object will be returned new column with hierarchical order from hclust

profileplyr

We can now replot our heatmap using the new clustering.

We must first tell the profileplyr object which metadata column to use for clustering/ordering the heatmap using the orderBy function.

ctcfProPlyr <- orderBy(tem,"hierarchical_order")
generateEnrichedHeatmap(ctcfProPlyr)

igv

##Exercises

Time for exercises! Link here

##Solutions

Time for solutions! Link here

Visualizing Genomics Data (part 4)

http://rockefelleruniversity.github.io/RU_VisualizingGenomicsData/

Recap

Recap

Recap

Vizualizing Biological data

Vizualizing Biological data

CTCF ChIPseq

CTCF ChIPseq

CTCF ChIPseq

Defining a consensus peak set

Differential ChIP

Generating rlog counts

Assessing the dataset

Assessing the dataset

Assessing the dataset

Assessing the dataset

Assessing the dataset

Assessing the dataset

Assessing the dataset

Assessing the dataset

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

soGGi

profileplyr

profileplyr

profileplyr

profileplyr

profileplyr

profileplyr

profileplyr