ChIPseq in Bioconductor exercises (part 1)

ChIPseq data processing

In these exercises we will review some of the functionality for summarizing counts and signal across genomes and within regions.

We will be using data directly downloaded from the Encode consortium.

Download the FASTQ for the other Myc MEL replicate from sample ENCSR000EUA. Direct link is here.

The resulting FQ file is ENCFF001NQQ.fastq.gz.

Read in a random sample of 10,000 reads from ENCFF001NQQ.fastq.gz into R.

Produce a boxplot of quality scores across cycles.

Create a barplot of A,C,G,T,N occurrence in reads.

Create a histogram of read scores.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Create a new FASTQ from file, filter reads with sum quality score less than 250 and N content greater than 50%.

Align FASTQ file to mm10 genome (only chromosomes chr1 to chr19, X,Y and M) to produce a sorted, indexed BAM file.

Produce a bigWig of coverage and another of coverage normalised to total reads (as Reads Per Million).