These are the first exercises are about alignment and counting in RNAseq.

In todays session we will work with some of the RNAseq data of T-regulatory cells from Christina Leslie’s lab.

Sequencing data as a FASTQ file can be found here.

Aligned data as a BAM file can be found here.

Exercises

1. Run Rfastp

Download the above FASTQ file for T-regulatory cells (replicate 2) - ENCFF070QMF.fastq.gz. Alteranitvely you can work with the much smaller sampled dataset in the data folder:

data/ENCFF070QMF_sampled.fastq.gz

2. Check Rfastp QC plots

Check the QC summary and then look at the GC and Quality curves.

##                      Before_QC     After_QC
## total_reads       1.000000e+05 9.967500e+04
## total_bases       5.000000e+06 4.983750e+06
## q20_bases         4.875344e+06 4.869811e+06
## q30_bases         4.749907e+06 4.748659e+06
## q20_rate          9.750690e-01 9.771380e-01
## q30_rate          9.499810e-01 9.528280e-01
## read1_mean_length 5.000000e+01 5.000000e+01
## gc_content        4.802670e-01 4.802000e-01

2. Alignment

Align our filtered reads to the chromosome 10 of mm10 genome. Sort and index the resulting BAM file.

4. Alignment

Count the reads in our newly aligned and indexed BAM file mapping within genes. Plot a density plot of log 10 of reads counts across genes on chromosome 10.
NOTE: Add 1 read to all counts to avoid log of zero

5. Salmon Quantification [ADVANCED]

Download and install Salmon. Using Salmon, quantify transcript levels using reads from our filtered FASTQ.

6. Review Salmon scores

Read in the generated quant.sf file and plot log2 read counts by log10 TPM scores in a scatter plot. NOTE: If you did not run salmon yourself, the quant file can be found here: “data/Salmon/TReg_2_Quant/quant.sf”

## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Transformation introduced infinite values in continuous y-axis