These exercises cover the sections of Data wrangling with tidy.

All files can be found in the “dataset” directory.

 

Exercise 7

 

Hint:

Counts per million (CPM) are the gene counts normalized to total counts in a sample, multiplied by a million to give you a sensible number.

gene_A_CPM = (gene_A_counts / sum(all_genes_counts)) * 1,000,000

Transcripts per million (TPM) are the gene counts normalized to total counts in a sample, multiplied by a million to give you a sensible number.

gene_A_TPM = (gene_A_counts / sum(all_genes_counts / all_genes_lengths)) * 1/gene_A_length * 1,000,000

More info on RNAseq counts quantification here: http://luisvalesilva.com/datasimple/rna-seq_units.html