This is a more extended problem set for the RNAseq course.

Lets try and repeat the RNAseq analysis in this paper from the Allis Lab: Histone butyrylation in the mouse intestine is mediated by the microbiota and associated with regulation of gene expression.

1. Processing

The GEO is here, but you can directly download files from ENA here.

If you do not want to dig into this section you can skip forward to section 2 and use the counts table that we provide.

FQ quality

Use rFastp to review the quality. Is there anything of note?

Counting

Use salmon to get pseudoalignment counts per transcript. Use tximport to import the counts into R.

If you want to save time there is an index and genome files for mm10 here.

2. Project QC

Now we will take our imported counts and assess the sample-to-sample varraition.

You can use the counts object data/PMC11520355_counts.csv in the project.

PCA

Import the counts into a DESeq2 object. Run a PCA on the counts. Do this to assess Vehicle vs Ampicillin Treatment, Mock vs Tributyrin and Replicates. You can do this with DESeq2, prcomp or pcaExplorer. [This is figure Ex 6b]

Dissimilarity Matrix

Now check the sample using a dissimilarity matrix. Ensure the biological and technical metadata is incorporated i.e. replicate. . [This is figure Ex 6a]

Check a few specific genes

Double-check counts for a few genes of interest: Arg2, Gstm2, Coq7, Hk2 [This is figure Ex 6d]

3. Differentials

Run DESeq2

Run differentials between each group: Amp_Mock vs. Veh_Mock, Amp_Tri vs. Amp_Mock and Amp_Tri vs. Veh_Mock.

Create Summary

Create a summary table that describes significantly changing genes for each comparison, including break down of up/down regulated. [This is figure Ex 6c]

4. Cluster Analysis

Cluster Analysis

Subset the dataset to genes that change signigciantly with Tributyrin treatment. Run clustering analysis to parse the subsetted dataset into several patterns of gene expression. [This is figure 3F]

GO term analysis

Use clusterProfiler to check the GO terms associated with each cluster. Visualize this with dotplots. [This is figure 3G]