This is a more extended problem set for the RNAseq course.
Lets try and repeat the RNAseq analysis in this paper from the Allis Lab: Histone butyrylation in the mouse intestine is mediated by the microbiota and associated with regulation of gene expression.
The GEO is here, but you can directly download files from ENA here.
If you do not want to dig into this section you can skip forward to section 2 and use the counts table that we provide.
Use rFastp to review the quality. Is there anything of note?
Use salmon to get pseudoalignment counts per transcript. Use tximport to import the counts into R.
If you want to save time there is an index and genome files for mm10 here.
Now we will take our imported counts and assess the sample-to-sample varraition.
You can use the counts object data/PMC11520355_counts.csv in the project.
Import the counts into a DESeq2 object. Run a PCA on the counts. Do this to assess Vehicle vs Ampicillin Treatment, Mock vs Tributyrin and Replicates. You can do this with DESeq2, prcomp or pcaExplorer. [This is figure Ex 6b]
Now check the sample using a dissimilarity matrix. Ensure the biological and technical metadata is incorporated i.e. replicate. . [This is figure Ex 6a]
Double-check counts for a few genes of interest: Arg2, Gstm2, Coq7, Hk2 [This is figure Ex 6d]
Run differentials between each group: Amp_Mock vs. Veh_Mock, Amp_Tri vs. Amp_Mock and Amp_Tri vs. Veh_Mock.
Create a summary table that describes significantly changing genes for each comparison, including break down of up/down regulated. [This is figure Ex 6c]
Subset the dataset to genes that change signigciantly with Tributyrin treatment. Run clustering analysis to parse the subsetted dataset into several patterns of gene expression. [This is figure 3F]
Use clusterProfiler to check the GO terms associated with each cluster. Visualize this with dotplots. [This is figure 3G]