These exercises are about the integration and annotation in session 3.
We will use data from this study on renal cancer from human cancer.
The full dataset is on ArrayExpress. We will now use 4 samples for now: FCAImmP7277561, FCAImmP7277560, FCAImmP7277553, FCAImmP7277552. These are CD45 +/- cells from liver.
You can also download them from DropBox.
Exercise 0 - Preprocess data [You can skip this and go straight to integration if you want]
Exercise 1 - Integration
Merge the datasets. How does it look? You can download a list of the 4 datasets after preprocessing from DropBox, or do exercise 0 yourself.
Integrate the 4 datasets using rPCA
Integrate the 4 datasets using Harmony
Assess the integration performance. Here are some marker genes: A1BG and SERPINC1
Exercise 2 - Annotation
[Hint: The layers of a Harmony integrated object are kept seperate in the Seurat Object. You made need to run JoinLayers() before exporting the count matrix]
Exercise 3 - Differential expression
Set up: Using the integrated Seurat object from the AD samples from our lecture, we will look at differentially expressed genes between AD and control subjects in cluster #4 from the ‘seurat_clusters’ metadata group. This object is on dropbox as integrated.rds
Generate a list of differentially expressed genes for AD vs control in cluster #4 using a Wilcoxon rank sum test. Note: Confirm you are using the ‘seurat_clusters’ group, not the ‘paper_cluster’ group that also exists in the object.
[Hint: Make sure you have the Seurat object properly set up for differential analysis. Are you using the correct assay? Are the integrated layers joined?]
Generate a list of genes with differential expression statistics for AD vs control in cluster #4 using MAST. Note: Again, confirm you are using the ‘seurat_clusters’ group, not the ‘paper_cluster’ group that also exists in the object. Sort the table of genes by the ‘FDR’ column.
Make one or separate violin plots for the top 5 genes that are going up in AD in this cluster of cells
Generate a list of genes with differential expression statistics for AD vs control in cluster #4 using a pseudobulk strategy. Use DESeq2 and sort the table by the ‘pvalue’ column from the DESeq2 results.
Compare the differential expression resutls between MAST and pseudobulk