Exercise 3 - Single-cell RNA sequencing

Description

We will use data from this study on renal cancer from human cancer.

The full dataset is on ArrayExpress. We will now use 4 samples for now: FCAImmP7277561, FCAImmP7277560, FCAImmP7277553, FCAImmP7277552. These are CD45 +/- cells from liver.

You can also download them from DropBox.

Exercise 0 - Preprocess data [You can skip this and go straight to integration if you want]

Read in each dataset. Quickly run through creating a seurat object, log normalize, scale, assessing MT, check cell cycle integration. Creating a loop or function will help do this. Check some QC attributes.

Exercise 1 - Integration

Merge the datasets. How does it look? You can download a list of the 4 datasets after preprocessing from DropBox, or do exercise 0 yourself.
Integrate the 4 datasets using rPCA
Integrate the 4 datasets using Harmony
Assess the integration performance. Here are some marker genes: A1BG and SERPINC1

Exercise 2 - Annotation

Use the harmony results. Lets annotate using the HumanPrimaryCellAtlasData and ImmGen from celldex. Use singleR.

[Hint: The layers of a Harmony integrated object are kept seperate in the Seurat Object. You made need to run JoinLayers() before exporting the count matrix]

Lets try annotation using the Allen Brain Map. Use singleR and transfer anchors.

Exercise 3 - Differential expression

Set up: Using the integrated Seurat object from the AD samples from our lecture, we will look at differentially expressed genes between AD and control subjects in cluster #4 from the ‘seurat_clusters’ metadata group. This object is on dropbox as integrated.rds
Generate a list of differentially expressed genes for AD vs control in cluster #4 using a Wilcoxon rank sum test. Note: Confirm you are using the ‘seurat_clusters’ group, not the ‘paper_cluster’ group that also exists in the object.

[Hint: Make sure you have the Seurat object properly set up for differential analysis. Are you using the correct assay? Are the integrated layers joined?]

Generate a list of genes with differential expression statistics for AD vs control in cluster #4 using MAST. Note: Again, confirm you are using the ‘seurat_clusters’ group, not the ‘paper_cluster’ group that also exists in the object. Sort the table of genes by the ‘FDR’ column.
- Add the cellular detection rate (CDR) to the model
- Does CDR correlate with any PCs? [Hint: you may need to go beyond PC1 and PC2 when looking at correlations]
- Do you think that Sex is a confounding variable for these cells? [Hint: Are there sex specific genes that drive your differential expression?]
Make one or separate violin plots for the top 5 genes that are going up in AD in this cluster of cells
Generate a list of genes with differential expression statistics for AD vs control in cluster #4 using a pseudobulk strategy. Use DESeq2 and sort the table by the ‘pvalue’ column from the DESeq2 results.
- Extra: Make a PCA plot with the aggregated cell counts to support or adjust the decision made about including sex as a confounding variable.
Compare the differential expression resutls between MAST and pseudobulk
- How many differential genes overlap between the methods?
- Do the pvalues and fold changes correlate?

Exercise 3 - Single-cell RNA sequencing

Rockefeller University, Bioinformatics Resource Centre

https://rockefelleruniversity.github.io/scRNA-seq

Description