These exercises are about manipulate single-cell data with Seurat. Please download the counting matrix from DropBox and loading them into a Seurat object. Or you may also used the rds file data/scSeq_CKO_1kCell_ori.rds.

Exercise 1 - Data manipulation with Seurat

Loading data

  1. Create a Seurat object by reading the count data from BOX or by loading the rds file. Then calculate mitochondrial contents of each ell
##                        orig.ident nCount_RNA nFeature_RNA dset percent.mt
## CKO_AGACGTTCAGCTGGCT-1        CKO       2679         1200  CKO  0.1119821
## CKO_TTAACTCGTAGTACCT-1        CKO        813          476  CKO 14.3911439
## CKO_GAGGTGAGTCTAGTGT-1        CKO       4852         1759  CKO 10.9233306
## CKO_CTCGTACAGCTAAGAT-1        CKO       1750          672  CKO 46.0000000
## CKO_AGGGAGTTCAAACCAC-1        CKO       4632         1585  CKO  5.3108808
## CKO_ATTATCCTCAACGGCC-1        CKO       2498         1177  CKO  3.8030424

Basic QC

  1. Access the read counts (nCount_RNA), gene counts (nFeature_RNA), and mitochondrial content (percent.mt) for each cell and draw a violin plot of each.

  2. Mmake a dot plot for nCount_RNA vs nFeature_RNA. NOTE: At this step try to keep an eye out potential doublets.

  3. Make a dot plot for nCount_RNA vs percent.mt. NOTE: At this step try to keep an eye out potential cell debris.

  4. Remove cells with percent.mt >= 10 for following analysis

## An object of class Seurat 
## 14353 features across 619 samples within 1 assay 
## Active assay: RNA (14353 features, 0 variable features)

Cell cycle

  1. Please estimate cell cycle phase of each cell and make a table to describe how many cells per phase.
## 
##  G1 G2M   S 
## 268 110 241

Normalization and clustering

  1. Please scale data regressing to mitochondrial content (percent.mt) and cell cycle (S.score, G2M.score, Phase).
## Warning: Requested variables to regress not in object: S.score, G2M.score
## Regressing out percent.mt, Phase
## Centering and scaling data matrix
  1. Please make principle component analysis (PCA), estimate how many PCs would best represent this data, then make clustering and UMAP plot.
  • check for up to 30 PCs.
  • elbow plot can be used to determine PCs.
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 619
## Number of edges: 17772
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8164
## Number of communities: 6
## Elapsed time: 0 seconds
## 16:35:22 UMAP embedding parameters a = 0.9922 b = 1.112
## 16:35:22 Read 619 rows and found 10 numeric columns
## 16:35:22 Using Annoy for neighbor search, n_neighbors = 30
## 16:35:22 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 16:35:22 Writing NN index file to temp file /tmp/RtmpMsibtn/file176911ce7eb8
## 16:35:22 Searching Annoy index using 1 thread, search_k = 3000
## 16:35:22 Annoy recall = 100%
## 16:35:23 Commencing smooth kNN distance calibration using 1 thread
## 16:35:25 Initializing from normalized Laplacian + noise
## 16:35:25 Commencing optimization for 500 epochs, with 24230 positive edges
## 16:35:26 Optimization finished

Marker genes

  1. Try to select marker genes for each cluster and generate a heatmap of the top five marker genes for each cluster.
## Calculating cluster 0
## Calculating cluster 1
## Calculating cluster 2
## Calculating cluster 3
## Calculating cluster 4
## Calculating cluster 5
##                 p_val avg_log2FC pct.1 pct.2    p_val_adj cluster     gene
## Lef1     3.491653e-14  2.1244940 0.531 0.228 5.011569e-10       0     Lef1
## Ccnd2    2.068318e-13  1.8703046 0.929 0.672 2.968656e-09       0    Ccnd2
## Cxcl14   2.913725e-13 15.3782765 0.972 0.740 4.182070e-09       0   Cxcl14
## Ifitm3   2.970921e-13  3.8616282 0.886 0.642 4.264163e-09       0   Ifitm3
## Tnfrsf19 9.619285e-13  0.6537363 0.611 0.294 1.380656e-08       0 Tnfrsf19
## Zfos1    2.002504e-12  7.3476849 0.962 0.713 2.874194e-08       0    Zfos1
## Warning in DoHeatmap(obj, features = topG$gene): The following features were
## omitted as they were not found in the scale.data slot for the RNA assay: Gpx1,
## Rps7, Rps28, Zfos1

Advanced plots

  1. Please display the top marker gene of each cluster in FeaturePlot and RidgePlot
## Picking joint bandwidth of 0.786
## Picking joint bandwidth of 1.05
## Picking joint bandwidth of 0.86
## Picking joint bandwidth of 1.73
## Picking joint bandwidth of 0.123