Loupe Browser, Session 1

class: center, middle, inverse, title-slide

.title[
# Loupe Browser, Session 1<br />
<html><br />
<br />
<hr color='#EB811B' size=1px width=796px><br />
</html>
]
.author[
### Rockefeller University, Bioinformatics Resource Centre
]
.date[
### <a href="http://rockefelleruniversity.github.io/RU_course_template/" class="uri">http://rockefelleruniversity.github.io/RU_course_template/</a>
]

---

## Overview

- [Course Home Page](http://rockefelleruniversity.github.io/LoupeBrowser/)
- [Clustering](https://rockefelleruniversity.github.io/LoupeBrowser/presentations/singlepage/Session1.html#Clustering)
- [Differential Expression](https://rockefelleruniversity.github.io/LoupeBrowser/presentations/singlepage/Session1.html#Diffential_Expression)
- [Features](https://rockefelleruniversity.github.io/LoupeBrowser/presentations/singlepage/Session1.html#Features)
- [Custom Groups](https://rockefelleruniversity.github.io/LoupeBrowser/presentations/singlepage/Session1.html#Custom_Groups6)
- [Reanalyze](https://rockefelleruniversity.github.io/LoupeBrowser/presentations/singlepage/Session1.html#Reanalyze)
- [Import/Export](https://rockefelleruniversity.github.io/LoupeBrowser/presentations/singlepage/Session1.html#ImportExport)

---
## Course materials

Links to material and slides for this course can be found on github.

* [Loupe Browser](https://rockefelleruniversity.github.io/LoupeBrowser/)

Or can be downloaded as a zip archive from here.

* [Download zip](https://github.com/rockefelleruniversity/LoupeBrowser/zipball/master)

---
## Course materials

Once the zip file in unarchived. All presentations as HTML slides and pages will be available in the directories underneath.

* **r_course/presentations/slides/**
Presentations as an HTML slide show.
* **r_course/presentations/singlepage/** 
Presentations as an HTML single page.

---
class: inverse, center, middle

# 10X scRNAseq datasets

---

## Sample Prep

There are several single cell sequencing technologies. The most common is from 10X Genomics.

![overview](./imgs/drops.png)

---
## A scRNAseq workflow

We often need many tools across several computational languages to [analyze these complex experiments](https://rockefelleruniversity.github.io/SingleCell_Bootcamp/). Deciding what is appropriate often depends on the data set and its QC metrics. Our first step is nearly always to run Cell Ranger. 
![overview](./imgs/advancedworkflow.png)

---
## .cloupe file

The product of this QC report, count matrices for downstream analysis and a Loupe file. The *.cloupe* extension denotes the Loupe file; a specialized file which can be opened by Loupe Browser.

At this point the sequenced reads have been mapped to genes and empty/bad droplets have been filtered out.

---
## Loupe Browser

Loupe Browser is a visualization and analysis software for scRNAseq. It plays a key role in exploratory analysis and QC allowing us to:

* Check QC markers
* Review clustering
* Check marker genes
* Run differentials
* Subset the data

---
## Install Loupe Browser

Loupe Browser can be installed from the [Loupe Browser website](https://www.10xgenomics.com/support/software/loupe-browser/downloads). The current version is 8, though most older versions are still [available](https://www.10xgenomics.com/support/software/loupe-browser/downloads/previous-versions).

---
## Opening Loupe Browser

When we open Loupe Browser we are greeted by a Home page.

![overview](./imgs/home.png)

---
## Opening Loupe Browser

When we open Loupe Browser we are greeted by a Home page.

![overview](./imgs/home2.png)

---
## Opening Loupe Browser

When we open Loupe Browser we are greeted by a Home page.

- *Center*
* Tutorial Files - Contains several example data sets.
* Recent Files - Contains prior used data sets.

- *Top Right*
* Open Loupe Files - Open a *.cloupe* file you have locally

- *Left*
* Documentation and Tutorials - Guides for using the browser
* Public Datasets - 10X's repository of datasets

---
## Lets start

We will start out using the data set for AML in the **Tutorial Files** section. This data set is from 3 donors. 2 healthy bone marrow samples and 1 from a patient with AML. There are 8,414 cells.

Zheng, G., Terry, J., Belgrader, P. et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun 8, 14049 (2017). [https://doi.org/10.1038/ncomms14049](https://www.nature.com/articles/ncomms14049)

---
## Panels

The data will take a few seconds to load in and then we can see our data and the main control panels.

.pull-left[
- View (Our Data) [Middle]
- Projections [Top]
- Toolbar [Left]
- Differential Expression [Bottom Right]
  ]
  
.pull-right[

![overview](./imgs/screen.png)

]

---
## Data and Projections

By default our data is shown as a t-SNE. This is a popular type of Dimension Reduction. This simplifies our very complex highly dimensional data into a 2D space. In this space each dot is a cell and the local relationship between cells is preserved.

* We can Zoom in using the Zoom icon [Bottom left]. 
* We can drag and drop to navigate around
* We can alter the dot size using projection settings [Top]
* We can hide most of the UI with *corner square* button [Top]

---
## Toolbar

Bulk of the activities we will want to do is contained within the left toolbar.

.pull-left[
* Clusters
* Features
* Reanalyze
* Advanced Selection
* V(D)J Clonotypes
* Search for Features
  ]
  
.pull-right[

![overview](./imgs/screen2.png)

]

---
class: inverse, center, middle

# Clustering

---

## Clustering

Lets start with clustering. Cell Ranger does some basic clustering for you. It performs clustering in two ways:

- Graph-Based
- K-means

The cells are colored by the Graph-Based clustering method be default. We can use the toolbar to switch easily between the two.

---
## Clustering

![overview](./imgs/screen3.png)

---
## Graph-Based vs K-means

- Graph-based: This is built on relationship between cells, grouping cells together based on a nearest-neighbor model.

- K-means: This assigns cells to a cluster based on their distance to a pre-determined number of cluster centers. This typically leads to round clusters.

Generally graph-based is better for identifying more complex spatial patterns. K-means is typically simpler.

---
## Clustering

All clusters are shown. We can focus on just a few quite easily by clicking on the check box. We can also edit color and name by clicking on the *three dots*.

![overview](./imgs/subclust.png)

---
## Clustering

Alternatively, we can use the split view to help clarify the distribution of the different plots.

![overview](./imgs/splitclust.png)

---
## Clustering and QC

This is our first chance to really assess the quality of our data.

We are looking for clearly defined clusters. For this we need easily distinguishable groups of cells that have classified correctly into clusters.

We don't want an inkblot with indistinguishable/overlapping cluster assignments.

---
## Clustering

You can download the cluster information easily by clicking the download button. This gives you a table containing every Cell Barcode and the cluster it is assigned to. This is a *.csv* file which can be opened in Excel/Numbers/Sheets or by Python/R.

![overview](./imgs/clust_down.png)

---
## Custom Groups

This data has several custom markers associated with it: AMLStatus and LibraryID.

We can also check the distribution of these identities across our tSNE in much the same way we can review the clusters.

---
## Adding Custom Markers

We can add our own groups. We can do this in two ways.

1) Upload a file *.csv* containing Cell Barcodes and group information.
  - If you download the course material there is an example files in: *LoupeBrowser/r_course/data/smoke_status_metadata.csv*

2) Manually create rules to create groups in Loupe Browser (we will show you this later)

---
class: inverse, center, middle

# Diffential Expression

---

## Diffential Expression

We can also get differential expression results from our analysis. This can be another important step in the QC of our data and start to provide insight into the biological findings.

The are various ways we can use the differential analysis:

- Markers
- Pairwise Comparisons
- Pseudobulk

---
## Markers

By default differential gene expression is calculated for each cluster. It does this by comparing each cluster to everything else using an exact negative binomial test.

The result are a set of marker genes for that cluster.

We can view the results as a Feature Table or a Heatmap.

In this case we can see markers like CD3D, IL32, CD3E up-regulated in Clusters 1, 6 and 12, suggesting they are some kind of Immune Cells.

---
## Markers

![overview](./imgs/heat.png)

---
## Pairwise Comparisons

Once we have some understanding of our data we may want to start do some specific comparison between clusters i.e. Cluster 1 vs Cluster 6, or use some of the group information i.e. AMLstatus.

Let's look at Cluster 1 vs Cluster 6. To do this we select just those clusters.

---
## Pairwise Comparisons

Once we have some understanding of our data we may want to start do some specific comparison between clusters i.e. Cluster 1 vs Cluster 6, or use some of the group information i.e. AMLstatus.

Let's look at Cluster 1 vs Cluster 6. To do this we select just those clusters and click *Run Differential* then *Between Selected Clusters*.

---
## Pairwise Comparisons

Again we can review results as a Feature Table or a Heatmap. This has some interesting results. There are several RPS and RPL genes. These are associated with the ribosome and could be a sign of QC issues.

We want to keep an eye out for problem gene families like mtRNA and rRNA as they could be symptomatic of technical issues with the data. We will come back to this.

---
## Pairwise Comparisons

![overview](./imgs/ribo.png)

---
## Pseudobulk

The last method is only applicable if you have replicates. Pseudobulk leverages your replicates and aggregates the information across cells to allow a more traditional bulk RNAseq statistical determination.

A pseudobulk approach can help deal with noise and variability scRNAseq and negate some of the overpowering you can have with the high n of cells.

Though not the best experimental design, we can run pseudobulk on this dataset as we have 2 control samples (even though we only have 1 AML).

Let's start by clicking *Run Differential* then *Across Multiple Samples*.

---
## Pseudobulk

A separate window will pop up. We will then provide our differential design.

1) Select our group. This will be the subset of cells that you want to focus your differential on i.e. Cluster 1. We can run it on all our clusters at once.

2) Experimental comparison. What do you want to compare? In this case we will look at AML status. We can then select which will be A or B in our comparison. We will pick patient as A; traditionally perturbations are used as the numerator for fold change comparisons.

3) Select samples. We will just include all our samples.

---
## Pseudobulk

![overview](./imgs/pseudo.png)

---
## Pseudobulk

When we review the heatmap we can see Cluster 14 has a lot of changes (typically the loss of expression). We can also see a few genes very up-regulated. We can see in Cluster 2 HBG1 and HBG2 which are known to be involved in AML.

![overview](./imgs/heat2.png)

---
class: inverse, center, middle

# Features

---

## Features

Once we have reviewed our clusters we may next want to check the distribution of expression of specific genes or features. This can useful for several reasons:

* Check QC markers - i.e. mtRNA or rRNAs
* Check cluster markers
* Check differential results
* Check genes of interest

All we have to do is click on the *Feature* panel and type in our gene of interest.

---
## QC Features

Here we check the ribosomal small subunit RPS4Y1. There is clear asymmetric distribution of expression. This would be a red flag, but AML is known to have abnormal expression of ribosomal subunits.

![overview](./imgs/ribo2.png)

---
## QC Features

We can also check which clusters are gene is expressed in by going down to the differential expression panel and clicking on *Expression Distribution*. This shows us violin plots, broken down into whatever groups we have selected in the Cluster panel.

![overview](./imgs/violin.png)

---
## Cluster Features
We can check our cluster specific features. Earlier we saw CD3D as a top cluster marker.

![overview](./imgs/cd3d.png)

---
## Differential Features
Likewise we can follow up with our differential results. Lets look at HBG2.

![overview](./imgs/hbg2.png)

---
## Custom Features
NPM1 is a well known gene which is often mutant in AML.

![overview](./imgs/npm1.png)

---
## Customise Feature Plot

We can customize the feature plot a lot using the tool panel in the top right.

We can alter the scale/transformation of the counts, color scheme and add filters. We can also export cells that express our gene into custom groups (more on that soon).

![overview](./imgs/tool.png)

---
## Feature Lists

The genes we have looked at have been automatically added to a gene list. We can generate several of these list, rename them or modify them easily in Loupe Browser. Alternatively we can export/import them.

![overview](./imgs/lists.png)

---
## Multiple Feature Lists

If you have multiple gene lists you can start to parse out complex interactions by making a co-expression plot. This shows the distribution of multiple gene lists across your tSNE.

---
## Feature Plot

We can also cross-reference individual genes in a similar way by looking at the Feature Plot. This is hidden in the Projection Type selection drop down.

We can use this to find sub populations of cells dependent on multiple markers for example: CD8+ T-cells. These will have CD3D and CD8 expression. We can also subset the cells just to clusters 1, 6 and 12 to help focus on these cells.

---
class: inverse, center, middle

# Custom Groups

---

## Custom Groups

So far we have shown you how to make custom groups by importing gene lists into Loupe Browser. We can also create custom groups while in Loupe Browser.

We have seen allusions to this while looking at filtering of Features based on expression.

Let's filter CD3D based on a minimum expression value of 1.5.

---
## Save Barcodes

After we have filtered our cells we can then save these barcodes. To save them we need to:
 - *Name the selected cluster*: What is the name of these cells? i.e. Immune
 - *Add to new Group*: What is the type of labeling we are doing? i.e. Annotation

---
## Save Barcodes

We can now just go to the Clusters section and we will see our new Group. We can now select this. Just like we would any other cluster.

Lets try to add more cell types of interest to our *Annotation* group. Lets start by looking for fibroblasts using the VIM marker gene.

---
##  Save Barcodes

Several clusters seem to correspond to fibroblasts. We could use our threshold filtering to select these cells or we could instead use point and click selection using the *Freehand Selection* tool. There is also a *Draw Selection* depending on your preference. Lets start by grabbing one of these clusters.

![overview](./imgs/freehand.png)

---
##  Save Barcodes

We can add a selection to an existing group, so lets use *Freehand Selection* again to highlight the other major fibroblast group.

---
## Custom Groups

When we go back to the Cluster tab and check our new groups we can see the two approaches have slightly different results.

As scRNAseq has dropouts the labeling of the Immune cells is a lot patchier, that said it maybe is more of an accurate representation of the original data.

![overview](./imgs/markers.png)

---
## Advanced Selection

As you may have noticed there is also a dedicated pane to filtering and selecting cells based on various criteria: expression level, cluster identity etc.

To create new selection we create *Rules*.

Our Immune Cell Cd3d+ group is specifically T Cells. We could subset this using rules to find CD8+ T-cells. First we will check CD8B expression levels in the feature tab. It seems a 0.5 filter on the log2 scale is sufficient.

---
## Advanced Selection

We can then set up a rule to filter our cells based on the fact that they are in our *Immune* group *AND* also they are *CD8B+* > 0.5 log2(counts). We can even export this new selection to a group by saving barcodes.

![overview](./imgs/tcells.png)

---
class: inverse, center, middle

# Reanalyze

---

## Reanalyze

A lot of the time there are issues with the QC that need to be remedied. Most of the time this will require taking the data into R/Python to use a collection of scRNAseq tools. But Loupe Browser does allow for some optimization with out leaving the GUI.

The *Reanalyze* tab allows you access to custom filtering of low quality cells, clustering and dimension reduction.

---
## Review Barcodes

The first step is to review the barcodes. This allows you to filter cells that you think are bad or unnecessary, based on whatever criteria.

You can supply the barcodes you want to filter as a *.csv* or you could filter based on the groups you have made. For example if you knew you were not interested in the T-cell portion of the cells than we can filter them out. Domain-specific knowledge is important for this decision making. Whatever groups are selected on the cluster pane are included, so we can deselect the Clusters 1, 6 and 12.

---
## Review Barcodes

![overview](./imgs/tcell_filter.png)

---
## UMI thresholds

We can then customize filtering of cells based on UMI values. Often we would filter top and bottom cells as they represent doublets or empty droplets respectively.

![overview](./imgs/umi_filter.png)
---
## Gene thresholds

Similarly we can then customize filtering of cells based on number of Genes expressed. Often we would filter top and bottom cells as they represent doublets or empty droplets respectively.

![overview](./imgs/gene_filter.png)

---
## Mitochondrial UMI thresholds

We will then set thresholds for the amount of mitochondrial signal. mtRNA is one of the first signs of ambient RNA contamination, and cells with high mtRNA often will confound results.

This is a human data set, so we can select the Human genome. There is also mouse as an option. If you work with a different organism you can supply the mitochondrial genes in a *.csv* format.

---
## Mitochondrial UMI thresholds

We often start with a 10% cut-off (1% for snRNAseq), but if you are seeing a lot of ambient RNA you may have to be flexible around this.

![overview](./imgs/mt_filter.png)

---
## Reanalyze

Lastly, we can decide on some of the outputs. This includes what dimension reduction to perform. So far we have just looked at tSNE, but actually UMAPs are the most popular.

You can customize some of the options used in the dimension reduction by adjusting the parameters. This allows you to control PCA, tSNE and UMAP. Generally though it is best to stick to the default unless you know what you're doing.

Once the analysis is given a name, we can then click *Recluster*. This may take a few minutes.

---
## Reanalyze

All our prior work is preserved but now we have some additional projections, including a UMAP.

![overview](./imgs/rean.png)

---
class: inverse, center, middle

# Import/Export

---

## Working with your own files

Thus far we have been working with some demo data. As mentioned we can get Loupe files from our own experiments (or public data) when we run Cell Ranger. This will take FastQ files as an input.

Alternatively, we can use publicly accessible loupe files from the [10X website](https://www.10xgenomics.com/datasets/) or other sources. These will have to be downloaded.

Once we have the files we simply open a new Loupe Browser file session and then click *Open Loupe File* and find the downloaded file.

---
## Working with your own files

Unfortunately, most repositories do not require loupe files to be submitted with data sets i.e. GEO. That said, often an R object like *.rds* files, are submitted.

There is a package called [loupeR](https://www.10xgenomics.com/support/software/loupe-browser/latest/tutorials/introduction/lb-louper) from 10X which can convert a Seurat object from R into a loupe file. You can include in the export a variety of Group IDs as well, so any annotation that has been added can also be exported.

This makes it really easy to go from more advanced bioinformatic tools to a much more user friendly Loupe Browser.

---
## loupeR

A quick piece of example code:

``` r
library(loupeR)

create_loupe_from_seurat(seu_obj, 
                         output_dir = "loupe", 
                         output_name = "scRNA_annotated_Igfilter")
```

---
## loupeR

When you import it will look like this, with all the inherited categories.

![overview](./imgs/louper_ex.png)

---
## loupeR

Unfortunately there is not an equivalent utility for Python, but it is relatively straightforward to export from Scanpy, read into R and then export again into Loupe Browser.

Alternatively, you can import all aspects of a data set into Loupe Browser individually. We have already seen how to import Group and Feature lists. You can also import projections. By using the "..." next to *Projection Type*.

---
## Export and Saving

The analysis you perform in Loupe Browser can easily be saved. When you click *File* > *Save* the loupe file you are working with is updated. Making it easy to come back to the work or share it with others.

You can also save any of the gene lists, group information or differential results by clicking on export buttons next to each of those panels.

Lastly, you can export high res SVG files (similar to PDF) of your t-SNE/UMAP. These files can be read into Illustrator/Inkscape.

---
## Summary

Hopefully we have shown you that you can complete a number of task with relative ease in Loupe Browser:

* Check QC markers
* Review clustering
* Check marker genes
* Run differentials
* Subset the data

---
## Limitations

* There is less controls of parameters i.e. clustering
* There are not the same tools available i.e. doublets, ambient correction, annotation
* Reproducibility is harder to maintain as it is a GUI

![overview](./imgs/advancedworkflow.png)

---
## Time for an exercise!

Exercises found [here](https://rockefelleruniversity.github.io/LoupeBrowser/exercises/exercises/Exercise_exercise.html)

---
## Answers to exercise

Answers can be found [here](https://rockefelleruniversity.github.io/LoupeBrowser/exercises/answers/Exercise_answers.html)