class: center, middle, inverse, title-slide # Analysis of RNAseq data in R and Bioconductor (part 2)
###
http://rockefelleruniversity.github.io/RU_RNAseq/
--- ## What we will cover * Session 1: Alignment and counting * _Session 2: Differential gene expression analysis_ * Session 3: Visualizing results through PCA and clustering * Session 4: Gene Set Analysis * Session 5: Differential Transcript Utilization analysis --- ## The data Over these sessions we will review data from Christina Leslie's lab at MSKCC on T-Reg cells. This can be found on the Encode portal. T-Reg data [here](https://www.encodeproject.org/experiments/ENCSR486LMB/) and activated T-Reg data [here](https://www.encodeproject.org/experiments/ENCSR726DNP/). --- ## The data Several intermediate files have already been made for you to use. You can download the course content from GitHub [here](https://github.com/RockefellerUniversity/RU_RNAseq/archive/master.zip). Once downloaded you will need to unzip the folder and then navigate into the *r_course* directory of the download. This will mean the paths in the code are correct. ```r setwd("Path/to/Download/RU_RNAseq-master") ``` --- ## The data All Gene and Exon counts can be found as .RData obects in **data** directory - Counts in genes can be found at - **data/GeneCounts.Rdata** - Counts in disjoint exons can be found at - **data/ExonCounts.Rdata** - Salmon transcript quantification output directories can be found under - **data/Salmon/** --- class: inverse, center, middle # Counting with multiple RNAseq datasets <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Counts from SummarizeOverlaps In our last session we counted our reads in genes using the summarize overlaps function to generate our **RangedSummarizedExperiment** object. We did this with a single BAM file. To be able to do differntial gene expression analysis we need replicates. We can use similar approaches for handle multiple files. First I am using a **BamFileList** of all the T-Reg samples to easily organize my samples and save memory with the **yieldSize** parameter. ```r library(Rsamtools) bamFilesToCount <- c("Sorted_Treg_1.bam", "Sorted_Treg_2.bam", "Sorted_Treg_act_1.bam", "Sorted_Treg_act_2.bam", "Sorted_Treg_act_3.bam") names(bamFilesToCount) <- c("Sorted_Treg_1", "Sorted_Treg_2", "Sorted_Treg_act_1", "Sorted_Treg_act_2", "Sorted_Treg_act_3") myBams <- BamFileList(bamFilesToCount, yieldSize = 10000) ``` --- ## Counts from SummarizeOverlaps We can count over this BamFileList in the same way we did for a single sample to produce our **RangedSummarizedExperiment** object of counts in genes across all samples. ```r library(TxDb.Mmusculus.UCSC.mm10.knownGene) library(GenomicAlignments) geneExons <- exonsBy(TxDb.Mmusculus.UCSC.mm10.knownGene, by = "gene") geneCounts <- summarizeOverlaps(geneExons, myBams, ignore.strand = TRUE) geneCounts ``` ``` ## class: RangedSummarizedExperiment ## dim: 24100 5 ## metadata(0): ## assays(1): '' ## rownames(24100): 497097 19888 ... 100040911 170942 ## rowData names(0): ## colnames(5): Sorted_T_reg_1 Sorted_T_reg_2 Sorted_T_reg_act_1 ## Sorted_T_reg_act_2 Sorted_T_reg_act_3 ## colData names(0): ``` --- ## Speeding up SummarizeOverlaps Bioconductor has a simple, unified system for parallelization through the [**BiocParallel** package.]("https://bioconductor.org/packages/release/bioc/html/BiocParallel.html") The BiocParallel package allows for differing parallel approaches including multiple cores on the same machine and distributed computing across large cluster environments (i.e a HPC). Simply by installing and loading library, we are ready for parallelization. ```r library(BiocParallel) ``` --- ## Speeding up SummarizeOverlaps We can control the parallelization in BiocParallel often choosing either a serial mode (no parallelization) with **SerialParam()** function or the number of cores to use with **MulticoreParam(workers=NUMBEROFCORES)**. We use the **register()** function to set the desired parallelization. Once registered you can then start counting using a parallelization. ```r paramMulti <- MulticoreParam(workers = 2) paramSerial <- SerialParam() register(paramSerial) ``` --- ## Counts from SummarizeOverlaps The counts object has already been made for you to save time. It is in the *data* directory The counrs object is a **RangedSummarizedExperiment**, which contains our counts as a matrix accessible by the **assay()** function. Each row is named by its Entrez gene ID. ```r load("data/GeneCounts.RData") ``` ```r assay(geneCounts)[1:2, ] ``` ``` ## Sorted_T_reg_1 Sorted_T_reg_2 Sorted_T_reg_act_1 Sorted_T_reg_act_2 ## 497097 0 1 0 0 ## 19888 0 0 0 0 ## Sorted_T_reg_act_3 ## 497097 0 ## 19888 11 ``` --- ## Counts from SummarizeOverlaps We can retrieve the GRangesList we counted by the **rowRanges()** accessor function. Each GRangesList element contains exons used in counting, has Entrez gene ID and is in matching order as rows in our count table. ```r rowRanges(geneCounts)[1:2, ] ``` ``` ## GRangesList object of length 2: ## $`497097` ## GRanges object with 3 ranges and 2 metadata columns: ## seqnames ranges strand | exon_id exon_name ## <Rle> <IRanges> <Rle> | <integer> <character> ## [1] chr1 3214482-3216968 - | 8008 <NA> ## [2] chr1 3421702-3421901 - | 8009 <NA> ## [3] chr1 3670552-3671498 - | 8012 <NA> ## ------- ## seqinfo: 66 sequences (1 circular) from mm10 genome ## ## $`19888` ## GRanges object with 6 ranges and 2 metadata columns: ## seqnames ranges strand | exon_id exon_name ## <Rle> <IRanges> <Rle> | <integer> <character> ## [1] chr1 4290846-4293012 - | 8013 <NA> ## [2] chr1 4343507-4350091 - | 8014 <NA> ## [3] chr1 4351910-4352081 - | 8015 <NA> ## [4] chr1 4352202-4352837 - | 8016 <NA> ## [5] chr1 4360200-4360314 - | 8017 <NA> ## [6] chr1 4409170-4409241 - | 8018 <NA> ## ------- ## seqinfo: 66 sequences (1 circular) from mm10 genome ``` --- class: inverse, center, middle # Differential gene expression analysis <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Differential expression with RNAseq To perform differential expression between our sample groups we will need to perform some essential steps. - Transform and/or normalize our counts. - Calculate significance of differences between groups from normalized/transformed counts. --- ## Log transform or as counts Two broadly different approaches to RNAseq analysis could be. - Log2 transform, normalize, use standard test for normal distribution to find differential expression. - Normalize and use test suitable for count data to find differential expression. --- ## Normalization As with microarrays, for the majority of normalization approaches we assume that the composition of RNA species across samples is similar and total RNA levels as the same. To account for technical differences in library composition, many of these approached identify scaling factors for each sample which may be used to normalized observed/measured counts. Simple measures such as reads per million (RPM) are scaled to total mapped reads. More sophisticated methods, such as used in the package **DESeq2** find scaling factors from the distribution of genes relative log expression versus their respective average across all samples. .pull-left[ <div align="center"> <img src="imgs/beforeNorm.png" alt="offset" height="200" width="400"> </div> ] .pull-right[ <div align="center"> <img src="imgs/afterNorm.png" alt="offset" height="200" width="400"> </div> ] --- ## Mean and variance relationship Most approaches seek to model the relationship between mean expression and variance of expression in order to model and shrink variance of individual genes. This will assist us to detect differential expression with a small number of replicates.  --- ## Differential Expression Packages Options for differential expression analysis in RNAseq data are available in R including DESeq2, EdgeR and Voom/Limma. Both EdgeR and DESeq2 works with non-transformed count data to detect differential expression. Voom transforms count data into log2 values with associated mean dependent weighting for use in Limma differential expression functions. We will use the DESeq2 software to identify differential expression. --- ## DESeq2 To use DESeq2 we must first produce a dataframe of our Sample groups. The DESeq2 package contains a workflow for assessing local changes in fragment/read abundance between replicated conditions. This workflow includes normalization, variance estimation, outlier removal/replacement as well as significance testing suited to high throughput sequencing data (i.e. integer counts). To use DESeq2 we must first produce a dataframe of our Sample groups. ```r metaData <- data.frame(Group = c("Naive", "Naive", "Act", "Act", "Act"), row.names = colnames(geneCounts)) metaData ``` ``` ## Group ## Sorted_T_reg_1 Naive ## Sorted_T_reg_2 Naive ## Sorted_T_reg_act_1 Act ## Sorted_T_reg_act_2 Act ## Sorted_T_reg_act_3 Act ``` --- ## Creating a DESeq2 object We can use the **DESeqDataSetFromMatrix()** function to create a DESeq2 object. We must provide our matrix of counts to countData parameter, our metadata data.frame to colData parameter and we include to an optional parameter of rowRanges the non-redundant peak set we can counted on. Finally we provide the name of the column in our metadata data.frame within which we wish to test to the design parameter. ```r countMatrix <- assay(geneCounts) countGRanges <- rowRanges(geneCounts) dds <- DESeqDataSetFromMatrix(countMatrix, colData = metaData, design = ~Group, rowRanges = countGRanges) dds ``` ``` ## class: DESeqDataSet ## dim: 24100 5 ## metadata(1): version ## assays(1): counts ## rownames(24100): 497097 19888 ... 100040911 170942 ## rowData names(0): ## colnames(5): Sorted_T_reg_1 Sorted_T_reg_2 Sorted_T_reg_act_1 ## Sorted_T_reg_act_2 Sorted_T_reg_act_3 ## colData names(1): Group ``` --- ## Creating a DESeq2 object As an alternative input, we can first update our **RangedSummarizedExperiment** object to include our metadata using the **colData** accessor. ```r colData(geneCounts)$Group <- metaData$Group geneCounts ``` ``` ## class: RangedSummarizedExperiment ## dim: 24100 5 ## metadata(0): ## assays(1): '' ## rownames(24100): 497097 19888 ... 100040911 170942 ## rowData names(0): ## colnames(5): Sorted_T_reg_1 Sorted_T_reg_2 Sorted_T_reg_act_1 ## Sorted_T_reg_act_2 Sorted_T_reg_act_3 ## colData names(1): Group ``` --- ## Creating a DESeq2 object Now we can make use of the **DESeqDataSet()** function to build directly from our **RangedSummarizedExperiment** object. We simply have to specify the **design** parameter to specify metadata column to test on. ```r dds <- DESeqDataSet(geneCounts, design = ~Group) ``` ``` ## renaming the first element in assays to 'counts' ``` ```r dds ``` ``` ## class: DESeqDataSet ## dim: 24100 5 ## metadata(1): version ## assays(1): counts ## rownames(24100): 497097 19888 ... 100040911 170942 ## rowData names(0): ## colnames(5): Sorted_T_reg_1 Sorted_T_reg_2 Sorted_T_reg_act_1 ## Sorted_T_reg_act_2 Sorted_T_reg_act_3 ## colData names(1): Group ``` --- ## DESeq function We can now run the DESeq2 workflow on our DESeq2 object using the **DESeq()** function. This function will normalize library sizes, estimate and shrink variance and test our data in a single step. Our DESeq2 object is updated to include useful statistics such our normalized values and variance of signal within each gene. ```r dds <- DESeq(dds) ``` ``` ## estimating size factors ``` ``` ## estimating dispersions ``` ``` ## gene-wise dispersion estimates ``` ``` ## mean-dispersion relationship ``` ``` ## final dispersion estimates ``` ``` ## fitting model and testing ``` --- ## Normalization In its first step the **DEseq()** function normalizes our data by evaluating the median expression of genes across all samples to produce a per sample normalization factor (library size). We can retrieve normalized and unnormalized values from our DESeq2 object using the **counts()** function and specifying the **normalized** parameter as TRUE. Normalized counts are counts divided by the library scaling factor. ```r normCounts <- counts(dds, normalized = TRUE) normCounts[1:2, ] ``` ``` ## Sorted_T_reg_1 Sorted_T_reg_2 Sorted_T_reg_act_1 Sorted_T_reg_act_2 ## 497097 0 0.9202987 0 0 ## 19888 0 0.0000000 0 0 ## Sorted_T_reg_act_3 ## 497097 0.00000 ## 19888 10.12247 ``` --- ## Variance Estimation In the second step the **DEseq()** function estimates variances and importantly shrinks variance depending on the mean. This shrinking of variance allows us to detect significant changes with low replicate number. We can review the Variance/mean relationship and shrinkage using **plotDispEsts()** function and our **DESeq2** object. We can see the <span style="color:blue">**adjusted dispersions in blue**</span> and <span style="color:black">**original dispersion for genes in black**</span>. ```r plotDispEsts(dds) ``` <!-- --> --- ## DESeq results Finally we can extract our contrast of interest using the **results()** function. We must specify the **contrast** parameter, listing the metadata column of interest and the two groups to compare. The resulting **DESeqResults** can be sorted by pvalue to list most significant changes at top of table. ```r myRes <- results(dds, contrast = c("Group", "Act", "Naive")) myRes <- myRes[order(myRes$pvalue), ] myRes[1:3, ] ``` ``` ## log2 fold change (MLE): Group Act vs Naive ## Wald test p-value: Group Act vs Naive ## DataFrame with 3 rows and 6 columns ## baseMean log2FoldChange lfcSE stat pvalue padj ## <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> ## 54167 20595.52 2.22416 0.0513373 43.3244 0 0 ## 20198 4896.29 3.41735 0.0782338 43.6812 0 0 ## 20200 6444.62 3.19637 0.0748192 42.7212 0 0 ``` --- ## DESeqResults objects The **DESeqResults** object contains important information from our differential expression testing. - **baseMean** - Mean normalised values across all samples. - **log2FoldChange** - Log2 of fold change between groups being compared. - **pvalue** - Significance of change between groups. - **padj** - Significance of change between groups corrected for multiple testing. --- ## DESeqResults objects With our **DESeqResults** object we can get a very quick overview of differential expression changes using the **summary** function. ```r summary(myRes) ``` ``` ## ## out of 19776 with nonzero total read count ## adjusted p-value < 0.1 ## LFC > 0 (up) : 2819, 14% ## LFC < 0 (down) : 2373, 12% ## outliers [1] : 0, 0% ## low counts [2] : 6443, 33% ## (mean count < 6) ## [1] see 'cooksCutoff' argument of ?results ## [2] see 'independentFiltering' argument of ?results ``` --- ## DESeqResults objects We can review the relationship between fold-changes and expression levels with a MA-Plot by using the **plotMA()** function and our DESeqResults object. ```r plotMA(myRes) ``` <!-- --> --- ## DESeqResults objects We downweight genes with high fold change but low significance (due to low counts/high dispersion) by using the **lfcShrink()** function as we have the **results()** function. This allows us to now use the log2FC as a measure of significance of change in our ranking for analysis and in programs such as **GSEA**. ```r myRes_lfc <- lfcShrink(dds, coef = "Group_Naive_vs_Act") DESeq2::plotMA(myRes_lfc) ``` <!-- --> --- ## DESeqResults objects We can also convert our DESeqResults objects into a standard data frame using the **as.data.frame** function. ```r myResAsDF <- as.data.frame(myRes) myResAsDF[1:2, ] ``` ``` ## baseMean log2FoldChange lfcSE stat pvalue padj ## 497097 0.1840597 -1.353988 4.558285 -0.2970389 0.7664368 NA ## 19888 2.0244933 4.254714 3.557199 1.1960856 0.2316632 NA ``` --- class: inverse, center, middle # Significance and Multiple Testing <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Multiple testing .pull-left[ In our analysis we are testing 1000s of genes and identifing significance of each gene separately at a set confidence interval. If we are 95% confident a gene is differentially expressed when we look at all genes we can expect 5% to be false. When looking across all genes we apply a multiple testing correction to account for this. ] .pull-right[ <div align="center"> <img src="imgs/significant.png" alt="offset" height="600" width="400"> </div> ] --- ## Multiple testing We could apply a multiple testing to our p-values ourselves using either the Bonferroni or Benjamini-Hockberg correction. - Bonferonni = (p-value of gene)*(Total genes tested) - Benjamini-Hockberg = ((p-value of gene)*(Total genes tested))/(rank of p-value of gene) --- ## Multiple testing The total number of genes tested has a large effect on these multiple correction methods. *Independent* filtering may be applied to reduce number of genes tested such as removing genes with low expression and/or variance. **We can not simply filter to differentially expressed genes and reapply correction. This is not independent!** --- ## NA in padj By default DEseq2 filters out low expressed genes to assist in multiple testing correction. This results in **NA** values in **padj** column for low expressed genes. ```r table(is.na(myResAsDF$padj)) ``` ``` ## ## FALSE TRUE ## 13333 10767 ``` --- ## NA in padj We can add a column of our own adjusted p-values for all genes with a pvalue using the **p.adjust()** function. ```r myResAsDF$newPadj <- p.adjust(myResAsDF$pvalue) myResAsDF[1:3, ] ``` ``` ## baseMean log2FoldChange lfcSE stat pvalue padj newPadj ## 497097 0.1840597 -1.353988 4.558285 -0.2970389 0.7664368 NA 1 ## 19888 2.0244933 4.254714 3.557199 1.1960856 0.2316632 NA 1 ## 20671 2.0122042 -3.633158 2.891907 -1.2563189 0.2090004 NA 1 ``` --- ## NA in padj Typcially genes with **NA** padj values should be filtered from the table for later evaluation and functional testing. ```r myResAsDF <- myResAsDF[!is.na(myResAsDF$padj), ] myResAsDF <- myResAsDF[order(myResAsDF$pvalue), ] myResAsDF[1:3, ] ``` ``` ## baseMean log2FoldChange lfcSE stat pvalue padj newPadj ## 54167 20595.525 2.224161 0.05133731 43.32445 0 0 0 ## 20198 4896.288 3.417348 0.07823384 43.68120 0 0 0 ## 20200 6444.617 3.196367 0.07481916 42.72124 0 0 0 ``` --- class: inverse, center, middle # DEseq2 and Salmon <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## DESeq2 from Salmon We saw at the end of last session how we can gain transcript quantification from FASTQ using Salmon. Salmon offers a fast alternative to alignment and counting for transcript expression estimation so we would be keen to make use of this in our differential expression analysis. To make use of Salmon transcript quantifications for differential gene expression changes we must however summarize our transcript expression estimates to gene expression estimates. --- ## tximport The tximport package, written by same author as DESeq2, offers functions to import data from a wide range of transcript counting and quantification software and summarizes transcript expression to genes. We can first load the Bioconductor [**tximport** package.](https://bioconductor.org/packages/release/bioc/html/tximport.html) ```r library(tximport) ``` --- ## tximport In order to summarize our transcripts to genes we will need to provide a data.frame of transcripts to their respective gene names. First we read in a single Salmon quantification report to get a table containing all transcript names. ```r temp <- read.delim("data/Salmon/TReg_2_Quant/quant.sf") temp[1:3, ] ``` ``` ## Name Length EffectiveLength TPM NumReads ## 1 ENSMUST00000193812.1 1070 820.000 0.00000 0 ## 2 ENSMUST00000082908.1 110 3.749 0.00000 0 ## 3 ENSMUST00000192857.1 480 230.000 0.24391 2 ``` --- ## tximport We can now use the **select()** function from AnnotationDbi to retrieve a transcript names (TXNAME) to gene ids (GENEID) map from our TxDb.Mmusculus.UCSC.mm10.knownGene object. Remember that available columns and keys can be found using the **columns()** and **keytype** functions respectively. ```r Tx2Gene <- select(TxDb.Mmusculus.UCSC.mm10.knownGene, keys = as.vector(temp[, 1]), keytype = "TXNAME", columns = c("GENEID", "TXNAME")) Tx2Gene <- Tx2Gene[!is.na(Tx2Gene$GENEID), ] Tx2Gene[1:10, ] ``` ``` ## TXNAME GENEID ## 14 ENSMUST00000134384.7 18777 ## 15 ENSMUST00000027036.10 18777 ## 16 ENSMUST00000150971.7 18777 ## 17 ENSMUST00000155020.1 18777 ## 18 ENSMUST00000119612.8 18777 ## 19 ENSMUST00000137887.7 18777 ## 20 ENSMUST00000115529.7 18777 ## 21 ENSMUST00000131119.1 18777 ## 22 ENSMUST00000141278.1 18777 ## 23 ENSMUST00000081551.13 21399 ``` --- ## tximport Now we can use the **tximport** function to import and summarize our Salmon quantification files to gene level expression estimates. We must provide the paths to Salmon *quant.sf* files, the type of file to import (here "salmon") to the **type** argument and our data.frame of transcript to gene mapping using the **tx2gene** argument. ```r salmonQ <- dir("data/Salmon/", recursive = T, pattern = "quant.sf", full.names = T) salmonCounts <- tximport(salmonQ, type = "salmon", tx2gene = Tx2Gene) ``` ``` ## reading in files with read.delim (install 'readr' package for speed up) ``` ``` ## 1 2 3 4 5 ## transcripts missing from tx2gene: 38851 ## summarizing abundance ## summarizing counts ## summarizing length ``` --- ## tximport The result from tximport is a list containing our summarized gene expression estimates **Abundance** - Number of transcripts per million transcripts (TPM). **Counts** - Estimated counts. ```r salmonCounts$abundance[1:2, ] ``` ``` ## [,1] [,2] [,3] [,4] [,5] ## 100009600 1.193915 1.664778 0.623252 1.117103 1.255239 ## 100009609 0.079162 0.090154 0.062317 0.067593 0.041142 ``` ```r salmonCounts$counts[1:2, ] ``` ``` ## [,1] [,2] [,3] [,4] [,5] ## 100009600 73.602 100.600 34.117 74.385 75.221 ## 100009609 27.280 30.453 19.069 25.160 13.782 ``` --- ## DESeq2 from Tximport We can now use the **DESeqDataSetFromTximport()** to build our DESeq2 object from salmon counts. We must specify the metadata to **colData** parameter and specify the column of interest in **design** parameter as we have done for **DESeqDataSetFromMatrix()**. ```r ddsSalmon <- DESeqDataSetFromTximport(salmonCounts, colData = metaData, design = ~Group) ``` ``` ## using counts and average transcript lengths from tximport ``` --- ## DESeq2 from Tximport We can then proceed as we did for previous analysis using summarizeOverlaps counts. ```r ddsSalmon <- DESeq(ddsSalmon) myResS <- results(ddsSalmon, contrast = c("Group", "Act", "Naive")) myResS <- myResS[order(myResS$pvalue), ] myResS[1:3, ] ``` ``` ## log2 fold change (MLE): Group Act vs Naive ## Wald test p-value: Group Act vs Naive ## DataFrame with 3 rows and 6 columns ## baseMean log2FoldChange lfcSE stat pvalue padj ## <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> ## 110454 20032.40 2.81696 0.0569895 49.4295 0 0 ## 12772 4452.24 3.87109 0.1007898 38.4075 0 0 ## 14939 6154.51 5.75391 0.1076499 53.4502 0 0 ``` --- ## Salmon and Rsubread We can now compare the differential following Salmon quantification and Rsubread/summarizeOverlaps. Here we are showing the overlaps between genes that have a padj <0.05 and also a fold change greater then 2. <!-- --> --- class: inverse, center, middle # Adding Annotation <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Annotation of results table By whatever method we have chosen to analyze differential expression we will want to add some sensible gene names or symbols to our data frame of results. We can use the **org.db** packages to retrieve Gene Symbols for our Entrez IDs using the **select** function. Here we use the **org.db** package for mouse, **org.db** ```r library(org.Mm.eg.db) eToSym <- select(org.Mm.eg.db, keys = rownames(myResAsDF), keytype = "ENTREZID", columns="SYMBOL") eToSym[1:10,] ``` ``` ## ENTREZID SYMBOL ## 1 54167 Icos ## 2 20198 S100a4 ## 3 20200 S100a6 ## 4 12772 Ccr2 ## 5 16407 Itgae ## 6 14939 Gzmb ## 7 110454 Ly6a ## 8 20715 Serpina3g ## 9 93692 Glrx ## 10 12766 Cxcr3 ``` --- ## Annotation of results table Now we can merge the Entrez ID to Symbol table into our table of differential expression results. ```r annotatedRes <- merge(eToSym,myResAsDF, by.x=1, by.y=0, all.x=FALSE, all.y=TRUE) annotatedRes <- annotatedRes[order(annotatedRes$pvalue),] annotatedRes[1:3,] ``` ``` ## ENTREZID SYMBOL baseMean log2FoldChange lfcSE stat pvalue padj ## 1027 110454 Ly6a 17468.571 2.809022 0.04994700 56.24006 0 0 ## 1690 12772 Ccr2 8529.015 4.011062 0.08476422 47.32022 0 0 ## 2339 14939 Gzmb 6260.651 5.733767 0.09678177 59.24428 0 0 ## newPadj ## 1027 0 ## 1690 0 ## 2339 0 ``` --- ## Time for an exercise [Link_to_exercises](../../exercises/exercises/RNAseq_part2_exercise.html) [Link_to_answers](../../exercises/answers/RNAseq_part2_answers.html)