Session 2 - Conditional branching and looping in R

These exercises are about the conditions and loops sections of Introduction to R.

** Exercise 1 **

Read in the file categoriesAndExpression.txt from the data directory of the course material. Filter it the ofInterest column to just the Selected.

my_df <- read.table("data/categoriesAndExpression.txt",sep="\t",header=T)
head(my_df)

##   geneName ofInterest    pathway Expression
## 1    Gene1   Selected Glycolysis   20.09519
## 2    Gene2   Selected Glycolysis   23.00306
## 3    Gene3   Selected Glycolysis   20.99712
## 4    Gene4   Selected Glycolysis   43.01145
## 5    Gene5   Selected Glycolysis   22.00567
## 6    Gene6   Selected Glycolysis   20.99162

my_df_subset <- my_df[my_df$ofInterest == "Selected",]
my_df_subset

##    geneName ofInterest    pathway Expression
## 1     Gene1   Selected Glycolysis   20.09519
## 2     Gene2   Selected Glycolysis   23.00306
## 3     Gene3   Selected Glycolysis   20.99712
## 4     Gene4   Selected Glycolysis   43.01145
## 5     Gene5   Selected Glycolysis   22.00567
## 6     Gene6   Selected Glycolysis   20.99162
## 7     Gene7   Selected Glycolysis   26.07826
## 8     Gene8   Selected Glycolysis   22.92961
## 9     Gene9   Selected Glycolysis   21.02250
## 10   Gene10   Selected Glycolysis   34.91377
## 11   Gene11   Selected Glycolysis   26.01709
## 12   Gene12   Selected Glycolysis   27.01314
## 13   Gene13   Selected Glycolysis   74.08310
## 14   Gene14   Selected Glycolysis   22.92992
## 15   Gene15   Selected Glycolysis   20.06247
## 16   Gene16   Selected       TGFb   56.03506
## 17   Gene17   Selected       TGFb   54.00140
## 18   Gene18   Selected       TGFb   59.04783
## 19   Gene19   Selected       TGFb   42.91023
## 20   Gene20   Selected       TGFb   66.09706

Reorder the subset from smallest to largest in expression levels.

my_df_subset_ordered <- my_df_subset[order(my_df_subset$Expression),]

** Exercise 2 **

Calculate the factorial (factorial of 3 = 3 * 2 * 1) of 10 using a loop.

for(x in 1:10){
  if(x == 1){
    factorialAnswer <- 1
  }else{
    factorialAnswer <- factorialAnswer * x 
  }
}
factorialAnswer

## [1] 3628800

Using an ifelse() expression, create a factor from a vector of 1 to 40 where all numbers less than 10 are “small”,10 to 30 are “mid”,31 to 40 are “big”

condExercise <- 1:40
condExercise

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

vectorResult <- ifelse(condExercise<10,"small",ifelse(condExercise < 31,"mid","big"))
temp <- factor(vectorResult,levels=c("small","mid","big"),order=T)
temp

##  [1] small small small small small small small small small mid   mid   mid  
## [13] mid   mid   mid   mid   mid   mid   mid   mid   mid   mid   mid   mid  
## [25] mid   mid   mid   mid   mid   mid   big   big   big   big   big   big  
## [37] big   big   big   big  
## Levels: small < mid < big

** Exercise 3 **

Read in all files from the ExpressionResults with .txt extension, using sapply and create a table of gene expression results.

NOTE: the dir() function can return only specific file types with the pattern argument.

filesToRead <- dir("ExpressionResults/", pattern = "*\\.txt", full.names=T)

fileRead <- sapply(filesToRead, read.delim, header=F, sep="\t")

mergedTable <- NULL
for(i in fileRead){
  if(is.null(mergedTable)){
    mergedTable <- i
  }else{
    mergedTable <- merge(mergedTable,i,by=1,all=T)
  }
  
  print(nrow(mergedTable))
}

## [1] 5001
## [1] 5001
## [1] 5001
## [1] 5001
## [1] 5001
## [1] 5001
## [1] 5001
## [1] 5001
## [1] 5001
## [1] 5001
## [1] 5001

mergedTable[1:3,]

##         V1       V2.x          V3     V2.y     V2.x     V2.y     V2.x     V2.y
## 1   Gene_1   Ens_1001 DNA_Binding 3.448466 7.665488 5.250063 5.968927 6.868251
## 2  Gene_10  Ens_10010 DNA_Binding 5.314180 7.813501 5.361170 5.305980 6.742855
## 3 Gene_100 Ens_100100 DNA_Binding 5.591612 5.186500 6.840497 5.197710 5.922931
##       V2.x     V2.y     V2.x     V2.y       V2
## 1 5.367100 5.189686 3.882930 5.329258 6.167451
## 2 5.957786 6.293098 7.361497 6.649428 6.213910
## 3 6.813154 6.228178 5.831575 6.653152 3.992555

Add annotation from Annotation.txt by merging it. How many of each Pathway are in the table? Can you also show this information with a plot?

Annotation <- read.table("ExpressionResults/Annotation.txt",sep="\t",h=T)
annotatedExpression <- merge(Annotation,mergedTable,by=1,all.x=F,all.y=T)
annotatedExpression[1:2,]

##   GeneName   Ensembl     Pathway      V2.x          V3     V2.y   V2.x.1
## 1   Gene_1  Ens_1001 DNA_Binding  Ens_1001 DNA_Binding 3.448466 7.665488
## 2  Gene_10 Ens_10010 DNA_Binding Ens_10010 DNA_Binding 5.314180 7.813501
##     V2.y.1   V2.x.2   V2.y.2   V2.x.3   V2.y.3   V2.x.4   V2.y.4       V2
## 1 5.250063 5.968927 6.868251 5.367100 5.189686 3.882930 5.329258 6.167451
## 2 5.361170 5.305980 6.742855 5.957786 6.293098 7.361497 6.649428 6.213910

annotatedExpression$Pathway <- factor(annotatedExpression$Pathway)


summary(annotatedExpression$Pathway)

##  DNA_Binding   Glycolysis         TGFb WntSignaling         NA's 
##         1000          500          300          200         3001

plot(annotatedExpression$Pathway)

Session 2 - Conditional branching and looping in R

Rockefeller University, Bioinformatics Resource Centre

https://rockefelleruniversity.github.io/Intro_To_R_1Day/