class: center, middle, inverse, title-slide .title[ # Introduction to R - Session 2 ] .subtitle[ ## Bioinformatics Resource Center - Rockefeller University ] .author[ ###
http://rockefelleruniversity.github.io/Intro_To_R_1Day/
] .author[ ###
brc@rockefeller.edu
] --- ## Overview - [Recap](https://rockefelleruniversity.github.io/Intro_To_R_1Day/presentations/singlepage/introToR_Session2.html#recap) - [Conditions and Loops](https://rockefelleruniversity.github.io/Intro_To_R_1Day/presentations/singlepage/introToR_Session2.html#conditions_and_loops) - [Libraries](https://rockefelleruniversity.github.io/Intro_To_R_1Day/presentations/singlepage/introToR_Session2.html#getting_additional_libraries) - [Writing scripts](https://rockefelleruniversity.github.io/Intro_To_R_1Day/presentations/singlepage/introToR_Session2.html#scripts) --- # Recap Session 1 covered introduction to R data types and import/export of data. - [Background to R](https://rockefelleruniversity.github.io/Intro_To_R_1Day/presentations/singlepage/introToR_Session1.html#background-to-r) - [Data types in R](https://rockefelleruniversity.github.io/Intro_To_R_1Day/presentations/singlepage/introToR_Session1.html#data_types_in_r) - [Reading and writing in R](https://rockefelleruniversity.github.io/Intro_To_R_1Day/presentations/singlepage/introToR_Session1.html#reading-and-writing-data-in-r) --- ## Recap R stores data in five main data types. - **Vector** - Ordered collection of single data type (numeric/character/logical). - **Matrix** - Table (ordered 2D array) of single data type (numeric/character/logical). - **Factors** - Ordered collection of ordinal or nominal categories. - **Data frame** - Table (ordered 2D array) of multiple data types of same length. - **List** - Ordered collection of multiple data types of differing length. ``` r aVector <- c(1,2,3,4,5,6,7,8,9,10) aMatrix <- matrix(aVector,ncol=2,nrow=5,byrow = TRUE) aFactor <- factor(c("R","Python","R","R","Python"),levels = c("R","Python")) aDataFrame <- data.frame(Number=c(1,2,3,4,5),Factor=aFactor) ``` --- ## Recap We can access and change information in our data.types using **indexing** with **[]** (or **[[]]** for lists). ``` r aVector <- c(1,2,3,4,5,6,7,8,9,10) aVector[10] ``` ``` ## [1] 10 ``` ``` r aVector[10] <- 0 aVector ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 0 ``` --- ## Recap For matrices and data frames we use two indexes, specifying columns and rows. **matrix[_rowIndex_,_columnIndex_]** .pull-left[ ``` r aMatrix <- matrix(aVector,ncol=2, nrow=5,byrow = TRUE) aMatrix ``` ``` ## [,1] [,2] ## [1,] 1 2 ## [2,] 3 4 ## [3,] 5 6 ## [4,] 7 8 ## [5,] 9 0 ``` ``` r aMatrix[1,1] ``` ``` ## [1] 1 ``` ``` r aMatrix[,2] ``` ``` ## [1] 2 4 6 8 0 ``` ] .pull-right[ ``` r aMatrix[1,1] <- 0 aMatrix[,2] <- 100 aMatrix ``` ``` ## [,1] [,2] ## [1,] 0 100 ## [2,] 3 100 ## [3,] 5 100 ## [4,] 7 100 ## [5,] 9 100 ``` ] --- ## Recap Remember a matrix can only contain one data type (numeric, character etc). For mixed data types we use data frames. .pull-left[ ``` r aMatrix <- matrix(aVector,ncol=2, nrow=5,byrow = TRUE) aMatrix ``` ``` ## [,1] [,2] ## [1,] 1 2 ## [2,] 3 4 ## [3,] 5 6 ## [4,] 7 8 ## [5,] 9 0 ``` ] .pull-right[ ``` r aMatrix[1,2] <- "Word" aMatrix ``` ``` ## [,1] [,2] ## [1,] "1" "Word" ## [2,] "3" "4" ## [3,] "5" "6" ## [4,] "7" "8" ## [5,] "9" "0" ``` ] --- ## Recap We can use logical operations to evaluate information our data types. ``` r aVector <- c(1,2,3,4,5,6,7,8,9,10) aVector > 9 ``` ``` ## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE ``` ``` r aFactor <- factor(c("R","Python","R","R","Python"),levels = c("R","Python")) aDataFrame <- data.frame(Number=c(1,2,3,4,5),Factor=aFactor) aDataFrame ``` ``` ## Number Factor ## 1 1 R ## 2 2 Python ## 3 3 R ## 4 4 R ## 5 5 Python ``` ``` r aDataFrame$Factor == "R" ``` ``` ## [1] TRUE FALSE TRUE TRUE FALSE ``` --- ## Recap We can use logical operations with indexing to subset or alter information held in our data types. ``` r aVector <- c(1,2,3,4,5,6,7,8,9,10) aVector > 5 ``` ``` ## [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE ``` ``` r aVector[aVector > 5] <- 10 aVector ``` ``` ## [1] 1 2 3 4 5 10 10 10 10 10 ``` --- ## Recap When replacing factor columns in data frames we have to remember the **levels**. ``` r aFactor <- factor(c("R","Python","R","R","Python"), levels = c("R","Python")) aDataFrame <- data.frame(Number=c(1,2,3,4,5), Factor=aFactor) aDataFrame[aDataFrame$Factor == "R",2] <- "NotPython" ``` ``` ## Warning in `[<-.factor`(`*tmp*`, iseq, value = c("NotPython", "NotPython", : ## invalid factor level, NA generated ``` ``` r aDataFrame ``` ``` ## Number Factor ## 1 1 <NA> ## 2 2 Python ## 3 3 <NA> ## 4 4 <NA> ## 5 5 Python ``` --- ## Recap We can update levels in a data frame column as we would a factor. Now we can replace our elements in data frame. ``` r aDataFrame <- data.frame(Number=c(1,2,3,4,5), Factor=aFactor) aDataFrame$Factor <- factor(aDataFrame$Factor, levels = c("R","Python","NotPython")) aDataFrame[aDataFrame$Factor == "R",2] <- "NotPython" aDataFrame ``` ``` ## Number Factor ## 1 1 NotPython ## 2 2 Python ## 3 3 NotPython ## 4 4 NotPython ## 5 5 Python ``` --- ## Recap Data can be read into R as a table with the **read.table()** function and written to file with the **write.table()** function. ``` r Table <- read.table("data/readThisTable.csv",sep=",",header=T,row.names=1) Table[1:3,] ``` ``` ## Sample_1.hi Sample_2.hi Sample_3.hi Sample_4.low Sample_5.low ## Gene_a 4.570237 3.230467 3.351827 3.930877 4.098247 ## Gene_b 3.561733 3.632285 3.587523 4.185287 1.380976 ## Gene_c 3.797274 2.874462 4.016916 4.175772 1.988263 ## Sample_1.low ## Gene_a 4.418726 ## Gene_b 5.936990 ## Gene_c 3.780917 ``` ``` r write.table(Table,file="data/writeThisTable.csv", sep=",", row.names =F,col.names=T) ``` --- ## Recap If we want to know how a function works or get an example on its use we can use **?** infront function name. **?_FunctionName_** ``` r ?merge ``` --- class: inverse, center, middle # Conditions and Loops <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Conditions and Loops * <span style="color:red">Conditional branching (if,else)</span> * <span style="color:green">Loops (for, while)</span> <span style="color:green">While</span> I'm analysing data, <span style="color:red">if</span> I need to execute complex statistical procedures on the data I will use R <span style="color:red">else</span> I will use a calculator. --- ## Conditions and Loops We have looked at using logical vectors as a way to index other data types. ``` r x <- 1:10 x[x < 4] ``` ``` ## [1] 1 2 3 ``` Logicals are also used in controlling how scripted procedures execute. --- ## Conditional branching Conditional branching is the evaluation of a logical to determine whether a chunk of code is executed. In R, we use the **if** statement with the logical to be evaluated in **()** and dependent code to be executed in **{}**. ``` r x <- TRUE if(x){ message("x is true") } ``` ``` ## x is true ``` ``` r x <- FALSE if(x){ message("x is true") } ``` --- ## Evaluating with if() More often, we construct the logical value within **()** itself. This can be termed the **condition**. ``` r x <- 10 y <- 4 if(x > y){ message("The value of x is ",x," which is greater than ", y) } ``` ``` ## The value of x is 10 which is greater than 4 ``` The message is printed above because x is greater than y. ``` r y <- 20 if(x > y){ message("The value of x is ",x," which is greater than ", y) } ``` x is now no longer greater than y, so no message is printed. We really still want a message telling us what was the result of the condition. --- ## else following an if .pull-left[ If we want to perform an operation when the condition is false we can follow the if() statement with an else statement. ``` r x <- 3 if(x < 5){ message(x, " is less than to 5") }else{ message(x," is greater than or equal to 5") } ``` ``` ## 3 is less than to 5 ``` ] .pull-right[ With the addition of the else statement, when x is not less than 5 the code following the else statement is executed. ``` r x <- 10 if(x < 5){ message(x, " is less than 5") }else{ message(x," is greater than or equal to 5") } ``` ``` ## 10 is greater than or equal to 5 ``` ] --- ## else if We may wish to execute different procedures under multiple conditions. This can be controlled in R using the else if() following an initial if() statement. ``` r x <- 5 if(x > 5){ message(x," is greater than 5") }else if(x == 5){ message(x," is 5") }else{ message(x, " is less than 5") } ``` ``` ## 5 is 5 ``` --- ## ifelse() A useful function to evaluate conditional statements over vectors is the **ifelse()** function. ``` r x <- 1:10 x ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 ``` The ifelse() functions take the arguments of the condition to evaluate, the action if the condition is true and the action when condition is false. ``` r ifelse(x <= 3,"lessOrEqual","more") ``` ``` ## [1] "lessOrEqual" "lessOrEqual" "lessOrEqual" "more" "more" ## [6] "more" "more" "more" "more" "more" ``` --- ## ifelse() We can use multiple nested **ifelse** functions to be apply more complex logical to vectors. ``` r ifelse(x == 3,"same", ifelse(x < 3,"less","more") ) ``` ``` ## [1] "less" "less" "same" "more" "more" "more" "more" "more" "more" "more" ``` --- ## Loops The two main generic methods of looping in R are **while** and **for** - **while** - *while* loops repeat the execution of code while a condition evaluates as true. - **for** - *for* loops repeat the execution of code for a range of specified values. --- ## While loops While loops are most useful if you know the condition will be satisfied but are not sure when (i.e. Looking for a point when a number first occurs in a list). ``` r x <- 1 while(x < 3){ message("x is ",x," ") x <- x+1 } ``` ``` ## x is 1 ``` ``` ## x is 2 ``` --- ## For loops For loops allow the user to cycle through a range of values applying an operation for every value. Here we cycle through a numeric vector and print out its value. ``` r x <- 1:5 for(i in x){ message(i) } ``` ``` ## 1 ``` ``` ## 2 ``` ``` ## 3 ``` ``` ## 4 ``` ``` ## 5 ``` --- ## For loops Similarly we can cycle through other vector types (or lists). ``` r x <- letters[1:5] for(i in x){ message(i) } ``` ``` ## a ``` ``` ## b ``` ``` ## c ``` ``` ## d ``` ``` ## e ``` --- ## Looping through indices We may wish to keep track of the position in x we are evaluating to retrieve the same index in other variables. A common practice is to loop though all possible index positions of x using the expression **1:length(x)**. ``` r geneName <- c("Ikzf1","Myc","Igll1") expression <- c(10.4,4.3,6.5) 1:length(geneName) ``` ``` ## [1] 1 2 3 ``` ``` r for(i in 1:length(geneName)){ message(geneName[i]," has an RPKM of ",expression[i]) } ``` ``` ## Ikzf1 has an RPKM of 10.4 ``` ``` ## Myc has an RPKM of 4.3 ``` ``` ## Igll1 has an RPKM of 6.5 ``` --- ## Loops and conditionals Loops can be combined with conditional statements to allow for complex control of their execution over R objects. .pull-left[ ``` r for(i in 1:8){ if(i > 5){ message("Number ",i," is greater than 5") }else if(i == 5){ message("Number ",i," is 5") }else{ message("Number ",i," is less than 5") } } ``` ] .pull-right[ ``` ## Number 1 is less than 5 ``` ``` ## Number 2 is less than 5 ``` ``` ## Number 3 is less than 5 ``` ``` ## Number 4 is less than 5 ``` ``` ## Number 5 is 5 ``` ``` ## Number 6 is greater than 5 ``` ``` ## Number 7 is greater than 5 ``` ``` ## Number 8 is greater than 5 ``` ] --- ## Breaking loops We can use conditionals to exit a loop if a condition is satisfied, just like a while loop. .pull-left[ ``` r for(i in 1:8){ if(i < 5){ message("Number ",i," is less than 5") }else if(i == 5){ message("Number ",i," is 5") break }else{ message("Number ",i," is greater than 5") } } ``` ] .pull-right[ ``` ## Number 1 is less than 5 ``` ``` ## Number 2 is less than 5 ``` ``` ## Number 3 is less than 5 ``` ``` ## Number 4 is less than 5 ``` ``` ## Number 5 is 5 ``` ] --- ## Functions to loop over data types There are functions which allow you to loop over a data type and apply a function to the subsection of that data. - **apply** - Apply function to rows or columns of a matrix/data frame and return results as a vector,matrix or list. - **lapply** - Apply function to every element of a vector or list and return results as a list. - **sapply** - Apply function to every element of a vector or list and return results as a vector,matrix or list. --- ## Looping - apply() The **apply()** function applys a function to the rows or columns of a matrix. The arguments **FUN** specifies the function to apply and **MARGIN** whether to apply the functions by rows/columns or both. ``` apply(DATA,MARGIN,FUN,...) ``` - **DATA** - A matrix (or something to be coerced into a matrix) - **MARGIN** - 1 for rows, 2 for columns, c(1,2) for cells --- ## apply() example ``` r matExample <- matrix(c(1:4),nrow=2,ncol=2,byrow=T) matExample ``` ``` ## [,1] [,2] ## [1,] 1 2 ## [2,] 3 4 ``` Get the mean of rows ``` r apply(matExample,1,mean) ``` ``` ## [1] 1.5 3.5 ``` Get the mean of columns ``` r apply(matExample,2,mean) ``` ``` ## [1] 2 3 ``` --- ## Additional arguments to apply Additional arguments to be used by the function in the apply loop can be specified after the function argument. Arguments may be ordered as if passed to function directly. For **paste()** function however this isn't possible. ``` r apply(matExample,1,paste,collapse=";") ``` ``` ## [1] "1;2" "3;4" ``` --- ## lapply() .pull-left[ Similar to apply, **lapply** applies a function to every element of a vector or list. **lapply** returns a list object containing the results of evaluating the function. ``` r lapply(c(4,16),sqrt) ``` ``` ## [[1]] ## [1] 2 ## ## [[2]] ## [1] 4 ``` ] .pull-right[ As with apply() additional arguments can be supplied after the function name argument. ``` r lapply(list(c(2,4),c(NA,9)),mean, na.rm=T) ``` ``` ## [[1]] ## [1] 3 ## ## [[2]] ## [1] 9 ``` ] --- ## sapply() **sapply** (*smart apply*) acts as lapply but attempts to return the results as the most appropriate data type. Here sapply returns a vector where lapply would return lists. ``` r exampleVector <- c(4,9,16) exampleList <- list(4,9,16) sapply(exampleVector, sqrt) ``` ``` ## [1] 2 3 4 ``` ``` r sapply(exampleList, sqrt) ``` ``` ## [1] 2 3 4 ``` --- ## sapply() examples .pull-left[ In this example lapply returns a list of vectors from the quantile function. ``` r exampleList <- list(row1=1:10, row2=6:15, row3=10:20) exampleList ``` ``` ## $row1 ## [1] 1 2 3 4 5 6 7 8 9 10 ## ## $row2 ## [1] 6 7 8 9 10 11 12 13 14 15 ## ## $row3 ## [1] 10 11 12 13 14 15 16 17 18 19 20 ``` ] .pull-right[ ``` r lapply(exampleList, quantile) ``` ``` ## $row1 ## 0% 25% 50% 75% 100% ## 1.00 3.25 5.50 7.75 10.00 ## ## $row2 ## 0% 25% 50% 75% 100% ## 6.00 8.25 10.50 12.75 15.00 ## ## $row3 ## 0% 25% 50% 75% 100% ## 10.0 12.5 15.0 17.5 20.0 ``` ] --- ## sapply() examples Here is an example of sapply parsing a result from the quantile function in a *smart* way. When a function always returns a vector of the same length, sapply will create a matrix with elements by column. ``` r sapply(exampleList, quantile) ``` ``` ## row1 row2 row3 ## 0% 1.00 6.00 10.0 ## 25% 3.25 8.25 12.5 ## 50% 5.50 10.50 15.0 ## 75% 7.75 12.75 17.5 ## 100% 10.00 15.00 20.0 ``` --- ## sapply() examples When sapply cannot parse the result to a vector or matrix, a list will be returned. ``` r exampleList <- list(df=data.frame(sample=paste0("patient",1:2), data=c(1,12)), vec=c(1,3,4,5)) sapply(exampleList, summary) ``` ``` ## $df ## sample data ## Length:2 Min. : 1.00 ## Class :character 1st Qu.: 3.75 ## Mode :character Median : 6.50 ## Mean : 6.50 ## 3rd Qu.: 9.25 ## Max. :12.00 ## ## $vec ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1.00 2.50 3.50 3.25 4.25 5.00 ``` --- ## Time for an exercise! Exercise on loops and conditional branching can be found [here](../..//exercises/exercises/ConditionsAndLoops_exercise.html) --- ## Answers to exercise Answers can be found here [here](../..//exercises/answers/ConditionsAndLoops_answers.html) --- class: inverse, center, middle # Custom functions <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Custom functions You can create your own functions easily in R using the **<-** assignment and **function** statement. We put all our input arguments in **()** and all code to be executed in **{}**. Anything to be received back from the function is indicated by the **return** statement. In this case we can recapitulate the square function. ``` r myFirstFunction <- function(MYARGUMENT){ ................... ..CODE_TO_EXECUTE.. ................... return(MYRESULT) } myFirstFunction(MYARGUMENT=MY_USER_SUPPLIED_ARGUMENT) ``` --- ## Custom functions First lets create a function that can sum two numbers. We can call the function **myFirstFunction** and set our arguments as **num1** and **num2**. ``` r myFirstFunction <- function(num1,num2){ sumNum <- num1+num2 return(sumNum) } myResult <- myFirstFunction(num1=2,num2=3) myResult ``` ``` ## [1] 5 ``` --- ## Returning multiple values We can only **return** 1 object at a time from function. Here we create the multiple of our numbers and try and return alongside our sum. ``` r myFirstFunction <- function(num1,num2){ sumNum <- num1+num2 multipleNum <- num1*num2 return(sumNum,multipleNum) } myResult <- myFirstFunction(num1=2,num2=3) ``` ``` ## Error in return(sumNum, multipleNum): multi-argument returns are not permitted ``` --- ## Returning multiple values A simple solution is to pass back an object that contains both results. Here we create a named vector to return our multiple and sum. ``` r myFirstFunction <- function(num1,num2){ sumNum <- num1+num2 multipleNum <- num1*num2 VectorOfResults <- c(sumNum,multipleNum) names(VectorOfResults) <- c("multiple","sum") return(VectorOfResults) } myResult <- myFirstFunction(num1=2,num2=3) myResult ``` ``` ## multiple sum ## 5 6 ``` --- ## Returning multiple values Lists are very useful here as they can contain different object classes in their elements and so we can return mixed data types from our functions using this. Below we return a list with elements of the input arguments as a vector and the result as a data.frame. ``` r myFirstFunction <- function(num1,num2){ sumNum <- num1+num2 multipleNum <- num1*num2 InputNumbers <- c(FirstNum=num1,SecondNum=num2) DF <- data.frame(Sum=sumNum,Multiple=multipleNum) listToReturn <- list(Input=InputNumbers,Result=DF) return(listToReturn) } myResult <- myFirstFunction(num1=2,num2=3) myResult ``` ``` ## $Input ## FirstNum SecondNum ## 2 3 ## ## $Result ## Sum Multiple ## 1 5 6 ``` --- ## Evaluate until return In a function containing a **return** statement, the code up until the return statement is evaluated and anything after the return statement is not evaluated. ``` r myFirstFunction <- function(num1,num2){ sumNum <- num1+num2 multipleNum <- num1*num2 InputNumbers <- c(FirstNum=num1,SecondNum=num2) DF <- data.frame(Sum=sumNum,Multiple=multipleNum) listToReturn <- list(Input=InputNumbers,Result=DF) message("Before return") return(listToReturn) message("After return") } myResult <- myFirstFunction(num1=2,num2=3) ``` ``` ## Before return ``` ``` r myResult ``` ``` ## $Input ## FirstNum SecondNum ## 2 3 ## ## $Result ## Sum Multiple ## 1 5 6 ``` --- ## No return statement If a function does not contain a return statement, the result of the last line in function is returned. ``` r myFirstFunction <- function(num1,num2){ sumNum <- num1+num2 multipleNum <- num1*num2 InputNumbers <- c(FirstNum=num1,SecondNum=num2) DF <- data.frame(Sum=sumNum,Multiple=multipleNum) listToReturn <- list(Input=InputNumbers,Result=DF) listToReturn } myResult <- myFirstFunction(num1=2,num2=3) myResult ``` ``` ## $Input ## FirstNum SecondNum ## 2 3 ## ## $Result ## Sum Multiple ## 1 5 6 ``` --- ## Variable scope in functions Variables defined in the arguments or within the function exist only within the environment of the function. ``` r myFirstFunction <- function(num1,num2){ sumNum <- num1+num2 return(sumNum) } myResult <- myFirstFunction(num1=2,num2=3) myResult ``` ``` ## [1] 5 ``` ``` r sumNum ``` ``` ## Error: object 'sumNum' not found ``` --- ## Variable scope in functions Variables defined in the global environment outside the function are available within the function. ``` r my3rdNumber <- 4 myFirstFunction <- function(num1,num2){ sumNum <- num1+num2+my3rdNumber return(sumNum) } myResult <- myFirstFunction(num1=2,num2=3) myResult ``` ``` ## [1] 9 ``` --- ## Variable scope in functions Changes to variables defined in the global environment from within the function will not be reflected in the global environment. ``` r my3rdNumber <- 4 myFirstFunction <- function(num1,num2){ my3rdNumber <- num1+num2+my3rdNumber return(my3rdNumber) } myResult <- myFirstFunction(num1=2,num2=3) myResult ``` ``` ## [1] 9 ``` ``` r my3rdNumber ``` ``` ## [1] 4 ``` --- ## Variable scope in functions We can use a **<<-** to assign to the environment outside the function although we will rarely do this ``` r my3rdNumber <- 4 myFirstFunction <- function(num1,num2){ my3rdNumber <<- num1+num2+my3rdNumber return(my3rdNumber) } myResult <- myFirstFunction(num1=2,num2=3) myResult ``` ``` ## [1] 9 ``` ``` r my3rdNumber ``` ``` ## [1] 9 ``` --- ## Argument defaults Functions can have defaults for their arguments which will be used when arguments are not specified. ``` r myFirstFunction <- function(num1=1,num2=3){ my3rdNumber <- num1+num2+my3rdNumber return(my3rdNumber) } myFirstFunction() ``` ``` ## [1] 13 ``` ``` r myFirstFunction(3,4) ``` ``` ## [1] 16 ``` --- ## Custom function example User defined functions can be as long and complicated as you want, taking in many variables, printing out messages and utilizing other functions within them. Lets make z-scores: ``` r my_zscore <- function(my_number, my_vector){ my_mean <- mean(my_vector) message("Mean is ", my_mean) diff_from_mean <- my_number-my_mean stdev <- sd(my_vector) my_z <- diff_from_mean/stdev return(my_z) } A <- rnorm(20) my_zscore(my_number=A[1], my_vector=A) ``` ``` ## Mean is -0.0203387124590891 ``` ``` ## [1] 0.6985356 ``` --- ## Debugging functions In Rstudio, if we want to see what may be happening within our function we can use **debug** function with our function of interest's name. We can stop debugging with the **undebug** function. ``` r debug(myFirstFunction) undebug(myFirstFunction) ``` --- ## Custom functions and apply These custom functions can also be utilized with apply. ``` r sapply(A, my_zscore, my_vector=A) ``` ``` ## [1] 0.69853558 -0.27348868 0.95493411 -0.93758466 -1.25074932 0.58066086 ## [7] -1.70156000 0.54276755 -1.05686510 -0.62777790 -0.79663393 -0.13015720 ## [13] -0.29857326 0.68579362 0.29999454 -0.02228858 1.86329420 -1.13874680 ## [19] 0.76959461 1.83885036 ``` --- class: inverse, center, middle # Getting Additional Libraries <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Loading libraries Libraries can be loaded using the library() function with an argument of the name of the library. ``` r library(ggplot2) ``` You can see what libraries are available in the Packages panel or by the library() function with no arguments supplied. ``` r library() ``` --- ## Installing libraries Libraries can be installed through the R studio menu. **-> Tools -> Install packages ..** Or by using the install.packages() command. ``` r install.packages("Hmisc") ``` --- ## Installing other libraries By default when we run *install.packages()* we are installing from [CRAN](https://cran.r-project.org/). This is the central network of servers that are the repository for R and it's packages. We can also install from other sources. For Bioinformatics we have **[Bioconductor](http://bioconductor.org/)**. There is a huge collection of packages that are interoperable and specific to our interest i.e. Bioconductor packages undergo peer review, undergo rigorous testing and are required to have vignettes unlike CRAN packages so we tyically hold these packages in a higher regard. --- ## BiocManager::install() Bioconductor uses a different install function which is contained in the BiocManager package. We first have to install this from CRAN. ``` r install.packages("BiocManager") ``` Once installed, BiocManager will install Bioconductor packages or CRAN packages. ``` r # Bioconductor package BiocManager::install("Rsamtools") # CRAN package BiocManager::install("ggplot2") ``` --- ## Github packages Another resource for packages is GitHub. There are a wide variety of packages. Some very polished. Some less so. There are not the same systems to validate the package works as in Bioconductor and CRAN, so always make sure that it is a trusted and reputable author. Here we will demonstrate the installation of the [SeuratData package](https://github.com/satijalab/seurat-data) containing single cell data from the Satija lab. Again we must install a new installation package called *devtools* from CRAN. We can then use the *install_github()* function along with the name of the GitHub repository to install the package. ``` r install.packages("devtools") devtools::install_github('satijalab/seurat-data') ``` --- ## Which install? As we have alluded to there is a hierarchy of trust when it comes to installing software. As a default we typically use BiocManager to install our software, but sometimes we may have to deviate. Every R package (even on Bioconductor) will have a GitHub page. If there is a new update it may not appear in the final Bioconductor version for months, so sometimes you may want to use the GitHub version. A quick note, is that all these functions also have ways of managing which versions of packages are installed. --- class: inverse, center, middle # Scripts <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Saving scripts Once we have got our functions together and know how we want to analyse our data, we can save our analysis as a **script**. By convention R scripts typically end in **.r** or **.R** To save a file in RStudio. **-> File -> Save as** To open a previous R script **->File -> Open File..** To save all the objects (workspace) with extension **.RData** **->Session -> Save workspace as** --- ## Sourcing scripts R scripts allow us to save and reuse custom functions we have written. To run the code from an R script we can use the **source()** function with the name of the R script as the argument. The file **dayOfWeek.r** in the "scripts" directory contains a simple R script to tell you what day it is. ``` #Contents of dayOfWeek.r dayOfWeek <- function(){ return(gsub(" .*","",date())) } ``` ``` r source("scripts/dayOfWeek.r") dayOfWeek() ``` ``` ## [1] "Sat" ``` --- ## Rscript R scripts can be run non-interactively from the command line with the **Rscript** command, usually with the option **--vanilla** to avoid saving or restoring workspaces. All messages/warnings/errors will be output to the console. ``` Rscript --vanilla myscript.r ``` An alternative to Rscript is **R CMD BATCH**. Here all messages/warnings/errors are directed to a file and the processing time appended. ``` R CMD BATCH myscript.r ``` --- ## Sending arguments to Rscript To provide arguments to an R script at the command line we must add **commandArgs()** function to parse command line arguments. ``` r args <- commandArgs(TRUE) myFirstArgument <- args[1] myFirstArgument ``` ``` '10' ``` ``` r as.numeric(myFirstArgument) ``` ``` 10 ``` Since vectors can only be one type, all command line arguments are strings and must be converted to numeric if needed with **as.numeric()**. --- ## Time for an exercise! Exercise on functions can be found [here](../..//exercises/exercises/Functions_exercise.html) --- ## Answers to exercise Answers can be found here [here](../..//exercises/answers/Functions_answers.html) --- ## Getting help - From us: Raise an issue on our [GitHub](https://github.com/RockefellerUniversity/Intro_To_R_1Day/issues). This can be suggestions, comments, edits or questions (about content or the slides themselves). - Google - Local friendly bioinformaticians and computational biologists. - [Stackoverflow](http://stackoverflow.com/) - [R-help](https://stat.ethz.ch/mailman/listinfo/r-help)