class: middle, inverse, title-slide .title[ # Intro to Python - Session 1 ] .subtitle[ ##
Bioinformatics Resource Center - Rockefeller University ] .author[ ###
http://rockefelleruniversity.github.io/Intro_To_Python/
] .author[ ###
brc@rockefeller.edu
] --- ``` ## Using Python: /usr/bin/python3.12 ## Creating virtual environment '~/.virtualenvs/r-reticulate' ... ``` ``` ## Done! ## Installing packages: pip, wheel, setuptools ``` ``` ## Virtual environment '~/.virtualenvs/r-reticulate' successfully created. ## Using virtual environment '~/.virtualenvs/r-reticulate' ... ``` --- ## Session Overview - [Background to Python](https://rockefelleruniversity.github.io/Intro_To_Python/presentations/singlepage/Session1.html#Background_of_Python) - [Set up](https://rockefelleruniversity.github.io/Intro_To_Python/presentations/singlepage/Session1.html#Set_Up) - [Variables and Functions](https://rockefelleruniversity.github.io/Intro_To_Python/presentations/singlepage/Session1.html#Variables_and_Functions) - [Data Objects](https://rockefelleruniversity.github.io/Intro_To_Python/presentations/singlepage/Session1.html#Data_Objects) - [Custom Functions](https://rockefelleruniversity.github.io/Intro_To_Python/presentations/singlepage/Session1.html#Custom_Functions) - [Control Statements](https://rockefelleruniversity.github.io/Intro_To_Python/presentations/singlepage/Session1.html#Control_Statements) --- class: inverse, center, middle # Background of Python <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## What is Python? Python is a high-level, general-purpose programming language. Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 One of the most noticeable differences comes from the emphasis on code readability: <div style="text-indent: 2em;"> it uses significant indentation. </div> Python 3.0, released in 2008, was a major revision not completely backward-compatible with earlier versions. Python 2.7 and older is officially unsupported but some tools require it. <img src="imgs/Python.png" alt="python" height="250" width="250"> --- ## What is Python to you? Python has a huge user base across many disciplines. These users have developed a wide range of libraries which you can use. As python is used by many fields, it is a useful language for adapting novel approaches for bioinformatics: such as deep learning methods for scRNAseq or Image Analysis  --- ## Python vs R Though R has long been the cornerstone of bioinformatics, python is growing in use. Though there are core utility packages that have been around a long time such as Biopython. It is the new technique-specific packages that are driving the surging popularity of python i.e Scanpy. <img src="imgs/scanpy_vs_seurat.png" alt="comparison" height="350" width="450"> --- ## Python vs R The strengths discussed when considering these languages are clear. That said, in both cases Python and R can handle their supposed weaknesses. |R |Python | |:------------|:---------------------| |Plotting |Large Data | |Statistics |Machine Learning | |Bioconductor |Most Popular Language | Realms of Python specifically relevant to those who are interested in Bioinformatics include [Biopython](https://biopython.org/), [PyMOL](https://www.pymol.org/), [sciKit](https://scikit-learn.org/stable/), [scanPy](https://scanpy.readthedocs.io/en/stable/), and Image Analysis. --- class: inverse, center, middle # Set Up <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Materials All prerequisites, links to material and slides for this course can be found on github. * [Intro_To_Python](https://rockefelleruniversity.github.io/Intro_To_Python/) Or can be downloaded as a zip archive from here. * [Download zip](https://github.com/rockefelleruniversity/Intro_To_Python/zipball/master) --- ## Course materials Once the zip file in unarchived. All presentations as HTML slides and pages, their R code and HTML practical sheets will be available in the directories underneath. * **r_course/presentations/slides/** Presentations as an HTML slide show. * **r_course/presentations/singlepage/** Presentations as an HTML single page. * **r_course/presentations/r_code/** R code in presentations. * **r_course/exercises/** Practicals as HTML pages. * **r_course/answers/** Practicals with answers as HTML pages and R code solutions. --- ## Starting with Python Many laptops will come with python installed so getting started is easy. You will simply be able write **python** into Terminal or Command Prompt and a **Python console** will open. This is an interactive python session. This will work with most of what will do in the next few sessions as we are working on the basics. <img src="imgs/pythonconsole.png" alt="comparison" height="250" width="700"> --- ## Setting Up Conda and Python Each version of python and python packages will respond slightly differently to commands given. We therefore want to make sure when you use python for analysis that you control which version you use. This is important for reproducibility. We will run a custom install of python using conda. Conda manages software and packages. Using Conda should make installations much easier and allow us to keep track of software versions. We can install Conda from here. We will specifically want to use Miniconda: https://conda.io/projects/conda/en/latest/user-guide/install/index.html If you have installed this correctly, running this on terminal/command prompt should give you a list of conda commands ``` sh conda ``` --- ## Setting Up Conda and Python Conda is built on the idea of environments. An environment is a directory that contains a specific collection of Conda packages that you have installed. For today we will make sure we have python and the python packages we need for the training. The first step is we create a new environment. We will then activate it to expose the environment. Lastly we can install python. ``` sh conda create -n intro_to_python conda activate intro_to_python conda install python ``` --- ## Setting Up Conda and Python We also want some specific Python packages. These mostly contain functions that are not present in the base Python distribution. ``` sh conda install numpy conda install scipy conda install matplotlib conda install seaborn conda install jupyter ``` --- ## Python and IDEs You can run python from within your terminal/command prompt. This will give you access to the python console. Many people prefer to use an **I**ntegrated **D**evelopment **E**nvironment to augment their experience while coding. They allow easy writing of scripts, visualization of plots, file navigation, and access to many customization features in additional panes. Examples include RStudio, pyCharm, Xcode etc. We will be using Visual Studio Code from Windows.  --- ## Navigating VS Code When you open VS Code it will asks you what you want to do. VS Code can be used with many programming languages, but you often install extensions to enable it to support the formatting for that language. Click on the `Extension` icon on the side tab (View > Extension also works). Search for Python, and install the appropriate extension. .pull-left[  ] .pull-right[  ] --- ## Navigating VS Code Now VS Code is all set up for python we want to make sure it using the right python i.e. the python we have just installed. Click on the search bar and type: ``` >Python: Select Interpreter ``` .pull-left[  ] .pull-right[  ] --- ## Open a python script We are now all set up. We will now create a python script. Just click `New File...` on the Welcome Page (or File > New File...), then choose python script. In this scripting panel we have opened we often will type code, write notes and build scripts. --- ## Open a python console As we mentioned before we will be working with our Python interactively. This means we need a Python console to work with. This is wehre the code is actually evaluated. We can manually open one up by Terminal > New Terminal. Once the Terminal is open, you can then just open Python, by typing `python` into the new terminal window. There is is an easier shortcut to do this. Once you start developing code the easiest way to open it is to use the `Shift + Enter` shortcut. --- ## Why VS code and IDEs As VS code is an IDE it allows us to access and run python. But also do several other things from the same portal i.e. - Explore files - Open/Write scripts - Shortcuts for common python utilities - Have a live plotting window - GitHub integration - Advanced coding features for debugging and LLM support We won't full dive into everything but getting familiar early is best. --- ## Interactive or scripts? As with many languages you can work with python in two main ways: interactive or scripts. When people think about coding they are often thinking about the interactive **console**. This is what we saw earlier when we first opened python. When you work in this way lines of code are submitted as you enter them. You often do this when you are developing an analysis and trying out parameters. This is mostly how we will work in the training. When you want to automate something, i.e. an analysis workflow, you will write a script. You can then run this script with python and it will run every line of code for you sequentially. --- ## Quick Recap  --- class: inverse, center, middle # Variables and Functions <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Simple Calculation At its simplest you can just use python as a fancy calculator: ``` python 1+1 ``` ``` ## 2 ``` ``` python 2*5 ``` ``` ## 10 ``` --- ## Functions To take things further there are many functions built-in to python. These are saved chunks of code that will do a task based on the arguments you provide. You can tell there is a function when there is a string immediately followed by a set of parenthesis: `myfunction()` Here we use the round function: ``` python round(3.14159) ``` ``` ## 3 ``` --- ## Help with functions To get help with a function you can use the `help()` function. This will open up the help page for this function. Hopefully this will contain information about what arguments it are accepted and what is returned by the function. In this case we can see there is an additional optional argument `ndigits`. This has a default value of `None`. ``` python help(round) ``` ``` ## Help on built-in function round in module builtins: ## ## round(number, ndigits=None) ## Round a number to a given precision in decimal digits. ## ## The return value is an integer if ndigits is omitted or None. Otherwise ## the return value has the same type as the number. ndigits may be negative. ``` --- ## Function and arguments If we want to update our rounding result to allow for more decimal places we can add the second argument. For simple function like this, s long as the order is correct we do not to specify the argument. ``` python round(3.14159, 3) ``` ``` ## 3.142 ``` We can still run a function with disordered arguments by naming them. ``` python round(ndigits=3, number=3.14159) ``` ``` ## 3.142 ``` --- ## Variables Often you will want to save something in your environment for use later on. We do this by creating a variable by assignment with the `=` sign. ``` python greeting = 'Hello!' greeting ``` ``` ## 'Hello!' ``` ``` python number = 3.14159 number ``` ``` ## 3.14159 ``` --- ## Variables When assigning a variable there are certain things you can and cannot do. Good: * Only letters, numbers, and the underscore Bad: * Don't use non-alphanumerics. This includes: . % + - * # * You can't start with a number The best thing to do is name it something short and simple, that makes sense. --- ## Variables and functions Once we have a variable we can then use it inside functions. The vector name is acting as an alias for what it contains. ``` python number ``` ``` ## 3.14159 ``` ``` python round(number) ``` ``` ## 3 ``` --- ## Variables There are many kinds of variables. The most basic types are: `str`,`float`, `int` and `boolean`. You can always check what kind you have with the `type()` function. ``` python greeting = 'Hello!' type(greeting ) ``` ``` ## <class 'str'> ``` ``` python number = 3.14159 type(number ) ``` ``` ## <class 'float'> ``` ``` python newnumber = round(number) newnumber ``` ``` ## 3 ``` ``` python type(newnumber) ``` ``` ## <class 'int'> ``` ``` python boolean = True type(boolean) ``` ``` ## <class 'bool'> ``` --- ## Variables Coercion We can manually set the type using: `str()`, `float()`, `int()` and `bool()`. ``` python string_number = str(number) string_number ``` ``` ## '3.14159' ``` ``` python float_string_number = float(string_number) float_string_number ``` ``` ## 3.14159 ``` ``` python int_float_string_number = int(float_string_number) int_float_string_number ``` ``` ## 3 ``` ``` python int_float_string_number_boolean = bool(float_string_number) int_float_string_number_boolean ``` ``` ## True ``` --- ## Variables Coercion These functions do not always work, if there is not a clear rationale for how to resolve the function. ``` python greeting = 'Hello!' int(greeting) ``` ``` ## could not convert string to float: 'Hello!' ``` --- ## A quick aside You will run into errors coding. DON'T PANIC. Most of the time the error messages are very clear. And if they are not a quick google will often clear it up. In this case we can break it down: * ValueError - The content is invalid for the operation. The first statement is the overarching name for the error. * invalid literal for int() - Specifically the int() function is expecting something different * with base 10 - It is expecting base 10 numbers/integers * 'Hello!' - This is what you gave it. We can see it doesn't match the criteria above. --- ## Concatenation Strings can be concatenated easily. ``` python newgreeting = 'Hi' ' there' newgreeting ``` ``` ## 'Hi there' ``` ``` python newgreeting2 = 'Hi' + ' there' newgreeting2 ``` ``` ## 'Hi there' ``` ``` python newgreeting3 = 'Hi' newgreeting3 += ' there' newgreeting3 ``` ``` ## 'Hi there' ``` ``` python newgreeting4 = 'Hi' * 5 newgreeting4 ``` ``` ## 'HiHiHiHiHi' ``` --- ## Time for an exercise! Exercise on the data types we have covered so far can be found [here](../../exercises/exercises/Exercise1_exercise.html) --- ## Answers to the exercise Answers can be found here [here](../../exercises/answers/Exercise1_answers.html) --- class: inverse, center, middle # Data Objects <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Lists Python has many options for storing data. The simplest is a list. A list has a few key characteristics: * the order of the elements matters and can be used for indexing * they are mutable and dynamic (elements can be modified and length can be changed) * they can hold mixed types of data --- ## Lists Lists are denoted with square brackets. ``` python my_strs = ['a','b','c','d','e'] my_strs ``` ``` ## ['a', 'b', 'c', 'd', 'e'] ``` ``` python my_ints= [1,2,3,4,5] my_ints ``` ``` ## [1, 2, 3, 4, 5] ``` ``` python my_floats = [1.1,2.2,3.3,4.4,5.5] my_floats ``` ``` ## [1.1, 2.2, 3.3, 4.4, 5.5] ``` --- ## Indexing Lists We can also use the square brackets to extract specific values from our list. ``` python my_strs[2] ``` ``` ## 'c' ``` Here we get the third value from our list using the number 2. That is because python uses zero indexing; counting in python starts at 0, not 1. ``` python my_strs[0] ``` ``` ## 'a' ``` --- ## Indexing Lists Sometimes we have a long list, but we know we want the final value. We can use a `-` to indicate how far from the end we want to index. ``` python my_strs[-1] ``` ``` ## 'e' ``` --- ## Indexing Lists We can also create a sublist by slicing our list with the `:`. ``` python my_strs[2:4] ``` ``` ## ['c', 'd'] ``` ``` python my_strs[2:-1] ``` ``` ## ['c', 'd'] ``` ``` python my_strs[2:] ``` ``` ## ['c', 'd', 'e'] ``` Key point: You'll notice that slicing in Python is inclusive of the first element, but exclusive of the last element. So in our example, `my_strs[2:4]` starts with `my_strs[2]` but does not include `my_strs[4]`. --- ## Complex Lists List are general containers for a variety of data types. This means you can make a list of lists! ``` python my_lists = ["a",['b1','b2'], ['c1',['c2']]] my_lists ``` ``` ## ['a', ['b1', 'b2'], ['c1', ['c2']]] ``` We can still use indexing to deal with this mess of lists ``` python my_lists[2][1][0] ``` ``` ## 'c2' ``` --- ## Indexing Lists We can use the assignment we have been using this whole time to break open this nested list structure. ``` python my_lists ``` ``` ## ['a', ['b1', 'b2'], ['c1', ['c2']]] ``` ``` python list1, list2, list3 = my_lists list1 ``` ``` ## 'a' ``` ``` python list2 ``` ``` ## ['b1', 'b2'] ``` ``` python list3 ``` ``` ## ['c1', ['c2']] ``` --- ## Concatenation We can concatenate lists, just as we did with strings. ``` python biglist = my_ints + my_strs biglist ``` ``` ## [1, 2, 3, 4, 5, 'a', 'b', 'c', 'd', 'e'] ``` ``` python biglist2 = [1,2,3,4,5] biglist2 += my_strs biglist2 ``` ``` ## [1, 2, 3, 4, 5, 'a', 'b', 'c', 'd', 'e'] ``` ``` python biglist3 = my_strs * 5 biglist3 ``` ``` ## ['a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e'] ``` --- ## Functions and lists There are many useful functions for working with lists. Many of these functions work directly on the list. This means you don't need to assign the result back to the object. The structure is VARIABLE.function(). These are called attributes. Here we use the append function to extend our list. ``` python my_strs.append('f') my_strs ``` ``` ## ['a', 'b', 'c', 'd', 'e', 'f'] ``` ``` python my_strs.append(1) my_strs ``` ``` ## ['a', 'b', 'c', 'd', 'e', 'f', 1] ``` --- ## Functions and lists There are many useful functions built into the base version of python. * `.insert()` inserts an argument into a specific position in the list ``` python my_strs.insert(3,'c') my_strs ``` ``` ## ['a', 'b', 'c', 'c', 'd', 'e', 'f', 1] ``` * `.remove()` removes something from the list, but will only remove the first instance ``` python my_strs.remove('c') my_strs ``` ``` ## ['a', 'b', 'c', 'd', 'e', 'f', 1] ``` --- ## Functions and lists There are many useful functions built into the base version of python. * `.index()` reveals which position in the list is the supplied argument ``` python my_strs.index('c') ``` ``` ## 2 ``` * the generic `del` statement removes an argument from a specific position in the list ``` python del my_strs[3] my_strs ``` ``` ## ['a', 'b', 'c', 'e', 'f', 1] ``` * we can also test membership of an element with the `in` operator ``` python "a" in my_strs ``` ``` ## True ``` --- ## Functions and lists * `.sort()` will sort your list. This works both with numerical and string data. ``` python my_list = [1,4,9,4,11,12,6] my_list.sort() my_list ``` ``` ## [1, 4, 4, 6, 9, 11, 12] ``` ``` python my_list.sort(reverse=True) my_list ``` ``` ## [12, 11, 9, 6, 4, 4, 1] ``` ``` python my_list = ["b","c","a"] my_list.sort() my_list ``` ``` ## ['a', 'b', 'c'] ``` --- ## Mutating vs Non-mutating functions You may have noticed that the `sort` function does not actually return the sorted list. It returns 'None' and modifies the list in place. This might be different from other languages you have used. ``` python sort_result = my_list.sort() sort_result # this returns None, modifies object in place ``` Some python functions do return the modified object and leave the original object unchanged. We can try this with the `sorted` function. ``` python new_list = [1,4,9,4,11,12,6] sorted_result = sorted(new_list) sorted_result # function modified object ``` ``` ## [1, 4, 4, 6, 9, 11, 12] ``` ``` python new_list # old object is unchanged ``` ``` ## [1, 4, 9, 4, 11, 12, 6] ``` --- ## Tuples Tuples are another type of object in python. They look and behave a lot like lists. But where lists are dynamic and mutable, tuples cannot be changed. As a result tuples are more memory efficient than lists. When making a tuple you use parentheses instead of square brackets. ``` python my_list = ['a','b','c','d','e'] my_list[0] = 'z' my_list ``` ``` ## ['z', 'b', 'c', 'd', 'e'] ``` ``` python my_tuple = ('a','b','c','d','e') my_tuple[0] = 'z' ``` ``` ## 'tuple' object does not support item assignment ``` --- ## Tuple/List coercion As with str/int/float/bool you can easily convert a list to a tuple and vice versa. Simply use the `list()` and `tuple()` functions. ``` python my_list = ['a','b','c','d','e'] my_tuple_list = tuple(my_list) my_tuple_list ``` ``` ## ('a', 'b', 'c', 'd', 'e') ``` ``` python type(my_tuple_list) ``` ``` ## <class 'tuple'> ``` ``` python my_list_tuple_list = list(my_tuple_list) my_list_tuple_list ``` ``` ## ['a', 'b', 'c', 'd', 'e'] ``` ``` python type(my_list_tuple_list) ``` ``` ## <class 'list'> ``` --- ## Functions, tuples and lists **Remember!** * Square brackets [] for indexing and lists. * Parentheses () for functions and tuples. --- ## Dictionaries Another data type are dictionaries. Dictionaries are made of key:value pairs. * A key is some kind of unique identifier, typically a short string. * The value is a corresponding piece of data that is often more complex (e.g. different data types). * While values can be modified, keys are immutable. This structure allows the organization of your data, and gives you the ability to grab out values using the key. --- ## Dictionaries Dictionaries are made with the curly brackets. Each entry consists of a pair of objects. The `key` identifier, and the `value`. Here you can see we have multiple types and shapes of data contained in our `values`. ``` python my_dict = { 'my_list': [1,2,3], 'my_tuple': (4,5,6), 'language': 'python', 'technique': 'scRNAseq' } my_dict ``` ``` ## {'my_list': [1, 2, 3], 'my_tuple': (4, 5, 6), 'language': 'python', 'technique': 'scRNAseq'} ``` --- ## Dictionaries There are attribute functions that we can use to access the keys and values from our dictionary. ``` python my_dict.keys() ``` ``` ## dict_keys(['my_list', 'my_tuple', 'language', 'technique']) ``` ``` python my_dict.values() ``` ``` ## dict_values([[1, 2, 3], (4, 5, 6), 'python', 'scRNAseq']) ``` --- ## Dictionary indexes We can index our dictionary using the key values and the square brackets, similar to other objects. ``` python my_dict['my_list'] ``` ``` ## [1, 2, 3] ``` We can also use the `.get()` attribute. ``` python my_dict['language'] ``` ``` ## 'python' ``` ``` python my_dict.get('language') ``` ``` ## 'python' ``` --- ## Dictionary indexes Unlike lists, dictionaries cannot be subset with a numeric index and must be indexed with a key value. ``` python my_dict[0] ``` ``` ## 0 ``` --- ## Dictionaries It is easy to add additional entries with the `.setdefault()` attribute. We just provide a new `key/value` pair. ``` python my_dict.setdefault('metadata', True) ``` ``` ## True ``` ``` python my_dict ``` ``` ## {'my_list': [1, 2, 3], 'my_tuple': (4, 5, 6), 'language': 'python', 'technique': 'scRNAseq', 'metadata': True} ``` We check our addition using the `in` operator. This performs a logical test. We can test specifically on the keys, or the dictionary as a whole. ``` python 'metadata' in my_dict.keys() ``` ``` ## True ``` ``` python 'metadata' in my_dict ``` ``` ## True ``` --- ## Concatenating Dictionaries Often we want to stick multiple dictionaries together. ``` python dict_1 = {'my_list': [1,2,3], 'my_tuple': (4,5,6),} dict_2 = {'a': 1, 'b': 2} ``` There are 3 options: 1) Unpacking operator ``` python dict_3 = {**dict_1, **dict_2} dict_3 ``` ``` ## {'my_list': [1, 2, 3], 'my_tuple': (4, 5, 6), 'a': 1, 'b': 2} ``` --- ## Concatenating Dictionaries 2) Merge with pipe ``` python dict_3 = dict_1 | dict_2 dict_3 ``` ``` ## {'my_list': [1, 2, 3], 'my_tuple': (4, 5, 6), 'a': 1, 'b': 2} ``` --- ## Concatenating Dictionaries 3) Update function ``` python dict_1.update(dict_2) dict_1 ``` ``` ## {'my_list': [1, 2, 3], 'my_tuple': (4, 5, 6), 'a': 1, 'b': 2} ``` --- ## Sets The last object type within base Python are sets. These are unordered and each entry is unique. Sets can be created using the curly brackets, or by coercing another object using the `set()` function. ``` python myset = {"a", "b", "c"} myset ``` ``` ## {'c', 'b', 'a'} ``` ``` python myset = set(["a", "b", "c"]) myset ``` ``` ## {'c', 'b', 'a'} ``` --- ## Sets can be modified The `.add()/remove()` attributes allow the easy modification of sets. ``` python myset.add("d") myset ``` ``` ## {'c', 'b', 'a', 'd'} ``` ``` python myset.remove("d") myset ``` ``` ## {'c', 'b', 'a'} ``` --- ## Sets have no order As sets have no order they can't be subset in the same way that other objects can be. ``` python myset[0] ``` ``` ## 'set' object is not subscriptable ``` --- ## Sets are unique Even if you provide duplicate entries to a set, the set will only contain unique values. ``` python myset = set(["a", "b", "c","c","c"]) myset ``` ``` ## {'c', 'b', 'a'} ``` --- ## Sets have specific functions Sets are really useful for checking intersections between two objects. ``` python myset1 = {1, 2, 3, 4} myset2 = {3, 4, 5, 6} myset1.intersection(myset2) ``` ``` ## {3, 4} ``` ``` python myset1.union(myset2) ``` ``` ## {1, 2, 3, 4, 5, 6} ``` ``` python myset1.difference(myset2) ``` ``` ## {1, 2} ``` --- ## Set vs List * A list allows duplicate elements and maintains their order * A set ensures element uniqueness without any guaranteed order, good for membership testing --- ## Time for an exercise! Exercise on the data types we have covered so far can be found [here](../../exercises/exercises/Exercise2_exercise.html) --- ## Answers to the exercise Answers can be found here [here](../../exercises/answers/Exercise2_answers.html) --- class: inverse, center, middle # NumPy: A python library for arrays <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## NumPy Many of the data types we have looked at thus far are either one-dimensional, or get quite complex when built up into multidimensional data frames. These can become relatively slow and cumbersome to work with if you have large datasets. NumPy is a Python library used for working with arrays, that are common in biological data. It is not included in the base distribution of Python so it has to be installed and loaded in separately. We installed NumPy earlier. Here we load it into our python session with `import`. ``` python import numpy ``` Often an alias is used when you import a library. Here we are importing NumPy `as` np. ``` python import numpy as np ``` --- ## NumPy array Within our imported NumPy library we have many different functions. Here we will use the `array()` function to create an array. In this case we are essentially are creating a list (square brackets), than coercing it into an array. ``` python arr = np.array([1, 2, 3, 4, 5]) type(arr) ``` ``` ## <class 'numpy.ndarray'> ``` ``` python arr ``` ``` ## array([1, 2, 3, 4, 5]) ``` --- ## Data Types and arrays In most data objects we have looked at so far there are limited types of data: `str`,`float`, `int` and `boolean`. Arrays accept all of these. We can always check the type with the `dtype` attribute. ``` python arr = np.array([34, 29, 40]) arr.dtype ``` ``` ## dtype('int64') ``` ``` python arr = np.array([True,False,True]) arr.dtype ``` ``` ## dtype('bool') ``` --- ## Data Types and arrays When you create the array you can specify what type you want the data to be. This can coerce the input data ... within reason. ``` python arr = np.array([34, 29, 40], dtype='S') arr ``` ``` ## array([b'34', b'29', b'40'], dtype='|S2') ``` ``` python arr = np.array(['a', '2', '3'], dtype='i') ``` ``` ## invalid literal for int() with base 10: 'a' ``` --- ## Data Types and arrays Arrays contain only one data type. While they will accept lists of different data types, but these elements will be coerced into a common type. ``` python arr = np.array(['a', 2, 3]) arr ``` ``` ## array(['a', '2', '3'], dtype='<U21') ``` --- ## Adding Dimensions Typically we think about 2D arrays as this is often the rectangular data we deal with. It is possible to create many kinds of arrays, with differing dimensionality. They can be 1D,2D,3D etc. Here we again create a list to coerce into a array. This time we have a list of lists. Each list will become equivalent to a row in our array. ``` python arr_2d = np.array([["Patient1",34,True],["Patient2", 29, True], ["Patient3",41,False]]) arr_2d ``` ``` ## array([['Patient1', '34', 'True'], ## ['Patient2', '29', 'True'], ## ['Patient3', '41', 'False']], dtype='<U21') ``` We can find out the dimension attribute by using `ndim`. In this case we have rectangular data so it is two dimensions. ``` python arr_2d.ndim ``` ``` ## 2 ``` --- ## Array shape We can confirm the shape of the array using the`shape` attribute ``` python arr_2d.shape ``` ``` ## (3, 3) ``` The shape of an array can easily be changed with the `reshape` method. Note that you will get an error if the number of values in the array doesn't fit into the dimensions specified. ``` python arr_2d.reshape(9,1) ``` ``` ## array([['Patient1'], ## ['34'], ## ['True'], ## ['Patient2'], ## ['29'], ## ['True'], ## ['Patient3'], ## ['41'], ## ['False']], dtype='<U21') ``` --- ## Indexing Arrays We can use the same square brackets we used for other data objects to index our arrays. The big difference is we now have 2 dimensions. We therefore need to provide 2 indexes, separated by a comma. The first number will correspond to row. The second number will correspond to column. ``` python arr_2d[0,2] ``` ``` ## np.str_('True') ``` --- ## Slicing Arrays We can also do more complex indexing operations like slicing, to get ranges of values from our array. ``` python arr_2d[:,2] ``` ``` ## array(['True', 'True', 'False'], dtype='<U21') ``` ``` python arr_2d[0,1:3] ``` ``` ## array(['34', 'True'], dtype='<U21') ``` --- ## Logical Indexing Booleans can be used to directly subset arrays. `True` entries are kept. ``` python arr_2d ``` ``` ## array([['Patient1', '34', 'True'], ## ['Patient2', '29', 'True'], ## ['Patient3', '41', 'False']], dtype='<U21') ``` ``` python arr_2d[[True,False,True],:] ``` ``` ## array([['Patient1', '34', 'True'], ## ['Patient3', '41', 'False']], dtype='<U21') ``` --- ## Logical Indexing We can use this along with logical testing to subset our arrays. Let's look back at our 2D array. We want to subset this based on the patient age (the second column) i.e. all patients over 30. ``` python arr_2d ``` ``` ## array([['Patient1', '34', 'True'], ## ['Patient2', '29', 'True'], ## ['Patient3', '41', 'False']], dtype='<U21') ``` It was read in as a 'U<21'. This is a type of string. We need to coerce the second column to a integer to be able to run a logical expression. --- ## Logical Indexing ``` python temp_arr = arr_2d[:,1].astype('i') temp_arr ``` ``` ## array([34, 29, 41], dtype=int32) ``` ``` python sub_idx = temp_arr >30 sub_idx ``` ``` ## array([ True, False, True]) ``` ``` python arr_2d[sub_idx,:] ``` ``` ## array([['Patient1', '34', 'True'], ## ['Patient3', '41', 'False']], dtype='<U21') ``` --- ## Logical operations Doing these kind of logical operations and subsetting on other data objects can be tricky. Many objects do not like working like this. Instead they use a process called [list comprehension](https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions). We will not go into this here, but it is a useful tool for performing a repeated action for each data point across an entire list i.e. checking if it is equal to a given value. --- ## Joining Arrays NumPy arrays can easily be joined together with the `concatenate` function. These are a simple 1D arrays. ``` python arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6]) arr = np.concatenate((arr1, arr2)) ``` --- ## Joining Arrays 2D arrays can be merged just as easily. The orientation of the merge can be controlled using the `axis` argument. ``` python arr1 = np.array([[1, 2], [3, 4]]) arr2 = np.array([[5, 6], [7, 8]]) arr = np.concatenate((arr1, arr2), axis=1) arr ``` ``` ## array([[1, 2, 5, 6], ## [3, 4, 7, 8]]) ``` ``` python arr = np.concatenate((arr1, arr2), axis=0) arr ``` ``` ## array([[1, 2], ## [3, 4], ## [5, 6], ## [7, 8]]) ``` --- ## Mathematical Functions NumPy is not just for arrays. Many mathematical functions are already included in base Python (we met round() earlier). When you import NumPy you gain access to a lot more. Mathematical constants: * pi - np.pi Mathematical functions: * Exponents and logs - np.exp(my_array), np.log(my_array) * Powers and roots - np.sqrt(my_array) * Trigonometery - np.sin(my_array) * Element-wise operators - np.add(my_array1, my_array2) --- ## Which Data Objects? So which data objects do you use? As with most of programming there are often multiple ways to do things and often the optimal data object will be very context dependent. Most of the time you will be working with different Python libraries and functions, each with different preferences for the input/output data object. This will help define which object is appropriate for you. Here is a rough guide: * Lists - Great general hold-all that is very flexible. * Tuples - If you want to create an immutable version of a list as some kind of reference. * Sets - Similar to Tuples, but if you do not want any repeating values. * Dictionaries - If you have complex data i.e. different shape and size, and you will have some kind of key/lookup values. * Arrays - When you have rectangular data i.e. a table, need to perform math on elements or interface with more advanced computational libraries (stats, ML, etc) --- class: inverse, center, middle # Custom Functions <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Making functions Often we will want to define our own functions. This way we can easily repeat the same process. Key parts: * function name * arguments * code * return values (sometimes) --- ## Making functions * `def` - indicates you will define a function. * `function name` - in this case *myFirstFunction*. * `arguments` - any arguments that the function expects in parentheses. * `code` - the start of the code is indicated by a colon, new line and indentation. * `return` - this is the result that you wish to be delivered from the function. NOTE: In python functions, indentation after defining the function is required. Otherwise you will get an error. ``` python def myFirstFunction(num1, num2): sumNum = num1 + num2 return sumNum myResult = myFirstFunction(num1=2, num2=3) myResult ``` ``` ## 5 ``` --- ## Returning multiple values We can only return 1 object at a time from function. Here we create the multiple of our numbers and try and return alongside our sum. It wont even let us define the function. ``` python def myFirstFunction(num1, num2): sumNum = num1 + num2 multipleNum = num1*num2 return sumNum multipleNum ``` ``` ## File "<string>", line 1 ## return sumNum multipleNum ## IndentationError: unexpected indent ``` --- ## Returning multiple values A simple solution is to pass back an object that contains both results. Here we create a quick list. ``` python def myFirstFunction(num1, num2): sumNum = num1 + num2 multipleNum = num1*num2 return [sumNum, multipleNum] myResult = myFirstFunction(num1=2, num2=3) myResult ``` ``` ## [5, 6] ``` --- ## Evaluate until return In a function containing a return statement, the code up until the return statement is evaluated and anything after the return statement is not evaluated. ``` python def myFirstFunction(num1, num2): sumNum = num1 + num2 multipleNum = num1*num2 print("Before return") return [sumNum, multipleNum] print("After return") myFirstFunction(num1=2, num2=3) ``` ``` ## Before return ## [5, 6] ``` --- ## No return statement If a function does not contain a return statement nothing will come back. In other languages this is not the case. ``` python def myFirstFunction(num1, num2): sumNum = num1 + num2 multipleNum = num1*num2 myFirstFunction(num1=2, num2=3) ``` --- ## Variable scope in functions Variables that are defined in the arguments or within the function exist only within the environment of the function. If we try and use the argument outside of the function it will not work. ``` python def myFirstFunction(num1, num2): sumNum = num1 + num2 multipleNum = num1*num2 return [sumNum, multipleNum] myFirstFunction(num1=2, num2=3) sumNum ``` ``` ## [5, 6] ``` ``` ## name 'sumNum' is not defined ``` --- ## Variable scope in functions If a function makes changes to variables defined in the global environment they will not be updated in the global environment. ``` python num3 = 4 def myFirstFunction(num1, num2, num3): num3 = num1+num2+num3 return num3 myFirstFunction(num1=2, num2=3, num3=num3) ``` ``` ## 9 ``` ``` python num3 ``` ``` ## 4 ``` --- ## Local vs Global scope * Functions have local scope. This means they have access to global variables (which can be used anywhere) and local variables which were made within the function. * Once you exit the function you are back to a global scope. Local variables from the function can not be accessed at this point. * Code in a function’s local scope cannot use variables in any other local scope i.e. between functions. * Though it is possible to have local and global variables with the same name, try and give everything unique names so you can keep track of everything. --- ## Argument defaults Functions can have defaults for their arguments which will be used when arguments are not specified. ``` python def myFirstFunction(num1=1,num2=3): sumNum = num1+num2 return sumNum myFirstFunction() ``` ``` ## 4 ``` ``` python myFirstFunction(5,10) ``` ``` ## 15 ``` --- ## Reusing functions Once I have made a function and I want to keep reusing it I can easily save it i.e. I have a function that I want to use regularly to process some data in the same way. To do this you save it as a script. First open a new script: File > New File > Python Script. We can then add our original function and save the python script as *myFirstFunction.py*. ``` python def myFirstFunction(num1, num2): sumNum = num1 + num2 return sumNum ``` --- ## Function import First open a new script: File > New File > Python Script. We can then add our orginal function and save the python script as *myFirstFunction_script.py*. ``` python import myFirstFunction_script myFirstFunction_script.myFirstFunction(5,20) ``` --- ## Time for an exercise! Exercise on the data types we have covered so far can be found [here](../../exercises/exercises/Exercise3_exercise.html) --- ## Answers to the exercise Answers can be found here [here](../../exercises/answers/Exercise3_answers.html) --- class: inverse, center, middle # Control Statements <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Control Statements There are several ways to control how your code is evaluated. There are two main classes: * Conditional branching (if,else) * Loops (for, while) <span style="color:green">While</span> I'm analyzing data, <span style="color:red">if</span> I need to execute complex statistical procedures on the data I will use Python <span style="color:red">else</span> I will use a calculator. --- ## Conditional Branching Conditional branching is the evaluation of a logical to determine whether a chunk of code is executed. In Python, we use the if statement with the logical to be evaluated immediately after. The dependent code is indicated by a **colon**, **new line** and **indentation**. ``` python x = True if x: print("x is true") ``` ``` ## x is true ``` ``` python x = False if x: print("x is true") ``` --- ## Conditional Branching More often, we construct the logical value within the if statement itself. This can be termed the condition. ``` python x = 10 y = 4 if x > y: print("The value of x is",x,"which is greater than", y) ``` ``` ## The value of x is 10 which is greater than 4 ``` The message is printed above because x is greater than y. ``` python y = 20 if x > y: print("The value of x is",x,"which is greater than", y) ``` x is now no longer greater than y, so no message is printed. We really still want a message telling us what was the result of the condition. --- ## else following an if If we want to perform an operation when the condition is false we can follow the `if` statement with an `else` statement. ``` python x = 3 if x < 5: print(x, "is less than to 5") else: print(x, "is greater than or equal to 5") ``` ``` ## 3 is less than to 5 ``` ``` python x = 10 if x < 5: print(x, "is less than to 5") else: print(x, "is greater than or equal to 5") ``` ``` ## 10 is greater than or equal to 5 ``` --- ## else if = elif We may wish to execute different procedures under multiple conditions. This can be controlled using the `elif` following an initial `if` statement. ``` python x = 5 if x < 5: print(x, "is less than to 5") elif x > 5: print(x, "is greater than 5") else: print(x, "is 5") ``` ``` ## 5 is 5 ``` --- ## Loops While and for loops iterate over a block of code, and keep rerunning it. `While` loops do this while a specific condition is met (or until that condition is not met). `For` loops will do this for a given number of iterations. --- ## While loop While loops have a similar structure to if statements. We start by designating the while loop, ten follow with the logical to be evaluated immediately after. The dependent code is indicated by a colon, new line and indentation. ``` python x = 1 while x < 3: print("x is",x) x = x+1 ``` ``` ## x is 1 ## x is 2 ``` --- ## For loop For loops do not have a conditional. Instead you supply an object that you want to be iterate over. This can be a list, tuple, dictionary, set or string. Here we use a list. ``` python x = ['Alpha','Bravo','Charlie'] for i in x: print(i) ``` ``` ## Alpha ## Bravo ## Charlie ``` --- ## for and range() The range() function provides us with a nice input for our for loops. It returns a sequence of numbers, starting from 0 and stops before the specified number. ``` python for i in range(3): print("i is", i) ``` ``` ## i is 0 ## i is 1 ## i is 2 ``` --- ## Looping through indices When we have a numeric range, we can use it to index out from existing objects. This often allows for more complex code evaluation. ``` python geneName = ["Ikzf1","Myc","Igll1"] expression = [10.4, 4.3, 6.5] iterations = len(geneName) for i in range(iterations): print(geneName[i]," has an TPM of ",expression[i]) ``` ``` ## Ikzf1 has an TPM of 10.4 ## Myc has an TPM of 4.3 ## Igll1 has an TPM of 6.5 ``` --- ## Loops and conditionals Loops can be combined with conditional statements to allow for complex control of their execution over Python objects. To help us write complex code we often use pseudocode as a starting point. --- ## Pseudocode When we write pseudocode we are trying to write out each computational step in a human readable way. It is important to be specific, simple, concise and include the control structures that would be in your final code. ``` for 0 to 7 if value is greater than 5 print the value and a statement saying it is greater than 5 else if value is 5 print the value and a statement saying it is equal to 5 else if value is less than 5 print the value and a statement saying it is less than 5 ``` --- ## Loops and conditionals Though these can be tough to read and create, by starting with pseudocode and keeping an eye of the hierarchy of indentation we can follow the logic. .pull-left[ ``` for 0 to 7 if value is greater than 5 print the value and a statement saying it is greater than 5 else if value is 5 print the value and a statement saying it is equal to 5 else if value is less than 5 print the value and a statement saying it is less than 5 ``` ] .pull-right[ ``` python for i in range(8): if i > 5: print("Number",i,"is greater than 5") elif i == 5: print("Number",i,"is 5") else: print("Number",i,"is less than 5") ``` ``` ## Number 0 is less than 5 ## Number 1 is less than 5 ## Number 2 is less than 5 ## Number 3 is less than 5 ## Number 4 is less than 5 ## Number 5 is 5 ## Number 6 is greater than 5 ## Number 7 is greater than 5 ``` ] --- ## Breaking loops We can use conditionals to exit a loop if a condition is satisfied, just like a while loop. ``` python x = range(8) for i in range(8): if i > 5: print("Number",i,"is greater than 5") elif i == 5: print("Number",i,"is 5") break else: print("Number",i,"is less than 5") ``` ``` ## Number 0 is less than 5 ## Number 1 is less than 5 ## Number 2 is less than 5 ## Number 3 is less than 5 ## Number 4 is less than 5 ## Number 5 is 5 ``` --- ## Time for an exercise! Exercises around control structures can be found [here](https://rockefelleruniversity.github.io/Intro_To_Python/exercises/exercises/Exercise4_exercise.html) --- ## Answers to exercise Answers can be found [here](https://rockefelleruniversity.github.io/Intro_To_Python/exercises/answers/Exercise4_answers.html) --- ## Further Support When you hit bugs: * Google/ChatGPT/Claude, etc. * Stackoverflow * Biostars * Reach out on [GitHub](https://github.com/RockefellerUniversity/Intro_To_Python/issues) Other Reference Material: * [Harvard's Python Course](https://cs50.harvard.edu/python/2022/) * [Geeks For Geeks](https://www.geeksforgeeks.org/getting-started-with-python-programming/)