## Using virtual environment '/github/home/.virtualenvs/r-reticulate' ...

Background of Python


What is Python?

Python is a high-level, general-purpose programming language.

Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991

One of the most noticeable differences comes from the emphasis on code readability:

it uses significant indentation.

Python 3.0, released in 2008, was a major revision not completely backward-compatible with earlier versions. Python 2.7 and older is officially unsupported but some tools require it.

python

What is Python to you?

Python has a huge user base across many disciplines. These users have developed a wide range of libraries which you can use.

As python is used by many fields, it is a useful language for adapting novel approaches for bioinformatics: such as deep learning methods for scRNAseq or Image Analysis

TIOBE
TIOBE

Python vs R

Though R has long been the cornerstone of bioinformatics, python is growing in use.

Though there are core utility packages that have been around a long time such as Biopython. It is the new technique-specific packages that are driving the surging popularity of python i.e Scanpy.

comparison

Python vs R

The strengths discussed when considering these languages are clear. That said, in both cases Python and R can handle their supposed weaknesses.

R Python
Plotting Large Data
Statistics Machine Learning
Bioconductor Most Popular Language

Realms of Python specifically relevant to those who are interested in Bioinformatics include Biopython, PyMOL, sciKit, scanPy, and Image Analysis.

Set Up


Materials

All prerequisites, links to material and slides for this course can be found on github.

Or can be downloaded as a zip archive from here.

Course materials

Once the zip file in unarchived. All presentations as HTML slides and pages, their R code and HTML practical sheets will be available in the directories underneath.

  • r_course/presentations/slides/ Presentations as an HTML slide show.
  • r_course/presentations/singlepage/ Presentations as an HTML single page.
  • r_course/presentations/r_code/ R code in presentations.
  • r_course/exercises/ Practicals as HTML pages.
  • r_course/answers/ Practicals with answers as HTML pages and R code solutions.

Starting with Python

Many laptops will come with python installed so getting started is easy. You will simply be able write python into Terminal or Command Prompt and a Python console will open. This is an interactive python session. This will work with most of what will do in the next few sessions as we are working on the basics.

comparison

Setting Up Conda and Python

Each version of python and python packages will respond slightly differently to commands given. We therefore want to make sure when you use python for analysis that you control which version you use. This is important for reproducibility.

We will run a custom install of python using conda. Conda manages software and packages. Using Conda should make installations much easier and allow us to keep track of software versions.

We can install Conda from here. We will specifically want to use Miniconda:

https://conda.io/projects/conda/en/latest/user-guide/install/index.html

If you have installed this correctly, running this on terminal/command prompt should give you a list of conda commands

conda

Setting Up Conda and Python

Conda is built on the idea of environments. An environment is a directory that contains a specific collection of Conda packages that you have installed. For today we will make sure we have python and the python packages we need for the training.

The first step is we create a new environment. We will then activate it to expose the environment.

Lastly we can install python.


 conda create -n intro_to_python
 conda activate intro_to_python
 conda install python

Setting Up Conda and Python

We also want some specific Python packages. These mostly contain functions that are not present in the base Python distribution.

conda install numpy
conda install scipy
conda install matplotlib
conda install seaborn
conda install jupyter

Python and IDEs

You can run python from within your terminal/command prompt. This will give you access to the python console.

Many people prefer to use an Integrated Development Environment to augment their experience while coding. They allow easy writing of scripts, visualization of plots, file navigation, and access to many customization features in additional panes.

Examples include RStudio, pyCharm, Xcode etc.

We will be using Visual Studio Code from Windows.

TIOBE
TIOBE

Open a python script

We are now all set up. We will now create a python script. Just click New File... on the Welcome Page (or File > New File…), then choose python script.

In this scripting panel we have opened we often will type code, write notes and build scripts.

Open a python console

As we mentioned before we will be working with our Python interactively. This means we need a Python console to work with. This is wehre the code is actually evaluated.

We can manually open one up by Terminal > New Terminal.

Once the Terminal is open, you can then just open Python, by typing python into the new terminal window.

There is is an easier shortcut to do this. Once you start developing code the easiest way to open it is to use the Shift + Enter shortcut.

Why VS code and IDEs

As VS code is an IDE it allows us to access and run python. But also do several other things from the same portal i.e.

  • Explore files
  • Open/Write scripts
  • Shortcuts for common python utilities
  • Have a live plotting window
  • GitHub integration
  • Advanced coding features for debugging and LLM support

We won’t full dive into everything but getting familiar early is best.

Interactive or scripts?

As with many languages you can work with python in two main ways: interactive or scripts.

When people think about coding they are often thinking about the interactive console. This is what we saw earlier when we first opened python. When you work in this way lines of code are submitted as you enter them. You often do this when you are developing an analysis and trying out parameters. This is mostly how we will work in the training.

When you want to automate something, i.e. an analysis workflow, you will write a script. You can then run this script with python and it will run every line of code for you sequentially.

Quick Recap

extensions
extensions

Variables and Functions


Simple Calculation

At its simplest you can just use python as a fancy calculator:

1+1
## 2
2*5
## 10

Functions

To take things further there are many functions built-in to python. These are saved chunks of code that will do a task based on the arguments you provide. You can tell there is a function when there is a string immediately followed by a set of parenthesis:

myfunction()

Here we use the round function:

round(3.14159)
## 3

Help with functions

To get help with a function you can use the help() function. This will open up the help page for this function. Hopefully this will contain information about what arguments it are accepted and what is returned by the function. In this case we can see there is an additional optional argument ndigits. This has a default value of None.

help(round)
## Help on built-in function round in module builtins:
## 
## round(number, ndigits=None)
##     Round a number to a given precision in decimal digits.
## 
##     The return value is an integer if ndigits is omitted or None.  Otherwise
##     the return value has the same type as the number.  ndigits may be negative.

Function and arguments

If we want to update our rounding result to allow for more decimal places we can add the second argument. For simple function like this, s long as the order is correct we do not to specify the argument.

round(3.14159, 3)
## 3.142

We can still run a function with disordered arguments by naming them.

round(ndigits=3, number=3.14159)
## 3.142

Variables

Often you will want to save something in your environment for use later on. We do this by creating a variable by assignment with the = sign.

greeting = 'Hello!'
greeting
## 'Hello!'
number = 3.14159
number
## 3.14159

Variables

When assigning a variable there are certain things you can and cannot do.

Good:

  • Only letters, numbers, and the underscore

Bad:

  • Don’t use non-alphanumerics. This includes: . % + - * #
  • You can’t start with a number

The best thing to do is name it something short and simple, that makes sense.

Variables and functions

Once we have a variable we can then use it inside functions. The vector name is acting as an alias for what it contains.

number
## 3.14159
round(number)
## 3

Variables

There are many kinds of variables. The most basic types are: str,float, int and boolean. You can always check what kind you have with the type() function.

greeting = 'Hello!'
type(greeting )
## <class 'str'>
number = 3.14159
type(number )
## <class 'float'>
newnumber = round(number)
newnumber
## 3
type(newnumber)
## <class 'int'>
boolean = True
type(boolean)
## <class 'bool'>

Variables Coercion

We can manually set the type using: str(), float(), int() and bool().

string_number = str(number)
string_number
## '3.14159'
float_string_number = float(string_number)
float_string_number 
## 3.14159
int_float_string_number = int(float_string_number)
int_float_string_number 
## 3
int_float_string_number_boolean = bool(float_string_number)
int_float_string_number_boolean
## True

Variables Coercion

These functions do not always work, if there is not a clear rationale for how to resolve the function.

greeting = 'Hello!'
int(greeting)
## could not convert string to float: 'Hello!'

A quick aside

You will run into errors coding.

DON’T PANIC.

Most of the time the error messages are very clear. And if they are not a quick google will often clear it up.

In this case we can break it down:

  • ValueError - The content is invalid for the operation. The first statement is the overarching name for the error.
  • invalid literal for int() - Specifically the int() function is expecting something different
  • with base 10 - It is expecting base 10 numbers/integers
  • ‘Hello!’ - This is what you gave it. We can see it doesn’t match the criteria above.

Concatenation

Strings can be concatenated easily.

newgreeting = 'Hi' ' there'
newgreeting
## 'Hi there'
newgreeting2 = 'Hi' + ' there'
newgreeting2
## 'Hi there'
newgreeting3 = 'Hi'
newgreeting3 += ' there'
newgreeting3 
## 'Hi there'
newgreeting4 = 'Hi' * 5
newgreeting4
## 'HiHiHiHiHi'

Time for an exercise!

Exercise on the data types we have covered so far can be found here

Answers to the exercise

Answers can be found here here

Data Objects


Lists

Python has many options for storing data. The simplest is a list.

A list has a few key characteristics: * the order of the elements matters and can be used for indexing * they are mutable and dynamic (elements can be modified and length can be changed) * they can hold mixed types of data

Lists

Lists are denoted with square brackets.

my_strs = ['a','b','c','d','e']
my_strs
## ['a', 'b', 'c', 'd', 'e']
my_ints= [1,2,3,4,5]
my_ints
## [1, 2, 3, 4, 5]
my_floats = [1.1,2.2,3.3,4.4,5.5]
my_floats
## [1.1, 2.2, 3.3, 4.4, 5.5]

Indexing Lists

We can also use the square brackets to extract specific values from our list.

my_strs[2]
## 'c'

Here we get the third value from our list using the number 2. That is because python uses zero indexing; counting in python starts at 0, not 1.

my_strs[0]
## 'a'

Indexing Lists

Sometimes we have a long list, but we know we want the final value. We can use a - to indicate how far from the end we want to index.

my_strs[-1]
## 'e'

Indexing Lists

We can also create a sublist by slicing our list with the :.

my_strs[2:4]
## ['c', 'd']
my_strs[2:-1]
## ['c', 'd']
my_strs[2:]
## ['c', 'd', 'e']

Key point: You’ll notice that slicing in Python is inclusive of the first element, but exclusive of the last element. So in our example, my_strs[2:4] starts with my_strs[2] but does not include my_strs[4].

Complex Lists

List are general containers for a variety of data types. This means you can make a list of lists!

my_lists = ["a",['b1','b2'], ['c1',['c2']]]
my_lists
## ['a', ['b1', 'b2'], ['c1', ['c2']]]

We can still use indexing to deal with this mess of lists

my_lists[2][1][0]
## 'c2'

Indexing Lists

We can use the assignment we have been using this whole time to break open this nested list structure.

my_lists
## ['a', ['b1', 'b2'], ['c1', ['c2']]]
list1, list2, list3 = my_lists
list1
## 'a'
list2
## ['b1', 'b2']
list3
## ['c1', ['c2']]

Concatenation

We can concatenate lists, just as we did with strings.

biglist = my_ints + my_strs
biglist
## [1, 2, 3, 4, 5, 'a', 'b', 'c', 'd', 'e']
biglist2 = [1,2,3,4,5] 
biglist2 += my_strs
biglist2
## [1, 2, 3, 4, 5, 'a', 'b', 'c', 'd', 'e']
biglist3 = my_strs * 5
biglist3
## ['a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e']

Functions and lists

There are many useful functions for working with lists. Many of these functions work directly on the list. This means you don’t need to assign the result back to the object. The structure is VARIABLE.function(). These are called attributes. Here we use the append function to extend our list.

my_strs.append('f')
my_strs
## ['a', 'b', 'c', 'd', 'e', 'f']
my_strs.append(1)
my_strs
## ['a', 'b', 'c', 'd', 'e', 'f', 1]

Functions and lists

There are many useful functions built into the base version of python.

  • .insert() inserts an argument into a specific position in the list
my_strs.insert(3,'c')
my_strs
## ['a', 'b', 'c', 'c', 'd', 'e', 'f', 1]
  • .remove() removes something from the list, but will only remove the first instance
my_strs.remove('c')
my_strs
## ['a', 'b', 'c', 'd', 'e', 'f', 1]

Functions and lists

There are many useful functions built into the base version of python.

  • .index() reveals which position in the list is the supplied argument
my_strs.index('c')
## 2
  • the generic del statement removes an argument from a specific position in the list
del my_strs[3]
my_strs
## ['a', 'b', 'c', 'e', 'f', 1]
  • we can also test membership of an element with the in operator
"a" in my_strs
## True

Functions and lists

  • .sort() will sort your list. This works both with numerical and string data.
my_list = [1,4,9,4,11,12,6]
my_list.sort()
my_list
## [1, 4, 4, 6, 9, 11, 12]
my_list.sort(reverse=True)
my_list
## [12, 11, 9, 6, 4, 4, 1]
my_list = ["b","c","a"]
my_list.sort()
my_list
## ['a', 'b', 'c']

Mutating vs Non-mutating functions

You may have noticed that the sort function does not actually return the sorted list. It returns ‘None’ and modifies the list in place. This might be different from other languages you have used.

sort_result = my_list.sort()
sort_result # this returns None, modifies object in place

Some python functions do return the modified object and leave the original object unchanged. We can try this with the sorted function.

new_list = [1,4,9,4,11,12,6]
sorted_result = sorted(new_list)
sorted_result # function modified object
## [1, 4, 4, 6, 9, 11, 12]
new_list # old object is unchanged
## [1, 4, 9, 4, 11, 12, 6]

Tuples

Tuples are another type of object in python. They look and behave a lot like lists. But where lists are dynamic and mutable, tuples cannot be changed. As a result tuples are more memory efficient than lists.

When making a tuple you use parentheses instead of square brackets.

my_list = ['a','b','c','d','e']
my_list[0] = 'z'
my_list
## ['z', 'b', 'c', 'd', 'e']
my_tuple = ('a','b','c','d','e')
my_tuple[0] = 'z'
## 'tuple' object does not support item assignment

Tuple/List coercion

As with str/int/float/bool you can easily convert a list to a tuple and vice versa. Simply use the list() and tuple() functions.

my_list = ['a','b','c','d','e']
my_tuple_list = tuple(my_list)
my_tuple_list
## ('a', 'b', 'c', 'd', 'e')
type(my_tuple_list)
## <class 'tuple'>
my_list_tuple_list = list(my_tuple_list)
my_list_tuple_list
## ['a', 'b', 'c', 'd', 'e']
type(my_list_tuple_list)
## <class 'list'>

Functions, tuples and lists

Remember!

  • Square brackets [] for indexing and lists.

  • Parentheses () for functions and tuples.

Dictionaries

Another data type are dictionaries. Dictionaries are made of key:value pairs.

  • A key is some kind of unique identifier, typically a short string.
  • The value is a corresponding piece of data that is often more complex (e.g. different data types).
  • While values can be modified, keys are immutable.

This structure allows the organization of your data, and gives you the ability to grab out values using the key.

Dictionaries

Dictionaries are made with the curly brackets. Each entry consists of a pair of objects. The key identifier, and the value. Here you can see we have multiple types and shapes of data contained in our values.

my_dict = {
    'my_list': [1,2,3],
    'my_tuple': (4,5,6),
    'language': 'python',
    'technique': 'scRNAseq'
}
my_dict
## {'my_list': [1, 2, 3], 'my_tuple': (4, 5, 6), 'language': 'python', 'technique': 'scRNAseq'}

Dictionaries

There are attribute functions that we can use to access the keys and values from our dictionary.

my_dict.keys()
## dict_keys(['my_list', 'my_tuple', 'language', 'technique'])
my_dict.values()
## dict_values([[1, 2, 3], (4, 5, 6), 'python', 'scRNAseq'])

Dictionary indexes

We can index our dictionary using the key values and the square brackets, similar to other objects.

my_dict['my_list']
## [1, 2, 3]

We can also use the .get() attribute.

my_dict['language']
## 'python'
my_dict.get('language')
## 'python'

Dictionary indexes

Unlike lists, dictionaries cannot be subset with a numeric index and must be indexed with a key value.

my_dict[0]
## 0

Dictionaries

It is easy to add additional entries with the .setdefault() attribute. We just provide a new key/value pair.

my_dict.setdefault('metadata', True)
## True
my_dict
## {'my_list': [1, 2, 3], 'my_tuple': (4, 5, 6), 'language': 'python', 'technique': 'scRNAseq', 'metadata': True}

We check our addition using the in operator. This performs a logical test. We can test specifically on the keys, or the dictionary as a whole.

'metadata' in my_dict.keys()
## True
'metadata' in my_dict
## True

Concatenating Dictionaries

Often we want to stick multiple dictionaries together.

dict_1 = {'my_list': [1,2,3],
  'my_tuple': (4,5,6),}
  
dict_2 = {'a': 1, 'b': 2}

There are 3 options:

  1. Unpacking operator
dict_3 = {**dict_1, **dict_2}
dict_3
## {'my_list': [1, 2, 3], 'my_tuple': (4, 5, 6), 'a': 1, 'b': 2}

Concatenating Dictionaries

  1. Merge with pipe
dict_3 = dict_1 | dict_2
dict_3
## {'my_list': [1, 2, 3], 'my_tuple': (4, 5, 6), 'a': 1, 'b': 2}

Concatenating Dictionaries

  1. Update function
dict_1.update(dict_2)

dict_1
## {'my_list': [1, 2, 3], 'my_tuple': (4, 5, 6), 'a': 1, 'b': 2}

Sets

The last object type within base Python are sets. These are unordered and each entry is unique. Sets can be created using the curly brackets, or by coercing another object using the set() function.

myset = {"a", "b", "c"}
myset
## {'c', 'b', 'a'}
myset = set(["a", "b", "c"])
myset
## {'c', 'b', 'a'}

Sets can be modified

The .add()/remove() attributes allow the easy modification of sets.

myset.add("d")
myset
## {'c', 'b', 'a', 'd'}
myset.remove("d")
myset
## {'c', 'b', 'a'}

Sets have no order

As sets have no order they can’t be subset in the same way that other objects can be.

myset[0]
## 'set' object is not subscriptable

Sets are unique

Even if you provide duplicate entries to a set, the set will only contain unique values.

myset = set(["a", "b", "c","c","c"])
myset
## {'c', 'b', 'a'}

Sets have specific functions

Sets are really useful for checking intersections between two objects.

myset1 = {1, 2, 3, 4}
myset2 = {3, 4, 5, 6}

myset1.intersection(myset2)
## {3, 4}
myset1.union(myset2)
## {1, 2, 3, 4, 5, 6}
myset1.difference(myset2)
## {1, 2}

Set vs List

  • A list allows duplicate elements and maintains their order
  • A set ensures element uniqueness without any guaranteed order, good for membership testing

Time for an exercise!

Exercise on the data types we have covered so far can be found here

Answers to the exercise

Answers can be found here here

NumPy: A python library for arrays


NumPy

Many of the data types we have looked at thus far are either one-dimensional, or get quite complex when built up into multidimensional data frames. These can become relatively slow and cumbersome to work with if you have large datasets.

NumPy is a Python library used for working with arrays, that are common in biological data. It is not included in the base distribution of Python so it has to be installed and loaded in separately. We installed NumPy earlier. Here we load it into our python session with import.

import numpy

Often an alias is used when you import a library. Here we are importing NumPy as np.

import numpy as np

NumPy array

Within our imported NumPy library we have many different functions. Here we will use the array() function to create an array. In this case we are essentially are creating a list (square brackets), than coercing it into an array.

arr = np.array([1, 2, 3, 4, 5])

type(arr)
## <class 'numpy.ndarray'>
arr
## array([1, 2, 3, 4, 5])

Data Types and arrays

In most data objects we have looked at so far there are limited types of data: str,float, int and boolean. Arrays accept all of these. We can always check the type with the dtype attribute.

arr = np.array([34, 29, 40])
arr.dtype
## dtype('int64')
arr = np.array([True,False,True])
arr.dtype
## dtype('bool')

Data Types and arrays

When you create the array you can specify what type you want the data to be. This can coerce the input data … within reason.

arr = np.array([34, 29, 40], dtype='S')

arr
## array([b'34', b'29', b'40'], dtype='|S2')
arr = np.array(['a', '2', '3'], dtype='i')
## invalid literal for int() with base 10: 'a'

Data Types and arrays

Arrays contain only one data type. While they will accept lists of different data types, but these elements will be coerced into a common type.

arr = np.array(['a', 2, 3])
arr
## array(['a', '2', '3'], dtype='<U21')

Adding Dimensions

Typically we think about 2D arrays as this is often the rectangular data we deal with. It is possible to create many kinds of arrays, with differing dimensionality. They can be 1D,2D,3D etc.

Here we again create a list to coerce into a array. This time we have a list of lists. Each list will become equivalent to a row in our array.

arr_2d = np.array([["Patient1",34,True],["Patient2", 29, True], ["Patient3",41,False]])
arr_2d
## array([['Patient1', '34', 'True'],
##        ['Patient2', '29', 'True'],
##        ['Patient3', '41', 'False']], dtype='<U21')

We can find out the dimension attribute by using ndim. In this case we have rectangular data so it is two dimensions.

arr_2d.ndim
## 2

Array shape

We can confirm the shape of the array using theshape attribute

arr_2d.shape
## (3, 3)

The shape of an array can easily be changed with the reshape method. Note that you will get an error if the number of values in the array doesn’t fit into the dimensions specified.

arr_2d.reshape(9,1)
## array([['Patient1'],
##        ['34'],
##        ['True'],
##        ['Patient2'],
##        ['29'],
##        ['True'],
##        ['Patient3'],
##        ['41'],
##        ['False']], dtype='<U21')

Indexing Arrays

We can use the same square brackets we used for other data objects to index our arrays. The big difference is we now have 2 dimensions. We therefore need to provide 2 indexes, separated by a comma.

The first number will correspond to row. The second number will correspond to column.

arr_2d[0,2]
## np.str_('True')

Slicing Arrays

We can also do more complex indexing operations like slicing, to get ranges of values from our array.

arr_2d[:,2]
## array(['True', 'True', 'False'], dtype='<U21')
arr_2d[0,1:3]
## array(['34', 'True'], dtype='<U21')

Logical Indexing

Booleans can be used to directly subset arrays. True entries are kept.

arr_2d
## array([['Patient1', '34', 'True'],
##        ['Patient2', '29', 'True'],
##        ['Patient3', '41', 'False']], dtype='<U21')
arr_2d[[True,False,True],:]
## array([['Patient1', '34', 'True'],
##        ['Patient3', '41', 'False']], dtype='<U21')

Logical Indexing

We can use this along with logical testing to subset our arrays. Let’s look back at our 2D array. We want to subset this based on the patient age (the second column) i.e. all patients over 30.

arr_2d
## array([['Patient1', '34', 'True'],
##        ['Patient2', '29', 'True'],
##        ['Patient3', '41', 'False']], dtype='<U21')

It was read in as a ‘U<21’. This is a type of string. We need to coerce the second column to a integer to be able to run a logical expression.

Logical Indexing

temp_arr = arr_2d[:,1].astype('i')
temp_arr
## array([34, 29, 41], dtype=int32)
sub_idx = temp_arr >30
sub_idx
## array([ True, False,  True])
arr_2d[sub_idx,:]
## array([['Patient1', '34', 'True'],
##        ['Patient3', '41', 'False']], dtype='<U21')

Logical operations

Doing these kind of logical operations and subsetting on other data objects can be tricky. Many objects do not like working like this. Instead they use a process called list comprehension.

We will not go into this here, but it is a useful tool for performing a repeated action for each data point across an entire list i.e. checking if it is equal to a given value.

Joining Arrays

NumPy arrays can easily be joined together with the concatenate function. These are a simple 1D arrays.

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.concatenate((arr1, arr2))

Joining Arrays

2D arrays can be merged just as easily. The orientation of the merge can be controlled using the axis argument.

arr1 = np.array([[1, 2], [3, 4]])

arr2 = np.array([[5, 6], [7, 8]])

arr = np.concatenate((arr1, arr2), axis=1)

arr
## array([[1, 2, 5, 6],
##        [3, 4, 7, 8]])
arr = np.concatenate((arr1, arr2), axis=0)

arr
## array([[1, 2],
##        [3, 4],
##        [5, 6],
##        [7, 8]])

Mathematical Functions

NumPy is not just for arrays. Many mathematical functions are already included in base Python (we met round() earlier). When you import NumPy you gain access to a lot more.

Mathematical constants: * pi - np.pi

Mathematical functions: * Exponents and logs - np.exp(my_array), np.log(my_array) * Powers and roots - np.sqrt(my_array) * Trigonometery - np.sin(my_array) * Element-wise operators - np.add(my_array1, my_array2)

Which Data Objects?

So which data objects do you use?

As with most of programming there are often multiple ways to do things and often the optimal data object will be very context dependent. Most of the time you will be working with different Python libraries and functions, each with different preferences for the input/output data object. This will help define which object is appropriate for you.

Here is a rough guide:

  • Lists - Great general hold-all that is very flexible.
  • Tuples - If you want to create an immutable version of a list as some kind of reference.
  • Sets - Similar to Tuples, but if you do not want any repeating values.
  • Dictionaries - If you have complex data i.e. different shape and size, and you will have some kind of key/lookup values.
  • Arrays - When you have rectangular data i.e. a table, need to perform math on elements or interface with more advanced computational libraries (stats, ML, etc)

Custom Functions


Making functions

Often we will want to define our own functions. This way we can easily repeat the same process.

Key parts: * function name * arguments * code * return values (sometimes)

Making functions

  • def - indicates you will define a function.
  • function name - in this case myFirstFunction.
  • arguments - any arguments that the function expects in parentheses.
  • code - the start of the code is indicated by a colon, new line and indentation.
  • return - this is the result that you wish to be delivered from the function.

NOTE: In python functions, indentation after defining the function is required. Otherwise you will get an error.

def myFirstFunction(num1, num2):
  sumNum = num1 + num2
  return sumNum

myResult = myFirstFunction(num1=2, num2=3)
myResult
## 5

Returning multiple values

We can only return 1 object at a time from function. Here we create the multiple of our numbers and try and return alongside our sum. It wont even let us define the function.

def myFirstFunction(num1, num2):
  sumNum = num1 + num2
  multipleNum = num1*num2
  return sumNum multipleNum
## File "<string>", line 1
##     return sumNum multipleNum
## IndentationError: unexpected indent

Returning multiple values

A simple solution is to pass back an object that contains both results. Here we create a quick list.

def myFirstFunction(num1, num2):
  sumNum = num1 + num2
  multipleNum = num1*num2
  return [sumNum, multipleNum]

myResult = myFirstFunction(num1=2, num2=3)
myResult
## [5, 6]

Evaluate until return

In a function containing a return statement, the code up until the return statement is evaluated and anything after the return statement is not evaluated.

def myFirstFunction(num1, num2):
  sumNum = num1 + num2
  multipleNum = num1*num2
  print("Before return")
  return [sumNum, multipleNum]
  print("After return")

myFirstFunction(num1=2, num2=3)
## Before return
## [5, 6]

No return statement

If a function does not contain a return statement nothing will come back. In other languages this is not the case.


def myFirstFunction(num1, num2):
  sumNum = num1 + num2
  multipleNum = num1*num2

myFirstFunction(num1=2, num2=3)

Variable scope in functions

Variables that are defined in the arguments or within the function exist only within the environment of the function. If we try and use the argument outside of the function it will not work.

def myFirstFunction(num1, num2):
  sumNum = num1 + num2
  multipleNum = num1*num2
  return [sumNum, multipleNum]

myFirstFunction(num1=2, num2=3)

sumNum
## [5, 6]
## name 'sumNum' is not defined

Variable scope in functions

If a function makes changes to variables defined in the global environment they will not be updated in the global environment.

num3 = 4

def myFirstFunction(num1, num2, num3):
  num3 = num1+num2+num3
  return num3

myFirstFunction(num1=2, num2=3, num3=num3)
## 9
num3
## 4

Local vs Global scope

  • Functions have local scope. This means they have access to global variables (which can be used anywhere) and local variables which were made within the function.

  • Once you exit the function you are back to a global scope. Local variables from the function can not be accessed at this point.

  • Code in a function’s local scope cannot use variables in any other local scope i.e. between functions.

  • Though it is possible to have local and global variables with the same name, try and give everything unique names so you can keep track of everything.

Argument defaults

Functions can have defaults for their arguments which will be used when arguments are not specified.

def myFirstFunction(num1=1,num2=3):
  sumNum = num1+num2
  return sumNum

myFirstFunction()
## 4
myFirstFunction(5,10)
## 15

Reusing functions

Once I have made a function and I want to keep reusing it I can easily save it i.e. I have a function that I want to use regularly to process some data in the same way. To do this you save it as a script.

First open a new script: File > New File > Python Script. We can then add our original function and save the python script as myFirstFunction.py.

def myFirstFunction(num1, num2):
  sumNum = num1 + num2
  return sumNum

Function import

First open a new script: File > New File > Python Script. We can then add our orginal function and save the python script as myFirstFunction_script.py.

import myFirstFunction_script

myFirstFunction_script.myFirstFunction(5,20)

Time for an exercise!

Exercise on the data types we have covered so far can be found here

Answers to the exercise

Answers can be found here here

Control Statements


Control Statements

There are several ways to control how your code is evaluated. There are two main classes:

  • Conditional branching (if,else)
  • Loops (for, while)

While I’m analyzing data, if I need to execute complex statistical procedures on the data I will use Python else I will use a calculator.

Conditional Branching

Conditional branching is the evaluation of a logical to determine whether a chunk of code is executed.

In Python, we use the if statement with the logical to be evaluated immediately after. The dependent code is indicated by a colon, new line and indentation.

x = True
if x:
  print("x is true")
## x is true

x = False
if x:
  print("x is true")

Conditional Branching

More often, we construct the logical value within the if statement itself. This can be termed the condition.

x = 10
y = 4
if x > y:
  print("The value of x is",x,"which is greater than", y)
## The value of x is 10 which is greater than 4

The message is printed above because x is greater than y.

y = 20
if x > y:
  print("The value of x is",x,"which is greater than", y)

x is now no longer greater than y, so no message is printed.

We really still want a message telling us what was the result of the condition.

else following an if

If we want to perform an operation when the condition is false we can follow the if statement with an else statement.

x = 3
if x < 5:
  print(x, "is less than to 5")
else:
  print(x, "is greater than or equal to 5")
## 3 is less than to 5
x = 10
if x < 5:
  print(x, "is less than to 5")
else:
  print(x, "is greater than or equal to 5")
## 10 is greater than or equal to 5

else if = elif

We may wish to execute different procedures under multiple conditions. This can be controlled using the elif following an initial if statement.

x = 5
if x < 5:
  print(x, "is less than to 5")
elif x > 5:
  print(x, "is greater than 5")
else:
  print(x, "is 5")
## 5 is 5

Loops

While and for loops iterate over a block of code, and keep rerunning it.

While loops do this while a specific condition is met (or until that condition is not met).

For loops will do this for a given number of iterations.

While loop

While loops have a similar structure to if statements. We start by designating the while loop, ten follow with the logical to be evaluated immediately after. The dependent code is indicated by a colon, new line and indentation.

x = 1
while x < 3:
  print("x is",x)
  x = x+1
## x is 1
## x is 2

For loop

For loops do not have a conditional. Instead you supply an object that you want to be iterate over. This can be a list, tuple, dictionary, set or string. Here we use a list.

x = ['Alpha','Bravo','Charlie']
for i in x:
  print(i)
## Alpha
## Bravo
## Charlie

for and range()

The range() function provides us with a nice input for our for loops. It returns a sequence of numbers, starting from 0 and stops before the specified number.

for i in range(3):
  print("i is", i)
## i is 0
## i is 1
## i is 2

Looping through indices

When we have a numeric range, we can use it to index out from existing objects. This often allows for more complex code evaluation.

geneName = ["Ikzf1","Myc","Igll1"]
expression = [10.4, 4.3, 6.5]
iterations = len(geneName)

for i in range(iterations):
  print(geneName[i]," has an TPM of ",expression[i])
## Ikzf1  has an TPM of  10.4
## Myc  has an TPM of  4.3
## Igll1  has an TPM of  6.5

Loops and conditionals

Loops can be combined with conditional statements to allow for complex control of their execution over Python objects.

To help us write complex code we often use pseudocode as a starting point.

Pseudocode

When we write pseudocode we are trying to write out each computational step in a human readable way.

It is important to be specific, simple, concise and include the control structures that would be in your final code.


for 0 to 7
  if value is greater than 5
    print the value and a statement saying it is greater than 5
  else if value is 5
    print the value and a statement saying it is equal to 5
  else if value is less than 5
    print the value and a statement saying it is less than 5

Loops and conditionals

Though these can be tough to read and create, by starting with pseudocode and keeping an eye of the hierarchy of indentation we can follow the logic.

for 0 to 7
  if value is greater than 5
    print the value and a statement saying it is greater than 5
  else if value is 5
    print the value and a statement saying it is equal to 5
  else if value is less than 5
    print the value and a statement saying it is less than 5
for i in range(8):
  if i > 5:
    print("Number",i,"is greater than 5")
  elif i == 5:
    print("Number",i,"is  5") 
  else:
    print("Number",i,"is less than 5") 
## Number 0 is less than 5
## Number 1 is less than 5
## Number 2 is less than 5
## Number 3 is less than 5
## Number 4 is less than 5
## Number 5 is  5
## Number 6 is greater than 5
## Number 7 is greater than 5

Breaking loops

We can use conditionals to exit a loop if a condition is satisfied, just like a while loop.

x = range(8)

for i in range(8):
  if i > 5:
    print("Number",i,"is greater than 5")
  elif i == 5:
    print("Number",i,"is  5")
    break
  else:
    print("Number",i,"is less than 5") 
## Number 0 is less than 5
## Number 1 is less than 5
## Number 2 is less than 5
## Number 3 is less than 5
## Number 4 is less than 5
## Number 5 is  5

Time for an exercise!

Exercises around control structures can be found here

Answers to exercise

Answers can be found here

Further Support

When you hit bugs: * Google/ChatGPT/Claude, etc. * Stackoverflow * Biostars * Reach out on GitHub

Other Reference Material: * Harvard’s Python Course * Geeks For Geeks