## Using virtual environment '/github/home/.virtualenvs/r-reticulate' ...
Today’s goals:
In general, the recipe for creating a figure is as follows:
plot(), scatter(), bar(),
etc..show() and save or export it if
you wish.Matplotlib is Python’s library for visualization. It has extensive
documentation available online, including many tutorials.
Within Matplotlib, you will mostly be working with pyplot
to generate simple plots. You can view the documentation for pyplot here.
Each function within pyplot has detailed descriptions of the arguments
it takes - these will be very useful when you would like to customize
your plots.
To import matplotlib.pyplot, simply type
import matplotlib.pyplot as plt at the top of your code.
You can then refer to the library as plt in your code as
needed. Note that this isn’t strictly necessary, but you will find that
this is an almost-universal naming convention (other libraries follow
similar conventions too).
In Python, the best way to make a figure is by using the subplots()
function to define a figure and set(s) of axes. The reason we use the
subplots() function is that it makes it easy to add
multiple plots/axes to a figure, which is commonly done in the
visualization of scientific data. To define a figure, you can write:
fig is your figure (think: shape and size of your
plot)ax is your set of axes where you will plot your data
and customize how the plot lookssubplots()
is a function which has multiple arguments that you can use to specify
the size and shape of your figure, as well as other parameters for your
axes.Let’s visit the documentation and take a look at the options.
figsize=([width], [height]) option (dimensions should be in
inches based on default 100 dpi - may change depending on your
monitor).plt.show() to render
your plot## (-3.0, 4.0)
Looking at the subplots()
documentation again, we can see that we can specify the number and
arrangement of subplots we want:
ncols for the number of columnsnrows for the number of rowsYou can then define a corresponding axis for each subplot. Below is
the code to generate two horizontally (fig1) and vertically
(fig2) stacked subplots.
Let’s make a 2x2 grid of subplots. Note that we use nested brackets to specify the positions of each subplot within the figure.
subplots()
options:
width_ratios and height_ratios to adjust
the relative sizes of rows and columnssharex and sharey to force subplots to
share an x or y axisgridspec
for arbitrary/custom subplots (ex: different number of plots in each
row)Now that we can make figures and axes, let’s grab some data to plot. We are going to use some patient data that contains sex (‘Male’ or ‘Female’), weight (kg) and height (cm). We will then have 3 arrays of data to work with.
import numpy as np
sex, height, weight = np.genfromtxt('data/height-weight.csv', unpack = True, delimiter = ",", skip_header=True, dtype=None, encoding='UTF-8')
print(sex)## ['Male' 'Male' 'Male' 'Male' 'Female' 'Female' 'Female' 'Female' 'Male'
## 'Male' 'Female' 'Female' 'Male' 'Female' 'Female' 'Male' 'Male' 'Male'
## 'Male' 'Male' 'Male' 'Female' 'Male' 'Female' 'Female' 'Male' 'Male'
## 'Male' 'Male' 'Male' 'Female' 'Male' 'Male' 'Female' 'Male' 'Female'
## 'Male' 'Female' 'Male' 'Female' 'Male' 'Female' 'Female' 'Female'
## 'Female' 'Female' 'Female' 'Female' 'Male' 'Female' 'Female' 'Female'
## 'Female' 'Female' 'Male' 'Female' 'Female' 'Female' 'Male' 'Male'
## 'Female' 'Male' 'Female' 'Male' 'Male' 'Female' 'Male' 'Female' 'Male'
## 'Female' 'Female' 'Male' 'Male' 'Female' 'Male' 'Female' 'Male' 'Male'
## 'Female' 'Female' 'Female' 'Female' 'Male' 'Female' 'Female' 'Female'
## 'Female' 'Male' 'Female' 'Male' 'Male' 'Male' 'Female' 'Male' 'Female'
## 'Female' 'Female' 'Female' 'Female' 'Female']
Let’s separate our weight and height data by sex. Can you see what the code below does?
Scatter plots are used for displaying discrete data points, where
each point has a set of coordinates \((x,y)\). If you want to plot data points
\((x_1, y_1), (x_2, y_2) ... (x_n,
y_n)\) from lists \(x = (x_1,
x_2,...,x_n)\) and \(y = (y_1,
y_2,...,y_n)\), you can use the scatter()
function, applied to the axis you want to plot on.
Let’s create a plot of height vs weight using our patient data. We are going to generate a 2x1 subplot and create a scatter plot on the left subplot.
# Generate figure and axes
fig, (ax1, ax2) = plt.subplots(ncols = 2, nrows = 1, figsize=(10, 4))
# Plot data on ax1
ax1.scatter(weight_m, height_m)
ax1.scatter(weight_f, height_f)
plt.show()Take a look at the options in the scatter()
documentation. Some common parameters you could use to customize your
scatter plot are:
s: marker size in points ^2 (don’t ask why…).color (c): marker color. Enter a string that could
include a named color, RBG code, or hex color code. Find a full guide to
specifying colors here.marker: marker style. Choose between a variety of
preset options, the default being ‘o’ for circles. View the full list of
options here.linewidths: width of the marker outline. Enter number
in pts.edgecolors: color of the marker outline. Enter as a
string, similar to the value of c.alpha: transparency (0 = transparent, 1 = opaque)You can use these color names as strings to define colors within the parameters of scatter(), or you can also specify hex or RGB color codes as strings.
Using these colors and the list of parameters below, take a second to customize your plot of weight vs height.
s: marker size in points ^2 (don’t ask why…).color (c): marker color. Enter a string that could
include a named color, RBG code, or hex color code. Find a full guide to
specifying colors here.marker: marker style. Choose between a variety of
preset options, the default being ‘o’ for circles. View the full list of
options here.linewidths: width of the marker outline. Enter number
in pts.edgecolors: color of the marker outline. Enter as a
string, similar to the value of c.alpha: transparency (0 = transparent, 1 = opaque)# Generate figure and axes
fig, (ax1, ax2) = plt.subplots(ncols = 2, nrows = 1, figsize=(10, 4))
# Plot data on ax1
ax1.scatter(weight_m, height_m, c = 'royalblue', alpha = 0.5, marker = 's')
ax1.scatter(weight_f, height_f, c = 'magenta', alpha = 0.5, marker = 'o')
plt.show()Let’s add a title and some axis labels to our plot. To do this, we can use the following functions:
Be sure to add all of this code before the plt.show()
line, which renders the plot. Anything after show() will
not be applied to the figure you see.
# Generate figure and axes
fig, (ax1, ax2) = plt.subplots(ncols = 2, nrows = 1, figsize=(10, 4))
# Plot data on ax1
ax1.scatter(weight_m, height_m, c = 'royalblue', alpha = 0.5, marker = 's')
ax1.scatter(weight_f, height_f, c = 'magenta', alpha = 0.5, marker = 'o')
ax1.set_title("Height vs Weight")
ax1.set_xlabel("Weight (kg)")
ax1.set_ylabel("Height (cm)")
plt.show()Another common plot type is a histogram. We are going to put a
histogram of height distributions by sex in the blank subplot. We will
do this using the plt.hist() function. You can find the
documentation here.
At a minimum, hist() requires the data points you wish
to plot as an argument. You may also specify the bins
argument as an integer (the default is 10).
Let’s take our previous plot and add histograms for the male and
female height distributions to the subplot axes on the right, each with
5 bins. Also, add a title and axis labels for the histogram. Note: You
may have to change the figsize parameter so that the labels
all fit.
We can add a legend to a set of axes by using the legend() function. You can find the documentation here.
We automatically generate a legend by adding a parameter called label as a string in each plot we would like to include in the legend, and then calling the legend() function.
You can pass arguments to this function to specify the formatting and location of the legend, but we’ll skip that part today. Check out the documentation for full details!
In the box below, add a legend for each set of axes.
Another common plot is a line plot. This uses the plot()
function from matplotlib.pyplot (check out the
documentation here).
The minimum arguments for plot() are the x- and
y-coordinates to be plotted, which will be output with a line connecting
them. Let’s plot the function \(y =
x^2\).
The first thing we need to do is to define the list of coordinates to be plotted. Remember that even though the function we are plotting is continuous mathematically, we will still be plotting a discrete line of points. In the bow below:
numpy’s
linspace() function to create a list of 100 points between
0 and 2.Next, plot your function on the a fresh set of axes. Add any labels and other customizations you would like.
fig, ax = plt.subplots()
ax.plot(x_values, y_values, c = 'teal', linestyle = '--')
ax.set_title("Quadratic Function $y = x^2$")
ax.set_xlabel("x")
ax.set_ylabel("y")
plt.show()matplotlib.pyplot is the bread and butter of data
visualization in Python, and allows you near-arbitrary degrees of
customization for your plots.
However, the seaborn library was developed using
matplotlib to make nice-looking plots with less code.
We are going to use it to make a violin plot, because that is
something that matplotlib.pyplot does not do a nice job
of.
Going back to our patient data, we are going to make a violin plot of the patient weight distributions by sex.
We will use the violinplot() function from the
seaborn library, which we will import as
sns.
import seaborn as sns
fig, ax = plt.subplots(figsize = (4, 4))
sns.violinplot(weight_m, ax = ax)
plt.plot()## []
fig, ax = plt.subplots(figsize = (4, 4))
sns.violinplot(weight_m, ax = ax, color = 'royalblue', alpha = 0.5, linewidth=0, label = "Male")
sns.violinplot(weight_f, ax = ax, color = 'magenta', alpha = 0.5, linewidth=0, label = "Female")
ax.set_ylabel('Weight (kg)')
ax.set_title("Weight Distribution by Sex")
ax.legend()
plt.show()Now that we have created several figures, we may want to save and
export them. To do this, we will apply the savefig()
function to our figure. This function takes your desired filepath as an
input, as well as other optional parameters such as dpi
(resolution), sizing, and transparency. Let’s save our most recent
figure. We will also use fig.tight_layout() to remove any
added white space and ensure that all nothing is cut off.
Rather than using a single color to plot your data, you may want to use a color map. This is particularly true for things like heatmaps, or when you are displaying an image.
To do this, you can use existing colormaps
within matplotlib, or create
your own.
It’s important to choose a colormap that is:
It turns out that people have thought about this problem a lot and have come up with some color maps that do a great job at maximizing these properties.
My personal favourite is called viridis (watch the launch video here - surprisingly interesting), but there is actually a selection of these schemes available.
Some color schemes that may seem natural to use (especially rainbow/jet) actually tend to skew our perceptions of the data values, as seen in the photo below (sourced from here), and therefore are not recommended.
When you are creating any plot with multiple datasets/colors, keep colorblindness and black-white conversion in mind. Using different dashes in lines and shapes in markers is also a good way to do this!
Let’s use the heatmap()
function from Seaborn to generate a plot of the time progression of
three genes.
data = np.genfromtxt('data/gene_data.csv', unpack = True, delimiter = ",", skip_header=True)
print(data)## [[0.2 0.3 0.5 0.6 0.7]
## [0.3 0.4 0.4 0.5 0.4]
## [0. 0.1 0.2 0.2 0.1]
## [0.9 0.7 0.6 0.5 0.4]
## [0.6 0.3 0.5 0.7 0.4]]
fig, ax = plt.subplots(figsize = (5,4))
sns.heatmap(data, linewidth = 0.5, cmap = 'viridis', annot = True)
ax.set_xlabel("Time")
ax.set_ylabel("Gene")
ax.set_title("Gene Progression")
plt.show()Let’s make a scatter plot with weight vs height again, but make the color of the points defined by the ratio of weight to height.
## <matplotlib.colorbar.Colorbar object at 0x7fc26457fc50>
Exercises around plotting can be found here
Answers can be found here