class: center, middle, inverse, title-slide .title[ # Introduction to ggplot2 ] .author[ ### Rockefeller University, Bioinformatics Resource Centre ] .date[ ###
http://rockefelleruniversity.github.io/Plotting_In_R/
] --- ## Graphics in R The R language has extensive graphical capabilities. Graphics in R may be created by many different methods including base graphics and more advanced plotting packages such as lattice. ![](imgs/plotsinR.jpg) --- ## ggplot2 The ggplot2 package was created by Hadley Wickham to provide an intuitive plotting system to rapidly generate publication quality graphics. ggplot2 builds on the concept of the "Grammar of Graphics" (Wilkinson 2005, Bertin 1983) which describes a consistent syntax for the construction of a wide range of complex graphics by a concise description of their components. ggplot2 is a core part of the Tidyverse, a group of packages designed to make data science easy and functional in R. To get an introduction to the core concepts of Tidyverse check out our training materials [here](https://rockefelleruniversity.github.io/RU_tidyverse_core/). --- ## Why use ggplot2 The structured syntax and high level of abstraction used by ggplot2 should allow for the user to concentrate on the visualizations instead of creating the underlying code. On top of this central philosophy ggplot2 has: - Increased flexible over many plotting systems. - An advanced theme system for professional/publication level graphics. - Large developer base -- Many libraries extending its flexibility. - Large user base -- Great documentation and active mailing list. --- class: inverse, center, middle # Grammar of Graphics <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## How ggplot2 builds a plot <!-- ![](imgs/Slide2.jpg) --> <div align="center"> <img src="imgs/Slide2.jpg" alt="igv" height="500" width="550"> </div> --- ## Example scatter plot Overview of example code for the ggplot2 scatter plot. ``` r ggplot(data = <default data set>, aes(x = <default x axis variable>, y = <default y axis variable>, ... <other default aesthetic mappings>), ... <other plot defaults>) + geom_scatter(aes(size = <size variable for this geom>, ... <other aesthetic mappings>), data = <data for this point geom>, stat = <statistic string or function>, position = <position string or function>, color = <"fixed color specification">, <other arguments, possibly passed to the _stat_ function) + scale_<aesthetic>_<type>(name = <"scale label">, breaks = <where to put tick marks>, labels = <labels for tick marks>, ... <other options for the scale>) + ggtitle("Graphics/Plot")+ xlab("Weight")+ ylab("Height")+ theme(plot.title = element_text(color = "gray"), ... <other theme elements>) ``` --- ## What users are required to specify <!-- ![](imgs/Slide3.jpg) --> <div align="center"> <img src="imgs/Slide3.jpg" alt="igv" height="500" width="550"> </div> --- ## Actual example scatter plot ``` r ggplot(data=patients_clean, aes(y=Weight,x=Height,colour=Sex, size=BMI,shape=Pet)) + geom_point() ``` ![](ggplot2_files/figure-html/simple_ggplot2-1.png)<!-- --> --- class: inverse, center, middle # Getting Started With ggplot2 <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Getting started with ggplot2 First we need a dataset. Here we read some data from the *data* directory using the **read.delim()** function. We can use the **class()** function to get the data.type of our table and **dim()** function to get the numbers of row and column. ``` r library(ggplot2) patients_clean <- read.delim("data/patient-data-cleaned.txt",sep="\t") class(patients_clean) ``` ``` ## [1] "data.frame" ``` ``` r dim(patients_clean) ``` ``` ## [1] 100 17 ``` --- # Review the data We can just review the first two rows to get an idea of the content of data ``` r patients_clean[1:2,] ``` ``` ## ID Name Race Sex Smokes Height Weight Birth State Pet ## 1 AC/AH/001 Michael White Male Non-Smoker 182.87 76.57 1972-02-06 Georgia Dog ## 2 AC/AH/017 Derek White Male Non-Smoker 179.12 80.43 1972-06-15 Missouri Dog ## Grade Died Count Date.Entered.Study Age BMI Overweight ## 1 2 FALSE 0.01 2015-12-01 44 22.90 FALSE ## 2 2 FALSE -1.31 2015-12-01 43 25.07 TRUE ``` --- # Review the data.frame By default, R's read.delim function has read in the data as a data.frame. Data.frames are essential for ggplot2 as we can have mixes of numerical, character and catergorical data in one table. ``` r patients_clean$Smokes[1:5] ``` ``` ## [1] "Non-Smoker" "Non-Smoker" "Non-Smoker" "Non-Smoker" "Non-Smoker" ``` ``` r patients_clean$Height[1:5] ``` ``` ## [1] 182.87 179.12 169.15 175.66 164.47 ``` --- # Review the data.frame with summary We can get an overview of the data in all columns of data.frame using the **summary()** function ``` r summary(patients_clean) ``` ``` ## ID Name Race Sex ## Length:100 Length:100 Length:100 Length:100 ## Class :character Class :character Class :character Class :character ## Mode :character Mode :character Mode :character Mode :character ## ## ## ## ## Smokes Height Weight Birth ## Length:100 Min. :157.0 Min. :63.54 Length:100 ## Class :character 1st Qu.:161.5 1st Qu.:68.17 Class :character ## Mode :character Median :165.7 Median :72.27 Mode :character ## Mean :167.9 Mean :74.89 ## 3rd Qu.:174.5 3rd Qu.:80.56 ## Max. :185.4 Max. :97.67 ## ## State Pet Grade Died ## Length:100 Length:100 Min. :1.000 Mode :logical ## Class :character Class :character 1st Qu.:1.000 FALSE:46 ## Mode :character Mode :character Median :2.000 TRUE :54 ## Mean :2.054 ## 3rd Qu.:3.000 ## Max. :3.000 ## NA's :7 ## Count Date.Entered.Study Age BMI ## Min. :-3.1400 Length:100 Min. :42.00 Min. :21.41 ## 1st Qu.:-0.8100 Class :character 1st Qu.:42.75 1st Qu.:25.07 ## Median :-0.0550 Mode :character Median :43.00 Median :26.51 ## Mean :-0.1066 Mean :43.09 Mean :26.54 ## 3rd Qu.: 0.6150 3rd Qu.:44.00 3rd Qu.:27.90 ## Max. : 1.7900 Max. :44.00 Max. :31.70 ## ## Overweight ## Mode :logical ## FALSE:23 ## TRUE :77 ## ## ## ## ``` --- ## Our first ggplot2 graph As seen above, in order to produce a ggplot2 graph we need a minimum of: - Data to be used in graph - Mappings of data to the graph (*aesthetic* mapping) - What type of graph we want to use (The *geom* to use). --- ## Our first ggplot2 graph In the code below we define the data as our cleaned patients data frame. ``` r pcPlot <- ggplot(data=patients_clean) class(pcPlot) ``` ``` ## [1] "gg" "ggplot" ``` Now we can see that we have gg/ggplot object (pcPlot). --- ## Our first ggplot2 graph Within this gg/ggplot object the data has been defined. ``` r pcPlot$data[1:4,] ``` ``` ## ID Name Race Sex Smokes Height Weight Birth State ## 1 AC/AH/001 Michael White Male Non-Smoker 182.87 76.57 1972-02-06 Georgia ## 2 AC/AH/017 Derek White Male Non-Smoker 179.12 80.43 1972-06-15 Missouri ## 3 AC/AH/020 Todd Black Male Non-Smoker 169.15 75.48 1972-07-09 Pennsylvania ## 4 AC/AH/022 Ronald White Male Non-Smoker 175.66 94.54 1972-08-17 Florida ## Pet Grade Died Count Date.Entered.Study Age BMI Overweight ## 1 Dog 2 FALSE 0.01 2015-12-01 44 22.90 FALSE ## 2 Dog 2 FALSE -1.31 2015-12-01 43 25.07 TRUE ## 3 None 2 FALSE -0.17 2015-12-01 43 26.38 TRUE ## 4 Cat 1 FALSE -1.10 2015-12-01 43 30.64 TRUE ``` --- ## Our first ggplot2 graph Important information on how to map the data to the visual properties (aesthetics) of the plot as well as what type of plot to use (geom) have however yet to specified. ``` r pcPlot$mapping ``` ``` ## Aesthetic mapping: ## <empty> ``` ``` r pcPlot$theme ``` ``` ## list() ``` ``` r pcPlot$layers ``` ``` ## list() ``` --- ## Our first ggplot2 graph The information to map the data to the plot can be added now using the aes() function. ``` r pcPlot <- ggplot(data=patients_clean) pcPlot <- pcPlot+aes(x=Height,y=Weight) pcPlot$mapping ``` ``` ## Aesthetic mapping: ## * `x` -> `Height` ## * `y` -> `Weight` ``` ``` r pcPlot$theme ``` ``` ## list() ``` ``` r pcPlot$layers ``` ``` ## list() ``` But we are still missing the final component of our plot, the type of plot to use (geom). --- ## Our first ggplot2 graph Below the geom_point function is used to specify a point plot, a scatter plot of Height values on the x-axis versus Weight values on the y values. ``` r pcPlot <- ggplot(data=patients_clean) pcPlot <- pcPlot+aes(x=Height,y=Weight) pcPlot <- pcPlot+geom_point() ``` --- ## Our first ggplot2 graph ``` r pcPlot$mapping ``` ``` ## Aesthetic mapping: ## * `x` -> `Height` ## * `y` -> `Weight` ``` ``` r pcPlot$theme ``` ``` ## list() ``` ``` r pcPlot$layers ``` ``` ## [[1]] ## geom_point: na.rm = FALSE ## stat_identity: na.rm = FALSE ## position_identity ``` --- ## Our first ggplot2 graph Now we have all the components of our plot, we need we can display the results. ``` r pcPlot ``` ![](ggplot2_files/figure-html/ggplot_aes_geom_display_ggplot2-1.png)<!-- --> --- class: inverse, center, middle # Geoms <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- --- ## Our first ggplot2 graph More typically, the data and aesthetics are defined within ggplot function and geoms applied afterwards. This makes it easier to switch between plot types to find the best way to visualize your data. ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Height,y=Weight)) pcPlot+geom_point() ``` ![](ggplot2_files/figure-html/ggplot_simple_geom_point_ggplot2-1.png)<!-- --> ## Plot types There are many geoms available in ggplot2: * geom_point() - Scatter plots * geom_line() - Line plots * geom_smooth() - Fitted line plots * geom_bar() - Bar plots * geom_boxplot() - Boxplots * geom_jitter() - Jitter to plots * geom_histogram() - Histogram plots * geom_density() - Density plots * geom_text() - Text to plots * geom_errorbar() - Errorbars to plots * geom_violin() - Violin plots --- ## Line plots ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Height,y=Weight)) pcPlot_line <- pcPlot+geom_line() pcPlot_line ``` ![](ggplot2_files/figure-html/line_simple_ggplot2-1.png)<!-- --> --- ## Smoothed line plots ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Height,y=Weight)) pcPlot_smooth <- pcPlot+geom_smooth() pcPlot_smooth ``` ``` ## `geom_smooth()` using method = 'loess' and formula = 'y ~ x' ``` ![](ggplot2_files/figure-html/smooth_simple_ggplot2-1.png)<!-- --> --- ## Bar plots ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Sex)) pcPlot_bar <- pcPlot+geom_bar() pcPlot_bar ``` ![](ggplot2_files/figure-html/bar_simple_ggplot2-1.png)<!-- --> --- ## Histograms ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Height)) pcPlot_hist <- pcPlot+geom_histogram() pcPlot_hist ``` ``` ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` ![](ggplot2_files/figure-html/histogram_simple_ggplot2-1.png)<!-- --> --- ## Density plots ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Height)) pcPlot_density <- pcPlot+geom_density() pcPlot_density ``` ![](ggplot2_files/figure-html/density_simple_ggplot2-1.png)<!-- --> --- ## Box plots ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Sex,y=Height)) pcPlot_boxplot <- pcPlot+geom_boxplot() pcPlot_boxplot ``` ![](ggplot2_files/figure-html/boxplot_simple_ggplot2-1.png)<!-- --> --- ## Violin plots ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Sex,y=Height)) pcPlot_violin <- pcPlot+geom_violin() pcPlot_violin ``` ![](ggplot2_files/figure-html/violin_simple_ggplot2-1.png)<!-- --> --- ## There are a world of geoms An overview of geoms and thier arguments can be found in the ggplot2 documentation or within the ggplot2 quick reference guides. - [ggplot2 documentation](https://ggplot2.tidyverse.org/) - [ggplot2 guide](http://sape.inf.usi.ch/quick-reference/ggplot2/geom) --- class: inverse, center, middle # Aesthetics <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Aesthetics In order to change the property on an aesthetic of a plot into a *constant* value (e.g. set color of all points to red) we can supply the color argument to the geom_point() function. ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Height,y=Weight)) pcPlot+geom_point(colour="red") ``` ![](ggplot2_files/figure-html/scatter_coloured_ggplot2-1.png)<!-- --> --- ## Plot properties .pull-left[ As we discussed earlier however, ggplot2 makes use of aesthetic mappings to assign variables in the data to the properties/aesthetics of the plot. This allows the properties of the plot to reflect variables in the data *dynamically*. In these examples we supply additional information to the aes() function to define what information to display and how it is represented in the plot. ] .pull-right[ First we can recreate the plot we saw earlier. ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Height, y=Weight)) pcPlot+geom_point() ``` ![](ggplot2_files/figure-html/scatter_simple_ggplot2-1.png)<!-- --> ] --- ## Color Now we can adjust the aes mapping by supplying an argument to the color parameter in the aes function. (Note that ggplot2 accepts "color" or "colour" as parameter name) This simple adjustment allows for identification of the separation between male and female measurements for height and weight. ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Height,y=Weight,color=Sex)) pcPlot + geom_point() ``` ![](ggplot2_files/figure-html/scatter_aes_sexcolor_ggplot2-1.png)<!-- --> --- ## Point shape Similarly the shape of points may be adjusted. ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Height,y=Weight,shape=Sex)) pcPlot+geom_point() ``` ![](ggplot2_files/figure-html/scatter_aes_sexShapeB_ggplot2-1.png)<!-- --> --- ## Aesthetics in geom The aesthetic mappings may be set directly in the geom_points() function as previously when specifying red. This can allow the same ggplot object to be used by different aesethetic mappings and varying geoms ``` r pcPlot <- ggplot(data=patients_clean) ``` --- ``` r pcPlot+geom_point(aes(x=Height,y=Weight,colour=Sex)) ``` ![](ggplot2_files/figure-html/aes_in_geomFS2_ggplot2-1.png)<!-- --> --- ``` r pcPlot+geom_point(aes(x=Height,y=Weight,colour=Smokes)) ``` ![](ggplot2_files/figure-html/aes_in_geomFS3_ggplot2-1.png)<!-- --> --- ``` r pcPlot+geom_point(aes(x=Height,y=Weight,colour=Smokes,shape=Sex)) ``` ![](ggplot2_files/figure-html/aes_in_geomFS4_ggplot2-1.png)<!-- --> --- ``` r pcPlot+geom_violin(aes(x=Sex,y=Height,fill=Smokes)) ``` ![](ggplot2_files/figure-html/aes_in_geomFS5_ggplot2-1.png)<!-- --> --- ## Aesthetics in geom Again, for a comprehensive list of parameters and aesthetic mappings used in geom_*type* functions see the ggplot2 documentation for individual geoms by using ?geom_*type* ``` r ?geom_point ``` or visit the ggplot2 documentations pages and quick reference: - [ggplot2 documentation](http://docs.ggplot2.org/current/) - [Quick Reference](http://sape.inf.usi.ch/quick-reference/ggplot2/geom) --- class: inverse, center, middle # Facets <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Facets One very useful feature of ggplot is faceting. This allows you to produce several plots that subset by variables in your data. To facet our data into multiple plots we can use the *facet_wrap* or *facet_grid* function specifying the variable we split by. The *facet_grid* function is well suited to splitting the data by two factors. --- ## Split by 2 factors Here we can plot the data with the Smokes variable as rows and Sex variable as columns. <div align="center"> facet_grid(Rows~Columns) </div> ``` r pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight, colour=Sex))+geom_point() pcPlot + facet_grid(Smokes~Sex) ``` ![](ggplot2_files/figure-html/facet_grid_SmokesBySex_ggplot2-1.png)<!-- --> --- ## Split by 1 factor To split by one factor we use the the facet_grid() function again, but omit the variable before the "~". This will facet along columns in plot. <div align="center"> facet_grid(~Columns) </div> ``` r pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight, colour=Sex))+geom_point() pcPlot + facet_grid(~Sex) ``` ![](ggplot2_files/figure-html/facet_grid_BySex_ggplot2-1.png)<!-- --> --- ## Split by 1 factor Similarly, to split along rows in plot, the variable is placed before the "~.". <div align="center"> facet_grid(Rows~.) </div> ``` r pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight, colour=Sex))+geom_point() pcPlot + facet_grid(Sex~.) ``` ![](ggplot2_files/figure-html/facet_grid_SexBy_ggplot2-1.png)<!-- --> --- ## facet_wrap() The *facet_wrap()* function offers a less grid-based structure but is well suited to faceting data by one variable. For *facet_wrap()* we follow as similar syntax to *facet_grid()*. ``` r pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight, colour=Sex))+geom_point() pcPlot + facet_wrap(~Smokes) ``` ![](ggplot2_files/figure-html/facet_Wrap_BySmokes_ggplot2_2-1.png)<!-- --> --- ## Multiple variables For more complex faceting both *facet_grid* and *facet_wrap* can accept combinations of variables. Here we use *facet_wrap*. ``` r pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight, colour=Sex))+geom_point() pcPlot + facet_wrap(~Pet+Smokes+Sex) ``` ![](ggplot2_files/figure-html/facet_wrap_smokesBySexandPet_ggplot2-1.png)<!-- --> --- ## Multiple variables Or in a nice grid format using facet_grid() and the Smokes variable against a combination of Gender and Pet. ``` r pcPlot + facet_grid(Smokes~Sex+Pet) ``` ![](ggplot2_files/figure-html/facet_grid_smokesBySexandPet_ggplot2_2-1.png)<!-- --> --- class: inverse, center, middle # Plotting Order <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Plotting order in ggplot We will shortly discuss how to change various aspects of the plot layout and appearance. However, a common-asked question is how to change the order in which R plots a categorical variable. Consider the boxplot to compare weights of males and females: ``` r ggplot(patients_clean, aes(x=Sex, y=Weight)) + geom_boxplot() ``` ![](ggplot2_files/figure-html/plotOrderBoxplot_ggplot2-1.png)<!-- --> --- ## Plotting order and factors Here, R decides the order to arrange the boxes according to the `levels` of the categorical variable. If there are no levels or the levels are not ordered it defaults to the alphabetical order. i.e. Female before Male. ``` r levels(patients_clean$Sex) ``` ``` ## NULL ``` --- ## Plotting order and factors Depending on the message we want the plot to convey, we might want control over the order of boxes. The `factor` functions allows us to explicitly change the order of the levels. ``` r patients_clean$Sex <- factor(patients_clean$Sex, levels=c("Male","Female")) ggplot(patients_clean,aes(x=Sex, y=Weight)) + geom_boxplot() ``` ![](ggplot2_files/figure-html/plotOrderControlBoxplot_ggplot2-1.png)<!-- --> --- Exercise on the principles of ggplot can be found [here](../../exercises/exercises/ggplot2_1_exercise.html) --- Answers for the principles of ggplot can be found [here](../../exercises/answers/ggplot2_1_answers.html) --- class: inverse, center, middle # Scales <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Scales Scales and their legends have so far been handled using ggplot2 defaults. ggplot2 offers functionality to have finer control over scales and legends using the *scale* methods. Scale methods are divided into functions by combinations of * the aesthetics they control. * the type of data mapped to scale. scale_aesthetic_type Try typing in scale_ then *tab* to autocomplete. This will provide some examples of the scale functions available in ggplot2. --- ## Arguments Although different *scale* functions accept some variety in their arguments, common arguments to scale functions include: - name - The axis or legend title - limits - Minimum and maximum of the scale - breaks - Label/tick positions along an axis - labels - Label names at each break --- ## Controlling the X and Y scale. Both continuous and discrete X/Y scales can be controlled in ggplot2 using: scale\_**(x/y)**\_**(continuous/discrete)** --- ## Continuous axes scales In this example we control the continuous scale on the x-axis by providing a name, X-axis limits, the positions of breaks (ticks/labels) and the labels to place at breaks. ``` r pcPlot + geom_point() + facet_grid(Smokes~Sex)+ scale_x_continuous(name="height ('cm')", limits = c(100,200), breaks=c(125,150,175), labels=c("small","justright","tall")) ``` ![](ggplot2_files/figure-html/scaleCont_ggplot2, facet_grid_smokesBySex_scalex-1.png)<!-- --> --- ## Discrete axes scales Similary control over discrete scales is shown below. ``` r pcPlot <- ggplot(data=patients_clean,aes(x=Sex,y=Height)) pcPlot + geom_violin(aes(x=Sex,y=Height)) + scale_x_discrete(labels=c("Women", "Men")) ``` ![](ggplot2_files/figure-html/scaleDiscrete_ggplot2, facet_grid_smokesBySex_scaleDisceteX-1.png)<!-- --> --- ## Combining axes scales Multiple X/Y scales can be combined to give full control of axis marks. ``` r pcPlot <- ggplot(data=patients_clean,aes(x=Sex,y=Height,fill=Smokes)) pcPlot + geom_violin(aes(x=Sex,y=Height)) + scale_x_discrete(labels=c("Women", "Men"))+ scale_y_continuous(breaks=c(160,180),labels=c("Short", "Tall")) ``` ![](ggplot2_files/figure-html/scaleFullControl_ggplot2, facet_grid_smokesBySex_scaleDisceteXContinuousY-1.png)<!-- --> --- ## Controlling other scales When using fill, color, linetype, shape, size or alpha aesthetic mappings the scales are automatically selected for you and the appropriate legends created. ``` r pcPlot <- ggplot(data=patients_clean, aes(x=Height,y=Weight,colour=Sex)) pcPlot + geom_point(size=4) ``` ![](ggplot2_files/figure-html/scaleOthers_ggplot2, facet_grid_height_weight-1.png)<!-- --> In the above example the discrete colours for the Sex variable was selected by default. --- ## Manual discrete color scale Manual control of discrete variables can be performed using scale\_*aes_Of_Interest*\_**manual** with the *values* parameter. Additionally in this example an updated name for the legend is provided. ``` r pcPlot <- ggplot(data=patients_clean, aes(x=Height,y=Weight,colour=Sex)) pcPlot + geom_point(size=4) + scale_colour_manual(values = c("Green","Purple"), name="Gender") ``` ![](ggplot2_files/figure-html/facet_grid_height_weight_manualScale_ggplot2-1.png)<!-- --> --- ## Colorbrewer for color scales Here we have specified the colours to be used (hence the manual) but when the number of levels to a variable are high this may be impractical and often we would like ggplot2 to choose colours from a scale of our choice. The brewer set of scale functions allow the user to make use of a range of palettes available from colorbrewer. - **Diverging** *BrBG, PiYG, PRGn, PuOr, RdBu, RdGy, RdYlBu, RdYlGn, Spectral* - **Qualitative** *Accent, Dark2, Paired, Pastel1, Pastel2, Set1, Set2, Set3* - **Sequential** *Blues, BuGn, BuPu, GnBu, Greens, Greys, Oranges, OrRd, PuBu, PuBuGn, PuRd, Purples, RdPu, Reds, YlGn, YlGnBu, YlOrBr, YlOrRd* --- ## scale_color_brewer ``` r pcPlot <- ggplot(data=patients_clean, aes(x=Height,y=Weight,colour=Pet)) pcPlot + geom_point(size=4) + scale_colour_brewer(palette = "Set2") ``` ``` ## Warning: Removed 5 rows containing missing values or values outside the scale range ## (`geom_point()`). ``` ![](ggplot2_files/figure-html/facet_grid_height_weight_brewerScale_ggplot2-1.png)<!-- --> --- ## Colorbrewer palettes For more details on palette sizes and styles visit the colorbrewer website and ggplot2 reference page. - [Colorbrewer](http://colorbrewer2.org/) - [ggplot2 color scales](https://ggplot2.tidyverse.org/reference/scale_brewer.html) --- class: inverse, center, middle # Continuous Scales <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Continuous scales So far we have looked a qualitative scales but ggplot2 offers much functionality for continuous scales such as for size, alpha (transparency), color and fill. - scale_alpha_continuous() - For transparency - scale_size_continuous() - For control of size. --- ## Alpha Both these functions accept the range of alpha/size to be used in plotting. Below the range of alpha to be used in plot is limited to between 0.5 and 1. ``` r pcPlot <- ggplot(data=patients_clean, aes(x=Height,y=Weight,alpha=BMI)) pcPlot + geom_point(size=4) + scale_alpha_continuous(range = c(0.5,1)) ``` ![](ggplot2_files/figure-html/facet_grid_height_weight_BMIalpha_ggplot2-1.png)<!-- --> --- ## Size Below the range of sizes to be used in plot is limited to between 3 and 6. ``` r pcPlot <- ggplot(data=patients_clean, aes(x=Height,y=Weight,size=BMI)) pcPlot + geom_point(alpha=0.8) + scale_size_continuous(range = c(3,6)) ``` ![](ggplot2_files/figure-html/facet_grid_height_weight_BMIsize_ggplot2-1.png)<!-- --> --- ## Limits The limits of the scale can also be controlled but it is important to note data outside of scale is removed from plot. ``` r pcPlot <- ggplot(data=patients_clean, aes(x=Height,y=Weight,size=BMI)) pcPlot + geom_point() + scale_size_continuous(range = c(3,6), limits = c(25,40)) ``` ![](ggplot2_files/figure-html/facet_grid_height_weight_BMIsizeLimits_ggplot2-1.png)<!-- --> --- ## Labels What points of scale to be labeled and labels text can also be controlled. ``` r pcPlot <- ggplot(data=patients_clean, aes(x=Height,y=Weight,size=BMI)) pcPlot + geom_point() + scale_size_continuous(range = c(3,6), breaks=c(25,30), labels=c("Good","Good but not 25")) ``` ![](ggplot2_files/figure-html/facet_grid_height_weight_BMIsizewithBreaks_ggplot2-1.png)<!-- --> --- ## Color Control of color/fill scales can be best achieved through the **gradient** subfunctions of scale. - scale_(colour/fill)_*gradient* - 2 colour gradient (eg. low to high BMI) - scale\_(colour/fill)\_*gradient2* - Diverging colour scale with a midpoint colour (e.g. Down, No Change, Up) Both functions take a common set of arguments:- - low - colour for low end of gradient scale - high - colour for high end of gradient scale. - na.value - colour for any NA values. --- ## Color An example using scale\_color\_gradient below sets the low and high end colors to White and Red respectively ``` r pcPlot <- ggplot(data=patients_clean, aes(x=Height,y=Weight,colour=BMI)) pcPlot + geom_point(size=4,alpha=0.8) + scale_colour_gradient(low = "White",high="Red") ``` ![](ggplot2_files/figure-html/facet_grid_height_weight_BMIgradient_ggplot2-1.png)<!-- --> --- ## Color Similarly we can use the scale_color_gradient2 function which allows for the specification of a midpoint value and its associated color. ``` r pcPlot <- ggplot(data=patients_clean, aes(x=Height,y=Weight,colour=BMI)) pcPlot + geom_point(size=4,alpha=0.8) + scale_colour_gradient2(low = "Blue",mid="Black", high="Red", midpoint = median(patients_clean$BMI)) ``` ![](ggplot2_files/figure-html/facet_grid_height_weight_BMIgradient2_ggplot2-1.png)<!-- --> --- ## Labels As with previous continuous scales, limits and custom labels in scale legend can be added. ``` r pcPlot <- ggplot(data=patients_clean, aes(x=Height,y=Weight,colour=BMI)) pcPlot + geom_point(size=4,alpha=0.8) + scale_colour_gradient2(low = "Blue", mid="Black", high="Red", midpoint = median(patients_clean$BMI), breaks=c(25,30),labels=c("Low","High"), name="Body Mass Index") ``` ![](ggplot2_files/figure-html/facet_grid_height_weight_BMIgradient2plus_ggplot2-1.png)<!-- --> --- ## Scales are very customizable Multiple scales may be combined to create high customizable plots and scales ``` r pcPlot <- ggplot(data=patients_clean, aes(x=Height,y=Weight,colour=BMI,shape=Sex)) pcPlot + geom_point(size=4,alpha=0.8)+ scale_shape_discrete(name="Gender") + scale_colour_gradient2(low = "Blue",mid="Black",high="Red", midpoint = median(patients_clean$BMI), breaks=c(25,30),labels=c("Low","High"), name="Body Mass Index") ``` ![](ggplot2_files/figure-html/facet_grid_smokesBySex_scaleDisceteXContinuouswY_ggplot2-1.png)<!-- --> --- class: inverse, center, middle # Transformations <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Statistical transformations In ggplot2 many of the statistical transformations are performed without any direct specification e.g. geom_histogram() will use stat_bin() function to generate bin counts to be used in plot. An example of statistical methods in ggplot2 which are very useful include the stat_smooth() and stat_summary() functions. --- ## Fitting lines The stat_smooth() function can be used to fit a line to the data being displayed. ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Weight,y=Height)) pcPlot+geom_point()+stat_smooth() ``` ``` ## `geom_smooth()` using method = 'loess' and formula = 'y ~ x' ``` ![](ggplot2_files/figure-html/stat_smooth_ggplot2-1.png)<!-- --> --- ## Loess and more By default a "loess" smooth line is plotted by stat_smooth. Other methods available include lm, glm, gam, rlm. ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Weight,y=Height)) pcPlot+geom_point()+stat_smooth(method="lm") ``` ``` ## `geom_smooth()` using formula = 'y ~ x' ``` ![](ggplot2_files/figure-html/stat_smoothlm_ggplot2-1.png)<!-- --> --- ## Fitting lines in groups A useful feature of ggplot2 is that it uses previously defined grouping when performing smoothing. If color by Sex is an aesthetic mapping then two smooth lines are drawn, one for each sex. ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Weight,y=Height,colour=Sex)) pcPlot+geom_point()+stat_smooth(method="lm") ``` ``` ## `geom_smooth()` using formula = 'y ~ x' ``` ![](ggplot2_files/figure-html/stat_smoothlmgroups_ggplot2-1.png)<!-- --> --- ## Fitting lines in groups This behavior can be overridden by specifying an aes within the stat_smooth() function and setting inherit.aes to FALSE. ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Weight,y=Height,colour=Sex)) pcPlot+geom_point()+stat_smooth(aes(x=Weight,y=Height),method="lm", inherit.aes = F) ``` ``` ## `geom_smooth()` using formula = 'y ~ x' ``` ![](ggplot2_files/figure-html/stat_smoothlmgroupsOverridden_ggplot2-1.png)<!-- --> --- ## Summary statistics Another useful method is stat_summary() which allows for a custom statistical function to be performed and then visualized. The fun parameter specifies a function to apply to the y variables for every value of x. In this example we use it to plot the quantiles of the Female and Male Height data ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Sex,y=Height)) + geom_jitter() pcPlot + stat_summary(fun=quantile, geom="point", colour="purple", size=8) ``` ![](ggplot2_files/figure-html/stat_summary_ggplot2-1.png)<!-- --> --- class: inverse, center, middle # Themes <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Themes Themes specify the details of data independent elements of the plot. This includes titles, background colour, text fonts etc. The graphs created so far have all used the default themes, `theme_grey()`, but ggplot2 allows for the specification of theme used. --- ## Predefined themes Predefined themes can be applied to a ggplot2 object using a family of functions theme_*style*() .pull-left[ Here is a scatter with the default theme... ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Weight,y=Height))+geom_point() pcPlot ``` ![](ggplot2_files/figure-html/theme_default_ggplot2-1.png)<!-- --> ] .pull-right[ ...and the same scatter plot with the minimal theme. ``` r pcPlot+theme_minimal() ``` ![](ggplot2_files/figure-html/theme_minimal_ggplot2-1.png)<!-- --> ] --- ## Predefined themes Several predifined themes are available within ggplot2 including: - theme_bw - theme_classic - theme_dark - theme_gray - theme_light - theme_linedraw - theme_minimal Packages such as [ggthemes](https://github.com/jrnold/ggthemes) also contain many useful collections of predined theme_*style* functions. --- ## Custom themes As well as making use of predefined theme styles, ggplot2 allows for control over the attributes and elements within a plot through a collection of related functions and attributes. **theme()** is the global function used to set attributes for the collections of elements/components making up the current plot. .pull-left[ Within the theme functions there are 4 general graphic elements which may be controlled... - rect - line - text - title ] .pull-right[ ...and 5 groups of related elements: - axis - legend - strip - panel (plot panel) - plot (Global plot parameters) ] --- ## Custom themes These elements may be specified by the use of their appropriate element functions including: - element_line() - element_text() - element_rect() and additionally element_blank() to set an element to "blank". --- ## Custom themes A detailed description of controlling elements within a theme can be seen at the ggplot2 vignette and by typing *?theme* into the console. - [ggplot2 themes](https://ggplot2.tidyverse.org/reference/index.html#section-themes) --- ## Customizing your theme .pull-left[ To demonstrate customizing a theme, in the example below we alter one element of theme. Here we will change the text colour for the plot. - Note because we are changing a *text* element we use the *element_text()* function. A detailed description of which elements are available and their associated element functions can be found by typing *?theme*. ] .pull-right[ ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Weight,y=Height))+ geom_point() pcPlot+ theme( text = element_text(colour="red") ) ``` ![](ggplot2_files/figure-html/theme_custom_ggplot2-1.png)<!-- --> ] --- ## Customizing your theme If we wished to set the y-axis label to be at an angle we can adjust that as well. ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Weight,y=Height))+geom_point() pcPlot + theme(text = element_text(colour="red"), axis.title.y = element_text(angle=0)) ``` ![](ggplot2_files/figure-html/theme_custom1_ggplot2-1.png)<!-- --> --- ## Customizing your theme Finally we may wish to remove axis line, set the background of plot panels to be white and give the strips (title above facet) a cyan background colour. ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Weight,y=Height))+ geom_point()+ facet_grid(Sex~Smokes) pcPlot+ theme( text = element_text(colour="red"), axis.title.y = element_text(angle=0), axis.line = element_line(linetype = 0), panel.background=element_rect(fill="white"), strip.background=element_rect(fill="cyan") ) ``` --- ## Customizing your theme Finally we may wish to remove axis line, set the background of plot panels to be white and give the strips (title above facet) a cyan background colour. ![](ggplot2_files/figure-html/theme_custom22_ggplot2-1.png)<!-- --> --- ## Useful example for legend A useful example of using the theme can be seen in controlling the legend. By default the legend is in right of plot. ``` r pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight, colour=Sex))+geom_point() pcPlot ``` ![](ggplot2_files/figure-html/legendD_ggplot2-1.png)<!-- --> --- ## Useful example for legend By modifying the theme we can control the legend positioning. ``` r pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight, colour=Sex))+geom_point() pcPlot+theme(legend.position="left") ``` ![](ggplot2_files/figure-html/legendleft_ggplot2-1.png)<!-- --> --- ## Useful example for legend We can control all aspects of a legend as we can for other theme elements. ``` r pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight, colour=Sex))+geom_point() pcPlot+theme(legend.text = element_text(colour="darkred"), legend.title = element_text(size=20), legend.position = "bottom" ) ``` ![](ggplot2_files/figure-html/legendText_ggplot2-1.png)<!-- --> --- ## + and %+replace% When altering themes we have been using the **+** operator to add themes as we would adding geoms,scales and stats. When using the **+** operator - Themes elements specified in new scheme replace elements in old theme - Theme elements in the old theme which have not been specified in new theme are maintained. This makes the **+** operator useful for building up from old themes. --- ## The **+** operator In the example below, we maintain all elements set by theme_bw() but overwrite the theme element attribute of the colour of text. ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Weight,y=Height))+geom_point()+theme_bw() pcPlot+ theme(text = element_text(colour="red")) ``` ![](ggplot2_files/figure-html/theme_custom8_ggplot2-1.png)<!-- --> --- ## **%+replace%** In contrast **%+replace%** replaces all elements within a theme regardless of whether they have been previously specfied in old theme. When using the **%+replace%** operator: - Theme elements specified in new scheme replace elements in old theme - Theme elements in the old theme which have not been specified in new theme are also replaced by blank theme elements. ``` r oldTheme <- theme_bw() newTheme_Plus <- theme_bw() + theme(text = element_text(colour="red")) newTheme_Replace <- theme_bw() %+replace% theme(text = element_text(colour="red")) ``` --- ## + and %+replace% ### Original theme ``` r oldTheme$text ``` ``` ## List of 11 ## $ family : chr "" ## $ face : chr "plain" ## $ colour : chr "black" ## $ size : num 11 ## $ hjust : num 0.5 ## $ vjust : num 0.5 ## $ angle : num 0 ## $ lineheight : num 0.9 ## $ margin : 'margin' num [1:4] 0points 0points 0points 0points ## ..- attr(*, "unit")= int 8 ## $ debug : logi FALSE ## $ inherit.blank: logi TRUE ## - attr(*, "class")= chr [1:2] "element_text" "element" ``` --- ## + and %+replace% ### Theme modified with **+** ``` r newTheme_Plus$text ``` ``` ## List of 11 ## $ family : chr "" ## $ face : chr "plain" ## $ colour : chr "red" ## $ size : num 11 ## $ hjust : num 0.5 ## $ vjust : num 0.5 ## $ angle : num 0 ## $ lineheight : num 0.9 ## $ margin : 'margin' num [1:4] 0points 0points 0points 0points ## ..- attr(*, "unit")= int 8 ## $ debug : logi FALSE ## $ inherit.blank: logi FALSE ## - attr(*, "class")= chr [1:2] "element_text" "element" ``` --- ## + and %+replace% ### Theme modified with %+replace% ``` r newTheme_Replace$text ``` ``` ## List of 11 ## $ family : NULL ## $ face : NULL ## $ colour : chr "red" ## $ size : NULL ## $ hjust : NULL ## $ vjust : NULL ## $ angle : NULL ## $ lineheight : NULL ## $ margin : NULL ## $ debug : NULL ## $ inherit.blank: logi FALSE ## - attr(*, "class")= chr [1:2] "element_text" "element" ``` This means that %+replace% is most useful when creating new themes. --- ## theme_get and theme_set In the examples we have shown you we have been modifying the theme for a specific plot. But once you have a theme that you really like you may want it to apply to every plot you draw. The active theme is automatically applied to every plot you draw. Use theme_get to get the current theme, and theme_set to completely override it. ``` r newTheme <- theme_bw() theme_set(newTheme) myTheme <- theme_get() ``` --- class: inverse, center, middle # Titles and Labels <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Adding titles for plot and labels So far no plot titles have been specified. Plot titles can be specified using the labs functions. ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Weight,y=Height))+geom_point() pcPlot+labs(title="Weight vs Height",y="Height (cm)") ``` ![](ggplot2_files/figure-html/theme_labs_ggplot2-1.png)<!-- --> --- ## Adding titles for plot and labels You can also specify titles using the ggtitle and xlab/ylab functions. ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Height,y=Weight))+geom_point() pcPlot+ggtitle("Weight vs Height")+ylab("Height (cm)") ``` ![](ggplot2_files/figure-html/theme_ggtitle_ggplot2-1.png)<!-- --> --- class: inverse, center, middle # Saving Plots <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- ## Saving plots Plots produced by ggplot can be saved in the same way as [base plots](../singlepage/basePlotting.html#saving-your-plots) The ggsave() function allows for additional arguments to be specified including the type, resolution and size of plot. By default ggsave() will use the size of your current graphics window when saving plots so it may be important to specify width and height arguments desired. ``` r pcPlot <- ggplot(data=patients_clean, mapping=aes(x=Weight,y=Height))+geom_point() ggsave(pcPlot,filename = "anExampleplot.png",width = 15, height = 15,units = "cm") ``` --- Exercise on scales and themes in ggplot can be found [here](../../exercises/exercises/ggplot2_2_exercise.html) --- Exercise on scales and themes in ggplot can be found [here](../../exercises/answers/ggplot2_2_answers.html) --- ## References - [Layered grammar of graphics](http://vita.had.co.nz/papers/layered-grammar.pdf) - [ggplot2 documentation](http://docs.ggplot2.org/current/) - [ggplot2 wiki](https://github.com/hadley/ggplot2/wiki) - [ggplot2 mailing list](http://groups.google.com/group/ggplot2) - [Cheatsheet](http://sape.inf.usi.ch/quick-reference/ggplot2/geom) - [Even more material](http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html) --- ## Contact Any suggestions, comments, edits or questions (about content or the slides themselves) please reach out to our [GitHub](https://github.com/RockefellerUniversity/Plotting_In_R/issues) and raise an issue.