Advanced Visualisation and Data Wrangling in R
TJ McKinley
Introduction
This workshop requires that you’re comfortable with R, and specifically with the concept of data.frame
objects. The ability to work with and visualise data frames is one of the key reasons why R is so popular amongst statisticians and data scientists. Although a vast amount can be achieved using base R functionality, one of R’s other key strengths is the vast array of packages that it supports, which add a rich variety of additional functionality to R.
A suite of packages that are fast becoming de rigueur for performing myriad data science tasks is known as the tidyverse
. These packages provide powerful functions for doing visualisation and manipulation of complex data sets. In this workshop we will introduce key tidyverse
packages, such as readr
, tidyr
, dplyr
and ggplot2
, and show how they can be used to efficiently process and visualise complex data.
tidyverse
packages
The tidyverse
is a suite of packages, including tidyr
, dplyr
, ggplot2
, purrr
, tibble
and readr
. Although these packages can each be installed and loaded separately, they are designed to work together, and as such will simply install and load the tidyverse
directly, rather than worry too much about which functions belong to which packages.
To install tidyverse
, use:
install.packages("tidyverse")
and once installed, it can be loaded using:
library(tidyverse)
in the usual way.
Note: if you are loading
tidyverse
as part of an R Markdown document, and you want to knit to a PDF document using LaTeX, then it sometimes throws an error when loading because LaTeX can’t process the correct fonts for the loading message. Hence in R Markdown documents I always suppress the load messages through the chunk optionmessage = F
e.g.```{r, message = F} library(tidyverse) ```
Data files and slides
All data files can be downloaded as a ZIP file from here. PDF copies of the workshop slides can be found through the following links: