Make reproducible tables of summary stats and contrasts tables with flextable in R
flextable
R
Author
Affiliation
Daniel Padfield
University of Exeter
Published
February 14, 2023
Modified
July 19, 2024
Outline
As scientists we make A LOT of tables, be it of summary statistics, model parameter values, contrast tables, or anything else. It used to be common practice to copy these individually into Excel or Word and then create a table to insert into your work, but this is prone to errors. No one is perfect after all.
This problem has now been solved by a bunch of different packages that allow for tables to be created in R and exported to a variety of formats. This means that fewer mistakes are made, that tables can easily be remade if the data or stats change, and that the process of making your tables is completely reproducible. The R package we will introduce to do this today is flextable, but there are other options available we won’t cover here such as gt and huxtable. I love flextable because it seems infinitely customisable and produces tables that are formatted beautifully to go into research articles.
flextable provides an easy and flexible way to create tables for reporting and presentations. It allows for customization of the table appearance, including cell borders, font styles, background color, and more. The package also supports different formats such as HTML, Microsoft Word, and PowerPoint. It is well suited for creating reports and presentations with a high level of formatting, while still being simple and straightforward to use. Additionally, flextable has a number of features that make it particularly useful for data visualisation, including the ability to merge cells, add custom functions for data formatting, and control the appearance of the table based on data values (i.e. making p values that are <0.05 bold).
Note: There are likely to be more elegant ways to do some of the code we present here. If you know of a simpler/easier way then we would love for you to help contribute to this project.
Prerequisites
Have R installed
Be using R for your research (or for funsies)
Hopefully have a table you want to make in R
Learning objectives
Learning objectives
This introduction should allow you to:
make publication-ready tables in flextable
format tables in flextable
export tables created in flextable
know where to learn more about flextable
Using flextable
This walkthrough will lean heavy on tools from the tidyverse for data wrangling, and the palmerpenguins package for the example data. The first thing we will do is load in the necessary packages and load in the dataset we will use to create our tables.
# load in required packageslibrary(flextable)library(tidyverse)library(emmeans)library(palmerpenguins)# load in datadata("penguins")d <- penguins# have a look at the dataglimpse(d)
This dataset contain data for 344 penguins, consisting of three species of penguins, collected from 3 islands in the Palmer Archipelago, Antarctica.
Make table of summary stats
It is common to make tables summarising aspects of any given dataset. We will do that here with flextable. Lets say we want to know the averages and standard deviations of body mass, bill length and bill depth of the different penguins species split by sex.
We can easily do this using group_by() and summarise() in dplyr. More information on these functions can be found in our tidyverse intRo.
Also because I dislike the behaviour of factors in R sometimes, we will first convert all columns that are factors into characters.
# convert all columns that are factors into charactersd <-mutate(d, across(where(is.factor), as.character))# create summary datad_summary <-group_by(d, species, sex) %>%summarise(# calculate number in each groupnum_penguins =n(),# can use across to apply the same functions to multiple columnsacross(c(bill_length_mm, bill_depth_mm, body_mass_g), list(mean =~mean(.x, na.rm =TRUE), sd =~sd(.x, na.rm =TRUE))),# specify we want to drop the groups.groups ='drop')# look at our tabled_summary
We can see there are some NAs for the sex of the penguins. We can make these unknown using replace_na().
# replace NAs in our sex columnd_summary <-mutate(d_summary, sex =replace_na(sex, 'unknown'))
We can immediately make this into a flextable using the flextable() function.
# make our table into a flex tabletable1 <-flextable(d_summary)table1
species
sex
num_penguins
bill_length_mm_mean
bill_length_mm_sd
bill_depth_mm_mean
bill_depth_mm_sd
body_mass_g_mean
body_mass_g_sd
Adelie
female
73
37.25753
2.028883
17.62192
0.9429927
3,368.836
269.3801
Adelie
male
73
40.39041
2.277131
19.07260
1.0188856
4,043.493
346.8116
Adelie
unknown
6
37.84000
2.802320
18.32000
1.2696456
3,540.000
477.1661
Chinstrap
female
34
46.57353
3.108669
17.58824
0.7811277
3,527.206
285.3339
Chinstrap
male
34
51.09412
1.564558
19.25294
0.7612730
3,938.971
362.1376
Gentoo
female
58
45.56379
2.051247
14.23793
0.5402493
4,679.741
281.5783
Gentoo
male
61
49.47377
2.720594
15.71803
0.7410596
5,484.836
313.1586
Gentoo
unknown
5
45.62500
1.374470
14.55000
0.8103497
4,587.500
338.1937
This already looks pretty good. But we can now add layers of formatting onto this table. The layers of formatting work much like the layers of ggplot2. You start with the base level and each new command creates new formatting of the whole table. Because of this, the order of the commands can make a big difference.
For this table, we will do a few different things:
rename the column names
merge the species column vertically to only have one instance of species
round the numbers to 2 decimal places
change the font and font size
play around with the column names to create a tiered column name for each trait (bill length, bill depth, body mass)
add in some horizontal lines to split species apart
We can do this by adding new layers using the %>%. Many of the functions involve you specifying what part you want the change to apply to (‘all’, ‘body’, or ‘header’). Specific rows can be specified using the i argument, and columns using j.
To add a new header row, we need to create a new object which contains the text we want to add.
For each new thing we do, I have commented what the command is doing so that the code makes as much sense as possible.
# text to be new header label# add same name for columns we want to merge later onpars <-as_paragraph(as_chunk(c('Species', 'Sex', 'n', 'Bill Length (mm)', 'Beak Length (mm)', 'Body Mass (g)')))# edit table1table1 <-flextable(d_summary) %>%# rename all column namesset_header_labels(species ='Species',sex ='Sex',num_penguins ='n',bill_length_mm_mean ='mean',bill_length_mm_sd ='sd',bill_depth_mm_mean ='mean',bill_depth_mm_sd ='sd',body_mass_g_mean ='mean',body_mass_g_sd ='sd') %>%add_header_row(values = pars, colwidths =c(1, 1, 1, 2, 2, 2), top =TRUE) %>%# add header rowmerge_v(j =c(1,2,3), part ='header') %>%merge_v(~species) %>%# merge multiple instances of the same species namefix_border_issues() %>%# fix random border issues after mergingvalign(valign ='top', j =1, part ='body') %>%# align Species columncolformat_double(j =c(4,5,6,7,8,9), digits =2) %>%# round numbers of specific columns to 2 decimal placesalign(align ='center', part ='header') %>%# align column names centrallyalign(align ='left', part ='body') %>%# align table output to the lefthline(i =c(3, 5), border =fp_border_default()) %>%# add in custom horizontal lines to split penguins upfont(fontname ='Times', part ='all') %>%# set font name for the tablefontsize(size =12, part ='all') %>%# set font size for the tableautofit()# view tabletable1
Species
Sex
n
Bill Length (mm)
Beak Length (mm)
Body Mass (g)
mean
sd
mean
sd
mean
sd
Adelie
female
73
37.26
2.03
17.62
0.94
3,368.84
269.38
male
73
40.39
2.28
19.07
1.02
4,043.49
346.81
unknown
6
37.84
2.80
18.32
1.27
3,540.00
477.17
Chinstrap
female
34
46.57
3.11
17.59
0.78
3,527.21
285.33
male
34
51.09
1.56
19.25
0.76
3,938.97
362.14
Gentoo
female
58
45.56
2.05
14.24
0.54
4,679.74
281.58
male
61
49.47
2.72
15.72
0.74
5,484.84
313.16
unknown
5
45.62
1.37
14.55
0.81
4,587.50
338.19
This is looking really good now! We can now export this table in a variety of formats (suitable for word documents, as an image or for powerpoint). I will write an exhaustive list of them here, but I generally like using an image for as long as possible so that I do not have as many formatting issues when putting tables into Microsoft Word or Google Docs.
# save as imagesave_as_image(table1, 'where/to/save/table1.png', webshot ='webshot2')# save for word documentsave_as_docx(table1, 'where/to/save/table1.docx', align ='center')# save for powerpointsave_as_pptx(table1, path ='where/to/save/table1.ppt')
Make table of model parameters
We can use very similar methods to summarise parameters from statistical models. To demonstrate this, I will fit a linear model looking at how body mass is different between species of penguins. We will then use emmeans() to extract model parameters and put them into a table.
# do modelsmod1 <-lm(body_mass_g ~ species, d)mod2 <-lm(body_mass_g ~1, d)# check for significance using an anovaanova(mod1, mod2)
Analysis of Variance Table
Model 1: body_mass_g ~ species
Model 2: body_mass_g ~ 1
Res.Df RSS Df Sum of Sq F Pr(>F)
1 339 72443483
2 341 219307697 -2 -146864214 343.63 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# super different# grab estimates for each species using emmeansd_params <-emmeans(mod1, ~species)d_params
We can now turn this into a nice table using similar methods to those we used above.
# text to be new header label# add same name for columns we want to merge later onpars <-as_paragraph(as_chunk(c('Species', 'Body Mass (g)', 'd.f.')))data.frame(d_params) %>%# rejig column name orders to have degrees of freedom at the endselect(species, emmean, SE, lower.CL, upper.CL, df) %>%flextable() %>%set_header_labels(species ='Species',emmean ='Estimate',SE ='s.e.',df ='d.f.',lower.CL ="2.5%CI",upper.CL ="97.5%CI") %>%add_header_row(values = pars, colwidths =c(1, 4, 1), top =TRUE) %>%# add header rowmerge_v(j =c(1,6), part ='header') %>%# merge column names togetheritalic(j =c(3, 6), part ='header') %>%# make some column names italiccolformat_double(j =c(2:5), digits =2) %>%# round numbers of specific columns to 2 decimal placesalign(align ='center', part ='header') %>%# align column names centrallyalign(align ='left', part ='body') %>%# align table output to the leftfont(fontname ='Times', part ='all') %>%# set font name for the tablefontsize(size =12, part ='all') %>%# set font size for the tableautofit() # fix any random size issues
Species
Body Mass (g)
d.f.
Estimate
s.e.
2.5%CI
97.5%CI
Adelie
3,700.66
37.62
3,626.67
3,774.66
339
Chinstrap
3,733.09
56.06
3,622.82
3,843.36
339
Gentoo
5,076.02
41.68
4,994.03
5,158.00
339
Nice. This table can easily be extended to have other parameters in, and this approach can be reused for countless types of models.
Make table of contrasts
Another common thing we might want to do is look at which (if any) species have significantly different body mass to each other. We can do this by doing post-hoc contrasts between the different species, given we know there is a significant effect of species on body mass.
contrast estimate SE df t.ratio p.value
Adelie - Chinstrap -32.4 67.5 339 -0.480 0.8807
Adelie - Gentoo -1375.4 56.1 339 -24.495 <.0001
Chinstrap - Gentoo -1342.9 69.9 339 -19.224 <.0001
P value adjustment: tukey method for comparing a family of 3 estimates
We can easily turn this into a publication-ready table using flextable(). As the p values are so low, we will conver them to being <0.0001 where appropriate, and convert that column to a character. This gives us more control over how flextable prints the output.
This is appropriate here but might not be the best way in other cases.
contrasts$contrasts %>%data.frame() %>%mutate(p.value =ifelse(p.value <0.0001, "<0.0001", as.character(round(p.value, 2)))) %>%flextable() %>%set_header_labels(contrast ='Contrast',emmean ='Estimate',SE ='s.e.',df ='d.f.',t.ratio ="t-ratio",p.value ="p value") %>%italic(j =c(3:6), part ='header') %>%# make some column names italiccolformat_double(j =c(2:6), digits =2) %>%# round numbers of specific columns to 2 decimal placesalign(align ='center', part ='header') %>%# align column names centrallyalign(align ='left', part ='body') %>%# align table output to the leftfont(fontname ='Times', part ='all') %>%# set font name for the tablebold(~p.value =="<0.0001", j ="p.value") %>%# make significant p values boldfontsize(size =12, part ='all') %>%# set font size for the tableautofit() # fix any random size issues
Contrast
estimate
s.e.
d.f.
t-ratio
p value
Adelie - Chinstrap
-32.43
67.51
339.00
-0.48
0.88
Adelie - Gentoo
-1,375.35
56.15
339.00
-24.50
<0.0001
Chinstrap - Gentoo
-1,342.93
69.86
339.00
-19.22
<0.0001
Summary
We have shown how to produce a range of publication-ready tables using flextable for summarising data, presenting model parameters, and contrast tables. Importantly all of these processes are reproducible and limit the chances of us making errors in our workflow.
All of the code here can be used to create your own tables. Have fun!