intRos: Reproducible tables with flextable

Make reproducible tables of summary stats and contrasts tables with flextable in R
flextable
R
Author
Affiliation

Daniel Padfield

University of Exeter

Published

February 14, 2023

Modified

December 6, 2023

Outline

As scientists we make A LOT of tables, be it of summary statistics, model parameter values, contrast tables, or anything else. It used to be common practice to copy these individually into Excel or Word and then create a table to insert into your work, but this is prone to errors. No one is perfect after all.

This problem has now been solved by a bunch of different packages that allow for tables to be created in R and exported to a variety of formats. This means that fewer mistakes are made, that tables can easily be remade if the data or stats change, and that the process of making your tables is completely reproducible. The R package we will introduce to do this today is flextable, but there are other options available we won’t cover here such as gt and huxtable. I love flextable because it seems infinitely customisable and produces tables that are formatted beautifully to go into research articles.

flextable provides an easy and flexible way to create tables for reporting and presentations. It allows for customization of the table appearance, including cell borders, font styles, background color, and more. The package also supports different formats such as HTML, Microsoft Word, and PowerPoint. It is well suited for creating reports and presentations with a high level of formatting, while still being simple and straightforward to use. Additionally, flextable has a number of features that make it particularly useful for data visualisation, including the ability to merge cells, add custom functions for data formatting, and control the appearance of the table based on data values (i.e. making p values that are <0.05 bold).

Note: There are likely to be more elegant ways to do some of the code we present here. If you know of a simpler/easier way then we would love for you to help contribute to this project.

Prerequisites

  • Have R installed
  • Be using R for your research (or for funsies)
  • Hopefully have a table you want to make in R

Learning objectives

Learning objectives

This introduction should allow you to:

  • make publication-ready tables in flextable
  • format tables in flextable
  • export tables created in flextable
  • know where to learn more about flextable

Using flextable

This walkthrough will lean heavy on tools from the tidyverse for data wrangling, and the palmerpenguins package for the example data. The first thing we will do is load in the necessary packages and load in the dataset we will use to create our tables.

# load in required packages
library(flextable)
library(tidyverse)
library(emmeans)
library(palmerpenguins)

# load in data
data("penguins")
d <- penguins

# have a look at the data
glimpse(d)
Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

This dataset contain data for 344 penguins, consisting of three species of penguins, collected from 3 islands in the Palmer Archipelago, Antarctica.

Make table of summary stats

It is common to make tables summarising aspects of any given dataset. We will do that here with flextable. Lets say we want to know the averages and standard deviations of body mass, bill length and bill depth of the different penguins species split by sex.

We can easily do this using group_by() and summarise() in dplyr. More information on these functions can be found in our tidyverse intRo.

Also because I dislike the behaviour of factors in R sometimes, we will first convert all columns that are factors into characters.

# convert all columns that are factors into characters
d <- mutate(d, across(where(is.factor), as.character))

# create summary data
d_summary <- group_by(d, species, sex) %>%
  summarise(
    # calculate number in each group
    num_penguins = n(),
    # can use across to apply the same functions to multiple columns
    across(c(bill_length_mm, bill_depth_mm, body_mass_g), list(mean = ~mean(.x, na.rm = TRUE), sd = ~sd(.x, na.rm = TRUE))),
    # specify we want to drop the groups
    .groups = 'drop')

# look at our table
d_summary
# A tibble: 8 × 9
  species   sex    num_penguins bill_length_mm_mean bill_length_mm_sd
  <chr>     <chr>         <int>               <dbl>             <dbl>
1 Adelie    female           73                37.3              2.03
2 Adelie    male             73                40.4              2.28
3 Adelie    <NA>              6                37.8              2.80
4 Chinstrap female           34                46.6              3.11
5 Chinstrap male             34                51.1              1.56
6 Gentoo    female           58                45.6              2.05
7 Gentoo    male             61                49.5              2.72
8 Gentoo    <NA>              5                45.6              1.37
# ℹ 4 more variables: bill_depth_mm_mean <dbl>, bill_depth_mm_sd <dbl>,
#   body_mass_g_mean <dbl>, body_mass_g_sd <dbl>

We can see there are some NAs for the sex of the penguins. We can make these unknown using replace_na().

# replace NAs in our sex column
d_summary <- mutate(d_summary, sex = replace_na(sex, 'unknown'))

We can immediately make this into a flextable using the flextable() function.

# make our table into a flex table
table1 <- flextable(d_summary)

table1

species

sex

num_penguins

bill_length_mm_mean

bill_length_mm_sd

bill_depth_mm_mean

bill_depth_mm_sd

body_mass_g_mean

body_mass_g_sd

Adelie

female

73

37.25753

2.028883

17.62192

0.9429927

3,368.836

269.3801

Adelie

male

73

40.39041

2.277131

19.07260

1.0188856

4,043.493

346.8116

Adelie

unknown

6

37.84000

2.802320

18.32000

1.2696456

3,540.000

477.1661

Chinstrap

female

34

46.57353

3.108669

17.58824

0.7811277

3,527.206

285.3339

Chinstrap

male

34

51.09412

1.564558

19.25294

0.7612730

3,938.971

362.1376

Gentoo

female

58

45.56379

2.051247

14.23793

0.5402493

4,679.741

281.5783

Gentoo

male

61

49.47377

2.720594

15.71803

0.7410596

5,484.836

313.1586

Gentoo

unknown

5

45.62500

1.374470

14.55000

0.8103497

4,587.500

338.1937

This already looks pretty good. But we can now add layers of formatting onto this table. The layers of formatting work much like the layers of ggplot2. You start with the base level and each new command creates new formatting of the whole table. Because of this, the order of the commands can make a big difference.

For this table, we will do a few different things:

  • rename the column names
  • merge the species column vertically to only have one instance of species
  • round the numbers to 2 decimal places
  • change the font and font size
  • play around with the column names to create a tiered column name for each trait (bill length, bill depth, body mass)
  • add in some horizontal lines to split species apart

We can do this by adding new layers using the %>%. Many of the functions involve you specifying what part you want the change to apply to (‘all’, ‘body’, or ‘header’). Specific rows can be specified using the i argument, and columns using j.

To add a new header row, we need to create a new object which contains the text we want to add.

For each new thing we do, I have commented what the command is doing so that the code makes as much sense as possible.

# text to be new header label
# add same name for columns we want to merge later on
pars <- as_paragraph(
  as_chunk(c('Species', 'Sex', 'n', 'Bill Length (mm)', 'Beak Length (mm)', 'Body Mass (g)'))
)


# edit table1
table1 <- flextable(d_summary) %>%
  # rename all column names
  set_header_labels(species = 'Species',
                    sex = 'Sex',
                    num_penguins = 'n',
                    bill_length_mm_mean = 'mean',
                    bill_length_mm_sd = 'sd',
                    bill_depth_mm_mean = 'mean',
                    bill_depth_mm_sd = 'sd',
                    body_mass_g_mean = 'mean',
                    body_mass_g_sd = 'sd') %>%
  add_header_row(values = pars, colwidths = c(1, 1, 1, 2, 2, 2), top = TRUE) %>% # add header row
  merge_v(j = c(1,2,3), part = 'header') %>%
  merge_v(~species) %>% # merge multiple instances of the same species name
  fix_border_issues() %>% # fix random border issues after merging
  valign(valign = 'top', j = 1, part = 'body') %>% # align Species column
  colformat_double(j = c(4,5,6,7,8,9), digits = 2) %>% # round numbers of specific columns to 2 decimal places
  align(align = 'center', part = 'header') %>% # align column names centrally
  align(align = 'left', part = 'body') %>% # align table output to the left
  hline(i = c(3, 5), border = fp_border_default()) %>% # add in custom horizontal lines to split penguins up
  font(fontname = 'Times', part = 'all') %>% # set font name for the table
  fontsize(size = 12, part = 'all') %>% # set font size for the table
  autofit()
  
# view table
table1

Species

Sex

n

Bill Length (mm)

Beak Length (mm)

Body Mass (g)

mean

sd

mean

sd

mean

sd

Adelie

female

73

37.26

2.03

17.62

0.94

3,368.84

269.38

male

73

40.39

2.28

19.07

1.02

4,043.49

346.81

unknown

6

37.84

2.80

18.32

1.27

3,540.00

477.17

Chinstrap

female

34

46.57

3.11

17.59

0.78

3,527.21

285.33

male

34

51.09

1.56

19.25

0.76

3,938.97

362.14

Gentoo

female

58

45.56

2.05

14.24

0.54

4,679.74

281.58

male

61

49.47

2.72

15.72

0.74

5,484.84

313.16

unknown

5

45.62

1.37

14.55

0.81

4,587.50

338.19

This is looking really good now! We can now export this table in a variety of formats (suitable for word documents, as an image or for powerpoint). I will write an exhaustive list of them here, but I generally like using an image for as long as possible so that I do not have as many formatting issues when putting tables into Microsoft Word or Google Docs.

# save as image
save_as_image(table1, 'where/to/save/table1.png', webshot = 'webshot2')

# save for word document
save_as_docx(table1, 'where/to/save/table1.docx', align = 'center')

# save for powerpoint
save_as_pptx(table1, path = 'where/to/save/table1.ppt')

Make table of model parameters

We can use very similar methods to summarise parameters from statistical models. To demonstrate this, I will fit a linear model looking at how body mass is different between species of penguins. We will then use emmeans() to extract model parameters and put them into a table.

# do models
mod1 <- lm(body_mass_g ~ species, d)
mod2 <- lm(body_mass_g ~ 1, d)

# check for significance using an anova
anova(mod1, mod2)
Analysis of Variance Table

Model 1: body_mass_g ~ species
Model 2: body_mass_g ~ 1
  Res.Df       RSS Df  Sum of Sq      F    Pr(>F)    
1    339  72443483                                   
2    341 219307697 -2 -146864214 343.63 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# super different

# grab estimates for each species using emmeans
d_params <- emmeans(mod1, ~species)

d_params
 species   emmean   SE  df lower.CL upper.CL
 Adelie      3701 37.6 339     3627     3775
 Chinstrap   3733 56.1 339     3623     3843
 Gentoo      5076 41.7 339     4994     5158

Confidence level used: 0.95 

We can now turn this into a nice table using similar methods to those we used above.

# text to be new header label
# add same name for columns we want to merge later on
pars <- as_paragraph(
  as_chunk(c('Species', 'Body Mass (g)', 'd.f.'))
)

data.frame(d_params) %>%
  # rejig column name orders to have degrees of freedom at the end
  select(species, emmean, SE, lower.CL, upper.CL, df) %>%
  flextable() %>%
  set_header_labels(species = 'Species',
                    emmean = 'Estimate',
                    SE = 's.e.',
                    df = 'd.f.',
                    lower.CL = "2.5%CI",
                    upper.CL = "97.5%CI") %>%
  add_header_row(values = pars, colwidths = c(1, 4, 1), top = TRUE) %>% # add header row
  merge_v(j = c(1,6), part = 'header') %>% # merge column names together
  italic(j = c(3, 6), part = 'header') %>% # make some column names italic
  colformat_double(j = c(2:5), digits = 2) %>% # round numbers of specific columns to 2 decimal places
  align(align = 'center', part = 'header') %>% # align column names centrally
  align(align = 'left', part = 'body') %>% # align table output to the left
  font(fontname = 'Times', part = 'all') %>% # set font name for the table
  fontsize(size = 12, part = 'all') %>% # set font size for the table
  autofit() # fix any random size issues

Species

Body Mass (g)

d.f.

Estimate

s.e.

2.5%CI

97.5%CI

Adelie

3,700.66

37.62

3,626.67

3,774.66

339

Chinstrap

3,733.09

56.06

3,622.82

3,843.36

339

Gentoo

5,076.02

41.68

4,994.03

5,158.00

339

Nice. This table can easily be extended to have other parameters in, and this approach can be reused for countless types of models.

Make table of contrasts

Another common thing we might want to do is look at which (if any) species have significantly different body mass to each other. We can do this by doing post-hoc contrasts between the different species, given we know there is a significant effect of species on body mass.

Again we can look at this using emmeans().

contrasts <- emmeans(mod1, pairwise ~ species)
contrasts$contrasts
 contrast           estimate   SE  df t.ratio p.value
 Adelie - Chinstrap    -32.4 67.5 339  -0.480  0.8807
 Adelie - Gentoo     -1375.4 56.1 339 -24.495  <.0001
 Chinstrap - Gentoo  -1342.9 69.9 339 -19.224  <.0001

P value adjustment: tukey method for comparing a family of 3 estimates 

We can easily turn this into a publication-ready table using flextable(). As the p values are so low, we will conver them to being <0.0001 where appropriate, and convert that column to a character. This gives us more control over how flextable prints the output.

This is appropriate here but might not be the best way in other cases.

contrasts$contrasts %>%
  data.frame() %>%
  mutate(p.value = ifelse(p.value < 0.0001, "<0.0001", as.character(round(p.value, 2)))) %>%
  flextable() %>%
  set_header_labels(contrast = 'Contrast',
                    emmean = 'Estimate',
                    SE = 's.e.',
                    df = 'd.f.',
                    t.ratio = "t-ratio",
                    p.value = "p value") %>%
  italic(j = c(3:6), part = 'header') %>% # make some column names italic
  colformat_double(j = c(2:6), digits = 2) %>% # round numbers of specific columns to 2 decimal places
  align(align = 'center', part = 'header') %>% # align column names centrally
  align(align = 'left', part = 'body') %>% # align table output to the left
  font(fontname = 'Times', part = 'all') %>% # set font name for the table
  bold(~p.value == "<0.0001", j = "p.value") %>% # make significant p values bold
  fontsize(size = 12, part = 'all') %>% # set font size for the table
  autofit() # fix any random size issues

Contrast

estimate

s.e.

d.f.

t-ratio

p value

Adelie - Chinstrap

-32.43

67.51

339.00

-0.48

0.88

Adelie - Gentoo

-1,375.35

56.15

339.00

-24.50

<0.0001

Chinstrap - Gentoo

-1,342.93

69.86

339.00

-19.22

<0.0001

Summary

We have shown how to produce a range of publication-ready tables using flextable for summarising data, presenting model parameters, and contrast tables. Importantly all of these processes are reproducible and limit the chances of us making errors in our workflow.

All of the code here can be used to create your own tables. Have fun!

Additional Resources

Acknowledgements

We did not create this content alone! Inspiration, tips, and resources have been borrowed from multiple sources.

  • Most of the inspiration for this was taken from David Gohel’s great work in developing flextable
  • Tips for creating the post, website, and format for the walkthroughs were taken from Stephanie Hicks’ Course website and source code.