-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
110 lines (79 loc) · 6.62 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
output:
github_document:
fig_width: 5
fig_height: 5
html_document:
fig_width: 5
fig_height: 5
self_contained: no
keep_md: yes
---
```{r setup, include=FALSE}
options(width = 90)
knitr::opts_chunk$set(fig.retina = 2,
tidy.opts = list(width.cutoff = 90), # For code
width = 90) # For output
knitr::read_chunk("broomify-amelia.R")
```
# Show multiply imputed results in a side-by-side regression table with broom and huxtable
[Andrew Heiss](https://www.andrewheiss.com) • 2018-03-08
---
*tl;dr*: Use the functions in [`broomify-amelia.R`](broomify-amelia.R) to use `broom::tidy()`, `broom::glance()`, and `huxtable::huxreg()` on lists of multiply imputed models.
---
The whole reason I went into the rabbit hole of the mechanics of merging imputed regression results [in the previous post](https://www.andrewheiss.com/blog/2018/03/07/amelia-tidy-melding/) was so I could easily report these results in papers and writeups. In political science and economics (and probably other social science disciplines), it's fairly standard to report many regression models in a side-by-side table, with a column for each model and rows for each coefficient. R packages like [`stargazer`](https://cran.r-project.org/package=stargazer) and [`huxtable`](https://cran.r-project.org/package=huxtable) make this fairly straightforward.
```{r load-libraries-impute, warning=FALSE, message=FALSE}
library(tidyverse)
library(Amelia)
library(stargazer)
library(huxtable)
library(broom)
# Use the africa dataset from Ameila
data(africa)
# Build some example models
model_original1 <- lm(gdp_pc ~ trade + civlib, data = africa)
model_original2 <- lm(gdp_pc ~ trade + civlib + infl, data = africa)
```
Stargazer takes a list of models:
```{r stargazer, warning=FALSE, results="markup"}
stargazer(model_original1, model_original2, type = "text")
```
As does huxtable's `huxreg()`:
```{r huxtable, results="markup"}
huxreg(model_original1, model_original2) %>%
print_screen()
```
[Stargazer has support for a ton of different model types](https://cran.r-project.org/web/packages/stargazer/vignettes/stargazer.pdf) (see `?stargazer` for details), but they're all hardcoded into stargazer's internal code and adding more is tricky. [Huxtable](https://hughjonesd.github.io/huxtable/), on the other hand, doesn't rely on hardcoded model processing, but instead will display any model that works with `broom::tidy()` and `broom::glance()`. The [`broom` package](https://cran.r-project.org/package=broom) supports way more models than stargazer (including models created with [`rstan`](https://cran.r-project.org/package=rstan) and [`rstanarm`](https://cran.r-project.org/package=rstanarm)!), and because of this, huxtable is far more extensible—if you can create a `tidy()` and a `glance()` function for a type of model, huxtable can use it.
Also, stargazer was written before [R Markdown](https://rmarkdown.rstudio.com/) was really a thing, so it has excellent support for HTML and LaTeX output, but that's it. Including stargazer tables in an R Markdown document is a hassle, especially if you want to be able to convert it to Word ([I've written a Python script for doing this](https://github.com/andrewheiss/edb-social-pressure/blob/master/bin/stargazer2docx.py)—that's how much extra work it takes). Huxtable, though, was written after the R Markdown and tidyverse revolutions, so it supports piping *and* can output to HTML, LaTeX, *and* Markdown (with `huxtable::print_md()`).
This history is important because it means that models based on multiple imputation **will not work with stargazer.** Melding all the coefficients across imputations creates nice data frames of model results, but it doesn't create a model that stargazer can work with. This is unfortunate, especially given [how much I use stargazer](https://github.com/search?l=&q=stargazer+user%3Aandrewheiss&ref=advsearch&type=Code&utf8=%E2%9C%93). However, if we could make a `tidy()` and a `glance()` function that could work with a list of multiply imputed models, huxtable would solve all our problems.
So here's how to solve all your problems :)
First, we'll impute the missing data in the Africa data set, nest the imputed data in a larger data frame, and run a model on each imputed dataset:
```{r imputation-models}
set.seed(1234)
imp_amelia <- amelia(x = africa, m = 5, cs = "country", ts = "year", logs = "gdp_pc", p2s = 0)
models_imputed_df <- bind_rows(unclass(imp_amelia$imputations), .id = "m") %>%
group_by(m) %>%
nest() %>%
mutate(model = data %>% map(~ lm(gdp_pc ~ trade + civlib, data = .)))
models_imputed_df
```
Before we do anything with the models in `models_imputed_df$model`, first we can define a few functions to extend broom. R's S3 object system means that a function named `whatever.blah()` will automatically work when called on objects with the class `blah`. This is how broom generally works—there are functions named `tidy.anova()`, `tidy.glm()`, `tidy.lm()`, etc. that will do the correct tidying when run on `anova`, `glm`, and `lm` objects. Huxtable also takes advantage of this S3 object system—it will call the appropriate tidy and glance functions based on the class of the models passed to it.
To make a list of models work with broom, we need to invent a new class of model. In this example I've named it `melded`, but it could be anything. Here are three functions designed to work on `melded` objects (the code for these is largely based on [the previous post about melding coefficients](https://www.andrewheiss.com/blog/2018/03/07/amelia-tidy-melding/)). These functions are also found in [`broomify-amelia.R`](broomify-amelia.R), which you can add to your project (maybe someday this could be an actual package, but I don't see a reason for it yet).
```{r broomify, message=FALSE}
```
With these three functions, we can now use `glance()` and `tidy()` on a list of models with the class `melded`, like so:
```{r}
# Extract the models into a vector and make it a "melded" class
models_imputed <- models_imputed_df$model
# Without this, R won't use our custom tidy.melded() or glance.melded() functions
class(models_imputed) <- "melded"
glance(models_imputed)
tidy(models_imputed)
```
Even better, though, is that we can use these imputed models in a huxtable regression table. And, because I included a column named `m` in `glance.melded()`, we can also include it in the regression output!
```{r huxtable-table, warning=FALSE, message=FALSE, results="markup"}
huxreg(model_original1, model_original2, models_imputed,
statistics = c(N = "nobs", R2 = "r.squared", `Adj R2` = "adj.r.squared",
"logLik", "AIC", "m")) %>%
print_screen()
```