I’m at the R/Medicine conference (no it’s not a Reddit thing) and got to help Alison Hill with her R Markdown for Medicine workshop. One of the questions I got to tinker with was making tables used to report model results.
One technique I learned while doing my MPH was to add variables to your model in blocks. It reduces the number of tests you need to perform, and is more meaningful than saying “I ran stepwise”. So, as a researcher, you will add variables in meaningful blocks and then see if that block of variables is “significant” by looking at the F-statistic.
Here is one example of at least putting your model results into a table.
Creating the models
library(reshape2) # using this just for the tips datasetlibrary(broom)library(purrr)library(knitr)library(kableExtra)head(tips)
total_bill tip sex smoker day time size
1 16.99 1.01 Female No Sun Dinner 2
2 10.34 1.66 Male No Sun Dinner 3
3 21.01 3.50 Male No Sun Dinner 3
4 23.68 3.31 Male No Sun Dinner 2
5 24.59 3.61 Female No Sun Dinner 4
6 25.29 4.71 Male No Sun Dinner 4
# create a bumch of modelsm1 <-lm(tip ~ total_bill, data = tips)m2 <-lm(tip ~ total_bill + sex, data = tips)m3 <-lm(tip ~ total_bill + sex + smoker, data = tips)m4 <-lm(tip ~ total_bill + smoker + size, data = tips)
Now that we have the models, we can use the broom package to get a nice consistent dataframe of the coefficients
# get a dataframe of the model estimatesmodels <-list(m1, m2, m3, m4)broom_outputs <- purrr::map(models, broom::tidy)broom_outputs
What we have now is a list of dataframes, one for each model we fit.
Just the beta estimates
If we just want the coefficients, we only need the terms and estimates columns and do a join. We can use the purrr::reduce function to do this
# join them all togethermodel_outputs <- purrr::reduce(broom_outputs, dplyr::full_join, by ='term')model_outputs
# A tibble: 5 × 17
term estimate.x std.error.x statistic.x p.value.x estimate.y std.error.y
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercep… 0.920 0.160 5.76 2.53e- 8 0.933 0.174
2 total_bill 0.105 0.00736 14.3 6.69e-34 0.105 0.00746
3 sexMale NA NA NA NA -0.0266 0.138
4 smokerYes NA NA NA NA NA NA
5 size NA NA NA NA NA NA
# ℹ 10 more variables: statistic.y <dbl>, p.value.y <dbl>, estimate.x.x <dbl>,
# std.error.x.x <dbl>, statistic.x.x <dbl>, p.value.x.x <dbl>,
# estimate.y.y <dbl>, std.error.y.y <dbl>, statistic.y.y <dbl>,
# p.value.y.y <dbl>
This will join all the dataframes together using the term column as the key. Variables that do not exist in the model will be filled with an NA. Problem now, is that we have a bunch of columns we don’t need. So the next step is to pull out the columns we want, and rename them accordingly.
# A tibble: 5 × 5
Variable `Model 1` `Model 2` `Model 3` `Model 4`
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.920 0.933 0.977 0.709
2 total_bill 0.105 0.105 0.106 0.0939
3 sexMale NA -0.0266 -0.0281 NA
4 smokerYes NA NA -0.149 -0.0834
5 size NA NA NA 0.180
What if I need more variables?
Just reporting the beta coefficients is not how you want to report your findings, typically you’d want to also display some measure of uncertainty along with the results. Here I’m just going to add the std.error column, but you can calculate the 95% confidence intervals using the std.error.
Here’s the reference from the kableExtra library I’m using to build the coefficient table:
# this is the exmaple form the kable docs I used for reference# https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.htmlcollapse_rows_dt <-data.frame(C1 =c(rep("a", 3), rep("b", 2)),C2 =c(rep("c", 2), rep("d", 1), rep("c", 1), rep("d", 1)),C3 =1:5,C4 =sample(c(0,1), 5, replace =TRUE))collapse_rows_dt
C1 C2 C3 C4
1 a c 1 0
2 a c 2 1
3 a d 3 0
4 b c 4 1
5 b d 5 0
This example was important to my solution, becuase I would need to format my coefficient table the same way as collapse_rows_dt. This means that:
C1 would be the variables in my model
C2 would need to alternate between the estimate and the std.error
The other columns would be the actual values for each model.
Since the model outputs the estimate and std.error as columns, I know some kind of reshaping/tidying step will need to be involved (melt/gather / pivot_longer).
We still start from the same data steps as the previous example
Next, you need to create some kind of ID value for each model you’re creating, that way when you’re combining everything into a single dataframe, the estimates are traveling together with the standard error. Here I’m labeling each model as a number starting with 1.
tidy_tables <- purrr::map2_df(broom_stats_long_filter, 1:length(broom_stats_long_filter),function(x, y) dplyr::mutate(x, mod = y))tidy_tables