Using imputation models already fitted to impute missing values in new individuals? #336
Replies: 3 comments
-
There was an old request for this feature at #32. See the discussion - and our solution using the |
Beta Was this translation helpful? Give feedback.
-
I'm going to dive in, thanks! |
Beta Was this translation helpful? Give feedback.
-
Here is a small reprex demonstrating two ways in which we may fit the imputation model in the train data, and apply it to the test data. library("mice", warn.conflicts = FALSE)
set.seed(123)
ignore <- sample(c(TRUE, FALSE), size = 25, replace = TRUE, prob = c(0.3, 0.7))
# scenario 1: train and test in the same dataset
imp <- mice(nhanes2, m = 2, ignore = ignore, print = FALSE)
imp.test1 <- filter(imp, ignore)
imp.test1$data
#> age bmi hyp chl
#> 2 40-59 22.7 no 187
#> 4 60-99 NA <NA> NA
#> 5 20-39 20.4 no 113
#> 8 20-39 30.1 no 187
#> 11 20-39 NA <NA> NA
#> 16 20-39 NA <NA> NA
#> 20 60-99 25.5 yes NA
#> 21 20-39 NA <NA> NA
#> 24 60-99 24.9 no NA
complete(imp.test1, 1)
#> age bmi hyp chl
#> 2 40-59 22.7 no 187
#> 4 60-99 21.7 yes 184
#> 5 20-39 20.4 no 113
#> 8 20-39 30.1 no 187
#> 11 20-39 28.7 no 229
#> 16 20-39 26.3 no 184
#> 20 60-99 25.5 yes 186
#> 21 20-39 28.7 no 199
#> 24 60-99 24.9 no 218
complete(imp.test1, 2)
#> age bmi hyp chl
#> 2 40-59 22.7 no 187
#> 4 60-99 27.4 yes 218
#> 5 20-39 20.4 no 113
#> 8 20-39 30.1 no 187
#> 11 20-39 28.7 no 204
#> 16 20-39 29.6 no 206
#> 20 60-99 25.5 yes 284
#> 21 20-39 22.0 yes 131
#> 24 60-99 24.9 no 229
# scenario 2: train and test in separate datasets
traindata <- nhanes2[!ignore, ]
testdata <- nhanes2[ignore, ]
imp.train <- mice(traindata, m = 2, print = FALSE)
imp.test2 <- mice.mids(imp.train, newdata = testdata)
#>
#> iter imp variable
#> 6 1 bmi hyp chl
#> 6 2 bmi hyp chl
complete(imp.test2, 1)
#> age bmi hyp chl
#> 2 40-59 22.7 no 187
#> 4 60-99 27.4 no 206
#> 5 20-39 20.4 no 113
#> 8 20-39 30.1 no 187
#> 11 20-39 26.3 no 118
#> 16 20-39 29.6 no 238
#> 20 60-99 25.5 yes 199
#> 21 20-39 27.4 no 187
#> 24 60-99 24.9 no 206
complete(imp.test2, 2)
#> age bmi hyp chl
#> 2 40-59 22.7 no 187
#> 4 60-99 27.5 yes 184
#> 5 20-39 20.4 no 113
#> 8 20-39 30.1 no 187
#> 11 20-39 29.6 no 187
#> 16 20-39 33.2 no 218
#> 20 60-99 25.5 yes 184
#> 21 20-39 28.7 no 238
#> 24 60-99 24.9 no 184 Created on 2021-12-06 by the reprex package (v2.0.1) |
Beta Was this translation helpful? Give feedback.
-
Hi,
I was wondering if it is possible (and if yes, how) to implement one of the methods described in https://onlinelibrary.wiley.com/doi/10.1002/sim.8682 (open access) for the imputation of missing predictor values in new individuals with missing data. More precisely, I would like to implement the method 6, which use the vector of parameter estimates for each of the fully conditional models as derived in a development dataset (where missing values were imputed by MICE) to impute missing data in new individuals. The method is described in the last paragraph of section 2.3.
I guess this is not straightforward, but is it possible using the mice package?
Best,
David
Beta Was this translation helpful? Give feedback.
All reactions