mice runtime on large datasets #404

amirdol · 2021-06-08T20:24:18Z

amirdol
Jun 8, 2021

Hi mice contributors!
I have a research project which requires imputation and I'm not sure what is the best way to approach a problem I encountered. Any help will be greatly appreciated.

I have a large data set of 500K rows, with ~20 variables that have ~30% missing values on average.

Running any mice algorithm (cart, norm etc.) with m=5 iterations is a very very long process (can take ~10 hours).
I'm using the ignore argument in order to train my model on a smaller subset of the data (~150K rows) with the assumptions this speeds up the imputation process (is this right?)

Are there any other suggestions re. how to shorten the time it takes the mice function to complete it's run?

stefvanbuuren · 2021-06-09T07:30:46Z

stefvanbuuren
Jun 9, 2021
Maintainer

This should not take 10 hours.

Do you have categorical variables with many categories?
Are you able to see whether some variables take much longer than others?
Did you try quickpred() to trim down the predictor matrix?
Did you try method = "norm"?

1 reply

amirdol Jun 10, 2021
Author

Thanks for the response @stefvanbuuren

I have only 1 categorical variable with 2 categories and no missing values
I just tried quickpred(),using the printFlag it looks like there are some variables that take longer but it's not one or two of them. In general it's 1/2-2 minutes per variable and I have 5 imputations to perform with 20 variables per imputations, for 5 iterations so quite a long time...
norm + quickpred() is much faster (less then an hour)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mice runtime on large datasets #404

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

mice runtime on large datasets #404

amirdol Jun 8, 2021

Replies: 1 comment · 1 reply

stefvanbuuren Jun 9, 2021 Maintainer

amirdol Jun 10, 2021 Author

amirdol
Jun 8, 2021

Replies: 1 comment 1 reply

stefvanbuuren
Jun 9, 2021
Maintainer

amirdol Jun 10, 2021
Author