From 23c16b60b92631962a1033d1e07fec799cc50231 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Wed, 31 Jan 2024 12:06:08 +0000 Subject: [PATCH] build based on 06b2490 --- dev/.documenter-siteinfo.json | 2 +- dev/acknowledgements/index.html | 2 +- dev/analysis/index.html | 8 ++++---- dev/benchmarks/index.html | 2 +- dev/gettingstarted/index.html | 2 +- dev/imputation/index.html | 20 ++++++++++---------- dev/index.html | 2 +- dev/issues/index.html | 2 +- dev/multithreading/index.html | 2 +- dev/pooling/index.html | 4 ++-- dev/rcall/index.html | 2 +- dev/references/index.html | 2 +- dev/whatsnext/index.html | 2 +- dev/wrangling/index.html | 2 +- 14 files changed, 27 insertions(+), 27 deletions(-) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 93ad2cf..fca80e3 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.9.2","generation_timestamp":"2024-01-31T12:05:00","documenter_version":"1.2.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.9.2","generation_timestamp":"2024-01-31T12:06:02","documenter_version":"1.2.1"}} \ No newline at end of file diff --git a/dev/acknowledgements/index.html b/dev/acknowledgements/index.html index c048905..ae050e1 100644 --- a/dev/acknowledgements/index.html +++ b/dev/acknowledgements/index.html @@ -3,4 +3,4 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-0SJ9WPE2ZH', {'page_path': location.pathname + location.search + location.hash}); -

Acknowledgements

This package is based heavily on the existing R package {mice} by Stef van Buuren, Karin Groothuis-Oudshoorn and collaborators [1].

The development of this package was supported by the Wellcome Trust [218497/Z/19/Z].


Funded by Wellcome     Wellcome logo
+

Acknowledgements

This package is based heavily on the existing R package {mice} by Stef van Buuren, Karin Groothuis-Oudshoorn and collaborators [1].

The development of this package was supported by the Wellcome Trust [218497/Z/19/Z].


Funded by Wellcome     Wellcome logo
diff --git a/dev/analysis/index.html b/dev/analysis/index.html index 57b7cdc..7ad24fb 100644 --- a/dev/analysis/index.html +++ b/dev/analysis/index.html @@ -6,12 +6,12 @@

Analysis (with)

Once you have a Mids object containing imputed data, you can use it to perform repeated analyses.

Inspecting imputed data

If you just want to inspect the outcome of the imputation process, you can use the complete/listComplete function to fill in the missing values in the original data frame.

Mice.completeFunction
complete(
     mids::Mids,
     imputation::Int
-    )

Produces a data table with missings replaced with imputed values from a multiply imputed dataset (Mids) object.

The Mids object must be supplied first.

The imputation argument is an integer identifying which specific imputation is to be used to fill in the missing values.

source
Mice.listCompleteFunction
listComplete(
+    )

Produces a data table with missings replaced with imputed values from a multiply imputed dataset (Mids) object.

The Mids object must be supplied first.

The imputation argument is an integer identifying which specific imputation is to be used to fill in the missing values.

source
Mice.listCompleteFunction
listComplete(
     mids::Mids
-    )

Summarises the outputs of all imputations in a multiply imputed dataset (Mids) as a list of completed datasets.

source

Data analysis

To perform a data analysis procedure on each imputed dataset in turn, use the with function. The with function returns the results of the analyses wrapped in a Mira object.

Mice.MiraType
Mira

A multiply imputed repeated analyses object.

The analyses are stored as a vector of analyses of individual imputations.

source
Mice.withFunction
with(
+    )

Summarises the outputs of all imputations in a multiply imputed dataset (Mids) as a list of completed datasets.

source

Data analysis

To perform a data analysis procedure on each imputed dataset in turn, use the with function. The with function returns the results of the analyses wrapped in a Mira object.

Mice.MiraType
Mira

A multiply imputed repeated analyses object.

The analyses are stored as a vector of analyses of individual imputations.

source
Mice.withFunction
with(
     mids::Mids,
     func::Function
-    )

Conducts repeated analyses of a multiply imputed dataset (Mids).

The function takes two arguments: firstly the Mids object itself, then a function (func). The function should take the form data -> analysisFunction(arguments, data, moreArguments...), where data represents the position of the data argument in the function.

For example: with(mids, data -> lm(@formula(y ~ x1 + x2), data))

source

The with function requires the use of a closure, which then permits the function to run the specified analysis procedure on each imputed dataset in turn. For example:

using CSV, DataFrames, GLM, Mice, Random, Statistics
+    )

Conducts repeated analyses of a multiply imputed dataset (Mids).

The function takes two arguments: firstly the Mids object itself, then a function (func). The function should take the form data -> analysisFunction(arguments, data, moreArguments...), where data represents the position of the data argument in the function.

For example: with(mids, data -> lm(@formula(y ~ x1 + x2), data))

source

The with function requires the use of a closure, which then permits the function to run the specified analysis procedure on each imputed dataset in turn. For example:

using CSV, DataFrames, GLM, Mice, Random, Statistics
 
 myData = CSV.read("test/data/cirrhosis.csv", DataFrame, missingstring = "NA");
 
@@ -29,4 +29,4 @@
 # returns Mira of the mean of Bilirubin in each imputed dataset
 
 analysesLMs = with(imputedData, data -> lm(@formula(N_Days ~ Drug + Age + Stage + Bilirubin), data));
-# returns Mira of linear model outputs from each imputed dataset

Funded by Wellcome     Wellcome logo
+# returns Mira of linear model outputs from each imputed dataset
Funded by Wellcome     Wellcome logo
diff --git a/dev/benchmarks/index.html b/dev/benchmarks/index.html index c79eda2..c3e2145 100644 --- a/dev/benchmarks/index.html +++ b/dev/benchmarks/index.html @@ -3,4 +3,4 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-0SJ9WPE2ZH', {'page_path': location.pathname + location.search + location.hash}); -

Benchmarks

I have (very much not rigorously) benchmarked Mice.jl using the test dataset [5], and also performed an equivalent benchmark of the R package mice.

15 iterations were completed to impute 12 variables (of which 4 binary categorical, 1 other categorical and 7 numeric) using a set of 18 predictors (those 12 variables plus 6 complete variables: 1 binary categorical, 2 other categorical and 3 numeric). Both used predictive mean matching for all variables that were to be imputed. In Mice.jl, gcSchedule was set to 0.3.

Benchmark results

Windows

System info: Single-threaded execution, Intel® Core™ i7-12700H 2.30GHz CPU, 32GB 4800MHz DDR5 RAM, running Windows 11 version 10.0.22621.

R: version 4.3.2 running mice version 3.16.0. Julia: version 1.10.0 running Mice.jl version 0.3.2.

Number of imputationsR (mice) (s)Mice.jl (s)
11.794.86
58.455.54
1016.596.55
2033.198.09
5085.7912.17
100171.9319.62

Linux

System info: Single-threaded execution, Intel® Core™ i7-12700H 2.30GHz CPU, 32GB 4800MHz DDR5 RAM, running Ubuntu (WSL) version 22.04.3.

R: version 4.3.2 running mice version 3.16.0. Julia: version 1.10.0 running Mice.jl version 0.3.2.

Number of imputationsR (mice) (s)Mice.jl (s)
11.244.93
55.925.63
1011.746.56
2023.878.44
5063.8312.54
100125.6521.30

Why is Mice.jl so slow for small jobs?

Julia is a compiled language. This means that the first time a function is run, it is compiled into machine code, which takes time. Therefore, the first iteration of mice() will be (much) slower in Julia than in R, for example. However, subsequent iterations will be much faster, as all of the required functions are already compiled.

Why is the first iteration so much slower than the rest?

See above.


Funded by Wellcome     Wellcome logo
+

Benchmarks

I have (very much not rigorously) benchmarked Mice.jl using the test dataset [5], and also performed an equivalent benchmark of the R package mice.

15 iterations were completed to impute 12 variables (of which 4 binary categorical, 1 other categorical and 7 numeric) using a set of 18 predictors (those 12 variables plus 6 complete variables: 1 binary categorical, 2 other categorical and 3 numeric). Both used predictive mean matching for all variables that were to be imputed. In Mice.jl, gcSchedule was set to 0.3.

Benchmark results

Windows

System info: Single-threaded execution, Intel® Core™ i7-12700H 2.30GHz CPU, 32GB 4800MHz DDR5 RAM, running Windows 11 version 10.0.22621.

R: version 4.3.2 running mice version 3.16.0. Julia: version 1.10.0 running Mice.jl version 0.3.2.

Number of imputationsR (mice) (s)Mice.jl (s)
11.794.86
58.455.54
1016.596.55
2033.198.09
5085.7912.17
100171.9319.62

Linux

System info: Single-threaded execution, Intel® Core™ i7-12700H 2.30GHz CPU, 32GB 4800MHz DDR5 RAM, running Ubuntu (WSL) version 22.04.3.

R: version 4.3.2 running mice version 3.16.0. Julia: version 1.10.0 running Mice.jl version 0.3.2.

Number of imputationsR (mice) (s)Mice.jl (s)
11.244.93
55.925.63
1011.746.56
2023.878.44
5063.8312.54
100125.6521.30

Why is Mice.jl so slow for small jobs?

Julia is a compiled language. This means that the first time a function is run, it is compiled into machine code, which takes time. Therefore, the first iteration of mice() will be (much) slower in Julia than in R, for example. However, subsequent iterations will be much faster, as all of the required functions are already compiled.

Why is the first iteration so much slower than the rest?

See above.


Funded by Wellcome     Wellcome logo
diff --git a/dev/gettingstarted/index.html b/dev/gettingstarted/index.html index 98b5cab..b8d6544 100644 --- a/dev/gettingstarted/index.html +++ b/dev/gettingstarted/index.html @@ -4,4 +4,4 @@ gtag('js', new Date()); gtag('config', 'G-0SJ9WPE2ZH', {'page_path': location.pathname + location.search + location.hash});

Getting started

Installation

To install the latest stable version:

] add Mice

or

using Pkg
-Pkg.add("Mice")

Usage

To load the package, use the command:

using Mice

Funded by Wellcome     Wellcome logo
+Pkg.add("Mice")

Usage

To load the package, use the command:

using Mice

Funded by Wellcome     Wellcome logo
diff --git a/dev/imputation/index.html b/dev/imputation/index.html index e050fbc..37e252f 100644 --- a/dev/imputation/index.html +++ b/dev/imputation/index.html @@ -3,7 +3,7 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-0SJ9WPE2ZH', {'page_path': location.pathname + location.search + location.hash}); -

Imputation (mice)

The main function of the package is mice, which takes a Tables.jl-compatible table as its input. It returns a multiply imputed dataset (Mids) object with the imputed values.

Mice.MidsType
Mids

A multiply imputed dataset object.

The data originally supplied are stored as data.

The imputed data are stored as imputations (one column per imputation).

The locations at which data have been imputed are stored as imputeWhere.

The number of imputations is stored as m.

The imputation method for each variable is stored as methods.

The predictor matrix is stored as predictorMatrix.

The order in which the variables are imputed is stored as visitSequence.

The number of iterations is stored as iter.

The mean of each variable across the imputations is stored as meanTraces.

The variance of each variable across the imputations is stored as varTraces.

source
Mice.miceFunction
mice(
+

Imputation (mice)

The main function of the package is mice, which takes a Tables.jl-compatible table as its input. It returns a multiply imputed dataset (Mids) object with the imputed values.

Mice.MidsType
Mids

A multiply imputed dataset object.

The data originally supplied are stored as data.

The imputed data are stored as imputations (one column per imputation).

The locations at which data have been imputed are stored as imputeWhere.

The number of imputations is stored as m.

The imputation method for each variable is stored as methods.

The predictor matrix is stored as predictorMatrix.

The order in which the variables are imputed is stored as visitSequence.

The number of iterations is stored as iter.

The mean of each variable across the imputations is stored as meanTraces.

The variance of each variable across the imputations is stored as varTraces.

source
Mice.miceFunction
mice(
     data;
     m::Int = 5,
     imputeWhere::AxisVector{Vector{Bool}} = findMissings(data),
@@ -14,13 +14,13 @@
     progressReports::Bool = true,
     gcSchedule::Float64 = 0.3,
     kwargs...
-    )

Imputes missing values in a dataset using the MICE algorithm. The output is a Mids object.

The data containing missing values (data) must be supplied as a Tables.jl table.

The number of imputations created is specified by m.

imputeWhere is an AxisVector of boolean vectors specifying where data are to be imputed. The default is to impute all missing data.

The variables will be imputed in the order specified by visitSequence. The default is sorted by proportion of missing data in ascending order; the order can be customised using a vector of variable names in the desired order. Any column not to be imputed at all can be left out of the visit sequence.

The imputation method for each variable is specified by the AxisVector methods. The default is to use predictive mean matching (pmm) for all variables. Any variable not to be imputed can be marked as such using an empty string ("").

The predictor matrix is specified by the AxisMatrix predictorMatrix. The default is to use all other variables as predictors for each variable. Any variable not predicting another variable can be marked as such in the matrix using a 0.

The number of iterations is specified by iter.

If progressReports is true, a progress indicator will be displayed in the console.

gcSchedule dictates when the garbage collector will be (additionally) invoked. The number provided is the fraction of your RAM remaining at which the GC will be called. For small datasets, you may get away with a value of 0.0 (never called), but for larger datasets, it may be worthwhile to call it more frequently. The default is 0.3, but for really large jobs you may want to increase this value.

source
mice(
+    )

Imputes missing values in a dataset using the MICE algorithm. The output is a Mids object.

The data containing missing values (data) must be supplied as a Tables.jl table.

The number of imputations created is specified by m.

imputeWhere is an AxisVector of boolean vectors specifying where data are to be imputed. The default is to impute all missing data.

The variables will be imputed in the order specified by visitSequence. The default is sorted by proportion of missing data in ascending order; the order can be customised using a vector of variable names in the desired order. Any column not to be imputed at all can be left out of the visit sequence.

The imputation method for each variable is specified by the AxisVector methods. The default is to use predictive mean matching (pmm) for all variables. Any variable not to be imputed can be marked as such using an empty string ("").

The predictor matrix is specified by the AxisMatrix predictorMatrix. The default is to use all other variables as predictors for each variable. Any variable not predicting another variable can be marked as such in the matrix using a 0.

The number of iterations is specified by iter.

If progressReports is true, a progress indicator will be displayed in the console.

gcSchedule dictates when the garbage collector will be (additionally) invoked. The number provided is the fraction of your RAM remaining at which the GC will be called. For small datasets, you may get away with a value of 0.0 (never called), but for larger datasets, it may be worthwhile to call it more frequently. The default is 0.3, but for really large jobs you may want to increase this value.

source
mice(
     mids::Mids;
     iter::Int = 10,
     progressReports::Bool = true,
     gcSchedule::Float64 = 0.3;
     kwargs...
-    )

Adds additional iterations to an existing Mids object.

The number of additional iterations is specified by iter.

progressReports and gcSchedule can also be specified: all other arguments will be ignored.

source

Customising the imputation setup

You can customise various aspects of the imputation setup by passing keyword arguments to mice. These are described above. You can also use some of the functions below to define objects that you can customise to alter how mice handles the imputation.

Locations to impute

You can customise which data points are imputed by manipulating the imputeWhere argument. By default, this will specify that all missing data are to be imputed (using the function findMissings()).

Mice.findMissingsFunction
findMissings(data)

Returns an AxisVector of boolean vectors describing the locations of missing data in each column of the provided data table.

source

You can over-impute existing data by setting the locations of non-missing data to true in the relevant vector in imputeWhere. For example, to over-impute the value of col1 for the first row, you could do the following:

using DataFrames, Mice, Random
+    )

Adds additional iterations to an existing Mids object.

The number of additional iterations is specified by iter.

progressReports and gcSchedule can also be specified: all other arguments will be ignored.

source

Customising the imputation setup

You can customise various aspects of the imputation setup by passing keyword arguments to mice. These are described above. You can also use some of the functions below to define objects that you can customise to alter how mice handles the imputation.

Locations to impute

You can customise which data points are imputed by manipulating the imputeWhere argument. By default, this will specify that all missing data are to be imputed (using the function findMissings()).

Mice.findMissingsFunction
findMissings(data)

Returns an AxisVector of boolean vectors describing the locations of missing data in each column of the provided data table.

source

You can over-impute existing data by setting the locations of non-missing data to true in the relevant vector in imputeWhere. For example, to over-impute the value of col1 for the first row, you could do the following:

using DataFrames, Mice, Random
 
 myData = DataFrame(
     :col1 => Vector{Union{Missing, Float64}}([1.0, missing, 3.0, missing, 5.0]),
@@ -78,7 +78,7 @@
 # "col2"
 
 # Not run
-mice(myData, visitSequence = myVisitSequence2)

Assuming that the imputations converge normally, changing the visit sequence should not dramatically affect the output. However, it can be useful to change the visit sequence if you want to impute variables in a particular order for a specific reason. The sequence used by default in Mice.jl can make convergence faster in cases where the data follow a (near-)"monotone" missing data pattern [2].

You can leave variables out of the visitSequence to cause mice() to not impute them.

Predictor matrix

The predictor matrix defines which variables in the imputation model are used to predict which others. By default, every variable predicts every other variable, but there are a wide range of cases in which this is not desirable. For example, if your dataset includes an ID column, this is clearly useless for imputation and should be ignored.

To create a default predictor matrix that you can edit, you can use the function makePredictorMatrix.

Mice.makePredictorMatrixFunction
makePredictorMatrix(data)

Returns an AxisMatrix of integers defining the predictors for each variable in data. The variables to be predicted are on the rows, and the predictors are on the columns. The default is to use all variables as predictors for all other variables (i.e. all 1s except for the diagonal, which is 0).

source

You can then edit the predictor matrix to remove any predictive relationships that you do not want to include in the imputation model. For example:

using DataFrames, Mice, Random
+mice(myData, visitSequence = myVisitSequence2)

Assuming that the imputations converge normally, changing the visit sequence should not dramatically affect the output. However, it can be useful to change the visit sequence if you want to impute variables in a particular order for a specific reason. The sequence used by default in Mice.jl can make convergence faster in cases where the data follow a (near-)"monotone" missing data pattern [2].

You can leave variables out of the visitSequence to cause mice() to not impute them.

Predictor matrix

The predictor matrix defines which variables in the imputation model are used to predict which others. By default, every variable predicts every other variable, but there are a wide range of cases in which this is not desirable. For example, if your dataset includes an ID column, this is clearly useless for imputation and should be ignored.

To create a default predictor matrix that you can edit, you can use the function makePredictorMatrix.

Mice.makePredictorMatrixFunction
makePredictorMatrix(data)

Returns an AxisMatrix of integers defining the predictors for each variable in data. The variables to be predicted are on the rows, and the predictors are on the columns. The default is to use all variables as predictors for all other variables (i.e. all 1s except for the diagonal, which is 0).

source

You can then edit the predictor matrix to remove any predictive relationships that you do not want to include in the imputation model. For example:

using DataFrames, Mice, Random
 
 myData = DataFrame(
     :id => Vector{Int64}(1:5),
@@ -124,7 +124,7 @@
 Random.seed!(1234); # Set random seed for reproducibility
 
 # Not run
-mice(myData, predictorMatrix = myPredictorMatrix)

Methods

The imputation methods are the functions that are used to impute each variable. By default, mice uses predictive mean matching ("pmm") for all variables. Currently Mice.jl supports the following methods:

MethodDescriptionVariable type
pmmPredictive mean matchingAny
sampleRandom sample from observed valuesAny
meanMean of observed valuesNumeric (float)
normBayesian linear regressionNumeric (float)

The mean and sample methods should not generally be used.

To create a default methods vector, use the function makeMethods.

Mice.makeMethodsFunction
makeMethods(data)

Returns an AxisVector of strings defining the method by which each variable in data should be imputed in the mice() function. The default method is predictive mean matching (pmm).

source

You can then customise the vector as needed. For example:

using DataFrames, Mice, Random
+mice(myData, predictorMatrix = myPredictorMatrix)

Methods

The imputation methods are the functions that are used to impute each variable. By default, mice uses predictive mean matching ("pmm") for all variables. Currently Mice.jl supports the following methods:

MethodDescriptionVariable type
pmmPredictive mean matchingAny
sampleRandom sample from observed valuesAny
meanMean of observed valuesNumeric (float)
normBayesian linear regressionNumeric (float)

The mean and sample methods should not generally be used.

To create a default methods vector, use the function makeMethods.

Mice.makeMethodsFunction
makeMethods(data)

Returns an AxisVector of strings defining the method by which each variable in data should be imputed in the mice() function. The default method is predictive mean matching (pmm).

source

You can then customise the vector as needed. For example:

using DataFrames, Mice, Random
 
 myData = DataFrame(
     :id => Vector{Int64}(1:5),
@@ -171,17 +171,17 @@
 mice(myData, methods = myMethods)

Diagnostics

After performing multiple imputation, you should inspect the trace plots of the imputed variables to verify convergence. Mice.jl includes a plotting function to do this.

RecipesBase.plotFunction
plot(
     mids::Mids,
     var::String
-    )

Plots the mean and standard deviation of the imputed values for a given variable. Here var is given as a string (the name of the variable).

source
plot(
+    )

Plots the mean and standard deviation of the imputed values for a given variable. Here var is given as a string (the name of the variable).

source
plot(
     mids::Mids,
     var_no::Int
-    )

Plots the mean and standard deviation of the imputed values for a given variable. Here var_no is given as an integer (the index of the variable in the visitSequence).

source

You do need to load the package Plots.jl to see the plots:

using Plots
+    )

Plots the mean and standard deviation of the imputed values for a given variable. Here var_no is given as an integer (the index of the variable in the visitSequence).

source

You do need to load the package Plots.jl to see the plots:

using Plots
 
 # Not run
 plot(myMids, 7)

Binding imputations together

If you have a number of Mids objects that were produced in the same way (e.g. through multithreading), you can bind them together into a single Mids object using the function bindImputations. Note that the log of events might not make sense in the resulting object: it is better to inspect the logs of the individual objects before binding them together.

Mice.bindImputationsFunction
bindImputations(
     mids1::Mids,
     mids2::Mids
-    )

Combines two Mids objects into one. The two objects must have been created from the same dataset, with the same imputation methods, predictor matrix, visit sequence and number of iterations. The numbers of imputations can be different.

source
bindImputations(
+    )

Combines two Mids objects into one. The two objects must have been created from the same dataset, with the same imputation methods, predictor matrix, visit sequence and number of iterations. The numbers of imputations can be different.

source
bindImputations(
     midsVector::Vector{Mids}
-    )

Combines a vector of Mids objects into one Mids object. They must all have been created from the same dataset with the same imputation methods, predictor matrix, visit sequence and number of iterations. The number of imputations can be different.

source
bindImputations(
+    )

Combines a vector of Mids objects into one Mids object. They must all have been created from the same dataset with the same imputation methods, predictor matrix, visit sequence and number of iterations. The number of imputations can be different.

source
bindImputations(
     mids...
-    )

Combines any number of Mids objects into one Mids object. They must all have been created from the same dataset with the same imputation methods, predictor matrix, visit sequence and number of iterations. The number of imputations can be different.

source

Funded by Wellcome     Wellcome logo
+ )

Combines any number of Mids objects into one Mids object. They must all have been created from the same dataset with the same imputation methods, predictor matrix, visit sequence and number of iterations. The number of imputations can be different.

source
Funded by Wellcome     Wellcome logo
diff --git a/dev/index.html b/dev/index.html index d7a4d6e..07a0039 100644 --- a/dev/index.html +++ b/dev/index.html @@ -3,4 +3,4 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-0SJ9WPE2ZH', {'page_path': location.pathname + location.search + location.hash}); -

What is Mice.jl?

Mice.jl is the Julia equivalent of the R package mice by Stef van Buuren, Karin Groothuis-Oudshoorn and collaborators [1]. It allows you to impute missing values in a dataset using multiple imputation by chained equations (MICE).

Currently, only predictive mean matching (PMM) and Bayesian linear regression are supported as methods. Mice.jl also currently does not support hybrid imputation models.

If you want to learn more about multiple imputation, this is not the guide for you. Instead, I recommend consulting "Flexible Imputation of Missing Data" by Stef van Buuren (ed.) [2].


Funded by Wellcome     Wellcome logo
+

What is Mice.jl?

Mice.jl is the Julia equivalent of the R package mice by Stef van Buuren, Karin Groothuis-Oudshoorn and collaborators [1]. It allows you to impute missing values in a dataset using multiple imputation by chained equations (MICE).

Currently, only predictive mean matching (PMM) and Bayesian linear regression are supported as methods. Mice.jl also currently does not support hybrid imputation models.

If you want to learn more about multiple imputation, this is not the guide for you. Instead, I recommend consulting "Flexible Imputation of Missing Data" by Stef van Buuren (ed.) [2].


Funded by Wellcome     Wellcome logo
diff --git a/dev/issues/index.html b/dev/issues/index.html index 3b7d33d..371dddf 100644 --- a/dev/issues/index.html +++ b/dev/issues/index.html @@ -3,4 +3,4 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-0SJ9WPE2ZH', {'page_path': location.pathname + location.search + location.hash}); -

Issues

This package is an early work in progress, and issues should be expected. When you find issues, please report them on the issues page.


Funded by Wellcome     Wellcome logo
+

Issues

This package is an early work in progress, and issues should be expected. When you find issues, please report them on the issues page.


Funded by Wellcome     Wellcome logo
diff --git a/dev/multithreading/index.html b/dev/multithreading/index.html index 7f6291f..86c73a3 100644 --- a/dev/multithreading/index.html +++ b/dev/multithreading/index.html @@ -21,4 +21,4 @@ imputedData[i] = mice(myData, m = 5, predictorMatrix = myPredictorMatrix, progressReports = false) end -imputedData = bindImputations(imputedData); # Binds the separate Mids objects into a single output
Funded by Wellcome     Wellcome logo
+imputedData = bindImputations(imputedData); # Binds the separate Mids objects into a single output
Funded by Wellcome     Wellcome logo
diff --git a/dev/pooling/index.html b/dev/pooling/index.html index f260a40..da5ccee 100644 --- a/dev/pooling/index.html +++ b/dev/pooling/index.html @@ -3,7 +3,7 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-0SJ9WPE2ZH', {'page_path': location.pathname + location.search + location.hash}); -

Pooling coefficients (pool)

Once you have a Mira object containing the results of repeated analyses, you can use the pool function to pool the results. The pool function returns the pooled results wrapped in a Mipo object.

Mice.MipoType
Mipo

A type for storing the pooled results of multiply imputed repeated analyses (Mira).

source
Mice.poolFunction
pool(mira::Mira)

Pools the results of multiply imputed repeated analyses (Mira). The function will work on any Mira object containing model outputs which are receptive to the coef, stderror and nobs functions from StatsAPI.jl.

source

The pool function should work on any Mira of model outputs that accept the StatsAPI functions coef, stderror and nobs. Otherwise, you will get an error and you will need to pool the results manually in accordance with Rubin's rules [3].

For example:

using CSV, DataFrames, GLM, Mice, Random
+

Pooling coefficients (pool)

Once you have a Mira object containing the results of repeated analyses, you can use the pool function to pool the results. The pool function returns the pooled results wrapped in a Mipo object.

Mice.MipoType
Mipo

A type for storing the pooled results of multiply imputed repeated analyses (Mira).

source
Mice.poolFunction
pool(mira::Mira)

Pools the results of multiply imputed repeated analyses (Mira). The function will work on any Mira object containing model outputs which are receptive to the coef, stderror and nobs functions from StatsAPI.jl.

source

The pool function should work on any Mira of model outputs that accept the StatsAPI functions coef, stderror and nobs. Otherwise, you will get an error and you will need to pool the results manually in accordance with Rubin's rules [3].

For example:

using CSV, DataFrames, GLM, Mice, Random
 
 myData = CSV.read("test/data/cirrhosis.csv", DataFrame, missingstring = "NA");
 
@@ -20,4 +20,4 @@
 # returns Mira of linear model outputs from each imputed dataset
 
 resultsLMs = pool(analysesLMs);
-# returns Mipo of pooled linear model results

Funded by Wellcome     Wellcome logo
+# returns Mipo of pooled linear model results

Funded by Wellcome     Wellcome logo
diff --git a/dev/rcall/index.html b/dev/rcall/index.html index 6e578f5..b0616dd 100644 --- a/dev/rcall/index.html +++ b/dev/rcall/index.html @@ -60,4 +60,4 @@ R> analyses <- with(imputedData, lm(N_Days ~ Drug + Age + Stage + Bilirubin)) -R> results <- summary(pool(analyses))
Funded by Wellcome     Wellcome logo
+R> results <- summary(pool(analyses))
Funded by Wellcome     Wellcome logo
diff --git a/dev/references/index.html b/dev/references/index.html index 4be8790..a920dff 100644 --- a/dev/references/index.html +++ b/dev/references/index.html @@ -3,4 +3,4 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-0SJ9WPE2ZH', {'page_path': location.pathname + location.search + location.hash}); -

References

[1]
S. van Buuren and K. Groothuis-Oudshoorn, mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software 45, 1–67 (2011), https://doi.org/10.18637/jss.v045.i03.
[2]
S. van Buuren. Flexible Imputation of Missing Data. 2nd Edition (Chapman and Hall/CRC, New York, 2018).
[3]
D. B. Rubin. Multiple imputation for nonresponse in surveys. 1st Edition (John Wiley & Sons, Ltd, New York, 1987). Accessed on Nov 8, 2023, https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470316696.
[4]
C. Li. JuliaCall: an R package for seamless integration between R and Julia. The Journal of Open Source Software 4, 1284 (2019). Publisher: The Open Journal.
[5]
E. R. Dickson, P. M. Grambsch, T. R. Fleming, L. D. Fisher and A. Langworthy. Prognosis in primary biliary cirrhosis: Model for decision making. Hepatology 10, 1–7 (1989), https://onlinelibrary.wiley.com/doi/pdf/10.1002/hep.1840100102.

Funded by Wellcome     Wellcome logo
+

References

[1]
S. van Buuren and K. Groothuis-Oudshoorn, mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software 45, 1–67 (2011), https://doi.org/10.18637/jss.v045.i03.
[2]
S. van Buuren. Flexible Imputation of Missing Data. 2nd Edition (Chapman and Hall/CRC, New York, 2018).
[3]
D. B. Rubin. Multiple imputation for nonresponse in surveys. 1st Edition (John Wiley & Sons, Ltd, New York, 1987). Accessed on Nov 8, 2023, https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470316696.
[4]
C. Li. JuliaCall: an R package for seamless integration between R and Julia. The Journal of Open Source Software 4, 1284 (2019). Publisher: The Open Journal.
[5]
E. R. Dickson, P. M. Grambsch, T. R. Fleming, L. D. Fisher and A. Langworthy. Prognosis in primary biliary cirrhosis: Model for decision making. Hepatology 10, 1–7 (1989), https://onlinelibrary.wiley.com/doi/pdf/10.1002/hep.1840100102.

Funded by Wellcome     Wellcome logo
diff --git a/dev/whatsnext/index.html b/dev/whatsnext/index.html index 521b592..53c91dd 100644 --- a/dev/whatsnext/index.html +++ b/dev/whatsnext/index.html @@ -3,4 +3,4 @@ function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-0SJ9WPE2ZH', {'page_path': location.pathname + location.search + location.hash}); -

What's next?

Aspirational features for future releases include:

  • Other imputation methods;
  • 2-level imputation and
  • CUDA support.

If there are any features you particularly want to see, please raise an issue on the issues page.


Funded by Wellcome     Wellcome logo
+

What's next?

Aspirational features for future releases include:

  • Other imputation methods;
  • 2-level imputation and
  • CUDA support.

If there are any features you particularly want to see, please raise an issue on the issues page.


Funded by Wellcome     Wellcome logo
diff --git a/dev/wrangling/index.html b/dev/wrangling/index.html index a3863fd..315af39 100644 --- a/dev/wrangling/index.html +++ b/dev/wrangling/index.html @@ -86,4 +86,4 @@ # 1 # 2 # 2 -# 3

As you can see, the col2 column is now a CategoricalArray. This is a special type of array that allows you to specify that a column is categorical. This is important, because mice will treat categorical variables differently to continuous variables.

If you have a column that contains strings (with or without missings, but with no other types), Mice.jl will handle it as a categorical variable automatically.


Funded by Wellcome     Wellcome logo
+# 3

As you can see, the col2 column is now a CategoricalArray. This is a special type of array that allows you to specify that a column is categorical. This is important, because mice will treat categorical variables differently to continuous variables.

If you have a column that contains strings (with or without missings, but with no other types), Mice.jl will handle it as a categorical variable automatically.


Funded by Wellcome     Wellcome logo