diff --git a/news/index.html b/news/index.html
index 05d8d76..90a59e9 100644
--- a/news/index.html
+++ b/news/index.html
@@ -39,7 +39,7 @@
Changelog
midoc 1.0.0
CRAN release: 2024-10-02
-
Changes coming in version 1.0
+
Changes in version 1.0
diff --git a/pkgdown.yml b/pkgdown.yml
index f7d9e7b..48d5b32 100644
--- a/pkgdown.yml
+++ b/pkgdown.yml
@@ -3,7 +3,7 @@ pkgdown: 2.1.1
pkgdown_sha: ~
articles:
midoc: midoc.html
-last_built: 2024-10-04T09:11Z
+last_built: 2024-10-04T09:13Z
urls:
reference: https://elliecurnow.github.io/midoc/reference
article: https://elliecurnow.github.io/midoc/articles
diff --git a/search.json b/search.json
index c964c95..2e6569f 100644
--- a/search.json
+++ b/search.json
@@ -1 +1 @@
-[{"path":"https://elliecurnow.github.io/midoc/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2022 midoc authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://elliecurnow.github.io/midoc/articles/midoc.html","id":"about-midoc","dir":"Articles","previous_headings":"","what":"About midoc","title":"Multiple Imputation DOCtor (midoc)","text":"Missing data common issue health social research, often addressed multiple imputation (MI). MI flexible general approach, suite software packages. However, using MI practice can complex. Application MI involves multiple decisions rarely justified even documented, little guidance available. Multiple Imputation DOCtor (midoc) R package decision-making system incorporates expert, --date guidance help choose appropriate analysis method missing data. midoc guide analysis, examining hypothesised causal relationships observed data advise whether MI needed, perform . midoc follows framework treatment reporting missing data observational studies (TARMOS) 1. assume interested obtaining unbiased estimates regression coefficients - note bias necessarily concern interest prediction (.e. diagnostic/prognostic modelling). , demonstrate key features midoc using worked example. example, wish estimate association maternal age first pregnancy (exposure) child’s body mass index (BMI) age 7 years (outcome). simplicity, consider one confounder relationship maternal age BMI age 7 years, maternal education level. Note simulated data study included midoc package bmi dataset. dataset contains 1000 observations, realistic values variable, exaggerated relationships variables (highlight consequences choice analysis approach). Note interactive version vignette: Multiple Imputation DOCtor (midoc) Shiny version also available run locally (can run using midoc command midocVignette()). interactive version, can apply features midoc described using DAG data.","code":""},{"path":"https://elliecurnow.github.io/midoc/articles/midoc.html","id":"step-1-specify-the-analysis-and-missingness-models-using-a-directed-acyclic-graph","dir":"Articles","previous_headings":"","what":"Step 1 Specify the analysis and missingness models using a directed acyclic graph","title":"Multiple Imputation DOCtor (midoc)","text":"First, construct causal diagram, directed acyclic graph (DAG) example, using syntax per dagitty package. start specifying relationships variables, assuming missing data. assume maternal age (matage) causes BMI age 7 years (bmi7), maternal education level (mated) causes maternal age BMI age 7 years. can express relationships using “dagitty” syntax, follows: Next, partially observed variable, specify variables related probability missing (“missingness”) adding relationships DAG. type DAG often referred “missingness” DAG (mDAG) 2, 3. first use midoc function descMissData identify variables dataset partially observed, specifying outcome (y), covariates, .e. independent variables, (covs), dataset (data), follows. see two missing data patterns: either variables observed, BMI age 7 years missing covariates observed. use indicator variable “R” denote missingness BMI age 7 years (example, R=1 BMI age 7 years observed, 0 otherwise). specific example, R also indicates complete record (R=1 variables fully observed, 0 otherwise) variables fully observed. suppose R related maternal education level via socio-economic position (SEP), .e. SEP cause maternal education level R, neither BMI age 7 years maternal age causes R. suppose SEP missing (unmeasured) individuals dataset; remind us fact, name variable sep_unmeas. mDAG now follows (note follow convention using lower case names variables code, R becomes “r”, ): Note instead believe maternal education direct cause R, mDAG follows: now draw mDAG visually check relationships specified intended: Note used additional commands specify layout mDAG shown - although necessary using midoc, go dagitty website like find using “dagitty” draw mDAGs. final check mDAG, use midoc function exploreDAG explore whether relationships dataset consistent proposed mDAG, specifying mDAG (mdag) dataset (data), follows. Based relationships fully observed variables maternal age, maternal education, missingness BMI age 7 years, can see little evidence inconsistency dataset proposed mDAG. particular, mDAG assumes maternal age (matage) unrelated missingness BMI age 7 years (r), given maternal education (mated); results suggest plausible. Note use observed data determine whether BMI age 7 years unrelated missingness - need missing values BMI age 7 years order . However, BMI age 7 years cause missingness, expect maternal age also related missingness (via BMI age 7 years). Since maternal age seems unrelated, reassured BMI age 7 years also likely unrelated, given maternal education. Tips specifying “missingness” DAG First specify DAG analysis model, missing data. may find introduction DAGs useful 4. Next add missingness indicator(s) DAG. multiple variables missing data, may want start including just complete records indicator DAG. Identify variables related missingness using: Subject-matter knowledge, example, prior research causes drop-study knowledge data collection process Data exploration, example, performing logistic regression missingness indicator analysis model variables - noting may exclude variables large proportion missing data avoid perfect prediction","code":"matage -> bmi7 mated -> matage mated -> bmi7 descMissData(y=\"bmi7\", covs=\"matage mated\", data=bmi) pattern bmi7 matage mated n pct 1 1 1 1 1 592 59 2 2 0 1 1 408 41 matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r matage -> bmi7 mated -> matage mated -> bmi7 mated -> r exploreDAG(mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r\", data=bmi) The proposed directed acyclic graph (DAG) implies the following conditional independencies (where, for example, 'X _||_ Y | Z' should be read as 'X is independent of Y conditional on Z'). Note that variable names are abbreviated: bmi7 _||_ r | sp_n bmi7 _||_ r | matd bmi7 _||_ sp_n | matd matg _||_ r | sp_n matg _||_ r | matd matg _||_ sp_n | matd matd _||_ r | sp_n These (conditional) independence statements are explored below using the canonical correlations approach for mixed data. See ??dagitty::localTests for further details. Results are shown for variables that are fully observed in the specified dataset. The null hypothesis is that the stated variables are (conditionally) independent. estimate p.value 2.5% 97.5% matage _||_ r | mated 0.02998323 0.343547 -0.03206946 0.09180567 Interpretation: A small p-value means the stated variables may not be (conditionally) independent in the specified dataset: your data may not be consistent with the proposed DAG. A large p-value means there is little evidence of inconsistency between your data and the proposed DAG. Note that these results assume that relationships between variables are linear. Consider exploring the specification of each relationship in your model. Also consider whether it is valid and possible to explore relationships between partially observed variables using the observed data, e.g. avoiding perfect prediction."},{"path":"https://elliecurnow.github.io/midoc/articles/midoc.html","id":"step-2-check-whether-complete-records-analysis-is-likely-to-be-a-valid-strategy","dir":"Articles","previous_headings":"","what":"Step 2 Check whether complete records analysis is likely to be a valid strategy","title":"Multiple Imputation DOCtor (midoc)","text":"next step determine whether complete records analysis (CRA) valid strategy, using mDAG. Remember , general, CRA valid analysis model outcome unrelated complete records indicator, conditional analysis model covariates 5 (special cases, depending type analysis model estimand interest, rule can relaxed 6 - , consider general setting without making assumptions fitted model). Suppose decide estimate unadjusted association BMI age 7 years maternal age, without including confounder maternal education model. use midoc function checkCRA applied mDAG check whether CRA valid model, specifying outcome (y), covariates, .e. independent variables, (covs), complete records indicator (r_cra), mDAG (mdag), follows: can see CRA valid (can also tell inspecting DAG: open path bmi7 r via mated sep_unmeas condition matage). checkCRA suggests CRA valid included mated, mated sep_unmeas, analysis model. particular setting, sensible include mated analysis model since confounder relationship matage bmi7. settings, might want include variables required valid CRA model might change interpretation - case, need use different analysis strategy. Note sep_unmeas related bmi7 condition mated (though still related missingness bmi7), need included analysis model. add mated model re-run checkCRA, , see CRA now valid. Note outcome, BMI age 7 years, cause missingness, CRA always invalid, .e. variables add analysis model make CRA valid. See see results checkCRA case (note, code, added path bmi7 r specified mDAG).","code":"checkCRA(y=\"bmi7\", covs=\"matage\", r_cra=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r\") Based on the proposed directed acyclic graph (DAG), the analysis model outcome and complete record indicator are not independent given analysis model covariates. Hence, in general, complete records analysis is not valid. In special cases, depending on the type of analysis model and estimand of interest, complete records analysis may still be valid. See, for example, Bartlett et al. (2015) (https://doi.org/10.1093/aje/kwv114) for further details. Consider using a different analysis model and/or strategy, e.g. multiple imputation. For example, the analysis model outcome and complete record indicator are independent if, in addition to the specified covariates, the following sets of variables are included as covariates in the analysis model (note that this list is not necessarily exhaustive, particularly if your DAG is complex): mated c(\"mated\", \"sep_unmeas\") checkCRA(y=\"bmi7\", covs=\"matage mated\", r_cra=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r\") Based on the proposed directed acyclic graph (DAG), the analysis model outcome and complete record indicator are independent given analysis model covariates. Hence, complete records analysis is valid. checkCRA(y=\"bmi7\", covs=\"matage mated\", r_cra=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r bmi7 -> r\") Based on the proposed directed acyclic graph (DAG), the analysis model outcome and complete record indicator are not independent given analysis model covariates. Hence, in general, complete records analysis is not valid. In special cases, depending on the type of analysis model and estimand of interest, complete records analysis may still be valid. See, for example, Bartlett et al. (2015) (https://doi.org/10.1093/aje/kwv114) for further details. There are no other variables which could be added to the model to make the analysis model outcome and complete record indicator conditionally independent. Consider using a different strategy e.g. multiple imputation."},{"path":"https://elliecurnow.github.io/midoc/articles/midoc.html","id":"step-3-check-whether-multiple-imputation-is-likely-to-be-a-valid-strategy","dir":"Articles","previous_headings":"","what":"Step 3 Check whether multiple imputation is likely to be a valid strategy","title":"Multiple Imputation DOCtor (midoc)","text":"Although CRA valid example, may also wish perform MI. Remember MI valid principle partially observed variable unrelated missingness, given imputation model predictors. Furthermore, include analysis model variables imputation model partially observed variable, form implied analysis model, analysis imputation models “compatible”. theory, given multiple partially observed variables, validity MI may imply different causes missingness missing data pattern. example, BMI age 7 years maternal education partially observed, MI valid missingness BMI age 7 years unrelated maternal education among individuals missing BMI age 7 years maternal education (given observed data). Missingness BMI age 7 years related maternal education among individuals observed maternal education. practice, recommend focusing common missing data patterns /variables missing data. Less common missing data patterns can often assumed missing completely random - unlikely change final conclusions assumption incorrect. example, single partially observed variable (BMI age 7 years), relatively simple check validity MI based mDAG. already verified (using checkCRA) BMI age 7 years unrelated missingness, given maternal age maternal education. Therefore, know MI valid use variables imputation model BMI age 7 years (analysis model imputation model exactly case). However, MI using just maternal age maternal education imputation model BMI age 7 years recover additional information compared CRA. Therefore, may wish include “auxiliary variables” imputation model BMI age 7 years. additional variables included predictors imputation model required analysis model. choose auxiliary variables predictive BMI age 7 years, can improve precision MI estimate - reduce standard error - compared CRA estimate. example, two variables used auxiliary variables: pregnancy size - singleton multiple birth - (pregsize) birth weight (bwt). inspect missing data patterns dataset using descMissData, including auxiliary variables. can see auxiliary variables fully observed. assume pregnancy size cause BMI age 7 years, missingness. assume birth weight related BMI 7 years (via pregnancy size) missingness (via SEP). now add variables mDAG. , shown updated mDAG. also explore whether relationships dataset consistent updated mDAG using exploreDAG, follows. results suggest updated mDAG plausible. Note CRA still valid updated mDAG. can check using checkCRA : now use midoc function checkMI applied DAG check whether MI valid imputation model predictors BMI age 7 years include pregnancy size birth weight, well maternal age maternal education. specify partially observed variable (dep), predictors (preds), missingness indicator partially observed variable (r_dep), mDAG (mdag). first consider imputation model including pregnancy size. results shown . suggest MI valid principle included pregnancy size well analysis model variables imputation model BMI age 7 years. next consider imputation model including birth weight. results shown . suggest MI valid included birth weight well analysis model variables imputation model BMI age 7 years. can also tell inspecting mDAG: since bwt shares common cause bmi7 r, “collider”, hence conditioning bwt opens path bmi7 r via bwt. Note theory, suggested checkMI results shown , MI valid added birth weight pregnancy size auxiliary variables imputation model (note SEP needed, conditional imputation model predictors). However, practice, strategy may still result biased estimates, due unmeasured confounding relationship BMI age 7 years birth weight. recommend including colliders partially observed variable missingness auxiliary variables 7.","code":"descMissData(y=\"bmi7\", covs=\"matage mated pregsize bwt\", data=bmi) pattern bmi7 matage mated pregsize bwt n pct 1 1 1 1 1 1 1 592 59 2 2 0 1 1 1 1 408 41 exploreDAG(mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\", data=bmi) The proposed directed acyclic graph (DAG) implies the following conditional independencies (where, for example, 'X _||_ Y | Z' should be read as 'X is independent of Y conditional on Z'). Note that variable names are abbreviated: bmi7 _||_ bwt | prgs, sp_n bmi7 _||_ bwt | matd, prgs bmi7 _||_ r | sp_n bmi7 _||_ r | matd bmi7 _||_ sp_n | matd bwt _||_ matg | matd bwt _||_ matg | sp_n bwt _||_ matd | sp_n bwt _||_ r | sp_n matg _||_ prgs matg _||_ r | sp_n matg _||_ r | matd matg _||_ sp_n | matd matd _||_ prgs matd _||_ r | sp_n prgs _||_ r prgs _||_ sp_n These (conditional) independence statements are explored below using the canonical correlations approach for mixed data. See ??dagitty::localTests for further details. Results are shown for variables that are fully observed in the specified dataset. The null hypothesis is that the stated variables are (conditionally) independent. estimate p.value 2.5% 97.5% bwt _||_ matage | mated 0.05018898 0.1127099 -0.01184095 0.11183410 matage _||_ pregsize 0.03029139 0.3386080 -0.03176134 0.09211150 matage _||_ r | mated 0.02998323 0.3435470 -0.03206946 0.09180567 mated _||_ pregsize 0.01594976 0.6144181 -0.04608889 0.07786585 pregsize _||_ r 0.01482015 0.6397174 -0.04721631 0.07674273 Interpretation: A small p-value means the stated variables may not be (conditionally) independent in the specified dataset: your data may not be consistent with the proposed DAG. A large p-value means there is little evidence of inconsistency between your data and the proposed DAG. Note that these results assume that relationships between variables are linear. Consider exploring the specification of each relationship in your model. Also consider whether it is valid and possible to explore relationships between partially observed variables using the observed data, e.g. avoiding perfect prediction. checkCRA(y=\"bmi7\", covs=\"matage mated\", r_cra=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\") Based on the proposed directed acyclic graph (DAG), the analysis model outcome and complete record indicator are independent given analysis model covariates. Hence, complete records analysis is valid. checkMI(dep=\"bmi7\", preds=\"matage mated pregsize\", r_dep=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\") Based on the proposed directed acyclic graph (DAG), the incomplete variable and its missingness indicator are independent given imputation model predictors. Hence, multiple imputation methods which assume data are missing at random are valid in principle. checkMI(dep=\"bmi7\", preds=\"matage mated bwt\", r_dep=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\") Based on the proposed directed acyclic graph (DAG), the incomplete variable and its missingness indicator are not independent given imputation model predictors. Hence, multiple imputation methods which assume data are missing at random are not valid. Consider using a different imputation model and/or strategy (e.g. not-at-random fully conditional specification). For example, the incomplete variable and its missingness indicator are independent if, in addition to the specified predictors, the following sets of variables are included as predictors in the imputation model (note that this list is not necessarily exhaustive, particularly if your DAG is complex): pregsize c(\"pregsize\", \"sep_unmeas\")"},{"path":"https://elliecurnow.github.io/midoc/articles/midoc.html","id":"step-4-check-that-all-relationships-are-correctly-specified","dir":"Articles","previous_headings":"","what":"Step 4 Check that all relationships are correctly specified","title":"Multiple Imputation DOCtor (midoc)","text":"far, explored whether CRA MI valid principle using mDAG, without making assumptions form variables, relationships . However, MI give unbiased estimates, imputation models must compatible analysis model correctly specified: must contain variables required analysis model, must include relationships implied analysis model e.g. interactions, must specify form relationships correctly 8. Since CRA MI valid principle worked example, use complete records bmi dataset explore specification relationships BMI age 7 years predictors (analysis model variables, maternal age maternal education, plus auxiliary variable, pregnancy size) imputation model. use midoc function checkModSpec applied bmi dataset check whether imputation model correctly specified. specify formula imputation model using standard R syntax (formula), type imputation model (family) (note midoc currently supports either linear logistic regression models), name dataset (data). Since maternal education pregnancy size binary variables, need explore form relationship BMI age 7 years continuous exposure, maternal age. first assume linear relationship BMI age 7 years maternal age (note, default software implementations MI). assume interactions. results shown . suggest imputation model mis-specified. plot residuals versus fitted values model (automatically displayed evidence model mis-specification), suggests may quadratic relationship BMI age 7 years maternal age. use midoc function checkModSpec , time specifying quadratic relationship BMI age 7 years maternal age. results suggest longer evidence model mis-specification. Note must make sure account non-linear relationship BMI age 7 years maternal age imputation models. example, imputation model pregnancy size need include BMI age 7 years, maternal education, quadratic form maternal age (induced conditioning BMI age 7 years). Although missing values pregnancy size dataset, can still explore specification need using checkModSpec follows (note suppressed plot case using plot = FALSE option): evidence model mis-specification. include quadratic form maternal age model pregnancy size, little evidence model mis-specification: Tips imputation model variable selection imputation model partially observed variable include: analysis model variables - check relationships partially observed variable predictors correctly specified imputation model e.g. using fractional polynomial selection auxiliary variables related missingness partially observed variable missing data , conditional analysis model variables Auxiliary variables related missing data missingness partially observed variable, conditional variables selected Steps 1 2 - large number variables, include predictive imputation model (using suitable variable selection method identify ) imputation model partially observed variable exclude: auxiliary variables related missingness partially observed variable missing data, conditional variables selected Steps 1, 2, 3 auxiliary variables colliders partially observed variable missingness","code":"checkModSpec(formula=\"bmi7~matage+mated+pregsize\", family=\"gaussian(identity)\", data=bmi) Model mis-specification method: regression of model residuals on a fractional polynomial of the fitted values P-value: 0 A small p-value means the model may be mis-specified. Check the specification of each relationship in your model. checkModSpec(formula=\"bmi7~matage+I(matage^2)+mated+pregsize\", family=\"gaussian(identity)\", data=bmi) Model mis-specification method: regression of model residuals on a fractional polynomial of the fitted values P-value: 1 A large p-value means there is little evidence of model mis-specification. checkModSpec(formula=\"pregsize~matage+bmi7+mated\", family=\"binomial(logit)\", data=bmi, plot=FALSE) Model mis-specification method: Pregibon's link test P-value: 0.038313 A small p-value means the model may be mis-specified. Check the specification of each relationship in your model. checkModSpec(formula=\"pregsize~matage+I(matage^2)+bmi7+mated\", family=\"binomial(logit)\", data=bmi) Model mis-specification method: Pregibon's link test P-value: 0.555356 A large p-value means there is little evidence of model mis-specification."},{"path":"https://elliecurnow.github.io/midoc/articles/midoc.html","id":"step-5-perform-mi-using-the-proposed-imputation-model","dir":"Articles","previous_headings":"","what":"Step 5 Perform MI using the proposed imputation model","title":"Multiple Imputation DOCtor (midoc)","text":"explored validity MI principle, using mDAG, specification imputation model, based observed data. now use midoc function proposeMI choose best options performing MI using mice package. first save chosen imputation model (.e. specifying quadratic relationship BMI age 7 years maternal age) mimod object. Note suppressed checkModSpec message case using message = FALSE option. use , along dataset, construct call “mice” function. Note also save proposed “mice” call miprop object, used later. results shown . particular, note proposed “mice” call, default values number imputations, method, formulas, number iterations changed. Plots distributions imputed observed data, based sample five imputed datasets, suggest extreme values handled appropriately using proposed imputation method. Trace plots, showing mean standard deviation imputed values across iterations, also displayed. Note plots shown without prompting (plotprompt = FALSE). need adjust number iterations , dataset, one variable partially observed. Note Given multiple partially observed variables, can specify list imputation models - one partially observed variable - proposeMI. example, suppose pregnancy size also partially observed. assume, simplicity, pregnancy size missing completely random. construct proposed “mice” call using proposeMI, follows. , suppress model checking messages. Returning example, assume adjustment required proposed “mice” call. use midoc function doMImice perform MI, specifying proposed “mice” call (miprop) seed “mice” call (seed) (results reproducible). also specify substantive model interest (substmod): regression BMI 7 years maternal age (fitting quadratic relationship) maternal education. optional step: specify substantive model, fitted automatically imputed dataset pooled results displayed (equivalent using “mice” functions pool). substantive model specified, imputation step performed.","code":"mimod_bmi7 <- checkModSpec(formula=\"bmi7~matage+I(matage^2)+mated+pregsize\", family=\"gaussian(identity)\", data=bmi, message=FALSE) miprop <- proposeMI(mimodobj=mimod_bmi7, data=bmi, plotprompt=FALSE) Based on your proposed imputation model and dataset, your mice() call should be as follows: mice(data = bmi , # You may need to specify a subset of the columns in your dataset m = 41 , # You should use at least this number of imputations based on the proportion of complete records in your dataset method = c( 'norm' ) # Specify a method for each incomplete variable. If displayed, the box-and-whisker plots can be used to inform your choice of method(s): for example, if the imputation model does not predict extreme values appropriately, consider a different imputation model/method e.g. PMM. Note the distribution of imputed and observed values is displayed for numeric variables only. The distribution may differ if data are missing at random or missing not at random. If you suspect data are missing not at random, the plots can also inform your choice of sensitivity parameter. formulas = formulas_list , # Note that you do not additionally need to specify a 'predmatrix' # The formulas_list specifies the conditional imputation models, which are as follows: 'bmi7 ~ matage + I(matage^2) + mated + pregsize' maxit = 10 , # If you have more than one incomplete variable, you should check this number of iterations is sufficient by inspecting the trace plots, if displayed. Consider increasing the number of iterations if there is a trend that does not stabilise by the 10th iteration. Note that iteration is not performed when only one variable is partially observed. printFlag = FALSE , # Change to printFlag=TRUE to display the history as imputation is performed seed = NA) # It is good practice to choose a seed so your results are reproducible mimod_bmi7 <- checkModSpec(formula=\"bmi7~matage+I(matage^2)+mated+pregsize\", family=\"gaussian(identity)\", data=bmi, message=FALSE) mimod_pregsize <- checkModSpec(formula=\"pregsize~bmi7+matage+I(matage^2)+mated\", family=\"binomial(logit)\", data=bmi, message=FALSE) proposeMI(mimodobj=list(mimod_bmi7, mimod_pregsize), data=bmi) doMImice(miprop, seed=123, substmod=\"lm(bmi7 ~ matage + I(matage^2) + mated)\") Given the substantive model: lm(bmi7 ~ matage + I(matage^2) + mated) , multiple imputation estimates are as follows: term estimate std.error statistic df p.value 1 (Intercept) 17.6607324 0.07126548 247.816079 233.1668 2.116834e-284 2 matage 1.1504545 0.05230345 21.995769 184.5081 1.863532e-53 3 I(matage^2) 0.8414975 0.03231752 26.038433 257.1270 4.754845e-74 4 mated1 -1.0026194 0.10787751 -9.294054 159.1101 1.094881e-16 2.5 % 97.5 % 1 17.5203258 17.8011389 2 1.0472648 1.2536442 3 0.7778567 0.9051382 4 -1.2156760 -0.7895629"},{"path":"https://elliecurnow.github.io/midoc/articles/midoc.html","id":"illustration-using-our-worked-example","dir":"Articles","previous_headings":"","what":"Illustration using our worked example","title":"Multiple Imputation DOCtor (midoc)","text":"Finally, illustrate choice analysis approach affects estimated association maternal age BMI age 7 years, adjusted maternal education level. compare CRA MI estimates. performing MI, used either pregnancy size birth weight auxiliary variable fitted either linear quadratic relationship BMI age 7 years maternal age imputation model. analysis approach, fitted substantive analysis model used . parameter estimates linear quadratic terms maternal age, 95% confidence intervals, shown table . Note , simulated data missingness, know “true” association .e. association missing data - shown “Full data” row table. note results displayed third row (“MI fitting quadratic relationship, using pregnancy size”) exactly generated . avoid repetition, shown code fitting models. table, can see CRA MI (fitting quadratic relationship BMI age 7 years maternal age imputation model) estimates unbiased linear quadratic terms maternal age. MI estimates biased fitting linear relationship imputation model, particularly quadratic term maternal age. MI estimates using collider, birth weight, auxiliary variable slightly biased slightly less precise estimates using pregnancy size auxiliary variable. collider bias relatively small association BMI age 7 years maternal age strong setting. Note collider bias relatively larger association weak 9. Parameter estimates maternal age","code":""},{"path":"https://elliecurnow.github.io/midoc/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Elinor Curnow. Author, maintainer, copyright holder. Jon Heron. Author. Rosie Cornish. Author. Kate Tilling. Author. James Carpenter. Author.","code":""},{"path":"https://elliecurnow.github.io/midoc/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Curnow E, Heron J, Cornish R, Tilling K, Carpenter J (2024). midoc: Decision-Making System Multiple Imputation. R package version 1.0.0, https://elliecurnow.github.io/midoc/.","code":"@Manual{, title = {midoc: A Decision-Making System for Multiple Imputation}, author = {Elinor Curnow and Jon Heron and Rosie Cornish and Kate Tilling and James Carpenter}, year = {2024}, note = {R package version 1.0.0}, url = {https://elliecurnow.github.io/midoc/}, }"},{"path":[]},{"path":"https://elliecurnow.github.io/midoc/index.html","id":"overview","dir":"","previous_headings":"","what":"Overview","title":"A Decision-Making System for Multiple Imputation","text":"Multiple Imputation DOCtor (midoc) R package guidance system analysis missing data. incorporates expert, --date methodology help choose appropriate analysis method missing data. examining available data assumed causal structure, midoc advise whether multiple imputation needed, , best perform . descMissData lists missing data patterns specified dataset exploreDAG compares relationships available data proposed DAG checkCRA checks complete records analysis valid proposed analysis model checkMI checks multiple imputation valid proposed imputation model checkModSpec explores parametric specification imputation model proposeMI suggests multiple imputation options based available data specified imputation model doMImice performs multiple imputation based proposeMI options can learn commands vignette(\"midoc\",\"midoc\").","code":""},{"path":"https://elliecurnow.github.io/midoc/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"A Decision-Making System for Multiple Imputation","text":"can install development version midoc GitHub :","code":"# install.packages(\"devtools\") devtools::install_github(\"elliecurnow/midoc\")"},{"path":"https://elliecurnow.github.io/midoc/index.html","id":"usage","dir":"","previous_headings":"","what":"Usage","title":"A Decision-Making System for Multiple Imputation","text":"","code":"library(midoc) head(bmi) #> bmi7 matage mated pregsize bwt r #> 1 15.16444 -1.30048035 0 0 3.287754 1 #> 2 18.00250 -0.33689915 0 0 3.770346 1 #> 3 NA -0.22673432 0 1 3.022161 0 #> 4 NA 0.81459107 1 0 3.103251 0 #> 5 17.97791 -0.55260086 0 0 3.830381 1 #> 6 NA -0.03829346 1 0 2.775282 0 descMissData(y=\"bmi7\", covs=\"matage mated\", data=bmi, plot=TRUE) #> pattern bmi7 matage mated n pct #> 1 1 1 1 1 592 59 #> 2 2 0 1 1 408 41 exploreDAG(mdag=\" matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\", data=bmi) #> The proposed directed acyclic graph (DAG) implies the following #> conditional independencies (where, for example, 'X _||_ Y | Z' should #> be read as 'X is independent of Y conditional on Z'). Note that #> variable names are abbreviated: #> #> bmi7 _||_ bwt | prgs, sp_n #> #> bmi7 _||_ bwt | matd, prgs #> #> bmi7 _||_ r | sp_n #> #> bmi7 _||_ r | matd #> #> bmi7 _||_ sp_n | matd #> #> bwt _||_ matg | matd #> #> bwt _||_ matg | sp_n #> #> bwt _||_ matd | sp_n #> #> bwt _||_ r | sp_n #> #> matg _||_ prgs #> #> matg _||_ r | sp_n #> #> matg _||_ r | matd #> #> matg _||_ sp_n | matd #> #> matd _||_ prgs #> #> matd _||_ r | sp_n #> #> prgs _||_ r #> #> prgs _||_ sp_n #> #> These (conditional) independence statements are explored below using #> the canonical correlations approach for mixed data. See #> ??dagitty::localTests for further details. Results are shown for #> variables that are fully observed in the specified dataset. The null #> hypothesis is that the stated variables are (conditionally) #> independent. #> #> estimate p.value 2.5% 97.5% #> #> bwt _||_ matage | mated 0.05018898 0.1127099 -0.01184095 0.11183410 #> #> matage _||_ pregsize 0.03029139 0.3386080 -0.03176134 0.09211150 #> #> matage _||_ r | mated 0.02998323 0.3435470 -0.03206946 0.09180567 #> #> mated _||_ pregsize 0.01594976 0.6144181 -0.04608889 0.07786585 #> #> pregsize _||_ r 0.01482015 0.6397174 -0.04721631 0.07674273 #> #> Interpretation: A small p-value means the stated variables may not be #> (conditionally) independent in the specified dataset: your data may not #> be consistent with the proposed DAG. A large p-value means there is #> little evidence of inconsistency between your data and the proposed #> DAG. #> #> Note that these results assume that relationships between variables are #> linear. Consider exploring the specification of each relationship in #> your model. Also consider whether it is valid and possible to explore #> relationships between partially observed variables using the observed #> data, e.g. avoiding perfect prediction. checkCRA(y=\"bmi7\", covs=\"matage mated\", r_cra=\"r\", mdag=\" matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\") #> Based on the proposed directed acyclic graph (DAG), the analysis model #> outcome and complete record indicator are independent given analysis #> model covariates. Hence, complete records analysis is valid. checkMI(dep=\"bmi7\", preds=\"matage mated pregsize\", r_dep=\"r\", mdag=\" matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\") #> Based on the proposed directed acyclic graph (DAG), the incomplete #> variable and its missingness indicator are independent given imputation #> model predictors. Hence, multiple imputation methods which assume data #> are missing at random are valid in principle. mimod_bmi7 <- checkModSpec(formula=\"bmi7~matage+I(matage^2)+mated+pregsize\", family=\"gaussian(identity)\", data=bmi) #> Model mis-specification method: regression of model residuals on a #> fractional polynomial of the fitted values #> #> P-value: 1 #> #> A large p-value means there is little evidence of model #> mis-specification. miprop <- proposeMI(mimodobj=mimod_bmi7, data=bmi) #> Based on your proposed imputation model and dataset, your mice() call #> should be as follows: #> #> mice(data = bmi , # You may need to specify a subset of the columns in #> your dataset #> #> m = 41 , # You should use at least this number of imputations based on #> the proportion of complete records in your dataset #> #> method = c( 'norm' ) # Specify a method for each incomplete variable. #> If displayed, the box-and-whisker plots can be used to inform your #> choice of method(s): for example, if the imputation model does not #> predict extreme values appropriately, consider a different imputation #> model/method e.g. PMM. Note the distribution of imputed and observed #> values is displayed for numeric variables only. The distribution may #> differ if data are missing at random or missing not at random. If you #> suspect data are missing not at random, the plots can also inform your #> choice of sensitivity parameter. #> #> formulas = formulas_list , # Note that you do not additionally need to #> specify a 'predmatrix' #> #> # The formulas_list specifies the conditional imputation models, which #> are as follows: #> #> 'bmi7 ~ matage + I(matage^2) + mated + pregsize' #> #> maxit = 10 , # If you have more than one incomplete variable, you #> should check this number of iterations is sufficient by inspecting the #> trace plots, if displayed. Consider increasing the number of iterations #> if there is a trend that does not stabilise by the 10th iteration. Note #> that iteration is not performed when only one variable is partially #> observed. #> #> printFlag = FALSE , # Change to printFlag=TRUE to display the history #> as imputation is performed #> #> seed = NA) # It is good practice to choose a seed so your results are #> reproducible doMImice(miprop, 123, substmod=\"lm(bmi7 ~ matage + I(matage^2) + mated)\") #> Given the substantive model: lm(bmi7 ~ matage + I(matage^2) + mated) , #> multiple imputation estimates are as follows: #> #> term estimate std.error statistic df p.value #> #> 1 (Intercept) 17.6607324 0.07126548 247.816079 233.1668 2.116834e-284 #> #> 2 matage 1.1504545 0.05230345 21.995769 184.5081 1.863532e-53 #> #> 3 I(matage^2) 0.8414975 0.03231752 26.038433 257.1270 4.754845e-74 #> #> 4 mated1 -1.0026194 0.10787751 -9.294054 159.1101 1.094881e-16 #> #> 2.5 % 97.5 % #> #> 1 17.5203258 17.8011389 #> #> 2 1.0472648 1.2536442 #> #> 3 0.7778567 0.9051382 #> #> 4 -1.2156760 -0.7895629"},{"path":"https://elliecurnow.github.io/midoc/reference/bmi.html","id":null,"dir":"Reference","previous_headings":"","what":"Child body mass index data — bmi","title":"Child body mass index data — bmi","text":"simulated dataset","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/bmi.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Child body mass index data — bmi","text":"","code":"bmi"},{"path":[]},{"path":"https://elliecurnow.github.io/midoc/reference/bmi.html","id":"bmi","dir":"Reference","previous_headings":"","what":"bmi","title":"Child body mass index data — bmi","text":"data frame 1000 rows 6 columns: bmi7 Child's body mass index age 7 years matage Mother's age pregnancy, standardised relative mean age 30 mated Mother's educational level: post-16 years qualification pregsize Mother's pregnancy size: singleton twins bwt Child's birth weight kilograms r Missingness indicator: whether bmi7 reported ","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkcra.html","id":null,"dir":"Reference","previous_headings":"","what":"Inspect complete records analysis model — checkCRA","title":"Inspect complete records analysis model — checkCRA","text":"Check complete records analysis valid proposed analysis model directed acyclic graph (DAG). Validity means proposed approach allow unbiased estimation estimand(s) interest, including regression parameters, associations, causal effects.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkcra.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Inspect complete records analysis model — checkCRA","text":"","code":"checkCRA(y, covs, r_cra, mdag)"},{"path":"https://elliecurnow.github.io/midoc/reference/checkcra.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Inspect complete records analysis model — checkCRA","text":"y analysis model outcome, specified string covs analysis model covariate(s), specified string (space delimited) r_cra complete record indicator, specified string mdag DAG, specified string using dagitty syntax","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkcra.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Inspect complete records analysis model — checkCRA","text":"message indicating whether complete records analysis valid proposed DAG analysis model outcome covariate(s)","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkcra.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Inspect complete records analysis model — checkCRA","text":"DAG include observed unobserved variables related analysis model variables missingness, well required missingness indicators. general, complete records analysis valid analysis model outcome complete record indicator unrelated, conditional specified covariates. determined using proposed DAG checking whether analysis model complete record indicator 'd-separated', given covariates.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkcra.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Inspect complete records analysis model — checkCRA","text":"Hughes R, Heron J, Sterne J, Tilling K. 2019. Accounting missing data statistical analyses: multiple imputation always answer. Int J Epidemiol. doi:10.1093/ije/dyz032 Bartlett JW, Harel O, Carpenter JR. 2015. Asymptotically Unbiased Estimation Exposure Odds Ratios Complete Records Logistic Regression. J Epidemiol. doi:10.1093/aje/kwv114","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkcra.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Inspect complete records analysis model — checkCRA","text":"","code":"# Example DAG for which complete records analysis is not valid, but could be ## valid for a different set of covariates checkCRA(y=\"bmi7\", covs=\"matage\", r_cra=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r\") #> Based on the proposed directed acyclic graph (DAG), the analysis model #> outcome and complete record indicator are not independent given #> analysis model covariates. Hence, in general, complete records analysis #> is not valid. #> #> In special cases, depending on the type of analysis model and estimand #> of interest, complete records analysis may still be valid. See, for #> example, Bartlett et al. (2015) (https://doi.org/10.1093/aje/kwv114) #> for further details. #> #> Consider using a different analysis model and/or strategy, e.g. #> multiple imputation. #> #> For example, the analysis model outcome and complete record indicator #> are independent if, in addition to the specified covariates, the #> following sets of variables are included as covariates in the analysis #> model (note that this list is not necessarily exhaustive, particularly #> if your DAG is complex): #> #> mated #> #> c(\"mated\", \"sep_unmeas\") # For the DAG in the example above, complete records analysis is valid ## if a different set of covariates is used checkCRA(y=\"bmi7\", covs=\"matage mated\", r_cra=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r\") #> Based on the proposed directed acyclic graph (DAG), the analysis model #> outcome and complete record indicator are independent given analysis #> model covariates. Hence, complete records analysis is valid. # Example DAG for which complete records is not valid, but could be valid ## for a different estimand checkCRA(y=\"bmi7\", covs=\"matage mated\", r_cra=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r matage -> bmi3 mated -> bmi3 bmi3 -> bmi7 bmi3 -> r\") #> Based on the proposed directed acyclic graph (DAG), the analysis model #> outcome and complete record indicator are not independent given #> analysis model covariates. Hence, in general, complete records analysis #> is not valid. #> #> In special cases, depending on the type of analysis model and estimand #> of interest, complete records analysis may still be valid. See, for #> example, Bartlett et al. (2015) (https://doi.org/10.1093/aje/kwv114) #> for further details. #> #> There are no other variables which could be added to the model to make #> the analysis model outcome and complete record indicator conditionally #> independent, without changing the estimand of interest. Consider using #> a different strategy e.g. multiple imputation. #> #> Alternatively, consider whether a different estimand could be of #> interest. For example, the analysis model outcome and complete record #> indicator are independent given each of the following sets of #> variables: #> #> c(\"bmi3\", \"mated\") #> #> c(\"bmi3\", \"matage\", \"mated\") #> #> c(\"bmi3\", \"sep_unmeas\") #> #> c(\"bmi3\", \"matage\", \"sep_unmeas\") #> #> c(\"bmi3\", \"mated\", \"sep_unmeas\") #> #> c(\"bmi3\", \"matage\", \"mated\", \"sep_unmeas\") # Example DAG for which complete records analysis is never valid checkCRA(y=\"bmi7\", covs=\"matage mated\", r_cra=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r bmi7 -> r\") #> Based on the proposed directed acyclic graph (DAG), the analysis model #> outcome and complete record indicator are not independent given #> analysis model covariates. Hence, in general, complete records analysis #> is not valid. #> #> In special cases, depending on the type of analysis model and estimand #> of interest, complete records analysis may still be valid. See, for #> example, Bartlett et al. (2015) (https://doi.org/10.1093/aje/kwv114) #> for further details. #> #> There are no other variables which could be added to the model to make #> the analysis model outcome and complete record indicator conditionally #> independent. Consider using a different strategy e.g. multiple #> imputation."},{"path":"https://elliecurnow.github.io/midoc/reference/checkmi.html","id":null,"dir":"Reference","previous_headings":"","what":"Inspect multiple imputation model — checkMI","title":"Inspect multiple imputation model — checkMI","text":"Check multiple imputation valid proposed imputation model directed acyclic graph (DAG). Validity means proposed approach allow unbiased estimation estimand(s) interest, including regression parameters, associations, causal effects. imputation model include analysis model variables predictors, well auxiliary variables. DAG include observed unobserved variables related analysis model variables missingness, well required missingness indicators.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmi.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Inspect multiple imputation model — checkMI","text":"","code":"checkMI(dep, preds, r_dep, mdag)"},{"path":"https://elliecurnow.github.io/midoc/reference/checkmi.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Inspect multiple imputation model — checkMI","text":"dep partially observed variable imputed, specified string preds imputation model predictor(s), specified string (space delimited) r_dep partially observed variable's missingness indicator, specified string mdag DAG, specified string using dagitty syntax","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmi.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Inspect multiple imputation model — checkMI","text":"message indicating whether multiple imputation valid proposed DAG imputation model","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmi.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Inspect multiple imputation model — checkMI","text":"principle, multiple imputation valid partially observed variable unrelated missingness, given imputation model predictors.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmi.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Inspect multiple imputation model — checkMI","text":"Curnow E, Tilling K, Heron JE, Cornish RP, Carpenter JR. 2023. Multiple imputation missing data missing random: including collider auxiliary variable imputation model can induce bias. Frontiers Epidemiology. doi:10.3389/fepid.2023.1237447","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmi.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Inspect multiple imputation model — checkMI","text":"","code":"# Example DAG for which multiple imputation is valid checkMI(dep=\"bmi7\", preds=\"matage mated pregsize\", r_dep=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\") #> Based on the proposed directed acyclic graph (DAG), the incomplete #> variable and its missingness indicator are independent given imputation #> model predictors. Hence, multiple imputation methods which assume data #> are missing at random are valid in principle. # Example DAG for which multiple imputation is not valid, due to a collider checkMI(dep=\"bmi7\", preds=\"matage mated bwt\", r_dep=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\") #> Based on the proposed directed acyclic graph (DAG), the incomplete #> variable and its missingness indicator are not independent given #> imputation model predictors. Hence, multiple imputation methods which #> assume data are missing at random are not valid. #> #> Consider using a different imputation model and/or strategy (e.g. #> not-at-random fully conditional specification). For example, the #> incomplete variable and its missingness indicator are independent if, #> in addition to the specified predictors, the following sets of #> variables are included as predictors in the imputation model (note that #> this list is not necessarily exhaustive, particularly if your DAG is #> complex): #> #> pregsize #> #> c(\"pregsize\", \"sep_unmeas\")"},{"path":"https://elliecurnow.github.io/midoc/reference/checkmodspec.html","id":null,"dir":"Reference","previous_headings":"","what":"Inspect parametric model specification — checkModSpec","title":"Inspect parametric model specification — checkModSpec","text":"Explore whether observed relationships specified dataset consistent proposed parametric model (may represent analysis imputation model).","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmodspec.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Inspect parametric model specification — checkModSpec","text":"","code":"checkModSpec(formula, family, data, plot = TRUE, message = TRUE)"},{"path":"https://elliecurnow.github.io/midoc/reference/checkmodspec.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Inspect parametric model specification — checkModSpec","text":"formula symbolic description model fitted, dependent variable left ~ operator, covariates, separated + operators, right, specified string family description error distribution link function used model, specified string; family functions supported \"gaussian(identity)\" \"binomial(logit)\" data data frame containing variables stated formula plot TRUE (default) evidence model mis-specification, displays plot can used explore functional form covariate specified model; use plot = FALSE disable plot message TRUE (default), displays message indicating whether relationships dependent variable covariates likely correctly specified ; use message = FALSE suppress message","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmodspec.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Inspect parametric model specification — checkModSpec","text":"object type 'mimod' (list containing specified formula, family, dataset name). Optionally, message indicating whether relationships dependent variable covariates likely correctly specified . evidence model mis-specification, optionally returns plot model residuals versus fitted values can used explore appropriate functional form specified model.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmodspec.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Inspect parametric model specification — checkModSpec","text":"Curnow E, Carpenter JR, Heron JE, et al. 2023. Multiple imputation missing data missing random: compatible imputation models sufficient avoid bias mis-specified. J Clin Epidemiol. doi:10.1016/j.jclinepi.2023.06.011","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmodspec.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Inspect parametric model specification — checkModSpec","text":"","code":"# Example (incorrectly) assuming a linear relationship checkModSpec(formula=\"bmi7~matage+mated+pregsize\", family=\"gaussian(identity)\", data=bmi) #> Model mis-specification method: regression of model residuals on a #> fractional polynomial of the fitted values #> #> P-value: 0 #> #> A small p-value means the model may be mis-specified. Check the #> specification of each relationship in your model. ## For the example above, (correctly) assuming a quadratic relationship checkModSpec(formula=\"bmi7~matage+I(matage^2)+mated+pregsize\", family=\"gaussian(identity)\", data=bmi) #> Model mis-specification method: regression of model residuals on a #> fractional polynomial of the fitted values #> #> P-value: 1 #> #> A large p-value means there is little evidence of model #> mis-specification."},{"path":"https://elliecurnow.github.io/midoc/reference/descMissData.html","id":null,"dir":"Reference","previous_headings":"","what":"Lists missing data patterns in the specified dataset — descMissData","title":"Lists missing data patterns in the specified dataset — descMissData","text":"function summarises missing data patterns specified dataset. row output corresponds missing data pattern (1=observed, 0=missing). number percentage observations also displayed missing data pattern. first column indicates number missing data patterns. second column refers analysis model outcome ('y'), variables ('covs') displayed subsequent columns. Alternatively, 'y' can used display primary variable interest, e.g. 'y' refer exposure, variables listed 'covs'.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/descMissData.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Lists missing data patterns in the specified dataset — descMissData","text":"","code":"descMissData(y, covs, data, plot = FALSE)"},{"path":"https://elliecurnow.github.io/midoc/reference/descMissData.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Lists missing data patterns in the specified dataset — descMissData","text":"y analysis model outcome, specified string covs analysis model covariate(s), specified string (space delimited) data data frame containing specified analysis model outcome covariate(s) plot TRUE, displays plot using md.pattern visualise missing data patterns; use plot = FALSE (default) disable plot","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/descMissData.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Lists missing data patterns in the specified dataset — descMissData","text":"summary missing data patterns","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/descMissData.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Lists missing data patterns in the specified dataset — descMissData","text":"","code":"descMissData(y=\"bmi7\", covs=\"matage mated\", data=bmi) #> pattern bmi7 matage mated n pct #> 1 1 1 1 1 592 59 #> 2 2 0 1 1 408 41 descMissData(y=\"bmi7\", covs=\"matage mated pregsize bwt\", data=bmi, plot=TRUE) #> pattern bmi7 matage mated pregsize bwt n pct #> 1 1 1 1 1 1 1 592 59 #> 2 2 0 1 1 1 1 408 41"},{"path":"https://elliecurnow.github.io/midoc/reference/doMImice.html","id":null,"dir":"Reference","previous_headings":"","what":"Performs multiple imputation — doMImice","title":"Performs multiple imputation — doMImice","text":"Creates multiple imputations using mice, based options dataset specified call proposeMI. substantive model specified, also calculates pooled estimates using pool.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/doMImice.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Performs multiple imputation — doMImice","text":"","code":"doMImice(mipropobj, seed, substmod = \" \", message = TRUE)"},{"path":"https://elliecurnow.github.io/midoc/reference/doMImice.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Performs multiple imputation — doMImice","text":"mipropobj object type 'miprop', created call 'proposeMI' seed integer used set seed 'mice' call substmod Optionally, symbolic description substantive model fitted, specified string; supplied, model fitted imputed dataset results pooled message TRUE (default), displays message summarising analysis performed; use message = FALSE suppress message","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/doMImice.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Performs multiple imputation — doMImice","text":"'mice' object class 'mids' (multiply imputed datasets). Optionally, message summarising analysis performed.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/doMImice.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Performs multiple imputation — doMImice","text":"","code":"# First specify the imputation model as a 'mimod' object ## (suppressing the message) mimod_bmi7 <- checkModSpec(formula=\"bmi7~matage+I(matage^2)+mated+pregsize\", family=\"gaussian(identity)\", data=bmi, message=FALSE) # Save the proposed 'mice' options as a 'miprop' object ## (suppressing the message) miprop <- proposeMI(mimodobj=mimod_bmi7, data=bmi, message=FALSE, plot = FALSE) # Create the set of imputed datasets using the proposed 'mice' options imp <- doMImice(miprop,123) #> Now you have created your multiply imputed datasets, you can perform #> your analysis and pool the results using the 'mice' functions 'with()' #> and 'pool()' # Additionally, fit the substantive model to each imputed dataset and display ## the pooled results doMImice(miprop, 123, substmod=\"lm(bmi7 ~ matage + I(matage^2) + mated)\") #> Given the substantive model: lm(bmi7 ~ matage + I(matage^2) + mated) , #> multiple imputation estimates are as follows: #> #> term estimate std.error statistic df p.value #> #> 1 (Intercept) 17.6607324 0.07126548 247.816079 233.1668 2.116834e-284 #> #> 2 matage 1.1504545 0.05230345 21.995769 184.5081 1.863532e-53 #> #> 3 I(matage^2) 0.8414975 0.03231752 26.038433 257.1270 4.754845e-74 #> #> 4 mated1 -1.0026194 0.10787751 -9.294054 159.1101 1.094881e-16 #> #> 2.5 % 97.5 % #> #> 1 17.5203258 17.8011389 #> #> 2 1.0472648 1.2536442 #> #> 3 0.7778567 0.9051382 #> #> 4 -1.2156760 -0.7895629"},{"path":"https://elliecurnow.github.io/midoc/reference/exploreDAG.html","id":null,"dir":"Reference","previous_headings":"","what":"Compares data with proposed DAG — exploreDAG","title":"Compares data with proposed DAG — exploreDAG","text":"Explore whether relationships fully observed variables specified dataset consistent proposed directed acyclic graph (DAG) using localTests functionality.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/exploreDAG.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compares data with proposed DAG — exploreDAG","text":"","code":"exploreDAG(mdag, data)"},{"path":"https://elliecurnow.github.io/midoc/reference/exploreDAG.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compares data with proposed DAG — exploreDAG","text":"mdag DAG, specified string using dagitty syntax data data frame containing variables stated DAG. ordinal variables must integer-coded categorical variables must dummy-coded.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/exploreDAG.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compares data with proposed DAG — exploreDAG","text":"message indicating whether relationships fully observed variables specified dataset consistent proposed DAG","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/exploreDAG.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compares data with proposed DAG — exploreDAG","text":"","code":"exploreDAG(mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r\", data=bmi) #> The proposed directed acyclic graph (DAG) implies the following #> conditional independencies (where, for example, 'X _||_ Y | Z' should #> be read as 'X is independent of Y conditional on Z'). Note that #> variable names are abbreviated: #> #> bmi7 _||_ r | sp_n #> #> bmi7 _||_ r | matd #> #> bmi7 _||_ sp_n | matd #> #> matg _||_ r | sp_n #> #> matg _||_ r | matd #> #> matg _||_ sp_n | matd #> #> matd _||_ r | sp_n #> #> These (conditional) independence statements are explored below using #> the canonical correlations approach for mixed data. See #> ??dagitty::localTests for further details. Results are shown for #> variables that are fully observed in the specified dataset. The null #> hypothesis is that the stated variables are (conditionally) #> independent. #> #> estimate p.value 2.5% 97.5% #> #> matage _||_ r | mated 0.02998323 0.343547 -0.03206946 0.09180567 #> #> Interpretation: A small p-value means the stated variables may not be #> (conditionally) independent in the specified dataset: your data may not #> be consistent with the proposed DAG. A large p-value means there is #> little evidence of inconsistency between your data and the proposed #> DAG. #> #> Note that these results assume that relationships between variables are #> linear. Consider exploring the specification of each relationship in #> your model. Also consider whether it is valid and possible to explore #> relationships between partially observed variables using the observed #> data, e.g. avoiding perfect prediction."},{"path":"https://elliecurnow.github.io/midoc/reference/midoc-package.html","id":null,"dir":"Reference","previous_headings":"","what":"midoc: A Decision-Making System for Multiple Imputation — midoc-package","title":"midoc: A Decision-Making System for Multiple Imputation — midoc-package","text":"guidance system analysis missing data. incorporates expert, --date methodology help researchers choose appropriate analysis approach data missing. provide available data assumed causal structure, including likely causes missing data. 'midoc' advise analysis approaches can used, best perform . 'midoc' follows framework treatment reporting missing data observational studies (TARMOS). Lee et al (2021). doi:10.1016/j.jclinepi.2021.01.008 .","code":""},{"path":[]},{"path":"https://elliecurnow.github.io/midoc/reference/midoc-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"midoc: A Decision-Making System for Multiple Imputation — midoc-package","text":"Maintainer: Elinor Curnow elinor.curnow@bristol.ac.uk (ORCID) [copyright holder] Authors: Jon Heron Rosie Cornish Kate Tilling James Carpenter","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/midocVignette.html","id":null,"dir":"Reference","previous_headings":"","what":"Run an interactive vignette for the midoc package — midocVignette","title":"Run an interactive vignette for the midoc package — midocVignette","text":"Runs interactive version midoc vignette: Multiple Imputation DOCtor (midoc). interactive version, can apply midoc functions shiny-package apps using DAG data.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/midocVignette.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Run an interactive vignette for the midoc package — midocVignette","text":"","code":"midocVignette()"},{"path":"https://elliecurnow.github.io/midoc/reference/midocVignette.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Run an interactive vignette for the midoc package — midocVignette","text":"browser-based, interactive version midoc vignette","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/midocVignette.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Run an interactive vignette for the midoc package — midocVignette","text":"","code":"if (FALSE) { # interactive() # Run the interactive vignette midocVignette() }"},{"path":"https://elliecurnow.github.io/midoc/reference/proposeMI.html","id":null,"dir":"Reference","previous_headings":"","what":"Suggests multiple imputation options — proposeMI","title":"Suggests multiple imputation options — proposeMI","text":"Suggests mice options perform multiple imputation, based proposed set imputation models (one partially observed variable) specified dataset.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/proposeMI.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Suggests multiple imputation options — proposeMI","text":"","code":"proposeMI(mimodobj, data, plot = TRUE, plotprompt = TRUE, message = TRUE)"},{"path":"https://elliecurnow.github.io/midoc/reference/proposeMI.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Suggests multiple imputation options — proposeMI","text":"mimodobj object, list objects, type 'mimod', stands 'multiple imputation model', created call checkModSpec data data frame containing variables required imputation substantive analysis plot TRUE (default), displays diagnostic plots proposed 'mice' call; use plot=FALSE disable plots plotprompt TRUE (default), user prompted second plot displayed; use plotprompt=FALSE remove prompt message TRUE (default), displays message describing proposed 'mice' options; use message=FALSE suppress message","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/proposeMI.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Suggests multiple imputation options — proposeMI","text":"object type 'miprop', can used run 'mice' using proposed options, plus, optionally, message diagnostic plots describing proposed 'mice' options","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/proposeMI.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Suggests multiple imputation options — proposeMI","text":"","code":"# First specify each imputation model as a 'mimod' object ## (suppressing the message) mimod_bmi7 <- checkModSpec(formula=\"bmi7~matage+I(matage^2)+mated+pregsize\", family=\"gaussian(identity)\", data=bmi, message=FALSE) mimod_pregsize <- checkModSpec( formula=\"pregsize~bmi7+matage+I(matage^2)+mated\", family=\"binomial(logit)\", data=bmi, message=FALSE) # Display the proposed 'mice' options (suppressing the plot prompt) ## When specifying a single imputation model proposeMI(mimodobj=mimod_bmi7, data=bmi, plotprompt = FALSE) #> Based on your proposed imputation model and dataset, your mice() call #> should be as follows: #> #> mice(data = bmi , # You may need to specify a subset of the columns in #> your dataset #> #> m = 41 , # You should use at least this number of imputations based on #> the proportion of complete records in your dataset #> #> method = c( ‘norm’ ) # Specify a method for each incomplete variable. #> If displayed, the box-and-whisker plots can be used to inform your #> choice of method(s): for example, if the imputation model does not #> predict extreme values appropriately, consider a different imputation #> model/method e.g. PMM. Note the distribution of imputed and observed #> values is displayed for numeric variables only. The distribution may #> differ if data are missing at random or missing not at random. If you #> suspect data are missing not at random, the plots can also inform your #> choice of sensitivity parameter. #> #> formulas = formulas_list , # Note that you do not additionally need to #> specify a 'predmatrix' #> #> # The formulas_list specifies the conditional imputation models, which #> are as follows: #> #> ‘bmi7 ~ matage + I(matage^2) + mated + pregsize’ #> #> maxit = 10 , # If you have more than one incomplete variable, you #> should check this number of iterations is sufficient by inspecting the #> trace plots, if displayed. Consider increasing the number of iterations #> if there is a trend that does not stabilise by the 10th iteration. Note #> that iteration is not performed when only one variable is partially #> observed. #> #> printFlag = FALSE , # Change to printFlag=TRUE to display the history #> as imputation is performed #> #> seed = NA) # It is good practice to choose a seed so your results are #> reproducible ## When specifying more than one imputation model (suppressing the plots) proposeMI(mimodobj=list(mimod_bmi7,mimod_pregsize), data=bmi, plot = FALSE) #> Based on your proposed imputation model and dataset, your mice() call #> should be as follows: #> #> mice(data = bmi , # You may need to specify a subset of the columns in #> your dataset #> #> m = 41 , # You should use at least this number of imputations based on #> the proportion of complete records in your dataset #> #> method = c( ‘norm’, ‘logreg’ ) # Specify a method for each incomplete #> variable. If displayed, the box-and-whisker plots can be used to #> inform your choice of method(s): for example, if the imputation model #> does not predict extreme values appropriately, consider a different #> imputation model/method e.g. PMM. Note the distribution of imputed and #> observed values is displayed for numeric variables only. The #> distribution may differ if data are missing at random or missing not at #> random. If you suspect data are missing not at random, the plots can #> also inform your choice of sensitivity parameter. #> #> formulas = formulas_list , # Note that you do not additionally need to #> specify a 'predmatrix' #> #> # The formulas_list specifies the conditional imputation models, which #> are as follows: #> #> ‘bmi7 ~ matage + I(matage^2) + mated + pregsize’ #> #> ‘pregsize ~ bmi7 + matage + I(matage^2) + mated’ #> #> maxit = 10 , # If you have more than one incomplete variable, you #> should check this number of iterations is sufficient by inspecting the #> trace plots, if displayed. Consider increasing the number of iterations #> if there is a trend that does not stabilise by the 10th iteration. Note #> that iteration is not performed when only one variable is partially #> observed. #> #> printFlag = FALSE , # Change to printFlag=TRUE to display the history #> as imputation is performed #> #> seed = NA) # It is good practice to choose a seed so your results are #> reproducible"},{"path":"https://elliecurnow.github.io/midoc/news/index.html","id":"midoc-100","dir":"Changelog","previous_headings":"","what":"midoc 1.0.0","title":"midoc 1.0.0","text":"CRAN release: 2024-10-02","code":""},{"path":"https://elliecurnow.github.io/midoc/news/index.html","id":"changes-coming-in-version-1-0-0","dir":"Changelog","previous_headings":"","what":"Changes coming in version 1.0","title":"midoc 1.0.0","text":"Initial CRAN submission.","code":""}]
+[{"path":"https://elliecurnow.github.io/midoc/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2022 midoc authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://elliecurnow.github.io/midoc/articles/midoc.html","id":"about-midoc","dir":"Articles","previous_headings":"","what":"About midoc","title":"Multiple Imputation DOCtor (midoc)","text":"Missing data common issue health social research, often addressed multiple imputation (MI). MI flexible general approach, suite software packages. However, using MI practice can complex. Application MI involves multiple decisions rarely justified even documented, little guidance available. Multiple Imputation DOCtor (midoc) R package decision-making system incorporates expert, --date guidance help choose appropriate analysis method missing data. midoc guide analysis, examining hypothesised causal relationships observed data advise whether MI needed, perform . midoc follows framework treatment reporting missing data observational studies (TARMOS) 1. assume interested obtaining unbiased estimates regression coefficients - note bias necessarily concern interest prediction (.e. diagnostic/prognostic modelling). , demonstrate key features midoc using worked example. example, wish estimate association maternal age first pregnancy (exposure) child’s body mass index (BMI) age 7 years (outcome). simplicity, consider one confounder relationship maternal age BMI age 7 years, maternal education level. Note simulated data study included midoc package bmi dataset. dataset contains 1000 observations, realistic values variable, exaggerated relationships variables (highlight consequences choice analysis approach). Note interactive version vignette: Multiple Imputation DOCtor (midoc) Shiny version also available run locally (can run using midoc command midocVignette()). interactive version, can apply features midoc described using DAG data.","code":""},{"path":"https://elliecurnow.github.io/midoc/articles/midoc.html","id":"step-1-specify-the-analysis-and-missingness-models-using-a-directed-acyclic-graph","dir":"Articles","previous_headings":"","what":"Step 1 Specify the analysis and missingness models using a directed acyclic graph","title":"Multiple Imputation DOCtor (midoc)","text":"First, construct causal diagram, directed acyclic graph (DAG) example, using syntax per dagitty package. start specifying relationships variables, assuming missing data. assume maternal age (matage) causes BMI age 7 years (bmi7), maternal education level (mated) causes maternal age BMI age 7 years. can express relationships using “dagitty” syntax, follows: Next, partially observed variable, specify variables related probability missing (“missingness”) adding relationships DAG. type DAG often referred “missingness” DAG (mDAG) 2, 3. first use midoc function descMissData identify variables dataset partially observed, specifying outcome (y), covariates, .e. independent variables, (covs), dataset (data), follows. see two missing data patterns: either variables observed, BMI age 7 years missing covariates observed. use indicator variable “R” denote missingness BMI age 7 years (example, R=1 BMI age 7 years observed, 0 otherwise). specific example, R also indicates complete record (R=1 variables fully observed, 0 otherwise) variables fully observed. suppose R related maternal education level via socio-economic position (SEP), .e. SEP cause maternal education level R, neither BMI age 7 years maternal age causes R. suppose SEP missing (unmeasured) individuals dataset; remind us fact, name variable sep_unmeas. mDAG now follows (note follow convention using lower case names variables code, R becomes “r”, ): Note instead believe maternal education direct cause R, mDAG follows: now draw mDAG visually check relationships specified intended: Note used additional commands specify layout mDAG shown - although necessary using midoc, go dagitty website like find using “dagitty” draw mDAGs. final check mDAG, use midoc function exploreDAG explore whether relationships dataset consistent proposed mDAG, specifying mDAG (mdag) dataset (data), follows. Based relationships fully observed variables maternal age, maternal education, missingness BMI age 7 years, can see little evidence inconsistency dataset proposed mDAG. particular, mDAG assumes maternal age (matage) unrelated missingness BMI age 7 years (r), given maternal education (mated); results suggest plausible. Note use observed data determine whether BMI age 7 years unrelated missingness - need missing values BMI age 7 years order . However, BMI age 7 years cause missingness, expect maternal age also related missingness (via BMI age 7 years). Since maternal age seems unrelated, reassured BMI age 7 years also likely unrelated, given maternal education. Tips specifying “missingness” DAG First specify DAG analysis model, missing data. may find introduction DAGs useful 4. Next add missingness indicator(s) DAG. multiple variables missing data, may want start including just complete records indicator DAG. Identify variables related missingness using: Subject-matter knowledge, example, prior research causes drop-study knowledge data collection process Data exploration, example, performing logistic regression missingness indicator analysis model variables - noting may exclude variables large proportion missing data avoid perfect prediction","code":"matage -> bmi7 mated -> matage mated -> bmi7 descMissData(y=\"bmi7\", covs=\"matage mated\", data=bmi) pattern bmi7 matage mated n pct 1 1 1 1 1 592 59 2 2 0 1 1 408 41 matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r matage -> bmi7 mated -> matage mated -> bmi7 mated -> r exploreDAG(mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r\", data=bmi) The proposed directed acyclic graph (DAG) implies the following conditional independencies (where, for example, 'X _||_ Y | Z' should be read as 'X is independent of Y conditional on Z'). Note that variable names are abbreviated: bmi7 _||_ r | sp_n bmi7 _||_ r | matd bmi7 _||_ sp_n | matd matg _||_ r | sp_n matg _||_ r | matd matg _||_ sp_n | matd matd _||_ r | sp_n These (conditional) independence statements are explored below using the canonical correlations approach for mixed data. See ??dagitty::localTests for further details. Results are shown for variables that are fully observed in the specified dataset. The null hypothesis is that the stated variables are (conditionally) independent. estimate p.value 2.5% 97.5% matage _||_ r | mated 0.02998323 0.343547 -0.03206946 0.09180567 Interpretation: A small p-value means the stated variables may not be (conditionally) independent in the specified dataset: your data may not be consistent with the proposed DAG. A large p-value means there is little evidence of inconsistency between your data and the proposed DAG. Note that these results assume that relationships between variables are linear. Consider exploring the specification of each relationship in your model. Also consider whether it is valid and possible to explore relationships between partially observed variables using the observed data, e.g. avoiding perfect prediction."},{"path":"https://elliecurnow.github.io/midoc/articles/midoc.html","id":"step-2-check-whether-complete-records-analysis-is-likely-to-be-a-valid-strategy","dir":"Articles","previous_headings":"","what":"Step 2 Check whether complete records analysis is likely to be a valid strategy","title":"Multiple Imputation DOCtor (midoc)","text":"next step determine whether complete records analysis (CRA) valid strategy, using mDAG. Remember , general, CRA valid analysis model outcome unrelated complete records indicator, conditional analysis model covariates 5 (special cases, depending type analysis model estimand interest, rule can relaxed 6 - , consider general setting without making assumptions fitted model). Suppose decide estimate unadjusted association BMI age 7 years maternal age, without including confounder maternal education model. use midoc function checkCRA applied mDAG check whether CRA valid model, specifying outcome (y), covariates, .e. independent variables, (covs), complete records indicator (r_cra), mDAG (mdag), follows: can see CRA valid (can also tell inspecting DAG: open path bmi7 r via mated sep_unmeas condition matage). checkCRA suggests CRA valid included mated, mated sep_unmeas, analysis model. particular setting, sensible include mated analysis model since confounder relationship matage bmi7. settings, might want include variables required valid CRA model might change interpretation - case, need use different analysis strategy. Note sep_unmeas related bmi7 condition mated (though still related missingness bmi7), need included analysis model. add mated model re-run checkCRA, , see CRA now valid. Note outcome, BMI age 7 years, cause missingness, CRA always invalid, .e. variables add analysis model make CRA valid. See see results checkCRA case (note, code, added path bmi7 r specified mDAG).","code":"checkCRA(y=\"bmi7\", covs=\"matage\", r_cra=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r\") Based on the proposed directed acyclic graph (DAG), the analysis model outcome and complete record indicator are not independent given analysis model covariates. Hence, in general, complete records analysis is not valid. In special cases, depending on the type of analysis model and estimand of interest, complete records analysis may still be valid. See, for example, Bartlett et al. (2015) (https://doi.org/10.1093/aje/kwv114) for further details. Consider using a different analysis model and/or strategy, e.g. multiple imputation. For example, the analysis model outcome and complete record indicator are independent if, in addition to the specified covariates, the following sets of variables are included as covariates in the analysis model (note that this list is not necessarily exhaustive, particularly if your DAG is complex): mated c(\"mated\", \"sep_unmeas\") checkCRA(y=\"bmi7\", covs=\"matage mated\", r_cra=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r\") Based on the proposed directed acyclic graph (DAG), the analysis model outcome and complete record indicator are independent given analysis model covariates. Hence, complete records analysis is valid. checkCRA(y=\"bmi7\", covs=\"matage mated\", r_cra=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r bmi7 -> r\") Based on the proposed directed acyclic graph (DAG), the analysis model outcome and complete record indicator are not independent given analysis model covariates. Hence, in general, complete records analysis is not valid. In special cases, depending on the type of analysis model and estimand of interest, complete records analysis may still be valid. See, for example, Bartlett et al. (2015) (https://doi.org/10.1093/aje/kwv114) for further details. There are no other variables which could be added to the model to make the analysis model outcome and complete record indicator conditionally independent. Consider using a different strategy e.g. multiple imputation."},{"path":"https://elliecurnow.github.io/midoc/articles/midoc.html","id":"step-3-check-whether-multiple-imputation-is-likely-to-be-a-valid-strategy","dir":"Articles","previous_headings":"","what":"Step 3 Check whether multiple imputation is likely to be a valid strategy","title":"Multiple Imputation DOCtor (midoc)","text":"Although CRA valid example, may also wish perform MI. Remember MI valid principle partially observed variable unrelated missingness, given imputation model predictors. Furthermore, include analysis model variables imputation model partially observed variable, form implied analysis model, analysis imputation models “compatible”. theory, given multiple partially observed variables, validity MI may imply different causes missingness missing data pattern. example, BMI age 7 years maternal education partially observed, MI valid missingness BMI age 7 years unrelated maternal education among individuals missing BMI age 7 years maternal education (given observed data). Missingness BMI age 7 years related maternal education among individuals observed maternal education. practice, recommend focusing common missing data patterns /variables missing data. Less common missing data patterns can often assumed missing completely random - unlikely change final conclusions assumption incorrect. example, single partially observed variable (BMI age 7 years), relatively simple check validity MI based mDAG. already verified (using checkCRA) BMI age 7 years unrelated missingness, given maternal age maternal education. Therefore, know MI valid use variables imputation model BMI age 7 years (analysis model imputation model exactly case). However, MI using just maternal age maternal education imputation model BMI age 7 years recover additional information compared CRA. Therefore, may wish include “auxiliary variables” imputation model BMI age 7 years. additional variables included predictors imputation model required analysis model. choose auxiliary variables predictive BMI age 7 years, can improve precision MI estimate - reduce standard error - compared CRA estimate. example, two variables used auxiliary variables: pregnancy size - singleton multiple birth - (pregsize) birth weight (bwt). inspect missing data patterns dataset using descMissData, including auxiliary variables. can see auxiliary variables fully observed. assume pregnancy size cause BMI age 7 years, missingness. assume birth weight related BMI 7 years (via pregnancy size) missingness (via SEP). now add variables mDAG. , shown updated mDAG. also explore whether relationships dataset consistent updated mDAG using exploreDAG, follows. results suggest updated mDAG plausible. Note CRA still valid updated mDAG. can check using checkCRA : now use midoc function checkMI applied DAG check whether MI valid imputation model predictors BMI age 7 years include pregnancy size birth weight, well maternal age maternal education. specify partially observed variable (dep), predictors (preds), missingness indicator partially observed variable (r_dep), mDAG (mdag). first consider imputation model including pregnancy size. results shown . suggest MI valid principle included pregnancy size well analysis model variables imputation model BMI age 7 years. next consider imputation model including birth weight. results shown . suggest MI valid included birth weight well analysis model variables imputation model BMI age 7 years. can also tell inspecting mDAG: since bwt shares common cause bmi7 r, “collider”, hence conditioning bwt opens path bmi7 r via bwt. Note theory, suggested checkMI results shown , MI valid added birth weight pregnancy size auxiliary variables imputation model (note SEP needed, conditional imputation model predictors). However, practice, strategy may still result biased estimates, due unmeasured confounding relationship BMI age 7 years birth weight. recommend including colliders partially observed variable missingness auxiliary variables 7.","code":"descMissData(y=\"bmi7\", covs=\"matage mated pregsize bwt\", data=bmi) pattern bmi7 matage mated pregsize bwt n pct 1 1 1 1 1 1 1 592 59 2 2 0 1 1 1 1 408 41 exploreDAG(mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\", data=bmi) The proposed directed acyclic graph (DAG) implies the following conditional independencies (where, for example, 'X _||_ Y | Z' should be read as 'X is independent of Y conditional on Z'). Note that variable names are abbreviated: bmi7 _||_ bwt | prgs, sp_n bmi7 _||_ bwt | matd, prgs bmi7 _||_ r | sp_n bmi7 _||_ r | matd bmi7 _||_ sp_n | matd bwt _||_ matg | matd bwt _||_ matg | sp_n bwt _||_ matd | sp_n bwt _||_ r | sp_n matg _||_ prgs matg _||_ r | sp_n matg _||_ r | matd matg _||_ sp_n | matd matd _||_ prgs matd _||_ r | sp_n prgs _||_ r prgs _||_ sp_n These (conditional) independence statements are explored below using the canonical correlations approach for mixed data. See ??dagitty::localTests for further details. Results are shown for variables that are fully observed in the specified dataset. The null hypothesis is that the stated variables are (conditionally) independent. estimate p.value 2.5% 97.5% bwt _||_ matage | mated 0.05018898 0.1127099 -0.01184095 0.11183410 matage _||_ pregsize 0.03029139 0.3386080 -0.03176134 0.09211150 matage _||_ r | mated 0.02998323 0.3435470 -0.03206946 0.09180567 mated _||_ pregsize 0.01594976 0.6144181 -0.04608889 0.07786585 pregsize _||_ r 0.01482015 0.6397174 -0.04721631 0.07674273 Interpretation: A small p-value means the stated variables may not be (conditionally) independent in the specified dataset: your data may not be consistent with the proposed DAG. A large p-value means there is little evidence of inconsistency between your data and the proposed DAG. Note that these results assume that relationships between variables are linear. Consider exploring the specification of each relationship in your model. Also consider whether it is valid and possible to explore relationships between partially observed variables using the observed data, e.g. avoiding perfect prediction. checkCRA(y=\"bmi7\", covs=\"matage mated\", r_cra=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\") Based on the proposed directed acyclic graph (DAG), the analysis model outcome and complete record indicator are independent given analysis model covariates. Hence, complete records analysis is valid. checkMI(dep=\"bmi7\", preds=\"matage mated pregsize\", r_dep=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\") Based on the proposed directed acyclic graph (DAG), the incomplete variable and its missingness indicator are independent given imputation model predictors. Hence, multiple imputation methods which assume data are missing at random are valid in principle. checkMI(dep=\"bmi7\", preds=\"matage mated bwt\", r_dep=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\") Based on the proposed directed acyclic graph (DAG), the incomplete variable and its missingness indicator are not independent given imputation model predictors. Hence, multiple imputation methods which assume data are missing at random are not valid. Consider using a different imputation model and/or strategy (e.g. not-at-random fully conditional specification). For example, the incomplete variable and its missingness indicator are independent if, in addition to the specified predictors, the following sets of variables are included as predictors in the imputation model (note that this list is not necessarily exhaustive, particularly if your DAG is complex): pregsize c(\"pregsize\", \"sep_unmeas\")"},{"path":"https://elliecurnow.github.io/midoc/articles/midoc.html","id":"step-4-check-that-all-relationships-are-correctly-specified","dir":"Articles","previous_headings":"","what":"Step 4 Check that all relationships are correctly specified","title":"Multiple Imputation DOCtor (midoc)","text":"far, explored whether CRA MI valid principle using mDAG, without making assumptions form variables, relationships . However, MI give unbiased estimates, imputation models must compatible analysis model correctly specified: must contain variables required analysis model, must include relationships implied analysis model e.g. interactions, must specify form relationships correctly 8. Since CRA MI valid principle worked example, use complete records bmi dataset explore specification relationships BMI age 7 years predictors (analysis model variables, maternal age maternal education, plus auxiliary variable, pregnancy size) imputation model. use midoc function checkModSpec applied bmi dataset check whether imputation model correctly specified. specify formula imputation model using standard R syntax (formula), type imputation model (family) (note midoc currently supports either linear logistic regression models), name dataset (data). Since maternal education pregnancy size binary variables, need explore form relationship BMI age 7 years continuous exposure, maternal age. first assume linear relationship BMI age 7 years maternal age (note, default software implementations MI). assume interactions. results shown . suggest imputation model mis-specified. plot residuals versus fitted values model (automatically displayed evidence model mis-specification), suggests may quadratic relationship BMI age 7 years maternal age. use midoc function checkModSpec , time specifying quadratic relationship BMI age 7 years maternal age. results suggest longer evidence model mis-specification. Note must make sure account non-linear relationship BMI age 7 years maternal age imputation models. example, imputation model pregnancy size need include BMI age 7 years, maternal education, quadratic form maternal age (induced conditioning BMI age 7 years). Although missing values pregnancy size dataset, can still explore specification need using checkModSpec follows (note suppressed plot case using plot = FALSE option): evidence model mis-specification. include quadratic form maternal age model pregnancy size, little evidence model mis-specification: Tips imputation model variable selection imputation model partially observed variable include: analysis model variables - check relationships partially observed variable predictors correctly specified imputation model e.g. using fractional polynomial selection auxiliary variables related missingness partially observed variable missing data , conditional analysis model variables Auxiliary variables related missing data missingness partially observed variable, conditional variables selected Steps 1 2 - large number variables, include predictive imputation model (using suitable variable selection method identify ) imputation model partially observed variable exclude: auxiliary variables related missingness partially observed variable missing data, conditional variables selected Steps 1, 2, 3 auxiliary variables colliders partially observed variable missingness","code":"checkModSpec(formula=\"bmi7~matage+mated+pregsize\", family=\"gaussian(identity)\", data=bmi) Model mis-specification method: regression of model residuals on a fractional polynomial of the fitted values P-value: 0 A small p-value means the model may be mis-specified. Check the specification of each relationship in your model. checkModSpec(formula=\"bmi7~matage+I(matage^2)+mated+pregsize\", family=\"gaussian(identity)\", data=bmi) Model mis-specification method: regression of model residuals on a fractional polynomial of the fitted values P-value: 1 A large p-value means there is little evidence of model mis-specification. checkModSpec(formula=\"pregsize~matage+bmi7+mated\", family=\"binomial(logit)\", data=bmi, plot=FALSE) Model mis-specification method: Pregibon's link test P-value: 0.038313 A small p-value means the model may be mis-specified. Check the specification of each relationship in your model. checkModSpec(formula=\"pregsize~matage+I(matage^2)+bmi7+mated\", family=\"binomial(logit)\", data=bmi) Model mis-specification method: Pregibon's link test P-value: 0.555356 A large p-value means there is little evidence of model mis-specification."},{"path":"https://elliecurnow.github.io/midoc/articles/midoc.html","id":"step-5-perform-mi-using-the-proposed-imputation-model","dir":"Articles","previous_headings":"","what":"Step 5 Perform MI using the proposed imputation model","title":"Multiple Imputation DOCtor (midoc)","text":"explored validity MI principle, using mDAG, specification imputation model, based observed data. now use midoc function proposeMI choose best options performing MI using mice package. first save chosen imputation model (.e. specifying quadratic relationship BMI age 7 years maternal age) mimod object. Note suppressed checkModSpec message case using message = FALSE option. use , along dataset, construct call “mice” function. Note also save proposed “mice” call miprop object, used later. results shown . particular, note proposed “mice” call, default values number imputations, method, formulas, number iterations changed. Plots distributions imputed observed data, based sample five imputed datasets, suggest extreme values handled appropriately using proposed imputation method. Trace plots, showing mean standard deviation imputed values across iterations, also displayed. Note plots shown without prompting (plotprompt = FALSE). need adjust number iterations , dataset, one variable partially observed. Note Given multiple partially observed variables, can specify list imputation models - one partially observed variable - proposeMI. example, suppose pregnancy size also partially observed. assume, simplicity, pregnancy size missing completely random. construct proposed “mice” call using proposeMI, follows. , suppress model checking messages. Returning example, assume adjustment required proposed “mice” call. use midoc function doMImice perform MI, specifying proposed “mice” call (miprop) seed “mice” call (seed) (results reproducible). also specify substantive model interest (substmod): regression BMI 7 years maternal age (fitting quadratic relationship) maternal education. optional step: specify substantive model, fitted automatically imputed dataset pooled results displayed (equivalent using “mice” functions pool). substantive model specified, imputation step performed.","code":"mimod_bmi7 <- checkModSpec(formula=\"bmi7~matage+I(matage^2)+mated+pregsize\", family=\"gaussian(identity)\", data=bmi, message=FALSE) miprop <- proposeMI(mimodobj=mimod_bmi7, data=bmi, plotprompt=FALSE) Based on your proposed imputation model and dataset, your mice() call should be as follows: mice(data = bmi , # You may need to specify a subset of the columns in your dataset m = 41 , # You should use at least this number of imputations based on the proportion of complete records in your dataset method = c( 'norm' ) # Specify a method for each incomplete variable. If displayed, the box-and-whisker plots can be used to inform your choice of method(s): for example, if the imputation model does not predict extreme values appropriately, consider a different imputation model/method e.g. PMM. Note the distribution of imputed and observed values is displayed for numeric variables only. The distribution may differ if data are missing at random or missing not at random. If you suspect data are missing not at random, the plots can also inform your choice of sensitivity parameter. formulas = formulas_list , # Note that you do not additionally need to specify a 'predmatrix' # The formulas_list specifies the conditional imputation models, which are as follows: 'bmi7 ~ matage + I(matage^2) + mated + pregsize' maxit = 10 , # If you have more than one incomplete variable, you should check this number of iterations is sufficient by inspecting the trace plots, if displayed. Consider increasing the number of iterations if there is a trend that does not stabilise by the 10th iteration. Note that iteration is not performed when only one variable is partially observed. printFlag = FALSE , # Change to printFlag=TRUE to display the history as imputation is performed seed = NA) # It is good practice to choose a seed so your results are reproducible mimod_bmi7 <- checkModSpec(formula=\"bmi7~matage+I(matage^2)+mated+pregsize\", family=\"gaussian(identity)\", data=bmi, message=FALSE) mimod_pregsize <- checkModSpec(formula=\"pregsize~bmi7+matage+I(matage^2)+mated\", family=\"binomial(logit)\", data=bmi, message=FALSE) proposeMI(mimodobj=list(mimod_bmi7, mimod_pregsize), data=bmi) doMImice(miprop, seed=123, substmod=\"lm(bmi7 ~ matage + I(matage^2) + mated)\") Given the substantive model: lm(bmi7 ~ matage + I(matage^2) + mated) , multiple imputation estimates are as follows: term estimate std.error statistic df p.value 1 (Intercept) 17.6607324 0.07126548 247.816079 233.1668 2.116834e-284 2 matage 1.1504545 0.05230345 21.995769 184.5081 1.863532e-53 3 I(matage^2) 0.8414975 0.03231752 26.038433 257.1270 4.754845e-74 4 mated1 -1.0026194 0.10787751 -9.294054 159.1101 1.094881e-16 2.5 % 97.5 % 1 17.5203258 17.8011389 2 1.0472648 1.2536442 3 0.7778567 0.9051382 4 -1.2156760 -0.7895629"},{"path":"https://elliecurnow.github.io/midoc/articles/midoc.html","id":"illustration-using-our-worked-example","dir":"Articles","previous_headings":"","what":"Illustration using our worked example","title":"Multiple Imputation DOCtor (midoc)","text":"Finally, illustrate choice analysis approach affects estimated association maternal age BMI age 7 years, adjusted maternal education level. compare CRA MI estimates. performing MI, used either pregnancy size birth weight auxiliary variable fitted either linear quadratic relationship BMI age 7 years maternal age imputation model. analysis approach, fitted substantive analysis model used . parameter estimates linear quadratic terms maternal age, 95% confidence intervals, shown table . Note , simulated data missingness, know “true” association .e. association missing data - shown “Full data” row table. note results displayed third row (“MI fitting quadratic relationship, using pregnancy size”) exactly generated . avoid repetition, shown code fitting models. table, can see CRA MI (fitting quadratic relationship BMI age 7 years maternal age imputation model) estimates unbiased linear quadratic terms maternal age. MI estimates biased fitting linear relationship imputation model, particularly quadratic term maternal age. MI estimates using collider, birth weight, auxiliary variable slightly biased slightly less precise estimates using pregnancy size auxiliary variable. collider bias relatively small association BMI age 7 years maternal age strong setting. Note collider bias relatively larger association weak 9. Parameter estimates maternal age","code":""},{"path":"https://elliecurnow.github.io/midoc/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Elinor Curnow. Author, maintainer, copyright holder. Jon Heron. Author. Rosie Cornish. Author. Kate Tilling. Author. James Carpenter. Author.","code":""},{"path":"https://elliecurnow.github.io/midoc/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Curnow E, Heron J, Cornish R, Tilling K, Carpenter J (2024). midoc: Decision-Making System Multiple Imputation. R package version 1.0.0, https://elliecurnow.github.io/midoc/.","code":"@Manual{, title = {midoc: A Decision-Making System for Multiple Imputation}, author = {Elinor Curnow and Jon Heron and Rosie Cornish and Kate Tilling and James Carpenter}, year = {2024}, note = {R package version 1.0.0}, url = {https://elliecurnow.github.io/midoc/}, }"},{"path":[]},{"path":"https://elliecurnow.github.io/midoc/index.html","id":"overview","dir":"","previous_headings":"","what":"Overview","title":"A Decision-Making System for Multiple Imputation","text":"Multiple Imputation DOCtor (midoc) R package guidance system analysis missing data. incorporates expert, --date methodology help choose appropriate analysis method missing data. examining available data assumed causal structure, midoc advise whether multiple imputation needed, , best perform . descMissData lists missing data patterns specified dataset exploreDAG compares relationships available data proposed DAG checkCRA checks complete records analysis valid proposed analysis model checkMI checks multiple imputation valid proposed imputation model checkModSpec explores parametric specification imputation model proposeMI suggests multiple imputation options based available data specified imputation model doMImice performs multiple imputation based proposeMI options can learn commands vignette(\"midoc\",\"midoc\").","code":""},{"path":"https://elliecurnow.github.io/midoc/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"A Decision-Making System for Multiple Imputation","text":"can install development version midoc GitHub :","code":"# install.packages(\"devtools\") devtools::install_github(\"elliecurnow/midoc\")"},{"path":"https://elliecurnow.github.io/midoc/index.html","id":"usage","dir":"","previous_headings":"","what":"Usage","title":"A Decision-Making System for Multiple Imputation","text":"","code":"library(midoc) head(bmi) #> bmi7 matage mated pregsize bwt r #> 1 15.16444 -1.30048035 0 0 3.287754 1 #> 2 18.00250 -0.33689915 0 0 3.770346 1 #> 3 NA -0.22673432 0 1 3.022161 0 #> 4 NA 0.81459107 1 0 3.103251 0 #> 5 17.97791 -0.55260086 0 0 3.830381 1 #> 6 NA -0.03829346 1 0 2.775282 0 descMissData(y=\"bmi7\", covs=\"matage mated\", data=bmi, plot=TRUE) #> pattern bmi7 matage mated n pct #> 1 1 1 1 1 592 59 #> 2 2 0 1 1 408 41 exploreDAG(mdag=\" matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\", data=bmi) #> The proposed directed acyclic graph (DAG) implies the following #> conditional independencies (where, for example, 'X _||_ Y | Z' should #> be read as 'X is independent of Y conditional on Z'). Note that #> variable names are abbreviated: #> #> bmi7 _||_ bwt | prgs, sp_n #> #> bmi7 _||_ bwt | matd, prgs #> #> bmi7 _||_ r | sp_n #> #> bmi7 _||_ r | matd #> #> bmi7 _||_ sp_n | matd #> #> bwt _||_ matg | matd #> #> bwt _||_ matg | sp_n #> #> bwt _||_ matd | sp_n #> #> bwt _||_ r | sp_n #> #> matg _||_ prgs #> #> matg _||_ r | sp_n #> #> matg _||_ r | matd #> #> matg _||_ sp_n | matd #> #> matd _||_ prgs #> #> matd _||_ r | sp_n #> #> prgs _||_ r #> #> prgs _||_ sp_n #> #> These (conditional) independence statements are explored below using #> the canonical correlations approach for mixed data. See #> ??dagitty::localTests for further details. Results are shown for #> variables that are fully observed in the specified dataset. The null #> hypothesis is that the stated variables are (conditionally) #> independent. #> #> estimate p.value 2.5% 97.5% #> #> bwt _||_ matage | mated 0.05018898 0.1127099 -0.01184095 0.11183410 #> #> matage _||_ pregsize 0.03029139 0.3386080 -0.03176134 0.09211150 #> #> matage _||_ r | mated 0.02998323 0.3435470 -0.03206946 0.09180567 #> #> mated _||_ pregsize 0.01594976 0.6144181 -0.04608889 0.07786585 #> #> pregsize _||_ r 0.01482015 0.6397174 -0.04721631 0.07674273 #> #> Interpretation: A small p-value means the stated variables may not be #> (conditionally) independent in the specified dataset: your data may not #> be consistent with the proposed DAG. A large p-value means there is #> little evidence of inconsistency between your data and the proposed #> DAG. #> #> Note that these results assume that relationships between variables are #> linear. Consider exploring the specification of each relationship in #> your model. Also consider whether it is valid and possible to explore #> relationships between partially observed variables using the observed #> data, e.g. avoiding perfect prediction. checkCRA(y=\"bmi7\", covs=\"matage mated\", r_cra=\"r\", mdag=\" matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\") #> Based on the proposed directed acyclic graph (DAG), the analysis model #> outcome and complete record indicator are independent given analysis #> model covariates. Hence, complete records analysis is valid. checkMI(dep=\"bmi7\", preds=\"matage mated pregsize\", r_dep=\"r\", mdag=\" matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\") #> Based on the proposed directed acyclic graph (DAG), the incomplete #> variable and its missingness indicator are independent given imputation #> model predictors. Hence, multiple imputation methods which assume data #> are missing at random are valid in principle. mimod_bmi7 <- checkModSpec(formula=\"bmi7~matage+I(matage^2)+mated+pregsize\", family=\"gaussian(identity)\", data=bmi) #> Model mis-specification method: regression of model residuals on a #> fractional polynomial of the fitted values #> #> P-value: 1 #> #> A large p-value means there is little evidence of model #> mis-specification. miprop <- proposeMI(mimodobj=mimod_bmi7, data=bmi) #> Based on your proposed imputation model and dataset, your mice() call #> should be as follows: #> #> mice(data = bmi , # You may need to specify a subset of the columns in #> your dataset #> #> m = 41 , # You should use at least this number of imputations based on #> the proportion of complete records in your dataset #> #> method = c( 'norm' ) # Specify a method for each incomplete variable. #> If displayed, the box-and-whisker plots can be used to inform your #> choice of method(s): for example, if the imputation model does not #> predict extreme values appropriately, consider a different imputation #> model/method e.g. PMM. Note the distribution of imputed and observed #> values is displayed for numeric variables only. The distribution may #> differ if data are missing at random or missing not at random. If you #> suspect data are missing not at random, the plots can also inform your #> choice of sensitivity parameter. #> #> formulas = formulas_list , # Note that you do not additionally need to #> specify a 'predmatrix' #> #> # The formulas_list specifies the conditional imputation models, which #> are as follows: #> #> 'bmi7 ~ matage + I(matage^2) + mated + pregsize' #> #> maxit = 10 , # If you have more than one incomplete variable, you #> should check this number of iterations is sufficient by inspecting the #> trace plots, if displayed. Consider increasing the number of iterations #> if there is a trend that does not stabilise by the 10th iteration. Note #> that iteration is not performed when only one variable is partially #> observed. #> #> printFlag = FALSE , # Change to printFlag=TRUE to display the history #> as imputation is performed #> #> seed = NA) # It is good practice to choose a seed so your results are #> reproducible doMImice(miprop, 123, substmod=\"lm(bmi7 ~ matage + I(matage^2) + mated)\") #> Given the substantive model: lm(bmi7 ~ matage + I(matage^2) + mated) , #> multiple imputation estimates are as follows: #> #> term estimate std.error statistic df p.value #> #> 1 (Intercept) 17.6607324 0.07126548 247.816079 233.1668 2.116834e-284 #> #> 2 matage 1.1504545 0.05230345 21.995769 184.5081 1.863532e-53 #> #> 3 I(matage^2) 0.8414975 0.03231752 26.038433 257.1270 4.754845e-74 #> #> 4 mated1 -1.0026194 0.10787751 -9.294054 159.1101 1.094881e-16 #> #> 2.5 % 97.5 % #> #> 1 17.5203258 17.8011389 #> #> 2 1.0472648 1.2536442 #> #> 3 0.7778567 0.9051382 #> #> 4 -1.2156760 -0.7895629"},{"path":"https://elliecurnow.github.io/midoc/reference/bmi.html","id":null,"dir":"Reference","previous_headings":"","what":"Child body mass index data — bmi","title":"Child body mass index data — bmi","text":"simulated dataset","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/bmi.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Child body mass index data — bmi","text":"","code":"bmi"},{"path":[]},{"path":"https://elliecurnow.github.io/midoc/reference/bmi.html","id":"bmi","dir":"Reference","previous_headings":"","what":"bmi","title":"Child body mass index data — bmi","text":"data frame 1000 rows 6 columns: bmi7 Child's body mass index age 7 years matage Mother's age pregnancy, standardised relative mean age 30 mated Mother's educational level: post-16 years qualification pregsize Mother's pregnancy size: singleton twins bwt Child's birth weight kilograms r Missingness indicator: whether bmi7 reported ","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkcra.html","id":null,"dir":"Reference","previous_headings":"","what":"Inspect complete records analysis model — checkCRA","title":"Inspect complete records analysis model — checkCRA","text":"Check complete records analysis valid proposed analysis model directed acyclic graph (DAG). Validity means proposed approach allow unbiased estimation estimand(s) interest, including regression parameters, associations, causal effects.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkcra.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Inspect complete records analysis model — checkCRA","text":"","code":"checkCRA(y, covs, r_cra, mdag)"},{"path":"https://elliecurnow.github.io/midoc/reference/checkcra.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Inspect complete records analysis model — checkCRA","text":"y analysis model outcome, specified string covs analysis model covariate(s), specified string (space delimited) r_cra complete record indicator, specified string mdag DAG, specified string using dagitty syntax","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkcra.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Inspect complete records analysis model — checkCRA","text":"message indicating whether complete records analysis valid proposed DAG analysis model outcome covariate(s)","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkcra.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Inspect complete records analysis model — checkCRA","text":"DAG include observed unobserved variables related analysis model variables missingness, well required missingness indicators. general, complete records analysis valid analysis model outcome complete record indicator unrelated, conditional specified covariates. determined using proposed DAG checking whether analysis model complete record indicator 'd-separated', given covariates.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkcra.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Inspect complete records analysis model — checkCRA","text":"Hughes R, Heron J, Sterne J, Tilling K. 2019. Accounting missing data statistical analyses: multiple imputation always answer. Int J Epidemiol. doi:10.1093/ije/dyz032 Bartlett JW, Harel O, Carpenter JR. 2015. Asymptotically Unbiased Estimation Exposure Odds Ratios Complete Records Logistic Regression. J Epidemiol. doi:10.1093/aje/kwv114","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkcra.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Inspect complete records analysis model — checkCRA","text":"","code":"# Example DAG for which complete records analysis is not valid, but could be ## valid for a different set of covariates checkCRA(y=\"bmi7\", covs=\"matage\", r_cra=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r\") #> Based on the proposed directed acyclic graph (DAG), the analysis model #> outcome and complete record indicator are not independent given #> analysis model covariates. Hence, in general, complete records analysis #> is not valid. #> #> In special cases, depending on the type of analysis model and estimand #> of interest, complete records analysis may still be valid. See, for #> example, Bartlett et al. (2015) (https://doi.org/10.1093/aje/kwv114) #> for further details. #> #> Consider using a different analysis model and/or strategy, e.g. #> multiple imputation. #> #> For example, the analysis model outcome and complete record indicator #> are independent if, in addition to the specified covariates, the #> following sets of variables are included as covariates in the analysis #> model (note that this list is not necessarily exhaustive, particularly #> if your DAG is complex): #> #> mated #> #> c(\"mated\", \"sep_unmeas\") # For the DAG in the example above, complete records analysis is valid ## if a different set of covariates is used checkCRA(y=\"bmi7\", covs=\"matage mated\", r_cra=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r\") #> Based on the proposed directed acyclic graph (DAG), the analysis model #> outcome and complete record indicator are independent given analysis #> model covariates. Hence, complete records analysis is valid. # Example DAG for which complete records is not valid, but could be valid ## for a different estimand checkCRA(y=\"bmi7\", covs=\"matage mated\", r_cra=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r matage -> bmi3 mated -> bmi3 bmi3 -> bmi7 bmi3 -> r\") #> Based on the proposed directed acyclic graph (DAG), the analysis model #> outcome and complete record indicator are not independent given #> analysis model covariates. Hence, in general, complete records analysis #> is not valid. #> #> In special cases, depending on the type of analysis model and estimand #> of interest, complete records analysis may still be valid. See, for #> example, Bartlett et al. (2015) (https://doi.org/10.1093/aje/kwv114) #> for further details. #> #> There are no other variables which could be added to the model to make #> the analysis model outcome and complete record indicator conditionally #> independent, without changing the estimand of interest. Consider using #> a different strategy e.g. multiple imputation. #> #> Alternatively, consider whether a different estimand could be of #> interest. For example, the analysis model outcome and complete record #> indicator are independent given each of the following sets of #> variables: #> #> c(\"bmi3\", \"mated\") #> #> c(\"bmi3\", \"matage\", \"mated\") #> #> c(\"bmi3\", \"sep_unmeas\") #> #> c(\"bmi3\", \"matage\", \"sep_unmeas\") #> #> c(\"bmi3\", \"mated\", \"sep_unmeas\") #> #> c(\"bmi3\", \"matage\", \"mated\", \"sep_unmeas\") # Example DAG for which complete records analysis is never valid checkCRA(y=\"bmi7\", covs=\"matage mated\", r_cra=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r bmi7 -> r\") #> Based on the proposed directed acyclic graph (DAG), the analysis model #> outcome and complete record indicator are not independent given #> analysis model covariates. Hence, in general, complete records analysis #> is not valid. #> #> In special cases, depending on the type of analysis model and estimand #> of interest, complete records analysis may still be valid. See, for #> example, Bartlett et al. (2015) (https://doi.org/10.1093/aje/kwv114) #> for further details. #> #> There are no other variables which could be added to the model to make #> the analysis model outcome and complete record indicator conditionally #> independent. Consider using a different strategy e.g. multiple #> imputation."},{"path":"https://elliecurnow.github.io/midoc/reference/checkmi.html","id":null,"dir":"Reference","previous_headings":"","what":"Inspect multiple imputation model — checkMI","title":"Inspect multiple imputation model — checkMI","text":"Check multiple imputation valid proposed imputation model directed acyclic graph (DAG). Validity means proposed approach allow unbiased estimation estimand(s) interest, including regression parameters, associations, causal effects. imputation model include analysis model variables predictors, well auxiliary variables. DAG include observed unobserved variables related analysis model variables missingness, well required missingness indicators.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmi.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Inspect multiple imputation model — checkMI","text":"","code":"checkMI(dep, preds, r_dep, mdag)"},{"path":"https://elliecurnow.github.io/midoc/reference/checkmi.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Inspect multiple imputation model — checkMI","text":"dep partially observed variable imputed, specified string preds imputation model predictor(s), specified string (space delimited) r_dep partially observed variable's missingness indicator, specified string mdag DAG, specified string using dagitty syntax","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmi.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Inspect multiple imputation model — checkMI","text":"message indicating whether multiple imputation valid proposed DAG imputation model","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmi.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Inspect multiple imputation model — checkMI","text":"principle, multiple imputation valid partially observed variable unrelated missingness, given imputation model predictors.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmi.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Inspect multiple imputation model — checkMI","text":"Curnow E, Tilling K, Heron JE, Cornish RP, Carpenter JR. 2023. Multiple imputation missing data missing random: including collider auxiliary variable imputation model can induce bias. Frontiers Epidemiology. doi:10.3389/fepid.2023.1237447","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmi.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Inspect multiple imputation model — checkMI","text":"","code":"# Example DAG for which multiple imputation is valid checkMI(dep=\"bmi7\", preds=\"matage mated pregsize\", r_dep=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\") #> Based on the proposed directed acyclic graph (DAG), the incomplete #> variable and its missingness indicator are independent given imputation #> model predictors. Hence, multiple imputation methods which assume data #> are missing at random are valid in principle. # Example DAG for which multiple imputation is not valid, due to a collider checkMI(dep=\"bmi7\", preds=\"matage mated bwt\", r_dep=\"r\", mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7 pregsize -> bwt sep_unmeas -> bwt\") #> Based on the proposed directed acyclic graph (DAG), the incomplete #> variable and its missingness indicator are not independent given #> imputation model predictors. Hence, multiple imputation methods which #> assume data are missing at random are not valid. #> #> Consider using a different imputation model and/or strategy (e.g. #> not-at-random fully conditional specification). For example, the #> incomplete variable and its missingness indicator are independent if, #> in addition to the specified predictors, the following sets of #> variables are included as predictors in the imputation model (note that #> this list is not necessarily exhaustive, particularly if your DAG is #> complex): #> #> pregsize #> #> c(\"pregsize\", \"sep_unmeas\")"},{"path":"https://elliecurnow.github.io/midoc/reference/checkmodspec.html","id":null,"dir":"Reference","previous_headings":"","what":"Inspect parametric model specification — checkModSpec","title":"Inspect parametric model specification — checkModSpec","text":"Explore whether observed relationships specified dataset consistent proposed parametric model (may represent analysis imputation model).","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmodspec.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Inspect parametric model specification — checkModSpec","text":"","code":"checkModSpec(formula, family, data, plot = TRUE, message = TRUE)"},{"path":"https://elliecurnow.github.io/midoc/reference/checkmodspec.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Inspect parametric model specification — checkModSpec","text":"formula symbolic description model fitted, dependent variable left ~ operator, covariates, separated + operators, right, specified string family description error distribution link function used model, specified string; family functions supported \"gaussian(identity)\" \"binomial(logit)\" data data frame containing variables stated formula plot TRUE (default) evidence model mis-specification, displays plot can used explore functional form covariate specified model; use plot = FALSE disable plot message TRUE (default), displays message indicating whether relationships dependent variable covariates likely correctly specified ; use message = FALSE suppress message","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmodspec.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Inspect parametric model specification — checkModSpec","text":"object type 'mimod' (list containing specified formula, family, dataset name). Optionally, message indicating whether relationships dependent variable covariates likely correctly specified . evidence model mis-specification, optionally returns plot model residuals versus fitted values can used explore appropriate functional form specified model.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmodspec.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Inspect parametric model specification — checkModSpec","text":"Curnow E, Carpenter JR, Heron JE, et al. 2023. Multiple imputation missing data missing random: compatible imputation models sufficient avoid bias mis-specified. J Clin Epidemiol. doi:10.1016/j.jclinepi.2023.06.011","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/checkmodspec.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Inspect parametric model specification — checkModSpec","text":"","code":"# Example (incorrectly) assuming a linear relationship checkModSpec(formula=\"bmi7~matage+mated+pregsize\", family=\"gaussian(identity)\", data=bmi) #> Model mis-specification method: regression of model residuals on a #> fractional polynomial of the fitted values #> #> P-value: 0 #> #> A small p-value means the model may be mis-specified. Check the #> specification of each relationship in your model. ## For the example above, (correctly) assuming a quadratic relationship checkModSpec(formula=\"bmi7~matage+I(matage^2)+mated+pregsize\", family=\"gaussian(identity)\", data=bmi) #> Model mis-specification method: regression of model residuals on a #> fractional polynomial of the fitted values #> #> P-value: 1 #> #> A large p-value means there is little evidence of model #> mis-specification."},{"path":"https://elliecurnow.github.io/midoc/reference/descMissData.html","id":null,"dir":"Reference","previous_headings":"","what":"Lists missing data patterns in the specified dataset — descMissData","title":"Lists missing data patterns in the specified dataset — descMissData","text":"function summarises missing data patterns specified dataset. row output corresponds missing data pattern (1=observed, 0=missing). number percentage observations also displayed missing data pattern. first column indicates number missing data patterns. second column refers analysis model outcome ('y'), variables ('covs') displayed subsequent columns. Alternatively, 'y' can used display primary variable interest, e.g. 'y' refer exposure, variables listed 'covs'.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/descMissData.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Lists missing data patterns in the specified dataset — descMissData","text":"","code":"descMissData(y, covs, data, plot = FALSE)"},{"path":"https://elliecurnow.github.io/midoc/reference/descMissData.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Lists missing data patterns in the specified dataset — descMissData","text":"y analysis model outcome, specified string covs analysis model covariate(s), specified string (space delimited) data data frame containing specified analysis model outcome covariate(s) plot TRUE, displays plot using md.pattern visualise missing data patterns; use plot = FALSE (default) disable plot","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/descMissData.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Lists missing data patterns in the specified dataset — descMissData","text":"summary missing data patterns","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/descMissData.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Lists missing data patterns in the specified dataset — descMissData","text":"","code":"descMissData(y=\"bmi7\", covs=\"matage mated\", data=bmi) #> pattern bmi7 matage mated n pct #> 1 1 1 1 1 592 59 #> 2 2 0 1 1 408 41 descMissData(y=\"bmi7\", covs=\"matage mated pregsize bwt\", data=bmi, plot=TRUE) #> pattern bmi7 matage mated pregsize bwt n pct #> 1 1 1 1 1 1 1 592 59 #> 2 2 0 1 1 1 1 408 41"},{"path":"https://elliecurnow.github.io/midoc/reference/doMImice.html","id":null,"dir":"Reference","previous_headings":"","what":"Performs multiple imputation — doMImice","title":"Performs multiple imputation — doMImice","text":"Creates multiple imputations using mice, based options dataset specified call proposeMI. substantive model specified, also calculates pooled estimates using pool.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/doMImice.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Performs multiple imputation — doMImice","text":"","code":"doMImice(mipropobj, seed, substmod = \" \", message = TRUE)"},{"path":"https://elliecurnow.github.io/midoc/reference/doMImice.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Performs multiple imputation — doMImice","text":"mipropobj object type 'miprop', created call 'proposeMI' seed integer used set seed 'mice' call substmod Optionally, symbolic description substantive model fitted, specified string; supplied, model fitted imputed dataset results pooled message TRUE (default), displays message summarising analysis performed; use message = FALSE suppress message","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/doMImice.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Performs multiple imputation — doMImice","text":"'mice' object class 'mids' (multiply imputed datasets). Optionally, message summarising analysis performed.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/doMImice.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Performs multiple imputation — doMImice","text":"","code":"# First specify the imputation model as a 'mimod' object ## (suppressing the message) mimod_bmi7 <- checkModSpec(formula=\"bmi7~matage+I(matage^2)+mated+pregsize\", family=\"gaussian(identity)\", data=bmi, message=FALSE) # Save the proposed 'mice' options as a 'miprop' object ## (suppressing the message) miprop <- proposeMI(mimodobj=mimod_bmi7, data=bmi, message=FALSE, plot = FALSE) # Create the set of imputed datasets using the proposed 'mice' options imp <- doMImice(miprop,123) #> Now you have created your multiply imputed datasets, you can perform #> your analysis and pool the results using the 'mice' functions 'with()' #> and 'pool()' # Additionally, fit the substantive model to each imputed dataset and display ## the pooled results doMImice(miprop, 123, substmod=\"lm(bmi7 ~ matage + I(matage^2) + mated)\") #> Given the substantive model: lm(bmi7 ~ matage + I(matage^2) + mated) , #> multiple imputation estimates are as follows: #> #> term estimate std.error statistic df p.value #> #> 1 (Intercept) 17.6607324 0.07126548 247.816079 233.1668 2.116834e-284 #> #> 2 matage 1.1504545 0.05230345 21.995769 184.5081 1.863532e-53 #> #> 3 I(matage^2) 0.8414975 0.03231752 26.038433 257.1270 4.754845e-74 #> #> 4 mated1 -1.0026194 0.10787751 -9.294054 159.1101 1.094881e-16 #> #> 2.5 % 97.5 % #> #> 1 17.5203258 17.8011389 #> #> 2 1.0472648 1.2536442 #> #> 3 0.7778567 0.9051382 #> #> 4 -1.2156760 -0.7895629"},{"path":"https://elliecurnow.github.io/midoc/reference/exploreDAG.html","id":null,"dir":"Reference","previous_headings":"","what":"Compares data with proposed DAG — exploreDAG","title":"Compares data with proposed DAG — exploreDAG","text":"Explore whether relationships fully observed variables specified dataset consistent proposed directed acyclic graph (DAG) using localTests functionality.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/exploreDAG.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compares data with proposed DAG — exploreDAG","text":"","code":"exploreDAG(mdag, data)"},{"path":"https://elliecurnow.github.io/midoc/reference/exploreDAG.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compares data with proposed DAG — exploreDAG","text":"mdag DAG, specified string using dagitty syntax data data frame containing variables stated DAG. ordinal variables must integer-coded categorical variables must dummy-coded.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/exploreDAG.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compares data with proposed DAG — exploreDAG","text":"message indicating whether relationships fully observed variables specified dataset consistent proposed DAG","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/exploreDAG.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compares data with proposed DAG — exploreDAG","text":"","code":"exploreDAG(mdag=\"matage -> bmi7 mated -> matage mated -> bmi7 sep_unmeas -> mated sep_unmeas -> r\", data=bmi) #> The proposed directed acyclic graph (DAG) implies the following #> conditional independencies (where, for example, 'X _||_ Y | Z' should #> be read as 'X is independent of Y conditional on Z'). Note that #> variable names are abbreviated: #> #> bmi7 _||_ r | sp_n #> #> bmi7 _||_ r | matd #> #> bmi7 _||_ sp_n | matd #> #> matg _||_ r | sp_n #> #> matg _||_ r | matd #> #> matg _||_ sp_n | matd #> #> matd _||_ r | sp_n #> #> These (conditional) independence statements are explored below using #> the canonical correlations approach for mixed data. See #> ??dagitty::localTests for further details. Results are shown for #> variables that are fully observed in the specified dataset. The null #> hypothesis is that the stated variables are (conditionally) #> independent. #> #> estimate p.value 2.5% 97.5% #> #> matage _||_ r | mated 0.02998323 0.343547 -0.03206946 0.09180567 #> #> Interpretation: A small p-value means the stated variables may not be #> (conditionally) independent in the specified dataset: your data may not #> be consistent with the proposed DAG. A large p-value means there is #> little evidence of inconsistency between your data and the proposed #> DAG. #> #> Note that these results assume that relationships between variables are #> linear. Consider exploring the specification of each relationship in #> your model. Also consider whether it is valid and possible to explore #> relationships between partially observed variables using the observed #> data, e.g. avoiding perfect prediction."},{"path":"https://elliecurnow.github.io/midoc/reference/midoc-package.html","id":null,"dir":"Reference","previous_headings":"","what":"midoc: A Decision-Making System for Multiple Imputation — midoc-package","title":"midoc: A Decision-Making System for Multiple Imputation — midoc-package","text":"guidance system analysis missing data. incorporates expert, --date methodology help researchers choose appropriate analysis approach data missing. provide available data assumed causal structure, including likely causes missing data. 'midoc' advise analysis approaches can used, best perform . 'midoc' follows framework treatment reporting missing data observational studies (TARMOS). Lee et al (2021). doi:10.1016/j.jclinepi.2021.01.008 .","code":""},{"path":[]},{"path":"https://elliecurnow.github.io/midoc/reference/midoc-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"midoc: A Decision-Making System for Multiple Imputation — midoc-package","text":"Maintainer: Elinor Curnow elinor.curnow@bristol.ac.uk (ORCID) [copyright holder] Authors: Jon Heron Rosie Cornish Kate Tilling James Carpenter","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/midocVignette.html","id":null,"dir":"Reference","previous_headings":"","what":"Run an interactive vignette for the midoc package — midocVignette","title":"Run an interactive vignette for the midoc package — midocVignette","text":"Runs interactive version midoc vignette: Multiple Imputation DOCtor (midoc). interactive version, can apply midoc functions shiny-package apps using DAG data.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/midocVignette.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Run an interactive vignette for the midoc package — midocVignette","text":"","code":"midocVignette()"},{"path":"https://elliecurnow.github.io/midoc/reference/midocVignette.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Run an interactive vignette for the midoc package — midocVignette","text":"browser-based, interactive version midoc vignette","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/midocVignette.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Run an interactive vignette for the midoc package — midocVignette","text":"","code":"if (FALSE) { # interactive() # Run the interactive vignette midocVignette() }"},{"path":"https://elliecurnow.github.io/midoc/reference/proposeMI.html","id":null,"dir":"Reference","previous_headings":"","what":"Suggests multiple imputation options — proposeMI","title":"Suggests multiple imputation options — proposeMI","text":"Suggests mice options perform multiple imputation, based proposed set imputation models (one partially observed variable) specified dataset.","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/proposeMI.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Suggests multiple imputation options — proposeMI","text":"","code":"proposeMI(mimodobj, data, plot = TRUE, plotprompt = TRUE, message = TRUE)"},{"path":"https://elliecurnow.github.io/midoc/reference/proposeMI.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Suggests multiple imputation options — proposeMI","text":"mimodobj object, list objects, type 'mimod', stands 'multiple imputation model', created call checkModSpec data data frame containing variables required imputation substantive analysis plot TRUE (default), displays diagnostic plots proposed 'mice' call; use plot=FALSE disable plots plotprompt TRUE (default), user prompted second plot displayed; use plotprompt=FALSE remove prompt message TRUE (default), displays message describing proposed 'mice' options; use message=FALSE suppress message","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/proposeMI.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Suggests multiple imputation options — proposeMI","text":"object type 'miprop', can used run 'mice' using proposed options, plus, optionally, message diagnostic plots describing proposed 'mice' options","code":""},{"path":"https://elliecurnow.github.io/midoc/reference/proposeMI.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Suggests multiple imputation options — proposeMI","text":"","code":"# First specify each imputation model as a 'mimod' object ## (suppressing the message) mimod_bmi7 <- checkModSpec(formula=\"bmi7~matage+I(matage^2)+mated+pregsize\", family=\"gaussian(identity)\", data=bmi, message=FALSE) mimod_pregsize <- checkModSpec( formula=\"pregsize~bmi7+matage+I(matage^2)+mated\", family=\"binomial(logit)\", data=bmi, message=FALSE) # Display the proposed 'mice' options (suppressing the plot prompt) ## When specifying a single imputation model proposeMI(mimodobj=mimod_bmi7, data=bmi, plotprompt = FALSE) #> Based on your proposed imputation model and dataset, your mice() call #> should be as follows: #> #> mice(data = bmi , # You may need to specify a subset of the columns in #> your dataset #> #> m = 41 , # You should use at least this number of imputations based on #> the proportion of complete records in your dataset #> #> method = c( ‘norm’ ) # Specify a method for each incomplete variable. #> If displayed, the box-and-whisker plots can be used to inform your #> choice of method(s): for example, if the imputation model does not #> predict extreme values appropriately, consider a different imputation #> model/method e.g. PMM. Note the distribution of imputed and observed #> values is displayed for numeric variables only. The distribution may #> differ if data are missing at random or missing not at random. If you #> suspect data are missing not at random, the plots can also inform your #> choice of sensitivity parameter. #> #> formulas = formulas_list , # Note that you do not additionally need to #> specify a 'predmatrix' #> #> # The formulas_list specifies the conditional imputation models, which #> are as follows: #> #> ‘bmi7 ~ matage + I(matage^2) + mated + pregsize’ #> #> maxit = 10 , # If you have more than one incomplete variable, you #> should check this number of iterations is sufficient by inspecting the #> trace plots, if displayed. Consider increasing the number of iterations #> if there is a trend that does not stabilise by the 10th iteration. Note #> that iteration is not performed when only one variable is partially #> observed. #> #> printFlag = FALSE , # Change to printFlag=TRUE to display the history #> as imputation is performed #> #> seed = NA) # It is good practice to choose a seed so your results are #> reproducible ## When specifying more than one imputation model (suppressing the plots) proposeMI(mimodobj=list(mimod_bmi7,mimod_pregsize), data=bmi, plot = FALSE) #> Based on your proposed imputation model and dataset, your mice() call #> should be as follows: #> #> mice(data = bmi , # You may need to specify a subset of the columns in #> your dataset #> #> m = 41 , # You should use at least this number of imputations based on #> the proportion of complete records in your dataset #> #> method = c( ‘norm’, ‘logreg’ ) # Specify a method for each incomplete #> variable. If displayed, the box-and-whisker plots can be used to #> inform your choice of method(s): for example, if the imputation model #> does not predict extreme values appropriately, consider a different #> imputation model/method e.g. PMM. Note the distribution of imputed and #> observed values is displayed for numeric variables only. The #> distribution may differ if data are missing at random or missing not at #> random. If you suspect data are missing not at random, the plots can #> also inform your choice of sensitivity parameter. #> #> formulas = formulas_list , # Note that you do not additionally need to #> specify a 'predmatrix' #> #> # The formulas_list specifies the conditional imputation models, which #> are as follows: #> #> ‘bmi7 ~ matage + I(matage^2) + mated + pregsize’ #> #> ‘pregsize ~ bmi7 + matage + I(matage^2) + mated’ #> #> maxit = 10 , # If you have more than one incomplete variable, you #> should check this number of iterations is sufficient by inspecting the #> trace plots, if displayed. Consider increasing the number of iterations #> if there is a trend that does not stabilise by the 10th iteration. Note #> that iteration is not performed when only one variable is partially #> observed. #> #> printFlag = FALSE , # Change to printFlag=TRUE to display the history #> as imputation is performed #> #> seed = NA) # It is good practice to choose a seed so your results are #> reproducible"},{"path":"https://elliecurnow.github.io/midoc/news/index.html","id":"midoc-100","dir":"Changelog","previous_headings":"","what":"midoc 1.0.0","title":"midoc 1.0.0","text":"CRAN release: 2024-10-02","code":""},{"path":"https://elliecurnow.github.io/midoc/news/index.html","id":"changes-in-version-1-0-0","dir":"Changelog","previous_headings":"","what":"Changes in version 1.0","title":"midoc 1.0.0","text":"Initial CRAN submission.","code":""}]