P.Mean >> Category >> Missing data (created 2010-02-04).

These pages discuss issues about missing data. This is a relatively new category and there is no comparable category at the StATS website.


  1. P.Mean: Formula for multiple imputation (created 2009-07-24). I'm working on a project that involves multiple imputation, and I may have to program some of the work myself. I can use the R package MICE to generate the imputed data sets, but then I have to use a mixed linear model rather than a linear model. How do I combine the estimates from the multiple imputed data sets? The estimate is just the average of the individual estimates, but what about the standard error?
Outside resources:

Saveli Goldberg, Andrzej Niemierko, Maria Shubina, Alexander Turchin. "Summary Page": a novel tool that reduces omitted data in research databases. BMC Medical Research Methodology. 2010;10(1):91. Abstract: "BACKGROUND: Data entry errors are common in clinical research databases. Omitted data are of particular concern because they are more common than erroneously inserted data and therefore could potentially affect research findings. However, few affordable strategies for their prevention are available. METHODS: We have conducted a prospective observational study of the effect of a novel tool called "Summary Page" on the frequency of correction of omitted data errors in a radiation oncology research database between July 2008 and March 2009. "Summary Page" was implemented as an optionally accessed screen in the database that visually integrates key fields in the record. We assessed the frequency of omitted data on the example of the Date of Relapse field. We considered the data in this field to be omitted for all records that had empty Date of Relapse field and evidence of relapse elsewhere in the record. RESULTS: A total of 1,156 records were updated and 200 new records were entered in the database over the study period. "Summary Page" was accessed for 44% of all updated records and for 69% of newly entered records. Frequency of correction of the omitted date of cancer relapse was six-fold higher in records for which "Summary Page" was accessed (p = 0.0003). CONCLUSIONS: "Summary Page" was strongly associated with an increased frequency of correction of omitted data errors. Further, controlled, studies are needed to confirm this finding and elucidate its mechanism of action." [Accessed December 28, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/91.

Journal article: Manisha Desai, Denise A Esserman, Marilie D Gammon, Mary B Terry. The use of complete-case and multiple imputation-based analyses in molecular epidemiology studies that assess interaction effects Epidemiologic Perspectives & Innovations: EP+I. 2011;8(1):5. ABSTRACT: "BACKGROUND: In molecular epidemiology studies biospecimen data are collected, often with the purpose of evaluating the synergistic role between a biomarker and another feature on an outcome. Typically, biomarker data are collected on only a proportion of subjects eligible for study, leading to a missing data problem. Missing data methods, however, are not customarily incorporated into analyses. Instead, complete-case (CC) analyses are performed, which can result in biased and inefficient estimates. METHODS: Through simulations, we characterized the performance of CC methods when interaction effects are estimated. We also investigated whether standard multiple imputation (MI) could improve estimation over CC methods when the data are not missing at random (NMAR) and auxiliary information may or may not exist. RESULTS: CC analyses were shown to result in considerable bias and efficiency loss. While MI reduced bias and increased efficiency over CC methods under specific conditions, it too resulted in biased estimates depending on the strength of the auxiliary data available and the nature of the missingness. In particular, CC performed better than MI when extreme values of the covariate were more likely to be missing, while MI outperformed CC when missingness of the covariate related to both the covariate and outcome. MI always improved performance when strong auxiliary data were available. In a real study, MI estimates of interaction effects were attenuated relative to those from a CC approach. CONCLUSIONS: Our findings suggest the importance of incorporating missing data methods into the analysis. If the data are MAR, standard MI is a reasonable method. Auxiliary variables may make this assumption more reasonable even if the data are NMAR. Under NMAR we emphasize caution when using standard MI and recommend it over CC only when strong auxiliary data are available. MI, with the missing data mechanism specified, is an alternative when the data are NMAR. In all cases, it is recommended to take advantage of MI's ability to account for the uncertainty of these assumptions." [Accessed on October 11, 2011].

Journal article: Rhian M Daniel, Michael G Kenward, Simon N Cousens, Bianca L De Stavola. Using causal diagrams to guide analysis in missing data problems Statistical Methods in Medical Research. 2011. Abstract: "Estimating causal effects from incomplete data requires additional and inherently untestable assumptions regarding the mechanism giving rise to the missing data. We show that using causal diagrams to represent these additional assumptions both complements and clarifies some of the central issues in missing data theory, such as Rubin's classification of missingness mechanisms (as missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR)) and the circumstances in which causal effects can be estimated without bias by analysing only the subjects with complete data. In doing so, we formally extend the back-door criterion of Pearl and others for use in incomplete data examples. These ideas are illustrated with an example drawn from an occupational cohort study of the effect of cosmic radiation on skin cancer incidence." [Accessed on September 7, 2011]. http://www.ncbi.nlm.nih.gov/pubmed/21389091.

What now?

Browse other categories at this site

Browse through the most recent entries

Get help

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2017-06-15.