Publications des scientifiques de l'IRD

Cottrell Gilles, Cot Michel, Mary J. Y. (2009). L'imputation multiple des données manquantes aléatoirement : concepts généraux et présentation d'une méthode Monte-Carlo = Multiple imputation of missing at random data : general points and presentation of a Monte-Carlo method. Revue d'Epidémiologie et de Santé Publique, 57 (5), p. 361-372. ISSN 0398-7620.

Titre du document
L'imputation multiple des données manquantes aléatoirement : concepts généraux et présentation d'une méthode Monte-Carlo = Multiple imputation of missing at random data : general points and presentation of a Monte-Carlo method
Année de publication
2009
Type de document
Article référencé dans le Web of Science WOS:000271525600007
Auteurs
Cottrell Gilles, Cot Michel, Mary J. Y.
Source
Revue d'Epidémiologie et de Santé Publique, 2009, 57 (5), p. 361-372 ISSN 0398-7620
Background. - Statistical analysis of a data set with missing data is a frequent problem to deal with in epidemiology. Methods are available to manage incomplete observations, avoiding biased estimates and improving their precision, compared to more traditional methods, such as the analysis of the sub-sample of complete observations. Methods. - One of these approaches is multiple imputation, which consists in imputing successively several values for each missing data item. Several completed data sets having the same distribution characteristics as the observed data (variability and correlations) are thus generated. Standard analyses are done separately on each completed dataset then combined to obtain a global result. In this paper, we discuss the various assumptions made on the origin of missing data (at random or not), and we present in a pragmatic way the process of multiple imputation. A recent method, Multiple Imputation by Chained Equations (MICE), based on a Monte-Carlo Markov Chain algorithm under missing at random data (MAR) hypothesis, is described. An illustrative example of the MICE method is detailed for the analysis of the relation between a dichotomous variable and two covariates presenting MAR data with no particular structure, through multivariate logistic regression. Results. - Compared with the original dataset without missing data, the results show a substantial improvement of the regression coefficient estimates with the MICE method, relatively to those obtained on the dataset with complete observations. Conclusion. - This method does not require any direct assumption on joint distribution of the variables and it is presently implemented in standard statistical software (Splus, Stata). It can be used for multiple imputation of missing data of several variables with no particular structure.
Plan de classement
Santé : généralités [050]
Localisation
Fonds IRD [F B010048385]
Identifiant IRD
fdi:010048385
Contact