The missMDA package
This package allows one to:
- handle missing values in exploratory multivariate analysis such as principal component analysis (PCA), multiple correspondence analysis (MCA) and multiple factor analysis (MFA)
- impute missing values in continuous data sets using the PCA model
- impute missing values in categorical data sets using the MCA model
- generate multiple imputed data sets using the PCA model
- visualize multiple imputation in PCA
TutoRials (in french)
How to perform a PCA with missing values?
- estimate the number of dimensions used in the reconstruction formula with the estim_ncpPCA function
- impute the data set with the impute.PCA function using the number of dimensions previously calculated (by default, 2 dimensions are chosen)
- perform the PCA on the completed data set using the PCA function of the FactoMineR package
Example
data(orange)
nb <- estim_ncpPCA(orange,ncp.max=5)
res.comp <- imputePCA(orange,ncp=2)
res.pca <- PCA(res.comp$completeObs)
How to perform a multiple correspondence analysis (MCA) with missing values?
- estimate the number of dimensions used in the reconstruction formula with the estim_ncpMCA function
- impute the data set with the impute.MCA function using the number of dimensions previously calculated (by default, 2 dimensions are chosen); this step impute the disjuntive matrix used in MCA
- perform the MCA on the completed disjunctive matrix using the MCA function of the FactoMineR package, and the tab.disj argument
Example
data(vnf)
nb <- estim_ncpMCA(vnf,ncp.max=5)
tab.disj <- imputeMCA(vnf, ncp=4)$tab.disj
res.mca <- MCA(vnf,tab.disj=tab.disj)
How to generate multiple imputed data sets (with continuous variables) ?
- estimate the number of dimensions used in the reconstruction formula with the estim_ncpPCA function
- generate the imputed data sets with the MIPCA function using the number of dimensions previously calculated (by default, 2 dimensions are chosen)
- visualize the imputed data sets with the plot.MIPCA function
Example
data(orange)
nb <- estim_ncpPCA(orange,ncp.max=5)
resMI <- MIPCA(orange,ncp=2)
plot(resMI)
Documentations
Papers:
Josse, J. & Husson, F. (2013). Handling missing values in exploratory multivariate data analysis methods. Journal de la SFDS. 153 (2), pp. 79-99.
Josse, J., Chavent, M., Liquet, B. & Husson, F.(2012). Handling missing values with Regularized Iterative Multiple Correspondence Analysis. Journal of classification. 29 (1), pp.91-116.
Josse, J. & Husson, F. (2011). Selecting the number of components in PCA using cross-validation approximations.Computational Statististics and Data Analysis. 56 (6), pp. 1869-1879.
Josse, J., Husson, H. & Pagès, J. (2011). Multiple imputation in PCA. Advances in data analysis and classification. 5 (3), pp. 231-246.
Conferences:
- Missing values imputation for mixed data based on principal component methods. COMPSTAT, Cyprus, 27-31th 2012. slides.
- Imputation de données manquantes pour des données mixtes via les méthodes factorielles grâce à missMDA. Premières rencontres R, Bordeaux. July 2-3th 2012. abstract - slides.
- missMDA : a package to handle missing values in and with multivariate exploratory data analysis methods. useR ! 2011, Warwick, England, August 15-20th. (abstract, slides).


