The missMDA package
This package allows one to:
- handle missing values in exploratory multivariate analysis such as principal component analysis (PCA), multiple correspondence analysis (MCA) and multiple factor analysis (MFA)
- impute missing values in continuous data sets using the PCA model
- impute missing values in categorical data sets using the MCA model
- generate multiple imputed data sets using the PCA model
- visualize multiple imputation in PCA
TutoRials (in french)
How to perform a PCA with missing values?
- estimate the number of dimensions used in the reconstruction formula with the estim_ncpPCA function
- impute the data set with the impute.PCA function using the number of dimensions previously calculated (by default, 2 dimensions are chosen)
- perform the PCA on the completed data set using the PCA function of the FactoMineR package
Example
data(orange)
nb <- estim_ncpPCA(orange,ncp.max=5)
res.comp <- imputePCA(orange,ncp=2)
res.pca <- PCA(res.comp$completeObs)
How to perform a multiple correspondence analysis (MCA) with missing values?
- estimate the number of dimensions used in the reconstruction formula with the estim_ncpMCA function
- impute the data set with the impute.MCA function using the number of dimensions previously calculated (by default, 2 dimensions are chosen); this step impute the disjuntive matrix used in MCA
- perform the MCA on the completed disjunctive matrix using the MCA function of the FactoMineR package, and the tab.disj argument
Example
data(vnf)
nb <- estim_ncpMCA(vnf,ncp.max=5)
tab.disj <- imputeMCA(vnf, ncp=4)
res.mca <- MCA(vnf,tab.disj=tab.disj)
How to generate multiple imputed data sets (with continuous variables)?
- estimate the number of dimensions used in the reconstruction formula with the estim_ncpPCA function
- generate the imputed data sets with the MIPCA function using the number of dimensions previously calculated (by default, 2 dimensions are chosen)
- visualize the imputed data sets with the plot.MIPCA function
Example
data(orange)
nb <- estim_ncpPCA(orange,ncp.max=5)
resMI <- MIPCA(orange,ncp=2)
plot(resMI)


