Institut supérieur des sciences agronomiques, agroalimentaires, horticoles et du paysage

Packages


The missMDA package

This package allows one to:

  • handle missing values in exploratory multivariate analysis such as principal component analysis (PCA), multiple correspondence analysis (MCA),  factor analysis for mixed data (FAMD) and multiple factor analysis (MFA)
  • impute missing values in:
    • continuous data sets using the PCA model
    • categorical data sets using MCA
    • mixed data using FAMD
  • generate multiple imputed data sets:
    • for continuous data using the PCA model
    • for categorical data using MCA
  • visualize multiple imputation in PCA and MCA

TutoRials (in french)

How to perform a PCA with missing values?

  1. estimate the number of dimensions used in the reconstruction formula with the estim_ncpPCA function
  2. impute the data set with the impute.PCA function using the number of dimensions previously calculated (by default, 2 dimensions are chosen)
  3. perform the PCA on the completed data set using the PCA function of the FactoMineR package

Example

data(orange)
nb <- estim_ncpPCA(orange,ncp.max=5)
res.comp <- imputePCA(orange,ncp=2)

res.pca <- PCA(res.comp$completeObs)

How to perform a multiple correspondence analysis (MCA) with missing values?

  1. estimate the number of dimensions used in the reconstruction formula with the estim_ncpMCA function
  2. impute the data set with the impute.MCA function using the number of dimensions previously calculated (by default, 2 dimensions are chosen); this step impute the disjuntive matrix used in MCA
  3. perform the MCA on the completed disjunctive matrix using the MCA function of the FactoMineR package, and the tab.disj argument

Example

data(vnf)
nb <- estim_ncpMCA(vnf,ncp.max=5)
tab.disj <- imputeMCA(vnf, ncp=4)$tab.disj

res.mca <- MCA(vnf,tab.disj=tab.disj)

How to generate multiple imputed data sets (with continuous variables) ?

  1. estimate the number of dimensions used in the reconstruction formula with the estim_ncpPCA function
  2. generate the imputed data sets with the MIPCA function using the number of dimensions previously calculated (by default, 2 dimensions are chosen)
  3. visualize the imputed data sets with the plot.MIPCA function

Example

data(orange)
nb <- estim_ncpPCA(orange,ncp.max=5)
resMI <- MIPCA(orange,ncp=2)

plot(resMI)

Documentations

Vignette:

Multiple imputation with principal component methods: a user guide written by V. Audigier [lien]

Papers:

Audigier, V., Husson, F., and Josse, J. (2017). MIMCA: multiple imputation for categorical variables with multiple correspondence analysis. Statistics and Computing, 27(2):501-518.

Josse, J., & Husson, F. (2016). missMDA: a package for handling missing values in multivariate data analysis. Journal of Statistical Software, 70(1), 1-31.

Audigier, V., Husson, F., and Josse, J. (2016). Multiple imputation for continuous variables using a bayesian principal component analysis. Journal of Statistical Computation and Simulation, 86(11):2140-2156.

Audigier, V., Husson, F., and Josse, J. (2016). A principal component method to impute missing values for mixed data. Advances in Data Analysis and Classification, 10(1):5-26.


Josse, J. & Husson, F. (2013). Handling missing values in exploratory multivariate data analysis methods. Journal de la SFDS. 153 (2), pp.  79-99.

Josse, J., Chavent, M., Liquet, B. & Husson, F.(2012). Handling missing values with Regularized Iterative Multiple Correspondence Analysis. Journal of classification. 29 (1), pp.91-116.

Josse, J. & Husson, F. (2011). Selecting the number of components in PCA using cross-validation approximations.Computational Statististics and Data Analysis. 56 (6), pp. 1869-1879.

Josse, J., Husson, H. & Pagès, J. (2011). Multiple imputation in PCA.  Advances in data analysis and classification. 5 (3), pp. 231-246.

Conferences:

- Missing values imputation for mixed data based on principal component methods.  COMPSTAT, Cyprus, 27-31th 2012. slides.

- Imputation de données manquantes pour des données mixtes via les méthodes factorielles grâce à missMDA. Premières rencontres R, Bordeaux. July 2-3th 2012. abstract - slides.

- missMDA : a package to handle missing values in and with multivariate exploratory data analysis methods. useR ! 2011, Warwick, England, August 15-20th. (abstract, slides).