Higher Education and Research in Agronomy

Packages

Sensometrics
2012

SIB group - Statistics for Integrative Biology

The missMDA package

This package allows one to:

  • handle missing values in exploratory multivariate analysis such as principal component analysis (PCA), multiple correspondence analysis (MCA) and multiple factor analysis (MFA)
  • impute missing values in continuous data sets using the PCA model
  • impute missing values in categorical data sets using the MCA model
  • generate multiple imputed data sets using the PCA model
  • visualize multiple imputation in PCA

TutoRials (in french)

How to perform a PCA with missing values?

  1. estimate the number of dimensions used in the reconstruction formula with the estim_ncpPCA function
  2. impute the data set with the impute.PCA function using the number of dimensions previously calculated (by default, 2 dimensions are chosen)
  3. perform the PCA on the completed data set using the PCA function of the FactoMineR package

Example

data(orange)
nb <- estim_ncpPCA(orange,ncp.max=5)
res.comp <- imputePCA(orange,ncp=2)

res.pca <- PCA(res.comp$completeObs)

How to perform a multiple correspondence analysis (MCA) with missing values?

  1. estimate the number of dimensions used in the reconstruction formula with the estim_ncpMCA function
  2. impute the data set with the impute.MCA function using the number of dimensions previously calculated (by default, 2 dimensions are chosen); this step impute the disjuntive matrix used in MCA
  3. perform the MCA on the completed disjunctive matrix using the MCA function of the FactoMineR package, and the tab.disj argument

Example

data(vnf)
nb <- estim_ncpMCA(vnf,ncp.max=5)
tab.disj <- imputeMCA(vnf, ncp=4)$tab.disj

res.mca <- MCA(vnf,tab.disj=tab.disj)

How to generate multiple imputed data sets (with continuous variables) ?

  1. estimate the number of dimensions used in the reconstruction formula with the estim_ncpPCA function
  2. generate the imputed data sets with the MIPCA function using the number of dimensions previously calculated (by default, 2 dimensions are chosen)
  3. visualize the imputed data sets with the plot.MIPCA function

Example

data(orange)
nb <- estim_ncpPCA(orange,ncp.max=5)
resMI <- MIPCA(orange,ncp=2)

plot(resMI)

Documentations

Papers:

 Josse, J. & Husson, F. (2013). Handling missing values in exploratory multivariate data analysis methods. Journal de la SFDS. 153 (2), pp.  79-99.

Josse, J., Chavent, M., Liquet, B. & Husson, F.(2012). Handling missing values with Regularized Iterative Multiple Correspondence Analysis. Journal of classification. 29 (1), pp.91-116.

Josse, J. & Husson, F. (2011). Selecting the number of components in PCA using cross-validation approximations.Computational Statististics and Data Analysis. 56 (6), pp. 1869-1879.

Josse, J., Husson, H. & Pagès, J. (2011). Multiple imputation in PCA.  Advances in data analysis and classification. 5 (3), pp. 231-246.

Conferences:

- Missing values imputation for mixed data based on principal component methods.  COMPSTAT, Cyprus, 27-31th 2012. slides.

- Imputation de données manquantes pour des données mixtes via les méthodes factorielles grâce à missMDA. Premières rencontres R, Bordeaux. July 2-3th 2012. abstract - slides.

- missMDA : a package to handle missing values in and with multivariate exploratory data analysis methods. useR ! 2011, Warwick, England, August 15-20th. (abstract, slides).