Supports de cours

Packages

# The missMDA package

## This package allows one to:

• handle missing values in exploratory multivariate analysis such as principal component analysis (PCA), correspondence analysis (CA), multiple correspondence analysis (MCA),  factor analysis for mixed data (FAMD) and multiple factor analysis (MFA)
• impute missing values in:
• continuous data sets using the PCA model
• categorical data sets using MCA
• contingency table
• mixed data using FAMD
• generate multiple imputed data sets:
• for continuous data using the PCA mod﻿e﻿l
• for categorical data using MCA
• visualize multiple imputation in PCA and MCA

### How to perform a PCA with missing values?

1. estimate the number of dimensions used in the reconstruction formula with the estim_ncpPCA function
2. impute the data set with the impute.PCA function using the number of dimensions previously calculated (by default, 2 dimensions are chosen)
3. perform the PCA on the completed data set using the PCA function of the FactoMineR package

#### Example

```data(orange) nb <- estim_ncpPCA(orange,ncp.max=5) res.comp <- imputePCA(orange,ncp=2) res.pca <- PCA(res.comp\$completeObs)```

### How to perform a multiple correspondence analysis (MCA) with missing values?

1. estimate the number of dimensions used in the reconstruction formula with the estim_ncpMCA function
2. impute the data set with the impute.MCA function using the number of dimensions previously calculated (by default, 2 dimensions are chosen); this step impute the disjuntive matrix used in MCA
3. perform the MCA on the completed disjunctive matrix using the MCA function of the FactoMineR package, and the tab.disj argument

### Example

`data(vnf) nb <- estim_ncpMCA(vnf,ncp.max=5) tab.disj <- imputeMCA(vnf, ncp=4)\$tab.disj res.mca <- MCA(vnf,tab.disj=tab.disj)`

### How to generate multiple imputed data sets (with continuous variables) ?

1. estimate the number of dimensions used in the reconstruction formula with the estim_ncpPCA function
2. generate the imputed data sets with the MIPCA function using the number of dimensions previously calculated (by default, 2 dimensions are chosen)
3. visualize the imputed data sets with the plot.MIPCA function

### Example

```data(orange) nb <- estim_ncpPCA(orange,ncp.max=5) resMI <- MIPCA(orange,ncp=2) plot(resMI)```

### Documentations

Vignette:

Multiple imputation with principal component methods: a user guide written by V. Audigier [lien]

Papers:

Audigier, V., Husson, F., and Josse, J. (2017). MIMCA: multiple imputation for categorical variables with multiple correspondence analysis. Statistics and Computing, 27(2):501-518.

Josse, J., & Husson, F. (2016). missMDA: a package for handling missing values in multivariate data analysis. Journal of Statistical Software, 70(1), 1-31.

Audigier, V., Husson, F., and Josse, J. (2016). Multiple imputation for continuous variables using a bayesian principal component analysis. Journal of Statistical Computation and Simulation, 86(11):2140-2156.

Audigier, V., Husson, F., and Josse, J. (2016). A principal component method to impute missing values for mixed data. Advances in Data Analysis and Classification, 10(1):5-26.

Josse, J. & Husson, F. (2013). Handling missing values in exploratory multivariate data analysis methods. Journal de la SFDS. 153 (2), pp.  79-99.

Josse, J., Chavent, M., Liquet, B. & Husson, F.(2012). Handling missing values with Regularized Iterative Multiple Correspondence Analysis. Journal of classification. 29 (1), pp.91-116.

Josse, J. & Husson, F. (2011). Selecting the number of components in PCA using cross-validation approximations.Computational Statististics and Data Analysis. 56 (6), pp. 1869-1879.

Josse, J., Husson, H. & Pagès, J. (2011). Multiple imputation in PCA.  Advances in data analysis and classification. 5 (3), pp. 231-246.

Conferences:

- Missing values imputation for mixed data based on principal component methods.  COMPSTAT, Cyprus, 27-31th 2012. slides.

- Imputation de données manquantes pour des données mixtes via les méthodes factorielles grâce à missMDA. Premières rencontres R, Bordeaux. July 2-3th 2012. abstract - slides.

- missMDA : a package to handle missing values in and with multivariate exploratory data analysis methods. useR ! 2011, Warwick, England, August 15-20th. (abstract, slides).