# An Exploration of Pathologies of Multilevel Principal Components Analysis in Statistical Models of Shape

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Monte Carlo Data Generation

^{2}= 0.0625 and “within-group” variance equals 1. The total variance over all 300 mutually independent variables is equal to: 300 × 1.0625 = 318.75. The percentage of the total variance explained due by between-group variation (Equation (3) below) in Experiment 3 is given by 5.7% (= 100 × 18.75/318.75), whereas this percentage is clearly 0% in Experiments 1 and 2. 100 MC datasets are used in these simulations for Experiments 1 to 3, except for when the sample size per group was equal to 300 (where only 50 MC datasets were used due to computational demands).

#### 2.2. Multilevel Principal Components Analysis (mPCA)

#### 2.3. Maximum Likelihood Solution

## 3. Results

## 4. Discussion

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Zelditch, M.L.; Swiderski, D.L.; Sheets, H.D. Geometric Morphometrics for Biologists: A Primer; Academic Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Elewa, A.M.T. (Ed.) Morphometrics: Applications in Biology and Paleontology; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2004; Volume 14. [Google Scholar]
- Tatsuta, H.; Takahashi, K.H.; Sakamaki, Y. Geometric morphometrics in entomology: Basics and applications. Entomol. Sci.
**2018**, 21, 164–184. [Google Scholar] [CrossRef] [Green Version] - Mitteroecker, P.; Gunz, P. Advances in Geometric Morphometrics. Evol. Biol.
**2009**, 36, 235–247. [Google Scholar] [CrossRef] [Green Version] - Klingenberg, C.P. Size, shape, and form: Concepts of allometry in geometric morphometrics. Dev. Genes Evol.
**2016**, 226, 113–137. [Google Scholar] [CrossRef] [PubMed] - Al-Khatib, A.R. Facial three dimensional surface imaging: An overview. Arch. Orofac. Sci.
**2010**, 5, 1–8. [Google Scholar] - Cau, C.H.; Cronin, A.; Durning, P.; Zhurov, A.I.; Sandham, A.; Richmond, S. A new method for the 3D measurement of postoperative swelling following orthognathic surgery. Orthod. Craniofacial Res.
**2006**, 9, 31–37. [Google Scholar] - Bookstein, F.L. Pathologies of between-groups principal components analysis in geometric morphometrics. Evol. Biol.
**2019**, 46, 271–302. [Google Scholar] [CrossRef] [Green Version] - Cardini, A.; O’Higgins, P.; Rohlf, F.J. Seeing distinct groups where there are none: Spurious patterns from between-group PCA. Evol. Biol.
**2019**, 46, 303–316. [Google Scholar] [CrossRef] - Darlington, R.B.; Weinberg, S.L.; Walberg, H.J. Canonical variate analysis and related techniques. Rev. Educ. Res.
**1973**, 43, 433–454. [Google Scholar] [CrossRef] - Farnell, D.J.J.; Popat, H.; Richmond, S. Multilevel principal component analysis (mPCA) in shape analysis: A feasibility study in medical and dental imaging. Comput. Methods Programs Biomed.
**2016**, 129, 149–159. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Farnell, D.J.J.; Galloway, J.; Zhurov, A.I.; Richmond, S.; Perttiniemi, P.; Katic, V. Initial Results of Multilevel Principal Components Analysis of Facial Shape. Commun. Comput. Inf. Sci.
**2017**, 723, 674–685. [Google Scholar] - Farnell, D.J.J.; Galloway, J.; Zhurov, A.I.; Richmond, S.; Perttiniemi, P.; Lähdesmäki, R. What’s in a Smile? Initial Results of Multilevel Principal Components Analysis of Facial Shape and Image Texture. Commun. Comput. Inf. Sci.
**2018**, 894, 177–188. [Google Scholar] - Farnell, D.J.J.; Galloway, J.; Zhurov, A.I.; Richmond, S.; Marshall, D.; Rosin, P.L.; Al-Meyah, K.; Perttiniemi, P.; Lähdesmäki, R. What’s in a Smile? Initial Analyses of Dynamic Changes in Facial Shape and Appearance. J. Imaging
**2019**, 5, 2. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Farnell, D.J.J.; Galloway, J.; Zhurov, A.I.; Richmond, S. Multilevel Models of Age-Related Changes in Facial Shape in Adolescents. Commun. Comput. Inf. Sci.
**2020**, 1065, 101–113. [Google Scholar] - Farnell, D.J.J.; Richmond, S.; Galloway, J.; Zhurov, A.I.; Pirttiniemi, P.; Heikkinen, T.; Harila, V.; Matthews, H.; Claes, P. Multilevel Principal Components Analysis of Three-Dimensional Facial Growth in Adolescents. Comput. Methods Programs Biomed.
**2019**, 188, 105272. [Google Scholar] [CrossRef] [PubMed] - Galloway, J.; Farnell, D.J.J.; Richmond, S.; Zhurov, A.I. Multilevel Analysis of the Influence of Maternal Smoking and Alcohol Consumption on the Facial Shape of English Adolescents. J. Imaging
**2020**, 6, 34. [Google Scholar] [CrossRef] [PubMed] - Rohlf, F.J. Why Clusters and Other Patterns Can Seem to be Found in Analyses of High-Dimensional Data. Evol. Biol.
**2021**, 48, 1–16. [Google Scholar] [CrossRef] - Cardini, A.; Polly, P.D. Cross-validated Between Group PCA Scatterplots: A Solution to Spurious Group Separation? Evol. Biol.
**2020**, 47, 85–95. [Google Scholar] [CrossRef] - Thioulouse, J.; Renaud, S.; Dufour, A.B.; Dray, S. Overcoming the Spurious Groups Problem in Between-Group PCA. Evol. Biol.
**2021**, 48, 458–471. [Google Scholar] [CrossRef] - Marchenko, V.A.; Pastur, L.A. Distribution of eigenvalues for some sets of random matrices. Math. USSR-Sb.
**1967**, 1, 457–483. [Google Scholar] [CrossRef]

**Figure 3.**Experiment 1: eigenvalues (

**upper row**), mPCA, level 1 component scores (

**middle row**), and single-level PCA, component scores (

**bottom row**) for sample sizes per group of ${n}_{l}$ = 10 (

**left-hand column**), ${n}_{l}$ = 100 (

**middle column**), and ${n}_{l}$ = 300 (

**right-hand column**) in all groups $l=1,2,3$. Group centroids for component scores are shown by the diamonds; x-axis = first component; y-axis = second component.

**Figure 4.**Extrapolation of the mean (over all MC simulations) sum of all eigenvalues for mPCA in the limit sample size per group ${n}_{l}\to \infty $ for Experiment 1 (

**top left**), Experiment 2a (

**top right**), Experiment 2b (

**bottom left**), and Experiment 3 (

**bottom right**). These values via mPCA scale approximately linearly with ${n}_{l}^{-1}$. Reference values for the asymptotic estimates of the total variance are shown by the dashed lines in these figures (equal to 300 for Experiments 1 and 2 and to 318.75 for Experiment 3). Results of single-level PCA for Experiment 3 are approximately “flat” with respect to ${n}_{l}^{-1}$. (Standard errors are shown by the error bars).

**Figure 5.**Experiment 2a: eigenvalues (

**upper row**), mPCA, level 1 component scores (

**middle row**), and single-level PCA, component scores (

**bottom row**) for sample sizes per group of ${n}_{1,2}$ = 30 (

**left-hand column**), ${n}_{1,2}$ = 100 (

**middle column**), and ${n}_{1,2}$ = 300 (

**right-hand column**) in groups 1 and 2. Note that ${n}_{3}$ = 10 in group 3 in all simulations for Experiment 2a. Group centroids for are shown by the diamonds; x-axis = first component; y-axis = second component.

**Figure 6.**Experiment 2b: eigenvalues (

**upper row**) and mPCA, level 1 component scores (

**bottom row**) for sample sizes per group of ${n}_{1,2}$ = 30 (

**left-hand column**), ${n}_{1,2}$ = 100 (

**middle column**), and ${n}_{1,2}$ = 300 (

**right-hand column**) in groups 1 and 2. Note that ${n}_{3}$ = 10 in group 3 in all simulations for Experiment 2b. Group centroids are shown by the diamonds; x-axis = first component; y-axis = second component. (Results for single-level PCA are as shown in Figure 5).

**Figure 7.**Experiment 3: eigenvalues (

**upper row**), mPCA, level 1 component scores (

**middle row**), and single-level PCA, component scores (

**bottom row**) for sample sizes per group of ${n}_{l}$ = 10 (

**left-hand column**), ${n}_{l}$ = 100 (

**middle column**), and ${n}_{l}$ = 300 (

**right-hand column**) in all groups $l=1,2,3$. Between-group variation has been added in this case and so component scores should be strongly separated for all values of ${n}_{l}$. Group centroids are shown by the diamonds; x-axis = first component; y-axis = second component.

**Figure 8.**Experiment 4: eigenvalues (

**upper row**), mPCA, level 1 component scores (

**middle row**), and single-level PCA, component scores (

**bottom row**) for sample sizes per group of ${n}_{l}$ = 10 (

**left-hand column**), ${n}_{l}$ = 100 (

**middle column**), and ${n}_{l}$ = 300 (

**right-hand column**) in both groups $l=1,2$. Group centroids are shown by the diamonds; x-axis = first component; y-axis = second component.

**Figure 9.**Experiment 5: eigenvalues (

**upper row**), mPCA, level 1 component scores (

**middle row**), and single-level PCA, component scores (

**bottom row**) for sample sizes per group of ${n}_{l}$ = 10 (

**left-hand column**), ${n}_{l}$ = 100 (

**middle column**), and ${n}_{l}$ = 300 (

**right-hand column**) in both groups $l=1,2$. Group centroids are shown by the diamonds; x-axis = first component; y-axis = second component.

**Figure 10.**Extrapolation of the mean (over all MC simulations) sum of all eigenvalues for PCA and mPCA in the limit sample size per group ${n}_{l}\to \infty $ for Experiments 4 (

**left**) and 5 (

**right**). The values for mPCA again scale approximately linearly with ${n}_{l}^{-1}$. Results of single-level PCA are approximately “flat” with respect to ${n}_{l}^{-1}$. (Standard errors are shown by the error bars).

**Table 1.**Overview of all MC simulations carried out here. Experiments 1 to 3 use uncorrelated normally distributed variables, whereas Experiments 4 and 5 use correlated normally distributed data inspired by 21 3D landmark points (thus 63 variables), as shown in Figure 1. (# Variables = number of variables; Correlated? = whether or not these variables are correlated or uncorrelated; BG Variation? = whether or not between-group variation is used in data generation; Balanced? = whether or not sample sizes are equal in all groups; Weighted? = whether or not weighted covariance matrices of Equations (7) and (8) are used).

Exp. 1 | Exp. 2a | Exp. 2b | Exp. 3 | Exp. 4 | Exp. 5 | |
---|---|---|---|---|---|---|

# Variables | 300 | 300 | 300 | 300 | 63 (=3 × 21) | 63 (=3 × 21) |

Correlated? | No | No | No | No | Yes | Yes |

BG Variation? | No | No | No | Yes | No | Yes |

Balanced? | Yes | No | No | Yes | Yes | Yes |

Weighted? | No | No | Yes | No | No | No |

**Table 2.**Mean (over all MC simulations) of percentage variance of Equation (3) explained by level 1 via mPCA for Experiments 1 to 5. Experiment 2a using standard mPCA, whereas Experiment 2b uses the weighted “population” covariance matrices of Equations (7) and (8). Reference values are given via asymptotic estimates for Experiments 1 to 4 and from experimental data for Experiment 5 (${n}_{1}=124;{n}_{2}=126$). (Standard errors are shown in brackets).

Exp. 1 $\left({\mathit{n}}_{3}={\mathit{n}}_{1,2}\right)$ | Exp. 2a $({\mathit{n}}_{3}=10)$ | Exp. 2b $({\mathit{n}}_{3}=10)$ | Exp. 3 $\left({\mathit{n}}_{3}={\mathit{n}}_{1,2}\right)$ | Exp. 4 $\left({\mathit{n}}_{1}={\mathit{n}}_{2}\right)$ | Exp. 5 $\left({\mathit{n}}_{1}={\mathit{n}}_{2}\right)$ | |
---|---|---|---|---|---|---|

${n}_{1,2}$ = 10 | 11.5% (0.63%) | 11.5% (0.63%) | 8.7% (0.51%) | 17.4% (2.20%) | 10.2% (4.85%) | 19.2% (5.01%) |

${n}_{1,2}$ = 30 | 3.3% (0.16%) | 5.4% (0.35%) | 3.1% (0.17%) | 9.0% (0.50%) | 3.1% (1.20%) | 12.8% (2.59%) |

${n}_{1,2}$ = 50 | 2.0% (0.11%) | 4.5% (0.27%) | 1.9% (0.12%) | 7.7% (0.41%) | 2.0% (0.84%) | 12.2% (2.31%) |

${n}_{1,2}$ = 100 | 1.0% (0.06%) | 3.9% (0.25%) | 1.0% (0.06%) | 6.7% (0.39%) | 0.9% (0.33%) | 11.3% (1.55%) |

${n}_{1,2}$ = 200 | 0.5% (0.03%) | 3.6% (0.26%) | 0.5% (0.03%) | 6.3% (0.33%) | 0.5% (0.21%) | 10.7% (1.03%) |

${n}_{1,2}$ = 300 | 0.3% (0.02%) | 3.5% (0.30%) | 0.3% (0.02%) | 6.1% (0.35%) | 0.3% (0.12%) | 10.6% (0.87%) |

Reference | 0% | 0% | 0% | 5.9% | 0.0% | 10.4% |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Farnell, D.J.J.
An Exploration of Pathologies of Multilevel Principal Components Analysis in Statistical Models of Shape. *J. Imaging* **2022**, *8*, 63.
https://doi.org/10.3390/jimaging8030063

**AMA Style**

Farnell DJJ.
An Exploration of Pathologies of Multilevel Principal Components Analysis in Statistical Models of Shape. *Journal of Imaging*. 2022; 8(3):63.
https://doi.org/10.3390/jimaging8030063

**Chicago/Turabian Style**

Farnell, Damian J. J.
2022. "An Exploration of Pathologies of Multilevel Principal Components Analysis in Statistical Models of Shape" *Journal of Imaging* 8, no. 3: 63.
https://doi.org/10.3390/jimaging8030063