# Investigating Shape Variation Using Generalized Procrustes Analysis and Machine Learning

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{*}

## Abstract

**:**

## 1. Introduction

**Hypothesis**

**1.**

**Hypothesis**

**2.**

## 2. Materials and Methods

#### 2.1. Data Sources

#### 2.2. Visible Information Extraction

#### 2.2.1. Landmark Placement and Processing

#### 2.2.2. Bayesian Gaussian Process Latent Variable Model

- Analyze $\mathbf{Y}$ using the principal component analysis [49]. Estimate the latent dimension ${D}^{*}$, which explains $75\%$ of the data.
- Keep ${D}^{*}$ fixed and find the minimum number of auxiliary points reaching $95\%$ of $\mathrm{ln}\left(p\right(\mathbf{Y}\left)\right)$.
- Keep the number of auxiliary points fixed and find the number of features D maximizing $\mathrm{ln}\left(p\right(\mathbf{Y}\left)\right)$.

#### 2.2.3. Convolutional Autoencoder

#### 2.2.4. Visual Interpretation of the Features

#### 2.3. Multivariate Data Analysis

#### 2.4. Investigation of Morphological Diversity

#### 2.4.1. Visible Features Clustering and Population Structure Investigation

- That the used models may not be a good approximation for the unknown probability density functions,
- The data are typically restricted as well as incomplete, and
- Different k’s may capture biological significant information on different scales [67].

#### 2.4.2. Consensus Clustering

## 3. Experimental Results

#### 3.1. Visible Diversity Relying on Landmarks

#### 3.2. Visible Diversity Relying on Machine Learning Models

#### 3.2.1. Gaussian Process Latent Variable Model

#### 3.2.2. Convolutional Autoencoder

#### 3.3. Multivariate Data Analysis

#### 3.4. Latent Structure Investigation Relying on pCA

## 4. Discussion

## 5. Summary

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Abbreviations

AE | Autoencoder |

ARD | Automatic relevance determination |

B-GP-LVM | Bayesian Gaussian process latent variable model |

CA | Co-association |

cAE | Convolutional autoencoder |

CNN | Convolutional neuronal network |

GMM | Gaussian mixture model |

GPA | Generalized Procrustes Analysis |

GP-LVM | Gaussian process latent variable model |

ICA | Independent component analysis |

MSE | Mean squared error |

pCA | Probabilistic co-association |

PCA | Principal component analysis |

RBF | Radial base function |

## References

- Thompson, D.W. On Growth and Form; Cambridge University Press: London, UK, 1945. [Google Scholar]
- Abzhanov, A. The old and new faces of morphology: The legacy of D’Arcy Thompson’s ‘theory of transformations’ and ‘laws of growth’. Development
**2017**, 144, 4284–4297. [Google Scholar] [CrossRef] [PubMed][Green Version] - Webster, M.; Sheets, D.H. A Practical Introduction to Landmark-Based Geometric Morphometrics. Paleontol. Soc. Pap.
**2010**, 16, 163–188. [Google Scholar] [CrossRef][Green Version] - Strauss, R.; Bond, C. Taxonomic Methods: Morphology. In Methods for Fish Biology; American Fisheries Society: Bethesda, MD, USA, 1990; pp. 109–140. [Google Scholar]
- Tibihika, P.D.; Waidbacher, H.; Masembe, C.; Curto, M.; Sabatino, S.; Negash, E.; Meulenbroek, P.; Akoll, P.; Meimberg, H. Anthropogenic impacts on the contextual morphological diversification and adaptation of Nile tilapia (Oreochromis niloticus, L. 1758) in East Africa. Environ. Biol. Fishes
**2018**, 101, 363–381. [Google Scholar] [CrossRef][Green Version] - Kerschbaumer, M.; Bauer, C.; Herler, J.; Postl, L.; Makasa, L.; Sturmbauer, C. Assessment of traditional versus geometric morphometrics for discriminating populations of the Tropheus moorii species complex (Teleostei: Cichlidae), a Lake Tanganyika model for allopatric speciation. J. Zool. Syst. Evol. Res.
**2008**, 46, 153–161. [Google Scholar] [CrossRef] - Kerschbaumer, M.; Sturmbauer, C. The Utility of Geometric Morphometrics to Elucidate Pathways of Cichlid Fish Evolution. Int. J. Evol. Biol.
**2011**, 2011, 290245. [Google Scholar] [CrossRef][Green Version] - Rüber, L.; Adams, D. Evolutionary convergence of body shape and trophic morphology in cichlids from Lake Tanganyika. J. Evol. Biol.
**2001**, 14, 325–332. [Google Scholar] [CrossRef] - Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2012; Volume 25. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar] [CrossRef][Green Version]
- Salman, A.; Siddiqui, S.A.; Shafait, F.; Mian, A.; Shortis, M.R.; Khurshid, K.; Ulges, A.; Schwanecke, U. Automatic fish detection in underwater videos by a deep neural network-based hybrid motion learning system. ICES J. Mar. Sci.
**2020**, 77, 1295–1307. [Google Scholar] [CrossRef] - Qin, H.; Li, X.; Liang, J.; Peng, Y.; Zhang, C. DeepFish: Accurate Underwater Live Fish Recognition with a Deep Architecture. Neurocomputing
**2016**, 187, 49–58. [Google Scholar] [CrossRef] - Villon, S.; Mouillot, D.; Chaumont, M.; Darling, E.S.; Subsol, G.; Claverie, T.; Villeger, S. A Deep Learning Method for Accurate and Fast Identification of Coral Reef Fishes in Underwater Images. Ecol. Inform.
**2018**, 48, 238–244. [Google Scholar] [CrossRef][Green Version] - Cui, S.; Zhou, Y.; Wang, Y.; Zhai, L. Fish Detection Using Deep Learning. Appl. Comput. Intell. Soft Comput.
**2020**, 2020, 3738108. [Google Scholar] [CrossRef] - Allken, V.; Handegard, N.O.; Rosen, S.; Schreyeck, T.; Mahiout, T.; Malde, K. Fish species identification using a convolutional neural network trained on synthetic data. ICES J. Mar. Sci.
**2018**, 76, 342–349. [Google Scholar] [CrossRef] - Marini, S.; Fanelli, E.; Sbragaglia, V.; Azzurro, E.; del Rio, J.; Aguzzi, J. Tracking Fish Abundance by Underwater Image Recognition. Sci. Rep.
**2018**, 8, 13748. [Google Scholar] [CrossRef][Green Version] - Lapuschkin, S.; Wäldchen, S.; Binder, A.; Montavon, G.; Samek, W.; Müller, K.R. Unmasking Clever Hans predictors and assessing what machines really learn. Nat. Commun.
**2019**, 10, 1096. [Google Scholar] [CrossRef] [PubMed][Green Version] - Samek, W.; Wiegand, T.; Müller, K.R. Explainable Artificial Intelligence: Understanding, Visualizing, and Interpreting Deep Learning Models. ITU J. ICT Discov.
**2018**, 1, 49–58. [Google Scholar] - Samek, W.; Montavon, G.; Lapuschkin, S.; Anders, C.J.; Müller, K.R. Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. Proc. IEEE
**2021**, 109, 247–278. [Google Scholar] [CrossRef] - Samek, W.; Müller, K.R. Towards Explainable Artificial Intelligence. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 5–22. [Google Scholar] [CrossRef][Green Version]
- Montavon, G.; Samek, W.; Müller, K.R. Methods for interpreting and understanding deep neural networks. Digit. Signal Process.
**2018**, 73, 1–15. [Google Scholar] [CrossRef] - Bengio, Y.; Courville, A.; Vincent, P. Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell.
**2013**, 35, 1798–1828. [Google Scholar] [CrossRef] - Wöber, W.; Curto, M.; Tibihika, P.; Meulenbroek, P.; Alemayehu, E.; Mehnen, L.; Meimberg, H.; Sykacek, P. Identifying geographically differentiated features of Ethopian Nile tilapia (Oreochromis niloticus) morphology with machine learning. PLoS ONE
**2021**, 16, e0249593. [Google Scholar] [CrossRef] - Gower, J.C. Generalized procrustes analysis. Psychometrika
**1975**, 40, 33–51. [Google Scholar] [CrossRef] - Wöber, W.; Mehnen, L.; Sykacek, P.; Meimberg, H. Investigating Explanatory Factors of Machine Learning Models for Plant Classification. Plants
**2021**, 10, 2674. [Google Scholar] [CrossRef] [PubMed] - Marcus, G. Deep Learning: A Critical Appraisal. arXiv
**2018**, arXiv:1801.00631. [Google Scholar] - Tibihika, P.D.; Curto, M.; Negash, E.; Waidbacher, H.; Masembe, C.; Akoll, P.; Meimberg, H. Molecular genetic diversity and differentiation of Nile tilapia (Oreochromis niloticus, L. 1758) in East African natural and stocked populations. BMC Evol. Biol.
**2020**, 20, 16. [Google Scholar] [CrossRef] [PubMed][Green Version] - Tesfaye, G.; Curto, M.; Meulenbroek, P.; Englmaier, G.K.; Tibihika, P.D.; Negash, E.; Getahun, A.; Meimberg, H. Genetic diversity of Nile tilapia (Oreochromis niloticus) populations in Ethiopia: Insights from nuclear DNA microsatellites and implications for conservation. BMC Ecol.
**2021**, 21, 113. [Google Scholar] [CrossRef] - Kariuki, J.; Tibihika, P.D.; Curto, M.; Alemayehu, E.; Winkler, G.; Meimberg, H. Application of microsatellite genotyping by amplicon sequencing for delimitation of African tilapiine species relevant for aquaculture. Aquaculture
**2021**, 537, 736501. [Google Scholar] [CrossRef] - Titsias, M.K.; Lawrence, N.D. Bayesian Gaussian Process Latent Variable Model. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 844–851. [Google Scholar]
- Dong, G.; Liao, G.; Liu, H.; Kuang, G. A Review of the Autoencoder and Its Variants: A Comparative Perspective from Target Recognition in Synthetic-Aperture Radar Images. IEEE Geosci. Remote Sens. Mag.
**2018**, 6, 44–68. [Google Scholar] [CrossRef] - Fred, A.L.; Jain, A. Combining Multiple Clusterings Using Evidence Accumulation. IEEE Trans. Pattern Anal. Mach. Intell.
**2005**, 27, 835–850. [Google Scholar] [CrossRef] - Rasmussen, C. The Infinite Gaussian Mixture Model. In Advances in Neural Information Processing Systems; Solla, S., Leen, T., Müller, K., Eds.; MIT Press: Cambridge, MA, USA, 2000; Volume 12. [Google Scholar]
- Vega-Pons, S.; Ruiz-Shulcloper, J. A Survey of Clustering Ensemble Algorithms. Int. J. Pattern Recognit. Artif. Intell.
**2011**, 25, 337–372. [Google Scholar] [CrossRef] - R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
- Dunnington, D. rosm: Plot Raster Map Tiles from Open Street Map and Other Sources; R Package Version 0.2.5. 2019. Available online: https://rdrr.io/cran/rosm/ (accessed on 17 February 2022).
- Bradski, G. The OpenCV Library. Dr. Dobb’s J. Softw. Tools
**2000**, 120, 122–125. [Google Scholar] - Dryden, I.L. shapes: Statistical Shape Analysis; R Package Version 1.2.6. 2021. Available online: ttps://cran.r-project.org/web/packages/shapes/shapes.pdf (accessed on 17 February 2022).
- Baken, E.; Collyer, M.; Kaliontzopoulou, A.; Adams, D. geomorph v4.0 and gmShiny: Enhanced analytics and a new graphical interface for a comprehensive morphometric experience. Methods Ecol. Evol.
**2021**, 12, 2355–2363. [Google Scholar] [CrossRef] - Adams, D.C.; Otárola-Castillo, E. geomorph: An r package for the collection and analysis of geometric morphometric shape data. Methods Ecol. Evol.
**2013**, 4, 393–399. [Google Scholar] [CrossRef] - Collyer, M.L. RRPP: Linear Model Evaluation with Randomized Residuals in a Permutation Procedure. 2019. Available online: https://cran.r-project.org/package=RRPP (accessed on 17 February 2022).
- Collyer, M.L.; Adams, D.C. RRPP: An r package for fitting linear models to high-dimensional data using residual randomization. Methods Ecol. Evol.
**2018**, 9, 1772–1779. [Google Scholar] [CrossRef][Green Version] - Lawrence, N.D. Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data. In Proceedings of the 16th International Conference on Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2004; pp. 329–333. [Google Scholar]
- Li, P.; Chen, S. A Review on Gaussian Process Latent Variable Models. CAAI Trans. Intell. Technol.
**2016**, 1, 366–376. [Google Scholar] [CrossRef] - Titsias, M. Variational Learning of Inducing Variables in Sparse Gaussian Processes. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, Clearwater Beach, FL, USA, 16–18 April 2009; Volume 5, pp. 567–574. [Google Scholar]
- Wöber, W.; Aburaia, M.; Olaverri-Monreal, C. Classification of Streetsigns Using Gaussian Process Latent Variable Models. In Proceedings of the 2019 IEEE International Conference on Connected Vehicles and Expo, ICCVE 2019, Graz, Austria, 4–8 November 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational Inference: A Review for Statisticians. J. Am. Stat. Assoc.
**2017**, 112, 859–877. [Google Scholar] [CrossRef][Green Version] - Bishop, C.M. Pattern Recognition and Machine Learning; Springer Science+Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- GPy. GPy: A Gaussian Process Framework in Python. 2012. Available online: http://github.com/SheffieldML/GPy (accessed on 17 February 2022).
- Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning); The MIT Press: Cambridge, MA, USA, 2005. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; The MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Chollet, F. Keras. 2015. Available online: https://keras.io (accessed on 17 February 2022).
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Chollet, F. Adam. 2015. Available online: https://keras.io/api/optimizers/adam/ (accessed on 17 February 2022).
- Pett, M.A. Nonparametric Statistics for Health Care Research: Statistics for Small Samples and Unusual Distributions; SAGE Publications: Thousand Oaks, CA, USA, 2015. [Google Scholar]
- Marchini, J.; Heaton, C.; Ripley, B.D. fastICA: FastICA Algorithms to Perform ICA and Projection Pursuit; R Package Version 1.2-2. 2019. Available online: https://cran.r-project.org/web/packages/fastICA/fastICA.pdf (accessed on 17 February 2022).
- Hyvärinen, A.; Oja, E. Independent component analysis: Algorithms and applications. Neural Netw.
**2000**, 13, 411–430. [Google Scholar] [CrossRef][Green Version] - Schloerke, B.; Cook, D.; Larmarange, J.; Briatte, F.; Marbach, M.; Thoen, E.; Elberg, A.; Crowley, J. GGally: Extension to ‘ggplot2’; R Package Version 2.1.2. 2021. Available online: https://cran.r-project.org/web/packages/GGally/index.html (accessed on 17 February 2022).
- Raj, A.; Stephens, M.; Pritchard, J.K. fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data sets. Genetics
**2014**, 197, 573–589. [Google Scholar] [CrossRef][Green Version] - Earl, D.A.; vonHoldt, B.M. STRUCTURE HARVESTER: A website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv. Genet. Resour.
**2011**, 4, 359–361. [Google Scholar] [CrossRef] - Evann, G.; Regnaut, S.; Goudet, J. Detecting the number of clusters of individuals using the software structure: A simulation study. Mol. Ecol.
**2005**, 14, 2611–2620. [Google Scholar] [CrossRef][Green Version] - Scrucca, L.; Fop, M.; Murphy, T.B.; Raftery, A.E. mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. R J.
**2016**, 8, 205–233. [Google Scholar] [CrossRef][Green Version] - Schwarz, G. Estimating the Dimension of a Model. Ann. Stat.
**1978**, 6, 461–464. [Google Scholar] [CrossRef] - Blöschl, G.; Sivapalan, M. Scale issues in hydrological modelling: A review. Hydrol. Process.
**1995**, 9, 251–290. [Google Scholar] [CrossRef] - Kass, R.E.; Raftery, A.E. Bayes Factors. J. Am. Stat. Assoc.
**1995**, 90, 773–795. [Google Scholar] [CrossRef] - Kauffmann, J.R.; Ruff, L.; Montavon, G.; Müller, K. The Clever Hans Effect in Anomaly Detection. arXiv
**2020**, arXiv:2006.10609. [Google Scholar]

**Figure 1.**Two-dimensional representation of a specimen dataset and two valid clusters unveiling the latent structure (left side). Two models (right side, red and green model) with global parameter $\mathsf{\Phi}$ and local parameter $\mathsf{\Psi}$ may result in two or four clusters.

**Figure 2.**The pipeline used in this study to investigate the biological interpretation of the learned features as well as the statistical analysis of the discriminability of the known population clusters.

**Figure 4.**The used landmarks for GPA coordinate scaling. The green landmarks were not used for the images obtained in Uganda.

**Figure 5.**The procedure to visualize the latent dimensions. The image databases were used to obtain a latent space. For each dimension in the latent space, a heatmap was generated showing the variability generated by this dimension’s variability.

**Figure 6.**Visualization of the GPA coordinates for the population of Ethiopia (

**left**) and Uganda (

**right**).

**Figure 7.**Visualization of Spearman’s rank correlation test for the relation between GPA coordinates and the specimens population. $1-p$ is shown instead of the p-value. A p-value below $\alpha =0.1$ indicates a significant correlation of the GPA landmarks coordinate to the population. GPA coordinates with a p-value above $\alpha =0.1$ are shown as red bars. (

**a**) Result of Spearman’s rank correlation test for the relation between the GPA coordinate and population locations in Ethiopia. (

**b**) Result of Spearman’s rank correlation test for the relation between the GPA coordinate and population locations in Uganda.

**Figure 8.**Result of Spearman’s rank correlation test for the relation between the manually selected GP-LVM features and population locations in Ethiopia. The features with highest ARD values are shown. Features with a p-value above $\alpha =0.1$ are shown as red bars.

**Figure 9.**Result of Spearman’s rank correlation test for the relation between the manually selected GP-LVM features and population locations in Uganda. The features with highest ARD values are shown. Features with a p-value above $\alpha =0.1$ are shown as red bars.

**Figure 10.**Result of Spearman’s rank correlation test for the relation between the manually selected cAE features and population locations in Ethiopia. Randomly selected features are visualized. Features with a p-value above $\alpha =0.1$ are shown as red bars.

**Figure 11.**Results of Spearman’s rank correlation test for the relation between the manually selected AE features and population locations in Uganda. Randomly selected features are visualized. Features with a p-value above $\alpha =0.1$ are shown as red bars.

**Figure 12.**Visualization of the multivariate analysis of Ethiopia (

**left column**) and Uganda (

**right column**). The figures show the PCA/ICA reduced landmarks (

**top**), GP-LVM features (

**middle**) as well as reduced cAE features (

**bottom**). The symbols ‘***’, ‘**’, ‘*’ as well as ‘.’ next to the numeric correlation values indicates significant levels below $0.001$, $0.01$, $0.05$ and $0.1$. If no symbol is given, the significance level was obtained to be larger than $0.1$.

**Figure 13.**Visualization of the results obtained by pCA and the Bayes factor hypothesis test relying on GPA. Both pCA matrices show minor visible structure. Three of six locations of Ethiopia were found to be significantly different to the remaining populations. Similarly, one out of nineteen locations were identified to be significantly different in Uganda’s locations. (

**a**) pCA and Bayes factor results for Ethiopia relying on GPA scaling. (

**b**) pCA and Bayes factor results for Uganda relying on GPA scaling. The KyB population label in the pCA matrix was removed due to readability.

**Figure 14.**Visualization of the results obtained by pCA and the Bayes factor hypothesis test relying on GP-LVM. Both pCA matrices show visible structure. Four of six locations in Ethiopia were found to be significantly different to the remaining locations. Eleven out of nineteen locations were identified to be significantly different in Uganda’s locations. (

**a**) pCA and Bayes factor results for Ethiopia relying on GP-LVM. (

**b**) pCA and Bayes factor results for Uganda relying on GP-LVM. The KyB population label in the pCA matrix was removed due to readability.

**Figure 15.**Visualization of the results obtained by pCA and the Bayes factor hypothesis test relying on cAE. Both pCA matrices show visible structure. All locations of Ethiopia were found to be significantly different. Fourteen out of nineteen locations were identified to be significantly different in Uganda’s locations. (

**a**) pCA and Bayes factor results for Ethiopia relying on cAE. (

**b**) pCA and Bayes factor results for Uganda relying on cAE. The KyB population label in the pCA matrix was removed due to readability.

**Table 1.**Summary of image dataset from Ethiopia (209 specimens) and Uganda [5] (462 specimens). During this study, we used the location name as well as the abbreviation.

Water Body | Abbr. | Nr. Spec. | Latitude | Longitude | |
---|---|---|---|---|---|

Ethiopia | Chamo | Cham | 36 | 5.83333 | 37.55 |

Hawassa | Hawa | 38 | 7.05 | 38.43333 | |

Koka | Koka | 31 | 8.39197 | 39.07679 | |

Langano | Lang | 26 | 7.61666 | 38.76666 | |

Tana | Tana | 38 | 12.0166 | 37.29194 | |

Ziway | Ziwa | 40 | 8.00083 | 38.82111 | |

Uganda | Victoria Kakyanga | ViKak | 28 | −0.18079 | 32.29332 |

Victoria Masese | ViM | 28 | 0.4365 | 33.24081 | |

Victoria Gaba | ViG | 23 | 0.25819 | 32.63727 | |

Victoria Sango Bay | ViSB | 20 | −0.86772 | 31.71332 | |

Victoria Kamuwunga | ViKam | 16 | −0.12747 | 31.93999 | |

Albert Ntoroko | AlN | 22 | 1.05206 | 30.53464 | |

Albert Kyehooro | AlK | 16 | 1.5099 | 30.9361 | |

George Hamukungu | Ge | 34 | −0.01739 | 30.08698 | |

Kazinga Channel Katungulu | KaC | 30 | −0.12541 | 30.04744 | |

Edward Kazinga | EdK | 21 | −0.20783 | 29.89252 | |

Edward Rwenshama | EdR | 19 | −0.40459 | 29.77283 | |

Kyoga Kibuye | KyK | 32 | 1.40028 | 32.57949 | |

Kyoga Bukungu | KyB | 3 | 1.43873 | 32.86809 | |

River Nile Kibuye | Ni | 29 | 1.18734 | 32.96865 | |

Mulehe Musezero | Mu | 27 | −1.21345 | 29.72668 | |

Kayumbu Rugarambiro | Ka | 28 | −1.34679 | 29.78446 | |

Bangena Farm | BF | 34 | −1.25617 | 29.73622 | |

Sindi Farm | SF | 22 | −1.17578 | 30.06198 | |

Rwitabingi Farm | RF | 30 | 0.97116 | 33.13924 |

**Table 2.**Description of the landmarks used in this study. The asterix (*) indicates landmarks not used on Ugandan samples.

Landmark Name | Landmark Abbreviation |
---|---|

Upper tip of snout | UTP |

Center of eye | EYE |

Anterior insertion of dorsal fin | AOD |

Posterior insertion of dorsal fin | POD |

Dorsal insertion of caudal fin | DIC |

Ventral insertion of caudal fin | VOC |

Posterior insertion of anal fin | PIA |

Dorsal base of pectoral fin | BPF |

Most posterior edge of operculum | PEO |

Ventral edge of operculum | VEO |

Anterior insertion of anal fin * | AOA |

Anterior insertion of pelvic fin * | AOP |

Halfway between dorsal and ventral insertion of caudal fin * | HCF |

Posterior end of mouth * | EMO |

**Table 3.**Summary of results obtained with generalized procrustes analysis (GPA), Gaussian process latent variable models (GP-LVM) as well as deep convolutional autoencoder (cAE). The significantly different locations are indicated with a cross (×).

Abbr. | GPA | GP-LVM | cAE | ||
---|---|---|---|---|---|

Ethiopia | Chamo | Cham | × | ||

Hawassa | Hawa | × | × | ||

Koka | Koka | × | × | ||

Langano | Lang | × | × | × | |

Tana | Tana | × | × | ||

Ziway | Ziwa | × | × | × | |

Uganda | Victoria Kakyanga | ViKak | × | ||

Victoria Masese | ViM | × | × | ||

Victoria Gaba | ViG | × | × | ||

Victoria Sango Bay | ViSB | × | |||

Victoria Kamuwunga | ViKam | × | |||

Albert Ntoroko | AlN | × | × | ||

Albert Kyehooro | AlK | × | |||

George Hamukungu | Ge | × | × | ||

Kazinga Channel Katungulu | KaC | × | × | ||

Edward Kazinga | EdK | × | × | ||

Edward Rwenshama | EdR | × | |||

Kyoga Kibuye | KyK | × | × | ||

Kyoga Bukungu | KyB | ||||

River Nile Kibuye | Ni | × | |||

Mulehe Musezero | Mu | ||||

Kayumbu Rugarambiro | Ka | × | × | ||

Bangena Farm | BF | × | × | ||

Sindi Farm | SF | ||||

Rwitabingi Farm | RF | × | × |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Wöber, W.; Mehnen, L.; Curto, M.; Tibihika, P.D.; Tesfaye, G.; Meimberg, H. Investigating Shape Variation Using Generalized Procrustes Analysis and Machine Learning. *Appl. Sci.* **2022**, *12*, 3158.
https://doi.org/10.3390/app12063158

**AMA Style**

Wöber W, Mehnen L, Curto M, Tibihika PD, Tesfaye G, Meimberg H. Investigating Shape Variation Using Generalized Procrustes Analysis and Machine Learning. *Applied Sciences*. 2022; 12(6):3158.
https://doi.org/10.3390/app12063158

**Chicago/Turabian Style**

Wöber, Wilfried, Lars Mehnen, Manuel Curto, Papius Dias Tibihika, Genanaw Tesfaye, and Harald Meimberg. 2022. "Investigating Shape Variation Using Generalized Procrustes Analysis and Machine Learning" *Applied Sciences* 12, no. 6: 3158.
https://doi.org/10.3390/app12063158