Next Article in Journal
A Theoretical Rigid Body Model of Vibrating Screen for Spring Failure Diagnosis
Previous Article in Journal
A Mathematical Model and Numerical Solution of a Boundary Value Problem for a Multi-Structure Plate
Previous Article in Special Issue
Ecoacoustics: A Quantitative Approach to Investigate the Ecological Role of Environmental Sounds
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Use of the Evenness of Eigenvalues of Similarity Matrices to Test for Predictivity of Ecosystem Classifications

Department of Life Sciences, University of Trieste, 34127 Trieste, Italy
*
Author to whom correspondence should be addressed.
Mathematics 2019, 7(3), 245; https://doi.org/10.3390/math7030245
Submission received: 6 December 2018 / Revised: 26 February 2019 / Accepted: 7 March 2019 / Published: 9 March 2019
(This article belongs to the Special Issue New Paradigms and Trends in Quantitative Ecology)

Abstract

:
The use of the evenness (E(λ)) of the eigenvalues of similarity matrices corresponding to different hierarchical levels of ecosystem classifications, is suggested to test correlation (or predictivity) between biological communities and environmental factors as one alternative of analysis of variance (parametric or non-parametric). The advantage over traditional methods is the fact that similarity matrices can be obtained from any kind of data (mixed and missing data) by indices such as those of Goodall and Gower. The significance of E(λ) is calculated by permutation techniques. One example of application of E(λ) is given by a data set describing plant community types (beech forests of the Italian peninsula).

1. Introduction

The separation between the classes of a classification in terms of the features used for the classification itself or in terms of features not used for the classification, can be evaluated by parametric methods based on simple or multivariate analysis of variance (ANOVA or MANOVA) [1,2] and by nonparametric ones [3]. Within these, Feoli and Bressan in 1972 [4] proposed a simple index, called index of “individualization”, given by the ratio between the average similarity within one class of sampling units and the average similarity that this class has with the other classes present in the study area: the higher the index the higher the separation of one class from the other classes. Orlóci [1] presented the index (let we call it INDI) in a way that its values range between 0 and 1. It is given by subtracting from 1 the ratio between the average similarity between the classes and the average similarity within the classes (i.e., INDI=1−B/W, where B means average between, or among similarity, and W the average within similarity). In 1988, Biondini et al. [5] proposed to test the separation between classes by an index based on Euclidean metric (a method based on “sum of squares”) that uses the average within class sum of squares (δ) by introducing the methods known as multiple response permutation procedure (MRPP) and its randomized block design analogue (MRBP). In their proposal the best partition (classification) is the one with the lowest value of δ. Later, Clark and Anderson proposed, respectively the methods known as Analysis Of SIMilarity (ANOSIM) [6] and PERmutational Multivariate ANalysis Of Variance (PERMANOVA) [7] in which the ratio between and within dissimilarity averages or “between (or among)/within (residual) sum of squares” can be based on any kind of similarity functions as suggested in [4]. In all these methods the significance of the class separation is tested by permutation techniques [8,9]. The advantage of ANOSIM and PERMANOVA with respect the method of Biondini et al. [5] relies on the fact that both can be applied to any kind of data (mixed and missing data) if similarity is measured by suitable functions, e.g., the one of Gower or Goodall [1,2].
The idea of using the evenness of the eigenvalues of a similarity matrix, rather than the indices based on the “ratio between/within average similarities” (or between/within sum of squares), is not new. It was already published by Feoli et al. [10] in 2009, and far before it was implemented, together with INDI [1], in the software MATEDIT [11,12]. This was done to find the optimal classification among the sets of classifications obtained by MATEDIT, i.e., the one that would offer the best separation between the classes according Occam’s razor rule. This rule, also called principle of ontological economy, principle of parsimony, or principle of simplicity [13,14,15], says that: “Entities are not to be multiplied beyond necessity.” In practical terms, Occam’s razor rule is satisfied when every class of a classification can be easily distinguishable from the other classes on the basis of peculiar features.
Two theorems of matrix algebra are supporting the evenness of eigenvalues as a suitable index of class separation [16]. According to the first theorem, each disjoint submatrix of a given matrix, has its independent sets of eigenvalues (and eigenvectors); according to the second theorem a matrix (N × N) with scores all equal to 1 has only one eigenvalue that is equal to N.
From these two theorems it is easy to deduce that in case of a perfect crisp classification, the entropy [17] of the eigenvalues Hk(λ) of a similarity matrix S (N × N), with k representing the number of classes of elements, would be equal to Hk, i.e., the entropy of the proportions of the classes:
H k = j n k N ln n k N
with j = 1,…k, nk indicating the number of elements in the k-th class and N the total number of elements.
Therefore, the ratio
D = H k ( λ ) H k
where
H k ( λ ) = i λ i i λ i ln λ i i λ i
with λi indicating the i-th positive eigenvalue of the similarity matrix S (k × k), where the entries are the sum of similarity values within the classes and the sum of similarity between the classes, would represent an index of class separation ranging between 0 and 1. It is 0 when the matrix is full of 1s, i.e., there is no separation between the classes, it is 1 when the matrix presents fully disjoint submatrices with the values of within similarity all equal to 1.
If we do not consider the importance of the proportion between the classes, but just their within and between similarity, we should use the following formula:
E ( λ ) = H S k ( λ ) ln k
In this case E(λ) is the evenness of the eigenvalues of the similarity matrix S (k × k) where the scores are the average similarity within the classes and the average similarity between the classes (also this index, ranges between 0 and 1). Both D and E(λ) are indices measuring class separation that can be tested by permutation techniques.
In the present paper, we consider only E(λ) because we think irrelevant the size of the classes when we are interested in measuring their separation. We suggest a new application of E(λ) that consists in testing how much a classification, based on a set of features A, is predictive with respect another set of features B that has not been used to obtain the classification.

2. Applications of E(λ)

2.1. Summary of the Rationale

E(λ) was primarily suggested to test the separation between classes at different hierarchical levels of a classification T on the basis of the set A of features that has been used to obtain T (e.g., [10]). In the present paper we suggest using E(λ) to test if the k classes of a given classification are significantly separated when they are described by a set B of external features (biotic or abiotic, explanatory or non-explanatory variables), i.e., features that have not been used to obtain T. If the external features show significant separation between the classes, it would mean that their effect on those used for the classification would be significant, or vice versa, that the effect of the features used for classification would be significant over these external variables. In other words, the higher is E(λ), the higher is the correlation between A and B, i.e., the higher is predictivity of the classification based on A with respect the set B or vice versa. We can apply E(λ) to any given similarity matrix S (N × N) based on all the h variables of B and to each of the h similarity matrices Si (N × N), with i = 1, …, h, obtained by using the single i-th feature of the set B to test the separation of the k classes of a given classification of N objects. In this case, we get a correlation between the set A and the single i-th feature of B. The exercise could be repeated for different hierarchical levels of T in order to discover or to define among the levels, what is the most predictive with respect to the whole set B or to its single features. E(λ) is different from INDI and from MRPP, MRBP, ANOSIM and PERMANOVA, because it uses the spectrum of positive eigenvalues and not what Anderson [7] calls the “pseudo F statistics”. E(λ) is not a “pseudo thing”, it is mathematically clear and represent an index sensitive to the overall structure of the data set under study thanks the two mentioned theorems. The permutation techniques allow to test the significance of E(λ) by calculating it a great number of times after permuting the scores of the similarity matrix S (N × N) within and among the submatrices corresponding to the k classes of a given classification. The ratio between the number of E(λ), greater than that observed, calculated by the permutations and the total number of permutations, gives an estimate of the probability to reject the hypothesis of separation, i.e., to accept the null hypothesis of non-separation.

2.2. Example of Application of E(λ)

2.2.1. Data

The example is based on a data matrix given in Table 1. It describes 10 vegetation types of beech forests of Central and South Italy [18] by eight ecological indicator values of Landolt [19] representing eight environmental factors. The 10 vegetation types are obtained by clustering methods using the species as features (features of set A) (see [18] for references). The 10 vegetation types belong to two phytosociological associations: Aquifolio-Fagetum (AQ) and Trochiscantho-Fagetum (TF).

2.2.2. Methods

E(λ) was applied in order to answer the following two questions:
(a)
Are the two plant associations, as defined by species (set A), significantly separated in the space defined by environmental factors (set B)?
(b)
What are the environmental factors of set B that are more correlated with the two associations?
The similarity matrix S (10 × 10) for the 10 vegetation types, has been obtained by the complement to 1 of Euclidean distance after having transformed all the Euclidean distances dij according to the following formula:
S i j = 1 d i j d m i n d m a x d m i n
where dmin and dmax are respectively the minimum and maximal Euclidean distance in the dissimilarity matrix.
To answer question a) we have calculated E(λ) with the similarity matrix S (10 × 10), by grouping the 10 vegetation types according to the two associations. To answer question b) we have measured the separation between the two associations in terms of the single environmental factor. In this way E(λ) is used as an alternative index of the Kruskal-Wallis test (i.e., a univariate non-parametric analysis of variance [3]) that we have calculated just for a comparison with E(λ). We have obtained eight similarity matrices Si (N × N), by comparing the 10 vegetation types on the basis of each of the eight environmental factors. Also, in this case we have used formula (5). These eight factors could have been combined in several ways to test the capacity of their combinations to separate the classes, but we did not enter in such an exercise that would not add new meanings to the aim of the paper.

2.2.3. Results

The results related to question (a) confirm that the separation between the two associations in the space defined by all environmental factors is significant. The E(λ) is highly significant when the two associations are described by all the eight environmental factors (E(λ) = 0.824, p < 0.00001). This means that the two associations defined on the basis of floristic data, occupy two different community niches well separated from the environmental point of view.
The answer to question (b) is given in Table 2. This table shows also the average values of environmental factors of the two associations. The two tests, E(λ) and KW, fully agree in showing the features that significantly separate the two associations. The correlation between E(λ) and the KW is very high and significant (Table 3).
The temperature and the dispersion (i.e., the dimension of the soil particles) are the only two environmental factors that are highly significant. This means that these two environmental factors, temperature and dispersion, are influencing very strongly the floristic composition of the two associations, therefore we can conclude that Aquifolio-Fagetum is more thermophilic than Trochiscantho-Fagetum and that this last association is more related to soil with higher dispersion values, with respect the soil of the former association.

3. Discussion and Conclusions

The example is showing how, with similarity matrices, we can analyze in detail the relationships between different kind of environmental factors and the states of ecological systems. In this case they are defined by types of communities, however they could be represented by any type of sampling units. It is easy to understand that E(λ) could be used as an index to measure the indicator values of species or other biological features since the higher is the E(λ) for a given feature the higher is the separation or distinctiveness of the classes as described by that feature. However, we do not enter in such an obvious discussion and we sent the interested reader to [20] (and references therein) for a review of other indices proposed to test the indicator values. We used a very simple data matrix in the example describing types of plant communities by features not used for their definition. We chose this simple data set because the results could have been easily discussed and challenged with the results of studies already completed with the same data set and with knowledge already consolidated on the type of vegetation system (i.e., beech forests, for references see [18]). The results indicate the environmental factors differentiating the two associations. The pattern of significance is in agreement with current knowledge about the two associations and provides useful details. However, a discussion on the relationships between the biological features (species or other features) of vegetation and the environmental factors—which could be interesting for specialists in vegetation science—is outside the scope of this paper.
We can conclude the paper by saying that the evenness E(λ) of the eigenvalues of similarity matrices—that are a necessary step for the majority of ordinations methods [21] thanks to the theorem of spectral decomposition [16]—can be considered a simple, useful tool to investigate the predictive value of a classification with respect to external variables. This would help to find the variables that would be explanatory and support the given classification in terms of Occam’s razor rule.

Author Contributions

Conceptualization, E.F. and P.G.; methodology, E.F.; writing—original draft preparation, E.F.; writing—review and editing, E.F. and P.G.

Funding

This research received no external funding.

Acknowledgments

We thank the anonymous referees for very useful suggestions to improve the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Orlóci, L. Multivariate Analysis in Vegetation Research, 2nd ed.; Junk: The Hague, The Netherlands, 1978. [Google Scholar]
  2. Legendre, P.; Legendre, L. Numerical Ecology, 3rd ed.; Elsevier: Amsterdam, The Netherlands, 2012. [Google Scholar]
  3. Siegel, S. Nonparametric Statistics: For the Behavioral Sciences; McGraw-Hill: New York, NY, USA, 1956. [Google Scholar]
  4. Feoli, E.; Bressan, G. Affinità floristica dei tipi di vegetazione bentonica della Cala di Mitigliano (Massa Lubrense, Napoli). Plant Biosyst. 1972, 106, 245–256. [Google Scholar] [CrossRef]
  5. Biondini, M.E.; Mielke, P.W., Jr.; Redente, E.F. Permutation techniques based on Euclidean analysis spaces: A new and powerful statistic method for ecological research. Coenoses 1988, 3, 155–174. [Google Scholar]
  6. Clarke, K.R. Non-parametric multivariate analyses of changes in community structure. Aust. J. Ecol. 1993, 18, 117–143. [Google Scholar] [CrossRef]
  7. Anderson, M.J. A new method for non-parametric multivariate analysis of variance. Austral Ecol. 2001, 26, 32–46. [Google Scholar]
  8. Pillar, V.D.P.; Orlóci, L. On randomization testing in vegetation science: Multifactor comparisons of relevé groups. J. Veg. Sci. 1996, 7, 585–592. [Google Scholar] [CrossRef]
  9. Manly, B.F.J. Randomization, Bootstrap and Monte Carlo Methods in Biology, 3rd ed.; Chapman & Hall: London, UK, 2006. [Google Scholar]
  10. Feoli, E.; Gallizia Vuerich, L.; Ganis, P.; Woldu, Z. A classificatory approach integrating fuzzy set theory and permutation techniques for land cover analysis: A case study on a degrading area of the Rift Valley (Ethiopia). Community Ecol. 2009, 10, 53–64. [Google Scholar] [CrossRef]
  11. Burba, N.; Feoli, E.; Malaroda, M.; Zuccarello, V. Un Sistema Informativo per la Vegetazione. Software per L’archiviazione Della Vegetazione Italiana e per L’elaborazione di Tabelle. Manuale di Utilizzo dei Programmi; GEAD-EQ n.11; Università degli Studi di Trieste: Trieste, Italy, 1992. [Google Scholar]
  12. Burba, N.; Feoli, E.; Malaroda, M. MATEDIT: A software tool to integrate information in decision making processes. In Perspectives on Integrated Coastal Management in South America; Neves, R., Baretta, J.W., Mateus, M., Eds.; IST Press: Lisbon, Portugal, 2008. [Google Scholar]
  13. Dale, M.B. Knowing when to stop: Cluster concept–concept cluster. Coenoses 1988, 1, 11–31. [Google Scholar]
  14. Pillar, V. How sharp are classifications? Ecology 1999, 80, 2508–2516. [Google Scholar] [CrossRef]
  15. Goodall, D.W. Classification and ordination: Their nature and role in taxonomy and community studies. Coenoses 1986, 1, 3–9. [Google Scholar]
  16. Wilkinson, J.H. The Algebraic Eigenvalue Problem; Oxford University Press: London, UK, 1965. [Google Scholar]
  17. Shannon, C.E.; Weaver, W. The Mathematical Theory of Communication; University of 524 Illinois Press: Urbana, IL, USA, 1949. [Google Scholar]
  18. Feoli, E.; Ganis, P. Comparison of floristic vegetation types by multiway contingency tables. Abstr. Bot. 1985, 9, 1–15. [Google Scholar]
  19. Landolt, E. Okologische Zeigerwerte zur Schweizer Flora. Ber. Geobot. Inst. ETH 1977, 64, 64–207. [Google Scholar]
  20. Wildi, O.; Feldmeyer-Christe, E. Indicator values (IndVal) mimic ranking by F-ratio in real world vegetation data. Community Ecol. 2013, 14, 139–143. [Google Scholar] [CrossRef]
  21. Wildi, O. Evaluating the predictive power of ordination methods in ecological context. Mathematics 2018, 6, 295. [Google Scholar] [CrossRef]
Table 1. Description of 10 vegetation types of beech forests of Central Italy by the indicator values corresponding to environmental factors according to Landolt [19]. F = Humidity, R = Reaction, N = Nutrients, H = Humus, D = Dispersion, L = Light, T = Temperature, C = Continentality.
Table 1. Description of 10 vegetation types of beech forests of Central Italy by the indicator values corresponding to environmental factors according to Landolt [19]. F = Humidity, R = Reaction, N = Nutrients, H = Humus, D = Dispersion, L = Light, T = Temperature, C = Continentality.
FRNHDLTC
Aquifolio-Fagetum cyclametosum2.83.32.83.53.62.53.82.4
Aquifolio-Fagetum carpinetosum var. Milium2.93.33.13.63.82.13.62.3
Aquifolio-Fagetum carpinetosum var. Lamium3.13.22.93.63.82.23.72.3
Aquifolio-Fagetum brachypodietosum var. Digitalis 2.83.22.93.53.62.43.62.5
Aquifolio-Fagetum brachypodietosum var. Quercus ilex2.83.12.93.53.62.43.62.6
Trochiscantho-Fagetum daphnetosum mezerei2.93.23.03.63.82.23.02.6
Trochiscantho-Fagetum ranunculetosum lanuginosi3.03.03.13.83.92.13.02.6
Trochiscantho-Fagetum ranunculetosum var. Acer pseudoplatanus3.13.23.23.53.92.13.02.5
Trochiscantho-Fagetum luzuletosum var. Sesleria autumnalis2.83.32.83.63.82.33.22.6
Trochiscantho-Fagetum luzuletosum niveae2.93.03.03.73.92.03.12.5
Table 2. Results of application of Kruskal–Wallis test (KW) and E(λ) obtained by the similarity matrices Si (10 × 10) corresponding to the single environmental factors (variables from Humidity to Continentality) by considering the classification of vegetation types in the two associations: AQ = Aquifolio-Fagetum and TF = Trochiscantho-Fagetum. Under the symbols of associations there are the average values of the eight variables, pKW is the probability of the test KW, pE(λ) is the probability of the evenness test. Significant differences are marked in bold.
Table 2. Results of application of Kruskal–Wallis test (KW) and E(λ) obtained by the similarity matrices Si (10 × 10) corresponding to the single environmental factors (variables from Humidity to Continentality) by considering the classification of vegetation types in the two associations: AQ = Aquifolio-Fagetum and TF = Trochiscantho-Fagetum. Under the symbols of associations there are the average values of the eight variables, pKW is the probability of the test KW, pE(λ) is the probability of the evenness test. Significant differences are marked in bold.
CodesAQTFKWpKWE(λ)pE(λ)
Humidity2.882.940.880.350.010.81
Reaction3.223.140.880.350.10.58
Nutrients2.923.021.320.250.160.64
Humus3.543.642.140.140.190.51
Dispersion3.683.864.810.030.730.01
Light2.322.143.90.050.330.10
Temperature3.663.066.80.0090.940.003
Continentality2.422.562.80.090.460.07
Table 3. Correlation between the Kruskal–Wallis test (KW) and E(λ) in Table 2; p means probability of the tests.
Table 3. Correlation between the Kruskal–Wallis test (KW) and E(λ) in Table 2; p means probability of the tests.
Matrix XKWpKWE(λ)pE(λ)
KW1.00−0.890.96−0.87
pKW−0.891.00−0.850.93
E(λ)0.96−0.851.00−0.88
pE(λ)−0.870.93−0.881.00

Share and Cite

MDPI and ACS Style

Feoli, E.; Ganis, P. The Use of the Evenness of Eigenvalues of Similarity Matrices to Test for Predictivity of Ecosystem Classifications. Mathematics 2019, 7, 245. https://doi.org/10.3390/math7030245

AMA Style

Feoli E, Ganis P. The Use of the Evenness of Eigenvalues of Similarity Matrices to Test for Predictivity of Ecosystem Classifications. Mathematics. 2019; 7(3):245. https://doi.org/10.3390/math7030245

Chicago/Turabian Style

Feoli, Enrico, and Paola Ganis. 2019. "The Use of the Evenness of Eigenvalues of Similarity Matrices to Test for Predictivity of Ecosystem Classifications" Mathematics 7, no. 3: 245. https://doi.org/10.3390/math7030245

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop