# Representational Rényi Heterogeneity

^{1}

^{2}

^{3}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Existing Heterogeneity Indices

#### 2.1. Rényi Heterogeneity in Categorical Systems

#### 2.1.1. Properties of the Rényi Heterogeneity

- Event spaces are disjoint: ${\mathcal{X}}_{i}\cap {\mathcal{X}}_{j}=\varnothing $ for all $(i,j)\in \{1,2,\dots ,N\}$ where $i\ne j$
- All systems have equal heterogeneity: ${\mathsf{\Pi}}_{q}\left({\mathbf{p}}_{1}\right)={\mathsf{\Pi}}_{q}\left({\mathbf{p}}_{2}\right)=\cdots ={\mathsf{\Pi}}_{q}\left({\mathbf{p}}_{i}\right)=\cdots ={\mathsf{\Pi}}_{q}\left({\mathbf{p}}_{N}\right)$

#### 2.1.2. Decomposition of Categorical Rényi Heterogeneity

^{th}row is the probability of system ${X}_{i}$ being observed in each state $j\in \{1,2,\dots ,n\}$.

#### 2.1.3. Limitations of Categorical Rényi Heterogeneity

#### 2.2. Non-Categorical Heterogeneity Indices

#### 2.2.1. Numbers Equivalent Quadratic Entropy

#### 2.2.2. Functional Hill Numbers

#### 2.2.3. Leinster–Cobbold Index

#### 2.2.4. Limitations of Existing Non-Categorical Heterogeneity Indices

**Definition**

**1**(Metric distance).

- 1
- Non-negativity: $d(x,y)\ge 0$
- 2
- Identity of indiscernibles: $d(x,y)=0\iff x=y$
- 3
- Symmetry: $d(x,y)=d(y,x)$
- 4
- Triangle inequality: $d(x,z)\le d(x,y)+d(y,z)$

**Definition**

**2**(Ultrametric distance).

## 3. Representational Rényi Heterogeneity

- The representation Z captures the semantically relevant variation in X
- Rényi heterogeneity can be directly computed on Z

- A.
- Application of standard Rényi heterogeneity (Section 2.1) when Z is a categorical representation
- B.
- Deriving parametric forms for Rényi heterogeneity when Z is a non-categorical representation

#### 3.1. Rényi Heterogeneity on Categorical Representations

**Example**

**1**(Classical measurement of biodiversity and economic equality as categorical RRH).

#### 3.2. Rényi Heterogeneity on Non-Categorical Representations

**Example**

**2**(Parametric pooling of multivariate Gaussian distributions).

## 4. Empirical Applications of Representational Rényi Heterogeneity

#### 4.1. Comparison of Heterogeneity Indices Under a Mixture of Beta Distributions

`BetaRegularized[${x}_{0},{x}_{1},a,b$]`command in the Wolfram language and

`betainc($a,b,{x}_{0},{x}_{1}$,regularized=True)`in Python’s

`mpmath`package). Equation (52) implies that ${\overline{f}}_{\theta}(z=1)=1-{\overline{f}}_{\theta}(z=2)$. The pooled heterogeneity is thus expressed as a function of $\theta $ as follows:

#### 4.2. Representational Rényi Heterogeneity is Scalable to Deep Learning Models

## 5. Discussion

## Supplementary Materials

`RRH_Supplement_3State_BMM_CVAE.ipynb`), and Appendix C (

`RRH_Supplement_Siamese.ipynb`).

## Author Contributions

## Funding

## Conflicts of Interest

## Appendix A. Mathematical Appendix

**Proof.**

**Proposition**

**A2.**

**Proof.**

**Proposition**

**A3**(Rényi Heterogeneity of a Continuous System).

**Proof.**

**Proposition**

**A4**(Rényi heterogeneity of a multivariate Gaussian).

**Proof.**

## Appendix B. Expected Distance Between two Beta-Distributed Random Variables

**Figure A1.**Numerical verification of the analytical expression for the expected absolute distance between two Beta-distributed random variables. Solid lines are the theoretical predictions. Ribbons show the bounds between 25th–75th percentiles (the interquartile range, IQR) of the simulated values.

## Appendix C. Evidence Supporting Relative Homogeneity of MNIST “Ones”

**Figure A2.**Depiction of the siamese network architecture and the empirical cumulative distribution function for pairwise distances between digit classes. (

**a**) Depiction of a siamese network architecture. At iteration k, each of two samples, ${X}_{A}^{\left(k\right)}$ and ${X}_{B}^{\left(k\right)}$, are passed through a convolutional neural network to yield embeddings z

_{A}and z

_{B}, respectively. The class label for samples A and B are denoted y

_{A}and y

_{B}, respectively. The L2-norm of these embeddings is computed as D

_{AB}. The network is optimized on the contrastive loss [48] $\mathcal{L}$. Here 𝕀[·] is an indicator function, (

**b**) Empirical cumulative distribution functions (CDF) for pairwise distances between images of the listed classes under the siamese network model. The x-axis plots the L2-norm between embedding vectors produced by the siamese network. The y-axis shows the proportion of samples in the respective group (by line color) whose embedded L2 norms were less than the specified threshold on the x-axis. Class groups are denoted by different line colors. For instance, “0-0” refers to pairs where each image is a “zero.” We combine all disjoint class pairs, for example “0–8” or “3–4,” into a single empirical CDF denoted as “A≠B”.

## References

- Jost, L. Entropy and diversity. Oikos
**2006**, 113, 363–375. [Google Scholar] [CrossRef] - Prehn-Kristensen, A.; Zimmermann, A.; Tittmann, L.; Lieb, W.; Schreiber, S.; Baving, L.; Fischer, A. Reduced microbiome alpha diversity in young patients with ADHD. PLoS ONE
**2018**, 13, e0200728. [Google Scholar] [CrossRef] [PubMed] - Cowell, F. Measuring Inequality, 2nd ed.; Oxford University Press: Oxford, UK, 2011. [Google Scholar]
- Higgins, J.P.T.; Thompson, S.G.; Deeks, J.J.; Altman, D.G. Measuring inconsistency in meta-analyses. BMJ Br. Med. J.
**2003**, 327, 557–560. [Google Scholar] [CrossRef][Green Version] - Hooper, D.U.; Chapin, F.S.; Ewel, J.J.; Hector, A.; Inchausti, P.; Lavorel, S.; Lawton, J.H.; Lodge, D.M.; Loreau, M.; Naeem, S.; et al. Effects of biodiversity on ecosystem functioning: A consensus of current knowledge. Ecol. Monogr.
**2005**, 75, 3–35. [Google Scholar] [CrossRef] - Botta-Dukát, Z. The generalized replication principle and the partitioning of functional diversity into independent alpha and beta components. Ecography
**2018**, 41, 40–50. [Google Scholar] [CrossRef][Green Version] - Mouchet, M.A.; Villéger, S.; Mason, N.W.; Mouillot, D. Functional diversity measures: An overview of their redundancy and their ability to discriminate community assembly rules. Funct. Ecol.
**2010**, 24, 867–876. [Google Scholar] [CrossRef] - Chiu, C.H.; Chao, A. Distance-based functional diversity measures and their decomposition: A framework based on hill numbers. PLoS ONE
**2014**, 9, e113561. [Google Scholar] [CrossRef] [PubMed][Green Version] - Petchey, O.L.; Gaston, K.J. Functional diversity (FD), species richness and community composition. Ecol. Lett.
**2002**. [Google Scholar] [CrossRef] - Leinster, T.; Cobbold, C.A. Measuring diversity: The importance of species similarity. Ecology
**2012**, 93, 477–489. [Google Scholar] [CrossRef][Green Version] - Chao, A.; Chiu, C.H.; Jost, L. Unifying Species Diversity, Phylogenetic Diversity, Functional Diversity, and Related Similarity and Differentiation Measures Through Hill Numbers. Annu. Rev. Ecol. Evol. Syst.
**2014**, 45, 297–324. [Google Scholar] [CrossRef][Green Version] - American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders, 5th ed.; American Psychiatric Publishing: Washington, DC, USA, 2013. [Google Scholar]
- Regier, D.A.; Narrow, W.E.; Clarke, D.E.; Kraemer, H.C.; Kuramoto, S.J.; Kuhl, E.A.; Kupfer, D.J. DSM-5 field trials in the United States and Canada, part II: Test-retest reliability of selected categorical diagnoses. Am. J. Psychiatr.
**2013**, 170, 59–70. [Google Scholar] [CrossRef] [PubMed] - Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell.
**2013**, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed] - Arvanitidis, G.; Hansen, L.K.; Hauberg, S. Latent Space Oddity: On the Curvature of Deep Generative Models. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018; pp. 1–15. [Google Scholar]
- Shao, H.; Kumar, A.; Thomas Fletcher, P. The Riemannian geometry of deep generative models. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Nickel, M.; Kiela, D. Poincaré embeddings for learning hierarchical representations. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6339–6348. [Google Scholar]
- Rényi, A. On measures of information and entropy. Proc. Fourth Berkeley Symp. Math. Stat. Probab.
**1961**, 114, 547–561. [Google Scholar] - Hill, M.O. Diversity and Evenness: A Unifying Notation and Its Consequences. Ecology
**1973**, 54, 427–432. [Google Scholar] [CrossRef][Green Version] - Hannah, L.; Kay, J.A. Concentration in Modern Industry: Theory, Measurement and The U.K. Experience; The MacMillan Press, Ltd.: London, UK, 1977. [Google Scholar]
- Ricotta, C.; Szeidl, L. Diversity partitioning of Rao’s quadratic entropy. Theor. Popul. Biol.
**2009**, 76, 299–302. [Google Scholar] [CrossRef] [PubMed] - LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE
**1998**, 86, 2278–2324. [Google Scholar] [CrossRef][Green Version] - Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. ICLR 2014
**2014**, arXiv:1312.6114v10. [Google Scholar] - Kingma, D.P.; Welling, M. An Introduction to Variational Autoencoders. Found. Trend. Mach. Learn.
**2019**, 12, 307–392. [Google Scholar] [CrossRef] - Eliazar, I.I.; Sokolov, I.M. Measuring statistical evenness: A panoramic overview. Phys. A Stat. Mech. Its Appl.
**2012**, 391, 1323–1353. [Google Scholar] [CrossRef] - Patil, A.G.P.; Taillie, C. Diversity as a Concept and its Measurement. J. Am. Stat. Assoc.
**1982**, 77, 548–561. [Google Scholar] [CrossRef] - Adelman, M.A. Comment on the “H” Concentration Measure as a Numbers-Equivalent. Rev. Econ. Stat.
**1969**, 51, 99–101. [Google Scholar] [CrossRef] - Jost, L. Partitioning Diversity into Independent Alpha and Beta Components. Ecology
**2007**, 88, 2427–2439. [Google Scholar] [CrossRef] [PubMed][Green Version] - Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef][Green Version] - Eliazar, I. How random is a random vector? Ann. Phys.
**2015**, 363, 164–184. [Google Scholar] [CrossRef] - Gotelli, N.J.; Chao, A. Measuring and Estimating Species Richness, Species Diversity, and Biotic Similarity from Sampling Data. In Encyclopedia of Biodiversity, 2nd ed.; Levin, S.A., Ed.; Academic Press: Waltham, MA, USA, 2013; pp. 195–211. [Google Scholar]
- Berger, W.H.; Parker, F.L. Diversity of planktonic foraminifera in deep-sea sediments. Science
**1970**, 168, 1345–1347. [Google Scholar] [CrossRef] [PubMed] - Daly, A.; Baetens, J.; De Baets, B. Ecological Diversity: Measuring the Unmeasurable. Mathematics
**2018**, 6, 119. [Google Scholar] [CrossRef][Green Version] - Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys.
**1988**, 52, 479–487. [Google Scholar] [CrossRef] - Simpson, E.H. Measurement of Diversity. Nature
**1949**, 163, 688. [Google Scholar] [CrossRef] - Gini, C. Variabilità e mutabilità. Contributo allo Studio delle Distribuzioni e delle Relazioni Statistiche; C. Cuppini: Bologna, Italy, 1912. [Google Scholar]
- Shorrocks, A.F. The Class of Additively Decomposable Inequality Measures. Econometrica
**1980**, 48, 613–625. [Google Scholar] [CrossRef][Green Version] - Jost, L. Mismeasuring biological diversity: Response to Hoffmann and Hoffmann (2008). Ecol. Econ.
**2009**, 68, 925–928. [Google Scholar] [CrossRef] - Pigou, A.C. Wealth and Welfare; MacMillan and Co., Ltd: London, England, 1912. [Google Scholar]
- Dalton, H. The Measurement of the Inequality of Incomes. Econ. J.
**1920**, 30, 348. [Google Scholar] [CrossRef] - Macarthur, R.H. Patterns of species diversity. Biol. Rev.
**1965**, 40, 510–533. [Google Scholar] [CrossRef] - Lande, R. Statistics and partitioning of species diversity and similarity among multiple communities. Oikos
**1996**, 76, 5–13. [Google Scholar] [CrossRef] - Rao, C.R. Diversity and dissimilarity coefficients: A unified approach. Theor. Popul. Biol.
**1982**, 21, 24–43. [Google Scholar] [CrossRef] - Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Distributed representations of words and hrases and their compositionality. In Proceedings of the NIPS 2013, Lake Tahoe, NV, USA, 5–10 December 2013; pp. 1–9. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar] [CrossRef]
- Nunes, A.; Alda, M.; Trappenberg, T. On the Multiplicative Decomposition of Heterogeneity in Continuous Assemblages. arXiv
**2020**, arXiv:2002.09734. [Google Scholar] - Bromley, J.; Guyon, I.; LeCun, Y.; Säckinger, E.; Shah, R. Signature verification using a “siamese” time delay neural network. In Proceedings of the Advances in Neural Information Processing Systems 6, Denver, CO, USA, 29 November–2 December 1993; pp. 737–744. [Google Scholar]
- Hadsell, R.; Chopra, S.; LeCun, Y. Dimensionality Reduction by Learning an Invariant Mapping. In Proceedings of the CVPR 2006, New York, NY, USA, 17–22 June 2006; pp. 1735–1742. [Google Scholar]

**Figure 1.**Illustration of simple three-state system under which we compare existing non-categorical heterogeneity indices.

**Panel A**depicts a three state system X as an undirected graph, with node sizes corresponding to state probabilities governed by Equation (24). As $0\le \kappa $ diverges further from $\kappa =1$, the probability distribution over states becomes more unequal.

**Panel B**visually represents the parametric pairwise distance matrix $\mathbf{D}(h,b)$ of Equation (25) (h is height, b is base length, ${D}_{ij}$ is distance between states i and j). In the examples shown in Panels B and C, we set $b=1$. Specifically, we provide visual illustration of settings for which the distance function on X is a metric (Definition 1; when $h<b\sqrt{3}/2$) or ultrametric (Definition 2; when $h\ge b\sqrt{3}/2$).

**Panel C**compares the numbers equivalent quadratic entropy (solid lines marked ${\widehat{Q}}_{e}$; Section 2.2.1), functional Hill numbers (at $q=1$, dashed lines marked ${F}_{1}$; Section 2.2.2), and the Leinster–Cobbold Index (at $q=1$, dotted lines marked ${L}_{1}$; Section 2.2.3) for reporting the heterogeneity of X. The y-axis reports the value of respective indices. The x-axis plots the height parameter for the distance matrix $\mathbf{D}(h,1)$ (Equation (25) and Panel B). The range of h at which $\mathbf{D}(h,1)$ is only a metric is depicted by the gray shaded background. The range of h shown with a white background is that for which $\mathbf{D}(h,1)$ is ultrametric. For each index, we plot values for a probability distribution over states that is perfectly even ($\kappa =1$; dotted markers) or skewed ($\kappa =10$; vertical line markers).

**Panel D**shows the sensitivity of the Leinster–Cobbold index (${L}_{1}$; y-axis) to the scaling parameter $0\le u$ (x-axis) used to transform a distance matrix into a similarity matrix (${S}_{ij}={e}^{-u{D}_{ij}}$). This is shown for three levels of skewness for the probability distribution over states (no skewness at $\kappa =1$, dotted markers; significant skewness at $\kappa =10$, vertical line markers; extreme skewness at $\kappa =100$, square markers).

**Figure 2.**Graphical illustration of the two main approaches for computing representational Rényi heterogeneity. In both cases, we map sampled points on an observable space $\mathcal{X}$ onto a latent space $\mathcal{Z}$, upon which we apply the Rényi heterogeneity measure. The mapping is illustrated by the curved arrows, and should yield a posterior distribution over the latent space.

**Panel A**shows the case in which the latent space is categorical (for example, discrete components of a mixture distribution on a continuous space).

**Panel B**illustrates the case in which the latent space has non-categorical topology. A special case of the latter mapping may include probabilistic principal components analysis. When the latent space is continuous, we must derive a parametric form for the Rényi heterogeneity.

**Figure 3.**Illustration of approaches to computing the pooled distribution on a simple representational space $\mathcal{Z}=\mathbb{R}$. In this example, two points on the observable space, $({\mathbf{x}}_{1},{\mathbf{x}}_{2})\in \mathcal{X}$, are mapped onto the latent space via model $f(\xb7|{\mathbf{x}}_{i})$ for $i\in \{1,2\}$, which indexes univariate Gaussians over $\mathcal{Z}$ (depicted as hatched patterns for ${\mathbf{x}}_{1}$ and ${\mathbf{x}}_{2}$, respectively). A pooled distribution computed non-parametrically by model-averaging (Equation (35)) is depicted as the solid black line. The parametrically pooled distribution (see Example 2) is depicted as the dashed black line. The parametric approach implies the assumption that further samples from $\mathcal{X}$ would yield latent space projections in some regions assigned low probability by $f\left(z\right|{\mathbf{x}}_{1})$ and $f\left(z\right|{\mathbf{x}}_{2})$.

**Figure 4.**Demonstration of data-generating distribution (top row; Equations (45)–(47)), and relationship between the representational model’s decision threshold (Equations (48) and (50)) and categorical representational Rényi heterogeneity (bottom row). The optimal decision boundary (Equation (50)) is shown as a gray vertical dashed line in all plots. Each column depicts a specific parameterization of the data-generating system (parameters are stated above the top row).

**Top Row:**Probability density functions for data-generating distributions. Shaded regions correspond to the two mixture components. Solid black lines denote the marginal distribution (Equation (47)). The x-axis represents the observable domain, which is the (0,1) interval.

**Bottom Row:**Effect of varying categorical representational Rényi heterogeneity (RRH) for $q\in \{1,2,\infty \}$ across different category assignment thresholds for the beta-mixture models shown in the top row. Varying levels of decision boundary are plotted on the x-axis. The y-axis shows the resulting between-observation RRH. Black dots highlight the RRH computed at the optimal decision boundary.

**Figure 5.**Comparison of categorical representational Rényi heterogeneity (${\mathsf{\Pi}}_{q}$), the functional Hill numbers (${F}_{q}$), the numbers equivalent quadratic entropy (${\widehat{Q}}_{e}$), and the Leinster–Cobbold index (${L}_{q}$) within the beta mixture model. Each row of plots corresponds to a given separation between the beta mixture components.

**Column 1**illustrates the beta mixture distributions upon which indices were compared. The x-axis plots the domain of the distribution (open interval between 0 and 1). The y-axis shows the corresponding probability density. Different line styles in Column 1 provides visual examples of the effect of changing the ${\theta}_{1}$ parameter over the range [0.5,1].

**Column 2**compares ${\mathsf{\Pi}}_{q}$ (solid line), ${F}_{q}$ (dashed line), and ${L}_{q}$ (dotted line), each at elasticity $q=1$. The x-axis shows the value of the $0.5\le {\theta}_{1}<1$ parameter at which the indices were compared. Index values are plotted along the y-axis.

**Column 3**compares the indices shown in Column 2, as well as ${\widehat{Q}}_{e}$ (dot-dashed line).

**Figure 6.**Sample images from the MNIST dataset [22].

**Figure 7.**

**Panel A**: Illustration of the convolutional variational autoencoder (cVAE) [23]. The computational graph is depicted from top to bottom. An n

_{x}-dimensional input data

**X**

_{i}(white rectangle) is passed through an encoder (in our experiment this is a convolutional neural network, CNN) which parameterizes an n

_{z}-dimensional multivariate Gaussian over the coordinates

**z**

_{i}for the image’s embedding on the latent space $\mathcal{Z}={\mathbb{R}}^{2}$. The latent embedding can then be passed through a decoder (blue rectangle) which is a neural network employing transposed convolutions (here denoted CNN

^{⊤}) to yield a reconstruction ${\stackrel{\u02c6}{X}}_{i}$ of the original input data. The objective function for this network is a variational lower bound on the model evidence of the input data (see Kingma and Welling [23] for details).

**Panel B:**Depiction of the latent space learned by the cVAE. This model was a pre-trained model from the (https://colab.research.google.com/github/smartgeometry-ucl/dl4g/blob/master/variational_autoencoder.ipynb, Smart Geometry Processing Group at University College London).

**Figure 8.**Heterogeneity for the subset of MNIST training data belonging to each digit class respectively projected onto the latent space of the convolutional variational autoencoder (cVAE). The leftmost plot shows the pooled heterogeneity for each digit class (the effective total area of latent space occupied by encoding each digit class). The middle plot shows the within-observation heterogeneity (the effective total area of latent space per encoded observation of each digit class, respectively). The rightmost plot shows the between-observation heterogeneity (the effective number of observations per digit class). Recall that Rényi heterogeneity on a continuous distribution gives the effective size of the domain of an equally heterogeneous uniform distribution on the same space, which explains why the within-observation heterogeneity values here are less than 1.

**Figure 9.**Visual illustration of MNIST image samples corresponding to different levels of representational Rényi heterogeneity under the convolutional variational autoencoder (cVAE).

**Panel (a)**illustrates the approach to this analysis. Here, the surface $\mathcal{Z}$ shows hypothetical contours of a probability distribution over the 2-dimensional latent feature space. The surface $\mathcal{X}$ represents the observable space, upon which we have projected an “image” of the latent space $\mathcal{Z}$ for illustrative purposes. We ﬁrst compute the expected latent locations

**m**(

**x**

_{i}) for each image ${\mathrm{x}}_{i}\in \mathcal{X}$

**(A**We then deﬁne the latent neighbourhood of image

_{1})**x**

_{i}as the 49 images whose latent locations are closest to

**m**(

**x**

_{i}) in Euclidean distance.

**(A**Each coordinate in the neighbourhood of

_{2})**m**(

**x**

_{i}) is then projected onto a corresponding patch on the observable space of images.

**(A**These images are then projected as a group back onto the latent space, where Equation (57) can be applied, given equal weights over images, to compute the effective number of observations in the neighbourhood of

_{3})**x**

_{i}.

**Panel (b)**plots the most and least heterogeneous neighbourhoods so that we may compare the estimated effective number of observations with the visually appreciable sample diversity.

**Table 1.**Relationships between Rényi heterogeneity and various diversity or inequality indices for a system X with event space $\mathcal{X}=\{1,2,\dots ,n\}$ and probability distribution $\mathbf{p}={\left({p}_{i}\right)}_{i=1,2,\dots ,n}$. The function $\U0001d7d9[\xb7]$ is an indicator function that evaluates to 1 if its argument is true or to 0 otherwise.

Index | Expression |
---|---|

Observed richness [31] | ${\mathsf{\Pi}}_{0}\left(\mathbf{p}\right)={\sum}_{i=1}^{n}\U0001d7d9[{p}_{i}>0]$ |

Perplexity [30] | ${\mathsf{\Pi}}_{1}\left(\mathbf{p}\right)=exp\left\{-{\sum}_{i=1}^{n}{p}_{i}log{p}_{i}\right\}$ |

Inverse Simpson concentration [1] | ${\mathsf{\Pi}}_{2}\left(\mathbf{p}\right)={\left({\sum}_{i=1}^{n}{p}_{i}^{2}\right)}^{-1}$ |

Berger-Parker Diversity Index [32,33] | ${\mathsf{\Pi}}_{\infty}\left(\mathbf{p}\right)={\left({max}_{i}{p}_{i}\right)}^{-1}$ |

Rényi entropy [18] | ${R}_{q}\left(\mathbf{p}\right)=log{\mathsf{\Pi}}_{q}\left(\mathbf{p}\right)$ |

Shannon entropy [29] | $H\left(\mathbf{p}\right)=log{\mathsf{\Pi}}_{1}\left(\mathbf{p}\right)$ |

Tsallis entropy [34] | ${T}_{q}\left(\mathbf{p}\right)=\frac{1}{q-1}\left(1-{\mathsf{\Pi}}_{q}{\left(\mathbf{p}\right)}^{1-q}\right)$ |

Simpson concentration [35] | $\mathrm{Simpson}\left(\mathbf{p}\right)={\left({\mathsf{\Pi}}_{2}\left(\mathbf{p}\right)\right)}^{-1}$ |

Gini-Simpson index [36] | $\mathrm{GSI}\left(\mathbf{p}\right)=1-\mathrm{Simpson}\left(\mathbf{p}\right)$ |

Generalized entropy index [3,37] | $\mathrm{GEI}\left(\mathbf{p}\right)=\frac{1}{q(q-1)}\left[{\left(\frac{1}{n}{\mathsf{\Pi}}_{q}\left(\mathbf{p}\right)\right)}^{1-q}-1\right]$ |

**Table 2.**Definitions in formulation of classical biodiversity and economic equality analysis as categorical representational Rényi heterogeneity. Superscripted indexing on $\mathbf{x}={\left({x}_{i}\right)}^{i=1,\dots ,{n}_{x}}$ denotes that this is a row vector.

Analytical Context | ||
---|---|---|

Symbol | Biodiversity | Economic Equality |

X | Ecosystem, whose observation yields an organism denoted by vector $\mathbf{x}={\left({x}_{i}\right)}^{i=1,\dots ,{n}_{x}}\in \mathcal{X}$ | A system of resources, whose observation yields an asset denoted by vector $\mathbf{x}={\left({x}_{i}\right)}^{i=1,\dots ,{n}_{x}}\in \mathcal{X}$ |

$\mathcal{X}\subseteq {\mathbb{R}}^{{n}_{x}}$ | ${n}_{x}$-dimensional feature space of organisms in the ecosystem | ${n}_{x}$-dimensional feature space of assets in the economy, whose topology is such that the “economic” or monetary value is equal at each coordinate $\mathbf{x}\in \mathcal{X}$ |

$\mathcal{Z}=\left\{\mathbf{z}\in {\left\{0,1\right\}}^{{n}_{z}}:{\sum}_{i=1}^{{n}_{z}}{z}_{i}=1\right\}$ | ${n}_{z}$-dimensional space of one-hot species labels | ${n}_{z}$-dimensional space of one-hot labels over wealth-owning agents |

$\mathbf{f}:\mathcal{X}\to \mathcal{P}\left(\mathcal{Z}\right)$ | A model that performs the mapping $\mathbf{x}\mapsto \mathbf{f}\left(\mathbf{x}\right)$ of organisms to discrete probability distributions over $\mathcal{Z}$ | A model that performs the mapping $\mathbf{x}\mapsto \mathbf{f}\left(\mathbf{x}\right)$ of assets to discrete probability distributions over $\mathcal{Z}$ |

${N}_{i}\in {\mathbb{N}}_{+}$ | The number of organisms observed belonging to species $i\in \left\{1,\dots ,{n}_{z}\right\}$ | The number of equal valued assets belonging to agent $i\in \left\{1,\dots ,{n}_{z}\right\}$ |

$N={\sum}_{i=1}^{{n}_{z}}{N}_{i}$ | The total number of organisms observed | The total quantity of assets observed |

$\mathbf{X}={\left({x}_{ij}\right)}_{i=1,\dots ,N}^{j=1,\dots ,{n}_{x}}$ | A sample of N organisms | A sample of N assets |

$\mathbf{w}={\left({w}_{i}\right)}_{i=1,\dots ,N}$ | Sample weights, such that ${w}_{i}\ge 0$ and ${\sum}_{i=1}^{N}{w}_{i}=1$ |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Nunes, A.; Alda, M.; Bardouille, T.; Trappenberg, T. Representational Rényi Heterogeneity. *Entropy* **2020**, *22*, 417.
https://doi.org/10.3390/e22040417

**AMA Style**

Nunes A, Alda M, Bardouille T, Trappenberg T. Representational Rényi Heterogeneity. *Entropy*. 2020; 22(4):417.
https://doi.org/10.3390/e22040417

**Chicago/Turabian Style**

Nunes, Abraham, Martin Alda, Timothy Bardouille, and Thomas Trappenberg. 2020. "Representational Rényi Heterogeneity" *Entropy* 22, no. 4: 417.
https://doi.org/10.3390/e22040417