Multiplicative Decomposition of Heterogeneity in Mixtures of Continuous Distributions

A system’s heterogeneity (diversity) is the effective size of its event space, and can be quantified using the Rényi family of indices (also known as Hill numbers in ecology or Hannah–Kay indices in economics), which are indexed by an elasticity parameter q≥0. Under these indices, the heterogeneity of a composite system (the γ-heterogeneity) is decomposable into heterogeneity arising from variation within and between component subsystems (the α- and β-heterogeneity, respectively). Since the average heterogeneity of a component subsystem should not be greater than that of the pooled system, we require that γ≥α. There exists a multiplicative decomposition for Rényi heterogeneity of composite systems with discrete event spaces, but less attention has been paid to decomposition in the continuous setting. We therefore describe multiplicative decomposition of the Rényi heterogeneity for continuous mixture distributions under parametric and non-parametric pooling assumptions. Under non-parametric pooling, the γ-heterogeneity must often be estimated numerically, but the multiplicative decomposition holds such that γ≥α for q>0. Conversely, under parametric pooling, γ-heterogeneity can be computed efficiently in closed-form, but the γ≥α condition holds reliably only at q=1. Our findings will further contribute to heterogeneity measurement in continuous systems.


Introduction
Measurement of heterogeneity is important across many scientific disciplines. Ecologists are interested in the heterogeneity of ecosystems' biological composition (biodiversity) [1], economists are interested in the heterogeneity of resource ownership (wealth equality) [2], and medical researchers and physicians are interested in the heterogeneity of diseases and their presentations [3]. Using Rényi heterogeneity [3][4][5], which for categorical random variables corresponds to ecologists' Hill numbers [6] and economists' Hannah-Kay indices [7], one can measure a system's heterogeneity as its effective number of distinct configurations.
The heterogeneity of a mixture or ensemble of systems is often known as γ-heterogeneity, and is generated by variation occurring within and between constituent subsystems. A good heterogeneity measure will facilitate decomposition of γ-heterogeneity into α (within subsystem) and β (between subsystem) components. Under this decomposition, we require that γ ≥ α, since it is counterintuitive that the heterogeneity of the overall ensemble should be less than any of its constituents, let alone the "average" subsystem [8,9]. Such a decomposition was introduced by Jost [9] for systems represented on discrete event spaces (such as representations of organisms by species labels). However, many data are better modeled by continuous embeddings, including word semantics [10][11][12], genetic population structure [13], and natural images [14]. Unfortunately, there is considerably less understood about how to decompose Rényi heterogeneity in such cases where data are represented on non-categorical spaces [4]. Although there are decomposable functional diversity indices expressed in numbers equivalent, they require categorical partitioning of the data (in order to supply species (dis)similarity matrices) [15][16][17][18] and setting sensitivity or threshold parameters for (dis)similarities [16,18]. For many research applications, such as those in psychiatry [3,4,19] or involving unsupervised learning [13,14], we may not have categorical partitions of the observable space that are valid, reliable, and of semantic relevance. If we are to apply Rényi heterogeneity to such continuous-space systems, then we must demonstrate that its multiplicative decomposition of γ-heterogeneity into α and β components is retained.
Therefore, our present work extends the Jost [9] multiplicative decomposition of Rényi heterogeneity to the analysis of continuous systems, and provides conditions under which the γ ≥ α condition is satisfied. In Section 2, we introduce decomposition of the Rényi heterogeneity in categorical and continuous systems. Specifically, we highlight that the most important decision guiding the availability of a decomposition is how one defines the distribution over the mixture of subsystems. We show that, for non-parametrically pooled systems (i.e., finite mixture models, illustrated in Section 3), the γ ≥ α condition can hold for all values of the Rényi elasticity parameter q > 0, but that γ-heterogeneity will generally require numerical estimation. Section 4 introduces decomposition of Rényi heterogeneity under parametric assumptions on the pooled system's distribution. In this case, which amounts to a Gaussian mixed-effects model (as commonly implemented in biomedical meta-analyses), we show that γ ≥ α will hold at q = 1, though not necessarily at q = 1. Finally, in Section 5, we discuss the implications of our findings and scenarios in which parametric or non-parametric pooling assumptions might be particularly useful.

Categorical Rényi Heterogeneity Decomposition
In this section, we consider the definition and decomposition of Rényi heterogeneity for a composite random variable (or "system") that we call a discrete mixture (Definition 1). Definition 1 (Discrete Mixture). A random variable or system X is called a discrete mixture when it is defined on an n-dimensional discrete state space X = {1, 2, . . . , n} with probability distributionp = (p i ) i=1,2,...,n , wherep i is the probability that X is observed in state i ∈ X . Furthermore, let X be an aggregation of N component subsystems X 1 , X 2 , . . . , X N with corresponding probability distributions P = p ij j=1,2,...,n i=1,2,...,N . The proportion of X attributable to each component is governed by the weights w = (w i ) i=1,2,...,N , where Let X be a discrete mixture. The Rényi heterogeneity for the i th component is which is the effective number of states in X i . Assuming the pooled distribution over discrete mixture X is a weighted average of subsystem distributions,p = P w, the γ-heterogeneity is thus which we interpret as the effective number of states in the pooled system X.
Jost [9] proposed the following decomposition of γ-heterogeneity: where Π α q (X) and Π β q (X) are summary measures of heterogeneity due to variation within and between subsystems, respectively. Since the γ factor has units of effective number of states in the pooled system, and α has units of effective number of states per component, then yields the effective number of components in X.
For discrete mixtures, Jost [9] specified the functional form for α-heterogeneity as which allows the decomposition in Equation (3) to satisfy the following desiderata: 1. The α and β components are independent [20] 2. The within-group heterogeneity is a lower bound on total heterogeneity [8]: The α-heterogeneity is a form of average heterogeneity over groups 4. The α and β components are both expressed in numbers that are equivalent.
Specifically, Jost [9] proved that Π γ q (X) ≥ Π α q (X) is guaranteed for all q ≥ 0 when w i = w j for all (i, j) ∈ {1, 2, . . . , N}, or for unequal weights w if the elasticity is set to the Shannon limit of q → 1.

Continuous Rényi Heterogeneity Decomposition
Let X be a non-parametric continuous mixture according to Definition 2. Despite individual mixture components in X potentially having parametric probability density functions, we call this a "non-parametric" mixture because the distribution over pooled components does not assume the form of a known parametric family. Definition 2 (Non-Parametric Continuous Mixture). A non-parametric continuous mixture is a random variable X defined on an n-dimensional continuous space X ⊆ R n , and composed of subsystems X 1 , X 2 , . . . , X N , with respective probability density functions f(x) = { f i (x)} i=1,2,...,N and weights w = (w i ) i=1,2,...,N such that ∑ N i=1 w i = 1 and 0 ≤ w i ≤ 1. The pooled probability density over X is defined as The continuous Rényi heterogeneity for the i th subsystem of X is whose interpretation is given by Proposition 1 (see Proposition A3 in Nunes et al. [5] for the proof), which we henceforth call the "effective volume" of the event space or domain of X i .
Proposition 1 (Rényi Heterogeneity of a Continuous Random Variable). The Rényi heterogeneity of a continuous random variable X defined on event space X ⊆ R n with probability density function f is equal to the magnitude of the volume of an n-cube over which there is a uniform probability density with the same Rényi heterogeneity as that in X.
Given the pooled distribution as defined in Equation (6), the Rényi heterogeneity over the mixture, which is the γ-heterogeneity, is The γ-heterogeneity is thus the total effective volume of X's domain. The α-heterogeneity represents the effective volume per component mixture component in X, and is computed as follows: Given Equations (8) and (9), the following theorem provides conditions under which γ ≥ α is satisfied for a non-parametric continuous mixture. The proof is analogous to that given by Jost [9] for discrete mixtures, and is detailed in Appendix A. Theorem 1. If X is a non-parametric continuous mixture (Definition 2), with γ-heterogeneity specified by Equation (8) and α-heterogeneity given by Equation (9), then under the following conditions: 1. q = 1 2. q > 0 when weights are equal for all mixture components.
. . , N}, then a closed form expression for Π α q (X) will be available. If Xf q (x) dx is also analytically tractable, then Π β q (X) will be too. However, this will depend entirely on the functional form off , and will rarely be the case using real world data. In the majority of cases, Xf q (x) dx will have to be computed numerically.

Rényi Heterogeneity Decomposition under a Non-Parametric Pooling Distribution
Definition 3 defines a general Gaussian mixture X as a weighted combination of component Gaussian random variables, without identifying the function form of the composition. The non-parametric Gaussian mixture, where the distribution over X is a simple model average over its Gaussian components, is specified in Definition 4.

Definition 3 (Gaussian Mixture). The n-dimensional Gaussian mixture X is a weighted combination of the set of n-dimensional Gaussian random variables
The probability density function of component X i is denoted N (x|µ i , Σ i ), and is parameterized by an n × 1 mean vector µ i and n × n covariance matrix Σ i . Definition 4 (Non-Parametric Gaussian Mixture). We define the random variable X as a non-parametric Gaussian mixture if it is a Gaussian mixture (Definition 3) whose probability density function is defined as where µ 1:N and Σ 1:N denote the set of component mean vectors µ 1 , . . . , µ N and covariance matrices Σ 1 , . . . , Σ N , respectively.
We now introduce the Rényi heterogeneity of a single n-dimensional Gaussian random variable (Proposition 2) and subsequently characterize the γ-, α-, and β-heterogeneity values for a non-parametric Gaussian mixture.
Proposition 2 (Rényi Heterogeneity of a Multivariate Gaussian). The Rényi heterogeneity of an n-dimensional Gaussian random variable X with mean µ and covariance matrix Σ is The proof of Proposition 2 is included in Appendix A. Unfortunately, a closed form solution such as Equation (12) cannot be obtained for the γ-heterogeneity of a non-parametric Gaussian mixture, which must be computed numerically to yield the effective size of the mixture's domain. This process may be computationally expensive, particularly in high dimensions. Conversely, Equation (9), which yields the effective size of the domain per mixture component, can be evaluated in closed form for a Gaussian mixture: The β-heterogeneity, which returns the effective number of components in the mixture, can then be computed using Equation (4). Example 1 demonstrates an important property of considering X as a non-parametric Gaussian mixture: that low-probability regions of the domain between well-separated components will have little to no effect on the γ-or β-heterogeneity estimates.
Example 1 (Decomposition of Rényi heterogeneity in a univariate Gaussian mixture). Consider three non-parametric Gaussian mixtures X (1) , X (2) , X (3) defined on R whose number of components are respectively N 1 = 2, N 2 = 3, and N 3 = 4. Components in each mixture are equally weighted-that is, the components of mixture X (j) have weights w . . , N j }-and have equal standard deviation σ = 0.5. This yields a per-component Rényi heterogeneity of approximately 2.07, which is also consequently the α-heterogeneity for each Gaussian mixture. Figure 1 demonstrates the multiplicative decomposition of Rényi heterogeneity (at q = 1) in these Gaussian mixtures, where γ-heterogeneity was computed numerically, across varying separations of respective mixtures' component means. Note that the β-heterogeneity in this case represents the effective number of distinct components in the mixture distribution, and is bound between 1 (when all components overlap), and N j (when all components are well separated). Further separating the mixture components beyond the point at which β-heterogeneity reaches N j yielded no further increase in β-heterogeneity.
Assuming sufficiently accurate approximation of the integral in Equation (13), the γ-heterogeneity in Example 1 appears to reach a limit corresponding to the sum of effective domain sizes under all mixture components, and the β-heterogeneity reaches a limit corresponding to the number of individual mixture components.
Unfortunately, computation of β-heterogeneity in a non-parametric Gaussian mixture will yield results whose accuracy will depend on the error of numerical integration, and which may consume significant computational resources when evaluated for large N (many components) and large n (high dimension). Monte Carlo integration may be preferable for high dimensional mixture distributions, but running samplers can still be costly if the γ-heterogeneity must be estimated many times. Although the non-parametric pooling approach may be the only available method for many distribution classes, a computationally efficient parametric pooling approach exists for Gaussian mixtures, to which we now turn our attention.

Rényi Heterogeneity Decomposition Under a Parametric Pooling Distribution
This section introduces the parametric Gaussian mixture (Definition 5). This is essentially an ensemble of individual Gaussian distributions whose means and covariance matrices are weighted and pooled to obtain the mean and covariance matrix of the mixture as a whole. We subsequently provide conditions under which decomposition of the parametric Gaussian mixture's heterogeneity satisfies the requirement that α-heterogeneity be a lower bound on γ-heterogeneity (Theorem 2). Parametric Gaussian mixtures are an important class of models commonly used in mixed-effects meta-analyses [21], where one models the effect size of each of K ∈ N + studies as Gaussians whose means are themselves Gaussian distributed with "true" effect-size µ * and variance τ 2 . The variance of the true effect, τ 2 , is often taken as an index of between-study heterogeneity, but unfortunately variance does not satisfy the replication principle [4]. A parametric Gaussian mixture can also be used to measure the effective number of natural images embedded in the real valued latent space of a variational autoencoder (a probabilistic deep learning model used to learn compressed representations of high-dimensional data) [5].
Definition 5 (Parametric Gaussian Mixture). We define the random variable X as an n-dimensional parametric Gaussian mixture if it is a Gaussian mixture (Definition 3) whose probability density function is defined as f (x|µ * , Σ * ) = N (x|µ * , Σ * ), (15) with pooled mean vector and pooled covariance matrix The efficiency of assuming a parametric, rather than non-parametric, Gaussian mixture is that γ-heterogeneity for the latter may be computed in closed form using Equation (12) (it is simply a function of Equation (17)). However, the critical difference between the parametric and non-parametric Gaussian mixture assumptions is that γ-heterogeneity-and therefore β-heterogeneity-will depend on the component means µ 1:N , according to the following Lemma.
..,N are identical between X and X . Finally, let Σ * and Σ * be the pooled covariance matrices for X and X , respectively. Then, for all c ≥ 1, we have that with equality if c = 1.
Lemma 1, whose proof is detailed in Appendix A, implies that the resulting β-heterogeneity of a parametric Gaussian mixture will increase as the mixture component means are spread further apart. This follows from the fact that Equation (14), which is computed component-wise, remains a valid expression of the α-heterogeneity in a parametric Gaussian mixture.
Before stating the conditions under which α is a lower bound on γ for a parametric Gaussian mixture (Theorem 2), we introduce the following Lemma, whose proof is left to Appendix A.

Lemma 2.
If {Σ i } i=1,2,...,N is a set of N ∈ N ≥2 positive semidefinite n × n matrices with corresponding weights w = (w i ) i=1,2,...,N such that 0 ≤ w i ≤ 1 and ∑ N i=1 w i = 1, then Theorem 2. The Rényi β-heterogeneity of order q = 1 of a parametric Gaussian mixture X (Definition 5) has a lower bound of 1: Proof. Recall that Π α q (X) is independent of the mean-vectors of components in X (Equation (14)). Furthermore, it follows from Lemma 1 that, if µ 1:N = {0} i=1,2,...,N , where 0 is an n × 1 zero vector, then for any parametric Gaussian mixture X with means µ 1:N , we will have Π γ q (X ) ≥ Π γ q (X), where equality is obtained if µ 1:N are also zero vectors, or the covariance of mean vectors in X , is otherwise singular. Thus, it suffices to prove our theorem under the assumption that µ 1:N = {0} i=1,2,...,N , where the pooled covariance of X is redefined as The expression for Π which after simplification, can be appreciated to satisfy Lemma 2.
Although Theorem 2 highlights the reliability and flexibility of using elasticity q = 1, we must emphasize that q = 1 may not be the only condition under which Π γ q (X) ≥ Π α q (X), as suggested by Example 2. Indeed, Example 2 suggests that the integrity of this bound on β-heterogeneity at elasticity values q = 1 may depend in various ways on the unique combination of component-wise parameters in a parametric Gaussian mixture.
Example 2 (Decomposition of Rényi Heterogeneity in a Parametric Gaussian Mixture). Consider a parametric Gaussian mixture X with four components defined on R (for instance, Figure 2A). The components' respective standard deviations are σ = (0.5, 0.8, 1.1, 1.6). We vary the column vector of mixture component weights w = (w i ) i=1,...,4 according to the following function: which "skews" the distribution of weights over components in X according to the value of a skew parameter a ≥ 0 (shown in Figure 2B. As the parameter a decreases further below 1, components X 1 and X 2 (which have the narrowest distributions) become preferentially weighted. Conversely, as a increases above 1, components X 3 and X 4 are preferentially weighted. At a = 1, all components are equally weighted (depicted as the dashed black lines in Figure 2B-F). Figures 2C-E plot the γ-, α-, and β-heterogeneity for the parametric Gaussian mixture at q = 1, respectively, while Figure 2F computes the β-heterogeneity at q = 1 for variously skewed weight distributions. Note that, when the weight distributions are skewed, there is a discontinuity in β-heterogeneity around q = 1. When the skew parameter results in a distribution of weights whose ranking of components agrees with the rank order of component distribution widths (that is, when the largest components of σ also have the highest weights), then β-heterogeneity appears to exceed 1 for q > 1. However, when the component weights and distribution widths are anti-correlated ( when the largest components of σ have the smallest weights, and vice versa), then we observe values of β-heterogeneity below 1 at values of q > 1, as well as for some values of q < 1.  The γ-heterogeneity values of parametric Gaussian mixtures were computed by pooling component means and variances according to Definition 5, to which we applied Equation (12).
The γ-heterogeneity values of non-parametric Gaussian mixtures (Equation (13)) were computed using numerical integration, as well as in closed form using second-order asymptotic approximation. In all cases, the α-heterogeneity reduced simply to the Rényi heterogeneity of a single univariate Gaussian with unit variance. Figure 3 further highlights that the β-heterogeneity of uniformly weighted non-parametric Gaussian mixtures tend to approach the number of individual components in the system. Conversely, the β-heterogeneity of parametric Gaussian mixtures continues increasing. In fact, one can show that, as the separation between mixture components becomes large, the β-heterogeneity approaches a linear rate of growth (Appendix B).

Discussion
This paper provided approaches for multiplicative decomposition of heterogeneity in continuous mixture distributions, thereby extending the earlier work on discrete space heterogeneity decomposition presented by Jost [9]. Two approaches were offered, dependent upon whether the distribution over the pooled system is defined either parametrically or non-parametrically. Our results improve the understanding of heterogeneity measurement in non-categorical systems by providing conditions under which decomposition of heterogeneity into α and β components conforms to the intuitive property that γ ≥ α.
If one defines the pooled mixture non-parametrically, as in a finite mixture model, heterogeneity is decomposable such that γ ≥ α for all q > 0 (if component weights are uniform, or at q = 1, otherwise), and β may be interpreted as the discrete number of distinct mixture components (Sections 2.2 and 3). This has the advantage of conforming with the original discrete decomposition by Jost [9], insofar as probability mass in the mixture is recorded only where it is observed in the data, and not elsewhere, as would be assumed under a parametric model of the pooled system. Consequently, one achieves a more precise estimate of the size of the pooled system's base of support. The primary limitation arises from the need to numerically integrate the γ-heterogeneity, which can become prohibitively expensive in higher dimensions. Future work should investigate the error bounds on numerically integrated γ.
A more computationally efficient approach for decomposition of continuous Rényi heterogeneity is to assume that the pooled mixture has an overall parametric distribution. A common application for which this assumption is generally made is in mixed-effects meta-analysis [21]. An important departure from the non-parametric pooling approach of finite mixture models is that non-trivial probability mass may now be assigned to regions not covered by any of the constituent component distributions. From another perspective, one may appreciate that the non-parametric approach to pooling is insensitive to the distance between component distributions, and rather only measures the effective volume of event space to which component distributions assign probability. Conversely, assumption of the parametric distribution over mixture (in the case of Section 4, a Gaussian) incorporates the distance between the component distributions into the calculation of γ-heterogeneity. This would be appropriate in scenarios where one assumes that the observed components undersamples the true distribution on the pooled system. For example, in the case of mixed-effects meta-analysis, the available research studies for inclusion may differ significantly in terms of their means, but one might assume that there is a significant probability of a new study yielding an effect somewhere in between. Specifying a parametric distribution over the pooled system would capture this assumption.
One limitation of the present study is the use of a Gaussian model for the pooled system distribution. This was chosen on account of (A) its prevalence in the scientific literature and (B) analytical tractability. Future work should expand these results to other distributions. Notwithstanding this, we have demonstrated the decomposition of γ Rényi heterogeneity into its α and β components for continuous systems. There are (broadly) two approaches, based on whether parametric assumptions are made about the pooled system distribution. Under these assumptions applied to Gaussian mixture distributions, we provided conditions under which the criterion that γ ≥ α is satisfied. Future studies should evaluate this method as an alternative approach for the measurement of meta-analytic heterogeneity, and expand these results to other parametric distributions over the pooled system. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proofs
Proof of Theorem 1. Following Jost [9] (proof 2), in the limit q → 1, one obtains the following inequality: whereas, when w i = w j for all (i, j) ∈ {1, 2, . . . , N}, for q > 1, we have and, for q < 1, we have all of which hold by Jensen's inequality.
By the Minkowski determinant inequality, we have that Σ * Proof of Lemma 2. Since Σ 1:N are positive semidefinite matrices, then for all x ∈ R n , we have that − 1 2 x (w i Σ i ) x ≤ 0, and thus − 1 2 x ∑ N i=1 w i Σ i x ≤ 0. By exponentiating the quadratic term, we have We obtain the following expressions by applying Gaussian integration to the left-hand side, as well as to a bound on the right-hand side obtained by Hölder's inequality, Substituting Equations (A17) and (A19) into Equation (A16) and simplifying terms yields