Power-Law Distributions from Sigma-Pi Structure of Sums of Random Multiplicative Processes

We introduce a simple growth model in which the sizes of entities evolve as multiplicative random processes that start at different times. A novel aspect we examine is the dependence among entities. For this, we consider three classes of dependence between growth factors governing the evolution of sizes: independence, Kesten dependence and mixed dependence. We take the sum X of the sizes of the entities as the representative quantity of the system, which has the structure of a sum of product terms (Sigma-Pi), whose asymptotic distribution function has a power-law tail behavior. We present evidence that the dependence type does not alter the asymptotic power-law tail behavior, nor the value of the tail exponent. However, the structure of the large values of the sum X is found to vary with the dependence between the growth factors (and thus the entities). In particular, for the independence case, we find that the large values of X are contributed by a single maximum size entity: the asymptotic power-law tail is the result of such single contribution to the sum, with this maximum contributing entity changing stochastically with time and with realizations.


Introduction
Power-laws appear in the probability distributions of many physical systems and human activities, being one of the signatures of complex systems [1,2].Usually occurring in its tail, i.e., for sufficiently large events, a probability density function f (x) of a random variable x with this behavior is such that: A probability distribution with a power-law tail is characterized by a much slower decay compared with an exponential law, and thus large events may not be negligible.Another distinctive and unique property of power-laws is that they obey the symmetry of scale invariance: the functional form of the distribution remains unchanged under a scale transformation.In addition, for these distributions, the moments of order larger than or equal to α diverge [3,4].
The frequent observation of power-law distributions in natural and social systems as well as their interesting mathematical properties have motivated the search for mechanisms that generate power-laws [3,5,6].One of the first proposed power-law generation mechanisms, which is recurrently applied to model a diversity of phenomena, is "proportional growth", also known in network theory as "preferential attachment".In this mechanism, a quantity is distributed among existing entities according to probabilities proportional to the quantity they already possess, establishing a "rich get richer" dynamics [7][8][9].Another iconic conceptual framework is self-organized criticality, captured by the metaphor of avalanches in a sandpile, in which power-laws are spontaneous emergent properties, with no apparent need to adjust control parameters [10,11].Among many other mechanisms, let us mention the superposition of probability distributions [12,13], formalized in the concept of superstatistics [14][15][16]; the key idea is that the basic non-power-law probability distribution of a system has parameters that are random variables themselves and, when including this second source of stochasticity, a power-law distribution may arise.A particular case of superstatistics that found wide applicability is the Tsallis statistics, where the extremization of its generalized entropy produces power-laws [17].
An additional relevant model for generating power-law distributions is the class of random multiplicative processes, often used to model growth of entities, such as business firms, cities, and biological populations [18][19][20][21][22][23][24][25].The common starting point to explain the mechanisms of power-law formation in those growth phenomena is the Gibrat's law of proportional growth, stating that an entity grows proportionally to its current size but with a growth rate independent of it [26].This law leads to the basic random multiplicative process expressed in discrete times: where s(t) is the size of entity at time t, ξ(t) represents the multiplicative factor and the discrete time growth rate is equal to ξ(t) − 1 = s(t)−s(t−1) s(t−1) .However, the Gibrat's law alone cannot explain a power-law distribution since the corresponding process (2) produces a non-stationary log-normal distribution, considering ξ(t) independent and identically distributed random variables with finite mean and variance.To obtain a power-law tail, it is necessary to assure a contraction tendency and add some kind of surviving mechanism to the basic random multiplicative process.The simplest manifestation of such mechanisms is the introduction of an additive term to the process, leading to stochastic multiplicative affine processes, also often referred to as the Kesten process [27][28][29][30][31]: with η(t) being the additive term.Another related surviving mechanism is the inclusion of the additive term only when the system crosses a lower threshold, acting as a barrier preventing the collapse to zero [32].Under general conditions, when those ingredients are added to the basic random multiplicative process, a power-law tailed distribution is obtained.
Here, we propose a simple growth model that combines basic random multiplicative processes with the idea of superposition of distributions mentioned above.The superposition of distributions has a neat interpretation: it corresponds to considering entities that are born successively.Combining the features of birth and stochastic proportional growth has been considered in [23,33,34].Our model can be viewed as the simplest and purest incarnation of these mechanisms.This allows us to examine dependence among entities in three particular different classes: complete independence; Kesten dependence (a dependence based on the Kesten process); and mixed dependence, combining both independence and Kesten dependence.Rather than studying the cross-section at a given time of the sizes of entities present in the system (as done, e.g., in [23,33,34]), we study the distribution of the sum of sizes of all entities in the system.This corresponds to the total capitalization of a country, when entities are firms, or to the total biomass of an ecosystem for biological populations.We obtain a random variable whose structure takes the form of a sum of product terms.As a result of this structure, its distribution has a power-law tail with the same exponent regardless of the dependence type-indicating a common general ingredient for power-laws-but with distinct tail formation mechanisms resulting from the degree of dependence among entities.

Model Definition
Consider a set of entities of sizes s j .At each time step, the sizes s j of existing entities evolve as basic random multiplicative processes and one new entity is born with unit size.The size s j (t) of a given entity j at time t is defined as: By definition, the time of birth of a given entity is its index j minus 1.Thus, the first entity with index j = 1 is born (of unit size) at time t = 0.The factors ξ j (t) are identically distributed random variables.Here, we take positive half-normally distributed random variables with variance σ 2 , i.e., The quantity that we study is the sum X(t) of the sizes {s j (t), j = 1, ..., t} of all existing entities present in the system at time t, excluding the last entity born at t that has index t + 1 and size s t+1 (t) = 1: ( By recursion, the size s j (t) of entity j at time t is given by: Then, the sum of entity sizes reads: The first sum in the right side of expression ( 7) is over all the t entities that have been born until time t (excluding the last one, as mentioned above).For a given entity j, the product term is made of t − j + 1 growth factors corresponding to the number of time periods of stochastic proportional growth allowed by its age.

Dependence Structure of the Growth Factors
The entities are not necessarily independent.External factors can influence more than one entity at the same time, resulting in dependence between the growth rates for different entities.For instance, there could exist situations in which j = j such that ξ j (t) = ξ j (t).In this work, from the numerous possible dependence types, we examine the following three particular ones: 1. Independence: all entities are growing independently: Observe that the notation ξ j (t) = ξ j (t ) does not mean that these variables cannot have by chance the same value.This notation just expresses that they are independent random variables.2. Kesten dependence: external influences determine the same growth factor for all existing entities at each given time, but the growth factor is a random variable as a function of time.This case reproduces the solution of the Kesten process (3) and constitutes a novel interpretation of the said process, originally representing a single entity evolving in the presence of an additive term: 3. Mixed dependence: alternation between independence and Kesten dependence, say independence for odd t and Kesten dependence for even t, representing a time-changing dependence.Note that this is only one of the many possibilities for the combination of independence and Kesten dependence.
Regardless of the dependence type, the solution for the sum of sizes has the structure of a sum of increasing products of random variables, which we refer to as Sigma-Pi: We have changed ∏ n k=j into ∏ j k=1 (compare with ( 7)) as this gives the same number and structure of terms, when summed over j from 1 to n.As mentioned before, the identically distributed random variables ξ jk are not necessarily all independent and their interdependence characterizes the dependence type.

Asymptotic Power-Law Tails
This Sigma-Pi structure has received special attention in the context of the Kesten process.For Kesten dependence, letting n → ∞, the central result [27] states that, provided that the multiplicative factors are contracting on average ( log ξ jk < 0, where the notation .stands for the expected operator and log, for the natural logarithm), ensuring stationarity, the distribution of X presents a power-law tail behavior, with the complementary cumulative distribution function of X, F(X), behaving as: where C is called the "scale factor" and the exponent α is determined by the relation (provided that there is a solution α > 0): Then, for half-normally distributed random variables, ξ jk ∼ N || (0, σ 2 ), Equation (12) shows that α is uniquely determined by σ, where the explicit relationship is conveniently expressed by σ as a function of α [35]: It has been shown rigorously that the power-law asymptotic result (11) with ( 12) also applies for the independence case [36] (this fact was first suggested by the authors, as acknowledged in [36]), up to a logarithmic correction so that F(X) ∼ log X X α .In fact, we conjecture that the result (11) with (12), up to slowly varying functions multiplying the scale factor C, holds for any dependence type with the restriction that all variables in a product term are independent (see Appendix A), or more generally with a sufficiently weak dependence.In other words, the sizes of the entities can be interdependent via the action of common simultaneous growth factors, but the growth factors themselves should exhibit no (or just a weak) dependence along the time dimension.We thus expect a power-law distribution with the same exponent for all three particular cases made explicit above: independence, Kesten dependence and mixed dependence.This conjecture is tested numerically in the next section.

Generalization
We end this section by commenting that the addition of more realistic ingredients to the model, such as random birth rates and the inclusion of stochastic deaths of the entities, can be represented by the generalized Sigma-Pi of the form: where the coefficients c j depend on the details of the modified model.For instance, the death of some entity j that occurred at some time between its birth and the "present time" n amounts to take c j = 0. We indicate in Appendix A that, for a large class of coefficients c j , result (11) with ( 12) still holds.Intuitively, this is because the behavior of X n given by ( 14) is controlled asymptotically by just a few terms in the sum, or by the single largest term only, depending on the dependence structure.Interpreted for a system, the total system size is thus controlled by few entities.

Numerical Construction of the Complementary Cumulative Distribution Function of Sum of Sizes
In this section, we check numerically the prediction ( 11) with ( 12) for the complementary cumulative distribution function F(X) of the random variable X(t) (7) or equivalently X n (10) in the limit of large t or n.Numerically, we have used two methods to generate the set of X n to construct F(X): (i) recording the successive values of X n over time n and (ii) constructing thousands of independent ensembles of time evolutions of X n and recording all the values of this ensemble at a fixed time.We have checked (see Appendix B) that the two methods yield the same results, supporting the validity of the ergodicity property for the process X(t).
To make manageable the numerical simulations of the model, we cannot keep an endlessly increasing number of entities entering the system and thus we add the rule of removing entities with age greater than n. Figure 1 shows F(X) for various values of n in the independent case with half-normally distributed random variables ξ j (k) with σ = 1, for which expression (13) gives α = 2.The value σ = 1 is a natural first choice for this parameter since it refers to the standard normal distribution.In this case, one can observe that F(X) does not change appreciably for n ≥ 50, which suggests that it has converged already for n = 50.This means that entities older than 50 time steps have sizes that do not contribute substantially to the sum X(t).In the sequel, we use n = 2000 to be conservative and to obtain reasonable approximations of the asymptotic limit n → ∞.
Figure 2 presents F(X) for the independence (black), Kesten dependence (red) and mixed dependence cases (blue), for σ = 0.8557 (panel a), σ = 1 (panel b) and σ = 1.2533 (panel c).These three values of σ corresponds, respectively, to α = 3, α = 2 and α = 1, as follows from relation (13).Values σ = 1 are chosen in order to analyze the power-law behavior when σ < 1 and σ > 1, in particular σ = 1.2533 yielding α = 1, which corresponds to the Zipf's law [23].The tails of F(X) are in good visual agreement with these predictions, while the non-asymptotic parts deviate substantially depending on the nature of the dependence.One can observe that the scale factor C, as defined in expression (11), is the smallest for the independence case and the largest for the Kesten dependence.The domain of validity of the power law tail (11) is also the smallest for the independence case and the largest for the Kesten dependence.The distribution F(X) for the independence case exhibits the longest intermediate regime before the tail converges to its clean power-law asymptotics.This may be related to the logarithmic correction mentioned above [36].

Study of the Entities Contributing to the Sum of Sizes
Notwithstanding the similar asymptotic power-law tail behavior (up to a slowly varying function), a closer inspection of the model reveals that the three dependence cases present important and interesting differences in the way the extreme values of X (X(t) or X n ) are realized.To unveil these differences, we calculate the Herfindahl index, a concentration indicator typically used in finance [37], which is also known as the participation ratio in the physics of spin-glasses [38].Starting from the size s j (t) of an individual entity contributing to the sum X(t) of sizes, one defines the normalized contribution s j (t)/X(t), which is the fraction of the total system size contributed by entity j.Then, the Herfindahl index reads: The inverse H −1 of the Herfindahl index is a measure of the number of entities that significantly contribute to the sum X of sizes.In particular, it is straightforward to verify that H −1 = 1 if one entity has size X and all the others have vanishing sizes and that H −1 = t (i.e., equal to the total number of entities in the system) if all entities have identical size equal to X/t.The inverse Herfindahl index H −1 thus varies between 1 and t, i.e., from a singular to a full "democratic" contribution to the sum X.
Figure 3 shows scatter plots of H −1 as a function of X obtained in the model evolution for all three dependence cases and the three values of σ used previously.In other words, for each simulation resulting in some X(t), we use the known values of all entity sizes to form the corresponding inverse Herfindahl index.Each row of three panels corresponds to a given dependence model, while each column corresponds to a fixed σ.Focusing on large values of X, corresponding to the power-law tail of the distribution function F(X), only one entity contributes to the total size of the system in the independence case, but, in the Kesten dependence case, the total size is always due to the contribution of several entities (see text).
Let us focus first on the second row corresponding to the Kesten dependence case (panels b).The most striking observation is a change of regime.The first regime holds from small to intermediate values of X for which the number of contributing entities can take any value from 1 to a value increasing approximately logarithmically with X.In panel (b-1) for σ = 0.8557, this regime holds up to X ≈ 5.
For a larger X, one can observe that the minimum of H −1 takes off from the line H −1 = 1 and there is a minimum H −1 strictly larger than 1 necessary to obtain the largest X values.For instance, for the largest sampled values X ≈ 400, the number of contributing entities is approximately at least seven.This result has a simple interpretation: due to the dependence between the growth factors across the entities, one very large entity is always surrounded by a "cloud" of similar sized entities.In the single entity dynamical version of the Kesten process, this cloud is made of the cluster of large transient excursions [32].
Let us now turn to the independence case shown in the first row of three panels (panels a).The most striking observation is the progressive shrinkage of the range of H −1 values as X gets larger and larger.In addition, there seems to be a rather well-defined threshold such that, for X larger than this threshold, H −1 collapses to the single value H −1 = 1.In words, the very large values of X that populate the asymptotic power-law behavior of the distribution of X are due to a single entity that overwhelmingly contributes to it.This is reminiscent of a condensation phenomenon, in which a single entity is overwhelmingly larger than all the others, notwithstanding their a priori unlimited number.This might be related to the dragon-king concept [39,40] in an unconventional way: the distribution of the sizes of entities can be seen as the weighted sum of distributions of products of random growth factors.Due to the stationary condition that the random growth factors tend to be smaller than one (the exact technical condition is that the expectation of the logarithm of the random growth factors is negative), only products of random growth factors that have a finite number of terms have a non-negligible size.The system of entities is thus made of a finite (possibly large) number of non-negligible entities at all times.Conditional on the sum X being asymptotically large, the distribution of the entities is non-power-law, but there is a single entity that is of order X, qualifying it as a dragon-king.Interestingly, when sampling over many realizations (statistically or temporally), such set of systems with dragon-kings exhibit in ensemble a pure asymptotic power-law behavior.
Another interesting result when comparing the Kesten and the independence cases is that small values of X, less than the unit size, occur much more frequently in the Kesten case than in the independence case, also a consequence of the strong dependence among entities in the former case: if one entity is small, it is likely that others are small as well.
The mixed dependence (panels c) shows an intermediary behavior between independence and Kesten dependence.In addition, changing the value of the standard deviation σ of the growth factors does not change the general behavior.
For the independence case, it is interesting to mention that, in addition to the fact that only one entity is responsible for the whole system size X, the age of this maximum size entity is a stochastic variable.This means that the product term that builds the power-law tail of F(X) varies as a function of time in its number of multiplicative growth factors, i.e., it does not have the same number of factors at all time steps.Figure 4 shows the age of the entity with maximum size against the total size X of the system for the independence case with σ = 0.8557 (panel b-1), σ = 1 (panel b-2) and σ = 1.2533 (panel b-3): there is a broad distribution of ages of the maximum size entities for large values of X and not a unique age.This observation suggests that the power-law behavior in this case results from the superposition of the statistics of single multiplicative processes at different ages with appropriate statistical weights, as in the superstatistics mechanism mentioned in the introduction [14][15][16].It is notable that studies following similar ideas with exponential weights established the so-called double Pareto distribution, with power-law tail [41,42].Finally, numerical simulations in Figure 4 (panels b-1-b-3) suggest that there exist a minimum age for this maximum size entity, which grows approximately logarithmically with X in the asymptotic tail regime.This observation can be rationalized by borrowing the extreme deviation theorem applied to finite products [13].Parameterizing the probability density function of the independent and identically distributed growth factors ξ j (k) as e −g(ξ) with g(ξ) being a function with a number of "natural" properties such as increasing sufficiently fast as x → ∞, then the distribution of the product Π n of n independent such variables is given by exp −ng Π 1/n n .In words, the typical tail value of the product of n independent variables is contributed mainly by realizations where all the n random variables are approximately similar in amplitude and equal to Π 1/n n , so that their product (with n term) recover Π n .The term n in front of g Π 1/n n in the exponential correspond to taking the product of n probability density functions of the independent growth factors ξ j (k).Fixing Π n to a large value X, the value n * (X) that maximizes exp −ng X 1/n is given by: and g (u) denotes the derivative of g(u).Searching for a solution of the form n * (X) = log X log c , where c is a constant, we find that c solves the equation log c = g(c) cg (c) .Thus, for the process (7), for a given large X, the product term that contributes the most to a given large Sigma-Pi X contains at least n * (X) ∼ log X factors.This means that a large X is minimally controlled by the term with about log X factors ξ j (k), i.e., with an age ∼ log X.This result is essentially independent of the specific form of the probability density function e −g(ξ) of the growth factors ξ j (k), as long as g(ξ) obeys the conditions for the validity of the extreme deviation theorem.

Conclusions
Two main conclusions can be drawn from this work.First, for sums of products of random growth factors, referred above as Sigma-Pi structures, the dependence among the product terms (the "entities") that may result from some dependence between the growth factors do not impact the asymptotic power-law of the distribution of these Sigma-Pi variables, up to slowly varying functions.Simulations of our simple growth model with distinct dependence types together with previous theoretical studies on the Kesten process and of the sum of products of independent random variable support this conclusion.Indeed, the essential ingredient for the asymptotic power-law tail seems to be the Sigma-Pi structure, regardless of the dependence type.This conjecture is still to be formally proved.
Second, although the exponent of the asymptotic power-law tail is not influenced by the dependence type among entities, the nature of the construction of the events populating the tail differs in each dependence case.In the independence case, only one entity in the sum contributes to it asymptotically.In contrast, in the Kesten dependence case, as a result of the strong dependence between the growth factors across entities, several entities contribute to the sum asymptotically.These insights are important in the modeling of real physical, biological or economic systems, complementing the ubiquitous power-law tail behavior with additional information on the internal structure of the system.

Figure 2 .
Figure 2. Numerical construction of the complementary cumulative distribution function F(X) of the random variable X(t) (7) or equivalently X n (10) in the limit of large t or n, for independence (black), Kesten dependence (red) and mixed dependence cases (blue) with (a) σ = 0.8557; (b) σ = 1 and (c) σ = 1.2533.Following expression (13), the tails are consistent with the expected values of the exponents α = 3, α = 2 and α = 1, respectively, regardless of the dependence type, as shown by the straight grey lines.

Figure 3 .
Figure 3. Relationship between the inverse Herfindahl index H −1 and the sum X of sizes where the three rows correspond to the three cases of dependence and the three columns correspond to three different values of σ: (a-1) independence with σ = 0.8557; (a-2) independence with σ = 1; (a-3) independence with σ = 1.2533; (b-1) Kesten dependence with σ = 0.8557; (b-2) Kesten dependence with σ = 1; (b-3) Kesten dependence with σ = 1.2533; (c-1) mixed dependence with σ = 0.8557; (c-2) mixed dependence with σ = 1 and (c-3) mixed dependence with σ = 1.2533.The horizontal grey lines indicate H −1 = 1.Focusing on large values of X, corresponding to the power-law tail of the distribution function F(X), only one entity contributes to the total size of the system in the independence case, but, in the Kesten dependence case, the total size is always due to the contribution of several entities (see text).

Figure 4 .
Figure 4. (a) relationship between the inverse of the Herfindahl index H −1 and the sum X of sizes and (b) relationship between the inverse of the age of the maximum size entity and the sum X of sizes for the independence case with: (a-1,b-1) σ = 0.8557; (a-2,b-2) σ = 1 and (a-3,b-3) σ = 1.2533.The horizontal gray lines indicate H −1 = 1.For the independence case, large values of X, corresponding to the power-law tail of the distribution function F(X), are made of a single entity, but its age is a stochastic variable changing with time and with realizations.

Figure A1 .
Figure A1.Complementary cumulative distribution function F(X) of the sum X of entity sizes for the independence case with σ = 1 using time sampling (black) and ensemble sampling (green).