A Maximum Entropy Method for the Prediction of Size Distributions

We propose a method to derive the stationary size distributions of a system, and the degree distributions of networks, using maximisation of the Gibbs-Shannon entropy. We apply this to a preferential attachment-type algorithm for systems of constant size, which contains exit of balls and urns (or nodes and edges for the network case). Knowing mean size (degree) and turnover rate, the power law exponent and exponential cutoff can be derived. Our results are confirmed by simulations and by computation of exact probabilities. We also apply this entropy method to reproduce existing results like the Maxwell-Boltzmann distribution for the velocity of gas particles, the Barabasi-Albert model and multiplicative noise systems.


Introduction
The famous model by Yule [1,2] and its analogue for networks, the Barabasi and Albert (BA) model for scalefree networks [3], have been widely used to the describe phenomena and processes that involve scalefree distributions. The latter are an ubiquitous phenomenon found, for example, in word frequency in language [4] and web databases [5], city and company sizes [6] and high-energy physics, and they have been modeled with different approaches, for example, References [7,8]. When occurring in the degree distribution of networks, power laws affect in particular the dynamics on a network, for example, of protein interaction networks [9], brain functional networks [10], email networks [11], and various social networks [12] such as respiratory contact networks [13]. An advantage of the the Yule model and the BA-model is that their interpretation of the 'preferential attachment' process (in which nodes preferentially attach to existing nodes with high degree) is simple and plausible, and that they generate a scalefree degree distribution, whose exponent can be calculated analytically given the rate of introduction of nodes. Therefore simple preferential attachment continues to be widely used to simulate networks for spreading processes. In addition, it has been extended [14][15][16][17][18] and the process has been generalized [19]. The exponent of the degree distribution in the BA-model can be derived starting from a master equation [20]. This ansatz is solvable for constantly growing systems, but becomes too complicated when a system can also lose nodes and edges. However, continuous growth is often not fulfilled in real world examples, especially for social systems, because people also exit the system or network.

1.
Growth of urns: every ball has probability q t of attracting another ball from a reservoir. Let X i,t be the number of new balls in urn i in iteration t; X i,t is binomial with mean n i,t q t , such that the urn grows on average to n i,t (1 + q t ). To ensure that after a full iteration of steps 1-4 N t is stationary (but not fixed), after the first time step, q t is adjusted to q t = N t=0 N t (1 + q t=0 ) − 1, such that the expected size after growth is always N t=0 (1 + q t=0 ).

2.
Shrinking of urns: every ball has probability δ shrink,t of disappearing, which is adjusted to δ shrink,t = ∑ i X i,t /(N t + ∑ i X i,t ) as a result of the growth step 1, such that the expectancy after shrinking N equals the initial size N t=0 . Let Y i,t be the number of disappearances of urn i; Y i,t is a random variable with a binomial distribution with mean Y i,t = δ shrink (n i,t + X i,t ).
The system shrinks in the number of balls, and some urns might be be left with 0 balls (which can be interpreted as exiting urns).

3.
Exit of urns (and balls): every urn has probability δ exit of exiting, that is, being set to size 0, so the system loses balls.

4.
Entry of urns (and balls): Urns that have lost all their balls due to steps (2) or (3) are replaced by urns that contain 1 ball, so that M is strictly conserved after one iteration of steps 1-4, but N t fluctuates around N t=0 .
Even if step 3 is omitted, some urns will exit, as urns can vanish by losing all their balls. Steps 3 and 4 do not affect the number of urns M, but may leave the system with a net loss or gain of balls, compared to the beginning of step 1. To conserve the average sum of balls in the system after growth, N t + ∑ i X i,t , the probability q t to attract a new ball from the reservoir is adjusted for the next iteration. A scheme of how the number of urns and balls change in steps 1-4 is given in Figure A2 in Appendix B.

Possible Cases
This general process can be reduced to two limiting scenarios with the same growth but different shrinking mechanisms. These are: (I) No deletion of urns of size n > 0. (II) Urns can only grow and do not shrink, but exit (with their balls) at a rate δ exit and get replaced by urns of size 1. (III) A combination of both.
(I) Urns do not exit (step 3 is omitted), that is, δ exit = 0. For an urn i of size n i , the probability distribution of the size after a growth-and-shrink cycle, p(n i,t+1 |n i,t ) can be written as a discrete Gaussian centered around n i and with standard deviation with standard deviation scaling exponent ω = 0.5 (see Equations (A1)-(A3) in Appendix A). (II) Urns do not shrink (step 2 is omitted). At each step a fraction δ exit of urns is deleted and replaced by urns of size 1, which means that the number of exiting balls varies more strongly. With probability δ exit , the urn size in the next time step is 1 (if one thinks of the replacing urn as being the same urn); with probability 1 − δ exit , the urn grows by X i , and the binomial distribution of X i has standard deviation again with scaling exponent ω = 0.5. The most probable outcome is the maximum of the binomial distribution (unless this is lower than δ exit ), but that is not the average expected size at t + 1 for an urn of size n i : n i,t+1 is again precisely n i,t (see Figure 2 left column with examples). (III) Mixed case. Steps 2 and 3 can be combined such that some balls (a fraction δ shrink ) will disappear from the system due to shrinking of urns, and some because urns exit with probability δ exit with their balls. The exiting urns have on average the median size of all urns in the system, on average a fraction δ exit,balls,t of balls exits with them (with δ exit,balls,t = δ exit,urns · median urn size mean urn size ). The turnover rate can then be defined as the fraction of balls that gets removed through exit of urns, normalized by the total number of balls that get removed in one time step, µ = δ exit,balls,t δ exit,balls,t +δ shrink .

Maximum Entropy Method
To derive P n we use the Maximum Entropy Principle (MEP) [21,34], which can be defined as: Suppose that probabilities are to be assigned to the set of k mutually exclusive possible outcomes of an experiment. The principle says that these probability values are to be chosen such that the (Shannon) entropy of the distribution P = (P 1 , . . . , P k ), that is, the expression S(P) = − ∑ n P n log P n attains its maximum value under the condition that P agrees with the given information [35].
We use as given information the probabilities p(n i,t+1 |n i,t ) of every urn i to change size. For urns that do not exit, the probability p(n i,t+1 |n i,t ) is either Gaussian (case I) or binomially distributed (case II), and their associated entropies are approximated by s = 1 2 ln(2πσ 2 ). This term becomes s i = 1 2 ln(2π 2q n i ) for case (I) using (1), or s i = 1 2 ln(2π 2q(1 − q) n i ) for case (II) using (2), which both have a standard deviation that scales as σ(n i ) ∝ √ n i . The individual Gaussian or binomial probability distributions p(n i,t+1 |n i,t ) are themselves maximum entropy distributions [36], under the constraint that the expectation of change n i,t+1 − n i,t is zero. If the s i are maximal at stationary state, then ∑ M i=1 s i needs to be stationary. p(n i,t+1 |n i,t ) are generated by the same updating rules as P n , and according to the Maximum Entropy Principle, P n needs to agree with the given information about p(n i,t+1 |n i,t ). (Should ∑ s i not be relevant then it would cancel out in the entropy maximisation step, as it is the case for a system shown in Section 4.4). Formulated differently, the size distribution P n maximizes entropy under the constraint 1 M ∑ M i=1 s i = C * . Subtracting the constant 1 2 ln(2π 2q) from C * , we can use as sum of entropies s i A second constraint is the conservation of the expectation of individual urn size n i,t+1 = ∑ n i,t+1 n i,t+1 p(n i,t+1 |n i,t ) = n i,t , or summed over all urns i, ∑ i ∑ n i,t+1 n i,t+1 p(n i,t+1 |n i,t ) = ∑ i n i,t which can be written as ∑ n P n n = E. (The precise number N t fluctuates around N 0 , in analogy to the total energy of a gas described in the canonical ensemble.) The Lagrangian function for maximizing entropy of the urn size distribution is To determine the distribution that maximizes S, we calculate ∂S ∂P n and set it to 0, leading to with α + 1 = λ/2. This equation can be solved using ∑ n P n = 1, ∑ n P n ln n = C and ∑ n P n n = E, which gives C = K , if urn sizes n can take values in [a 0 , ∞). Knowing K, the exponent α + 1 can be determined from the condition ∑ n P n ln n = C. In continuous approximation . This result is independent of q and for a 0 = 1 simplifies to For β = 0, α depends only on C, which is the logarithm of the geometric mean of urn sizes.

Size Distribution
The maximum entropy size distribution of the stable size process (5) is confirmed by numerical results (see Figure 1a). The method holds for all three cases. The sum C is smaller when urns have a probability δ exit to be replaced with an urn of size 1, which has a theoretical explanation. [37] has shown that the geometric mean decreases when subject to mean-preserving spread, that is, when all numbers in a set become proportionately farther from their fixed arithmetic mean (of course increasing their standard deviation). The statement applies to C as well, since the logarithm is a monotonically increasing function. Spread increases with turnover, which increases the fraction of urns of size 1, and consequently the larger the other urns need to be for a given arithmetic mean E. Another effect is the direct change of s i through increasing p(1|n i,t ) with higher turnover, as shown in Figure 2.    Three examples for the expected size distribution, and its entropy, for the mixed case (III) with deathrate 0.1, for urns of size 5, 10 and 20. The high probability for size n = 1 comes from the re-introduction of urns of size 1. Dependent on the deathrate, this probability increases and the sum of the other outcomes decreases. In the right column, the entropy s i = − ∑ i p i log pi is calculated as a function of the deathrate δ exit . The higher the deathrate, the lower s, which contributes to a lower C = ∑ i s i , and a higher the exponent α, which are shown in Figure 1.
To compare the exponent fit of α to the calculated one from the numerical entropy sum C, an adjustment to the computation of C in (3) is necessary, since the approximation of the entropy of a binomial s(n) = 1 2 ln(2πq(1 − q)n) + O( 1 n ) holds for large n, but yields s n=1 = 0. Especially for cases (II) and (III) with turnover, urns of size 1 make up a large fraction of urns, and their contribution to the total entropy cannot be neglected. To correct for this we calculate the exact entropies s e,n=1 and s e,n=2 from the definition s e = ∑ i p i ln p i , and then multiply their fraction by s n=2 from the large-n-approximation: s n=1 = s e,n=1 s e,n=2 s n=2 with s e,n=1 s e,n=2 = q ln q+(1−q) ln(1−q)) q 2 ln q 2 +2q(1−q ln[2q(1−q)]+(1−q 2 ) ln(1−q 2 )) ≈ 0.6 for a wide range of q. We use as corrected C s e,n=1 s e,n=2 s n=2 .
The correction is only significant for high turnover rates where a large fraction of urns has size 1, and with it, the theoretical α is confirmed by simulations (see Figure 3). Furthermore, if the average size E and turnover rate µ are known, the power law exponent α (via the constant C) and the exponential decay β can be determined numerically (see Figure 1c,d). In case (I) where urns shrink, C is so big and consequently α so low, that P n has a strong exponential cutoff in order to keep the system at the same mean urn size, in agreement with (5). Although (6) holds only for β = 0, it only slightly overestimates α for β > 0, since the exponential cutoff affects only a small fraction of urns. In the presence of β > 0, C can be greater than 1, resulting in α < 1, which would diverge without exponential cutoff.  The larger µ and the mean urn size E, the larger the fluctuations in number of removed balls in step 3, and the more the urn size distribution fluctuates. Both α and β are independent of system size (except if the system size is too low for convergence, in which case β increases), see Figure 1d.
Simulation results are independent of the urns' probability to attract balls in one time step, q, in agreement with our theoretical result in (6).
In addition to simulations, we derived the same size distributions for cases (I) and (II) with another method using the exact probabilities of p(n t |n t−1 ) for every individual urn, which we calculated with a recursion equation (see Appendix A and Figure A1a). Also this method reproduces all of the results of the maximum entropy method which we presented above and in Section 4.4 (see for example Figure A1b).

Aggregate Growth Rate Distribution of the Stable Size Process
It follows from the binomially (or normally) distributed p(n i,t |n i−t,1 ) (where σ(n) ∝ n 0.5 ) that an individual urn's growth rate, defined as g i,t = n i,t n i,t−1 , is also normally distributed with scaling σ g (n) ∝ n −0.5 . The aggregate growth rate distribution (aggregated over all urns in one timestep, dropping the index t) is G(g) = ∑ N i=1 p(n i )G(g i |n i ), or in the continuous limit G(g) = ∞ n 0 dn G(g|n)ρ(n). This can be evaluated using (8) and for ρ(n) the expression (5). For α = 0.5 and β = 0, this yields a upper incomplete Gamma function shown in Figure 4 and References [38][39][40]: G(g) ∝ Γ 0, 1 2 n 0 (g − 1) 2 .  Such 'tent-shaped' aggregate growth rate distributions are often observed for quantities that themselves follow a power-law [22,25,27,29,41]. The tent shape is the sample average, but not the expectation for a given urn, as other models for it presume [28,42]. This result adds credibility to the stable size process as a model for some real system, in particular since a tent-shaped aggregate growth rate distribution does not automatically result from other models for scalefree distributions. An example is a multiplicative noise term γ in the linear Langevin equation n t+1 = γn t + δ [7,43] (where δ is additive noise and n t is the size of the process at time t). Such models produce a scalefree distribution for n above some value n , but the growth rate γ can be any i.i.d. random variable [44][45][46] independent of an urn's size n, and no distinction between individual growth rate and aggregate growth rate can be made. Therefore it does not additionally generate a tent shape for the aggregate growth rate distribution (unless a tent shape is assumed as individual growth rate distribution γ).

Extension to Networks of the Stable Size Process
The algorithm can be adapted to derive the degree distribution for networks, where M nodes are connected with N undirected and unweighted links. The substeps become: (1. and 2.) A random link is broken, and one of its neighbors i is chosen to receive an additional link (i.e., every node is picked with probability proportional to its degree n i ). Its new neighbor j is also picked with probability ∝ n j . (3.) Nodes are removed at random at rate δ exit ; their links are broken. (4.) Nodes are re-introduced and linked to an existing node; the probability of selecting a node i as neighbor is ∝ n i . New links are added to keep N conserved; each node has a probability of receiving a link ∝ n i .
Compared to an urn/ball system, the exponential cutoff always exists, for the following reason. The case (II) in Section of Possible Cases, where the only shrink mechanism is exit nodes, cannot be reached. If a node exits the network, all its links are broken, so necessarily also non-exiting nodes will lose the same number of edges. The maximal turnover µ rate is therefore 0.5. Numerical results confirm that a scalefree network without cutoff is not produced by this algorithm.
In previous work [47,48], we have added further features to make the model more plausible for example, for epidemiology, such as clustering (that a link is preferably formed between neighbours of second or third degree), or different exit rules, for example, removal of a node after a given time span instead of exit by rate δ exit . The latter increases in addition the exponential cutoff, because it prevents nodes to remain a sufficiently long duration to attract many links. In that case α and β in (5) can still be inferred numerically from E, µ and additional features (see Figure 5).

Maximum Entropy Argument of Other Systems
The method of using the sum of entropies of the evolution of individual urns as a constraint on the entropy of the system can be applied to many urn-ball systems in discrete time steps. It will affect the maximum entropy size distribution whenever s i depend on n i .

Maxwell-Boltzmann Distribution
A well-known example for a maximum entropy distribution is the velocity distribution of gas particles (Maxwell-Boltzmann distribution, here in one dimension). The only assumption about the process generating the velocity distribution P is that the mean E = ∑ i i ∝ ∑ i v 2 i is conserved in time. Particles can change their velocity through collisions with other particles. In a given timespan, the sum of received shocks of particle i (in one dimension) follows a Gaussian distribution, which has entropy s i = 1 2 ln(2πσ 2 ), but all particles are hit by shocks of the same distribution, that is, s i = s, since σ does not depend on a particle's own velocity v i . The focus is usually not on the distribution of individual change of v i , only on the stationary distribution of v. In one dimension, the Lagrangian function becomes S(P) = ∑ v P v ln P v + λ (∑ v P v s v − C) + β ∑ v P v v 2 − E with λ = 0 at the extremum where ∂S ∂v = 0, and results in P v = K exp(−βv 2 ). The constraint on entropies vanishes, as s i do not depend on v i , and the established Maxwell-Boltzmann distribution is found.

Yule Process (or Barabasi-Albert for Networks)
We simulated the Yule process in discrete time steps of adding a number N add of balls before adding an urn. If we consider larger time steps where several urns and many balls are added, the growth of an urn is approximately binomial with ω = 0.5. Following our argument in Section 3, the binomial p(n i,t+1 |n i,t ) are themselves maximum entropy distributions [36] and therefore the sum of their entropies is stationary. The system has the constraint that the sum of individual entropies C = 1 M ∑ M i=1 s i in one time step is stationary. The Lagrangian function becomes S(P) = ∑ n P n ln P n + λ (∑ n P n ln(n) − C), which is maximal for P n = Kn −λ .

Multiplicative Noise
An example are systems described by a multiplicative noise term in the linear Langevin equation [7,43] n t+1 = γn t + δ. They can be written like n t+1 = n t + h(n, t) where the noise term appears now as an additive term. This (e.g., Gaussian) noise term h(n, t) has then σ h (n) ∝ n 1 , that is, ω = 1. In this case, a large number of urns will attain size zero, since p(0|n) = p 0 = const does not decrease for larger urns, due to σ(n) ∝ n. For this reason many empty urns need to be refilled and have size 1. We assume again that s i are therefore C = 1 M ∑ i ln n 2 i are stationary. Since the system is at constant size, we also assume ∑ n P n n − E. The Lagrangian function is S(P) = ∑ n P n 2 ln P n + λ (∑ n P n ln(n) − C) + β (∑ n P n n − E), which is maximal for P n = Kn −λ e −β . Examples in the literature often have α sufficiently large to not need a cutoff β for the system to be at a given mean urn size [43]. An additional exit rate of urns can be added, in which case the power law exponent grows with exit rate, like in Figure 1.

Conclusions
We have introduced a method to derive stationary distributions, by looking at them as the maximum entropy distribution of the outcomes in one iteration, for a process in discrete time. The method provides an intuitive explanation for a size or degree distribution. It has been applied to a novel preferential attachment process for systems of constant size. The model has been analyzed for three different methods to keep the system at constant size. Each provides a realistic model for real-world applications. Results are confirmed by simulations and by summing over exact probabilities. We have also applied the method to derive the Maxwell-Boltzmann distribution for the velocity of gas particles, to the Yule process, and to multiplicative noise systems, where in each case established results are reproduced. The constraint that allowed these derivations is that the sum of entropies of the individual urns are also maximal when the system's entropy is maximal.  and for the following shrink step when the system has grown to (1 + q)M, the probability to shrink by one s p s = 1 1+q . From this follows that The process can thus be rephrased as a growth process that follows p given in Equations (A1)-(A3), using the shorter notation v and w, irrespective of the interpretation of the microfoundations. This probability mass function has mean m = 1 since 2v + w = 1, and variance Var( For an urn of size n, E(X) = E(X 1 + X 2 + . . . + X n ) = E(X 1 ) + E(X 2 ) + . . . + E(X n ) = n, and Var(X) = Var(X 1 + X 2 + . . . + X n ) = Var(X 1 ) + Var(X 2 ) + . . . + Var(X n ) = n2v and thus the standard deviation of an urn's next size p(n t+1 |n t ) scales as with its size n. This scaling holds whenever growth is the sum of independent growth of balls.
(i) From (A1)-(A3), the probabilities p(j|k), can be calculated, similar to Pascal's triangle for binomial coefficients. The lowest possible j for an urn of size n t−1 = k is always 0 (all balls leave), the largest is always 2k (all balls attract another ball). Every probability is itself a sum of terms p(j|k) = ∑ (x,y)|x+y=k;y max =k−|k−j| c x,y,j,k · v x (1 − 2v) y (A5) We calculated the coefficients c x,y,j,k recursively from coefficients of the corresponding addends in the 3 terms p(j|k − 1), p(j − 1|k − 1) and p(j − 2|k − 1) with the corresponding powers x and y: c x,y,j,k = ∑ j =j−2,j−1,j c x−1,y,j ,k−1 + c x,y−1,j ,k−1 (A6) if j exists, given j ∈ [0, 2(k − 1)]. The c x,y,j,k with y = y max is calculated first and no c x,y,j ,k−1 can be used in two addends for the same (j, k). With (A6) the coefficients and probabilities have been computed (until n max = 1000). Care has been taken at the implementation since (A5) and (A6) sum over terms of very different orders of magnitude. (ii) With the transition probabilities p(j|k) the most probable time evolution of an urn that started at size 1 can be calculated recursively like p t (n) = ∑ j p t−1 (j)p(n|j). p t (n = 0) grows with t and approaches 1, since over time, the probability to have died out is increasing. (iii) Assuming that equilibrium has been obtained by continuously replacing urns of size 0 by urns of size n = 1, the equilibrium distribution is P n = 1 t max ∑ t p n,t . It is shown in Figure A1.
The obtained size distribution can again be fitted by a power law with exponential cutoff (see Figure A1). The method applies to other processes if p(j|k) can be known. We used it also for multiplicative noise systems where Zipf's law is recovered as result ( Figure A1b).  Figure A1. (a) Numerical normalized probability density with Gaussian p i with σ(n) ∝ n 0.5 without and with turnover, both in log-linear and double logarithmic scale (b) Numerical probability density with Gaussian p i with σ(n) ∝ n generates Zipf's law α = 1.