Heavy-Tailed Probability Distributions: Some Examples of Their Appearance

: We provide two examples of the appearance of heavy-tailed distributions in social sciences applications. Among these distributions are the laws of Pareto and Lotka and some new ones. The examples are illustrated through the construction of suitable toy models

1. The distribution of big capital was considered by Pareto in 1896 (see [1]). The density is p(x) = α x o x o x α+1 for x ≥ x o , α > 0.
2. Scientific production, that is, the number of scientists who published one paper, two papers, etc., (the number of papers x published by a scientist) has also been studied: Ref. [2] showed that n(x) = n 1 /x a , where n 1 > 0, a ≤ 2 (in many cases, a is close to 2). 3. Lotka's law approximately holds for the number of citations of a paper by a scientist. 4. For a specific artistic text, the sequence of all words is written in descending order according to the frequency of their occurrence. Comparing the frequency of the word (x) and its place in this sequence (rank r) leads to x = B/r, where B = const (see [3]). Why do these patterns emerge? Probably, Laws 1-3 refer to individual human abilities, while Law 4 refers to the memory or other functions of the human brain.
We will not consider the fourth law in this paper and will focus on Laws 1 and 3; more precisely, we will focus on their qualitative explanation. This is because Zipf explained his law based on the least effort principle. Although there are no rigorous results on the existence of a mechanism related to this principle in the human brain, not wasting memory seems natural. However, the application of the least effort principle in Laws 1-3 does not seem to be related to the essence of the issues under consideration.
At first glance, everything looks quite simple. The population of a country is heterogeneous. There are people more capable in business (Law 1) or scientific work (Laws 2 and 3) and people who are not (or are less) capable of such activities.
However, how large are these differences in ability, and are all differences in "success" determined by ability? Does chance have an effect? First, let us focus on the first law. Let us try to build a model that explains the reason for its occurrence.
The distributions of income and capital are subject to many factors that are not fully accounted for. Our interest is not in the whole mechanism of the accumulation and distribution of capital; instead, we are only interested in the roles of human talent and chance in this process. How essential are these roles? Therefore, we have to use a toy model that assumes that all people have identical abilities. If the role of chance is small, then there will not be many variations in the model between different investors. On the other hand, if we see large differences between investors, this will indicate that chance plays a significant role.
As previously noted, we want to provide examples of the possible occurrence of distributions with power tails in connection with classical empirical facts. The presentation of modern results related to the use of such distributions is not within the scope of the problems considered here. The reader interested in studying the modern use of heavy-tailed distributions in financial problems is referred to [4] and the literature cited therein.

A Toy Model for the Distribution of Capital
Let us consider the first toy model of the distribution of capital leading to Pareto's law. Suppose for simplicity that there exists only one business. All possible investors are equal in terms of their talent and initial capital. Consider the case in which each investor invests one unit of capital in the business. After one unit of time, the business outcome is X 1 , where X 1 is a random variable. Suppose that the investor leaves this entire sum in the business, and the conditions of the market remain the same during the following time interval. Then, the outcome after the second time interval is X 1 · X 2 , where X 1 and X 2 are independent identically distributed (i.i.d.) random variables. In the same way, the outcome after the n-th time interval is ∏ n i=1 X j , where X 1 , X 2 , . . . , X n are i.i.d. random variables. Let us suppose that the conditions of the market will change radically at a random moment ν p so that investing in the business becomes unprofitable. Therefore, the final outcome is ∏ ν p j=1 X j . We are interested in the outcome behavior for large values of ν p . More precisely, we suppose the following: 1. X = {X 1 , X 2 , . . . , X n , . . .} is a sequence of i.i.d. positive random variables, and a = IE log X 1 . 2. ν = {ν p , p ∈ ∆ ⊂ (0, 1)} is a family of positive-integer-valued random variables independent of the sequence X, and IEν p = 1/p. Generally, no information on the ν-family is available. We shall consider a few cases, starting with a simple one: . ., i.e., ν p has a geometric distribution.
Theorem 1. Suppose that cases 1-3 hold. Let a = 0. Then, In the case of a > 0 (a profitable business), we have a Pareto distribution, which Pareto proposed on the basis of an empirical study (see [1]). For the proof of Theorem 1, see [5]. In [5], this result is obtained for a = 0. For this case, Z p must be changed to Under the condition of the existence of the logarithmic second moment of X 1 , the product Z p converges in distribution to a mixture of the distributions provided in Theorem 1. It is well-known that the Pareto distribution has heavy tails. This implies that capital belongs to a relatively small number of people. Now, we see that the Pareto distribution appears in a very natural way, described as a limit distribution for the product of a random number ν p of random variables X j . The value of ν p , p ∈ (0, 1), in case 3 has a geometric distribution. What will happen with other ("natural") distributions? Below, we consider two additional cases leading to the existence of analogs of stable distributions (see [6]):

4.
ν p has a probability-generating function

5.
ν p has a probability-generating function where T n (u) is a Chebyshev polynomial of the first kind and n = 1/ √ p is its degree.
From the result of [7], it follows that the limit distribution of log Z p as p → 0 has the density exp{−u/b}/ u 1−1/m b 1/m Γ(1/m) , u > 0. Now, it is sufficient to pass to the limit distribution of Z p from its logarithmic density.
Theorem 3. Suppose that cases 1, 2, and 5 hold. Let a = 0 and suppose that the second logarithmic moment of X 1 exists. Then, Proof. Similarly to the proof of the previous theorem, we have to pass from Z p to its logarithm, apply the corresponding result from [6], and go back to the limit distribution for the initial random variables.
None of the three models constructed above take into account the abilities of the people investing in the given enterprise, but they lead to heavy-tailed distributions. The investors differ only in terms of the occurrence of some unfavorable event (the moment ν p ). An objection is that this moment is the same for the whole store; i.e., it is insolvent for all investors at once because the investors invested in the business at different times. Therefore, the period for which the investment was made is different for each investor. Thus, we see that the dependence on the moment and the case are really very high. We do not deny that the dependence on the talent of the investor is indeed significant, but it would be very difficult to separate this component from random factors.
Here and later, we provide only the simplest examples of schemes leading to heavytailed distributions. Namely, the case of independent identically distributed random variables is considered. The reason for this is to show that the occurrence of heavy tails can be associated exclusively with randomness. Involving dependent variables can lead to the incorrect opinion that certain types of dependence are necessary for the appearance of such distributions. However, suppose that the reader wishes to see more general results. In that case, it suffices to note that results on limiting distributions for products of random variables can be obtained by applying the corresponding theorems for the sums of the logarithms of the original variables. The corresponding theory of the summation of a random number of random variables is quite well-developed (see [5,8]).
It is well-known that limit theorems are examples of ill-posed problems. This may raise doubts about the practical significance of the conclusions based on these theorems. Socalled pre-limit theorems allow us to justify these conclusions in many situations (see [5]).

Distribution of the Number of Citations
A similar situation occurs when the distribution of the number of citations of scientific publications is studied. Let us make some assumptions. Assumption 1. All scientists under consideration are equal in terms of their scientific and literary abilities.

Assumption 2. The citations of a paper occur independently.
Assumption 3. The probability that an article will be repeatedly cited depends on the number of previous citations. This probability is increasing in the number of citations. More precisely, the probability that an article with k − 1(k ≥ 1) citations will have no further citations is where a > 0 and b > a − 1.
Let Y be a random variable describing the number of citations during the considered period. Assumption 1 implicitly de facto implies that Y has the same distribution for different papers because the scientific abilities of the authors are supposed to be the same.
In view of the independence of the citations, the probability that a paper is cited exactly n times is where (a) n = a(a + 1) . . . (a + n − 1) is the Pochhammer symbol. It is not difficult to calculate this probability: The relation (2) shows that the distribution of the number of citations has a heavy tail, the severity of which depends on the value of the parameter a responsible for the degree of influence of previous citations. Therefore, a larger value of a corresponds to a heavier tail. In any case, the presence of such a tail makes it possible to conclude that the citation intensities of almost identical scientists can differ significantly, which leads to a significant stratification of the scientific community due to various random circumstances that have nothing to do with research abilities. Thus, the number of citations seems meaningless as an indicator of scientific value.
Comments on Assumption 3. At first glance, the relation (1) does not seem to be too natural. However, it seems almost unique asymptotically, leading to a heavy-tailed distribution. We will consider this in more detail but without complete proofs (obtaining general mathematical results is not an aim of this paper).
Let Y be the (random) number of citations of a paper. Suppose that the distribution of Y has a power tail. In other words, where κ(n) − −− → n→∞ 0 and has "regular" behavior in a sense. The symbol C is used for constants, which are possibly different. From (3), it follows that Suppose that κ(n+1) κ(n) is bounded from above. Then, the equality (4) implies that where κ 1 (n) possesses the same properties as κ(n). If where p k is the probability of the termination of citations, then Under some restrictions on the behavior of p k as k → ∞, we have The symbol ∼ is used here for asymptotic equivalence as n → ∞. Therefore, from (3), we must have Taking the logarithm of both sides of the last relation yields and It is clear that Assumption 3 leads to the same asymptotic behavior. However, the presence of the parameter b may make the asymptotics more precise if we fix not only the tail index α but also the corresponding constant C in (3). There remains the question of how many distributions may be represented in the form (6). Suppose that Y is a random variable taking positive integer values such that IP{Y = n} > 0 for any n ∈ N. Then, there are probabilities p n such that (6) holds. Indeed, we can write κ n = IP{Y = n} and Then, (6) holds. Note that p n represents the intensity rate for the distribution of Y . From the considerations provided above, it follows that, under mild restrictions, the distribution of a positive integer random variable possessing power tails has a representation (6), with p k being asymptotically equivalent to that of (1). The indicated method of the occurrence of heavy-tailed distributions on the set of positive integers turns out to be quite universal and probably can be applied to some classes of applied problems.
We now make some remarks on the distribution of the impact factor.
Let us now consider the possibility of using the impact factor of a journal as an indicator of the scientific significance of a paper published in it. The impact factor of a journal is calculated as the ratio of the number of citations of papers published over a certain period to the number of these papers. The idea of considering such an average value is connected with the idea that, according to the law of large numbers, the influence of chance will be leveled. However, we shall show that this is not true.
We mention that there exists a rather large body of literature stating that a scientific journal's impact factor has essential value. Based on the observed data, the presence of asymmetry in the distribution of the impact factor and the presence of a heavy tail have been noted. However, these circumstances have not been analyzed from a theoretical point of view; comments have only been made on the advisability of replacing the arithmetic mean with other statistics for the purpose of statistical data analysis. We note one of the typical works of this kind: [9]. True, the author notes the similarity of the distribution of some of the data with the Pareto distribution, but a mathematical analysis of the reasons for this is not carried out. In addition, the mathematically strict definition of a distribution is not considered; only its "naive" form is examined. Below, we will try to clarify the appearance of heavy tails in the impact factor distribution.
We assume that the number of papers submitted to the journal has a Poisson distribution. For simplicity, let us assume that the number of citations for each of the submitted papers has a Sibuya distribution. Then, the citation distribution for all papers has a probability-generating function that is a superposition of the generating functions of the Sibuya and Poisson laws. The probability-generating function of this superposition is P (z) = e −λ(1−z) p for a fixed λ > 0 and p ∈ (0, 1). Clearly, this distribution has a heavy tail with an index p. In view of the fact that p < 1, the law of large numbers is inapplicable in this situation. Moreover, in this case, the impact factor increases with the number of publications without increasing their scientific significance. The observed increase (over time) in the impact factors of leading journals confirms this circumstance. Now, we can conclude that the impact factor distribution has a heavy tail and cannot be used as an indicator of scientific significance.

1.
It is shown that distributions with heavy tails can arise in some manifestations of social inequality (the distribution of capital, the number of citations, or the impact factor) due to purely random reasons. In this case, the spread in the magnitude of inequality is significant.

2.
The circumstance specified in the previous item makes it impossible to use indices such as the number of citations and/or the impact factor of a journal as an indicator of the scientific significance (scientific quality) of a published work.

3.
We do not need any proof of the existence of heavy tails for the distributions under consideration. Their presence follows from the previously mentioned papers that Lotka, Pareto, and Zipf published many years ago, which have withstood the test of time.