Gintropy: Gini Index Based Generalization of Entropy

Entropy is being used in physics, mathematics, informatics and in related areas to describe equilibration, dissipation, maximal probability states and optimal compression of information. The Gini index, on the other hand, is an established measure for social and economical inequalities in a society. In this paper, we explore the mathematical similarities and connections in these two quantities and introduce a new measure that is capable of connecting these two at an interesting analogy level. This supports the idea that a generalization of the Gibbs–Boltzmann–Shannon entropy, based on a transformation of the Lorenz curve, can properly serve in quantifying different aspects of complexity in socio- and econo-physics.


Motivation
Many researchers use entropy as an appropriate measure for quantifying complexity or the inequality level in a complex system. There is an overwhelming choice in generalized entropy formulas, some of them satisfying more of the basic axioms than the others [1]. The classical Boltzmann-Gibbs-Shannon formula is often used in economic and social studies without elaborating too much on the conditions under which it is an appropriate thermodynamic function. Most prominently, the additivity of entropy upon the factorization of probabilities is, as a rule, not tested and therefore the use of entropy remains at the level of a crude analogy. Using the Tsallis-or Rényi entropy formula [2] is also not a sufficient choice. Although a free parameter in this entropy provides more flexibility in processing and interpreting statistical data and generalizing the additivity, there is no basic reason to not use yet another formula that satisfies the basic physical requirements for the entropy.
On the other hand, the most popular way for quantifying the inequality level in a socio-economic system is to use the Gini index, introduced for the first time by the economist Corrado Gini [3]. This measure provides a simple method of quantifying the deviation from a uniform distribution, and it is not a quantity borrowed by a simple analogy from thermodynamics. It also has the advantage that its value is a number in the [0, 1] interval, like an order parameter. The Gini index is 0 when all members of the investigated society are equal in the relevant quantity, and it is 1 if one member is monopolizing the whole of the available resources. The Gini index can be determined experimentally either graphically by constructing the Lorentz curve [4], or by the simple formula where x i is the relevant quantity for element i, and x is its average value for the whole system with N elements. While the Gini index is traditionally used to measure wealth-, income-or other inequality, the entropy is a concept stemming from physics and mathematics and is applied to understand, describe and construct optimal or equilibrium distributions. At first glance, these two termini show no reason to be connected. However, in recent publications, it has been observed that the Gini index and the total Shannon entropy of socio-economical models and data show a synergic behavior [5].
In this paper, we shall demonstrate that the mathematical construction formulas of the Gini measure of inequality in a society on the one hand and the entropy-probability trace formula on the other hand bring intriguing similarities at a certain step of their derivation. Both quantities are integrated quantities, in the sense of summing over alternative values of a basic variable, x. We propose the usage of the phrase "gintropy" in order to express the combination of the Gini index [3,6,7] and the entropy, both associated with a probability density distribution (PDF).

Basics
Let us consider the relevant quantity of the investigated system as a continuous variable x. This could be, for example, salary, wealth, population, etc. The occurrence frequency of this given value in a huge set of data are described by the normalized PDF: An approximation to such mathematical PDFs is given in the praxis by observing the number of occurrences of values in a short bin [x, x + dx] and dividing these by their sum, the total number: with N tot the total number of observed data. In income distributions, for example, N(x, x + ∆x) is the number of persons having an income in the ∆x interval starting at x. The total income is then obtained as and the average income is given by Both the entropy and the Gini index can be expressed as expectation values of some functions of x over the PDF ρ(x), and we are going to demonstrate the latter in the present paper.
Not only the PDFs, but frequently the cumulative distributions are in our light-spot. The first reason for this is that the experimental shape of the cumulative functions are smoother even in the case of a poorer statistics. The second reason is that, especially for income distribution and inequality, the total body of "rich" is better contrasted to the"poor".
It is straightforward to construct the quantity "the population fraction of richer than x" as the tail-cumulative integral of the PDF: A similar cumulative quantity is the wealth accumulated by this richer class, divided by the average income: Trivially, one obtains C(0) = 1 and F(0) = 1. The famous Pareto-law expresses that a p fraction of the population possesses a (1 − p) fraction of the wealth. In the original statement about the economy at the end of 19th century, it was p = 0.2, formulated as the "80/20" rule: 20 percent of the population having 80 percent of the total wealth [8][9][10]. Later, a "90/20" rule has also been suggested by Dunford [11]; this loses, however, the elegant definition of the Pareto point (see the next paragraph). Analyses of national GDP comparisons and wealth distribution in certain countries often use in the wealthy region a power-law fit, ρ(x) = cx −(1+α) , calling the parameter α the Pareto-index [12][13][14][15]. It is however largely debated where should one consider the cut-off in the distribution curve, over which the tail is of the power-law type. For part of the PDF, exponential fits can also be done [16]. As an overall fit to the whole income distribution curve recently, it has been shown that a Tsallis-Pareto cut power-law or some special beta prime distribution works well [17].
For a simple division of the system in an upper and lower class the x P Pareto-point is used, satisfying: The implicit relation, x P (p), depends on the underlying PDF, ρ(x). Since C(0) + F(0) = 2 and the general sum is monotonically decreasing, due to there is always a point x = x P , where C(x P ) + F(x P ) = 1. However, the value p cannot be arbitrary.
As we shall discuss in the next section, the Gini index, G, can be expressed in several alternative ways: (i) as the average of big differences in the data set, (ii) as a construction using the above cumulative quantities or (iii) as an expectation value of the cumulative of the cumulative. G expressed as an integral over C contains an integrand σ(C). For some PDFs, this function turns out to be formally identical with the terms in entropy-a probability trace formula known from elsewhere. These formulas define the gintropy, as a function of the cumulative measure of being "richer than", σ(C)-and this function coincides with the classical entropy for an exponential PDF, like the Gibbs-Boltzmann distribution of energy in thermodynamics. For some other, frequently considered distributions in complex systems, the gintropy resembles terms of various generalizations of the Gibbs-Boltzmann-Shannon entropy. Among others, we arrive at the Tsallis entropy for the original Pareto distribution, and some further interesting cases. By construction, as we shall demonstrate later, the gintropy curve is the difference between the Lorenz curve and the diagonal in the F vs. C maps.
In the sequel of this paper, we explore these formulas as several facets of the Gini index and its calculation. After the mathematical definitions and equivalent forms, we present certain analytically given PDFs, each reflecting a theoretical possibility about income inequalities: extreme communism giving every person the same income; a divided society defining two classes of the previous case with a fixed share; eco-window, providing equal probability to any income in a fixed, but possibly even an infinite interval; the exponentially distributed income taken as an analogy to the nature of atomic physics; and finally the Pareto-distribution characteristic to capitalism. A different Gini index, G, and also a different gintropy, σ(C) belong to each model. Finally, we collect a few ideas about what laws the Gini index and gintropy may follow: is there a trend akin to the second law of thermodynamics? Are societies closed systems or not? Can or must inflation distort our analysis?

Gross Inequality in General
Let ρ(x) be a normalized PDF. The Gini index in the continuous x case is defined as: It can easily be proven that its value is always between zero and one, and is used to quantify the gross inequality in the distribution ρ(x). The original definition (10) can be expressed by using the cumulatives as This expression can be further comprised by considering the cumulative of the cumulative: Finally, from here, the Gini index is then expressed as a ratio of two expectation values: Alternatively, it can be solely expressed via the cumulative population. From the corresponding definitions, we have the derivatives: ρ(x) = −dC/dx, xρ(x) = − x dF/dx and therefore (13), and, integrating by parts, we get: Using the boundary conditions C(0) = 1 and h(0) = x , we arrive at This form is reminiscent of the quantum impurity measure, Tr(ρ − ρ 2 ), which is zero only for pure states. In the theory of searching trees in informatics, the expression [18]. Let us also note here that for scaling PDFs, i.e., ρ(x) = 1 x f x x , the cumulative functions, C, and the Gini index, G, do not depend directly on x , it depends only on the form of the f (z) function. This is important when studying the history (time evolution) of G and the related constructions: an overall inflation increasing x in time will not influence this inequality measure.
Finally, we arrive now at the construction of the quantity gintropy. A fashionable representation of the Gini index is realized by plotting the cumulative wealth percentage in terms of the cumulative population possessing that wealth like in Figure 1. It can be shown that the half-moon area between the Lorenz curve [4,[19][20][21][22] and the diagonal of the unit square (known as equality line) in such an F(x) vs. C(x) plot, is exactly G/2. The integrand, σ(C), under the integral over C-which runs between zero and one-behaves alike an entropy-density. (The original and nowadays used Lorenz curve actually maps the low-cumulatives, integrated from zero to x. However, σ(C) instead of σ(C) does not remind to entropy formulas). We call this quantity gintropy, and define as the difference between the rich-end-cumulative Lorenz curve and the diagonal: From the above definition, σ(C) remains to be reconstructed with the help of C(x). We note that, using the relations C(x) = 1 − C(x) and F(x) = 1 − F(x), this quantity equivalently can be expressed by the poor-end-cumulative Lorenz curve (the generally used form of the Lorenz curve) too: The Gini index is expressed from the gintropy as a simple integral We note here that, for any integral, one substitutes It is interesting to summarize the proof of this statement here because it is a central motivation of thinking in terms of gintropy. Using the respective definitions of the tail-cumulative quantities, the half-moon area (16) is calculated as the following double integral: Changing the order of integration leads to Here, the term with 1 in the first parenthesis integrates to zero due to the definition of the expectation value, x . Then, we replace −C(y)ρ(y) = 1 2 d dy C 2 , integrate by parts and compare the result to (15) to conclude: Now, we explore some basic properties of gintropy. Some of these provide further evidence to consider gintropy like a generalized entropy density.

1.
The gintropy is never negative: σ = F − C ≥ 0 is proven by inspecting the integral and taking the first form for x ≥ x , the second form for the opposite case. This implies that the rich-end wealth fraction is always bigger or equal to the population fraction possessing it.

3.
According to Equation (8) at the Pareto-point, the gintropy equals σ(x P ) = 1 − 2p, and therefore, for the Pareto point, p ≤ 1/2 holds for the rich fraction. Since σ max ≥ σ(x P ), in order to get a Pareto point: σ( x ) ≥ 1 − 2p, i.e., the maximum of the gintropy has to be bigger than this difference value. As a consequence for the Pareto Point, we have a restriction imposed by the maximal gintropy (1 − σ( x ))/2 ≤ p ≤ 1/2.

4.
The expectation value of gintropy is the half of the gini index:

The integral of gintropy over the base value x is the non-Poissonity index,
with Var(x) = x 2 − x 2 being the variance of x. The proof of this statement uses the same mathematical trick as the one in Equation (12). 6.
For some particular PDFs, σ(C) looks like an entropy density formula, s(p i ). We present important examples in the next section.

Important Examples
In this section, we list some important examples of the gintropy, σ(C). We go through primitive models of income/wealth distributions, labelled as communism, communism++, eco-window, natural, or capitalism. Starting from model PDFs, the gintropy expression and the Gini index are calculated.

Communism
Our first example is communism: all incomes are equal, the PDF is simply a singular delta-distribution, peaked at the single value a: ρ(x) = δ(x − a) leading to x = a, C(x) = Θ(a − x) and h(x) = (a − x)Θ(a − x), with Θ(x) the Heaviside step function defined as: This leads to x F = h + xC = aΘ(a − x) and by that i.e., to an identically vanishing gintropy. As a consequence, also G = 0. Here, the Pareto-point belongs to a 50/50 division.

Communism++
The next example we present is a slight variation of the previous: now, two peaks in a given ratio constitute the PDF. This belongs to a two-class-society where all are equal but some of them are more equal. The two-peak-PDF, The w fraction of the population has an income a and the (1 − w) fraction b. The cumulative rich population graph shows two steps, at a and b, respectively: having the value 1 for x ≤ a, (1 − w) for x ∈ [a, b], and 0, otherwise. Therefore, C(x) = 1 − C(x) is zero for x ≤ a, equals w in the mid interval, and has the value 1 otherwise. The Gini index is obtained from this as: Expressing the weights, w = b− x b−a and 1 − w = x −a b−a , we obtain the alternative form It is worth noting that, for a → 0, i.e., when the lower class has (almost) zero income, the Gini index, cf. (26) tends to G → w, exactly the share of the people earning a → 0 in the population. This result is independent of b, the income in the upper class.
The gintropy, following its definition, first is expressed as a function of x: It is easy to see that, outside the interval [a, b], the gintropy is zero. Inside the interval, only the second term survives giving In conclusion, σ(C) shows a plateau at C = 1 − w with the value G and its jumps are at C(a) = 1 − w/2 and C(b) = (1 − w)/2: It is easy to check that indeed The corresponding Lorenz curve is illustrated in Figure 2a.

Eco-Window
The next example is still mathematically simple with a window-form PDF. We label this as eco-window: here, everyone has the same chance for all of possible incomes between a and b. Eventually, a = 0 and/or b = ∞ may be considered, as special cases. For the PDF ρ( , one obtains the following cumulative rich distribution: Obviously, x = (a + b)/2 and, according to eq. 15, the Gini index becomes: After some tedious but straightforward calculation, the gintropy is obtained as a function of C: For a specific choice of a and b, the corresponding Lorenz curve is illustrated in Figure 2b.

Natural Distribution
Our next example is the natural distribution, mimicking the Boltzmann-Gibbs exponential energy distribution, known from statistical physics. This is not necessarily an equilibrium distribution, it may also be the stationary limit of "growth and resetting" type processes with quantity-independent rates [23]. The PDF is a scaling one: ρ(x) = 1 x e −x/ x . The corresponding tail-cumulative probability, the rich population is given by and the Gini index becomes Our gintropy formula is constructed as follows: first, we obtain the cumulative of the cumulative, From this, it is easy to obtain the wealth share of the rich classes, x F = h + xC = (x + x )e −x/ x , and based on this the gintropy In order to express it as a function of C, we invert (35) to have Finally, it leads to Apart from a constant proportionality factor, this formula formally coincides with the terms in the sum of the Boltzmann-Gibbs-Shannon entropy: To continue the analogy, also C ∈ [0, 1]. Indeed, in this case, gintropy is like the entropy density, with the caveat that the cumulative values C(x) are never disjunct for different x-s; instead, they overlap and show a definite hierarchy. The Lorenz curve for x = 1 is illustrated in Figure 2c.

Capitalism
Our last example is capitalism, conjecturing the base PDF being the cut Pareto (known also as Tsallis-Pareto or Lomax II) distribution [24]: This distribution can also be obtained as the canonical equilibrium optimizer of the Tsallis entropy [25]. The tail-cumulative integral is which upon integration leads to the following cumulative of the cumulative: This result also delivers the expectation value, x = h(0) = 1/AB. The Gini index is calculated in the (15) form, and it becomes The gintropy as a function of the income, x, follows the form In order to express this result akin to the entropy, we write σ as a function of C using the inversion of Equation (43) and we obtain Finally, using the Tsallis parameter, q = B/(B + 1), we arrive at the formula: One immediately makes an analogy with the terms in the Tsallis entropy formula: The Gini index is simply Similar to the previously considered cases, we illustrate the Lorenz curve for this distributions as well. For A = 1 and B = 3, the corresponding Lorenz curve is plotted in Figure 2d.
Finally, we summarize the lesson of the considered theoretical examples in Table 1 and Figure 3.

Conclusions
In this work, we explored a density-like quantity called gintropy which occurs in calculating the Gini index, G, for a given relevant socio-economic distribution, ρ(x). This gintropy can be deduced from two cumulative functions, the rich population fraction and the corresponding richness fraction, C(x) and F(x), respectively. The proposed "gintropy" name is meant to suggest a connection between the inequality measure quantified by the Gini index and the entropy. Its dependence on the rich population fraction cumulative function is reminiscent of terms in entropy formulas, known from physics, statistics, and informatics. More precisely, we found that, for the the natural, exponential PDF, the gintropy is reminiscent of the classical Boltzmann-Gibbs-Shannon formula, σ(C) = −C ln C. The Gini index is then the expectation value of the gintropy function; for the exponential PDF, its value is 1/2. For the Tsallis-Pareto distribution, the Gini index must be always over this value.
Several other PDFs have been suggested to describe income or wealth distributions in terms of time [17,[26][27][28][29]. Many of them are not treatable analytically, so the σ(C) relation can only be explored numerically.
Beyond igniting the theoretical phantasy, the gintropy-reminscent of generalized entropy formulas-is also the one-variable density, which lays under the Gini index, originally defined for measuring inequality. Turning this statement around, should we look for such generalizations of the classical entropy formula that are inequality or impurity measures at the same time? We believe that this criterion selects a subclass of possible statistical theories among all possible approaches to the origin, behavior and future of social and economical inequalities. Even generalizations of the Gini index formula have been suggested a few times, cf. [30,31]. We do not expect that a corresponding gintropy ("Lorenz curve minus the diagonal") would resemble any known entropy formula, but this question needs further study.
Finally, it seems that the "correct" entropy measure for economical and social theories can hardly be a simple copy of the classical formula known from physics, mathematics and informatics. Our procedure, described above, is more promising: a recipe for constructing gintropy from cumulative functions of the underlying PDF whose expectation value is the half Gini index and whose dependency on the cumulative rich population coincides with various generalizations of the entropy-probability formula.