Symmetry, Entropy, Diversity and (Why Not?) Quantum Statistics in Society

We describe society as an out-of-equilibrium probabilistic system: in it, N individuals occupy W resource states and produce entropy S over definite time periods. The resulting thermodynamics are however unusual, because a second entropy, H, measures inequality or diversity―a typically social feature―in the distribution of available resources. A symmetry phase transition takes place at Gini values 1/3, where realistic distributions become asymmetric. Four constraints act on S: N and W, and new ones, diversity and interactions between individuals; the latter are determined by the coordinates of a single point in the data, the peak. The occupation number of a job is either zero or one, suggesting Fermi–Dirac statistics for employment. Contrariwise, an indefinite number of individuals can occupy a state defined as a quantile of income or of age, so Bose–Einstein statistics may be required. Indistinguishability rather than anonymity of individuals and resources is thus needed. Interactions between individuals define classes of equivalence that happen to coincide with acceptable definitions of social classes or periods in human life. The entropy S is non-extensive and obtainable from data. Theoretical laws are compared to empirical data in four different cases of economic or physiological diversity. Acceptable fits are found for all of them.


Introduction
In previous papers [1,2], we fitted Lorenz inequality curves [3]-non-thermodynamic quantities at first sight-with a simple model of social entropy. Symmetric distribution laws predict equal probabilities of being in the oldest or youngest decile, or to belong either to the richest or the poorest one. In fact, important differences between such deciles are found practically everywhere, and this requires, as shown below, Gini [4] coefficients Gi ≥ 1/3. The assumption of a symmetry phase transition, similar to that in binary alloys [5,6] and superconductors [7], provides very good fits to data [1]. Four cases in this paper hint indeed at asymmetric distributions as the real-world rule.
Entropy can measure social diversity or inequality [8,9], and possibly other "qualitative" quantities like difficulty, ability [10] or sensitivity. Societies cannot exist without a minimum of cohesion among their members, that is, individuals are correlated through an attractive relation. Interactions are successfully taken into account in the present paper. In principle, they may affect the definition of inequality indicators. The latter are expected to satisfy certain conditions such as anonymity [11], that is, all permutations of individuals or their resources are equivalent and count for one. We discuss the statistical consequences of this conjecture.
Whether societies are or are not in equilibrium is a relevant question in any theoretical approach [12]. They have been visualized as nonequilibrium dynamic networks of voters [13]. A significant difference between them results here from the assumption of a symmetry phase transition Interclass correlations and non-additive entropies [20,21] finally furnish a convenient picture of social systems.
(vi) Indistinguishability. Statistical descriptions of employment and incomes may be drastically different. A Fermi-Dirac (F-D) statistic applies to employment states, just because the number of individuals on a job is either zero or one. Alternatively, if states are specified as quantiles of income, the upper limit to the amount of benefit in any of them is total resource, which pleads for Bose-Einstein (B-E) statistics. Social and economic laws are thus expected to be invariant against exchange [22]-rather than permutation-of two indistinguishable-rather than anonymous-individuals or resources.
Mathematical functions are assumed to fulfil the conditions of continuity, differentiability, and so on, required to perform indicated operations on them. We mark conceptually important conjectures by the letter "C" followed by an ordinal. Indistinguishability means then (C1) that social phenomena admit a quantum-like statistical description. Incidentally, other cases exist where classical entities [23,24] obey quantum statistics.
Section 2 discusses the relation between social states and entropy. Section 3 dwells on fictitious societies of independent individuals, and Section 4 examines an inequality-and interaction-dependent model providing rather good fits to actual data. Conclusions appear in Section 5.

Symmetry, Entropy and Universality
Let F(ω) be the cumulative population fraction (CPF) and L(F) the cumulative benefit fraction (CBF). The Gini coefficient is, by definition, That is, all Lorenz curves having the same value of L have the same Gini coefficient. In particular, symmetric distributions should be Gini-equivalent to uniform distributions, with maximum and minimum benefits ω M and ω m , respectively. Define R u = ω m /ω M ≥ 0: the uniform probability density function is f u Perfect equality results when R u = 1 and L u = F u . Symmetric distributions lead instead to a maximum of inequality when R u = 0 and thus L u = F u 2 . Symmetry is just incompatible with Gi > 1/3. As shown in [1], symmetric distributions have ω m + ω M = 2, while asymmetry imposes ω M > 2, actually the usual case in real life. Now, since asymmetric distributions and Gini values above 1/3 do exist, a symmetry change-a phase transition-must take place, which is expected to be at Gi = 1/3. Experimental evidence supports these results: Figure 3 in reference [25], dealing with size distributions of beer bubbles, shows a great number of Gini coefficients above 0.33, and none below, showing that symmetric distributions are indeed unlikely. In a phase transition, universality is expected, whereby near the transition thermodynamic quantities and their possible social counterparts are generalised homogeneous functions [26] of their arguments. We apply this condition to social welfare U(w; W, N) [11].

Welfare, Inequality and Symmetry
Social welfare must increase with W and decrease as 1/N when the population increases but total benefit remains constant. Generalised homogeneity means then that transformations W → aW and N → bN reduce to multiplication of both U(·) and w by a/b. That is, a = 1/W and b = 1/N require Equation (3) compels the independent variable in U 0 (·) to be ω = w/w. Social welfare should decrease as inequality increases, a condition satisfied by Foster and Sen's [27] proposal, where U 0 (ω) = 1 − I(ω) = H(ω)/max(H(ω)) and I(ω) is the normalised measure of inequality. Properties of inequality indicators [11] easily follow from the fact that ω, and therefore I(ω), are scale-, replicationand permutation-invariant, that is, they do not change if all benefits are multiplied by the same positive constant, the distribution is replaced by a number of replicas of itself, or the ordering of components of the vector ω is changed. Coincidence of economical and statistical approaches reinforces the present one.

Interactions
Phase transitions reveal relevant interactions in both thermodynamic and social systems. Assume then that individuals occupy sites r i in a periodic lattice embedded in a Euclidean space of dimensionality d. Interaction links between individuals are randomly established. We measure distances r ij = |r i − r j | in this space in units of nearest-neighbour distance and assume correlations to exist and to decrease with distance as corr(r i , r j ) ∼ r ij −δ , with δ positive. Such is the case in percolative clusters. In a qualitative approach [21], consider constant-density groups: an individual out of N in a cluster of linear size R ∼ N 1/d interacts with ln N. We refer to functions ln θ (·) as quasi-logarithms.
If θ is positive, when N goes to infinity the number of interactions per individual remains finite, of the order of 1/θd = 1/(δ − d) ≥ 0. This defines short-range correlations: society behaves as an assembly of independent finite clusters, with possible inner interactions. Long-range correlations, where each individual is connected to infinitely many others as N grows without limit, occur for θ ≤ 0. The parameter θ thus conveys information on the existence, the range and, as we shall show in Section 4.1, the strength of many-body interactions. We point out that ln q (·), where q = θ + 1 = δ/d, is a more usual notation for quasi-logarithms.

Classical Independent Individuals
Contrary to the preceding Section, assume that society is composed of noninteracting anonymous individuals, whose permutations count for one. They form K groups of N k individuals each; the entropy is that of Maxwell, Boltzmann and Gibbs (MBG), The symbol {N} means that each term in the sum satisfies ∑ k N k = N. This is similar to the case of gases in physics, although the latter are not affected by inequality. It in fact becomes relevant in social systems. Its measure is given by the MBG entropy (3). A simple textbook exercise [21] shows that this is not the right way to count configurations in social systems: quantum statistics are necessary.

Paradoxical Distinguishability
Count the states in an elementary society consisting of two distinguishable individuals, A and B, two equally distinguishable BUs labelled a and b, and G = C = 3 states, numbered k = 1, 2, 3. Employment states result from three jobs that individuals can occupy or not, and where available resources can alight. If states are defined by income, the amount of resource in each of them is arbitrary. We use Dirac's notation as discussed in the Introduction. An equal sign relates equivalent configurations (all permutations count for one), while the sign "⇔" indicates their indistinguishability (the statistic of independent individuals is either F-D or B-E). The MBG expressions imply that N = W = 2 such individuals or BUs populate three states in A paradoxical result in more than one sense, as we now show. Assume anonymity, that is, all permutations of individuals A and B are equivalent and count for one [11]. Which is one too many for employment states, because Γ MBG involves configurations of the type A|k + B|k = B|k + A|k , with k = k , that is, where individuals A and B occupy the same job. The notion of state shows here its relevance: anonymity ignores the fundamental zero-or-one restriction on the occupation of employment states. Only three states instead of nine are possible if A|k + B|k ⇔ A|k + B|k satisfies the condition k = k . This is similar to Gibbs paradox in classical statistical physics. A F-D statistic furnishes the right value for employment:

Resource Paradox
Five ten-unit banknotes are physically distinguishable from a single bill of fifty units, but they are socially indistinguishable. Individual states have total benefit as an upper limit of income, so this type of resource obeys B-E statistics. Three states |k involve therefore six possible configurations instead of nine, of the type a|k + b|k ⇔ a|k + b|k ⇔ 2a|k k , where k = k is now included. Combinatorics gives the number of configurations for C k benefit states as The second Equation (4) applies when C k 1.

Individuals' Paradox
Social individuals, like resources, are B-E indistinguishable, and an equation similar to (4) should apply. Indeed, one finds Γ BE = 6 for our elementary society. With G k states and N k individuals in group k, we have, in general: The second equation applies when G k 1. Statistics for elementary particles result from their spin and are therefore an intrinsic particle property, but they depend on the nature of individual states in social systems. The same individuals may obey F-D employment statistics and display B-E behaviour when their incomes are at stake.

Unattainable Dilution
Is there a connection between the number of states C k (Equation (4)) and the number of individuals N k (Equation (5)) in spontaneous groups? Classical statistics requires a high degree of dilution, that is, many more benefit states than individuals, N k /C k 1. This would mean, for examples discussed here, many more jobs than employees, or life expectancy at birth well above one hundred years. In actual fact, these quantities are not strictly equal but of the same order of magnitude, C k ≈ N k = G k ν k . This asserts the impossibility of dilution. We therefore conjecture that the attractive intraclass interaction, referred to in the Introduction-a many-body effect-is strong enough to induce high occupancy of available benefit states. This amounts to a new formulation of C1, C1', in the particular case of spontaneous groups: Dilution is impossible for such groups, they never behave classically.

Entropic Duality
Apply C1' and Stirling's large number formula to Equation (4). One obtains a dimensionless measure of social equality given by the out-of-equilibrium B-E entropy [28] with G k ν k states in group k: to which the group contribution is Social individuals produce in turn ν-dependent entropy resulting from Equation (5): where the B-E entropy production by ν k individuals on G k states is Equation (9) applies equally well to equilibrium and nonequilibrium entropies, but of course the values of ν k in each case are different. The functional relation ν(ω) requires still another conjecture, C2: The most probable path for entropy production maximises the number of ways of reaching the final distribution, and thereby the socially constrained entropy production S BE (ν) during the period of interest [29].

Constraints
Two constraints are obvious, N = ∑ K k=1 G k ν k and W/w = ∑ K k=1 G k ν k ω k . Lagrange multipliers should thereby result in two adjustable parameters, namely α and β. But the out-of-equilibrium value of the entropic form H BE (ω), due to society's self-inflicted inequality, is a third constraint. An additional Lagrange multiplier is necessary, and results in a new parameter, λ, measuring diversity. According to C2 and Equations (6) to (9), entropy production s BE (ν k ) obeys that is, the first term in the second Equation (10) is a linear function of the social free energy per individual: formally similar to the Helmholtz free energy per molecule of a B-E ideal gas at "temperature" −λ; as shown in the next Section, this quantity is positive. The distribution law for noninteracting individuals is: In case of a F-D statistic for individuals, but no change in the B-E nature of resources, it becomes A similar procedure would apply to any number of independent resources, as many entropic forms H BE,i , parameters λ i and several peaks in the distribution. Our data show however a single peak in ν BE (ω) at ω = ω p : it is a poverty peak for income, a youth peak for life expectancy and an old-age peak for cancer incidence. It coincides with a minimum of the social free energy.

Parameters
Since Equation (11) implies that ϕ BE (0, λ) = 0 for any finite λ, α > 0 determines the fraction of population 1/(e α − 1) suffering from extreme poverty or very short life expectancy, that is, ω ≈ 0; α = −βµ, where µ < 0 is the counterpart of the chemical potential in physics. The peak abscissa ω p defines λ through Now, ω must be positive because nobody can survive without resources. This is in particular the case of ω p , so Equation (14) shows that only negative values of λ(ω p ) are realistic. Parameters β and α result from linear regression once λ has been determined; 1/β = ω + λ h BE (ω) = ϕ BE (ω, λ) plays the role of absolute temperature.

Results
The distribution of incomes in the USA shows apparently spurious oscillations, with local maxima and minima that happen to coincide with tax return brackets. Numerical smoothing was necessary, and it was applied to all distributions to warrant equality of treatment. For the same reason fitting was also sought for smoothed curves, which had anyway little effect on resources other than incomes. Equation (10) assumes independent individuals and refers to the whole distribution. It therefore predicts a single straight line for plots like those in Figure 1, where Figure 1a refers to annual incomes per household in the USA, the insert being an enlargement of the low-income region; and Figure 1b illustrates cancer incidence on male population in New York City. Now, Figure 1a displays three segments instead of one, and Figure 1b shows four segments. Relevant ages and middle-class income boundaries are indicated and look reasonable. Not shown, life expectancy also displays four segments, with age boundaries close to those in Figure 1b, but in reverse order; 13, 45 and 65 years. Electricity consumption requires five segments, to be discussed in Section 4.2.
Segments in Figure 1 provide a piecewise fit of Equation (10). This is compatible with the picture of finite clusters as described for short-range interactions in Section 2.2. The point here is that each segment clearly corresponds to a social class in Figure 1a and well-known periods in human life in Figure 1b. Populations economically or physiologically wealthier, when ordered by increasing benefit, show decreasing slopes in Figure 1. This results from the definition of absolute "temperature" that follows Equation (14), for segments fitting "hotter" (richer or younger) fractions of society. Their slopes thus describe intraclass behaviour. Possible interclass interactions are discussed in the next Section. Greater equality implies a decrease in the number of states per individual in a group, 1/ν k = G k /N k .
The fraction of population F p = Pr(ω ≤ ω p ) objectively defines the poorest or the oldest in the distribution. Dissimilar data like ages and social classes provide similar results and so plead for common treatment of different types of diversity. therefore predicts a single straight line for plots like those in Figure 1, where Figure 1a refers to annual incomes per household in the USA, the insert being an enlargement of the low-income region; and Figure 1b illustrates cancer incidence on male population in New York City. Now, Figure 1a displays three segments instead of one, and Figure 1b shows four segments. Relevant ages and middle-class income boundaries are indicated and look reasonable. Not shown, life expectancy also displays four segments, with age boundaries close to those in Figure 1b, but in reverse order; 13, 45 and 65 years. Electricity consumption requires five segments, to be discussed in Section 4.2.1.  Figure 1 provide a piecewise fit of Equation (10). This is compatible with the picture of finite clusters as described for short-range interactions in Section 2.2. The point here is that each segment clearly corresponds to a social class in Figure 1a and well-known periods in human life in Figure 1b. Populations economically or physiologically wealthier, when ordered by increasing benefit, show decreasing slopes in Figure 1. This results from the definition of absolute "temperature" that follows Equation (14), for segments fitting "hotter" (richer or younger) fractions of society. Their slopes thus describe intraclass behaviour. Possible interclass interactions are discussed in the next Section. Greater equality implies a decrease in the number of states per individual in a group, 1/ / .

Interacting Classes
Can the theory feature slope changes and thereby interactions? Consider, for instance, a three-class system like that in Figure 1a, and independent quantities x, y, z, with probabilities p x , p y , p z , respectively, so total probability is p xyz = p x p y p z , with θ = 0. The replacement of logarithms by quasi-logarithms results in products that should be responsive [20,21] to interclass correlations: ln θ (p xyz ) = ln θ p x + ln θ p y + ln θ p z +(−θ)[ln θ p x ln θ p y + ln θ p x ln θ p z + ln θ p y ln θ p z ] +(−θ) 2 [ln θ p x ln θ p y ln θ p z ], (15) where square brackets enclose linear combinations of such products. Factors (−θ) k−1 in Equation (15) measure the strength of a many-body interaction among k = 1, 2, . . . , K classes.

The Model
The replacement of s BE (ν k ) in Equation (10) by s θ (ν k ) = (1 + ν k ) ln θ (1 + ν k ) − ν k ln θ ν k leads to a simple interaction-sensitive model. We have: Since resources are not expected to interact, Equations (6) and (14) give again the measure of equality and the value of λ, respectively. Plots of ∂s θ (ν k )/∂ν k as functions of ϕ BE (ω, λ) are close to a single straight line. We therefore obtain a first approximation to the numerical value of θ from the condition that it maximises the ordinate at the peak. An even better guess stems from a maximisation of Pearson's correlation coefficient for the whole line; β and α result from subsequent linear regression on Equation (16) once λ and θ have been determined. Uncertainties due to multidimensional fits are thus avoided. Four such fits appear in Figure 2. They validate the asymmetric-entropic-quantum model, in particular conjectures C1, C1' and C2.
Class boundaries and periods in human life are obtained from plots like those in Figure 1. Poverty, for instance, is objectively defined by the region [0, ω p ] under the data curve in Figure 2a.

Resource-Dependent Interactions
Annual per capita electricity consumption in Figure 2d is a special case; not only it is an example of rather unusual long-range interactions, it also requires two values of the -parameter, 0.18 for an overall fit and 0, apparently noninteracting behaviour for two groups, of 22 and 23 countries. A possible explanation is that interactions between countries result mainly from their exchange of electricity, often carried out to optimise each country's production systems. The poorest nations rely heavily on-and therefore interact strongly with-foreign production, which introduces correlations between countries. They would disappear for self-sufficient countries, which would thus form the first 0 group. Increasing production would make trade and therefore correlations to reappear, but they vanish again when import and export compensate each other, that is, the second group. The interplay of production, consumption and exchange imposes resource-dependent interactions and thereby several values of the -parameter. Figure 2d illustrates another possible drawback: no data were available for the poorest countries (about 20 in number). As a result, the poverty peak is missing, unfortunately a rather common situation for an insufficient number of datapoints. The peak is assumed to coincide with the lowest electricity consumption in Figure 2d, and this is enough to furnish a rather acceptable fit for the data.

Resource-Dependent Interactions
Annual per capita electricity consumption in Figure 2d is a special case; not only it is an example of rather unusual long-range interactions, it also requires two values of the θ-parameter, θ = −0.18 for an overall fit and θ = 0, apparently noninteracting behaviour for two groups, of 22 and 23 countries. A possible explanation is that interactions between countries result mainly from their exchange of electricity, often carried out to optimise each country's production systems. The poorest nations rely heavily on-and therefore interact strongly with-foreign production, which introduces correlations between countries. They would disappear for self-sufficient countries, which would thus form the first θ = 0 group. Increasing production would make trade and therefore correlations to reappear, but they vanish again when import and export compensate each other, that is, the second group. The interplay of production, consumption and exchange imposes resource-dependent interactions and thereby several values of the θ-parameter. Figure 2d illustrates another possible drawback: no data were available for the poorest countries (about 20 in number). As a result, the poverty peak is missing, unfortunately a rather common situation for an insufficient number of datapoints. The peak is assumed to coincide with the lowest electricity consumption in Figure 2d, and this is enough to furnish a rather acceptable fit for the data.

Conclusions
A simple probabilistic model fits data from two types of statistical phenomena, demographic and economical, through four different examples. Dynamical effects and interactions [13] are taken into account through their averages over definite periods of time, which allows good fits of real data. The model's extension to other cases where inequality, symmetry and/or quantum behaviour are at work may be expected. It depends on the statistics of state occupation (of two types, F-D or B-E, the latter being the case of all examples in this paper), the resulting symmetry (two possibilities, though symmetric distributions are rather doubtful), the type of interactions (two levels in this paper, intraclass and interclass), their intensity (possibly variable, as for electric energy consumption) and their long or short range.
Results on welfare and universality support the proposal of a symmetry phase transition at Gi = 1/3 between conceivable but unlikely symmetric distributions and realistic asymmetric laws. Phase transitions are thus possible not only in material systems, but also in societies, physiology and perhaps other correlated structures. Equation (16) provides expressions for their distribution laws, whereby specific regions under the distribution curve objectively define youth and oldness or poverty and wealth. Different forms of inequality are thus found to admit similar descriptions.
We obtain good quality fits to data by applying the paraphernalia of well-known, century-old statistical mechanics (states, entropy, Lagrange multipliers . . . ) to social matters. The manner in which this is done is however atypical, particularly because symmetry is a relevant parameter, two entropies at least are at work, and nonquantum free-will individuals obey quantum statistics. Societies are considered as nonequilibrium, interacting, entropy-producing statistical systems. As a result: (1) individuals interact in at least two ways-intraclass and interclass; (2) inequality is an example of a "qualitative" quantity, for which generalisations of the present approach may be expected; (3) households and resources are clearly not quantal, but socially indistinguishable, wherefrom quantum statistics finally follows-quantum behaviour is not intrinsic, but results from the nature of occupied states; and (4) one of the entropies is the outcome of activities in an evolving society, the other simply measures inequality in the distribution of available resources, and furnishes a constraint on the former. In the general case, multiple entropies would be required to account for different types of inequality, and as many constraints on social entropy production S(ν) would result. Conjectures similar to C1' and C2 would be required. The concepts of extropy, class interaction, multiple entropies and social free energy appear as efficient approaches to nonequilibrium evolving systems. Furthermore, diversity occurs in so many domains that similar methods may be expected to apply to energy production, environmental and other complex systems.
Only two supplementary parameters, θ due to the strength and range of interactions and λ related to inequality, suffice to transform the ideal-gas description of independent individuals into a predictive model of society. They result from the coordinates of a single point on the empirical distribution law, the peak. The additional information thus obtained may look rather scanty at first sight, were it not for a remark by E.T. Jaynes [30]: "Entropy as a concept may be regarded as a measure of our degree of ignorance as to the state of a system". Our successful maximisation of entropy production implies then the safest possible assumption, that is, minimum social knowledge of economic and demographic statistical facts.