On the Decomposition of the Esteban and Ray Index by Income Sources

This paper proposes a simple algorithm based on a matrix formulation to compute the Esteban and Ray (ER) polarization index. It then shows how the algorithm introduced leads to quite a simple decomposition of polarization by income sources. Such a breakdown was not available hitherto. The decomposition we propose will thus allow one to determine the sign, as well as the magnitude, of the impact of the various income sources on the ER polarization index. A simple empirical illustration based on EU data is provided.


Introduction
During the past 25 years, many studies attempted to measure the extent of the middle class and stressed the link between the concept of bipolarization and the importance of the middle class. Another strand of the economic literature emphasized the concept of polarization (or multi-polarization). The basic contribution here is that of Esteban and Ray (1994) who linked the concept of polarization to the notions of identification, alienation, and potential social conflict. Identification refers to the idea that an individual feels some degree of identification with those who are 'close' to him/her. Identification is thus an increasing function of the number of individuals who are in the same income class as that individual. The alienation function on the contrary characterizes the antagonism caused by income differences so that an individual will feel alienated from those who are 'far away' from him/her. While Esteban and Ray (1994), as well as Esteban et al. (2007), assumed that the number of groups was determined ex ante, Duclos et al. (2004) extended the analysis of polarization to the continuous case, letting the data determine the number of relevant groups and poles.
The focus of most empirical studies of bi-polarization and polarization was on the distribution of total income. There have however been a few attempts to decompose bipolarization and polarization indices by income sources (e.g., Araar 2008;Deutsch and Silber 2010) but the procedures are not very simple. More recently, Bárcena-Martín et al. (2017) proposed a simple matrix formulation to decompose the Foster and Wolfson bi-polarization index by income sources.
The main contribution of the present paper is to introduce a simple algorithm to compute the Esteban and Ray (1994) polarization index. We derive this algorithm from the simple matrix formulation suggested by Silber (1989) to compute the Gini index. We then show that, with such an approach, it is easy to derive the contribution of various income sources (or explanatory variables in the case of an earnings function) to the degree of polarization of the distribution of total income. Section 2 describes the algorithm allowing the simple computation of the ER index polarization index while Section 3 shows how such a formulation simplifies the decomposition of this index by income sources. Section 4 presents a simple empirical illustration and Section 5 concludes the paper.

Matrix Representation of the Esteban and Ray (1994) ER Polarization Index
The Esteban and Ray (1994) polarization index ER is expressed as where v k is the relative population frequency of population subgroup k, µ k the mean income 1 of group k and β a parameter which varies between 2 and 2.6 (see, Esteban and Ray 1994). We can also write expression (1) as where the mean incomes µ i are ranked by increasing values. More generally, assuming n population subgroups, expression (2) becomes In (3), ER A and ER B are the two components of the ER index, t is a (1 by n) row vector, written s is a (n by 1) column vector which, as row vector, would be written as s = [µ 1 v 1 µ 2 v 2 . . . µ n v n ], v is a (1 by n) row vector written as v = [v 1 v 2 . . . v n ] and r is a (n by 1) column vector which, as a row vector would be expressed as G is a square n by n matrix, called G-matrix, whose typical element g ij is equal to 0 if i = j, to −1 if j > i and to +1 if i > j (see, Silber 1989, for more details on this G-matrix 2 ). It is important to stress that the elements µ i v i in vector s' and the elements µ i v β i in vector r' have both to be ranked by decreasing values of the mean incomes µ i . Let τ' be a (1 by n) row vector, written as τ = . Let also θ be a (n by 1) column vector of the income shares the 'identification modified population share' of population subgroup i, the expression τ'Gθ is a kind of Gini index comparing a priori shares which are the 'identification modified population shares' with a posteriori shares which are the actual income shares of the various population subgroups, the comparison being made via the linear operator G, the G-matrix.
Similarly, let η' be a (n by 1) row vector whose typical element η i is written as . η i will be labeled the 'identification modified income share' of population subgroup i. The expression v Gη is then a kind of Gini index, comparing a priori shares, the actual population shares, with a posteriori shares, the 'identification modified income shares' of the various population subgroups. This comparison is made again via the linear operator G, the G-matrix.
1 Esteban and Ray (1994) refer to the natural logarithm of income rather than to income. We will make a somehow similar assumption by stating that the mean income of a given group refers in fact to its mean income relative to the mean income in the whole population. To simplify the notations, we do not introduce the population mean income in the formulations.

2
As stressed already in Silber (1989), the first matrix formulation of the Gini index was proposed by Pyatt (1976).

Expression (3) is then rewritten as
In other words, the polarization index is equal to the corrected sum of two Gini-related indices. The first one compares the 'identification modified population shares' with the actual income shares of the different population subgroups. The second one compares the actual population shares with the 'identification modified income shares' of the different population subgroups. The first correction

Decomposing the ER Index by Income Sources
Assume there are J income sources. The average income µ i , in population subgroup i, may then be expressed as so that expression (3) may also be written as where s .j is a (n by 1) column vector whose typical element s ij is equal to v i µ ij while r .j is a (n by 1) column vector whose typical element r ij is equal to v β i µ ij . Note that the elements s ij in vector s .j and the elements r ij in vector r .j have to be ranked by decreasing mean incomes µ i .
We may then rewrite (6) as where D j , the contribution of income source j to the ER index, is expressed as We could also express (8) as where s .j is a (n by 1) column vector whose typical elements s ij , which are equal to v i µ ij , are ranked in descending order of µ ij , while r .j is a (n by 1) column vector whose typical elements r ij , which are where ER j is the Esteban and Ray polarization index for income source j, ER A j and ER B j being its two components.
Let us also define two correlation measures, COR A j and COR B j , with These correlation measures may evidently be positive or negative.
Combining expressions (7)- (12) we derive that We therefore conclude that, ceteris paribus, - The higher ER A j , the higher the degree of polarization of the distribution of total income. - The higher ER B j , the higher the degree of polarization of the distribution of total income. - If COR A j is positive, the higher this correlation measure, the higher the degree of polarization of the distribution of total income. However, if it is negative, it will have a negative impact on the overall Esteban and Ray index ER.
-Similarly, if COR B j is positive, the higher this correlation measure, the higher the degree of polarization of the distribution of total income. However, if it is negative, it will have a negative impact on the overall Esteban and Ray index ER 3 .

A Short Empirical Illustration
In this section, we present a simple empirical illustration, based on EU data from the European Union Statistics on Income and Living Conditions (EU-SILC) data set for the 2016 wave (EUROSTAT 2016). EU-SILC is an international database that consists of comparable, country-specific data. The measure of income is the total disposable household income. Since a given level of household income corresponds to a different standard of living, depending on the size and composition of the household, we adjust incomes for differences in household size and composition using the "modified OECD" equivalence scale 4 . The latter assigns a value of 1 to the first adult in the household, 0.5 to each remaining adult, and 0.3 to each person younger than 14.
Disposable income includes net income from work, other private income not related to work, pensions and other social transfers. Net money income includes all income sources received by the household and by each of its current members in the year preceding the survey. Social insurance contributions, pay-as-you-earn taxes, and non-money income are not included in this definition of income.
The decomposition of the ER polarization index by income sources is based on three income sources:

1.
Benefits (benefits) that include: old-age and survivor' benefits, unemployment benefits, sickness benefits, disability benefits, education-related allowances, family/children related allowances, social exclusion not classified elsewhere, housing allowances 2.
Income from rental of a property or land, interest, dividends, profit from capital investments in unincorporated business (property and interest) 3.
Income available before including sources 1 and 2 (income before) 3 Expression (13) reminds us of the decomposition of the Gini index by income sources (see, Lerman and Yitzhaki 1985) where the contribution of an income source to the overall Gini index is a function of the share of this source in total income, of the Gini index of this source and of the Gini-correlation between this source and total income. In (13) the contribution of an income source to the overall ER index is a function of the two components of the ER index for this source, and of two correlation measures. However the share of the source does not appear. In Appendix A, we provide a more detailed decomposition where the parallel with the traditional decomposition of the Gini index by income sources becomes evident. Table A1 in the Appendix A gives, for each of these countries, the average value of these income sources, the average total income and the population size. Table 1 refers to data in Euros. We give there the value of the ER index when the parameter β is equal to 2.5 and when it is equal to 1 (Gini related measure 5 ). We also computed, as suggested by Esteban and Ray (1994), the ER index with these two values of the parameter β, for the case where the logarithm of income rather than income was the variable under study. Table 1 gives also, when income and not the logarithm of income is used, the relative contributions of the different income sources, to the ER index. It appears that the most important (relative) contribution to the value of the ER index is that of income before transfer (62.4%) while this source has a share in total income of 70.7%. On the contrary, benefits and 'property income and interest' have a higher relative contribution to the ER index (respectively 25.4% and 12.2%) than their share in total income (23.2% and 6.1%). We may also observe that the contributions of these sources to the Gini-related index (parameter β equal to 1) is quite similar to their contributions to the ER index (65.6, 24.9, and 9.5%). They actually lie between their contributions to the average total income and to the ER index.
When introducing the logarithm of income into the formulation of the ER index with β = 2.5, we observe that this index is quite small (0.045) when compared to its value (0.577) when β = 1.  Table 2 is similar to Table 1 but here all the computations are derived from PPP income data. While the relative contributions of the three income sources to the average EU PPP income (on the basis of the countries for which data were available) are quite similar to those presented in Table 1, the computation of the ER index and of the contributions of the income sources to this index show a somehow different picture. When the parameter β is equal to 2.5, it appears that the ER index is lower than in Table 1, whether this index is derived from income data or from the logarithm of incomes. What is more interesting is that there is an important decrease in the relative contribution 5 When, in expression (1), we divide the income data by the average income and assume that β = 1, ER will equal to twice the traditional Gini index. What is called the absolute Gini index, is actually the product of the Gini index by the mean, so that when β = 1 and we use absolute incomes and not relative incomes in (1) ER will be equal to twice the absolute Gini index. of income before transfer (from 62.4% to 54.4%) when β = 2.5 and from 65.6% to 59.6% when β = 1. On the contrary, there is an increase in the relative contribution of benefits: from 25.4% to 28.6% when β = 2.5 and from 24.9% to 27.7% when β = 1. A similar increase is observed for property income and interest since the relative contribution rises from 12.2% to 17.0% when β = 2.5 and from 9.5% to 12.8% when β = 1. In short, when using PPP rather than current data, polarization and inequality turn out to be smaller, but the relative contribution of benefits and property income and interest to polarization and inequality rises.

Concluding Comments
This paper has shown how it is possible to express the Esteban and Ray (1994) ER index in matrix form. Such a formulation greatly simplifies the decomposition of this index by income sources. We gave a simple empirical illustration showing that this breakdown gives useful information as to the impact of the different income sources on the polarization of incomes. This illustration was based first on income data in Euros and then on PPP income data. We could also apply the proposed breakdown to an analysis of the polarization of the distribution of wages or earnings. If we estimate a traditional earnings function, we could then easily derive the contribution to the polarization of wages of the explanatory variables of such a function. Indeed, we intend to explore these issues in future empirical work.

Appendix A. The Similarity between the Decomposition by Income Sources of the Gini Index and of the ER Index
Remember that expression (4) is written as where τ' is a (1 by n) row vector, written as τ = , θ a (n by 1) column vector of the income shares , η a (n by 1) row vector whose typical element η i is written as and v' a row vector of the population shares.
We can rewrite (A1) as Given that the G-matrix is a linear operator we then derive that If instead of ranking the incomes µ ij by decreasing values of the incomes µ i , we rank them by decreasing values of the incomes µ ij , and call µ ij this re-ordered vector, we end up with