Decomposition of the Inequality of Income Distribution by Income Types - Application for Romania

This paper identifies the salient factors that characterize the inequality income distribution for Romania. Data analysis is rigorously carried out using sophisticated techniques borrowed from classical statistics (Theil). Decomposition of the inequalities measured by the Theil index is also performed. This study relies on an exhaustive (11.1 million records for 2014) data-set for total personal gross income of Romanian citizens.

Romanian income distribution has been little studied. Using wage data from a social security database for a county in Romania, Derzsy et al. [31] showed that in the upper tail, the distribution follows a Pareto law with a coefficient of 2.5, while in the range of low and middle incomes, they found that an exponential distribution fits the data. Similar results were obtained by Oancea et al. [29] using tax records data for the entire population that received an income in Romania in 2013. Other studies of the Romanian income distribution [32,33] used survey data and showed that income inequality in Romania has grown over time.
In this paper, we use tax records data for 2014 in Romania to study income inequality using the Theil index [34][35][36]. While there are other widely used measures of inequality like the Gini index or the Lorenz curve, we are mainly interested in studying the extent to which income inequality can be explained by different subgroups of populations. In this case, the advantages of the decomposable measures of inequality make the Theil index the ideal candidate to be considered in our analysis [37]. We decompose the total income of a person by the source of income in wages and non-wage income (here we include social transfers, unemployment benefits etc.) and grouped the population by the source of income. We then study the decomposition of the Theil index by population subgroups starting with the guidelines presented in [37]. However, while Shorrocks [37] used a decomposition for disjoint groups, our population groups overlap since individuals can earn income from multiple sources during a year. Decomposition of inequality by income sources was studied in several papers using either the classical method, where the Theil index can be seen as a weighted average of inequality within subgroups, or the inequality between those subgroups examined via regressionbased methods: [38][39][40][41] or [42][43][44][45]. In this paper, we introduce a new decomposition of the Theil index where the population groups overlap.

Problem Presentation
The income of an individual in a population has three possible sources: salary, capital, or other sources like pensions, unemployment benefits, and social assistance. In Romania, every person who was registered as having earned an income during 2014 earned money from one, two, or three of these sources. In these conditions, the total population having an income during 2014 was divided into the following seven categories of persons ( Figure 1): persons who had their income only from a single source of income (G1-salaries, G2-capital, and G3-other sources of income), persons who earned income from two income categories (G4-salaries and other sources, G5-salaries and capital income, and G6-capital income and other sources of income), and persons who earned money from all three income sources (G7). Figure 1 shows the nature of the different incomes associated with individuals in the different categories, G1 to G7. Thus, we divide the total population into seven disjoint groups: Under these conditions, we study the following:  To what extent is the inequality of the distribution of the total income of the population influenced by the distribution of income on the seven classes of persons. In this case, we use a decomposition of the Theil index calculated for the entire population depending on the inequality of distribution of income among the seven groups of people and the differences that exist between the seven groups. In this case, the decomposition of the Theil index corresponds to the case where the groups are disjoint [37]. The total inequality is explained by the factors that act at the level of the groups and factors that differentiate the groups of employees;  For each group for which there are at least two income sources, the inequality of income distribution is measured by the inequality of income distribution on each source of income. In this case, because the decomposition relationship used at the previous point can no longer be used, we propose another relationship for this decomposition: the total inequality of income distributions can be decomposed into three components: the first component highlights the differences that are at the level of each data series, the second component highlights the differences between the averages of the data series, and the last term highlights the interaction between the factors.

Data Series
We use gross annual income computed from tax records data at the individual level for 2014. We distinguish between three parts of the total income: wages, capital income, and other sources of income. The currency used for all incomes in the current paper is the Romanian "leu"-RON. The first part of the income can be attributed to labor (domestic and abroad). The second part, capital income, comes from dividends, interest on deposits, rents, real estate transfers, etc. The third part of the income comes from pensions, social assistance, unemployment benefits, income from agricultural labor, freelance activities, and intellectual property rights. Our data sets have 11 million records and were processed using R software [46,47].
The structure of incomes and across the population shows that (i) 44% of people who earned wages earned 56% of the total income of the population; (ii) 23% of people have earned capital income and these represent 19.5% of the total income of the population; (iii) 33% of people also earned other types of income, accounting for 24.4% of the total income of the population; (iv) Figure 2 shows the shares of income and the number of people in the seven groups-these results show that there are significant differences between the two data series; and (v) the significant differences between income shares and the number of people in the seven groups are materialized by different yearly average earnings of persons belonging to one of the seven groups of persons ( Figure 3).   Table 1 presents the characteristics of the total incomes of the population, both for total and for the three sources of income. The characteristics are evaluated for the whole population and for the seven groups. The results in Table 1 allow the following comments to be made: At the level of the total population, there are significant differences in the distribution of the income obtained by the source of income ( Figure 4); There is a different concentration of the income from the three sources. Figure 5 shows the ratio between top 1%/bottom 99% for the total population, the seven groups, and the income from the three sources of income (we define this ratio as the ratio of the sum of incomes in the 99-100% centile to the sum of incomes in the 0-1% centile); (iii) The total income of the population is thus divided on the three sources of income: 56.0%wages, 19.5%-capital and 24.4%-other income; (iv) The distribution of people who have earned income from at least one source of income on the three sources is as follows: 44% obtained at least income from wages, 23%-at least capital income and 33%-at least other income.

Breakdown by Disjoint Groups
In order to measure the inequality of income distribution on each group, we calculate the Theil index. We consider the incomes earned by persons in a group Gi is vij and this group has ni persons. Under this condition, the income series for this group is represented by the vector , … , , 1, … ,7. The total income for each group is ∑ , the mean income of the group being denoted by / , and the Theil index can be formulated as [37], The results obtained for the seven groups and the entire population are presented in Table 2. In the following, we will analyze how the degree of the inequality of income distribution of the entire population is explained by two factors: the inequality within each group of persons and the differences between the seven groups of population. The seven groups being disjoint, the inequality of incomes explained by the Theil index can be decomposed as follows [37]: In (3), the Theil index computed for the entire population can be decomposed in two components: (i) The first term measures the inequality of income distribution as a result of the differences concerning the distribution of income in the seven groups. For each group inequality of income distribution is calculated by Theil indices and for all groups we evaluate this part of T(v) to multiply the Theil indices calculated at the group level by weighted arithmetic mean of all income; (ii) The second decomposition term in relation (3), which is denoted by , quantifies the part of the inequality of distribution of population incomes due to the differences of income distribution that exist between the seven groups. This term is a Theil index calculated for the average income at the level of the groups and using as a relative frequency the structure of the population on the seven groups from which the population is constituted. Table 3 shows the results of the decomposition of the inequality of income distribution based on the inequality of income distribution on each of the seven groups of people using (3).

Decomposing the Inequality of Income Distribution by Income Sources
For the G4, G5 and G6 groups each person can have two sources of income and for G7 three sources of income. For instance, a person belonging to G5 group has as income sources wages and capital income. Under these conditions, the inequality of income distribution within this group is determined by the distribution of the income of the persons in the group on each of the two sources of income, as well as by the distribution of the total income of the group on the two categories of income.
For the groups with two income sources, the total income (VT) is considered to be the sum of two income categories: , 1, … , . It is denoted by , Theil indices calculated for the distributions of the two variables X and Y, respectively. For the decomposition of the inequality of the distribution of the total incomes, relation (3) cannot not applied. In this case, the Theil index calculated for total income is the sum of three terms: The first term in this relationship is a weighted arithmetic mean of the Theil indices calculated for the two variables. The weights are the ratios of income on each source of income and the total income. This term measures the income inequality due to the inequality of income distribution for each income source. In this case, we calculated the Theil index for each of the data series.
The second term measures income inequality due to distribution of income by the income sources. This term is a Theil index calculated on the basis of the distribution of income on the two sources of income.
The third term represents the correlation between the two categories of income that influence the inequality of income distribution among a population.
We present below the breakdown of the Theil index calculated for the total income of individuals of a population if they are formed on the basis of multiple sources of income. We denote by , 1, … , the income of the person formed on the basis of m income sources. In this case, ⋯ , 1, … , . The Theil Index of the entire population breaks down as follows: where -represents the average income per person for the income symbolized by , 1, … , ; μ-average income per person irrespective of the source of income; , 1, … -Theil index for the income series ; -total income of a person; -the income earned by a person having . as a source of income.


The first relationship measures the inequality of distribution of total incomes of the population due to the differences that exist in the distribution of income distributed by each income source. In this case, the Theil indices computed on the series of data constituted by income sources are multiplied by the weights of the total income from each income category in the total income of the population ;  The second term quantifies the differences that exist between people's income categories. This term is computed as the difference between the maximum entropy and the entropy of distributing the total income of the population by income sources ;  The latter term is a rest that quantifies the effect of interaction between income distribution on each income category and total income distribution across the population .
In the case of the decomposition of the Theil index used in the situation where the population was composed of disjoint groups, there is no longer the interaction factor that is included in the relations (4) and (5). Table 4 presents the results obtained by applying decomposition (5) to the groups in which individuals earn the total income from two or three sources of income:

Conclusions and Discussion
We have computed the income inequality in Romania taking account of the different modes of income (wages, capital, etc.) for more than 11 million individuals for the year 2014.
The data has been analysed using entropy methods and characterised using the Theil index. This leads to the identification of three components. The first computed on the series of data constituted by income sources is multiplied by the weights of the total income from each income category in the total income of the population. The second quantifies the differences that exist between the population's income categories and is computed as the difference between the maximum entropy and the entropy of distributing the total income of the population by income sources. The third one is a remainder that quantifies the effect of interaction between income distribution on each income category and total income distribution across the population.
We believe entropy methods of the kind proposed here offer a new and interesting way forward for the analysis of income distributions. In a future paper, we will re-examine the data using the Tsallis entropy now widely used to characterise economic phenomena (See [48,49]).

Acknowledgments:
The authors would like to thank Roxana Herteliu-Iftode, who checked the English for the later version of our manuscript.
Author Contributions: Bogdan Oancea prepared the data series, wrote the R code and performed the statistical data analysis. Tudorel Andrei, Bogdan Oancea, Peter Richmond, Gurjeet Dhesi and Claudiu Herteliu wrote the paper.