Theil’s Index of Inequality: Computation of Value-Validity Correction

Kvålseth, Tarald O.

doi:10.3390/computation12120240

Open AccessArticle

Theil’s Index of Inequality: Computation of Value-Validity Correction

by

Tarald O. Kvålseth

^1,2

¹

Department of Mechanical Engineering, University of Minnesota, Minneapolis, MN 55455, USA

²

Department of Industrial & Systems Engineering, University of Minnesota, Minneapolis, MN 55455, USA

Computation 2024, 12(12), 240; https://doi.org/10.3390/computation12120240

Submission received: 28 October 2024 / Revised: 27 November 2024 / Accepted: 3 December 2024 / Published: 5 December 2024

(This article belongs to the Section Computational Social Science)

Download

Browse Figure

Versions Notes

Abstract

The Theil index is one of the most popular indices of economic inequality, one reason for which is no doubt due to its convenient additive decomposition property. One of its weaknesses, however, is its lack of any intuitively meaningful interpretations. Another, and more serious, limitation of Theil’s index, as argued in this paper, is its lack of the value-validity property. That is, this index does not meet a particular condition based on metric distances between income-share distributions required in order for the range of potential index values to provide true, realistic, and valid representations of the economic inequality characteristic. After outlining the value-validity condition, this paper derives a simple transformation of Theil’s index that meets this condition to a high degree of approximation. Randomly generated income-share distributions are used to demonstrate and verify the validity of the corrected index. The new index formulation, which is simply a power function of Theil’s index, can then be used to make appropriate and reliable representations of absolute and relative difference comparisons of economic inequalities.

Keywords:

economic inequality; Theil’s index; Theil’s corrected index; value validity

1. Introduction

Among the various measures of economic inequality that have been previously proposed, one of the most popular is the Theil index, named after Theil [1,2], and generally expressed as

(Y_{n}) = \frac{1}{n} \sum_{i = 1}^{n} (\frac{y_{i}}{\bar{y}}) \log (\frac{y_{i}}{\bar{y}})

(1)

where

Y_{n} = (y_{1}, \dots, y_{n})

is the set of incomes of n individuals or income intervals and

\bar{y} = \sum_{i = 1}^{n} y_{i} / n

(e.g., [3,4,5,6,7]). In terms of proportions (income shares)

p_{i} = y_{i} / n \bar{y} (i = 1, \dots, n)

, Theil’s index can be expressed as

T (P_{n}) = \log n - H (P_{n}), H (P_{n}) = - \sum_{i = 1}^{n} p_{i} \log p_{i}

(2)

where

H (P_{n})

is recognized as the entropy of Shannon [8] for the distribution

P_{n} = (p_{1}, \dots, p_{n})

with

p_{i} \geq 0

for

i = 1, \dots, n

and

\sum_{i = 1}^{n} p_{i} = 1 .

The natural (base-e) logarithm is used in (1) and (2).

While this index can certainly be criticized for its lack of intuitive sense [9], its popularity is probably due to its useful decomposition property, as follows: T can be decomposed additively into the inequality “between” and “within” different subgroups (e.g., [1,2,10,11]). This decomposability property is useful in empirical studies and can be used by policymakers when trying to identify sources of economic inequality (e.g., [6,10,12,13,14]). The U.S. Census Bureau produces estimates of the Theil index.

A recognized practical disadvantage of Theil’s T is that its values are not always comparable across different units (such as countries) since, although its lower bound is 0, the upper bound

\log n

of T is not fixed, but depends on n. Another limitation of T, that has not so far been reported or discussed, relates specifically to the values taken on by T. While various properties relevant to T that are basically mathematical have been widely discussed (symmetry, scale invariance, population replication, Pigou–Dalton transfer principle; e.g., [5]), any concern about T lacking the value-validity property has not so far been discussed, but will be in this paper. This property, first introduced by Kvålseth [15], ensures that an inequality index takes on values throughout its range that provide representations of the inequality characteristic that are true, realistic, and valid with respect to a generally acceptable criterion.

It becomes immediately evident that T does not meet the condition required by the value-validity property and can therefore lead to unreliable, inappropriate, and misleading results and conclusions. Consequently, the objective of this paper is to explore some alternative formulation as a correction of T that satisfies the value-validity condition, at least as a good approximation. The exploratory analysis is based on randomly generated income-share distributions

P_{n} = (p_{1}, \dots, p_{n})

as well as the so-called lambda distribution [16].

2. Value-Validity

Since the value-validity property and its conditions have been discussed extensively by Kvålseth [14,16,17], only a brief outline will be provided here. Thus, consider a generic economic inequality measure EI whose value becomes

E I (P_{n})

for the income-share distribution

P_{n} = (p_{1}, \dots, p_{n})

and with the extreme values

E I (P_{n}^{0})

and

E I (P_{n}^{1})

for the two distributions

P_{n}^{0} = (\frac{1}{n}, \dots, \frac{1}{n}), P_{n}^{1} = (1, 0, \dots, 0) .

(3)

While the strictly correct notation would be for EI to denote a measure or function and

E I (P_{n})

to denote its value for some

P_{n}

, EI may be used in this paper to denote both a measure (index) and its value to simplify the notation when there is no chance of ambiguity.

As a convenient starting point to introduce the value-validity concept, consider the following delta distribution introduced by Kvålseth [16]:

P_{n}^{λ} = (λ + \frac{1 - λ}{n}, \frac{1 - λ}{n}, \dots, \frac{1 - λ}{n}), 0 \leq λ \leq 1

(4)

where

λ

can be considered as an inequality parameter. The

P_{n}^{0}

and

P_{n}^{1}

in (3) are seen to be extreme members of (4). Thus, for any given n,

λ = 0

in (4) represents the income-share distribution with perfect equality while

λ = 1

corresponds to the distribution with maximum income-share inequality. When considering some condition for the value-validity of an economic inequality index EI, the special distribution in (4) can conveniently be used because of the following relationship:

E I (P_{n}) = E I (P_{n}^{λ}) for one unique λ

(5)

for any given income-share distribution

P_{n} = (p_{1}, \dots, p_{n})

and single-valued EI with

E I (P_{n}^{0}) \leq E I (P_{n}) \leq E I (P_{n}^{1})

. Of course, there can be any number of different

P_{n},

for which (5) would hold for the same

λ

-value.

The distribution in (4) can be viewed as a so-called mixture distribution, since it is the following weighted mean of

P_{n}^{0}

and

P_{n}^{1}

in (3):

P_{n}^{λ} = (1 - λ) P_{n}^{0} + λ P_{n}^{1}, 0 \leq λ \leq 1 .

(6)

As a basis for the value-validity condition for an economic inequality index EI, the following linearity (mean-value) requirement for (5) is proposed:

E I (P_{n}^{λ}) = (1 - λ) E I (P_{n}^{0}) + λ E I (P_{n}^{1})

(7)

for all n and

0 \leq λ \leq 1

. This linear relationship can equivalently be expressed in terms of the normalized form

E I^{*} (P_{n}^{λ}) = \frac{E I (P_{n}^{λ}) - E I (P_{n}^{0})}{E I (P_{n}^{1}) - E I (P_{n}^{0})} = λ .

(8)

For any given

λ

,

E I (P_{n}^{λ})

in (7) becomes a linear function of the two variables

E I (P_{n}^{0})

and

E I (P_{n}^{1})

and, for any given (fixed)

E I (P_{n}^{0})

and

E I (P_{n}^{1})

,

E I (P_{n}^{λ})

is a linear function of

λ

.

Besides the linearity proposition in (7), this relationship can also be justified or explained in terms of metric distances between income-share distributions. Thus, by considering the distributions

P_{n}^{λ}, P_{n}^{0}, and P_{n}^{1}

as points (vectors) in n-dimensional Euclidean space,

λ

can be expressed in terms of Euclidean distances d as

\frac{d (P_{n}^{λ}, P_{n}^{0})}{d (P_{n}^{1}, P_{n}^{0})} = d^{*} (P_{n}^{λ}) = λ .

(9)

Then, from (8) and (9), the value-validity condition can be expressed in terms of the normalized index EI* and distance d* as

E I^{*} (P_{n}^{λ}) = d^{*} (P_{n}^{λ}) = λ .

(10)

While the condition in (10) is based on the specific distribution in (4), there are more general implications from the equality in (5). Consequently, it can be expected that the first equality in (10) becomes an approximate equality for any income-share distribution

P_{n} = (p_{1}, \dots, p_{n})

, i.e.,

E I^{*} (P_{n}) \approx \frac{d (P_{n}, P_{n}^{0})}{d (P_{n}^{1}, P_{n}^{0})} = d^{*} (P_{n}) = \sqrt{\frac{n \sum_{i = 1}^{n} p_{i}^{2} - 1}{n - 1}} .

(11)

3. Critical Assessment of T

It is readily seen from its definition in (2) that the Theil index T does not meet the value-validity condition in (10). It is clear from numerical examples that T substantially understates the true extent of the economic inequality. For a simple example

P_{2} = (0.75, 0.25)

, it follows from (2) that

T (P_{2}) = 0.13

so that since

T (P_{2}^{0}) = 0

and since

T (P_{2}^{1}) = \log 2

, then the normalized index value becomes

T^{*} (P_{2}) = 0.19

. This

P_{2}

distribution is equivalent to

P_{2}^{0.5}

in (4) so that, according to the requirement in (10), the normalized index value should be 0.50 rather than the

T^{*}

-value of 0.19.

For an inequality index ranging in potential values between 0 and

\log n

, as is the case for Theil’s index with

T (P_{n}^{0}) = 0

and

T (P_{n}^{1}) = \log n

, an equivalent index

T_{V}

complying with the value-validity condition in (10) can be expressed as

T_{V} (P_{n}^{λ}) = (\log n) λ, T_{V} (P_{n}) = (\log n) d^{*} (P_{n}) = (\log n) \sqrt{\frac{n \sum_{i = 1}^{n} p_{i}^{2} - 1}{n - 1}}

(12)

for the

P_{n}^{λ}

in (4) and

P_{n} = (p_{1}, \dots, p_{n})

. The extent to which T lacks the value-validity property can conveniently be analyzed by comparing

T (P_{n}^{λ})

with the

T_{V} (P_{n}^{λ})

in (12).

Although the extent of the inequalities

T (P_{n}^{λ}) < T_{V} (P_{n}^{λ})

and

T (P_{n}) < T_{V} (P_{n})

become readily apparent from numerical data considered below, this can be performed analytically in terms of

P_{n}^{λ}

and by defining the value bias of

T (P_{n}^{λ})

as

V B T (P_{n}^{λ}) = T (P_{n}^{λ}) - T_{V} (P_{n}^{λ}) = (1 - λ) \log n - H (P_{n}^{λ})

(13)

for the

T_{V} (P_{n}^{λ})

in (12) and with

H (P_{n}^{λ})

being the entropy in (2) for the distribution

P_{n}^{λ}

in (4). In terms of partial derivatives,

\frac{\partial V B T}{\partial λ} = (1 - \frac{1}{n}) \log [1 + n λ {(1 - λ)}^{- 1}] - \log n

(14)

and

\partial^{2} V B T / \partial λ^{2} \geq 0

. For any given n, it is found from (14) that VBT becomes maximum for

λ

values ranging from

λ = 3 / 5

for

n = 2

and

λ = 1 / 2

for

n \to \infty

. Furthermore, by treating n as a continuous variable for mathematical purposes, it is found that

\partial V B T (P_{n}^{λ}) / \partial n \leq 0

. This analysis shows that the value bias

V B T (P_{n})

from (13) tends to become increasingly negative with increasing n and as

P_{n}

approaches the mean

(P_{n}^{0} + P_{n}^{1}) / 2

of the two extreme distributions in (3).

While the sensitivity of

T_{V} (P_{n}^{λ})

in (12) to changes in the inequality or concentration parameter

λ

in (4) remains constant for any given n, that of

T (P_{n}^{λ})

varies substantially with

λ

. Specifically, it is found that

\frac{\partial T (P_{n}^{λ})}{\partial λ} = (1 - \frac{1}{n}) \log (1 + \frac{n λ}{1 - λ}) > 0, \frac{\partial^{2} T (P_{n}^{λ})}{\partial λ^{2}} = \frac{n - 1}{(1 - λ) (n λ + 1 - λ)} > 0

i.e., for any given n, the sensitivity of

T (P_{n}^{λ})

to small changes in

λ

increases with

λ

and at an increasing rate. For any income-share distribution

P_{n} = (p_{1}, \dots, p_{n})

and from (5), the implication is clear: the sensitivity of

T (P_{n})

to small changes in the inequality (concentration, unevenness) of the components of

P_{n}

is not constant for any fixed n, but increases with increasing inequality.

4. Correction of T

4.1. Specific Objective

In order to determine if T can be corrected so as to comply with the value-validity condition in (8), an obvious approach would be to explore whether some systematic relationship exists between T and

T_{V}

in (8). If the dimension n of the income-share distribution

P_{n} = (p_{1}, \dots, p_{n})

or the number n of income earners is known, the results from Kvålseth [15] could be used to explore such a potential relationship. However, when various studies, organizations, or agencies provide reports on economic inequality, the values of indices such as Theil’s T are typically given without specifying values of n (e.g., [10,11,12,13,18,19]).

Therefore, for practical purposes, it would be most useful if a value-validity correction

T_{C}

could be formulated at least approximately as a simple function of T, i.e.,

T_{V} (P_{n}) \approx T_{C} (P_{n}) = f [T (P_{n})] .

(15)

Exploratory statistical analyses will be used to explore the function f in (15).

4.2. Data

To obtain the necessary data for analyzing the potential relationship in (15), two sources of data were used. First, randomly generated lambda distributions

P_{n}^{λ}

in (4) were obtained by generating n as a random integer between 2 and 100, inclusive, and

λ

as a random number (to 2 decimal places) such that

0 < λ < 1

.

Second, randomly generated distributions

P_{n} = (p_{1}, \dots, p_{n})

were produced and based on the following computer algorithm. First, n was generated as a random integer between 2 and 100, inclusive. Then, for each such generated n, each

p_{i}

was generated in descending order

(p_{1} \geq p_{2} \geq \dots \geq p_{n})

as random numbers within the following respective intervals:

\frac{1}{n} \leq p_{i} \leq 1

\frac{1 - \sum_{j = 1}^{i - 1} p_{j}}{n - (i - 1)} \leq p_{i} \leq \min {p_{i - 1}, 1 - \sum_{j = 1}^{i - 1} p_{j}} for i = 2, \dots, n - 1

p_{n} = 1 - \sum_{j = 1}^{n - 1} p_{j} .

Some distributions

P_{n}^{λ}

and

P_{n}

were excluded when they produced near identical (repeat) results or when they resulted in values of T > 1, since such T-values would be unrealistic of real reported economic data. Thus, a total of 35 of each of the two types of distributions were used in the analysis.

4.3. Results

The results from using the randomly generated

P_{n}^{λ}

in (4) and

P_{n} = (p_{1}, \dots, p_{n})

are summarized in Table 1 and Table 2, respectively. An immediate observation from these results is how far T deviates from the corresponding values of

T_{V}

in (12). The values of

T (P_{n}^{λ})

and

T (P_{n})

differ greatly from the respective values of

T_{V} (P_{n}^{λ})

in Table 1 and

T_{V} (P_{n})

in Table 2. These results support the above analysis that T consistently and substantially understates the true inequality.

Perhaps the most interesting and promising result from Table 1 and Table 2 is the apparent indication that, although the values of T and

T_{V}

can differ greatly, they appear to be systematically related. In fact, when the values

T_{V} (P_{n}^{λ})

versus

T (P_{n}^{λ})

and

T_{V} (P_{n})

versus

T (P_{n})

are represented by the scatter diagram in Figure 1, it becomes clear that a functional relationship, as in (15), could be formulated.

It is evident from this scatter diagram that a simple power function may indeed be an appropriate correction for T, i.e.,

T_{C} (P_{n}) = α {[T (P_{n})]}^{β}

(16)

where

α

and

β

are the parameters. The adequacy of this formulation would depend on how closely the values of

T_{C}

from (16) approximate those of

T_{V}

from (12).

From regression analysis of

T_{V}

on T, the following parameter estimates from (16) are obtained:

\hat{α} = 1.57

and

\hat{β} = 0.64

for the data in Table 1 and

\hat{α} = 1.53

and

\hat{β} = 0.66

for Table 2. When combining the data from both tables into 70 data points,

\hat{α} = 1.55

and

\hat{β} = 0.65

, which turns out to be the means of the other two sets of parameter estimates. Consequently, the following value-validity correction of Theil’s T is proposed:

{\hat{T}}_{V} (P_{n}) = T_{C} (P_{n}) = (1.55) {[T (P_{n})]}^{0.65}

(17)

which is the curve shown in Figure 1.

When comparing the values of

T_{V}

and

T_{C}

in Table 1 and Table 2 and from the scatter diagram in Figure 1, it becomes apparent that the

T_{C}

in (17) has the value-validity property since the values of

T_{V}

and

T_{C}

are approximately equal, to a reasonable degree. Specifically, if

T_{C}

is used to predict

T_{V}

, it is found that the coefficient of determination

R^{2}

, when properly computed [20], becomes

R^{2} = 1 - \sum {(T_{V} - T_{C})}^{2} / \sum {(T_{V} - {\bar{T}}_{V})}^{2} = 0.987

for the fitted model

{\hat{T}}_{V} = T_{C}

and the 70 data sets combined from Table 1 and Table 2. That is, 99% of the total variation of

T_{V}

(about its mean) is explained (accounted for) by the model

{\hat{T}}_{V} = T_{C}

. Also,

R S M E (T_{V}, T_{C})

=

\sqrt{\sum_{1}^{70} {(T_{V} - T_{C})}^{2} / 70} = 0.045

. It is also clear, based on the residuals from Figure 1, that the formulation in (17) is an appropriate one and that no alternative needs to be considered.

It is also of interest to note the close comparative results when based on the distribution

P_{n}^{λ}

in (4) versus the general distribution

P_{n} = (p_{1}, \dots, p_{n})

. From the data in Table 1 and Table 2 and from the scatter diagram in Figure 1, it is evident that the results from the two different types of distribution are highly comparable. In fact, such correspondence is not surprising, in view of the relationship in (5) involving T and the equivalent one in terms of the corrected

T_{C}

in (17).

4.4. Real Data Results

In addition to the results from randomly generated data, as discussed in Section 4.3, it may also be of interest to perform the same analysis using some real income data. Also, while the focus of this paper is on the important Theil index, the results from the real data will also be used to make a comparison with another index, the most popular Gini’s index [21], which does, in fact, have the value-validity property.

By definition, if the income shares are rank ordered such that

p_{[1]} \geq p_{[2]}, \geq \dots \geq p_{[n]}

, Gini’s index G can be expressed as

G (P_{n}) = \frac{n + 1}{n} - \frac{2}{n} \sum_{i = 1}^{n} i p_{[i]}

(18)

with tied (equal)

p_{[i]}

’s being placed in any order. For the lambda distribution in (4), it is determined from (18) that

G (P_{n}^{λ}) = (1 - \frac{1}{n}) λ, G^{*} (P_{n}^{λ}) = λ

and hence G meets the value-validity condition in (10).

In order to compare values of the indices G, T,

T_{V}, and T_{C}

in (18), (2), (12), and (17), respectively, for some real economic income data, U.S. Census Bureau data were used, as reported by Semega and Kollar [22] (Table A2), for total household income and all ethnic groups for various years. Nine income intervals were reported, ranging from “under USD 15,000” to “USD 200,000 and over”. The results are summarized in Table 3 (to 3 decimal places in order to discriminate between some of the small index values).

It is clear from the data in Table 3 that

T_{C}

in (17) is closely related to both

T_{V}

in (12) and G in (18). However, the values of

T_{C}

and

T_{V}

do not correspond as closely as they do for the random-based data in Table 1 and Table 2. Of course, the range of values of

T_{C}

and

T_{V}

is much greater in Table 1 and Table 2 than in Table 3. Also, the results in Table 3 are based on a fixed number of a few income categories,

n = 9

, whereas those in Table 1 and Table 2 are based on n ranging from 2 to 100.

There is, however, close linear relationships between the indices based on the data in Table 3. In fact, the following fitted regression models are obtained from the data in Table 3:

{\hat{T}}_{C} = - 0.127 + 1.436 T_{V}, R^{2} = 0.991

{\hat{T}}_{C} = - 0.135 + 2.068 G, R^{2} = 0.990

G = 0.005 + 0.692 T_{V}, R^{2} = 0.993

showing that the variation of one index (about its mean) is nearly perfectly explained (accounted for) by its linear relationship to another index. Consequently, when making difference (interval) comparisons, the indices

T_{V}, T_{C}, and G

can generally be expected to provide similar results, since each complies with the value-validity condition in (10) and (11).

5. Concluding Comments

The single most significant result in this paper is the simple formulation in (17) that provides a correction of Theil’s economic inequality index T to incorporate the value-validity property as a good approximation. The corrected index

T_{C}

is only a function of T and does not explicitly depend on the number of income units n. This is important when using

T_{C}

to correct published data for T, since such data typically do not specify n. In fact, this was the motivation behind searching for a potential relationship, as in (17), rather than considering some

T_{C}

as a function of both T and n.

While Theil’s T has a number of desirable properties, none of those relate specifically to the potential numerical values of T and whether those values can be justified as truly representing the economic inequality characteristic. This limitation of T is addressed by

T_{C}

and its value-validity property:

T_{C}

transforms understated T-values into realistic, reliable, and valid inequality representations.

Various economic inequality indices, such as Gini’s G in (18) and Theil’s T, are commonly used to make absolute and relative comparisons between individual values and differences (intervals). The

T_{C}

, because of its additional value-validity property, has the advantage of providing more representative economic inequality comparisons.

An interesting inconsistency occurs between

T_{C}

and T when making absolute and relative comparisons. That is, for any two values

t_{1}

and

t_{2} > t_{1}

of T and the corresponding values

t_{C 1}

and

t_{C 2 >} > t_{C 1}

from (17), a general difference between the two indices becomes:

t_{C 2} - t_{C 1} > t_{2} - t_{1}, but (t_{C 2} - t_{C 1}) / t_{C 1} < (t_{2} - t_{1}) / t_{1 .}

This inconsistency can be verified from the form of (17).

What sets Theil’s T apart from other inequality indices is its desirable decomposition property. That is, T can be decomposed into within

(T_{W})

and between

(T_{B})

inequalities, such that

T = T_{W} + T_{B}

when, for example, considering global economic inequality versus the inequalities within and between countries or regions (see, e.g., [12,13,23]). While the additive decomposition does not hold for the correction in (17), i.e.,

T_{C} \neq T_{W C} + T_{B C}

, ratio comparisons could still be corrected, such as

T_{W C} / T_{C} = {(T_{W} / T)}^{0.65}

,

T_{B C} / T_{C} = {(T_{B} / T)}^{0.65}

, or

T_{B C} / T_{W C} = {(T_{B} / T_{C})}^{0.65}

.

The

T_{C}

does have a clear limitation, as does T. Neither index has any intuitively appealing or meaningful interpretation. Nevertheless, as a simple quantitative measure of economic inequality,

T_{C}

has the important advantage over T of having the value-validity property. Consequently, when compared to T, the corrected form

T_{C}

provides more realistic and true inequality representations and comparative results from real economic evaluations.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The author declares no conflicts of interest.

References

Theil, H. Economics and Information Theory; American Elsevier Company: New York, NY, USA, 1967. [Google Scholar]
Theil, H. Statistical Decomposition Analysis; North-Holland Publishing Company: Amsterdam, The Netherlands, 1972. [Google Scholar]
Cowell, F.A. Measuring Inequality, 3rd ed.; Oxford University Press: Oxford, UK, 2011. [Google Scholar]
Coulter, P.B. Measuring Inequality: A Methodological Handbook; Routledge: London, UK, 1989. [Google Scholar]
McGregor, T.; Smith, B.; Wills, S. Measuring inequality. Oxf. Rev. Econ. Policy 2019, 35, 368–395. [Google Scholar] [CrossRef]
Trapeznikova, I. Measuring inequality. IZA World Labor 2019, 462, 1–11. [Google Scholar] [CrossRef]
Bellù, L.G.; Liberati, P. Policy Impacts on Inequality: Simple Inequality Measures; Food and Agriculture Organizations of the United Nations, FAO: Rome, Italy, 2006. [Google Scholar]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423, 623–656. [Google Scholar] [CrossRef]
Sen, A.K.; Foster, J.E. On Economic Inequality, 2nd ed.; Clarendon Press: Oxford, UK, 1997. [Google Scholar]
Muszyńska, J.; Oczki, J.; Wędrowska, E. Income inequality in Poland and the United Kingdom. Decomposition of the Theil index. Folia Oeconomica Stetin. 2018, 18, 108–122. [Google Scholar] [CrossRef]
Liao, T.F. Evaluating distributional differences in income inequality. Socius 2016, 2, 1–14. [Google Scholar] [CrossRef]
Carvajal, C.P.; Rodríguez, M.A.; Cuartas, B.M. Determinants of income inequality reduction in the Latin American countries. Cepal. Rev. 2018, 126, 80–98. [Google Scholar] [CrossRef]
Morrisson, C.; Murtin, F. Average Income Inequality Between Countries (1700–2030); FERDI La Fondation pour les études et Recherches sur le Développment International: Paris, France, 2011; pp. 1–16. [Google Scholar]
Kranzinger, S. The decomposition of income inequality in the EU-28. Empirica 2020, 47, 643–668. [Google Scholar] [CrossRef]
Kvålseth, T.O. Entropy evaluation based on value validity. Entropy 2014, 16, 4855–4873. [Google Scholar] [CrossRef]
Kvålseth, T.O. The lambda distribution and its application to categorical summary measures. Adv. Appl. Stat. 2011, 24, 83–106. [Google Scholar]
Kvålseth, T.O. Measurement of market (industry) concentration based on value validity. PLoS ONE 2022, 17, e0264613. [Google Scholar] [CrossRef] [PubMed]
Buettgens, M.; Blavin, F.; Pan, C. The Affordable Care Act reduced income inequality in the US. Health Aff. 2021, 40, 121–129. [Google Scholar] [CrossRef] [PubMed]
Amarante, V.; Galván, M.; Mancero, X. Inequality in Latin America: A global measurement. Cepal Rev. 2018, 118, 25–44. [Google Scholar] [CrossRef]
Kvålseth, T.O. Cautionary note about R². Am. Stat. 1985, 39, 279–285. [Google Scholar]
Gini, C. Variabilità e mutabilità. In Studi Economico-Giuridici. Pubblicati per cura della Facoltà di Giurisprudenza della R. Università di Cagliari; Tipogr. di P. Cuppini: Bologna, Italy, 1912; Volume 3, pp. 1–158. [Google Scholar]
Semega, J.; Kollar, M. Income in the United States: 2021; Current Population Reports, P60-276; U.S. Census Bureau: Suitland-Silver Hill, MD, USA; U.S. Government Publishing Office: Washington, DC, USA, 2022. [Google Scholar]
Ma, N.; Cheong, T.S.; Li, J. Evaluating global inequality using decomposition approach. Front. Psych. 2022, 12, 809670. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Scatter diagram of

T_{V}

in (12) versus Theil’s T from the data in Table 1 (red dots) and Table 2 (blue dots). The curve represents the fitted model for the corrected index

T_{C}

in (17).

Figure 1. Scatter diagram of

T_{V}

in (12) versus Theil’s T from the data in Table 1 (red dots) and Table 2 (blue dots). The curve represents the fitted model for the corrected index

T_{C}

in (17).

Table 1. Values of T in (2), T_V in (12), and

T_{c}

in (17) for the distribution

P_{n}^{λ}

in (4) with randomly generated

λ \in (0, 1)

and

n \in [2, 100]

.

Table 1. Values of T in (2), T_V in (12), and

T_{c}

in (17) for the distribution

P_{n}^{λ}

in (4) with randomly generated

λ \in (0, 1)

and

n \in [2, 100]

.

$λ$	$n$	$T (P_{n}^{λ})$	$T_{V} (P_{n}^{λ})$	$T_{c} (P_{n}^{λ})$
0.16	74	0.36	0.69	0.80
0.10	78	0.16	0.44	0.47
0.10	47	0.12	0.39	0.39
0.08	80	0.11	0.35	0.37
0.16	27	0.17	0.53	0.49
0.25	72	0.55	1.07	1.05
0.52	6	0.47	0.93	0.95
0.38	31	0.72	1.30	1.25
0.35	14	0.42	0.92	0.88
0.29	64	0.65	1.21	1.17
0.32	92	0.85	1.45	1.39
0.41	16	0.59	1.14	1.10
0.06	60	0.05	0.25	0.22
0.35	48	0.77	1.35	1.31
0.24	85	0.57	1.07	1.08
0.19	35	0.27	0.68	0.66
0.26	27	0.37	0.86	0.81
0.44	38	0.98	1.60	1.53
0.16	60	0.26	0.66	0.65
0.26	59	0.55	1.06	1.05
0.41	8	0.38	0.85	0.83
0.51	20	0.93	1.53	1.48
0.20	72	0.40	0.86	0.85
0.26	35	0.43	0.92	0.90
0.25	66	0.53	1.05	1.03
0.27	79	0.65	1.18	1.17
0.06	11	0.02	0.14	0.12
0.28	5	0.13	0.45	0.41
0.23	12	0.19	0.57	0.53
0.30	18	0.38	0.87	0.83
0.22	47	0.38	0.85	0.83
0.34	56	0.78	1.37	1.32
0.36	10	0.36	0.83	0.80
0.07	86	0.09	0.31	0.32
0.15	55	0.22	0.60	0.58

Table 2. Values of T in (2),

T_{V}

in (12),

d^{*}

in (11), and

T_{c}

in (17) for randomly generated income-share distributions

P_{n} = (p_{1}, \dots, p_{n})

with

n \in [2, 100]

.

Table 2. Values of T in (2),

T_{V}

in (12),

d^{*}

in (11), and

T_{c}

in (17) for randomly generated income-share distributions

P_{n} = (p_{1}, \dots, p_{n})

with

n \in [2, 100]

.

$n$	$T (P_{n})$	$T_{V} (P_{n})$	$d^{*} (P_{n})$	$T_{c} (P_{n})$
36	0.03	0.14	0.04	0.16
13	0.16	0.49	0.19	0.47
19	0.25	0.65	0.22	0.63
47	0.92	1.42	0.37	1.47
92	0.47	0.90	0.20	0.95
70	0.04	0.17	0.04	0.19
67	0.08	0.25	0.06	0.30
14	0.02	0.13	0.05	0.12
28	0.20	0.53	0.16	0.54
25	0.68	1.26	0.39	1.21
4	0.23	0.55	0.40	0.60
55	0.60	0.96	0.24	1.11
24	0.29	0.67	0.21	0.69
76	0.26	0.61	0.14	0.65
3	0.18	0.42	0.38	0.51
72	0.59	1.03	0.24	1.10
51	0.94	1.53	0.39	1.49
87	0.18	0.45	0.10	0.51
34	0.76	1.34	0.38	1.30
15	0.38	0.87	0.32	0.83
63	0.13	0.37	0.09	0.41
77	0.24	0.61	0.14	0.61
25	0.10	0.39	0.12	0.35
74	0.47	0.82	0.19	0.95
14	0.15	0.48	0.18	0.45
69	0.87	1.44	0.34	1.42
89	0.38	0.81	0.18	0.83
41	0.54	1.08	0.29	1.04
86	0.85	1.34	0.30	1.39
36	0.09	0.32	0.09	0.32
9	0.33	0.73	0.33	0.75
46	0.51	1.03	0.27	1.00
57	0.34	0.73	0.18	0.77
8	0.14	0.48	0.23	0.43
98	0.77	1.33	0.29	1.31

Table 3. Value of G in (18), T in (2),

T_{V}

in (12), and

T_{C}

in (17) for total U.S. household income over

n = 9

income categories for different years. Source: U.S. Census Bureau.

Table 3. Value of G in (18), T in (2),

T_{V}

in (12), and

T_{C}

in (17) for total U.S. household income over

n = 9

income categories for different years. Source: U.S. Census Bureau.

Year	G	T	$T_{V}$	$T_{C}$
2021	0.148	0.035	0.210	0.176
2020	0.144	0.033	0.211	0.170
2019	0.154	0.039	0.219	0.187
2018	0.147	0.034	0.213	0.172
2017	0.141	0.033	0.188	0.169
2016	0.146	0.035	0.209	0.174
2015	0.142	0.032	0.206	0.165
2014	0.146	0.034	0.204	0.171
2013	0.150	0.034	0.197	0.172
2012	0.158	0.039	0.221	0.187
2011	0.164	0.041	0.229	0.195
2010	0.161	0.040	0.224	0.190
2005	0.169	0.044	0.240	0.204
2000	0.178	0.052	0.250	0.227
1995	0.206	0.069	0.284	0.273
1990	0.227	0.085	0.311	0.312
1985	0.237	0.102	0.333	0.351
1980	0.254	0.124	0.363	0.399
1975	0.279	0.155	0.397	0.461
1970	0.302	0.171	0.434	0.492

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kvålseth, T.O. Theil’s Index of Inequality: Computation of Value-Validity Correction. Computation 2024, 12, 240. https://doi.org/10.3390/computation12120240

AMA Style

Kvålseth TO. Theil’s Index of Inequality: Computation of Value-Validity Correction. Computation. 2024; 12(12):240. https://doi.org/10.3390/computation12120240

Chicago/Turabian Style

Kvålseth, Tarald O. 2024. "Theil’s Index of Inequality: Computation of Value-Validity Correction" Computation 12, no. 12: 240. https://doi.org/10.3390/computation12120240

APA Style

Kvålseth, T. O. (2024). Theil’s Index of Inequality: Computation of Value-Validity Correction. Computation, 12(12), 240. https://doi.org/10.3390/computation12120240

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Theil’s Index of Inequality: Computation of Value-Validity Correction

Abstract

1. Introduction

2. Value-Validity

3. Critical Assessment of T

4. Correction of T

4.1. Specific Objective

4.2. Data

4.3. Results

4.4. Real Data Results

5. Concluding Comments

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI