Next Article in Journal
Distributions You Can Count On …But What’s the Point?
Previous Article in Journal
Cross-Validation Model Averaging for Generalized Functional Linear Model
Previous Article in Special Issue
Econometrics and Income Inequality
Open AccessArticle

Asymptotic Versus Bootstrap Inference for Inequality Indices of the Cumulative Distribution Function

1
Business School, University of Aberdeen, Old Aberdeen AB24 3UE, UK
2
Department of Economics, University of Edinburgh, Edinburgh EH8 9AB, UK
3
Business School, University of Leeds, Leeds LS2 9JT, UK
*
Author to whom correspondence should be addressed.
Econometrics 2020, 8(1), 8; https://doi.org/10.3390/econometrics8010008
Received: 12 December 2017 / Accepted: 18 February 2020 / Published: 26 February 2020
(This article belongs to the Special Issue Econometrics and Income Inequality)

Abstract

We examine the performance of asymptotic inference as well as bootstrap tests for the Alphabeta and Kobus–Miłoś family of inequality indices for ordered response data. We use Monte Carlo experiments to compare the empirical size and statistical power of asymptotic inference and the Studentized bootstrap test. In a broad variety of settings, both tests are found to have similar rejection probabilities of true null hypotheses, and similar power. Nonetheless, the asymptotic test remains correctly sized in the presence of certain types of severe class imbalances exhibiting very low or very high levels of inequality, whereas the bootstrap test becomes somewhat oversized in these extreme settings.
Keywords: measurement of inequality; ordered response data; multinomial sampling; large sample distributions; Studentized bootstrap tests; monte carlo experiments measurement of inequality; ordered response data; multinomial sampling; large sample distributions; Studentized bootstrap tests; monte carlo experiments

1. Introduction

Inequality indices expressed as functions of the cumulative distribution function (CDF) are routinely used in studies that quantify inequality in self-assessed health, happiness, and other life satisfaction variables that are collected in the form of ordered response data. The large sample distribution of inequality indices of the CDF has been obtained by Abul Naga and Stapenhurst (2015). It is, however, an open question as to how reliable the adoption of the large sample distribution in testing hypotheses in applied work involving finite samples sizes is. Such an investigation (the first of this kind according to our best knowledge) is the main purpose of this paper. The focus of our investigation will be the Alphabeta family of inequality indices (Abul Naga and Yalcin 2008) further extended by Kobus and Milos (2012). This family of indices has been associated with an important empirical literature (Dutta and Foster 2013; Jones et al. 2011; Madden 2010; Arrighi et al. 2015). This research is motivated by an important body of work, reviewed in Cowell and Flachaire (2015); Davidson and Duclos (2013), that documents poor finite sample performance of the t-statistic of income inequality measures.
The inequality indices we investigate in this paper are smooth statistics of multinomial distributions. It is well known that for smooth statistics of multinomial distributions, the normal approximation provided by the central limit theorem is generally very accurate as sample size increases with the number of probability categories fixed, and as long as the underlying probability distribution lies in the interior of its parameter space.1
Nonetheless, there are several reasons a priori why we may want to consider undertaking bootstrap type inference for inequality indices defined on ordered response data. Firstly, we note that such indices are typically non-linear functions. The analytical calculation of their standard errors involves linearization using the delta method. Davison and Hinkley (1997, p. 16) argue that simulation methods in practice can provide more accurate estimates of the distribution of test statistics than analytical methods that rely on the delta method. Also, under certain regularity assumptions, statistical theory will always recommend bootstrap inference over asymptotic tests in the context of asymptotically pivotal test statistics and in presence of finite samples (Horowitz 2001). Furthermore, in the income inequality literature, bootstrap methods often prove useful in cases where the exact distribution of the test is very complicated to obtain analytically (e.g., Barret et al. 2014).
Bootstrap methods, however, need not always be recommended over asymptotic inference in all sampling contexts. For instance, Athreya (1987) among others shows that bootstrap methods may fail when resampling occurs from long-tailed distributions. Likewise, in the context of income data, Russell and Flachaire (2007) show that the presence of outliers severely disrupts the performance of bootstrap inference in relation to income inequality measures.
A priori therefore, it is not clear which of bootstrap methods and asymptotic inference is to be preferred in economic investigations of inequality in relation to ordered response data. We therefore use Monte Carlo experiments to compare the empirical size and statistical power of asymptotic inference and the Studentized bootstrap test, in the context of a generic hypothesis test that a population has a given level of inequality t 0 . We find that, in a broad variety of settings, both tests have similar rejection probabilities of true null hypotheses, and similar power. Nonetheless, the asymptotic test remains correctly sized in presence of certain types of severe class imbalances2 exhibiting very low or very high levels of inequality, whereas the bootstrap test becomes somewhat oversized in these extreme settings. Given that the simulation results suggest that in practice the two tests perform very similarly, we are inclined to recommend the use of asymptotic inference in applied research involving the use of inequality indices of the cumulative distribution function.
The structure of the paper is as follows. Section 2 presents the two families of inequality indices for ordered response data that will be the subject of our investigation. Section 3 discusses the implementation of the Studentized bootstrap test in relation to the two families of inequality indices. In the discussion, we place particular emphasis on ensuring that sampling follows the two golden rules set out by Davidson (2007). Section 4 discusses the methods of investigation used in exploring the comparative properties of the two tests. Section 5 presents the results of the Monte Carlo simulations. Section 6 presents a brief application, while Section 7 concludes with a discussion of the limitations of the paper and directions for further research.

2. The Family of Inequality Indices

Let S denote a sample of observations on k ordered categories of well-being (for example life satisfaction, self-reported health status or obesity status). Assume n 1 individuals are reported to be in category 1, n 2 individuals are reported to be in category 2 , etc., and define n = i = 1 k n i . The resulting sample then follows a multinomial distribution. If S is drawn from an underlying population probability mass function (PMF) f = ( f 1 , , f k ) , the likelihood of S takes the form
P ( S | f ) : = n ! n 1 ! · · n k ! f 1 n 1 · · f k n k .
We note that, in the context of multinomial sampling, the data counts ( n 1 , , n k ) are jointly sufficient statistics for the sample S (see Yalonetzky 2013, for further detail).
Inequality indices for ordered response data are often expressed in terms of the cumulative distribution associated with the sample S . We denote the sample’s frequency distribution x = ( x 1 , , x k ) where x i = n i / n is the proportion of individuals who are in category i . We define X = ( X 1 , , X k ) as the resulting empirical cumulative distribution, where X j : = i = 1 j n i / n , and we call X the empirical distribution function (EDF).
Let D denote the set of cumulative distributions defined over k ordered categories of well-being. An inequality index for ordered response data is then some function defined on D , with parameters reflecting some appropriately defined inequality aversion axiom and other ethical properties. First, to give an example of an inequality index that is linear in the cumulative distribution function, consider the family of sub-group decomposable indices of Kobus and Milos (2012):
Ξ ( X ; m , a 1 , a 2 ) : = a 1 i = 1 m 1 X i a 2 i = m k X i + c 1 k , m , a 1 , a 2 c 2 k , m , a 1 , a 2 a 1 , a 2 0
Here a 1 and a 2 are parameter values chosen by the data analyst in order to reflect different social value judgements regarding inequality below, and above, the median category m3 and c 1 and c 2 are normalization constants that insure that the index takes values in the unit interval [ 0 , 1 ] .
Next consider the Alphabeta family of inequality indices (Abul Naga and Yalcin 2008):
Δ ( X ; m , ω 1 , ω 2 ) : = i = 1 m 1 X i ω 1 i = m k X i ω 2 + c 3 ( k , m , ω 1 , ω 2 ) c 4 ( k , m , ω 1 , ω 2 ) ω 1 , ω 2 1
Likewise ω 1 and ω 2 are parameter values chosen to reflect social aversion to inequality below and above the median, and c 3 : = k + 1 m and c 4 : = ( m 1 ) 1 2 ω 1 ( k m ) 1 2 ω 2 + ( k m ) are normalization constants. Note that the index Δ ( X ; m , ω 1 , ω 2 ) is only linear in X in the specific case where ω 1 = ω 2 = 1 , and furthermore that Δ ( X ; m , 1 , 1 ) = Ξ ( X ; m , 1 , 1 ) for any distribution X.
The key property associated with the two families of inequality indices presented above is that they are increasing in median preserving spreads (Allison and Foster 2004).4 The above indices feature in studies aimed at quantifying health inequality in multiple country contexts (e.g., Jones et al. 2011; Madden 2010) and also in simulating the envisaged effect of policy interventions on health inequality in the context of specific pathologies (e.g., Arrighi et al. 2015). They are also used in the quantification of happiness inequality in the United States (Dutta and Foster 2013).

3. The Bootstrap and Asymptotic Test

The purpose of this section is to detail the procedures used to implement the asymptotic and bootstrap tests. We can think of a generic statistical test as a function τ ( H o , S ) [ 0 , 1 ] , where H o is a hypothesis being investigated and S is a sample. The test function returns a p-value giving the lowest bound on the type-1 error rate for which we can reject the hypothesis of concern.
Let Λ j denote the subset of D containing distributions with a median status j. Then Λ j : = { G D : m e d ( G ) = j } and it is clear that D = Λ 1 Λ k . Accordingly, for some sample S of n observations with EDF X , drawn from a cumulative distribution function F Λ m , and some inequality index Δ ( . ; m , ω 1 , ω 2 ) , consider testing the null hypothesis
H o : Δ ( F ; m , ω 1 , ω 2 ) = t o .
We define the null space associated with this null hypothesis as the set of cumulative distribution functions G Λ m that exhibit the level of inequality t o , and whose median equals the median m of the cumulative distribution F :
E ( t o , Λ m ; ω 1 , ω 2 ) : = G Λ m : Δ ( G ; m , ω 1 , ω 2 ) = t o
Let V [ Δ ( X ; m , ω 1 , ω 2 ) ] denote any consistent estimator of the asymptotic variance of the inequality index. Conditional on the median category m , an asymptotic test of the null hypothesis that Δ ( F ; m , ω 1 , ω 2 ) = t o involves computing the test statistic
z : = Δ ( X ; m , ω 1 , ω 2 ) t o V [ Δ ( X ; m , ω 1 , ω 2 ) ] 1 / 2 ,
and approximating the exact distribution of z under the null hypothesis by a standard normal distribution5. Equivalently, for the asymptotic test, the critical values are obtained from the quantiles of the standard normal distribution.
As an alternative to asymptotic inference, bootstrap procedures simulate the distribution of the test via calculation of the test statistic in a large number of samples drawn from a distribution in the null space. Typically the null space (5) contains many distributions, yet one way of ensuring a successful implementation of the bootstrap is to insure that sampling follows the related two golden rules (Davidson 2007). The first of these rules requires that the data generating process underlying the bootstrap samples must belong to the model underlying the null hypothesis H o . The second golden rule requires that the data generating process underlying the bootstrap samples be obtained under the null hypothesis from an efficient estimation procedure.
Because the Maximum Likelihood estimator is generally efficient, it is of particular relevance, if possible, to use an ML procedure to select the underlying model of the null hypothesis, and to generate the bootstrap samples from this data generating process. This method of investigation, known as the parametric bootstrap, ensures that both golden rules of bootstrap inference are satisfied (Davidson 2007).
Associate with the estimation sample S the vector of responses ( n 1 , , n k ) . The ML estimator of the data generating process underlying H o is obtained by maximizing the sample likelihood (1) in the null space E ( t o , Λ m ; ω 1 , ω 2 ) . Associate a probability mass function f with the cumulative distribution F. The ML estimator f ˜ of the data generating process underlying the null hypothesis of interest is then chosen as the maximizer of P ( S | f ) in the null space E ( t o , Λ m ; ω 1 , ω 2 ) ; that is,
From here on, the ML estimator of the null hypothesis is used to generate b = 1 , , B bootstrap samples S ˜ b resulting in empirical distributions X ˜ 1 , , X ˜ B and a sequence of test statistics
z ˜ b : = Δ ( X ˜ b ; m , ω 1 , ω 2 ) Δ ( X ; m , ω 1 , ω 2 ) V B O O T 1 / 2
Here, V B O O T is the empirical variance of the B values Δ ( X ˜ b ; j , ω 1 , ω 2 ) of the level of inequality. The sample quantiles of the B bootstrap statistics z ˜ b are used instead of the quantiles of the standard normal distribution in order to provide critical values for hypotheses tests related to the level of inequality in the underlying population.

4. Methods

Consider testing a null hypothesis of the generic form (4) using each of the asymptotic and bootstrap test discussed in the previous section. Call these two testing procedures τ A and τ B respectively. In this section we discuss how to use Monte Carlo simulation methods to evaluate the size and power properties of the asymptotic and bootstrap tests.
In order to obtain a good understanding of the comparative size and power properties of the two statistical tests, our interest in the Monte Carlo experiments will be to investigate for each of the two procedures, the effect of varying the parameters F, ω 1 , ω 2 and t o of the generic test (4), in relation to different sample sizes. Our interest in examining the effect of sample size is to explore which of the two estimation procedures is to be recommended in applied work, involving small samples.
We explore changing the data generating process F by varying its parameters, namely the number of response categories k, the median response category m6, and the underlying level of inequality t. In varying these parameters of the DGP, we are particularly interested in investigating the effect of severe class imbalances. The interest underlying DGPs exhibiting severe class imbalances, is that they allow the researcher to explore the effect of sampling near the boundary of the parameter space of the underlying multinomial population, where we expect the normal approximation of the distribution of the asymptotic test to be less accurate.
Varying t o in the generic test (4) allows us ( i ) to investigate test size (by setting t o = t , where t is the level of inequality associated with the DGP F), and ( i i ) to investigate statistical power (by setting t o to be different from t). We also explore variations in the null hypothesis (4) by varying the inequality aversion parameters ω 1 and ω 2 of the underlying inequality index.
To provide a unified view of the scope of the envisaged simulation exercises, it is useful to consider a general Monte Carlo procedure as a method for studying the size and power properties a given test τ (in this paper we consider τ = τ A , τ B ) given a prescribed null hypothesis H o , a data generating process F, and a sample size n. A Monte-Carlo function M ( τ , H o , F , n ) is then an algorithmic procedure used to estimate, via simulation, the distribution of the resulting test statistic in a variety of contexts, such as those discussed above.
The strategy we pursue in the investigations is to define a baseline specification, and to explore variations from this benchmark case. The baseline case arises in relation to a sample size n of 499 observations, a DGP associated with k = 5 socioeconomic classes, a median socioeconomic status m = 3 and a uniform probability mass function f b a s e = ( 1 1 1 1 1 ) / 5 (see Table 1).
The inequality aversion parameters are set at the value ( ω 1 , ω 2 ) = ( 2 , 2 ) and the resulting level of inequality t o is equal to 0.60 . The number B of bootstrap samples is set throughout equal to 999 , while the number of Monte Carlo samples is set to the value 5000 . The appendix provides further detail about the DGPs used in the Monte Carlo investigations.
Our chosen method of summarizing the simulation results will be to rely on graphical methods developed by Davidson and MacKinnon (1998). Firstly, we shall report p-value curves for both tests. The advantage of these graphical devices lies in allowing the researcher to investigate globally the size of tests, not just at key nominal values (such as 1 % or 5 % ). We can furthermore identify a correctly sized test when its p-value curve lies below the 45 degree line.To investigate power, we shall report size–power curves. The advantage of the size–power curve is to quantify statistical power at a correct (i.e., consistently estimate) size, rather than nominal size. A test procedure τ i is more powerful than τ j in the context of a particular hypothesis of interest H o , when the related size–power curve of τ i lies above the size–power curve pertaining to τ j .

5. Simulation Results

The focus in this section is on exploring the size and power properties of the asymptotic and bootstrap tests. With the exception of extreme cases of very low and very high inequality, we find that both tests are correctly sized in our investigations, and moreover, both tests have similar power even in the presence of small samples.

5.1. Sample Size

In the baseline specification of interest, we set the sample size at n = 499 , and the other parameters of the DGP are chosen as indicated in Table 1 and the Appendix A of the paper. In the top panel of Figure 1 we plot p-value curves pertaining to both the asymptotic and bootstrap test, while the bottom panel plots size–power curves. The simulations pertain to sample sizes n = 49 , n = 99 , n = 499 (the baseline) and n = 999 . We report on the horizontal axis of the p-value curves nominal sizes from 0% to 25%. For the size–power curves, we report power over the same range of actual sizes. For sample sizes of 49, 99 and 499 observations the p-value curves lie below diagonal (45 degree) line, indicating that the tests are correctly sized. At nominal sizes exceeding 15%, we note nonetheless that the p-value plots cross the diagonal axis in the context larger samples ( n = 999 ) . Our central concern being the comparison of the relative performance of the two tests, we note that they perform equally well in terms of their p-value curves.
Turning to the bottom panel where we report size–power curves, we observe that both testing methods also perform broadly similarly in terms of power, and the overall pattern is that power rises with sample size, as is of course expected to be the case.

5.2. The Number of Response Categories

We are interested in exploring how the choice of DGP influences the relative performance of the asymptotic and Studentized bootstrap test. Our first investigation in this respect is to explore the effect of changing the number of response categories k . The baseline specification chooses a uniform PMF defined on k = 5 socioeconomic categories, and we also examine DGPs pertaining to k = 3 , and k = 9 . For all three values of k , the evidence plotted in the top panel of Figure 2 supports the conclusion that both tests are correctly sized.
For a given sample size, it is not immediate to infer the effect of changing the number of response categories on the power of the tests. In the context of both tests, the findings of the bottom panel of Figure 2 suggest, however, that other things equal, increasing k from k = 3 to the baseline value k = 5 results in an increase of statistical power. The gain in power from increasing the number of response categories further to k = 9 is, however, less apparent in the context of both tests. In further investigations not reported in the paper, we explored the effect of further increasing k (for a range of values from k = 2 to k = 49 ) , for different sample sizes. We did not find however any general monotonic relation emerging between the number of response categories and statistical power in either of the asymptotic or Studentized bootstrap test.

5.3. The Median Response Category

By varying the median response category m of the distribution we can explore some parametric changes in the shape of the underlying DGP. We consider first a PMF f = ( 5710 1073 1073 1073 1073 ) / 10 , 000 with an associated median state m = 1 . We then somewhat reduce the mass at the bottom tail of the distribution (i.e., at the bottom socioeconomic status), and consider a PMF f = ( 2879 2879 1414 1414 1414 ) / 10,000 with median state m = 2 . Finally, we use the baseline DGP to explore the effect of setting the median state at the value m = 3 . Note also that the three PMFs exhibit a level of inequality equal to 0.60 (i.e., the baseline level), despite the different shapes of these probability mass functions and their varying median states.
For all three values of the median state m , the evidence plotted in the top panel of Figure 3 supports the conclusion that both tests are correctly sized. It is also the case that their p-value curves broadly overlap. Turning to the size–power curves, the findings of the bottom panel of Figure 3 suggest that upon varying the median response category, both tests perform equally well in terms of statistical power.

5.4. Inequality Aversion Parameters

We expect the asymptotic test to perform well in the context of a linear inequality index (i.e., when both inequality aversion parameters are set equal to one). When either inequality parameters are greater than unity, simulation methods in practice can provide more accurate estimates of the distribution of test statistics than analytical methods that rely on the delta method (Davison and Hinkley 1997, p. 16). For this reason, our interest here will be to investigate the practical advantage of adopting the Studentized bootstrap test, when the inequality aversion parameters are allowed to vary.
Specifically, in Figure 4 we begin by setting the inequality aversion parameters to ( ω 1 , ω 2 ) = ( 1 , 1 ) . The resulting inequality index being linear, is a member of the Kobus–Miłoś family (2) of subgroup decomposable indices. We then examine the case pertaining to the baseline parameter values ( ω 1 , ω 2 ) = ( 2 , 2 ) , followed by the case ( ω 1 , ω 2 ) = ( 4 , 4 ) . For all three investigations, the evidence plotted in the top panel of Figure 4 supports the conclusion that both tests are correctly sized, and the p-value curves specific to each investigation broadly overlap. Turning to the size–power curves, the findings of the bottom panel of Figure 4 suggest that the two tests exhibit broadly similar power for each pair of parameter values, and furthermore that the power of each given test is sensitive to the specification of the inequality index.
In the investigations of Figure 5, we explore the effect of attaching different weight to the bottom and top tails of the distribution via setting ω 1 and ω 2 at different parameter values: a larger value of ω 1 renders the inequality index more sensitive to the bottom tail of the distribution, while increasing ω 2 makes the index more sensitive to the top tail of the distribution. We depict the p-value and size–power curves of the two tests in relation to the following sequence of inequality aversion parameters: ( ω 1 , ω 2 ) = ( 1 , 4 ) , ( ω 1 , ω 2 ) = ( 4 , 1 ) and ( ω 1 , ω 2 ) = ( 8 , 1 ) . For all three investigations, the evidence in the top panel of Figure 5 supports the conclusion that both tests are moderately over-sized, while the p-value curves of the two tests in each specific investigation broadly overlap. The bottom panel reveals that the power of the tests is sensitive to the specification of the inequality index. There is little to differentiate the asymptotic and Studentized bootstrap test in term of size–power curves, and, as in the findings of Figure 4, we are led to conclude that the power of each test is sensitive to the choice of inequality aversion parameters.

5.5. Severe Class Imbalances

To investigate the effect of severe class imbalances, our strategy here is to generate a sequence of distributions that are ordered by the A F relation. That is, we consider a sequence of PMFs f 1 , f 2 , f 3 , f 4 , f 5 ; where for i = 1 , 2 , 3 , 4 , f i + 1 is obtained from f i using a number of median preserving spreads. In increasing order, the level of inequality t associated with each of the distributions in the sequence equals 0.002 , 0.3827 , 0.444 , 0.60 , and 0.999 . We note here however that associated with the two polar cases of extreme equality and inequality are two PMFs that exhibit severe class imbalances: f 1 : = ( 1 1 1000 1 1 ) / 1004 , and f 5 = ( 1000 1 1 1 1000 ) / 2003 . On the other hand, f 4 coincides with the PMF of the baseline model, f 4 : = ( 1 1 1 1 1 ) / 5 , which exhibits a property of extremely well balanced classes. There are two other PMFs in the sequence, f 2 = ( 0 0 6 2 1 ) / 9 and f 3 = ( 1 2 3 2 1 ) / 9 . Both of f 2 and f 3 share a common median ( m = 3 ) , f 2 however exhibits a less balanced class structure than f 3 , and furthermore lies at the boundary of the parameter space defining multinomial distributions.
We also consider a second sequence of PMFs ordered by the median preserving spreads relation, given by f 1 , f 2 , g 3 , f 4 , f 5 , where g 3 replaces f 3 , and such that the level of inequality at g 3 equals 0.5278 . This new PMF, g 3 = ( 0 0 3 2 1 ) / 6 is chosen to lie at the boundary between the subsets Λ 3 and Λ 4 pertaining to distributions with median states equal to 3 and 4 respectively.7
The top panel of Figure 6 provides p-value plots for the tests associated with the (from left to right) the DGPs of f 1 , f 2 and f 3 . The PMF f 1 chosen to exhibit severe class imbalances is of particular interest, as it generates marked differences in the p-value curves associated with the asymptotic test and the Studentized bootstrap test. At low significance levels 0% to 5%, the asymptotic test is moderately oversized. We may however note that at higher significance levels (6% to 25%) the asymptotic test is correctly sized, rather than being oversized. The bootstrap test is oversized at all significance levels, and its size substantially exceeds that of the asymptotic test above the 4% significance level. It is equally important to observe that any differences between the p-value curves of the two tests become hardly visible in relation to the DGPs associated with the PMFs f 2 and f 3 . The bottom panel of Figure 6 reports the size–power curves of the two tests in relation to the DGPs associated with f 1 , f 2 and f 3 . In the case of severe class imbalances (the DGP associated with f 1 ) the two tests exhibit very similar power, and the conclusion is similar when inequality rises in the case of f 2 and f 3 .
Figure 7 similarly plots p-value and size–power curves for the DGPs associated with the remaining PMFs g 3 , f 4 and f 5 . In terms of test size and power, the plots of the top and bottom panel reveal that both the asymptotic and bootstrap test perform very similarly in relation to g 3 and f 4 . Recalling that f 4 is the uniform PMF of the baseline model, we do not find in this investigation that severe class balances lead to oversized tests, comparatively low power, or overall differences in the p-value or size–power curves of the two tests.
As discussed above, the probability mass function f 5 exhibits the polar case of high inequality ( t 0 = 0.999 ) , and is associated with severe class imbalances. The findings in relation to the DGP associated with f 5 are qualitatively similar to those of the DGP associated with f 1 which exhibits the opposite polar case of low inequality. Here we find that at low significance levels (0% to 4%), the asymptotic test is moderately oversized. We may however note that at higher significance levels (5% to 25%) the asymptotic test is correctly sized. The bootstrap test remains correctly sized at significance levels 0% to 6% but then becomes (increasingly) oversized at all levels above the 7% significance level. Nonetheless, we do not find any substantial differences in the size–power curves of the two tests in the bottom panel of Figure 7. What we can document, however, is that statistical power is considerably higher in the extreme cases of severe class imbalances than in the intermediate cases of the two sequences of PMFs, using either of the two tests.

6. An Illustrative Example

As an illustrative example, we return to the data application in Abul Naga and Stapenhurst (2015) pertaining to asymptotic inference in relation to the Alphabeta family of inequality indices (3). These are data on five ordered nutritional health states from the Egyptian Integrated Household Survey of 1997–1999.8 The data refer to two statistical areas of Northern Egypt (also known as Lower Egypt), namely Metropolitan Lower Egypt (METLO) and Non-Metropolitan Lower Egypt (NMETLO). The resulting cumulative distributions are respectively X 1 = ( 0.075 , 0.187 , 0.430 , 0.812 , 1.00 ) for the METLO data ( n 1 = 107 ) and X 2 = ( 0.040 , 0.144 , 0.363 , 0.667 , 1.00 ) for the NMETLO data ( n 2 = 452 ) . Note also that the median response category is m = 4 in both distributions. The data from the larger sample exhibit somewhat more class imbalances than the first sample, though neither sample exhibits the levels of class imbalances found to be problematic in the simulations of Section 5.
We begin by carrying out the same exercise as in Section 5, for the empirical distribution functions of the data. Using the baseline values ( ω 1 , ω 2 ) = ( 2 , 2 ) of the inequality aversion parameters, inequality is calculated at 0.376 and 0.474 in respectively the Metropolitan (METLO) and Non-Metropolitan (NMETLO) samples. We next investigate a hypothesis that inequality in a given sample—computed at parameter values ( ω 1 , ω 2 ) = ( 2 , 2 ) —is equal to 0.456 (the level of inequality of the pooled sample, computed at the same parameter values.) On the basis of the simulations reported in the results section, we expect the asymptotic and bootstrap tests to exhibit very similar p-value curves as well as size–power curves. Inspecting the related curves in Figure 8, we find that this pattern indeed does emerge. It is worth noting however that the power of the test is considerably higher in the smaller METLO sample. This pattern is to be expected, as the hypothesized value of T = 0.456 is considerably closer to the level of inequality in the larger Non-Metropolitan sample ( T 2 = 0.474 ) than it is in the Metropolitan Lower Egypt ( T 1 = 0.376 ) .
We report these inequality computations pertaining to parameter values ( ω 1 , ω 2 ) = ( 2 , 2 ) in Table 2, together with other computations of inequality in the two samples for various pairs of parameter values. Rows 4 and 5 of the Table furthermore report p-values arising from the bootstrap test of the hypothesis that each of the respective samples has the same level of inequality T of the combined sample. In the context of the computations pertaining to parameter values ( ω 1 , ω 2 ) = ( 1 , 1 ) , both samples exhibit identical levels of inequality, a 0.440 figure, and hence the p-values of both tests are equal to 1. We observe that it is also the case that for the other pairs of parameter values, the bootstrap test fails to reject at the 5% level the hypothesis that either of the two populations has a level of inequality equal to that of the combined sample. Nonetheless, we note that in the context of the computations pertaining to parameter values ( ω 1 , ω 2 ) = ( 2 , 2 ) , the test does reject at the 10 % the hypothesis that METLO has the same level of inequality T = 0.456 as the combined sample (last column of the table).
In order to highlight one practical difference between the two tests, we calculate in rows 6 to 9 of Table 2 the 95 % confidence intervals for inequality in the two samples, using both bootstrap inference (the rows starting with T i B C I ) and asymptotic inference (the rows starting with the cell T i A C I ) 9. These intervals consist of all the levels of inequality that we would fail to reject at the 5% level. While the asymptotic inference confidence intervals are by construction symmetric about the sample value T i of inequality, this symmetry need not arise in the context of the bootstrap confidence intervals. We find here that the bootstrap confidence intervals are generally larger than the asymptotic confidence intervals. These findings, obtained in the context of sample sizes of 100 to 500 observations, would suggest that the exact distribution of the alphabeta inequality index may exhibit thicker tails than those of the standard normal distribution.

7. Conclusions

We have used Monte Carlo experiments to compare the empirical size and statistical power of asymptotic inference and the Studentized bootstrap test. We have found that in all our investigations the asymptotic and bootstrap test exhibit very similar size–power curves. With the exception of extreme cases of very low and very high inequality, we also have found that both tests are correctly sized in our investigations. The experiments pertaining to extremely low and high levels of inequality (respective values of 0.002 and 0.999) present cases of several class imbalances in the underlying data generating process. In these two investigations, the asymptotic test remains correctly sized at all test sizes ranging between 0% and 25%, while the bootstrap test becomes increasingly oversized at all levels starting from the 6% value. Nonetheless, both tests remain correctly sized under other cases of severe class imbalances or other cases where the DGP lies at the boundary of the parameter space pertaining to multinomial distributions. Awaiting further investigations of this nature, given the numerical cost associated with implementing the bootstrap test, our broad recommendation to applied researchers would be to adopt asymptotic inference in the context of inequality indices defined on ordered response data. That is, the context of ordered response data would appear to be of a separate nature from income data, where asymptotic inference has been documented to often produce incorrectly sized tests.
These conclusions have been reached in the context of sampling where the population median, used in order to determine the functional form of the inequality index, was assumed to be known. It is therefore important in further investigations to develop a framework for exploring the performance of the two tests in sampling contexts where the median of the distribution is treated as a random quantity.10

Author Contributions

Conceptualization, R.A.N., C.S. and G.Y.; methodology, R.A.N., C.S. and G.Y.; software, R.A.N., C.S. and G.Y.; formal analysis, R.A.N., C.S. and G.Y.; writing–original draft preparation, R.A.N., C.S. and G.Y.; writing–review and editing, R.A.N., C.S. and G.Y.; visualization, R.A.N., C.S. and G.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

We are grateful to three anonymous referees, Karim Abadir, Martin Biewen and Emmanuel Flachaire for helpful comments and suggestions. The paper was written in part while Abul Naga was a visitor at Aix-Marseille University. The author wishes to thank the Iméra Institute of Advanced Studies and the Aix-Marseille School of Economics for their hospitality.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In several Monte Carlo investigations, the baseline DGP is used, with an associated PMF f b a s e = ( 1 1 1 1 1 ) / 5 , (with a median response category m = 3 ) . In other investigations, the adoption of alternative DGPs is required, for which we provide here further information.
( i ) In the context of investigations pertaining to the effect of changes in sample size, the baseline DGP is used throughout the investigations. The sample sizes examined considered are n = 49 , n = 99 , n = 499 , and finally n = 999 .
( i i ) In the context of investigations of changes in the number of response categories, the DGP in the context of k = 3 has a PMF given by f = ( 3 4 3 ) / 10 (with a median response category m = 2 ) . For the case k = 5 , the PMF of the baseline DGP is adopted. For the final case we examine, k = 9 and the PMF given is by f = ( 1601 1103 811 665 1640 665 811 1103 1601 ) / 10 , 000 (with a median response category m = 5 ) .
( i i i ) In the context of investigations of changes in the median category, the DGP in the context of m = 1 has a PMF given by f = ( 5710 1073 1073 1073 1073 ) / 10 , 000 . For the case m = 2 , the PMF given is by f = ( 2879 2879 1414 1414 1414 ) / 10 , 000 . For the final case we examine, m = 3 the baseline DGP is chosen ) . We recall that all three DGPs share an identical level of inequality of 0.60 .
( i v ) In the context of investigations of changes in the inequality aversion parameters, the baseline DGP is used throughout the investigations. The inequality aversion parameters examined are given by the following sequence of ordered pairs ( ω 1 , ω 2 ) : ( 1 , 1 ) , ( 2 , 2 ) , ( 4 , 4 ) , ( 1 , 4 ) , ( 4 , 1 ) , ( 8 , 1 ) .
( v ) In the context of investigations of changes in the level of inequality, the DGP in the context of the investigations pertaining to the level of inequality t 0 = 0.002 has a PMF given by f 1 = ( 1 1 1000 1 1 ) / 1004 . The DGP in the context of t 0 = 0.3827 has a PMF given by f 2 = ( 0 0 6 2 1 ) / 9 . The DGP associated with the level of inequality t 0 = 0.4444 has a PMF given by f 3 = ( 1 2 3 2 1 ) / 9 . The DGP associated with the level of inequality t 0 = 0.5278 has a PMF g 3 = ( 0 0 3 2 1 ) / 6 . For the investigations pertaining to the level of inequality t 0 = 0.60 , we adopt the baseline DGP; that is we set f 4 : = f b a s e . The DGP associated with the level of inequality t 0 = 0.999 has a PMF given by f 5 = ( 1000 1 1 1 1000 ) / 2003 .

References

  1. Abul Naga, Ramses, and Christopher Stapenhurst. 2015. Estimation of inequality indices of the cumulative distribution function. Economic Letters 130: 109–12. [Google Scholar] [CrossRef]
  2. Abul Naga, Ramses, and Tarik Yalcin. 2008. Inequality measurement for ordered response health data. Journal of Health Economics 27: 1614–25. [Google Scholar] [CrossRef] [PubMed]
  3. Allison, Andrew, and James Foster. 2004. Measuring health inequality using qualitative data. Journal of Health Economics 23: 505–24. [Google Scholar] [CrossRef] [PubMed]
  4. Arrighi, Yves, Mohammad Abu-Zaineh, and Bruno Ventelou. 2015. To count or not to count deaths: Reranking effects in health distribution evaluation. Health Economics 24: 193–205. [Google Scholar] [CrossRef] [PubMed]
  5. Athreya, K. 1987. Bootstrap of the mean in the infinite variance case. The Annals of Statistics 15: 724–31. [Google Scholar] [CrossRef]
  6. Barret, Garry F., Stephen G. Donald, and Debopam Bhattacharya. 2014. Consistent nonparametrictests for lorenz dominance. Journal of Business and Economics Statistics 32: 1–13. [Google Scholar] [CrossRef]
  7. Cowell, Frank, and Emmanuel Flachaire. 2015. Statistical methods for distributional analysis. In Handbook of Income Distribution. Edited by A. Atkinson and F. Bourguignon. Amsterdam: Elsevier, pp. 359–465. [Google Scholar]
  8. Davidson, Russell. 2007. Bootstrapping econometric models. Quantile 3: 13–36. [Google Scholar]
  9. Davidson, Russell, and Jean-Yves Duclos. 2013. Testing for restricted stochastic dominance. Econometric Reviews 1: 84–125. [Google Scholar] [CrossRef]
  10. Davidson, Russell, and James G. MacKinnon. 1998. Graphical methods for investigating the size and power of hypothesis tests. Manchester School 66: 1–26. [Google Scholar] [CrossRef]
  11. Davison, A. C., and David Hinkley. 1997. Bootstrap Methods and Their Application. Cambridge: Cambridge University Press. [Google Scholar]
  12. Dutta, Indranil, and James Foster. 2013. Inequality of happiness in the united states 1972–2010. Review of Income and Wealth 59: 393–415. [Google Scholar] [CrossRef]
  13. Horowitz, J. 2001. The bootstrap. In Handbook of Econometrics. Edited by J. Heckman and E. Leamer. Amsterdam: Elsevier. [Google Scholar]
  14. Jones, Andrew M., Nigel Rice, Silvana Robone, and Pedro Rosa Dias. 2011. Inequality and polarization in health systems responsiveness: A cross-country analysis. Journal of Health Economics 30: 616–25. [Google Scholar] [CrossRef] [PubMed]
  15. Kobus, Martyna, and Piotr Milos. 2012. Inequality decomposition by population subgroups for ordinal data. Journal of Health Economics 31: 15–21. [Google Scholar] [CrossRef] [PubMed]
  16. Madden, David. 2010. Ordinal and cardinal measures of health inequality: An empirical comparison. Health Economics 19: 243–50. [Google Scholar] [CrossRef] [PubMed]
  17. Russell, Davidson, and Emmanuel Flachaire. 2007. Asymptotic and bootstrap inference for inequality and poverty measures. Journal of Econometrics 141: 141–66. [Google Scholar]
  18. Stapenhurst, Christopher. 2017. Testing for Median Preserving Spreads of the Multinomial Distribution. Master’s thesis, University of Edinburgh, Edinburgh, UK. [Google Scholar]
  19. Yalonetzky, Gaston. 2013. Stochastic dominance with ordinal variables: Conditions and a test. Econometric Reviews 32: 126–63. [Google Scholar] [CrossRef]
1
Note that here the parameter space is a space of probability distributions.
2
Severe class imbalances occur when the share of probability mass allocated to the various probability states is unevenly distributed, and far from the uniform case presenting perfectly balanced classes.
3
Set X 0 = 0 , X k = 1 , and X = ( X 1 , , X k ) . The median socio-economic status category associated with X, med ( X ) = m { 1 , , k } , is an index such that X m 1 0.5 and X m 0.5 . We note that the median status is uniquely defined in samples where n is an odd number, and accordingly we work with odd sample sizes throughout our simulation exercises.
4
Consider two empirical distributions X , Y D . The median preserving spreads partial ordering ( D , A F ) ranks X more egalitarian than Y , written X A F Y , if three conditions are satisfied. These are ( 1 ) m e d ( X ) = m e d ( Y ) = m , ( 2 ) X i Y i for all i < m and ( 3 ) X i Y i for all i m .
5
See Abul Naga and Stapenhurst (2015) for a derivation of a consistent estimator of the asymptotic variance of the inequality index.
6
Observe from (2) and (3) that the functional form of the inequality index changes with the median socio-economic status m . In this sense, one could equally interpret the purpose of introducing variations in the parameter m as an exercise of exploring changes in the functional form of the inequality index in the null hypothesis being investigated.
7
We note however that the new PMF g 3 is not comparable with f 3 in terms of median preserving spreads. The central feature defining the two sequences is therefore that within a sequence inequality rises according to any inequality index that is increasing in median preserving spreads.
8
The health states in ascending order (from state 1 to state 5 ) are the following: type-III obese, type-II obese, type-I obese, overweight and not overweight.
9
In the context of the bootstrap test, the confidence intervals are obtained by finding hypothesized values t o which produce p-values equal to 0.025 and 0.975 .
10
See Stapenhurst (2017) for further discussion.
Figure 1. The effect of sample size.
Figure 1. The effect of sample size.
Econometrics 08 00008 g001
Figure 2. The number of response categories.
Figure 2. The number of response categories.
Econometrics 08 00008 g002
Figure 3. The median response category.
Figure 3. The median response category.
Econometrics 08 00008 g003
Figure 4. The effect of inequality aversion parameters.
Figure 4. The effect of inequality aversion parameters.
Econometrics 08 00008 g004
Figure 5. The effect of inequality aversion parameters (further results).
Figure 5. The effect of inequality aversion parameters (further results).
Econometrics 08 00008 g005
Figure 6. The effect of severe class imbalances.
Figure 6. The effect of severe class imbalances.
Econometrics 08 00008 g006
Figure 7. The effect of severe class imbalances (further results).
Figure 7. The effect of severe class imbalances (further results).
Econometrics 08 00008 g007
Figure 8. Inequality in nutritional health in Lower Egypt.
Figure 8. Inequality in nutritional health in Lower Egypt.
Econometrics 08 00008 g008
Table 1. Baseline specification.
Table 1. Baseline specification.
ParameterNotationBaseline Value
Sample sizen499
Number of response categoriesk5
Probability mass function f = ( f 1 , , f k ) f b a s e ( 1 / 5 , , 1 / 5 )
Median response categorym3
Inequality aversion parameters ( ω 1 , ω 2 ) ( 2 , 2 )
Level of inequality (size of test) t o 0.60
Level of inequality (power of test) t o ± 0 . 02
Table 2. Bootstrap inference for Inequality in Nutritional Health in Lower Egypt.
Table 2. Bootstrap inference for Inequality in Nutritional Health in Lower Egypt.
( ω 1 , ω 2 ) ( 1 , 1 ) ( 1 , 2 ) ( 2 , 1 ) ( 2 , 2 )
T 1 0.4400.4580.3300.376
T 2 0.4400.4900.3900.474
T0.4400.4850.3780.456
p ( T 1 = T ) 10.520.280.06
p ( T 2 = T ) 10.780.460.34
T 1 B C I (0.353;0.545)(0.373;0.553)(0.243;0.437)(0.287;0.470)
T 1 A C I (0.357;0.521)(0.380;0.536)(0.250;0.410)(0.294;0.458)
T 2 B C I (0.402;0.481)(0.453;0.527)(0.352;0.426)(0.432;0.507)
T 2 A C I (0.405;0.475)(0.459;0.521)(0.357;0.423)(0.440;0.507)
Notes: (1) T 1 denotes inequality in Metropolitan Lower Egypt ( n 1 = 107 ); T 2 denotes inequality in Non-Metropolitan Lower Egypt ( n 2 = 452 ) ; and T denotes inequality in the pooled sample. (2) p ( T i = T ) denotes the p-value associated with the null hypothesis H o : T i = T . (3) A C I and B C I are respectively asymptotic and bootstrap 95 % confidence intervals.
Back to TopTop