Parametric Inference for Index Functionals

: In this paper, we study the ﬁnite sample accuracy of conﬁdence intervals for index functional built via parametric bootstrap, in the case of inequality indices. To estimate the parameters of the assumed parametric data generating distribution, we propose a Generalized Method of Moment estimator that targets the quantity of interest, namely the considered inequality index. Its primary advantage is that the scale parameter does not need to be estimated to perform parametric bootstrap, since inequality measures are scale invariant. The very good ﬁnite sample coverages that are found in a simulation study suggest that this feature provides an advantage over the parametric bootstrap using the maximum likelihood estimator. We also ﬁnd that overall, a parametric bootstrap provides more accurate inference than its non or semi-parametric counterparts, especially for heavy tailed income distributions.


Introduction
In this paper, we consider the problem of inference for an index functional T, i.e., quantities of interest that can be written as a function of the data generating model. Given a sample x i , i = 1, . . . , n and an associated distribution F such that one can assume that X i ∼ F, i = 1, . . . , n, we are interested in computing confidence intervals or proceeding with hypothesis testing for T(F). For that, there exists many different approaches that are based on either T(F (n) ) or T(F θ ), where F (n) is the empirical distribution (hence leading to a nonparametric approach) and F θ , θ ∈ Θ ⊂ p is a parametric model for which θ needs to be estimated from the sample (hence leading to a parametric approach).
As a leading example, we consider T to be an inequality index and F an income distribution. Inequality indices are welfare indices which can be very generally written in the following quasi-additively decomposable form (see Victoria-Feser (2002, 2003) for the original formal setting) where ϕ is piecewise differentiable in . The generalized entropy family of inequality indices given by is obviously obtained by setting ϕ (x, µ(F)) = 1 For example, the cases ξ = 0 and ξ = 1 are given by with I 0 GE (F) being the Mean Logarithmic Deviation (see Cowell and Flachaire 2015) and I 1 GE (F) being the Theil index. A notable exception to the class in (1) is the Gini coefficient which can be expressed in several forms, such as with C(F; q) = F −1 (q) xdF(x), the cumulative income functional. Inference on T(F) can be done in several manners: 1. The (nonparametric) bootstrap is a distribution-free approach that allows to derive the sample distribution of T(F (n) ) from which quantiles (for confidence intervals) and variance (for testing) can be estimated; for application to inequality indices, see e.g., Mills and Zandvakili (1997) and Biewen (2002).

2.
Another distribution-free approach consists in deriving the asymptotic variance of the index using the Influence Function (IF) of Hampel (1974) (see also Hampel et al. 1986) as is done in Cowell and Victoria-Feser (2003) (for different types of data features such as censoring and truncating) and estimate it directly from the sample (see also Victoria-Feser 1999;Cowell and Flachaire 2015).

3.
A parametric (and asymptotic) approach, given a chosen parametric model F θ for the data generating model, consists in first consistently estimating θ, sayθ, then considering its asymptotic properties such as its variance var(θ) and derive the corresponding asymptotic variance of T(Fθ) using e.g., the delta method (based on a first order Taylor series expansion).

4.
A parametric (finite sample) approach, given a chosen parametric model F θ for the data generating model, consists in first consistently estimating θ, sayθ, then using parametric bootstrap to derive the sample distribution of T(Fθ) from which quantiles (for confidence intervals) and variance (for testing) can be estimated.

5.
Refinements and combinations of these approaches.
While most would agree that the fully parametric and asymptotic approach based on the delta method cannot provide as accurate inference as the other methods, it is not clear that avoiding the specification of a parametric model is the way to go. Indeed, for example, Cowell and Flachaire (2015) notice that nonparametric bootstrap inference on inequality indices is sensitive to the exact nature of the upper tail of the income distribution, in that bootstrap inference is expected to perform reasonably well in moderate and large samples, unless the tails are quite heavy. Similar conclusions are also drawn in Davidson and Flachaire (2007); Cowell and Flachaire (2007); Davidson (2009); Davidson (2010) and Davidson (2012). This has for example motivated Schluter and van Garderen (2009) and Schluter (2012), using the results of Hall (1992), to propose normalizing transformations of inequality measures using Edgeworth expansions, to adjust asymptotic Gaussian approximations.
Alternatively, Davidson and Flachaire (2007) and Cowell and Flachaire (2007) consider a semi-parametric bootstrap, where bootstrap samples are generated from a distribution which combines a parametric estimate of the upper tail, namely the Pareto distribution, with a nonparametric estimate the other part of the distribution. We note that modelling the upper tail with a parametric model is common in instances were not only the interest lies in the upper tail itself but also where the data are sparse. For example, in finance, determination of the value at risk or expected shortfall is central to portfolio management, and in insurance, it is important to estimate probabilities associated with given levels of losses. A critical challenge is then to select the threshold from which the upper tail is modelled parametrically (see for example Danielsson et al. 2001;Guillou and Hall 2001;Beirlant et al. 2002;Dupuis and Victoria-Feser 2006 and the references therein). Cowell and Flachaire (2015) propose to use a another type of semi-parametric approach by which a mixture of lognormal distributions is first considered and then data are generated from the estimated mixture. A mixture of lognormal distributions to model the data can be thought of as a compromise between fully parametric and nonparametric estimation. The use of mixtures for income distribution estimation can be found for example in Flachaire and Nuñez (2007) and the references in Cowell and Flachaire (2015).
Through a simulation study, Cowell and Flachaire (2015), Table 7, compare the actual coverage probabilities of 95% confidence intervals for the Theil index, using, as data generating models, the lognormal distribution and the Singh-Maddala (SM) distribution (Singh and Maddala 1976), with varying parameters to increase the heaviness of the tail. The different methods cited above are compared. Cowell and Flachaire (2015) conclude that, in the presence of very heavy-tailed distributions, even if significant improvements can be obtained on the fully asymptotic and the standard bootstrap methods, none of the alternative methods provides very good results overall.
Moreover, Cowell and Flachaire (2015) do not consider a parametric bootstrap and this has motivated the present paper. Namely, we study the behaviour of coverage probabilities associate to the index functional T(F) using a parametric bootstrap based on samples generated from Fθ (i.e., Approach 4). A parametric model introduces a form of smoothness into the inferential procedure which can lead to more accurate inference. This is for example a fundamental argument for modelling the upper tail with a Pareto distribution. Specifying a parametric distribution for the data generating process can be considered as an additional risk of introducing "error" in the inferential procedure. With income distributions, common wisdom however suggests that some parametric models are sufficiently flexible to encompass most of the data generating processes observed with real data. For example, the four parameters generalized beta distribution of second kind (GB2) proposed by (McDonald 1984), which encompasses the generalized gamma, the Singh-Maddala and Dagum distribution (Dagum 1977) (see also McDonald and Xu 1995), can be considered as sufficiently general to model income data. If this is not the case, then one would wonder if the lack of flexibility of a general four parameter model is not due to a spurious amount of observations, and hence consider a robust estimation approach as proposed and motivated in Cowell and Victoria-Feser (1996), see also (Cowell and Victoria-Feser 2000).
In this paper, as an alternative to the classical Maximum Likelihood Estimator (MLE), we propose a Target Matching Estimator (TME), a member of the class of Generalized Method of Moments (GMM) estimators (Hansen 1982), where one of the "moments" is the targeted inequality index T. It has the advantage that for inference on T, the scale parameter does not need to be estimated (and hence can be set to an arbitrary value), so that the estimation exercise is simpler in that the optimization is performed in a smaller dimension. We derive its asymptotic properties and compare them to the MLE when targeting T(F θ ). As illustrated in a simulation study, it turns out that the finite sample coverage probabilities obtained from a parametric bootstrap based on this alternative estimator are far more accurate than the ones computed with other methods, especially with heavy tailed income distributions.

A Target Matching Estimator
Recall that we are interested in making inference on an inequality index T and we assume that the sample data are generated from a (sufficiently general) parametric mode F θ , θ ∈ Θ ⊂ p . We let ν = (T, S 1 , . . . , S q−1 ) be a vector of statistics of length q, where the first element is the statistic of interest and the remaining q − 1 elements are additional statistics. We denote byν the sample vector of statistics and by ν n (θ) its expectation at the model F θ , for a fixed sample size n. Assuming that the mapping θ → ν n (θ) is bijective, a GMM estimator can be defined aŝ where Ω is positive definite q × q matrix of weights, possibly estimated from the sample (in that case one assumes that it converges to a non-stochastic quantity), used to adjust the statistical efficiency ofθ. If ν n (θ) cannot be obtained in an analytically tractable form, one can use instead ν(θ) = lim n→∞ ν n (θ), or alternatively, use Monte Carlo simulations to approximate ν n (θ), leading to a Simulated Method of Moments (SMM) estimator (McFadden 1989) given bŷ If the number of simulation B is infinite, then the estimators in (6) and (7) are equivalent, otherwise the latter is (asymptotically) less efficient.
It is computationally advantageous to have an analytic expression for ν(θ) and thus prefer this approximation overν n (θ). However, in finite samples, the bias onθ using ν(θ) may be more important than the one resulting from usingν n (θ) (see Guerrier et al. 2018). An other approach, considered for example by Arvanitis and Demos (2015), is to directly approximate ν n (θ) with expansions on analytical functions.
Given that the interest here is to make inference about a functional T, one also needs to consider a suitable choice for the (additional) statistics in ν. Obviously one needs to choose a number of statistics at least as large as the number of parameter in the assumed model, i.e., q ≥ p. If these statistics are sufficient, then q = p. Moreover, T may depend only on q s < p of the elements of θ, and for this purpose, the whole estimation of θ maybe an unnecessary burden. Let θ = (θ s , θ c ) where θ s , of dimension q s ≥ 1 is the vector of parameters that (uniquely) determines T whereas θ c , of dimension q c , is the vector of "nuisance parameters" that do not influence T. Then, instead of solving (6) or (7), we propose to consider a Target Matching Estimator (TME) defined aŝ It is known that in an homogeneous system the asymptotic covariance ofθ s is not influenced by the weighting matrix Σ (supposedly independent from θ) as long as Σ is a positive-definite matrix. Since we consider the case when the dimension of the statistics and the parameters of interest are the same, i.e., dim(ν) = dim(θ s ) = q s , taking the identity matrix for Σ, and assuming that the minimum of the quadratic function is attained in the interior of the parameter space Θ s , we then have that (8) can be equivalently written asθ s = argzero The generalized entropy family of measures and the Gini index are scale invariant whereas the models F θ usually suggested in the literature (Kleiber and Kotz 2003) are parametrised with a scale component. Indeed, let δ, an element of θ, denote the scale parameter, then with the linear property of the expectation, I ξ GE (F) in (2) is invariant to any transformation δX. The same statement is true for the Gini coefficient. This is not surprising as scale-invariance is indeed one of the required property of inequality indices. We hence have (∂/∂δ)T(F θ ) = 0, so that θ s is θ without the scale parameter δ. Note that (∂/∂δ)T(F θ ) = 0 may be useful in situations where the analytical form of T(F θ ) is not available.
For scale invariant inequality measures T, any statistics of the form is also scale-invariant. This is also true with a logarithmic transformation as Finally, for the choice of F θ , one can consider the GB2 (see Section 4) which is sufficiently general to encompass real data situations with income data (Bandourian et al. 2002). Alternatively, as suggested for example in Cowell and Flachaire (2015), one can also consider the SM distribution.
In the simulation Section 4 we propose suitable statistics ν that are used in (8). Given these statistics ν and an assumed data generating model F θ , inference about T, using the parametric bootstrap, is obtained using Algorithm 1.

Algorithm 1: TME-percentile confidence interval
Input : A given function ν s ; its sample versionν s ; number of iteration B; a confidence level 1 − α. Output : An interval: [H (n) is the empirical distribution function of T, with realizations T 1 , . . . , T B . Computeθ s = argmin θ s ν s − ν(θ s ) 2 . Fix θ c to an arbitrary value in Θ c .
Compute the percentiles H Note that ifν(θ s ) is used instead of ν(θ s ) in (8), the last step of the optimization leading toθ s readily delivers (T 1 , . . . , T B ).

Asymptotic Properties
We now look at the asymptotic distribution of the TME in (8). Since θ c is fixed but θ s is estimated by matching some statistics ν, a crucial question is on whetherθ s is more efficient than sayθ s MLE , the estimator that we would have obtained by the MLE on the whole vector θ. In order to answer this question consider a setting in which the regular conditions for the MLEθ MLE to be square root-n consistent are met. In this case, we let I denotes the Fisher information matrix evaluated at the point θ 0 ∈ Θ, we have n 1/2 θ MLE − θ 0 N 0, I −1 .
This setting is clearly not the weakest possible in theory for our analysis and may be further relaxed. We do not attempt to pursue the weakest possible conditions to avoid overly technical treatments in establishing the theoretical result given in this section.
Theorem 1. Let Θ s ⊂ q s be compact. Suppose that the point θ s 0 is in the interior of Θ s . Suppose that ν(θ s 0 ) is the expectation ofν s when n is large. If n 1/2 ν s − ν(θ s 0 ) satisfies a central limit theorem with covariance matrix Ξ, the mapping θ → ν is bijective, continuously once differentiable in an open neighborhood of the point θ s 0 ∈ Θ s and the derivativeν is nonsingular at the point θ s 0 , then The proof is provided in the Appendix A.
Compared to the MLE, the additional condition that the statisticsν s satisfy a central limit theorem is mild and generally met in practice for sample moments and the inequality indices considered here. The results on the delta method and the continuous mapping theorem of Phillips (2012) may be employed to refine Theorem 1 to the case where the known function ν is replaced by the function evaluated by simulationν n .
The asymptotic covariance matrix ofθ s , given in Theorem 1 by [ν(θ s 0 ) ] −1 Ξν(θ s 0 ) −1 , is proportional to the inverse of the derivative of the expectation of the statistics with respect to θ and the asymptotic covariance matrix of the statistics. The choice of statistics should then be guided by their sensitivity to θ and their variability at the model. The same argument is found in Heggland and Frigessi (2004).
If the statistics ν are sufficient, then the asymptotic covariance matrix ofθ s is equivalent to the asymptotic covariance matrix of the MLE conditionally onθ c MLE fixed. From the properties of the normal distribution, we have asymptotically that where V ss = I −1 ss − [I −1 ] sc I cc [I −1 ] cs , I ss denotes the partition of I corresponding to θ s , I cc for θ c and [I −1 ] sc for the covariances betweenθ s MLE andθ c MLE . Thus, the estimatorθ s obtained from (8) has a smaller variance than the unconditional MLE by a factor [I −1 ] sc I cc [I −1 ] cs ≥ 0. In particular, this gain could by substantial ifθ c has a large variance. On the other hand, the gain would be null ifθ s andθ c are independent as their covariances [I −1 ] sc = [I −1 ] cs = 0.
Choosing "good" statisticsν s remains a difficult task: sufficient statistics with appropriate data reduction and with the property of being independent (asymptotically) from θ c may be hard to find. Heggland and Frigessi (2004) suggest a graphical procedure based on simulation to find statistics "sensitive enough" to the parameter of interest. In a similar context, Gallant and Tauchen (1996) propose to use the likelihood score function of a model "close" to the one of interest as statistics. In the present context, it could be a probability model parametrised by θ s only. There are however no guarantee that such a model exists, and if it does, it might be not unique.

Simulation Study
We consider here two parametric distributions, namely the four parameters GB2 and the three parameters SM distributions. We compare the coverage probabilities provided by the parametric bootstrap using on the one hand the MLE and on the other hand the TME approach presented in Section 2 (using Algorithm 1) to the nonparametric bootstrap for the GB2. We also compare the coverage probabilities assuming a SM data generating process, to a variance stabilizing transform of the index proposed by Schluter (2012) (Varstab), the semi-parametric approach of Davidson and Flachaire (2007) and Cowell and Flachaire (2007) (Semip) and when mixtures of lognormal distributions are used to fit the density as proposed in Cowell and Flachaire (2015).
The GB2 has density function where B is the beta function, b is the scale parameter, a, p and q are shape parameters. Note that here we consider a to be positive, yet, the distribution of the inverse may be obtained by allowing a to be negative (McDonald and Xu 1995). Suppose we are interested in the Theil index defined in (4), the population index, with θ = (a, b, p, q) , is given by where Γ is the gamma function and ψ is the digamma function. Clearly the Theil index is scale invariant, so that we set θ s = (a, p, q) and θ c = b.
The population values of the statistics S k in (9) are given by and the ones for U l in (10), for l = 2, 3, are given by where ψ (m) is the polygamma function, i.e., the m-th derivative of the digamma function ψ.
As is done in Cowell and Flachaire (2015), we consider the SM distribution with density and corresponding population statistics T, S k and U l , l = 2, 3, given by where ζ(3) is the Apéry's constant.
Under the GB2, for generating the data, we set θ s = (a = 3, p = 3.5, q = 0.8) , θ c = (b = 10) and n = 250, 500, 1000. For the TME, we choose the vector of statistics to be ν = [T(x), U 2 (x), U 3 (x)] with T(x) the Theil index and U j (x), j = 2, 3 given in (10). We fix the value of the scale parameter to the arbitrary value of one (b = 1) in Algorithm 1. We repeat the experiment 10 4 times and set the number of bootstrap replicates to B = 10 3 .
To solve forθ s in (8) or for the MLE, we use the classical quasi-Newton optimization algorithm with starting values obtained from the differential evolution heuristic (Storn and Price 1997), in order to mimic a real situation in which the true parameter's values are unknown.
In Table 1, we report the performances of the three approaches with respect to a nominal confidence level of 95% for the three sample sizes. As already shown in the literature (see e.g., Cowell and Flachaire 2015), we find poor performance for the nonparametric bootstrap (Boot), far from the nominal confidence level. The parametric bootstrap using the MLE provides reasonable finite sample coverage that are nevertheless conservatives. On the other hand, the performance of parametric bootstrap using the TME is overall satisfactory, with enhanced performance when sample size increases. Table 1. Finite sample coverage probability with respect to a nominal confidence level (two-sided) of 95% for the Theil Index. Data are simulated under the GB2 with θ s = (a = 3, p = 3.5, q = 0.8) , In Table 2, we replicate the simulation study in (Cowell and Flachaire 2015 , Table 6.6), and report the values for Varstab, Semip and Mixture. We have θ s = (a = 2.8, q) , θ c = (b = 0.193) and set ν = [T(x), U 2 (x)] with T(x) the Theil index and U 2 (x) given in (10). We fix the value of the scale parameter to the arbitrary value of one (b = 1) in Algorithm 1. We repeat the experiment 10 4 times and set the number of bootstrap replicates to B = 10 3 . The results reported in Table 2 are also presented graphically in Figure 1. Both parametric approaches present finite sample coverage probabilities that are far more accurate than the other approaches, especially in the heavy tail case. As with the GB2, the parametric bootstrap based on the MLE tends to provide conservative coverage probabilities.  Table 6.6). Data are simulated under the Singh-Madalla with n = 500, θ s = (a = 2.8, q), θ c = (b = 0.193). The parameter q accounts for the shape of the upper tail of the distribution, the smaller the heavier the tail. ν = [T(x), U 2 (x)] with T(x) the Theil index. In Algorithm 1, b = 1. The experiment is repeated 10 4 times and B = 10 3 .   Table 1) and the Singh-Madalla (b) (see Table 2). Each color represents a different method. The shade area around each line is the 99.9% asymptotic confidence interval for proportion. The black line is the nominal confidence level of 95%.

Conclusions
In this paper, we study the finite sample accuracy of confidence intervals built via parametric bootstrap. We also propose a GMM estimator, the TME, that targets the quantity of interest, namely the considered inequality index. Its primary advantage is that the scale parameter of the assumed parametric model does not need to be estimated to perform parametric bootstrap, since inequality measures are scale invariant. The theoretical result and the simulation study suggest that this feature provides an advantage over the parametric bootstrap using the MLE and also over other established simulation-based inferential methods.
As noted by an anonymous referee, an important point that has not been directly assessed is the specification robustness, i.e., the properties of the proposed method when the assumed general model is not the exact one. This point deserves more (formal) investigation that we leave for further research.
On the more practical side, although this study is limited to two income distributions and one inequality index, the methodology presented here can be extended to other settings in a relative straightforward manner. For example, it is possible to extend the TME to include trimmed inequality indices since it suffices to use the trimmed version of T in ν. If trimming is done for robustness purposes as proposed in Cowell and Victoria-Feser (2003), then the other statistics inν should also be robust (see also Victoria-Feser 2000). This is the case, for example, with trimmed moments. θ s = argzero θ s ∈Bν (θ s ) g(θ s ), g(θ s ) =ν s − ν(θ s ).
The proof results from the central limit theorem on n 1/2 g(θ s 0 ), the invertibility of the derivativė ν(θ s 0 ) and the Slutsky's lemma.