Next Article in Journal
A Statistical Characterization of Median-Based Inequality Measures
Previous Article in Journal
Beyond GDP: COVID-19’s Effects on Macroeconomic Efficiency and Productivity Dynamics in OECD Countries
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Simple Approximations and Interpretation of Pareto Index and Gini Coefficient Using Mean Absolute Deviations and Quantile Functions

Department of Computer Science, Metropolitan College, Boston University, Boston, MA 02215, USA
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Econometrics 2025, 13(3), 30; https://doi.org/10.3390/econometrics13030030
Submission received: 22 June 2025 / Revised: 3 August 2025 / Accepted: 4 August 2025 / Published: 8 August 2025

Abstract

The Pareto distribution has been widely used to model income distribution and inequality. The tail index and the Gini index are typically computed by iteration using Maximum Likelihood and are usually interpreted in terms of the Lorenz curve. We derive an alternative method by considering a truncated Pareto distribution and deriving a simple closed-form approximation for the tail index and the Gini coefficient in terms of the mean absolute deviation and weighted quartile differences. The obtained expressions can be used for any Pareto distribution, even without a finite mean or variance. These expressions are resistant to outliers and have a simple geometric and “economic” interpretation in terms of the quantile function and quartiles. Extensive simulations demonstrate that the proposed approximate values for the tail index and the Gini coefficient are within a few percent relative error of the exact values, even for a moderate number of data points. Our paper offers practical and computationally simple methods to analyze a class of models with Pareto distributions. The proposed methodology can be extended to many other distributions used in econometrics and related fields.

1. Introduction

The Pareto distribution Pareto (1897) is commonly used for modeling income inequality Coelho et al. (2008) due to its heavy-tailed nature, capturing wealth concentration Bennett et al. (2019); Brzezinski (2014); Catalano et al. (2009); Chotikapanich et al. (2007); Pinsky et al. (2024). The tail index α and Gini coefficient G are key parameters Catalano et al. (2009); Gini (1912, 1936). Traditional methods like maximum likelihood Beirlant et al. (1996); Grimshaw (1993); Rytgaard (1990) and the method of moments are sensitive to outliers and can fail when moments do not exist Beirlant et al. (1996); Hill (1975). Recent literature has turned to quantile-based methods for their robustness and applicability to heavy-tailed distributions Brazauskas and Serfling (2007); Pickands (1975). However, existing quantile and moment estimators often involve complex calculations or require multiple quantiles Chotikapanich et al. (2007); Dekkers et al. (1989); Hussain et al. (2018). A comparison of recent methods for parameter estimation for the Pareto distribution is presented in Warsono et al. (2019).
This paper presents a simple method for estimating α and G using only the first quartile Q 1 , median M, and third quartile Q 3 . Our approach uses the mean absolute deviation (MAD) Bloomfield and Steiger (1983) and truncation to estimate and interpret the parameters in terms of the corresponding sub-areas computed as integrals of the underlying quantile function. Our approximation formulas directly yield simple closed-form formulas relating the tail index and the Gini coefficient to quartiles. The proposed MAD-based method is robust (50% breakdown point), making it ideal for heavy-tailed data Hampel (1974) with outliers often present in income and wealth distributions data Bennett et al. (2019); Brzezinski (2014).
One of our main results is the approximation of the tail index α and the Gini coefficient G using the median and weighted average of the other two quartiles. By considering different quadrature approximation methods, we obtain different weights and a number of approximations to the parameters. For example, in one such approximation, the tail index α is shown to be half the inverse Bowley–Galton skewness coefficient Bowley (1920), and the Gini index G is one-half of the ratio of the quartile difference to the lower quartile difference. These approximations allow us to connect classical quantile statistics Huber (2009) and Pareto parameters. Our approach offers several advantages:
  • Computational simplicity—order statistics readily available from standard software;
  • Applicability to truncated distributions (e.g., income surveys with top-coding);
  • Elegant geometric and economic interpretation via areas under the quantile function and income groups.
This paper is organized as follows. In Section 2, we introduce the Pareto distribution and discuss its quantile function. In Section 3, we derive and interpret mean absolute deviation for a continuous distribution in terms of the subareas of the quantile functions. In Section 4, we apply these ideas to the Pareto distribution. In Section 5, we interpret the indices in terms of the ratio of the appropriate subarea integrals. In Section 6, we provide an economic interpretation of these indices. In terms of a truncated Pareto distribution. In Section 7, we consider the truncated Pareto distribution and derive its mean absolute deviation. This will allow us to express the tail index α and the Gini coefficient in terms of quartiles and, at the same time, address the practical issue of dealing with outliers in real data sets. In Section 8, we derive an approximation to indices using the mean absolute deviation of a truncated distribution. By contrast, in Section 9, we derive several approximations to these indices using the mean of the truncated distributions and several integral quadrature methods. The proposed approximations have simple geometric and economic interpretations. The obtained formulas are expressed in terms of very simple expressions involving the median and weighted average of the other two quartiles. In Section 10, we provide a summary of the proposed approximation formulas. In Section 11, we summarize our proposed estimation formula and compare it to the Maximum Likelihood Estimation. In Section 12, we present simulation results to illustrate our approach. These simulations show that the proposed estimators achieve relative errors within a few percent of true values for even small sample sizes. In Section 13, we illustrate our approximations on several real datasets. Finally, in Section 14, we present some concluding remarks and discuss future directions.
In this paper, we focus on income inequality. Many economists focus on other components of income, such as wages and earnings, or on some utility functions like consumption and leisure Attanasio and Pistaferri (2016). We do not advocate a particular viewpoint on established theories in finance and economics (permanent income hypothesis, lifecycle models, wealth accumulation, and others). Our objective is to show that the proposed formulas for the Pareto distribution allow for the derivation of simple economic interpretations in terms of quantile-based metrics. We emphasize that these approximations do not require full knowledge of the underlying distributions. Since the obtained metrics are available in closed form in terms of simple quantile measures, one can study the effect of changes by examining the differences in quantiles over time. This allows for the application of the proposed methodology to time-varying or dynamic quantile models. Therefore, our methodology and formulas are general and can be used to interpret other models described by the Pareto distribution.

2. Pareto Distribution

Suppose X is distributed according to the Pareto distribution Feller (1956); Johnson and Kotz (1970); Papoulis (1984) over the domain x β with scale β > 0 and shape α > 0 . Its density f ( x ) and its cumulative distribution function F ( x ) are given by
f ( x ) = α β α x α + 1 and F ( x ) = 1 β x α
For this distribution, no moments exist for α 1 Arnold (2015). The first moment is finite for α > 1 , and the second moment exists only for α > 2 . The mean μ and variance σ 2 for this distribution are
μ = α β ( α 1 ) for α > 1 and σ 2 = α β 2 ( α 1 ) 2 ( α 2 ) for α > 2
Of practical interest is the case α > 1 . For example, many datasets have α in the range 1 < α < 2  Dallas (1976). Pareto density f ( x ) and cumulative distribution function F ( x ) are illustrated in Figure 1.
The Pareto distribution is right-skewed, with more data points on the lower end Dallas (1976); MacGillivray (1986). In economic terms, this can be interpreted as stating that lower-income people have a disproportionately lower share of the total wealth.
Let Q ( p ) be the quantile function defined by Q ( p ) = inf { x : F ( x ) t } with t [ 0 , 1 ] . For the Pareto distribution, the quantile function is Gilchrist (2000)
Q ( p ) = β ( 1 p ) 1 / α
and therefore, for the quartiles Q 1 , Q 2 = M , and Q 3 we have
Q 1 = β ( 4 / 3 ) 1 / α , M = β 2 1 / α and Q 3 = β 4 1 / α
Since α > 1 and 0 p 1 , for the first and second derivatives of Q ( p ) , we have
Q ( p ) = Q ( p ) α ( 1 p ) > 0 and Q ( p ) = ( α + 1 ) Q ( p ) α 2 ( 1 p ) 2 > 0
and therefore, the quantile function Q ( p ) is an increasing convex function with a minimum Q ( 0 ) = β . It follows then that for any 0 p 1 < p 2 1 and any 0 w 1 we have w Q ( p 1 ) + ( 1 w ) Q ( p 2 ) > Q w p 1 + ( 1 w ) p 2 . This is illustrated in Figure 2.
In Figure 2, we take p 1 = 1 / 4 , p 2 = 3 / 4 , and w = 1 / 2 . The convexity of Q ( p ) implies that M < ( Q 1 + Q 3 ) / 2 . We may consider other choices for p 1 , p 2 , and w and obtain the following inequalities that will be useful later in this paper:
  • w = 1 / 3 :   β / 2 + 2 Q 3 / 3 > Q 1
  • w = 1 / 2 :   β / 2 + M / 2 > Q 1 and Q 1 / 2 + Q 3 / 2 > M
  • w = 2 / 3 :   2 β / 3 + Q 3 / 3 > M

3. Mean Absolute Deviation

Let us start by outlining our methodology. It is based on using mean absolute deviations to derive simple expressions for the tail index and the Gini coefficient as an alternative to the traditional approach of deriving them, mainly the maximum likelihood estimation. Using mean absolute deviation is a standard statistical technique of estimating parameters that minimizes the calculated mean absolute deviation. Specifically, we derive mean absolute deviations in terms of the integrals of the quantile function. With simple quadrature approximations to these integrals, we express metrics of interest in terms of simple expressions involving the quartiles. This allows a simple explanation of the results in terms of the quantile metrics.
The proposed approximations do not require the full knowledge of the underlying distribution. The main attractiveness of the proposed method is its simplicity—we express our results in terms of the weighted averages of the quartiles.
Let us start with mean absolute deviations. Suppose X is a real-valued random variable with density f ( x ) and cumulative distribution function F ( x ) . Let Q ( p ) be the quantile function defined by Q ( p ) = inf { x : F ( x ) t } with t [ 0 , 1 ] . Let Q 1 , Q 2 = M , and Q 3 denote the first quartile, median, and third quartile, respectively.
If X has a finite first moment μ , then for any value a we can define the mean absolute deviation H of X around the a by Bloomfield and Steiger (1983).
H ( a ) = | x a | f ( x ) d x = x a ( a x ) f ( x ) d x + x > a ( x a ) f ( x ) d x
This is defined in Lebesque–Stieltjesintegration Convertito and Cruz-Uribe (2023); Rudin (1976) and applies to continuous and discrete distributions. If a = μ , then H ( μ ) denotes the mean absolute deviation around the mean. If a = M , then H ( M ) denotes the mean absolute deviation around the median M.
It is well-known Bloomfield and Steiger (1983); E. Elsayed (2022); Schwertman et al. (1990); Shad (1969) that for any distribution with finite variance, H ( X , M ) H ( X , μ ) σ . On the other hand, mean absolute deviations require only the existence of the first-order moment. Throughout this paper, we focus on the mean absolute deviation around the median, and we will use the notation H = H ( M ) .
Mean absolute deviations have been used for a long time, as early as the 18th century by Boscovitch and Laplace Farebrother (1987); Portnoy and Koenker (1997). For a historical survey, see Gorard (2005); Pham-Gia and Hung (2001). There is currently a renewed interest in using the L 1 norm for robust statistical modeling and inference (e.g., Dodge (1987); K. Elsayed (2015); Gorard (2015a, 2015b); Habib (2012); Leys et al. (2013); Yager and Alajilan (2014), to name just a few). A MAD-based alternative to kurtosis based on H ( M ) was recently introduced in Pinsky and Klawansky (2023).
Since the well-known work of Fisher (1920), standard deviation has been used as the primary metric for deviation. The convenience of differentiation and optimization, and the summation of variances for independent variables, has contributed to the widespread use of the standard deviation in estimation and hypothesis testing. On the other hand, using MAD offers a direct measure of deviation and is more resilient to outliers. In computing standard deviation, we square the differences between values and the central point, such as the mean or median. As a result, standard deviation emphasizes larger deviations. The computation of mean absolute deviation offers a direct and easily interpretable measure of deviation.
Usually, one computes mean absolute deviations from the density functions Kenney and Keeping (1962) as shown in Equation (6). It is also possible to derive mean absolute deviations from cumulative distribution functions (CDF) instead of density functions (PDF). For many well-known distributions, the CDFs are often expressed in terms of special functions, and indefinite integrals and sums for these functions are well-known, allowing one to derive expressions for MAD in closed form Pinsky (2025).
Alternatively, it is possible to derive mean absolute deviations in terms of the quantile functions. For many distributions, such as Pareto, quantile functions are available in closed form. Computing mean absolute deviations involves computing some integrals of these quantile functions. Using truncation, we derive the parametric solutions through integration of the quantile function between the quartiles. This ensures that the integrals are finite even for distributions exhibiting heavy tails or infinite moments Bock et al. (2013); Rousseeuw and Leroy (2005).
Our approach eliminates the computational burden of iterative optimization procedures while maintaining accuracy comparable to maximum likelihood estimation. As a result, our approach gives closed-form solutions for parameters that are resistant to outliers and achieves substantial reductions in complexity without sacrificing precision. We have applied this approach to a number of distributions with heavy tails and obtained very good approximations.
Our main idea is the following: for the Pareto distribution considered in this paper, the quantile function has a simple form. We consider a truncation at the first and third quartiles. This ensures that all integrals are finite. The MAD of the truncated distribution can be computed in terms of the corresponding integrals of the quantile functions and can be expressed in terms of the tail index α and quartiles. On the other hand, using simple quadrature methods for evaluating these integrals, we express MAD in terms of the corresponding quartiles only. From these two expressions for MAD, we derive simple approximations for the tail index α and for the Gini index G. The accuracy of the proposed approach depends on the form of the quantile function and on the numeric quadrature approximation. We show by extensive simulations that for the Pareto distribution, the proposed approximations to α and G give good accuracy. They are easy to compute and interpret.
The proposed approach of using the quantile functions and truncation offers many advantages. For many distributions, the quantile function is available in closed form even if the CDF or PDF is not (for example, the Levy or the Cauchy distribution). It is therefore possible to design parameter estimation procedures for these distributions (after truncation) and derive simple closed-form approximations. This allows us to connect the derived approximations for α and G and interpret them in terms of widely used quantile metrics such as interquartile range and Galton skewness.
One of the interesting applications of our method is to study volatility in financial asset prices and investment returns. The standard approach is to use the standard deviation of returns as a measure of risk and to compare investments in terms of Sharpe ratios (returns per unit of risk). Mean absolute deviation is a robust measure of risk and provides an alternative to variance-based risk measures. It has been suggested as an alternative for risk management for non-normal return distributions Lam et al. (2021).
Let us elaborate on this connection with risk and volatility. First, consider the Pareto distributions with α > 1 . These distributions have finite mean μ , and from Equation (2), we have α = μ / ( μ β ) . We can interpret ( μ β ) as a measure of deviation from the mean and interpret it as volatility. Then, if we interpret μ as some measure of average returns, then the above expression is analogous to the Sharpe ratio widely used in finance to measure risk-adjusted returns. In the more general case for any α , we can consider truncation and show that the mean absolute deviations can be expressed via the integrals of the quantile functions, and in turn, these integrals can be approximately evaluated via quartiles using a number of quadrature approximation methods. For example, in one such method (“midpoint”) applied for the Pareto distribution in Section 9.2, we obtain H ( Q 3 Q 1 ) and α ( Q 1 + Q 3 ) / 2 ( Q 3 Q 1 ) . If we think of X as measuring returns, then the term ( Q 3 Q 1 ) for H can be interpreted as risk, whereas the term ( Q 1 + Q 3 ) / 2 can be interpreted as some measure of average returns. Therefore, with this analogy, the tail index α can again be interpreted as the analogy of Sharpe’s ratio. Higher volatility would translate into higher values of the tail index.
Finally, for distributions with finite mean, we have the following result Pham-Gia and Hung (2001) for confidence intervals:
P | X μ | k H ( μ ) 1 k and P | X M | k H ( M ) 1 k
where k > 1 . With truncation at the quartiles considered in this paper, the median M of the truncated distribution X is unchanged, the mean μ of X is always finite, and the above equation can be rewritten as
P | X μ | k H ( X , μ ) 1 k and P | X M | k H ( X , M ) 1 k
where H ( X , μ ) and H ( X , M ) denote the mean absolute deviations of X around the mean μ and the median M, respectively. This is somewhat analogous to the well-known Chebyshev’s inequality for distributions with finite variance Feller (1956):
P ( | X μ | k σ ) 1 k 2
For many cases of interest with Pareto distribution 1 < α < 2 , resulting in infinite variance. As a result, Chebyshev’s inequality in Equation (9) cannot be used. However, the bounds with mean absolute deviations in Equation (7) are applicable, giving us additional inequalities relating mean absolute deviations (“volatility”) and confidence intervals.
On the other hand, our method has limitations. Although the quantile function is available in closed form for many well-known distributions (including Pareto, Weibull, log-normal, and exponential), for many distributions, such as the beta distribution, quantile functions are not available in closed form. For such distributions, we can compute MAD from PDF or CDF (if they are available in closed form) or from some estimated quantile function. One approach that we think may be promising is to consider a convex sum of well-known quantile functions to model a distribution, as suggested in Gilchrist (2000). In this case, the resulting Mean Absolute Deviation can be computed in terms of Mean Absolute Deviations of these quantile functions, and the resulting parameters can be expressed in terms of the quantiles of these functions. We hope to address this in future work.
Another limitation of the proposed method is dealing with small sample sizes. With truncation, we eliminate 50% of the points and derive approximations to parameters from the “central” part of the distribution. Our preliminary results of applying this method for a number of distributions seem to be promising, even for relatively small sizes.

4. Mean Absolute Deviation for Pareto Distribution

For the Pareto distribution, the mean is finite for α > 1 . Most of the data sets on income inequality have the tail index α satisfying 1.5 < α < 1.7 (Brzezinski (2014); Catalano et al. (2009)). Throughout this paper, we assume α > 1 and use H to denote the mean absolute deviation around the median M. From the above Equation (6) for a = M , we immediately obtain
H = x > M x f ( x ) d x x M x f ( x ) d x
We can express H in terms of the integrals of the quantile function Q ( p ) as follows. For continuous distributions, we have x = Q ( p ) , p = F ( x ) , and d p = F ( x ) d x = f ( x ) d x . For any x 1 < x 2 , let p 1 = F ( x 1 and p 2 = F ( x 2 ) . Then, we have
p 1 p 2 Q ( p ) d p = x 1 x 2 x f ( x ) d x
We will use the following result for the Pareto distribution: for any 0 < p 1 < p 2 < 1 , we have Olver (1974)
p 1 p 2 Q ( p ) d p = α ( α 1 ) ( 1 p 1 ) Q ( p 1 ) ( 1 p 2 ) Q ( p 2 )
Let us define two subarea integrals I 1 and I 2 as
I 1 = 0 1 2 Q ( p ) d p and I 2 = 1 2 1 Q ( p ) d p
Therefore, we can rewrite equation Equation (10) for MAD as
H = I 2 I 1 = α ( α 1 ) M μ
If we re-write this as H = ( I 2 M / 2 ) + ( M / 2 I 1 ) , then H has a simple geometrical interpretation as the shaded area shown in Figure 3b.
Unlike the standard deviation, which is based on the L 2 -norm, the mean absolute deviation is an L 1 -metric and offers a direct and more explainable metric of deviations Dodge (1987); K. Elsayed (2015); Pham-Gia and Hung (2001); Rousseeuw and Croux (1993). When applied to the Pareto distribution, the mean absolute deviation H also has a simple “economic” interpretation. We can interpret I 1 as one-half of the average income of the (0–50%) lower-half income households, whereas I 2 is one-half of the average income of the (50–100%) income households. The mean absolute deviation from the median H in Equation (14) can therefore be interpreted as one-half of the difference in mean incomes between the upper- and lower-half income groups.
Finally, let us now relate the mean absolute deviation to skewness. Consider the Groeneveld and Meeden’s skewness Groeneveld and Meeden (1984) coefficient S is defined as
S = μ M H
In economic terms, the numerator ( μ M ) is the difference between the mean income μ and the median income M. The denominator H = ( I 2 I 1 ) is one-half of the difference between the average income of the upper half (50–100%) income bracket and the lower half (0–50%) income bracket.

5. Interpreting the Tail and the Gini Index

One of the metrics for this distribution is the Gini index G that measures the deviation from equitable distribution. Formally, it is defined in terms of the so-called Lorenz curve L ( p )
L ( x ) = β x t f ( t ) d t β t f ( t ) d t
The Lorenz curve shows the percentage of income earned by people below the value x. For the Pareto distribution, the Lorenz curve L ( p ) and the Gini index G are
L ( p ) = 1 ( 1 p ) 1 1 / α and G = 0 1 L ( p ) d p = 1 2 α 1
This index can be computed as the ratio of area A / ( A + B ) under the Lorenz curve. This is illustrated in Figure 4a.
The coefficient G tells us how far the Lorenz curve L ( x ) is from the line of equality E ( p ) . The Gini coefficient G is equal to the area ( A + B ) below the line of perfect equality E ( p ) minus the area B below the Lorenz curve L ( p ) , divided by the area ( A + B ) below the line of perfect equality E ( p ) . In other words, G = A / ( A + B ) .
We can derive a similar interpretation in terms of the quantile functions. Consider two random variables, X and Y, both with the same β but with tail indices α and 2 α respectively. As before, we assume α > 1 . Note that a larger α represents a more equitable distribution.
Let Q x ( p ) and Q y ( p ) denote their corresponding quantile functions. The means μ x and μ y can be computed as areas under their quartile functions:
μ x = 0 1 Q x ( p ) d p and μ y = 0 1 Q y ( p ) d p
On the other hand, for Pareto distributions with α > 1 , these means are μ x = α β / ( α 1 ) and μ y = 2 α β / ( 2 α 1 ) respectively. Then
μ y μ x = α β ( α 1 ) 1 2 α 1 = μ x G x G x = μ y μ x μ x
If the means μ x and μ y represent the areas ( A + B ) and B under the corresponding quantile functions, we can interpret the Gini coefficient as follows: The Gini coefficient G is equal to the area ( A + B ) below the quantile function of X minus the area B below the quantile function of Y divided by the area ( A + B ) below the quantile function of X. In other words, G = A / ( A + B ) . This is illustrated in Figure 4b. This explanation is analogous to the explanation of G in terms of the Lorenz curve shown in Figure 4a.
Finally, throughout this paper, we will use the following: for any two constants A and B such that
α ( α 1 ) A = B
we have
α = 1 + A B A and G = 1 2 A B + A
For example, if we apply this to the formula for the mean of the Pareto distribution with A = β and B = μ , then we obtain
α = 1 + β μ β and G = 1 2 β μ + β
This has the following geometric explanation in terms of the quantile function shown in Figure 5.
In the case of perfect inequality (one person has all the wealth), the tail index α = 1 and the Gini coefficient is 1. The term β / ( μ β ) in the expression for α and the term 2 β / ( μ + β ) in Equation (22) measure deviation from such a perfect inequality. On the other extreme, in the case of perfect equality (everyone has the same income β ), the tail index α is infinite and the Gini coefficient G = 0 . In such a case, all income would be equal, and the mean income μ would be the same as the minimum income β . Therefore, the function y ( p ) = β represents the quantile function of “total equality” with the area under y ( p ) for 0 p 1 equal to β . The mean μ for the Pareto distribution is the area under the quantile curve shown in Figure 5a. The term ( μ β ) is the average minus the minimum income β and represents deviation from the equality case. Therefore, the term β / ( μ β ) in Equation (22) represents deviation from perfect inequality for α , whereas the term 2 β / ( μ + β ) represents the deviation of the Gini coefficient from the perfect inequality. This is shown in Figure 5b.
This interpretation of α and G in terms of the ratio of areas under the quantile function in Figure 5b in Equation (22) is analogous to the interpretation of G in terms of the ratios of areas under the Lorenz curve in Figure 4b in Equation (17).

Interpretation of the Tail Index and Gini Coefficient in Terms of Mean Absolute Deviation

Alternatively, we can interpret α and G in terms of the mean absolute deviations. First, we provide a geometric interpretation for α .
For the Pareto distribution, from Equation (12) we have
I 1 = 1 2 1 Q ( p ) d p = α α 1 β M 2 I 2 = 0 1 2 Q ( p ) d p = α α 1 M 2
Since for the Pareto distribution with α > 1 , the mean μ is finite and μ = α β / ( α 1 ) , we can easily evaluate H from Equation (14) as follows:
H = I 2 I 1 = μ α α 1 M
After some simple algebra, we obtain
α = H + μ H + μ M = I 2 I 2 M / 2 and G = H + μ M H + μ + M = I 2 M / 2 I 2 + M / 2
This is illustrated in Figure 6.
Finally, let us relate the mean absolute deviation H, the mean μ , and the Lorenz curve. Since L ( 1 / 2 ) = 1 2 1 / α 1 , we have
I 1 = α β ( α 1 ) L 1 2 , and I 2 = α β ( α 1 ) 1 L 1 2
This gives us
H = α β α 1 = μ 1 2 L 1 2

6. “Economic” Interpretation of Tail Index, Gini Index, and MAD

Let us interpret our results in economic terms. If the distribution of income X follows a Pareto distribution, then we can define any subgroup with income in the range x 1 X x 2 and compute the average income in this group by considering the corresponding truncated distribution of X as follows. For any two values, x 1 < x 2 let p 1 = F ( x 1 ) and p 2 = F ( x 2 ) . Let E ( x 1 X x 2 ) denote the corresponding average income for that group. Then, using the truncation, we have
E ( x 1 X x 2 ) = 1 F ( x 2 ) F ( x 1 ) x 1 x 2 x f ( x ) d x = 1 ( p 2 p 1 ) p 1 p 2 Q ( p ) d p = 1 ( p 2 p 1 ) · α ( α 1 ) ( 1 p 1 ) Q ( p 1 ) ( 1 p 2 ) Q ( p 2 )
For example, if we define a low-income group as households with income β X Q 1 , then we have p 1 = 0 and p 2 = 1 / 4 . From Equation (28) we obtain
E ( β X Q 1 ) = 4 α ( α 1 ) β 3 4 Q 1 = α ( α 1 ) ( 4 β 3 Q 1 )
In the same manner, we can define other income groups and explicitly compute their average income in terms of α and quartiles. This is summarized in Table 1.
From this table, we can interpret I 1 as one-half of the average income of the (25–50%) lower-half income households, whereas I 2 is one-half of the average income of the (50–75%) income households. The mean absolute deviation from the median H in Equation (14) can be interpreted as one-half of the difference in mean incomes between the upper- and lower-half income groups.
The tail index α in Equation (25) can be written as α = 2 I 2 / ( 2 I 2 M ) . This gives us the following simple interpretation: The tail index is the ratio of the average income 2 I 2 of the (50–75%) income group divided by the differences between this average and the median income, namely ( 2 I 2 M ) .
The Gini index α in Equation (25) can be written as G = ( 2 I 2 M ) / ( 2 I 2 + M ) . This gives us the following simple interpretation: The Gini index is the ratio of the difference between the average income and the median ( 2 I 2 M ) of the (50–75%) income divided by the sum of this average and the median income, namely ( 2 I 2 + M ) .
Finally, we note that for the Pareto distribution, for any average income above some x = Q ( p ) , we have
E ( X x ) = α ( α 1 ) Q ( p )
Therefore, the ratio of the average income above any threshold X = x to the corresponding rank Q ( p ) is independent of x and is just b = α / ( α 1 ) . The term b is referred to as the inverted Pareto coefficient. It represents a measure of income at the top of the distribution and describes the tail of the distribution. In this paper, we will continue to express our formulas in terms of the tail index α .

7. Truncated Pareto Distribution

If X does not have a finite first moment (case α 1 ), then at least one of the integrals on the right-hand side of Equation (6) does not converge. Moreover, in practice, we often have extreme values (“outliers”) that could have a significant effect on the results. The mean absolute deviation is not as sensitive to outliers as the standard deviation, especially if one uses mean absolute deviation around the median (as we do) and not around the mean Leys et al. (2013).
Many models in statistical analysis assume finite mean or variance. Many models with a Pareto distribution assume a finite mean ( α > 1 ). However, there are many models for datasets with heavy tails for which the finite moment does not exist. Such models are used to describe catastrophic losses in risk management.
To address this, we can consider a truncated random variable X restricted to Q 1 x Q 3 . When applied to real datasets, truncation allows the removal of extreme values in the distribution. For this truncated distribution, its density f ( x ) and its cumulative distribution F ( x ) are Papoulis (1984)
f ( x ) = 2 f ( x ) , F ( x ) = 2 F ( x ) 1 / 2 , x ( Q 1 , Q 3 )
The median of the truncated random variable X is M = M F ( Q 1 ) + F ( Q 3 ) = M and coincides with the median M of the original distribution X.
We now address the issue of computing the mean absolute deviation H ( M ) of a truncated random variable X around its median M using the original quantile function Q ( p ) for X. To that end, we will find it convenient to introduce the following two “subarea” integrals:
I 1 = 1 4 1 2 Q ( p ) d p and I 2 = 1 2 3 4 Q ( p ) d p
Then, we can write the equation for the mean absolute deviation H of the truncated distribution H around its median M = Q 2 as follows:
H = Q 1 Q 3 | x M | f ( x ) d x = 2 Q 1 M ( M x ) f ( x ) d x + 2 M Q 3 ( x M ) f ( x ) d x = 2 M Q 1 M f ( x ) d x Q 1 M x f ( x ) d x + 2 M Q 3 x f ( x ) d x M M Q 3 f ( x ) d x = 2 M 4 1 4 1 2 Q ( p ) d p + 2 1 2 3 4 Q ( p ) d p M 4 = 2 ( I 2 I 1 )
The mean absolute deviation H of the truncated variable X around the median M is twice the difference between the right and left sub-means. Note that the mean μ of the truncated distribution
μ = Q 1 Q 3 x f ( x ) d x = 2 1 4 3 4 Q ( p ) d p = 2 ( I 1 + I 2 )
The subareas I 1 and I 2 correspond to left and right sub-means for X between p = 1 / 4 and p 1 / 2 and between 1 / 2 and 3 / 4 , respectively. This is illustrated in Figure 7.
The subarea I 1 represents the average income in the 25 = 50 % income group, whereas the subarea I 2 represents the average income in the 50–75% income group. It is interesting to compare the above expression for the mean absolute deviation of H = 2 ( I 2 I 1 ) of the truncated random variable X in Equation (34) with the equation for the mean absolute deviation H = ( I 2 I 1 ) of X in Equation (24). As we discussed in Section 6, the mean absolute deviation H can be interpreted as a difference in average incomes of the top (50–100%) and bottom (50–100%) income groups. By contrast, the mean absolute deviation H can be interpreted as twice the difference in average incomes of the top (50–75%) and bottom (25–50%) income groups in the truncated distribution.
We now compute the subareas I 1 and I 2 explicitly. For the left sub-area I 1 , from Equation (12), we have
I 1 = 1 4 1 2 Q ( p ) d p = α α 1 3 4 Q 1 1 2 M
Similarly, for the right sub-area I 2 from Equation (12), we have
I 2 = 1 2 3 4 Q ( p ) d p = α α 1 1 2 M 1 4 Q 3
For the mean absolute deviation H = 2 ( I 2 I 1 ) , we obtain
H = α α 1 2 M 3 2 Q 1 + 1 2 Q 3
whereas for mean μ = 2 ( I 1 + I 2 ) we obtain
μ = α α 1 · 3 2 Q 1 1 2 Q 3
We will use the above result in Equation (38) and consider different approximations to subarea integrals. The problem of estimating such integrals is called numerical integration or numerical quadrature and is a classical and well-researched problem in numerical analysis Conte and De Boor (1972). This problem arises when the integration cannot be carried out or when the function is known only at a finite number of points.
When applied to estimating the subarea integrals with numerical quadrature, these integrals are approximated at points p = 1 / 4 , p = 1 2 , and p = 3 4 . This would allow us to derive explicit closed-form approximations for α and G in terms of quartiles. This would be addressed in Section 9.
We can rewrite this in terms of the Lorenz curve. From Equations (3) and (17) we have:
L ( 1 4 ) = 1 ( 4 / 3 ) 1 / α 1 , L ( 1 2 ) = 1 ( 1 / 2 ) 1 / α 1 , L ( 3 4 ) = 1 4 1 / α 1
and therefore, we can rewrite Equations (35) and (36) for subarea integrals I 1 and I 2 as:
I 1 = α β ( α 1 ) L ( 1 2 ) L ( 1 4 ) and I 2 = α β ( α 1 ) L ( 3 4 ) L ( 1 2 )
This immediately gives us
H = 2 α β ( α 1 ) 2 L ( 1 2 ) L ( 1 4 ) L ( 3 4 ) , μ = 2 α β ( α 1 ) L ( 3 4 ) L ( 1 4 )
Note that the expression for the mean μ in the above Equation (41) can be written in terms of the mean μ of the original distribution as
μ = 2 μ L ( 3 4 ) L ( 1 4 )
For income distributions that follow the Pareto distribution, μ represents the average income of all individuals, whereas μ represents the average income of individuals with income in the range ( Q 1 , Q 3 ) . The term in brackets represents the fraction of total wealth held by such individuals. For α 1 (equal distribution), we have
μ = 2 μ 1 1 4 1 4 1 / α 1 3 4 4 3 1 / α μ / 2
as expected.
We can provide a quantile interpretation of income inequality. The average income in the (25–75%) income bracket is μ , shown in Figure 8a. If everyone had the same (minimum) income β , then we would have μ = β . The difference ( μ β ) between the average (truncated) income μ and the minimum income β represents the deviation from the equality. This is illustrated in Figure 8b.
This is analogous to the interpretation of inequality in terms of the overall mean μ and β shown in Figure 6.
The approach outlined above of truncating the distribution at the quartiles and estimating the parameters by computing the subarea integrals of the quantile functions between 1 4 and 3 4 is quite general. It can be applied to distributions with or without finite moments or variances if the quantile function is available. Such distributions arise in many areas of risk management and finance Chen and Wang (2025).
We can consider such an approach and obtain closed-form approximations for parameters for a number of important distributions, including the Levy distribution, which does not have any moments, and the Gumbel extreme-value distribution. These distributions are widely used in risk management and finance Rachev (2003).
In particular, for the Levy distribution, the quantile function can be expressed in terms of the inverse of the Gaussian CDF Φ ( · ) and is
Q ( p ) = μ + c [ Φ 1 ( 1 p / 2 ) ] 2
and using the approach outlined above, we were able to derive simple approximations for the parameters by truncating the distribution at the quartiles and expressing the subarea integrals in terms of the corresponding quartiles and octiles of the normal distribution.
For the Gumbel distribution with location μ and scale β , the quantile function is
Q ( p ) = μ β log ( log ( p ) )
Using our approach, we should be able to compute the parameters in terms of quartiles and logarithmic integrals (constants).
When applied to Pareto, one immediate extension is the Generalized Pareto IV distribution with the quantile function
Q ( p ) = μ + σ ξ ( 1 p ) ξ 1
This includes Pareto distribution ( ξ > 0 and μ = σ / ξ ), exponential ( ξ = μ = 0 ), Lomax distribution ( μ = 0 ), and many others. When applied to financial markets, this distribution can be used to model extreme events and estimate the Value at Risk (VaR) and the expected shortfall (ES). Our method would allow a simple estimation of these quantities via the quartiles. This would allow us to see the effect of the upper-tail behavior on volatility and returns.
Unlike the Pareto distribution considered in this paper, the generalized Pareto distribution has multiple parameters. To estimate these, we can consider an MAD-based approach: As before, we consider a truncated distribution, but we estimate three mean absolute deviations, H ( Q 1 ) , H ( M ) , H ( Q 3 ) , around the quartiles as well as the truncated mean μ . This gives us four equations in four unknowns that can be solved numerically. For many other distributions, such as log-normal, Cauchy, and Weibull, we can derive closed-form solutions for the parameters in terms of simple formulas involving the quartiles. We hope to address the analysis of the generalized Pareto distribution in our future work.

8. Approximating Tail and Gini Index Using the Mean Absolute Deviation of the Truncated Distribution

In this section, we derive approximations to the tail index and Gini index using the mean absolute deviation H of the truncated distribution. We will call this method H -based. We proceed as follows. From Equation (37), we obtain
α = H H M ( 3 Q 1 + Q 3 ) / 2 , G = H M ( 3 Q 1 + Q 3 ) / 2 H + M ( 3 Q 1 + Q 3 ) / 2
In terms of subarea integrals, we can rewrite these expressions as
α = 2 ( I 2 I 1 ) 2 ( I 2 I 1 ) M ( 3 Q 1 + Q 3 ) / 2 , G = 2 ( I 2 I 1 ) M ( 3 Q 1 + Q 3 ) / 2 2 ( I 2 I 1 ) + M ( 3 Q 1 + Q 3 ) / 2
To compute α and G, we propose to approximate subarea integrals I 1 and I 2 by numeric quadrature. In quadrature approximation, we approximate the integral of a function with a weighted sum of functions, the so-called “abscissas” or “Quadrature points” Rudin (1976). When the function in question is a quantile function, its integral is approximated by a weighted sum of quantiles. To illustrate this with a very simple example, consider the integral of the quantile function between points p 1 and p 2 . This integral is approximated by the area of a rectangle with base length ( p 2 p 1 ) and height Q ( p 1 ) . In this paper, we will confine ourselves to very simple rules, such as the “midpoint”, trapezoid, and Simpson rules Olver et al. (2010). We will show that even these simple rules provide valuable insights into interpreting the shape metrics.
Let us consider the so-called trapezoid rule Rudin (1976): I 1 ( Q 1 + M ) / 8 and I 2 ( M + Q 3 ) / 8 . From Table 1 this can be interpreted as setting the average income E ( Q 1 X M ) of the (25–50%) group to ( Q 1 + M ) / 2 and setting the average income E ( M X Q 3 ) of the (50–75%) group to ( Q 3 + M ) / 2 . The difference of subarea integrals ( I 2 I 1 ) is then ( Q 3 Q 1 ) / 8 . This is illustrated in Figure 9.
With this approximation for ( I 2 I 1 ) , we obtain
H 1 4 ( Q 3 Q 1 ) , α 1 2 Q 3 Q 1 Q 1 + Q 3 2 M , G 1 2 Q 1 + Q 3 2 M M Q 1
These equations have a simple interpretation in terms of well-known metrics used in quantile statistics Gilchrist (2000) summarized in Table 2.
We can interpret the performance metrics in Equation (48). In terms of these metrics, obtain the following simple interpretation:
  • The mean absolute deviation H of the truncated distribution around its median is approximately a quarter of the interquartile range ( Q 3 Q 1 ) ;
  • The tail index α is approximately one-half of the inverse of the Galton skewness;
  • The Gini index G is one-half of the ratio of the quartile difference to the lower quartile difference ( M Q 1 ) .
We will find it convenient to consider the weighted quartile difference R ( w ) with 0 w 1 , defined as
R ( w ) = w · ( M Q 1 ) + ( 1 w ) · ( Q 3 M )
This quantile metric R ( w ) measures the spread of data with different weights w and ( 1 w ) assigned to lower and upper quartile differences ( M Q 1 ) and ( Q 3 M ) , respectively. With this metric, we can express our results in Equation (48) as
H R ( 0 ) + R ( 1 ) 8 , α 1 2 R ( 0 ) + R ( 1 ) R ( 0 ) R ( 1 ) G 1 2 R ( 0 ) R ( 1 ) 1

9. Approximating Tail and Gini Index Using the Mean of the Truncated Distribution

In this section, we will derive approximations for α and G using the mean μ of the truncated distribution. In economic terms, we estimate these parameters by different approximations to the mean income. To proceed, let us define I = ( I 1 + I 2 ) as the total subarea under Q ( p ) with 1 / 4 p 3 / 4 . Then, μ = 2 I , and from Equation (38), we have
I = α α 1 3 4 Q 1 1 4 Q 3
From this, we obtain the following expressions for α and G:
α = μ μ ( 3 Q 1 Q 3 ) / 2 and G = μ ( 3 Q 1 Q 3 ) / 2 μ + ( 3 Q 1 Q 3 ) / 2
We can now consider a number of approximation methods to approximate μ using integral quadrature methods to obtain simple approximations to the tail index α and the Gini coefficient G (For a comprehensive survey of integral quadrature methods, see Masud et al. (2024)). We will compare the relative error of the proposed approximations at the end of this section. We will consider the following methods:
  • “Midpoint” method: approximate the mean μ by the median M;
  • “One-Trapezoid” method: approximate the mean μ by the average of the first and third quartiles Q 1 and Q 3 ;
  • “Two-Trapezoids” method: approximate the mean by the average of the median M with the average of the first and third quartiles Q 1 and Q 3 ;
  • “Quartile Average” method: approximate the mean income μ by the average of the three quartiles Q 1 , M, and Q 3 ;
  • “Simpson 1/3” method: approximate the quantile function Q ( p ) by a parabola passing through the quartiles. This will be shown to be equivalent to approximating the mean μ by the weighted average of “Two-Trapezois” and “Quartile Average” methods.

9.1. “Midpoint” Method for Approximating Tail and Gini Indices

Recall the midpoint rule for approximation of the integral of g ( x ) between a and b:
a b g ( x ) d x ( b a ) · g a + b 2
This is illustrated in Figure 10.
Using this rule, we have ( I 1 + I 2 ) M / 2 , and from Equation (38), we have
α α 1 3 Q 1 Q 3 2 M
From Table 1, we can interpret this approximation as follows: The average income E ( Q 1 X Q 3 ) of the middle (25–75%) income group is approximately the median income M.
From Equations (38) and (54), we obtain the following approximation for α and G:
α = 1 + 3 Q 1 Q 3 2 M 3 Q 1 + Q 3 and G = 1 2 3 Q 1 Q 3 2 M + 3 Q 1 Q 3
In terms of the weighted interquartile range defined in Equation (49) we can rewrite the above expression for α and G as
α = 1 + 1 2 M R ( 1 2 ) R ( 3 4 ) and G = 1 M R ( 1 2 ) M R ( 3 4 )

9.2. “One Trapezoid” Method for Approximating Tail and Gini Indices

Recall the trapezoid rule for approximation of the integral of g ( x ) between a and b:
a b g ( x ) d x ( b a ) · g ( a ) + g ( b ) 2
We approximate the subarea I = ( I 1 + I 2 ) by the corresponding trapezoid and obtain I ( Q 1 + Q 3 ) / 4 . This is illustrated in Figure 11.
This can be rewritten using Equation (38) as
α α 1 · 3 Q 1 Q 3 2 Q 1 + Q 3 2
From Table 1, the approximation in Equation (58) has a simple interpretation: the average income E ( Q 1 X Q 3 ) in the middle (25–75%) income group is approximately the average of the first and third quartiles.
From Equations (38) and (58), we get the following estimate of the tail index α and the Gini index G:
α = 1 + 1 2 3 Q 1 Q 3 Q 3 Q 1 and G = 1 1 2 3 Q 1 Q 3 Q 1
With this approximation, the tail index α can be simplified to α = ( Q 1 + Q 3 ) / 2 ( Q 3 Q 1 ) . It is the average of the first and third quartiles, Q 1 and Q 3 , divided by the interquartile range IQR = ( Q 3 Q 1 ) . The Gini coefficient can be simplified to G = ( Q 3 Q 1 ) / 2 Q 1 . The Gini coefficient G is one-half of the ratio of the interquartile range ( Q 3 Q 1 ) and the first quartile Q 1 .
In terms of the weighted interquartile range defined in Equation (49), we can rewrite the above expression for α and G as
α = 1 + M R ( 1 2 ) R ( 1 2 ) and G = 1 M R ( 1 2 ) M R ( 1 )

9.3. “Two Trapezoids” Method for Approximating Tail and Gini Indices

If we divide the interval [ a , b ] into n equal sub-intervals a = x 0 < x 1 < < x n = b , then
a b g ( x ) d x ( b a ) n i = 1 n g ( x i 1 ) + g ( x i ) 2
Therefore, we can consider two sub-intervals [ 1 / 4 , 1 / 2 ] and [ 1 / 2 , 3 / 4 ] . We approximate each subareas I 1 and I 2 by separate (left and right) trapezoids: I 1 ( Q 1 + M ) / 8 and I 2 ( M + Q 3 ) / 8 . This is illustrated in Figure 12.
From Table 1, this has a simple interpretation: the average income E ( Q 1 X M ) in the (25–50%) income group is approximately the average of the first and second quartiles, whereas the average income E ( M X Q 3 ) in the (50–75%) income group is approximately the average of the second and third quartiles.
We can rewrite this using Equation (38)
α α 1 3 Q 1 Q 3 2 Q 1 + 2 M + Q 3 4
From this, we get the following estimates for α and G:
α = 1 + 2 3 Q 1 Q 3 2 M 5 Q 1 + 3 Q 3 and G = 1 4 3 Q 1 Q 3 2 M + 7 Q 1 Q 3
In terms of the weighted interquartile range defined in Equation (49), we can rewrite the above expression for α and G as
α = 1 + 1 2 M R ( 1 2 ) R ( 5 8 ) and G = 1 M R ( 1 2 ) M R ( 7 8 )

9.4. “Quartile Average” Method for Approximating Tail and Gini Indices

We approximate the subarea the mean value μ = 2 ( I 1 + I 2 ) by the average ( Q 1 + M + Q 3 ) / 3 of the three quartiles Q 1 , M, and Q 3 . This is illustrated in Figure 13.
From Table 1, this has a simple interpretation: the average income E ( Q 1 X Q 3 ) in the (25–75%) income group is approximately the average of the three quartiles Q 1 , M, and Q 3 .
From Equation (38), this gives us
α α 1 3 2 Q 1 1 2 Q 3 = Q 1 + M + Q 3 3
From this, we obtain
α = 1 + 3 3 Q 1 Q 3 2 M 7 Q 1 + 5 Q 3 and G = 1 6 3 Q 1 Q 3 2 M + 11 Q 1 Q 3
In terms of the weighted interquartile range defined in Equation (49) we can rewrite the above expression for α and G as
α = 1 + M R ( 1 2 ) R ( 7 12 ) and G = 1 M R ( 1 2 ) M R ( 11 12 )

9.5. “Simpson 1/3” Method for Approximating Tail and Gini Indices

Recall Simpson’s rule Rudin (1976) for approximation of the integral of g ( x ) between a and b:
a b g ( x ) d x ( b a ) 6 g ( a ) + 4 g a + b 2 + g ( b )
This is accomplished by approximating the quantile function by a parabola h ( p ) passing through the three points ( 1 4 , Q 1 ) , ( 1 2 , M ) , and ( 3 4 , Q 3 ) . By the Lagrange interpolation formula Gradshteyn and Ryzhik (1980), a parabola y ( x ) passing through three points ( x 1 , y 1 ) , ( x 2 , y 2 ) , and ( x 3 , y 3 ) has the equation:
y ( x ) = y 1 ( x x 2 ) ( x x 3 ) ( x 1 x 2 ) ( x 1 x 3 ) + y 2 ( x x 1 ) ( x x 3 ) ( x 2 x 1 ) ( x 2 x 3 ) + y 3 ( x x 1 ) ( x x 2 ) ( x 3 x 1 ) ( x 3 x 2 )
With this approximation, we replace the quantile function Q ( p ) by a parabola h ( p ) :
h ( p ) = Q 1 ( p 1 2 ) ( p 3 4 ) ( 1 4 ) · ( 1 2 ) + Q 2 ( p 1 2 ) ( p 3 4 ) 1 4 · ( 1 4 ) + Q 3 ( p 1 4 ) ( p 1 2 ) 1 2 · 1 4
This is illustrated in Figure 14.
Applying this to the subarea integrals of the truncated distribution, we have from Equation (38)
α α 1 3 2 Q 1 1 2 Q 3 Q 1 + 4 M + Q 3 6
With this method, we approximate the average μ by μ ( Q 1 + 4 M + Q 3 ) / 6 . From Table 1 and Equation (69), this has the following interpretation: The average income E ( Q 1 X Q 3 ) in the (25–75%) income group is approximately the weighted average of the quartiles, where the weight of the median M is 2 / 3 and the weight of the other two quartiles, Q 1 and Q 3 , is 1 / 6 .
This gives us
α = 1 + 3 4 3 Q 1 Q 3 M 2 Q 1 + Q 3 and G = 1 3 3 Q 1 Q 3 2 M + 5 Q 1 Q 3
In terms of the weighted interquartile range defined in Equation (49), we can rewrite the above expression for α and G as
α = 1 + 1 2 M R ( 1 2 ) R ( 2 3 ) and G = 1 M R ( 1 2 ) M R ( 5 6 )

10. Summary of Methods, Estimation Procedure, and Error Analysis

All of the proposed methods can be summarized by their approach to estimating the truncated mean μ . We summarize these methods in Table 3.
We summarize the obtained formulas in Table 4:
We can compare the values for α by re-writing the results in terms of the median M and the weighted quartile difference R ( w ) from Table 2. These are summarized in Table 5.
We can compute the exact values of the relative errors for different α in the range 1 < α < 2 . These values are summarized in Table 6.
As can be seen from Table 6, using the H -based method gives us the worst results, similar to those using the “One-Trapezoid” method, with a typical relative error around 6–7%. The best results are obtained by the Simpson’s 1/8 method with relative errors of about 0.25%. We note that the error increases for larger values of α . This can be explained by noting from Equation (5) that for larger α , the quantile function increases faster. Asymptotically, as p 1 , the quantile function gets steeper (as in Figure 4b), resulting in less accuracy in the p > 1 2 region.
Similarly, for the Gini index, we can compute the exact values of the relative errors for different α in the range 1 < α < 2 . These values are summarized in Table 7.
We get similar results: the best approximation is to use the Simpson 1/8 rule. The average relative errors of this approximation are less than 0.5%. These results from Table 6 and Table 7 suggest that the Simpson 1/8 method gives the most accurate result. The one-trapezoid method (estimating the truncated mean μ by the average of the first and last quartiles) gives the worst result but provides a simpler interpretation. The H -based method has similar accuracy to the One-Trapezoid and has a very simple and intuitive explanation.
The closed-form approximations for the tail index α and the Gini index G make it easier to incorporate income mobility to analyze changes in these metrics. For example, consider the case where both Q 1 and Q 3 are increased by the same amount Δ . Let α denote the resulting tail index under this change. Then, with trapezoid approximation, we obtain
α = 1 + 1 2 3 ( Q 1 + Δ ) ( Q 3 + Δ ) ( Q 3 + Δ ) ( Q 1 + Δ ) + 1 2 3 Q 1 Q 3 Q 3 Q 1 + Δ ( Q 3 Q 1 ) = α + Δ ( Q 3 Q 1 )
In particular, if Δ = C α ( Q 3 Q 1 ) for some C, then α = ( 1 + C ) α .

11. Estimation Procedures

Our estimation procedure is the following: Given N points x 1 x 2 x N drawn from a Pareto distribution with shape α and scale β . We can consider two approaches: using H or using the truncated mean μ . For both approaches, we compute the first quartile Q 1 , the median Q 2 = M , and the third quartile Q 3 and consider the n points { x 1 , , x n } in the truncated distribution with Q 1 x i Q 3 .
  • Using Mean Absolute Deviation H :
    • Compute β = min { x 1 , , x N } ;
    • Compute the first quartile Q 1 , the median Q 2 = M , and the third quartile Q 3 ;
    • Consider the n points { x 1 , , x n } in the truncated distribution with Q 1 x i Q 3 . Note that because we truncate at the quartiles, we have n = N / 2 ;
    • Estimate the mean absolute deviation H around the median M. This can be done either using the quartiles H = ( Q 3 Q 1 ) / 4 or by computing H directly:
      H = | x 1 M | + | x 2 M | + + | x n M | n
    • Compute the α and G from Equation (47):
      α = H H M + ( 3 Q 1 + Q 3 ) / 2 , G = H M + ( 3 Q 1 + Q 3 ) / 2 H + M ( 3 Q 1 + Q 3 ) / 2
  • Using mean μ of the truncated distribution:
    • Compute β = min { x 1 , , x N }
    • Compute the first quartile Q 1 , the median Q 2 = M and the third quartile Q 3
    • Consider the n points { x 1 , , x n } in the truncated distribution with Q 1 x i Q 3 . Note that because we truncate at the quartiles, we have n = N / 2
    • Estimate the mean μ . This can be done either using the quartiles with formulas presented in Table 4 or by computing the mean μ directly
      μ = x 1 + x 2 + + x n n
    • Compute the α and the Gini coefficient G from Equation (22):
      α = 1 + ( 3 Q 1 Q 3 ) / 2 μ ( 3 Q 1 Q 3 ) / 2 and G = 1 ( 3 Q 1 Q 3 ) μ + ( 3 Q 1 Q 3 ) / 2
      The formulas in terms of the quartiles are presented in Table 4. The preferred way based on theoretical analysis is the Simpson 1/8 approximation.
  • Maximum Likelihood Estimation: The standard estimation procedure for estimating parameters in distributions is the Maximum Likelihood Estimation. In datasets conforming to the Pareto distribution, which may contain outliers, Maximum Likelihood Estimation (MLE) is prone to potential bias and reduced efficiency. The Maximum Likelihood Estimation for Pareto is
    β = min { x q , , x N } and α = N log ( x 1 / β ) + + log ( x N / β )
    Note that MLE uses all N points, whereas the proposed procedures use all points to compute β and 50% of the points from Q 1 to Q 3 . This means that MLE would be influenced by outlier(s) in the upper tail (beyond Q 3 ), whereas the estimates for the truncated distribution will not change. To see this, consider the following simple example. If we increase the value of the largest value x N by a factor of c > 1 , then the MLE estimate α based on x 1 , , x N 1 , c x N would change (decrease) since
    α = N log ( x 1 / β ) + + log ( x N / β ) + log c < N log ( x 1 / β ) + + log ( x N / β ) = α
    On the other hand, it is easy to show that such a change will not affect the truncated values Q 1 x i Q 3 and therefore will not change estimates based on the quartiles or MAD of the truncated distribution. for fixed x 1 , , x N .
  • Quantile Matching and other methods: We can consider other estimation methods, including the method of moments, fractional moments, quantile and log-quantile least squares, Bayesian methods, generalized order statistics, percentile points, least squares and weighted least squares, and Monte Carlo simulation, to name just a few Caeiro (2024); Habibullah and Ahsanullah (2000); Sharpe and Juarez (2023); Warsono et al. (2019). However, these methods do not yield a closed-form and easily explainable solution and require numerical iteration. By contrast, our proposed approximations are simple and easily explainable.

12. Numerical Results

We have conducted numerical evaluations to compare the performance of several of our proposed estimation methods against the traditional Maximum Likelihood Estimation (MLE), which utilizes the entire dataset, and to assess resilience to outliers. For this purpose, we generated 100 sample datasets with N = 1000 points. Each dataset was generated from a Pareto distribution, with the tail index parameter α varying from α = 1.1 to α = 2.0 (the most interesting range for practical applications). The results of these simulations are summarized to compare the accuracy of each method.
In our analysis of the tail index, α , as detailed in Table 8, a clear performance hierarchy emerges. The Maximum Likelihood Estimation (MLE) method has the highest accuracy. The Simpson 1/8 method is a close second, exhibiting remarkable precision. The Two Trapezoids method also performs reliably, showing consistent results across the range of α values. In contrast, the H -based method yields the highest error rate. Notably, the performance of the One-Trapezoid method shows considerable variability with changes in α , unlike the more stable Two-Trapezoids method.
A similar pattern is evident in the estimation of the Gini index G, as shown in Table 9. Once again, the Simpson 1/8 and Two-Trapezoids methods prove to be the most accurate, with their error rates being significantly lower than those of the other estimators.
Our simulation reveals that the Simpson 1/8 and MLE methods provide the most accurate estimations for both the tail index ( α ) and the Gini index (G), consistently outperforming the H -based and One-Trapezoid approaches, especially at smaller sample sizes. However, as illustrated in Figure 15 for the tail index and Figure 16 for the Gini index, the performance of all methods improves as the sample size increases. Notably, the plots show that when the sample size is sufficiently large, such as n = 800 , the accuracy of the various estimators converges, with most methods yielding results that are within a narrow 1–2% error rate of each other. Note that we use 50% fewer points in computing α and G compared to the Maximum Likelihood estimation.
To assess the robustness of each method, the dataset was “contaminated” with outliers by artificially inflating the largest 1% of data points by a factor of 10 3 . As illustrated in Figure 17 and Figure 18, MLE proved highly sensitive to outliers, exhibiting increased relative errors for both the α parameter and the Gini coefficient by as much as 15%. On the other hand, the proposed estimators grounded in robust statistics and truncation, such as Simpson’s 1/8 and the Two-Trapezoids methods, demonstrated remarkable resilience by maintaining low error rates across all sample sizes.

13. Case Study

To evaluate the practical performance of our proposed closed-form approximation methods, we conduct an analysis using five real-world wealth distribution datasets. Our study includes the following:
  • A synthetic Pareto dataset (n = 10,000, α = 1.5) generated for validation purposes. For this dataset, the quartiles are Q 1 = 1.21 , M = 1.57 , and Q 3 = 2.45 .
  • The Asia Fortune dataset (n = 11,008), last accessed on 8 February 2025. https://corgis-edu.github.io/corgis/csv/billionaires/. This dataset contains wealth information of affluent individuals across Asian markets. For this dataset, the quartiles are Q 1 = 13 , M = 19 , and Q 3 = 32 .
  • Global Billionaire (n = 30,497), last accessed on 8 February 2025. https://www.kaggle.com/datasets/vincentcampanaro/forbes-worlds-billionaires-list-2024. This dataset provides longitudinal wealth rankings spanning multiple years with demographic information. For this dataset, the quartiles are Q 1 = 14 , M = 21 , and Q 3 = 38 .
  • The Gender Money dataset (n = 26,609), last accessed on 8 February 2025. https://www.kaggle.com/datasets/fedesoriano/gender-pay-gap-dataset. This dataset captures wealth distributions segmented by gender across a 14-year period (2010–2023), enabling analysis of inequality patterns across demographic groups. For this dataset, the quartiles are Q 1 = 14 , M = 22 , and Q 3 = 39 .
  • North America dataset (n = 11,823), last accessed on 8 February 2025. https://github.com/open-numbers/ddf–gapminder–billionaires. This dataset focuses on wealthy individuals from the North American region. For this dataset, the quartiles are Q 1 = 15 , M = 23 , and Q 3 = 40 .
  • The Forbes dataset ( n = 2509 ), last accessed on 8 February 2025. https://www.kaggle.com/datasets/guillemservera/forbes-billionaires-1997-2023. This dataset represents recent global billionaire wealth data with values reaching hundreds of billions of dollars. For this dataset, the quartiles are Q 1 = 1.5 , M = 2.3 , and Q 3 = 4.3 .
These datasets collectively span different scales of wealth (from hundreds of thousands to hundreds of billions), geographical regions, and time periods, providing a robust testbed for evaluating how well our quartile-based approximation methods ( H , Midpoint, One Trapezoid, Two Trapezoids, Quartiles Average, and Simpson 1/8) estimate the tail index α and Gini coefficient G compared with maximum likelihood estimation and exact calculations.
For example, if we wish to calculate the tail index α for the Global Billionaire dataset with the midpoint method, we can use the formula from Table 4. Plugging in quartile values Q 1 = 14 , M = 21 and Q 3 = 38 , we get:
α = 1 + 3 Q 1 Q 3 2 M 3 Q 1 + Q 3 = = 1 + 3 · 14 38 2 · 21 3 · 14 + 38 = 1 + 4 38 = 1.105
Table 10 presents the actual Gini coefficient values computed using the exact formula and six quartile-based approximation methods across all datasets, showing that true Gini values range from 0.46 to 0.56, while approximation methods exhibit varying degrees of accuracy.
Table 11 displays the estimated tail index α parameters using maximum likelihood estimation (MLE) as the benchmark and six approximation methods, revealing that MLE estimates range from 1.00 to 1.53 across datasets, and our proposed approximation methods produce reasonable estimates.
Table 12 demonstrates the relative percentage errors for Gini coefficient approximations, where the midpoint method achieves low error (0.09%) for the Synthetic Pareto dataset, while the H* method generally outperforms other methods with errors typically below 25%.
Finally, Table 13 shows the relative percentage errors for tail index α estimation, where the Simpson and Two Trapezoids methods showed errors generally below 5% across most datasets, while the H* method exhibits the highest errors, ranging from 11% to 39%.
As can be seen from these results, the proposed methods give a simple and fairly accurate approximation to the tail index α and the Gini index G.

14. Concluding Remarks and Future Work

In this paper, we presented simple and easily explainable closed-form approximations of the Pareto index and the Gini coefficient. These metrics are derived in terms of the integrals of the underlying quantile function. Using the truncated distribution and a number of quadrature approximation methods for these integrals, we showed that the tail index α and the Gini index G can be expressed in terms of the simple weighted averages of the quartiles. The resulting formulas are resistant to outliers and have explanations both in economic terms, as quadrature approximation methods, and in terms of mean absolute deviations and means.
In the future, we propose to extend this method to the generalized Pareto distributions and other distributions for which the quantile function is known. For distributions for which the closed-form of the quantile function is not available, such as beta distributions, we propose approximating such quantile functions as the convex sum of simpler quantile functions. We can then use the quadrature formulas with approximate estimation for these quantile functions to estimate the parameters of the underlying distributions.
The proposed approach connects quantile statistics, mean absolute deviations, and parameter estimation by deriving a general class of MAD-based approximations in terms of the quantile function. It can be generalized to many other distributions for which the quantile function can be computed or approximated in closed form. In this work, we illustrated our methodology by focusing on the Pareto distribution. Future work will extend our approach to other distributions widely used in econometrics and other disciplines.

Author Contributions

Conceptualization, E.P.; methodology, E.P.; software, Q.W.; validation, Q.W.; formal analysis, E.P.; investigation, E.P.; resources, Q.W.; data curation, Q.W.; writing—original draft preparation, E.P.; writing—review and editing, E.P. and Q.W.; visualization, E.P. and Q.W.; supervision, E.P.; project administration: E.P.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All the relevant data, Python 3.13 code for analysis, detailed annual tables and graphs are available via: https://github.com/wenqifu/MAD_Gini (accessed on 1 June 2025).

Acknowledgments

The authors would like to thank the Metropolitan College of Boston University for their support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Arnold, B. (2015). An introduction to the theory of statistics. Chapman and Hall/CRC. [Google Scholar]
  2. Attanasio, O., & Pistaferri, L. (2016). Consumption inequality. Journal of Economic Perspectives, 30(2), 3–28. [Google Scholar] [CrossRef]
  3. Beirlant, J., Vynckier, P., & Teugels, J. L. (1996). Tail index estimation, Pareto quantile plots, and regression diagnostics. Journal of the American Statistical Association, 91(436), 1659–1667. [Google Scholar]
  4. Bennett, N., Hays, D., & Sullivan, B. (2019). The wealth of households: 2019. Available online: https://www.census.gov/library/publications/2022/demo/p70br-180.html (accessed on 1 June 2025).
  5. Bloomfield, P., & Steiger, W. (1983). Least absolute deviations: Theory, applications and algorithms. Birkhauser. [Google Scholar]
  6. Bock, H. G., Carraro, T., Jäger, W., Körkel, S., Rannacher, R., & Schlöder, J. P. (Eds.). (2013). Model based parameter estimation: Theory and applications (Vol. 4). Springer. [Google Scholar] [CrossRef]
  7. Bowley, A. L. (1920). Elements of statistics. P.S. King & Son. [Google Scholar]
  8. Brazauskas, V., & Serfling, R. (2007). Estimating tail weight in Pareto models. Journal of Statistical Planning and Inference, 142(7), 685–695. [Google Scholar]
  9. Brzezinski, M. (2014). Do wealth distributions follow power laws? Evidence from “rich lists”. Physica A: Statistical Mechanics and its Applications, 406, 155–162. [Google Scholar] [CrossRef]
  10. Caeiro, M., & Norouzirad, F. (2024). Comparing estimation methods for the power–Pareto distribution. Econometrics, 12(3), 20. [Google Scholar] [CrossRef]
  11. Catalano, M. T., Leise, T. L., & Pfaff, T. J. (2009). Measuring resource inequality: The gini coefficient. Digital Commons @ University of South Florida, 1–24. [Google Scholar]
  12. Chen, Y., & Wang, R. (2025). Infinite-mean models in risk management: Discussions and recent advances. Risk Sciences, 1, 100003. [Google Scholar] [CrossRef]
  13. Chotikapanich, D., Griffiths, W. E., & Rao, D. P. (2007). Estimating and combining national income distributions using limited data. Journal of Business & Economic Statistics, 25(1), 97–109. [Google Scholar] [CrossRef]
  14. Coelho, R., Richmond, P., & Barry, J. D. (2008). Double power laws in income and wealth distributions. Physica A: Statistical Mechanics and its Applications, 387, 3847–3851. [Google Scholar]
  15. Conte, S., & De Boor, C. (1972). Numerical analysis: An algorithmic approach (2nd ed.). McGraw-Hill Publishing. [Google Scholar]
  16. Convertito, G., & Cruz-Uribe, D. (2023). The stieltjes integral. Chapman and Hall/CRC. [Google Scholar] [CrossRef]
  17. Dallas, A. C. (1976). Characterizing the Pareto and power distributions. Annals of the Institute of Statistical Mathematics, 28(1), 491–497. [Google Scholar] [CrossRef]
  18. Dekkers, A. L., Einmahl, J. H., & De Haan, L. (1989). A moment estimator for the index of an extreme-value distribution. The Annals of Statistics, 17(4), 1833–1855. [Google Scholar] [CrossRef]
  19. Dodge, Y. (1987). Statistical data analysis based on the l1 norm and related topics. North-Holland. [Google Scholar]
  20. Elsayed, E. (2022). On uses of mean absolute deviation: Shape exploring and distribution function estimation. arXiv. [Google Scholar] [CrossRef]
  21. Elsayed, K. (2015). Mean absolute deviation: Analysis and applications. International Journal of Business and Statistical Analysis, 2(2), 63–74. [Google Scholar] [CrossRef] [PubMed]
  22. Farebrother, R. W. (1987). The historical development of the L1 and L estimation methods. In Y. Dodge (Ed.), Statistical data analysis based on the l1-norm and related topics (pp. 37–63). North-Holland. [Google Scholar]
  23. Feller, J. (1956). Probability theory and applications. J. Wiley. [Google Scholar]
  24. Fisher, R. A. (1920). A mathematical examination of the methods of determining the accuracy of observation by the mean error, and by the mean square error. Monthly Notices of the Royal Astronomical Society, 80, 758–770. Available online: https://api.semanticscholar.org/CorpusID:124194116 (accessed on 1 June 2025). [CrossRef]
  25. Gilchrist, W. (2000). Statistical modelling with quantile functions. Chapman and Hall/CRC. [Google Scholar] [CrossRef]
  26. Gini, C. (1912). Variabilità e mutuabilità. Contributo allo studio delle distribuzioni e delle relazioni statistiche. C. Cuppini. [Google Scholar]
  27. Gini, C. (1936). On the measure of concentration with special reference to income and statistics. Colorado College Publication, General Series, 208, 73–79. [Google Scholar]
  28. Gorard, S. (2005). Revisiting a 90-year-old debate: The advantages of the mean deviation. British Journal of Educational Studies, 53(4), 417–430. [Google Scholar] [CrossRef]
  29. Gorard, S. (2015a). An absolute deviation approach to assessing correlation. British Journal of Education, Society and Behavioral Science, 53(1), 73–81. [Google Scholar] [CrossRef]
  30. Gorard, S. (2015b). Introducing the mean deviation “effect” size. International Journal of Research and method in Education, 38(2), 105–114. [Google Scholar] [CrossRef]
  31. Gradshteyn, I. S., & Ryzhik, I. M. (1980). Table of integrals, series, and products. Academic Press. [Google Scholar]
  32. Grimshaw, S. D. (1993). Computing maximum likelihood estimates for the generalized Pareto distribution. Technometrics, 35(2), 185–191. [Google Scholar] [CrossRef]
  33. Groeneveld, R. A., & Meeden, G. (1984). Measuring Skewness and Kurtosis. The Statistician, 33(4), 391–399. [Google Scholar] [CrossRef]
  34. Habib, E. (2012). Mean absolute deviation about median as a tool of exploratory data analysis. International Journal of Research and Reviews in Applied Sciences, 11(3), 517–523. [Google Scholar]
  35. Habibullah, M., & Ahsanullah, M. (2000). Estimation of parameters of a Pareto distribution by generalized order statistics. Communications in Statistics—Theory and Methods, 29(7), 1597–1609. [Google Scholar] [CrossRef]
  36. Hampel, F. R. (1974). The influence curve and its role in robust estimation (Vol. 69, No. 346). Taylor & Francis. [Google Scholar]
  37. Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution. The Annals of Statistics, 3(5), 1163–1174. [Google Scholar] [CrossRef]
  38. Huber, P. J. (2009). Robust statistics. J. Wiley. [Google Scholar]
  39. Hussain, S., Bhatti, S., Ahmad, T., Aftab, M., & Tahir, M. (2018). Parameter estimation of Pareto distribution: Some modified moment estimators. Maejo International Journal of Science and Technology, 12, 11–27. [Google Scholar]
  40. Johnson, N., & Kotz, S. (1970). Distributions in statistics. J. Wiley. [Google Scholar]
  41. Kenney, J., & Keeping, E. (1962). Mathematics of statistics. Van Nostrand. [Google Scholar]
  42. Lam, W. S., Lam, W. H., & Hafizah, J. S. (2021). Portfolio optimization with a mean–absolute deviation–entropy multi-objective model. Entropy, 23(10), 1266. [Google Scholar] [CrossRef]
  43. Leys, C., Klein, O., Bernard, P., & Licata, L. (2013). Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. Journal of Experimental Social Psychology, 49(4), 764–766. [Google Scholar] [CrossRef]
  44. MacGillivray, H. I. (1986). Skewness and asymmetry. The Annals of Statistics, 14(3), 994–1011. [Google Scholar] [CrossRef]
  45. Masud, A., Shimi, F., & Gope, A. (2024). Numerical integration techniques: A comprehensive review. International Journal of Innovative Science and Research Technology (IJISRT), 9, 2744–2755. [Google Scholar] [CrossRef]
  46. Olver, F. (1974). Introduction to asymptotics and special functions (1st ed.). Academic Press. [Google Scholar]
  47. Olver, F., Lozier, D., Boisvert, R., & Clark, C. (2010). Nist handbook of mathematical functions. Cambridge University Press. [Google Scholar]
  48. Papoulis, A. (1984). Probability, random variable, and stochastic processes (2nd ed.). McGraw-Hill. [Google Scholar]
  49. Pareto, V. (1897). The new theories of economics. Journal of Political Economy. [Google Scholar] [CrossRef]
  50. Pham-Gia, T., & Hung, T. (2001). The mean and median absolute deviations. Journal of Mathematical and Computer Modelling, 34, 921–936. [Google Scholar] [CrossRef]
  51. Pickands, J., III. (1975). Statistical inference using extreme order statistics. The Annals of Statistics, 3(1), 119–131. [Google Scholar] [CrossRef]
  52. Pinsky, E. (2025). Computation and interpretation of mean absolute deviations by cumulative distribution functions. Frontiers in Applied Mathematics and Statistics, 11, 1487331. [Google Scholar] [CrossRef]
  53. Pinsky, E., & Klawansky, S. (2023). MAD (about median) vs. quantile-based alternatives for classical standard deviation, skew, and kurtosis. Frontiers in Applied Mathematics and Statistics, 9, 1206537. [Google Scholar] [CrossRef]
  54. Pinsky, E., Zhang, W., & Wang, Z. (2024). Pareto distribution of the forbes billionaires. Computational Economics, 9, 1–28. [Google Scholar] [CrossRef]
  55. Portnoy, S., & Koenker, R. (1997). The gaussian hare and the Laplacian tortoise: Computability of square-error versus abolute-error estimators. Statistical Science, 2(88), 279–300. [Google Scholar] [CrossRef]
  56. Rachev, S. (2003). Handbook of heavy tailed didstributions in finance. Elsevier. [Google Scholar]
  57. Rousseeuw, P., & Croux, C. (1993). Alternatives to the median absolute deviation. Journal of the American Statistical Association, 88(424), 1273–1283. [Google Scholar] [CrossRef]
  58. Rousseeuw, P., & Leroy, A. (2005). robust regression and outlier detection. Wiley. [Google Scholar] [CrossRef]
  59. Rudin, W. (1976). Principles of mathematical analysis. McGraw-Hill. [Google Scholar]
  60. Rytgaard, M. (1990). Estimation in the Pareto distribution. ASTIN Bulletin, 20(2), 201–216. [Google Scholar] [CrossRef]
  61. Schwertman, N., Gilks, A., & Cameron, J. (1990). A simple noncalculus proof that the median minimizes the sum of the absolute deviations. The American Statistician, 44(1), 38–41. [Google Scholar] [CrossRef]
  62. Shad, S. (1969). On the minimum property of the first absolute moment. The American Statistician, 23, 27. [Google Scholar] [CrossRef] [PubMed]
  63. Sharpe, J., & Juarez, M. (2023). Estimation of the Pareto and related distributions—A reference-intrinsic approach. Communications in Statistics—Theory and Methods, 52(3), 523–542. [Google Scholar] [CrossRef]
  64. Warsono, G., Gustavia, E., Kurniasari, D. A. D., & Antonio, Y. (2019). On the comparison of the methods of parameter estimation for Pareto distribution. Journal of Physics: Conference Series, 1338(1), 012042. [Google Scholar] [CrossRef]
  65. Yager, R., & Alajilan, N. (2014). A note on mean absolute deviation. Information Sciences, 279, 632–641. [Google Scholar] [CrossRef]
Figure 1. Illustration of Pareto density f ( x ) and CDF F ( x ) for α = 1.5 and β = 1 .
Figure 1. Illustration of Pareto density f ( x ) and CDF F ( x ) for α = 1.5 and β = 1 .
Econometrics 13 00030 g001
Figure 2. Illustration of convexity of Q ( p ) .
Figure 2. Illustration of convexity of Q ( p ) .
Econometrics 13 00030 g002
Figure 3. Area interpretation of mean (a) and mean absolute deviation (b).
Figure 3. Area interpretation of mean (a) and mean absolute deviation (b).
Econometrics 13 00030 g003
Figure 4. Two area interpretations of the Gini coefficient (via Lorentz curve in (a) and via quantile function in (b).
Figure 4. Two area interpretations of the Gini coefficient (via Lorentz curve in (a) and via quantile function in (b).
Econometrics 13 00030 g004
Figure 5. Interpretation of Gini via mean μ (a) and quantile function (b).
Figure 5. Interpretation of Gini via mean μ (a) and quantile function (b).
Econometrics 13 00030 g005
Figure 6. Area interpretation of alpha α (a) and Gini G (b).
Figure 6. Area interpretation of alpha α (a) and Gini G (b).
Econometrics 13 00030 g006
Figure 7. Area interpretation of the truncated mean μ (a) and mean deviation H (b).
Figure 7. Area interpretation of the truncated mean μ (a) and mean deviation H (b).
Econometrics 13 00030 g007
Figure 8. Interpreting income inequality in terms of subareas (a) and β (b).
Figure 8. Interpreting income inequality in terms of subareas (a) and β (b).
Econometrics 13 00030 g008
Figure 9. Approximating subarea integrals with Two Trapezoids.
Figure 9. Approximating subarea integrals with Two Trapezoids.
Econometrics 13 00030 g009
Figure 10. Approximating truncated mean ( I 1 + I 2 ) (a) using the midpoint approximation (b).
Figure 10. Approximating truncated mean ( I 1 + I 2 ) (a) using the midpoint approximation (b).
Econometrics 13 00030 g010
Figure 11. Approximating truncated subarea integrals (a) with One Trapezoid (b).
Figure 11. Approximating truncated subarea integrals (a) with One Trapezoid (b).
Econometrics 13 00030 g011
Figure 12. Approximating truncated subarea integrals (a) with Two Trapezoids (b).
Figure 12. Approximating truncated subarea integrals (a) with Two Trapezoids (b).
Econometrics 13 00030 g012
Figure 13. Approximating truncated subarea integrals (a) with three rectangles (b).
Figure 13. Approximating truncated subarea integrals (a) with three rectangles (b).
Econometrics 13 00030 g013
Figure 14. Approximating truncated subarea integrals (a) with Simpson 1/3 Rule (b).
Figure 14. Approximating truncated subarea integrals (a) with Simpson 1/3 Rule (b).
Econometrics 13 00030 g014
Figure 15. Pareto Alpha values for different sample sizes. (Left): n = 100 , 200 , (Right): n = 400 , 800 .
Figure 15. Pareto Alpha values for different sample sizes. (Left): n = 100 , 200 , (Right): n = 400 , 800 .
Econometrics 13 00030 g015
Figure 16. Gini index for different sample sizes. (Left): n = 100 , 200 ; (Right): n = 400 , 800 .
Figure 16. Gini index for different sample sizes. (Left): n = 100 , 200 ; (Right): n = 400 , 800 .
Econometrics 13 00030 g016
Figure 17. Pareto Tail Index with outliers for different sample sizes. (Left): n = 100 , 200 ; (Right): n = 400 , 800 .
Figure 17. Pareto Tail Index with outliers for different sample sizes. (Left): n = 100 , 200 ; (Right): n = 400 , 800 .
Econometrics 13 00030 g017
Figure 18. Gini index with outliers for different sample sizes. (Left): n = 100 , 200 ; (Right): n = 400 , 800 .
Figure 18. Gini index with outliers for different sample sizes. (Left): n = 100 , 200 ; (Right): n = 400 , 800 .
Econometrics 13 00030 g018
Table 1. Interpretation of incomes via quartiles.
Table 1. Interpretation of incomes via quartiles.
Income GroupXpAverage
0–100% X β 0 p 1 α α 1 · β
0–25% β X Q 1 0 p < 1 / 4 α α 1 · ( 4 β 3 Q 1 )
25–50% Q 1 X M 1 / 4 p 1 / 2 α α 1 · ( 3 Q 1 2 M )
50–75% M X Q 3 1 / 2 p 3 / 4 α α 1 · ( 2 M Q 3 )
25–100% X Q 3 3 / 4 p 1 α α 1 · ( 4 β Q 3 )
0–50% β X M 0 p 1 / 2 α α 1 · ( 2 β M )
50–100% X M 1 / 2 p 1 α α 1 · M
25–75% Q 1 X Q 3 1 / 4 p 3 / 4 α α 1 · 3 Q 1 Q 3 2
0–75% β X Q 3 0 p 3 / 4 α α 1 · 4 β Q 3 3
75–100% X Q 1 1 / 4 p 1 α α 1 · 4 β + 3 Q 1 3
Table 2. Common quantile metrics.
Table 2. Common quantile metrics.
NameFormula
lower quartile difference M Q 1
upper quartile difference Q 3 M
quartile difference Q 1 + Q 3 2 M
interquartile range (IQR) Q 3 Q 1
Galton skewness ( Q 1 + Q 3 2 M ) / ( Q 3 Q 1 )
skewness ratio ( Q 3 M ) / ( M Q 1 )
Table 3. Methods for estimating μ .
Table 3. Methods for estimating μ .
MethodEstimate of μ Notes
H -based H = Q 3 Q 1 4
Midpoint μ = M
One Trapezoid μ = Q 1 + Q 3 2 >M by convexity
Two Trapezoids μ = Q 1 + 2 M + Q 3 4 1 2 Q 1 + Q 3 2 + 1 2 M
Quartile Average μ = Q 1 + M + Q 3 3 2 3 Q 1 + 2 M + Q 3 4 + 1 3 Q 1 + Q 3 2
Simpson 1/8 μ = Q 1 + 4 M + Q 3 6 2 3 Q 1 + 2 M + Q 3 4 + 1 3 M
Table 4. Approximation of α and G.
Table 4. Approximation of α and G.
MethodTail Index α Gini Index G
H -based 1 2 Q 3 Q 1 Q 1 + Q 3 2 M 1 2 Q 1 + Q 3 2 M M Q 1
Midpoint 1 + 3 Q 1 Q 3 2 M 3 Q 1 + Q 3 1 2 3 Q 1 Q 3 2 M + 3 Q 1 Q 3
One Trapezoid 1 + 1 2 3 Q 1 Q 3 Q 3 Q 1 1 1 2 3 Q 1 Q 3 Q 1
Two Trapezoids 1 + 2 3 Q 1 Q 3 2 M 5 Q 1 + 3 Q 3 1 4 3 Q 1 Q 3 2 M + 7 Q 1 Q 3
Quartiles Average 1 + 3 3 Q 1 Q 3 2 M 7 Q 1 + 5 Q 3 1 6 3 Q 1 Q 3 2 M + 11 Q 1 Q 3
Simpson 1/8 1 + 3 4 3 Q 1 Q 3 M 2 Q 1 + Q 3 1 3 3 Q 1 Q 3 2 M + 5 Q 1 Q 3
Table 5. Summary of approximation of α and G in terms of M and weighted quartile difference R ( w ) .
Table 5. Summary of approximation of α and G in terms of M and weighted quartile difference R ( w ) .
MethodTail Index α Gini Index G
H -based: 1 2 R ( 0 ) + R ( 1 ) R ( 0 ) R ( 1 ) 1 2 R ( 0 ) R ( 1 ) 1
Midpoint: 1 + 1 2 M R ( 1 2 ) R 3 4 R ( 1 2 ) R ( 3 4 ) M R ( 3 4 )
One-Trapezoid: 1 + M R ( 1 2 ) R ( 1 2 ) R ( 1 2 ) R ( 1 ) M R ( 1 )
Two-Trapezoids: 1 + 1 2 M R ( 1 2 ) R ( 5 8 ) R ( 1 2 ) R ( 7 8 ) M R ( 7 8 )
Quartile Average: 1 + M R ( 1 2 ) R ( 7 12 ) R ( 1 2 ) R ( 11 12 ) M R ( 11 12 )
Simpson 1/3: 1 + 1 2 M R ( 1 2 ) R ( 2 3 ) R ( 1 2 ) R ( 5 6 ) M R ( 5 6 )
Table 6. Percent relative error of approximation for the tail index.
Table 6. Percent relative error of approximation for the tail index.
MethodTail Index α
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9
H -based 1.26 2.34 3.29 4.14 4.92 5.63 6.30 6.92 7.51
Midpoint 0.86 1.50 2.01 2.40 2.72 4.61 3.20 3.38 3.53
One Trapezoid 1.53 2.71 3.62 4.35 4.94 5.43 5.83 6.17 6.45
Two Trapezoids 0.50 0.88 1.16 1.39 1.57 2.98 1.83 1.92 2.01
Quartiles Average 0.88 1.54 2.05 2.45 2.78 3.04 3.26 3.44 3.59
Simpson 1/8 0.09 0.16 0.20 0.23 0.25 0.27 0.28 0.29 0.30
Table 7. Percent relative error of approximation for the Gini index.
Table 7. Percent relative error of approximation for the Gini index.
MethodTail Index α
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9
H -based 2.36 4.18 5.65 6.89 7.96 8.92 9.79 10.60 11.36
Midpoint 1.54 2.51 3.16 3.61 3.93 4.16 4.34 4.47 4.57
One Trapezoid 2.89 4.86 6.25 7.26 8.01 8.57 9.00 9.34 9.60
Two Trapezoids 0.93 1.53 1.93 2.21 2.40 2.55 2.66 2.74 2.80
Quartiles Average 1.63 2.71 3.44 3.97 4.35 4.63 4.84 5.00 5.12
Simpson 1/8 0.17 0.27 0.33 0.36 0.38 0.39 0.40 0.41 0.41
Table 8. Average percentage errors for tail index α estimation ( n = 1000 and 100 simulations).
Table 8. Average percentage errors for tail index α estimation ( n = 1000 and 100 simulations).
MethodTail Index α
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
H direct 7.2 10.2 15.5 17.6 21.0 23.8 27.3 29.0 32.8 34.4
H estimate 7.2 10.2 15.5 17.6 21.0 23.8 27.3 29.0 32.8 34.4
Midpoint 3.9 4.3 4.1 4.7 4.7 4.5 5.5 4.5 5.8 4.5
One Trapezoid 3.2 3.8 4.7 4.6 5.0 5.6 5.1 6.2 5.8 7.1
Two Trapezoids 3.4 3.6 3.5 3.8 3.5 3.7 3.2 3.6 4.0 4.0
Quartiles Average 3.3 3.6 3.8 3.9 3.7 4.2 3.5 4.2 4.3 4.7
Simpson 1/8 3.5 3.7 3.5 3.8 3.5 3.5 3.5 3.4 4.2 3.6
MLE 2.2 2.5 2.5 2.4 2.4 2.8 2.8 2.3 2.6 2.3
Table 9. Average percentage errors for Gini coefficient G estimation ( n = 1000 and 100 simulations).
Table 9. Average percentage errors for Gini coefficient G estimation ( n = 1000 and 100 simulations).
MethodGini Coefficient G
1.11.21.31.41.51.6 1.7 1.8 1.9 2.0
H direct 15.4 22.8 36.4 40.5 50.9 57.9 67.7 71.2 86.8 89.4
H estimate 15.4 22.8 36.4 40.5 50.9 57.9 67.7 71.2 86.8 89.4
Midpoint 6.9 6.9 6.4 6.7 6.5 6.1 7.1 5.7 7.1 5.6
One Trapezoid 6.2 7.1 8.4 7.9 8.3 9.2 8.1 9.7 8.8 10.9
Two Trapezoids 6.3 6.3 6.0 6.0 5.4 5.7 4.7 5.2 5.5 5.7
Quartiles Average 6.2 6.5 6.6 6.4 6.0 6.6 5.3 6.3 6.2 7.0
Simpson 1/8 6.4 6.3 5.8 5.8 5.2 5.2 4.9 4.8 5.5 5.0
MLE 4.0 4.2 4.0 3.6 3.6 4.1 3.9 3.2 3.5 3.1
Table 10. Gini coefficient values across datasets and methods.
Table 10. Gini coefficient values across datasets and methods.
DatasetTrueH*MidpointOne TrapTwo TrapQuartilesSimpson
Synthetic Pareto0.460.710.460.520.490.500.48
Asia Fortune0.470.580.690.730.710.720.70
Global Billionaire0.510.710.830.860.840.850.84
Gender Money0.520.560.870.890.880.890.88
North America0.550.560.800.830.820.820.81
Forbes0.560.750.920.930.930.930.92
Table 11. Alpha parameter values across datasets and methods.
Table 11. Alpha parameter values across datasets and methods.
DatasetTrueH*MidpointOne TrapTwo TrapQuartilesSimpson
Synthetic Pareto1.531.201.591.471.521.501.54
Asia Fortune1.211.361.231.181.201.201.21
Global Billionaire1.071.201.111.081.091.091.10
Gender Money1.041.391.071.061.071.061.07
North America1.001.391.121.101.111.111.11
Forbes1.001.171.051.041.041.041.04
Table 12. Percent relative error of approximation for the Gini coefficient.
Table 12. Percent relative error of approximation for the Gini coefficient.
DatasetTrueH*MidpointOne TrapTwo TrapQuartilesSimpson
Synthetic Pareto0.0055.200.0912.596.618.684.46
Asia Fortune0.0024.1746.6455.5651.4252.8749.91
Global Billionaire0.0040.5162.5168.6265.8666.8464.82
Gender Money0.008.1967.7971.7369.9370.5769.26
North America0.001.9945.7651.1048.6549.5147.74
Forbes0.0034.1463.9566.9365.6066.0865.09
Table 13. Percent relative error of approximation for the Alpha parameter.
Table 13. Percent relative error of approximation for the Alpha parameter.
DatasetTrueH*MidpointOne TrapTwo TrapQuartilesSimpson
Synthetic Pareto0.0021.653.674.340.802.060.57
Asia Fortune0.0011.790.972.450.911.460.33
Global Billionaire0.0012.113.261.212.121.792.47
Gender Money0.0033.222.941.672.242.042.46
North America0.0038.7412.079.8810.8710.5211.24
Forbes0.0016.554.443.473.903.744.06
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pinsky, E.; Wen, Q. Simple Approximations and Interpretation of Pareto Index and Gini Coefficient Using Mean Absolute Deviations and Quantile Functions. Econometrics 2025, 13, 30. https://doi.org/10.3390/econometrics13030030

AMA Style

Pinsky E, Wen Q. Simple Approximations and Interpretation of Pareto Index and Gini Coefficient Using Mean Absolute Deviations and Quantile Functions. Econometrics. 2025; 13(3):30. https://doi.org/10.3390/econometrics13030030

Chicago/Turabian Style

Pinsky, Eugene, and Qifu Wen. 2025. "Simple Approximations and Interpretation of Pareto Index and Gini Coefficient Using Mean Absolute Deviations and Quantile Functions" Econometrics 13, no. 3: 30. https://doi.org/10.3390/econometrics13030030

APA Style

Pinsky, E., & Wen, Q. (2025). Simple Approximations and Interpretation of Pareto Index and Gini Coefficient Using Mean Absolute Deviations and Quantile Functions. Econometrics, 13(3), 30. https://doi.org/10.3390/econometrics13030030

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop