Next Article in Journal
Navigating Inflation Challenges: AI-Based Portfolio Management Insights
Next Article in Special Issue
Cyber Risk in Insurance: A Quantum Modeling
Previous Article in Journal
Market Equilibrium and the Cost of Capital with Heterogeneous Investment Horizons
Previous Article in Special Issue
Stochastic Chain-Ladder Reserving with Modeled General Inflation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Estimation of the Tail Index of a Single Parameter Pareto Distribution from Grouped Data

by
Chudamani Poudyal
Department of Mathematical Sciences, University of Wisconsin-Milwaukee, Milwaukee, WI 53211, USA
Risks 2024, 12(3), 45; https://doi.org/10.3390/risks12030045
Submission received: 24 January 2024 / Revised: 23 February 2024 / Accepted: 26 February 2024 / Published: 1 March 2024
(This article belongs to the Special Issue Advancements in Actuarial Mathematics and Risk Theory)

Abstract

:
Numerous robust estimators exist as alternatives to the maximum likelihood estimator (MLE) when a completely observed ground-up loss severity sample dataset is available. However, the options for robust alternatives to a MLE become significantly limited when dealing with grouped loss severity data, with only a handful of methods, like least squares, minimum Hellinger distance, and optimal bounded influence function, available. This paper introduces a novel robust estimation technique, the Method of Truncated Moments (MTuM), pecifically designed to estimate the tail index of a Pareto distribution from grouped data. Inferential justification of the MTuM is established by employing the central limit theorem and validating it through a comprehensive simulation study.

1. Introduction

The protection of policyholder privacy, covering a wide array of stakeholders from individuals to small businesses, privately owned companies, and local government funds, is a critical issue in the contemporary digital era. With the advent of digital data collection and storage, insurance companies, which traditionally rely on detailed individual claim information, are increasingly recognizing the importance of also understanding market-level trends and severities through grouped data analysis. Data vendors and public databases attempt to address this concern by providing data in a summarized or grouped format. Such data treatment necessitates viewing them as independent and identically distributed (i.i.d.) realizations of a random variable, subjected to interval censoring over multiple contiguous intervals. The analysis of grouped sample data has largely depended on maximum likelihood estimation (MLE). However, MLE can result in models that are overly sensitive to anomalies in the data distribution, such as contamination, Tukey (1960), or the presence of disproportionately heavy point masses at specific values, a scenario frequently encountered in the actuarial field, particularly within payment-per-payment and payment-per-loss data contexts, Poudyal et al. (2023).
The drive for robustness in statistical estimation against the sensitivity/vulnerability of MLE has led to the exploration and establishment of various robust estimation methods across different data scenarios, with the notable exception of grouped data. This gap significantly underlines the fundamental motivation of this scholarly work: to address this gap by introducing an innovative robust estimation approach specifically designed to tackle the distinct challenges associated with the analysis of grouped data.
Within the domain of robust statistical estimation literature, the broad category of L-statistics, Chernoff et al. (1967), stands out as a comprehensive toolkit, spanning a wide array of robust estimators along with their inferential justification, such as methods of trimmed moments (MTM) and winsorized moments (MWM). MTM and MWM approaches have been effectively applied in actuarial loss data scenarios, contingent on the existence of the quantile function for the assumed underlying distribution. Studies by Brazauskas et al. (2009); Zhao et al. (2018) have, respectively, applied MTM and MWM to datasets with completely observed ground-up actuarial loss severity. In contexts of incomplete actuarial loss data, particularly for payment-per-payment and payment-per-loss, Poudyal (2021a) and Poudyal et al. (2023) have, respectively, implemented MTM and MWM. These investigations have further established that trimming and winsorizing serve as effective strategies for enhancing the robustness of moment estimation in the presence of extreme claims, Gatti and Wüthrich (2023). However, the adaptability and applicability of MTM/MWM to situations involving grouped data, especially where the quantile function may be undefined within certain intervals of interest, remain open for investigation. This scholarly work aims to explore this potential, particularly in light of the Method of Truncated Moments (MTuM), a novel approach introduced by Poudyal (2021b) for completely observed ground-up loss datasets, which implements predetermined lower and upper truncation points to effectively manage tail sample observations.
The importance of robust statistical methods is significant in the insurance sector, especially in its effects on pricing strategies and rate regulation. Traditional estimation methods, like MLE, that are sensitive to data anomalies can lead to inaccuracies in risk assessment, thus affecting the fairness and reliability of insurance premiums. This has direct implications for regulatory compliance and the development of insurance products. The need for methodologies that ensure accuracy and fairness in premium settings is crucial, as these influence both insurer profitability and policyholder satisfaction. Although MTM/MWM offer robust alternatives to MLE, they are not directly applicable in the context of grouped data. Therefore, the proposed MTuM approach aims to address these critical industry challenges by offering a more stable and fair estimation framework, which could significantly contribute to the improvement in insurance pricing models and regulatory practices.
Regarding the analysis of grouped data, Aigner and Goldberger (1970) explored the estimation of the scale parameter for the single-parameter Pareto distribution MLE and four variants of least squares. As a robust alternative to MLE for grouped data, Lin and He (2006) examined the approximate minimum Hellinger distance estimator (Beran 1977a, 1977b), which can be asymptotically as efficient as the MLE. Additionally, Victoria-Feser and Ronchetti (1997) demonstrated that, in the presence of minor model contaminations, optimal bounded influence function estimators offer greater robustness than MLE for grouped data. The strategy of optimal grouping, in the sense of minimizing the loss of information, was introduced by Schader and Schmid (1986); however, this method remains within the likelihood estimation framework, Kleiber and Kotz (2003).
Therefore, the fundamental objective of this manuscript is to investigate the robustness of the MTuM estimator, specifically for the tail index of grouped single-parameter Pareto distributions, and to evaluate its performance against the corresponding MLE. Asymptotic distributional properties, such as normality, consistency, and the asymptotic relative efficiency in relation to the MLE, are established for the purpose of inferential justification. In addition, the paper strengthens its theoretical concepts with extensive simulation studies. It is noteworthy that the moments, when subject to threshold truncation and/or censorship, are consistently finite, irrespective of the underlying true distribution.
The structure of the remainder of this manuscript is outlined as follows. Section 2 offers a succinct summary of the scenarios involving grouped data, encompassing a variety of probability functions. Section 3 concentrates on the elaboration of the Method of Truncated Moments (MTuM) procedures specifically designed for grouped data, along with a discussion on the justification for their inferential application. An extensive simulation study is undertaken in Section 4 to augment the theoretical results across diverse scenarios. The manuscript concludes in Section 5, presenting our closing remarks and outlining possible paths for further research.

2. Pareto Grouped Data

Due to the complexity of the involved theory, we only investigate single parameter Pareto distribution in this scholarly work. As considered by Poudyal (2021b, sct. 3), let Y Pareto I ( α , x 0 ) with the distribution function F Y ( y ) = 1 x 0 / y α , y > x 0 , and zero elsewhere. Here, α > 0 represents the shape parameter, often referred to as the tail index, and x 0 > 0 is the known lower bound threshold. Consequently, if we define X : = l o g Y / x 0 , then X follows an exponential distribution, X Exp ( θ = 1 / α ) , with its distribution function given by F X ( x ) = 1 e x / θ . Hence, estimating α is equivalent to estimating the exponential parameter θ . Thus, for the purpose of analytic simplicity, we investigate θ , rather than α . The development and asymptotic behavior of MTuM estimators will be explored for a grouped sample drawn from an exponential distribution.
Let 0 < c 1 < < c m 1 < c m < be the group boundaries for the grouped data, where we define c 0 : = 0 and c m + 1 : = . Let X 1 , , X n i . i . d . X , where X has pdf f ( x | θ ) = f ( x ) = 1 θ e x θ and cdf F ( x | θ ) = F ( x ) . The computation of the empirical distribution function at the group boundaries is clear, but inside the intervals, the linearly interpolated empirical cdf as defined in Klugman et al. (2019, sct. 14.2), is the most common one. The linearly interpolated empirical cdf, called “ogive” and denoted by F n , is defined as
F n ( x ) = c j x c j c j 1 F n ( c j 1 ) + x c j 1 c j c j 1 F n ( c j ) ; if c j 1 < x c j , j m , Undefined ; if x > c m .
In the complete data case, we observe the following empirical frequencies of X:
P ^ c j 1 < X c j = F n ( c j ) F n ( c j 1 ) = n j n , j = 1 , , m + 1 ,
where n j = i = 1 n 1 { c j 1 < X i c j } , giving n = j = 1 m + 1 n j is the sample size.
Clearly, the empirical distribution F n is not defined in the interval ( c m , c m + 1 = ) , as it is impossible to draw a straight line joining two points c m , F n ( c m ) and ( , 1 ) unless F n ( c m ) = 1 .
The corresponding linearized population cdf F G is defined by
F G ( x ) = c j x c j c j 1 F ( c j 1 | θ ) + x c j 1 c j c j 1 F ( c j | θ ) ; if c j 1 < x c j , j m , F ( x | θ ) ; if x > c m .
The corresponding density function f n , called the histogram, is defined as
f n ( x ) = F n ( c j ) F n ( c j 1 ) c j c j 1 = n j n ( c j c j 1 ) ; if c j 1 < x c j , j m , Undefined ; if x > c m .
The empirical quantile function (the inverse of F n ) is then computed as
F n 1 ( s ) = c j 1 + ( c j c j 1 ) ( s F n ( c j 1 ) ) F n ( c j ) F n ( c j 1 ) ; if F n ( c j 1 ) < s F n ( c j ) , j m , Undefined ; if s > F n ( c m ) .
Similarly,
F G 1 ( s | θ ) = c j 1 + ( c j c j 1 ) ( s F ( c j 1 | θ ) ) F ( c j | θ ) F ( c j 1 | θ ) , F ( c j 1 | θ ) < s F ( c j | θ ) , j m ; F 1 ( s | θ ) , s > F ( c m | θ ) .
If individual claim losses X, when grouped, are subjected to further changes, like truncation, interval censoring, or coverage adjustments, then the underlying distribution function requires suitable modifications. For example, if m groups (n observations in total) are provided and it is known that only data above deductible d appeared, then the distributional assumption is that we observe
P ^ c j 1 < X c j | X > d = n j n , j = 1 , , m + 1 ,
with the group boundaries satisfying d = c 0 < c 1 < < c m < c m + 1 = .

3. MTuM for Grouped Data

For both MTM and MWM, if the right trimming/winsorzing proportion b is such that 1 b > F n ( c m ) , then we have c m < F n 1 ( 1 b ) < c m + 1 = . That is, F n 1 ( 1 b ) does not exist as the linearized empirical distribution F n is not defined in the interval ( c m , c m + 1 = ) , see Equation (1). As a consequence, F n 1 is not defined on the interval ( F n ( c m ) , 1 ] . Thus, in order to apply the MTM/MWM approach for a grouped sample, we always need to make sure that F n 1 ( 1 b ) c m , that is, 1 b F n ( c m ) , but this is problematic for different samples with the fixed right trimming/winsorzing b. With this fact in consideration, the asymptotic distributional properties of MTM and MWM estimators and from grouped data are very complicated and not easy to analytically derive if not intractable. But with the MTuM, we can always choose the right truncated threshold T, such that T c m . Therefore, we proceed with the MTuM approach for grouped data in the rest of this section. Let 0 t and T c m , with t < T , be the left and right truncation points, respectively.
Let us introduce the following notations:
p j p j ( θ ) : = F ( c j | θ ) P j P j ( θ ) : = F ( c j | θ ) F ( c j 1 | θ ) p j , n : = F n ( c j ) σ j , j 2 : = C o v F n ( c j ) , F n ( c j ) = C o v p j , p j I i , j : = 1 { X i c j } J i , j : = 1 { X i > c j } for 0 j , j m + 1 ; 0 i n .
Proposition 1.
Suppose 1 j j m . Then, C o v p j , n , 1 p j , n = p j ( 1 p j ) n .
Proof. 
Clearly, p j , n = ( 1 / n ) i = 1 n I i , j and 1 p j , n = ( 1 / n ) i = 1 n J i , j . Therefore,
C o v p j , n , 1 p j , n = C o v 1 n i = 1 n I i , j , 1 n i = 1 n J i , j = 1 n 2 C o v i = 1 n I i , j , i = 1 n J i , j = 1 n 2 k = 1 n i = 1 n C o v I k , j , J i , j = 1 n 2 i = 1 n C o v I i , j , J i , j = 1 n 2 n C o v I 1 , j , J 1 , j = 1 n E ( I 1 , j J 1 , j ) E ( I 1 , j ) E ( J 1 , j ) = 1 n 0 p j ( 1 p j ) = p j ( 1 p j ) n .
 □
The following corollary is an immediate consequence of Proposition 1.
Corollary 1.
Let ( F n ( c 1 ) , , F n ( c m ) ) be a vector of empirical distribution function evaluated at the group boundaries vector ( c 1 , , c m ) . Then, ( F n ( c 1 ) , , F n ( c m ) ) is AN F , n 1 Σ , where F = ( F ( c 1 | θ ) , , F ( c m | θ ) ) , Σ = σ j j 2 j , j = 1 m , with σ j j 2 = σ j j 2 = F ( c j | θ ) ( 1 F ( c j | θ ) ) for all j j .
Assume that c 0 c l 1 < t c l c r < T c r + 1 c m . Then,
F n ( t ) = A 1 F n ( c l 1 ) + B 1 F n ( c l ) and F n ( T ) = A 2 F n ( c r ) + B 2 F n ( c r + 1 ) ,
where
A 1 : = c l t c l c l 1 , A 2 : = c r + 1 T c r + 1 c r , B 1 : = t c l 1 c l c l 1 , and B 2 : = T c r c r + 1 c r .
also, consider
u l : = c l 2 t 2 2 ( c l c l 1 ) , v i : = c i + c i 1 2 , and z r : = T 2 c r 2 2 ( c r + 1 c r ) .
We now define the Method of Truncated Moment (MTuM) estimator from grouped data. By using the empirical cdf, Equation (1), and pdf, Equation (3), the sample truncated moment for a grouped data as defined by Poudyal (2021b) is given by
μ ^ = 1 F n ( T ) F n ( t ) t T h ( x ) f n ( x ) d x .
By using Equation (2), the corresponding linearized/grouped population mean is
g t T ( θ ) : = μ = u l P l ( θ ) + i = l + 1 r v i P i ( θ ) + z r P r + 1 ( θ ) A 2 p r ( θ ) + B 2 p r + 1 ( θ ) A 1 p l 1 ( θ ) B 1 p l ( θ ) = N * H * .
The truncated estimator of θ is determined by equating the sample truncated moment, as specified in Equation (6), with the population truncated moment, as presented in Equation (7). This equation is then solved for θ . The solutions obtained, denoted by θ ^ or θ ^ MTuM , are defined as the MTuM estimator of θ , provided that such a solution exists.
Assuming h ( x ) x and after some computation, we obtain
g μ ( p 1 , n , , p m , n ) : = μ ^ = u l ( p l , n p l 1 , n ) + i = l + 1 r v i ( p i , n p i 1 , n ) + z r ( p r + 1 , n p r , n ) A 2 p r , n + B 2 p r + 1 , n A 1 p l 1 , n B 1 p l , n = : N H .
Note that p 0 , n , = 0 . Thus, by the delta method (see, e.g., Serfling 1980, Theorem A, p. 122), we have
μ ^ AN μ = g μ ( F ) , n 1 D μ Σ D μ ,
where D μ : = g μ p 1 , n , , g μ p m , n p = F and p : = p 1 , n , , p m , n . Consider Σ μ : = D μ Σ D μ . Clearly, if 2 l < r then
g μ p j , n = 0 , for 1 j l 2 or j r + 2 ; u l H + A 1 N H 2 , for j = l 1 ; ( u l v l + 1 ) H + B 1 N H 2 , for j = l ; c j 1 c j + 1 2 H , for l + 1 j r 1 ; ( v r z r ) H A 2 N H 2 , for j = r ; z r H B 2 N H 2 , for j = r + 1 .
and if l = r ,
g μ p j , n = 0 , for 1 j l 2 or j l + 2 ; u l H + A 1 N H 2 , for j = l 1 ; ( u l z r ) H ( A 2 B 1 ) N H 2 , for j = l ; z r H B 2 N H 2 , for j = l + 1 .
Due to the intense nature of the function g t T ( θ ) , it is complicated to come up with an analytic justification establishing whether it is increasing or decreasing. But, at least for X E x p ( θ ) , g t T ( θ ) appears to be an increasing function of θ > 0 as shown in Figure 1. Generally, we summarize the result in the following conjecture.
Conjecture 1.
The function g t T ( θ ) is strictly increasing.
Proposition 2.
The function g t T ( θ ) has the following limiting values
lim θ 0 + g t T ( θ ) = u l A 1 ,
lim θ g t T ( θ ) = u l ( c l 1 c l ) + i = l + 1 r v i ( c i 1 c i ) + z r ( c r c r + 1 ) A 1 c l 1 + B 1 c l A 2 c r B 2 c r + 1 .
Proof. 
These limits can be established by using L’Hôpital’s rule.  □
Now, assuming the Conjecture 1 is true, then with Proposition 2, we have
Theorem 1.
The equation μ ^ = g t T ( θ ) has a unique solution θ ^ M T u M provided that
u l A 1 < μ ^ < u l ( c l 1 c l ) + i = l + 1 r v i ( c i 1 c i ) + z r ( c r c r + 1 ) A 1 c l 1 + B 1 c l A 2 c r B 2 c r + 1 .
Solve the equation μ ^ = μ for θ ^ M T u M , say θ ^ = : g θ ( μ ^ ) . Then, again by the delta method, we conclude that θ ^ AN g θ ( μ ) , n 1 g θ ( μ ) 2 Σ μ . Note that if both the left- and right-truncation points lie on the same interval, then μ ^ = t + T 2 = μ . So, the parameter to be estimated disappears from the equation, and hence we do not consider this case for further investigation. Define
P : = u l e c l 1 θ e c l θ + i = l + 1 r v i e c i 1 θ e c i θ + z r e c r θ e c r + 1 θ , Q : = B 2 1 e c r + 1 θ A 1 1 e c l 1 θ B 1 1 e c l θ .
Then, we obtain a fixed point function as θ = G ( θ ) , where
G ( θ ) = c r log μ ^ A 2 P + μ ^ Q μ ^ A 2 .
However, we need to consider the condition μ ^ ( A 2 + Q ) > P . Therefore, we need to be careful about the initialization of θ as the right truncation point T cannot be a boundary point because, if it was, then A 2 = 0 and we would not be able to divide by A 2 in the fixed point function θ = G ( θ ) .
Now, let us compute the derivative of g θ with respect to μ using implicit differentiation.
  • Case 1: Assume that the two truncation points are in two consecutive intervals, i.e., assume that l = r . Then, θ = g θ ( μ ^ ) = A B Λ + Δ , where
    A : = A 2 + B 2 A 1 B 1 , B : = A 2 e c r θ + B 2 e c r + 1 θ A 1 e c l 1 θ B 1 e c l θ , Λ : = u l θ 2 c l 1 e c l 1 θ c l e c l θ + z r θ 2 c r e c r θ c r + 1 e c r + 1 θ , Δ : = μ ^ θ 2 A 2 c r e c r θ + B 2 c r + 1 e c r + 1 θ A 1 c l 1 e c l 1 θ B 1 c l e c l θ .
  • Case 2: The other case is that the two truncation points are not in the two consecutive intervals, i.e., assume that l < r . Then, θ = g θ ( μ ^ ) = A B Γ + Δ , where Γ : = Λ + i = l + 1 r v i θ 2 c i 1 e c i 1 θ c i e c i θ and A , B , Λ , and Δ are defined above.
To obtain the exponential grouped MLE, consider P j ( θ ) : = e c j 1 θ e c j θ . Then, following Xue and Song (2002), we have θ ^ MLE AN θ , 1 n I 1 ( θ ) , where I ( θ ) = j = 1 m P j ( θ ) d ln P j ( θ ) d θ 2 .
Note that after finding the derivative, I ( θ ) can be expressed as
I ( θ ) = j = 1 m P j ( θ ) c j 1 e c j 1 θ c j e c j θ θ 2 e c j 1 θ e c j θ 2 = j = 1 m c j 1 e c j 1 θ c j e c j θ θ 2 2 1 P j ( θ ) .
The asymptotic performance of the MTuM estimator is measured through the asymptotic relative efficiency (ARE) in comparison to the grouped MLE. The ARE (see, e.g., Serfling 1980; van der Vaart 1998) is defined as
A R E ( M T u M , M L E ) = asymptotic variance of MLE estimator asymptotic variance of MTuM estimator .
The primary justification for employing the MLE as a standard/benchmark for comparison lies in its optimal asymptotic behavior in terms of variance, though this comes with the typical proviso of being subject to “under certain regularity conditions”. Therefore, the desired ARE as given by Equation (11) is computed as
A R E θ ^ MTuM , θ ^ MLE = I 1 ( θ ) ( g θ ( μ ) ) 2 Σ μ = I 1 ( θ ) ( g θ ( μ ) ) 2 D μ Σ D μ .
The numerical values of A R E θ ^ MTuM , θ ^ MLE , Equation (12), from Exp ( θ = 10 ) with group boundaries vector G : = ( 0 : 5 : 50 , 200 , ) , is summarized in Table 1. As shown in Table 1, greater robustness is achieved with wider truncation thresholds, that is, as the distance between t and T increases.

4. Simulation Study

This section augments the theoretical findings established in Section 3 with simulations. The primary objective is to determine the sample size required for the estimators to be unbiased (acknowledging that they are asymptotically unbiased), to validate the asymptotic normality, and to ensure that their finite sample relative efficiencies (REs) are converging towards the respective AREs. For calculating the RE, the MLE is utilized as the reference point. Consequently, the concept of asymptotic relative efficiency outlined in Equation (11) is adapted for finite sample analysis as follows:
R E ( M T u M , M L E ) = asymptotic variance of MLE estimator variance of a competing estimator MTuM .
The design of the simulation is detailed below, covering both the generation of data and the computation of various statistics as described:
(i)
The underlying ground-up distribution is assumed to be exponential with a mean parameter θ = 10 .
(ii)
Different sample sizes are explored: n = 50 , 100 , 250 , 500 , 1000 .
(iii)
A total of 1000 samples are generated for each scenario.
(iv)
The sample data are grouped according to the specified group boundaries: 0 = c 0 < c 1 < < c m .
(v)
We consider the following vectors of group boundaries:
G 1 : = 0 : 1 : 100 , 200 , , G 2 : = 0 : 1 : 200 , , G 3 : = 0 : 5 : 50 , 200 , , G 4 : = 0 : 10 : 100 , 200 , , and G 5 : = 0 : 50 : 200 , .
(vi)
For each grouping, 1000 estimates of θ are computed under the Method of Truncated Moments (MTuM) with varying truncation points for grouped data, denoted as θ ^ 1 , , θ ^ 1000 .
(vii)
The average estimated θ , denoted θ ^ ¯ , is calculated as θ ^ ¯ = i = 1 1000 θ ^ i / 1000 .
(vii)
This process is repeated 10 times, yielding averages θ ^ ¯ 1 , , θ ^ ¯ 10 .
(ix)
The overall mean, θ ^ , and the standard deviation, s e ( θ ^ ¯ ) , of these averages are computed as follows:
θ ^ = θ ^ ¯ 1 + + θ ^ ¯ 10 10 and s e ( θ ^ ¯ ) = θ ^ ¯ 1 θ ^ ^ 2 + + θ ^ ¯ 10 θ ^ ^ 2 10 .
(x)
The ratios θ ^ θ and s e ( θ ^ ¯ ) θ are reported in Table 2, Table 3, Table 4, Table 5 and Table 6.
(xi)
Similarly, the finite-sample relative efficiency (RE) of the MTuM with respect to the grouped MLE is calculated as R E 1 , , R E 10 . The mean and standard deviations of these RE values are reported for different vectors of group boundaries.
The outcomes of the simulations are documented in Table 2, Table 3, Table 4, Table 5 and Table 6. The entries represent the ratios of the mean estimated values to the true parameter θ = 10 based on 1000 samples and repeated 10 times. That is, the ratio of the estimated θ and the true θ = 10 , as described in item x above. The corresponding standard errors are presented in parentheses. In all tables, the last three columns (with ∞) represent analytic results, not results from simulations. The third last column is for the asymptotic relative efficiency of the MTuM with regards to grouped MLE, coming from Equation (12) and Table 1. For example, the group boundary vectors considered in Table 1 and Table 4 are exactly the same and given by
G G 3 = ( 0 : 5 : 50 , 200 , ) .
Then, for ( t , T ) = ( 2 , 12 ) , the corresponding entries in those two tables should match, and those matching entries, i.e., 0.11 , are boxed in both tables. Similarly, the second last column is for the asymptotic relative efficiency of the MTuM with regards to un-grouped MLE, and the very last column represents the asymptotic relative efficiency of grouped MLE with regards to un-grouped MLE.
If both the truncation points are in the same interval, say t l , t r [ c j 1 , c j ] , then we have μ ^ = μ = t l + t r 2 . Therefore, the parameter θ = 10 to be estimated disappeared, and hence the four rows on Table 6 are reported as n / a . As we move in sequence from Table 2, Table 3, Table 4, Table 5 and Table 6, it becomes noticeable that the convergence of the ratio of the estimated θ with the true θ , i.e., θ ^ ^ / θ , approaches the true asymptotic value of 1 at a more gradual pace. This is because the length of intervals going from Table 2, Table 3, Table 4, Table 5 and Table 6 increases. More specifically, both our intuition and the data presented in the tables suggest that when there is a wider gap between the thresholds (namely, t and T), the estimators tend to approach the true values at a slower rate.
In Table 2, Table 3, Table 4 and Table 5, it is interesting to observe that, even for the sample size n = 50 , the estimator θ ^ ^ successfully estimates their corresponding parameter θ , with less than ± 1 % of the relative bias, with one exception for ( t l , t r ) = ( 2 , 12 ) . As seen in Table 6, the relative bias is within ± 1 % only for the sample of size n = 1000 and for ( t l , t r ) = ( 2 , 12 ) . Similarly, as observed in those tables, it is clear that all the REs are asymptotically unbiased.

5. Concluding Remarks

In this scholarly work, we have introduced a novel Method of Truncated Moments (MTuM) estimator designed to estimate the tail index from grouped Pareto loss severity data, offering a robust alternative to maximum likelihood estimation (MLE). We have established theoretical justifications for the existence and asymptotic normality of the designed estimators. Additionally, we have conducted a detailed investigation into the finite sample performance across various sample sizes and group boundary vectors through a comprehensive simulation study.
Looking ahead, this paper predominantly addressed the estimation of the mean parameter of an exponential distribution (or equivalently, the tail index of a single-parameter Pareto distribution) using grouped sample data. Therefore, an avenue for future research is the extension of the proposed methodology to more complex scenarios and models. However, particularly for distributions with multiple parameters, examining the nature of the function g t T ( Parameters ) , as presented in Equation (7) and Conjecture 1, can be highly challenging, if not infeasible. The task of providing asymptotic inferential justification for the MTuM methodology when applied to multi-parameter distributions presents similar difficulties. In this context, a potential direction for future research involves adopting an algorithmic approach (i.e., designing simulation-based estimators for complex models) rather than focusing solely on inferential justification, Efron and Hastie (2016, p. xvi).
Moreover, evaluating the performance of this novel MTuM estimator in diverse practical risk analysis scenarios remains an important area for further assessment. Attempting to apply the designed MTuM methodology to real grouped insurance data revealed the challenge of finding publicly available data that fit the single-parameter Pareto (or equivalently, an exponential) model well. This issue, along with the often poor fit of the data to the Pareto model, highlighted the importance of adapting the MTuM to many other distributions as well. Such adaptation will provide the necessary flexibility to select the most suitable underlying models based on the initial diagnostic tests of the datasets. Therefore, there is a need to broaden the theoretical development of the MTuM approach to at least include the location-scale family of distributions.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Aigner, Dennis J., and Arthur S. Goldberger. 1970. Estimation of Pareto’s law from grouped observations. Journal of the American Statistical Association 65: 712–23. [Google Scholar] [CrossRef]
  2. Beran, Rudolf. 1977a. Minimum Hellinger distance estimates for parametric models. The Annals of Statistics 5: 445–63. [Google Scholar] [CrossRef]
  3. Beran, Rudolf. 1977b. Robust location estimates. The Annals of Statistics 5: 431–44. [Google Scholar] [CrossRef]
  4. Brazauskas, Vytaras, Bruce L. Jones, and Ričardas Zitikis. 2009. Robust fitting of claim severity distributions and the method of trimmed moments. Journal of Statistical Planning and Inference 139: 2028–43. [Google Scholar] [CrossRef]
  5. Chernoff, Herman, Joseph L. Gastwirth, and M. Vernon Johns. 1967. Asymptotic distribution of linear combinations of functions of order statistics with applications to estimation. Annals of Mathematical Statistics 38: 52–72. [Google Scholar] [CrossRef]
  6. Efron, Bradley, and Trevor Hastie. 2016. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. New York: Cambridge University Press. [Google Scholar] [CrossRef]
  7. Gatti, Selim, and Mario V. Wüthrich. 2023. Modeling lower-truncated and right-censored insurance claims with an extension of the MBBEFD class. arXiv arXiv:2310.11471. [Google Scholar]
  8. Kleiber, Christian, and Samuel Kotz. 2003. Statistical Size Distributions in Economics and Actuarial Sciences. Hoboken: John Wiley & Sons. [Google Scholar] [CrossRef]
  9. Klugman, Stuart A., Harry H. Panjer, and Gordon E. Willmot. 2019. Loss Models: From Data to Decisions, 5th ed. Hoboken: John Wiley & Sons. [Google Scholar]
  10. Lin, Nan, and Xuming He. 2006. Robust and efficient estimation under data grouping. Biometrika 93: 99–112. [Google Scholar] [CrossRef]
  11. Poudyal, Chudamani. 2021a. Robust estimation of loss models for lognormal insurance payment severity data. ASTIN Bulletin. The Journal of the International Actuarial Association 51: 475–507. [Google Scholar] [CrossRef]
  12. Poudyal, Chudamani. 2021b. Truncated, censored, and actuarial payment-type moments for robust fitting of a single-parameter Pareto distribution. Journal of Computational and Applied Mathematics 388: 113310. [Google Scholar] [CrossRef]
  13. Poudyal, Chudamani, Qian Zhao, and Vytaras Brazauskas. 2023. Method of winsorized moments for robust fitting of truncated and censored lognormal distributions. North American Actuarial Journal, 1–25. [Google Scholar] [CrossRef]
  14. Schader, Martin, and Friedrich Schmid. 1986. Optimal grouping of data from some skew distributions. Computational Statistics Quarterly 3: 151–59. [Google Scholar]
  15. Serfling, Robert J. 1980. Approximation Theorems of Mathematical Statistics. New York: John Wiley & Sons. [Google Scholar]
  16. Tukey, John W. 1960. A survey of sampling from contaminated distributions. In Contributions to Probability and Statistics. Stanford: Stanford University Press, pp. 448–85. [Google Scholar]
  17. van der Vaart, Aad W. 1998. Asymptotic Statistics. Cambridge: Cambridge University Press. [Google Scholar] [CrossRef]
  18. Victoria-Feser, Maria-Pia, and Elvezio Ronchetti. 1997. Robust estimation for grouped data. Journal of the American Statistical Association 92: 333–40. [Google Scholar] [CrossRef]
  19. Xue, Hongqi, and Lixin Song. 2002. Asymptotic properties of MLE for Weibull distribution with grouped data. Journal of Systems Science and Complexity 15: 176–86. [Google Scholar]
  20. Zhao, Qian, Vytaras Brazauskas, and Jugal Ghorai. 2018. Robust and efficient fitting of severity models and the method of Winsorized moments. ASTIN Bulletin 48: 275–309. [Google Scholar] [CrossRef]
Figure 1. Graphs of g t T for different values of θ . Left panel represents the graph of g t T ( θ ) for θ = 10 , ( t , T ) = ( 2 , 12 ) , and group boundaries vector v 1 = ( 0 , 5 , 10 , 15 , 20 , 25 ) . Similarly, right panel represents the graph of g t T ( θ ) for θ = 0.2 , ( t , T ) = ( 0.05 , 0.45 ) , and group boundaries vector v 2 = ( 0 , 0.1 , 0.2 , 0.3 , 0.4 , 0.5 ) .
Figure 1. Graphs of g t T for different values of θ . Left panel represents the graph of g t T ( θ ) for θ = 10 , ( t , T ) = ( 2 , 12 ) , and group boundaries vector v 1 = ( 0 , 5 , 10 , 15 , 20 , 25 ) . Similarly, right panel represents the graph of g t T ( θ ) for θ = 0.2 , ( t , T ) = ( 0.05 , 0.45 ) , and group boundaries vector v 2 = ( 0 , 0.1 , 0.2 , 0.3 , 0.4 , 0.5 ) .
Risks 12 00045 g001
Table 1. Numerical values of A R E θ ^ MTuM , θ ^ MLE given by Equation (12) for selected t and T with G = ( 0 : 5 : 50 , 200 , ) from Exp ( θ = 10 ) . The truncation thresholds t and T are rounded to two decimal places, for example, 70 F 1 ( 0.999 ) , etc.
Table 1. Numerical values of A R E θ ^ MTuM , θ ^ MLE given by Equation (12) for selected t and T with G = ( 0 : 5 : 50 , 200 , ) from Exp ( θ = 10 ) . The truncation thresholds t and T are rounded to two decimal places, for example, 70 F 1 ( 0.999 ) , etc.
t(F(t)) T ( S ( T ) = 1 F ( T ) )
70(0.001)45(0.01)35(0.03)25(0.08)19(0.15)12(0.30)7(0.50)
00.0(0.00)0.8700.7620.5840.3500.2240.0930.038
00.5(0.05)0.8690.7600.5830.3490.2250.0950.038
01.0(0.10)0.8650.7560.5790.3470.2250.0980.038
02.0(0.14)0.8410.7310.5590.3340.2200.1050.038
03.0(0.26)0.7820.6750.5120.3020.2010.1140.038
07.0(0.50)0.4830.3960.2740.1310.0710.023
14.0(0.75)0.2140.1560.0870.0270.014
19.0(0.85)0.1180.0740.0330.009
23.0(0.90)0.0760.0410.014
Note: The boxed entry will be compared with an entry from Table 4.
Table 2. Finite-sample performance evaluation of MTuM with regards to MLE for grouped data from an exponential ( θ = 10 ) and group boundaries vector G 1 = ( 0 : 1 : 100 , 200 , ) .
Table 2. Finite-sample performance evaluation of MTuM with regards to MLE for grouped data from an exponential ( θ = 10 ) and group boundaries vector G 1 = ( 0 : 1 : 100 , 200 , ) .
MTuM Performance for Exponential Grouped Data
n
t l t r 50 100 250 500 1000
θ ^ / θ 02001.00(0.003)1.00(0.003)1.00(0.002)1.00(0.001)1.00(0.001)1--
0501.01(0.004)1.00(0.002)1.00(0.002)1.00(0.001)1.00(0.001)1--
01001.00(0.003)1.00(0.003)1.00(0.001)1.00(0.001)1.00(0.001)1--
01401.00(0.005)1.00(0.004)1.00(0.002)1.00(0.001)1.00(0.001)1--
2123.22(0.174)1.68(0.130)1.14(0.021)1.06(0.012)1.03(0.005)1--
RE02001.01(0.032)1.02(0.037)1.02(0.039)0.99(0.045)0.99(0.048)1.001.001.00
0500.77(0.048)0.79(0.058)0.81(0.032)0.84(0.035)0.83(0.028)0.820.821.00
01001.00(0.045)0.98(0.040)1.02(0.050)0.99(0.034)0.99(0.046)1.000.991.00
01400.96(0.042)1.01(0.063)0.97(0.047)1.00(0.047)1.01(0.033)1.001.001.00
2120.00(0.000)0.00(0.000)0.01(0.004)0.02(0.004)0.03(0.002)0.040.041.00
Table 3. Finite-sample performance evaluation of MTuM with regards to MLE for grouped data from an exponential ( θ = 10 ) and group boundaries vector G 2 = ( 0 : 1 : 200 , ) .
Table 3. Finite-sample performance evaluation of MTuM with regards to MLE for grouped data from an exponential ( θ = 10 ) and group boundaries vector G 2 = ( 0 : 1 : 200 , ) .
MTuM Performance for Exponential Grouped Data
n
t l t r 50 100 250 500 1000
θ ^ / θ 02001.00(0.006)1.00(0.003)1.00(0.002)1.00(0.001)1.00(0.001)1--
0501.01(0.005)1.00(0.003)1.00(0.002)1.00(0.002)1.00(0.001)1--
01001.00(0.006)1.00(0.003)1.00(0.002)1.00(0.002)1.00(0.001)1--
01401.00(0.005)1.00(0.004)1.00(0.002)1.00(0.001)1.00(0.001)1--
2123.17(0.169)1.67(0.083)1.16(0.024)1.05(0.007)1.03(0.006)1--
RE02001.00(0.061)0.99(0.071)0.99(0.047)0.98(0.068)1.04(0.049)1.001.001.00
0500.75(0.054)0.81(0.037)0.81(0.023)0.82(0.038)0.84(0.038)0.820.821.00
01000.99(0.047)0.96(0.039)0.99(0.065)1.03(0.043)0.99(0.057)1.000.991.00
01400.99(0.028)1.02(0.046)1.02(0.050)1.00(0.041)1.01(0.043)1.001.001.00
2120.00(0.000)0.00(0.000)0.01(0.003)0.02(0.004)0.03(0.001)0.040.041.00
Table 4. Finite-sample performance evaluation of MTuM with regards to MLE for grouped data from an exponential ( θ = 10 ) and group boundaries vector G 3 = ( 0 : 5 : 50 , 200 , ) .
Table 4. Finite-sample performance evaluation of MTuM with regards to MLE for grouped data from an exponential ( θ = 10 ) and group boundaries vector G 3 = ( 0 : 5 : 50 , 200 , ) .
MTuM Performance for Exponential Grouped Data
Sample Size, n
t l t r 50 100 250 500 1000
θ ^ / θ 02001.00(0.004)1.00(0.003)1.00(0.002)1.00(0.002)1.00(0.001)1--
0501.01(0.005)1.00(0.003)1.00(0.002)1.00(0.001)1.00(0.001)1--
01001.00(0.004)1.00(0.003)1.00(0.002)1.00(0.001)1.00(0.001)1--
01401.00(0.004)1.00(0.004)1.00(0.003)1.00(0.002)1.00(0.001)1--
2121.41(0.062)1.13(0.019)1.04(0.007)1.02(0.003)1.01(0.004)1--
RE02000.88(0.026)0.91(0.049)0.88(0.027)0.87(0.033)0.86(0.037)0.860.840.97
0500.79(0.043)0.79(0.044)0.81(0.045)0.82(0.019)0.82(0.028)0.830.800.97
01000.92(0.036)0.94(0.030)0.94(0.033)0.94(0.038)0.94(0.031)0.950.920.97
01401.01(0.078)0.99(0.047)1.01(0.040)0.94(0.038)1.02(0.038)1.000.970.97
2120.01(0.002)0.03(0.007)0.07(0.005)0.09(0.004)0.10(0.006)0.110.100.97
Note: The boxed entry is identical to the boxed entry from Table 1.
Table 5. Finite-sample performance evaluation of MTuM with regards to MLE for grouped data from an exponential ( θ = 10 ) and group boundaries vector G 4 = ( 0 : 10 : 100 , 200 , ) .
Table 5. Finite-sample performance evaluation of MTuM with regards to MLE for grouped data from an exponential ( θ = 10 ) and group boundaries vector G 4 = ( 0 : 10 : 100 , 200 , ) .
MTuM Performance for Exponential Grouped Data
n
t l t r 50 100 250 500 1000
θ ^ / θ 02001.00(0.002)1.00(0.004)1.00(0.002)1.00(0.001)1.00(0.001)1--
0501.01(0.007)1.00(0.004)1.00(0.002)1.00(0.001)1.00(0.001)1--
01001.00(0.006)1.00(0.003)1.00(0.002)1.00(0.002)1.00(0.001)1--
01401.00(0.006)1.00(0.002)1.00(0.002)1.00(0.001)1.00(0.001)1--
2121.16(0.022)1.06(0.007)1.02(0.005)1.01(0.004)1.01(0.002)1--
RE02001.00(0.032)1.03(0.038)1.00(0.036)0.99(0.049)1.01(0.048)1.000.920.92
0500.76(0.054)0.78(0.031)0.78(0.028)0.80(0.031)0.81(0.027)0.810.740.92
01000.97(0.064)0.97(0.038)1.00(0.034)0.97(0.053)0.99(0.026)1.000.920.92
01400.99(0.042)1.00(0.061)1.01(0.058)1.00(0.024)1.01(0.044)1.000.920.92
2120.04(0.010)0.11(0.020)0.16(0.010)0.18(0.007)0.18(0.008)0.180.170.92
Table 6. Finite-sample performance evaluation of MTuM with regards to MLE for grouped data from an exponential ( θ = 10 ) and group boundaries vector G 1 = ( 0 : 50 : 200 , ) .
Table 6. Finite-sample performance evaluation of MTuM with regards to MLE for grouped data from an exponential ( θ = 10 ) and group boundaries vector G 1 = ( 0 : 50 : 200 , ) .
MTuM Performance for Exponential Grouped Data
n
t l t r 50 100 250 500 1000
MEAN02000.68(0.011)0.78(0.007)0.91(0.006)0.97(0.003)0.99(0.002)1--
050n/an/an/an/an/an/a--
01000.68(0.011)0.78(0.008)0.91(0.011)0.97(0.004)0.99(0.003)1--
01400.68(0.011)0.78(0.014)0.92(0.006)0.97(0.004)0.99(0.002)1--
212n/an/an/an/an/an/a--
RE02000.43(0.003)0.31(0.006)0.32(0.014)0.52(0.033)0.84(0.079)1.000.170.17
050n/an/an/an/an/an/a--
01000.43(0.007)0.30(0.007)0.32(0.018)0.53(0.060)0.84(0.046)0.970.170.17
01400.43(0.006)0.31(0.009)0.34(0.018)0.55(0.030)0.87(0.058)1.000.170.17
212n/an/an/an/an/an/a--
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Poudyal, C. Robust Estimation of the Tail Index of a Single Parameter Pareto Distribution from Grouped Data. Risks 2024, 12, 45. https://doi.org/10.3390/risks12030045

AMA Style

Poudyal C. Robust Estimation of the Tail Index of a Single Parameter Pareto Distribution from Grouped Data. Risks. 2024; 12(3):45. https://doi.org/10.3390/risks12030045

Chicago/Turabian Style

Poudyal, Chudamani. 2024. "Robust Estimation of the Tail Index of a Single Parameter Pareto Distribution from Grouped Data" Risks 12, no. 3: 45. https://doi.org/10.3390/risks12030045

APA Style

Poudyal, C. (2024). Robust Estimation of the Tail Index of a Single Parameter Pareto Distribution from Grouped Data. Risks, 12(3), 45. https://doi.org/10.3390/risks12030045

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop