Next Article in Journal
Predicting and Mitigating Delays in Cross-Dock Operations: A Data-Driven Approach
Previous Article in Journal
Bidirectional f-Divergence-Based Deep Generative Method for Imputing Missing Values in Time-Series Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Weighted Lindley Model with Applications to Extreme Historical Insurance Claims

by
Morad Alizadeh
1,
Mahmoud Afshari
1,
Gauss M. Cordeiro
2,
Ziaurrahman Ramaki
1,
Javier E. Contreras-Reyes
3,*,
Fatemeh Dirnik
1 and
Haitham M. Yousof
4
1
Department of Statistics, Faculty of Intelligent Systems Engineering and Data Science, Persian Gulf University, Bushehr 75169, Iran
2
Department of Statistics, Federal University of Pernambuco, Recife 50670-901, Brazil
3
Instituto de Matemática, Física y Estadística, Facultad de Ingeniería y Negocios, Universidad de Las Américas, Sede Viña del Mar, 7 Norte 1348, Viña del Mar 2531098, Chile
4
Department of Statistics, Mathematics and Insurance, Benha University, Benha 13518, Egypt
*
Author to whom correspondence should be addressed.
Submission received: 20 November 2024 / Revised: 4 January 2025 / Accepted: 8 January 2025 / Published: 15 January 2025
(This article belongs to the Section Reliability Engineering)

Abstract

In this paper, we propose a weighted Lindley (NWLi) model for the analysis of extreme historical insurance claims. It extends the classical Lindley distribution by incorporating a weight parameter, enabling more flexibility in modeling insurance claim severity. We provide a comprehensive theoretical overview of the new model and explore two practical applications. First, we investigate the mean-of-order P (MOOP(P)) approach for quantifying the expected claim severity based on the NWLi model. Second, we implement a peaks over a random threshold (PORT) analysis using the value-at-risk metric to assess extreme claim occurrences under the new model. Further, we provide a simulation study to evaluate the accuracy of the estimators under various methods. The proposed model and its applications provide a versatile tool for actuaries and risk analysts to analyze and predict extreme insurance claim severity, offering insights into risk management and decision-making within the insurance industry.

1. Introduction

The Lindley (Li) distribution is a very important tool in insurance and actuarial sciences, due to its flexibility and applicability in modeling claim count data and event occurrences [1]. In insurance, it is used to model the number of insurance claims or the frequency of certain events occurring within a given time frame. It is particularly useful for situations where the data exhibit overdispersion (higher variability than expected under a Poisson distribution) or where there is a clustering of events. Actuaries adopt the Li distribution to develop risk models that account for the nature of claims and events. This distribution can effectively capture scenarios where the occurrence of events is influenced by previous events, which is a common characteristic in insurance claims (e.g., accident-prone areas or policyholders).
Actuaries often use the standard Li distribution to analyze historical claims data and forecast future claim frequencies. By fitting this distribution to observed data, actuaries can estimate the likelihood of different claim scenarios and assess associated risks more accurately [2,3]. Understanding the distribution of claim counts is crucial for risk management in insurance. The Li distribution allows insurers to calculate probabilities for different levels of claim frequency, aiding in setting appropriate premiums and reserves. While primarily used in insurance and actuarial contexts, it also finds applications in other fields such as reliability engineering, epidemiology, and queuing theory, where modeling event occurrences over time is essential. However, it has many defects related to its limited flexibility and application. For this reason, we present an updated and more flexible version of the Li distribution, called the weighted Lindley (NWLi) model, for the analysis and treatment of actuarial risks. We will show its flexibility for studying and comparing a set of risk indicators.
Actuarial data often exhibit skewed or asymmetric characteristics, which can be challenging when modeling standard symmetric distributions. Weighted distributions can be tailored to address such asymmetry by assigning different weights to various parts of the distribution, thus providing a more accurate representation of the data. In insurance and risk management, outliers or extreme values can have significant implications but are often poorly handled by standard distributions. Weighted distributions can assign lower weights to extreme values, thereby reducing their influence on the overall model and improving robustness against outliers [4,5].
The MOOP(P) and peaks over a random threshold–value-at-risk (PORT-VaR) estimators play a crucial role in dealing with historical claims data by providing robust tools for analyzing tail risks, quantifying extreme loss events, supporting risk management decisions, and ensuring regulatory compliance within the insurance industry [6]. Their practical importance lies in enhancing insurers’ ability to understand and mitigate the impact of extreme events on their business operations and financial stability.
This paper is summarized as follows. Section 2 defines the NWLi distribution, and presents some useful motivations. Some structural properties of the proposed distribution are investigated in Section 3. Five alternative methods for estimating the model parameters are introduced in Section 4. Some simulations are presented in Section 5. Two real applications to show the utility of the new model are discussed in Section 6. The theory of the risk indicators is addressed in Section 7. The historical claims analysis, including the MOOP(P) assessments and optimal order of P and PORT-VaR estimator for extreme claims, is summarized in Section 8. Finally, Section 9 concludes the paper.

2. The Model

Following the principle of distributional weighting, the cumulative distribution function (CDF) of the two-parameter weighted Lindley distribution is
F ( x ) F ( x ; α , β ) = 1 1 + α α + 1 x e α x 1 + 1 + β β + 1 x e β x = 1 ω x ; α 1 + ω x ; β , x > 0 ,
where α > 0 and β > 0 are two shape parameters and
ω x ; β = 1 + β β + 1 x e β x .
Its probability density function (PDF) is
f ( x ) f ( x ; α , β ) = α 2 1 + α ( 1 + x ) e α x 1 + ω x ; β + β 2 1 + β ( 1 + x ) 1 ω x ; α 1 + ω x ; β 2 .
For α = β , Equation (2) reduces to
f ( x ; α ) = 2 α 2 ( 1 + x ) e α x ( 1 + α ) 1 + ( 1 + α x 1 + α ) e α x 2 = w ( x ) f L i ( x ; α ) ,
where
w ( x ) = 2 1 + ( 1 + α x 1 + α ) e α x 2
and
f L i ( x ; α ) = α 2 ( 1 + x ) e α x 1 + α
denotes the PDF of the Lindley distribution with parameter α . This means that the new model is a weighted Lindley distribution.
Figure 1 displays plots of the NWLi density for certain parameter values, where the new density can be right skewed with no peaks, right skewed with one peak, a wide peak, and sharpened peak.
The hazard rate function (HRF) of the NWLi model reduces to
h ( x ) h ( x ; α , β ) = β 2 1 + β 1 + x e β x 1 ω x ; α + α 2 α + 1 1 + x e α x 1 + ω x ; β 1 + ω x ; β ω x ; α + ω x ; β .
Figure 2 reports some hazard rate plots, which reveal that the new HRF can be decreasing-constant, increasing-constant, upside down, and bathtub.
The NWLi model and its applications have several important scientific motivations, particularly in addressing critical needs within the insurance industry. Some of them are as follows:
  • The development and application of the NWLi model represents a novel advancement in statistical modeling techniques tailored specifically for extreme insurance claims. Therefore, this article contributes to the ongoing evolution of statistical methods in risk assessment and management.
  • This paper addresses a significant knowledge gap in the analysis of sudden large losses within insurance companies. The results and methodologies presented offer valuable insights into modeling extreme claim severity, which is essential for developing effective risk management strategies.
  • This work can directly benefit insurance companies by providing practical tools and methodologies to assess and mitigate sudden large losses. By showcasing the relevance and impact of the NWLi model in addressing real challenges, the paper serves as a valuable resource for industry practitioners.
  • Insurance companies rely on scientific evidence and rigorous methodologies to make informed decisions about risk exposure and financial resilience. The NWLi model can support evidence-based decision-making in insurance practices.
  • It encourages innovative approaches that can enhance the resilience and competitiveness of insurance companies in addressing sudden large and extreme losses.

3. Properties

For the following properties, let X ∼ NWLi(α, β).

3.1. Asymptotic Properties

The following asymptotic results hold when x 0 + :
F ( x ) α x , f ( x ) α , h ( x ) α 1 α x .
Let δ = min ( α , β ) . If x , then the asymptotic behaviors hold
1 F ( x ) δ x 1 + δ e δ x , f ( x ) δ x 1 + δ e δ x , h ( x ) δ .

3.2. Moments

The nth ordinary moment of X follows as
E ( X n ) = n 0 x n 1 ω x ; α + ω x ; β 1 + ω x ; β d x = n i = 0 ( 1 ) i 0 ω x ; β i x n 1 ω x ; α + ω x ; β d x = i = 0 j = 0 i ( 1 ) i i j β β + 1 j 0 x n + j 1 e i β x ω x ; α + ω x ; β d x .
By using a power series, we obtain
E ( X n ) =   i = 0 j = 0 i ( 1 ) i i j β β + 1 j [ Γ ( n + j ) i β + β n + j + β Γ ( n + j + 1 ) 1 + β ) ( i β + β n + j + 1 + Γ ( n + j ) i β + α n + j + α Γ ( n + j + 1 ) ( α + 1 ) ( i β + α ) n + j + 1 ] ,
where Γ ( · ) is the usual gamma function.
We calculate the mean, skewness, and kurtosis of X from (4). The plots of these measures are reported in Figure 3. The superiority of the proposed model over the Li distribution is noted.
The nth lower incomplete moment of X
m n ( x ) E ( X n | X x ) = 1 F ( x ) 0 x t n f ( t ) d t ,
follows as
m n ( x ) = 1 F ( x ) 0 x t n β 2 1 + β ( 1 + t ) e β t [ 1 ω t ; α ] + α 2 1 + α ( 1 + t ) e α t [ 1 + ω t ; β ] [ 1 + ω t ; β ] 2 d t .
By using the expansion for any | t | < 1 , we can write
1 ( 1 + t ) 2 = i = 0 ( 1 ) i ( i + 1 ) t i ,
we have
m n ( x ) = 1 F ( x ) 0 x t n i = 0 ( 1 ) i ( i + 1 ) ω t ; β i [ β 2 1 + β ( 1 + t ) e β t [ 1 ω t ; α ] + α 2 1 + α ( 1 + t ) e α t [ 1 + ω t ; β ] ] d t = 1 F ( x ) 0 x i = 0 j = 0 i ( 1 ) i i j ( i + 1 ) ( β β + 1 ) j t n + j e i β t [ β 2 1 + β ( 1 + t ) e β t [ 1 ω t ; α ] + α 2 1 + α ( 1 + t ) e α t [ 1 + ω t ; β ] ] d t .
Finally,
m n ( x ) = 1 F ( x ) i = 0 j = 0 i q i , j [ α 2 β γ ( n + j + 3 , ( α + ( i + 1 ) β ) x ) ( α + 1 ) ( β + 1 ) ( α + ( i + 1 ) β ) n + j + 3 + α 2 γ ( n + j + 1 , ( α + ( i + 1 ) β ) x ) ( α + 1 ) ( α + ( i + 1 ) β ) n + j + 1 + α 2 γ ( n + j + 2 , ( α + ( i + 1 ) β ) x ) ( α + 1 ) ( α + β ( i + 1 ) ) n + j + 2 + α 2 β γ ( n + j + 2 , ( α + ( i + 1 ) β ) x ) ( α + 1 ) ( β + 1 ) ( α + ( i + 1 ) β ) n + j + 2 + α 2 γ ( n + j + 1 , ( α + i β ) x ) ( α + 1 ) ( α + i β ) n + j + 1 + α 2 γ ( n + j + 2 , ( α + i β ) x ) ( α + 1 ) ( α + i β ) n + j + 2 α β 2 γ ( n + j + 3 , ( α + ( i + 1 ) β ) x ) ( α + 1 ) ( β + 1 ) ( α + ( i + 1 ) β ) n + j + 3 α β 2 γ ( n + j + 2 , ( α + ( i + 1 ) β ) x ) ( α + 1 ) ( β + 1 ) ( α + ( i + 1 ) β ) n + j + 2 β 2 γ ( n + j + 1 , ( α + ( i + 1 ) β ) x ) ( β + 1 ) ( α + ( i + 1 ) β ) n + j + 1 β 2 γ ( n + j + 2 , ( α + ( i + 1 ) β ) x ) ( β + 1 ) ( α + ( i + 1 ) β ) n + j + 2 β 2 γ ( n + j + 1 , ( i + 1 ) β x ) ( β + 1 ) ( ( i + 1 ) β ) n + j + 1 β 2 γ ( n + j + 2 , ( i + 1 ) β x ) ( β + 1 ) ( ( i + 1 ) β ) n + j + 2 ] ,
where
q i , j = ( 1 ) i ( i + 1 ) i j β β + 1 j ,
and
γ ( s , x ) = 0 x t s 1 e t d t
is the lower incomplete gamma function.
The mean deviations of X about any point (for example, mean and median) and the Bonferroni and Lorentz curves can be easily calculated from its first incomplete moment m 1 ( x ) and quantile function u = F 1 ( x ) , determined numerically from (1).

4. Estimation Methods

4.1. Maximum Likelihood

We estimate the parameters of the NWLi model using the maximum likelihood (ML) method. If x 1 , , x n is a random sample of size n from the NWLi ( α , β ) distribution and θ = ( α , β ) is the parameter vector, we have
F ( x ) = p ( x ) q ( x ) , f ( x ) = r ( x ) q ( x ) s ( x ) p ( x ) q ( x ) 2 ,
where
p ( x ) = 1 ω x ; α , q ( x ) = 1 + ω x ; β , r ( x ) = d p ( x ) d x = α 2 1 + α ( 1 x ) e α x , s ( x ) = d q ( x ) d x = β 2 1 + β ( 1 + x ) e β x .
The log-likelihood function for θ has the form
( θ ) = i = 1 n log [ r ( x i ) q ( x i ) s ( x i ) p ( x i ) ] 2 i = 1 n log [ q ( x i ) ] .
The parameter estimates can be found by maximizing ( θ ) using numerical methods; e.g., the AdequacyModel function [7] of the R software.
Furthermore, we can derive the likelihood equations by differentiating ( θ ) in relation to the parameters
( θ ) α = i = 1 n r i ( α ) q i s i p i ( α ) r i q i s i p i = 0 , ( θ ) β = i = 1 n r i q i ( β ) s i ( β ) p i r i q i s i p i i = 1 n q i ( β ) q i = 0 ,
where
r i ( α ) = r i α = ( 1 x i ) 1 1 ( 1 + α ) 2 α 2 1 + α x i e α x i , s i ( β ) = ( 1 + x i ) 1 1 ( 1 + β ) 2 β 2 1 + β x i e β x i , p i ( α ) = 1 + 1 ( 1 + α ) 2 α x i α + 1 x i e α x i , q i ( β ) = 1 + 1 ( 1 + β ) 2 β x i β + 1 x i e β x i .
The Hessian matrix H ( θ ) is derived by taking the second derivatives of ( θ ) with respect to α and β :
H ( θ ) = 2 α 2 2 α β 2 β α 2 β 2 .
The inverse of H ( θ ) provides an asymptotic variance–covariance matrix of the ML estimators. Under regularity conditions, these estimators are asymptotically normal:
θ ^ N 2 ( θ , [ H ( θ ) ] 1 ) ,
where N 2 denotes the bi-variate normal random variable.

4.2. Least Squares

The estimates in the least squares estimation (LSE) method are found by minimizing the function
S L S E ( α , β ) = i = 1 n F ( x i : n ; α , β ) i n + 1 2 ,
where, from now on, x 1 : n , , x n : n are the ordered observations.

4.3. Weighted Least Squares

The objective function of the weighted least squares estimation (WLSE) has the form
S W L S E ( α , β ) = i = 1 n ( n + 1 ) 2 ( n + 2 ) i ( n i + 1 ) F ( x i : n ; α , β ) i n + 1 2 .

4.4. Cramér-Von Mises

The objective function of the Cramér-von Mises estimator (CME) is
S C M E ( α , β ) = 1 12 n + i = 1 n F ( x i : n ; α , β ) 2 i 1 2 n 2 .

4.5. Anderson–Darling

The Anderson–Darling estimator (ADE) [8] and right-tailed Anderson–Darling estimator (RTADE) are defined by minimizing
S A D E ( α , β ) = n 1 n i = 1 n ( 2 i 1 ) log F ( x i ; α , β ) + log F ¯ ( x n + 1 i ; α , β ) , S R T A D E ( α , β ) = n 2 2 i = 1 n F ( x i ; α , β ) 1 n i = 1 n ( 2 i 1 ) log F ¯ ( x n + 1 i ; α , β ) ,
respectively, where F ¯ ( · ) = 1 F ( · ) .
The estimates of the parameters are determined by equating the first partial derivatives of these functions to zero. The nonlinear equations are solved numerically using Rsoftware version 4.4.1.

5. Simulations

Some simulations were conducted to evaluate different estimation methods for the parameters α and β in the NWLi distribution. Initially, a set of parameters was selected, namely ( α , β ) = ( 0.6 , 0.5 ) and ( α , β ) = ( 0.5 , 0.4 ) . Further, for each of these parameter sets, samples were generated from the NWLi distribution with sample size n = 50 , 100 , 500 . The biases and mean squared errors (MSEs) of the estimators were calculated from
Bias ( α ^ ) = 1 1000 i = 1 1000 ( α ^ i α ) , Bias ( β ^ ) = 1 1000 i = 1 1000 ( β ^ i β ) , MSE ( α ^ ) = 1 1000 i = 1 1000 ( α ^ i α ) 2 , MSE ( β ^ ) = 1 1000 i = 1 1000 ( β ^ i β ) 2 .
Therefore, α ^ i and β ^ i denote the estimators of α and β obtained from these methods.
Figure 4 and Figure 5 reveal that these estimation methods performed well for this model. The MSEs for the MLE and WLSE were the lowest, thus indicating these as the best estimators. Consequently, the aforementioned methods provided the best estimation for the proposed model.

6. Applications

We present two applications of the new model and compare it to other distributions, namely: log-logistic half-logistic (NOLL-HL) [9], power Lindley (PLi) [10], Weibull (W), Li [11], generalized exponential (GE) [12], and an extension of the exponential (NH) [13]. The best model was chosen based on Cramér-von Mises (W), Anderson–Darling (A), Akaike information criterion (AIC), and Bayesian information criterion (BIC). The maximum likelihood estimates (MLEs), their standard errors (SEs), and the statistics were found using the AdequacyModel function [7] of R software.

6.1. Failure Times

The failure times of 50 components (per 1000 h) [14] are 0.036, 0.058, 0.061, 0.074, 0.078, 0.086, 0.102, 0.103, 0.114, 0.116, 0.148, 0.183, 0.192, 0.254, 0.262, 0.379, 0.381, 0.538, 0.570, 0.574, 0.590, 0.618, 0.645, 0.961, 1.228, 1.600, 2.006, 2.054, 2.804, 3.058, 3.076, 3.147, 3.625, 3.704, 3.931, 4.073, 4.393, 4.534, 4.893, 6.274, 6.816, 7.896, 7.904, 8.022, 9.337, 10.940, 11.020, 13.880, 14.730, 15.080.
The estimated parameters, their SEs, and the goodness-of-fit statistics for each of the models are reported in Table 1. In addition, the NWLi density fit is illustrated in Figure 6 for the failure time data. It is noted that these statistics for the NWLi model had the lowest values. Therefore, this model provided a better fit to the data compared to the other competing models.

6.2. Survival Data

The data referring to 48 patients who underwent allogeneic hematopoietic stem cell transplantation for multiple myeloma [15] are as follows: 1, 1, 2, 2, 2, 3, 5, 5, 6, 6, 6, 6, 7, 7, 7, 9, 11, 11, 11, 11, 11, 13, 14, 15, 16, 16, 17, 17, 18, 19, 19, 24, 25, 26, 32, 35, 37, 41, 42, 51, 52, 54, 58, 66, 67, 88, 89, 92.
Figure 7 shows the NWLi density fitted to the survival data and Table 2 reports the MLEs and SEs for the various models along with the adequacy measures. It can be seen that NWLi was the best model for the current data.

7. Risk Indicators

7.1. The Mean of Order P and Optimal Order of P

The MOOP(P) estimators provide a flexible parametric framework for estimating tail risks associated with historical claims data. By selecting an appropriate order P, the MOOP(P) estimator can capture different characteristics of the distribution’s tail, which is crucial for modeling extreme events in insurance. In addition, these estimators focus on tail behavior, particularly the heavy-tailedness of claim distributions. This is an important issue in insurance, where rare but severe events (e.g., catastrophic losses) can significantly impact risk assessment and pricing. Moreover, they are utilized in reinsurance to assess the tail risk of aggregated claims. Re-insurers use these estimators to understand potential losses due to rare and severe events, helping them manage risk exposure more effectively [16].
The MOOP(P) analysis is a statistical technique used to describe the typical behavior or central tendency of a dataset by considering different orders of moments. MOOP(P) is computed by raising each data point to a positive integer power P, and then averaging these values. This method was formally defined by [16]. The choice of the optimal P in MOOP(P) analysis involves selecting the most appropriate value of P that provides meaningful insights into the dataset’s characteristics or distribution. The selection of P can influence how sensitive the analysis is to various aspects of the dataset. Typically, MOOP(P) is calculated for a range of p values (e.g., p = 1 , 2 , ) to explore different moments and their impacts on the dataset. Comparing MOOP(P) values across different orders of P allows researchers to identify patterns, trends, or critical points in the data. Lower values of P (e.g., p = 1) emphasize linear relationships and the overall distribution, while higher values of P capture more complex or extreme variations within the dataset. The choice of the optimal P is determined based on the specific objectives of the analysis, aiming to strike a balance between sensitivity to dataset variations (higher P) and the risk of overfitting or loss of generality (lower P). MOOP(P) analysis and the selection of the optimal P have diverse applications across various fields. For instance, in finance, selecting the optimal P can assist in risk assessment and modeling, while in signal processing or image analysis, it can reveal important features or patterns within data.

7.2. The PORT-VaR Estimator

The PORT-VaR estimator focuses on estimating the VaR associated with extreme loss events based on ordered claims data. This helps insurers understand the potential magnitude of losses beyond certain thresholds. It provides insights into tail risk, which is essential for understanding the potential impact of extreme events on an insurer’s financial position. Insurers can use PORT-VaR to accurately quantify tail risk exposure [17]. The algorithm below provides a systematic approach to quantifying extreme risks beyond a specified threshold, which is essential for risk management and decision-making in various domains, including insurance, finance, and actuarial sciences:
1.
Gather the historical claims data, say X 1 , , X n , where each X i represents a claim amount.
2.
Choose a threshold U based on the desired risk level or severity of extreme events. This threshold should typically be higher than the majority of the claim amounts in the dataset.
3.
Determine all claim amounts X i that exceed the threshold U. Let X 1 , X 2 < < X k be the ordered exceedances, where k is the number of exceedances.
4.
Compute the VaR at a specified confidence level ε using the estimated parameters of α and β . The VaR at confidence level ε represents the threshold beyond which the risk of extreme events (exceedances) is considered.
5.
Finally, the PORT-VaR is estimated as the expected value of the exceedances above the threshold U based on the fitted WLi parameter.

8. Historical Claims Analysis

8.1. The MOOP(P) Assessments and Optimal Order of P

MOOP(P) assessments involve calculating the average behavior or central tendency of a dataset using different orders of P. Selecting the optimal P is crucial in determining the sensitivity and granularity of risk analysis. Here, we present an expanded explanation of MOOP(P) assessments and the importance of choosing the optimal P.
Table 3 provides the MOOP(P) assessment, including the true mean value (TMV), mean squared error (MSE), and bias for P = 1 , , 5 , n = 5000 and certain parameter values.
The findings in Table 3 (the first scenario) for α 0 = 1 , β 0 = 1 imply that
1.
The true mean value provides a baseline reference point for comparison with the estimated MOOP.
2.
The MOOP values decrease with increasing P, reflecting the increasing complexity and variability captured by higher-order moments. The corresponding MSE values show a slight improvement with higher P, thus indicating reduced error in estimation as more moments are considered.
3.
The bias decreases when P increases, thus suggesting more accurate estimation of the true mean with higher-order moments. The bias values converge towards the true mean when P increases.
4.
The results suggest that considering higher-order moments (increasing P) leads to a better estimation accuracy (lower MSE and bias) but with diminishing improvements beyond a certain point. This analysis highlights the trade-off between model complexity (capturing more variability) and estimation accuracy, emphasizing the importance of selecting an optimal order P based on the specific objectives and requirements of the analysis.
The findings in Table 3 (the second scenario) for α 0 = 2.5 , β 0 = 100 indicate that
1.
The true mean value provides a reference point for evaluating the performance of MOOP.
2.
Higher-order moments (larger P) captured more variability and detail in the dataset, as evidenced by the increasing values of MOOP.
3.
Both the MSE and bias decrease with increasing P, thus indicating improved accuracy and closer approximation to the true mean as more moments are considered.
4.
This analysis highlights the importance of selecting an optimal order P based on the trade-off between complexity (capturing more details) and accuracy (minimizing bias and MSE). In this case, P = 5 appears to provide a good balance between model complexity and estimation accuracy.
5.
The results demonstrate the impact of different moment orders on estimating the true mean value, with higher-order moments leading to improved accuracy but potentially increasing model complexity. The choice of P should be carefully considered based on the specific objectives and requirements of the analysis.
Based on the results in Table 3 (the third scenario) for α 0 = 100 , β 0 = 0.5 , it is noted that
1.
The decreasing trend in MSE and bias when P increases indicates that higher-order moments captured more of the dataset’s variability, leading to more accurate estimations of the true mean value.
2.
The results suggest that using moments of higher orders (P) improves estimation accuracy, with diminishing improvements when P increases. In this case, P = 5 provides a good balance between capturing variability and minimizing bias and MSE.
3.
These findings emphasize the importance of considering higher-order moments in statistical analyses, especially for capturing complex patterns and variability in datasets. Selecting the appropriate order of moments depends on the specific characteristics and goals of the analysis, balancing model complexity with the need for accurate estimation.
The results in Table 3 (the fourth scenario) for α 0 = 0.1 , β 0 = 0.1 imply that
1.
The extremely small values of MOOP, MSE, and bias suggest that the higher-order moments (P) captured minute variations and details in the dataset, leading to very accurate estimations of the true mean value.
2.
In this case, the diminishing improvements in MSE and bias when P increased suggest that, even with very high-order moments, the estimates were already very close to the true mean. This indicates that capturing higher-order variability may not significantly enhance the estimation in this specific context.
3.
These results demonstrate the high level of precision and accuracy achieved in estimating the true mean value using moments of varying orders. The findings underscore the importance of understanding the sensitivity of moment-based estimators to different levels of data complexity and variability.
The results in Table 3 (the fifth scenario) for α 0 = 1000 , β 0 = 100 demonstrate that
1.
The decreasing trend in MSE and bias as P increases suggests that higher-order moments (P) captured more of the variability and complexity in the dataset, leading to improved estimations of the true mean value.
2.
In this case, the diminishing MSE and bias with increasing P suggest that using higher-order moments ( P = 5 ) provides a good balance between capturing sufficient variability and minimizing estimation error.
3.
These results demonstrate the importance of considering higher-order moments in statistical analyses to capture complex patterns and variability in datasets. The findings highlight the trade-off between model complexity (higher P) and accuracy in estimating statistical parameters like the mean.
Across all scenarios, the MOOP values decreased with increasing P, thus indicating that higher-order moments capture more dataset variability and complexity. Correspondingly, the MSE and bias decreased with higher P, thus suggesting an improved estimation accuracy and closer approximation to the true mean value.

8.2. PORT-VaR Estimator for Extreme Claims

The PORT-VaR estimator allows insurers to focus on extreme events or “peaks” in claim sizes that exceed a specified threshold. This approach helps in understanding and quantifying tail risks, which are critical for insurers dealing with rare but potentially high-impact claims. Practically, insurance companies need to be prepared for extreme events that could lead to significant losses. The PORT-VaR estimator helps identify and quantify these tail risks, providing insights into the potential magnitude of losses beyond conventional risk measures. For this purpose, we present a comprehensive PORT-VaR analysis for extreme claims.
The results are given in Table 4, where certain confidence levels are considered, such as 50%, 70%, 75%, 80%, 85%, 90%, 95% and 99%. The relationship between the increasing confidence level and decreasing number of identified PORTs reflects a shift towards more conservative risk management, where stricter thresholds are applied to identify only the most significant or extreme claims in the dataset. This understanding helps in assessing risk tolerance and making informed decisions about risk management strategies based on the level of acceptable risk.
On the other hand, the MOOP | P = 1 , , 5 values were calculated for the claims data, where MOOP = 75,672 | P = 1 , 16,463.05 | P = 2 , 10,393.24 | P = 3 , 8441.958 | P = 4 , 7548.389 | P = 5 . They indicate that
1.
The MOOP values showed a decreasing trend as the order of P increased from 1 to 5. This trend suggests that higher-order moments (larger values of P) captured less variability and complexity in the dataset compared to lower-order moments.
2.
The significant drop in MOOP from P = 1 to 2 indicates that the addition of a second moment ( P = 2 ) reduced the mean substantially, thus suggesting that the second moment captured important aspects of the data distribution.
3.
If P increases more than 2, the reduction in MOOP becomes less pronounced, thus indicating diminishing returns in terms of capturing additional variability or complexity with higher-order moments.
4.
The choice of the optimal P depends on balancing model complexity with the need to capture essential characteristics of the data distribution. In this case, the significant drop in MOOP from P = 1 to 2 suggests that a model with P = 2 may provide a good compromise between simplicity and capturing variability.
Table 4 presents an analysis of extreme claims data using the PORT-VaR methodology for various confidence levels. Each row corresponds to a specific confidence level, and the columns provide summary statistics of the extreme claims that exceeded the threshold at each confidence level, where
1.
For a 50% confidence level, there were 14 extreme claims that exceeded the threshold. This indicates relatively common extreme claims occurrences within this confidence interval.
2.
If the confidence level increased, the number of extreme claims (PORT) also increased, ranging from 19 at a 70% confidence level to 27 at a 99% confidence level. This suggests a higher frequency of extreme claims events as we move towards higher confidence levels, highlighting the tail risk in the claims data.
3.
The minimum (Min) value of the extreme claims increased as the confidence level decreased. This implies that lower confidence levels captured less severe but more frequent extreme claims.
4.
The 1st quartile (25% percentile) of the extreme claims also increased with decreasing confidence levels, reflecting the distribution of less severe events.
5.
The median of the extreme claims (50% percentile) generally increased with decreasing confidence levels, representing the central tendency of extreme claim values at each level of risk.
6.
The mean of the extreme claims (ExV) showed a similar trend, thus increasing with decreasing confidence levels.
7.
The 3rd quartile (75% percentile) and maximum values of the extreme claims also followed a pattern of increasing values with decreasing confidence levels.
8.
The distribution of extreme claims varied across the different confidence levels, with lower confidence levels capturing more frequent but less severe events, and higher confidence levels focusing on rarer but more severe claims. The analysis emphasizes the importance of understanding tail risk in insurance, as higher confidence levels reveal the occurrence of more severe and less frequent claims that may have significant financial implications. These results can support decision-making processes within insurance companies, helping to assess the frequency and severity of extreme claims events and inform risk management strategies.
9.
The PORT-VaR analysis in Table 4 provides valuable insights into the distribution and characteristics of extreme claims data at different confidence levels. The detailed summary statistics can help quantify tail risk and support informed decision-making in insurance practices, aiding in risk assessment and mitigation strategies.
10.
The distribution of extreme claims across different confidence levels shows that lower confidence levels capture more frequent but less severe events, while higher confidence levels focus on rare but more severe claims. Understanding this tail risk is crucial in insurance, as it reveals severe and less frequent claims that may have substantial financial implications. The insights from this analysis can support decision-making processes within insurance companies, aiding in the assessment of extreme claims frequency and severity to inform risk management strategies.
Overall, the PORT-VaR analysis presented in Table 4 offers valuable insights into the distribution and characteristics of extreme claims data at varying confidence levels, facilitating informed risk assessment and mitigation strategies in insurance practices.

9. Conclusions

In the paper, a novel model called the weighted Lindley model was introduced to analyze extreme historical insurance claims. It generalized the classical Lindley distribution by introducing a weight parameter, which enhances the flexibility in modeling insurance claim severity. A comprehensive theoretical overview of the properties of WLi was provided. Additionally, two practical applications of the proposed model were explored. First, a study was conducted using the mean-of-order P (MOOP(P)) approach to quantify expected claim severity based on the WLi model. Second, a peaks over a random threshold analysis was implemented using the value-at-risk metric to assess extreme claim occurrences within the new model. Furthermore, a thorough simulation study was presented to assess the estimator performance under various estimation methods, comparing and contrasting these techniques. The proposed model and its applications can serve as a versatile tool for actuaries and risk analysts in analyzing and predicting extreme insurance claims severity, providing valuable insights for risk management and decision-making within the insurance industry. The analysis across multiple scenarios based on MOOP(P) assessments revealed key insights into the impact of moment order (P) on estimation accuracy and model complexity.
In each scenario with varying parameter values ( α 0 , β 0 ) and dataset sizes ( n = 5000 ), the following trends and observations emerged:
1.
MOOP values decrease with increasing P, thus indicating that higher-order moments capture more variability and complexity within a dataset. This suggests that higher P values lead to more detailed representations of the underlying distribution.
2.
Both mean squared error and bias decrease when P increases. This trend signifies improved accuracy in estimating the true mean value with higher-order moments. The bias converges towards the true mean when P increases, indicating a more accurate estimation.
3.
The results highlight the trade-off between model complexity and estimation accuracy. While higher p values generally lead to improved accuracy (lower MSE and bias), the improvements are reduced beyond a certain point. Selecting an optimal P involves balancing the need to capture dataset variability against increasing model complexity.
4.
The optimal P varies across scenarios based on parameter values and dataset characteristics. For instance, p = 5 often strikes a balance between capturing sufficient dataset detail and minimizing bias and MSE.
5.
Extremely small values of MOOP, MSE, and bias in certain scenarios suggest that very high-order moments may not significantly enhance estimation accuracy beyond a certain threshold of complexity.
6.
The findings underscore the importance of considering higher-order moments in statistical analyses, particularly for capturing complex patterns and variability in datasets.
7.
The choice of P should be tailored to specific analysis goals and dataset characteristics, ensuring an optimal balance between model complexity and estimation accuracy.
8.
The number of extreme claims exceeding the threshold increases with higher confidence levels, ranging from 14 at a 50% confidence level to 27 at a 99% confidence level. This highlights the higher occurrence of extreme events as the risk level increases.
9.
As confidence levels decrease, the minimum value of extreme claims increases, indicating less severe but more frequent events being captured at lower confidence levels.
10.
The first and third quartiles of extreme claims values increase with decreasing confidence levels, reflecting a shift towards less severe but more frequent events.
11.
The median and mean values of extreme claims also increase with decreasing confidence levels, indicating a shift towards higher values and greater severity at lower confidence levels.
12.
Lower confidence levels capture more frequent but less severe events, while higher confidence levels focus on rarer but more severe claims. Understanding this tail risk is crucial in insurance for assessing potential financial impacts.
13.
The analysis provides valuable insights for insurance decision-making, facilitating risk assessment and mitigation strategies by quantifying the frequency and severity of extreme claims at different confidence levels.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/stats8010008/s1.

Author Contributions

Conceptualization, M.A. (Mahmoud Afshari), M.A. (Morad Alizadeh) and G.M.C.; methodology, M.A. (Morad Alizadeh), M.A. (Mahmoud Afshari), G.M.C. and H.M.Y.; software, Z.R., F.D. and H.M.Y.; validation, J.E.C.-R. and H.M.Y.; formalanalysis, M.A. (Morad Alizadeh), M.A. (Mahmoud Afshari), G.M.C. and H.M.Y.; investigation, M.A. (Morad Alizadeh), M.A. (Mahmoud Afshari), G.M.C. and H.M.Y.; resources, J.E.C.-R., M.A. (Mahmoud Afshari) and H.M.Y.; data curation, Z.R., F.D. and H.M.Y.; writing—original draft preparation, M.A. (Morad Alizadeh), M.A. (Mahmoud Afshari), G.M.C. and H.M.Y.; writing—review and editing, J.E.C.-R. and H.M.Y.; visualization, Z.R., F.D. and H.M.Y.; supervision, J.E.C.-R. and H.M.Y.; project administration, M.A. (Mahmoud Afshari); funding acquisition, J.E.C.-R., M.A. (Mahmoud Afshari) and H.M.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the Supplementary Material. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare that they have no known conflicting/competing interests that could have appeared to influence the work reported in this paper.

References

  1. Atikankul, Y.; Thongteeraparp, A.; Bodhisuwan, W. The new Poisson mixed weighted Lindley distribution with applications to insurance claims data. Songklanakarin J. Sci. Technol. 2020, 42, 152–162. [Google Scholar]
  2. Gel, Y.R.; Gastwirth, J.L. The Lindley distribution and its applications to mixture modeling. J. Data Sci. 2008, 6, 575–590. [Google Scholar]
  3. Manesh, S.N.; Hamzah, N.A.; Zamani, H. Poisson-weighted Lindley distribution and its application on insurance claim data. AIP Conf. Proc. 2014, 1605, 834–839. [Google Scholar]
  4. Hogg, R.V.; Klugman, S.A. Loss Distributions; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
  5. Klugman, S.A.; Panjer, H.H.; Willmot, G.E. Loss Models: From Data to Decisions; John Wiley & Sons: Hoboken, NJ, USA, 2012; Volume 715. [Google Scholar]
  6. Alizadeh, M.; Afshari, M.; Contreras-Reyes, J.E.; Mazarei, D.; Yousof, H.M. The Extended Gompertz Model: Applications, Mean of Order P assessment and Statistical Threshold Risk Analysis Based on Extreme Stresses Data. IEEE Trans. Reliab. 2024, in press. [CrossRef]
  7. Marinho, P.R.D.; Silva, R.B.; Bourguignon, M.; Cordeiro, G.M.; Nadarajah, S. AdequacyModel: An R package for probability distributions and general purpose optimization. PLoS ONE 2019, 14, e0221487. [Google Scholar] [CrossRef] [PubMed]
  8. Anderson, T.W.; Darling, D.A. Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. Ann. Math. Stat. 1952, 23, 193–212. [Google Scholar] [CrossRef]
  9. Alizadeh, M.; Emadi, M.; Doostparast, M. A new two-parameter lifetime distribution: Properties, applications and different methods of estimations. Stat. Optim. Inf. Comput. 2019, 7, 291–310. [Google Scholar] [CrossRef]
  10. Ghitany, M.E.; Al-Mutairi, D.K.; Balakrishnan, N.; Al-Enezi, L.J. Power Lindley distribution and associated inference. Comput. Stat. Data Anal. 2013, 64, 20–33. [Google Scholar] [CrossRef]
  11. Ghitany, M.E.; Atieh, B.; Nadarajah, S. Lindley distribution and its application. Math. Comput. Simul. 2008, 78, 493–506. [Google Scholar] [CrossRef]
  12. Gupta, R.D.; Kundu, D. Theory and Methods: Generalized exponential distributions. Aust. N. Z. J. Stat. 1999, 41, 173–188. [Google Scholar] [CrossRef]
  13. Nadarajah, S.; Haghighi, F. An extension of the exponential distribution. Statistics 2011, 45, 543–558. [Google Scholar] [CrossRef]
  14. Murthy, D.P.; Xie, M.; Jiang, R. Weibull Models; John Wiley & Sons: Hoboken, NJ, USA, 2004. [Google Scholar]
  15. Liu, J.; Hamrouni, A.; Wolowiec, D.; Coiteux, V.; Kuliczkowski, K.; Hetuin, D.; Saudemont, A.; Quesnel, B. Plasma cells from multiple myeloma patients express B7-H1 (PD-L1) and increase expression after stimulation with IFN-γ and TLR ligands via a MyD88-, TRAF6-, and MEK-dependent pathway. Blood J. Am. Soc. Hematol. 2007, 110, 296–304. [Google Scholar] [CrossRef] [PubMed]
  16. Rice, J.A. Mathematical Statistics and Data Analysis; Thomson/Brooks/Cole: Belmont, CA, USA, 2007; Volume 371. [Google Scholar]
  17. Szubzda, F.; Chlebus, M. Comparison of Block Maxima and Peaks Over Threshold Value-at-Risk models for market risk in various economic conditions. Cent. Eur. Econ. J. 2019, 6, 70–85. [Google Scholar] [CrossRef]
Figure 1. Plots of the NWLi density.
Figure 1. Plots of the NWLi density.
Stats 08 00008 g001
Figure 2. Plots of the NWLi HRF.
Figure 2. Plots of the NWLi HRF.
Stats 08 00008 g002
Figure 3. 3D plots of (a) mean, (b) variance, (c) skewness and (d) kurtosis of W L i ( α , β ) . These plots show the effect of parameters on these measures.
Figure 3. 3D plots of (a) mean, (b) variance, (c) skewness and (d) kurtosis of W L i ( α , β ) . These plots show the effect of parameters on these measures.
Stats 08 00008 g003
Figure 4. Findings from the methods under ( α , β ) = ( 0.6 , 0.5 ) .
Figure 4. Findings from the methods under ( α , β ) = ( 0.6 , 0.5 ) .
Stats 08 00008 g004
Figure 5. Findings from the methods under ( α , β ) = ( 0.5 , 0.4 ) .
Figure 5. Findings from the methods under ( α , β ) = ( 0.5 , 0.4 ) .
Stats 08 00008 g005
Figure 6. Estimated NWLi density for failure times.
Figure 6. Estimated NWLi density for failure times.
Stats 08 00008 g006
Figure 7. Estimated NWLi density for survival data.
Figure 7. Estimated NWLi density for survival data.
Stats 08 00008 g007
Table 1. Findings from the fitted models and adequacy measures. Lowest values are marked in bold.
Table 1. Findings from the fitted models and adequacy measures. Lowest values are marked in bold.
ModelMLEWAAICBIC
(SE)
NWLi ( α , β ) 3.96 0.382 0 . 0426 0 . 351 197 . 629 201 . 453
( 0.9 ) ( 0.05 )
LN ( α , β ) 0.58 1.8 0.18 1.13 210.068 213.89
( 0.25 ) ( 0.18 )
NH ( α , β ) 3.25 0.35 0.176 1.09 210.24 214.067
( 1.91 ) ( 0.07 )
Ga ( α , β ) 0.545 0.163 0.148 0.948 208.87 212.69
( 0.090 ) ( 0.040 )
PL ( α , β ) 0.93 0.58 0.156 0.983 209.49 213.31
( 0.122 ) ( 0.06 )
W ( α , β ) 0.54 0.66 0.152 0.954 208.73 212.55
( 0.099 ) ( 0.074 )
GE ( α , β ) 0.194 1.536 0.147 0.941 208.745 212.569
( 0.043 ) ( 0.09 )
Li ( α ) 0.45 0.172 1.111 242.36 244.27
( 0.513 )
NOLL-HL ( α , β ) 0.69 0.246 0.103 0.96 204.62 208.44
( 0.112 ) ( 0.044 )
Table 2. Findings from the fitted models and adequacy measures. Lowest values are marked in bold.
Table 2. Findings from the fitted models and adequacy measures. Lowest values are marked in bold.
ModelMLEWAAICBIC
(SE)
NWLi ( α , β ) 0.206 0.060 0 . 034 0 . 242 404 . 042 407 . 784
( 0.036 ) ( 0.009 )
LN ( α , β ) 2.647 1.149 0.043 0.366 407.53 411.272
( 0.166 ) ( 0.117 )
NH ( α , β ) 0.047 0.923 0.068 0.418 407.042 410.518
( 0.027 ) ( 0.308 )
Ga ( α , β ) 1.045 0.043 0.074 0.450 406.768 410.510
( 0.188 ) ( 0.010 )
PL ( α , β ) 0.184 0.7478 0.069 0.431 407.139 410.882
( 0.046 ) ( 0.071 )
W ( α , β ) 0.040     1.007 0.0747 0.453 604.832 410.565
( 0.017 ) ( 0.112 )
GE ( α , β ) 0.042 1.053 0.074 0.448 406.755 410.497
( 0.008 ) ( 0.202 )
Li ( α ) 0.079 0.094 0.567 416.208 418.170
( 0.008 )
Table 3. MOOP(P) assessment.
Table 3. MOOP(P) assessment.
P 12345
    α 0 = 1 , β 0 = 1
TMV0.1390
MOOP5.852 × 10 5 0.00020.00030.00040.0005
MSE0.01930.01930.01930.01930.0193
Bias0.13900.13900.13890.13890.1389
    α 0 = 2.5 , β 0 = 100
TMV0.5491
MOOP5.852 × 10 5 0.00020.00030.00040.0005
MSE0.30150.30130.30110.30100.3009
Bias0.54910.54890.54880.54870.5485
    α 0 = 100 , β 0 = 0.5
TMV0.5194
MOOP0.00320.01280.01850.02370.0298
MSE0.26640.25660.25080.24560.2397
Bias0.51620.50650.50080.49560.4896
    α 0 = 0.1 , β 0 = 0.1
TMV0.002
MOOP2.9699 × 10 7 1.2063 × 10 6 1.7511 × 10 6 2.2556 × 10 6 2.8527 × 10 6
MSE8.4937 × 10 6 8.4884 × 10 6 8.4853 × 10 6 8.4823 × 10 6 8.4789 × 10 6
Bias0.00290.00290.00290.00290.0029
    α 0 = 1000 , β 0 = 1000
TMV0.9988
MOOP0.03260.13040.18770.23880.2949
MSE0.93350.75390.65770.57760.4954
Bias0.96610.86830.81100.76000.7038
Table 4. PORT-VaR analysis for extreme claims data.
Table 4. PORT-VaR analysis for extreme claims data.
CLsNumber of PORTMin.1st Qu.MedianExV3rd Qu.Max.
50%14232035593966404443406283
70%19171222993702351842226283
75%21132022663511331841506283
80%22123820843483322441136283
85%23118019843455313540766283
90%2595617123215296540016283
95%2662915702768287539846283
99%2758714212320279039666283
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alizadeh, M.; Afshari, M.; Cordeiro, G.M.; Ramaki, Z.; Contreras-Reyes, J.E.; Dirnik, F.; Yousof, H.M. A New Weighted Lindley Model with Applications to Extreme Historical Insurance Claims. Stats 2025, 8, 8. https://doi.org/10.3390/stats8010008

AMA Style

Alizadeh M, Afshari M, Cordeiro GM, Ramaki Z, Contreras-Reyes JE, Dirnik F, Yousof HM. A New Weighted Lindley Model with Applications to Extreme Historical Insurance Claims. Stats. 2025; 8(1):8. https://doi.org/10.3390/stats8010008

Chicago/Turabian Style

Alizadeh, Morad, Mahmoud Afshari, Gauss M. Cordeiro, Ziaurrahman Ramaki, Javier E. Contreras-Reyes, Fatemeh Dirnik, and Haitham M. Yousof. 2025. "A New Weighted Lindley Model with Applications to Extreme Historical Insurance Claims" Stats 8, no. 1: 8. https://doi.org/10.3390/stats8010008

APA Style

Alizadeh, M., Afshari, M., Cordeiro, G. M., Ramaki, Z., Contreras-Reyes, J. E., Dirnik, F., & Yousof, H. M. (2025). A New Weighted Lindley Model with Applications to Extreme Historical Insurance Claims. Stats, 8(1), 8. https://doi.org/10.3390/stats8010008

Article Metrics

Back to TopTop