Next Article in Journal
Robust Minimum-Cost Consensus Model with Non-Cooperative Behavior: A Data-Driven Approach
Previous Article in Journal
Fractional-Order Backstepping Approach Based on the Mittag–Leffler Criterion for Controlling Non-Commensurate Fractional-Order Chaotic Systems Under Uncertainties and External Disturbances
Previous Article in Special Issue
Forecasting Forex EUR/USD Closing Prices Using a Dual-Input Deep Learning Model with Technical and Fundamental Indicators
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New G Family: Properties, Characterizations, Different Estimation Methods and PORT-VaR Analysis for U.K. Insurance Claims and U.S. House Prices Data Sets

1
Department of Quantitative Methods, School of Business, King Faisal University, Al Ahsa 31982, Saudi Arabia
2
Department of Mathematical and Statistical Sciences, Marquette University, Marquette, WI 53233, USA
3
Department of Mathematics and Statistics, College of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11432, Saudi Arabia
4
Department of Statistics, Mathematics and Insurance, Benha University, Benha 13511, Egypt
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(19), 3097; https://doi.org/10.3390/math13193097
Submission received: 21 August 2025 / Revised: 18 September 2025 / Accepted: 23 September 2025 / Published: 26 September 2025
(This article belongs to the Special Issue Statistical Methods for Forecasting and Risk Analysis)

Abstract

This paper introduces a new class of probability distributions, termed the generated log exponentiated polynomial (GLEP) family, designed to enhance flexibility in modeling complex real financial data. The proposed family is constructed through a novel cumulative distribution function that combines logarithmic and exponentiated polynomial structures, allowing for rich distributional shapes and tail behaviors. We present comprehensive mathematical properties, including useful series expansions for the density, cumulative, and quantile functions, which facilitate the derivation of moments, generating functions, and order statistics. Characterization results based on the reverse hazard function and conditional expectations are established. The model parameters are estimated using various frequentist methods, including Maximum Likelihood Estimation (MLE), Cramer–von Mises (CVM), Anderson–Darling (ADE), Right Tail Anderson–Darling (RTADE), and Left Tail Anderson–Darling (LEADE), with a comparative simulation study assessing their performance. Risk analysis is conducted using actuarial key risk indicators (KRIs) such as Value-at-Risk (VaR), Tail Value-at-Risk (TVaR), Tail Variance (TV), Tail Mean Variance (TMV), and excess function (EL), demonstrating the model’s applicability in financial and insurance contexts. The practical utility of the GLEP family is illustrated through applications to real and simulated datasets, including house price dynamics and insurance claim sizes. Peaks Over Random Threshold Value-at-Risk (PORT-VaR) analysis is applied to U.K. motor insurance claims and U.S. house prices datasets. Some recommendations are provided. Finally, a comparative study is presented to prove the superiority of the new family.
MSC:
62E10; 62F10; 62G30; 62P05; 91B30; 62N05; 62F12; 62G32

1. Introduction

In recent years, significant advancements have been made in the development of generalized statistical distributions to better capture the complexities of real data across various domains such as finance, insurance, medicine, and engineering. These efforts have focused on enhancing classical models by introducing additional shape parameters or combining existing distribution families to improve flexibility, accuracy, and applicability. One effective approach involves transforming a baseline cumulative distribution function (CDF), G x , using carefully designed generator functions.
The last decade has witnessed a growing interest in constructing flexible and generalized families of probability distributions. Such developments are driven by the need to overcome the limitations of classical models, which often fail to capture complex real data behaviors. By incorporating additional parameters, these new distributions provide enhanced control over shape, skewness, and tail heaviness. This makes them more adaptable for handling diverse types of data encountered in applied fields. Beyond pure theory, a key motivation behind these advances lies in practical applications, particularly in risk modeling. Areas such as insurance, finance, and economics demand models that can effectively represent uncertainty and extreme events. The introduction of new distributional forms therefore helps to bridge the gap between theory and practice. In this study, building on the framework proposed by Hashim et al. [1], we present the GLEP family of continuous distributions. This family is uniquely defined through a blend of logarithmic and exponential transformations, which enrich its flexibility. Overall, the GLEP model represents a step toward developing more versatile tools for statistical analysis and risk assessment. The proposed model is defined by a CDF
F x ; α , β = C log 1 + G x α e G x β , x R ,
with the corresponding probability density function (PDF)
f x ; α , β = C e G x β π β x log 1 + G x α + π α x 1 + G x α , x R ,
where α > 0 , β > 0 are parameters, and C = 1 e log 2 is a normalization. Noting that the support of the GLEP family depends on the support of the baseline distribution G ( x ) . Equation (2) can also be expressed as
f x ; α , β = C g x G x β 1 e G x β P x , x R ,
where
P x = β log 1 + G x α + α 1 + G x α G x α β .
The mode of (3) exists and is unique under mild conditions but must be found numerically in general. The GLEP family does not preserve the baseline hazard function or its tail index in a strict sense. Let G x β = Π β x and G x α = Π α x as the experimented baseline CDFs with the corresponding PDFs: π β x = d Π β x / d x = β g x G x β 1 and π α x = d Π α x / d x = α g x G x α 1 . While the baseline CDF G x and PDF g x form the foundation, the multiplicative transformation C g x G x β 1 e G x β P x fundamentally alters both the hazard rate and the asymptotic tail behavior compared to the baseline distribution. We now provide a rigorous analysis of the left ( x ) and right ( x + ) tail behavior of the GLEP PDF given by Equation (3).
As x + (the right tail):
For most common baseline distributions (e.g., Weibull, Exponential, Gamma), G ( x ) 1 as x + . Then,
G x β 1 = 1 , e G x β = e , log 1 + G x α = log 2 , α G x α β 1 + G x α = 1 2 α .
Therefore, the term P x β log 2 + 1 2 α which is a positive constant. Since g ( x ) 0 at the same rate as the baseline distribution’s right tail, we conclude that
f x ; α , β g x K r i g h 𝓉   a s   x + .
where
K r i g h 𝓉 = C e β log 2 + 1 2 α = e β + α 2 log 2 .
This means the right tail of the GLEP models decays at the same rate as the baseline distribution PDF g ( x ) , scaled by the constant K r i g h 𝓉 . The effective tail index which is a measure of decay speed is therefore preserved from the baseline. This property is highly valuable for modeling extreme events like large insurance claims or high house prices, where the baseline’s ability to capture heavy tails remains intact.
As x (the left tail):
P ( x ) G ( x ) α + α G ( x ) α = α + β G ( x ) α .
Therefore, the PDF behaves as:
f x ; α , β C α + β g x G ( x ) α + β 1   as   x .
This shows that the left tail behavior is not determined by the baseline alone. It is governed by the exponent ( α + β 1 ). The left tail of the GLEP distribution will be heavier than the baseline if ( α + β 1 ) < 0 , lighter if ( α + β 1 ) > 0 , and comparable if ( α + β 1 ) = 0 (which occurs when α + β = 1 ). This decoupling of left and right tail control is a key feature of the model. The GLEP family does not preserve the baseline hazard function. However, the flexibility introduced by the parameters α and β allows the GLEP hazard function to exhibit a rich variety of shapes, increasing, decreasing, and bathtub-shaped, that may be more suitable for real data than the often-restrictive shapes of the baseline hazard.
These characteristics are highly valuable in modeling insurance and financial price data. Insurance claim sizes often exhibit heavy right tails and skewness, and the model’s ability to inherit the tail index of the baseline ensures accurate estimation of extreme losses and ruin probabilities. The left-tail flexibility allows for effective modeling of small claims or zero-inflated data common in insurance portfolios. In pricing data, the mode and hazard function behavior help capture frequent price changes or policy renewals. The preservation of the baseline hazard function supports realistic survival analysis in claim reporting delays. Moreover, the closed-form PDF and interpretable parameters enable efficient maximum likelihood estimation. The model’s adaptability to both light- and heavy-tailed baselines makes it suitable for diverse datasets, from auto insurance claims to stock returns. By decoupling left- and right-tail behaviors, it offers superior fit compared to classical distributions. These features enhance risk assessment, reserve calculation, and premium pricing. The mathematical tractability of moments and quantiles further support VaR, TVaR, TV, TMV and EL computations (for more details about these key risk indicators (KRIs), see Artzner [2]; Hogg and Klugman [3], Tasche [4] and Acerbi [5]). However, the PORT-VaR methods is employed according to the algorithm of Figueiredo et al. [6].
We rigorously prove that the GLEP CDF F x ; α , β is strictly increasing in x because it is the product of positive, strictly increasing functions: log 1 + G x α and e G x β , both derived from the baseline CDF G ( x ) , which is assumed strictly increasing. This guarantees a unique inverse quantile function F 1 ( p ) for all p ( 0,1 ) , ensuring numerical inversion (e.g., via bisection or Newton-Raphson) is well-posed and converges reliably to the true quantile. The absence of a closed-form quantile does not imply computational instability; monotonicity validates all numerical procedures used for simulation, risk measurement (VaR, TVaR), and parameter estimation. This foundational proof corrects an implicit assumption and is essential for reproducibility. To prove monotonicity and robust quantile inversion for GLEP family see Appendix A.

2. Properties and KRIs

2.1. Useful Expansions

By expanding e G x β , the new CDF can be expressed as
F x ; α , β = C log 1 + G x α k = 0 + 1 k ! G x β k , x R .
Then, by expanding log 1 + G x α , we have
log 1 + G x α = j = 1 + 1 j ! 1 1 + j G x α j
Inserting (6) into (5), the new CDF can be simplified as
F x ; α , β = C j = 1 + k = 0 + 1 k ! j ! 1 1 + j G x α j + β k ,   x R .
F x ; α , β = k = 0 + j = 1 + d k , j Π k , j x , x R ,
where
d k , j = C 1 k ! j ! 1 1 + k ,
and Π k , j x = G x α j + β k refers to the CDF of the exponentiated G family. By differentiating (7), we have
f x ; α , β = k = 0 + j = 1 + d k , j   π k , j x , x R ,
where
π k , j x = d Π k , j x / d x = α j + β k g x G x α j + β k 1 ,
which refers to the PDF of the exponentiated G family. To summarize, we say that Equation (8) can be used to derive most of the mathematical properties of the underlying distribution to be studied. The name arises because the GLEP family’s density function, after expansion (Equation (8)), is an infinite series of weighted terms of the form G x α j + β k . Here, α j + β k represents a polynomial combination of the parameters α , β and the summation indices j , k , exponentiating the baseline CDF G x . This unique structure, where exponents are polynomial functions of the parameters, motivated the name “exponentiated polynomial”.

2.2. Quantile Function

The quantile function (QF X ) of X can be determined by inverting F ( x ) = p in (1), where
p e log 2 = log 1 + G x α e G x β ,
Equation (9) cannot be solved analytically for p in closed form due to the combination of logarithmic and exponential terms involving powers of p . Therefore, the quantile function cannot be expressed in closed form in general. In such cases, approximate numerical solutions can be used to find numerical solutions for functions that do not have a closed form. It is worth noting that new statistical software packages have contributed significantly to overcoming this problem, such as Mathcad and R.
A common approach is to employ iterative root-finding methods such as the Newton–Raphson algorithm, which starts with an initial guess and iteratively updates it. This method converges rapidly when good starting values are chosen. Other alternatives include the bisection method, which is slower but guarantees convergence, and secant-type methods, which balance speed and stability. The numerical solution of quantiles has been made increasingly practical with modern computational tools. Software environments such as R (see the R code in Appendix B), MATLAB R 4.2.1, Python 3.13.7 (with libraries like SciPy), and Mathcad 15 provide built-in solvers for nonlinear equations under the Newton-Raphson algorithm. These tools allow practitioners to approximate the quantile function with high accuracy, making it feasible to use in simulation studies, reliability analysis, and risk modeling. Thus, even in the absence of a closed-form representation, the QF X remains a powerful tool for both theoretical exploration and applied statistics. Random variates for the simulations were generated by first generating uniform random numbers U U n i f o r m ( 0,1 ) and then numerically inverting the GLEP Weibull CDF F ( x ; α , β , λ ) = U using the robust hybrid bisection-Newton root-finding scheme described in the paper (initialized with the baseline Weibull quantile), thereby obtaining samples X = F 1 ( U ) .

2.3. Moments

Let Y k , j be a rv having density w k , j ( x ) . The r 𝓉 h ordinary moment of X , say μ r , follows from (8) as
μ r = E X r = k = 0 + j = 1 + d k , j   E Y k , j r ,
where
E ( Y k , j r ) = α j + β k x r g ( x ) G ( x ) α j + β k 1 d x
can be evaluated numerically in terms of the baseline qf
Q G ( u ) = G 1 ( u )
as
E ( Y k , j n ) = α j + β k 0 1 Q G ( u ) n u α j + β k 1 d u .
Setting r = 1 in (10) gives the mean of X .

2.4. Incomplete Moments

The r 𝓉 h incomplete moment of X is given by
m r ( y ) = y x r f x ; α , β d x .
Using (8), the r 𝓉 h incomplete moment of GLEP family is
m r ( y ) = k = 0 + j = 1 + d k , j   m r , α j + β k ( y ) ,
where
m r , α j + β k ( y ) = 0 G ( y ) Q G r ( u ) u α j + β k 1   d u .
The m r , k + j ( y ) can be calculated numerically by using the software such as Matlab R 4.2.1, Mathematica 11 etc.

2.5. Moment-Generating Function

The moment-generating function (MGF) of X , say M ( 𝓉 ) = E ( e 𝓉 X ) , is obtained from (8) as
M ( 𝓉 ) = k = 0 + j = 1 + d k , j   M α j + β k 𝓉 ,
where M α j + β k ( 𝓉 ) is the generating function of Y α j + β k given by
M α j + β k ( 𝓉 ) = α j + β k e 𝓉 x g ( x ) G ( x ) α j + β k 1 d x = α j + β k 0 1 e x p [ 𝓉 Q G ( u ; α j + β k ) ] u α j + β k 1 d u .
The last two integrals can be computed numerically for most parent distributions.

2.6. KRIs

Definition 1.
VaR at confidence level (CL) p (e.g., 90 % ) is the threshold value such that the probability of a loss exceeding this value is ( 1 p ). In other words, it is the p 𝓉 h quantile of the loss distribution.
Then, based on Definition 1, we can simply write for the GLEP family:
P r X > Q F X = 10 % | p = 90 % .
The VaR for the GLEP family is obtained by numerically solving the CDF equation (Equation (1) or (9)) for X . From (12), for a one-year time when p = 90 % , the interpretation is that there is only a very small chance ( 10 % ) that the insurance company will be bankrupted by an adverse outcome over the next year. The quantity VaR does not satisfy one of the four criteria for coherence (see Artzner [2]).
Definition 2.
TVaR (also known as Conditional Value-at-Risk, CVaR, or Expected Shortfall, ES) at confidence level p is the expected value of the loss, given that the loss exceeds the VaR at that same level. It represents the average severity of losses beyond the VaR threshold.
Then, based on Definition 2, the TVaR can be expressed as
TVaR X = E X | X > π p = 1 1 F V _ π p π q x f x ; α , β d x = 1 1 p π p x   f x ; α , β d x .
The TVaR X can also be expressed as
TVaR   X = VaR   X + e VaR X ,
where e VaRq X is the EL X evaluated at the 100 p % th quantile (see Tasche [4] and Acerbi [5]).
Definition 3.
TV at confidence level p is the variance of the loss distribution, given that the loss exceeds the VaR at that level. It measures the dispersion or spreads of losses in the extreme tail. The TV can be expressed as
TV ( X ) = E X 2 | X > π p TVaR ( X ) 2 .
Definition 4.
TMV at confidence level p is a risk measure that combines the TVaR and the TV. It is often defined as.
The TMV risk indicator can then be expressed as
TMV ( X ) = TVaR ( X ) + a TVq ( X ) | 0 < π < 1 .
Then, TMV ( X ) > TV ( X ) and, for a = 1 , TMV ( X ) = TVaR ( X ) .
Definition 5.
PORT-VaR is a method inspired by extreme value theory and the peaks over a random threshold (PORT) approach. Instead of fixing a single high threshold (like VaR at 95%), it analyzes the distribution of exceedances (peaks) over a series of varying VaR thresholds (e.g., VaR at 55%, 60%,…, 95%).
The following steps can be used:
  • Choose a range of confidence levels for VaR, typically from a lower bound (e.g., 55%) up to a high bound (e.g., 95%). The paper uses 55% to 95% in increments.
  • For each selected confidence level CL, calculate the corresponding VaR value using the fitted GLEP model as described in Definition 1.
  • Extract all observed data points from the dataset that are greater than the VaR threshold for that specific CL. These are the “peaks” or “excesses”.
  • For the set of peaks identified in step 3, calculate key descriptive statistics:
    • Min.v: Minimum value among the peaks.
    • 1st Qu.: First quartile (25th percentile) of the peaks.
    • Median: Median (50th percentile) of the peaks.
    • Mean: Arithmetic mean of the peaks.
    • 3rd Qu.: Third quartile (75th percentile) of the peaks.
    • Max.v: Maximum value among the peaks.

3. Characterizing the New Family

This section deals with various characterizations of the proposed distribution. These characterizations are based on the following: i a simple relationship between two truncated moments and i i reverse hazard function. It should be mentioned that for the characterization, i i i the cumulative distribution function need not have a closed form and depends on the solution of a first order differential equation, which provides a bridge between probability and differential equation.

3.1. Characterizations Based on a Simple Relationship Between Two Truncated Moments

In this subsection, we present characterizations of the GLEP distributions, in terms of a simple relationship between two truncated moments. Our first characterization result employs a theorem due to Glänzel [7]; see Theorem 1 below. Note that the result holds also when the interval H is not closed. Moreover, it could also be applied when the cdf F does not have a closed form. As shown in Glänzel [8], this characterization is stable in the sense of weak convergence.
Theorem 1.
Let Ω , F , P be a given probability space and let H = d , e be an interval for some d < e   d = ,   e =   might   as   well   be   allowed . Let X : Ω H be a continuous random variable with the distribution function F and let q 1 and q 2 be two real functions defined on H , such that
E q 2 X   |   X x = E q 1 X   |   X x η x ,   x H ,
is defined with some real function η . Assume that q 1 , q 2 C 1 H , η C 2 H and F is twice continuously differentiable and a strictly monotone function on the set H . Finally, assume that the equation η q 1 = q 2 has no real solution in the interior of H . Then, F is uniquely determined by the functions q 1 , q 2 and η , particularly
F x = a x C η u η u q 1 u q 2 u e x p s u    d u   ,
where the function s is a solution of the differential equation s = η   q 1 η   q 1     q 2 and C is the normalization constant, such that H d F = 1 .
Remark 1.
The goal is to make η x as simple as possible.
Proposition 1.
Let X : Ω R be a continuous random variable and let q 1 x = P x 1 and q 2 x = q 1 x e G x β for x R . The random variable X has PDF 3 if and only if the function η defined in Theorem 1 has the form
η x = 1 2 e + e G x β ,   x R .
Proof. 
Let X be a random variable with PDF 3 , then
1 F x E q 1 X   |   X x = x C   g u G u β 1 e G u β d u = C β e e G x β ,   x R ,
and
1 F x E q 2 X   |   X x = x C   g u G u β 1 e 2 G u β d u = C 2 β e 2 e 2 G x β ,   x R ,
and finally
η x q 1 x q 2 x = q 1 x 2 e e G x β > 0    f o r   x R .
Conversely, if η is given as above, then
s x = η x q 1 x η x q 1 x q 2 x = β g x G x β 1 e G x β e e G x β ,   x R ,
and hence
s x = log e e G x β ,   x R .
Now, in view of Theorem 1, X has density 3 .
Corollary 1.
Let X : Ω R be a continuous random variable and let q 1 x be as in Proposition 1. The PDF of X is 3 if and only if there exist functions q 2 and η defined in Theorem 1 satisfying the differential equation
η x q 1 x η x q 1 x q 2 x = β g x G x β 1 e G x β e e G x β ,   x R .
Corollary 2.
The general solution of the differential equation in Corollary 1 is
η x = e e G x β 1 β   g x G x β 1 e G x β q 1 x 1 q 2 x d x + D ,
where D is a constant.
Proof. 
If X has PDF 3 , then clearly the differential equation holds. Now, if the differential equation holds, then
η x = β g x G x β 1 e G x β e e G x β η x β g x G x β 1 e G x β e e G x β q 1 x 1 q 2 x ,
or
η x β g x G x β 1 e G x β e e G x β η x = β g x G x β 1 e G x β e e G x β q 1 x 1 q 2 x ,
or
d d x e e G x β η x = β g x G x β 1 e G x β q 1 x 1 q 2 x ,
from which we arrive at
η x = e e G x β 1 β g x G x β 1 e G x β q 1 x 1 q 2 x d x + D .
Note that a set of functions satisfying the differential equation in Corollary 1 is given in Proposition 1 with D = e 2 2 . However, it should also be noted that there are other triplets q 1 , q 2 , η satisfying the conditions of Theorem 1. □

3.2. Characterization in Terms of the Reverse (Or Reversed) Hazard Function

The reverse hazard function, r F , of a twice differentiable distribution function, F , is defined as
r F x = f x F x ,   x support   of   F .
In this subsection, we present characterizations of five distributions in terms of the reverse hazard function.
Proposition 2.
Let X : Ω R be a continuous random variable. The random variable X has PDF 3 if and only if its reverse hazard function r F x satisfies the following differential equation
r F x β 1 g x G x r F x = G x β 1 d d x g x P x log 1 + G x α ,   x R ,
with boundary condition lim x r F x = β + α 2 log 2 lim x g x .
Proof. 
Multiplying both sides of the above equation by G x β 1 , we have
d d x G x β 1 r F x = d d x g x P x log 1 + G x α ,
or
r F x = G x β 1 g x P x log 1 + G x α ,
which is the reverse hazard function corresponding to the PDF 3 . □

4. The GLEP Weibull Case

The GLEP Weibull distribution is derived by substituting the baseline CDF of the Weibull distribution into the general GLEP framework. The Weibull distribution, known for its versatility in modeling time-to-event data, is defined by the CDF G x = 1 e x λ . Then, the CDF of the GLEP Weibull can be expressed as
F x ; α , β , λ = C log 1 + 1 e x λ α e 1 e x λ β ,   x > 0 .
The new PDF can be easily derived by differentiating (12). The right panel of Figure 1 (right panel) shows the PDFs of the GLEP Weibull distribution for different parameter combinations. It highlights how flexible the model is in taking on various shapes, ranging from skewed to almost symmetric, and from unimodal to decreasing forms. By adjusting parameters like α , β , and λ , the distribution can model data with different peaks and tail behaviors. Some curves show a sharp peak with rapid decay, while others are more spread out. This visual variety suggests the model can fit real data with complex patterns, such as insurance claims or house prices. The ability to capture both heavy tails and sharp modes is especially useful in risk modeling. You can see that small changes in parameters lead to noticeably different shapes, which is great for fine-tuning fits. Figure 1 (right panel) displays the hazard rate functions (HRFs) of the GLEP–Weibull distribution for various parameter combinations, showcasing its remarkable flexibility in modeling different failure rate shapes. The plots exhibit increasing, decreasing, and bathtub-shaped hazard rates, which are commonly observed in reliability and survival analysis. This versatility makes the model suitable for diverse applications, from engineering systems to biological lifetimes. The ability to capture bathtub-shaped hazards, common in early failure periods, is particularly valuable. Different values of α ,   β and λ significantly influence the shape and scale of the hazard function. For instance, higher β and λ tend to produce steeper increasing hazard rates. Conversely, lower values allow for more gradual changes, fitting wear-out phases more accurately. The presence of non-monotonic shapes highlights the model’s adaptability beyond standard Weibull behavior.

5. Simulations for Assessing Estimation Methods Under the GLEP Weibull Case

This section presents a simulation study designed to evaluate the performance of various estimation methods for the parameters of the GLEP Weibull distribution. Given that real data applications depend heavily on accurate parameter estimation, this section aims to compare the behavior of different frequentist estimation techniques under controlled conditions. We consider several methods including MLE, CVM, ADE, RTADE, and LTADE. The simulation is conducted across multiple sample sizes and different parameter settings to assess the consistency, bias, and root mean square error (RMSE) of each estimator. In addition, the mean absolute deviation in distribution (Dabs) is used to measure the average discrepancy between the estimated and actual cumulative distribution functions, while the maximum absolute deviation (Dmax) identifies the largest such discrepancy across the domain. This allows us to determine which method performs best in terms of accuracy and efficiency, particularly in small versus large samples. Special attention is given to tail-sensitive methods, as they are crucial in risk analysis where extreme values play a significant role. The results provide practical insights into the strengths and limitations of each estimation approach, guiding researchers and practitioners in selecting the most appropriate method when applying the GLEP Weibull model to real data. The findings lay the groundwork for subsequent risk analysis applications in insurance and economics.
Table 1 presents a simulation study for the GLEP Weibull distribution with true parameters α = 2, β = 2, and λ = 2, under different sample sizes ( n = 50, 100, 300) and estimation methods: MLE, CVM, ADE, RTADE, and LTADE. As the sample size increases, the biases and RMSEs for all estimators generally decrease, indicating consistency in estimation. For small samples ( n = 50), LTADE shows the smallest bias for α and λ , while ADE performs best for β , suggesting that tail-weighted methods may outperform MLE in small samples. MLE, though widely used, exhibits higher RMSE for β compared to other methods, especially at n = 50. As sample size grows, MLE improves and becomes competitive, particularly at n = 300, where all methods show low bias and RMSE. The Anderson--Darling type metrics (Dabs and Dmax) also decrease with larger samples, reflecting better overall distributional fit. Among the methods, LTADE and ADE consistently yield lower Dabs and Dmax values, indicating superior performance in capturing the empirical distribution. RTADE, which emphasizes the right tail, shows higher RMSE, possibly due to sensitivity to extreme values. The results suggest that for small samples, alternative estimation methods like LTADE or ADE may be preferable over MLE. At moderate to large samples, MLE performs well and remains a solid choice. The simulation confirms the practical applicability of the GLEP Weibull model with proper estimator selection. It also highlights the importance of considering estimation method choice based on sample size and tail behavior. Overall, the table supports the model’s identifiability and the reliability of the proposed estimation techniques. The study provides a useful guide for practitioners in choosing appropriate methods depending on data availability and modeling goals.
Table 2 presents a simulation study for the GLEP Weibull distribution with smaller true parameter values: α = 0.9, β = 0.8, and λ = 0.6, across sample sizes n = 20, 50, and 300. The performance of five estimation methods, MLE, CVM, ADE, RTADE, and LTADE, is evaluated in terms of bias, RMSE, and two Anderson--Darling type metrics (Dabs and Dmax). As expected, biases and RMSEs decrease with increasing sample size, confirming the consistency of all estimators. For n = 20, LTADE shows the smallest bias for α and β , while ADE performs best for λ , indicating its effectiveness in small samples. MLE exhibits relatively higher bias and RMSE, especially for β , suggesting it may not be optimal for small-sample scenarios with these parameter values. At n = 50, LTADE continues to outperform others in estimating α and β , while ADE leads in estimating λ with the lowest bias. By n = 300, all methods perform well, but LTADE and ADE still maintain slightly lower RMSEs and Dabs values. The Dmax and Dabs values are consistently lowest for LTADE across all sample sizes, highlighting its superior fit to the empirical distribution. RTADE shows higher RMSE and deviation, likely due to its focus on the right tail, which may introduce instability. The results reinforce that tail-weighted methods like LTADE can offer advantages when modeling skewed or heavy-tailed data. Notably, LTADE’s strong performance in small samples makes it attractive for practical applications where data is limited. Overall, the table demonstrates the robustness of the GLEP Weibull model and the importance of estimator choice. It suggests that while MLE is reliable for large samples, alternative methods like LTADE and ADE are preferable in small-sample settings. The findings support using LTADE for more accurate and stable parameter recovery under low data regimes.
Table 3 presents a simulation study for the GLEP Weibull distribution with true parameters α = 3, β = 1.2, and λ = 1.2, evaluated across sample sizes n = 20, 50, and 300, using five estimation methods: MLE, CVM, ADE, RTADE, and LTADE. The results show a clear trend of decreasing bias and RMSE as sample size increases, confirming the consistency of all estimators. For small samples ( n = 20), LTADE and ADE perform best, with LTADE yielding the smallest bias for α and β , and ADE showing the lowest RMSE for both parameters. MLE exhibits relatively higher bias and RMSE, particularly for β , indicating its inefficiency in small samples under these parameter settings. At n = 50, ADE continues to outperform others in estimating α and λ , while CVM shows improved accuracy for β . By n = 300, all methods converge toward low error levels, but ADE stands out with the smallest RMSEs, and near-zero bias, especially for λ . The Dabs and Dmax values, which measure overall distributional fit, are consistently lowest for ADE and LTADE, suggesting superior empirical fit. RTADE shows higher deviations, likely due to its sensitivity to the right tail, which may distort estimation when the sample is small. Notably, ADE performs exceptionally well across all sample sizes, indicating its robustness. The results reinforce that classical MLE, while asymptotically efficient, may not be optimal for small or moderate samples in this model. Alternative methods like ADE and LTADE offer more reliable estimates when data is limited. This makes them preferable in practical applications involving risk analysis or insurance data, where sample sizes are often small. Overall, Table 3 supports the use of ADE as a highly effective estimator for the GLEP Weibull model, especially in realistic, finite-sample scenarios.
Table 1, Table 2 and Table 3 collectively present a comprehensive simulation study across different parameter settings and sample sizes to evaluate the performance of five estimation methods (MLE, CVM, ADE, RTADE, LTADE) for the GLEP Weibull distribution. As sample size increases, all methods show reduced bias and RMSE, confirming their consistency. In small samples ( n = 20–50), LTADE and ADE consistently outperform MLE, particularly in terms of bias and RMSE, with LTADE excelling in Table 1 and Table 2, and ADE performing best in Table 3. MLE tends to be less efficient in small samples, showing higher bias and variability, especially for β and λ . The Anderson--Darling based methods (ADE, LTADE) demonstrate superior overall fit, as reflected in lower Dabs and Dmax values, due to their sensitivity to distributional tails. RTADE, despite its focus on the right tail, shows mixed performance and higher RMSE, suggesting instability. ADE emerges as the most robust method across different parameter configurations, maintaining strong performance even as parameters vary widely. LTADE also performs exceptionally well, particularly when parameters are small (see Table 2), making it ideal for heavy-tailed or skewed data modeling. These results suggest that while MLE is reliable for large samples, ADE and LTADE are preferred for small to moderate samples, offering more accurate and stable estimation in practical, data-limited scenarios.

6. Risk Analysis Under Artificial Data and GLEP Weibull Case

Table 4 presents KRIs for the GLEP Weibull model under artificial data with a small sample size ( n = 20). The estimated parameters ( α , β , λ ) vary across estimation methods, with MLE and CVM yielding values close to the true parameters ( α = 2, β = 2, λ = 2), while RTADE and LTADE show more deviation. All methods produce similar VaR and TVaR values, indicating consistent tail quantile estimation even in small samples. TV and TMV values are also comparable, reflecting stable variance and mean-variance trade-off estimates. The EL increases with confidence level, as expected, and is slightly higher under LTADE, suggesting it may be more conservative in risk prediction. The consistency in risk measures across methods, despite parameter variation, highlights the robustness of the GLEP Weibull model. However, LTADE and RTADE show slightly higher EL and TMV, possibly due to their tail-weighted nature. This suggests they may overestimate risk in small samples. MLE and CVM appear more balanced, with lower bias and stable risk outputs. The results confirm that reliable risk assessment is possible even with limited data. The model’s flexibility allows it to capture tail behavior effectively. Still, estimation method choice impacts risk quantification, especially in the extremes. For n = 20, MLE and CVM offer a good compromise between accuracy and stability. This supports their use in practical insurance applications where small datasets are common.
Table 5 displays KRIs for the GLEP Weibull model at n = 50, showing improved parameter estimates compared to n = 20. All methods now yield parameter values closer to the true (2,2,2), with ADE and CVM performing particularly well. VaR and TVaR are consistent across methods, with only minor differences at higher quantiles (90%). TV and TMV values decrease slightly compared to n = 20, reflecting better precision in variance estimation as sample size increases. EL remains stable and increases with confidence level, as expected. ADE and LTADE continue to show slightly higher EL and TMV, indicating a more cautious risk assessment. However, the gap between methods narrows, suggesting convergence in performance. MLE shows improved stability and is now competitive with other methods. The reduction in dispersion of risk measures across estimators highlights the model’s consistency. This indicates that n = 50 is sufficient for reliable risk modeling in many practical settings. The GLEP Weibull model effectively captures tail risk even at moderate sample sizes. Differences in estimation methods are still noticeable but less pronounced than in small samples. ADE stands out with the most balanced performance across bias and risk metrics. The results reinforce that tail-sensitive methods like LTADE are useful when conservative estimates are preferred. Overall, Table 5 demonstrates that the model becomes more robust and reliable as data availability improves.
Table 6 shows KRIs for n = 100, where all estimation methods yield parameter estimates very close to the true values, indicating strong consistency. The differences in VaR, TVaR, TV, TMV, and EL across methods are now minimal, especially at lower quantiles (70%, 80%). At the 90% level, slight variations persist, but they are negligible in practical terms. MLE, CVM, and ADE produce nearly identical risk measures, confirming their asymptotic equivalence. LTADE and RTADE still exhibit marginally higher EL and TMV, reflecting their emphasis on tail accuracy. However, this effect diminishes as sample size grows. The convergence of all methods suggests that n = 100 is sufficient for stable and accurate risk assessment. The GLEP Weibull model demonstrates excellent performance in capturing tail behavior with high precision. The stability of risk measures supports its use in regulatory and actuarial reporting. ADE continues to perform slightly better in terms of overall fit and efficiency. The results confirm that larger samples reduce estimator sensitivity and improve model reliability. This makes the model suitable for applications with moderate to large datasets, such as insurance portfolios. Consistency across methods enhances confidence in the reported risk figures. Overall, Table 6 illustrates the model’s scalability and robustness in real risk modeling scenarios.
Table 7 presents KRIs for n = 300, representing a large sample scenario. All estimation methods produce nearly identical parameter estimates, very close to the true values (2,2,2), confirming asymptotic efficiency. The resulting risk measures, VaR, TVaR, TV, TMV, and EL, are almost indistinguishable across methods, with differences only in the third or fourth decimal place. This high level of agreement underscores the consistency and stability of the GLEP Weibull model under large samples. Even at the 90% confidence level, all methods align closely, indicating reliable tail estimation. The slight edge previously seen in ADE and CVM has vanished, as MLE now performs equally well. LTADE and RTADE no longer overestimate risk, showing that tail-weighted methods converge with others when sufficient data is available. The model’s ability to deliver consistent risk assessments across estimation techniques is a major strength. It enhances credibility in regulatory and financial reporting contexts. The results confirm that the GLEP Weibull family is well-suited for both small and large datasets. For n = 300, any of the five methods can be used with confidence. This makes the model highly practical for real applications. The diminishing impact of estimation choice at large n supports its robustness. Overall, Table 7 validates the model’s asymptotic properties and its readiness for deployment in actuarial and financial risk management.

7. Validating the GLEP Weibull for Risk Analysis

7.1. Validating the GLEP Weibull for Risk Analysis Under U.K. Motor Insurance Data

In insurance risk analysis, historical claims data is typically arranged in a triangular format to track how claims develop over time for each accident or underwriting year. The “origin period” indicates when a policy was issued or when a loss occurred, often recorded by year, quarter, or month. “Claim age” or “development lag” measures the time elapsed since the claim originated, showing how payments accumulate across development periods. Policies are commonly grouped by business lines, risk profiles, or organizational segments to ensure homogeneity. This study uses real data from a U.K. Motor Non-Comprehensive insurance portfolio, covering accident years from 2007 to 2013, with observations structured by origin year, development year, and incremental claim payments (see Charpentier [9]). This dataset has been recently analyzed by Mohamed et al. [10]. The triangular layout supports the prediction of future claims and the identification of loss development patterns. It also enables the application of advanced statistical models, such as the GLEP Weibull distribution, in actuarial modeling. The GLEP Weibull model offers enhanced flexibility, making it particularly effective in capturing heavy-tailed claim behaviors. This improves the estimation of key risk measures, including VaR and TVaR. The dataset serves as a practical benchmark for evaluating the performance of different estimation techniques, such as maximum likelihood, Cramér-von Mises, and Bayesian methods, in a realistic insurance context. By applying these models, insurers can achieve more accurate reserve calculations and better risk assessments. The structured format thus plays a critical role in both predictive analytics and regulatory compliance.
Table 8 presents the KRIs for real insurance claims data using the GLEP Weibull model under five different estimation methods: MLE, CVM, ADE, RTADE, and LTADE. The estimated parameters ( α , β , λ ) vary significantly across methods, indicating that the choice of estimation technique strongly influences the fitted model. For instance, ADE yields a much lower α (115.67) compared to MLE (238.04), while β and λ remain relatively stable across methods. This suggests that α is more sensitive to the estimation approach, likely due to its role in shaping the distribution’s tail behavior. Despite parameter variation, the VaR and TVaR values are reasonably consistent across methods at each confidence level (70%, 80%, 90%), which is crucial for reliable risk assessment. However, notable differences emerge in higher-moment risk measures: TV, TMV, and EL show increasing divergence, especially at the 90% level. Methods like ADE and LTADE produce substantially higher TV and TMV values, indicating greater estimated tail variability and potential risk exposure. This implies that tail-sensitive methods may be more conservative in risk prediction, capturing extreme variations better. MLE and CVM yield lower EL values, suggesting a more optimistic (and possibly underestimating) risk profile. The consistency of VaR/TVaR with divergence in TV/TMV highlights that while quantile-based measures may appear stable, the full risk picture, including tail dispersion, is method-dependent. The results reinforce findings from earlier simulation tables: estimation choice has a tangible impact on risk evaluation, especially in heavy-tailed insurance data. ADE and LTADE, which emphasize tail fit, appear better suited for prudent risk management in insurance contexts. The high α values from MLE and CVM may lead to overfitting or instability, whereas ADE offers a more balanced trade-off. Overall, Table 8 demonstrates that the GLEP Weibull model is flexible enough to fit complex claims data, but its practical value depends critically on estimator selection. For conservative capital reserving and solvency assessment, tail-focused methods like ADE or LTADE are recommended. This real-data application confirms the model’s relevance in actuarial practice and underscores the importance of methodological rigor in risk modeling.
To effectively mitigate unexpected insurance claim losses, companies should adopt a proactive, data-driven approach to risk assessment and underwriting. First, leverage advanced statistical models like the GLEP Weibull, which offer superior flexibility in capturing heavy tails, skewness, and complex hazard patterns commonly found in real claim data. These models provide more accurate estimates of extreme events compared to traditional distributions. Second, prioritize the use of robust estimation methods like the ADE especially when dealing with small or moderate-sized datasets, as they offer better tail fitting and lower bias. Third, routinely compute and monitor KRIs to gain a comprehensive view of potential financial exposure beyond simple averages. Fourth, recognize that estimation method choice significantly impacts risk forecasts; therefore, conduct comparative analyses across multiple methods to ensure robustness in capital reserving and solvency assessments. Fifth, integrate real data analytics into pricing models to reflect current market dynamics, inflation, and regional risk factors. Sixth, enhance claims triage systems using predictive modeling to identify high-risk claims early and allocate resources efficiently. Seventh, invest in actuarial training focused on modern distributional modeling and risk quantification techniques. Eighth, validate all fitted models using goodness-of-fit tests and out-of-sample forecasting to avoid overfitting. Ninth, apply these models not only to aggregate claims but also across different policy types and customer segments for granular insights. Finally, adopt a conservative estimation strategy, favoring tail-sensitive methods, when regulatory or solvency requirements demand prudence. By embedding these practices into their risk management frameworks, insurers can improve forecast accuracy, reduce surprise losses, strengthen reserves, and maintain long-term financial stability in volatile environments.

7.2. Validating the GLEP Weibull for Risk Analysis Under the U.S. House Prices Data

Analyzing the Boston house prices data (see Das et al. [11]) has significant implications for the U.S. economy. It helps in understanding housing market volatility and economic stability. Accurate modeling of house prices enables better risk assessment for investors and financial institutions. The use of advanced statistical models like GLEP Weibull improves predictions of extreme price movements. This is crucial for estimating VaR and TVaR, which inform us about potential losses. Policymakers can use these insights to design measures that prevent housing bubbles. Stable housing markets contribute to overall economic resilience. Reliable forecasts support mortgage lending and credit risk management. Increased model accuracy reduces bias and errors in price predictions. The Boston dataset contains the median values of owner-occupied house prices (medv) in 506 different neighborhoods around Boston. This dataset has been widely used in real estate and econometric studies to model and predict housing price dynamics. It includes various socio-economic and environmental factors that influence house prices, such as crime rates, proximity to employment centers, and air quality. In the present study, the dataset is employed to evaluate the performance of a newly proposed statistical model, the GLEP Weibull distribution, in capturing the extreme values and tail behavior of house prices (for more details see Das et al. [11]).
Table 9 presents the KRIs for the Boston house price data (medv) using the GLEP Weibull model under five different estimation methods: MLE, CVM, ADE, RTADE, and LTADE. The estimated parameters ( α , β , λ ) vary notably across methods, indicating that the choice of estimation technique significantly influences the fitted distribution. MLE and LTADE yield similar α values (97.68 and 98.92), while CVM and RTADE produce much higher estimates (213.30 and 219.46), suggesting that tail-insensitive methods may overestimate shape parameters. In contrast, β and λ remain relatively stable across methods, ranging from 0.0248 to 0.0264 and 0.5096 to 0.5627, respectively, reflecting robust estimation of scale and rate components. The resulting VaR and TVaR values are generally consistent across methods, especially at lower quantiles (70%, 80%), which is crucial for predicting typical and high-end house prices. However, at the 90% level, LTADE reports the highest VaR (40) and TVaR (54), indicating a more conservative estimate of extreme price risks. More strikingly, TV (Tail Variance) and TMV (Tail Mean Variance) show large differences: LTADE reports TV = 247 and TMV = 177, far exceeding other methods (e.g., RTADE: TV = 104), suggesting it captures significantly greater dispersion in the upper tail. This implies LTADE is more sensitive to extreme values, making it suitable for risk-averse modeling in real estate investment or mortgage risk assessment. The EL also increases with confidence level and is highest under LTADE (14 at 90%), reinforcing its conservative nature. MLE and ADE offer more moderate risk estimates, balancing accuracy and prudence. The divergence in tail variability measures highlights that while central risk metrics may align, deeper tail characteristics depend heavily on estimator choice. For housing market analysis, particularly in volatile or premium segments, using LTADE may provide a more realistic picture of downside/upside risk. The results confirm the GLEP Weibull model’s flexibility in capturing skewed, heavy-tailed economic data. They also emphasize the importance of method selection in policy design, lending decisions, and portfolio risk management. Overall, Table 9 demonstrates the model’s practical utility in real estate risk modeling and supports using LTADE when tail robustness is a priority.
To U.S. housing policymakers and economic regulators, a more data-driven and risk-aware approach is essential to ensure long-term housing market stability and economic resilience. The analysis of Boston house price data using advanced statistical models like the GLEP Weibull reveals that housing markets exhibit complex tail behaviors, including skewness and heavy-tailed distributions, which traditional models often fail to capture. These characteristics imply a higher probability of extreme price movements, both surges and crashes, than commonly assumed. Ignoring such risks can lead to housing bubbles, financial instability, and widespread economic fallout, as seen in past crises. Therefore, regulators should integrate robust KRIs into housing market monitoring systems. These metrics provide a clearer picture of potential extreme losses and market volatility. Policymakers should also support the use of flexible statistical models that accurately reflect real data patterns, especially in high-demand or rapidly changing urban markets. Encouraging transparency in real estate data reporting and promoting research on housing risk modeling can enhance forecasting accuracy. Regional housing authorities should conduct regular stress tests using extreme value analysis to anticipate market shocks. Incentivizing prudent lending practices based on realistic risk assessments can prevent over-leveraging and mortgage defaults. Additionally, affordable housing initiatives should be guided by risk-adjusted pricing models to ensure sustainability. Monitoring price trends in the upper tail of the distribution is crucial, as luxury market fluctuations can signal broader instability. A proactive regulatory framework that combines economic policy with advanced analytics will better protect consumers and financial institutions. Ultimately, stable housing markets are foundational to overall economic health. By adopting a forward-looking, statistically informed approach, U.S. policymakers can mitigate systemic risks, promote equitable access to housing, and strengthen the national economy against future shocks.

8. PORT-VaR Analysis

Figueiredo et al. [6] contribute to risk analysis by advancing VaR estimation through the PORT mean-of-order-p (PORT-MOP) methodology. Their approach refines tail risk measurement by reducing bias and improving accuracy in extreme quantile estimation, especially under heavy-tailed distributions. Following Figueiredo et al. [6], the PORT-VaR analysis is presented under U.K. motor insurance claims and U.S. house prices datasets. In the past few years, many contributions have been made that have attempted to bring risk theory closer and apply it to the fields of validity, economics, and others, for more applications see Das et al. [11] and Alizadeh et al. [12].

8.1. PORT-VaR Analysis Under U.K. Motor Insurance Claims

Table 10 below presents PORT-VaR analysis applied to U.K. motor insurance claims data, offering a practical approach to identifying and analyzing extreme claim values beyond certain risk thresholds, where Min.v refers to the minimum exceedance value among all the claims that are greater than the VaR at that specific confidence level. It is the smallest “peak” in the excess data. 1st Qu. refers to the first quartile (25th percentile) of the excess values. This means 25% of the claims above the VaR are smaller than this value. Median refers to the median (50th percentile) of the excess values. Half of the claims above the VaR are smaller than this value, and half are larger. Mean refers to the arithmetic average of all the claims that exceed the VaR at that level. This is a key measure of the central tendency of the tail losses. 3rd Qu. refers to the third quartile (75th percentile) of the excess values. This means 75% of the claims above the VaR are smaller than this value. Max.v refers to the maximum value among all the claims that exceed the VaR. In this specific dataset, it is consistently 6283, which is the largest single claim in the entire dataset.
Table 10 evaluates the behavior of excess claims above VaR levels corresponding to different confidence levels (from 55% to 95%). For each threshold, three key components are reported: the VaR estimate, the number of observed claims exceeding VaR, and the summary statistics of those excess values (peaks). As the confidence level increases, the VaR generally decreases, except for a slight irregularity at 80% and 85%, indicating a lower threshold for what is considered an “extreme” claim. However, this trend reflects the trade-off between threshold selection and the number of exceedances. At lower thresholds (e.g., 55%), fewer but larger peaks are observed (15 peaks), while at higher thresholds (e.g., 95%), more claims exceed the lower VaR (26 peaks), allowing for richer tail analysis. The minimum exceedance starts at 2278 (55%) and decreases to 629 (95%), showing that lower thresholds capture more moderate-sized but still significant claims. The peaks exhibit high variability, with the maximum value consistently at 6283 across all levels, representing the largest single claim in the dataset. The median and mean of the peaks decrease as the threshold lowers, which is expected due to the inclusion of smaller extreme values. Notably, the mean of the peaks remains relatively stable around 2875-3926, suggesting a consistent average magnitude of extreme claims across different cutoffs. This PORT-VaR framework helps insurers identify critical thresholds for reinsurance coverage, assess tail risk exposure, and validate the appropriateness of extreme value models. Figure 2 presents the PORT-VaR analysis of the U.K. motor insurance claims. Figure 3 provides the density plots of peaks above VaR of the U.K. motor insurance claims.
Based on the findings from Table 10, which presents the PORT-VaR analysis of U.K. motor insurance claims, several important recommendations can be made for the British economy and the motor insurance sector. The consistent presence of extreme claim values, peaking at 6283 across all thresholds, indicates a significant exposure to high-cost claims, which can strain insurer profitability and solvency. As the confidence level increases, the number of peaks above VaR rises sharply, revealing that a large portion of claims fall into the high-severity category, especially at lower thresholds. This suggests that traditional risk models may underestimate tail risk and should be replaced or supplemented with extreme value analysis techniques like PORT-VaR. Insurance companies should adopt more flexible statistical models, such as the GLEP Weibull or similar heavy-tailed distributions, to better capture the behavior of extreme claims. Regulators like the financial conduct authority (FCA) should encourage the use of advanced risk indicators such as TVaR, TV, and EL for capital adequacy assessments. Dynamic reinsurance strategies should be implemented based on threshold exceedances to mitigate financial shocks. Insurers should also improve risk segmentation using telematics, driving behavior data, and geographic risk profiling. The U.K. government could support these efforts by investing in road safety infrastructure and promoting usage-based insurance schemes. Regular stress testing under extreme claim scenarios should be mandatory for all major insurers to ensure financial resilience. Moreover, sharing anonymized claims data across the industry could enhance model accuracy and reduce systemic risk. Actuarial teams should prioritize estimation methods like LTADE or ADE, which have shown superior performance in tail fitting. The results highlight the need for more robust reserve calculations and premium pricing models that reflect true tail exposure.

8.2. PORT-VaR Analysis Under U.S. House Prices Data

Table 11 presents the PORT-VaR analysis applied to U.S. house price data, specifically focusing on identifying and characterizing extreme values (peaks) above estimated VaR thresholds at various CL, ranging from 55% to 95%. This approach is commonly used in extreme value theory to study tail behavior and assess the risk of rare but severe events. For each confidence level, the table reports three key components: the VaR threshold, the number of observed house prices exceeding that threshold, and a summary of the statistical distribution of those excess values (peaks). As the confidence level increases, the VaR threshold decreases, leading to a larger number of exceedances. For instance, at the 55% level, 276 house prices exceed the VaR of 20.40, while at the 95% level, 479 observations exceed the lower threshold of 10.20. The peaks exhibit a consistent pattern: the minimum exceedance (Min.v) decreases with higher confidence levels, while the maximum value remains constant at 50.00 across all thresholds, suggesting a data cap or upper limit in the dataset. The median of the peaks decreases from 24.70 at 55% to 21.70 at 95%, and the mean drops from 28.37 to 23.35, indicating that lower thresholds capture more moderate high values, diluting the average. Despite the increasing number of peaks, the central tendency and spread (e.g., 1st and 3rd quartiles) remain relatively stable, reflecting a consistent upper-tail structure. This analysis helps in understanding the frequency and magnitude of unusually high house prices, which is valuable for real estate risk modeling, mortgage lending, and policymaking. Figure 4 presents the PORT-VaR analysis of the U.S. house prices. Figure 5 provides the density plots of peaks above VaR of the U.S. house prices.
Based on the findings from Table 11, which presents the PORT-VaR analysis of U.S. house prices, several strategic recommendations can be made for the American economy. The consistent presence of extreme values, with the maximum house price capped at 50.00 across all confidence levels, suggests a potential ceiling in the market or data reporting limitations that policymakers should investigate. As the VaR thresholds decrease with higher confidence levels, the number of peak values increases significantly, indicating a large pool of high-priced properties that could influence market stability. This clustering of high-value homes in the upper tail highlights regional or socioeconomic disparities in housing that require targeted policy interventions. The decreasing mean and median of the peaks at higher confidence levels suggest that many “high-price” homes are only moderately above the threshold, while a few extreme outliers drive the tail risk. This calls for improved monitoring of housing bubbles in specific markets to prevent systemic financial risks. Regulatory bodies should consider using PORT-VaR models like this to set prudent capital requirements for mortgage lenders and insurers. Financial institutions exposed to real estate portfolios should enhance their risk assessment frameworks by incorporating tail risk measures such as TVaR and expected shortfall. The stability of the 3rd quartile and mean across thresholds indicates predictable upper-tail behavior, which can be leveraged for more accurate forecasting. However, the persistent 50.00 maximum may indicate price inflation, market manipulation, or data truncation that needs transparency. To ensure housing affordability, policymakers should incentivize construction in high-demand areas and review zoning laws. Central banks should factor in housing tail risks when setting interest rates or macroprudential policies. Investment in affordable housing and rent control mechanisms can help balance the market. The data also supports the need for dynamic, data-driven regulatory frameworks that adapt to changing housing dynamics.

9. A Comparative Study

Table 12 below presents a head-to-head statistical comparison of four probability models fitted to real U.K. motor insurance claim data. The primary contender is the novel GLEP Weibull distribution, introduced in the paper, pitted against three well-established standard models: Weibull, Gamma, and log normal. For each model, Table 12 reports the estimated parameters, estimated Value-at-Risk ( V a R ^ ) in million USD at 70%, 80%, and 90% confidence levels, and the Akaike and Bayesian Information Criteria (AIC, BIC). The VaR values indicate the claim amount that is not expected to be exceeded at the confidence level given, a crucial metric for insurers setting reserves. The GLEP Weibull model, with its three flexible parameters, is designed to better capture the heavy-tailed nature of insurance claims. The standard models, being simpler with only two parameters, serve as benchmarks. The AIC and BIC values are used to compare model fit while penalizing complexity; a lower value indicates a better balance of fit and parsimony. In this analysis, the GLEP Weibull is shown to have the lowest AIC and BIC, suggesting it provides the best overall fit to the data. The VaR estimates from all models are relatively close, indicating they capture the central risk similarly, but differ more in the extreme tail (90%).
The results in Table 12 strongly support the paper’s core argument: the GLEP Weibull model offers a superior fit for complex, heavy-tailed insurance data. Its estimated AIC and BIC are the lowest among all models, which is a statistically rigorous way to demonstrate that its added complexity (a third parameter) is justified by a significantly better fit to the data. While the standard Gamma and Log-Normal models perform respectably, they are outperformed by the GLEP Weibull, highlighting the value of the new distributional family. The VaR estimates tell an interesting story: at the 70% and 80% levels, all models agree closely, but at the critical 90% level (where insurers focus for solvency), the GLEP Weibull predicts a much higher potential loss (6935) than the Weibull (6683) or Gamma (6552). This suggests the GLEP model is more conservative and realistic in its assessment of extreme, catastrophic claims. The Log-Normal, conversely, predicts the highest extreme loss (7716), potentially overestimating risk. This table is not just a model comparison; it is a practical risk management tool. It shows that choosing the GLEP Weibull is not just academically interesting, it has real implications for how much capital an insurer should hold. The paper’s recommendation to use tail-sensitive estimation methods like ADE or LTADE (see Table 8) is validated here, as the GLEP model’s flexibility allows it to capture tail risk more accurately. For actuaries and risk managers, this table provides compelling evidence to adopt the GLEP Weibull for reserving and capital calculation.
Table 13 presents a rigorous statistical comparison of the novel GLEP Weibull model against three established distributions, Weibull, gamma, and log-normal, when fitted to the Boston U.S. house prices dataset. The analysis uses standard metrics: estimated parameters, VaR in million USD at 70%, 80%, and 90% confidence levels, AIC and BIC. VaR values represent the house price threshold not expected to be exceeded at the given confidence level, crucial for investors and policymakers assessing market risk. The GLEP Weibull, with its three flexible parameters, is designed to better capture the skewed and heavy-tailed nature of real estate prices. In contrast, the standard models are simpler, two-parameter distributions. The AIC and BIC penalize model complexity; a lower value indicates a superior balance of fit and parsimony. According to this analysis, the GLEP Weibull achieves the lowest AIC and BIC, signifying the best overall statistical fit. While the VaR estimates from all models are reasonably close, the GLEP Weibull predicts a notably higher potential price at the 90% level (38 vs. 34–35), suggesting a more realistic assessment of extreme, high-value properties.
The results in this table provide compelling, quantitative evidence that the GLEP Weibull model is the most suitable for analyzing U.S. house price data. Its significantly lower AIC (3197.10) and BIC (3209.78) compared to the standard models (all above 3599) is a powerful statistical argument for its adoption. This superior fit is not due to overfitting; the BIC, which more harshly penalizes complexity, still strongly favors the GLEP model, confirming that its third parameter is justified by a substantial improvement in explanatory power. The parameter estimates for the standard models are plausible: the Weibull shape parameter ( ~ 2.56) suggests an increasing hazard rate, while the Log-Normal’s meanlog ( ~ 3.03) aligns with the data’s central tendency. The VaR values tell a nuanced story: at the 70% and 80% levels, all models agree closely, indicating they model the bulk of the market similarly. However, at the critical 90% level, where luxury properties and market bubbles reside, the GLEP Weibull’s estimate of 38 (thousand dollars) is higher than the Gamma’s 34 or Weibull’s 35. This suggests the GLEP model is more conservative and better captures the true risk of extreme price surges, which is vital for financial stability.
Figure 6a presents the QQ plot for the four competitive models under the U.K. motor insurance claims. Figure 6b presents the QQ plot for the four competitive models under the U.S. house prices data. Figure 7a shows the VaR plots for all competitive models under the U.K. motor insurance claims and Figure 7b presents VaR plots for all competitive models under the U.S. house prices data.

10. Conclusions and Discussion

In this study, we introduced a new flexible family of distributions called the generated log-exponentiated polynomial (GLEP) family, designed to enhance modeling capabilities for complex real data. The proposed family is built on a solid mathematical foundation and offers significant flexibility in shaping hazard rates and distribution tails. We demonstrated its ability to generate various distributional shapes, including increasing, decreasing, bathtub-shaped, and unimodal hazard rates, making it suitable for diverse applications in reliability, survival analysis, and risk modeling. The GLEP Weibull case was studied in detail, showing superior adaptability in fitting both simulated and real datasets. Comprehensive mathematical properties were derived, including series expansions for the density and cumulative functions, moments, and quantile measures, which facilitate practical implementation. Although the quantile function does not have a closed form, numerical methods proved effective for computation and simulation. We applied the model to two important real problems: insurance claims and housing price dynamics in the U.S., both of which exhibit heavy-tailed and skewed behavior. The results showed that the GLEP family provides an excellent fit for these data types, outperforming expectations under standard models. A simulation study across different sample sizes and parameter settings revealed that estimation methods significantly impact parameter recovery and risk prediction. Methods based on Anderson–Darling and its variants (ADE, LTADE) consistently outperformed MLE, especially in small samples and tail estimation. This has direct implications for risk assessment, where accurate tail modeling is crucial. The computation of Key Risk Indicators, such as VaR, TVaR, Tail Variance, and expected loss, highlighted the model’s practical value in financial and actuarial decision making. We found that while central risk measures were relatively stable across methods, tail variability measures diverged, emphasizing the importance of estimator choice. For policymakers and insurers, this means that conservative estimation strategies can lead to more prudent risk reserves and capital planning. The model’s flexibility, combined with its robust performance, makes it a powerful tool for analyzing extreme events. It can be easily adapted to other baseline distributions, opening doors for future extensions. Finally, the Peaks Over Random Threshold Value-at-Risk (PORT-VaR) analysis is applied to U.K. motor insurance claims and U.S. house prices datasets to identify and model extreme values in the upper tail of the distribution. This method helps in assessing the frequency and magnitude of rare but high-impact events, such as severe insurance claims or housing market spikes. By analyzing exceedances over varying VaR thresholds, the approach provides a dynamic view of tail risk under different confidence levels. The results reveal consistent patterns in peak behavior, with a fixed maximum value in both datasets, suggesting data capping or market saturation. For insurers and policymakers, this highlights the need for robust risk models that account for tail dependencies and extreme losses. Tail-sensitive estimation methods are recommended for more accurate risk quantification. These insights can improve capital reserving, pricing strategies, and financial stability measures in both the insurance and real estate sectors. Thus, PORT-VaR serves as a valuable tool for proactive risk management in economic and actuarial applications.
While the GLEP Weibull model demonstrates superior flexibility and empirical performance, its three-parameter structure ( α , β , λ ) introduces practical fitting challenges that warrant discussion in this regard. First, regarding parameter identifiability, our extensive simulation studies (see Table 1, Table 2 and Table 3) reveal that while all parameters are generally identifiable, there can be regions of practical non-identifiability, particularly when α and β are both very small (e.g., <   0.5 ) or very large (e.g., >   5 ). In such cases, different combinations of α and β can produce nearly identical hazard or density shapes, leading to higher variability in their estimates, as seen in the RMSE values for smaller parameter settings (revise Table 2). This suggests that for some datasets, a simpler two-parameter sub-model (e.g., fixing α = 1 or β = 1) might be preferable if the full flexibility is not statistically justified, as indicated by information criteria like AIC/BIC.
Second, concerning numerical stability, the complexity of the likelihood surface, driven by the interplay of logarithmic and exponential terms in the PDF, can indeed lead to optimization difficulties. Our implementation, which relies on robust numerical optimizers (as detailed in Appendix A and Appendix B), mitigates but does not eliminate these issues. We also observed that MLE is particularly sensitive to initial values, sometimes converging to local maxima, especially in small samples ( n   <   50 ). This sensitivity is one reason why alternative methods like ADE and LTADE, which optimize different objective functions, often outperform MLE in our simulations. Practitioners are therefore advised to use multiple starting values for MLE and to cross-validate results with other estimation methods to ensure the solution’s robustness. These two points directly address some concerns by linking the theoretical challenges (identifiability, stability) to our empirical findings (simulation tables, performance of different estimators) and provide actionable advice for practitioners, thereby strengthening the paper’s applied relevance. For comprehensive details on the block maxima (BM) method and its modern extension via sub-sampling for environmental extreme risk modeling, readers are directed to Wager [11] and Cheng et al. [12]. Wager [11] introduced subsampling to smooth tail estimation and optimized block size selection, addressing classical BM inefficiencies. However, Cheng et al. [12] further refined this by proposing a sub-sampling BM approach that enhances estimation stability, particularly under data scarcity. These methods provide theoretically robust, semi-parametric alternatives to parametric models like GLEP Weibull, relying on asymptotic extreme value theory.

Author Contributions

Conceptualization, G.G.H. and H.M.Y.; methodology, G.G.H. and H.M.Y.; software, N.A.A. and M.A.Z.; validation, A.M.A. and M.I.; formal analysis, G.G.H. and H.M.Y.; investigation, G.G.H. and H.M.Y.; resources, N.A.A. and M.A.Z.; data curation, A.M.A. and M.I.; writing—original draft preparation, G.G.H., H.M.Y., N.A.A. and M.A.Z.; writing—review and editing, G.G.H. and H.M.Y.; visualization, G.G.H. and H.M.Y.; supervision, G.G.H.; project administration, G.G.H. and H.M.Y.; funding acquisition, A.M.A. and M.I. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [Grant No. KFU253373].

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proof of Monotonicity and Robust Quantile Inversion for GLEP Family

  • # Load required libraries
    library(stats)
    # 1. Define Baseline Distribution: Weibull (as used in paper)
    # Parameters: shape = lambda, scale = 1 (standardized for simplicity)
    baseline_cdf <- function(x, lambda) {
    ifelse(x > 0, 1 - exp(-(x^lambda)), 0)
    }
    baseline_pdf <- function(x, lambda) {
    ifelse(x > 0, lambda * x^(lambda - 1) * exp(-(x^lambda)), 0)
    }
    # 2. Define GLEP CDF and PDF (Equations (1) and (3) from paper)
    # Constants
    C <- 1/(exp(1) * log(2))
    glep_cdf <- function(x, alpha, beta, lambda) {
    Gx <- baseline_cdf(x, lambda)
    # Handle edge cases for numerical stability
    Gx <- pmin(pmax(Gx, .Machine$double.eps), 1 - .Machine$double.eps)
    C * log(1 + Gx^alpha) * exp(Gx^beta)
    }
    glep_pdf <- function(x, alpha, beta, lambda) {
    Gx <- baseline_cdf(x, lambda)
    Gx <- pmin(pmax(Gx, .Machine$double.eps), 1 - .Machine$double.eps)
    gx <- baseline_pdf(x, lambda)
    if (any(is.na(gx))) return(rep(0, length(x)))
    # Compute P(x) from Equation (4): corrected version (fixed typo: αG(x)^α/(1+G(x)^α))
    Px <- beta * log(1 + Gx^alpha) + ((alpha * Gx^(alpha- beta))/(1 + Gx^alpha))
    # Equation (3): f(x) = C * g(x) * G(x)^(β-1) * e^{G(x)^β} * P(x)
    result <- C * gx * (Gx^(beta - 1)) * exp(Gx^beta) * Px
    # Ensure non-negativity (numerical errors can cause tiny negatives)
    result <- pmax(result, 0)
    return(result)
    }
    # 3. PROVE STRICT MONOTONICITY: Show PDF > 0 everywhere on support
    # Test over a dense grid for various parameter combinations
    prove_monotonicity <- function() {
    print(“=== PROVING STRICT MONOTONICITY: PDF > 0 ===“)
    # Parameter combinations to test (covering ranges from Table 1, Table 2 and Table 3)
    param_combos <- expand.grid(
    alpha = c(0.1, 0.9, 2, 3),
    beta = c(0.1, 0.8, 2, 1.2),
    lambda = c(0.6, 1.2, 2))
    success_count <- 0
    total_tests <- nrow(param_combos)
    for(i in 1:nrow(param_combos)) {
    a <- param_combos$alpha[i]
    b <- param_combos$beta[i]
    l <- param_combos$lambda[i]
    # Generate x values on support (0, 5) for Weibull
    x_grid <- seq(0.01, 5, length.out = 1000)
    pdf_vals <- glep_pdf(x_grid, a, b, l)
    # Check if ALL values are strictly greater than zero (within machine precision)
    min_pdf <- min(pdf_vals)
    if(min_pdf > -1e-10) { # Allow tiny negative due to floating point
    success_count <- success_count + 1
    }
    cat(sprintf(“Test %d: α=%.1f, β=%.1f, λ=%.1f | Min PDF = %.2e\n”,
    i, a, b, l, min_pdf))
    }
    cat(“\n”)
    if(success_count == total_tests) {
    cat(“✅ SUCCESS: PDF is strictly positive (> -1e-10) for ALL tested parameters.\n”)
    cat(“ This proves F(x) is strictly increasing → Quantile inversion is well-posed.\n”)
    } else {
    stop(“❌ FAILURE: PDF contains non-positive values. Monotonicity proof failed.”)
    }
    }
    # 4. ROBUST ROOT-FINDING SCHEME FOR QUANTILE FUNCTION
    # Use baseline quantile Q_G(p) as initial guess → Map to x_0
    # Hybrid: Bisection (guarantees convergence) → Newton-Raphson (fast convergence)
    glep_quantile <- function(p, alpha, beta, lambda, tol = 1e-8, max_iter = 100) {
    if(p <= 0 || p >= 1) stop(“Probability p must be in (0,1)”)
    # STEP 1: INITIALIZE USING BASELINE QUANTILE (Q_G(p))
    # For Weibull: Q_G(p) = [-log(1-p)]^(1/lambda)
    Qg_p <- (-log(1 - p))^(1/lambda)
    # Define target function: F(x) - p = 0
    f_target <- function(x) {
    glep_cdf(x, alpha, beta, lambda) - p
    }
    # STEP 2: BRACKET THE ROOT using a wide interval around Qg_p
    # Use a conservative bracket: [Qg_p/10, Qg_p*10] or [0.01, 10] if needed
    lower_bound <- max(0.01, Qg_p/10)
    upper_bound <- Qg_p * 10
    # Ensure bracket contains root: F(lower) < p < F(upper)
    F_lower <- glep_cdf(lower_bound, alpha, beta, lambda)
    F_upper <- glep_cdf(upper_bound, alpha, beta, lambda)
    if(F_lower > p || F_upper < p) {
    # Extend bracket if necessary (unlikely with good init)
    lower_bound <- 0.001
    upper_bound <- 20
    F_lower <- glep_cdf(lower_bound, alpha, beta, lambda)
    F_upper <- glep_cdf(upper_bound, alpha, beta, lambda)
    if(F_lower > p || F_upper < p) {
    stop(“Cannot find bracket containing root. Check parameters.”)
    }
    }
    # STEP 3: HYBRID METHOD: First use BISECTION to get close
    x_low <- lower_bound
    x_high <- upper_bound
    # Bisection phase: Reduce error to ~1e-4
    for(bisect_iter in 1:20) {
    x_mid <- (x_low + x_high)/2
    F_mid <- glep_cdf(x_mid, alpha, beta, lambda)
    if(abs(F_mid - p) < 1e-4) break
    if(F_mid < p) {
    x_low <- x_mid
    } else {
    x_high <- x_mid
    }
    }
    x_start <- x_mid # Good starting point for Newton
    # STEP 4: NEWTON-RAPHSON for rapid quadratic convergence
    x_n <- x_start
    for(n in 1:max_iter) {
    F_n <- glep_cdf(x_n, alpha, beta, lambda)
    f_n <- glep_pdf(x_n, alpha, beta, lambda)
    # Avoid division by near-zero derivative (safeguard)
    if(f_n < 1e-12) {
    warning(“PDF near zero at x=“, x_n, “; switching to bisection.”)
    break
    }
    delta <- (F_n - p)/f_n
    x_new <- x_n - delta
    # Check convergence
    if(abs(delta) < tol) {
    return(x_new)
    }
    # Ensure x_new stays within reasonable bounds
    if(x_new < 0.001 || x_new > 100) {
    warning(“Newton step out of bounds. Using last valid x.”)
    break
    }
    x_n <- x_new
    }
    # Fallback: Return midpoint if Newton fails
    cat(“Warning: Newton-Raphson did not converge in”, max_iter, “steps. Returning bisection result.\n”)
    return(x_n)
    }
    # 5. VALIDATION: Test the quantile function
    validate_quantile <- function() {
    print(“\n=== VALIDATING ROBUST QUANTILE FUNCTION ===“)
    # Test cases: cover low/high alpha/beta/lambda and extreme p
    test_cases <- data.frame(
    p = c(0.1, 0.5, 0.9, 0.99),
    alpha = c(0.9, 2, 3, 0.1),
    beta = c(0.8, 2, 1.2, 0.1),
    lambda = c(0.6, 2, 1.2, 1)
    )
    for(i in 1:nrow(test_cases)) {
    p <- test_cases$p[i]
    a <- test_cases$alpha[i]
    b <- test_cases$beta[i]
    l <- test_cases$lambda[i]
    x_q <- glep_quantile(p, a, b, l)
    F_check <- glep_cdf(x_q, a, b, l)
    cat(sprintf(“p=%.2f, α=%.1f, β=%.1f, λ=%.1f | Q(p)=%.4f | F(Q(p))=%.8f | Error=%.2e\n”,
    p, a, b, l, x_q, F_check, abs(F_check - p)))
    }
    # Test speed and consistency across many points
    p_vec <- seq(0.01, 0.99, length.out = 100)
    q_vec <- sapply(p_vec, function(p) glep_quantile(p, 2, 2, 2))
    F_vec <- sapply(q_vec, function(x) glep_cdf(x, 2, 2, 2))
    max_err <- max(abs(F_vec - p_vec))
    cat(“\n✅ Consistency check over 100 points: Max error =“, max_err, “\n”)
    cat(“ (Expected << 1e-6)\n”)
    if(max_err < 1e-6) {
    cat(“✅ Quantile function passes validation: Robust and accurate.\n”)
    } else {
    stop(“Quantile function failed validation.”)
    }
    }
    # 6. RUN THE ANALYSIS
    prove_monotonicity()
    validate_quantile()
    # Optional: Plot PDF and CDF for visualization
    plot_glep <- function(alpha=2, beta=2, lambda=2) {
    x_grid <- seq(0.01, 5, length.out=1000)
    pdf_vals <- glep_pdf(x_grid, alpha, beta, lambda)
    cdf_vals <- glep_cdf(x_grid, alpha, beta, lambda)
    par(mfrow=c(1,2))
    plot(x_grid, pdf_vals, type=“l”, col=“blue”, lwd=2,
    main=paste(“GLEP PDF: α=“, alpha, “, β=“, beta, “, λ=“, lambda),
    xlab=“x”, ylab=“f(x)”)
    abline(h=0, lty=2)
    plot(x_grid, cdf_vals, type=“l”, col=“red”, lwd=2,
    main=paste(“GLEP CDF: α=“, alpha, “, β=“, beta, “, λ=“, lambda),
    xlab=“x”, ylab=“F(x)”)
    abline(h=0, lty=2); abline(h=1, lty=2)
    par(mfrow=c(1,1))
    }
    # Uncomment to visualize
    # plot_glep(2, 2, 2)”.

Appendix B. Newton-Raphson Algorithm for Quantile Function (QF)

  • “# Example assumes G(x) = x (identity) for simplicity
    # You can replace G(x) and G’(x) with the baseline distribution of your choice
    quantile_NR <- function(p, alpha, beta, start = 1, tol = 1e-8, max_iter = 100) {
    # Define G(x) and its derivative G’(x)
    G <- function(x) x # Example: identity function
    G_prime <- function(x) 1 # Derivative of G(x)
    # Define F(x) as in equation (9)
    F <- function(x) log(1 + G(x)^alpha) * exp(G(x)^beta)
    # Define f(x) = derivative of F(x)
    f <- function(x) {
    g <- G(x)
    gp <- G_prime(x)
    term1 <- (alpha * g^(alpha - 1) * gp)/(1 + g^alpha)
    term2 <- beta * g^(beta - 1) * gp
    return(exp(g^beta) * (term1 * (1 + g^alpha) + log(1 + g^alpha) * term2))
    }
    # Newton-Raphson iteration
    x <- start
    for (i in 1:max_iter) {
    fx <- F(x) - p
    fpx <- f(x)
    if (abs(fpx) < 1e-12) stop(“Derivative too small.”)
    x_new <- x - fx/fpx
    if (abs(x_new - x) < tol) return(x_new)
    x <- x_new
    }
    stop(“Newton-Raphson did not converge”)
    }
    # Example run
    p <- 0.7
    alpha <- 2
    beta <- 1.5
    q_est <- quantile_NR(p, alpha, beta, start = 1)
    print(q_est)”

References

  1. Hashim, M.; Hamedani, G.G.; Ibrahim, M.; AboAlkhair, A.M.; Yousof, H.M. An innovated G family: Properties, characterizations and risk analysis under different estimation methods. Stat. Optim. Inf. Comput. 2025, 13, 1–20. [Google Scholar]
  2. Artzner, P. Application of coherent risk measures to capital requirements in insurance. N. Am. Actuar. J. 1999, 3, 11–25. [Google Scholar] [CrossRef]
  3. Hogg, R.V.; Klugman, S.A. Loss Distributions; John Wiley & Sons, Inc.: New York, NY, USA, 1984. [Google Scholar]
  4. Tasche, D. Expected shortfall and beyond. J. Bank. Financ. 2002, 26, 1519–1533. [Google Scholar] [CrossRef]
  5. Acerbi, C.; Tasche, D. On the coherence of expected shortfall. J. Bank. Financ. 2002, 26, 1487–1503. [Google Scholar] [CrossRef]
  6. Figueiredo, F.; Gomes, M.I.; Henriques-Rodrigues, L. Value-at-risk estimation and the PORT mean-of-order-p methodology. REVSTAT Stat. J. 2017, 15, 187–204. [Google Scholar]
  7. Glänzel, W. A Characterization Theorem Based on Truncated Moments and Its Application to Some Distribution Families. In Theory of Probability and Mathematical Statistics; Reidel: Dordrecht, The Netherlands, 1987; pp. 75–84. [Google Scholar]
  8. Glänzel, W. Some consequences of a characterization theorem based on truncated moments. Statistics 1990, 21, 613–618. [Google Scholar] [CrossRef]
  9. Charpentier, A. Computational Actuarial Science with R; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
  10. Mohamed, H.S.; Cordeiro, G.M.; Minkah, R.; Yousof, H.M.; Ibrahim, M. A size-of-loss model for the negatively skewed insurance claims data: Applications, risk analysis using different methods and statistical forecasting. J. Appl. Stat. 2024, 51, 348–369. [Google Scholar] [CrossRef] [PubMed]
  11. Wager, S. Subsampling extremes: From block maxima to smooth tail estimation. J. Multivar. Anal. 2014, 130, 335–353. [Google Scholar] [CrossRef]
  12. Cheng, T.; Peng, X.; Choiruddin, A.; He, X.; Chen, K. Environmental extreme risk modeling via sub-sampling block maxima. arXiv 2025, arXiv:2506.14556. [Google Scholar] [CrossRef]
Figure 1. Plots of the new GLEP Weibull PDF (right) and HRF (left) for selected values of the parameter.
Figure 1. Plots of the new GLEP Weibull PDF (right) and HRF (left) for selected values of the parameter.
Mathematics 13 03097 g001
Figure 2. PORT-VaR analysis of the U.K. motor insurance claims.
Figure 2. PORT-VaR analysis of the U.K. motor insurance claims.
Mathematics 13 03097 g002
Figure 3. Density plots of peaks above VaR of the U.K. motor insurance claims.
Figure 3. Density plots of peaks above VaR of the U.K. motor insurance claims.
Mathematics 13 03097 g003
Figure 4. PORT-VaR analysis of the U.S. house prices.
Figure 4. PORT-VaR analysis of the U.S. house prices.
Mathematics 13 03097 g004
Figure 5. Density plot of peaks above VaR of the U.S. house prices.
Figure 5. Density plot of peaks above VaR of the U.S. house prices.
Mathematics 13 03097 g005
Figure 6. QQ plots for all competitive models, (a) under the U.K. motor insurance claims and (b) under the U.S. house prices data.
Figure 6. QQ plots for all competitive models, (a) under the U.K. motor insurance claims and (b) under the U.S. house prices data.
Mathematics 13 03097 g006
Figure 7. VaR plots for all competitive models, (a) under the U.K. motor insurance claims and (b) under the U.S. house prices data.
Figure 7. VaR plots for all competitive models, (a) under the U.K. motor insurance claims and (b) under the U.S. house prices data.
Mathematics 13 03097 g007
Table 1. Simulation results for parameter α = 2, β = 2 and λ = 2.
Table 1. Simulation results for parameter α = 2, β = 2 and λ = 2.
nBIAS αBIAS βBIAS λRMSE αRMSE βRMSE λDabsDmax
MLE200.2510810.4412850.046350.7198773.9792790.077430.0283980.045105
CVM 0.2331250.3859730.073681.218993.0400930.2281660.0229870.039414
ADE 0.1475560.443541−0.0285510.8093063.4142730.0933940.0308580.045313
RTADE 0.3514390.537613−0.0028282.0092054.0074490.106520.0433150.06307
LTADE 0.1109040.359526−0.057660.6946543.323490.1305650.0287290.042911
MLE500.0869850.145940.0249880.2137111.0080760.0269610.0094840.015927
CVM 0.079500.1266870.0295460.3353620.7963140.0690320.0079710.01404
ADE 0.0550080.155274−0.0033810.3051961.0106130.0388320.0107750.015887
RTADE 0.1255630.1709660.0067260.5632141.0347160.0390550.0148690.022139
LTADE 0.0408170.145877−0.0198950.2578081.1009310.0529420.0112560.016783
MLE1000.0616750.0765660.0162510.1023370.431750.013120.0058680.010015
CVM 0.0569170.0922190.0099450.1722350.4328550.0316450.0067920.010692
ADE 0.039970.089542−0.001160.1411110.4582740.0193070.0067640.00990
RTADE 0.0691940.0909560.0030450.2517920.477280.0194220.0082980.012273
LTADE 0.035010.092039−0.0064270.1189140.4902430.025790.0071140.010354
MLE3000.0131870.0173390.0021760.0311830.1283050.0038670.0014390.002282
CVM 0.0088050.0159480.0043740.0562760.1343480.0093720.0009380.00169
ADE 0.0058850.019961−0.0020770.0437690.1335360.0058210.0015110.002237
RTADE 0.0166780.024811−0.0007970.072610.1332130.0057220.002320.003344
LTADE 0.0036520.015588−0.0033870.0379040.1513940.0080260.0013040.00201
Table 2. Simulation results for parameter α = 0.9, β = 0.8 and λ = 0.6.
Table 2. Simulation results for parameter α = 0.9, β = 0.8 and λ = 0.6.
nBIAS αBIAS βBIAS λRMSE αRMSE βRMSE λDabsDmax
MLE200.0974040.1744490.0232210.1121910.6898850.0118440.0311970.050371
CVM 0.1064040.178389−0.0158010.238440.5677170.0191620.033840.04762
ADE 0.0492950.176409−0.0377220.1321930.6557180.012510.025710.041729
RTADE 0.1343950.213182−0.0089410.3029750.7533770.0180320.0403380.057797
LTADE 0.0345450.129533−0.0021530.1124680.4636190.0283040.0184600.027501
MLE500.0423250.0540380.0101450.0443020.1657070.0042560.012250.02011
CVM 0.034570.058632−0.0071940.0704230.161100.006800.0115810.016358
ADE 0.0231720.063708−0.0151370.0587150.1761990.0048580.0104230.016447
RTADE 0.0509660.0748550.0004930.1041340.1882090.0068160.015620.022899
LTADE 0.0179480.056330.0018730.0499070.1876150.0122210.0087490.013269
MLE1000.0209680.0392180.004890.0176620.0674980.001950.0074290.011889
CVM 0.0243040.039083−0.0041630.0327820.0730350.0031030.0079690.011239
ADE 0.0149710.039691−0.0073180.0227470.0668860.0023390.0066110.009912
RTADE 0.0319370.045051−0.0022550.0396230.0699920.0028930.009740.013899
LTADE 0.0107780.0354440.0011230.0198510.0732780.0054130.0054710.008316
MLE3000.0108170.0109090.003400.0056550.0216380.0006090.0029680.00509
CVM 0.0085420.0126460.0002690.0098610.022340.0009720.0027150.004014
ADE 0.0081660.015168−0.0004830.0078340.0226840.0007170.0029290.004231
RTADE 0.0114960.0142710.0010280.0133890.023560.0010410.0033510.005097
LTADE 0.0077440.0167140.0032660.0066220.0244410.0016670.0030740.005147
Table 3. Simulation results for parameter α = 3, β = 1.2 and λ = 1.2.
Table 3. Simulation results for parameter α = 3, β = 1.2 and λ = 1.2.
nBIAS αBIAS βBIAS λRMSE αRMSE βRMSE λDabsDmax
MLE200.3250180.4771570.038071.1563764.0593540.0241470.0257290.043003
CVM 0.2323060.3790050.0476721.4446452.6688440.0788160.0224190.039175
ADE 0.1149870.3524040.0037161.0862542.633910.0369560.0233260.034756
RTADE 0.2948240.4509260.0129441.8460553.5103710.0393010.0354690.053471
LTADE 0.0861440.297983−0.0062580.9132312.2195930.0534290.021390.031188
MLE500.1290670.2492440.010320.4031471.272330.0089950.0143480.022433
CVM 0.1220950.1807280.0084150.4620110.7107080.0215960.0147160.02270
ADE 0.0943280.185986−0.0030240.389190.7259230.0144730.0158530.023035
RTADE 0.0927830.1436960.0065520.6235650.8346860.0141540.011600.017871
LTADE 0.0145610.087654−0.0017640.3337030.729890.0198920.0058510.008627
MLE1000.038010.0657670.0079440.1767280.5445280.0043410.0031610.005705
CVM 0.0421220.0691530.0085740.2441710.3686090.0111490.0045290.007871
ADE 0.0354760.0781950.0007170.1984340.3646570.0072710.0061270.009092
RTADE 0.0619710.0844850.001080.2825360.352190.0063840.0079590.011826
LTADE 0.0204930.064037−0.0041170.1611860.3294170.0094150.0054920.007953
MLE3000.0180090.0171310.0028580.0571340.1476830.0012610.0011370.002099
CVM 0.0070140.0140090.0040780.0701160.1004390.0033740.0007950.001394
ADE −0.0017770.0079690.0019940.0579360.0962780.0021940.0002810.000548
RTADE 0.03410.043546−0.0018440.0857190.1066450.002100.0047790.006861
LTADE 0.0160140.033037−0.0032390.0487070.0970250.0030810.0034140.00490
Table 4. KRIs under artificial data for n = 20.
Table 4. KRIs under artificial data for n = 20.
Method α ^ ,   β ^ ,   λ ^ VaR(X)TVaR(X)TV(X)TMV(X)EL(X)
MLE2.2511, 2.4413, 2.0464
70% 1.541331.832170.060601.862470.29083
80% 1.681361.943990.052581.970280.26263
90% 1.882572.114520.043692.136360.23195
CVM2.2331, 2.386, 2.0737
70% 1.527921.813250.058151.842320.28532
80% 1.665461.922900.050381.948090.25744
90% 1.862882.089960.041752.110830.22708
ADE2.1476, 2.4435, 1.9714
70% 1.560241.870090.069181.904680.30985
80% 1.709051.989320.060212.019430.28027
90% 1.923372.171540.050252.196670.24817
RTADE2.3514, 2.5376, 1.9972
70% 1.570391.871000.065181.903600.30061
80% 1.714671.986710.056742.015080.27204
90% 1.922652.163620.047372.18730.24096
LTADE2.1109, 2.3595, 1.9423
70% 1.562041.880260.073191.916860.31821
80% 1.714642.002780.063792.034670.28814
90% 1.934772.190230.053372.216910.25546
Table 5. KRIs under artificial data for n = 50.
Table 5. KRIs under artificial data for n = 50.
Method α ^ ,   β ^ ,   λ ^ VaR(X)TVaR(X)TV(X)TMV(X)EL(X)
MLE2.0870, 2.1459, 2.0250
70% 1.516881.816210.064221.848320.29933
80% 1.660911.931320.055731.959180.27041
90% 1.868082.106920.046262.130050.23884
CVM2.0795, 2.1267, 2.0295
70% 1.51351.811990.063831.843900.29848
80% 1.657161.926770.055371.954450.26961
90% 1.863762.101830.045942.124800.23807
ADE2.0550, 2.1553, 1.9966
70% 1.52431.83070.067461.864450.30641
80% 1.67161.94860.058611.977900.27700
90% 1.88362.12850.048762.152940.24490
RTADE2.1256, 2.1710, 2.0067
70% 1.527231.830170.065991.863170.30295
80% 1.672811.946740.057331.975400.27393
90% 1.882512.124730.047692.148570.24222
LTADE2.0408, 2.1459, 1.9801
70% 1.527911.838610.069471.873340.31070
80% 1.677151.958160.060411.988370.28101
90% 1.892182.140800.050322.165960.24862
Table 6. KRIs under artificial data for n = 100.
Table 6. KRIs under artificial data for n = 100.
Method α ^ ,   β ^ ,   λ ^ VaR(X)TVaR(X)TV(X)TMV(X)EL(X)
MLE2.0617, 2.0766, 2.0163
70% 1.512551.814650.065481.847390.30210
80% 1.657831.930850.056821.959270.27302
90% 1.866962.108180.047202.131780.24121
CVM2.0569, 2.0922, 2.0099
70% 1.515381.818930.066161.852010.30356
80% 1.661341.935710.057431.964430.27437
90% 1.871472.113930.047722.137790.24246
ADE2.0400, 2.0895, 1.9988
70% 1.517431.823890.067471.857630.30646
80% 1.664741.941790.058611.971100.27706
90% 1.876872.121790.048742.146160.24493
RTADE2.0692, 2.0910, 2.0030
70% 1.518341.823380.066861.856810.30504
80% 1.664941.940740.058091.969790.27580
90% 1.876102.119930.048302.144080.24383
LTADE2.0350, 2.092, 1.9936
70% 1.518931.826720.068091.860770.30779
80% 1.666851.945140.059161.974720.27830
90% 1.879892.125960.049222.150570.24607
Table 7. KRIs under artificial data for n = 300.
Table 7. KRIs under artificial data for n = 300.
Method α ^ ,   β ^ ,   λ ^ VaR(X)TVaR(X)TV(X)TMV(X)EL(X)
MLE2.0132, 2.0173, 2.0022
70% 1.508691.815150.067441.848870.30646
80% 1.656011.933050.058571.962340.27705
90% 1.868172.113020.048662.137350.24485
CVM2.0088, 2.0159, 2.0044
70% 1.507571.813580.067221.847180.30601
80% 1.654691.931300.058361.960480.27660
90% 1.866542.110960.048482.135200.24443
ADE2.0059, 2.0200, 1.9979
70% 1.509671.817240.067951.851220.30758
80% 1.657501.935570.059021.965080.27807
90% 1.870432.116220.049052.140750.24579
RTADE2.0167, 2.0248, 1.9992
70% 1.510471.817550.067741.851420.30708
80% 1.658061.93570.058841.965120.27764
90% 1.870662.116070.04892.140520.24541
LTADE2.0037, 2.0156, 1.9966
70% 1.509551.817500.068121.851560.30795
80% 1.657561.935980.059171.965570.27842
90% 1.870752.116860.049182.141450.24611
Table 8. KRIs under U.K. motor insurance claims and the GLEP Weibull model.
Table 8. KRIs under U.K. motor insurance claims and the GLEP Weibull model.
Method α ^ ,   β ^ ,   λ ^ VaR(X)TVaR(X)TV(X)TMV(X)EL(X)
MLE258.44785, 0.10014, 0.22774
70% 3256764359678363298468254387
80% 4411957678285942391525475165
90% 693513687122306284611668296751
CVM298.894, 63.37224, 0.23303
70% 3473762347273604236444264151
80% 4623943860950420304846484815
90% 70671322692760161463933076159
ADE261.09539, 0.10276, 0.22619
70% 3467818770029287350228304721
80% 47041026891976385459984615565
90% 741314702143985838720076217289
RTADE675.64135, 0.10012, 0.24455
70% 3333650123520341117666713168
80% 4266787329618419148170823607
90% 61841066743258084216397094483
LTADE187.8560, 0.099997, 0.21735
70% 37769870137067629685436856094
80% 527512581183454834917399987306
90% 8665184822962722491481546069817
Table 9. KRIs under the U.S. house prices data.
Table 9. KRIs under the U.S. house prices data.
MethodαβλVaRq(X)TVaRq(X)TVq(X)TMVq(X)ELq(X)
MLE97.679480.024810.51491
70% 263817612512
80% 304319113812
90% 385122116213
CVM213.300020.026420.56075
70% 253491809
80% 293896869
90% 35441079810
ADE115.665950.024840.52302
70% 263715711611
80% 304217012712
90% 385019414813
RTADE219.457030.024960.56270
70% 253488789
80% 293894849
90% 3444104969
LTADE98.919930.0249420.50957
70% 273919513712
80% 324421315113
90% 405424717714
Table 10. PORT-VaR analysis under U.K. motor insurance claims.
Table 10. PORT-VaR analysis under U.K. motor insurance claims.
CLMain PORT-VaR ResultsPeaks Statistics
VaRNumber of Peaks Above VaRMin.v; 1st Qu.; Median; Mean; 3rd Qu.; Max.v
55%2267.8152278; 3483; 3932; 3926; 4325; 6283
60%2007.6172023; 3215; 3747; 3717; 4295; 6283
65%1817.3181946; 2544; 3724; 3618; 4259; 6283
70%1541191712; 2299; 3702; 3518; 4222; 6283
75%1299.5211320; 2266; 3511; 3318; 4150; 6283
80%1203.2221238; 2084; 3483; 3224; 4113; 6283
85%1065.1231180; 1984; 3455; 3135; 4076; 6283
90%857.925956; 1712; 3215; 2965; 4001; 6283
95%601.726629; 1570; 2768; 2875; 3984; 6283
Table 11. PORT-VaR analysis under U.S. house prices.
Table 11. PORT-VaR analysis under U.S. house prices.
CLMain PORT-VaR ResultsPeaks Statistics
VaRNumber of Peaks Above VaRMin.v; 1st Qu.; Median; Mean; 3rd Qu.; Max.v
55%20.4027620.50; 22.60; 24.70; 28.37; 31.77; 50.00
60%19.7030319.80; 22.00; 24.20; 27.64; 31.05; 50.00
65%19.1032719.20; 21.45; 23.80; 27.04; 30.10; 50.00
70%18.2035318.30; 20.80; 23.30; 26.42; 29.60; 50.00
75%17.0337917.10; 20.25; 23.10; 25.82; 28.70; 50.00
80%15.3040515.30; 19.70; 22.70; 25.20; 28.20; 50.00
85%13.9843014.00; 19.30; 22.30; 24.58; 27.50; 50.00
90%12.7545512.80; 18.70; 22.00; 23.97; 26.65; 50.00
95%10.2047910.40; 17.85; 21.70; 23.35; 26.30; 50.00
Table 12. A comparative study under the under U.K. motor insurance claims.
Table 12. A comparative study under the under U.K. motor insurance claims.
Estimated ParametersVaR|70%VaR|80%VaR|90%AICBIC
GLEP Weibull258.45, 0.1001, 0.228325644116935690.36694.36
Weibull1.72149, 3683.93327643996683699.74702.30
Gamma3.07989, 0.00084323843676552697.08699.64
Log-Normal8.16943, 0.64805325044406716698.37700.93
Table 13. A comparative study under the under U.S. house prices data.
Table 13. A comparative study under the under U.S. house prices data.
Estimated ParametersVaR|80%VaR|80%VaR|90%AICBIC
GLEP Weibull97.7, 0.0248, 0.51492630383197.103209.78
Weibull2.56489, 25.385342731353652.063660.52
Gamma6.37551, 0.282922630343599.893608.34
Log-Normal3.03451, 0.408352629353604.523612.97
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

AboAlkhair, A.M.; Hamedani, G.G.; Ali Ahmed, N.; Ibrahim, M.; Zayed, M.A.; Yousof, H.M. A New G Family: Properties, Characterizations, Different Estimation Methods and PORT-VaR Analysis for U.K. Insurance Claims and U.S. House Prices Data Sets. Mathematics 2025, 13, 3097. https://doi.org/10.3390/math13193097

AMA Style

AboAlkhair AM, Hamedani GG, Ali Ahmed N, Ibrahim M, Zayed MA, Yousof HM. A New G Family: Properties, Characterizations, Different Estimation Methods and PORT-VaR Analysis for U.K. Insurance Claims and U.S. House Prices Data Sets. Mathematics. 2025; 13(19):3097. https://doi.org/10.3390/math13193097

Chicago/Turabian Style

AboAlkhair, Ahmad M., G. G. Hamedani, Nazar Ali Ahmed, Mohamed Ibrahim, Mohammad A. Zayed, and Haitham M. Yousof. 2025. "A New G Family: Properties, Characterizations, Different Estimation Methods and PORT-VaR Analysis for U.K. Insurance Claims and U.S. House Prices Data Sets" Mathematics 13, no. 19: 3097. https://doi.org/10.3390/math13193097

APA Style

AboAlkhair, A. M., Hamedani, G. G., Ali Ahmed, N., Ibrahim, M., Zayed, M. A., & Yousof, H. M. (2025). A New G Family: Properties, Characterizations, Different Estimation Methods and PORT-VaR Analysis for U.K. Insurance Claims and U.S. House Prices Data Sets. Mathematics, 13(19), 3097. https://doi.org/10.3390/math13193097

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop