Next Article in Journal
Risks for Companies during the COVID-19 Crisis: Dataset Modelling and Management through Digitalisation
Previous Article in Journal
Pricing Pandemic Bonds under Hull–White & Stochastic Logistic Growth Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian Inference for the Loss Models via Mixture Priors

Department of Mathematics, Towson University, Towson, MD 21252, USA
*
Author to whom correspondence should be addressed.
Risks 2023, 11(9), 156; https://doi.org/10.3390/risks11090156
Submission received: 5 June 2023 / Revised: 16 August 2023 / Accepted: 18 August 2023 / Published: 31 August 2023

Abstract

:
Constructing an accurate model for insurance losses is a challenging task. Researchers have developed various methods to model insurance losses, such as composite models. Composite models combine two distributions: one for part of the data with small and high frequencies and the other for large values with low frequencies. The purpose of this article is to consider a mixture of prior distributions for exponential–Pareto and inverse-gamma–Pareto composite models. The general formulas for the posterior distribution and the Bayes estimator of the support parameter θ are derived. It is shown that the posterior distribution is a mixture of individual posterior distributions. Analytic results and Bayesian inference based on the proposed mixture prior distribution approach are provided. Simulation studies reveal that the Bayes estimator with a mixture distribution outperforms the Bayes estimator without a mixture distribution and the ML estimator regarding their accuracies. Based on the proposed method, the insurance losses from natural events, such as floods from 2000 to 2019 in the USA, are considered. As a measure of goodness-of-fit, the Bayes factor is used to choose the best-fitted model.

1. Introduction

Constructing an accurate loss model for insurance loss is one of the essential topics in actuarial science. Insurance industry data have unique properties. They have a high frequency for small losses and very few significant losses. A traditional distribution, such as normal and others, cannot describe insurance data skewness and fat-tailed properties. Therefore, many researchers explored the other distributions to fit the insurance loss data better. The class of composite distributions is one of them. A composite distribution combines a typical distribution, such as exponential, inverse-gamma, Weibull, and log-normal, for the data with slight losses and the Pareto distribution with extreme losses but low frequencies.
Klugman et al. (2012) provided a detailed discussion on modeling datasets in actuarial science. Teodorescu and Vernic (2006) considered the exponential–Pareto composite model and derive the maximum likelihood estimator for the support parameter θ . Preda and Ciumara (2006) employed the composite models Weibull–Pareto and log-normal–Pareto to model insurance losses. These models have two parameters: one is the support parameter θ and another is the shape parameter α . In the article, they developed algorithms to find and compare the maximum likelihood estimates for two unknown parameters. Cooray and Cheng (2013) estimated the parameters of the log-normal–Pareto composite distribution by using Bayesian methods with both Jeffreys and conjugate priors. They used MCMC methods rather than developing closed mathematical formulas. Scollnik and Sun (2012) developed several composite Weibull–Pareto models and suggested using them in different situations. Aminzadeh and Deng (2017) reconsidered the composite exponential–Pareto distribution and provided the Bayesian estimate of the θ via inverse-gamma as the prior distribution. Aminzadeh and Deng (2019) developed an inverse-gamma–Pareto composite distribution to model insurance losses and provided Bayesian inference based on gamma prior distribution. Deng and Aminzadeh (2019) revisited the Weibull–Pareto composite model and derived the Bayesian inference for the model. In Deng and Aminzadeh (2019), both inverse-gamma (IG) and gamma priors were employed to find Bayes estimates of the support parameter of θ and the shape parameter α . They also confirmed via simulation studies that the Bayes estimates of the parameters consistently outperform MLEs in all cases. Bakar et al. (2015) develop several new composite models on the Weibull distribution for heavy-tailed insurance loss data. These models are fitted to two real insurance loss data and their goodness-of-fit is tested.
Mixture distributions have applications in many fields, including insurance, actuarial science, and risk management. Klugman et al. (2012) discussed why mixture distributions have broad applications in the actuarial science field. Miljkovic and Grün (2016) used the mixture distributions to model insurance losses. They compared the mixture model with composite models for Danish Fire data and pointed out that it is better than composite models. Bhati et al. (2019) used a mixture of the Pareto and log-gamma distributions to model the heavy-tailed losses.
Abdul Majid and Ibrahim (2021a) analyzed composite Pareto models for Malaysian household income data. The parameter estimation uses numerical methods based on maximum pseudo-likelihood. The conclusion is that the log-normal–Pareto (II) model provides the best fit compared to other models. Abdul Majid and Ibrahim (2021b) proposed a Bayesian approach to composite Pareto models that involves prior distribution on the proportion of data coming from the Pareto distribution instead of assuming the prior distribution on the threshold θ . They concluded that a uniform prior on the proportion approach is less biased than the point estimates determined when using a uniform prior on the threshold. Deng et al. (2021) provided an analytical Bayesian approach to derive estimators of the log-normal–Pareto composite distribution parameters based on the selected priors. The article compared exponential–Pareto, inverse-gamma–Pareto, and log-normal–Pareto as candidate models for data on natural hazards from 1900 to 2016 in the USA. The conclusion is that the log-normal–Pareto distribution provides the best fit.
To model large losses, the Pareto distribution is the distribution favored by practitioners and researchers for modeling heavy-tailed financial data. However, when losses consist of smaller values with high frequencies and larger losses with low frequencies, the log-normal or the Weibull distributions are preferred. Nevertheless, no ordinary distribution provides an acceptable fit for both small and large losses. On the one hand, as mentioned by Dominicy and Sinner (2017), the Pareto fits the tail well, but on the other hand, log-normal, Weibull, and inverse-gamma produce an overall good fit but fit the tail badly. The purpose of using composite distributions is to overcome the dilemma. Saleem (2010) considers type-I mixtures of the members of a subclass of the one-parameter exponential family distributions, such as exponential, Rayleigh, Pareto, Burr type XII, and power function distributions for censored data. The article provides Bayes estimators of parameters using ML, as well as uniform and Jeffreys priors. To our knowledge, the mixture of the priors’ approach has not been considered in the literature for composite distributions. The proposed method in the current article considers two composite distributions. The mixture prior method is based on gamma and inverse-gamma priors, which are good candidates for the positive threshold parameter θ . Furthermore, we propose a data-driven approach to compute optimal values for hyperparameters. For a real dataset where a selected “true” value for θ (unlike in simulations) is not available, we propose using the MLE of θ along with the characteristics of the prior distribution to assign optimal values for the hyperparameter values.
In this article, we apply the Bayesian method to the composite models using a mixture of prior distributions instead of a single prior distribution for θ . The motivation comes from the natural loss of data over many years due to many factors, such as floods, fires, storms, and earthquakes. Each of them should have a distribution with its parameters. Therefore, the mixture distribution describes the overall distribution. The organization of the article is as follows. Section 2 discusses the general mixture prior, the general mixture posterior, and the general predictive distributions with the risk measures. Section 3 provides the formulas for Bayes estimators of θ via the mixture prior distribution approach for both exponential–Pareto and gamma–Pareto composite models. Section 4 summarizes simulation studies based on equality-weighted mixture distributions and compares the accuracy of different methods. Section 5 analyzes the natural disaster loss data to illustrate the computations involved and identifies the best model using the Bayes factor as a goodness-of-fit measure.

2. Mixture Distribution

Definition 1. 
A random variable Z is a K-point mixture of the random variables X 1 , X 2 , , X K if its cdf is given by
F Z ( z ) = k 1 F X 1 ( z ) + k 2 F X 2 ( z ) + + k K F X K ( z ) ,
where k j > 0 and j = 1 K k j = 1 .
Therefore, a mixture distribution density is given by
f Z ( z ) = j = 1 K k j f X j ( z ) .
The steps to derive the posterior distribution of the random variables with a mixture prior distribution are as follows:
Let x 1 , x 2 , x n be a random sample from the distribution with a parameter θ . The likelihood function L( θ ) can be written as follows:
L ( θ ) = i = 1 n f ( x i | θ ) .
Let the prior distribution of the parameter θ be a K-point mixture distribution with the density function given by
π ( θ ) = j = 1 K k j π j ( θ )
where all k j > 0 , and j = 1 K k j = 1 , and π j ( θ ) d θ = 1 .
Therefore, the joint distribution of θ and X is
f ( θ , x ) = L ( θ ) j = 1 K k j π j ( θ ) ,
where π j ( θ ) , j = 1 , 2 , 3 , K belong to the same class of distributions. For example, all could be Pareto, gamma, or normal. The marginal distribution of X is given by
f X ( x ) = f ( θ , x ) d θ .
The posterior distribution of θ is
π ( θ | x ) = f ( θ , x ) f X ( x ) = L ( θ ) j = 1 K k j π j ( θ ) f ( θ , x ) d θ .
For now, consider only the j t h prior distribution π j ( θ ) and denote the corresponding joint distribution as f j ( θ , x ) , then
f j ( θ , x ) = π j ( θ ) L ( θ ) , j = 1 , 2 , , K .
Let us denote the corresponding marginal distribution of X as f X j ( x ) , then
f X j ( x ) = f j ( θ , x ) d θ = π j ( θ ) L ( θ ) d θ .
Therefore, the corresponding posterior distribution is
π X j ( θ | x ) = π j ( θ ) L ( θ ) f X j ( x ) ,
which implies
π j ( θ ) L ( θ ) = π X j ( θ | x ) f X j ( x ) .
Using (2) and (1), the posterior distribution, based on the K-point mixture prior π j ( θ ) , ( j = 1 , 2 , , K ) , is given by
π ( θ | x ) = f ( θ , x ) f X ( x ) = j = 1 K k j π j ( θ ) L ( θ ) f X ( x ) = j = 1 K k j π X j ( θ | x ) f X j ( x ) f ( θ , x ) d θ = j = 1 K k j π X j ( θ | x ) f X j ( x ) j = 1 K k j π X j ( θ | x ) f X j ( x ) d θ = j = 1 K β j π j ( θ | x )
where
β j = k j f X j ( x ) j = 1 K k j π X j ( θ | x ) f X j ( x ) d θ = k j f X j ( x ) j = 1 K k j f X j ( x )
because, π X j ( θ | x ) d θ = 1 . Therefore, j = 1 K β j = 1 . Hence, the form of the posterior pdf in (3) confirms that the posterior distribution based on a mixture prior distribution is also a mixture distribution of the individual posterior distributions.
Now, we consider the predictive distribution of Y, given X. Let y denote a future realization of the random variable Y. We assume that θ > 0 , which is the case for the composite models.
The predictive density of y, given x, is formulated as follows:
f ( y | x ) = f ( y , x ) f X ( x ) = 0 f ( y , x , θ ) π ( θ ) d θ f X ( x ) = 0 f ( y | x , θ ) f ( x | θ ) π ( θ ) d θ f X ( x ) = 0 f ( y | θ ) f ( x , θ ) d θ f X ( x ) = 0 f ( y | θ ) π ( θ | x ) d θ .
Using (3) and (4), and noting that f j ( y | x ) = 0 f ( y | θ ) π j ( θ | x ) d θ , we obtain
f ( y | x ) = 0 f ( y | θ ) j = 1 K β j π j ( θ | x ) d θ = j = 1 K β j f j ( y | x ) .
Recall that we have already shown j = 1 K β j = 1 . Therefore, the predictive distribution of the mixture prior distribution is also the mixture distribution of the individual predictive distributions.

2.1. Example: Exponential with a Mixture of Gamma Distributions

Let X 1 , X 2 , , X n be independent identically distributed (iid) random variables from the exponential distribution with parameter θ . The density function is given by
f X i ( x | θ ) = θ e θ x , x > 0 , θ > 0 , i = 1 , 2 , n ,
and the likelihood function is
L ( θ ) = i = 1 n f X i ( x | θ ) = θ n e θ i = 1 n x i .
Let the prior distribution of θ be in the class of gamma distributions with parameters α j > 0 and β j > 0 , j = 1 , 2 , , K . Then the mixture prior distribution is
π ( θ ) = j = 1 K k j θ α j 1 e θ β j Γ ( α j ) β j α j .
Therefore, the joint distribution is given by
f ( x , θ ) = L ( θ ) π ( θ ) = j = 1 K k j θ α j 1 e θ β j Γ ( α j ) β j α j θ n e θ i = 1 n x i = j = 1 K k j θ n + α j 1 e θ ( 1 β j + i = 1 n x i ) Γ ( α j ) β j α j .
Hence, the marginal distribution of X is given by
f X ( x ) = 0 j = 1 K k j θ n + α j 1 e θ ( 1 β j + i = 1 n x i ) Γ ( α j ) β j α j d θ
= j = 1 K k j Γ ( α j ) β j α j 0 θ n + α j 1 e θ ( 1 β j + i = 1 n x i ) d θ
= i = j K k j Γ ( n + α j ) ( 1 1 β j + i = 1 n x i ) n + α j Γ ( α j ) β j α i
The integrand in RHS of (6) is the kernel of the gamma distribution with parameters n + α j and 1 1 β j + i = 1 n x i .
Using (5) and (7), the posterior distribution π ( θ | x ) is given by
π ( θ | x ) = f ( x , θ ) f X ( x ) = j = 1 K k j θ n + α j 1 e θ ( 1 β j + i = 1 n x i ) Γ ( α j ) β j α j j = 1 K k j Γ ( n + α j ) ( 1 1 β j + i = 1 n x i ) n + α j Γ ( α j ) β j α j .
And after some algebraic manipulations, (8) reduces to
π ( θ | x ) = j = 1 K k j Γ ( n + α j ) ( 1 1 β j + i = 1 n x i ) n + α j Γ ( α j ) β j α i j = 1 K k j Γ ( n + α j ) ( 1 1 β j + i = 1 n x i ) n + α j Γ ( α j ) β j α j θ n + α j 1 e θ ( 1 β j + i = 1 n x i ) Γ ( n + α j ) ( 1 1 β j + i = 1 n x i ) n + α j = j = 1 K k j f X j ( x ) j = 1 K k j f X j ( x ) π j ( θ | x ) = j = 1 K β j π j ( θ | x ) .
As expected, the RHS of (9) confirms that the posterior distribution is the mixture distribution of the individual posterior distributions.
Figure 1a provides graphs for five individual gamma distributions with the shape parameter α and the scale parameter β . Figure 1b provides graphs for equally weighted mixtures for gamma distributions. Figure 1c provides graphs of mixture gamma distributions with equal weights for two gamma distributions with different shaped parameters and equal scale parameters. Figure 1d provides graphs for the mixture of gamma distributions with unequal weights for gamma distributions with the same parameter shape but different scale parameters. Figure 1b–d confirm that the general shape of the pdf for the mixture distribution is significantly different from that of the pdf for individual gamma distributions in Figure 1a.

3. Bayesian Approach to Composite Models based on the Mixture Prior Distribution

3.1. Bayesian Inference for Composite Exponential–Pareto Based on the Mixture Prior Distribution

Teodorescu and Vernic (2006) considered the exponential–Pareto composite model.
Suppose a random variable X has the pdf defined as a piecewise function,
f X ( x ) = c f 1 ( x ) 0 < x θ c f 2 ( x ) θ x <
where
f 1 ( x ) = λ e λ x , x > 0 , λ > 0 ,
and
f 2 ( x ) = α θ α x α + 1 , x θ ,
The pdf of the exponential distribution with parameter λ is denoted by f 1 ( x ) , and the pdf of the Pareto distribution with parameters θ and α is denoted by f 2 ( x ) .
Since the pdf of a composite distribution should be a smooth function, the continuity and differentiability conditions on f X ( x ) at θ are necessary. Hence,
f 1 ( θ ) = f 2 ( θ ) , f 1 ( θ ) = f 2 ( θ ) .
As explained in Teodorescu and Vernic (2006), the above equations reduce to
λ e λ θ = α θ λ 2 e λ θ = α ( 1 + α ) θ 2 ,
which lead to
α = λ θ 1 λ θ ( e λ θ 1 ) + 1 = 0 .
Numerical methods via Mathematica for the second equation above, lead to
λ θ = 1.35 α = 0.35 .
Since 0 f ( x ) d x = 1 , the normalizing constant c is computed as c = 1 2 e λ θ = 0.574 . Therefore, the initial three parameters reduce to only one parameter θ , and the pdf of the exponential–Pareto distribution is
f X ( x | θ ) = 0.775 θ e 1.35 x θ 0 < x θ 0.2 θ 0.35 x 1.35 θ x <
For 0 < x θ , the cdf is
F x ( x | θ ) = P ( X x ) = 0 x 0.775 θ e 1.35 x θ d x = 0.775 1.35 ( 1 e 1.35 x θ ) .
When x > θ ,
F x ( x | θ ) = P ( X x ) = 0 θ 0.775 θ e 1.35 x θ d x + θ x 0.2 θ 0.35 x 1.35 d x = 0.775 1.35 ( 1 e 1.35 x θ ) + 0.2 0.35 ( 1 ( θ x ) 0.35 )
Therefore, we have the CDF as a piecewise function
F X ( x | θ ) = 0.775 1.35 ( 1 e 1.35 x θ ) 0 < x θ 0.775 1.35 ( 1 e 1.35 x θ ) + 0.2 0.35 ( 1 ( θ x ) 0.35 ) θ x <
To find the quantile x p through F x ( x | θ ) = P , for 0 < P < 1 , we consider two cases. Since F x ( θ | θ ) = 0.775 1.35 ( 1 e 1.35 θ θ ) = 0.425251 , first consider the case 0 < P < 0.425251 . Solving 0.775 1.35 ( 1 e 1.35 x p θ ) = P for x p , gives x p = θ 1.35 ln ( 1 1.35 P 0.775 ) . Note that for P = 0.25 , the first quartile is x 0.25 = 0.423545 θ .
For the case 0.425251 < P < 1 , solving
0.775 1.35 ( 1 e 1.35 x p θ ) + 0.2 0.35 ( 1 ( θ x p ) 0.35 ) = P ,
gives
x p = θ ( 1 0.35 0.2 ( P 0.425251 ) ) 1 0.35 .
For the special case P = 0.99 , we have x P = 331,596 θ . In light of the above findings, the quantile function for the exponential–Pareto is
x p = θ 1.35 ln ( 1 1.35 P 0.775 ) 0 < P < 0.425251 θ ( 1 0.35 0.2 ( P 0.425251 ) ) 1 0.35 0.425251 < P < 1
For a random sample x 1 , , x n from the composite pdf in (10), without loss of generality, assume x 1 < x 2 < < x n . The likelihood function can be formulated as
L ( x ̲ | θ ) = c θ 0.35 n 1.35 m e 1.35 i = 1 m x i / θ ,
where c = 0 . 2 n m ( 0.775 ) m i = m + 1 n x i 1.35 . To formulate the likelihood function, we assume that, without loss of generality, there is an m ( m = 1 , 2 , , n 1 ) so that in the ordered sample x m θ x m + 1 .
The solution to ln ( L ( x ̲ | θ ) ) θ = 0.35 n 1.35 m θ + 1.35 i = 1 m x i θ 2 = 0 is the MLE of θ ,
θ ^ M L E = 1.35 i = 1 m x i 1.35 m 0.35 n .
Note that the Fisher information is
I ( θ ) = E [ 2 ln ( L ( x ̲ | θ ) ) 2 θ ] = 1.35 m 0.35 n θ 2 + 2.7 i = 1 m E [ X i ] θ 3 ,
where,
E [ X ] = 0 θ x f X ( x | θ ) d x = 0.0064273 θ
1 / I ( θ ) provides the standard deviation of the MLE.
We can see that the MLE requires the correct value of m for its computation. By the assumption, x m θ x m + 1 , the algorithm below goes through the following steps to compute θ ^ M L E .
  • Get sorted sample observations x 1 < x 2 < < x n .
  • Start with m = 1 , compute θ ^ M L E , if x 1 θ ^ M L E x 2 , then m = 1 , otherwise go to step 3.
  • Let m = 2 , compute θ ^ M L E , if x 2 θ ^ M L E x 3 , then m = 2 , otherwise go to next step.
The above process continues until we identify the correct value for m. Using the correct value of m, θ ^ M L E can be computed.
Aminzadeh and Deng (2017) developed Bayesian inference for the exponential–Pareto composite model by considering inverse-gamma as the prior distribution for θ ,
π ( θ ) = b a θ a 1 e b / θ Γ ( a ) ,   b > 0 ,   a > 0 .
Using (11) and (12), the posterior pdf π ( θ | x ̲ ) is
π ( θ | x ̲ ) = L ( x ̲ | θ ) π ( θ ) e b + 1.35 i = 1 m x i θ θ ( a 0.35 n + 1.35 m ) 1 .
Using the squared-error loss function, the Bayes estimator for θ is
θ ^ B a y e s = E [ θ | x ̲ ] = B A 1 = b + 1.35 i = 1 m x i a 0.35 n + 1.35 m 1 .
where A = ( a 0.35 n + 1.35 m ) and B = ( b + 1.35 i = 1 m x i ) .
It is shown in the article that the Bayes estimator (14) is consistently better than the MLE in regards to accuracy.
Now, consider the mixture prior distribution of inverse-gamma distributions. Let
π ( θ ) = j = 1 K k j b j a j θ a j 1 e b j / θ Γ ( a j ) ,   b j > 0 ,   a j > 0 ,   j = 1 , 2 , , K , j = 1 K k j = 1 ,
where the j t h prior distribution is given by
π j ( θ ) = b j a j θ a j 1 e b j / θ Γ ( a j ) .
The pdf of the composite model based on the j t h prior is given by
f X j ( x ) = 0 L ( x ̲ | θ ) π j ( θ ) d θ = 0 c θ 0.35 n 1.35 m e 1.35 i = 1 m x i / θ b j a j θ a j 1 e b j / θ Γ ( a j ) d θ    = c b j a j Γ ( a j ) 0 θ ( a j 0.35 n + 1.35 m ) 1 e b j + 1.35 i = 1 m x i θ d θ .
The integrand in the last line of (16) is the kernel of inverse-gamma with parameters A j and B j , where A j = ( a j 0.35 n + 1.35 m ) and B j = ( b j + 1.35 i = 1 m x i ) . Therefore,
f X j ( x ) = c b j a j Γ ( a j ) Γ ( A j ) B j A j .
Using the above result, the j t h posterior distribution is
π j ( θ | x ) = L ( x ̲ | θ ) π j ( θ ) f X j ( x ) = c θ 0.35 n 1.35 m e 1.35 i = 1 m x i / θ b j a j θ a j 1 e b j / θ Γ ( a j ) c b j a j Γ ( a j ) Γ ( A j ) B j A j = c b j a j Γ ( a j ) θ ( a j 0.35 n + 1.35 m ) 1 e b j + 1.35 i = 1 m x i θ c b j a j Γ ( a j ) Γ ( A j ) B j A j ,
which reduces to
π j ( θ | x ) = B j A j θ A j 1 e B j / θ Γ ( A j ) .
Furthermore, we have
f X ( x ) = j = 1 K k j f X j ( x ) = j = 1 K k j c b j a j Γ ( a j ) Γ ( A j ) B j A j ,
hence,
f X j ( x ) f X ( x ) = c b j a j Γ ( a j ) Γ ( A j ) B j A j i = 1 K k j c b j a j Γ ( a i ) Γ ( A j ) B j A j = b j a j Γ ( a j ) Γ ( A j ) B j A j j = 1 K k j b j a j Γ ( a j ) Γ ( A j ) B j A j .
Using the above results, the posterior distribution, based on the mixture prior distribution, is
π ( θ | x ̲ ) = j = 1 K k j f X j ( x ) f X ( x ) π j ( θ | x ̲ )
j = 1 K k j b j a j Γ ( a j ) Γ ( A j ) B j A j j = 1 K k j b j a j Γ ( a j ) Γ ( A j ) B j A j B j A j θ A j 1 e B j / θ Γ ( A j ) .
Hence, under the squared-error loss function, the Bayes estimator for θ is
θ ^ B a y e s = E [ θ | x ̲ ] = j = 1 K k j b j a j Γ ( a j ) Γ ( A j ) B j A j j = 1 K k j b j a j Γ ( a j ) Γ ( A j ) B j A j E j [ θ | x ̲ ] = j = 1 K k j b j a j Γ ( a j ) Γ ( A j ) B j A j j = 1 K k j b j a j Γ ( a j ) Γ ( A j ) B j A j B j A j 1 = j = 1 K k j b j a j Γ ( a j ) Γ ( A j 1 ) B j A j 1 j = 1 K k j b j a j Γ ( a j ) Γ ( A j ) B j A j .

3.2. Bayesian Inference for the Composite IG–Pareto Based on the Mixture Prior Distribution

Aminzadeh and Deng (2019) developed the composite inverse-gamma–Pareto model as follows:
Suppose X is a random variable with the pdf f ( x ) , where f 1 ( x ) and f 2 ( x ) , respectively, are the pdfs of inverse-gamma and Pareto distributions.
f X ( x ) = c f 1 ( x ) 0 < x < θ c f 2 ( x ) θ x <
where
f 1 ( x ) = β α x α 1 e β / x Γ ( α ) , x > 0 ,   α > 0 ,   β > 0 ,
and
f 2 ( x ) = a θ a x a + 1 , x θ ,   a > 0 ,   θ > 0 ,
Recall that the composite pdf f ( x ) should be smooth at θ . Therefore,
f 1 ( θ ) = f 2 ( θ ) , f 1 ( θ ) = f 2 ( θ ) .
The simultaneous solutions of the above equations, after algebraic manipulations, lead to
k α e k Γ ( α ) = α k
where k = β θ and a = α k > 0 , which implies α > k > 0 . The functions on both sides of the above equations are positive and integrable; therefore, the integrals of the functions on a closed interval should be equal. Hence,
0 α k α e k Γ ( α ) d k = α 2 2 .
Using the gamma function, we obtain
Γ ( α + 1 ) = 0 α t ( α + 1 ) 1 e t d t + Γ ( α + 1 , α )
where Γ ( α + 1 , α ) = α t ( α + 1 ) 1 e t d t denotes the incomplete upper gamma function. In light of this result, the above equation reduces to
Γ ( α + 1 , α ) Γ ( α ) + 0.5 α 2 α = 0 .
Mathematica can solve the above equation numerically. We obtain α = 0.308289 . As a result, we have k = 0.144351 , a = α k = 0.163947 . To find the value of c, we need (see the definition of the composite pdf above)
c 0 θ ( k θ ) α x α 1 e k θ / x d x Γ ( α ) + θ a θ a x a + 1 d x = 1
which leads to
c = 1 1 + G R ( α , k ) = 0.711384 .
Note that GR stands for GammaRegularized and G R ( α , β x ) is the cdf of inverse-gamma with parameters α , β . Therefore, G R ( α , k θ θ ) , which is the first integral above, reduces to G R ( α , k ) . Mathematica can compute the GR function. The above findings reveal that four initial parameters reduce to only one parameter θ . As a result, the pdf of the IG–Pareto distribution is
f X ( x | θ ) = c ( k θ ) α x α 1 e k θ x Γ ( α ) , 0 < x θ c ( α k ) θ α k x α k + 1 , θ x <
and its cdf is given by
F X ( x | θ ) = c G R ( α , k θ x ) , 0 < x θ 1 c ( θ x ) α k , θ x <
The quantile function can be derived similarly to the exponential–Pareto composite distribution. Using the cdf above, we have F X ( θ | θ ) = c G R ( α , k ) = 1 c .
Case 1: 0 < P 1 c c G R ( α , k θ x P ) = P
x p = k θ InverseGammaRegularized ( α , P c ) ,
where InverseGammaRegularized ( α , P c ) can be computed via Mathematica.
Case 2: 1 c < P < 1 1 c ( θ x P ) α k = P
x p = θ ( 1 P c ) ( 1 k α ) .
For the special cases P = 0.25 and P = 0.99 , using the constant values k = 0.144351 , α = 0.308289 , c = 0.711384 , and Mathematica, we obtain
x 0.25 = 0.723798 θ and x 0.99 = 1.98138 × 10 11 θ .
Suppose x 1 , , x n is a random sample from the IG–Pareto distribution, without loss of generality, we assume x 1 < x 2 < < x n . The likelihood function is
L ( x ̲ | θ ) = Q e k θ i = 1 m 1 x i θ a ( n m ) + α m
where Q = c n k α m i = 1 m x i α 1 a n m i = m + 1 n x i a + 1 . For the formulation of the likelihood function, we assume an m ( m = 1 , 2 , , n 1 ) exists, such that in the sorted sample x m θ x m + 1 . The solution to L ( x ̲ | θ ) θ = 0 is the MLE for θ , which is
θ ^ M L E = m α + ( α k ) m k S , S = i = 1 m x i 1 .
Using (20), the Fisher information is
I ( θ ) = E [ 2 ln ( L ( x ̲ | θ ) ) 2 θ ] = n α θ 2 ,
As a result, the standard error of the MLE is 1 I ( θ ) = θ n α .
Using the same algorithm in Section 3.1, we identify the correct value, m, and compute θ ^ M L E .
Aminzadeh and Deng (2019), as a prior distribution for θ , used gamma( γ , δ ) with the pdf
π ( θ ) = θ γ 1 e θ / δ Γ ( γ ) δ γ , γ > 0 , δ > 0 ,
then, the posterior pdf is
f ( θ | x ̲ ) = L ( x ̲ | θ ) × π ( θ ) L ( x ̲ | θ ) × π ( θ ) d θ e θ ( k i = 1 m 1 x i + 1 δ ) θ n a + m ( α a ) + γ 1 .
The R.H.S. in (21) is the kernel of gamma ( A , B ) , A = n a + m ( α a ) + γ and B = δ ( δ k i = 1 m 1 x i + 1 ) . As a result, the pdf of the posterior is given by
π ( θ | x ̲ ) = θ A 1 e θ / B Γ ( A ) B A ,
as a result, under the squared-error loss function, the Bayes estimator for θ is
θ ^ B a y e s = E [ θ | x ̲ ] = A B = δ ( n a + m k + γ ) ( δ k i = 1 m 1 x i + 1 ) .
Now, we derive the Bayes estimator using the mixture prior distribution based on individual gamma priors,
π j ( θ ) = θ γ j 1 e θ / δ j Γ ( γ j ) δ i γ j ,
as a result
π ( θ ) = j = 1 K k j θ γ i 1 e θ / δ j Γ ( γ j ) δ j γ j , γ j > 0 , δ j > 0 , j = 1 , 2 , , K , j = 1 K k j = 1 .
The pdf of the composite model based on j t h prior is given by
f X j ( x ) = 0 L ( x ̲ | θ ) π j ( θ ) d θ
= 0 Q e k θ i = 1 m 1 x i θ a ( n m ) + α m θ γ j 1 e θ / δ j Γ ( γ j ) δ i γ j d θ
= Q Γ ( γ i ) δ i γ i 0 θ a ( n m ) + α m + γ j 1 e θ ( k i = 1 m 1 x i + 1 δ j ) d θ
The RHS of the last line in (23) is the kernel of Gamma( A j , B j ) , where
A j = a ( n m ) + α m + γ j , B j = 1 k i = 1 m 1 x i + 1 δ j .
Therefore, f X j ( x ) = Q Γ ( A j ) B i A j Γ ( γ j ) δ j γ j and
f X ( x ) = j = 1 K k j f X j ( x ) ,
f X j ( x ) f X ( x ) = Q Γ ( A j ) B j A j Γ ( γ j ) δ j γ j j = 1 K k j Q Γ ( A j ) B j A j Γ ( γ i ) δ j γ j = Γ ( A j ) B j A j Γ ( γ j ) δ j γ j j = 1 K k j Γ ( A j ) B j A j Γ ( γ j ) δ j γ j .
Therefore, the pdf of j t h posterior distribution is
π j ( θ | x ) = L ( x ̲ | θ ) π j ( θ ) f X j ( x ) = Q e k θ i = 1 m 1 x i θ a ( n m ) + α m θ γ j 1 e θ / δ j Γ ( γ j ) δ j γ j Q Γ ( A j ) B j A j Γ ( γ j ) δ j γ j
π j ( θ | x ) = θ a ( n m ) + α m + γ j 1 e θ ( k i = 1 m 1 x j + 1 δ j ) Γ ( A j ) B j A j = θ A j 1 e θ B j Γ ( A j ) B j A j
From (24) and (25), we conclude that the posterior distribution for IG–Pareto based on the mixture of gamma priors is
π ( θ | x ) = j = 1 K k j f X j ( x ) f X ( x ) π j ( θ | x ) = j = 1 K k j Γ ( A j ) B j A j Γ ( γ j ) δ j γ j j = 1 K k j Γ ( A j ) B i A j Γ ( γ j ) δ j γ j θ A j 1 e θ B j Γ ( A j ) B i A j .
Hence, under the squared-error loss function, the Bayes estimator for θ is
θ ^ B a y e s = E [ θ | x ̲ ] = j = 1 K k j f X j ( x ) f X ( x ) E j [ θ | x ̲ ] = j = 1 K k j Γ ( A j ) B j A j Γ ( γ j ) δ j γ j j = 1 K k j Γ ( A j ) B j A j Γ ( γ i ) δ j γ j A j B j = j = 1 K k j Γ ( A j + 1 ) B j A j + 1 Γ ( γ j ) δ j γ j j = 1 K k j Γ ( A j ) B j A j Γ ( γ j ) δ j γ j .

4. Simulation

4.1. Simulation for Composite Exponential–Pareto

To compare the accuracies of θ ^ M L E and θ ^ B a y e s (with and without a mixture of prior distributions), simulations are conducted using Mathematica. For the same generated sample, the code computes estimators using ( a j , b j , j = 1 , 2 , K ), weights k 1 , k 2 , , k K , ( j = 1 K k j = 1 ) . For each set of input parameters in the simulation, N = 1000 samples from the composite density (10) are generated.
For a random sample x 1 , , x n from the composite pdf (10) and without loss of generality, consider the ordered sample x 1 < x 2 < < x n . Recall (18),
θ ^ B a y e s = j = 1 K k j b j a j Γ ( a j ) Γ ( A j 1 ) B j A j 1 j = 1 K k j b j a j Γ ( a j ) Γ ( A j ) B j A j .
The following algorithm is used to determine m:
  • Start with m = 1 , check to see if x 1 θ ^ B a y e s x 2 , if yes, then m = 1 . Otherwise, go to step 2.
  • For m = 2 , if x 2 θ ^ B a y e s x 3 , then m = 2 , otherwise we consider m = 3 and continue until we find the correct value for m. The idea is to find the value for m so that x m θ ^ B a y e s x m + 1 . The Mathematica code uses the algorithm to find m and compute θ ^ B a y e s .
Selecting hyperparameter values could be challenging. Suppose two experts can provide partial prior information about the hyperparameter values. See, Rufo et al. (2010). The idea with the mixture prior distribution is to incorporate both experts’ opinions to find the Bayes estimate of θ . In this article, we use the same weights ( k 1 = k 2 = 0.5 ) for each expert’s opinion and consider two cases when K = 2 :
b 1 b 2 , a 1 a 2 values of b 1 and b 2 are provided by the experts .
a 1 a 2 , b 1 b 2 values of a 1 and a 2 are provided by the experts .
It is noted that B 1 = b 1 + 1.35 i = 1 m x i and B 2 = b 2 + 1.35 i = 1 m x i , which implies that B 1 B 2 . It is also noted A 1 = a 1 0.35 n + 1.35 m and A 2 = a 2 0.35 n + 1.35 m , and A 1 A 2 . From (18), we have
θ ^ B a y e s = b 1 a 1 Γ ( A 1 1 ) Γ ( a 1 ) B 1 A 1 1 + b 2 a 2 Γ ( A 2 1 ) Γ ( a 2 ) B 2 A 2 1 b 1 a 1 Γ ( A 1 ) Γ ( a 1 ) B 1 A 1 + b 2 a 2 Γ ( A 2 ) Γ ( a 2 ) B 2 A 2 .
Case 1: In this case, experts are quite sure about the values for b 1 and b 2 ; therefore, there are only two hyperparameter values that should be selected. We would like the optimal values for a 1 , a 2 . Given values of θ , b 1 , b 2 , Mathematica provides optimal values of a 1 , a 2 via a numerical optimization and the constraints E [ θ ] = 0.5 b 1 a 1 1 + b 2 a 2 1 = θ and, a 1 > 2 , a 2 > 2 .
NMinimize [ { Var ( θ ) ] , E [ θ ] = = θ , a 2 > 2 , a 1 > 2 } , { a 1 , a 2 } ] .
For example, for θ = 5 , b 1 = 25 , b 2 = 22 , we obtain a 1 = 5.90681 , a 2 = 5.48518 . Note that unlike in simulation studies, for a real dataset, a selected value for θ is not available. Hence, we propose a data-driven approach be used to compute a 1 , a 2 . Meaning, the equation 0.5 b 1 a 1 1 + b 2 a 2 1 = θ ^ M L E along with the above optimization command provide values for a 1 and a 2 .
Table 1a reveals that by selecting the hyperparameters, as described above, the mixture prior approach gives a more accurate Bayes estimate, as the average squared-error = ASE (Bayes) = ξ ( θ ^ B a y e s ) is smaller than its counterpart that does not use a mixture prior. For example, for b 1 = 260 , b 2 = 235 , we obtain a 1 = 52.921 , a 2 = 48.0728 . We can see that the smallest ASE values = 0.45027 and 0.41351, corresponding to the optimal set of hyperparameter values of n = 30,100, respectively. Also, comparing Table 1a with Table 1b, it is clear that both Bayes estimators (with and without the mixture prior) outperform MLE with regard to their accuracies, as ξ ( θ ^ B a y e s ) is much smaller than ξ ( θ ^ M L E ) . Boldface numbers in tables indicate the optimal values.
Case 2: In this case, experts are pretty sure about the values for a 1 and a 2 , and they would like the optimal values for b 1 , b 2 . Given values of θ , a 1 , a 2 , Mathematica provides optimal values of b 1 , b 2 via a numerical optimization and the constraint E [ θ ] = .5 b 1 a 1 1 + b 2 a 2 1 = θ .
NMinimize [ { Var ( θ ) ] , E [ θ ] = = θ } , { b 1 , b 2 } ] , b 1 > 0 , b 2 > 0 .
For example, for θ = 5 , a 1 = 4 , a 2 = 6 , we obtain b 1 = 13.63 , b 2 = 27.27 . Similar to case 2, to compute b 1 , b 2 , we use 0.5 b 1 a 1 1 + b 2 a 2 1 = θ ^ M L E in the above optimization command to find b 1 , b 2 .
Like the previous case, Table 2a confirms that by selecting the hyperparameters, as described above, the mixture prior approach provides a more accurate Bayes estimate, as ASE (Bayes) = ξ ( θ ^ B a y e s ) is smaller than its counterpart that does not use a mixture prior. For example, for a 1 = 110 , a 2 = 98 , we obtain b 1 = 545.312 , b 2 = 484.727 . Again, the smallest ASE values = 0.26801 and 0.24191, corresponding to the optimal hyperparameter values of n = 30 and 100, respectively. Also, comparing Table 2a with Table 2b, in this case, both Bayes estimators (with and without the mixture prior) outperform MLE with regard to their accuracies.

4.2. Simulation for Composite Inverse-Gamma–Pareto

Simulations compare similarly to the composite exponential–Pareto model to compare the accuracy of θ ^ B a y e s based on with and without mixture prior distributions. For selected values of n and θ , the hyperparameters ( γ j , δ j , j = 1 , 2 , K ) for gamma prior distributions, and weights k 1 , k 2 , , k K , ( j = 1 K k j = 1 ) . The simulation study generates N = 1000 samples from the composite density (19).
Given a random sample x 1 , , x n from the composite pdf in (19), without loss of generality, consider the ordered sample x 1 < < x n . Recall the Bayes estimator (26), which uses the mixture gamma prior distributions,
θ ^ B a y e s = j = 1 K k j Γ ( A j + 1 ) B j A j + 1 Γ ( γ j ) δ j γ j j = 1 K k j Γ ( A j ) B j A j Γ ( γ j ) δ j γ j .
The algorithm described in Section 4.1 determines the value of m. Like the exp–Pareto composite distribution case, we must select the prior distributions’ hyperparameters. We consider the mixture prior distribution with equal weights and K = 2 . For the prior distribution π 1 ( θ ) = gamma ( γ 1 , δ 1 ) , and π 2 ( θ ) = gamma ( γ 2 , δ 2 ) , consider two cases:
γ 1 γ 2 , δ 1 δ 2 values of γ 1 and γ 2 are provided by the experts .
δ 1 δ 2 , γ 1 γ 2 values of δ 1 and δ 2 are provided by the experts .
Here, γ 1 γ 2 , δ 1 δ 2 , under the assumption k 1 = k 2 = 0.5 , it can be shown that
E ( θ ) = 0.5 ( γ 1 δ 1 + γ 2 δ 2 )
V a r ( θ ) = 0.25 ( γ 1 δ 1 γ 2 δ 2 ) 2 + 0.5 γ 1 δ 1 2 + 0.5 γ 2 δ 2 2 .
Table 3a and Table 4a provide simulation results for Cases 1 and 2. Since δ 1 δ 2 , and γ 1 γ 2 , we have
A 1 = a ( n m ) + α m + γ 1 A 2 = a ( n m ) + α m + γ 2
B 1 = 1 k i = 1 m 1 x i + 1 δ 1 B 2 = 1 k i = 1 m 1 x i + 1 δ 2 .
From (26), we have
θ ^ B a y e s = Γ ( A 1 + 1 ) B 1 A 1 + 1 Γ ( γ 1 ) δ 1 γ 1 + Γ ( A 2 + 1 ) B 2 A 2 + 1 Γ ( γ 2 ) δ 2 γ 2 Γ ( A 1 ) B 1 A 1 Γ ( γ 1 ) δ 1 γ 1 + Γ ( A 2 ) B 2 A 2 Γ ( γ 2 ) δ 2 γ 2 .
Case 1:  γ 1 γ 2 , δ 1 δ 2 where values of γ 1 and γ 2 are provided by the experts .
For given values of θ , γ 1 , and γ 2 , Mathematica provides optimal values of δ 1 , δ 2 via a numerical minimization for V a r ( θ ) , and the constraint E ( θ ) = 0.5 ( γ 1 δ 1 + γ 2 δ 2 ) = θ . Again, when we have a real dataset, 0.5 ( γ 1 δ 1 + γ 2 δ 2 ) = θ ^ M L E is used in the optimization command below in Mathematica to compute hyperparameters δ 1 , δ 2 .
NMinimize [ { Var ( θ ) , E [ θ ] = = θ , δ 1 > 0 , δ 2 > 0 } , { δ 1 , δ 2 } ] .
For example, when θ = 5 , γ 1 = 2 , and γ 2 = 2.5 , the optimal solutions are δ 1 = 2.41379 , δ 2 = 2.06897 . When θ = 5 , γ 1 = 5 , and γ 2 = 5.5 , the optimal solutions δ 1 = 0.99236 , δ 2 = 0.91603 .
Table 3a reveals that by selecting the hyperparameters, as described above, the mixture prior approach gives a more accurate Bayes estimate, as ASE (Bayes) = ξ ( θ ^ B a y e s ) is smaller than its counterpart that does not use a mixture prior. For example, for a sample size of n = 30 , the smallest ASE values = 1.72061 and 1.16326, corresponding to the optimal two sets of hyperparameter values. Also, for a sample size of n = 100 , the smallest ASE values 1.06654 and 0.91892, corresponding to the optimal two sets of hyperparameter values. In this case, Table 3a,b suggest that both Bayes estimators (with and without mixture prior distributions) outperform MLE.
Case 2:  δ 1 δ 2 , γ 1 γ 2 where values of δ 1 and δ 2 are provided by the experts .
Given the values of θ , δ 1 , and δ 2 , Mathematica provides optimal values of γ 1 , γ 2 via a numerical minimization for V a r ( θ ) and the constraint E ( θ ) = 0.5 ( γ 1 δ 1 + γ 2 δ 2 ) = θ .
NMinimize [ { Var ( θ ) , E [ θ ] = = θ , γ 1 > 0 , γ 2 > 0 } , { γ 1 , γ 2 } ] .
For example, for θ = 5 , δ 1 = 2.4 , and δ 2 = 2.6 , the optimal solutions are γ 1 = 2.10417 , γ 2 = 1.90384 . For θ = 5 , δ 1 = 1 , and δ 2 = 1.1 , the optimal solutions are γ 1 = 5.025 , γ 2 = 4.52273 . Similar to other cases, for real data, 0.5 ( γ 1 δ 1 + γ 2 δ 2 ) = θ ^ M L E along with the above command is used to find hyperparameters γ 1 , γ 2 .
Table 4a reveals that the mixture prior approach gives a more accurate Bayes estimate, as ASE (Bayes) = ξ ( θ ^ B a y e s ) is smaller than its counterpart that does not use a mixture prior. For example, for a sample size of n = 30 , the smallest ASE values = 1.85072 and 1.20807, corresponding to the optimal two sets of hyperparameter values. Also, for a sample size of n = 100 , the smallest ASE values = 1.08946 and 0.96434, corresponding to the optimal two sets of hyperparameter values. Table 4a,b reveal that both Bayes estimators (with and without mixture prior distributions) outperform MLE.

5. Numerical Example

5.1. Data and Basic Descriptive Statistics

This section considers possible models via methods presented in the article for the dataset. The objectives are to find out if using the mixture prior approach in the Bayesian framework provides better results concerning the Bayes estimate of the parameter θ and to select the best model that fits the data. The insurance losses from natural events, such as floods, are obtained from EM-DAT, the International Disaster Database. EM-DAT contains all natural events worldwide in raw data on the occurrences and effects from 1900 to the present day. “The database is compiled from various sources, including the United Nations agencies, non-governmental organizations, insurance companies, research institutes, and press agencies”. This paper considers flood insurance damage in the USA from 2000 to 2019. EM-DAT also provides the annual average CPI using the base year 2019. To eliminate the effect of inflation, all insurance damage amounts are converted to 2019 dollars.
Figure 2 shows the CPI-adjusted (in 2019 dollars) histogram of insurance damage amounts from 2000 to 2019 in the USA. There were 23 recorded insurance losses due to natural event floods in the USA.
Figure 2 provides the frequentist statistics. The average insurance loss due to a natural event storm is x ¯ = USD 62.6694 million, the minimum loss is USD 5.3996 million, and the maximum loss is USD 266.302 million. The standard deviation is s = USD 71.0041 million, which indicates that the data are widespread. The skewness is 1.96116, which indicates that the data are right-skewed. The kurtosis is 5.79031, which tells us the data have a heavy tail. The histogram also shows the high frequency for small amounts of damage and the low frequency for large amounts of insurance losses. The data represent typical insurance data for which composite models are applicable; see Aminzadeh and Deng (2019). The regular distributions, such as normal and exponential, cannot effectively model the losses.

5.2. Model Selection

Miljkovic and Grün (2016) provide goodness-of-fit measures to determine the appropriateness of the fitted models.
NLL: the negative log-likelihood
NLL is used to compare models with the same number of parameters. NLL corresponds to the model with the minimum value of ln L ( x 1 , x 2 , , x n | θ ̲ , among models considered, where L ( x 1 , x 2 , , x n | θ ̲ ) is the likelihood function of data, and θ is the parameter that can be multi-dimensional. The model with a smaller NLL value indicates that the model has a better fit for the data than other models under consideration.
AIC: Akaike’s information criterion
To compare models with different parameter numbers, we consider AIC (Akaike’s information criterion) and BIC (Bayesian information criterion). Both measures penalize the increase in the number of parameters.
Akaike developed AIC,
A I C = ln L ( x 1 , x 2 , , x n | θ ̲ + 2 q
where q is the number of parameters. As q increases, ln ( L ( x 1 , x 2 , x n | θ ̲ ) decreases and 2q becomes larger. It provides the trade-off between the models with different parameters. The model with a smaller AIC value indicates that the model has a better fit for the data than other models under consideration.
BIC: Bayesian information criterion
BIC was proposed by Schwarz and is given by
B I C = ln L ( x 1 , x 2 , , x n | θ ̲ + q ln ( n )
which also depends on the sample size n. BIC not only penalizes the increase in the number of parameters but also the increase in the sample size. The smallest BIC indicates the best-fitted model among the models under consideration.

5.2.1. Goodness-of-Fit Measures for Maximum Likelihood Method

Table 5 provides the MLE, NLL, AIC, and BIC values for different models. Standard errors of MLEs (see Section 4.1 and Section 4.2) also are listed in the table. For example, when we use the exponential model to fit the insurance flood loss data, there is one unknown parameter λ . The MLE of λ based on the exponential model is λ ^ = 0.0159 . Using λ ^ , the goodness-of-fit measures—NLL, AIC, and BIC—are computed as 118.171, 238.342, and 239.478. Table 5 reveals that based on NLL, AIC, and BIC, among non-composite models (exponential, inverse-gamma) and composite models (exp–Pareto, IG–Pareto), IG–Pareto has the smallest NLL, which is 105.701. Therefore, IG–Pareto is the best-fitted model among all four models for insurance losses due to natural event floods from 2000 to 2019.
Note that for the IG( α , β ) distribution in Table 5, using the second derivatives of the log-likelihood function, we obtain
2 ln ( L ( x ̲ | α , β ) ) 2 α = n polygamma ( 1 , α )
2 ln ( L ( x ̲ | α , β ) ) 2 β = n α β 2
where polygamma ( 1 , α ) is the first derivative of the digamma function, which Mathematica can compute. These derivatives are used in Table 5 to compute the standard error of the MLEs.

5.2.2. Bayesian Inference of IG–Pareto

NLL, AIC, and BIC are criteria for evaluating models estimated by maximum likelihood methods. They may not be suitable for Bayesian model selections. Many researchers proposed the Bayesian approach. Ando (2010) introduced the Bayes factor, originally proposed by Kass and Raftery (1995), among other authors. The logic behind using Bayesian inference for the real data is based on the idea that it provides a more accurate estimator than the ML method, as verified by the simulation studies in Table 1a, Table 2a, Table 3a and Table 4a. The Bayesian estimator based on the mixture prior approach is more accurate than a non-Bayesian method, such as MLE, provided that a data-driven approach is instigated for the hyperparameters.
Bayes factor
The odds of the marginal likelihood of the data, x ̲ , is given by
B 1 , 2 ( x ̲ ) = Bayes factor ( M 1 , M 2 ) = P ( x ̲ | M 1 ) P ( x ̲ | M 2 )
where P ( x ̲ | M 1 ) , P ( x ̲ | M 2 ) are marginal likelihoods of the dataset corresponding to two models: M 1 and M 2 . If B 1 , 2 ( x ̲ ) > 1 , it is concluded that M 1 is a better-fitted model than M 2 . Ando (2010) states, “The Bayes Factor chooses the model with the largest value of marginal likelihood among a set of candidate models”. The following is Jeffreys’ scale of evidence for interpreting the Bayes factor:
  • If B 1 , 2 ( x ̲ ) < 1 , negative support for M 1
  • If 1 < B 1 , 2 ( x ̲ ) < 3 , barely worth mentioning M 1
  • If 3 < B 1 , 2 ( x ̲ ) < 10 , substantial evidence for M 1
  • If 10 < B 1 , 2 ( x ̲ ) < 30 , strong evidence for M 1
  • If 30 < B 1 , 2 ( x ̲ ) < 100 , very strong evidence for M 1
  • If B 1 , 2 ( x ̲ ) > 100 , decisive evidence for M 1 .
The Marginal Likelihood
Let x 1 , x 2 , , x n be a random sample with the distribution f ( x | θ ) . Then, the likelihood function is given by
i = 1 n f ( x i | θ ) .
Let π ( θ ) be the prior distribution for the parameter θ , then the marginal likelihood function (PML) is defined by
PML ( x ̲ | model ) = 0 i = 1 n f ( x i | θ ) π ( θ ) d θ ,
where θ > 0 , which is the case for composite models, such as exp–Pareto and IG–Pareto considered in this article.
According to Table 5, as IG–Pareto is the best-fitted composite model, going forward, we will consider the mixture prior approach as discussed in the previous sections and apply it to the IG–Pareto composite model. From (20)
L ( x ̲ | θ ) = Q e k θ i = 1 m 1 x i θ a ( n m ) + α m
where Q = c n k α m i = 1 m x i α 1 a n m i = m + 1 n x i a + 1 and m is a positive integer, such as x m θ x m + 1 . Let gamma( γ , δ ) be the prior distribution with the pdf
π ( θ ) = θ γ 1 e θ / δ Γ ( γ ) δ γ , γ > 0 , δ > 0 , θ > 0 ,
and A = n a + m ( α a ) + γ , B = δ i δ k i = 1 m ( 1 x i + 1 ) . The marginal likelihood function (PML) is
P M L ( x ̲ | δ , γ ) = 0 Q e k θ i = 1 m 1 x i θ a ( n m ) + α m θ γ 1 e θ / δ Γ ( γ ) δ γ d θ
= Q Γ ( γ ) δ γ 0 e θ ( k i = 1 m 1 x i + 1 δ ) θ a ( n m ) + α m + γ 1 d θ = Q Γ ( A ) B A Γ ( γ ) δ γ ,
where Γ ( A ) denotes the gamma function evaluated at A. Now, consider the mixture distribution π ( θ ) of gamma priors π ( γ 1 , δ 1 ) and π ( γ 2 , δ 2 ) , with equal weights k 1 = k 2 = 0.5 .
π ( θ ) = 1 2 θ γ 1 1 e θ / δ 1 Γ ( γ 1 ) δ 1 γ 1 + θ γ 2 1 e θ / δ 2 Γ ( γ 2 ) δ 2 γ 2 , γ 1 , γ 2 > 0 , δ 1 , δ 2 > 0 , θ > 0 ,
and let A h = n a + m ( α a ) + γ h , B h = δ h δ h k I = 1 m ( 1 x i + 1 ) , h = 1 , 2 . We can see that, given the mixture prior, the PML is represented as:
P M L ( x ̲ | δ 1 , γ 1 , δ 2 , γ 2 ) = 1 2 Q ( Γ ( A 1 ) B 1 A 1 Γ ( γ 1 ) δ 1 γ 1 + Γ ( A 2 ) B 2 A 2 Γ ( γ 2 ) δ 2 γ 2 ) .
As mentioned, selecting the hyperparameters γ , δ is challenging. The expected value and variance for π ( γ , δ ) are given as follows:
E ( θ ) = γ δ Var ( θ ) = γ δ 2 .
To find the optimal values of the hyperparameters γ and δ , we propose minimizing the variance γ δ 2 under the constraint E ( θ ) = θ . Substituting E ( θ ) = γ δ into the variance formula, we have Var ( θ ) = θ δ . Therefore, the smaller the δ , the smaller the variance. The variance is an increasing function concerning the parameter δ . Note that the coefficient of variation is 1 γ . Since we do not know the "real" value of θ , as in the simulation section, we replace θ with its MLE θ ^ = 49.3097 . Table 6 provides PML values for the selected models ( M 1 M 4 ) with γ = 10 , 20 , 30 , and 50, and the corresponding δ = 4.93097 , 2.46549 , 1.64366 , and 0.98619 . It is clear that the smaller the variance, the better the model. For example, the Bayes factor of M 4 vs. M 1 is 3.90122 × 10 56 3.28084 × 10 56 = 1.1891 . M 4 is about 118.91 % times as likely as M 1 .
Table 6 also provides PML values based on the mixture prior distribution when K = 2 and weights k 1 = k 2 = 0.5 . The same Mathematica code as in the simulation section computes the hyperparameter values, assuming the “true” parameter value of theta is θ ^ = 49.3097.
Recall the expected value and variance for θ under the assumptions γ 1 γ 2 , δ 1 δ 2 , and k 1 = k 2 = 0.5 , are
E ( θ ) = 0.5 ( γ 1 δ 1 + γ 2 δ 2 ) Var ( θ ) = 0.25 ( γ 1 δ 1 γ 2 δ 2 ) 2 + 0.5 γ 1 δ 1 2 + 0.5 γ 2 δ 2 2 .
Using the Mathematica code below, we find optimal δ 1 , δ 2 for given γ 1 , γ 2 (see Case 1 in Section 4.2).
NMinimize [ { Var ( θ ) , E [ θ ] = = 49.3097 , δ 1 > 0 , δ 2 > 0 } , { δ 1 , δ 2 } ] .
Using the Mathematica code below, we find optimal γ 1 , γ 2 for given δ 1 , δ 2 (see Case 2 in Section 4.2).
NMinimize [ { Var ( θ ) , E [ θ ] = = 49.3097 , γ 1 > 0 , γ 2 > 0 } , { γ 1 , γ 2 } ] .
Table 6 provides the PML values and the Bayesian estimates of the support parameter θ . We note that the model based on the mixture prior outperforms the models without the mixture prior distribution. The model M 6 -based equal-weight mixture of π ( 27.5299 , 2 ) and π ( 1.74239 , 25 ) provides the maximum PML value, which is 1.61188 × 10 53 . Model M 5 , with an equal-weight mixture of π ( 25 , 2.11327 ) and π ( 5 , 9.15752 ) , provides the second largest PML value, which is 7.19919 × 10 55 . For example, the Bayes factor of M 6 vs. M 4 is 7.19874 × 10 55 3.90122 × 10 56 = 18.4525 . Therefore M 6 is about 1845.25 % times as likely as M 4 . Based on Jeffreys’ scale evidence, we have strong evidence for M 6 (that it fits the data).
Table 7 summarizes the Bayes factors for Models M 1 to M 6 , which we discussed in Table 6. It can clearly be seen that M6 outperforms all other models due to Bayes factor ( M 6 , M h ) > 1 for h = 1 , , 5 , and M5 is the second best since Bayes factor ( M 5 , M h ) > 1 for h = 1 , , 4 . Hence, the model with the mixture prior outperforms the model without the mixture prior. The Bayes factor ( M k , M j ) is denoted as B k j in Table 7.
Figure 3 presents the virtual presentation of the gamma priors and the gamma mixture priors to the IG–Pareto composite model in Table 6. Figure 3a shows gamma priors corresponding to M 1 to M 4 . When the shape parameter γ increases, the shapes of the curves become more symmetric and bell-shaped; Figure 3b corresponds to M 5 ; the equal weight mixture of two gamma priors shows a bimodal shape; even one mode around 28 is not significantly recognized. Figure 3c corresponds to M 6 . The equal-weight mixture of two gamma priors clearly shows a bimodal shape.
Table 8 compares the models with the optimal mixture prior ( M 5 , M 6 ) and without the optimal mixture prior ( M 5 , M 6 ). Note that in the selection of hyperparameters for M 5 , M 6 , we do not minimize the variance. We only ensure the mean equation
E ( θ ) = 1 2 ( γ 1 δ 1 + γ 2 δ 2 )
is satisfied.
Since the “true” value of θ is unknown, as mentioned before, we use its MLE θ ^ = 49.3097. Therefore, for given γ 1 , γ 2 ,and δ 1 , we have
δ 2 = 2 E ( θ ) γ 1 δ 1 γ 2 = 2 × 49.3097 γ 1 δ 1 γ 2
which leads to M 5 . Also, for the given δ 1 , δ 2 , and γ 1 , we have
γ 2 = 2 E ( θ ) γ 1 δ 1 δ 2 = 2 × 49.3097 γ 1 δ 1 δ 2
which leads to M 6 .
Table 8 reveals that the models with the optimal mixture prior are better than those without the optimal mixture prior. For example, the Bayes factor ( M 5 , M 5 ) = 5.0458. The model with the optimal mixture prior, M 5 , is about 504.58% as likely as the model without the optimal mixture prior, M 5 .
The value at risk, Klugman et al. (2012), is an important and standard risk measure in the insurance industry. VaR p is the capital required at a higher probability p to ensure the company will not go bankrupt.
P ( X Var p ) = p
The V a R p , or an upper limit prediction for a future value y, can be obtained via the predictive density f ( y | x ̲ ) . Aminzadeh and Deng (2019) provide the predictive density for the IG–Pareto composite model based on only one gamma prior distribution, as
f ( y | x ̲ ) = 0 f ( θ | x ̲ ) f Y ( y | θ ) d θ = K 1 ( y ) ( 1 H 1 ( y | α + A , y B k B + y ) ) + K 2 ( y ) H 2 ( y | α k + A , B )
where
K 1 ( y ) = k α Γ ( α + A ) y A 1 B α ( 1 + G R ( α , k ) ) Γ ( α ) ) Γ ( A ) ( k B + y ) α + A
and
K 2 ( y ) = ( α k ) B α k Γ ( α k + A ) ( 1 + G R ( α , k ) ) Γ ( A ) y α k + 1 ,
H 1 denotes the cdf of gamma ( α + A , y B k B + y ) and H 2 denotes the cdf of gamma ( α k + A , B ) , where
A = n a + m ( α a ) + γ , B = δ ( δ k i = 1 m 1 x i + 1 ) .
The above results can be extended to the mixture of two gamma prior distributions. Recall from Section 2.1,
f ( y | x ̲ ) = j = 1 K β j f j ( y | x ) ,
β j = k j f X j ( x ) j = 1 K k j π X j ( θ | x ) f X j ( x ) d θ = k j f X j ( x ) j = 1 K k j f X j ( x ) .
Therefore, for the K = 2 case, based on (27), we have
f j ( y | x ̲ ) = K 1 j ( y ) ( 1 H 1 ( y | α + A j , y B j k B j + y ) ) + K 2 j ( y ) H 2 ( y | α k + A j , B j ) , j = 1 , 2
and K 1 j ( y ) , K 2 j ( y ) can be defined via (27) using the corresponding A j , B j .
The last column of Table 6 for models M 1 to M 6 provides VaR 0.95 values, which are found using the predictive density (28),
P ( Y V a R 0.95 | x ̲ ) = 0.95 ,
and Mathematica. The values tell us how much the company should reserve under each model to avoid bankruptcy at the 95% confidence level. For example, if we use model M 6 , we should reserve USD 5.4426 × 10 8 million to avoid bankruptcy at the 95% confidence level.

6. Conclusions

This article considers the class of composite models. We are interested in exploring the Bayesian estimates of the threshold parameter θ , which separates the small losses with high frequencies and significant losses with low frequencies. Two composite models, exp–Pareto and IG–Pareto, are considered as examples. The prior distribution for parameter θ uses a mixture of prior distributions. We verify that, in general, the posterior distribution is a mixture of individual posterior distributions. For each composite model considered in the article, the general formula of the Bayes estimator of θ is derived based on the mean squared error loss function.
Simulation results compare the accuracies of θ ^ B a y e s using with and without mixture prior distributions. Also, the accuracy of θ ^ M L E is compared to the Bayes estimates. For both exp–Pareto and IG–Pareto models, respectively, methods for choosing the optimal hyperparameter ( a i , b i ) , ( γ i , δ i , i = 1 , 2 , K ) values, are proposed. The proposed method is data-driven, as it uses the MLE of θ based on real data to compute optimal values for hyperparameters. Simulations reveal that the Bayesian estimator with the mixture prior distribution is more accurate than the Bayesian estimator without the mixture prior distribution. Also, both Bayes estimators are more accurate than MLE.
For an illustration of computations involved in the proposed methods, the insurance losses in the USA from 2000 to 2019 due to natural event floods are considered and downloaded from EM-DAT, the International Disaster Database. In order to eliminate the effect of inflation, all insurance damage amounts are converted to 2019 dollars. Based on NLL, AIC, and BIC measures, the conclusion is that IG–Pareto provides the best fit, which leads us to apply the Bayesian method to the IG–Pareto composite model. We have shown that the IG–Pareto model with the mixture gamma prior distribution most optimally fits the data based on the optimal hyperparameter value. For the comparison of Bayesian models, the Bayes factor is used.
Potential future research would involve extending the mixture prior approach to other composite distributions, such as log-normal–Pareto and Rayleigh-Pareto. Furthermore, the mixture prior approach can be investigated for composite models that involve more than two distributions. For example, consider Pareto for the right tail of data with very large losses, a non-heavy tail distribution for small losses in the data, and another distribution that models moderate losses in the center of data.

Author Contributions

Conceptualization, M.D. and M.S.A.; Methodology, M.D. and M.S.A.; Software, M.D. and M.S.A.; Validation, M.S.A.; Formal analysis, M.D. and M.S.A.; Investigations, M.D. and M.S.A.; Resources, M.D. and M.S.A.; Data curation, M.D.; writing—original draft preparation, M.D. and M.S.A.; writing—review and editing, M.D. and M.S.A.; visualization, M.D. and M.S.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Acknowledgments

The authors are grateful for the invaluable time and suggestions of the editors and reviewers to enhance the presentation of the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Abdul Majid, Muhammad Hilmi, and Kamarulzaman Ibrahim. 2021a. Composite Pareto Distributions for Modeling Household Income Distribution in Malaysia. Sains Malaysiana 50: 2047–58. [Google Scholar] [CrossRef]
  2. Abdul Majid, Muhammad Hilmi, and Kamarulzaman Ibrahim. 2021b. On Bayesian approach to composite Pareto models. PLoS ONE 16: e0257762. [Google Scholar] [CrossRef] [PubMed]
  3. Aminzadeh, Mostafa S., and Min Deng. 2017. Bayesian Predictive Modeling for Exponential-Pareto Composite Distribution. Variance 12: 59–68. [Google Scholar]
  4. Aminzadeh, Mostafa S., and Min Deng. 2019. Bayesian Predictive Modeling for Inverse Gamma-Pareto Composite Distribution. Communications In Statistics, Theory, and Methods 48: 1938–54. [Google Scholar] [CrossRef]
  5. Ando, Tomohiro. 2010. Bayesian Model Selection and Statistical Modeling. Orange: Chapman & Hall/CRC. [Google Scholar]
  6. Bakar, S. A. Abu, Nor A. Hamzah, Mastoureh Maghsoudi, and Saralees Nadarajah. 2015. Modeling loss data using composite models. Insurance: Mathematics and Economics 61: 146–54. [Google Scholar]
  7. Bhati, Deepesh, Enrique Calderín-Ojeda, and Mareeswaran Meenakshi. 2019. A new heavy-tailed class of distributions which includes the Pareto. Risks 7: 99. [Google Scholar] [CrossRef]
  8. Cooray, Kahadawala, and Chin-I. Cheng. 2013. Bayesian Estimators of the Lognormal-Pareto Composite Distribution. Scandinavian Actuarial Journal 2015: 500–15. [Google Scholar] [CrossRef]
  9. Deng, Min, and Mostafa S. Aminzadeh. 2019. Bayesian predictive analysis for Weibull-Pareto composite model with an application to insurance data. Communications in Statistics-Simulation and Computation 51: 2683–709. [Google Scholar] [CrossRef]
  10. Deng, Min, Mostafa S. Aminzadeh, and Min Ji. 2021. Bayesian Predictive Analysis of Natural Disaster Losses. Risks 9: 12. [Google Scholar] [CrossRef]
  11. Dominicy, Yves, and Corinne Sinner. 2017. Distributions and composite models for size-type data. Advances in Statistical Methodologies and Their Application to Real Problems 159. [Google Scholar] [CrossRef]
  12. Kass, Robert E., and Adrian E. Raftery. 1995. Bayes factors. Journal of the American Statistical Association 90: 773–95. [Google Scholar] [CrossRef]
  13. Klugman, Stuart A., Harry H. Panjer, and Gordon E. Willmot. 2012. Loss Models from Data to Decisions, 3rd ed. New York: John Wiley. [Google Scholar]
  14. Miljkovic, Tatjana, and Bettina Grün. 2016. Modeling loss data using mixtures of distributions. Insurance Mathematics, and Economics 70: 387–96. [Google Scholar] [CrossRef]
  15. Preda, Vasile, and Roxana Ciumara. 2006. On Composite Models: Weibull-Pareto and Lognormal-Pareto—A comparative Study. Romanian Journal of Economic Forecasting 8: 32–46. [Google Scholar]
  16. Rufo, María Jesús, Carlos. J. Pérez, and Jacinto Martín. 2010. Merging experts’ opinions: A Bayesian hierarchical model with a mixture of prior distributions. European Journal of Operational Research 207: 284–89. [Google Scholar] [CrossRef]
  17. Saleem, Muhammad. 2010. Bayesian Analysis of Mixture Distributions. Ph.D. thesis, Quaid-i-Azam University Islamabad, Islamabad, Pakistan. Available online: http://prr.hec.gov.pk/jspui/bitstream/123456789/1430/1/824S.pdf (accessed on 1 August 2023).
  18. Scollnik, David P. M., and Chenchen Sun. 2012. Modeling with Weibull-Pareto Models. North American Actuarial Journal 16: 260–72. [Google Scholar] [CrossRef]
  19. Teodorescu, Sandra, and Raluca Vernic. 2006. A composite Exponential-Pareto distribution. The Annals of the “Ovidius” University of Constanta, Mathematics Series 14: 99–108. [Google Scholar]
Figure 1. Comparison of non-mixture gamma with mixture gamma distributions. (a) Gamma distributions with different parameters. (b) Equally weighted mixture gamma distributions. (c) Unequally weighted mixture of two gamma distributions with different shapes and the same scale. (d) Unequally weighted mixture of two gamma distributions with the same shape and different scales.
Figure 1. Comparison of non-mixture gamma with mixture gamma distributions. (a) Gamma distributions with different parameters. (b) Equally weighted mixture gamma distributions. (c) Unequally weighted mixture of two gamma distributions with different shapes and the same scale. (d) Unequally weighted mixture of two gamma distributions with the same shape and different scales.
Risks 11 00156 g001
Figure 2. Histogram of insurance damage from 2000 to 2019 in 2019 USD dollars.
Figure 2. Histogram of insurance damage from 2000 to 2019 in 2019 USD dollars.
Risks 11 00156 g002
Figure 3. The gamma distribution and the mixture gamma distribution of the prior distribution for the IG–Pareto composite models corresponding to Table 6. (a) Gamma distribution of the prior distribution for the IG–Pareto with different shapes and sizes. (b) The mixture gamma distribution of the prior distribution for the IG–Pareto with different shapes and sizes. (c) The mixture gamma distribution of the prior distribution for the IG–Pareto with different shapes and sizes.
Figure 3. The gamma distribution and the mixture gamma distribution of the prior distribution for the IG–Pareto composite models corresponding to Table 6. (a) Gamma distribution of the prior distribution for the IG–Pareto with different shapes and sizes. (b) The mixture gamma distribution of the prior distribution for the IG–Pareto with different shapes and sizes. (c) The mixture gamma distribution of the prior distribution for the IG–Pareto with different shapes and sizes.
Risks 11 00156 g003
Table 1. Improved Bayes estimation with hyperparameter selection for mixture priors.
Table 1. Improved Bayes estimation with hyperparameter selection for mixture priors.
(a): Comparison of the Bayes estimator without the mixture prior (K = 1)
and with the mixture prior (K = 2) when b 1 , b 2 are given
θ = 5
Kn b 1 b 2 a 1 a 2 θ ^ ¯ B a y e s ξ ( θ ^ B a y e s )
130245 N A a 1 = b 1 θ + 1 NA5.117320.42551
23026023552.92148.0275.116620.42337
2302602355546.321435.124720.45379
1100245 N A a 1 = b 1 θ + 1 NA5.069080.41555
210026023552.92148.0275.068750.41351
21002602355546.321435.093620.54073
(b): Mean and A S E of MLE of θ
θ = 5
n θ ^ ¯ M L E ξ ( θ ^ M L E )
307.397074.88656
1006.092381.74385
Table 2. Improved Bayes estimation with hyperparameter selection for mixture priors.
Table 2. Improved Bayes estimation with hyperparameter selection for mixture priors.
(a): Comparison of the Bayes estimator without the mixture prior (K = 1)
and with the mixture prior (K = 2) for given a 1 , a 2
θ = 5
Kn a 1 a 2 b 1 b 2 θ ^ ¯ B a y e s ξ ( θ ^ B a y e s )
130100 N A b 1 = θ ( a 1 1 ) NA5.072430.27438
23011098545.312484.7275.070360.26801
23011098560471.6515.073270.27393
1100100 N A b 1 = θ ( a 1 1 ) NA5.034040.24642
210011098545.312484.7275.033120.23956
210011098560471.6515.034790.25197
(b): Mean and A S E of MLE of θ
θ = 5
n θ ^ ¯ M L E ξ ( θ ^ M L E )
307.447025.34616
1006.053741.70871
Table 3. Bayes estimators (with and without mixture prior distributions) outperform MLE.
Table 3. Bayes estimators (with and without mixture prior distributions) outperform MLE.
(a): Comparison of the Bayes estimator without the mixture prior (K = 1)
and with the mixture prior (K = 2) when γ 1 , γ 2 are given
θ = 5
Kn γ 1 γ 2 δ 1 δ 2 θ ^ ¯ B a y e s ξ ( θ ^ B a y e s )
1302 N A δ 1 = θ γ 1 NA5.449431.70629
23022.52.413792.068975.424561.63481
23022.531.65.399211.70996
1305 N A δ 1 = θ γ 1 NA5.306381.19520
23055.50.992370.916035.29441.16326
23055.51.10.818185.300431.21116
11002 N A δ 1 = θ γ 1 NA5.164171.08021
210022.52.413792.068975.161891.06654
210022.531.65.132961.06718
11005 N A δ 1 = θ γ 1 NA5.159040.92941
210055.50.992370.916035.156210.91892
210055.51.10.818185.148460.92938
(b): Mean and A S E of MLE of θ
θ = 5
n θ ^ ¯ M L E ξ ( θ ^ M L E )
305.907312.8898
1005.164171.08021
Table 4. Improved Bayes estimation with hyperparameter selection for mixture priors.
Table 4. Improved Bayes estimation with hyperparameter selection for mixture priors.
(a): Comparison of the Bayes estimator without the mixture prior (K = 1)
and with the mixture prior (K = 2) when δ 1 , δ 2 are given
θ = 5
Kn δ 1 δ 2 γ 1 γ 2 θ ^ ¯ B a y e s ξ ( θ ^ B a y e s )
1302.5 N A γ 1 = θ δ 1 NA5.585521.85126
2302.42.62.104171.903845.585691.85072
2302.42.631.076925.797442.10257
1301 N A γ 1 = θ δ 1 NA5.751022.27942
23011.15.0254.522735.308321.20807
23011.15.54.090915.327961.23492
11002.5 N A γ 1 = θ δ 1 NA5.213171.08963
21002.42.62.104171.903845.213351.08946
21002.42.631.076925.283171.14863
11001 N A γ 1 = θ δ 1 NA5.227841.18276
210011.15.0254.522735.160320.96434
210011.15.54.090915.170630.97477
(b): Mean and A S E of MLE of θ
θ = 5
n θ ^ ¯ M L E ξ ( θ ^ M L E )
306.1833.69041
1005.213171.08963
Table 5. Goodness-of-fit measures and MLEs for non-composite models and composite models.
Table 5. Goodness-of-fit measures and MLEs for non-composite models and composite models.
ModelMLE and SE(MLE) NLL AIC BIC
Exponential
X Exp( λ )
λ ^ = 0.0159 , S E ( λ ^ ) = λ ^ n = 0.00332 118.171 238.342 239.478
Exp–Pareto θ ^ = 61.521 , S E ( θ ^ ) = 12.4734 126.291254.582255.717
m = 18
Inverse-gamma α ^ = 1.2633 , S E ( α ^ ) = 1 n Polygamma ( 1 , α ^ ) = 0.19196 117.129238.258248.529
X IG( α , β ) β ^ = 31.5436 , S E ( β ^ ) = β ^ n α ^ = 5.85195
IG–Pareto θ ^ = 49.3097 , S E ( θ ^ ) = θ ^ n α = 18.5178 105.701213.402214.538
m = 13
Table 6. Bayesian estimates and marginal likelihood (PML) of IG–Pareto models with (K = 2) or without (K = 1) mixture priors to the insurance losses due to floods in the USA.
Table 6. Bayesian estimates and marginal likelihood (PML) of IG–Pareto models with (K = 2) or without (K = 1) mixture priors to the insurance losses due to floods in the USA.
KModelPrior DistributionsBayesian
Estimates
PML VaR 0.95
1 M 1 θ gamma(10, 4.93097) θ ^ = 49.3097
m = 13
3.28084 × 10 56 5.18593 × 10 8
1 M 2 θ gamma(20, 2.46549) θ ^ = 49.3097
m = 13
3.63167 × 10 56 5.24085 × 10 8
1 M 3 θ gamma(30, 1.64366) θ ^ = 49.3097
m = 13
3.77457 × 10 56 5.26504 × 10 8
1 M 4 θ gamma(50, 0.98619) θ ^ = 49.3097
m = 13
3.90122 × 10 56 5.28750 × 10 8
2 M 5 π 1 ( θ ) gamma(25, 2.11327)
π 2 ( θ ) gamma(5, 9.15752)
θ ^ = 50.1982
m = 14
1.64218 × 10 55 5.28485 × 10 8
2 M 6 π 1 ( θ ) gamma(27.5299, 2)
π 2 ( θ ) gamma(1.74239, 25)
θ ^ = 51.8988
m = 15
7.19874 × 10 55 5.4426 × 10 8
Table 7. Bayes factors for paired models.
Table 7. Bayes factors for paired models.
Paired Models B kj Paired Models B kj Paired Models B kj
M 2 , M 1 1.1069 M 3 , M 2 1.0393 M 5 , M 3 4.3506
M 3 , M 1 1.1505 M 4 , M 2 1.0742 M 6 , M 3 19.0717
M 4 , M 1 1.1891 M 5 , M 2 4.5218 M 5 , M 4 4.2094
M 5 , M 1 5.0054 M 6 , M 2 19.8221 M 6 , M 4 18.4525
M 6 , M 1 21.9418 M 4 , M 3 1.0336 M 6 , M 5 4.3837
Table 8. Comparison between the optimal mixture prior and the non-optimal mixture prior for the insurance losses due to floods in the USA.
Table 8. Comparison between the optimal mixture prior and the non-optimal mixture prior for the insurance losses due to floods in the USA.
ModelPrior DistributionsBayesian
Estimates
PML B M k M k
M 5 π 1 ( θ ) gamma(25, 2)
π 2 ( θ ) gamma(5, 9.27388)
θ ^ = 49.4899
m = 13
3.25458 × 10 56 5.0458
M 6 π 1 ( θ ) gamma(26, 2)
π 2 ( θ ) gamma(1.86478, 25)
θ ^ = 50.5036
m = 14
1.44635 × 10 55 4.9772
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Deng, M.; Aminzadeh, M.S. Bayesian Inference for the Loss Models via Mixture Priors. Risks 2023, 11, 156. https://doi.org/10.3390/risks11090156

AMA Style

Deng M, Aminzadeh MS. Bayesian Inference for the Loss Models via Mixture Priors. Risks. 2023; 11(9):156. https://doi.org/10.3390/risks11090156

Chicago/Turabian Style

Deng, Min, and Mostafa S. Aminzadeh. 2023. "Bayesian Inference for the Loss Models via Mixture Priors" Risks 11, no. 9: 156. https://doi.org/10.3390/risks11090156

APA Style

Deng, M., & Aminzadeh, M. S. (2023). Bayesian Inference for the Loss Models via Mixture Priors. Risks, 11(9), 156. https://doi.org/10.3390/risks11090156

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop