An Expectation-Maximization Algorithm for the Exponential-Generalized Inverse Gaussian Regression Model with Varying Dispersion and Shape for Modelling the Aggregate Claim Amount

: This article presents the Exponential–Generalized Inverse Gaussian regression model with varying dispersion and shape. The EGIG is a general distribution family which, under the adopted modelling framework, can provide the appropriate level of ﬂexibility to ﬁt moderate costs with high frequencies and heavy-tailed claim sizes, as they both represent signiﬁcant proportions of the total loss in non-life insurance. The model’s implementation is illustrated by a real data application which involves ﬁtting claim size data from a European motor insurer. The maximum likelihood estimation of the model parameters is achieved through a novel Expectation Maximization (EM)-type algorithm that is computationally tractable and is demonstrated to perform satisfactorily.


Introduction
In the recent literature, in various fields of research such as seismology, biology, genetics, econometrics and insurance, an interest has been developed in modelling rightskewed data which are dominated by large values. Similarly, from the literature in non-life insurance, such as Beirlant et al. (1992), Kleiber and Kotz (2003) and Rosenberg et al. (2007), it is well known that claim size distributions are right-skewed and heavy-tailed, meaning that it is of interest for insurance companies to quantify the risk from extreme amounts of losses. In order to keep the variation of the aggregate claim amount reasonable, an insurer usually takes out reinsurance cover for their insurance portfolio; in other words, they protect themselves against losses arising from large, excessively numerous or catastrophic claims by reinsuring large claim amounts with one or more other insurance or reinsurance companies. In other cases, unless larger claim size amounts are eliminated by reinsurance, the tail of the distribution function is of critical importance, and alternative approaches are required whenever extreme losses are under consideration. In this regard, it has become a standard practice in non-life insurance to employ heavy-tailed distributions to accommodate these extreme claim sizes.
The most well-known families of heavy-tailed distributions that have been utilized in actuarial practice for this purpose are the Generalized Inverse Gaussian (GIG) and Generalized Beta of the second kind (GB2) families. The GIG distribution, which was comprehensively explored in Jorgensen (1982) and Johnson et al. (1994), includes the Inverse Gaussian as a special case and the Gamma and Inverse Gamma distributions as limiting cases. Note that the Inverse Gamma distribution also includes some well-known distributions such as the Inverse Exponential, Inverse Chi Squared and Scaled Inverse Chi Squared distributions. The GB2 distribution includes the Burr, generalized Pareto and Pareto distributions as special cases and the Generalized Gamma (GG) distribution as a limiting case. The distributions which belong to the GB2 family have been widely used in the actuarial literature to model heavy-tailed insurance loss data for cases with and without covariate information; see, for instance, Frees et al. (2014Frees et al. ( , 2016; Frees and Valdez (2008); Hürlimann (2014); Jeong (2020); Jeong and Valdez (2020); Laudagé et al. (2019); Ramirez-Cobo et al. (2010); Shi et al. (2015); Wang et al. (2020); Yang et al. (2011), among many others.
Furthermore, it should be noted that alternative approaches have been considered in the actuarial literature to model heavy-tailed losses. Log phase-type distributions were considered in (Ahn et al. 2012;Bladt and Rojas-Nandayapa 2018;Hassan Zadeh and Stanford 2016) in the context of regression analysis. The Double-Pareto-Lognormal (DPLN) distribution was proposed in Calderín-Ojeda et al. (2017) to efficiently model large claim amounts with heavy-tail behaviors. Additionally, the use of continuous mixture distributions has been proposed as a way to capture the heavy-tailed behavior of insurance losses. In Li et al. (2020), the authors considered a new parametric family of loss distributions, termed the Generalized Log-Moyal Gamma distribution (GLMGA), which can be derived as a Gamma mixture of the Generalized Log-Moyal distribution; see Bhati and Ravi (2018). While the GLMGA distribution that they presented is a special case of the GB2 distribution, they demonstrated that it is effective in the regression modelling of large and modal loss data. In Tzougas and Karlis (2020), the authors calibrated heavy-tailed insurance losses using a class of mixed Exponential Regression models with varying dispersion. Their proposed class of models extends the setup of many well-known two parameter mixed Exponential distributions, such as the classic Exponential-Inverse Gamma-namely Pareto-, the Exponential-Inverse Gaussian (EIG) distributions and the Exponential-Lognormal (ELN) distribution, which was recently considered in .
In this study, we introduce the Exponential-Generalized Inverse Gaussian (EGIG) regression model with varying dispersion and shape to approximate heavy-tailed claim sizes in non-life insurance. The probability density function (pdf) of the model is parameterized in terms of its mean, dispersion and shape parameters. This results in a more orthogonal parameterization which facilitates maximum likelihood (ML) estimation when regression specifications are allowed for the mean, dispersion and shape parameters of the EIG distribution. Furthermore, the EGIG is a very wide family which includes many well-known mixed Exponential distributions as special and limiting cases, such as the EIG, Pareto (which can be derived either as an Exponential-Gamma or as an Exponential-Inverse Gamma), Exponential-Inverse Exponential, Exponential-Inverse Chi Squared and Exponential-Scaled Inverse Chi Squared distributions, depending on the estimated values of the dispersion and shape parameters which are modelled as functions of risk factors. This can be regarded as a very useful property since, as is well known, real non-life insurance datasets are a mix of moderate and large claim amounts. In particular, the EGIG family combined with the proposed modelling framework can provide insurance companies with a useful-from a practical business point of view-tool both to model moderate losses, which correspond to the body area of the distribution, and match the varying tail behaviors of insurance losses from different risk profiles, thus leading to a better risk-adjusted classification of policyholders with similar risk characteristics. For example, suppose that an actuary empirically knows that loss amounts from the moderate-risk profile tend to follow the EIG distribution whereas loss amounts from the high-risk profile tend to follow Pareto distribution; in such cases, the use of either the EIG or the Pareto model might not be able to efficiently approximate the claim severity for the entire dataset. However, a potential distribution misspecification can affect the degree of reliability of predictions and subsequently lead to inaccurate pricing and ratemaking. The EGIG family has the substantial flexibility to overcome these deficiencies. Additionally, unlike the majority of models for insurance losses, our general approach can allow an insurer to determine the distribution of each risk class based not only on the mean parameter, which is traditionally modelled in terms of covariates, but also by using regressors on the dispersion and shaper parameters which describe the shape of the EGIG distribution. Moreover, an insurance company might not only be interested in the predictive mean of each individual claim, but also in the predictive distributions of the individual claims for more effective enterprise risk management. Finally, it is worth noting that our main contribution is to develop an Expectation-Maximization (EM)-type algorithm which exploits the stochastic mixture representation of the EGIG model to maximize its cumbersome likelihood function expressed in terms of the modified Bessel function of the third kind. Our proposed method can also remedy the computational issues which may occur by traditional direct maximization procedures since it may not be possible to obtain numerically reliable direct first and second derivatives of the Bessel function when regression structures are incorporated in its order.
The remainder of this article is organized as follows. In Section 2, we present the construction of EGIG distributions with varying dispersion and shape. Section 3 deals with the maximum likelihood (ML) estimation procedure for our proposed model via the EM algorithm. In Section 4, we describe the Motor Third Party Liability (MTPL) dataset that we use for our empirical analysis, and we provide estimation and model comparison results for the proposed model and various benchmark models. Section 5 discusses the computational issues for the implementation of the EM algorithm wich is used to fit the EGIG regression model with varying dispersion and shape. Finally, concluding remarks are given in Section 6.

The Exponential-Generalized Inverse Gaussian Regression Model with Varying Dispersion and Shape
Consider a non-life insurance portfolio which consists of i = 1, . . . , n policyholders. We describe the average severity as Y i of policyholder i, which is well-defined when there is at least one claim. In practice, there is often no access to the individual claim severities but only to the aggregate claim amounts. Therefore, it is customary to decompose the aggregate claim amounts into two parts: frequency and severity, where the heavy-tail behavior arises from the severity part.
Suppose that, given a continuous random variable Z i > 0, Y i |Z i follows an Exponential distribution with the probability density function (pdf) given by where Note that Y i |Z i denotes the conditional distribution of claim amounts given the latent variable Z i , which accounts for the unobserved heterogeneity in risks.
Let us now assume that Z i are random variables from a Generalized-Inverse Gaussian with a pdf of is the modified Bessel function of the third kind of order ν i with argument ω.
Note that Equation (2) is obtained from a reparameterization of the GIG distribution which was considered by Jorgensen (1982) and Johnson et al. (1994). This parameterization ensures that the model is identifiable since E(Z i ) = 1. Note also that When considering the assumptions in Equations (1) and (2), it is easy to see that the unconditional distribution of Y i will be an Exponential-Generalized Inverse Gaussian (EGIG) distribution with a pdf given by Note that if we let ν = −0.5 in Equation (4), the EGIG distribution reduces to an Exponential-Inverse Gaussian (EIG) distribution. Further, the Pareto distribution can be derived as an Exponential-Gamma or as an Exponential-Inverse Gamma since both distributions are limiting cases of Equation (4), obtained by allowing φ i → ∞ for ν i > 0 and ν i < −1, respectively.
To allow for the mean, dispersion and shape parameters to be modelled as functions of explanatory variables with parametric linear functional forms, we assume that where x 1,i , x 2,i and x 3,i are covariate vectors with dimensions p 1 × 1, p 2 × 1 and p 3 × 1 respectively, with β 1 = β 1,1 , . . . , β 1,p 1 T , β 2 = β 2,1 , . . . , β 2,p 2 T and β 3 = β 3,1 , . . . , β 3,p 3 T the corresponding parameter vectors, and where it is considered that the matrices X 1 , X 2 and X 3 with rows given by x 1,i , x 2,i and x 3,i respectively, are of full rank, for i = 1, . . . , n. Finally, using the moments of the Exponential and GIG distributions, one can easily find that the mean, variance, skewness and kurtosis of Y i are as follows: One can see that the proposed regression model satisfies Var

The EM Algorithm
In this section, an Expectation-Maximization (EM) algorithm (Dempster et al. 1977;McLachlan and Krishnan 2007) is used to facilitate the maximum likelihood (ML) estimation of the EGIG regression model with a varying dispersion and shape, which was described in Section 2. Due to the inherent heavy-tail behaviors of non-life insurance claims, continuous mixture models have been widely used in the modelling of general insurance claims. However, such models tend to have complicated and highly non-convex forms of likelihood, meaning that the naïve maximization of likelihood with well-known optimization routines often suffers from computational instability. In this regard, the use of EM algorithm can be beneficial to analyze non-life data. Let (y i , x 1,i , x 2,i , x 3,i ), i = 1, . . . , n, be a sample of independent observations, where y i is the response variable and x 1,i , x 2,i and x 3,i are the vectors of covariate information with dimensions p 1 × 1, p 2 × 1 and p 3 × 1 respectively. Additionally, consider that the data are produced according to the EGIG model. Then, the log-likelihood of the model can be written as where is the vector of the parameters. The direct maximization of the above function with respect to the vector of parameters θ is complicated to calculate when regression structures are allowed for the mean, dispersion and shape parameters since it would be necessary to differentiate the last two terms in Equation (12) with respect to β 1 , β 2 and β 3 , respectively.
On the other hand, the ML estimation of the model can be achieved via an EM-type algorithm which, as demonstrated in Frangos and Karlis (2004),  and Tzougas and Karlis (2020), is specifically tailored to ML estimation for mixed Exponential models since their stochastic mixture representation involving a non-observable random variable, denoted by z i herein, can be regarded to produce missing data. In the case of the EGIG model, if one augments the unobserved data z i to the observed data (y i , x 1,i , x 2,i , x 3,i ), for i = 1, . . . , n, then the complete data log-likelihood factorizes into two parts as follows: In what follows, at the E-Step of the algorithm, it is necessary to compute the Qfunction, which is the conditional expectation of the complete log-likelihood data, while the M-Step consists of maximizing the Q-function with respect to θ. The Q-function is proportional to the sum of the terms which involve the regression coefficients β where θ (r) is the estimate of θ at the rth iteration in the E-step of our EM type algorithm, . At this point, we would like to call attention to the fact that if then, applying Bayes theorem, one can find that the posterior distribution of z i |y i ; θ is a GIG distribution with a pdf where a i = c i φ i > 0, b i = 1 c i φ i + 2y i µ i > 0 and p i = ν i − 1 ∈ R and where µ i , φ i and ν i are given by Equations (5)-(7), respectively. The above result will enable us to calculate the conditional expectations E z i |y i ; θ (r) , E 1 z i |y i ; θ (r) and E log(z i )|y i ; θ (r) which are involved in Equation (14) and hence are required at the E-Step of the EM algorithm. Furthermore, the following well-known relationships between the modified Bessel functions of the third kind of different orders-see, for example, Abramowitz and Stegun (1965)-will be useful for implementing the M-step of the EM algorithm.
The EM-type algorithm for the EGIG regression model with a varying dispersion and shape can be formally described as follows. •

Empirical Analysis
We conducted an empirical analysis using a sample of claim severity data, which was randomly selected from a pool of 4,381,022 motor third-party liability (MTPL) insurance policies observed during the year 2017 from a major European insurance company. The sample comprised insured parties with complete records; i.e., with the availability of all a priori rating variables under consideration, and with at least one reported accident. There were 9525 observations that met our criteria. The response variable was the cost of claims at fault registered for each insured vehicle in the dataset, and the a priori rating variables we employed were the years that the policyholder had been with the company (YC), the age of their car (AC) and the horsepower (HP) of their car. Furthermore, an exploratory analysis was carried out in order to accurately select the subset of explanatory variables with the highest predictive power for the costs of claims. Additionally, in light of the heterogeneity that existed within the portfolio, we grouped the levels of each a priori rating variable with respect to risk profiles with a similar claim severity. This enabled us to achieve ratemaking accuracy and balance the homogeneity and sufficiency of the volume of data in each cell in order to provide credible patterns. This was necessary, since, under the proposed modelling framework, the mean, dispersion and shape parameters of the Exponential-Generalized Inverse Gaussian (EGIG) distribution were modelled in terms of covariates.

•
The variable YC consisted of three categories of policyholders: those who had been with the company for "less than 4 years" (C1), "between 4 to 8 years" (C2) and "more than 8 years" (C3). • The variable AC consisted of three categories of cars: those with an age "between 0 to 7 years" (C1), "between 7 to 14 years" (C2) and "greater than 14 years" (C3). • The variable HP consisted of three categories of cars: those with a HP of "0-1400 cc" (C1), "1400-1800 cc" (C2) and "greater than 1800 cc" (C3). Table 1 shows brief descriptive statistics for claim severities along with the number of observations in each category of the three explanatory variables. To assess the novelty of the proposed method, the following models were considered as benchmarks besides the proposed model. Note that for all models below µ i , φ i and ν i are defined by Equations (5)-(7).
• Gamma (GA): • Exponential-Inverse Gaussian (EIG): • GIG: EGIG: Defined by Equation (4). Table 2 presents the estimated regression coefficients and the corresponding standard errors in parentheses for the GA, IG, EIG, Pareto, GIG and EGIG models, which are given by Equations (4) and (30)-(34), respectively. Furthermore, with respect to the model selection, Table 2 depicts the deviance (DEV), Akaike information criterion (AIC) and the Bayesian information criterion (BIC) values for all of the fitted models. At this point, it should be mentioned that we used a model selection technique similar to the one considered in Tzougas and Karlis (2020). In particular, we started by selecting the best predictor for the parameter µ i of each claim severity model. This was done by adding all three explanatory variables-YC, AC and HP-and testing whether the exclusion of each one would result in lower DEV, AIC and BIC values. Subsequently, we continued by testing which explanatory variable between those used in parameter µ i would lead to a further decrease of the DEV, AIC and BIC values when inserted into parameters φ i and ν i for each claim severity model. Additionally, if different parameter specifications for the same claim severity model resulted in small discrepancies in the DEV, AIC and BIC values, we opted for the simpler models with fewer predictors for φ i and ν i in order to avoid overfitting. In the above respect, as we can observe from Table 2, the variables YC, AC and HP were in the model equation for µ i , the variables AC and HP were in the model equation for φ i in the case of all claim severity models, and the variable HP was in the model equation for ν i in the case of the GIG and EGIG models. Furthermore, we see that the magnitudes and signs of the estimated regression coefficients of the variables YC, AC and HP for µ i were almost identical across all claim severity models, whereas the values and the effects (positive and/or negative) of the estimated regression coefficients of the variables AC and HP for φ i and the variable HP for ν i varied among the claim severity models. In the following, we see that, due to this discrepancy, the claim severity models were better to compare in terms of their standard deviation values rather than their mean values, which are usually considered in risk classification literature. Additionally, regarding the comparison of the alternative claim severity models based on the model selection criteria, as is well known, a model noticeably outperforms its competitor if the difference in their log-likelihoods exceeds five, corresponding to a difference in their respective AIC values of more than 10 and to a difference in their BIC values of more than five; see Burnham and Anderson (2002) and Raftery (1995) respectively. Thus, as we can see from Table 2, the EGIG model gives the best fit. Finally, the normalized randomized quantile residuals-see Dunn and Smyth (1996)-were used as a graphical tool to help us assess the adequacy of the fit of the competing models. The normalized randomized quantile residuals for these claim severity regression models are defined asr i = Φ −1 (u i ), where Φ −1 is the inverse cumulative distribution function of a standard normal distribution and where u i = F i (y i |θ (r+1) ) where F i is the cumulative distribution function estimated for the ith insured and where θ (r+1) is the vector of the estimated model parameters after the EM algorithm has reached the global maximum and y i is the corresponding observation. The claim severity model fit could be investigated via the usual quantile-quantile plots. In particular, if the data indeed followed the assumed claim severity distribution, then the residual on the quantile-quantile plot would fall approximately on a straight line. From Figure 1, we observe that the mixed Exponential models provided better assumptions than the GA, IG and GIG models since their residuals were close to the diagonal line and also yielded a better performance than that close to the right tail of the claim size distribution. Therefore, as an overall conclusion, it is reasonable to suggest the use of the EGIG model for modelling claim severity in our data set. Table 3 provides the summary of the calculated premiums (Panel A) and standard deviations (Panel B) under each model. From Table 3, we observe that, as previously mentioned, the premiums are almost identical under different distributional assumptions, whereas noticeable discrepancies can be found in the standard deviation values of the claim severity models. Furthermore, we see that the GA model, a usual choice for claim severity, results in the smallest values of the estimated standard deviation, which can be partially explained by the failure of the model to capture the heavy-tail behavior of observed data, as shown in Figure 1. We also notice that the standard deviation cannot be computed for the Pareto model because the maximum value of φ i under the Pareto model is at most 1.765. Therefore, the heavy-tail behavior of the data can be captured by the Pareto model at the expense of losing the feasibility of computing variance. On the other hand, the use of the EGIG model allows us to not only capture the heavy-tail behavior but also to quantify the dispersion at an individual level. Thus, the simultaneous modelling of µ i , φ i and ν i of the EGIG model in terms of the a priori rating of variables is justified because it enables us to use all the available information in the estimation of the variance of the claim severity, which is an important risk measure as it can provide a measure of the uncertainty regarding different risk classes of policyholders, leading to a better risk classification.

Computational Aspects
This subsection discusses the computational issues related to the ML estimation of the EGIG regression model with varying dispersion and shape via the EM algorithm which was presented in Section 3. A rather strict stopping criterion was used, and the EGIG model required quite a large number of EM iterations to converge. In particular, the algorithm iterated between the E and the M-steps until the relative change in the log-likelihood between two successive iterations was smaller than 10 −12 . Also, it should be noted that the choice of sensible initial values for the vectors of regression coefficients β 1 , β 2 and β 3 can influence the speed of convergence of the EM and its ability to locate the global maximum. We obtained good starting values for β 1 by fitting the Exponential regression. Alternatively, the initial values could be obtained based on the data as follows: (i) calculating E(Y i ) = µ i , with i = 1, . . . , n-see Equation (8)-for the different risk classes, which could be formed by dividing the portfolio into clusters defined by the combinations of the available explanatorily variables and (ii) assuming a log-link function for µ i -see Equation (5)-and solving Equation (5) with respect to β 1 , since, under the parameterization method we adopted, the mean is an explicit parameter of the EGIG model. Furthermore, meaningful initial values for the regression parameters β 2 and β 3 were obtained by (i) calculating Var(Y i ), skewness(Y i ) and kurtosis(Y i )-see Equations (9)-(11)-for the different risk classes based on all observations i = 1, . . . , n, (ii) by calculating E(Y i ) = µ i with i = 1, . . . , n for the different risk classes (or alternatively computing µ i , based on the initial values for β 1 and on the log-link function given by Equation (5)) and using the log-link function for φ i -see Equation (6)-and the identity function for ν i -see Equation (7)-and so we found the values which satisfied Equations (9)-(11). Additionally, the standard errors of the regression coefficients were obtained using the standard method in Louis (1982). All computing was performed using the programming language R. Finally, the EIG and Pareto regression models with varying dispersion were fitted using the EM algorithm, which was considered in Tzougas and Karlis (2020), while the GA and IG regression models with varying dispersion and the GIG regression model with varying dispersion and shape were estimated using the generalized additive models for the location, scale and shape (GAMLSS) package in R; see Stasinopoulos et al. (2008)

Concluding Remarks
In this paper, we proposed an EM-type algorithm to estimate the parameters of the EGIG regression model with varying dispersion and shape. The Exponential-Generalized Inverse Gaussian is a wide and flexible model class which may fit both moderate and large claims very well based on the simultaneous modelling of its mean, dispersion and shape parameters in terms of risk factors. In this respect, the model can enable an actuary to accurately determine the distribution of each risk class and thus efficiently carry out different tasks such as computing premiums and reserves and measuring tail risk. Thus, our general approach can provide an insurance company with an advantage relative to previous approaches that have been considered in the literature concerning heavytailed losses. Furthermore, it is worth mentioning that the novel EM-type algorithm we developed can reduce the computational burden for the ML estimation of the model, which has a cumbersome density; meanwhile, it is not chronologically demanding and can avoid overflow problems which may occur via other numerical maximization schemes. Additionally, it is worth noting that Gómez-Déniz et al. (2013) introduced the Gamma-Generalized Inverse Gaussian (GAGIG) family of models, gave an excellent account of its statistical properties and considered estimation methods for cases without covariates and the case in which a regression component was introduced in the model. Therefore, since the GAGIG can be regarded as a natural extension of the proposed EGIG model, an interesting line of further research would be to extend the setup of the GAGIG model to allow for regression specifications on every parameter. Finally, it is worth mentioning that a potential future research direction would be to model different types of claims jointly, using a multivariate extension of the EGIG regression model with varying dispersion and shape through copula constructions. and h 3 (β 3 ) = ∂ ∂β 3,j