An EM Algorithm for Double-Pareto-Lognormal Generalized Linear Model Applied to Heavy-Tailed Insurance Claims

Generalized linear models might not be appropriate when the probability of extreme events is higher than that implied by the normal distribution. Extending the method for estimating the parameters of a double Pareto lognormal distribution (DPLN) in Reed and Jorgensen (2004), we develop an EM algorithm for the heavy-tailed Double-Pareto-lognormal generalized linear model. The DPLN distribution is obtained as a mixture of a lognormal distribution with a double Pareto distribution. In this paper the associated generalized linear model has the location parameter equal to a linear predictor which is used to model insurance claim amounts for various data sets. The performance is compared with those of the generalized beta (of the second kind) and lognorma distributions.


Introduction
Heavy-tailed distributions are an important tool for actuaries working in insurance where many insurable events have low likelihoods and high severities and the associated insurance policies require adequate pricing and reserving.In such cases the four-parameter generalized beta distribution of the second kind (GB2) and the three-parameter generalized gamma distribution fulfil this purpose, as demonstrated in McDonald (1990), Wills et al. (2006), Frees and Valdez (2008), Wills et al. (2006), Frees et al. (2014a) and Chapter 9 of Frees et al. (2014b).In fact, the set of possible distributions that could be used for long-tail analyses is much broader than suggested here and good references for these are Chapter 10 of Frees et al. (2014a), Chapter 9 of Frees et al. (2014b) and Section 4.11 of Kleiber and Kotz (2003).We propose in this article the use of the double Pareto lognormal (DPLN) distribution as an alternative model for heavy-tailed events.
The DPLN distribution was introduced by Reed (2003) to model the distribution of incomes.It occurs as the distribution of the stopped wealth where the wealth process is geometric Brownian motion, the initial wealth is lognormally distributed and the random stopping time is exponentially distributed.This parametric model exhibits Paretian behaviour in both tails, among other suitable theoretical properties, and there is favourable evidence of its fit to data in various applications, as demonstrated in Colombi (1990), Reed (2003), Reed and Jorgensen (2004) and Hajargasht and Griffiths (2013) for income data and Giesen et al. (2010) for settlement size data.Particular applications of the DPLN distribution to insurance and actuarial science have previously been given in Ramírez-Cobo et al. (2010) and Hürlimann (2014).
In this paper, the DPLN generalized linear model (GLM) is introduced by setting the location parameter equal to a linear predictor, i.e., ν = β x, where β ∈ R d is a vector of regression coefficients and x is a vector of explanatory variables.Then the mean of the DPLN GLM is proportional to some exponential transformation of the linear predictor.Then an EM algorithm is developed which solves for the regression parameters.
Particular applications of the DPLN distribution to insurance and actuarial science have previously been given in Ramírez-Cobo et al. (2010) and Hürlimann (2014).However, another practical application of the DPLN GLM, beyond what is demonstrated in this article, is assessing variability of survival rates of employment, as done in Yamaguchi (1992).
Applying this generalized linear model, we model the claim severity for private passenger automobile insurance and claim amounts due to bodily injuries sustained in car accidents, the data sets of which are supplied in Frees et al. (2014a).We compare this predictive model with the generalized linear model derived from the generalized beta distribution of the second kind (GB2), which has been employed in modelling incomes, for example by McDonald (1990), and in modelling insurance claims, for example by Wills et al. (2006) and Frees and Valdez (2008).The EM algorithm has previously been applied to insurance data, for example in Kocović et al. (2015) , where it is used to explain losses of a fire insurance portfolio in Serbia.
The rest of the paper is organized as follows.Section 2 explains how the DPLN model is applied to regression by setting the location parameter equal to a linear predictor.In Section 3, details of parameter estimation by the method of maximum likelihood using the related normal skew-Laplace (NSL) distribution are provided, where we develop an EM algorithm for such a purpose.Section 4 gives numerical applications of the models to two insurance-related data sets and makes comparisons of fits with the LN and GB2 distributions using the likelihood ratio test and another test for non-nested models due to Vuong (1989).Out-of-sample performances of the models in regards to capital requirements of an insurer are also provided.Section 5 concludes.

DPLN Generalized Linear Model
As mentioned previously, the DPLN distribution can be obtained as a randomly stopped geometric Brownian motion whose initial value is lognormally distributed.Therefore, without any mathematical analysis, stopping the geometric Brownian motion at the initial time with probability one will give the lognormal distribution.If the diffusion coefficient of the geometric Brownian motion is set to zero, we have a deterministic geometric motion which is stopped at an exponentially distributed time, giving the PLN distribution.If also the drift coefficient of the geometric Brownian motion is set to zero then the lognormal distribution results.Another degenerate case emerges when the initial value is constant, that is, when its variance is zero, giving the Pareto distribution.This gives us a sensible intuition on the mathematical derivations in this section.
Formally, given a filtered probability space (Ω, F , (F t ) t≥0 , P), where Ω is the sample space, F is the σ-algebra of events, (F t ) t≥0 is the filtration of sub-σ-algebras of F and P is a probability measure, we consider the adapted stochastic process Y specified by the following stochastic differential equation (SDE), for t ≥ 0, where µ and σ ≥ 0 are constants and W is a Wiener process adapted to the filtration (F ) t≥0 .
Then for a fixed time t ≥ 0, the random variable Y t can be written as Now if Y 0 is a random variable, dependent upon the vector of predictor variables x = (x 1 , x 2 , . . ., x d ) ∈ R d , such that Y 0 ∼ N(ν, τ 2 ), where ν = β x, and if we stop the process Y randomly at time t = T, where T ∼ Exp(λ), then Y T is a normal skew Laplace (NSL) distributed random variable regressed on the vector of predictors x, that is the exponential of which is a DPLN distributed random variable V T (see Reed and Jorgensen (2004) for more details) dependent on the same predictors, namely where ν = β x.As indicated previously, the particular case of the PLN distribution arises when σ = 0 in (1) and the case of the lognormal distribution (LN) arises when µ = 0 and σ = 0 in (1).
The moment generating function (MGF) of Y T is where T 1 ∼ Exp(λ 1 ) and T 2 ∼ Exp(λ 2 ) are exponentially distributed random variables with as given in Reed and Jorgensen (2004).This product of MGFs demonstrates that the NSL distributed random variable Y T can be expressed as the sum where log X 0 ∼ N(ν, τ 2 ) and T 1 and T 2 are as above.
The probability density function (PDF) of V T = exp(Y T ), as in (4), is given by also given in Reed (2003).Because we will work with logarithms of DPLN variates we will make use of the PDF of Y T given by where Φ(•) and Φ c (•) represent the cumulative distribution function and survival function of the standard normal distribution respectively.Additional properties concerning the moments of the DPLN and asymptotic tail behaviour can be found in Reed (2003).When we incorporate a vector of explanatory variables (or covariates) x ∈ R d in our model, the location parameter ν is set equal to β x, where β ∈ R d is a vector of regression coefficients, and the mean of the response variable is given by where τ, λ 2 > 0 and λ 1 > 1.Each regression coefficient can be interpreted as a proportional change in the mean of the response variable per unit change in the corresponding covariate.For a random sample coming from and corresponding vectors of covariates x (1) , x (2) , . . ., x (n) , we will use the maximum likelihood estimation described in the following section to compute the parameters of the DPLN distribution.

Methods of Estimation
Given the random sample in (11) and corresponding vectors of covariates, there are several ways of estimating parameters, such as moment matching, where such moments exist, and maximum likelihood estimation (MLE).As we are dealing with heavy-tailed distributions, moment matching may not be possible and we therefore resort to maximum likelihood estimation.Maximum likelihood estimators are also preferable since for large samples the estimators are unbiased, efficient and normally distributed.The EM algorithm of Dempster et al. (1977) is one approach to performing MLE of parameters, which we describe in the next section.Another approach is based on the gradient ascent method which we discuss in a subsequent section.

The EM Algorithm for the DPLN GLM
Our task is to obtain maximum likelihood estimates of the parameters of the model DPLN(ν, τ 2 , λ 1 , λ 2 ) using the EM algorithm, which was developed in Dempster et al. (1977).Because an NSL random variable is the logarithm of a DPLN random variable, fitting a DPLN distribution to the observations in (11) is the same as fitting the NSL distribution to the logarithms y 1 , y 2 , . . ., y n of these observations.The EM algorithm starts from an initial estimate of parameter values and sequentially computes refined estimates which increase the value of the log-likelihood function.In the following paragraphs we explain how it is applied to the DPLN distribution.
Suppose that θ = (β, τ 2 , λ 1 , λ 2 ) is an initial estimate of the parameters of the distribution of the random variable Y whose density function is f Y , given in (9).Let θ denote a refined estimate of the parameters of the distribution , that is, an estimate for which the log-likelihood function exceeds that of the initial estimate θ.In what follows, we demonstrate how to generate a refined estimate for the DPLN GLM.For the refined estimate θ , we can write the log-likelihood function as where f Y (y) is the PDF of the random variable Y, f Y,Z (y, z) is the joint density function of the random variables (Y, Z) and where the random variable Z is latent and therefore unobserved.In our case, the random variable Z is a normally distributed random variable having parameters ν and τ 2 , as indicated by the random variable log X 0 in (7).
We now give the probability density function g i , for each i = 1, 2, . . ., n, for the conditional random variable Z|Y = y i which only depends on the initial estimate θ of parameters, and not on θ , namely We then rewrite (12) as and applying Jensen's inequality log (15) So our maximization of likelihood amounts to maximizing with respect to θ , which is the M-step or maximization-step of the EM algorithm.
In practice, there is an E-step or expectations-step of the algorithm which is performed prior to the M-step, however we continue with the M-step in the next section because this identifies the variables whose expectations are to be computed in the E-step.

M-Step
So we need to maximize (16) with respect to the parameter θ .We show how this is done for the double Pareto lognormal distribution by expanding out the terms in (16), giving which becomes where We arrive at the following Theorem giving the optimum parameter vector θ and whose proof follows the reasoning for the simpler case without explanatory variables given in Reed and Jorgensen (2004).
Theorem 1.The components of the parameter vector θ which maximise (16) are where and X is the matrix of predictor variables Proof.See Appendix A.1.

E-Step
Here we compute the conditional distributions which are used in the E-step.For the set of n logarithms y 1 , . . ., y n of observations, the maximum likelihood estimates of the parameters can be obtained using the EM algorithm with as follows, where the density functions f Z , f W and f Y are defined as Our E-step of the EM algorithm is given in the following Theorem, most of the equations of which are mentioned in Reed and Jorgensen (2004) but for which we supply explicit proofs.

Standard Errors
The standard errors of the estimates θ = ( β , τ2 , λ1 , λ2 ) can be estimated in the last iteration of the EM algorithm, as shown in Louis (1982).The observed Fisher information matrix evaluated at θ based on the observations {v i } n i=1 can be approximated by In particular, and therefore these expressions are available in the last iteration of the EM algorithm.

Gradient Ascent Method
The gradient ascent method is applied to the likelihood function of the normal skew Laplace distribution in this subsection.Let be y = (y 1 , ..., y n ) be a random sample of size n from a NSL distribution.Its log-likelihood function is: The solutions of the d + 3 score equations that are shown in the Appendix, provide the maximum likelihood estimates of λ 1 , λ 2 , τ and {β j } j=1,...,d , which can be obtained by numerical methods such as Newton-Raphson algorithm.Alternatively, parameter estimates can be obtained directly via a grid search for the global maximum of the log-likelihood surface given by ( 29), or equivalently by maximizing the log-likelihood function derived from the expression (8).We have used FindMaximum function of Mathematica software package v.11.0.Since the global maximum of the log-likelihood surface is not guaranteed, different initial values of the parametric space can be considered as seed point using different methods of maximization, such as Newton-Raphson method, Principal Axis method and the Broyden-Fletcher-Goldfarb-Shanno algorithm (BGGS), among others.The standard errors of the estimates have been approximated by inverting the Hessian matrix and the relevant partial derivatives can be approximated well by finite differences.

Numerical Applications
In this section, two well-known data sets in the actuarial literature that can be downloaded from Professor E. Frees' personal website 1 will be considered to test the practical performance of the DPLN generalized linear model.For the two data sets considered, the EM algorithm for the DPLN GLM was stopped when the relative change of the log-likelihood function was smaller than 1 × 10 −10 .The initial values were calculated by using the estimates of the lognormal GLM and the estimates of the parameters λ 1 and λ 2 for the model without covariates.
Because we are comparing the DPLN generalized linear model with the GB2 GLM, we give here some rudimentary facts concerning the GB2 distribution.Let Z be a random variable having the Beta(p, q) distribution, for p, q ∈ (0, ∞), as defined in Chapter 6 of Kleiber and Kotz (2003).Then, for ν ∈ (−∞, ∞) and τ ∈ (0, ∞), the random variable has the GB2 distribution and its probability density function can be written as where v ∈ (0, ∞), ν is a location parameter, τ > 0 is a scale parameter, p > 0 and q > 0 are shape parameters and is the Beta function.As for the aformentioned distributions, to include explanatory variables in the model, we let the location parameter be a linear function of covariates, i.e., ν = β x.The k-th moment is easily seen to be where k ∈ (−p/τ, q/τ), and looking at the case k = 1 we can interpret each of the regression coefficients β i , i = 1, . . ., d, as being the proportional sensitivity of the mean to the corresponding covariate.Further details of this model can be found in Frees et al. (2014a).Parameter estimation for the GB2 GLM has been performed via a grid search for the global maximum of the log-likelihood surface associated to this model.We have used FindMaximum function of Mathematica software package v.11.0.

Example 1: Automobile Insurance
The first data set pertains to claims experience from a large midwestern (US) property and casualty insurer for private passenger automobile insurance.
The dependent variable is the amount paid on a closed claim, in US$.The sample includes 6773 claims.The following explanatory variables have been considered to explain the claims amount: • GENDER, gender of operator, takes the value 1 if female and 0 otherwise; • AGE, age of operator; • CLASS rating class of operator as coded in Table 1.
In the top part of Figure 1 the histogram of the Automobile insurance claims is exhibited in logarithmic scale.This dataset is quite symmetrical but it presents a slightly longer lower tail.For that reason the DPLN distribution seems suitable to explain this dataset.

Model Without Covariates
Here, for comparison purposes only, the lognormal, DPLN and GB2 distributions will be used to describe the total losses (e.g., when explanatory variables are not considered).Firstly, the automobile insurance claims dataset is examined.Table 2 summarizes parameters estimates obtained by maximum likelihood with corresponding standard errors (in brackets) for the aforementioned distributions.
In respect of model selection, we also provide the negative of the maximum of the log-likelihood (NLL), Akaike's information criterion (AIC) and Bayesian information criterion (BIC) results in the table.Note that for all three measures of model validation, smaller values indicate a better fit of the model to the empirical data.As expected, the lognormal distribution exhibits the worst performance in terms of all three measures of model validation.In the top part of Figure 2, we have superimposed the log transformation of these three distributions to the empirical distribution of the log of the claims sizes to test the fit in both tails.It is evident that the log transformation of the lognormal distribution (black curve), i.e., Normal distribution (N), provides the worst fit due to asymmetry of the data.The logGB2 (blue curve) and NSL distributions (red curve) give better fit to data as measured by the NLL, AIC and BIC, although the latter model adheres closely to the data.Although it is not shown in Table 2, the PLN distribution, replicates the fit of the LN distribution and the value of the shape parameter that controls the right tail tends to infinity.The computing times (CT) in seconds to estimate the maximum likelihood estimates by directly maximizing the log-likelihood surface for these distributions are shown in the last row of the table.The computing time of the EM algorithm for the DPLN GLM was 1145.86 s using the stopping criterion that the relative change of the log-likelihood function be less than 1 × 10 −4 .

Comparison of Estimation from Simulations
In this section we compare the methods of estimating parameters by conducting the following simulation experiment.For the lognormal, DPLN and GB2 distributions, we simulate values based on the corresponding parameter estimates given in Table 2, and then, using as appropriate either standard formulae or the EM algorithm, compute parameter estimates from the 1000 simulated data sets of size N = 100, 200, 300, 400, 500, 1000.The results are shown in Table 3, where it is evident that increasing the sample size increases the accuracy of the parameter estimate.Of course, the true parameter values are given in Table 2, and these are the limits of the estimates as the sample size N increases.Importantly, the standard errors of the parameter estimates in Table 3 are noticeably smaller for the DPLN distribution, highlighting the consistency of the parameter estimation for the DPLN model.It is noteworthy to observe that the parameter estimates of the GB2 distribution are unstable for small sample sizes, which is not the case for the DPLN model.This highlights an advantage of the DPLN model over the GB2 in this case.

Including Explanatory Variables
Making use of the above additional information, we aim to better explain the total losses in terms of the set of covariates by using the DPLN generalized linear model.For the purpose of comparison, we have also fitted the lognormal and GB2 generalized linear models.Here, we choose the identity link function for the location parameter.
From left to right in Table 1, the parameter estimates, standard errors (S.E.) and the corresponding p-values calculated based on the t-Wald statistics for the LN, GB2 and DPLN generalized linear models are displayed for the automobile insurance claims dataset.The AIC and BIC values for each model are provided in the last two rows of the table.For the i-th claimant, the number of total amount y i follows (10) whose mean depends on the above set of covariates through the identity link function.The exponential of INTERCEPT coefficient 7.260 is proportional to the predicted loss amount when the values of the other explanatory variables are equal to 0. This estimate is statistically significant at the usual significance levels, i.e., 5% and 1%.In total, the estimates of 10 out of 23 parameters for the DPLN generalized linear models are statistically significant at the usual levels (i.e., 5% and 1%) including the scale and shape parameters.The results for the LN and GB2 generalized linear model are also exhibited in Table 1 to compare their behaviour with DPLN generalized linear model.As it can be seen the fit provided by the DPLN generalized linear model improves the one provided by GB2.For the DPLN generalized linear model, parameters were estimated by the method of maximum likelihood by maximizing the log-likelihood surface.The same estimates were achieved by the EM algorithm described in Section 3.2.The standard errors of the parameter estimates for the DPLN GLM were computed from the last iteration of the EM algorithm and also approximated by inverting the Hessian matrix.Similar values were obtained.The computing times in seconds to estimate the maximum likelihood estimates by directly maximizing the log-likelihood surface for these generalized linear models are shown in the last row of the table.The DPLN GLM shows a better performance than the GB2 counterpart.The computing time of the EM algorithm for the DPLN GLM was 2239.24s using the stopping criterion that the relative change of the log-likelihood function be less than 1 × 10 −4 .

Model Validation
Now, we analyze model validation from a practical perspective.In this regard, LN generalized linear model can be seen as a limiting case of DPLN generalized linear model when both λ 1 and λ 2 tend to infinity.We are interested, by means of the likelihood ratio test, in determining whether the LN generalized linear model (null hypothesis) is preferable to DPLN generalized linear model (alternative hypothesis) in describing these datasets.The test statistic is T = 2 ( LN − DPLN ) where LN and DPLN represent the maximum of the log-likelihood function for the LN and DPLN generalized linear models respectively.Asymptotically, under certain regularity conditions (see for example Lehmann and Casella (1998)) T follows a chi-square distribution with two degrees of freedom.We have that T = 2 (−57179.69+ 57155.87)= 50.02,therefore the larger model (DPLN) is preferable to the smaller (LN) generalized linear model at the usual significance levels, i.e., 5% and 1% (p-value less than 0.0001).
Next, the likelihood ratio test proposed by Vuong (1989) for non-nested models will be considered as a tool for model diagnostic.The test statistic is where is the sample variance of the pointwise log-likelihood ratios and f and g represent the probability density function (pdf) of two different non-nested models, θ 1 and θ 2 are the maximum likelihood estimates of θ 1 and θ 2 and n f and n g are the number of estimated coefficients in the model with pdf f and g respectively.Note that the Vuong's statistic is sensitive to the number of estimated parameters in each model and therefore the test must be corrected for dimensionality.Under the null hypotheses, H 0 : E[ f ( θ 1 ) − g ( θ 2 )] = 0 and T is asymptotically normally distributed.At the 5% significance level, the rejection region for this test in favor of the alternative hypothesis occurs when T > 1.96.Now we compare the GB2 and DPLN generalized linear models in terms of Vuong's test.Under the null hypothesis the two models are equally close to the true but unknown specification.For our data set, the value of the test statistic is T = 1.00 and we fail to reject H 0 , and therefore differences between these two models do not exist.

Example 2: Automobile Bodily Injury Claims
The second data set deals with automobile bodily injury claims sourced from the Insurance Research Council (IRC), a division of the American Institute for Chartered Property Casualty Underwriters and the Insurance Institute of America.The data, collected in 2002, contains demographic information about the claimants, attorney involvement and the economic losses (in thousands of US$), among other variables.As some of these explanatory variables contain missing observations, we only consider those data items having no missing values, resulting in a sample of 1,091 losses from a single state.We use as the response variable the claimant's total economic loss.Also, additional information is available to explain the claimants' total economic losses.We employ the following factors as covariates in our model fitting: • ATTORNEY, takes the value 1 if the claimant is represented by an attorney and 0 otherwise; • CLMSEX, takes the value 1 if the claimant is male and 0 otherwise; • MARRIED, takes the value 1 if the claimant is married and 0 otherwise; • SINGLE, takes the value 1 if the claimant is single and 0 otherwise; • WIDOWED, takes the value 1 if the claimant is widowed and 0 otherwise; • CLMINSUR, whether or not the claimant's vehicle was uninsured (= 1 if yes and 0 otherwise); • SEATBELT, whether or not the claimant was wearing the seatbelt/child restraint ing belt's vehicle was uninsured (= 1 if yes and 0 otherwise); • CLMAGE, claimant's age.
The empirical distribution of this variable combines losses of small, moderate and large sizes which makes it suitable for fitting heavy-tailed distributions.It has other features such as unimodality, skewness and a long upper tail, indicating a high likelihood of extremely expensive events.In the bottom part of Figure 1 the histogram of the response variable of this data set is shown again in logarithmic scale.A heavy lower tail is evident when this scale is used.

Model Without Covariates
The results for the bodily injury claims data are shown in Table 4.The GB2 and DPLN distributions give the best fit to data as measured by these three measures of model selection.As expected, the LN distribution has the worst performance due to the asymmetry of the data.Again, although it is not shown in Table 4, the three-parameter PLN model replicates the LN distribution.This is due to the fact that the former model is a limiting case of the latter when shape parameter λ 1 tends to infinity.These results are also supported by the bottom part of Figure 1, where it can be seen that the log transformation of the GB2 distribution LogGB2 (blue curve) and the NSL distribution (red curve) provide almost an identical fit to data.The MLEs for the DPLN distribution were obtained by using the EM algorithm whose starting parameter values λ 1 , λ 2 , ν and τ are those obtained by moment-matching the first four cumulants.These MLEs were confirmed by those obtained directly from maximizing the log-likelihood surface.The computing times in seconds to estimate the maximum likelihood estimates by directly maximizing the log-likelihood surface for these distributions are shown in the last row of the table.The computing time of the EM algorithm for the DPLN GLM was 1,322.81seconds using the stopping criterion that the relative change of the log-likelihood function be less than 1 × 10 −4 .In this section we compare the methods of estimating parameters by conducting the following simulation experiment.For the lognormal and DPLN distributions, we simulate values based on the corresponding parameter estimates given in Table 4, and then, using as appropriate either standard formulae or the EM algorithm, compute parameter estimates from the 1000 simulated data sets of size N = 100, 200, 300, 400, 500, 1000.The results are shown in Table 5, where it is evident that increasing the sample size increases the accuracy of the parameter estimate.Of course, the true parameter values are given in Table 4, and these are the limits of the estimates as the sample size N increases.Importantly, the standard errors of the parameter estimates in Table 5 are noticeably smaller for the DPLN distribution, highlighting the consistency of the parameter estimation for the DPLN model.However, in attempting to simulate values from the GB2 distribution, calculation of the inverse CDF via the expression in ( 30) is highly unstable for simulated values of the Beta(p, q) random variable Z which are close to unity.

Including Explanatory Variables
Table 6, displays the same results for the automobile injury claims dataset.For the i-th policyholder, the number of total amount y i follows (10) whose mean depends on the above set of covariates through the identity link function.The exponential of INTERCEPT coefficient 1.023 is proportional to the predicted loss amount when the values of the other explanatory variables are equal to 0. In view of its low p-value, this estimate is statistically significant at the usual significance levels, 5% and 1%.On the other hand, the indicator ATTORNEY is statistically significant at the usual nominal levels, whereas the gender and marital status of the claimant, except that the explanatory variable SINGLE, are not significant at the 5% significance level.Similarly, the fact that the vehicle was uninsured is not relevant in the investigation.Both claimant's age and usage of seatbelt/child restraint are highly significant.Three more parameters affect the calculation of the predicted mean: the parameter τ, which is also highly significant, and shape parameters λ 1 and λ 2 .All these three parameters are highly statistically significant at the usual nominal levels, 5% and 1%.For the sake of comparison the results for the LN and GB2 generalized linear model are displayed in Table 6.As it can be observed the fit provided by the GB2 generalized linear model is only marginally better than the DPLN generalized linear model.For the DPLN generalized linear model, parameters were estimated by the method of maximum likelihood by using log-transformed data and the NSL distribution.The maximum of the log-likelihood function was −1753.07 and it was achieved after considering different initial values of likelihood surface by using the FindMaximum function of Mathematica software package v.11.0.Similar estimates were obtained by means of the EM algorithm described in Section 3.2.In this case the same value was obtained for the maximum of the log-likelihood function of the NSL GLM.The standard errors of the parameter estimates for the DPLN GLM have been approximated by inverting the Hessian matrix and also from the last iteration of the EM algorithm.Similar values were obtained.The computing times in seconds to estimate the maximum likelihood estimates by directly maximizing the log-likelihood surface for these generalized linear models are shown in the last row of the table.The DPLN GLM shows a better performance than the GB2 counterpart.The computing time of the EM algorithm for the DPLN GLM was 142.57seconds using the stopping criterion that the relative change of the log-likelihood function be less than 1 × 10 −4 .As done for our first example, we analyze model validation from a practical perspective.The test statistic is T = 2 ( LN − DPLN ) where LN and DPLN represent the maximum of the log-likelihood function for the LN and DPLN generalized linear models respectively and T asymptotically follows a chi-square distribution with two degree of freedom.For the automobile bodily injury claims data set, it is verified that T = 2 (−2450.54+ 2430.02)= 20.52.Then, at the usual significance levels (i.e., p-value is less than 0.0001), the null hypothesis is clearly rejected and consequently, the smaller regression (LN) is rejected in favour of the model based on the DPLN distribution.
Also, as done for our first example, Vuong's test statistic is T = 1.00 for our second data set and we fail to reject H 0 , and therefore differences between these two models do not exist.

Log-Residuals for Assessing Goodness-of-Fit
In the following we consider the log-residuals for assessing the goodness-of-fit of the proposed models for the two datasets considered.As the population moments of order higher than two cannot be derived neither for the DPLN (i.e., λ 1 < 2) nor the GB2 (i.e., the condition p < 2τ < q is not satisfied in none of the datasets) distribution for the automobile bodily injury claims dataset, we have not examined the Pearson's type residual.In Figure 2, one can see the QQ-plot of the log-residuals for LN, GB2 and DPLN generalized linear models for the automobile insurance claims (left hand side) automobile bodily injure claims data set (right hand side).The alignment along the 45-degree line is better in both the DPLN and GB2 generalized linear models in the central part and both tails of the distribution of the residuals as compared to the LN generalized linear model for the two datasets analyzed.

Out-of-Sample Validation of Models
We demonstrate the abilities of the models to predict portfolio losses out-of-sample with probability-probability plots shown in Figure 3.The data set {v 1 , v 2 , . . ., v n } is partitioned into two halves by sorting the claim sizes ascendingly, that is, we write the data set as where i 1 , . . ., i n ∈ {1, 2, . . ., n}, and then two data sets are formed, A = {v i 1 , v i 3 , v i 5 , . ..} and B = {v i 2 , v i 4 , v i 6 , . ..}, alternating the data set to which each claim data item in the ordered data set is allocated.In this way the second data set is a good representation of the first data set in respect of claim distribution, but not necessarily in respect of the corresponding covariates, i.e., {x (i 2 ) , x (i 4 ) , . ..} may not be representative of {x (i 1 ) , x (i 3 ) , . ..}.The data set A is used for fitting the models, whereas the data set B is used for graphing the probability-probability plots.In Figures 4 and 5 we focus on the lower and upper tails of the distributions respectively, where it is evident that the DPLN and GB2 models provide the best fit and are almost indistiguishable.
In Figure 6 the net losses under the various models are shown, where, as before, we have used half of the data for fitting the models and the other half for computing the net losses and maximum probable losses based on the 99.5-th percentile.It is evident that the DPLN and GB2 models give a higher computed maximum probable loss than for the LN distribution, and thus illustrates the ability of these models to provide adequate solvency levels when extreme claims are experienced by the insurer's portfolio.

Conclusions
In this paper, the DPLN generalized linear model was developed and fitted to two data sets, these being private passenger automobile insurance claims data and automobile bodily injury claims data.Several covariates pertaining to various attributes of insurance claimants were combined in the linear predictor of the location parameter ν, and were chosen because of their anticipated effect on claim size.This model exhibits Paretian behaviour in both tails and it is shown to provide fits to the two data sets which are comparable to those of the GB2 distribution.
The parameters of the DPLN generalized linear model were estimated via the EM algorithm and independently confirmed by maximizing the log-likelihood surface of the closely related Normal Laplace generalized linear model.The performance of the DPLN model has been compared with the lognormal distribution, a limiting case of the DPLN distribution, and the GB2 generalized linear model according to different model selection criteria.In view of the results obtained, we have found that the proposed DPLN generalized linear model is a valid alternative to other parametric heavy-tailed generalized linear models such as the GB2 GLM.
Potential practical applications of the DPLN GLM, beyond what is demonstrated in this article, include predicting mortality rates for lives where the covariates of the GLM are age, sex, occupation, etc. and predicting hazard rates in reduced-form credit risk models.These will be considered in further work.

Figure 1 .
Figure 1.Empirical distribution of the logarithm of automobile insurance claims (above) and logarithm of automobile bodily injury claims (below).The log transformation of the LN (N, black), DPLN (normal skew-Laplace (NSL), red) and GB2 (LogGB2, blue) distributions have been superimposed.

Figure 2 .
Figure 2. QQ-plots of the log-residuals for LN (above), GB2 (middle) and DPLN (below) generalized linear models for automobile insurance claims data (left panel) and automobile bodily injury claims (right panel).

Table 1 .
Parameter estimates, standard errors (S.E.) and p-values of the t-test for automobile insurance claims dataset under lognormal distribution (LN), generalized beta distribution of the second kind (GB2) and double Pareto lognormal distribution (DPLN) generalized linear models.

Table 2 .
Model fitting results of the LN, GB2 and DPLN distributions regarding automobile insurance claims.

Table 3 .
Results of the simulation experiment involving 1000 simulations of data sets of size N, with standard errors shown in brackets.

Table 4 .
Results of fitting the LN, GB2 and DPLN distributions to automobile bodily injury claims data.

Table 5 .
Results of the simulation experiment involving 1000 simulations of data sets of size N, with standard errors shown in brackets.

Table 6 .
Parameter estimates, standard errors (S.E.) and p-values of the t-test for automobile bodily injury claims dataset under LN, GB2 and DPLN generalized linear models.