Bivariate Mixed Poisson and Normal Generalised Linear Models with Sarmanov Dependence—An Application to Model Claim Frequency and Optimal Transformed Average Severity

: The aim of this paper is to introduce dependence between the claim frequency and the average severity of a policyholder or of an insurance portfolio using a bivariate Sarmanov distribution, that allows to join variables of different types and with different distributions, thus being a good candidate for modeling the dependence between the two previously mentioned random variables. To model the claim frequency, a generalized linear model based on a mixed Poisson distribution -like for example, the Negative Binomial (NB), usually works. However, ﬁnding a distribution for the claim severity is not that easy. In practice, the Lognormal distribution ﬁts well in many cases. Since the natural logarithm of a Lognormal variable is Normal distributed, this relation is generalised using the Box-Cox transformation to model the average claim severity. Therefore, we propose a bivariate Sarmanov model having as marginals a Negative Binomial and a Normal Generalized Linear Models (GLMs), also depending on the parameters of the Box-Cox transformation. We apply this model to the analysis of the frequency-severity bivariate distribution associated to a pay-as-you-drive motor insurance portfolio with explanatory telematic variables.


Introduction
Calculating premiums is a fundamental task for an insurance company. To this purpose, a simple procedure consists of considering the aggregate claims as the product of the random variable (r.v.) number of claims and of the r.v. average cost of these claims, then of fitting appropriate distributions to these two random variables; if, moreover, the premium is evaluated for a given policyholder, some of its characteristics are often included in the calculation as covariates in Generalized Linear Models (GLMs) used for both variables. Then the pure premium is obtained as the product of the means of the number of claims and of the average claim cost, procedure that implies assuming that the two r.v.s are uncorrelated; however, there are evidences in the literature showing how, in practice, the correlation between both variables is not zero (see References [1][2][3][4] for some illustrations using GLM). Therefore, in this paper, we shall assume that the two just mentioned r.v.s are dependent and jointly distributed using a bivariate Sarmanov distribution with GLM marginals.
Because it provides a flexible structure in joining different types of marginals, the bivariate Sarmanov distribution has recently found its place in actuarial studies likemodeling continuous claim costs [5], modeling discrete claim frequencies [1,6], modeling dependence between discrete frequency and continuous claims [7], ruin probability calculation [8,9] or modeling telematic variables [10].
The traditional approach to modeling count data by means of Poisson regression failed due to over or under dispersion of the data. Therefore, in practice, count data models based on mixed Poisson distributions are used for the number of claims (or frequency) (see Reference [11]. As an alternative, generalized Poisson regression models have also been considered in the literature (see Reference [12]). Specifically, for this variable, in this paper a GLM based on the Negative Binomial distribution will be used.
To model the claim cost r.v. (severity), Gamma and Lognormal distributions are the most common choices (see Reference [7] for comparing both cost distributions in a real motor data set). However, it may be the case that neither of these two distributions fits the data well enough. It may even be difficult to find a known distribution for the cost of claims. To overcome this problem, in this paper it is assumed that by applying a Box-Cox transformation to the claim cost r.v., the transformed data follow a Normal distribution. In the insurance context, Harrington [13] proposed the Box-Cox transformation to generalize the log-linear model in order to calculate pure premiums (see also Reference [14]).
The Box-Cox transformation [15] provides a family of power transformations that is useful in linear regression when the normality assumption is violated, its aim being to obtain a transformed r.v. that is Normal distributed. Furthermore, the Gamma and Lognormal distributions, which have been traditionally used to adjust the distribution of claims costs, are particular cases of this family; therefore, if the original r.v. follows one of these two distributions, there is an optimal transformation so that the transformed variable becomes Normal. This idea is generalized for some unknown distribution for which we can find an optimal transformed r.v. that follows a Normal distribution. Using a Box-Cox transformation of the distribution of the variable in the original scale can be considered as a generalization of the Lognormal distribution, allowing a longer and heavier right tail as λ 1 decreases, and shifting the mode as λ 2 increases.
To conclude, in this paper it is considered a bivariate Sarmanov distribution that allows us to join a Normal GLM distribution for the transformed average claim cost r.v. (see Reference [16] for Sarmanov with Normal marginal distributions) with a Negative Binomial GLM distribution for the number of claims (see Reference [6] for Sarmanov with Negative Binomial GLM marginals). This model can be used to obtain the distribution of the total cost of claims based on the collective model, for a policyholder with specific characteristics. The bivariate Sarmanov distribution allows to fit a non-linear dependence between frequency and severity, and to analyze if the riskier profiles implies larger dependency. Furthermore, the marginal severity distribution based on a Box-Cox transformation allows a longer and heavier right tail than classical models like Gamma and Lognormal.
As numerical illustration, a database containing information on a portfolio of policyholders that have contracted an auto insurance which involved the installation of a GPS/inertial device in their vehicle is used. This device provides a source of information (telematic variables) to motor insurers that complements the variables that have traditionally been used in a-priori pricing in auto insurance. The telematic variables provide an innovative way to calculate car insurance pricing (see References [17][18][19][20][21][22] where a similar database is considered and the telematic variables are used to predict accident rate). Furthermore, for each policyholder in the database, the number of claims and their mean cost are known; these variables are right skewed and have excess of zeros, characteristics that are common in insurance auto databases.
Traditionally, tariffication in auto insurance has been based on the total number of claims of the policyholders. Given the large number of zeros, the models commonly used for fitting the claim frequency variable are the NB and the Zero Inflated Poisson [22,23]. Alternatively, some works analyze the number of claims using multivariate models, that is, considering that there are different types of claims-with civil liability, with personal injury, and so forth [1,6,24]. More recently, tariffication based on the collective model has been proposed, that is, the premium is deduced from the distribution of the total cost variable, which in turn is deduced from the bivariate distribution of frequency and severity claims. Two alternative approaches have been proposed-on the one hand, the bivariate model is deduced from the frequency distribution and from the severity distribution conditioned to frequency defined for the average claim size distribution using the number of claims as covariate [2,4]. On the other hand, Czado et al. [25] proposed bivariate models based on copulae (see also References [26] for a more general approach). Furthermore, the bivariate Sarmanov model has also been analyzed in the collective model framework [7].
The rest of the paper is organized as follows-in Section 2.1, we describe the proposed Sarmanov model relating the r.v. number of claims and the r.v. average claim cost; its properties and particular cases for the proposed marginal distributions are presented in Section 2.2; moreover, some characteristics of the inference from margin (IFM) estimation procedure are pointed out in Section 2.3. The application on a real database using telematic variables to predict the total cost of claims of a given insured is discussed in Section 3. Finally, Section 4 concludes. This paper ends with Appendixes A and B containing the proofs and additional results.

The Dependence Model and Its Properties
Let N i , i = 1, ..., m, be the counting r.v. representing the number of claims of individual i and let Y i be the r.v. representing the average claim cost per insured. Then, the resulting aggregate claims of individual i can be represented as where the usual assumption under which this model is considered is that Y i is independent of N i . We assume that Y i > 0 if and only if N i > 0; otherwise, both variables take the value zero. Therefore, we define the r.v.Ỹ i representing Y i > 0 and, based on it, the expectation and variance of S i (which are useful to obtain a-priori ratemaking in insurance) are given by: However, in practice, the independence assumption is not always true, in which case the moments of S i must be deduced from a bivariate distribution associated to the random vector (N i , Y i ), distribution that takes into account the dependence between the two variables.
Moreover, we assume that for each i, the distributions of both random variables in vector (N i , Y i ) depend on a set of k quantitative or binary covariates, which are represented by the vector X i = (X i1 , ..., X ik ) . For simplicity, the covariates are assumed to be common, but they can also be different between the two variables. In practice, we specify the relation as a GLM and define the linear predictor X i β j , where β j = β j 1 , ..., β j k , j ∈ {N, Y}, are vectors of parameters to be estimated; note that throughout the paper, such a j ∈ {N, Y} should be interpreted as an upper index and not as a power.

The Model Relating the Counting and Average Claim Cost r.v.s
This dependence model is defined in two parts: the first part is the probability mass function (pmf) of N i = 0 and the second part is the conditional bivariate density function that joins the pmf p N of the discrete r.v. N i with the probability density function (pdf) fỸ of the continuous r.v.Ỹ i corresponding to Y i > 0. The model is: where ω is the Sarmanov dependence parameter and ψ(·) and φ(·) are bounded kernel functions.
In order for (2) to define a proper pdf for all i, we impose the conditions: Then, the kernel functions limits for each i are defined as: Taking into account the conditions defined in (3) and (4), limits are also defined for the dependence parameter ω. However, since this parameter does not depend on the linear predictor, new maximum and minimum values are defined as: so that the limits of the dependence parameter are: In the following proposition, the new formulae for the expectation and variance of S i are presented. To simplify the notation, the functions used in the bivariate density expressed in (2) are rewritten as follows: From Proposition 3 in Bolancé and Vernic [7], the following correlation for each i can be easily deduced: To calculate the moments of S i and the correlation expressed in (6), we need to define the marginal GLMs and the resulting particular bivariate Sarmanov model with given kernel functions.
We propose to use exponential kernels. For the cost per insured Y i , the kernel function is expressed as φ i (y) = e −γy − LỸ i (γ), where LỸ i denotes the Laplace transform ofỸ i .
For the number of claims with n ≥ 1 in (2), we let ψ i (n) = e −δn − k i , and, to find k i , we impose condition (3) as follows

Counting Distribution
We assume that the distribution of the counting process N i is the Negative Binomial (NB), where we take µ N i = EN i = e X i β N , so that in the GLM specification ln µ N i = X i β N , and the pmf is: where α > 0. The previous model is heteroscedastic and its variance is To obtain the correlation between the frequency and the severity per policyholder, and the moments of S i from Proposition 1, we need to calculate while the other two expectations can be directly deduced from the Proposition 5 of Bolancé and Vernic [7]. The results are:

Severity Distribution
In what concerns the mean cost per policyholder represented by the r.v. Y i , we assume that its distribution is unknown, but the variableỸ i can be normalised using a Box-Cox transformation. The one parameter Box-Cox transformation is given by: while the two parameters Box-Cox transformation is: We assume that the two parameters Box-Cox transformation T λ 1 ,λ 2 (·) is applied to the r.v.Ỹ i and obtain the truncated normal r.v. Z i = T λ 1 ,λ 2 (Ỹ i ). Since the domain ofỸ i is left bounded (also necessary in order to have a bounded kernel function), when λ 1 ≥ 0, the r.v.
Then, with ϕ(·) and Φ(·) denoting the pdf and, respectively, the cumulative distribution function of the standard normal distribution, the density ofỸ i is given by hence the left truncation point a must satisfy the condition a ≥ − 1 λ 1 , for which we obtain that the left truncation condition gives Therefore, the pdf ofỸ i is • If λ 1 = 0, then y > −λ 2 implies so that the left truncation point a can be any real value. Then the left truncation condition becomes The distribution ofỸ i becomes the left truncated lognormal LTLN µ Z i , σ 2 ; a having pdf • If λ 1 < 0, the condition y > −λ 2 implies and the left truncation point a must be such that a < − 1 λ 1 . Again, the left truncation condition gives However, note that in this case, when y → ∞ then z → − 1 . We therefore write the pdf ofỸ i as In order to write the exponential kernel function corresponding to Taking b = ∞, we obtain the formula for the LTN distribution. Therefore, the kernel corresponding toỸ i becomes As discussed above for the r.v. N i , the following quantities are needed: iφ i Ỹ i , which will be separately calculated for λ 1 = 0 and for λ 1 = 0.
We change variable In general, there is no closed type formula for this integral and for similar integrals associated with E Ỹ 2 i , E Ỹ iφi Ỹ i , E Ỹ 2 iφ i Ỹ i and they must be evaluated numerically. However, in the following particular case some recursive formulas can be used. Particular case: λ 1 = 1 m , where m is a positive integer. The following notation for k ∈ N, k > 0 is introduced: Here the upper index i emphasises the connection with individual i. The proof of the following lemma is very easy and we skip it since it can also be obtained as a particular case of a result in Burkardt [27]. Lemma 1. Let L j (α) be the j-th moment of the standard left truncated normal distribution with left truncation point α, that is, Then L j (α) can be recursively evaluated as Lemma 2. With the above notation, for The following proposition provides the needed formulas.

Proposition 2.
LetỸ i be the r.v. with pdf (13) and λ 1 = 1 m . Then Case λ 1 = 0. Let now λ 1 = 0. As noted before, in this case,Ỹ i follows a left truncated Lognormal distribution LTLN µ Z i , σ 2 ; a and the following result holds without further assumption on the parameter λ 1 .
where, as before, z a,i = a−µ Z i σ .

Parameter Estimation
Since the distribution associated with the r.v. Y i is unknown, the parameter estimation is based on the two parameters Box-Cox transformed variable T λ 1 ,λ 2 Ỹ i = Z i , whose distribution is LTN or DTN. Therefore, we can estimate the bivariate Sarmanov with NB and TN marginal distributions and then the results on the original scale can be deduced as we have shown in Section 2.2. The bivariate Sarmanov distribution with NB and LTN or DTN marginals has pdf: where we recall that µ N i = e X i β N and µ Z i = X i β Z . In conclusion, we have to estimate the transformation parameters λ 1 and λ 2 , the vectors of parameters β N and β Z associated with the covariate vector, the parameter α of NB marginal distribution, the parameter σ of the LTN or DTN marginal distribution and the dependence parameter ω.
Let (n i , y i ), i = 1, ..., m, be a sample of observed values of frequency and severity per policyholder and let z i = T λ 1 ,λ 2 (y i ), i = 1, ..., m, be the transformed severity. The logarithm of the likelihood function l(Θ) to be maximised with respect to the vector of parameters which is defined as Θ = β N , β Z , α, σ, λ 1 , λ 2 , ω to be estimated, depends on the value of λ 1 as follows: letting m 0 be the number of insured with n i = 0 and m 1 the number of insured with n i ≥ 1, then
• If λ 1 = 0 and λ 2 > − min(y 1 , ..., y m ) To maximise the log-likelihood functions defined in (15) and (16), a procedure similar to the one described and tested in Bolancé and Vernic [7] can be used, which is divided in two phases: the first phase consists in using the inference from margins (IFM) technique to estimate the marginal parameters, and the second phase in using these results as starting values to obtain estimations for all parameters based on the maximization of the full log-likelihood.
The IFM is a two step estimation method which starts from an initial estimation of the marginal parameters, estimation obtained by the maximum likelihood (ML) method for the parameters of the two GLMs-GLM of the frequency variable with NB distribution, and GLM of the severity variable whose distribution is defined by the transformation parameters and by the parameters of the truncated normal distribution with given truncation values a and b. We will take the left truncation value such that Φ a−X i β Z σ is almost zero; note that since we consider heterogeneity between policyholders, in practice, a good choice will be a = min(a 1 , ..., a m ), where a i = −3σ + X i β Z (another simple choice for a would be the minimum of the transformed data) (see References [28] for ML estimation of the univariate model based on Box-Cox transformation). However, note that for (15) we must check that a ≥ − 1 λ 1 when λ 1 ≥ 0 and that a < − 1 λ 1 if λ 1 < 0. In practice, for λ 1 < 0, we used the maximum value for the right truncation point b = −λ −1 1 , which does not affect the likelihood value because Φ((b − mu)/sigma) ≈ 1.
Therefore, for the Sarmanov distribution defined in (14), each iteration of the IFM consists of the following two steps: Step 1 (iteration r) Given the parameters of the marginal distributions obtained at iteration r − 1, find the estimationω r of the dependence parameter within the interval defined in (5), estimation that maximises the log-likelihood in (15) or (16).
Step 2 Givenω r obtained in Step 1, find new values for the parameters of the marginals that maximise the log-likelihood function in (15) or (16).
In practice, in the numerical analysis that will be presented in Section 3, two assumptions on the Box-Cox transformation parameters are used. On the one hand, as we have described before, we can include the transformation parameters in the optimization procedure. On the other hand, we a-priori select the values of the transformation parameters, which can be calculated by maximizing the log-likelihood of the normal distribution associated with the transformed variable. The results obtained in both cases are practically the same.

Numerical Analysis
In this section, a database corresponding to a car insurance portfolio is analyzed, in which some of the variables have been measured via a telematic system. It is interesting to check if, in the collective risk model context, the telematic variables can replace or complement classical a-priori ratemaking variables, as the age, the driving license age, the power of the car, and so forth. The dependent variables are the frequency and mean severity of claims per policyholder. Information on 25,014 policyholders with a car in-surance is available, who have contracted a policy that incorporates a GPS device in the vehicle. The data were provided by a Spanish insurer. The information corresponds to all the drivers in the insurer's database who had a telematics insurance product in 2010. In general, this type of insurance is mainly chosen by young drivers who value the fact that their car can be located in case of an accident. So, the group of drivers is generally composed of new drivers. Our dataset contains all the available drivers; most of them did not report any claim during 2010, but a few reported at least one accident. Only accidents at fault are considered, and the cost of the accident was also collected. The cost of the accident is generally known once damages, medical expenses and bodily injury compensations are paid. The sum of all these costs was included in the dataset in a variable measuring the cost of claim.
Traditionally, the variables used to model the a-priori premium in auto insurance have been classified in policyholder and auto characteristics; furthermore, if there is some experience with the insureds, that is, they are not new in the company, some tariff variables also could be considered (see Reference [11], for some examples, [chapters 12 and 13]). In practice, these variables are those available for the insurance company. In our case, we have a portfolio with new policyholders in the company and for whom we know their age, their driving license age, and if night parking is used; we also know the gender, but this variable is not included in the analysis because official regulations prevent differentiating premiums based on the gender of the insured. About the auto of the policyholder, it is know that all are cars of private use and the available variables are the age of the car and its power. More recently, thanks to GPS devices, telematic variables are available for the insurance company; in general, the variables that are used in the premium calculation are the total of kilometers (km) driven and three percentages: in urban area, at night and over the speed limit (see Reference [22] for an example with similar portfolio). Table 1 shows the main descriptive statistics of the dependent variables and of the covariates used in the bivariate Sarmanov model defined in (14). It can be noticed that the dependent variables have right skewness. Considering this shape and comparing with alternative count data models like the Poisson or Zero Inflated Poisson, the NB GLM defined in (7) proved to be the best option for our data. Focusing on the mean severity, the distributions that have been classically used are the Gamma or the Lognormal (see Reference [7] for an example). However, in our case, the shape of the severity distribution is unknown and has a heavier tail than the Gamma and the Lognormal; in this case, the Box-Cox transformation allows us to work with the normal distribution for the transformed variable. Regarding the explanatory variables, it can be checked that the portfolio is made up of young policyholders with a symmetric age distribution. The variables with the largest skewness, which in our case is positive, are the % of km driven over the speed limit (X 8 ) and the % of km driven at night (X 9 ); these variables are those directly related with the risk exposure and, therefore, with the frequency and the severity of claims.
At the bottom of Table 1, the linear Pearson correlation coefficient and its confidence interval are shown. The value of this coefficient is low, but significant. Given the right skewness shape of both dependent variables, the linear correlation is not the best dependence measure. Alternatively, Reference [25] proposed fitting copula models with GLM marginals, and, using Gaussian copula, they obtain a dependence parameter equal to 0.13, that is, the dependence between frequency and severity is not very strong, but it affects the insurance premium considerably. The Sarmanov model with GLM marginals proposed in this work is an alternative way to copula for modelling dependence; furthermore, it allows heterogeneity dependence between policyholders with different covariate values. Table 1. Definition of variables and descriptive statistics: mean, median, standard deviation (STD), minimum (Min), maximum (Max), skewness (Skew) and kurtosis (Kur). The last row shows the linear correlation between the dependent variables and a confidence interval (CI) at a 95% level.

Description
Mean Focusing on the marginal distributions for the frequency and severity variables, the NB and Gamma GLM models (see Reference [2] for example), respectively, are the most used to model the variables number and cost of claims in auto insurance. Alternatively, in some cases, the Zero Inflated Poisson (ZIP) and Lognormal distributions could improve the fit of NB and Gamma, respectively (see References [7,29], for example). In Table 2 the AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) are compared for alternative univariate GLM models considered for marginals, including the Poisson, the ZIP and the NB for frequency, and the Gamma, Lognormal and Box-Cox transformation based model for severity. These results show that ZIP is better than NB if AIC is used, but, if BIC is used, the best model is the NB. For the severity variable, the best model is the one based on the Box-Cox transformation.  Table 3 presents the results of the estimated bivariate Sarmanov model used to fit the joint distribution of the frequency and of the Box-Cox transformation of the mean severity of claims for a given insured with the characteristics represented in a given vector of covariates. The transformation parameters are calculated a-priori by maximizing the normal log-likelihood function of the transformed variable; alternatively, the transformation parameters are included in step 2 of the estimation procedure described in Section 2.3, and it is observed that the results do not improve. A negative value for λ 1 is obtained, indicating that the distribution of the mean cost per policyholder has a longer and heavier tail than the Lognormal distribution. The estimated model with all the explanatory variables that are shown in Table 3 is compared with alternative proposals that can be found in the literature (see References [2,7,25]), to estimate collective risk model distributions (results are shown in Appendix B); for the data used in this paper, these models do not improve our proposed model.
Three Sarmanov GLM models are estimated-Model I incorporates all the explanatory variables, Model II only includes classical a-priori characteristics of the policyholder and car, and, finally, Model III only includes telematic variables. Furthermore, Models I and II are estimated without the variable X 1 (Age of the driver) given its high correlation with X 2 (Age of driver license), and noticed that the signs and p-values of the parameters of the remaining variables change only slightly. The dependence parameter ω is positive and indicates a significant dependence between frequency and severity that need to be considered in this collective model in order to calculate insurance premiums.
Comparing the three models, we can see that the best one is Model I, this model incorporating non telematic and telematic variables. The effect of the driving experience on the frequency and severity is negative; on the contrary, the effects of the car characteristics are positive in the sense that older car and higher power increase the accident rates. The coefficients associated with the telematic variables are positive, their p-values indicating significant effects on the frequency. For the severity, the significant effects are associated to X 8 (Percentage of kilometers driven over the speed limit) and X 9 (Percentage of kilometers driven at night).
Focusing on Models II and III, we note that the best fit is obtained with the telematic variables; although, as we have seen in Model I, these variables are complemented by non telematic variables improving the goodness of fit in the full model. In short, the effects of the covariates hardly change if we compare models I, II and III, which implies that multicollinearity hardly affects the results, and that telematic and non-telematic variables complement each other and take into account different characteristics of the insured. The telematic variables are taking into account the exposure to risk that the non-telematic variables do not detect by themselves. In Table 4, we show the mean of S i , that is, the pure premium, for four different insured profiles, using the full sample (with extremes) and with the positive dependence parameter ω estimated in Model I; this mean E(S i ), is compared with the one obtained without dependence (i.e., dependence parameter equal to zero); we also calculate the differences between these two pure premiums, with and without frequency and severity dependence. The first and second profiles correspond to policyholders with high risk-they are 25 years old, with 6 years old driver license, with a car with 150 horsepower, 8100 yearly kilometers driven, 50% on urban area, 40% over the speed limit and 20% at night; the difference between these two profiles is that the first one does not use night parking, while the second does. The profiles 3 and 4 are policyholders with lower risk, having the same personal and car characteristics, same total km driven as the profiles 1 and 2, but with only 5% over the speed limit and 10% at night. We observe how the Sarmanov dependence parameter affects the pure premium. The difference between premiums calculated with and without dependence increases with risk, from 5.69 Euros for the lower risk profile (Profile 4) to 12.51 Euros for the higher risk profile (Profile 1).

Conclusions
In this paper, dependence between the claim frequency and the average severity of a policyholder was introduced, using a bivariate Sarmanov distribution with Negative Binomial GLM and Normal GLM marginals. The Normal GLM distribution was obtained by applying a Box-Cox transformation with two parameters on the original distribution, motivated by the fact that, in the collective risk model context, the claim cost distribution may not coincide with the traditionally used distributions (Exponential, Gamma, Lognormal, etc.). For some specific values of the transformation parameters, some useful closed type expressions for calculating the moments of the aggregate claims were obtained.
The proposed model was fitted on a sample of policyholders from an auto insurance portfolio. The peculiarity of these data is that they are associated with an insurance which involved the installation of a GPS/inertial device in the vehicle, device that allows the collection of telematic information related to the total kilometers and to the way which they are traveled. We have shown how these variables complement and do not substitute the variables that have traditionally been used in car insurance pricing. Furthermore, we have analyzed the importance of incorporating the dependence between frequency and severity in the collective model for calculating the premium, and how our Sarmanov model allows this dependence to be incorporated in a simple way.
In general, the bivariate Sarmanov distribution shows how the effect of dependency between frequency and severity is different depending on the risk profiles. For our dataset, the results show that riskier profiles have a greater effect of dependency. Furthermore, the Box-Cox transformation improves the distribution fit of the cost of claims variable, given that it allows a longer and heavier right tail than classical models like Gamma and Lognormal.
Author Contributions: All authors contributed equally. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.

Sample Availability:
The used database is available from the authors.

Appendix A
Proof of Proposition 1. The expected value results easily from For the variance, we start with Therefore, the variance follows from This completes the proof.
Proof of Lemma 2. Since Z i ∼ LTN µ Z i , σ 2 ; a , we can write Z i = µ Z i + σZ i , where Z i ∼ LTN 0, 1; a−µ Z i σ and the first formula is immediate with α = z a,i = a−µ Z i σ . The formula of A i m,k is also immediate. To prove the formula of B i m,k we write Changing variable x = z−µ Z i +σ 2 γ σ results in and, by considering the definition of L j , we obtain the stated result.