A Survey of the Individual Claim Size and Other Risk Factors Using Credibility Bonus-Malus Premiums

: In this paper, a ﬂexible count regression model based on a bivariate compound Poisson distribution is introduced in order to distinguish between different types of claims according to the claim size. Furthermore, it allows us to analyse the factors that affect the number of claims above and below a given claim size threshold in an automobile insurance portfolio. Relevant properties of this model are given. Next, a mixed regression model is derived to compute credibility bonus-malus premiums based on the individual claim size and other risk factors such as gender, type of vehicle, driving area, or age of the vehicle. Results are illustrated by using a well-known automobile insurance portfolio dataset. driver, vehicle body, driver’s area of residence, of vehicle, driver’s age


Introduction
In a recent work, a modification in the bonus-malus systems was proposed Gómez-Déniz (2016), which are commonly applied in automobile insurance, that differentiated between two different types of claims by including a bivariate model based on the assumption of dependence. The aforementioned work studied the impact on the bonus-malus premium in a general setting without involving individual's risk factors, such as gender, type of vehicle, area of circulation, etc.
It is well known that under the traditional bonus-malus system, the premium charged to each insured is based solely on the number of claims made. Therefore, an insured who has had an accident that causes a relatively small loss amount is penalised to the same extent as one who has experienced a more expensive accident. This event would seem to be unfair by the insureds. In the mentioned work a bivariate prior model, conjugated with respect to the likelihood, was also proposed, and as a result of this, simple credibility bonus-malus premiums that satisfy appropriate transition rules were obtained. These expressions were used to compute credibility bonus-malus premiums by considering two different types of claims: those ones above and below a threshold claim size, say ψ > 0.
Similar related works have been proposed in the actuarial literature. In this sense, the work in Pinquet (1998) computed bonus-malus rates in a multi-equation Poisson model with random effects. The work in Ragulina (2011) introduced a bonus-malus system with different claim types and varying deductibles. The work in Walhin and Paris (2001) showed how to set up a practical bonus-malus system with a finite number of classes using both the actual claim amount and claim frequency distribution. The work in Bonsdorff (2005) also incorporated the claim number and the severity in the bonus-malus system literature by using Markov chains. The work in Bermúdez (2009) examined, in automobile X 1 i=1 Z i is the total claim number with a claim amount larger than ψ. Thus, if the Z i (i = 1, . . . , x 1 ) are assumed to be mutually independent, then the conditional probability function of X 2 , given that X 1 = x 1 , is binomial with parameters x 1 and µ 2 /µ 1 . Therefore, the joint distribution of the total claim number (X 1 ) and the corresponding claim number with claim amount exceeding ψ, X 2 , has this probability function: for x 1 = 0, 1, . . . , x 2 = 0, 1, . . . , x 1 , µ 1 > 0, and 0 < µ 2 < µ 1 .

Properties of the Distribution
The marginal means are given by E(X i ) = µ i , i = 1, 2. The cross moment, the covariance, and the correlation are given by: cov(X 1 , X 2 |µ 1 , µ 2 ) = µ 2 , ρ(X 1 , X 2 |µ 1 , µ 2 ) = µ 2 /µ 1 , respectively. Thus, the model admits only positive correlation. The probabilities for different values of (x 1 , x 2 ) were calculated, and graphs were plotted for different values of these two parameters. They are shown in Figure 1. It is observable that for larger values of µ 1 and µ 2 , the modal value increases in x 1 and x 2 , illustrating that the new model is very versatile. The expression provided in (1) can also be obtained differently as follows: Let us consider an automobile insurance portfolio in which X 1 is a random variable that represents the number of claims in a given period and X 2 yields the number of claims with a size above a threshold ψ > 0 over the same period of time. If each policyholder has a probability µ 2 /µ 1 of having a claim with a claim size above ψ, then Pr(X 2 = x 2 ) and Pr(X 1 = x 1 ) are related as follows: Pr(X 1 = x 1 ).
Obviously, (3) represents a map from the probability function to the probability function. That is, ∑ ∞ x 2 =0 Pr(X 2 = x 2 ) = 1 with Pr(X 2 = x 2 ) ≥ 0, x 2 = 0, 1, . . . Although other distributions, i.e., negative binomial, could be chosen to model count data, for the sake of simplicity, let us suppose that X 1 follows a Poisson distribution with parameter µ 1 > 0. Then, we have: Expression (3) can be viewed as a weighted sum of binomial probabilities where the weights are given by the probability that the policyholder declares a certain number of claims. More specifically, it is the mean of the total number of claims with a threshold conditional on the fact that X 1 = x 1 claims and assuming the existence of a heterogeneity factor that causes different claims of different amounts. Hence, Expression (3) can be viewed as a mixture distribution. From this standpoint, the model provides a framework in which random effects are incorporated into the Poisson assumption. In this case, the bivariate distribution provided in (1) can be obtained by multiplying the conditional and the marginal distributions in the usual way.
Numerical simulation of the bivariate distribution can be simply obtained by following the approach explained in Kocherlakota and Kocherlakota (1992, chp. 1). In this regard, both the marginal distribution f (x 1 ) and the conditional distribution f (x 2 |x 1 ) will be used. The former is a Poisson distribution with parameter µ 1 and the latter a binomial distribution with parameters x and µ 2 /µ 1 . Thus, for specific values of x 1 , a realization of x 2 from f (x 2 |x 1 ) can be generated, and therefore, the pairs (x 1 , x 2 ) are observations from the joint distribution given in (1). This procedure can be repeated n times in order to obtain a random sample of size n.
The joint probability generating function is given by: Note that (4) is the limiting case of the bivariate Poisson distribution with parameters θ 1 = µ 1 − µ 2 , θ 2 → 0, and θ 12 = µ 2 (see for instance this expression in Johnson et al. 1997, chp. 37 andalso Hesselager (1996) for more details of recursions for bivariate discrete distributions). Thus, the following recursions are valid: with: and zero otherwise.

The Role of the Covariates
Clearly, the number of claims below and above ψ may be influenced by different characteristics and factors; likewise, explanatory variables may be useful to explain the individual premium to be charged. As (1) satisfies that the marginal means are given by E(X 1 ) = µ 1 and E(X 2 ) = µ 2 , then covariates can be simply implemented in the model.
We now investigate the effect of including covariates to account for the total number of claims and the claims above the threshold ψ. Obviously, some factors are crucial when explaining the endogenous variables (X 1i , X 2i ). Two appropriate links are needed to connect the explanatory variables with the marginal means. A natural way to proceed is to assume that (X 1i , X 2i ) for i = 1, . . . , n follows the probability function (1) and: where ω 1i and η 2i denote vectors of m explanatory variables for the i th observation, i.e., with components ω ji and η ji , (j = 1, . . . , m), used to model µ 1i and µ 2i , respectively, and where β k = (β k1 , . . . , β km ) , (k = 1, 2) designates the corresponding vector of regression coefficients. The log-linear specification for µ 1i is widely used, while the link function for µ 2i was chosen in this way to ensure that the latter one would not be larger than µ 1i , and thus, it would be compatible with X 2 ≤ X 1 .
These mean values may be influenced by several characteristics and variables, and the explanatory variables that are used to model each parameter µ 1i and µ 2i may not be the same in practice. In this respect, the work in Cameron and Trivedi (1998) provided good insight into standard count regression models.
The marginal effect reflects the variation of the conditional mean of X 1 and X 2 due to a one-unit change in the j th covariate, and it is calculated as: for i = 1, . . . , n and j = 1, . . . , m. Thus, the marginal effect indicates that a one-unit change in the j th regressor increases or decreases the expectation of the total number of claims and the number of claims above the given threshold depending on the sign, positive or negative, of the regressor for each mean.
For indicator variables such as ω ik , which takes only the value zero or one, the marginal effect in terms of the odds-ratio is exp(β 1j ) for µ i1 and exp(β 2j ) for µ i2 . Therefore, for µ i1 , the conditional mean is exp(β 1j ) times larger if the indicator is one rather than zero. A similar conclusion is drawn for µ i2 . Certainly, if µ 1i and µ 2i share the same covariates, then (5) does not correspond to the marginal effect of the j th covariate since µ 1i may also change in response to the changes of this covariate.

Estimation
In this section, we derive estimators based on the maximum likelihood for the model with and without covariates, and we also provide closed-form expressions for Fisher's information matrix.
The log-likelihood is proportional to: wherex 1 andx 2 are the sample means of X 1 and X 2 , respectively. The normal equations to be solved are: from which it is easy to obtain the solution to obtain the maximum likelihood estimators µ 1 =x 1 and µ 2 =x 2 which coincide with the moment estimators. The second partial derivatives are: The expectation of the negative of the second partial derivative yields Fisher's information matrix: The asymptotic variance-covariance matrix of ( µ 1 , µ 2 ) is obtained by inverting this information matrix.
Then, after some algebra, we obtain the normal equations, where: These equations provide the maximum likelihood estimates for the vector of parameters β 1 = ( β 11 , . . . , β 1m ) and β 2 = ( β 21 , . . . , β 2m ) . Similarly to the previous case, Fisher's information matrix can be obtained in closed-form. See the details in Appendix A.
The normal equations illustrated above can be used to estimate model parameters with and without covariates. The Newton-Raphson method provides solutions in a non-prohibitive time, obviously depending on the number of regressors used.

Credibility Regression Premiums
Briefly speaking, a bonus-malus system is an experience rating system that is based on the insured's claim experience frequency rather than the claim size. Let us now assume some kind of heterogeneity between policyholders, by allowing that the parameters µ i , i = 1, 2 follow some probability functions. For µ 1 , a gamma prior distribution will be assumed π 1 (µ 1 ) with a shape hyperparameter α 1 > 0 and a scale hyperparameter γ 1 > 0, whereas a type beta prior distribution will be considered for µ 2 with the probability density function given by: Here, α 2 > 0, γ 2 > 0, and B(a, b) is the beta function given by The main benefit of selecting these prior distributions is that they are conjugate with respect to the likelihoods, and for that reason, they are common choices in Bayesian and actuarial statistics; see for instance Heilmann (1989), Denuit et al. (2009), andKlugman et al. (2008), among others.
Since µ 1 and µ 2 are dependent, we can choose the prior distribution given by: which corresponds to the copula proposed by Lee (1996).
and ω a real number, which satisfies that where t is the sample size, the posterior distribution of (µ 1 , µ 2 ) given the sample information is computed according to Bayes' theorem, and it is proportional to the product of the likelihood and the prior distribution. Thus, the posterior distribution is almost conjugated with respect to the likelihood and similar to the product of a gamma and a beta distribution and where the updated parameters are given by: In practise, it is shown that µ 2 is near zero, then in this case, ω → 0, and the prior distribution reduces to π(µ 1 , µ 2 ) = π 1 (µ 1 )π 2 (µ 2 ), which is the case considered here. Now, the unconditional means and cross moment are given by: Finally, the unconditional bivariate distribution is: For computational reasons, sometimes, it is more convenient to work with the parametrization The maximum likelihood estimates for this mixture regression model can be simply obtained by means of the EM algorithm. This method is a powerful technique that provides an iterative procedure to compute maximum likelihood estimation when data contain missing information. Details on the derivation of the EM algorithm can be found in Appendix B. The standard errors of the estimatesΩ = ( β 1 , β 2 , γ 1 , γ 2 ) can be computed by using the method given by Louis (1982). Here, we use Fisher's information matrix found in Appendix A and replace the missing values by the corresponding pseudo-values calculated in the last iteration of the EM algorithm. Direct maximization of the likelihood surface is also possible to compute the maximum likelihood estimates of the mixture regression model.
By following the same arguments as those ones provided in Gómez-Déniz (2016) and also based on the ideas in Heilmann (1989) (see also Gerber 1979, Rolski et al. 1999, Bühlmann and Gisler 2005, and Gómez-Déniz 2008; among others), a premium calculation principle assigns to each risk vector of parameters Θ Θ Θ a premium within the set P ∈ R, the action space. Let L(Θ Θ Θ, P) = (Θ Θ Θ − P) 2 be the squared-error loss function sustained by a decision-maker who takes the action P and is faced with the outcome Θ Θ Θ of a random experience. The premium must be determined in a way such that the expected loss is minimised. The unknown premium P(Θ Θ Θ), called the risk premium, can be obtained by minimising (g(x 1 , x 2 ) − P) 2 , where g(x 1 , x 2 ) is an appropriate function of the number of claims with a claim size below ψ and above ψ, respectively. It seems reasonable to take g(x 1 , x 2 ) as: where p s , p l are appropriate weights assigned to the number of claims for claim sizes above and below the critical value, respectively, with p s < p l . Now, simple algebra provides the risk premium given by, where the expectation is taken on (1). By taking p l = p s = 1 in (14), this reduces to P(Θ Θ Θ) = µ 1 , that is the risk premium depends only on the number of claims, irrespective of their size.
In the absence of experience, the actuary computes the collective premium, Again, by inserting p l = p s = 1 into (15), we obtain the collective premium computed under the traditional model, P = α 1 /γ 1 . On the other hand, if experience is available, the actuary takes a sample (x 1 ,x 2 ) from the random variables (X 1 , X 2 ) and uses this information to estimate the unknown risk premium P(Θ Θ Θ), through the Bayes premium P * (x 1 ,x 2 ) = E π(Θ Θ Θ|(x 1 ,x 2 )) [P(Θ Θ Θ)]. Due to the fact that the posterior distribution is conjugated with the prior, the Bayes premium can be derived from (15) by simply switching the parameters α i and γ i (i = 1, 2) with the updated parameters by using (8)-(11). Furthermore, the Bayesian premium can be rewritten as a credibility expression, i.e., a linear function of the data and the collective premium.
Obviously, the Bayesian premium based on (15) does not depend on the individual's risk factors, and it is only based on the accumulated past claims. Individual's risk factors can be incorporated into the premium by computing P * i (x 1 ,x 2 , β 1i , β 2i ), for i = 1, . . . , n. This general pricing formula is a function of the number of accumulated claims and the individual's significant characteristics in the regression component.
Finally, the Bayesian bonus-malus premium is computed as the ratio between the Bayesian premium and the collective premium. This bonus-malus premium is usually normalised by multiplying this ratio by 100.

Empirical Results
We will now analyse a dataset that includes information based on one-year vehicle insurance policies taken out in 2004 or 2005. This dataset is available on the website of the Faculty of Business and Economics, Macquarie University (Sydney, Australia) (see also de Jong and Heller 2008). The total portfolio contained 67,856 policies, of which 4624 have at least one claim. With respect to the number of claims, the minimum and maximum were zero and four, respectively. The mean was 0.072, and standard deviation was 0.278. On the other hand, regarding the claim size, the minimum and maximum were zero and 55,922.10, respectively. The mean was 137.27, and the standard deviation was 1056.30. This value was very large for the severity of claims, which meant that a premium based only on the mean claim size was not adequate for computing the bonus-malus premiums. As this portfolio only included the aggregate value of the claims' severity, we followed the approach provided in Gómez-Déniz (2016) to determine the exact value of all claims randomly. Since this portfolio only included the aggregate value of the claim amount for all of the claims in the portfolio, a simulation was performed to determine the exact amount corresponding to each claim. This simulation was carried out by using the Mathematica commands Permute, RandomChoice, IntegerPartitions, IntegerPart and RandomPermutation, as shown in the Appendix provided in Gómez-Déniz (2016). It is convenient to note that the partition obtained only provided the integer part, and this did not seem very relevant in the analysis. Furthermore, due to the RandomChoice command, the partition was different every time the program was run. The results obtained for the claim amounts via simulation are not shown in this work, but they are available from the authors upon request.
Below in Table 1, the observed (in bold) and expected frequencies with the threshold value for the claims assumed to be ψ = $1000 are shown. For each entry, observed frequencies (top row in bold), expected frequency under the basic model (given by using (1) in the middle row), and mixture model (bottom row), obtained by using (12), are illustrated. Furthermore, the marginal observed and expected frequencies are in the far right column and in the bottom row for X 1 and X 2 , respectively. The cells in this table are grouped to comply with the rule of five when applying the χ 2 test.
Similarly, Table 2 exhibits the observed and expected frequencies when the threshold amount was ψ = $3000. Again, the cells are combined to comply with the rule of five. As can be seen, the fitting values obtained by using the mixture model were much more flexible since it incorporated heterogeneity among policyholders via the prior distributions, and it also provided a better fit to the data than those ones computed under the basic model for both thresholds. Maximum likelihood estimation was used in both cases. It is convenient to point out that in the case of the mixture model, it was proven that directly maximizing the logarithm of the log-likelihood function provided, as expected, the same results as using the EM algorithm shown in Appendix B of this work. Mathematica and WinRaTs were the two packages used in this case.
Parameter estimates, standard errors (in brackets), the maximum of the log-likelihood function, figures of the chi-squared test statistics, degrees of freedom (d.f.), and the p-value are exhibited in Table 3 for the basic and mixture models. Results under the threshold value first ψ = $1000 are shown in the second and third columns and ψ = $3000 in the last two columns. Virtually, the same estimates were obtained for parameters µ 1 and µ 2 under the basic and mixture models. Similarly, no changes were discernible in the estimates between the estimates for the two thresholds with the exemption of the estimate of parameter γ 2 . In this case, it was observable that the estimate decreased when the threshold increased. By incrementing the threshold value, the fit to the data improved. The mixture model provided the best fit to the data in terms of the χ 2 test statistic and the negative of the maximum of the likelihood function max . Note that the mixture model was not rejected at the 5% significance level for the two thresholds previously considered. It is important to note that, although the gain in terms of maximum of the log-likelihood function did not seem significant, the mixture model was preferable in terms of the χ 2 test statistics since, unlike the basic one, it was not rejected at the 5% significance level (see the corresponding p-values) in either of the two thresholds mentioned above. We now implement explanatory variables in our analysis. The following covariates were considered: gender of driver, vehicle body, driver's area of residence, age of vehicle, and driver's age category. In addition, an intercept was also included in the study. Details about the codification of these variables can be found on the same website. Moreover, an offset variable (exposure, log of the time exposed to risk) was included in the linear predictor associated with the first variable. Table 4 illustrates the estimates of the regressors for the mixture model associated with the random variables X 1 and X 2 again for a threshold of ψ = $1000 and ψ = $3000. In the first case, the explanatory variables hardtop (HDTOP), motorized caravan (MCARA), driver's area of residence C (AREAC), age of Vehicles 1 and 2 (VAGE1 and VAGE2), and driver's Age Category 1 (AGE 1) were statistically significant at the 5% significance level for the random variable total number of claims given that the claim size exceeded ψ = $1000. Among these variables, only HDTOP, MCARA, VAGE1, VAGE2, and AGE1 were significant for both response variables. However, it is important to note that all these variables except for the regressors associated with AGE1 and AGE2, the sign of the estimates changed from positive to negative for claims above the threshold. Furthermore, the estimate of parameter γ 1 was statistically significant at the same nominal level. When the threshold value was increased up to ψ = $3000, the number of significative variables above the threshold considerably grew since now, the intercept (CONSTANT), gender of driver (GENDER), HDTOP, SEDAN, station wagon (STNWG), TRUCK, AREAA, AREAB, AREAC, AREAD, VAGE1, VAGE2, and AGE1, were relevant. However, only CONSTANT, HDTOP, STNWG, AREAD, VAGE1, VAGE2, and AGE1 were significant for both dependent variables at the same nominal level. The regressors associated with the explanatory variables CONSTANT, AREAD, and the AGE1 had the same sign for claims below and above ψ = $3000. The first two regressors were negatively correlated and the latter one positively correlated to the response variables, respectively. For the other regressors, once again, the sign of the estimates changed from positive to negative for claims above the threshold. Among the common statistically significant estimates for both threshold values, i.e., HDTOP, VAGE1, VAGE2, and AGE1, the same sign of the estimates in the variables X 1 and X 2 was observable. For the non-significant estimates, different signs were observed in the regressors. Furthermore, the estimates of parameters γ 1 and γ 2 were statistically significant at the same nominal level. Similarly to the case previously considered, the fit to the data improved when covariates were incorporated in the model and when the threshold value enlarged. Table 5 exhibits the negative of the maximum of the likelihood function (− max ), Akaike's information criterion (AIC), the Bayesian information criterion (BIC), and the consistent Akaike's information criterion (CAIC) for the basic and mixture regression models. A lower value of these measures of model selection was desirable. It was observable that the latter model was preferable to the former one. We plot the QQ-plots of the randomized quantile residuals to check for normality in Figure 2. The residuals for the basic regression models are shown in the top row and for the mixture regression model in the bottom row. Furthermore, models that use ψ = $1000 as the threshold value are exhibited in the left column and ψ = $3000 in the right-hand column. A perfect alignment with the 45 • line implies that the residuals are normally distributed. It was observable that the residuals for the larger threshold values adhered a little bit closer to the line, but these differences were not significant.  Here, x 1 is the total number of claims when x 2 claims out of x 1 have a size larger than ψ. In each chart, the thick line represents ψ = $1000, and the thin line denotes ψ = $3000. It was noticeable that the BMP decreased with the time period when the observed pair x 1 and x 2 was fixed for the two thresholds considered. The BMP was consistently lower when the threshold ψ decreased. Although for both values of ψ, the premium charged increased when x 1 and x 2 grew, the premium paid also increased with x 2 when x 1 was fixed.  Figure 4 illustrates the bonus-malus premiums (BMP) to be charged to the subgroup of policyholders with SEDAN and AREAA. In this case, we used the mixture regression model including the rest of the explanatory variables and the exposure. Similar conclusions could be drawn from this set of graphs. Again, the BMP was persistently lower when the threshold ψ decreased. The premium charged increased when x 1 and x 2 grew for either value of ψ; moreover, the premium paid rose with x 2 when x 1 was held fixed. As compared to the premiums obtained under this regression model were way higher than those ones derived before, this could be surely explained by the small sample size used to estimate regressors and also for the incorporation of the offset variable that without any doubt affected the individual average number of claims and the probability of making a claim higher than the threshold. Other different subgroups of policyholders could also be used for tarification purposes; however, for some of these classes, non-reliable estimates were obtained due to the very low sample size.

Computations in the Compound Model
Although it is customary to calculate the bonus-malus premium based on the variable number of claims (it is usually considered that once a loss has occurred, the company does not have the ability to model the amount corresponding to the loss), some attempts have been made to implement the severity in the calculation of the premium. Some works related to this topic are Frangos and Vrontos (2001), Pinquet (1998), andGómez-Déniz et al. (2014), among others. As the practitioner wishes to calculate the premium using both variables, it is useful to rely on the composite collective model. Similarly to the univariate case, the bivariate compound distributions for the aggregate claim size random variable can be simply derived as follows: and this is the the joint probability density function of (Y 1 , i=0 Y 2i are the aggregate severities, Y 1 and Y 2 being mutually independent and also independent of (X 1 , X 2 ) with probability functions (discrete or continuous) f 1 (y 1 ), f 2 (y 2 ), respectively, with x 1 and x 2 -fold convolutions f * x 1 1 (y 1 ) and f * x 2 2 (y 2 ), respectively. General expressions for E(S), var(S) and cov(S 1 , S 2 ), where S = S 1 + S 2 , were provided in Partrat (1994).
Recursion for bivariate count distributions and their compound distributions given in the form (16) have been previously considered in the actuarial literature; see Theorem 2.1. in Hesselager (1996). Other similar recursions can be found in Vernic (1997), Walhin and Paris (2000), Walhin and Paris (2001), Sundt (2002), and Sundt and Vernic (2009), among others. Moreover, bivariate recursions are useful in prediction problems involving the conditional g(y|x) of Y, given X = x; see Hesselager (1996) for more details.
Let us now assume that the random variables X 1 and X 2 represent two kinds of claims, for instance bodily injury and material damage, or as in our study, claims below and above a threshold ψ.
The fact that the probability generating function of (1) is analytically obtained helps us to derive the probability generating function of the joint random variable (X 1 (d 1 ), X 2 (d 2 )) for d i , which can be deduced in type i (i = 1, 2) claim amounts. Here, X i (d i ) is the random variable corresponding to the yearly frequency of type i claims exceeding d i . The work in Partrat (1994) then showed that the probability generating function of the random variable (X 1 (d 1 ), X 2 (d 2 )) is given by: where F 1 and F 2 are the cumulative distribution functions of the random variables Y 1 and Y 2 , respectively; while the probability generating function of the random variable X(d 1 , d 2 ), with X = X 1 + X 2 , is given by:

Final Comments
In this paper, a flexible bivariate count data regression model that let us distinguish between different types of claims according to the claim size was introduced. Besides, it allowed us to examine the factors that affect the number of claims above and below a given claim size threshold. By means of a mixture regression model, the individual claim size and other risk factors such as gender, type of vehicle, driving area, or age of the vehicle could be used to compute credibility bonus-malus premiums. Extensions of this work includes a simple modification of this model to differentiate between more than two claims in the line of the work provided in Gómez-Déniz and Calderín (2018). Besides, a similar model can be simply implemented when the number of claims is distributed according to a negative binomial distribution. A study of this nature would be a possible extension of this work.
Author Contributions: E.G.D. and E.C.-O. contributed equally to this work.

Appendix B
Given the vector of complete data x x x and the vector of missing observations (δ 1 ,δ 2 ) = {(δ 11 ,δ 21 ), . . . , (δ 1n ,δ 2n )}, then the complete data log-likelihood takes the form: Expression (A1) can be divided into two parts; the regressors are included in the first part, and the mixing distributions appear only in the second part (i.e., parameters γ 1 and γ 1 ). Furthermore, we assume, without loss of generality, that to make the model identifiable, E π 1 (δ 1 ) = 1 and E π 2 (δ 2 ) = 1/2. The EM algorithm is based on two steps. The E-step, i.e., expectation, fills in the missing data. Once the missing data are built-in, the parameters are estimated in the M-step, i.e., maximization. The regressors are estimated using the pseudo-values, E(δ 1i |x 1 ,x 2 ) and E(δ 2i |x 1 ,x 2 ) as offset variables and then fitting the regression model given in (6). Then, to estimate the parameters γ 1 and γ 2 , we maximize the log-likelihood of the mixing distributions, replacing the missing observations with their expectations. Next, if some terminating condition is achieved, then stop iterating, otherwise move back to the E-step for more iterations.
For all i = 1, 2, . . . , n, we calculate: M-step: This step works as follows: • Update the regressors β (k+1) j , j = 1, 2, using the pseudo-values c i and m i as offset variables by fitting a the regression model given in (6), and then,
Stop iterating if some terminating condition is satisfied.
The following result concerns the concept of multivariate log-concavity, which was introduced by Bapat (1988). See also Johnson et al. (1997).