Multivariate Credibility in Bonus-Malus Systems Distinguishing between Different Types of Claims

In the classical bonus-malus system the premium assigned to each policyholder is based only on the number of claims made without having into account the claims size. Thus, a policyholder who has declared a claim that results in a relatively small loss is penalised to the same extent as one who has declared a more expensive claim. Of course, this is seen unfair by many policyholders. In this paper, we study the factors that affect the number of claims in car insurance by using a trivariate discrete distribution. This approach allows us to discern between three types of claims depending wether the claims are above, between or below certain thresholds. Therefore, this model implements the two fundamental random variables in this scenario, the number of claims as well as the amount associated with them. In addition, we introduce a trivariate prior distribution conjugated with this discrete distribution that produce credibility bonus-malus premiums that satisfy appropriate traditional transition rules. A practical example based on real data is shown to examine the differences with respect to the premiums obtained under the traditional system of tarification.


Introduction
In an attempt of reducing the economic and casualty losses, the bonus-malus systems (BMS) have been introduced in the actuarial community.BMS is a pricing system mainly used in Europe in vehicle insurance.In this systems, the insured may have his/her premium discounted or penalized based on his/her own experience of claims.Actuarial literature about this topic is extensive see for example Lemaire;(1985, 1995); Boucher et al. (2007); Mert and Saykan (2005); Sarabia et al. (2004); Denuit et al. (2009), among other papers.Different methodologies have been used to determine the fair premium that policyholders must pay for the different classes in which the system is configured.Among these methods, the most popular ones are Bayesian methods and discrete Markov chains.
In Bonus-Malus systems, the premium is usually computed by using only the random variable number of claims.However, as not all the events produce the same individual claim amount Then, as different claims produce different claim sizes, it would be sensible to develop BMS based on both the number of claims and the corresponding severity (see Gómez-Déniz (2016)).In addition to this, if the severity is not included in the bounus-malus premium (BMP), the independence between the claims number and severity is implicitly assumed (see Lemaire (2004)).In this regard, several papers have discussed the question of implementing both variables in the BMS.See, for example, Frangos and Vrontos (2001); Gómez-Déniz et al. (2014) and the recent paper Gómez-Déniz (2016).
In this article we develop a trivariate model that discriminates between three types of claims.Since this distinction is made with respect to claims that are above, between or below certain thresholds, the resulting model takes into account the two fundamental random variables in this scenario, the number of claims as well as the corresponding claim size.Furthermore, we incorporate a trivariate prior distribution which is conjugated with the latter discrete trivariate model.As a result, we obtain credibility BMP's which satisfy desirable transition rules.We present an example consisting of real data corresponding to an Australian portfolio of automobile insurance claims.Our findings reveal that the BMP's computed by using the methodology proposed in this work (unlike those ones derived under the traditional Poison-Gamma model) does not modify the discounts make in the absence of claims.However, the methodology used in this paper is different to the recent developments in multivariate credibility models that can be found in the recent actuarial literature (see for example Frees (2003); Bühlmann and Gisler (2005)).Multidimensional credibility models was considered by Englund et al. (1999).In that paper the authors assumed that each dimension of the risk parameter represented one cover from the business.However, they only used frequency information in the credibility approach.Similarly, Thuring (2011) investigated the effect of assuming that one out of two insurance products is inactive when estimating the latent risk profile.Moreover Thuring et al. (2012) used a multivariate credibility model that allows the practitioner to consider the positive correlation in customer behaviour between different financial products and estimate the customer specific risk profiles for a specific product not owned by the customer.Again, this approach uses only two quantities, the a priori expected number of events and the observed number of events.
The Bayesian methodology has been used in actuarial science since the mid-twentieth century and it has proved to be a useful tool for the calculation of insurance premiums.It generally consists of accepting that each policy or insured is represented by a risk parameter that is unknown but random with a certain probability distribution (in the insurance portfolio), called a priori distribution or structure function.This way of proceeding is even more useful in the BMS scenario since the premium obeys certain transition rules that classify the policyholders as a bonus or malus.In other words, it lower the premiums to be paid for the bonuses and it increase the premiumt for the malus.The fundamental Bayesian tool here is simply the Bayes's Theorem, so by dividing the a posteriori mean of the parameter by the a priori mean, when the net premium principle or the quadratic loss function are used, see Denuit et al. (2009); Lemaire (1985Lemaire ( , 1995Lemaire ( , 2004)); Sarabia et al. (2004), among others, we obtain an estimator of the risk/s parameter/s that will indirectly divide the insured into good risks and bad risks.In addition, the empirical results illustrated in Frangos and Vrontos (2001); Mert and Saykan (2005); Gómez-Déniz et al. (2014) show that many auto insurance portfolios present a positive correlation between these two random variables, and therefore, the assumption of some kind of dependence between them should be considered in the calculation of BMP's.The pioneering research in this field was that of Picard (1976) (see also Lemaire (1995), chp.13), who divided the claims into two types: small, those ones that are below a limit value, say ψ , and large, above ψ.Then, as this assumption did not produce a good fit, the author proposed to distinguish between accidents that caused property damage and those ones that caused personal injury.
The rest of this paper is organised as follows.Section 2 describes the basic distributional assumptions formulated for the numbers of either type of claims.Section 3 discusses a trivariate conjugate, with respect to the discrete model, prior distribution.Next, Bayesian BMP's are derived and written as credibility formula.Numerical applications are displayed in Section 4. Finally, Section 5 concludes the article.

Basic Model
There is a lot of criticism about assuming that the number of claims in an auto insurance portfolio can follow a Poisson distribution due to the fact that for this distribution the dispersion index (ratio between the variance and the mean) is one, when in the auto insurance portfolios have been empirically proven to be a value slightly higher than the unit.However, as an initial starting point and facilitating the methodology that we will develop, we will assume that, in effect, the number of claims has a Poisson distribution with parameter θ > 0 and probability function given by When the ith policyholder makes a claim x i , it has associated a certain size, say y i ≥ 0. It is our interest now to distinguish between different types of claims (three in our case).For that reason, we include two new random variables that give rise to the consideration of three separate sub-events as follows.Let us to consider Z 0 i , Z 1 i and Z 2 i , the following random variables The Z j i 's, j = 0, 1, 2 are modelled as independent and identically distributed random variables with the following Bernoulli distributions: where 0 < p i < 1. Observe that these assumptions imply that E(Z i ) = p i , (i = 0, 1, 2).Since in practise majority of policyholders in the portfolio does not make a claim and those ones that declare claims with large claim size are sparse, we will also assume also p 0 ≥ p 1 ≥ p 2 with ∑ 2 j=0 p j = 1. 1 We now assume that Z 1 = ∑ x i=1 Z 1 i is the total number of claims with a claim size between φ 1 > 0 and φ 2 > 0 and Z i is the total number of claims with a claims size larger than φ 2 .Thus, if the Z j i , i = 1, . . ., x, j = 1, 2, are assumed to be mutually independent, then the conditional probability function of Z 1 , given that X = x, is binomial with parameters x and p 1 and the conditional probability function of Z 2 , given that X = x and Z 1 = z 1 is also binomial with parameters x − z 1 and p 2 .That is, Thus, the conditional mean and variance are linear.They are given by E(Z i |x, Now, by conditioning it is easy to get the joint probability function of the random variable (X, Z 1 , Z 2 ) which results, As a reviewer has pointed out if this inequality is not sustained, then the likelihood function and posterior distribution that will be defined later are not correct.
for x = 0, 1, . . ., Observe that the distribution depends on three parameters every one related with the three types of claims.
Straightforward algebra provides moments and the cross moment,

Estimation
Given a sample ( x, z1 , z2 ), where t is the sample size, estimation of the parameters θ, p 1 and p 2 via maximum likelihood method are easily obtained.They result θ = x, p 1 = z1 / x and p 2 = z2 /( x − z1 ), where x, z1 and z2 are the sample mean of the three random variables, respectively.These estimators coincide with the moment estimators obtained by using ( 5)-( 7).The Fisher information matrix is given by from which the asymptotic variance-covariance matrix of ( θ, p 1 , p 2 ) is obtained by inverting this information matrix.The score equations used to estimate the parameters and the Fisher's information matrix are provided in the Appendix A.

Contemplating Heterogeneity
Let us assume now that the model includes a certain level of heterogeneity and it allows parameters θ, p 1 and p 2 to vary among insureds in the portfolio.In this regard, we suppose that the parameter θ follows a gamma prior distribution (structure function) with a shape parameter α > 0, a scale parameter β > 0 and a probability density function given by with mean and variance are given by E(θ) = α/β and var(θ) = α/β 2 , respectively.The p i parameters are assumed to follow a beta prior distribution with parameters α i > 0 and β i > 0, i = 1, 2. That is, the probability density function of p i are given by respectively where B(a, b) represents the beta function given by B(a, b) = Γ(a)Γ(b)/Γ(a + b) and Γ(•) is the Euler gamma function.The mean and variance of these prior distributions, given by ( 9) for i = 1, 2, are provided by The flexibility of the beta distribution allows it to take up different shapes depending on the values of its two parameters.The choice of these prior distributions obeys to the fact that they are conjugate with the likelihoods (see Heilmann (1989); Denuit et al. (2009); Klugman et al. (2008); among others).For that reason, we will assume independence between the random variables θ, p 1 and p 2 by taking Given a sample ( x, z1 , z2 ), where t is the sample size, the posterior distribution of ϑ = (θ, p 1 , p 2 ) given the sample information is computed according to Bayes's theorem and is proportional to the product of the likelihood and the prior distribution.Thus we find that the likelihood function is proportional to and the prior distribution is proportional to Thus, the posterior distribution is conjugated with respect to the likelihood (10) and it is described by , where the constant of proportionality does not depend on θ, p 1 and p 2 .Here, x = (1/t) ∑ t i=1 xi , zi = (1/t) ∑ t i=1 zi , i = 1, 2, are the sample means of X, Z 1 and Z 2 , respectively.Therefore, the posterior distribution is the product of a gamma and two beta distributions, with the updated parameters given in Table 1.

Parameter
Updated Parameter In the numerical applications Section later we will adopt an empirical Bayes approach where the parameters of the prior distributions can be estimated from the data (see Robbins (1964); Casella (1985)).In order to do this, we need the marginal (unconditional) distribution of (X, Z 1 , Z 2 ), that can be easily obtained by compounding.Due that the variables are separated the integration process is simple.Thus, the unconditional distribution results, where N B represents the negative binomial distribution and BB the compound binomial-beta distribution.Some algebra provides that the probability function ( 12) of this trivariate unconditional model can be rewritten in a compact form as, The unconditional means are obtained using ( 5)-( 8) by compounding and given by .

The Premiums
Premiums can be derived by following the ideas displayed in Gómez-Déniz (2016).Let be an appropriate function of the number of claims with claim size below φ 1 > 0, between φ 1 and φ 2 > 0 and above φ 2 , where p x , p z 1 and p z 2 are appropriate weights assigned to the number of claims with different types of size.It is also reasonable to assume that p z 2 > p z 1 > p x .Now, by using the net premium principle, i.e., the squared-error loss function, and simple algebra, we obtain the risk premium, Observe that if p z 1 = p z 2 = p x = 1, then the risk premium in ( 14) is simply P(θ) = θ, that is, the risk premium obtained under the traditional model (net premium principle).The premium depends only on the number of claims, irrespective of their size.This is the fair premium to be charged to a policyholder if θ, p 1 and p 2 were known.However, these quantities are unobservable in practice and then, the risk premium is a theoretical one which cannot be determined exactly and it must be estimated from the data.On the other hand, the a priori premium is obtained for a policyholder about whom nothing is known, i.e., the average premium for all possible risk premiums.
We now obtain the a priori (collective) premium, as follows: Again, by inserting p z 1 = p z 2 = p x = 1 in (15) we obtain the collective premium computed under the traditional model.That is, P = α/β.
The Bayesian premium P * (t, x, z 1 , z 2 ), which is no reproduced here, is derived from (15) by interchanging the parameters α, β, α i and β i (i = 1, 2) with the updated parameters by using the expressions displayed in Table 1.
Note that P * (0, 0, 0, 0) = P.That is, the Bayesian premium coincides with the a priori premium when no information is available.Furthermore, the expression of the Bayesian premium can be written as , ( 16) , where Additionally, γ(x, z 1 , z 2 , t) can also be written as , where with κ = E[var(X|θ)]/var[E(X|θ)], coincides with the classical credibility factor usually appearing in this setting in actuarial science (see Bühlmann (1967); Jewell (1974); Gómez-Déniz (2008), among others for details).Now it is simple to see that: Then, the premium is based only in the prior information about the risk.Therefore, the case is the one in which experience is ignored and external information is used as the sole basis for the process of ratemaking.
Then, the premium is based only in the sample information.

Numerical Applications
Now, in order to compute the premiums based on the models introduced in this paper, we will examine a dataset that include information based on one-year vehicle insurance policies taken out in 2004 or 2005.This dataset is available on the website of the Faculty of Business and Economics, Macquarie University (Sydney, Australia) (see also De Jong and Heller (2008)).The total portfolio contains 67,856 policies of which 4624 have at least one claim.With respect to the number of claims, the minimum and maximum are 0 and 4 respectively.The mean is 0.072 and standard deviation is 0.278.On the other hand, regarding the claim size, the minimum and maximum are 0 and 55,922.10respectively.The mean is 137.27 and the standard deviation is 1056.30.This latter measure is very large for the size of the claims, therefore it means that a premium based only on the mean claim size is not adequate for calculating the bonus-malus premiums.Due to this portfolio only includes the aggregate value of the claims severity, a simulation analysis was completed to randomly determine the exact value that corresponds to all claims.Then, we proceed to allocate the claims that correspond to each interval, i.e., $0-$500, $500-$1000 and >$1000.Thus, we are assuming that φ 1 = 500 and φ 2 = 1000.This simulation analysis was carried by using Mathematica software package.We have taken the integer part of the individual claim amount, this does not seem very relevant in our analysis.It is important to mention that due to RandomChoice function, the partition of the aggregate claim amount is different every time the program is run.
Empirical values and fitted values by using the discrete trivariate distribution, Fitted (1), and the mixing model ( 12), Fitted (2), are illustrated in Table 2.These sample values were taken from the results obtained after dividing the claims by using the simulation scheme mentioned above.The estimated parameter values (the standard errors appear in brackets) are shown in Table 3.We also show the value obtained for two measures of model selection: Akaike's information criterion (AIC) and the consistent Akaike information criterion (CAIC).See Akaike (1974); Bozdogan (1987) for details.The goodness of fit was determined by standard Pearson's chi squared test statistics with the following grouping procedure: the outermost classes were consolidated to produce theoretical class sizes of 5 or larger.It is observable that the fit to data is reasonably good for the mixture model and not very promising for the basic model.For the mixture model, the maximum likelihood estimates were obtained by directly maximizing the log-likelihood surface.

The Proposed Premiums
Table 4 illustrates the relativities (Bayesian BMP's) obtained by applying ( 17) and the parameter estimates displayed in Table 3.It is noticeable that the structure of this table is built in a similar way the one derived in traditional BMS.Namely, at the beginning of the system the relativity is set equal to 1.000; then this relativity decreases within the year in the absence of claims, and it increases when claims are declared.Nevertheless, for x > 1 the system now discerns whether the number of claims corresponds to those below the size φ 1 , between φ 1 and φ 2 and above φ 2 .For the sake of comparison, the reader is referred to Gómez-Déniz (2016) where the Bayesian bonus-malus premiums calculated for the Poisson-Gamma model under the net premium principle (see also Dionne and Vanasse (1989)) and those ones computed by using expression (20) in Gómez-Déniz (2016).It is observable that the bonus-malus premiums are the same for the bonus class (x = 0) and different for the rest of the malus (x ≥ 1) classes with respect to the first of the models mentioned above.In this regard, it can be distinguished now between claims with a severity below φ 1 , between φ 1 and φ 2 and above φ 2 .In this sense if we consider Table 3 in Gómez-Déniz (2016), for example for the case (t, x, z) = (1, 0, 1) (i.e., at the end of the first year the policyholder has declared one claim with size higher than $500) the Bayesian bonus-malus premium is 1.800.However, under the scheme introduced in Table 4, the Bayesian BMP to be paid is 1.754 if the claim size belong to the interval $500-$1000, i.e., (t, x, z 1 , z 2 ) = (1, 1, 1, 0).On the other hand, when the claim amount is >$1000, i.e., (t, x, z 1 , z 2 ) = (1, 1, 0, 1) the Bayesian BMP is 2.040.Therefore, the premiums as shown in Table 4, may be larger or smaller than those ones shown in Table 3 in Gómez-Déniz (2016).This methodology would ensure the financial viability of the company.

Final Comments and Future Research
In this paper, a simple model that distinguishes, among three types of claims in bonus-malus settings has been introduced.This distinction is based on discriminating between those claims with associated amount below a threshold, between two values of thresholds or greater than a certain threshold.This methodology presented is based on the use of a trivariate distribution (not common in any statistical scenario) that depends on parameters that in turn are considered as random variables that follow certain a priori probability distributions.As a consequence, it is possible to express the bonus-malus premium based on the net premium principle (quadratic error loss function) as a credibility formula that writes the premium as a convex combination of sample information and a priori information.The bonuses of the premium obtained are undoubtedly fairer than those ones computed by using the classical methodology that does not discern between different types of claims.We shall conclude with an interesting comment made by a referee with respect to the range of values that parameter p 2 can take on.In this work we have assumed that p 2 ∈ (0, 1), however, in the third section when the probabilities are randomized it could be sensible to suppose that p 2 is dependent on the value of p 1 .In this sense, we believe it should be more realistic to consider π(ϑ) = π(θ)π(p 1 , p 2 ), where the latter factor is a bivariate distribution that assumes dependency between p 1 and p 2 .This could be subject of future research.In addition, it would be interesting to examine how the premiums behave when both the distribution based on the classical model and the marginal model derived after including heterogeneity are normalized to implement a generalized linear model.This is likely to refine the premiums according to the individual factors of each insured.

Appendix A
In this Appendix we provide the score equations which provide the maximum likelihood estimators of the parameters and the elements of the Fisher's information matrix.
The score equations are given by ∂ (ϑ;

Table 1 .
Updated parameters of the posterior distribution.

Table 2 .
Empirical and fitted data for the basic model (1) and mixture model (2).

Table 3 .
Parameters estimated and standard errors (SE) for the basic and mixture model without including covariates.

Table 4 .
BMP's for claims when there are x claims, z 1 with a claim size between ψ 1 and ψ 2 , z 2 claims with a size larger than ψ 2 and x − z 1 − z 2 claims with a claim size smaller than ψ 1 with p x = 0.25, p y = 0.50 and p z = 0.75.