The Measures of Accuracy of Claim Frequency Credibility Predictor

: Nowadays, the sustainability risks and opportunities start to affect strongly insurance companies in regard to the resulting additional variability of future values of variables taken into account in the decision processes. This is important especially in the era of sustainable non-life insurance promoting, among others, the use of ecological car engines or ecological systems of building heating. The fundamental issue in non-life insurance is to predict future claims (e.g., the aggregate value of claims or the number of claims for a single policy) in a heterogeneous portfolio of policies taking account of claim experience. For this purpose, the so-called credibility theory is used, which was initiated by the fundamental Bühlmann model modiﬁed to the Bühlmann–Straub model. Several modiﬁcations of the model have been proposed in the literature. One of them is the development of the relationship between the credibility models and statistical mixed models (e.g., linear mixed models) for longitudinal data. The article proposes the use of the parametric bootstrap algorithm to estimate measures of accuracy of the credibility predictor of the number of claims for a single policy taking into account new risk factors resulting from the emergence of green technologies on the considered market. The predictor is obtained for the model which belongs to the class of Generalised Linear Mixed Models (GLMMs) and which is a generalization of the Bülmann–Straub model. Additionally, the possibility of predicting the number of claims and the problem of the assessment of the prediction accuracy are presented based on a policy characterized by new green risk factor (hybrid motorcycle engine) not previously present in the portfolio. The paper presents the proposed methodology in a case study using real insurance data from the Polish market.


Introduction
Nowadays, the financial sector including the insurance industry strengthens its activities to promote and build sustainable economies and societies. The management of the sustainability risks, which takes into account both the asset and the liability side, but also a broader corporate perspective, becomes a very important issue (see [1]). The number of insurers who actively take into account different green factors in their risk assessment process is growing. The problem of including these factors in the prediction of different characteristics such as the total number of claims (e.g., [2]), crash severity (e.g., [3]) and accident risk (e.g., [4]) can play the crucial role from this perspective. A very basic feature of the non-life insurance portfolio of policies is its heterogeneity, which means that policies generate different values of claims. Consequently, charging each policy with the same pure premium (flat rate) is both unjust and uncompetitive. The pure premium is understood as the part of the gross premium intended to cover the current insurance risk of the policy. Therefore, insurers classify the portfolio into risk classes, and in each class the same value of pure premium is assigned to the grouped risks. In such a classification, various kinds of criteria are considered, simply referred to as risk factors. The nature of risk factors is important as they can be observable or unobservable. Observable factors (while signing an insurance contract) are described by the policyholder, e.g., the subject of the insurance, the policyholder and the geographical area. To be more precise, observable risk factors are categorical variables with a number of levels which divide the portfolio into risk classes (e.g., risks grouped by the age of the driver in automobile insurance). When it comes to unobservable factors, they describe items such as driving skills, drinking habits or the safety of the district in which the insured property is located. Unobservable factors can be approximated by observable factors. The area where the car is driven (unobservable) is usually assumed as the area where the car is registered (observable). A more complicated issue is to take into account the behavioral unobservable factors that reflect the inner traits or abilities of an individual policyholder.
This type of situation can be observed in (but of course is not limited to) real life problems resulting from taking into account new and 'green' technologies in the motor and property insurance. From our point of view, it is important for an insurance company to have a tool to deal with this kinds of problems in the age of climate change. In the era of sustainable development, new challenges arise related to green risk factors taken into account in the premium calculation (pricing) process. This can be broadly related to the guidelines included in the Principle of Sustainable Insurance document [5] presented for the first time in 2012. This global guide explains how to manage environmental, social and governance risks in the process of evaluating, defining and pricing non-life insurance risks. One of the problems is, in particular, the analysis of green risk factors which were not present in the premium calculation process before. For example-the engine type, which is the basic factor in automobile insurance. Currently, insurers reward hybrid or electric cars, which are considered more environmentally friendly. Another example is property insurance and the type of home heating. Therefore, there is a problem of the inclusion of the indicated risk factors in statistical models without historical data. Hence, we will propose the specific way to take into account this situation in the process of pure premium prediction.
In order to calculate the pure premium for the single policy in heterogenous portfolio, traditionally the credibility theory is applied, first proposed in [6]. In that theory, the pure premium for single ith policy, corresponding to the value of claims or the number of claims, is predicted for the next period taking the data from previous periods (the history of claims). The classic Bühlmann model from [6] defines so-called credibility predictor as the weighted average of prior portfolio mean (µ) and the mean of claim experience for each ith policy (Z i ): 1] denotes the credibility factor. The strong assumption is that the number of policies is constant in each period of the analysis. The Bühlmann-Straub model proposed in [7] allows for the variability of the number of policies over time. In both models the pure premium is viewed as the random variable being a function of a random parameter Θ representing the unobservable risk factor of the individual policy and the distribution of Θ describes the variation of this factor across the whole portfolio. The models presented in both papers were then generalized in many different ways, e.g., [8]. This article considers the development of the Bühlmann-Straub model towards a regression approach, where risk factors are included in the credibility predictor. In [9] the author proposes the linear trend credibility model. As the parameter Θ is regarded as the random variable the mixed models approach to credibility prediction is adopted. In [10] authors provide examples of the credibility predictor by the use of Hierarchical Generalised Linear Models (HGLMs), further developed in e.g., [11,12] or [13]. In turn, the paper [14] provides the detailed derivation of the exact formula of predictors corresponding to different credibility models, including Bühlmann-Straub model, under particular linear mixed models (LMMs). The other topics connected with mixed models which belong to the GLMM class are studied in [15]. Authors present the so-called limited fluctuation credibility, where the predictor strongly depends on the sample size, the distribution of covariates and the link function. The GLMM is also considered in [16], where its application is presented for an analysis of longitudinal claims data by generalizing the normal assumption in Bühlmann-Straub model to Poisson and negative binomial distributions. The Poisson claim frequency, but in the form of Poisson mixtures (not as the GLMM), is also discussed in [17] assuming time-varying random effects and the frequency risks dependent on the latent individual factor corresponding to, e.g., the behavioral risk factors. These behavioral factors in credibility prediction can be analyzed in the case of the access to telematic data too. Then, the premium is calculated in the usage-based system, for instance in automobile insurance (by using GPS tracking in the car). The Poisson frequency claim models are discussed in this context by several authors, see [18][19][20]. Typically, in credibility models of frequency the accuracy of prediction is measured by the mean squared error (MSE), see [21]. There is also an alternative of using an expectation of a loss function measuring the discrepancy between real and predicted values (as considered in [22]) but in this case the loss function corresponds to the specific assumed distribution of the variable of interest. In this paper we will propose using different accuracy measures applicable for any prediction problem and its bootstrap estimator.
The aims of the paper cover the following issues. We will propose: • The procedure in premium prediction taking into account some completely new risk factors (for which realizations of the response variable are not observed); • Use of two accuracy measures applicable for any prediction problem based on the quantiles of absolute prediction errors; • The parametric bootstrap estimators of the accuracy measures of the considered credibility predictor.
This paper is organized as follows. Section 2 presents the theoretical foundations of the credibility theory, in particular-the credibility predictor with emphasis on so far not observed risk factors which can result from sustainable insurance principles. In Section 3, a case is discussed of the claim frequency credibility predictor in the GLMM approach. Section 4 proposes estimators of the measures of accuracy in claim frequency prediction together with a full parametric bootstrap algorithm. The application of the method based on real data is presented in Section 5. The paper ends with a discussion.

The Background of Bühlmann-Straub Model
Generally, longitudinal data allows to take into account not only the spatial correlation between observations and the spatial correlation between the considered variables (observed in the case of cross-sectional data) but also changes in time of the variable of interest and then prediction for future periods. In credibility theory, which in under consideration, the idea is to predict the future premium (understood here as the future number of claims) for the single policy taking into account the historical data. That is why we will model the longitudinal data: it gives the possibility to take into account both dimensions-time and space-and increase the prediction accuracy due to the complex information on the phenomenon.
Let the population size of all policies (covering all combinations of risk factors, including green ones, of the insurer's interest) be denoted by K. Let the size of the set of policies present in the current insurer's portfolio be denoted by n, where n ≤ K. Let us also assume that longitudinal data on the policies from periods t = 1, . . . , T are available but not necessarily balanced-it is not assumed that the data for the particular policy are observed in all T periods. Hence, the number of all observations is smaller than or equal to nT. What is important is that the insurer can be interested in the prediction for the future period and the policy covering the combinations of risk factors, including green ones, not observed in the available dataset.
In such a case the credibility theory could be applied, see [6,21]. The pure premium for the ith policy is defined there as the conditional mean: where Θ i is the random variable representing the individual unobservable risk factor-the risk profile. In turn, Z i,T+1 corresponding to the ith individual policy (i = 1, . . . , K) in period T + 1 is the aggregate value of claims. It is assumed that these values are observed in periods t = 1, . . . , T but not necessarily in all of them (unbalanced longitudinal data are considered), which makes it possible to measure the individual factor Θ i but for i = 1, . . . n. However, in the insurance practice both ¶ i (Θ i ) and Θ i are unknown, so there is a need to predict the value of ¶ i (Θ i ). Let the predictor of ¶ i (Θ i ) be denoted by ¶ i (Θ i ). Nowadays, in lines of businesses such as automobile or homeowner insurance, insurers collect mass portfolios of risks. In their databases, detailed information is stored about the policies. This can be cross-sectional data, but often longitudinal data are available. This means that the portfolio of policies is observed during several periods, mostly on a yearly basis. Because of the fact that each policy can be renewed (usually on an annual basis), it is possible to analyze the claim history for each risk and apply the statistical model to compute the value Let us start from the fundamental Bühlmann-Straub model presented in [7] and assume for the current portfolio (of size n) and for T periods. Let us assume that conditionally, for a given Θ i , random variables Z i,t are independent and where µ(Θ i ) is the individual risk premium, σ 2 (Θ i ) is the variance within an individual risk, i = 1, . . . , n and t = 1, . . . , T + 1. Let us consider the following predictor of (1), which is a linear combination of past observations of the aggregate value of claims (c.f. [21]): Let the parameters a i0 , . . . , a iT , as discussed in [16], be estimated by minimizing the quadratic loss function: . Then, the credibility predictor (4) takes the following form: where z i is the unknown credibility factor z i ranging from 0 to 1,Z i = T −1 ∑ T t=1 Z i,t is the average of historical claims for a single ith risk and µ is the overall mean in the current portfolio.
Predictor (5) is a function of the model unknown parameters. To obtain its empirical (estimated) version, the parameters must be replaced by their estimates. In the actuarial practice, the parameters of the Bühlmann-Straub model are usually estimated using the method of moments. However, in [14] the authors develop the link between the general formula of the Bühlmann-Straub model and the LMM for longitudinal data with fixed effects corresponding to the observable risk factors and random effects representing the individual risk profile. This makes it possible to estimate the model parameters using the maximum likelihood and the restricted maximum likelihood methods, which are usually used in this class of models. A drawback of using the LMM is the normality assumption of the response variable. Considering insurance data, which are usually right-skewed or discrete (claim frequency), a specific LMM has to be extended to the GLMM, see [23]. The GLMM no longer requires the normality assumption and can be adapted to distributions from the exponential family, such as the Poisson and the Gamma distributions. What is more, the approach presented in this section does not cover the case of prediction for new (including green) risk factors which the insurer would like to take into account, but which are not observed in the current portfolio.

Credibility Predictor of Claim Frequency
Claim frequency plays an important role in the calculation of the pure premium or, e.g., in motor insurance for the bonus-malus systems, see [24][25][26]. In frequency modeling it is usually assumed that the number of claims N it , i = 1, . . . , n, t = 1, . . . , T + 1 follows a conditional Poisson distribution, where n is the size of the observed risk portfolio. This is why the credibility premium understood as E[N i,T+1 |Θ i ] is often under consideration. In [16] the authors extend the LMM approach from [14] to the GLMM for the case of the Poisson distribution and derive a new formula of the credibility predictor (5) but for the observed ith risk factor. Consequently, the results presented below take additional account of new, including green, risk factors which were not present in the premium calculation process before (i.e., for policies indexed by i = n + 1, . . . , K). It is of special importance in the case of challenges resulting from the sustainable approach in the management in insurance companies.
Considering the claim frequency credibility predictor, let the number of claims be expressed as: where i = 1, . . . , K (for all risk factors, including those not present in the current portfolio), t = 1, . . . , T + 1 (for all periods including the future one), λ 0 is an unknown parameter and Θ i is an unobservable risk factor. The GLMM corresponding to Bühlmann-Straub model is described as follows: where β 0 is the unknown fixed effect in the portfolio (the same for all policies) and u i represents normally distributed random effects with E[u i ] = 0 and Var[u i ] = σ 2 u . Hence, considering (7), λ 0 and Θ i in (6) can be written as λ 0 = exp(β 0 ) and Θ i = exp(u i ), respectively. Under the GLMM, the credibility predictor will be given by: The credibility factor here is z i = T Of course it is possible to consider different distribution than the conditional Poisson distribution assumed in in (6), e.g., zero-inflated Poisson (ZIP) or negative-binomial and different model than (7), but then it will not correspond to the classic Bühlmann-Straub model but rather to its modification. On the other hand, in the case of a misspecification of the model the predictor can be biased and it can influence its accuracy as well.
Like predictor (8), predictor (5) depends on the model unknown parameters. To obtain the empirical (estimated) version of (8) they should be replaced by their estimates. The analysis presented herein makes use of the restricted maximum likelihood method, which is widely used in the case of the GLMM. What is also important, using the empirical version of (8) for policies not present in the current risk portfolio (i.e., for i = n + 1, . . . , K), the unpredictable random effects are not taken into account.

Bootstrap Estimators of Prediction Accuracy Measures for Claim Frequency
This section deals with the problem of prediction accuracy estimation, which is not considered in [16]. Although random effects for the risk factors not present in the insurer's current portfolio, including those resulting from the emergence of green technologies, cannot be taken into consideration at the prediction stage (see (8)), their variability can be taken into account in the estimation of prediction accuracy. Let the prediction error for the ith policy, where i = 1, 2, . . . , K, be denoted by U i,T+1 =N i,T+1 − N i,T+1 . What is more, the new prediction accuracy measures (given by (11) and (12)) are proposed as alternative to the MSE, and their estimators (see (16) and (17)) are presented. The statement seems justified that their usage should be preferred to the MSE, because the MSE is the mean of squared prediction errors whose distribution is strongly, positively skewed (see Figure 1). Let us consider the following prediction accuracy measures.

Firstly, the Mean Squared Error (MSE) is defined as
and the Root Mean Squared Error (RMSE) is given by: Secondly, for the problem under consideration, the Quantile of Absolute Prediction Errors (QAPE), originally proposed in [27] for small area prediction problems under the LMM and then used in [28] for the problem of prediction of the total loss reserve under the Hierarchical General Linear Model (HGLM), can be modified to the following form: This means that (11) is the quantile of order p of absolute prediction errors. It also means that at least p100% of realizations of absolute prediction errors for the ith risk factor are smaller than or equal to QAPE p (N i,T+1 ).
Thirdly, for the problem under consideration, the Quantile of a Mixture of Absolute Prediction Errors (QMAPE), originally proposed in [27] for small area prediction problems under the LMM, can be written as follows: Hence, it is the pth quantile of the distribution of a mixture of absolute prediction errors |U i,T+1 | with equal weights, where i = 1, 2, . . . , K. This means that at least p100% of realizations of absolute prediction errors for all risk factors are smaller than or equal to ). Let us introduce Algorithm 1, see [28][29][30], which will be used to estimate the prediction accuracy for the considered model, where the variability of random effects will be taken into account also for the risk factors not observed in the considered dataset. compute bootstrap realizations of prediction error U i,T+1 denoted by u * i,T+1 and for the bth iteration given by: where i = 1, 2, . . . , K. 7: end for 8: Compute the parametric bootstrap estimators of prediction accuracy measures: where Q p (.) is the quantile of order p.
If the considered predictor is based on a different distribution and different model than that assumed in (6) and (7), then the bootstrap algorithm presented in this section should be modified taking into account the appropriate new assumptions.

The Case Study Based on Longitudinal Portfolio
The aim of this section is to apply the presented methodology to real data problem resulting from the sustainable insurance principles-the prediction of the number of claims and the estimation of accuracy for a green factor not present in the current portfolio. Considerations are based on a motorcycle dataset taken from a Polish insurance company. The data have a longitudinal structure and contain information from years 2007-2010 on 24 different risk factors. Every risk factor describes the insured vehicle according to: • The type of the engine-benzine (BEN), diesel (DIE), hybrid (HYBRID); • The power range-0-60, 61-80, 81-100, 101-120, 121-140, 141-160, 160+; • The type of payment-cash, transfer.
One unobserved green factor is assumed-HYBRIDtrans f er, which means that we are interested in a policy paid by bank transfer for a motorcycle with a hybrid engine. Of course additional unobserved factors could be considered as well, for example hybrid type of engine for a policy paid by cash, but the methodology to be used will remain the same.
The variable of response is the number of claims of a single policy observed in every year. Relevant information is given in Figure 2. The empirical distribution presented in Table 1 clearly demonstrates a strong positive asymmetry.   The aim is to compute the values of predictors for the year 2011, based on (8), and assess the prediction accuracy, based on (15) and our proposals: (16) and (17), mainly for HYBRIDtrans f er, for which no past data are available but also for other risk factors. What is important is that the presentation of the results will not be limited to the values of the predictors and estimates of accuracy measures. The distributions of the parametric bootstrap absolute errors (see (13)) will also be presented, which is especially important due to the strong positive asymmetry of prediction errors and to the fact that the values of the variable of interest are natural (not real) numbers. The results will be presented using violin plots including the density function estimator with a mirror effect. The violin plot is more informative than a boxplot, e.g., because it makes it possible to show the multimodality of data and, additionally, to present outliers more clearly than in the case of a histogram. Showing the multimodality of the distribution may be important especially in the considered case of prediction errors computed as differences between the forecasts (which are real numbers) and the realizations of the variable of interest (which are natural numbers). What is more, the multimodality of the distribution is the additional argument in favor of using the quantiles (i.e., the QAPE and QMAPE) instead of the mean (i.e., the MSE) to describe the distribution of prediction errors or at least its central tendency.
In the first step of the analysis, the values of the predictors of the number of claims are computed separately for all risk factors-they vary from 0.26 to 3.73 (see Table 2) including the predicted value for the HYBRIDtrans f er risk factor of 0.65 (see Table 3). Then, the bootstrap algorithm is run for all risk factors for the year 2011. The distributions of the bootstrap absolute prediction errors for six arbitrarily chosen risk factors (out of 24) are presented in Figure 1. Estimates of RMSE (see (15)) for all risk factors are between 1.06 and 1.50 (see Table 2) including the value for the HYBRIDtrans f er risk factor of 1.50 (see Table 3). However, all of the distributions have a very strong positive asymmetry with many outliers-the highest value of the absolute bootstrap error, ca. 19, is observed for the HYBRIDtrans f er risk factor. The strong positive asymmetry implies that the mean of squared errors (i.e., the MSE) and the RMSE should be replaced with different accuracy measures. The proposal is to use the QAPE estimators denoted by green circles in Figure 1. For example, the QAPE estimate for order p = 0.99 for the HYBRIDtrans f er risk factor (the right bottom corner of Figure 1, see also Table 3) equals 5.69. Hence, it is estimated that at least 99% of absolute prediction errors for the HYBRIDtrans f er risk factor are smaller than or equal to 5.69 and at least 1% of absolute prediction errors for the HYBRIDtrans f er risk factor are greater than or equal to 5.69. It is the highest value of the QAPE estimate of order p = 0.99 for the risk factors presented in Figure 1.
In the second step of the analysis all the absolute bootstrap errors computed for all of the 24 risk factors (not only the six risk factors presented in Figure 1) are taken into account. These values are presented in Figure 3. Hence, the distribution can be interpreted as the distribution of a mixture of the bootstrap absolute prediction errors for all risk factors with equal weights. Then, quantiles of these values are computed, as defined in (17), denoted by green lines in Figure 3. For example, the estimate of QMAPE for order p = 0.99 equals 3.92 (see also Table 2). Hence, it is estimated that at least 99% of absolute prediction errors for all risk factors are smaller than or equal to 3.92 and at least 1% of absolute prediction errors for all risk factors are greater than or equal to 3.92. In the last step of the analysis, the estimates of QAPEs-quantiles of bootstrap absolute prediction errors for the HYBRIDtrans f er risk factor are compared to the estimates of QMAPEs-quantiles of bootstrap absolute prediction errors for all risk factors. In Figure 4 estimates of QAPEs are presented as green circles, while the estimated QMAPEs are represented by green lines; these results are also presented in Tables 2 and 3. What is important, for the same order of the quantile, QAPEs are greater than respective QMAPEs. For example, it is estimated that the median of absolute prediction errors for the HYBRIDtrans f er risk factor equals 0.89, while for all risk factors it is 0.60. It is not surprising because QAPEs are estimated for the unobserved green risk factor HYBRIDtrans f er, while the QMAPE is computed for all risk factors, only one of which has not been observed.

Statistic
Value Table 3. Values of the predictor and estimators of prediction accuracy measures for HYBRIDtrans f er risk factor.

Statistic
Valuê Summing up, the presented methodology enables prediction of claim counts and estimation of the prediction accuracy for risk factors even not present in the current portfolio. It is crucial due to the necessity of extending the decision-making process in the insurance company to so far not taken into account information on new green technologies (in our case: hybrid motorcycle engine). In addition, it makes it possible to compare the prediction accuracy with other individual risk factors and also with the whole risk portfolio.

Discussion
The development of sustainable insurance leads to new challenges which influence management processes in insurance companies. The paper deals with the problem of taking account of new risk factors resulting from the emergence of green technologies in the premium pricing process, cf. [4]. The GLMM extension of the commonly used Bühlmann-Straub model is studied. The problem of the prediction of the number of claims is extended, comparing with [16], by estimation of the prediction accuracy. New measures are presented too-they make it possible not only to assess the prediction accuracy but also to compare the prediction accuracy for new green risk factors with the accuracy of the whole risk portfolio. They are also not limited to the assessment of the average prediction accuracy but enable a description of the whole distribution of absolute prediction errors. The authors believe that the considered issue of assessing the prediction accuracy for risk factors currently not present in the portfolio is crucial for the effective conduct of the insurance business in accordance with the principles of sustainable development. The theoretical considerations presented herein are supported by an application based on a real longitudinal dataset of policies sold by a Polish insurance company.
Of course the proposed bootstrap estimation procedure is not limited to just dealing with new risk factors classified as "green". The presented methodology can be used for any new risk factors including new distribution channels, regions or customer groups. In practice, a large percentage of changes in (non-life) insurance data concerns information related to the climate change and the premium level adaptation to new technologies. Basically, incorporating "green" risk factors in modeling is not the same as reducing the pure premium in terms of data prediction. It is only the premium better adjustment for the future along with an increase in the number of observed periods in the sample. What is more, based on our results insurers can take into account not only the value of the predictor but also the distribution of the predictions errors, which can lead to more effective business decisions. Generally, values of the predictors and higher estimated prediction errors in the case of unobserved risk factors do not have to justify the reduction of the premium at least until new observations of claims are available. However, the insurers can take into account not only their own historical data but also wider information from the entire market.