Approximation of Zero-Inﬂated Poisson Credibility Premium via Variational Bayes Approach

: While both zero-inﬂation and the unobserved heterogeneity in risks are prevalent issues in modeling insurance claim counts, determination of Bayesian credibility premium of the claim counts with these features are often demanding due to high computational costs associated with a use of MCMC. This article explores a way to approximate credibility premium for claims frequency that follows a zero-inﬂated Poisson distribution via variational Bayes approach. Unlike many existing industry benchmarks, the proposed method enables insurance companies to capture both zero-inﬂation and unobserved heterogeneity of policyholders simultaneously with modest computation costs. A simulation study and an empirical analysis using the LGPIF dataset were conducted and it turned out that the proposed method outperforms many industry benchmarks in terms of prediction performances and computation time. Such results support the applicability of the proposed method in the posterior ratemaking practices.


Introduction
Credibility premium has been widely used in actuarial practice to capture unobserved heterogeneity of policyholders via historical claim experiences. Traditionally, the Poissongamma random effects model has been used as a benchmark to model claim frequency with unobserved heterogeneity (Dionne and Vanasse 1989).
It is also well-known that the traditional Poisson-gamma random effects model can enjoy natural conjugacy between the underlying distribution and the prior distribution of the random effects so that both the posterior distribution of the random effects and predictive premiums are readily available in closed forms, which is quite effective to compute individual premium for millions of policyholders.
In spite of the aforementioned benefits, the traditional Poisson-gamma random effects model can be too restrictive due to natural indication of zero-inflation in claim frequency, which has been shown in many empirical studies including, but not limited to, Shi and Zhao (2020), Zhang et al. (2020), and Lee (2021).
Inspired by presence of zero-inflation in a longitudinal setting, Boucher and Denuit (2008) and Boucher et al. (2009) discussed possible use of zero-inflated Poisson models for panel data and subsequently derived credibility premiums under the proposed model. Zhao and Zhou (2012) used copula models to consider zero-inflation and time-dependence simultaneously. Lee and Shi (2019) is another example of using zero-inflated marginal distributions and copulas for longitudinal claims. Chen et al. (2019) considered a non-parametric approach to estimate the individual unobserved heterogeneity using zero-inflated Poisson likelihood with fused LASSO penalty. Note that credibility premiums with these models are less computationally tractable compared to the traditional models such as Poisson-gamma random effects models.
While using a complicated model enables us to consider more realistic features of the observed data, calibration of such model may suffer from computational burden on the optimization and the opacity of the model, which makes the use of such model less attractive to the practitioners.
In this regard, there have been some attempts to approximate the Bayes credibility premium under a complicated model with relatively simpler form. Bühlmann credibility premium was proposed as a linear approximation of Bayes credibility premium (Bühlmann and Gisler 2006). Najafabadi (2010) and Najafabadi et al. (2012) considered a new approach to approximate the Bayes credibility premium with a simpler credibility premium via maximum entropy principle. However, their models do not include regression coefficients so that it may not capture observed heterogeneity in tariffication unlike the aforementioned literature. Oh et al. (2021) used similar approach to analyze impacts of historical frequency and severity on posterior ratemaking.
In this article, we propose a new approach to consider zero-inflation and timedependence of claim frequency that provide a relatively simpler form of credibility premium via variational Bayes (VB) method. VB approach has received a lot of attention as a powerful alternative to Markov Chain Monte Carlo (MCMC) method. VB method is based on optimization, which provides the closest approximation to the true posterior (Jordan et al. 1999). Among a predetermined family of distributions, VB finds an optimal distribution using Kullback-Leibler (KL) divergence as a measure to characterize the dissimilarity. Ranganath et al. (2014) suggested a new optimization technique for VB using Monte Carlo (MC) estimates. Recently, Saha et al. (2020) proposed a geometric variational Bayes approach that uses L 2 distance instead of KL divergence. This paper is organized as follows. In Section 2, we briefly review the concept of VB and specify our proposed model with details of optimization and premium calculation. Section 3 provides a simulation study to assess applicability of the proposed method. Section 4 presents estimation and validation results on an actual insurance dataset. We conclude the paper in Section 5 with a few remarks.

Claim Frequency Model with Longitudinality and Zero-Inflation
Let N it be the number of accidents for each policyholder i at time t = 1, . . . , T i . Insurance exposure e it ∈ [0, 1] and explanatory variables x it are defined accordingly. Traditionally, claims frequency has been modeled by Poisson distribution with covariates as follows: Based on usual longitudinality of Property and Casualty (P&C) insurance claim datasets, one can also consider the following extension by incorporating random effects as follows: Note that we enforce a restriction on π N (θ) so that E[θ i ] = 1, due to the identifiability issue.
By assuming π N (θ) ∝ θ γ−1 exp(−θγ), one can easily show that the predictive distribution of N i,T i +1 is given as which has been shown in actuarial literature including, but not limited to, Frangos and Vrontos (2001), Jeong (2020), and Jeong and Valdez (2020). While use of the aforementioned Poisson-gamma model allows us to evaluate the individual predictive premium for a large portfolio at ease and naturally captures possible overdispersion, such a model cannot reflect the possibility of zero-inflation in claim frequency. Therefore, one can incorporate both zero-inflation and longitudinality of the claim frequency as follows: where N ∼ Z IP (p, ν) means In spite of flexibility of the model in (4), we have some issues on the posterior analysis of θ with the proposed model. If we assume π(θ) ∝ θ γ−1 e −θγ , we cannot obtain the closed form expression of the posterior density π(θ i |F i,T i ) := π(θ i |N i1 , . . . , N i,T i ), which is defined as follows: By noting π(θ i |F i,T i ) ∝ π(θ i ) ∏ T i t=1 p(N it |θ i ), one can try to find posterior samples of θ i by MCMC, which is usually quite time consuming and sometimes infeasible to be implemented for calculation of individual predictive premium in an insurance portfolio, which usually contains millions of policyholders. According to a numerical experiment in the Section 4 of Ahn et al. (2021a), it turns out that a use of MCMC for calculation of individual predictive premium requires excessive amounts of time compared to other methods.
In this regard, we have developed a new approach that are less computationally expensive and more accurate to approximate the true posterior π(θ i |F i,T i ) via a surrogate function using variational Bayes method. See the Section 2.2 for the details of our variational algorithm.

Variational Bayes
Variational Bayes (VB) is an optimization method for obtaining the best approximation to the true posterior among the predetermined family of distributions.
In the following, we propose a use of variational Bayes approach to approximate π(θ i |F i,T i ) defined in (5). Considering an insurance portfolio with M policyholders, we define prior distributions for the random effects θ = (θ 1 , . . . , θ M ) as follows: Observe that we use the same values for both shape parameter and rate parameter in (6) such that E[θ i ] = 1 for i = 1, . . . , M. Given θ, the likelihood of model (4) is written as: As the first step for using VB method, a family of distributions needs to be selected. We call the family of distribution variational family (VF) in which each distribution can be easily controlled by its own parameters. While there is no universal rule for the choice of VF family, one simple choice of VF is mean-field family (Blei et al. 2017), where the latent variables are mutually independent. For the sake of containing observed or estimated information (N it , p it , ν it ) and maintaining connection with the prior distribution, our choice of variational family is independent gamma family as follows: Note that . As a special case of the proposed model, if p it , the zero-inflation probability, equals 0 for all i and t, then the variational distribution q(θ; γ q ) coincides with the true posterior distribution.
We point out that a variational distribution q(θ; γ q ) ∈ Q is characterized by a parameter γ q referred to as the variational parameter, which will be updated in our VB algorithm. Although the variational family Q does not always contain the true posterior, we can find the optimal distribution q(θ; γ * q ), which is closest to the true posterior in terms of Kullback-Leibler (KL) divergence: Here, we define a function of the variational parameter: The function (8) is called the Evidence Lower Bound (ELBO). One can easily show that minimizing (7) is equivalent to maximizing (8) (Wainwright and Jordan 2008) such that In order to find γ * q , we employ a version of gradient descent algorithm, the stochastic optimization method in Ranganath et al. (2014). For this, it is needed to compute the gradient of the ELBO: where ∇ γ q means ∂ ∂γ q . The third equality is due to the fact ∇ γ q q(θ; γ q ) = q(θ; γ q )∇ γ q log q(θ; γ q ) and the last equality holds because the expected value of the score function is zero: Below is the detailed derivation of (9): With step sizes ρ l , l = 0, 1, 2, . . . , which satisfy Robbins-Monro conditions (Robbins and Monro 1951), and using Monte Carlo (MC) estimate for (9), we iteratively update the given initial γ 0 q until the ELBO converges: is the MC estimate for (9) based on the samples from the current variational distribution, θ 1 , . . . , θ S ∼ q(θ, γ l q ).

Data and Results
To assess applicability of the proposed method, here we perform numerical studies using both simulated datasets and an actual insurance claim portfolio, which correspond to the aforementioned actuarial application where we have indication of zero-inflation in the claim counts observed over time. In this section, we first introduce the industry benchmarks for the ratemaking purpose. After that, both the proposed model and the benchmarks are calibrated with given data. Finally, the posterior premiums under each model are computed and compared to the actual claim counts in the out-of-sample validation set to assess the prediction performances.
Note that all the calculations in this section and thereafter were performed using R, and a computer with Intel Core i7-8565U at 1.80 Ghz 4 cores, 16 GB memory.

Simulation Study
We generate {N it } i=1,...,5000, t=1,...,6 with the following hierarchical distributions: We consider some frequency models and corresponding premium calculation for comparison. First, we use models without zero-inflation whose premium calculation are given as follows: Note that NP and PG models are specified with the same mean structure so that we estimatedα 0 andα 1 via glm function in R, which are still consistent regardless of possible misspecification in the working correlation structure. (Zeger et al. 1988) The estimation took around 0.07 s and the a priori premium E N i,T i+1 = exp(α 0 +α 1 X i,T i+1 ) is the same in both NP and PG models while the posterior premiums vary.
Further, we also used models with zero-inflation whose premium calculations are given as follows: i } r=1,...,R are posterior samples of θ i via MCMC. Note that the value of R should be large enough for the convergence of the posterior distribution while it also has a substantial impact on the computational time. To achieve a balance between the computational cost and prediction accuarcy, we set R = 30,000.
• True (TR): Again, due to the same mean structure,η 0 ,η 1 ,α 0 andα 1 are commonly estimated via zeroinfl function in R for ZIP, VB, BA, and PG models. γ * is estimated via variational Bayes approach and used both in PG and VB models. The estimation took around 1.27 s and the a priori premium is the same while the posterior premiums vary. Note that our main interest is to establish a way to compute the predictive premium of N i,T+1 given N i,1 , . . . , N i,T with less computational cost. While one could estimate α and γ (note that θ i are treated as random in our framework) via applying an EM algorithm as in Tzougas and Karlis (2020) or Tzougas and Jeong (2021), we chose a rather simpler way as in Pechon et al. (2019Pechon et al. ( , 2020, to focus on different characterizations of E[θ i |N i,1 , . . . , N i,T ] under different models. Note that the comparison of all models were done at the same ground (the fixed effects were estimated in the same way as long as they have the same marginal mean structure) to assure that we make a fair comparison of the models.
By doing so, we focus on the efficiency of each model to incorporate the unobserved heterogeneity rather than estimation accuracy of the fixed effects. Note that the true model assumes perfect knowledge on the unobserved heterogeneity θ i for each policyholder i (while it still allows for possible estimation errors in the fixed effects), which is not available in practice and only used as an (unattainable) benchmark.
After the benchmarks and the proposed model are specified, we assess the prediction performances of the models via root-mean squared error (RMSE), mean absolute error (MAE) that are defined as follows: Note that RMSE and MAE measure the discrepancy between the actual values and predicted values in L 2 and L 1 norms, respectively, so that we prefer a model with lower values of RMSE and MAE. We also prefer a model with less computation time since it is required to evaluate individual posterior premium for a portfolio that consists of millions of policyholders in general. Out-of-sample validation results with the simulated data are provided in Table 1.

Case Study-Posterior Ratemaking with the LGPIF Data
For the empirical analysis, a public dataset on insurance claim provided by the Wisconsin Local Government Property Insurance Fund (LGPIF) is used. The dataset consists of claims information on multiple coverages and corresponding policy characteristics that have been observed from years 2006 to 2011. Among the information on multiple coverage, we only use inland marine (IM) claims information. Note that observations from years 2006-2010 are used to train the frequency models while observations from year 2011 are used to validate the trained models and compare their performance. Table 2 summarizes the distributions of covariates, which are used to determine rating factors for each policyholder via a regression model. For more detailed explanation and preliminary analysis of given dataset, see Frees et al. (2016). Note that the LGPIF dataset used here is a so-called traditional dataset while there has been emerging interest in new types of insurance claim datasets, which are high-dimensional and contain more information on the heterogeneity of the policyholders, such as telematics data (Gao et al. 2021). However, uses of such high-dimensional data and related models are still in a developing stage. Further, analyses of traditional datasets with limited range of features could be still meaningful, especially in the sense that the proposed method explores an efficient way to capture the unobserved heterogeneity of each policyholder that are not explainable by the available covariates. Before we consider frequency models with zero-inflation, it is natural to test presence of zero-inflation from the data. To detect the zero-inflation in Poisson distribution, equivalently to test whether the zero-inflation parameter p 0i = 0 or not, Van den Broek (1995) showed that With this data, the value of S(α) for IM claims is 1.967 and corresponding p-value is 0.08038. Therefore, one can conclude that there is a not negligible level of overdispersion in IM claim frequencies.
Based on such indications, now we train the models that were used in Section 3 except for the true model, which assumes perfect knowledge on unobserved heterogeneity of the policyholders. As in the simulation study, we first estimate the fixed effects, only α for the models without zero-inflation via glm and η and α for the models with zero-inflation via zeroinfl, respectively. Table 3 summarizes the estimated coefficients for the fixed effects. Note thatα from the models without zero-inflation andα from the ones with zero-inflation are not comparable due to the presence of covariate impacts on zero-inflation. After the fixed effects are estimated, we can compute the individual posterior premiums based on the covariates information in the out-of-sample validation set and claims history of each policyholder under the specified models as in Table 4.

Discussion of the Results
According to the out-of-sample validation results of the simulation study in Table 1, it is observed that the proposed method, VB method outperforms NP, PG, NZIP, and BA models in terms of predictive performances. (Note that the true model is the best in terms of predictive performance as expected while it is not available in practice.) Even though the computation time under BA model was excessive, its performance improvement was not significant over other benchmarks. To assess the stability of the simulation results under the parameter changes, we also performed simulation studies with different sets of parameters and it showed the consistent pattern as in Table 1, while detailed results are omitted here. Note that the simulation studies conducted here are still restrictive due to pre-specified model structure and limited number of covariates so that there is no assurance that the proposed model outperforms the full Bayes model in general. As in the simulation study, the proposed model also shows the best performance on the prediction results with the LGPIF data in terms of both RMSE and MAE, and much less computation time compared to the BA model that uses MCMC to estimate the individual unobserved heterogeneity.
In that regard, the numerical illustrations given in Section 3 shows us the applicability of the proposed method as an acceptable approximation of the true unobserved heterogeneity that requires much less computation time compared to a naive use of MCMC in the presence of zero-inflation in claim counts.

Conclusions
As insurance companies are interested in better risk classification and tarrification by incorporating prevalent features of claims data such as indication of zero-inflation and the unobserved heterogeneity, computational costs in model calibration and individual premium calculation have been obstacles of using complicated models. To tackle this issue, we proposed a way to approximate the posterior density of the unobserved heterogeneity in risks with consideration of zero-inflation, which leads to an analytic form of the posterior premium. It was also shown that the proposed method can be used an alternative of full Bayes method due to its predictive performance and less computation cost.
The proposed approach is limited in the sense that the random effect θ, which captures the unobserved heterogeneity, is assumed to be static. It means there is no room for evolution of the unobserved risk characteristics of a policyholder over time under this model, which is somewhat unrealistic. While there are some research work focused on the use of dynamic random effects for determination of credibility premium (Ahn et al. 2021b;Pinquet 2020), calibration and prediction of dynamic random effects models are often computationally intensive and intractable. Therefore, as a direction for future research, one can expand the class of variational family so that impacts of dynamic random effects can be incorporated in the posterior premium calculation.

Data Availability Statement:
The LGPIF dataset that has been used in this article is publicly available at https://sites.google.com/a/wisc.edu/jed-frees/ (accessed on 10 January 2022). The codes used in the analyses of simulated dataset and the LGPIF dataset are available at https://github.com/ ssauljin/VB_ZIP (accessed on 10 January 2022).

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: LGPIF