# Approximation of Zero-Inflated Poisson Credibility Premium via Variational Bayes Approach

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Proposed Methodology

#### 2.1. Claim Frequency Model with Longitudinality and Zero-Inflation

#### 2.2. Variational Bayes

**variational family (VF)**in which each distribution can be easily controlled by its own parameters. While there is no universal rule for the choice of VF family, one simple choice of VF is mean-field family (Blei et al. 2017), where the latent variables are mutually independent.

**variational parameter,**which will be updated in our VB algorithm. Although the variational family $\mathcal{Q}$ does not always contain the true posterior, we can find the optimal distribution $q(\theta ;{\gamma}_{q}^{*})$, which is closest to the true posterior in terms of Kullback–Leibler (KL) divergence:

## 3. Data and Results

`R`, and a computer with Intel Core i7-8565U at 1.80 Ghz 4 cores, 16 GB memory.

#### 3.1. Simulation Study

- Naive Poisson (NP): $\widehat{\mathbb{E}\left[{N}_{i,{T}_{i+1}}|{\mathcal{F}}_{i,{T}_{i}}\right]}={\widehat{\nu}}_{i,{T}_{i}+1}$.
- Poisson-Gamma (PG): $\widehat{\mathbb{E}\left[{N}_{i,{T}_{i+1}}|{\mathcal{F}}_{i,{T}_{i}}\right]}=\frac{{\gamma}^{*}+{\sum}_{t=1}^{{T}_{i}}{N}_{it}}{{\gamma}^{*}+{\sum}_{t=1}^{{T}_{i}}{\widehat{\nu}}_{it}}{\widehat{\nu}}_{i,{T}_{i}+1}$.

`glm`function in

`R`, which are still consistent regardless of possible misspecification in the working correlation structure. (Zeger et al. 1988) The estimation took around 0.07 s and the a priori premium $\widehat{\mathbb{E}\left[{N}_{i,{T}_{i+1}}\right]}=exp({\widehat{\alpha}}_{0}+{\widehat{\alpha}}_{1}{X}_{i,{T}_{i+1}})$ is the same in both NP and PG models while the posterior premiums vary.

- Naive ZIP (NZIP): $\widehat{\mathbb{E}\left[{N}_{i,{T}_{i+1}}|{\mathcal{F}}_{i,{T}_{i}}\right]}=(1-{\widehat{p}}_{i,{T}_{i}+1}){\widehat{\nu}}_{i,{T}_{i}+1}$.
- Proposed (VB): $\widehat{\mathbb{E}\left[{N}_{i,{T}_{i+1}}|{\mathcal{F}}_{i,{T}_{i}}\right]}=\frac{{\gamma}^{*}+{\sum}_{t=1}^{{T}_{i}}{N}_{it}}{{\gamma}^{*}+{\sum}_{t=1}^{{T}_{i}}(1-{\widehat{p}}_{it}){\widehat{\nu}}_{it}}(1-{\widehat{p}}_{i,{T}_{i}+1}){\widehat{\nu}}_{i,{T}_{i}+1}$.
- Bayes (BA): $\widehat{\mathbb{E}\left[{N}_{i,{T}_{i+1}}|{\mathcal{F}}_{i,{T}_{i}}\right]}=(1-{\widehat{p}}_{i,{T}_{i}+1}){\widehat{\nu}}_{i,{T}_{i}+1}\xb7\frac{1}{R}{\sum}_{r=1}^{R}{\theta}_{i}^{\left(r\right)}$ where ${\left\{{\theta}_{i}^{\left(r\right)}\right\}}_{r=1,\dots ,R}$ are posterior samples of ${\theta}_{i}$ via MCMC. Note that the value of R should be large enough for the convergence of the posterior distribution while it also has a substantial impact on the computational time. To achieve a balance between the computational cost and prediction accuarcy, we set $R=$ 30,000.
- True (TR): $\widehat{\mathbb{E}\left[{N}_{i,{T}_{i+1}}|{\mathcal{F}}_{i,{T}_{i}}\right]}=(1-{\widehat{p}}_{i,{T}_{i}+1}){\widehat{\nu}}_{i,{T}_{i}+1}{\theta}_{i}$.

`zeroinfl`function in

`R`for ZIP, VB, BA, and PG models. ${\gamma}^{*}$ is estimated via variational Bayes approach and used both in PG and VB models. The estimation took around 1.27 s and the a priori premium

#### 3.2. Case Study—Posterior Ratemaking with the LGPIF Data

`glm`and $\eta $ and $\alpha $ for the models with zero-inflation via

`zeroinfl`, respectively. Table 3 summarizes the estimated coefficients for the fixed effects. Note that $\widehat{\alpha}$ from the models without zero-inflation and $\widehat{\alpha}$ from the ones with zero-inflation are not comparable due to the presence of covariate impacts on zero-inflation.

## 4. Discussion of the Results

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

LGPIF | Local Government Property Insurance Fund |

KL | Kullback-Leibler |

MCMC | Markov Chain Monte Carlo |

VB | Variational Bayes |

VF | Variational family |

## References

- Ahn, Jae Youn, Himchan Jeong, and Yang Lu. 2021a. On the ordering of credibility factors. Insurance: Mathematics and Economics 101: 626–38. [Google Scholar]
- Ahn, Jae Youn, Himchan Jeong, and Yang Lu. 2021b. A simple bayesian state-space model for the collective risk model. arXiv arXiv:2110.09657. [Google Scholar]
- Blei, David M., Alp Kucukelbir, and Jon D. McAuliffe. 2017. Variational inference: A review for statisticians. Journal of the American Statistical Association 112: 859–77. [Google Scholar] [CrossRef] [Green Version]
- Boucher, Jean-Philippe, and Michel Denuit. 2008. Credibility premiums for the zero-inflated poisson model and new hunger for bonus interpretation. Insurance: Mathematics and Economics 42: 727–35. [Google Scholar] [CrossRef]
- Boucher, Jean-Philippe, Michel Denuit, and Montserrat Guillen. 2009. Number of accidents or number of claims? An approach with zero-inflated poisson models for panel data. Journal of Risk and Insurance 76: 821–46. [Google Scholar] [CrossRef] [Green Version]
- Bühlmann, Hans, and Alois Gisler. 2006. A course in Credibility Theory and Its Applications. New York: Springer Science & Business Media. [Google Scholar]
- Chen, Kun, Rui Huang, Ngai Hang Chan, and Chun Yip Yau. 2019. Subgroup analysis of zero-inflated poisson regression model with applications to insurance data. Insurance: Mathematics and Economics 86: 8–18. [Google Scholar] [CrossRef]
- Dionne, Georges, and Charles Vanasse. 1989. A generalization of automobile insurance rating models: The negative binomial distribution with a regression component. ASTIN Bulletin: The Journal of the IAA 19: 199–212. [Google Scholar] [CrossRef]
- Frangos, Nicholas E., and Spyridon D. Vrontos. 2001. Design of optimal bonus-malus systems with a frequency and a severity component on an individual basis in automobile insurance. ASTIN Bulletin: The Journal of the IAA 31: 1–22. [Google Scholar] [CrossRef] [Green Version]
- Frees, Edward W., Gee Lee, and Lu Yang. 2016. Multivariate frequency-severity regression models in insurance. Risks 4: 4. [Google Scholar] [CrossRef] [Green Version]
- Gao, Guangyuan, Yanlin Shi, and He Wang. 2021. Telematics Car Driving Data Analytics. SOA General Insurance Research Reports. Available online: https://www.soa.org/globalassets/assets/files/resources/research-report/2021/telematics-driving-data.pdf (accessed on 10 January 2022).
- Jeong, Himchan. 2020. Testing for random effects in compound risk models via bregman divergence. ASTIN Bulletin: The Journal of the IAA 50: 777–98. [Google Scholar] [CrossRef]
- Jeong, Himchan, and Emiliano A. Valdez. 2020. Predictive compound risk models with dependence. Insurance: Mathematics and Economics 94: 182–95. [Google Scholar] [CrossRef]
- Jordan, Michael I., Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. 1999. An introduction to variational methods for graphical models. Machine Learning 37: 183–233. [Google Scholar] [CrossRef]
- Lee, Gee Y., and Peng Shi. 2019. A dependent frequency–severity approach to modeling longitudinal insurance claims. Insurance: Mathematics and Economics 87: 115–29. [Google Scholar] [CrossRef]
- Lee, Simon C. K. 2021. Addressing imbalanced insurance data through zero-inflated poisson regression with boosting. ASTIN Bulletin: The Journal of the IAA 51: 27–55. [Google Scholar] [CrossRef]
- Najafabadi, Amir T. Payandeh. 2010. A new approach to the credibility formula. Insurance: Mathematics and Economics 46: 334–38. [Google Scholar]
- Najafabadi, Amir T. Payandeh, Hamid Hatami, and Maryam Omidi Najafabadi. 2012. A maximum-entropy approach to the linear credibility formula. Insurance: Mathematics and Economics 51: 216–21. [Google Scholar]
- Oh, Rosy, Youngju Lee, Dan Zhu, and Jae Youn Ahn. 2021. Predictive risk analysis using a collective risk model: Choosing between past frequency and aggregate severity information. Insurance: Mathematics and Economics 96: 127–39. [Google Scholar] [CrossRef]
- Pechon, Florian, Michel Denuit, and Julien Trufin. 2019. Multivariate modelling of multiple guarantees in motor insurance of a household. European Actuarial Journal 9: 575–602. [Google Scholar] [CrossRef] [Green Version]
- Pechon, Florian, Michel Denuit, and Julien Trufin. 2020. Home and motor insurance joined at a household level using multivariate credibility. Annals of Actuarial Science 15: 82–114. [Google Scholar] [CrossRef]
- Pinquet, Jean. 2020. Positivity properties of the arfima (0, d, 0) specifications and credibility analysis of frequency risks. Insurance: Mathematics and Economics 95: 159–65. [Google Scholar] [CrossRef]
- Ranganath, Rajesh, Sean Gerrish, and David Blei. 2014. Black box variational inference. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics. Reykjavik: PMLR, pp. 814–22. [Google Scholar]
- Robbins, Herbert, and Sutton Monro. 1951. A stochastic approximation method. The Annals of Mathematical Statistics 22: 400–7. [Google Scholar] [CrossRef]
- Saha, Abhijoy, Karthik Bharath, and Sebastian Kurtek. 2020. A geometric variational approach to bayesian inference. Journal of the American Statistical Association 115: 822–35. [Google Scholar] [CrossRef] [Green Version]
- Shi, Peng, and Zifeng Zhao. 2020. Regression for copula-linked compound distributions with applications in modeling aggregate insurance claims. The Annals of Applied Statistics 14: 357–80. [Google Scholar] [CrossRef] [Green Version]
- Tzougas, George, and Dimitris Karlis. 2020. An em algorithm for fitting a new class of mixed exponential regression models with varying dispersion. ASTIN Bulletin: The Journal of the IAA 50: 555–83. [Google Scholar] [CrossRef]
- Tzougas, George, and Himchan Jeong. 2021. An expectation-maximization algorithm for the exponential-generalized inverse gaussian regression model with varying dispersion and shape for modelling the aggregate claim amount. Risks 9: 19. [Google Scholar] [CrossRef]
- Van den Broek, Jan. 1995. A score test for zero inflation in a poisson distribution. Biometrics 51: 738–43. [Google Scholar] [CrossRef]
- Wainwright, Martin J., and Michael I. Jordan. 2008. Introduction to variational methods for graphical models. Foundations and Trends in Machine Learning 1: 1–103. [Google Scholar] [CrossRef] [Green Version]
- Zeger, Scott L., Kung-Yee Liang, and Paul S. Albert. 1988. Models for longitudinal data: A generalized estimating equation approach. Biometrics 44: 1049–60. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Pengcheng, Enrique Calderin, Shuanming Li, and Xueyuan Wu. 2020. On the type i multivariate zero-truncated hurdle model with applications in health insurance. Insurance: Mathematics and Economics 90: 35–45. [Google Scholar] [CrossRef]
- Zhao, Xiaobing, and Xian Zhou. 2012. Copula models for insurance claim numbers with excess zeros and time-dependence. Insurance: Mathematics and Economics 50: 191–99. [Google Scholar] [CrossRef]

NP | PG | NZIP | VB | BA | TR | |
---|---|---|---|---|---|---|

RMSE | 5.1342 | 3.4719 | 4.1796 | 2.9404 | 3.3008 | 0.7636 |

MAE | 0.4254 | 0.3940 | 0.4068 | 0.3768 | 0.3835 | 0.2822 |

Computation Time | 0.07 | 1.57 | 1.27 | 380.33 | 6492.93 | 1.27 |

Categorical Variables | Description | Proportions | ||
---|---|---|---|---|

TypeCity | Indicator for city entity: | Y = 1 | 14% | |

TypeCounty | Indicator for county entity: | Y = 1 | 5.78% | |

TypeMisc | Indicator for miscellaneous entity: | Y = 1 | 11.04% | |

TypeSchool | Indicator for school entity: | Y = 1 | 28.17% | |

TypeTown | Indicator for town entity: | Y = 1 | 17.28% | |

TypeVillage | Indicator for village entity: | Y = 1 | 23.73% | |

NoClaimCreditIM | No IM claim in three consecutive prior years: | Y = 1 | 42.1% | |

Continuous Variables | Minimum | Mean | Maximum | |

CoverageIM | Log coverage amount of IM claim in mm | 0 | 0.8483 | 46.7493 |

lnDeductIM | Log deductible amount for IM claim | 0 | 5.340 | 9.210 |

No ZI | With ZI | |||||
---|---|---|---|---|---|---|

$\mathbf{\alpha}$ | $\mathbf{\eta}$ | $\mathbf{\alpha}$ | ||||

Estimate | p-Value | Estimate | p-Value | Estimate | p-Value | |

(Intercept) | −4.0315 | 0.0000 | 3.6900 | 0.0011 | −0.7553 | 0.4308 |

TypeCity | 0.9437 | 0.0000 | 1.8268 | 0.0903 | 1.7887 | 0.0080 |

TypeCounty | 1.7300 | 0.0000 | −0.2296 | 0.8584 | 1.3579 | 0.0485 |

TypeMisc | −2.7326 | 0.0071 | 0.9887 | 0.8437 | −1.9120 | 0.6441 |

TypeSchool | -0.9172 | 0.0010 | 2.9200 | 0.0076 | 1.5831 | 0.0396 |

TypeTown | −0.3960 | 0.1531 | 1.4311 | 0.2196 | 0.7086 | 0.4187 |

CoverageIM | 0.0664 | 0.0000 | −0.2242 | 0.0002 | 0.0553 | 0.0000 |

lnDeductIM | 0.1353 | 0.0031 | −0.4826 | 0.0018 | −0.2183 | 0.0509 |

NoClaimCreditIM | −0.3690 | 0.0049 | −0.3249 | 0.4854 | −0.5103 | 0.0746 |

NP | PG | NZIP | VB | BA | |
---|---|---|---|---|---|

RMSE | 0.3797 | 0.2334 | 0.2291 | 0.1794 | 0.1873 |

MAE | 0.1267 | 0.1167 | 0.1174 | 0.1098 | 0.1133 |

Computation Time | 0.02 | 1.32 | 0.64 | 71.54 | 1310.58 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kim, M.; Jeong, H.; Dey, D.
Approximation of Zero-Inflated Poisson Credibility Premium via Variational Bayes Approach. *Risks* **2022**, *10*, 54.
https://doi.org/10.3390/risks10030054

**AMA Style**

Kim M, Jeong H, Dey D.
Approximation of Zero-Inflated Poisson Credibility Premium via Variational Bayes Approach. *Risks*. 2022; 10(3):54.
https://doi.org/10.3390/risks10030054

**Chicago/Turabian Style**

Kim, Minwoo, Himchan Jeong, and Dipak Dey.
2022. "Approximation of Zero-Inflated Poisson Credibility Premium via Variational Bayes Approach" *Risks* 10, no. 3: 54.
https://doi.org/10.3390/risks10030054