Non-Differentiable Loss Function Optimization and Interaction Effect Discovery in Insurance Pricing Using the Genetic Algorithm
Abstract
:1. Introduction
2. Interaction Term and Variable Selection
2.1. Statistical Interaction
2.2. Related Work on Variable Selection Methods
- Direct Modeling: This class includes methods that model both interaction and main effects directly. Techniques such as tree-based models can split using different predictors at various nodes, thus accommodating non-linear, non-additive terms. Examples include
- Regression trees and Random Forests capable of modeling complex interaction structures (Friedman and Popescu 2008; Ke et al. 2017).
- Generalized Additive Models (GAMs) and their extensions that incorporate interaction terms (Hastie et al. 2009; Wood 2006).
- Performance Comparison Methods: This class involves comparing the performance of models with and without certain interactions (referred to as restricted and unrestricted models). Techniques include
- The all-possible-regression method, where all combinations of main effects and interactions are fitted separately. This becomes computationally intensive for large p (Wood 2006).
- H-statistics using partial dependence plots to assess pairwise interactions (Hastie et al. 2009).
- Feature importance metrics from tree-based models, though these may suffer from issues like collinearity (Lundberg et al. 2020).
- Tree-based extended GAMs () that detect interactions but often have limited loss function choices (Lou et al. 2013).
2.3. Genetic Algorithm
3. Methodology
3.1. GA for Variable Selection
- 0
- Initialization step: I different models are generated at random, with the jth model of the ith generation. corresponds to the population of models that will serve as an input to generate the population of models of the first generation (see the selection step below).
- 1
- Evaluation step: Each model of is evaluated against a fitting criterion. In this article, we use the concordance probability C as this criterion Ponnet et al. (2021).
- 2
- Selection step: The estimates of the fitting criterion, concordance probability C in our case, of each model of the current generation are transformed into sampling probabilities. Given that the C of the null model equals 0.5, an intuitive way of constructing these probabilities corresponds to , where and are the sampling probability and C, respectively, of model j of generation i. These sampling probabilities are used to sample a new population of size I of models for generation , randomly (with replacement) from the population of models of generation i.
- 3
- Crossover step: At random, two models are selected (without replacement) from the current population. Next, both strings are aligned, a random position is chosen in the string, beyond which the strings are interchanged between both models. The reasoning is that this operation introduces some randomness to prevent premature convergence of the GA. However, the resulting models are still quite familiar to the pre-crossover models, since a lot of features are still shared over both sets of models. An example of such a crossover step is shown in Figure 2.
- 4
- Mutation step: At random, one digit in the bit string of each model is switched to its opposite value, hence from 0 to 1 or from 1 to 0. See Figure 2 for a visual display of the mutation step.
- 5
- Addition step: For every model, it is verified that every variable involved in at least one interaction term also contains its corresponding main effect. This step is typically not present in most GAs.
3.2. Concordance Probability C
4. Application
4.1. Data Set Description
- uwYear: called CalYear in the original dataset, it refers to the underwriting/renewal year we are looking at (2009 or 2010);
- gender: called Gender in the original dataset, it refers to the gender (male or female) of the policyholder—assumed to be the insured person;
- carType: called Type in the original dataset, it refers to the type of car the insured person is driving (6 different types are defined);
- carCat: called Category in the original dataset, it refers to category of the car that the policyholder is driving (Small, Medium or Large);
- job: called Occupation in the original dataset, it refers to the type of occupation of the policyholder (5 categories are defined: Employed, Housewife, Retired, Self-employed and Unemployed);
- age: called Age in the original dataset, it refers to the age of the policyholder (not binned a priori—see below);
- group1: called Group1 in the original dataset, it splits the dataset in 20 different groups;
- bm: called Bonus in the original dataset, it refers to the bonus-malus level of the policyholder; this is a scale from −50 to 150 assumed to reflect the driver quality (in terms of past claims);
- nYears: called PolDur in the original dataset, it refers to the duration of the policyholder contract (it goes from 0 for new business to 15);
- carVal: called Value in the original dataset, it provides the value of the car in the range between EUR 1.000 and EUR 49.995 (a priori, this variable is not binned—see below);
- cover: called Adind in the original dataset, it is binary information of whether or not the policyholder is subscribing ancillaries;
- density: called Density in the original dataset, it provides information on the average density where the policyholder is driving (this is a number between 14.37 and 297.38); this variable is a priori not binned (see below).
4.2. Parameter Tuning in Genetic Algorithms
- nTimesInMods: the minimal number of times a variable should appear over the different interaction terms over all models at the end of a generation;
- nKeptBestMods: the number of top performers of the current generation that are retained at the end of a generation;
- nMods: the population size of each generation.
- nTimesInMods should not be set excessively high, as this could prematurely limit the exploration of the search space by reducing randomness.
- nKeptBestMods needs to be set to a reasonably high value. The standard GA can sometimes overlook promising models; therefore, it is advantageous to fine-tune these models over successive generations by ensuring that some are consistently retained in the model population.
- nMods should also be reasonably high, allowing the GA to explore a broader range of possibilities within the search space. However, a very high nMods value increases the GA’s runtime and can introduce excessive randomness. Therefore, while it should not be set too high, a substantial number is recommended. This parameter is particularly crucial for tuning.
- All final models included at least twelve main and interaction effects. Notably, the worst-performing model contained the maximum of twenty main and interaction effects. This suggests that a well-tuned GA tends to produce models with fewer main and interaction effects than the maximum possible number.
- The best final model shared a significant number of main and interaction effects with the other final models. In every case, at least 50% of the main and interaction effects in a final model overlapped with those in the best final model. Interestingly, the degree of overlap in main and interaction effects tended to increase with better model tuning.
- Throughout our tuning process, the best-performing model in the GA was never selected as the final model upon validation. This highlights the importance of using both training and validation sets to mitigate the risk of overfitting.
- Main effects: age, gender, carCat, bm, carVal, job, density, cover, uwYear, group1.
- Interaction effects: age*gender, carCat*bm, carVal*job, density*gender, gender*job, job*cover, uwYear*gender, uwYear*group1.
4.3. Identifying the Strongest Correlating GLM
- Main effects: age, gender, bm, job, density, cover, uwYear, nYears, group1, carVal and carType.
- Interaction effects: uwYear*bm, age*cover, nYears*cover, uwYear*gender, density*uwYear, density*job, age*uwYear and carVal*carType.
4.4. Claim Size Analysis
5. Discussion and Conclusions
- Higher-Order Interaction Effects: While this study focused on first-order interaction effects, further exploration could investigate strategies to adapt the GA for exploring higher-order interaction effects. Given the exponential search space associated with higher-order interactions, strategies may involve constraints on their inclusion in a generation based on the selection of a (m − 1)th interaction effect in the preceding generation.
- Connection with Shapley Values: The connection between Genetic Algorithms for variable selection, including interaction effects, and popular variable importance methods like Shapley values could be investigated. Although GAs and Shapley values are distinct optimization concepts, they can be related in the context of measuring the contribution of individual variables in different generations (GA) or coalitions (Shapley).
- Consideration of Protected Attributes: In our application, we observed a notable contribution of the gender feature in multiple interaction terms. As gender is often considered a protected attribute, particularly in the insurance pricing literature (e.g., Lindholm et al. (2022)), the approach presented could be adapted to focus specifically on one or multiple protected variables. This adaptation could reveal interaction effects with non-protected variables and contribute to discussions on fairness and discrimination in modeling practices.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Complete List Tuning Parameters
- nVarInit: the number of the interaction terms that are selected at random during the initialization phase. In our application this was set to 10.
- nGens: the number of generations. In our application this was set to 20, as we frequently observed that convergence according the above explained rule was almost always reached by 20 generations.
- nCrossOver: the number of cross-overs. In our application this was set to one for all generations. Note that one could make an nCrossOver change over the different generations.
- nMuts: the number of mutations. In our application this was set to one for all generations. Note that one could make an nMuts change over the different generations.
- nVarMax: the maximum number of interaction terms of a model by the end of the generation. In our application this was set to 20. Note that one could make an nVarMax change over the different generations.
- nRedMods: For models with more than nVarMax variables, the pruning step is repeated nRedModels times. In our application this was set to five. Note that one could make an nRedMods change over the different generations.
- nTimesInMods: the minimal number of times a variable should appear over the different interaction terms over all models at the end of a generation. Note that one could make an nTimesInMods change over the different generations.
- nKeptBestMods: the number of top performers of the current generation that are retained at the end of a generation. Note that one could make an nKeptBestMods change over the different generations.
- nMods: the population size of each generation. Note that one could make an nMods change over the different generations.
Appendix B. Additional Changes to the Vanilla GA
0.58 | 0.16 | 0.04 |
0.59 | 0.18 | 0.12 |
0.60 | 0.20 | 0.20 |
0.62 | 0.24 | 0.36 |
References
- Bamber, Donald. 1975. The area above the ordinal dominance graph and the area under the receiver operating characteristic graph. Journal of Mathematical Psychology 12: 387–415. [Google Scholar] [CrossRef]
- Blier-Wong, Christopher, Hélène Cossette, Luc Lamontagne, and Etienne Marceau. 2020. Machine learning in P&C insurance: A review for pricing and reserving. Risks 9: 4. [Google Scholar] [CrossRef]
- Broadhurst, David, Royston Goodacre, Alun Jones, Jem J. Rowland, and Douglas B. Kell. 1997. Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry. Analytica Chimica Acta 348: 71–86. [Google Scholar] [CrossRef]
- European Insurance and Occupational Pensions Authority (EIOPA). 2019. Big Data Analytics in Motor and Health Insurance: A Thematic Review. Luxembourg: Publications Office of the European Union. [Google Scholar]
- Frees, Edward W., Richard A. Derrig, and Glenn Meyers. 2014. Predictive Modeling Applications in Actuarial Science. Cambridge: Cambridge University Press, vol. 1. [Google Scholar]
- Friedman, Jerome H., and Bogdan E. Popescu. 2008. Predictive learning via rule ensembles. The Annals of Applied Statistics 2: 916–54. [Google Scholar] [CrossRef]
- Gayou, Olivier, Shiva K. Das, Su-Min Zhou, Lawrence B. Marks, David S. Parda, and Moyed Miften. 2008. A genetic algorithm for variable selection in logistic regression analysis of radiotherapy treatment outcomes. Medical Physics 35: 5426–33. [Google Scholar] [CrossRef]
- Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Berlin: Springer, Heidelberg: Springer, vol. 2. [Google Scholar]
- Ke, Guolin, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems 30: 3146–54. [Google Scholar]
- Lindholm, Mathias, Ronald Richman, Andreas Tsanakas, and Mario V. Wüthrich. 2022. Discrimination-free insurance pricing. ASTIN Bulletin: The Journal of the IAA 52: 55–89. [Google Scholar] [CrossRef]
- Lou, Yin, Rich Caruana, Johannes Gehrke, and Giles Hooker. 2013. Accurate intelligible models with pairwise interactions. Paper presented at 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24–27; pp. 623–31. [Google Scholar]
- Lundberg, Scott M., Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence 2: 56–67. [Google Scholar] [CrossRef] [PubMed]
- Menvouta, Emmanuel Jordy, Jolien Ponnet, Robin Van Oirbeek, and Tim Verdonck. 2022. mCube: A Multinomial Micro-level reserving Model. arXiv arXiv:2212.00101. [Google Scholar]
- Mingote, Victoria, Antonio Miguel, Alfonso Ortega, and Eduardo Lleida. 2020. Optimization of the area under the ROC curve using neural network supervectors for text-dependent speaker verification. Computer Speech Furthermore, Language 63: 101078. [Google Scholar] [CrossRef]
- Mitchell, Tom M. 1997. Machine Learning. Burr Ridge: McGraw Hill, vol. 45, pp. 870–77. [Google Scholar]
- Ohlsson, Esbjörn, and Björn Johansson. 2010. Non-life Insurance Pricing with Generalized Linear Models. Berlin: Springer, Heidelberg: Springer, vol. 174. [Google Scholar]
- Pencina, Michael J., and Ralph B. D’Agostino. 2004. Overall C as a measure of discrimination in survival analysis: Model specific population value and confidence interval estimation. Statistics in Medicine 23: 2109–23. [Google Scholar] [CrossRef] [PubMed]
- Ponnet, Jolien, Robin Van Oirbeek, and Tim Verdonck. 2021. Concordance Probability for Insurance Pricing Models. Risks 9: 178. [Google Scholar] [CrossRef]
- Qasim, Omar Saber, and Zakariya Yahya Algamal. 2018. Feature selection using particle swarm optimization-based logistic regression model. Chemometrics and Intelligent Laboratory Systems 182: 41–6. [Google Scholar] [CrossRef]
- Stefansson, Petter, Kristian H. Liland, Thomas Thiis, and Ingunn Burud. 2020. Fast method for GA-PLS with simultaneous feature selection and identification of optimal preprocessing technique for datasets with many observations. Journal of Chemometrics 34: e3195. [Google Scholar] [CrossRef]
- Van Oirbeek, Robin, Jolien Ponnet, Bart Baesens, and Tim Verdonck. 2023. Computational Efficient Approximations of the Concordance Probability in a Big Data Setting. Big Data. [Google Scholar] [CrossRef]
- Wood, Simon N. 2006. Generalized Additive Models: An Introduction with R. London: Chapman and Hall. Boca Raton: CRC. [Google Scholar]
- Wuthrich, Mario V., and Christoph Buser. 2021. Data Analytics for Non-Life Insurance Pricing. Swiss Finance Institute Research Paper. Zürich: Swiss Finance Institute. [Google Scholar]
concProb | nTimesInMods | nKeptBestMods | nMods |
---|---|---|---|
0.6443 | 1 | 5 | 10 |
0.6298 | 1–3 | 2 | 10 |
0.6485 | 1–3 | 3 | 10 |
0.6609 | 1–3 | 3–5 | 15 |
0.6669 | 1–3 | 3–5 | 30 |
0.6700 | 1 | 3–5 | 30 |
concProb | Selection | nVars | Shared |
---|---|---|---|
0.6443 | Best 3/4 | 13 | 8 (61%) |
0.6298 | Second Best | 20 | 11 (55%) |
0.6485 | Second Best | 17 | 9 (53%) |
0.6609 | Best 3/4 | 12 | 8 (67%) |
0.6669 | Second Best | 13 | 11 (85%) |
0.6700 | Best 3/4 | 18 | 18 (100%) |
0.6903 | 0.8165 | 0.8089 |
GLM | 0.6919 | 0.8264 | 0.8196 |
GBM | 0.7820 | 0.9107 | 0.9024 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Van Oirbeek, R.; Vandervorst, F.; Bury, T.; Willame, G.; Grumiau, C.; Verdonck, T. Non-Differentiable Loss Function Optimization and Interaction Effect Discovery in Insurance Pricing Using the Genetic Algorithm. Risks 2024, 12, 79. https://doi.org/10.3390/risks12050079
Van Oirbeek R, Vandervorst F, Bury T, Willame G, Grumiau C, Verdonck T. Non-Differentiable Loss Function Optimization and Interaction Effect Discovery in Insurance Pricing Using the Genetic Algorithm. Risks. 2024; 12(5):79. https://doi.org/10.3390/risks12050079
Chicago/Turabian StyleVan Oirbeek, Robin, Félix Vandervorst, Thomas Bury, Gireg Willame, Christopher Grumiau, and Tim Verdonck. 2024. "Non-Differentiable Loss Function Optimization and Interaction Effect Discovery in Insurance Pricing Using the Genetic Algorithm" Risks 12, no. 5: 79. https://doi.org/10.3390/risks12050079
APA StyleVan Oirbeek, R., Vandervorst, F., Bury, T., Willame, G., Grumiau, C., & Verdonck, T. (2024). Non-Differentiable Loss Function Optimization and Interaction Effect Discovery in Insurance Pricing Using the Genetic Algorithm. Risks, 12(5), 79. https://doi.org/10.3390/risks12050079