Skip Content
You are currently on the new version of our website. Access the old version .
Applied SciencesApplied Sciences
  • Article
  • Open Access

21 March 2022

Using Feature Selection with Machine Learning for Generation of Insurance Insights

,
and
1
School of Computer Science, Technological University Dublin, D02 HW71 Dublin, Ireland
2
Faculty of Computers and Artificial Intelligence, Cairo University, Giza 12613, Egypt
3
DOCOsoft, D03 E5R6 Dublin, Ireland
*
Author to whom correspondence should be addressed.
This article belongs to the Topic Machine and Deep Learning

Abstract

Insurance is a data-rich sector, hosting large volumes of customer data that is analysed to evaluate risk. Machine learning techniques are increasingly used in the effective management of insurance risk. Insurance datasets by their nature, however, are often of poor quality with noisy subsets of data (or features). Choosing the right features of data is a significant pre-processing step in the creation of machine learning models. The inclusion of irrelevant and redundant features has been demonstrated to affect the performance of learning models. In this article, we propose a framework for improving predictive machine learning techniques in the insurance sector via the selection of relevant features. The experimental results, based on five publicly available real insurance datasets, show the importance of applying feature selection for the removal of noisy features before performing machine learning techniques, to allow the algorithm to focus on influential features. An additional business benefit is the revelation of the most and least important features in the datasets. These insights can prove useful for decision making and strategy development in areas/business problems that are not limited to the direct target of the downstream algorithms. In our experiments, machine learning techniques based on a set of selected features suggested by feature selection algorithms outperformed the full feature set for a set of real insurance datasets. Specifically, 20% and 50% of features in our five datasets had improved downstream clustering and classification performance when compared to whole datasets. This indicates the potential for feature selection in the insurance sector to both improve model performance and to highlight influential features for business insights.

1. Introduction

The insurance sector by nature has been an intensively data-driven industry for many years, with insurance companies managing large quantities of customer data. The business of insurance is based on the analysis of data to understand and effectively evaluate risk. The insurer makes use of actuaries and actuarial science techniques to analyse insurance data to perform core roles. Therefore, insurance data can be claimed to be a dominant force in the sector [1]. Enhancing the quality of insurance data through appropriate pre-processing should improve the estimation process by increasing data quality. Examples of pre-processing tasks include handling of missing values [2], managing outliers [3], binning numerical data to create categories [4], and better handling categorical data correlation/association [5].
The use of machine learning in the non-life insurance industry can be divided into three categories: actuarial, fraud detection, and customer behaviour. Actuaries carry out two tasks relevant to machine learning: calculating how much an insurance policy should cost, which is called pricing or ratemaking, and calculating how much money an insurer should set aside for the payment of future claims, which is called reserving. The nature of insurance data (especially aggregated data) and the detailed domain requirements means that bespoke domain specific models are required within the actuarial category. Fraud detection and customer behaviour (churn and propensity to buy) are however amenable to the application of more general machine learning techniques, such that advances in other domains can be applied to problems found in these categories. To give an example, techniques for detecting credit card fraud can be used to detect insurance fraud [6] and techniques developed for churning in the supermarket sector can also be applied to the insurance sector [7].
In training datasets for supervised learning, redundant and irrelevant features have been demonstrated to affect the performance of learning models. Taking the commonly used Support Vector Machine (SVM) algorithm, the identification of important features significantly improves the robustness of SVM learning models [8], with such models being sensitive to noisy features. SVM models aim to optimise the kernel of the data, where these models suppose that the kernel matrix of training data is positive definite. However, the positive definite assumption cannot be ensured because of the influence of noisy features. Therefore, it is recommended to decrease the influence of noisy features to build robust SVM-based models [9].
Insurance databases tend to contain multiple redundant and irrelevant attributes. These attributes negatively affect the accuracy of prediction of insurance reserve prediction techniques. Therefore, it is intuitively important to apply/embed feature selection prior to the creation of Machine Learning (ML) models in order to strip out low influence features. Furthermore, improving the model prediction power by feature selection and dimensionality reduction holds promise towards improving the processing and accuracy of many insurance problems such as insurance reserve prediction, customer retention, policy price, and insurance fraud detection. In our related works section we outline both the application of ML in the insurance sector and discuss various techniques used to implement feature selection. Above and beyond improved algorithmic performance, feature selection reveals the most and least influential features in a dataset for the model task in question. Domain experts can use these insights for practical purposes immediately or as a starting point for further investigations [10].

Contributions

This paper highlights the presence and influence of poor data quality on machine learning algorithms using insurance datasets. It brings feature selection and machine learning together in the insurance sector aiming towards the creation of more robust ML models and predictive domain insights. It emphasizes the role of feature selection techniques in improving insurance insights. Our specific contributions are as follows:
  • We highlight the influence of noisy, redundant and/or irrelevant features in lessening the accuracy of machine learning algorithms. We show that feature selection can lead to better predictive/classification models in insurance, through removal of noisy features.
  • We demonstrate how feature selection can lead to domain insights through examination of the features which most contribute to model decisions.
  • We have used the proposed framework to improve the performance of machine learning algorithms on real insurance datasets.
The rest of this paper is organized as follows: Section 2 highlights different applications of machine learning in the insurance sector and reviews feature selection algorithms for mixed data which are suitable for insurance datasets. Our methodology for improving the application of machine learning in the insurance sector is presented in Section 3. Section 4 introduces a comparative study based on benchmark datasets. The discussion is presented in Section 5. Finally, Section 6 concludes the paper.

3. Proposed Framework

Our aim is to demonstrate and investigate feature selection on insurance datasets. We present our approach to applying feature selection as a framework, explained in two parts: defining the problem statement and presenting the proposed framework.

3.1. Problem Statement

Finding the most influential insurance features can be formally expressed as follows:
Given:
  • A insurance dataset, D, consisting of v = p + q mixed features, where
    -
    p: The number of quantitative features and;
    -
    q: The number of qualitative features.
Find:
  • The best representative set of features.
Objective
  • Identify irrelevant and redundant feature.
  • Improve machine learning accuracy.

3.2. Framework Description

Our proposed framework aims to highlight the importance of employing feature selection algorithms prior to applying machine learning algorithms for insurance data. The steps of the proposed framework are shown in Algorithm 1.
Input:
  • D: A Mixed Insurance Dataset containing v Mixed features.
  • C: List of candidate feature selection algorithms.
  • π : predefined set of features percentages.
Output:
  • S: The best representative set of features to represent D.
Algorithm 1 Proposed framework for identifying most influential attributes in insurance datasets
1: for < each feature selection algorithm c i in C> do
2:        for < each each percentage π j of selected feature in π > do
3:              D i j Identify selected feature by Algorithm c i
4:              Apply intended machine learning algorithm on D i j
5:              Compute performing measures
6:        end for
7: end for
8: Identify D i j that gets best performance results
9: return D i j
The proposed framework requires the input datasets, the list of numbers of desired features, and the choice of candidate feature selection algorithms. For each pair of feature selection algorithm and percentage of feature selection, the set of classification and clustering-based performance measures is computed. It then returns the set of features of required size that has the best performance measures.

4. Methods and Materials

In this section, we aim to discover insurance insights and improve the performance of machine learning in the insurance sector through examining the effect of our proposed framework to highlight the importance of feature engineering in focusing, ranking, and selecting influential features. We examine the candidate feature selection methods based on the accuracy measures of machine learning algorithms before and after applying the feature selection algorithm on insurance datasets to highlight the importance of the usage of feature engineering, before applying machine learning techniques in the insurance sector.
In these experiments, parameter settings for comparative published methods were fixed according to the recommendation of their respective authors. We implement GFSA in the R language. Fernandez et al. [42] kindly provided us with their Java program for computing USFSM. Both the LS and Spec algorithms are provided in an R-package named "Rdimtools" for Dimension Reduction and Estimation Methods. All experiments were run in R, using a computer with an Intel Core i7-6600U 2.60 GHz processor with 16 GB DDR4 RAM, running 64-bit Windows 10.

4.1. Experimental Design

We propose an experimental design for highlighting the effect of employing feature selection algorithms before applying machine learning tasks for insurance. All experiments were run on a computer with an Intel Core i7-6600U processor running at 2.60 GHz using 16 GB of RAM, running Windows 10. In these experiments, we make use of the following candidate feature selection algorithms:
  • GFSA: The Greedy Feature Selection Algorithm [39];
  • LS: The Laplacian Score [40];
  • Spec: The Spectral Algorithm [41];
  • USFSM: The Unsupervised Spectral Feature Selection Method [42].
For each candidate algorithm, we have applied different proportions of selected features, π = {0.2, 0.5, and 0.8}. All algorithms apart from GFSA are ranked-based feature selection algorithms, which means that they rank all features according to their relevance. We then use this ranking to identify the top π percentage of these features. GFSA, however, requires the number of desired features, M, in advance as an input parameter. Therefore, we run GFSA with each value of π that we wish to investigate. We have 12 configurations, consisting of 4 algorithms across 3 proportions of selected features. We compute the downstream evaluation measures for each configuration. These evaluation measures are also computed for whole datasets (all features) to provide our baseline without the use of FS selection. Finally, we select the set of features that has the best performance measures.
We would like to mention that all candidates solutions, except GFSA, are sensitive to the number of objects (observations) in the dataset. These techniques are challenging to run when the number of observations is high because their time complexity increases non-linearly as the number of observations increases. These algorithms, except GFSA, could not be applied for long datasets such as D4 in Table 1.
Table 1. Insurance datasets’ description.

4.2. Real Insurance Datasets

For our work, we looked for insurance benchmark datasets in the public machine learning repositories, e.g., UCI [47] and Kaggle [48]. We excluded many insurance datasets because they are either not annotated or lacked meta information (such as feature titles) for interpretation, finally selecting five insurance datasets for this comparison. These datasets are publicly available from the Kaggle machine learning repository [48]. Furthermore, we manually prune the features to remove individual identifiers or constant features. Table 1 shows the characteristics of selected datasets, describing each of the datasets as set out in the repository [48].
The first dataset, Car Insurance Cold Calls, is a dataset from a bank, which constructs campaigns to engage new clients for car insurance services. The bank needs to predict whether those potential customers will buy car insurance or not based on their data from previous campaigns. It provides general information about clients and specific information about the insurance campaigns. This dataset consists of 18 attributes (after removing the id attribute) in relation to 4000 customers contacted during the last campaign.
The Medical Insurance Claim Fraud dataset relates to the detection of health insurance fraud claims based on patients’ data. It provides general information about insurance claims and their owners such as gender, location, employer, cause, and fee charged. The original datasets contain 15 attributes and 7000 claims. However, we removed 3 noisy attributes: patient name, DOB, and email.
The Caravan Insurance Challenge dataset was used in the CoIL 2000 challenge [49]. It contains socio-demographic information about clients of an insurance company to predict which potential customer will buy a caravan insurance policy. It consists of 86 attributes and 9822 observations.
The Health Insurance Lead Prediction dataset was collected by a financial services company. This company built this dataset to predict whether a client is interested in its recommended medical insurance policies based on their profile on the company website. The client information includes demographics information and previous holding polices. The customer is classified as a lead if she/he fills in the policy form. Its training dataset consists of 50,882 rows and 13 attributes after removing the individual identifier attribute for prediction, “Id”.
The final dataset is named Insurance Company. It aims to identify customers willing to buy a new insurance product. It contains customer information and their buying behaviour of two other services. It consists of 15 attributes and 14,017 customers. We removed two features, customer-id and contract, because contract contains a constant value which has 0 variance, so it cannot be used for extracting any interesting information.

4.3. Evaluating Measures

We have chosen two types of machine learning tasks: classification-based approach for supervised learning and clustering-based approach for unsupervised learning, to demonstrate the effect of features selection algorithm in machine learning for insurance datasets. We followed the standard methodology for evaluating feature selection methods [42]. To evaluate the clustering performance of insurance datasets before and after feature selection, we utilise the well-known clustering algorithm for clustering mixed data k-prototypes [50]. In the comparison, two clustering measures are used: clustering accuracy (C-ACC) and normalized mutual information (NMI). The C-ACC is calculated as follows
C-ACC ( θ ) = i = 1 n δ ( p , m a p ( q i ) ) n ,
where n is the total number of observations and δ ( a , b ) = 1 if a = b : otherwise δ ( a , b ) = 0 , and m a p ( q i ) is a mapping function that permutes clustering labels to get the best match with the true labels based on the Kuhn–Munkres algorithm [51]. The C-ACC values range from 0 to 1; the clustering is better when the C-ACC is higher.
For clustering results (P) and true labels (Q), NMI [40] is defined as:
NMI ( θ ) = I ( P , Q ) m a x ( H ( P ) , H ( Q ) ) ,
where H ( P ) and H ( Q ) are the entropies of P and Q, respectively, and I ( P , Q ) is the mutual information [52] between P and Q, which is defined as:
I ( P , Q ) = p i P q j Q p ( p i , q j ) l o g 2 p ( p i , q j ) p ( p i ) p ( q j ) .
In the classification evaluation-based approach, two classification techniques, Support Vector Machine (SVM) and K-Nearest Neighbours (KNN), are used to evaluate classification accuracy: the ratio between the correctly classified objects and the total number of objects.

4.4. Experimental Results Analysis

The clustering accuracy for selected datasets is shown in Table 2. For each dataset, D i in the table displays 3 groups (rows) of results. Each row indicates results corresponding to a percentage of selected features, π = M v , is set to 0.2, 0.5, and 0.8. As shown in Table 2, there is at least one subset of features that has better clustering accuracy than the whole dataset. This indicates the existence of noisy features, which reduces the clustering accuracy of the whole dataset. If we remove these noisy features the performance will improve. For example, with D1, 9 out of 12 configurations using FS algorithms lead to higher clustering accuracy than using the full dataset. The best clustering accuracy for D1, 0.6613 , is found when we select 0.5 of features by Spec Algorithm. The best clustering accuracy for D2 and D3 occurs when we select 0.2 of features by Spec Algorithm. We found that if we select 0.2 of features in D1, D2, D3, and D5, by Spec Algorithm, we will get better clustering accuracy than by applying the clustering algorithm to the whole dataset.
Table 2. Clustering accuracy where four feature selection (FS) methods were applied, showing results for three feature percentages versus all features. Highest FS-based result is in bold.
Table 3 shows the NMI results for the selected 5 datasets. Similar to clustering accuracy, for all datasets, there is at least one subset of features which outperforms using the whole dataset in terms of NMI. This emphasizes the existence of noisy features that negatively affect the NMI accuracy of the whole dataset. For example, all subsets of features, recommended by all algorithms in D1, have NMI accuracy better than the whole dataset. The best NMI accuracies are found when we select 0.5 percentage of features by Spec Algorithm. Corresponding with clustering accuracy results, we find that if we select 0.2 of features in D1, D2, D3, and D5 by Spec Algorithm, we will get better NMI accuracy than by applying the clustering algorithm to whole datasets.
Table 3. NMI accuracy for insurance datasets.
The Spec algorithm has the best clustering-based performing measures for our five insurance datasets.
Table 4 shows the KNN classification accuracy. Like clustering accuracy and KNN accuracy, for all datasets, there is at least one subset of features which outperforms the whole dataset in terms of KNN accuracy. This emphasizes the existence of noisy features that negatively affect the KNN accuracy of whole datasets.
Table 4. KNN Accuracy for insurance datasets.
Table 5 depicts the SVM classification accuracy. Like clustering accuracy and SVM classification accuracy, for all datasets except D3, there is at least one subset of features which outperforms using the whole dataset. This emphasizes the existence of noisy features that negatively affect the KNN accuracy of the whole dataset.
Table 5. SVM accuracy for insurance datasets.
The only exception case in our experiments is the SVM classification accuracy for D3. We found that the best SVM accuracy occurred for the whole dataset. There is no subset that outperforms SVM classification accuracy using all features in D3.

5. Discussion

The objective of this article is to highlight the benefit of feature selection for insurance insights and learning-based tasks. We have proposed a framework for selecting the most influential and discarding irrelevant and noisy features.
We applied our framework to identify the most and least influential feature when applying predictive or descriptive analytical algorithms. In order to demonstrate our claim, we have collected and analysed the results of real insurance datasets publicly available in Kaggle [48]. In most cases (all cases except one), there is at least one subset of features which outperforms all features. This indicates that our benchmark insurance datasets contain irrelevant features that degrade the performance of classification and clustering algorithms.

5.1. Feature Selection Methodologies

Feature selection reduces the number of given features before developing machine learning models. It aims to remove irrelevant, redundant, and/or noisy features to both lessen the processing time of modelling and improve the accuracy of the models. Feature selection techniques can be classified into three types: filter, wrapper, and embedded. We opted to use filter based feature selection methods because they are typically fast, scalable, and applicable to high-dimensional data. In addition, they are widely used in the literature and independent of learning algorithms, such that their results do not changed according to the learning algorithm. In most of the feature selection algorithms, all candidates solutions, except GFSA, are time sensitive to the number of objects (observations). These techniques suffer from computational time issues when the number of observations is high because their time complexity increases non-linearly as number of observations increases. As a result, GFSA is the only algorithm which we could apply to the Health Lead Prediction (D4) dataset. Generally, if we compare four algorithms versus all features, we find that the set of features suggested by Spec algorithm was the best in the majority of cases, especially for clustering-based evaluation measures like clustering and NMI accuracy.

5.2. Generating Insurance Insights

We mentioned previously that a potential benefit of feature selection for insurance datasets is the revelation of the most important and least important features, as a pre-step to generating business insights. Our experimental results from use of our framework revealed the extent to which redundant and irrelevant features exist in the datasets. By identifying which features are influencing (or not) the downstream accuracy gains, we can use these feature level insights to assist business decision making.
From our experimental results, two examples stand out in particular. Insurance companies tend to have more than one line of business to sell across these lines. In the car Insurance dataset, D1, the least important feature is HHInsurance, which indicates whether the target of the call has house insurance with this insurance provider. This immediately indicates that targeting existing customers with house insurance will be of little reward. Balance (yearly average bank balance) and PrevAttempts (no. of previous contacts prior to this campaign) are the second and third least important features respectively. These findings indicate that targeting customers with a high bank balance will be of little reward and that customers do not necessarily tire of being contacted multiple times. The most important features in the D1 dataset are CallStart (start time of the last call), CallEnd (end time of the last call), and Communication (contact communication type). This knowledge enables the company to target particular times of the day to call, choose the particular communication type that works best, and tailor call scripts to favour long or short calls as appropriate.
In the Medical Insurance Claim Fraud dataset, D2, the most important features are gender, location, and number of claims. Knowledge of these factors will allow investigators to target groups of claims for further examination. Fee charge and cause are the least important features and this indicates that the value of the claim and the cause giving rise to medical treatment are not relevant and that focusing on a particular type of claim or value of claim is not likely to uncover fraud.

6. Conclusion and Future Work

The insurance sector has become an intensively data-driven industry, enabling it to offer services that are more responsive to customer needs. Accurate information and insights are required to support decision making. Insurance datasets usually contain irrelevant, noisy, and/or redundant features. These features can negatively affect machine learning techniques, suggesting their removal prior to applying any data analytics algorithms. We propose a framework for selecting the most influential features before applying predictive or descriptive analytical algorithms. We also demonstrate how the feature selection process itself, apart from its role in improving downstream algorithmic performance, can provide insights for insurers that can lead to practical action. The experiments based on real insurance datasets indicate that the application of machine learning techniques based on a set of selected features suggested by feature selection algorithms outperforms the application without employing feature selection. In some cases, we found that half of the features or more in insurance datasets are redundant and irrelevant.

Author Contributions

Conceptualization, A.T. and S.M.; methodology, A.T. and B.C.; formal analysis, A.T. and B.C.; investigation, A.T. and B.C.; writing—original draft preparation, A.T., B.C. and S.M.; writing—review and editing, B.C. and S.M.; supervision, S.M.; project administration, S.M.; funding acquisition, S.M. All authors have read and agreed to the published version of the manuscript.

Funding

This publication has emanated from research supported in part by a grant from Science Foundation Ireland under grant number 18/CRT/6222.

Acknowledgments

Ayman Taha is funded by the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Skłodowska-Curie Co-funding of regional, national and international programmes (grant agreement No. 713654).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hussain, K.; Prieto, E. Big data in the finance and insurance sectors. In New Horizons for a Data-Driven Economy; Springer: Cham, Switzerland, 2016; pp. 209–223. [Google Scholar]
  2. Johnson, T.F.; Isaac, N.J.; Paviolo, A.; González-Suárez, M. Handling missing values in trait data. Glob. Ecol. Biogeogr. 2021, 30, 51–62. [Google Scholar] [CrossRef]
  3. Taha, A.; Hadi, A.S. A general approach for automating outliers identification in categorical data. In Proceedings of the ACS International Conference on Computer Systems and Applications (AICCSA), Ifrane, Morocco, 27–30 May 2013; pp. 1–8. [Google Scholar]
  4. Tang, C.; Liu, X.; Li, M.; Wang, P.; Chen, J.; Wang, L.; Li, W. Robust unsupervised feature selection via dual self-representation and manifold regularization. Knowl. Based Syst. 2018, 145, 109–120. [Google Scholar] [CrossRef]
  5. Taha, A.; Hadi, A.S. Pair-wise association measures for categorical and mixed data. Inf. Sci. 2016, 346, 73–89. [Google Scholar] [CrossRef]
  6. Gomes, C.; Jin, Z.; Yang, H. Insurance fraud detection with unsupervised deep learning. J. Risk Insur. 2021, 88, 591–624. [Google Scholar] [CrossRef]
  7. Scriney, M.; Nie, D.; Roantree, M. Predicting customer churn for insurance data. In International Conference on Big Data Analytics and Knowledge Discovery; Springer: Cham, Switzerland, 2020; pp. 256–265. [Google Scholar]
  8. Hu, R.; Zhu, X.; Zhu, Y.; Gan, J. Robust SVM with adaptive graph learning. World Wide Web 2020, 23, 1945–1968. [Google Scholar] [CrossRef]
  9. Hu, R.; Zhang, L.; Wei, J. Adaptive Laplacian Support Vector Machine for Semi-supervised Learning. Comput. J. 2021, 64, 1005–1015. [Google Scholar] [CrossRef]
  10. Taha, A.; Cosgrave, B.; Rashwan, W.; Mckeever, S. Insurance Reserve Prediction: Opportunities and Challenges. In Proceedings of the International Conference on Computational Science & Computational Intelligence, Krakow, Poland, 16–18 June 2021; pp. 1–6. [Google Scholar]
  11. Blier-Wong, C.; Cossette, H.L.M.E. Machine Learning in P&C Insurance: A Review for Pricing and Reserving. Risks 2020, 9, 4. [Google Scholar]
  12. Avanzi, B.; Taylor, G.; Vu, P.A.; Wong, B. Stochastic loss reservingwith dependence: A flexible multivariate tweedie approach. Insur. Math. Econ. 2016, 71, 63–78. [Google Scholar] [CrossRef] [Green Version]
  13. Dugas, C.; Bengio, Y.; Chapados, N.; Vincent, P.; Denoncourt, G.; Fournier, C. Statistical Learning Algorithms Applied to Automobile Insurance Ratemaking. Casualty Actuar. Soc. Forum 2003, 1, 179–213. [Google Scholar]
  14. Haberman, S.; Renshaw, A.E. Genaralized linear models and actuarial science. Statistician 1996, 45, 407–436. [Google Scholar] [CrossRef]
  15. Generalized Linear Models for Insurance Data; Cambridge University Press: Cambridge, UK, 2008.
  16. Staudt, Y.; Wagner, J. Comparison of Machine Learning and Traditional Severity-Frequency Regression Models for Car Insurance Pricing; Technical Report, Working Paper; University of Lausanne: Lausanne, Switzerland, 2019. [Google Scholar]
  17. Denuit, M.; Lang, S. Non-life rate-making with Bayesian GAMs. Insur. Math. Econ. 2004, 35, 627–647. [Google Scholar] [CrossRef]
  18. Klein, N.; Denuit, M.; Lang, S.; Kneib, T. Nonlife ratemaking and risk management with Bayesian generalized additive models for location, scale, and shape. Insur. Math. Econ. 2014, 55, 225–249. [Google Scholar] [CrossRef]
  19. Wuthrich, M.V. From Generalized Linear Models to Neural Networks, and Back. Available at SSRN 3491790. 2019. Available online: https://owars.info/mario/2020_Wuthrich.pdf (accessed on 15 January 2022).
  20. Wüthrich, M.V.; Merz, M. Yes, we CANN! ASTIN Bull. J. IAA 2019, 49, 1–3. [Google Scholar] [CrossRef] [Green Version]
  21. Mack, T. Distribution-free calculation of the standard error of chain ladder reserve estimates. ASTIN Bull. J. IAA 1993, 23, 213–225. [Google Scholar] [CrossRef] [Green Version]
  22. Lopez, O.; Milhaud, X.; Thérond, P.E. Tree-based censored regression with applications in insurance. Electron. J. Stat. 2016, 10, 2685–2716. [Google Scholar] [CrossRef]
  23. Kuo, K. DeepTriangle: A deep learning approach to loss reserving. Risks 2019, 7, 97. [Google Scholar] [CrossRef] [Green Version]
  24. Wüthrich, M.V. Neural networks applied to chain–ladder reserving. Eur. Actuar. J. 2018, 8, 407–436. [Google Scholar] [CrossRef]
  25. Lopes, H.; Barcellos, J.; Kubrusly, J.; Fernandes, C. A non-parametric method for incurred but not reported claim reserve estimation. Int. J. Uncertain. Quantif. 2012, 2, 39–51. [Google Scholar] [CrossRef]
  26. Wüthrich, M.V. Machine learning in individual claims reserving. Scand. Actuar. J. 2018, 2018, 465–480. [Google Scholar] [CrossRef]
  27. Kuo, K. Individual claims forecasting with Bayesian mixture density networks. arXiv 2020, arXiv:2003.02453. [Google Scholar]
  28. Itri, B.; Mohamed, Y.; Mohammed, Q.; Omar, B. Performance comparative study of machine learning algorithms for automobile insurance fraud detection. In Proceedings of the 2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS), Marrakech, Morocco, 28–30 October 2019; pp. 1–4. [Google Scholar]
  29. Hassan, A.K.I.; Abraham, A. Modeling insurance fraud detection using imbalanced data classification. In Advances in Nature and Biologically Inspired Computing; Springer: Cham, Switzerland, 2016; pp. 117–127. [Google Scholar]
  30. Wang, Y.; Xu, W. Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud. Decis. Support Syst. 2018, 105, 87–95. [Google Scholar] [CrossRef]
  31. Günther, C.C.; Tvete, I.F.; Aas, K.; Sandnes, G.I.; Borgan, Ø. Modelling and predicting customer churn from an insurance company. Scand. Actuar. J. 2014, 2014, 58–71. [Google Scholar] [CrossRef]
  32. Solorio-Fernández, S.; Carrasco-Ochoa, J.A.; Martínez-Trinidad, J.F. A review of unsupervised feature selection methods. Artif. Intell. Rev. 2020, 53, 907–948. [Google Scholar] [CrossRef]
  33. Arai, H.; Maung, C.; Xu, K.; Schweitzer, H. Unsupervised feature selection by heuristic search with provable bounds on suboptimality. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-17), Phoenix, AZ, USA, 12–17 February 2016; pp. 666–672. [Google Scholar]
  34. Guo, J.; Zhu, W. Dependence guided unsupervised feature selection. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-17), New Orleans, LA, USA, 2–7 February 2018; pp. 2232–2239. [Google Scholar]
  35. Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R.P.; Tang, J.; Liu, H. Feature Selection: A Data Perspective. ACM Comput. Surv. 2017, 50, 94:1–94:45. [Google Scholar] [CrossRef] [Green Version]
  36. Farahat, A.K.; Ghodsi, A.; Kamel, M.S. An efficient greedy method for unsupervised feature selection. In Proceedings of the IEEE International Conference on Data Mining (ICDM), Vancouver, BC, Canada, 11–14 December 2011; pp. 161–170. [Google Scholar]
  37. Wang, S.; Tang, J.; Liu, H. Embedded Unsupervised Feature Selection. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 1–7. [Google Scholar]
  38. Ang, J.C.; Mirzal, A.; Haron, H.; Hamed, H.N.A. Supervised, unsupervised, and semi-supervised feature selection: A review on gene selection. IEEE/ACM Trans. Comput. Biol. Bioinform. 2015, 13, 971–989. [Google Scholar] [CrossRef]
  39. Taha, A.; Hadi, A.S.; Cosgrave, B.; Mckeever, S. A Multiple Association-Based Unsupervised Feature Selection Algorithm for Mixed Data Sets. Expert Syst. Appl. 2022, 1–31. [Google Scholar]
  40. He, X.; Cai, D.; Niyogi, P. Laplacian score for Feature Selection. Adv. Neural Inf. Process. Syst. 2005, 18, 507–514. [Google Scholar]
  41. Zhao, Z.; Liu, H. Spectral feature selection for supervised and unsupervised learning. In Proceedings of the 24th International Conference on Machine Learning, New York, NY, USA, 20–24 June 2007; pp. 1151–1157. [Google Scholar]
  42. Solorio-Fernández, S.; Martínez-Trinidad, J.F.; Carrasco-Ochoa, J.A. A new unsupervised spectral feature selection method for mixed data: A filter approach. Pattern Recognit. 2017, 72, 314–326. [Google Scholar] [CrossRef]
  43. Paniri, M.; Dowlatshahi, M.B.; Nezamabadi-Pour, H. MLACO: A multi-label feature selection algorithm based on ant colony optimization. Knowl.-Based Syst. 2020, 192, 105285. [Google Scholar] [CrossRef]
  44. Hashemi, A.; Dowlatshahi, M.B.; Nezamabadi-pour, H. Ensemble of feature selection algorithms: A multi-criteria decision-making approach. Int. J. Mach. Learn. Cybern. 2022, 13, 49–69. [Google Scholar] [CrossRef]
  45. Hashemi, A.; Dowlatshahi, M.B.; Nezamabadi-pour, H. A pareto-based ensemble of feature selection algorithms. Expert Syst. Appl. 2021, 180, 115130. [Google Scholar] [CrossRef]
  46. Raquel, C.R.; Naval Jr, P.C. An effective use of crowding distance in multiobjective particle swarm optimization. In Proceedings of the Annual Conference on Genetic and Evolutionary Computation, Washington, DC, USA, 26 June 2005; pp. 257–264. [Google Scholar]
  47. Frank, A.; Asuncion, A. UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/ml/index.php (accessed on 15 January 2022).
  48. Kaggle: Your Machine Learning and Data Science Community. Available online: https://www.kaggle.com/ (accessed on 15 January 2022).
  49. Caravan Insurance Challenge-Coil Challenge 2000. Available online: https://www.kaggle.com/uciml/caravan-insurance-challenge (accessed on 15 January 2022).
  50. Huang, Z. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 1998, 2, 283–304. [Google Scholar] [CrossRef]
  51. Lovász, L.; Plummer, M.D. Matching Theory; American Mathematical Society: Providence, RI, USA, 2009; Volume 367. [Google Scholar]
  52. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley Series in Telecommunications and Signal Processing; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.