Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Using Feature Selection with Machine Learning for Generation of Insurance Insights

Appl. Sci. 2022, 12(6), 3209; https://doi.org/10.3390/app12063209

by Ayman Taha^1,2, Bernard Cosgrave³ and Susan Mckeever^1,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2022, 12(6), 3209; https://doi.org/10.3390/app12063209

Submission received: 31 January 2022 / Revised: 11 March 2022 / Accepted: 11 March 2022 / Published: 21 March 2022

(This article belongs to the Topic Machine and Deep Learning)

Round 1

Reviewer 1 Report

The manuscript "Using Feature Selection with Machine Learning for Generation of Insurance Insights" is written by Ayman Taha, Bernard Cosgrave, Susan Mckeever for Applied Sciences MDPI journal. It contains 14 pages, 5 tables and 41 references.

It is a comprehensive study of the state-of-the-art methods on the insurance data analysis, as well as the novel results of applying feature selection process to remove noisy features before performing machine learning techniques. It is a comprehensive and timely work. The manuscript is well organized and written. There are three my comments to improve the manuscript.

1. I recommend to reorganize the structure of the manuscript. it looks like the original idea of a Review is now transformed into an Article. For example, it is usual to discuss all previous works in the Introduction section, details of methods and experimental data in the section Methods and Materials, all novel results in the Results sections. This general structure can help the readers.

2. The authors should check that all abbreviations are properly introduced: SVM, ML, etc.

3. Could the conclusion that Spec Algorithm has the best clustering-based performing measures for insurance datasets be verified by the previous works?

This manuscript can be published in Applied Sciences after the minor changes listed above.

Author Response

Comment1: I recommend to reorganize the structure of the manuscript. it looks like the original idea of a Review is now transformed into an Article. For example, it is usual to discuss all previous works in the Introduction section, details of methods and experimental data in the section Methods and Materials, all novel results in the Results sections. This general structure can help the readers.

Response: Thank you, we reorganized the structured of manuscript. We updated the Methods and Materials section to include the details of experimental data and results. Furthermore, previous works is provided in the related work section.

Comment 2: The authors should check that all abbreviations are properly introduced: SVM, ML, etc.

Response: Thank you. We have reviewed and fixed all the abbreviations.

Comment 3: Could the conclusion that Spec Algorithm has the best clustering-based performing measures for insurance datasets be verified by the previous works?
Response: We agreed that SPEC algorithm had the best results in these datasets. However, this finding is not generalizable across the insurance domain as the performance of feature selection algorithms relative to each other will fluctuate based on variations within the datasets. Variations would include differences in data types, data distribution, number of instances and number of features amongst others. We would also like to draw the reviewer’s attention to that the conclusion of this article is highlighting that insurance datasets usually contain irrelevant, noisy and/or redundant features. These features can negatively affect machine learning techniques. Therefore, we suggest the removal of noisy feature prior to applying any data analytics algorithms. Furthermore, we propose a framework for selecting the most influential features
before applying predictive or descriptive analytical algorithms. We also demonstrate how the feature selection process itself, apart from its role is improving downstream algorithmic performance, can provide insights for insurers that can lead to practical action.

Reviewer 2 Report

Below are my comments and suggestions.

1. It seems that the authors missed some recent relevant references. References need to be updated. For example:

-MLACO: A multi-label feature selection algorithm based on ant colony optimization

-Ensemble of feature selection algorithms: A multi-criteria decision-making approach

-A Pareto-based ensemble of feature selection algorithms

2. This article have some writing errors! For example:
-line 258 page 7: "Identify selected feature by Algorith ci". Algorithm is correct.
-and etc.

3. What is the idea behind the used feature selection algorithms? Why do not authors use the other filter feature selection algorithms such as GA, Binary PSO, or ACO algorithms? Please clarify this. It is necessary to implement several popular filter algorithms and compare the results of their algorithms with each other.
4. Why aren't wrapper algorithms used? As you know, filter algorithms are less accurate than wrapper algorithms. It is necessary to implement several well-known wrapper algorithms and compare the results of filter algorithms with wrapper algorithms.
5. In addition to the learning algorithms used, it is necessary to use the artificial neural networks as the evaluation metrics.
6. Please add a section and discuss about the advantages and disadvantages of the existing approaches.
7. It is essential to make sure that the manuscript reads smoothly- this definitely helps the reader fully appreciate your research findings.

Author Response

Point 1: It seems that the authors missed some recent relevant references. References need
to be updated. For example: -MLACO: A multi-label feature selection algorithm based on ant colony optimization -Ensemble of feature selection algorithms: A multi-criteria decision-making approach -A Pareto-based ensemble of feature selection algorithms

Response: Thank you for pointing out our attention these recent articles. We rewrote the related work accordingly and included the suggested articles.

Point 2: This article have some writing errors! For example: -line 258 page 7: ”Identify selected feature by Algorithci”. Algorithm is correct. -and etc.

Response: We are sorry for those typos. We have read the paper more carefully and corrected typos. The paper went through a full grammar check with a native English speaker - who will
do it thoroughly.

Point 3: What is the idea behind the used feature selection algorithms? Why do not authors use the other filter feature selection algorithms such as GA, Binary PSO, or ACO algorithms? Please clarify this. It is necessary to implement several popular filter algorithms and compare the results of their algorithms with each other.
Response: This work demonstrates the role and usage of feature selection methods within the insurance sector. We do not limit to one feature selection method as our results might be a once off. Therefore, we implemented four FS methods for comparison to demonstrate how feature selection can improve downstream machine learning tasks, and as a by product, highlight important insights - using a set of published real insurance datasets. The key point
in our response here is that we are not attempting to do an exhaustive comparison of feature selection methods or categories of methods for the domain - but rather addressed the aims just mentioned.

Point 4: Why aren’t wrapper algorithms used? As you know, filter algorithms are less accurate than wrapper algorithms. It is necessary to implement several well-known wrapper algorithms and compare the results of filter algorithms with wrapper algorithms.

Response: Thank you for this point. In addition, our point from the previous review point is relevant here. Our overall aim is about the benefit of feature selection for insurance insights and learning-based tasks. We do not, nor aim to, provides comparisons of categories of feature selection methods. There is a body of literature comparing different types of feature selection
methods, both qualitatively and quantitatively. We opted to use filter-based feature selection methods because they are typically fast, scalable
and applicable to high-dimensional data. In addition, they are widely used in the literature and independent of learning algorithms, such that their results do not changed according to the learning algorithm. We do not suggest that wrapper methods or other FS methods are not suitable for insurance sector. It is not our main focus or aim of the paper to compare filter versus wrapper versus other feature selection methods here in this work.

Point 5: In addition to the learning algorithms used, it is necessary to use the artificial neural networks as the evaluation metrics.
Response: We are assuming that the reviewer is suggesting to use Artificial Neural Networks as a classification approach in our evaluation. We chose KNN and SVM as two established classifier approaches, which have also been used in previous works for feature selection comparison (e.g. Fernandez et al 2017/ 2020). As regards ANNs, it is an interesting point, as ANNs are generally associated with black box models, with feature selection (and discarding of noise or irrelevant features) embedded into the layers of the network. We are starting to note the use of feature selection as complementary to ANNs in the literature, in that it can be provide scope for helping to explain the decisions of ANNs. This is a slightly different take on the use of feature selection. As a general response, we selected two well used classifiers as metrics, where these classifiers are
typically used with a transparent feature selection as is the case here.

Point 6: Please add a section and discuss about the advantages and disadvantages of the existing approaches.

Response: We have added a subsection in the discussion section (5.1 Feature Selection Methodologies), which talks about the advantages and disadvantages of the candidate feature selection
methods.

Point 7: It is essential to make sure that the manuscript reads smoothly- this definitely helps the reader fully appreciate your research findings

Response: We are sorry, we have read the paper more carefully. The paper went through a full read and grammar check with a native English speaker to make sure that manuscript reads smoothly.

Round 2

Reviewer 2 Report

The authors have made good improvements to the article. The article is acceptable in its current form.

Article Menu

Using Feature Selection with Machine Learning for Generation of Insurance Insights

Further Information

Guidelines

MDPI Initiatives

Follow MDPI