Next Article in Journal
Explainable Machine Learning Reveals Capabilities, Redundancy, and Limitations of a Geospatial Air Quality Benchmark Dataset
Next Article in Special Issue
FeaSel-Net: A Recursive Feature Selection Callback in Neural Networks
Previous Article in Journal
Machine Learning Based Restaurant Sales Forecasting
Previous Article in Special Issue
A Novel Feature Representation for Prediction of Global Horizontal Irradiance Using a Bidirectional Model
 
 
Article
Peer-Review Record

A Novel Framework for Fast Feature Selection Based on Multi-Stage Correlation Measures

Mach. Learn. Knowl. Extr. 2022, 4(1), 131-149; https://doi.org/10.3390/make4010007
by Ivan-Alejandro Garcia-Ramirez 1,*, Arturo Calderon-Mora 1, Andres Mendez-Vazquez 1, Susana Ortega-Cisneros 2 and Ivan Reyes-Amezcua 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Mach. Learn. Knowl. Extr. 2022, 4(1), 131-149; https://doi.org/10.3390/make4010007
Submission received: 4 November 2021 / Revised: 7 December 2021 / Accepted: 13 December 2021 / Published: 8 February 2022
(This article belongs to the Special Issue Recent Advances in Feature Selection)

Round 1

Reviewer 1 Report

GENERIC COMMENTS

The paper provides an interesting approach to optimisation in feature selection, however it requires significant changes before being accepted. The below recommendations warrant extensive work and hence come under major revisions, as in effect there will have to be significant rewriting and provision of further evidence. Detailed comments are provided below. 

SPECIFIC COMMENTS

The abstract should provide a better outline of the paper (motivation - purpose - method - model - results); in its present form it appears to serve the role of an introductory note. 

Overall as an editorial comment, there is use of occasional italicised text across the board, yet it is unclear what the significance of doing so is. There either needs to be consistent use or none at all. 

Intro 

There is a brief intro where reference to feature extraction and selection is referred to, but only two sources are used. While it is understood that these are review papers, typically, one would expect a greater population fo articles there. 

It is also important that the authors outline the core contribution of their method. That needs to be clear and may even be introduced in bullet points or a list. (For perspective, note that the first hint to the novelty of this paper is made in line 90, well into the paper; this needs to be surfaced much earlier). 

Background

Lines 59-51 read :” In this work, we focus on the first approach by developing a fast

50 framework for feature selection that identifies relevant features, and remove irrelevant

51 and redundant ones.”. A brief outline of why the first approach is considered preferable to the second is warranted here to explain how and why similarity measure approaches are not suitable or less suitable. Also, there appears to be no linkage between the preamble in section 2 and then 2.1, 2.2 etc where specific methods are discussed. The authors need to discuss why they chose to address Pearson / MI / MIC over other techniques (and which ones) - in the present form, the section simply presents the metrics without adequately qualifying these. 

Lines 69, 107, 112 feature a missing reference 

The method in section 2.4 should be better explained: it is unclear which and how many algorithms are applied, as the sequence starts from the fastest, and ultimately where the computational gain is.  Where / how does the approach reach closure? If this is what refers to ApproxMaxMI technique, that has to be mentioned. 

Overall, same as before, section 2.4 goes on to provide in 2.4.1,  2.4.2 etc a series of optimisation techniques which, while adequately presented, they could be better introduced in the originating opening in 2.4. Again the question is, why are these three techniques discussed (and not others)?

Section 3 provides a coherent and consistent approach - a recommendation here is to tail it with a summary that outlines the outcomes from the section; this is particularly important as Section 3 proposes the novelty, and hence has to be suitably highlighted. 

Section 4 discusses the architecture; line 251 refers to “several layers”: how many? This is a precise number and needs to be mentioned. The section presents clearly the functionality and architecture. An editorial suggestion here is to remove the 3d level numbering (e.g. 4.3.1; 4.3.2 etc) as the discussion of the sub-layers is brief. Consider also transforming this into a table. 

Section 5 - Results

Perhaps rename to Results & Evaluation. This is very brief and is definitely in need of much further elaboration. For instance you refer to SVM, KNN, LR as testing methods, which is fine, however: (I) you need to address more specific configurations and parameterisation of the methods you chose; (ii) address how you ensured parity between these; (iii) discuss how and why you have selected AUC as a reliable universal metric; (iv) provide a graphical comparison of performance (v) provide the computational configuration you used for all your experiments. 

Conclusions

As in section 5 above, this is very brief and reads more as an epilogue. That needs to be much more developed and refer to the actual specific and actionable conclusions, present the limitations of your approach, expand on the novelty of your approach, discuss future work.      

 

  

Author Response

We really appreciate the comments and suggestions provided. 

 

1. “The abstract should provide a better outline of the paper (motivation -
purpose - method - model - results); in its present form it appears to serve
the role of an introductory note.”
Thanks for your comment. This has been solved at the abstract by
adding the ideas you proposed.
2. “Overall as an editorial comment, there is use of occasional italicized text
across the board, yet it is unclear what the significance of doing so is.
There either needs to be consistent use or none at all. ”
Thanks for your comment. The italics have been removed on the
paper.
3. “There is a brief intro where reference to feature extraction and selection
is referred to, but only two sources are used. While it is understood that
these are review papers, typically, one would expect a greater population
fo articles there.”
Thanks for your comment. Citations have been added to address the
issue from line 46 to 69
4. It is also important that the authors outline the core contribution of their
method. That needs to be clear and may even be introduced in bullet
points or a list. (For perspective, note that the first hint to the novelty of
this paper is made in line 90, well into the paper; this needs to be surfaced
much earlier).
Thanks for your comment, and we added the core contributions using
bullets for it (lines 48-65).
5. “Lines 59-51 read :” In this work, we focus on the first approach by developing
a fast 50 framework for feature selection that identifies relevant
features, and remove irrelevant 51 and redundant ones.”. A brief outline of
why the first approach is considered preferable to the second is warranted
here to explain how and why similarity measure approaches are not suitable
or less suitable. Also, there appears to be no linkage between the
preamble in section 2 and then 2.1, 2.2 etc where specific methods are discussed.
The authors need to discuss why they chose to address Pearson /
MI / MIC over other techniques (and which ones) - in the present form, the
section simply presents the metrics without adequately qualifying these.

Thanks for your comment, and we added extra information conveying
all these ideas.
6. 51 and redundant ones.”. A brief outline of why the first approach is
considered preferable to the second is warranted here to explain how and
why similarity measure approaches are not suitable or less suitable. Also,
there appears to be no linkage between the preamble in section 2 and
then 2.1, 2.2 etc where specific methods are discussed. The authors need
to discuss why they chose to address Pearson / MI / MIC over other
techniques (and which ones) - in the present form, the section simply
presents the metrics without adequately qualifying these.
 Thanks for your comment. We were exploring for this research only
dependency and correlation measures. MIC was suitable due to the
ability of detecning no linear correlation and also due symmetry property
that is useful for detecting groups. Pearson correlation is used
on our proposed as filter preamble and mutual information is used
on MIC.
7. “Lines 69, 107, 112 feature a missing reference ”
Thanks for your comment. We added the references.
8. “provides a coherent and consistent approach - a recommendation here is
to tail it with a summary that outlines the outcomes from the section; this
is particularly important as Section 3 proposes the novelty, and hence has
to be suitably highlighted. ”
Thanks for your comment, we change a bit the section to provide
better explanation of our proposal.
9. “discusses the architecture; line 251 refers to “several layers”: how many?
This is a precise number and needs to be mentioned. The section presents
clearly the functionality and architecture. An editorial suggestion here is
to remove the 3d level numbering (e.g. 4.3.1; 4.3.2 etc) as the discussion
of the sub-layers is brief. Consider also transforming this into a table. “
Thanks for your comment, we specify that four layers are provided
in architecture. We also remove 3d level numbers
10. “Perhaps rename to Results & Evaluation.”
Thanks for your comment, and we added the editorial suggestions
11. Section 5
(a) “you need to address more specific configurations and parameterisation
of the methods you chose; (ii) address how you ensured parity
between these”
Thanks for your comment, we added additional information on
table 4
12. Conclusions
(a) As in section 5 above, this is very brief and reads more as an epilogue.
That needs to be much more developed and refer to the actual specific
and actionable conclusions, present the limitations of your approach,
expand on the novelty of your approach, discuss future work.
Thanks for the comment. We have extended the conclusions and
look at the future with a tool called MAFLE that uses metalearning
and Deep Learning together with Bayesian Causality.

Reviewer 2 Report

I have also attached a pdf of my revision (AE: please see the attached zip), since some equation cannot be correctly wrote in the review window. 

Revision

Manuscript ID: make-1472568

Type of manuscript: Article

Title: A Novel Tool For Fast Feature Selection

The authors proposed a new tool to perform feature selection, aiming at both speed and accuracy of the algorithm, by combining different approaches, mainly based on the combination of the Pearson coefficient and the indicator from information theory (Mutual Information, etc.).

The problem that the authors try to solve is of great importance for the scientific community. Feature extraction is a key factor to develop models and understanding the physics from the data.

However, I have several questions/comments that I would like to see solved/answered before the publication of this paper:

  1. In section 3.1, the authors say that a first screening is the Pearson coefficient. If the Pearson between x and y is enough large (small), it can be concluded that x is correlated (is not correlated) with y. My first point is that this approach may involve false negatives. In fact, suppose that y = x^2 , with x normally distributed. You will have that the Pearson coefficient is zero, but the correlation between the variable is high. How do you take into account such a problem?
  2. In the case of positive Pearson, authors say that (section 3.1) they compare the mutual information with the Pearson. Do you expect that these signals are similar? Even if you use the normalisation of equation (3), I expect that R does not always range from 0 to 1. In fact, saying that R = MI/(Hx + Hy) = 1 implies that the joint entropy is 0, which is an unlikely situation.
  3. Always section 3.1, how do you set the ranges for the Pearson thresholds?
  4. The analysis does not take into account correlations with products. I want just to make you an example. Suppose to have that your target z goes like z = x1 + x2 + f(x3,...,xn), and suppose that x1 and x2 are normally distributed. If you calculate the Pearson and the Mutual information, I suppose that you will obtain very small values. You should perform conditional mutual information. How do you take into account this problem?
  5. It is not completely clear to me how it works the part of groups, described in pag 7, line 166 – 170. If I well understood, the algorithm makes at first a rank with the targets. Then, it groups the possible features that are partially correlated and then, from this group, it choose only one feature. If this is correct, I have another example that I would like to see analysed. Suppose to have the following signals: x = sigmax, y = x + sigmay, z = x + y + f(hi) , where hi are other features not correlated with x and y, and σ is a normally distributed variable. What I suppose that happens is that your algorithm select x and y as features, put them in one group (because partially redundant and correlated), but one of them is excluded from the final features. Is this correct? If yes, how do you take into account this problem?

In general, I would like to ask the authors to reply to these doubts. In particular, I would like to see in the result section how this new tool and the other algorithms perform in the examples that I have shown before.

I have also minor comments:

  1. Revise the English, in some parts you are missing something. Just a few examples:

In the abstract: “Therefore, the increasing interest to develop frameworks for automatic discovery and removal of useless features.” You miss the verb.

Pag 1, line 13: I think there are too many “more”.

  1. The title is too much general in my opinion.

Comments for author File: Comments.zip

Author Response

We really appreciate the correction and suggestions. 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

I am satisfied that the recommendations were followed and the manuscript is now at the right standard. 

Reviewer 2 Report

I think that the authors significantly improve the paper respect with to the previous version and I think that it may be accepted for publication in "Machine Learning and Knowledge Extraction". 

 

Back to TopTop