Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Learning Functions and Classes Using Rules

AI 2022, 3(3), 751-763; https://doi.org/10.3390/ai3030044

by Ioannis G. Tsoulos

Reviewer 1: Anonymous

Reviewer 2:

Tao Meng

Reviewer 3:

Yue Zhao

AI 2022, 3(3), 751-763; https://doi.org/10.3390/ai3030044

Submission received: 24 July 2022 / Revised: 22 August 2022 / Accepted: 1 September 2022 / Published: 5 September 2022

(This article belongs to the Special Issue Feature Papers for AI)

Round 1

Reviewer 1 Report

The manuscript presents an interesting new approach to tackle classification and regression problems. The method is surpringly general and can therefore be applied to many different problems. This makes this original paper highly relevant and significant. The technique of Grammatical Evolution generates the rules set iteratively. The introduction gives an adequate entry to the science. The method is described in detail and so can be easily reproduced. It is great to see that the authors have applied their approach to a very large number of well known standard test problem data sets which provides sound validation of the method. I consider this paper to be of really high value and recommend its publication.

Author Response

Dear reviewer thank you for your comments.

Reviewer 2 Report

Comments for author File: Comments.pdf

Author Response

1. COMMENT

In the fitness function, the optimal solution is found through selection and mutation, but this evolution process is random, and individuals with poor fitness may also become the parent or be preserved. How to ensure that the utility function is obtained.

RESPONSE

The following text has been added at the end of subsection 2.2:

“In the proposed genetic algorithm, elitism is used for the best chromosome in the population, which means that the best solution, if found, will not be lost between the iterations of the genetic algorithm.”

2. COMMENT

The author should check carefully the results of MLP for BL and PY datasets in table 3. Why MSE of MLP is so higher than that of RBF and proposed method.

RESPONSE

Yes you are correct, due to a misstype. The correct value for PY was 0.10 and was corrected. Thank you for this comment.

3. COMMENT

In experimental results, the classification error and MSE were tested by GE, and they were compared with the RBF and MLP in tables 2 and 3.The classification error and average MSE seem to be lower in GE, however, The RBF is with 10 parameters and the weights of MLP is set to 10. My question is that the method proposed by the author must be compared with the results of RBF and MLP after fine-tuning the hyperparameters, so as to truly reflect the superiority of the proposed method, otherwise the comparison results are meaningless. Therefore, the author must do the comparative experimental again under the fine-tuning parameters of RBF and MLP network.

RESPONSE

1) The following explanation was added for the MLP column: “The column MLP stands for the results of a neural network with 10 sigmoidal nodes trained by a genetic algorithm. The genetic algorithm has 500 chromosomes and the maximum number of allowed iterations was set to 500. At the end of the genetic algorithm a BFGS variant due to Powell [powell] was used to enchance the obtained result.

2) The method Stochastic Gradient Descent SGD was added to the experimental results. The added text in subsection 3.2 reads: “The column SGD stands for the incorporation of the Stochastic Gradient Descent method [sgd], used to train a neural network with 10 hidden nodes. ”

3) The well known local search procedure of Levenberg-Marquardt was also used and the added text reads: “The column LEVE represents the results from the usage of the well - known local search procedure Levenberg-Marquardt [leve]to train a neural network with 10 hidden nodes. ”

4) The following text was added as well as a figure with the average classification error for RBF and artificial neural network for different number of weights. The text reads:

“To justify the incorporation of 10 weights in the RBF network and in the artificial neural network, they were trained with 5, 10, 15 and 20 weights for all classification data and the result is shown graphically in the Figure fig:weight_comparison. The neural network was trained using the BFGS local search procedure. In the graph the average classification error for all datasets is shown. As can be seen in both neural networks 10-15 processing nodes are enough to achieve good results.”

5) Additional experiments with the maximum number of generations (parameter N_G) were conducted and two new tables have been added in the revised manuscript. The added text reads: “Also, and additional experiment was conducted to determine the effect of the maximum number of generations on the accuracy of the proposed method. The number of generations was varied from 500 to 2000 (parameter N G ) and the results for the classification datasets are shown in Table tab:ng_class and for regression datasets in Table tab:ng_regression. These experiments show the dynamics of the proposed method and its accuracy, since even for a small number of generations it shows remarkable results. Also through them is seen the need for the use of intelligent termination rules that will terminate the proposed technique in time without having to exhaust all generations of the genetic algorithm.”

4. COMMENT

The paper is formatted well and free of typographical errors, however there are countless grammatical errors. There are unnecessarily wrong sentences. For example, In 3.1 Dataset description, the first sentence “The was tested on a series of well known datasets from the relevant literature and the results are compared against other machine learning techniques”, is hard to understand.

RESPONSE

1) The sentense was corrected

2) The freely available software of https://www.grammarcheck.net/ was used to correct the manuscript.

Reviewer 3 Report

This paper proposes an interesting pattern recognition method based on Grammatical Evolution. The author claims the proposed outperform other machine learning techniques. Although I am not familiar with Grammatical Evolution, I’d like to point out two major shortcomings in this paper.

1. The main issue, in my opinion, is that the author only compared the proposed method with RBF and MLP, without providing many details about the training process. First of all, a shallow network or not-well-trained network could perform very badly. How many layers are in the MLP? Are there regularization methods used in the networks, such as batchnorm, residual connection, weight decay, etc.? The mainstream optimizer is stochastic gradient descent, why “a genetic algorithm” is used to train the MLP? Is the learning rate tuned well? Secondly, even though the author doesn’t want to spend too much time tuning the MLP, why not use a tree-based method, for instance, xgboost, which is one of the most widely-used machine learning methods? It shouldn’t take long on a medium-sized dataset with open-source xgboost package. In addition, the author used so many datasets but didn’t really deep dive. I think it would be more convincing to select a few representative datasets, regarding the data size, data imbalance, missing value percentage, numeric/categorical feature ratio, or image/text/tabular, etc. and then compare the model performance with in-depth analysis. The proposed method may outperform for some datasets with certain characteristics.

2. The author mentioned three primary challenges of the existing machine learning methods: overfitting, latency, interpretability. However, the author doesn’t mention these items in the result section so it doesn’t seem to me the proposed method could mitigate these challenges. In general, a rule-based methods seems having less overfitting, lower latency, and better interpretability than a complex xgboost or neural network. But the paper doesn’t have any evidence to prove it. For example, regarding the interpretability, by reading the example in Figure 1., it is certainly not straightforward to me how to interpret it. An example of interpreting the final rules generated by the proposed method based on 1 or 2 datasets would be very helpful to prove this item. I think the interpretability would be the most promising part of the proposed method, because there are other techniques to mitigate the overfitting and latency problems of the existing machine learning methods.

Author Response

1. COMMENT

First of all, a shallow network or not-well-trained network could perform very badly. How many layers are in the MLP? Are there regularization methods used in the networks, such as batchnorm, residual connection, weight decay, etc.? The mainstream optimizer is stochastic gradient descent, why “a genetic algorithm” is used to train the MLP? Is the learning rate tuned well? Secondly, even though the author doesn’t want to spend too much time tuning the MLP, why not use a tree-based method, for instance, xgboost, which is one of the most widely-used machine learning methods? It shouldn’t take long on a medium-sized dataset with open-source xgboost package. In addition, the author used so many datasets but didn’t really deep dive. I think it would be more convincing to select a few representative datasets, regarding the data size, data imbalance, missing value percentage, numeric/categorical feature ratio, or image/text/tabular, etc. and then compare the model performance with in-depth analysis. The proposed method may outperform for some datasets with certain characteristics.

RESPONSE

4) The following text was added as well as a figure with the average classification error for RBF and artificial neural network for different number of weights. The text reads:

2. COMMENT

The author mentioned three primary challenges of the existing machine learning methods: overfitting, latency, interpretability. However, the author doesn’t mention these items in the result section so it doesn’t seem to me the proposed method could mitigate these challenges. In general, a rule-based methods seems having less overfitting, lower latency, and better interpretability than a complex xgboost or neural network. But the paper doesn’t have any evidence to prove it. For example, regarding the interpretability, by reading the example in Figure 1., it is certainly not straightforward to me how to interpret it. An example of interpreting the final rules generated by the proposed method based on 1 or 2 datasets would be very helpful to prove this item. I think the interpretability would be the most promising part of the proposed method, because there are other techniques to mitigate the overfitting and latency problems of the existing machine learning methods.

RESPONSE

Two figures were added showing examples of produced programs for classification and regression and the added text in subsection 3.2 reads:

“An example program for the Wine dataset is illustrated in Figure fig:program_example. A similar constructed program for the Housing regression dataset is shown in Figure fig:example2. The proposed method constructs simple rules for classifying or learning functions while at the same time selecting features, i.e. keeping from the initial features of the problem those that have greater weight in learning.”

Round 2

Reviewer 2 Report

The authors clearly explain the four questions.

Reviewer 3 Report

The author has addressed all the questions and comments in my previous review. I recommend to publish the current version of the paper. Thanks.

Article Menu

Learning Functions and Classes Using Rules

1. COMMENT

RESPONSE

2. COMMENT

RESPONSE

3. COMMENT

RESPONSE

4. COMMENT

RESPONSE

1. COMMENT

RESPONSE

2. COMMENT

RESPONSE

Further Information

Guidelines

MDPI Initiatives

Follow MDPI