Next Article in Journal
Acknowledgement to Reviewers of Computation in 2019
Previous Article in Journal
An Evolutionary Computing Model for the Study of Within-Host Evolution
 
 
Article
Peer-Review Record

Clustering versus Incremental Learning Multi-Codebook Fuzzy Neural Network for Multi-Modal Data Classification

by Muhammad Anwar Ma’sum *,†, Hadaiq Rolis Sanabila, Petrus Mursanto and Wisnu Jatmiko
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Submission received: 28 October 2019 / Revised: 3 January 2020 / Accepted: 4 January 2020 / Published: 13 January 2020

Round 1

Reviewer 1 Report

The paper is an extended work from the authors' previous paper that focuses on a similar but simpler problem. Overall, the paper seems to have a solid theoretical foundation with abundant experimental results. I think it is publishable after addressing my comments.

Please proofread the paper carefully. Even the very first sentence has an English error. 84.67% accuracy does not sound an impressive result. Please either give more references on why 84.67% is a good result or try to fine-tune your model to improve the accuracy. Some of the figures are overflown (to the right margin), please fix them in the revised version.

Author Response

Point 1 : Please proofread the paper carefully. Even the very first sentence has an English error.

 

Respond :I have revised my paper regarding the writing and presentation.

 

Point 2 : 84.67% accuracy does not sound an impressive result. Please either give more references on why 84.67%is a good result or try to fine-tune your model to improve the accuracy.

 

Respond : I have explain about this issue on the discussion sub-section, The
content is as follows: The performance of the proposed methods is indeed still
below 90%. For some applications such as medical analysis, It may be not
acceptable. However, this study concerns to show that the proposed method
improves the original method significantly with improvement up to 24%.
Furthermore, the popular classifiers’ achievements on this dataset are also below
the proposed method. It shows that the dataset is difficult to classify. However,
the good news is the proposed method has room for improvements Please read
the Discussion sub-section on my revised paper for further explanation.

 

Point 3 : Some  of the figures are overflown (to the right margin), please fix them in the revised version.

 

Respond : I have revised the writing of my paper and fixed the overflown
figures and tables. Please check my revised version of my paper for the
presentation.

Reviewer 2 Report

 

the introduction section discusses multi-modal data is an open problem for classification, but lacks of formalization of the definition of multi-modal data and what are the targeted data sets for the proposed method. Suggest adding problem statement to clarify the targeted problem to solve, on data from multiple data sources, or single sources but with different distributions. Please further describe the motivations of using Multi-Codebook approach. Can you add content to explain why select alpha=0.05 and other parameters for experiment? how does the parameter impact the results? How are the parameters for MLP, SAE, DBN, ELM selected, are those optimal parameters? For approaches that require training and testing data, what's the partition and how many shuffles are performed?

Author Response

Point 1 : the introduction section discusses multi-modal data is an open problem for classification, but lacks of formalization of the definition of multi-modal data and what are the targeted data sets for the proposed method. Suggest adding problem statement to clarify the targeted problem to solve, on data from multiple data sources, or single sources but with different distributions.

 

Respond : I have revised the explanation in the introduction as follows:
Somehow, we will face multi-modal data where the feature of the data is
distributed in multiple areas. For example, several users in e-commerce have
different preference for the same choice (item). Another example is the
prediction of the election results. A candidate is voted by residents that aged
below 30 (0-30) and in range (40-60). While the other candidate is voted by
residents in age (25-45) only. We can say that the first candidate (class) has
multi-modal data. Gathering the data from such conditions will result in multimodal
data, a data characteristic that has multiple peaks when its distribution is
drawn.

 

Point 2 : Please further describe the motivations of using Multi-Codebook approach.

 

Respond : I have elaborated the motivation of using the multi-codebook
approach in the introduction as follows: To deal with multi-modal data, the
original (single-codebook) neural networks are not able to fit the data properly.
The data feature spreads in multiple areas while the methods only approximated
by using a single preference (codebook). When the feature of the classes is not
overlapping, then there is no problem. But if the feature is overlapping, then It
will cause misclassification. For example, in the election case as mentioned
above. Let’s say class A has a feature in range (0-30) and (40-60) while class B
has a feature in range (25-45). The single preference (codebook) classifier will
have a reference that class A is in range (0-60) and class B is in range (25-45).
We know that (25-45) is in (0-60) which means class A and B is highly
overlapping. In this case, the classifier will find difficult to separate those classes.
In the multi-codebook approach, the classifier will have two references for class
A (0-30) and (40-60). It means in class A and B are less overlap in this case. The
more explanation is written in sub-section 4.1

 

 

Point 3 : Can you add content to explain why select alpha=0.05 and other parameters for experiment? how does the parameter impact the results? How are the parameters for MLP, SAE, DBN, ELM selected, are those optimal parameters?

 

Respond : It is explained in the experiment setup sub-section as follows: Our
preliminary try and error showed that variation of alpha, beta, delta, and gamma
do not contribute significantly on a multi-modal dataset. Instead, the number of
clusters has a big role in achieving good accuracy. Therefore, in the experiments,
we concerned more about cluster parameters and threshold for incremental
learning parameters. DBN and SAE are tested by using epoch=100 and hidden
layer={50,75,100}. The result will be taken from the best layer. We stop on 100
layers because increasing layers from 25 to 100 does not improve performance
significantly. Rather, in some cases the performance is decreasing when the
number of layers is increasing. ELM is tested by using epoch=100. The MLP is
also used as comparison. The MLP is implemented in Weka software by using the
default setting as follow alpha=0.3, momentum=0.2 and epoch=500. The
explanation is written in Experiment Setup section.

 

 

Point 4 : For approaches that require training and testing data, what's the partition and how many shuffles are performed?

 

Respond : The experiment is conducted by using 5-folds cross-validation.

Reviewer 3 Report

This article addressed the classification problems of multi-modal data. Authors proposed a Learning Vector Quantization (LVQ) method based on multi-codebbook framework. Two multi-codebook generation processing including clustering and incremental learning are discussed in the article. For evaluation, the proposed approach is tested on both synthetic and benchmark datasets. The proposed approach achieved the best results over compared methods.

 

The proposed approach is going to tackle the classification problem. However, by reading through the introduction of existing work based on LVQ, I didn't get a clear idea about how an input data can be eventually classified. The author spent a lot of spaces on introducing the update rules of existing frameworks of LVQ and its variants. However, my feeling is these contents didn't add too much value to the article, instead, they kind of shadow the novel contribution of the article. In addition, it is hard to figure out how those update rules are related to the final classification decision. Fow example, in Equ 2, it is hard to understand how W_w is related to classification process. What parameters will be updated during the training process? Some description kind of make me confused. For example, in the paragraph before Equ 1, author wrote: "each instance was trained iteratively by measuring the distance using formula 1", does the iterative process update the vector of the input sample or the reference vector? Please clarify more about the update rule and provide a more clear description over there.

 

Some syntaxes are missing contexts, for example, W_w in Equ 1. How W_w got initialized? In line 7 of the algorithm 1, what does denote f c = f 1 − f m with i as class label mean? Where is i in this equation? Is that a hyphen sign in the middle or a subtract sign?

 

Many other places still need a polish, I suggest the author address issues I bring up over here and submit a revised version.

Author Response

Point 1 : The proposed approach is going to tackle the classification problem. However, by reading through the introduction of existing work based on LVQ, I didn't get a clear idea about how an input data can be eventually classified.

 

Respond : I have revised the explanation of LVQ as follows: Given input vector
(x), class label (Y), and reference vector (w ij ) of class j for feature i. The input
vector is an instance of training data. In the beginning, the reference vector (w ij )
for each class is initiated by random sampling from the training sample. Then the
reference vector (w ij ) is updated by using equation 1 and equation 2. First, the
method computes the distance between input and
reference vector of all class (all variation of j) by using equation 1. Then the
method find the winner class. Winner class is denoted as W and the reference
vector (codebook) of winner class is denoted as W_w . Value of W is one of j
possible values. For example, if we have three classes j=1,2,3 then w could be 1,
or 2 or 3. Winner class is a class that its distance is closest (minimal) to the input
vector. Then the reference vector of winner class (W_w ) is updated by using
equation 2. The process is conducted for all input vectors (training instances)
and It is conducted for a number of iteration (epoch).

 

Point 2 : The author spent a lot of spaces on introducing the update rules of existing frameworks of LVQ and its variants. However, my feeling is these contents didn't add too much value to the article, instead, they kind of shadow the novel contribution of the article. In addition, it is hard to figure out how those update rules are related to the final classification decision. Fow example, in Equ 2, it is hard to understand how W_w is related to classification process. What parameters will be updated during the training process?

 

Respond : As for equation 2 case, I have explained in the previous point. As for
the contribution of the proposed method, I have elaborated the paper by adding
a sub-section about the problem, motivation and idea of the proposed method
(sub-section 4.1), and discussion sub-section regarding the result of the study to
emphasize the contribution. I also added the explanation of these matters in the
introduction section.

 

Point 3 : Some description kind of make me confused. For example, in the paragraph before Equ 1, author wrote: "each instance was trained iteratively by measuring the distance using formula 1", does the iterative process update the vector of the input sample or the reference vector? Please clarify more about the update rule and provide a more clear description over there.

 

Respond : I have revised the explanation of LVQ as mentioned in point 1.

 

Point 4 : Some syntaxes are missing contexts, for example, W_w in Equ 1. How W_w got initialized? In line 7 of the algorithm 1, what does denote f c = f 1 − f m with i as class label mean? Where is i in this equation? Is that a hyphen sign in the middle or a subtract sign?

 

Respond : It means the feature space of class i, where i is stated as sentinel number in for loop. However, I have revised the algorithm so that line 7 becomes : f c = f 1, f 2, .. f m of class i.

Round 2

Reviewer 2 Report

Examples are helpful, but the definition of multimodal data can be further improved in the introduction part.

Language style needs to be improved such as the use of ‘somehow’.

Suggest providing the analysis and experiment results to quantify why alpha, beta are not important.

 

 

Author Response

Point 1 : Examples are helpful, but the definition of multimodal data can be further improved in the introduction part.

 

Respond : I have added the definition (representation) of multi-modal data in introduction as follows:According to Baltrusaitis et al, the multi-modal can be represented by two representations as shown in figure 1 [1]. One of the representations is the joint representation notated as x m = f (x1, x2, ...xn). The function f is a joint function that is computed by machine learning methods such as neural networks or restricted Boltz-mann machine. The other representation is coordinated representation notated as f (x1) ∼ g (x2) . The function f and g are the mapping function that maps the unimodal distribution into multi-modal distribution. The coordination between f and g is notated by the symbol ( ∼ ).

 

Point 2 : Language style needs to be improved such as the use of ‘somehow’.

Respond : I have revised the writing of my paper regarding the issue.

 

 

Point 3 : Suggest providing the analysis and experiment results to quantify why alpha, beta are not important.

 

Respond : I have revised my paper regarding the explanation of the parameters as follows:Our preliminary try and error is conducted to find the best combination of alpha, beta, delta, and gamma for FNGLVQ. The experiment is conducted by using 2Peak-2Class synthetic dataset. Table 2 shows that for epoch=100, the chosen value of alpha, beta, and gamma achieved good accuracy among the tested combinations of parameters. Therefore the values are used in the following experiments.

Author Response File: Author Response.pdf

Reviewer 3 Report

All my points in my first review comments have been addressed. Paper now is much easier to understand and the author's reply has clarified my confusion in the first version of the paper. Thus I suggest publishing the paper as it is.

Author Response

Thank you for the comments and advice to improve the quality of my paper.

Author Response File: Author Response.pdf

Round 3

Reviewer 2 Report

line 494-495, figure number is missing. figure 14,15,16,17,19,20, recommend using a darker color for the x, y axis.

Author Response

Point 1: line 494-495, figure number is missing. 

answer: I Have revised my paper so that the figure number is not missing.

 

Point 2: Figure 14,15,16,17,19,20, recommend using a darker color for the x, y axis. 

answer: I have modified the figures by using darker color for x,y axis.

Back to TopTop