Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Feature Fusion Text Classification Model Combining CNN and BiGRU with Multi-Attention Mechanism

Future Internet 2019, 11(11), 237; https://doi.org/10.3390/fi11110237

by Jingren Zhang¹, Fang’ai Liu^1,*, Weizhi Xu¹ and Hui Yu²

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Future Internet 2019, 11(11), 237; https://doi.org/10.3390/fi11110237

Submission received: 3 September 2019 / Revised: 4 November 2019 / Accepted: 7 November 2019 / Published: 12 November 2019

(This article belongs to the Special Issue New Perspectives on Semantic Web Technologies and Applications)

Round 1

Reviewer 1 Report

1) In the first paragraph please justify your statements. See for example the articles

Ntalianis, Klimis, et al. "Social relevance feedback based on multimedia content power." IEEE Transactions on Computational Social Systems 5.1 (2017): 109-117.

Doulamis, Nikolaos D., et al. "Event detection in twitter microblogging." IEEE transactions on cybernetics 46.12 (2015): 2810-2824.

2) The contribution paragraph of this paper should enriched.

3) In previous works I would like to see a subsection with data fusion and deeo learning. for example loom some papers

Jing, Luyang, et al. "An adaptive multi-sensor data fusion method based on deep convolutional neural networks for fault diagnosis of planetary gearbox." Sensors 17.2 (2017): 414.

Liu, Li, and Ling Shao. "Learning discriminative representations from RGB-D video data." Twenty-Third International Joint Conference on Artificial Intelligence. 2013.

Audebert, Nicolas, Bertrand Le Saux, and Sébastien Lefèvre. "Semantic segmentation of earth observation data using multimodal and multi-scale deep networks." Asian conference on computer vision. Springer, Cham, 2016.

Bakalos, Nikolaos, et al. "Protecting Water Infrastructure From Cyber and Physical Threats: Using Multimodal Data Fusion and Adaptive Deep Learning to Monitor Critical Systems." IEEE Signal Processing Magazine 36.2 (2019): 36-48.

4) Instead of PCA, the authors can applied the new method of Prof. Giannakis in 2018 IEEE ICASSP, d-PCA. Please comment.

5) please identify why fusion does not succeed for all datasets.

6) how different parameters affects the performance? Please give some figures.

7) how does the computational complexity affects the performance.

Minor comment

please put a space in "..sets.Among.."

Author Response

Response 1:Based on your suggestions, I have read Ntalianis, Klimis, et al. "Social relevance feedback based on multimedia content power." IEEE Transactions on Computational Social Systems 5.1 (2017): 109-117.And according to its introduction logic structure to modify, added the introduction of specific target analysis (ABSA), and according to the way you recommend the paper, after the contribution we made a brief overview of the following four parts.

Response 2:Based on your suggestion, we have enriched our work contribution, clarified the advantages of simplification of the multi-conflict convolutional neural network structure, namely contribution 4, and optimized the semantic structure of the article contribution.

Response 3:Based on your suggestions, we have included an introduction to research involving data fusion and deep learning, and cited the articles you listed in the references.

Response 4:Thank you very much for your d-PCA technical reference, which is important for our future work.I looked through the d-PCA related content, but due to the time limit of the paper modification (within 10 days), we still could not fully evaluate the full performance of d-PCA in NLP, and the experimental results still need to be verified. The experimental results obtained by PCA are completely in line with expectations, and we update the influence of the contribution rate of eigenvalues on the classification accuracy in the experimental part, which proves the validity of PCA.

Response 5:Thank you for your suggestion.We have analyzed the reasons for the defeat of the Restaurant dataset in the experimental section, mainly because of the following two points:
1) The data set of the Restaurant data set is significantly higher than the remaining two data sets in our experiment, so the characteristics learned by the training set are limited. The internal parameters of the model need to be improved, which is the focus of our continued research in the future.
2) Although the BiLSTM three-door structure is cumbersome, it still has a place where BiGRU cannot be perfectly replaced in the internal structure, especially the forgetting door and parameter optimization method.

Response 6:Based on your comments, we have added two experiments in the experimental section to analyze the parameters:
1: The effect of adding the characteristic contribution rate parameter of PCA on the classification accuracy of the model. (FiG.12)
2: In order to verify the validity of the part-of-speech attention mechanism, we use the part-of-speech vector dimension parameters from the Restaurant dataset to compare the classification accuracy. (FIG.13)

Response 7:We analyzed the training time of different models in the experimental histogram section, and verified the effectiveness of PCA dimension reduction. However, regarding the computational complexity of your proposed part, please forgive me for the modification time. We are not very good at the moment.The experimental results serve as a support.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper proposes a sentiment classification framework fusing CNNs combined with LSTM. The main technical limitations of this work are the following.

First, I cannot understand the usage of CNN model in sentiment analysis. In general, CNN networks provides better performance when 2D data, such as image/video cues, are considered. This is mainly due to the fact that convolutions operates better in 2D data. Instead, LSTM networks provides better performance when 1D data is considered. Therefore, why do the authors exploit a CNN network? Why do they combine this network with an LSTM structure. This is a critical question. Personally, I think that the role of CNN is not so important in this type of research. The austhors use a PCA analysis for reducing the dimensionality of the data. This is actually the role of deep learning. To provide non-linear dimensionality reduction framework. So, again, the purpose of PCA is unclear. There are several papers deal with social media analysis. One important aspect in this type of processing is the text features used to represent the text content. Usually, information oriented metrics are considered such as the TF-IDF metric [1]. The authors of [2] extend this metric in the context of twitter microblogging. Instead, the authors of this paper use simple text features which is not appropriate for modelling the dynamic characteristics of the twitter. Finally, the experimental framework fails to model the dynamic nature of the twitter, such as to handle with time uncertainties that a text message appear.

Refs

[1] A. Aizawa, “The feature quantity: An information theoretic perspective of Tfidf-like measures,” in Proc. ACM SIGIR, Athens, Greece, 2000, pp. 104–111.

[2] Doulamis, N. D., Doulamis, A. D., Kokkinos, P., & Varvarigos, E. M. (2015). Event detection in twitter microblogging. IEEE transactions on cybernetics, 46(12), 2810-2824.

Author Response

We have made extensive revisions and language retouching of the papers, making the article logic clearer, and modifying the abstract and inconsistent language. In addition, we have made the following improvements in the experimental section:

1: The effect of adding the characteristic contribution rate parameter of PCA on the classification accuracy of the model.
2: In order to verify the validity of the part-of-speech attention mechanism, we use the part-of-speech vector dimension parameters from the Restaurant dataset to compare the classification accuracy.
3: Compare the current mainstream IAN network with our proposed model.

Response 1(Respond to your question about the application of CNN in the field of textual sentiment:):As you said, CNN was first used in the image field.However, in the field of NLP, although the accuracy of sentence classification is not as good as LSTM, in the Aspect-level sentiment analysis (ABSA, text analysis of target keywords); CNN's experimental effect is significantly better than LSTM and other models. For details, you can refer to:

[1]Heyz, R.; Lee, W.S.; Ng, H.T. An Unsupervised Neural Attention Model for Aspect Extraction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada,30 July–4 August 2017; pp. 388–397.

[2]Yin W, Schütze H, Xiang B, et al. ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs[J]. Computer Science, 2015.

Response 2(Why combine this network with the LSTM structure):The difference between LSTM and convolutional neural networks is that convolutional neural networks pay more attention to local fuzzy perception, while LSTM focuses on the reconstruction of neighbors, so LSTM can be used to obtain the association between contexts. The proposed MATT-CNN+BiGRU model is based on the principle, that is, considering the respective advantages of the two models, using MATT-CNN to obtain the first dimension of the emotion of the keyword, and using BiGRU to obtain the second dimension of the sentence, two The information of the dimension is fused to improve the classification accuracy of the sentence, which is the original intention of our fusion of the two models.

Response 3(About PCA and data sets)：At present, there are indeed many researchers in the field of NLP using PCA for data dimensionality reduction. For example, Prof. Giannakis proposed d-PCA in 2018 IEEE ICASSP, and achieved certain results in the field of text classification;

Since we use the part of speech and positional attention to mark the text, the role of TF-IDF on (Term Frequency) and (Inverse Document Frequency) has been replaced, so the TF-IDF indicator is not considered.
The datasets used in our experiments were not taken from Twitter, and many of the data in SemEval2016 and MRD (Cornell's Critics Dataset) have been labeled and the noise and NaN values removed, which is convenient for us to experiment.

Reviewer 3 Report

This paper deals with proposing a feature fusion model based on multi-attention Convolutional Neural Networks (CNN) and Bidirectional Gated Recurrent Unit (BiGRU) networks for text classification research. The major concerns are related to the novelty and the authors’ contribution over the other existing methods, which are not significant. Also, there are several critical issues that the authors should address them, precisely.

1- The paper needs major proofreading. In addition, punctuation should be checked, again. All abbreviations must be explained once and in order. The keywords should be more specific and in alphabetical order. The font in the figures and texts should be the same.

2- The paper does not comply with the writing standards of the journal. The authors should respect the journal’s standard and prepare their papers by writing the paper based on the journal’s template.

3- The literature review is inadequate and inconsistent. The organization of this section is not good enough. Moreover, the authors are advised to use a narrative style based on recent research works to complete the literature review section, not like jumping from one reference in 2015 to the other reference in 2014 and then another reference in 2018. It means that there should be flow in conducting the literature review so that anyone can understand what the motivation is and to which point the work is reaching.

4- Although conclusions and future works should be separated, there is no need to talk about future works at the moment.

5- The problem formulation is not complete. Also, there are some missing parameters in the problem formulation. Parameters are used but not defined. Therefore, the authors are advised to check the formulas line-by-line and define the used parameters.

6- The abstract is badly written. Worse than that is the conclusions section which does not support the abstract. These two sections should be rewritten carefully.

7- An important consideration that is not even mentioned in this paper is the noise in the dataset. Reviewer’s concern regarding how to deal with noisy and missing data (NaN values) should be well-addressed. Kindly refer to the following manuscripts to consider several technical aspects of machine learning in this study which have been explained, completely.

[1] Mohammadi, F.; Zheng, C.; Su, R. “Fault Diagnosis in Smart Grid Based on Data-Driven Computational Methods,” 5th International Conference on Applied Research in Electrical, Mechanical, and Mechatronics Engineering, Jan. 2019.

[2] Mohammadi, F.; Zheng, C. “A Precise SVM Classification Model for Predictions with Missing Data,” 4th National Conference on Applied Research in Electrical, Mechanical Computer and IT Engineering, pp. 3594-3606, Oct. 2018.

[3] Mohammadi, F.; Nazri, G.-A.; Saif, M. “A Fast Fault Detection and Identification Approach in Power Distribution Systems,” 5th International Conference on Power Generation Systems and Renewable Energy Technologies (PGSRET 2019), Aug. 2019.

8- More elaboration on the labeling process is required.

9- There are no major flaws in this article. The authors have jumped from one item to the other item without a clear aim. The paper is not well-organized.

10- The authors have not provided a fair comparison with other recent research works. Reviewer does not see any superiority of this work over the existing methods in terms of the overall accuracy, etc.

Author Response

Response 1:According to your request, we have carried out a lot of rectification of the content of the paper. The keywords were sorted and interpreted, the abbreviations were standardized, and the graphics and text fonts were also modified.

Response 2:According to your request, we have made a lot of changes to the content of the paper according to the template requirements.

Response 3:Based on your suggestion, we have overhauled the review section, and the introduction section is introduced in the order of evolution of the deep learning model.The first part introduces CNN, RNN, LSTM, and capsule models; the second part introduces the development of attention mechanisms in neural networks (in chronological order). The third part combines the attention mechanism and introduces the development of specific target sentiment analysis. At the same time, in the relevant work, according to the order of the general timeline, avoiding the problem that the timeline of the article you pointed out before switching back and forth. The revised paper structure is more organized, I hope you can see our workload.

Response 4:According to your request, we have rewritten the paper's conclusions and deleted the description of future works.

Response 5:We re-examined the parameters and problems in the text and redefine the undefined parameters. Thank you for reading this article so carefully.

Response 6:We rewrote the summary and conclusion sections to make the abstracts and conclusions more compact and interrelated.

Response 7:Based on your suggestion, we read and quoted the above-mentioned articles listed in the relevant locations. In the case of a large amount of data, we generally use regression-based, Bayesian formalization methods based on reasoning tools or decision trees.Inductively determine NaN, because the experimental data used in this paper is relatively mature, and the amount of data is not very large, we use a relatively cumbersome way to manually fill in missing values.

Response 8:The data we used in the experimental part came from SemEval2016, and the part of the dataset about the label has been pre-processed.So in the course of the specific experiment, we gave a brief introduction to the part of the label processing.

Response 9:Based on your valuable comments, we have made detailed and comprehensive changes to the article, and systematically optimized the chapter organization issues that you emphasized.

Response 10:According to your opinion, we have added the IAN network to our experimental comparison with our proposed model. By comparing the two data sets and comparing with the training time, our proposed model has a certain degree compared with them.improve. But as you said, the improvement is not very obvious, and this is where we need to work hard to improve in the future.
In addition, we have made the following improvements in the experimental section:
1: The effect of adding the characteristic contribution rate parameter of PCA on the classification accuracy of the model.

2: In order to verify the validity of the part-of-speech attention mechanism, we use the part-of-speech vector dimension parameters from the Restaurant dataset to compare the classification accuracy.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

My comments have been addressed.

However, the response

"Response 1:Based on your suggestions, I have read Ntalianis, Klimis, et al. "Social relevance feedback based on multimedia content power." IEEE Transactions on Computational Social Systems 5.1 (2017): 109-117.And according to its introduction logic structure to modify, added the introduction of specific target analysis (ABSA), and according to the way you recommend the paper, after the contribution we made a brief overview of the following four parts."

is OK but why the authors have not cited the article o Ntalianis Klimis et. al?

Author Response

Sorry, this is our work mistake. We have included the paper "Ntalianis, Klimis, et al. "Social relevance feedback based on multimedia content power." IEEE Transactions on Computational Social Systems 5.1 (2017): 109-117." In the literature 7, please review.
Thank you again for your recognition of our work.

Author Response File: Author Response.pdf

Reviewer 3 Report

The majority of my concerns are addressed. However, some improvements in grammar and punctuation are required. The authors are asked to read the paper line-by-line and check the grammar and punctuation, precisely.

Author Response

Thank you for your valuable comments, we have re-calibrated the paper. Due to time factor (submitted within one day) there may be individual errors that have not been corrected. If you have any questions, you can always contact me and thank you again for your approval.

Author Response File: Author Response.pdf

Article Menu

Feature Fusion Text Classification Model Combining CNN and BiGRU with Multi-Attention Mechanism

Further Information

Guidelines

MDPI Initiatives

Follow MDPI