Next Article in Journal
Effect of a Urethane Acrylate-Based Photosensitive Coating on the Reliability of Ag Nanowire Transparent Electrodes
Next Article in Special Issue
Global Study of Human Heart Rhythm Synchronization with the Earth’s Time Varying Magnetic Field
Previous Article in Journal
Sea Bass Side Streams Valorization Assisted by Ultrasound. LC-MS/MS-IT Determination of Mycotoxins and Evaluation of Protein Yield, Molecular Size Distribution and Antioxidant Recovery
Previous Article in Special Issue
Using Recognizable Fuzzy Analysis for Non-Destructive Detection of Residual Stress in White Light Elements
 
 
Article
Peer-Review Record

Global and Local Information Adjustment for Semantic Similarity Evaluation

Appl. Sci. 2021, 11(5), 2161; https://doi.org/10.3390/app11052161
by Tak-Sung Heo 1,2, Jong-Dae Kim 1,2, Chan-Young Park 1,2 and Yu-Seop Kim 1,2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2021, 11(5), 2161; https://doi.org/10.3390/app11052161
Submission received: 8 December 2020 / Revised: 25 February 2021 / Accepted: 25 February 2021 / Published: 1 March 2021
(This article belongs to the Special Issue Selected Papers from IMETI 2020)

Round 1

Reviewer 1 Report

I suggest to include a definition of the equations 5 to 8. Although in the paper authors explain the meaning and contributions of the variables used in each equation, it is not clear what the equation represents.

Author Response

Comment 1: I suggest to include a definition of the equations 5 to 8. Although in the paper authors explain the meaning and contributions of the variables used in each equation, it is not clear what the equation represents.

Answer: The description of formula (5)-(8) has been changed. The revised sentences are shown in line 154-162 as follows.

“In equation (5),  is the hidden state of the current timestep, and  is the hidden state of the any timestep including the current one.  and  refer to the learning weights of the corresponding timesteps i and j, and  is the bias vector. In equation (6),  is a learning weight that calculates the importance of each word in terms of the current word. eij in equation (6) is a scalar value representing importance of hj in terms of hi. And the importance of words is normalized to a probability value by equation (7). Then,  containing context information is extracted by equation (8) By multiplying the hidden value of each word by the importance probability scalar and adding all the results, the final importance vector of a given word can be obtained. C is a matrix of  when the length of the sentence is  and the number of units of Bi-LSTM is .”

Reviewer 2 Report

The Authors investigate semantic similarity of sentence pairs using a combination of known machine learning methods of modern linguistics. The paper has a good literature review and moderately high scientific level. It presents a parameterized learning model where both local and global features of a sentence are retrieved through state-of-the-art AI tools. The idea to include local features and make explicit use of a word context within a sentence seems to be plausible, as it promises to enrich previously exploited global information for better nuancing of sentence meanings. As a result, an improvement in accuracy over a number of previously developed models has been demonstrated. The experiment section and the ensuing discussion are solid, particularly valuable is the comparison of various existing modesl with the proposed one, as well as between the English and Korean sentence datasets, potentially giving rise to interesting comparative studies of various ethnic languages.

Several remarks and minor objections come to mind. Firstly, known tools are integrated in an innovative way, but no new tools are devised. Secondly, the achieved improvement shown in Table 4, albeit distinct, is rather incremental against the best known models - within a single percentage point, hence no real breakthrough can be claimed. 10% or 20% of instances of undiscerned meaning sounds like a lot, what impact can it have on real-world applications? Thirdly, the adjustment of relative significance of the local and global features through the \alpha is still arbitrary and it is not clear to what extent any recommendations as to its setting carry over to other datasets; the Authors are close to making this point themselves. Is there any argument in favor of the universality of the proposed modeling? Finally, the presentation suffers from lack of clarity at times, often providing bare equations and referring to other works without enlightening comments. Apparently, the target audience is limited to a circle of experts working on the same problem. Motivational examples including elements of the proposed modeling (e.g., elaborating on the sentence pairs in Table 1) would help. The phrasing in the body text is often vague or bizarre and should be corrected; samples are: "which shows better than past studies", "one or more semantic information exists", "sentence pairs with relatively free of word order", "created through the learned the siamese network" (Siamese should be capitalized), "maximizing/minimizing the vector", "sentences with very short sentences" and many others. I recommend a revision before the paper is presented to larger audiences.

Author Response

Comment 1: Known tools are integrated in an innovative way, but no new tools are devised.

Answer: As you have suggested, this study did not present new tools, unfortunately. This seems to have occurred because of the rush to solve a given problem using DeepNLP technology that is developing at a very fast pace. We will do a more in-depth study of developing new tools based on your comments. Thanks.

 

Comment 2: The achieved improvement shown in Table 4, albeit distinct, is rather incremental against the best known models - within a single percentage point, hence no real breakthrough can be claimed. 10% or 20% of instances of undiscerned meaning sounds like a lot, what impact can it have on real-world applications?

Answer: We fully agree with your opinion that a single digit performance improvement cannot have a noticeable effect in practical applications. Clearly, however, it can be seen that the model using both global information and local information has improved performance compared to the model using only global information in previous studies. Therefore, compared to conventional models, this model can produce better accurate results when used in real applications (e.g., information retrieval, plagiarism detection, etc.), even though the improvement could not be distinguishable.

 

Comment 3: the adjustment of relative significance of the local and global features through the \alpha is still arbitrary and it is not clear to what extent any recommendations as to its setting carry over to other datasets; the Authors are close to making this point themselves. Is there any argument in favor of the universality of the proposed modeling?

Answer: As you mentioned, it is correct that the alpha value is determined arbitrarily according datasets. We agree with you that we need to think about this. So, we determined the study to solve the problem as a future work. The detail is expressed in 5. Conclusions.

 

Comment 4: Finally, the presentation suffers from lack of clarity at times, often providing bare equations and referring to other works without enlightening comments. Apparently, the target audience is limited to a circle of experts working on the same problem. Motivational examples including elements of the proposed modeling (e.g., elaborating on the sentence pairs in Table 1) would help.

Answer: We modified and supplemented some of the descriptions of the equations and corrected the citation position to avoid meaningless citations. In addition, we added a motivational example to the Introduction, line 55-65. This example is rewritten from the Table 1.

 

Comment 5: The phrasing in the body text is often vague or bizarre and should be corrected; samples are: "which shows better than past studies", "one or more semantic information exists", "sentence pairs with relatively free of word order", "created through the learned the siamese network" (Siamese should be capitalized), "maximizing/minimizing the vector", "sentences with very short sentences" and many others. I recommend a revision before the paper is presented to larger audiences.

Answer: We reviewed the whole article with your comments to make it more clear. Please understand the difficulty of listing all of those revised texts here.

Reviewer 3 Report

The paper deals with a semantic similarity evaluation technique. It is a very interesting topic and because of use of electronic documents everywhere also considerably up to date.

The introduction and the literature review are joint together in the sole section called Introduction. Because of the selected topic, it would be more beneficial to include the separate section devoted to the literature review, whilst the Introduction section would serve only as a general entrance to the discussed topic itself.

The Material and Methods section are quite comprehensive. The subsections 2.2 to 2.5 describe the whole process in detail. It helps reader to understand the analytical steps in the background.

The Experiments section looks like not explained enough. For instance, as I presume, Word2Vec is your main analytical tool. Is it true? If yes, this should be more emphasised here. Now, I think it is little bit lost in the text. Table 4 demonstrates comparison of accuracy for the both cases. What criterion represents the basement in order to rank the mentioned models? Why is it ordered just as it is?

The Discussion section needs to be revised. It involves no reference to the other study. But this section should be aimed at a comparison of the study outcome with the other studies exploring the same or similar field. The summarisation of your outcome is well built, but it is not enough for this section.

The text should be proofread. Also, it contains some formal shortcomings as for instance a disturbed continuity of the text between the headings of the 3.1 subsection and the 3.1.1 subsection. There is a lack of the text. It would involve at least an information about the contents of the 3.1 subsection.

Author Response

First of all, thank you for the careful review and professional comments. The answers to the reviews are summarized by each your comments below.

 

  • The paper deals with a semantic similarity evaluation technique. It is a very interesting topic and because of use of electronic documents everywhere also considerably up to date.
    • Thank you for your understanding of our research.
  • The introduction and the literature review are joint together in the sole section called Introduction. Because of the selected topic, it would be more beneficial to include the separate section devoted to the literature review, whilst the Introduction section would serve only as a general entrance to the discussed topic itself.
    • We have separated the Introduction section into ‘1. Introduction’ and ‘2. Related Works.’ In the ‘Introduction’ section, the background, motivation, problem and composition of this research are explained. The ‘Related Works’ section describes previous works that inspired this research. The reorganized sections can be found between line 23 and line 107.
  • The Material and Methods section are quite comprehensive. The subsections 2.2 to 2.5 describe the whole process in detail. It helps reader to understand the analytical steps in the background.
    • We are glad that section 2 we wrote helped you understand our proposed process.
  • The Experiments section looks like not explained enough. For instance, as I presume, Word2Vec is your main analytical tool. Is it true? If yes, this should be more emphasised here. Now, I think it is little bit lost in the text. Table 4 demonstrates comparison of accuracy for the both cases. What criterion represents the basement in order to rank the mentioned models? Why is it ordered just as it is?
    • Unfortunately, Word2Vec is not the primary analytical tool in this research. Word2Vec is just one of several tools to represent the words that make up a sentence as dense vectors. We revised the introductory part of the Experiment section (line 209 – line 212) to correct this misunderstanding.
    • We have modified Table 4 more reasonably. No1 to No3 in Table 4 are previous works and their accuracies. No4 and No5 are models that use only global features or only local features, respectively, and these are models of RNN series and CNN series, respectively. No6 and No7 are models to verify the usefulness of dynamic routing. No8-No11 show the accuracies of various models that combine RNN, Self-Attention, and capsule networks, which are basis of our model.
  • The Discussion section needs to be revised. It involves no reference to the other study. But this section should be aimed at a comparison of the study outcome with the other studies exploring the same or similar field. The summarisation of your outcome is well built, but it is not enough for this section.
    • We added a performance comparison with previous researches at the beginning of the Discussion section. Additional sentences can be found in lines 278-284.
    • We also modified the Conclusion section. In particular, a new future works is proposed at the end of the section. Future works are described between line 307 and line 312.
  • The text should be proofread. Also, it contains some formal shortcomings as for instance a disturbed continuity of the text between the headings of the 3.1 subsection and the 3.1.1 subsection. There is a lack of the text. It would involve at least an information about the contents of the 3.1 subsection.
    • We have proofread and modified this paper.
    • We have described the corpus for word embedding between 4.1 and 4.1.1. The corpus used in this research are the corpus used for word embedding and the corpus used for similarity estimation learning. Of these corpora, the former is described between line 209 and line 212, and the latter is described in subsections that follow.

Round 2

Reviewer 1 Report

Requirements were applied

Author Response

Since the reviewer stated 'Requirements were applied', there is nothing to write here.    

Reviewer 2 Report

I have noticed a substantial improvement of the presentation and motivation for the presented research. Although the Authors' responses to some of my previous comments seemed to agree with my objections without effectively addressing them, other responses and the changes subsequently made in the body text have in my opinion resulted in an acceptable, if a little too specialized submission. Therefore I withdraw my former objections and recommend publication. 

Author Response

Since the reviewer stated 'Therefore I withdraw my former objections and recommend publication.', there is nothing to write here.

Reviewer 3 Report

Thank you for detailed enumeration of the changes. Almost all the points can be considered to be edited enough. Although, the discussion section should be even richer.

Author Response

Comments and Suggestions for Authors:

Thank you for detailed enumeration of the changes. Almost all the points can be considered to be edited enough. Although, the discussion section should be even richer.

 

  • First of all, thank you very much for admitting that we have made enough revision. We have revised 5. Discussion to further reflect what you have said. We made following three modifications

 

  • The last sentence of the first paragraph is changed to “Comparing No. 1 and No. 2, and No.3 and No. 4, it can be seen that simply and mechanically combining the global features and the local features does not help improve performance. Rather, it only decreases the accuracy.” (line 283-285)
  • We added a sentence, “Unlike images in which the relative positions of the pixel chunks are fixed to some extent, the position of words or phrases in a sentence are relatively more free,” at the end of the second paraphrase. (line291-293)
  • We enriched the last phrase. The result of revision is shown in line 294-304.
Back to TopTop