Next Article in Journal
Perceptions and Resistance to Accept Smart Clothing: Moderating Effect of Consumer Innovativeness
Next Article in Special Issue
Compositional Distributional Semantics with Syntactic Dependencies and Selectional Preferences
Previous Article in Journal
Portable near Infrared Spectroscopy as a Tool for Fresh Tomato Quality Control Analysis in the Field
Previous Article in Special Issue
On the Use of Parsing for Named Entity Recognition
 
 
Article
Peer-Review Record

Evaluation of the Coherence of Polish Texts Using Neural Network Models

Appl. Sci. 2021, 11(7), 3210; https://doi.org/10.3390/app11073210
by Sergii Telenyk 1,*,†, Sergiy Pogorilyy 2,† and Artem Kramov 2,†
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2021, 11(7), 3210; https://doi.org/10.3390/app11073210
Submission received: 16 March 2021 / Revised: 26 March 2021 / Accepted: 31 March 2021 / Published: 2 April 2021
(This article belongs to the Special Issue Rich Linguistic Processing for Multilingual Text Mining)

Round 1

Reviewer 1 Report

My suggestions regarding the article:

line 67. I suggest the use of can instead of cannot, as there is a double negation.

line 74. I suggest: ”…the Polish corpus is still at the initial stage. Despite the…”

line 111 and 212: I suggest using long-distance instead of long-distant

line 132: I suggest using are not instead of aren’t

line 192: I suggest using “…by sentence, he or she can understand…” instead of just he

line 196: Please, check the use of than in the collocation: “If the value of the corresponding weight is non-zero than the edge…” My suggestion is to use: ”…non-zero, then the edge…”

line 238: I suggest using does not instead of doesn’t

line 249: In the collocation: “…will help to recognize…” - I suggest using the bare infinitive: “…will help recognize…”

line 253: In the collocation: “…in order to set the common fixed length of them. …” I suggest using: “…in order to set their common fixed length. …”

line 280: There seems to be extra space before:  “ Tokens [CLS]…”

line 285: I suggest using commas: “…tokens, regardless of word order, can be…”

line 286: I suggest using comma: “…language, where the order…”

line 286: I suggest using is not instead of isn’t

line 318: In the collocation: “…can help to verify…” - I suggest using the bare infinitive: “…can help verify…”

line 334: Please, check the use of that in the collocation: “…the original version is higher that the corresponding value…” My suggestion is to use: “…the original version is higher than the corresponding value…”

line 351: There seems to be extra space before: “ The pre-trained…”

line 353: There seems to be extra space before: “ The dimension…”… “ The lemmatisation…”

line 355: There seems to be extra space before: “ The pre-trained…”

line 358: There seems to be extra space before: “ The binary…”

line 383: I suggest using a comma: “…approaches, too.”

line 405: In the collocation: “…helps to reveal…” - I suggest using the bare infinitive: “…helps reveal…”

 

Also, I would suggest revisiting the References section, in order to keep consistency in editing: for ex. the years are sometimes written in bold, some other times not.  

 

 

 

 

Author Response

line 67. Done

line 74. Done

line 111 and 212: Done

line 132: Done

line 192: Done

line 196: Done

line 238: Done

line 249: Done

line 253: Done

line 280: There is no extra space there

line 285: Done

line 286: Done

line 286: Done

line 318: Done

line 334: Changed

line 351: There is no extra space there

line 353: There is no extra space there

line 355: There is no extra space there

line 358: There is no extra space there

line 383: Done

line 405: Done

 

--- Also, I would suggest revisiting the References section, in order to keep consistency in editing: for ex. the years are sometimes written in bold, some other times not.

--- We have prepared references using the BibTex package for a LaTex MDPI template. As for the bold font for the article's years, it is automatically generated by "Bibliography style for MDPI journals" for journal articles. We have checked other papers of "Applied Sciences" and have reviewed the "References" section of the journal; corresponding references are formatted in the same manner.

Reviewer 2 Report

Overall the manuscript is presented well. I only have 3 minor comments

 

1) Assess whether reference [1] is necessary in the article. It seems too general for the subject matter.

 

2) (lines 225 to 230). It is indicated that in this work a LSTM model is developed without additional convolution levels. It is recommended to improve the justification of the choice, contrary to what has been done by other authors.

 

3) (lines 237 to 239) Regarding the choice of a window size of 3, it is indicated that increasing this value does not provide improvement. How is this statement justified? On the basis of what is indicated in the paper [21], or have empirical tests been performed? Since it is a different language, I think tests should be done to confirm this.

 

Author Response

1) Assess whether reference [1] is necessary in the article. It seems too general for the subject matter.

After the reconsideration of the corresponding paragraph, it was decided to remove this reference. We agree that it is too general reference for this topic.

2) (lines 225 to 230). It is indicated that in this work a LSTM model is developed without additional convolution levels. It is recommended to improve the justification of the choice, contrary to what has been done by other authors.

The corresponding explanation of this choice has been added to the paper. It should be mentioned that all considered models based on LSTM layers and convolution operations were tested on English corpora. As the Polish language belongs to another family group, it is advisable to verify if the usage of LSTM layers is suitable for the processing of Polish texts at all while evaluating output coherence. The key question consists in the order of words' processing, namely, whether it should be taken into account for the coherence evaluation of Polish corpora. Thus, it is suggested to design an LSTM-based coherence estimation model without additional CNN layers in order to estimate the expediency of the analysis of sentences in a word-by-word manner.

3) (lines 237 to 239) Regarding the choice of a window size of 3, it is indicated that increasing this value does not provide improvement. How is this statement justified? On the basis of what is indicated in the paper [21], or have empirical tests been performed? Since it is a different language, I think tests should be done to confirm this.

A more detailed description of the selection of the size of a "window" has been added to the corresponding paragraph. The search for the size of a "window" was chosen according to the experimental verification of the impact of its different values on the training and inference of a neural network that was described in the paper [20] ([21] before the removal of the first reference). It should be mentioned that the corresponding experimental selection of the value was performed according to the analysis of an English corpus. In spite of the different structure of the sentences of English and Polish languages, the way to group sentences around the key idea of a text should remain the same for all persons. It is supposed that the size of a "window" may depend on the style of a text (e.g., the long-distance sentences of a step-by-step manual can be more connected than the sentences of some fiction that describes nature, environment, etc.). As both English and Polish corpora consisted of news reports, the same size of a "window" (L=3) was chosen.

Round 2

Reviewer 2 Report

No comments

Back to TopTop