Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Chinese Short-Text Sentiment Prediction: A Study of Progressive Prediction Techniques and Attentional Fine-Tuning

Future Internet 2023, 15(5), 158; https://doi.org/10.3390/fi15050158

by Jinlong Wang

, Dong Cui and Qiang Zhang^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3:

Marco Palomino

Future Internet 2023, 15(5), 158; https://doi.org/10.3390/fi15050158

Submission received: 7 March 2023 / Revised: 19 April 2023 / Accepted: 21 April 2023 / Published: 23 April 2023

Round 1

Reviewer 1 Report

1) Basically, the article seems complete to me, but in my opinion, the authors focused entirely on the algorithms, omitting the user (human). This is my subjective feeling. That's why I miss in the article a discussion of what the research described here means for users. What is the significance of the scientific contributions made by the authors in this article for Internet users?

2) What are the possibilities of practical use of the research results described here? In the introduction, the authors wrote that I quote: "The vast volume of social text information available on the Internet has a high commercial value." How to use the method described here to increase goal conversion (sales effectiveness, reach increase, etc.)?

3) In my opinion, this study is of a scientific (theoretical) nature. Its practical application is limited. The average user is ignorant of algorithms, mathematical formulas, etc. "Don't Make Me Think" is a common catchphrase that relates to how we use the Internet. In my opinion, the algorithm described here has a chance to be used in practice only in the form of a GUI application. Is it possible to launch an "Small-Sample Sentiment Prediction Using Progressive Prediction Techniques and Attentional Fine-Tuning" in the form of a GUI application, e.g. in the "thin client" model? Please comment.

Author Response

First of all thank you very much for your comments
We think it is very useful to address the issues you have raised. We will take every suggestion seriously.

The following is a response to your comment:

we will add to the paper what the implications are for Internet users. We believe that the study can be useful for internet users to get their feedback to the decision making level faster. For businesses, it is a headache to deal with the huge amount of messages and comments. With this research, content is quickly categorised with the help of ai. For example, underneath the huge amount of positive comments, negative information often hides the value of product improvements. With the help of ai this can be found quickly and with timely feedback. The user's problem can be solved.
As in the previous article, for companies, it can help to filter out more useful information. For example, researching what features are in the new version of the game can be filtered out by ai to the surprised tag.
“ Is it possible to launch an "Small-Sample Sentiment Prediction Using Progressive Prediction Techniques and Attentional Fine-Tuning" in the form of a GUI application, e.g. in the "thin client" model? ” .Yes, practice, not everyone is equipped with programming development and algorithm design skills. So we have designed the program to be called via an API interface that can be used to train and use the BBMEJ model via web pages on demand. In this way, this technology can be truly applied in productive life.
The webpage effect is shown to you in the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

see file attached

Comments for author File: Comments.pdf

Author Response

Firstly, thank you very much for your comments.

Based on your suggestions as follows:

“However there is one piece of information that lacks in this paper. Where is the line that separates small amount of data from large and ftom too small. In other words could you provide some experimental results fir different sizes if data sets - possibly subsets of the same dataset?”

We initially thought that the boundaries of a small dataset depended on the strength of the company being easily accessible and the dataset being small enough. With this advice, it has made us rethink the issue and we think it would be very helpful for us.

We redesigned Experiment 1 and completed it. The original dataset was split into four different size datasets by 10^1,10^2,10^3,10^4. The problem of where to draw the line between small and large datasets was analysed by observing the change in Acc and F1 values combined with the difficulty of labelling the data, and by verifying the performance of our model when trained with large datasets. The experiments have now been done and some of the modifications have been uploaded as attachment . Please see the attachment for details of the experiments

Any other minor suggestions you may have for us will be taken seriously and amended.

[cls] used in many places?

Yes, [cls] is used in the bert classification task where the BERT model inserts a [CLS] symbol in front of the text and uses the output vector corresponding to that symbol as a semantic representation of the whole text for text classification. So our model is used a lot.

Author Response File: Author Response.pdf

Reviewer 3 Report

It was a pleasure reading your manuscript. However, it is important to improve it before publication. Please, consider the following comments. I have divided them in three sections: Title, Content, and English Language and Style.

Title

I recommend that the authors should consider amending the title of the manuscript. I suggest using short-text instead of small-sample, because small-sample may imply a reference to the datasets, rather than a reference to the type of text that is being analysed. So, I suggest a title such as: Chinese Short-Text Sentiment Prediction: A Study of Progressive Prediction Techniques and Attentional Fine-Tuning.

Of course, authors do not have to accept my suggestion, but the words Short-Text better describe the work described in the manuscript.

Content

Sina Weibo

A brief description of Sina Weibo as a Chinese social media platform China would improve the quality of the paper significantly. We do not need to have a whole separate section (or several pages dedicated to it). However, a brief description would be appreciated. This can be included as part of the Introduction or Related Work.

I appreciate that the authors are fully familiar with Sina Weibo, but European and North American readers may not know this platform well enough to understand the value of this article. Additionally, a description of Sina Weibo would allow the authors to clarify the terminology. For example, do you refer to the text retrieved from Sina Weibo as posts or publications? Or do you call them Weibo? Lines 342 and 362 seem to indicate that you can call these pieces of text Weibo. However, Lines 344, 359, and 362 seem to indicate that you can call these pieces of the text tweets, which I would assume it is wrong. Sina Weibo may be similar to Twitter, but I do not think you can call tweets the posts published by Sina Weibo users. All this needs to be clearly stated early in the article.

Polarity

Three sentiment categories are established. The first two are clear: positive and negative. However, the third category is unclear. First, it is called “empty” (Line 95). Then, it is called “null” (Line 215). Both options appear incorrect. You cannot call it empty or null, especially if the first stage of the prediction method has already shown the presence of sentiment (as opposed to the absence of sentiment).

I think the right word is neutral, which is also used by the authors in Line 476.

Anyway, authors need to choose one word and be consistent. If the right word is “neutral”, then they must replace all the occurrences of “empty” (Line 95) and “null” (Line 215) with “neutral”.

Related work

Lines 58-85 may be better placed in the Related Work section. I appreciate that these paragraphs are only meant to introduce CNN and RNN. However, they go a little deep into the details of some of the sources cited. Thus, I recommend that these should be moved to the start of the Related Work section.

Lexicon-based classification

I do not think that Reference 13, cited in Line 115, is a suitable source to support an argument on dictionary-based classification methods. I may be wrong, but I recommend that authors should verify this citation. If necessary elaborate further on the contents of Reference 13, so that readers can understand unequivocally why the authors are using this source.

In any case, I suggest adding a different citation at the end of the first sentence of Subsection 2.1 (Lines 112-113). An additional citation on lexicon-based classification would be useful.

English language and style

Format

After reading this manuscript, one gets the impression that different authors wrote different sections of the manuscript. While this is an acceptable practice, it is important to make sure that the writing style and format of the text are uniform. You cannot have some of the text presented in a different colour. The text contributed by the last author appears in orange, as if using the Track Changes options in Microsoft Word. This must be corrected.

Figure 7

Figure 7 in Page 12 shows the volume of data for different moods. However, the two datasets involved in the histograms are titled Usual and Virus datasets. This must be corrected. Instead of Usual dataset, the title should be General dataset. General means the opposite of specific.

Also, the title Virus dataset is incorrect. I suggest the title COVID dataset, because the pandemic to which the article refers to is precisely COVID-19.

Proofreading

I recommend that authors should proofread the text carefully and thoroughly before publication. There are several lines that could benefit from a different wording. For instance, Lines 27-28 should be rephrased as “Users expect that their complaints about products are answered in a timely manner”.

There are many other lines that could benefit from a rephrasing. For example, Lines

19-20 could be rephrased as “... Sina Weibo super topics in Chinese microblogging may be related to specific events on the Hot Search List (HSL)”.

There are many other lines that could be improved.

Lines 352-354 must be removed. The sentence in these lines is better phrased immediately afterwards (Lines 354-356 are a better description of the same idea).

Every time the authors refer to a numbered figure in their manuscript, the word Figure must be spelled with capital F (even if it is not mentioned at the start of the sentence). For instance, Figure 1 (Line 230), Figure 2 (Line 248), Figure 10 (Line 475), Figure 11 (Line 475), etc.

Conclusions

I recommend that the advantages of the proposed approach mentioned in the Conclusions (starting in Line 490) should be enumerated (or listed) with bullet points to show them clearly.

Lines 500-502 should be replaced with the following text: “Our model has plenty of room for improvement, and we will continue to enhance it by adding convolutional networks and feature fusion to increase the accuracy.”

Author Response

Thank you for your comments, I have amended them as you requested.

Title

Thanks for the suggestion, it's a really good title.

Content

Sina Weibo

Thanks to your suggestion, a brief description of Sina Weibo has been included in the introduction, and the tweets in the text have been changed to Weibo.

Polarity

Thank you very much, I have made the changes

Related work

Thank you very much, I have made the changes

Lexicon-based classification

You are absolutely right, in quote 13, part 2 Although the process of constructing Affective Words is described in detail, it may not be a suitable source for supporting the arguments for a dictionary-based classification approach. As suggested, a new citation has been added.

English language and style

Format

By the time you see this paper, it will be a revised version. On the advice of the journal editor, the revised sections need to be marked up. The editor sent me the following request:

Any revisions made to the manuscript should be marked up using the
“Track Changes” function if you are using MS Word/LaTeX, such that
changes can be easily viewed by the editors and reviewers

For your suggestions, I have used the red marks to make changes. Considering that these marks may affect your review, I will send the assistant editor an unmarked version after submission, which will be forwarded to you by the assistant editor if possible.

Figure 7

Thank you very much, I have made the changes

Proofreading

Thank you very much, I have made the changes

Conclusions

Thank you very much, I have made the changes

Round 2

Reviewer 1 Report

Corrections have been made and are appropriate.

2) What are the possibilities of practical use of the research results described here? In the introduction, the authors wrote that I quote: "The vast volume of social text information available on the Internet has a high commercial value." How to use the method described here to increase goal conversion (sales effectiveness, reach increase, etc.)?

Corrections have been made and are appropriate.

3) In my opinion, this study is of a scientific (theoretical) nature. Its practical application is limited. The average user is ignorant of algorithms, mathematical formulas, etc. "Don't Make Me Think" is a common catchphrase that relates to how we use the Internet. In my opinion, the algorithm described here has a chance to be used in practice only in the form of a GUI application. Is it possible to launch an "Small-Sample Sentiment Prediction Using Progressive Prediction Techniques and Attentional Fine-Tuning" in the form of a GUI application, e.g. in the "thin client" model? Please comment.

Corrections have been made and are appropriate.

I have no more comments. Thank you for your cooperation.

Author Response

Thank you very much!

Reviewer 3 Report

It was a pleasure reading your manuscript again. Thank you for considering my previous suggestions and comments. Please, consider the following notes.

· Line 19: I recommend that the authors should avoid mentioning Sina Weibo in Line 19, because Sina Weibo is introduced until the following paragraph (starting in Line 28). Thus, I suggest changing the sentence starting at the end of Line 18 as follows:

There is a great deal of textual material kept on the Internet, for instance, in online shopping platforms, such as Taobao and Jingdong, where people can express comments and opinions.

· Line 372: I recommend that the authors should change the sentence in Line 372. Please, replace the existing sentence with the following:

The first part is the general microblogging dataset. Such a dataset was obtained randomly from the microblogging content, without identifying specific topics, covering a wide range of subjects.

· Line 376: I suggest changing the way the authors refer to the second part of the experimental dataset. Do not call it “epidemic Weibo dataset”. I recommend that the authors should refer to the second part of the dataset as the “COVID-19 Weibo dataset”. You may also call it “COVID-19 Outbreak dataset”.

The name “epidemic Weibo dataset” is not very good, because the word epidemic is too generic (it could refer to any epidemic in history). I think it is important to specify that it refers to COVID-19.

Of course, changing the name of the second part of the dataset means that you should use the new name in the whole article, not only in Line 376.

Proofreading

I recommend that authors should proofread the text carefully and thoroughly one more time before publication.

Author Response

Thank you for your suggestion, I have revised it according to your request. And furthermore the structure of the article has been checked.

Article Menu

Chinese Short-Text Sentiment Prediction: A Study of Progressive Prediction Techniques and Attentional Fine-Tuning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI