Fake News Classification Based on Content Level Features
Round 1
Reviewer 1 Report
The article presents a problem of fake news classification with use of several ML methods in combination with natural language processing. The study is undoubtedly relevant. The article is written in good language and well-structured. The literature review is complete enough.
There are few comments:
Some terms are given in different ways, for example, “tokenize” and “Tokenize”
Line 46 - “…are shown in Section 4. the discussion is also…”
period in the middle of a sentence
Line 48 - Section 2 relates more to the methods used than to recent work, so its title is somewhat confusing
Line 49 – Subsection 2.1. named “Fakes News Definition”, but this section does not contain any definitions
Line 82 - “reinforcing learning”
obviously, reinforcement learning meant
Line 91 - P(d1, d2, d3|h)
there is no description for variables in this formula
Line 112 “…deep neural netwo with globrks…” – typo
Line 314 “From the graph, neural network models can e seen to outperform ML models…” – typo
The numbering of subsections in section 4.2 is out of order.
References must be numbered in order of appearance in the text.
Please, format the references list in accordance with the journal requirements
In general, the article can be accepted for publication with minor corrections
Author Response
Dear editors and reviewers,
Thank you for giving the constructive comments to enhance the manuscript entitled “Fake News Classification Based on Content Level Features” (Paper# applsci-1543601). It is our pleasure to have the second evaluation for publication. The authors have been considered all reviewers’ suggestions, and the manuscript provides higher readability and completeness in the reversion. The authors would like to provide the revised manuscript and the comment reply, and all modifications in the manuscript are marked by yellow. We hope that the current version is qualified to be considered for publication in the MDPI Applied Sciences.
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
9th January, 2022
Review of Manuscript ID: applsci-1543601
Article: Fake News Classification Based on Content Level Features
The article contains a problem of making the difference between real and fake news. The objective is to build AI using NLP and ML which would perform the given task as good as possible.
Previously this task has been approached using linguistic methods analyzing the vocabulary and structure of the language to compare them in real and fake news materials. The other approach has been also earlier to use modern AI methods with the computers. In this article both approaches are used.
AI and particularly NLP has ability for passive and active forms of ntulal language generation. The computer can also develop phases with content. As well the phases content can be recognized in a text by the computer. The authors state: “ML definitively came from early AI community, and the algorithmic methods over the years included the study of the decision and inductive logic: clustering, reinforcing learning, and Bayesian methods.” This raises the question on which the authors should give more light. It is clear that Binomial logistic regression represents a method that belongs to Traditional ML Training. But is not clear that also Naïve Bayes and SVM belong to the same category. The authors should clarify their thoughts in the categorization.
In the category of Neural Networks there exist clearly Neural Networks with Global Maxpool, CNN and LSTM. But the difference between this category and the Traditional ML Category is not clear, e.g. with SVM. Please, write down the arguments.
Review of the related studies, the authors introduce, is good. The abbreviations DT, SGD, RF, XGB, and LR should be opened as they may appear in the first time. The authors refer to Hakak, et al [36] “in this article a combination of decision tree , random forest and extra tree classifier mentioned accuracy in this field 99.8%.“ This result might reserve a comment. Why was this result so good? Or what is the authors comment / conclusion?
The paper also gives a reference to some of advanced data treatment methods in stage of data preprocessing, but these do not give remarkable results.
The good and interesting description of previous studies should lead to conclusions about the methods in the past. What is the conclusion of the past on the previous writers. Why do these writers believe to make better than the previous ones. What is the difference between their research and the one of predecessors.
As regards to the structure of the article the section Exploratory analysis is now following experimental analysis made by developed AI methods. The exploratory includes now mainly quantitative linguistic analysis, which as such belongs to this study. The authors should think, however, the location of the section within the text. Very often Eploratory part is before actual experimental analysis.
We see that libraries of TensorFlow e.g. NumPy, Pandas, and Matplotlib are in use. In section Training methods (3.4) the authors are obviously using GoogleNews-vertors-negtive300.bin service. The clarification is needed where exactly this Google service is used. Also the reader wants to know some facts about this service and its availability for other researchers. The experiences on this are to be reported.
Figure 8 an 9: The article uses term “standard deviation”. I believe the authors mean “standard error” or RMSE. Please, check.
Figure 10: Is the average number of words in the article really so small as 10.2 (in real news articles) or 14.62 (in fake news article)? Probably these numbers refer to words in headlines? Please, clarify.
The numerical analysis with traditional statistics and ML give the results. They are fairly balanced between in accuracy of training and testing. The biggest difference is in Random Forest 99.6% vs. 92.8%. This might refer to a slight overfitting. The similar dispersion is in loss curves with CNN with Global Maxpool model and CNN Deep Network model: The test loss is slightly higher than training loss.
Conclusion: The conclusion chapter is quite short. It allows a limited contribution. The reader easily concludes that this is just another study comparing implementation AI to the problem. The authors should make the comparison with the earlier results and try to explain the reasons for possible differences with them.
Comments for author File: Comments.pdf
Author Response
Dear editors and reviewers,
Thank you for giving the constructive comments to enhance the manuscript entitled “Fake News Classification Based on Content Level Features” (Paper# applsci-1543601). It is our pleasure to have the second evaluation for publication. The authors have been considered all reviewers’ suggestions, and the manuscript provides higher readability and completeness in the reversion. The authors would like to provide the revised manuscript and the comment reply, and all modifications in the manuscript are marked by yellow. We hope that the current version is qualified to be considered for publication in the MDPI Applied Sciences.
Please see the attachment.
Author Response File: Author Response.pdf