Next Article in Journal
Automated Segmentation and Classification of Aerial Forest Imagery
Previous Article in Journal
On Sense Making and the Generation of Knowledge in Visual Analytics
 
 
Article
Peer-Review Record

Comparison of Different Modeling Techniques for Flemish Twitter Sentiment Analysis

Analytics 2022, 1(2), 117-134; https://doi.org/10.3390/analytics1020009
by Manon Reusens 1,*, Michael Reusens 2, Marc Callens 2, Seppe vanden Broucke 1,3 and Bart Baesens 1,4
Reviewer 1:
Reviewer 2:
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Analytics 2022, 1(2), 117-134; https://doi.org/10.3390/analytics1020009
Submission received: 7 June 2022 / Revised: 3 October 2022 / Accepted: 9 October 2022 / Published: 18 October 2022

Round 1

Reviewer 1 Report

The authors present an interesting piece of research and address a gap in literature. They contribute to the domain by comparing multiple modelling techniques for sentiment analysis on a newly introduced dataset. Overall, the paper uses a proper methodology and the experiment is nicely set up. Moreover, they are transparent in sharing their code on github, which I strongly applaud.  Hence, I believe this paper add value to this journal. Nevertheless, there are several things that could be further improved.

 Main remarks:

Positioning of the paper

1/ The authors should rethink the positioning of the paper and use proper terminology. Typically, benchmarking studies are performed on several datasets and explore a wide range of techniques (e.g. Verbeke et al. (2012) in churn, or Lessmann et al. (2015) in credit scoring). Hence, when reading the title I was expecting a more elaborate analysis in this paper than what is currently presented. I understand that data collection is difficult and adding several additional (Flemish language) datasets is probably outside the scope of the current project. Therefore, it would be fair to the readers to not claim a “benchmarking” study, but rather position your work as  a “comparison” of several modelling techniques on one particular case.

Literature review:

2/ It is unclear on what basis literature is included/ excluded in Table 1. General conclusions from this Table are therefore invalid such as “English occurring most frequently” or “studies that benchmark models across all four distinct categories are rare”.

Experimental choices

3/ The authors distinguish Flemish from Dutch and explain that there are significant differences between these languages, especially on Twitter. This statement seems questionable and might be further elaborated by for example adding references to this statement or by comparing Dutch vs Flemish tweets and empirically show that these are different.

4/ The authors remain vague on the tuning of the parameters. More details are needed.

Method

5/ One of the goals of the paper is to investigate non-english corpora. Some of the methods require to translate the text to English, which make the translation an important step in the process that should deserve more attention.

Results:

6/ Results are presented confusingly. Table 12 shows a nice summary and should be built up better beforehand. The authors used an extensive set of classification algorithms, but it adds relatively little to the core of the paper. They could leverage this extensive set of results by comparing of different model vector representation and preprocessing techniques within the machine learning classifier.

 

7/ The authors claim that “VADER undoubtedly performed significantly worse than the other models”, no significance testing in the manuscript is presented.

Smaller remarks:

8/ There are some language issues and parts that are unclear. Below a non-exhaustive list of such issues:

·        p. 1, lines 19-20: “More specifically, it is estimated that the total number of social media users increased to 4.2 billion from a mere 0.93 billion in 2010, and the total is expected to increase further in subsequent years.”

·         Generally, scientific papers use “active” instead of “passive” tense

·         Write-up should be more concise and scientific.

 

9/ There are some issues in how the paper is structured. For example, the build up of the contributions is written down confusingly and arguments are not always well developed.

10/ The authors explicitly mention 2001 as the start of sentiment analysis (p. 1, line 27). What is the reference paper here?

11/ Language specific pre-preprocessing techniques are not included in the paper. It would be interesting to list specific (if any) Flemish language pre-processing techniques if any.

12/ Something went wrong in the reference to the “itranslate” package.

13/ The appearance of Tables 6-12 look odd.

 

14/ Table 7 has two values in bold for Macro Precision. 

References:

Lessman, S. , Baesens, B. , Seow, H. , & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of re- search. European Journal of Operational Research, 247 , 124–136 .

Verbeke, W. , Dejaeger, K. , Martens, D. , Hur, J. , & Baesens, B. (2012). New insights into churn prediction in the telecommunication sector: A profit driven data mining approach. European Journal of Operational Research, 218 (1), 211–229 .

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Kindly see the attached file for comments.

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

The manuscript is centered on a very interesting and timely topic, which is also quite relevant to the themes of Analytics. Organization of the paper is good and the proposed method is quite novel. The length of the manuscript is about right. The paper, however, does not link well with recent literature on sentiment analysis appeared in relevant top-tier journals, e.g., the IEEE Intelligent Systems department on "Affective Computing and Sentiment Analysis". Also, latest trends in multilingual sentiment analysis are missing, e.g., see Lo et al.’s recent survey on multilingual sentiment analysis (from formal to informal and scarce resource languages). Finally, check recent resources for multilingual sentiment analysis, e.g., BabelSenticNet.

 

Authors seem to handle sentiment analysis mostly as a binary classification problem (positive versus negative). What about the issue of neutrality or ambivalence? Check relevant literature on detecting and filtering neutrality in sentiment analysis and recent works on sentiment sensing with ambivalence handling. Finally, the manuscript cites no papers from 2022: check latest works on emotional recurrent units and aspect-based sentiment analysis via graph convolutional networks.

 

 

The manuscript presents some bad English constructions, grammar mistakes, and misuse of articles: a professional language editing service (e.g., the ones offered by IEEE, Elsevier, or Springer) is strongly recommended in order to sufficiently improve the paper's presentation quality for meeting the high standards of Analytics. Finally, double-check both definition and usage of acronyms: every acronym, e.g., LSTM and RNN, should be defined only once (at the first occurrence) and always used afterwards (except for abstract and section titles). 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

The article focuses on the study of data contained in Flemish tweets. Mainly known methods and their combinations are used. 

The peer-reviewed article is well structured, there are all the necessary items for the article, which presents the results obtained by the authors.

As noted above, the paper uses known methods and approaches, but this in no way diminishes the importance of the article.

All results are correct.

And need remember the rule of a good scientist "that not a good result is also a result".

Below we would like to share with you some thoughts, that are related most to process of study, such as subject, the information technology, the algorithms and experimental methodology.
First of all we want to dive into the topic of the sentiment analysis itself.

Many datasets are provided as is and they are carefully arranged by topics, the authors select the entries that contain the topics of biggest interest. So as the dataset may be, in fact the sentences that represent edge situations, such as, for instance “good review” and “bad review” and in some situations “neutral”. The problem that, however lays behind the methodology of studying such datasets is that there are also some sentences that can be classified with a great confidence such as “bad”, “good” or “neutral”. These situations should always draw attention since they affect the behaviour of the algorithms and thus may affect the performance on unknown data if using such approach.

Speaking about the preparation of the data, we totally agree, that separation of the tweets from one country, but representing the same language base is a good idea, but there may be some underwater stones. Let us explain. We know that Flemish, the language that is covering a wide region of Belgium, Netherlands and partly Luxeuburg is a language of multinational communication and since the speakers of this language may share the same thoughts using same language media. We totally agree, that spokers from Belgium would have different thoughts about the things happening in the political world or related to modern culture (music, art) or sports. Though we may already see a little flaw in the study. Since the authors of the work represent Belgian culture and nation, there is also a language phenomena, when the country has multiple spoken languages, so as in fact the French is minority language (in terms of its usage and not the nation representatives). It means that the tweets, representing the Belguim culture are spoken in French, but have a Belgium in heart. We understand, that language is not the topic of your study and we appreciate the unique Belgium culture, specially speaking about fine arts. We think that the French tweets had to be translated into Flemish in order to translate thoughts, same can be said about the tweets in English.

Speaking about omitting the tweets from the nearest Flemish-spoken country, we think that this idea has both advantages and drawbacks. The advantage is that the language field is narrowed to the Belgium and if the study touches the phenomena (let’s say some news), it would be more correct in terms of omitting other phenomena, that, in fact would be a good idea to avoid the anomalies in data, where, if looking the feature-space representation of the data, one can find data crowdings, that represent Belgium-olny phenomena and others Netherland-only phenomena. This, on one hand is a good idea, but here is a flaw: the phenomena and anomalies may, in fact contain minor part of the dataset and can be omitted by a stage called visual analysis and anomaly detection. Let us explain: many modern algorithms, such as neural networks, methods of grouping of data and methods of data dimensionality reduction work better with bigger data. Every neural network on shallow layers would contain, in fact the hidden data representation in the latent feature space. If one can construct a convolution autoencoder (or even more – recurrect autoencoder, if one can be constructed), the hidden data representation would show the points (coordinates) in latent feature space and thus allow to see the anomalies and hidden relationships in data. So as, in stage of data engineering, that follows the data preparation, the anomalies could be omitted, using the proof of studied data. We also want to underscore, that according to our expirience, the data crowdings and anomalies are more “pronounced” in bigger datasets. In fact, if splitting the dataset, let say, in half, the density of clusters would be smaller and the separation of such data would be harder than otherwise.

Now speaking about the language, itself and its preparation for usage in data classification. The methods of data stemming and data lemmatization are widely used in natural language processing. They are, in fact the methods that allow to remove the unnecessary language noise and focus only on specific language phenomena which is usage of specific terms within a relatively big text so as the benefits of data processing using a regular neural networks (without recurrent connections) is totally clear. Less words means less features. Less features means less RAM used to store the data and so all.

In fact, from our experience, we figured out, that the stemming and remowing stop-words affected badly our algorithms. In fact, we decided to keep them since they affected the dimensionality reduction algorithms. For these purposes we used SVD on TFIDF features (even on big datasets containing several dozens of thousands of entries, representing probably few hundred sentences in each). And here we come to most important part. The language embedding models, the neural networks and LSTMs. These types of constructs find the hidden connections between every feature: let’s say the feature with number 113 representing word “horse” can be find very often near feature 291 representing word “barn”. The hidden relationship lays behing the appearance of these words within a sentence, and not within a text. In contrary, in LSTM models, the syntactic sugar, that appears in sentence, defines the distance between some terms, that may interest us and removing them may affect the number of features, that represent the property “term frequency” but also the property “word-distance”. This can be done in some synthetic datasets, where each data entry may contain hundreds of sentences.

 

In contrary, the stemming or removing syntactic sugar, in your case don’t really affect the accuracy and recall of the model. In fact, this happens because the size of the dataset is not big enough (we explained above why it is so important). And also this technique does not affect the data entries that in most cases contain only one long sentence or few simple ones. In old rules of the Twitter the user could post the tweet that contains 255 Latin symbols. If the tweet was too big, it could be posted as an image with a text. If we are speaking about the tweets, the internal restriction of 255 symbols restricts the number of sentences and also number of information hidden within. Thus, it may be useful if constructing some neural network models, that follow the rule (no more than 255 features) and even using pre-trained model on the other dataset. From the one hand, the restriction is a good feature (ease of construction of neular network architecture and interoperability). From the other hand it is a flaw, because there is a rule of thumb if studying linear systems (because in fact the TFIDF features represent a matrix, that form a specific linear system of equation that had to be solved). If we to apply methods of dimensionality reduction and data grouping (for instance SVD and T-SNE), we can see how this approach is affecting the features. Low number of features and low number of data entries form very scattered data plot and give really a very big problem to classify the data with a given dataset.

Remarks:

Issue 1 The sentence on line 33-34 don’t need a reference to source 24 and 26 because they were cited in two sentences before the noted one.

Issue 2 The table 1 contains of unnecessary repetition of term “Learning”.

Issue 3 Line 219 contains of invalid reference within: “[?]”.

Issue 4 Figure 1 and Table3 has to be moved in the section 3.3.2, where they are referenced to.

Issue 5 Lines 234-235 have a repetition of word “approach” and the sentence is better to rewrite as: “Therefore, only this and Textified approaches were used”.

Issue 6 Formulae 1 and 2 has to include the commas after the formulae. Also the sentence on lines 247-248 have to begin from the word “where” which means the disambiguation of terms in formula 2.

Issue 7 Sentences on lines 250 and 251 begin from the small letter. Also sentences begin from reference what is not a good style of writing. In our opinion the borderline of usage of such style of reference can be the sentence that acknowledges the thought in previous sentence, e.g. “The researchers in text mining area is widely adopting Word2Vec models in their research. [11, 23, 7] – are the best examples of articles showing the successful adoption of such idea in research, in particular, sentiment analysis”.

Issue 8 Tables 3 and 4 are hard to read, because the data perceives as a list of parameters, which are to apply to every method in left column, what is not true in your case. We suggest using other form of representation of such information.

Issue 9 Figure 2 has a text, which covers the important part of the diagram – the dataflow between elements of bidirectional LSTM. Thus, the vectors h have to be moved in order to display proper connections between elements in the scheme.

Issue 10 The text on line 335 contains of unnecessary line break (start of paragraph).

Issue 11 There are different styles of tables in the article: for instance, table 4 has outer left and right borders and the table 5 has not.

Issue 12 The tables 6-9 are too wide. The addressed issue with tables 6-9 can be fixed as follows: 1) the words describing the type of model must stay within the allowed range; 2) the terms  “Textified”, “Stemmed”, “Lemmatized” have to be moved and the numbers in other columns have to be split with fraction “/” sign; 3) the text explaining the fractions in table has to be placed after introduction of table 6-12 has on the bottom of the pages 11 and 12 (footnote) with an asterisk “*” or number and following text: “each number in tables display the appropriate parameter, when used a specific modification of the used method. The upper number of the fraction displays the value for “Textified” modification, the bottom “Lemmatized” and the middle (only for TFIDF) number – “Stemmed” respectively.

Issue 13 The tables 10-12 are placed on the borders of the page and have to be placed between left and right border of the text, which can be done with a proper alignment.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors addressed all my comments with sufficient detail. They extended existing tables, improved the write-up and structure of their manuscript and have clarified several aspects. I have no further remarks. I wish the authors all the best.

Author Response

We thank the reviewer for taking the time to go through our paper and for all the valuable feedback we received.

Reviewer 2 Report

1. Be consistent with formatting Tables. Table 1 and 2 have caption on the top while Table 3 has at the bottom

2. Research works given in Comment 4 of revision 1 are not cited.

3. Minor English correction and typos need to be fixed.

Author Response

We thank the reviewer for taking the time to go through our paper and for all the valuable feedback we received.

Regarding the reviewer's comments, we addressed them accordingly:

  1. We changed the formatting of Table 3 and put the caption above the table. That way, the formatting is now consistent throughout the entire paper.
  2. We cited the papers referred to by the reviewer.
  3. We went over the entire paper and fixed the remaining spelling errors and typos.

Reviewer 3 Report

The authors have addressed all of my concerns and their revisions have substantially improved the manuscript.

Author Response

We thank the reviewer for taking the time to go through our paper and for all the valuable feedback we received.

Reviewer 4 Report

I thank the authors for the high evaluation of my work as a reviewer and for taking into account all my remarks. The work became better because of it.

Author Response

We thank the reviewer for taking the time to go through our paper and for all the valuable feedback we received.

Back to TopTop