Next Article in Journal
Robust Cochlear-Model-Based Speech Recognition
Next Article in Special Issue
J48SS: A Novel Decision Tree Approach for the Handling of Sequential and Time Series Data
Previous Article in Journal
Utilizing Transfer Learning and Homomorphic Encryption in a Privacy Preserving and Secure Biometric Recognition System
 
 
Article
Peer-Review Record

Sentiment Analysis of Lithuanian Texts Using Traditional and Deep Learning Approaches

by Jurgita Kapočiūtė-Dzikienė 1,*, Robertas Damaševičius 2 and Marcin Woźniak 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Submission received: 27 November 2018 / Revised: 21 December 2018 / Accepted: 24 December 2018 / Published: 1 January 2019

Round 1

Reviewer 1 Report

The paper is quite interesting exploiting the sentiment analysis domain for the Lituanian language. The manuscript is well written in general and it is ease to follow. Nevertheless, I have one main concern:

The obtained results are compared in terms of accuracy measure but they are compared with other type of approaches as dictionary- based methods. For this case I strongly recommend a second case study where polarity comparisons between well-known lexicons (SentiWordNet or Sentiment140) to illustrate the similarity of the results. Moreover, I also want to indicate the need of a background section where sentiment-based approaches were addressed. The authors can use as a reference the background of this paper:

de Diego, I. M., Fernández-Isabel, A., Ortega, F., & Moguerza, J. M. (2018). A visual framework for dynamic emotional web analysis. Knowledge-Based Systems145, 264-273.

It also includes a novel methodology to generate dynamic sentiment analysis which is not considered in the current approache presented by authors. One of the popes of the sentiment analysis domain "Erik Cambria" should be referenced in this background.

FInally, some mistakes:

Line 46: "Naivel Bayes" -> Naïve Bayes

Line 65: "recurrent RNN" -> RNN are presented in the next paragraph so please upgrade the organization.

The histogram figures are too big for the size of the font of the paper. Please modify them accordingly.

Notice that it is interesting to consider the opinion of more experts in future work as only two were used.

I desire the best to the authors in the implementation of the suggested modifications.




Author Response

Our response to your comments:

1. Unfortunately, the external sentiment resources (as, e.g., SentiWordNet) do not exist for the Lithuanian language (therefore we could not integrate them into our methods). Besides, Twitter is not popular Social Network among the Lithuanian users; therefore application of Sentiment140 would also be complicated. Even if the external lexical resources existed for the Lithuanian language, we would require either normative texts or accurate diacritics restoration tools (to restore diacritics in words) and lemmatizers (to transform words into their lemma form to be found in the SentiWordNet). Besides, one dictionary-based method (using rather small lexicon of annotated words with their sentiments) was tested on the Lithuanian language before (ref. [35] in the renewed version of the paper), but the results were very poor. Thus, machine learning approaches seem as the only effective solution for our solving task with the currently available resources (we added explanation to your comment in the 2nd, 3rd, 12th pages). 

2. Reference to (de Diego et al., 2018) was included into the paper as ref. [14].

3. The size of Figures 1-6 were decreased as requested.

4. We cited the paper of E. Cambria as ref. [63].

5. English mistakes and other mistakes (you denoted in line 46 and 65) were corrected.

Thank you for contributing to the improvement of our paper.


Reviewer 2 Report

The authors have described their research on sentiment analysis using both regular and more recent deep learning methods in NLP. The key distinguishing feature of this paper is the dataset. The authors have applied sentiment analysis to Lithuanian texts.

#114 The dataset has only 2 human annotators and a single rating, based on "mutual agreement". This does not appear to be a statistically sound approach. It is important to allow independent assessment and report on the bias & variance statistics. Since, 2 is too small for this, a report on the correlation score between the human annotators would provide the dataset further credibility.

The authors have accounted for diacritics in the Lithuanian language, which is a good point.

I would recommend applying a greater relative space to the description of the dataset, which will help the reader get an intuitive appreciation of this paper and an intuitive comparison between sentiment analysis as applied to Lithuanian vs. (for example) English.


The paper would benefit from providing some form of comparison results with existing papers in a different typically utilized language. This provides a check on the accuracy of the implementations and an approximate idea of the challenges on applying sentiment analysis to Lithuanian. It could be argued that for a casual reader (not working with Lithuanian language) a core motivation to read the paper is to discover what the idiosyncrasies of a different language are and how they might affect sentiment analysis. 


#252 The experimental section though good, appears to be a laundry list. It would be more interesting if the authors provided a couple of lines of motivation of changing certain parameters  in each of the experiments.


Author Response

Our response to your comments:

1. By “mutual agreement” we mean that the annotation was done independently and only texts that obtained the same polarity values were included into the dataset (we clarified it in Section 2.2.).

2. We completely agree that the difference of the Lithuanian language specifics (and non-normative dataset specifics) compared to the English language must to be highlighted. We added the explanation in Section 2.2.

3. The comparison of the sentiment analysis results in the similar languages (morphologically complex and facing the same diacritics problem) is presented in Section 4. Unfortunately the deep learning approaches are not popular for the sentiment analysis task for such languages, probably because so far they cannot outperform the traditional machine learning approaches.   

4. More detailed explanation about the experiments is presented in Section 3.

Thank you for contributing to the improvement of our paper.


Reviewer 3 Report

ABSTRACT: It should contain the preliminary information of the advantage of using deep learning approaches. When the reader is informed that traditional machine learning methods reach a higher accuracy performance, the reader should have an anticipation here of why the authors are interested in using deep learning methods.
INTRODUCTION: for the best of my knowledge, the authors could add reference to a couple of important contributions in the field. Namely,
- Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1631-1642).
- You, Q., Luo, J., Jin, H., & Yang, J. (2015, January). Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks.
In AAAI (pp. 381-388).
The METHODOLOGY section is complete, very clear, and well written.  One only thing is not so clear to me: did you adopted LSTM only with 256 blocks? Why not trying with increasing numbers, e.g. 512 and 1024 blocks?
The EXPERIMENT & RESULTS section is clearly presented.
DISCUSSION: what kind of significance test did you use? ? In such a paper it should be clearly mentioned.
The final part of the Discussion section and the CONCLUSION one should be improved. The reader get high expectation from the very beginning (Abstract) of your contribution, but all in all the only original claim is that the authors "assume a [the] gap? is small enough to be eliminated with more training data" [as quoted at the end of the abstract]. "...we plan to eliminate this gap...." [as quoted at the very end of the paper].
How? My suggestion to improve your paper is to anticipate here a clearer plan of eliminate the gap.  You should give the idea that you are sure that it is only a matter of dimension of your training data...but the reader could wonder if it could be an issue related either to input encoding or to network implementation.

Author Response

Our response to your comments:
1.In this paper we are using the deep learning approaches not because we expect some advantages over traditional, but because it is an innovative technique effective for languages as English. It is very important to reveal how far we can go with the current resources for the Lithuanian language and how much effort we still need in the future to outperform the traditional approaches for the Lithuanian or even achieve similar accuracy levels as it is for English. We clarified it in the Abstract and Discussion sections.
2.Thank you for the paper recommendations. The first paper is about plugging in external lexical sentiment resources. Unfortunately they do not exist for the Lithuanian language, but we agree that it is important to mention this fact (explanation is given in pages 2, 3 and 12). We added the first paper as ref. [15]. The second recommended paper is about image sentiment classification and does not fit into our scope.
3.The size of LSTM was limited to 256 blocks, because most of the texts in the dataset were short (less than 256 words) (we mentioned it in Section 2.4).
4.To evaluate if the results are statistically significant we use McNemar test (described in Section 3).
5.We clarified the discussion and conclusions part. In the future research we are planning both: improve the parameters of the deep learning approaches and increase the training data (clarified in Section 5).
Thank you for contributing to the improvement of our paper.

Round 2

Reviewer 1 Report

The paper has been improved its quality. Nevertheless, SentiWordNet 3.0 has a good reference better than use footnotes. Only as a suggestion for the final version:

Baccianella, S., Esuli, A., & Sebastiani, F. (2010, May). Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In Lrec (Vol. 10, No. 2010, pp. 2200-2204).


Reviewer 2 Report

 By “mutual agreement” we mean that the annotation was done independently and only texts that obtained the same polarity values were included into the dataset (we clarified it in Section 2.2.).

The problem here is that the data sample is intentionally biased. The sample is not representative of the population or of the inherent complexity of the dataset. By selecting samples where there is agreement, you are removing samples that are ambiguous, which weakens the entire (sampled) dataset. The sampled dataset is biased to these two human annotators. What if you had 10 human annotators? Would the sample set be mutual agreement between all 10 annotators? Such a severe mutual agreement constraint might whittle the sample to a very small size. So you would select another scheme, which is less restrictive. Then you should use that same scheme for two annotators, ie. the statistically meaningful sampling strategy should in principle be independent of the number of human annotators. Note that disagreement between annotators does not make that data sample an outlier so it cannot be used as justification for pruning.

The other big statistical issue is that performance on a biased sample is not a reliable indicator of performance on the population.

Overall, I would say that dataset collection is an expensive process and it is entirely expected that seminal datasets for any topic will be flawed. What makes the experiments using this dataset meaningful is the quantification of the statistical pros and cons of the dataset.

Datasets and algorithms improve symbiotically over time. It is unfortunate that so much stress is placed on quantification of the quality of algorithms while little to no analysis is done on the dataset. 

2. We completely agree that the difference of the Lithuanian language specifics (and non-normative dataset specifics) compared to the English language must to be highlighted. We added the explanation in Section 2.2.

Good points added.

3. The comparison of the sentiment analysis results in the similar languages (morphologically complex and facing the same diacritics problem) is presented in Section 4. Unfortunately the deep learning approaches are not popular for the sentiment analysis task for such languages, probably because so far they cannot outperform the traditional machine learning approaches.

I liked this discussion. While it is up to the authors, I feel that some of Discussion in section 4 could be placed earlier in the paper. Unfortunately, the popularity of deep learning makes any reader simply ignore what traditional machine learning methods achieve and simply wonder why deep learning methods are not center stage in this paper. With the benefit of the insight from this discussion a potential reader might then have greater motivation to read the rest of the paper.

4. More detailed explanation about the experiments is presented in Section 3.

Good

Thank you for contributing to the improvement of our paper.

You are welcome.


While I'll recommend the paper be accepted in its current form. There are multiple instances of typos that the authors should correct and proofread. It is in their own best interests to do so.

For example:

#118: arrow symbol

# 287,288 : latex subtext missing


Best wishes to the authors for future research in this topic. I am quite interested in how Transfer Learning and Fine Tuning could be employed in Sentiment Analysis to perhaps:

(1) Apply sentiment analysis across dialects of a language

(2) Utilize pre-trained models in a popular language to less popular languages (which has less training data available)

Back to TopTop