Evaluating Richer Features and Varied Machine Learning Models for Subjectivity Classification of Book Review Sentences in Portuguese
Abstract
:1. Introduction
2. Related Work
2.1. Previous Attempts on Subjectivity Classification for Portuguese
2.2. Rhetorical Structure Theory
3. The Corpus
4. The Methods
4.1. Lexicon-Based Method
4.2. Graph-Based Method
4.3. Machine Learning-Based Methods
4.4. Enriched Machine Learning-Based Methods
5. Results and Discussion
5.1. The Methods of Belisário et al. (2020)
5.2. Evaluating the Feature Sets
- From the lexical features: the proportion of adverbs of negation and intensity, and the presence of exclamation point in the sentence;
- From the centrality-based features: the Eigenvector centrality of the subjective graph;
- From the discourse features: the presence of some specific discourse relations (antithesis, cause, circumstance, background, comparison, contrast, condition, restatement and disjunction).
5.3. Analysis of Acquired Knowledge
6. Final Remarks
Author Contributions
Funding
Conflicts of Interest
References
- Liu, B. Sentiment Analysis and Opinion Mining. Synth. Lect. Hum. Lang. Technol. 2012, 5, 1–167. [Google Scholar] [CrossRef] [Green Version]
- Moraes, S.M.W.; Santos, A.L.L.; Redecker, M.; Machado, R.M.; Meneguzzi, F.R. Comparing Approaches to Subjectivity Classification: A Study on Portuguese Tweets. In Proceedings of the International Conference on the Computational Processing of the Portuguese Language (PROPOR), Tomar, Portugal, 13–15 July 2016; pp. 86–94. [Google Scholar]
- Bertaglia, T.F.P.; Nunes, M.G.V. Exploring Word Embeddings for Unsupervised Textual User-Generated Content Normalization. In Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), Osaka, Japan, 11 December 2016; pp. 112–120. [Google Scholar]
- Avanco, L.V.; Nunes, M.D.G.V. Lexicon-Based Sentiment Analysis for Reviews of Products in Brazilian Portuguese. In Proceedings of the 2014 Brazilian Conference on Intelligent Systems (BRACIS), São Carlos, Brazil, 18–23 October 2014; pp. 277–281. [Google Scholar]
- Condori, R.E.L.; Pardo, T.A.S. Opinion summarization methods: Comparing and extending extractive and abstractive approaches. Expert Syst. Appl. 2017, 78, 124–134. [Google Scholar] [CrossRef]
- Belisário, L.B.; Ferreira, L.G.; Pardo, T.A.S. Evaluating Methods of Different Paradigms for Subjectivity Classification in Portuguese. In Proceedings of the 14th International Conference on the Computational Processing of Portuguese (PROPOR), Evora, Portugal, 2–4 March 2020; pp. 261–269. [Google Scholar]
- Mann, W.; Thompson, S. Rhetorical Structure Theory: A Theory of Text Organization; Technical Report ISI/RS-87-190; Information Science Institute, University of Southern California: Los Angeles, CA, USA, 1987. [Google Scholar]
- Vargas, F.A.; Pardo, T.A.S. Hierarchical clustering of aspects for opinion mining: A corpus study. In Linguística de Corpus: Perspectivas; Finatto, M.J.B., Rebechi, R.R., Sarmento, S., Bocorny, A.E.P., Eds.; Instituto de Letras da UFRGS: Porto Alegre, Brazil, 2018; pp. 69–91. [Google Scholar]
- Vilarinho, G.N.; Ruiz, E.E.S. Global centrality measures in word graphs for Twitter sentiment analysis. In Proceedings of the 7th Brazilian Conference on Intelligent Systems (BRACIS), Sao Paulo, Brazil, 22–25 October 2018; pp. 55–60. [Google Scholar]
- Zhao, H.; Lu, Z.; Poupart, P. Self-Adaptive Hierarchical Sentence Model. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, Argentina, 25–31 July 2015; pp. 4069–4076. [Google Scholar]
- Yu, H.; Hatzivassiloglou, V. Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Sapporo, Japan, 11–12 July 2003; pp. 129–136. [Google Scholar]
- Wiebe, J.; Bruce, R.; O’Hara, T. Development and use of a gold standard data set for subjectivity classifications. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL), College Park, MD, USA, 22–26 June 1999; pp. 246–253. [Google Scholar]
- O’Donnell, M. RSTTool 2.4—A Markup Tool for Rhetorical Structure Theory. In Proceedings of the International Natural Language Generation Conference (INLG), Mitzpe Ramon, Israel, 12–16 June 2000; pp. 253–256. [Google Scholar]
- Chenlo, J.M.; Hogenboom, A.; Losada, D.E. Rhetorical Structure Theory for polarity estimation: An experimental study. Data Knowl. Eng. 2014, 94, 135–147. [Google Scholar] [CrossRef]
- Freitas, C.; Motta, E.; Milidiú, R.; Cesar, J. Vampiro que brilha... rá! Desafios na anotação de opinião em um corpus de resenhas de livros. In Proceedings of the XI Encontro de Linguística de Corpus (ELC), São Carlos, Brazil, 13–15 September 2012; pp. 1–12. [Google Scholar]
- Carvalho, P.; Silva, M.J. Sentilex-PT: Principais Características e Potencialidades. Oslo Stud. Lang. 2015, 7, 425–438. [Google Scholar] [CrossRef]
- Pasqualotti, P.R.; Vieira, R. WordnetAffectBR: Uma base lexical de palavras de emoções para a língua portuguesa. Rev. Novas Tecnol. Educ. 2008, 6, 1–10. [Google Scholar] [CrossRef] [Green Version]
- Yang, J.; Liu, Y.; Zhu, X.; Liu, Z.; Zhang, X. A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization. Inf. Process. Manag. 2012, 48, 741–754. [Google Scholar] [CrossRef]
- Mikolov, T.; Corrado, G.; Chen, K.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Hartmann, N.S.; Avanço, L.; Balage, P.P.; Duran, M.S.; Nunes, M.G.V.; Pardo, T.; Aluísio, S. A Large Opinion Corpus in Portuguese—Tackling Out-Of-Vocabulary Words. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC), Reykjavik, Iceland, 26–31 May 2014; pp. 3865–3871. [Google Scholar]
- Witten, I.H.; Frank, E.; Hall, M.A.; Pal, C.J. Data Mining: Practical Machine Learning Tools and Techniques; Morgan Kaufmann: Cambridge, MA, USA, 2016. [Google Scholar]
- Maziero, E.G.; Hirst, G.; Pardo, T.A.S. Adaptation of Discourse Parsing Models for Portuguese Language. In Proceedings of the 4th Brazilian Conference on Intelligent Systems (BRACIS), Natal, Brazil, 4–7 November 2015; pp. 140–145. [Google Scholar]
- Maziero, E.G.; Hirst, G.; Pardo, T.A.S. Semi-Supervised Never-Ending Learning in Rhetorical Relation Identification. In Proceedings of the Recent Advances in Natural Language Processing (RANLP), Hissar, Bulgaria, 7–9 September 2015; pp. 436–442. [Google Scholar]
- Barzilay, R.; Lapata, M. Modelling local coherence: An entity-based approach. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), Ann Arbor, MI, USA, 25–30 June 2005; pp. 141–148. [Google Scholar]
- Martin, J.R.; White, P.R.R. The Language of Evaluation: Appraisal in English; AIAA: London, UK, 2005. [Google Scholar]
Sentence | Polarity |
---|---|
Alguem me indica marcas boas de notebook? (Can anyone suggest me good notebook brands?) | Objective |
Esse Not é baum? Alguem sabe? (Is this Note good? Anybody know?) | Objective |
Sem notebook de novo… Parece brincadeira de mau gosto (No notebook again... It seems like a bad joke) | Subjective/Negative |
Logo um precioso notebook Dell lindíssimo chega aqui em casa tô mt feliz (Soon a precious Dell notebook arrives here at home I’m very happy) | Subjective/Positive |
Nucleus-Satellite Relations | Multinuclear Relations | |||
---|---|---|---|---|
antithesis | elaboration | motivation | summary | conjunction |
attribution | enablement | non-volitional cause | unconditional | contrast |
background | evaluation | non-volitional result | unless | disjunction |
circumstance | evidence | otherwise | volitional cause | joint |
comparison | explanation | parenthetical | volitional result | list |
concession | interpretation | preparation | restatement | |
conclusion | justify | purpose | same-unit | |
condition | means | solutionhood | sequence |
Characteristic | Class | Value |
---|---|---|
Number of sentences | Objective | 175 |
Subjective | 175 | |
Average number of segments per sentence | Objective | 2.91 |
Subjective | 2.83 | |
Both | 2.87 |
Class | Relations |
---|---|
Objective | elaboration (105), list (74), same-unit (58), circumstance (37), sequence (34), non-volitional cause (17), purpose (13), justify (13), contrast (10), parenthetical (10), joint (8), non-volitional result (7), explanation (7), concession (7), attribution (4), means (4), conclusion (2), preparation (2), motivation (1), interpretation (1), antithesis (1), evidence (1), volitional cause (1), condition (1), summary (1) |
Subjective | list (65), elaboration (59), justify (51), concession (35), same-unit (27), contrast (26), attribution (16), circumstance (14), evaluation (12), non-volitional result (9), joint (8), sequence (7), means (7), non-volitional cause (6), explanation (5), purpose (4), comparison (4), otherwise (4), conclusion (3), condition (3), preparation (3), parenthetical (3), antithesis (2), restatement (2), motivation (1), interpretation (1), evidence (1), background (1), summary (1) |
Measures | Lexicon-Based Methods | Graph-Based Method | Machine-Learning-Based Methods | |||
---|---|---|---|---|---|---|
Sentilex-PT | WordnetAffectBR | NB | SVM | Neural Network | ||
Precision (objective) | 0.490 | 0.518 | 0.545 | 0.759 | 0.782 | 0.806 |
Recall (objective) | 0.600 | 0.931 | 0.723 | 0.736 | 0.83 | 0.863 |
F-measure (objective) | 0.539 | 0.665 | 0.652 | 0.747 | 0.805 | 0.831 |
Precision (subjective) | 0.524 | 0.730 | 0.674 | 0.763 | 0.831 | 0.865 |
Recall (subjective) | 0.413 | 0.181 | 0.536 | 0.782 | 0.783 | 0.804 |
F-measure (subjective) | 0.461 | 0.288 | 0.596 | 0.772 | 0.806 | 0.832 |
Overall accuracy | 0.504 | 0.545 | 0.627 | 0.761 | 0.806 | 0.832 |
Category | Method | Class | Precision | Recall | F-Measure | Accuracy | Accuracy |
---|---|---|---|---|---|---|---|
Rules | OneR | Subjective | 0.656 | 0.457 | 0.539 | 0.457 | 0.609 |
Objective | 0.583 | 0.760 | 0.660 | 0.760 | |||
PART | Subjective | 0.658 | 0.726 | 0.690 | 0.726 | 0.674 | |
Objective | 0.694 | 0.623 | 0.657 | 0.623 | |||
Trees | J48 | Subjective | 0.658 | 0.726 | 0.690 | 0.726 | 0.674 |
Objective | 0.694 | 0.623 | 0.657 | 0.623 | |||
Bayes | NaiveBayes | Subjective | 0.669 | 0.680 | 0.674 | 0.680 | 0.671 |
Objective | 0.674 | 0.663 | 0.669 | 0.663 | |||
Functions | SMO | Subjective | 0.660 | 0.727 | 0.691 | 0.728 | 0.676 |
Objective | 0.694 | 0.623 | 0.656 | 0.623 | |||
MultilayerPerceptron | Subjective | 0.651 | 0.640 | 0.646 | 0.640 | 0.649 | |
Objective | 0.646 | 0.657 | 0.652 | 0.657 |
Category | Method | Class | Precision | Recall | F-Measure | Accuracy | Accuracy |
---|---|---|---|---|---|---|---|
Rules | OneR | Subjective | 0.531 | 0.537 | 0.534 | 0.537 | 0.531 |
Objective | 0.532 | 0.526 | 0.529 | 0.526 | |||
PART | Subjective | 0.631 | 0.566 | 0.596 | 0.566 | 0.617 | |
Objective | 0.606 | 0.669 | 0.636 | 0.669 | |||
Trees | RandomForest | Subjective | 0.747 | 0.691 | 0.718 | 0.691 | 0.729 |
Objective | 0.713 | 0.766 | 0.738 | 0.766 | |||
Bayes | NaiveBayes | Subjective | 0.626 | 0.440 | 0.517 | 0.440 | 0.589 |
Objective | 0.568 | 0.737 | 0.642 | 0.737 | |||
Functions | LibLINEAR | Subjective | 0.735 | 0.697 | 0.716 | 0.697 | 0.723 |
Objective | 0.712 | 0.749 | 0.730 | 0.749 | |||
MultilayerPerceptron | Subjective | 0.744 | 0.766 | 0.755 | 0.766 | 0.751 | |
Objective | 0.759 | 0.737 | 0.748 | 0.737 |
Category | Method | Class | Precision | Recall | F-Measure | Accuracy | Accuracy |
---|---|---|---|---|---|---|---|
Rules | OneR | Subjective | 0.794 | 0.286 | 0.420 | 0.286 | 0.806 |
Objective | 0.564 | 0.926 | 0.701 | 0.926 | |||
PART | Subjective | 0.658 | 0.760 | 0.706 | 0.760 | 0.683 | |
Objective | 0.716 | 0.606 | 0.656 | 0.606 | |||
Trees | J48 | Subjective | 0.701 | 0.709 | 0.705 | 0.709 | 0.703 |
Objective | 0.705 | 0.697 | 0.701 | 0.697 | |||
Bayes | NaiveBayes | Subjective | 0.707 | 0.634 | 0.669 | 0.634 | 0.686 |
Objective | 0.668 | 0.737 | 0.701 | 0.737 | |||
Functions | SMO | Subjective | 0.723 | 0.566 | 0.635 | 0.566 | 0.674 |
Objective | 0.643 | 0.783 | 0.706 | 0.783 | |||
MultilayerPerceptron | Subjective | 0.692 | 0.669 | 0.680 | 0.669 | 0.686 | |
Objective | 0.680 | 0.703 | 0.691 | 0.703 |
Category | Method | Class | Precision | Recall | F-Measure | Accuracy | Accuracy |
---|---|---|---|---|---|---|---|
Rules | OneR | Subjective | 0.531 | 0.537 | 0.534 | 0.537 | 0.531 |
Objective | 0.532 | 0.526 | 0.529 | 0.526 | |||
PART | Subjective | 0.712 | 0.663 | 0.686 | 0.663 | 0.697 | |
Objective | 0.684 | 0.731 | 0.707 | 0.731 | |||
Trees | RandomForest | Subjective | 0.797 | 0.720 | 0.757 | 0.720 | 0.769 |
Objective | 0.745 | 0.817 | 0.779 | 0.817 | |||
Bayes | NaiveBayes | Subjective | 0.755 | 0.669 | 0.709 | 0.669 | 0.726 |
Objective | 0.703 | 0.783 | 0.741 | 0.783 | |||
Functions | LibLINEAR | Subjective | 0.781 | 0.754 | 0.767 | 0.754 | 0.771 |
Objective | 0.762 | 0.789 | 0.775 | 0.789 | |||
MultilayerPerceptron | Subjective | 0.744 | 0.714 | 0.729 | 0.714 | 0.734 | |
Objective | 0.725 | 0.754 | 0.739 | 0.754 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Belisário, L.B.; Ferreira, L.G.; Pardo, T.A.S. Evaluating Richer Features and Varied Machine Learning Models for Subjectivity Classification of Book Review Sentences in Portuguese. Information 2020, 11, 437. https://doi.org/10.3390/info11090437
Belisário LB, Ferreira LG, Pardo TAS. Evaluating Richer Features and Varied Machine Learning Models for Subjectivity Classification of Book Review Sentences in Portuguese. Information. 2020; 11(9):437. https://doi.org/10.3390/info11090437
Chicago/Turabian StyleBelisário, Luana Balador, Luiz Gabriel Ferreira, and Thiago Alexandre Salgueiro Pardo. 2020. "Evaluating Richer Features and Varied Machine Learning Models for Subjectivity Classification of Book Review Sentences in Portuguese" Information 11, no. 9: 437. https://doi.org/10.3390/info11090437
APA StyleBelisário, L. B., Ferreira, L. G., & Pardo, T. A. S. (2020). Evaluating Richer Features and Varied Machine Learning Models for Subjectivity Classification of Book Review Sentences in Portuguese. Information, 11(9), 437. https://doi.org/10.3390/info11090437