Analyzing the Effect of Negation in Sentiment Polarity of Facebook Dialectal Arabic Text
Abstract
:1. Introduction
2. Background
2.1. Naïve Bayes-Based Approach
2.2. Valence Shifters
- Inverters: These are words that change the polarity of a word from positive to negative and vice versa, for example, “not”, “never”, “none”, “neither ”. Saying “not bad at all” changes the polarity from negative to positive; however, saying “not good” changes the polarity from positive to negative.
- Intensifiers can amplify the sentiment, for example: “badly” in “badly injured”; others make the meaning of the word weak, such as “slightly” in “slightly interested”. Their impact is not consistent in the language context; their position can significantly influence their effect.
- Modals: words which may be used to assume future consequences, hence they convey a probability of an even happening. Words related to opinions may not give normal behavior, such as “If Marwan were lazy, he would fail in their exams”. The words “lazy” and “fail” are negative; however, “would” and “was” affected the sentiment of the negative words. Thus, the sentence sentiment is that Marwan is not lazy not likely to fail in the exam.
- Pre-suppositional Items: these words can affect the valence when an event does not meet the confident expectation. For example, the word “almost” in “he almost passed” implies that the person did not pass. The word “passed” conveys a positive sentiment but “almost” changed the sentiment to a negative one. This also applies to “barely” in “the water was barely enough”. In both examples, the pre-suppositional items negatively shifted the neutral and positive sentiment. Some parts of speech may affect the sentiment in a similar way, such as saying “failed to pass” or “impossible to enjoy”.
- Irony is the last category of sentence valence shifters we consider here. Context plays a significant role in its recognition and impact, to the extent that readers may not consistently identify an invention of sentiment. The irony arises when some words are used to intensify the sentiment but flip the polarity. The word “genius” is a positive word that can give a negative sentiment such as “the genius student did not solve the exercise”.
- Particular conjunctions that combine clauses such as “but”, “although”, and “however” can affect words that show opinions within their range. For example, “mean” is a negative word, but when used in “Although he is mean, he treats birds in a nice way,” it has a positive meaning.
- Discourse structure: Sentences may consist of two parts: dominant and illustrative. The illustrative part supports the dominant part such that when the dominant part is opinionated, the illustrative part intensifies its sentiment. Consider the sentence “He is a great hunter. He caught five animals today”. The first part of the sentence contains the word “great,” so it is positive and opinionated. However, the second part of the sentence is neutral and intensifies the first sentence.
- Reported speech: some sentences like “he said that the episode is nice” do not mean that the author agrees on the positive sentiment. However, if it is “he said that the episode is nice and I totally agree” is supported by a phrase to give positive sentiment. Thus, “I agree” and “I do not agree” are keywords that can be used in reported speech and may change the sentiment of a sentence.
3. Materials and Methods
- 1.
- Tokenization: When inverters with scope one are the adequate form some sentiment analysis. Our corpora contradicted this norm because spelling rules were not followed in DA on social media. Consider the negated pattern ماتهزأ; ideally, space should separate the subject from the negative pattern. As there is no space, this phrase will not be properly tokenized. Thus, it will not be treated in a proper way. Unluckily, improper tokenization is frequently available in DA. One solution would be to search for inverters prior to tokenization and thus split the negation from the subject. This approach, however, would raise other challenges at the tokenization stage. For example, legitimate cases where a negation may be falsely identified based on lexical matching. Consider the word مشروعة (legal): negation sensitive tokenization might result in a classifier detecting روعة as a positive pattern, preceded by an inverter مش, an incorrect inversion. This process is challenging when an invertor is a prefix. For example, the verb عجبه (liked him), can be negated by adding the letter م (M) as a prefix to the word, leading to معجبه, but the exact word has another meaning of admirer or fan which is a positive meaning.Another example would be محب, the word may mean lover or did not love at the same time depending on context. In MSA or with diacritized text, the problem can be resolved. However, since DA cannot be diacritized, other solutions are needed. Smarter tokenization may help eliminate such cases by splitting prefixes and checking for the legitimacy of the remaining string as a standalone Arabic word. For example, مشروعة, will be split into مش روعة but after checking the words before it, splitting the word may be disregarded if it turned out the مشروعة is an adjective to what comes before it. However, such “solutions” have their own problems of additional complexity and still legitimate cases by wrongly interpreted. For example, the letter م is an inverter, yet there are plenty of words that start with this letter without being inverted.
- 2.
- Fake Inverters: The strings representing inverters have other usages not related to negation. Consider the phrase ما أحلاها (how beautiful she is): the phrase consists of a string used usually for negation followed by a positive word. If MSA was used, searching for patterns that belong to the morphological group used usually for comparison أفعل التفضيل would resolve the ambiguity. However, since such tools do not exist for DA, and if we assumed that they are available, the spelling irregularities would hinder the usage of such tools. For example, ما ئحلى (how beautiful) is a positive pattern that is usually used to praise the beauty of an object, yet this pattern is not written أ (A) as it should be. What complicates the problem is when the exact “fake” negation cases appear in legitimate negation scenarios. Consider the phrase بلا احلى صووووووووت بلا نيله at (not a beautiful voice at all). This is a positive pattern that is preceded by an inverter that makes it negative.Another example is مش أحلى صوت (not the most beautiful voice). The positive sentence is preceded by an inverter that makes it negative. There is an important observation that many targets of the fake inverters start with the letter أ (A). However, this alone cannot be considered as a rule because, in other cases, a real inverter changes the polarity of a sentence starting with the same letter. One way to reduce the number of misleading cases is to filter targets consisting of 4 letters (when pronouns are not used as suffixes such as ما أجملها) and patterns consisting of all spelling variants of the letter أ such as ئ، ء، أ، ا، إ،آ since these are considered as fake targets. The patterns that fail to satisfy these rules may not be targets of fake inverters. However, those who satisfy the rules can be fake or legitimate or inverter’s targets.
- 3.
- Odd Negation: Although real inverters usually flip the polarity of sentimental targets, there are cases when this is not true. Consider the phrase ما تسب (do not curse). Although the negative word curse comes after a real inverter, the phrase is still negative after negation. The target in these cases has the exact properties as other patterns when negation is valid, i.e., flipping polarity. For example, the phrase ما تزعل (do not be sad) has the same POS-features as the previous example because the two verbs are in the present tense). Furthermore, they have the same semantic features because both are negative patterns. Furthermore, they have the same syntactic features because both patterns come after the same inverter. Finally, they are representing orders; however, the overall outcome is still different. These cases make the aforementioned modified algorithm is prone to error. Work on this issue is still ongoing.
- 4.
- Implicit Negation: The sentiment of a negated pattern can be reversed by a dependent clause. Consider the phrase: ما به عيب سوى عبادة الاصنام (he would have flawless if he did not worship statues). In other words, the pattern “flawless” is implicitly negated because the overall sentence implies that “he has flaws”. The first segment of the sentence is positive, but adding neutral phrases converts it to negative. MSA can easily detect these cases because few words imply exclusion, i.e., words used to show how something would change under a certain condition. In English, we can say, “It would have been flawless if it was red”, which means that an object has flaws and can become flawless if a particular condition is met. In MSA three common words are used for this purpose لو، لولا، سوى، إلا، إنما, the example below is illustrated:
- لو أنه سمع النصيحة، لكان من السعداء (if he had listened to the advice, he would have been happy now)
- لولا التعب، لكان العمل ممتعا (work would have been fun if we don’t get tired)
- لن ينجح سوى المجتهد (only the hard worker will succeed)
- لن ينجح إلا المجتهد (only the hard worker will succeed)
- إنما طلبت الأخضر (I only asked for the green one)
However, not all these cases are applicable in DA, especially where spelling variants of these words can be used, use them without proper tokenization, or the same pattern can be used to express different meaning, such as شو بدك بهاشغلة ولو, in this phrase, the word لو is not used for exclusion, such cases are challenging. - 5.
- Neutral Targets: In addition to their ability to flip the polarity of sentimental targets, inverters may act on neutral targets to produce a sentimental phrase. Consider the example لا صوت ولا صورة (no voice, no picture) the two patterns voice and picture are neutral, but when the inverter لا (La), comes before it, negative sentiment will be given to the negated phrase. It is complex to detect these cases because usually when a neutral target is negated, the output is a neutral phrase. The neutral patterns mentioned earlier cannot be used by themselves to express a positive sentiment, i.e., saying صوت وصورة is not used a positive phrase. Another example would be مش ناقص, the pattern ناقص (missing) is considered neutral in DA since it does not express a sentiment as a stand-alone pattern; however, when preceded by the inverter, the phrase will express as a negative sentiment. Moreover, the same pattern ناقص can be used as a negative pattern in MSA as in ناقص العقل, and if an inverter comes before it in MSA, the overall sentiment becomes positive.
4. Results
Algorithm 1: The pseudocode of the negation algorithm. |
Result: Pattern polarity inverted for each inverter. Input: The Unclassified Corpus P; Output: The Classified Corpus P; SpamCount = 0; NegCount = 0; PosCount = 0; |
Algorithm 1:Cont. |
- tp: correct result and classified as being correct.
- fn: correct result missed by the classifier.
- fp: negative result that is incorrectly classified by the classifier as positive.
5. Discussion
- 1.
- Frequency of negation in posts: In CA-SN, the number of positive and negative targets affected by inverters is 229% more than those of CN-SA, this means that as the frequency of presence of inverters in posts increases, the number of incorrectly classified posts will increase and therefore a solution like ours will be needed to resolve such cases. In other words, the performance improvement will be appreciated when there is a significant part of posts containing inverters.
- 2.
- Increase in the number of patterns: as mentioned earlier, splitting inverters from their targets led to increasing the set of patterns without losing the original one. The number of patterns in SN increased by 4% while that of SA decreased by 3%. The increase in the number of patterns in SN led to higher performance when classifying SN. In contrast, a decrease in the number of patterns of SA did not lead to a decrease in performance since, as we mentioned earlier, when splitting negated phrase and adding the target to its corresponding set, the algorithm will check the presence of any inverter before each pattern, and therefore the effect of the original negated phrase will be conserved. However, the decrease in the number of patterns was because the newly found patterns (after splitting) were already part of SA, so the set lost the negated phrases as a pattern and did not gain new ones. This did not affect performance.
6. Conclusions
- 1.
- Odd negation: More investigation is needed to find rules that can determine which patterns will flip their polarity when preceded by inverters.
- 2.
- Fake inverters: There are cases when applying the inverter without considering the nature of the targets will lead to incorrect classification. Such cases should be excluded when inverting the polarity of targets preceded by inverters.
- 3.
- Complex Negation: The Arabic language is complex by its nature, so there are many cases where negation is not trivially treated. Such cases include using exclusion words mentioned earlier.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kumar, K.N.; Uma, V. Intelligent sentinet-based lexicon for context-aware sentiment analysis: Optimized neural network for sentiment classification on social media. J. Supercomput. 2021, 1–25. [Google Scholar] [CrossRef]
- Jindal, K.; Aron, R. A systematic study of sentiment analysis for social media data. Mater. Today Proc. 2021, in press. [Google Scholar] [CrossRef]
- Abd El-Jawad, M.H.; Hodhod, R.; Omar, Y.M.K. Sentiment analysis of social media networks using machine learning. In Proceedings of the 14th International Computer Engineering Conference (ICENCO), Cairo, Egypt, 29–30 December 2018; pp. 174–176. [Google Scholar]
- Sangam, S.; Shinde, S. Sentiment classification of social media reviews using an ensemble classifier. Indones. J. Electr. Eng. Comput. Sci. IJEECS 2019, 16, 355–363. [Google Scholar] [CrossRef]
- Oueslati, O.; Cambria, E.; HajHmida, M.B.; Ounelli, H. A review of sentiment analysis research in Arabic language. Future Gener. Comput. Syst. 2020, 112, 408–430. [Google Scholar] [CrossRef]
- Hamouda, A.A.; El-taher, F.Z. Sentiment analyzer for arabic comments system. Int. J. Adv. Comput. Sci. Appl. 2013, 4. [Google Scholar] [CrossRef] [Green Version]
- Gamal, D.; Alfonse, M.; El-Horbaty, E.S.; Salem, A.B. Twitter benchmark dataset for Arabic sentiment analysis. Int. J. Mod. Educ. Comput. Sci. 2019, 11, 33. [Google Scholar] [CrossRef] [Green Version]
- Yu, L.S.; Al Baadani, S. A sentiment analysis approach based on Arabic social media platforms. DEStech Trans. Eng. Technol. Res. ICMEIT 2018. [Google Scholar] [CrossRef]
- Al-Laith, A.; Shahbaz, M. Tracking sentiment towards news entities from Arabic news on social media. Future Gener. Comput. Syst. 2021, 118, 467–484. [Google Scholar] [CrossRef]
- Ramanathan, V.; Meyyappan, T. Twitter text mining for sentiment analysis on people’s feedback about Oman tourism. In Proceedings of the 4th MEC International Conference on Big Data and Smart City (ICBDSC), Muscat, Oman, 15–16 January 2019; pp. 1–5. [Google Scholar]
- Assiri, A.; Emam, A.; Al-Dossari, H. Towards enhancement of a lexicon-based approach for Saudi dialect sentiment analysis. J. Inf. Sci. 2018, 44, 184–202. [Google Scholar] [CrossRef]
- Itani, M.; Hamandi, L.; Zantout, R.; Elkabani, I. Classifying sentiment in Arabic social networks: Naïve Search versus Naïve Bayes. In Proceedings of the 2nd International Conference on Advances in Computational Tools for Engineering Applications (ACTEA 2012), Beirut, Lebanon, 12–15 December 2012. [Google Scholar]
- Polanyi, L.; Zaenen, A. Contextual valence shifters. In Computing Attitude and Affect in Text: Theory and Applications; Springer: Dordrecht, The Netherlands, 2005; pp. 1–10. [Google Scholar]
- Zahidi, Y.; Younoussi, Y.E.; Yassine, A.A. Arabic sentiment analysis problems and challenges. In Proceedings of the X International Conference on Virtual Campus (JICV), Tetouan, Morocco, 3–5 December 2020; pp. 1–4. [Google Scholar]
- El-Halees, A. Arabic opinion mining using combined classification approach. In Proceedings of the International Arab Conference on Information Technology (ACIT), Riyadh, Saudi Arabia, 11–14 December 2011. [Google Scholar]
- Jia, L.; Yu, C.; Meng, W. The effect of negation on sentiment analysis and retrieval effectiveness. In Proceedings of the 8th International Conference on Information and Knowledge Management, Hong Kong, China, 10–11 July 2009. [Google Scholar]
- Al-Khawaldeh, F.T. A study of the effect of resolving negation and sentiment analysis in recognizing text entailment for Arabic. arXiv 2019, arXiv:1907.03871. [Google Scholar]
- Maynard, D.; Bontcheva, K.; Rout, D. Challenges in developing opinion mining tools for social media. In Proceedings of the @NLP Can u Tag# Usergeneratedcontent, Istanbul, Turkey, 26 May 2012; pp. 15–22. [Google Scholar]
- Taboada, M.; Brooke, J.; Tofiloski, M.; Voll, K.; Stede, M. Lexicon-based methods for sentiment analysis. Comput. Linguist. 2011, 37, 267–307. [Google Scholar] [CrossRef]
- Farra, N.; Challita, E.; Abou Assi, R.; Hajj, H. Sentence-level and document-level sentiment mining for arabic texts. In Proceedings of the IEEE international Conference on Data Mining Workshops, Sydney, NSW, Australia, 13–17 December 2010; pp. 1114–1119. [Google Scholar]
- Shoukry, A.M. Arabic Sentence Level Sentiment Analysis. Master’s Thesis, The American University in Cairo, Cairo, Egypt, 2013. [Google Scholar]
- Itani, M.; Chris, R.; Al-Khayatt, S. Corpora for sentiment analysis of Arabic text in social media. In Proceedings of the 8th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 4–6 April 2017. [Google Scholar]
- Al Arabiya. Available online: http://www.facebook.com/AlArabiya (accessed on 1 January 2012).
- MBC The Voice. Available online: http://www.facebook.com/MBCTheVoice (accessed on 1 January 2012).
- Carta, S.; Podda, A.S.; Recupero, D.R.; Saia, R.; Usai, G. Popularity Prediction of Instagram Posts. Information 2020, 11, 453. [Google Scholar] [CrossRef]
- Khattak, A.M.; Batool, R.; Satti, F.A.; Hussain, J.; Khan, W.A.; Khan, A.M.; Hayat, B. Tweets Classification and Sentiment Analysis for Personalized Tweets Recommendation. Complexity 2020, 2020, 8892552. [Google Scholar] [CrossRef]
MSA Inverters | DA Inverters | Translation |
---|---|---|
لا | مفي | does/do not |
لم | مافي | does/do not |
لما | منو | without |
لات | ماكو | there is no |
لن | مو | There will not be |
بلا | بلاش | without |
ليس | مش | No |
من دون | مانو | without |
بدون | م | without |
Arts Corpus | News Corpus | |
---|---|---|
Negative | 224 | 230 |
Positive | 233 | 236 |
Dual | 151 | 161 |
Spam | 197 | 193 |
Neutral | 195 | 180 |
Before Treating Negation | After Treating Negation | |
---|---|---|
CA—SN | 0.73 | 0.93 |
CN—SA | 0.73 | 0.73 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kaddoura, S.; Itani, M.; Roast, C. Analyzing the Effect of Negation in Sentiment Polarity of Facebook Dialectal Arabic Text. Appl. Sci. 2021, 11, 4768. https://doi.org/10.3390/app11114768
Kaddoura S, Itani M, Roast C. Analyzing the Effect of Negation in Sentiment Polarity of Facebook Dialectal Arabic Text. Applied Sciences. 2021; 11(11):4768. https://doi.org/10.3390/app11114768
Chicago/Turabian StyleKaddoura, Sanaa, Maher Itani, and Chris Roast. 2021. "Analyzing the Effect of Negation in Sentiment Polarity of Facebook Dialectal Arabic Text" Applied Sciences 11, no. 11: 4768. https://doi.org/10.3390/app11114768
APA StyleKaddoura, S., Itani, M., & Roast, C. (2021). Analyzing the Effect of Negation in Sentiment Polarity of Facebook Dialectal Arabic Text. Applied Sciences, 11(11), 4768. https://doi.org/10.3390/app11114768