Next Article in Journal
Statistical Analysis of Nanofiber Mat AFM Images by Gray-Scale-Resolved Hurst Exponent Distributions
Next Article in Special Issue
Enhancement of Text Analysis Using Context-Aware Normalization of Social Media Informal Text
Previous Article in Journal
Facial Paralysis Detection on Images Using Key Point Analysis
Previous Article in Special Issue
Valence and Arousal-Infused Bi-Directional LSTM for Sentiment Analysis of Government Social Media Management
Article

AraSenCorpus: A Semi-Supervised Approach for Sentiment Annotation of a Large Arabic Text Corpus

1
Computer Science Department, University of Engineering and Technology, Lahore 54890, Pakistan
2
Artificial Intelligence and Data Analytics Laboratory, Prince Sultan University, Riyadh 11586, Saudi Arabia
*
Author to whom correspondence should be addressed.
Academic Editor: Carlos A. Iglesias
Appl. Sci. 2021, 11(5), 2434; https://doi.org/10.3390/app11052434
Received: 1 February 2021 / Revised: 26 February 2021 / Accepted: 26 February 2021 / Published: 9 March 2021
(This article belongs to the Special Issue Sentiment Analysis for Social Media Ⅱ)
At a time when research in the field of sentiment analysis tends to study advanced topics in languages, such as English, other languages such as Arabic still suffer from basic problems and challenges, most notably the availability of large corpora. Furthermore, manual annotation is time-consuming and difficult when the corpus is too large. This paper presents a semi-supervised self-learning technique, to extend an Arabic sentiment annotated corpus with unlabeled data, named AraSenCorpus. We use a neural network to train a set of models on a manually labeled dataset containing 15,000 tweets. We used these models to extend the corpus to a large Arabic sentiment corpus called “AraSenCorpus”. AraSenCorpus contains 4.5 million tweets and covers both modern standard Arabic and some of the Arabic dialects. The long-short term memory (LSTM) deep learning classifier is used to train and test the final corpus. We evaluate our proposed framework on two external benchmark datasets to ensure the improvement of the Arabic sentiment classification. The experimental results show that our corpus outperforms the existing state-of-the-art systems. View Full-Text
Keywords: corpus annotation; Arabic sentiment analysis; semi-supervised learning; self-learning; neural networks; deep learning corpus annotation; Arabic sentiment analysis; semi-supervised learning; self-learning; neural networks; deep learning
Show Figures

Figure 1

MDPI and ACS Style

Al-Laith, A.; Shahbaz, M.; Alaskar, H.F.; Rehmat, A. AraSenCorpus: A Semi-Supervised Approach for Sentiment Annotation of a Large Arabic Text Corpus. Appl. Sci. 2021, 11, 2434. https://doi.org/10.3390/app11052434

AMA Style

Al-Laith A, Shahbaz M, Alaskar HF, Rehmat A. AraSenCorpus: A Semi-Supervised Approach for Sentiment Annotation of a Large Arabic Text Corpus. Applied Sciences. 2021; 11(5):2434. https://doi.org/10.3390/app11052434

Chicago/Turabian Style

Al-Laith, Ali, Muhammad Shahbaz, Hind F. Alaskar, and Asim Rehmat. 2021. "AraSenCorpus: A Semi-Supervised Approach for Sentiment Annotation of a Large Arabic Text Corpus" Applied Sciences 11, no. 5: 2434. https://doi.org/10.3390/app11052434

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop