Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning
Abstract
:1. Introduction
2. Related Works
2.1. Arabic Language
2.2. Other Languages
3. Materials and Methods
3.1. Dataset Details
3.1.1. Large-Scale Arabic Book Reviews (LABR) Dataset
3.1.2. Hotel Arabic-Reviews Dataset (HARD)
3.2. Data Pre-Processing and Cleaning
3.2.1. LABR Preparation
3.2.2. HARD Preparation
- Remove the NAN values.
- Removing non-required columns by excluding any columns that are not “sentiment“ or “polarity”.
- Remove any duplicate entries.
- Map any “positive”, “negative”, or “neutral” values into numbers.
- Tokenize the text column into separate lists using sent_tokenise from NLTK (https://www.nltk.org/, (accessed on 1 January 2022)).
- Clean the text column using the following steps:
- Remove the English and Arabic punctuation, e.g., <>, _, (), *, &, …etc.
- Perform Arabic character normalization, as in [56]. This step is widely used in Arabic NLP as it aims to unify the letters that can appear in different forms.e.g., convert any [إأآا] to “ا”, “ى” to “ي”, “ؤ” to “ء”, “ئ” to ”ء”, “ة” to “ه” and “گ” to “ك”
- Remove the diacritic “tashkeel”e.g.,| # Tashdid الشدة, | # Fatha الفتحة, | # Tanwin Fath تنوين بالفتحتين…etc.
- Elongation removal: remove the repeating character to keep just one character. e.g.,جمييييييييييييل will convert to جميل [57]
- Remove the stop words, e.g., أما, بعد, هل … etc.
- Remove the non-Arabic character by using regular expressions to filter out languages that use other alphabets.
- Remove any digit, such as 1234.
- Convert the lists into tokens using word_tokenize from NLTK
- Stemming the tokens using ISRIStemmer from NLTK, e.g., نظيف will be نظف
- Split the data into training and testing
3.3. Preparing Data for Word Embedding/Text Representation
3.3.1. Word to Vector Embedding Model (Word2Vec)
3.3.2. FastText Embedding Model
3.4. Deep Learning Models
3.4.1. The Proposed Arabic Sentiment Analysis Using CNN
3.4.2. The Proposed Arabic Sentiment Analysis Using LSTM
3.4.3. The Proposed Arabic Sentiment Analysis Using CNN-LSTM
4. Experimental Results
4.1. Experimental Setup
4.2. Evaluation Criteria
5. Results and Discussions
5.1. The Proposed CNN Model Experiments Results
5.2. Experiment Results for the Proposed LSTM Model
5.3. The Proposed CNN-LSTM Hybrid Model Experimentation Results
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Liu, B. Sentiment analysis and opinion mining. In Synthesis Lectures on Human Language Technologies; Springer: Cham, Switzerland, 2012; Volume 5, pp. 1–167. [Google Scholar]
- Alsayat, A.; Elmitwally, N. A comprehensive study for Arabic Sentiment Analysis (challengesand applications). Egypt. Inform. J. 2020, 21, 7–12. [Google Scholar] [CrossRef]
- Al-Bayati, A.Q.; Al-Araji, A.S.; Ameen, S.H. Arabic Sentiment Analysis (ASA) using deep learning approach. J. Eng. 2020, 26, 85–93. [Google Scholar] [CrossRef]
- Ombabi, A.H.; Ouarda, W.; Alimi, A.M. Deep learning CNN–LSTM framework for Arabic Sentiment Analysis using textual information shared in social networks. Soc. Netw. Anal. Min. 2020, 10, 53. [Google Scholar] [CrossRef]
- Omara, E.; Mosa, M.; Ismail, N. Deep convolutional network for Arabic Sentiment Analysis. In Proceedings of the 2018 International Japan-Africa Conference on Electronics, Communications and Computations (JAC-ECC), Alexandria, Egypt, 17–19 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 155–159. [Google Scholar]
- Kolkur, S.; Dantal, G.; Mahe, R. Study of different levels for sentiment analysis. Int. J. Curr. Eng. Technol. 2015, 5, 768–770. [Google Scholar]
- Balaji, P.; Nagaraju, O.; Haritha, D. Levels of sentiment analysis and its challenges: A literature review. In Proceedings of the 2017 International Conference on Big Data Analytics and Computational Intelligence (ICBDAC), Chirala, India, 23–25 March 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 436–439. [Google Scholar]
- Alowaidi, S.; Saleh, M.; Abulnaja, O. Semantic sentiment analysis of Arabic texts. Int. J. Adv. Comput. Sci. Appl. 2017, 8, 256–262. [Google Scholar] [CrossRef] [Green Version]
- Alayba, A.M.; Palade, V.; England, M.; Iqbal, R. A combined CNN and LSTM model for Arabic Sentiment Analysis. In Proceedings of the International Cross-Domain Conference for Machine Learning and Knowledge Extraction, Hamburg, Germany, 27–30 August 2018; Springer: Hamburg, Germany, 2018; pp. 179–191. [Google Scholar]
- Dashtipour, K.; Gogate, M.; Adeel, A.; Larijani, H.; Hussain, A. Sentiment analysis of persian movie reviews using deep learning. Entropy 2021, 23, 596. [Google Scholar] [CrossRef] [PubMed]
- Ain, Q.T.; Ali, M.; Riaz, A.; Noureen, A.; Kamran, M.; Hayat, B.; Rehman, A. Sentiment analysis 663 using deep learning techniques: A review. Int. J. Adv. Comput. Sci. Appl. 2017, 8, 424–433. [Google Scholar]
- Jhaveri, R.H.; Revathi, A.; Ramana, K.; Raut, R.; Dhanaraj, R.K. A review on machine learning strategies for real-world engineering applications. Mob. Inf. Syst. 2022, 2022, 1833507. [Google Scholar] [CrossRef]
- Varone, G.; Gasparini, S.; Ferlazzo, E.; Ascoli, M.; Tripodi, G.G.; Zucco, C.; Calabrese, B.; Cannataro, M.; Aguglia, U. A comprehensive machine-learning-based software pipeline to classify EEG signals: A case study on PNES vs. control subjects. Sensors 2020, 20, 1235. [Google Scholar] [CrossRef] [Green Version]
- Varone, G.; Ieracitano, C.; Çiftçioğlu, A.Ö.; Hussain, T.; Gogate, M.; Dashtipour, K.; Al-Tamimi, B.N.; Almoamari, H.; Akkurt, I.; Hussain, A. A Novel Hierarchical Extreme Machine-Learning-Based Approach for Linear Attenuation Coefficient Forecasting. Entropy 2023, 25, 253. [Google Scholar] [CrossRef]
- Al-Azani, S.; El-Alfy, E.S.M. Hybrid deep learning for sentiment polarity determination of Arabic microblogs. In Proceedings of the International Conference on Neural Information Processing, Guangzhou, China, 14–18 November 2017; Springer: Cham, Switzerland, 2017; pp. 491–500. [Google Scholar]
- Kalchbrenner, N.; Grefenstette, E.; Blunsom, P. A convolutional neural network for modelling sentences. arXiv 2014, arXiv:1404.2188. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Baly, R.; El-Khoury, G.; Moukalled, R.; Aoun, R.; Hajj, H.; Shaban, K.B.; El-Hajj, W. Comparative evaluation of sentiment analysis methods across Arabic dialects. Procedia Comput. Sci. 2017, 117, 266–273. [Google Scholar] [CrossRef]
- Zahidi, Y.; El Younoussi, Y.; Al-Amrani, Y. A powerful comparison of deep learning frameworks for Arabic Sentiment Analysis. Int. J. Electr. Comput. Eng. 2021, 11, 745–752. [Google Scholar] [CrossRef]
- Nassif, A.B.; Elnagar, A.; Shahin, I.; Henno, S. Deep learning for Arabic subjective sentiment analysis: Challenges and research opportunities. Appl. Soft Comput. 2021, 98, 106836. [Google Scholar] [CrossRef]
- Rudkowsky, E.; Haselmayer, M.; Wastian, M.; Jenny, M.; Emrich, Š.; Sedlmair, M. More than bags of words: Sentiment analysis with word embeddings. Commun. Methods Meas. 2018, 12, 140–157. [Google Scholar] [CrossRef] [Green Version]
- Elnagar, A.; Khalifa, Y.S.; Einea, A. Hotel Arabic-reviews dataset construction for sentiment analysis applications. In Intelligent natural Language Processing: Trends and Applications; Springer: Cham, Switzerland, 2018; pp. 35–52. [Google Scholar]
- Aly, M.; Atiya, A. Labr: A large scale arabic book reviews dataset. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 5–7 August 2013; Volume 2, pp. 494–498. [Google Scholar]
- Heikal, M.; Torki, M.; El-Makky, N. Sentiment analysis of Arabic tweets using deep learning. Procedia Comput. Sci. 2018, 142, 114–122. [Google Scholar] [CrossRef]
- Alahmary, R.M.; Al-Dossari, H.Z.; Emam, A.Z. Sentiment analysis of Saudi dialect using deep learning techniques. In Proceedings of the 2019 International Conference on Electronics, Information, and Communication (ICEIC), Auckland, New Zealand, 22–25 January 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
- Baly, R.; Hajj, H.; Habash, N.; Shaban, K.B.; El-Hajj, W. A sentiment treebank and morpho-logically enriched recursive deep models for effective sentiment analysis in arabic. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2017, 16, 1–21. [Google Scholar] [CrossRef]
- Al Sallab, A.; Hajj, H.; Badaro, G.; Baly, R.; El-Hajj, W.; Shaban, K. Deep learning models for sentiment analysis in Arabic. In Proceedings of the Second Workshop on Arabic Natural Language Processing, Beijing, China, 30 July 2015; pp. 9–17. [Google Scholar]
- Al-Sallab, A.; Baly, R.; Hajj, H.; Shaban, K.B.; El-Hajj, W.; Badaro, G. Aroma: A recursive deep learning model for opinion mining in arabic as a low resource language. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2017, 16, 1–20. [Google Scholar] [CrossRef]
- AlSurayyi, W.I.; Alghamdi, N.S.; Abraham, A. Deep Learning with Word Embedding Modeling for a Sentiment Analysis of Online Reviews. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 2019, 11, 227–241. [Google Scholar]
- Alayba, A.M.; Palade, V.; England, M.; Iqbal, R. Arabic language sentiment analysis on health services. In Proceedings of the 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), Nancy, France, 3–5 April 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 114–118. [Google Scholar]
- Al-Azani, S.; El-Alfy, E.S.M. Using word embedding and ensemble learning for highly imbal-anced data sentiment analysis in short arabic text. Procedia Comput. Sci. 2017, 109, 359–366. [Google Scholar] [CrossRef]
- Al-Azani, S.; El-Alfy, E.S. Emojis-based sentiment classification of Arabic microblogs using deep recurrent neural networks. In Proceedings of the 2018 International Conference on Computing Sciences and Engineering (ICCSE), Kuwait City, Kuwait, 11–13 March 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
- Al-Laith, A.; Shahbaz, M.; Alaskar, H.F.; Rehmat, A. Arasencorpus: A semi-supervised approach for sentiment annotation of a large arabic text corpus. Appl. Sci. 2021, 11, 2434. [Google Scholar] [CrossRef]
- Oussous, A.; Benjelloun, F.Z.; Lahcen, A.A.; Belfkih, S. ASA: A framework for Arabic Sentiment Analysis. J. Inf. Sci. 2020, 46, 544–559. [Google Scholar] [CrossRef]
- Dahou, A.; Elaziz, M.A.; Zhou, J.; Xiong, S. Arabic sentiment classification using convolutional neural network and differential evolution algorithm. Comput. Intell. Neurosci. 2019, 2019, 2537689. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Altaher, A. Hybrid approach for sentiment analysis of Arabic tweets based on deep learning model and features weighting. Int. J. Adv. Appl. Sci 2017, 4, 43–49. [Google Scholar] [CrossRef]
- Saeed, R.M.; Rady, S.; Gharib, T.F. Optimizing sentiment classification for Arabic opinion texts. Cogn. Comput. 2021, 13, 164–178. [Google Scholar] [CrossRef]
- Addi, H.A.; Ezzahir, R.; Mahmoudi, A. Three-level binary tree structure for sentiment classifica-tion in Arabic text. In Proceedings of the 3rd International Conference on Networking, Information Systems & Security, Marrakech, Morocco, 31 March–2 April 2020; pp. 1–8. [Google Scholar]
- Muaad, A.Y.; Jayappa, H.; Al-antari, M.A.; Lee, S. ArCAR: A novel deep learning computer-aided recognition for character-level Arabic text representation and recognition. Algorithms 2021, 14, 216. [Google Scholar] [CrossRef]
- Mhamed, M.; Sutcliffe, R.; Sun, X.; Feng, J.; Almekhlafi, E.; Retta, E.A. A Deep CNN Architecture with Novel Pooling Layer Applied to Two Sudanese Arabic Sentiment Datasets. arXiv 2022, arXiv:2201.12664. [Google Scholar]
- Nassif, A.B.; Darya, A.M.; Elnagar, A. Empirical evaluation of shallow and deep learning classifiers for Arabic Sentiment Analysis. Trans. Asian Low-Resour. Lang. Inf. Process. 2021, 21, 1–25. [Google Scholar] [CrossRef]
- Al Shboul, B.; Al-Ayyoub, M.; Jararweh, Y. Multi-way sentiment classification of arabic reviews. In Proceedings of the 2015 6th International Conference on Information and Communication Systems (ICICS), Amman, Jordan, 7–9 April 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 206–211. [Google Scholar]
- Elnagar, A. Investigation on sentiment analysis for Arabic reviews. In Proceedings of the 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), Agadir, Morocco, 29 November–2 December 2016; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar]
- Aliane, A.; Aliane, H.; Ziane, M.; Bensaou, N. A genetic algorithm feature selection based approach for Arabic sentiment classification. In Proceedings of the 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), Agadir, Morocco, 29 November–2 December 2016; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar]
- Barhoumi, A.; Estève, Y.; Aloulou, C.; Belguith, L. Document embeddings for Arabic Sentiment Analysis. In Proceedings of the Conference on Language Processing and Knowledge Management, LPKM, Kerkennah, Tunisia, 8–10 September 2017. [Google Scholar]
- Al-Saqqa, S.; Obeid, N.; Awajan, A. Sentiment analysis for Arabic text using ensemble learning. In Proceedings of the 2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA), Aqaba, Jordan, 28 October–1 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–7. [Google Scholar]
- Al-Ayyoub, M.; Nuseir, A.; Kanaan, G.; Al-Shalabi, R. Hierarchical classifiers for multi-way sentiment analysis of arabic reviews. Int. J. Adv. Comput. Sci. Appl. 2016, 7, 531–539. [Google Scholar] [CrossRef] [Green Version]
- Elzayady, H.; Badran, K.M.; Salama, G.I. Arabic Opinion Mining Using Combined CNN-LSTM Models. Int. J. Intell. Syst. Appl. 2020, 12, 25–36. [Google Scholar] [CrossRef]
- Abu Kwaik, K.; Saad, M.; Chatzikyriakidis, S.; Dobnik, S. LSTM-CNN deep learning model for sentiment analysis of dialectal Arabic. In Proceedings of the International Conference on Arabic Language Processing, Nancy, France, 16–17 October 2019; Springer: Cham, Switzerland, 2019; pp. 108–121. [Google Scholar]
- Nouhaila, B.; Habib, A.; Abdellah, A.; El Farouk Abdelhamid, I. Arabic sentiment analysis based on 1-D convolutional neural network. In Proceedings of the Third International Conference on Smart City Applications, Karabuk, Turkey, 7–9 October 2020; Springer: Cham, Switzerland, 2021; pp. 44–55. [Google Scholar]
- Al-Dabet, S.; Tedmori, S. Sentiment Analysis for Arabic Language using Attention-Based Simple Recurrent Unit. In Proceedings of the 2019 2nd International Conference on new Trends in Computing Sciences (ICTCS), Amman, Jordan, 9–11 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
- Naqvi, U.; Majid, A.; Abbas, S.A. UTSA: Urdu text sentiment analysis using deep learning methods. IEEE Access 2021, 9, 114085–114094. [Google Scholar] [CrossRef]
- Kapočiūtė-Dzikienė, J.; Damaševičius, R.; Woźniak, M. Sentiment analysis of lithuanian texts using traditional and deep learning approaches. Computers 2019, 8, 4. [Google Scholar] [CrossRef] [Green Version]
- Divyapushpalakshmi, M.; Ramalakshmi, R. An efficient sentimental analysis using hybrid 779 deep learning and optimization technique for Twitter using parts of speech (POS) tagging. Int. J. Speech Technol. 2021, 24, 329–339. [Google Scholar] [CrossRef]
- Vasili, R.; Xhina, E.; Ninka, I.; Terpo, D. Sentiment Analysis on Social Media for Albanian Language. Open Access Libr. J. 2021, 8, 1–31. [Google Scholar] [CrossRef]
- Darwish, K.; Magdy, W. Arabic information retrieval. Found. Trends® Inf. Retr. 2014, 7, 239–342. [Google Scholar] [CrossRef] [Green Version]
- Darwish, K.; Magdy, W.; Mourad, A. Language processing for arabic microblog retrieval. In Proceedings of the the 21st ACM International Conference on Information and Knowledge Management, Maui, HI, USA, 29 October–2 November 2012; pp. 2427–2430. [Google Scholar]
- Terechshenko, Z.; Linder, F.; Padmakumar, V.; Liu, M.; Nagler, J.; Tucker, J.A.; Bonneau, R. A comparison of methods in political science text classification: Transfer learning language models for politics. SSRN Electron. J. 2020, 1–25. [Google Scholar] [CrossRef]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 2017, 5, 135–146. [Google Scholar] [CrossRef] [Green Version]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional trans-formers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Omara, E.; Mosa, M.; Ismail, N. Applying Recurrent Networks For Arabic Sentiment Analysis. Menoufia J. Electron. Eng. Res. 2022, 31, 21–28. [Google Scholar] [CrossRef]
- Sivakumar, S.; Videla, L.S.; Kumar, T.R.; Nagaraj, J.; Itnal, S.; Haritha, D. Review on Word2Vec Word Embedding Neural Net. In Proceedings of the 2020 International Conference on Smart Electronics and Communication (ICOSEC), Trichy, India, 10–12 September 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 282–290. [Google Scholar]
- Khalid, U.; Hussain, A.; Arshad, M.U.; Shahzad, W.; Beg, M.O. Co-occurrences using Fasttext embeddings for word similarity tasks in Urdu. arXiv 2021, arXiv:2102.10957. [Google Scholar]
- Cliche, M. BB_twtr at SemEval-2017 task 4: Twitter sentiment analysis with CNNs and LSTMs. arXiv 2017, arXiv:1704.06125. [Google Scholar]
- Joulin, A.; Grave, E.; Bojanowski, P.; Douze, M.; Jégou, H.; Mikolov, T. Fasttext. zip: Compressing text classification models. arXiv 2016, arXiv:1612.03651. [Google Scholar]
- Li, L.; Jamieson, K.; DeSalvo, G.; Rostamizadeh, A.; Talwalkar, A. Hyperband: A novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 2017, 18, 6765–6816. [Google Scholar]
- Ahmed, R.; Gogate, M.; Tahir, A.; Dashtipour, K.; Al-Tamimi, B.; Hawalah, A.; El-Affendi, M.A.; Hussain, A. Novel deep convolutional neural network-based contextual recognition of Arabic handwritten scripts. Entropy 2021, 23, 340. [Google Scholar] [CrossRef]
- Rani, S.; Kumar, P. Deep learning based sentiment analysis using convolution neural network. Arab. J. Sci. Eng. 2019, 44, 3305–3314. [Google Scholar] [CrossRef]
- Cheng, Y.; Sun, H.; Chen, H.; Li, M.; Cai, Y.; Cai, Z.; Huang, J. Sentiment analysis using multi-head attention capsules with multi-channel CNN and bidirectional GRU. IEEE Access 2021, 9, 60383–60395. [Google Scholar] [CrossRef]
- Shickel, B.; Tighe, P.J.; Bihorac, A.; Rashidi, P. Deep EHR: A survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health Inform. 2017, 22, 1589–1604. [Google Scholar] [CrossRef]
- Minaee, S.; Azimi, E.; Abdolrashidi, A. Deep-sentiment: Sentiment analysis using ensemble of cnn and bi-lstm models. arXiv 2019, arXiv:1904.04206. [Google Scholar]
- Yue, W.; Li, L. Sentiment analysis using Word2vec-CNN-BiLSTM classification. In Proceedings of the 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS), Paris, France, 14–16 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–5. [Google Scholar]
- Rehman, A.U.; Malik, A.K.; Raza, B.; Ali, W. A hybrid CNN-LSTM model for improving accuracy of movie reviews sentiment analysis. Multimed. Tools Appl. 2019, 78, 26597–26613. [Google Scholar] [CrossRef]
- Jain, P.K.; Saravanan, V.; Pamula, R. A hybrid CNN-LSTM: A deep learning approach for consumer sentiment analysis using qualitative user-generated contents. Trans. Asian Low-Resour. Lang. Inf. Process. 2021, 20, 1–15. [Google Scholar] [CrossRef]
- Obeid, O.; Zalmout, N.; Khalifa, S.; Taji, D.; Oudah, M.; Alhafni, B.; Inoue, G.; Eryani, F.; Erdmann, A.; Habash, N. CAMeL tools: An open source python toolkit for Arabic natural 838 language processing. In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, 11–16 May 2020; pp. 7022–7032. [Google Scholar]
- Poria, S.; Cambria, E.; Howard, N.; Huang, G.B.; Hussain, A. Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing 2016, 174, 50–59. [Google Scholar] [CrossRef]
- Poria, S.; Chaturvedi, I.; Cambria, E.; Hussain, A. Convolutional MKL based multimodal emotion recognition and sentiment analysis. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 439–448. [Google Scholar]
- Poria, S.; Majumder, N.; Hazarika, D.; Cambria, E.; Gelbukh, A.; Hussain, A. Multimodal sentiment analysis: Addressing key issues and setting up the baselines. IEEE Intell. Syst. 2018, 33, 17–25. [Google Scholar] [CrossRef] [Green Version]
- Poria, S.; Peng, H.; Hussain, A.; Howard, N.; Cambria, E. Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis. Neurocomputing 2017, 261, 217–230. [Google Scholar] [CrossRef]
Ref. | Extractor | Deep Learning | Dataset | Accuracy (%) |
---|---|---|---|---|
[4] | FastText | CNN/LSTM | Multi-domain sentiment corpus | 90.75% |
[24] | AraVec | CNN/LSTM | ASTD | 65.05% |
[25] | Word2Vec | LSTM/Bi-LSTM | Arabic tweets | 94% |
Baly et al. (2017) [26] | ARSENTB | RNTN | Arabic Corpus | 81% |
[27] | BOW | DNN/DBN/Deep Auto Encoders | LDC ATB | 74.3% |
[28] | ArSenL | The recursive deeplearning model | ATB/Tweets /QALB | 86.5%/79.2% /76.9% |
[29] | GloVe | LSTM/Bi-LSTM | Restaurant reviews | 95.76% binary /64.03% multi |
[9] | 5-g | CNN LSTM | Main-AHS /Sub-AHS /Ar-Twitter /ASTD | 94.2% /95.7% /88.1% /77.6% |
[30] | Word2Vec | DNN CNN | Health Tweets | 91.87% |
[31] | Word2Vec/CBOW; Word2Vec/SG | CNN/LSTM | ASTD/ ArTwitter | 81.63%/ 87.27% |
[32] | Several types | LSTM/Bi-LSTM GRU/Bi-GRU | Various datasets | 78.71% |
[33] | FastText | LSTM | SemEval 2017; ASTD | 87.4%/85.2% |
[34] | Unigrams | CNN LSTM | Arabic tweets | 92.1%/90% |
[5] | TF TF-IDF | CNN | Constructed dataset | 94.33% |
[35] | Word2Vec | CNN | Various datasets | 93.28% |
[36] | TF-IDF | H2O Deep Learning model | Arabic tweets | 90% |
Language: Urdu | ||||
Naqvi et al. (2021) [52] | SAMAR/ FastText/ coNLL | LSTM/BiLSTM-ATT/CNN/C- LSTM | Constructed dataset | 77.9% |
Language: Persian | ||||
Dashtipour et al. (2021) [10] | fastText | CNN/LSTM | Movies review | 95.61% |
Language: Lithuanian | ||||
Kapovciute et al. (2019) [53] | Word2Vec/ fastText | CNN/LSTM | Lithuanian Internet comments | 70.6% |
Language: English | ||||
Divyapushpalakshmi et al. (2021) [54] | semantic and syntactic functions | ANN | SemEVAL-2017 | 92.0% |
Language: Albanian | ||||
Vasili et al. (2021) [55] | BOW/TF-ID/ Word2Vec/Glove | LSTM/CNN | Tweets in Albanian | 79.2% |
Data Set | Positive | Negative | Neutral | Total |
---|---|---|---|---|
LABR/2classes/imbalance | 42,832 | 8224 | None | 63,257 |
LABR/3classes/imbalance | 42,832 | 8224 | 12,201 | 63,257 |
HARD/2classes/Balance | 52,849 | 52,849 | None | 105,698 |
HARD/2classes/imbalance | 276,387 | 52,849 | None | 409,562 |
HARD/3classes/imbalance | 276,387 | 52,849 | 80,326 | 409,562 |
Word Embedding | Size | Min Count | Window | Workers | SG |
---|---|---|---|---|---|
Word2Vec | 300 | 1 | 5 | 4 | 0 |
FastText | 300 | 1 | 40 | 4 | 1 |
Hyperparameter | Value |
---|---|
Batch size | 128 |
Embedding size | 300 |
Embedding model | Word2Vec/CBOW-FastText/SG |
Embedding Trainable | True |
Filters | 16,32 |
Epochs | 50 |
Kernel Size | [3] |
Pool size | [2,3] |
Verbose | 1 |
Optimizer | RMSprop |
Activation function | Softmax, Sigmoid |
Dropout | 0.25, 0.10, 0.15, 0.5 |
Validation split | 0.2 |
Shuffle | True |
Data Sets | Word Embedding | Precision | Recall | F-Score | Accuracy |
---|---|---|---|---|---|
HARD/2Balance | Word2Vec | 90.41% | 90.34% | 90.37% | 90.36% |
FastText | 90.35% | 90.34% | 90.34% | 90.34% | |
HARD/2Imbalance | Word2Vec | 92.43% | 87.43% | 89.86% | 94.68% |
FastText | 92.76% | 87.08% | 89.83% | 94.69% | |
HARD/3Imbalance | Word2Vec | 84.00% | 75.78% | 79.78% | 86.37% |
FastText | 84.24% | 76.20% | 80.02% | 86.46% | |
LABR/2Imbalance | Word2Vec | 80.08% | 61.57% | 69.61% | 86.33% |
FastText | 76.44% | 63.33% | 69.27% | 86.17% | |
LABR/3Imbalance | Word2Vec | 56.23% | 43.71% | 49.19% | 69.72% |
FastText | 58.29% | 41.34% | 48.38% | 69.54% |
Data Sets | Word Embedding | Precision | Recall | F-Score | Accuracy |
---|---|---|---|---|---|
HARD/2Balance | Word2Vec | 90.28% | 90.20% | 90.24% | 90.20% |
FastText | 90.24% | 90.19% | 90.22% | 90.17% | |
HARD/2Imbalance | Word2Vec | 92.86% | 86.56% | 89.60% | 94.58% |
FastText | 93.07% | 86.58% | 89.71% | 94.63% | |
HARD/3Imbalance | Word2Vec | 83.68% | 76.24% | 79.79% | 86.33% |
FastText | 84.44% | 76.19% | 80.10% | 86.56% | |
LABR/2Imbalance | Word2Vec | 77.75% | 62.77% | 69.46% | 85.85% |
FastText | 79.91% | 61.55% | 69.54% | 86.36% | |
LABR/3Imbalance | Word2Vec | 50.69% | 41.12% | 45.41% | 69.35% |
FastText | 58.70% | 43.71% | 50.11% | 69.28% |
Data Sets | Word Embedding | Precision | Recall | F-Score | Accuracy |
---|---|---|---|---|---|
HARD/2Balance | Word2Vec | 90.28% | 90.24% | 90.26% | 90.23% |
FastText | 90.47% | 90.36% | 90.42% | 90.38% | |
HARD/2Imbalance | Word2Vec | 92.53% | 87.07% | 89.72% | 94.63% |
FastText | 92.16% | 86.93% | 89.47% | 94.54% | |
HARD/3Imbalance | Word2Vec | 84.52% | 76.09% | 80.08% | 86.61% |
FastText | 83.82% | 76.49% | 79.99% | 86.50% | |
LABR/2Imbalance | Word2Vec | 78.21% | 62.50% | 69.48% | 86.00% |
FastText | 77.12% | 62.35% | 68.95% | 86.05% | |
LABR/3Imbalance | Word2Vec | 45.85% | 39.84% | 42.63% | 68.64% |
FastText | 54.09% | 40.92% | 46.59% | 69.09% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Elhassan, N.; Varone, G.; Ahmed, R.; Gogate, M.; Dashtipour, K.; Almoamari, H.; El-Affendi, M.A.; Al-Tamimi, B.N.; Albalwy, F.; Hussain, A. Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning. Computers 2023, 12, 126. https://doi.org/10.3390/computers12060126
Elhassan N, Varone G, Ahmed R, Gogate M, Dashtipour K, Almoamari H, El-Affendi MA, Al-Tamimi BN, Albalwy F, Hussain A. Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning. Computers. 2023; 12(6):126. https://doi.org/10.3390/computers12060126
Chicago/Turabian StyleElhassan, Nasrin, Giuseppe Varone, Rami Ahmed, Mandar Gogate, Kia Dashtipour, Hani Almoamari, Mohammed A. El-Affendi, Bassam Naji Al-Tamimi, Faisal Albalwy, and Amir Hussain. 2023. "Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning" Computers 12, no. 6: 126. https://doi.org/10.3390/computers12060126
APA StyleElhassan, N., Varone, G., Ahmed, R., Gogate, M., Dashtipour, K., Almoamari, H., El-Affendi, M. A., Al-Tamimi, B. N., Albalwy, F., & Hussain, A. (2023). Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning. Computers, 12(6), 126. https://doi.org/10.3390/computers12060126