Spatial Impressions Monitoring during COVID-19 Pandemic Using Machine Learning Techniques
Abstract
:1. Introduction
- We analyse Arabic Twitter posts to identify people’s impressions regarding COVID-19-pandemic to aid the government in comprehending the public’s perceptions and making required decisions based on them.
- The sentiment polarity patterns across different spatial zones are observed using geotag data.
- We examine various machine learning models and deep learning techniques to understand public behavior and attitudes. SVM is found to be more accurate than other methods for analyzing and monitoring COVID-19 opinions and enhanced the accuracy of the predictions.
2. Related Work
3. Methodology
3.1. Pre-Processing
3.1.1. Clean Text
3.1.2. Filtering Tweets Location
3.1.3. TF-IDF Vectorizer
3.2. Learning
3.3. Classification
4. Experimental Evaluation
4.1. Data Collection
4.2. Performance Evaluation
4.3. Machine Learning Models
4.4. LSTM Model
- False Negative (FN): the model anticipated a negative outcome, but it was incorrect. The False-negative value for a class is the sum of the values of the linked rows, except for the TP value.
- False Positive (FP): the model anticipated a positive result, but it was incorrect. Except for the TP value, the False-positive value for a class is the total of the values in the relevant column.
- True Negative (TN): the model correctly predicted a negative outcome. The total of all columns and rows for a class, except the values of the class for which the values are being calculated, is the True Negative value for that class.
- True Positive (TP): The model correctly predicted a positive outcome. The real positive value is where the actual and expected values are the same.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- CDC. Symptoms Testing. October 2020. Available online: https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/symptoms.html (accessed on 21 October 2020).
- Meter, W. World Meter. 2020. Available online: https://www.worldometers.info/coronavirus/ (accessed on 21 October 2020).
- Health, M. Ministry of Health, COVID-19. 2020. Available online: https://covid19.moh.gov.sa/ (accessed on 21 October 2020).
- Statista. Coronavirus Impact Index by Industry. 2020. Available online: https://www.statista.com/statistics/1106302/coronavirus-impact-index-by-industry-2020/ (accessed on 21 October 2020).
- Kretinin, A.; Samuel, J.; Kashyap, R. When the Going Gets Tough, the Tweets Get Going! An Exploratory Analysis of Tweets Sentiments in the Stock Market. Am. J. Manag. 2018, 18, 23–36. [Google Scholar]
- Almars, A.; Almaliki1, M.; Noor, T.H.; Alwateer, M.M.; Atlam, E. HANN: Hybrid Attention Neural Network for Detecting Covid-19 Related Rumors. IEEE Access 2022, 10, 12334–12344. [Google Scholar] [CrossRef]
- Elmezain, M.; Ibrahem, H.M. Retrieving Semantic Image Using Shape Descriptors and Latent-Dynamic Conditional Random Fields. Comput. J. 2020, 64, 1876–1885. [Google Scholar] [CrossRef]
- Choudhury, M.D.; Counts, S.; Horvitz, E. Predicting postpartum changes in emotion and behavior via social media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Paris, France, 27 April–2 May 2013. [Google Scholar]
- Wang, Z.; Ye, X.; Tsou, M.H. Spatial, temporal, and content analysis of Twitter for wildfire hazards. Nat. Hazards 2016, 83, 523–540. [Google Scholar] [CrossRef]
- Almars, A.; Li, X.; Zhao, X. Modelling user attitudes using hierarchical sentiment-topic model. Data Knowl. Eng. 2019, 119, 139–149. [Google Scholar] [CrossRef]
- Shahri, M.P.; Lyon, K.; Schearer, J.; Kahanda, I. DeepPPPred: An Ensemble of BERT, CNN, and RNN for Classifying Co-mentions of Proteins and Phenotypes. bioRxiv 2020. [Google Scholar] [CrossRef]
- Malki, Z.; Atlam, E.S.; Ewis, A.; Dagnew, G.; Reda, A.; Elmarhomy, G.; Elhosseini, M.A.; Hassanien, A.E.; Gad, I. ARIMA Models for Predicting the End of COVID-19 Pandemic and the Risk of a Second Rebound. J. Neural Comput. Appl. 2020, 33, 2929–2948. [Google Scholar] [CrossRef]
- Ibrahim, A.F.; Hassaballah, M.; Ali, A.A.; Nam, Y.; Ibrahim, I.A. COVID19 Outbreak: A Hierarchical Framework for User Sentiment Analysis. Comput. Mater. Contin. 2022, 70, 2507–2524. [Google Scholar] [CrossRef]
- Almars, A.M. Attention-Based Bi-LSTM Model for Arabic Depression Classification. Comput. Mater. Contin. 2022, 71, 3091–3106. [Google Scholar] [CrossRef]
- Malki, Z.; Atlam, E.S.; Hassanien, A.E.; Dagnew, G.; Elhosseini, M.A.; Gad, I. Association between weather data and COVID-19 pandemic predicting mortality rate: Machine learning approaches. Chaos Solitons Fractals 2020, 138, 110137. [Google Scholar] [CrossRef] [PubMed]
- Farsi, M.; Hosahalli, D.; Manjunatha, B.; Gad, I.; Atlam, E.S.; Ahmed, A.; Elmarhomy, G.; Elmarhoumy, M.; Ghoneim, O.A. Parallel genetic algorithms for optimizing the SARIMA model for better forecasting of the NCDC weather data. Alex. Eng. J. 2021, 60, 1299–1316. [Google Scholar] [CrossRef]
- Malki, Z.; Atlam, E.S.; Ewis, A.; Dagnew, G.; Ghoneim, O.A.; Mohamed, A.A.; Abdel-Daim, M.M.; Gad, I. The COVID-19 Pandemic: Prediction Study Based on Machine Learning Model. J. Environ. Sci. Pollut. Res. 2021, 28, 40496–40506. [Google Scholar] [CrossRef] [PubMed]
- Elmezain, M.; Othman, E.A.; Ibrahim, H.M. Temporal Degree-Degree and Closeness-Closeness: A New Centrality Metrics for Social Network Analysis. Mathematics 2021, 9, 2850. [Google Scholar] [CrossRef]
- Hazarika, D.; Konwar, G.; Deb, S.; Bora, D.J. Sentiment Analysis on Twitter by Using TextBlob for Natural Language Processing. In Proceedings of the International Conference on Research in Management and Technovation 2020, Nagpur, India, 5–6 December 2020. [Google Scholar]
- Aydemir, M.; Akyol, H.B. #Imnotavirus: Pro-Migrant Activism on Twitter Amidst the Global Corona Virus (COVID-19) Outbreak. SSRN Electron. J. 2020. Available online: https://ssrn.com/abstract=3599758 (accessed on 24 March 2022). [CrossRef]
- Althagafi, A.; Althobaiti, G.; Alhakami, H.; Alsubait, T. Arabic Tweets Sentiment Analysis about Online Learning during COVID-19 in Saudi Arabia. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 620–625. [Google Scholar] [CrossRef]
- Aljabri, M.; Chrouf, S.B.; Alzahrani, N.A.; Alghamdi, L.; Alfehaid, R.; Alqarawi, R.; Alhuthayfi, J.; Alduhailan, N. Sentiment Analysis of Arabic Tweets Regarding Distance Learning in Saudi Arabia during the COVID-19 Pandemic. Sensors 2021, 21, 5431. [Google Scholar] [CrossRef] [PubMed]
- Anuratha, K. Public Sentiment Insights Analysis using Word Sense Disambiguation Application on Twitter Data during a Pandemic—COVID-19. Int. J. Adv. Trends Comput. Sci. Eng. 2020, 9, 4729–4732. [Google Scholar]
- Hussain, A.; Tahir, A.; Hussain, Z.; Sheikh, Z.; Gogate, M.; Dashtipour, K.; Ali, A.; Sheikh, A. Artificial Intelligence-Enabled Analysis of Public Attitudes on Facebook and Twitter toward COVID-19 Vaccines in the United Kingdom and the United States: Observational Study. J. Med. Internet Res. 2021, 23, e26627. [Google Scholar] [CrossRef]
- Al-Twairesh, N.; Al-Khalifa, H.; Al-Salman, A.; Al-Ohali, Y. AraSenTi-Tweet: A Corpus for Arabic Sentiment Analysis of Saudi Tweets. Procedia Comput. Sci. 2017, 117, 63–72. [Google Scholar] [CrossRef]
- Heist, N.; Hertling, S.; Paulheim, H. Language-Agnostic Relation Extraction from Abstracts in Wikis. Information 2018, 9, 75. [Google Scholar] [CrossRef] [Green Version]
- Atlam, E.-S.; Fuketa, M.; Morita, K.; Aoe, J.-i. Document Similarity measurement using field Association term. Inf. Process. Manag. J. 2003, 39, 809–824. [Google Scholar] [CrossRef]
- Baena-Garcia, M.; Carmona-Cejudo, J.M.; Castillo, G.; Morales-Bueno, R. TF-SIDF: Term frequency, sketched inverse document frequency. In Proceedings of the 2011 11th International Conference on Intelligent Systems Design and Applications, Cordoba, Spain, 22–24 November 2011. [Google Scholar]
- Zhang, C.; Shao, X.; Li, D. Knowledge-based Support Vector Classification Based on C-SVC. Procedia Comput. Sci. 2013, 17, 1083–1090. [Google Scholar] [CrossRef] [Green Version]
- Md. Yasin Kabir, S.M. CoronaVis: A Real-time COVID-19 Tweets Data Analyzer and Data Repository. arXiv 2020, arXiv:2004.13932. [Google Scholar]
- Wilbur, W.J.; Kim, W. Stochastic Gradient Descent and the Prediction of MeSH for PubMed Records. In Proceedings of the AMIA Annual Symposium Proceedings, Washington, DC, USA, 15–19 November 2014; pp. 1198–1207. [Google Scholar]
- Dimovski, A.S.; Apel, S.; Legay, A. A Decision Tree Lifted Domain for Analyzing Program Families with Numerical Features. In Fundamental Approaches to Software Engineering. FASE 2021. Lecture Notes in Computer Science; Guerra, E., Stoelinga, M., Eds.; Springer: Cham, Switzerland, 2021; Volume 12649, pp. 67–86. [Google Scholar]
- Mohandoss, D.P.; Shi, Y.; Suo, K. Outlier Prediction Using Random Forest Classifier. In Proceedings of the 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 27–30 January 2021. [Google Scholar]
- Zhang, Z. Introduction to machine learning: K-nearest neighbors. Ann. Transl. Med. 2016, 4, 218. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Folly, K.A.; Venayagamoorthy, G.K. Effects of learning rate on the performance of the population based incremental learning algorithm. In Proceedings of the 2009 International Joint Conference on Neural Networks, Atlanta, GA, USA, 14–19 June 2009. [Google Scholar]
- Charyulu, E.M.; Gnanamani, A.; Mandal, A.B. Identification and Discrimination of Methicillin Resistant Staphylococcus aureus Strains Isolated from Burn Wound Sites Using PCR and Authentication with MALDI-TOF–MS. Indian J. Microbiol. 2012, 52, 337–345. [Google Scholar] [CrossRef] [Green Version]
- Ibrahem, H.M.; Elmezain, M.; Shoman, S. Adaptive image enhancement approach based on double-plateaus histogram. J. Theor. Appl. Inf. Technol. 2020, 98, 1675–1685. [Google Scholar]
- Jelodar, H.; Wang, Y.; Orji, R.; Huang, S. Deep Sentiment Classification and Topic Discovery on Novel Coronavirus or COVID-19 Online Discussions: NLP Using LSTM Recurrent Neural Network Approach. IEEE J. Biomed. Health Inform. 2020, 24, 2733–2742. [Google Scholar] [CrossRef]
- Adam, K.; Smagulova, K.; James, A. Generalised Analog LSTMs Recurrent Modules for Neural Computing. Front. Comput. Neurosci. 2021, 15, 705050. [Google Scholar] [CrossRef] [PubMed]
- Gad, I.; Hosahalli, D. A comparative study of prediction and classification models on NCDC weather data. Int. J. Comput. Appl. 2020, 1–12. [Google Scholar] [CrossRef]
- Clarin, J.A. Academic Analytics: Predicting Success in the Licensure Examination of Graduates using CART Decision Tree Algorithm. J. Adv. Res. Dyn. Control Syst. 2020, 12, 143–151. [Google Scholar] [CrossRef]
- Hosahalli, D.; Gad, I. A Generic Approach of Filling Missing Values in NCDC Weather Stations Data. In Proceedings of the 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India, 19–22 September 2018; pp. 143–149. [Google Scholar]
Word in Arabic | Word in English |
---|---|
كورونا | Corona |
التباعد الاجتماعي | Social distancing |
كوفيد 19 | COVID-19 |
منظمة الصحة العالمية | World Health Organization |
لبس الكمامات | Wear masks |
تعقيم اليدين | Hand sanitizer |
وباء عالمي | A global epidemic |
Country (Arabic Twitter) | Positive | Neutral | Negative | Total | Evaluation (≈%) |
---|---|---|---|---|---|
Saudi Arabia | 325 | 1822 | 1903 | 4050 | 37% |
Egypt | 248 | 1398 | 1459 | 3105 | 28% |
Jordan | 119 | 668 | 698 | 1485 | 13% |
UAE | 76 | 425 | 444 | 945 | 9% |
Palestine | 65 | 365 | 380 | 810 | 7% |
Algeria | 54 | 304 | 317 | 675 | 6% |
Total | 887 | 4982 | 5201 | 11,070 | — |
Evaluation (≈%) | 8 | 45 | 47 | — | (100%) |
Classifier | N-Gram | Accuracy | Precision | Recall |
---|---|---|---|---|
SVC | 1 | 0.848 | 0.845 | 0.848 |
2 | 0.854 | 0.851 | 0.854 | |
3 | 0.851 | 0.850 | 0.851 | |
MultinomialNB | 1 | 0.808 | 0.810 | 0.808 |
2 | 0.820 | 0.824 | 0.820 | |
3 | 0.824 | 0.830 | 0.824 | |
BernoulliNB | 1 | 0.804 | 0.814 | 0.804 |
2 | 0.822 | 0.830 | 0.822 | |
3 | 0.826 | 0.831 | 0.826 | |
SGD | 1 | 0.788 | 0.781 | 0.788 |
2 | 0.810 | 0.810 | 0.810 | |
3 | 0.807 | 0.803 | 0.807 | |
Decision Tree | 1 | 0.672 | 0.730 | 0.672 |
2 | 0.678 | 0.732 | 0.678 | |
3 | 0.656 | 0.771 | 0.656 | |
Random Forest | 1 | 0.738 | 0.699 | 0.738 |
2 | 0.695 | 0.766 | 0.695 | |
3 | 0.691 | 0.769 | 0.691 | |
KNN | 1 | 0.797 | 0.807 | 0.797 |
2 | 0.801 | 0.814 | 0.801 | |
3 | 0.786 | 0.795 | 0.786 |
Precision | Recall | F1-Score | Support | |
---|---|---|---|---|
Neutral | 0.85 | 0.91 | 0.88 | 1851 |
Negative | 0.86 | 0.87 | 0.87 | 1903 |
Positive | 0.76 | 0.49 | 0.59 | 348 |
Accuracy | 0.85 | 4102 | ||
Macro avg | 0.83 | 0.75 | 0.78 | 4102 |
Weighted avg | 0.85 | 0.85 | 0.85 | 4102 |
Precision | Recall | F1-Score | Support | |
---|---|---|---|---|
Neutral | 0.55 | 0.93 | 0.69 | 760 |
Negative | 0.82 | 0.42 | 0.56 | 853 |
Positive | 0.64 | 0.15 | 0.25 | 163 |
Accuracy | 0.62 | 1776 | ||
Macro avg | 0.67 | 0.50 | 0.50 | 1776 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Noor, T.H.; Almars, A.; Gad, I.; Atlam, E.-S.; Elmezain, M. Spatial Impressions Monitoring during COVID-19 Pandemic Using Machine Learning Techniques. Computers 2022, 11, 52. https://doi.org/10.3390/computers11040052
Noor TH, Almars A, Gad I, Atlam E-S, Elmezain M. Spatial Impressions Monitoring during COVID-19 Pandemic Using Machine Learning Techniques. Computers. 2022; 11(4):52. https://doi.org/10.3390/computers11040052
Chicago/Turabian StyleNoor, Talal H., Abdulqader Almars, Ibrahim Gad, El-Sayed Atlam, and Mahmoud Elmezain. 2022. "Spatial Impressions Monitoring during COVID-19 Pandemic Using Machine Learning Techniques" Computers 11, no. 4: 52. https://doi.org/10.3390/computers11040052
APA StyleNoor, T. H., Almars, A., Gad, I., Atlam, E. -S., & Elmezain, M. (2022). Spatial Impressions Monitoring during COVID-19 Pandemic Using Machine Learning Techniques. Computers, 11(4), 52. https://doi.org/10.3390/computers11040052