Next Article in Journal
Insights into the Bioactivities and Chemical Analysis of Ailanthus altissima (Mill.) Swingle
Next Article in Special Issue
Erratum: Born et al. Accelerating Detection of Lung Pathologies with Explainable Ultrasound Image Analysis. Appl. Sci. 2021, 11, 672
Previous Article in Journal
Content Analysis of Mobile Device Applications for Artistic Creation for Children between 4 and 12 Years of Age
Previous Article in Special Issue
Machine Learning Models of COVID-19 Cases in the United States: A Study of Initial Lockdown and Reopen Regimes
 
 
Article

Location Analysis for Arabic COVID-19 Twitter Data Using Enhanced Dialect Identification Models

1
The Engineering Company for the Development of Digital Systems, Giza 12311, Egypt
2
Faculty of Computers and Artificial Intelligence, Cairo University, Giza 12613, Egypt
3
Faculty of Engineering, Cairo University, Giza 12613, Egypt
4
College of Computer Sciences and Information Technology, King Faisal University, AlAhsa 31982, Saudi Arabia
*
Author to whom correspondence should be addressed.
Academic Editors: Anton Civit and Manuel Dominguez-Morales
Appl. Sci. 2021, 11(23), 11328; https://doi.org/10.3390/app112311328
Received: 8 October 2021 / Revised: 21 November 2021 / Accepted: 23 November 2021 / Published: 30 November 2021
The recent surge of social media networks has provided a channel to gather and publish vital medical and health information. The focal role of these networks has become more prominent in periods of crisis, such as the recent pandemic of COVID-19. These social networks have been the leading platform for broadcasting health news updates, precaution instructions, and governmental procedures. They also provide an effective means for gathering public opinion and tracking breaking events and stories. To achieve location-based analysis for social media input, the location information of the users must be captured. Most of the time, this information is either missing or hidden. For some languages, such as Arabic, the users’ location can be predicted from their dialects. The Arabic language has many local dialects for most Arab countries. Natural Language Processing (NLP) techniques have provided several approaches for dialect identification. The recent advanced language models using contextual-based word representations in the continuous domain, such as BERT models, have provided significant improvement for many NLP applications. In this work, we present our efforts to use BERT-based models to improve the dialect identification of Arabic text. We show the results of the developed models to recognize the source of the Arabic country, or the Arabic region, from Twitter data. Our results show 3.4% absolute enhancement in dialect identification accuracy on the regional level over the state-of-the-art result. When we excluded the Modern Standard Arabic (MSA) set, which is formal Arabic language, we achieved 3% absolute gain in accuracy between the three major Arabic dialects over the state-of-the-art level. Finally, we applied the developed models on a recently collected resource for COVID-19 Arabic tweets to recognize the source country from the users’ tweets. We achieved a weighted average accuracy of 97.36%, which proposes a tool to be used by policymakers to support country-level disaster-related activities. View Full-Text
Keywords: BERT models; dialect identification; location analysis; language identification; social networks BERT models; dialect identification; location analysis; language identification; social networks
Show Figures

Figure 1

MDPI and ACS Style

Essam, N.; Moussa, A.M.; Elsayed, K.M.; Abdou, S.; Rashwan, M.; Khatoon, S.; Hasan, M.M.; Asif, A.; Alshamari, M.A. Location Analysis for Arabic COVID-19 Twitter Data Using Enhanced Dialect Identification Models. Appl. Sci. 2021, 11, 11328. https://doi.org/10.3390/app112311328

AMA Style

Essam N, Moussa AM, Elsayed KM, Abdou S, Rashwan M, Khatoon S, Hasan MM, Asif A, Alshamari MA. Location Analysis for Arabic COVID-19 Twitter Data Using Enhanced Dialect Identification Models. Applied Sciences. 2021; 11(23):11328. https://doi.org/10.3390/app112311328

Chicago/Turabian Style

Essam, Nader, Abdullah M. Moussa, Khaled M. Elsayed, Sherif Abdou, Mohsen Rashwan, Shaheen Khatoon, Md. Maruf Hasan, Amna Asif, and Majed A. Alshamari. 2021. "Location Analysis for Arabic COVID-19 Twitter Data Using Enhanced Dialect Identification Models" Applied Sciences 11, no. 23: 11328. https://doi.org/10.3390/app112311328

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop