MDPI - Publisher of Open Access Journals

24 pages, 4913 KiB

Open AccessArticle

Region-Wise Recognition and Classification of Arabic Dialects and Vocabulary: A Deep Learning Approach

by Fawaz S. Al–Anzi and Bibin Shalini Sundaram Thankaleela

Appl. Sci. 2025, 15(12), 6516; https://doi.org/10.3390/app15126516 - 10 Jun 2025

Viewed by 603

This article presents a unique approach to Arabic dialect identification using a pre-trained speech classification model. The system categorizes Arabic audio clips into their respective dialects by employing 1D and 2D convolutional neural network technologies built from diverse dialects from the Arab region [...] Read more.

This article presents a unique approach to Arabic dialect identification using a pre-trained speech classification model. The system categorizes Arabic audio clips into their respective dialects by employing 1D and 2D convolutional neural network technologies built from diverse dialects from the Arab region using deep learning models. Its objective is to enhance traditional linguistic handling and speech technology by accurately classifying Arabic audio clips into their corresponding dialects. The techniques involved include record gathering, preprocessing, feature extraction, prototypical architecture, and assessment metrics. The algorithm distinguishes various Arabic dialects, such as A (Arab nation authorized dialectal), EGY (Egyptian Arabic), GLF (Gulf Arabic), LAV and LF (Levantine Arabic, spoken in Syria, Lebanon, and Jordan), MSA (Modern Standard Arabic), NOR (North African Arabic), and SA (Saudi Arabic). Experimental results demonstrate the efficiency of the proposed approach in accurately determining diverse Arabic dialects, achieving a testing accuracy of 94.28% and a validation accuracy of 95.55%, surpassing traditional machine learning models such as Random Forest and SVM and advanced erudition models such as CNN and CNN2D. Full article

(This article belongs to the Special Issue Speech Recognition and Natural Language Processing)

► Show Figures

Figure 1

16 pages, 2242 KiB

Open AccessArticle

Effective Data Augmentation Techniques for Arabic Speech Emotion Recognition Using Convolutional Neural Networks

by Wided Bouchelligua, Reham Al-Dayil and Areej Algaith

Appl. Sci. 2025, 15(4), 2114; https://doi.org/10.3390/app15042114 - 17 Feb 2025

Cited by 1 | Viewed by 1119

Abstract

This paper investigates the effectiveness of various data augmentation techniques for enhancing Arabic speech emotion recognition (SER) using convolutional neural networks (CNNs). Utilizing the Saudi Dialect and BAVED datasets, we address the challenges of limited and imbalanced data commonly found in Arabic SER. [...] Read more.

This paper investigates the effectiveness of various data augmentation techniques for enhancing Arabic speech emotion recognition (SER) using convolutional neural networks (CNNs). Utilizing the Saudi Dialect and BAVED datasets, we address the challenges of limited and imbalanced data commonly found in Arabic SER. To improve model performance, we apply augmentation techniques such as noise addition, time shifting, increasing volume, and reducing volume. Additionally, we examine the optimal number of augmentations required to achieve the best results. Our experiments reveal that these augmentations significantly enhance the CNN’s ability to recognize emotions, with certain techniques proving more effective than others. Furthermore, the number of augmentations plays a critical role in balancing model accuracy. The Saudi Dialect dataset achieved its best results with two augmentations (increasing volume and decreasing volume), reaching an accuracy of 96.81%. Similarly, the BAVED dataset demonstrated optimal performance with a combination of three augmentations (noise addition, increasing volume, and reducing volume), achieving an accuracy of 92.60%. These findings indicate that carefully selected augmentation strategies can greatly improve the performance of CNN-based SER systems, particularly in the context of Arabic speech. This research underscores the importance of tailored augmentation techniques to enhance SER performance and sets a foundation for future advancements in this field. Full article

(This article belongs to the Special Issue Natural Language Processing: Novel Methods and Applications)

► Show Figures

Figure 1

22 pages, 1750 KiB

Open AccessArticle

Attitudes Toward Dialectal Variations in Saudi Arabic: A Case Study of King Abdulaziz University Students

by Saeed Ali Al Alaslaa

Languages 2025, 10(1), 2; https://doi.org/10.3390/languages10010002 - 27 Dec 2024

Viewed by 2659

Abstract

The current study investigated the attitudes of 340 Saudi college students towards two Arabic dialectal variations, kaskasah and kaʃkaʃah, utilizing the matched-guise technique. Participants listened to recordings of a speaker using each variation and evaluated the speaker on various personality traits, regional [...] Read more.

The current study investigated the attitudes of 340 Saudi college students towards two Arabic dialectal variations, kaskasah and kaʃkaʃah, utilizing the matched-guise technique. Participants listened to recordings of a speaker using each variation and evaluated the speaker on various personality traits, regional origin, and hireability. The findings revealed generally positive attitudes towards both variations, with the majority associating the speaker with desirable traits such as humility, kindness, friendliness, and respectfulness. However, the kaskasah variation was perceived slightly more favorably overall compared to kaʃkaʃah. The study also found distinct regional associations, with kaskasah slightly more strongly linked to the Najdi dialect and kaʃkaʃah overwhelmingly associated with the Southern dialect. Notably, a considerable minority indicated that they would not hire speakers of these variations, particularly kaʃkaʃah, suggesting some degree of dialect-based bias. The study contributes to research on language attitudes in Saudi Arabia by highlighting the complex interplay between dialectal variation, regional identity, and social evaluation. The findings underscore the importance of promoting linguistic awareness and inclusivity to mitigate the negative effects of dialect-based stereotyping and bias. Full article

(This article belongs to the Special Issue Sociolinguistic Studies: Insights from Arabic)

► Show Figures

Figure 1

18 pages, 871 KiB

Open AccessArticle

Advancing AI-Driven Linguistic Analysis: Developing and Annotating Comprehensive Arabic Dialect Corpora for Gulf Countries and Saudi Arabia

by Nouf Al-Shenaifi, Aqil M. Azmi and Manar Hosny

Mathematics 2024, 12(19), 3120; https://doi.org/10.3390/math12193120 - 5 Oct 2024

Cited by 1 | Viewed by 2854

Abstract

This study harnesses the linguistic diversity of Arabic dialects to create two expansive corpora from X (formerly Twitter). The Gulf Arabic Corpus (GAC-6) includes around 1.7 million tweets from six Gulf countries—Saudi Arabia, UAE, Qatar, Oman, Kuwait, and Bahrain—capturing a wide range of [...] Read more.

This study harnesses the linguistic diversity of Arabic dialects to create two expansive corpora from X (formerly Twitter). The Gulf Arabic Corpus (GAC-6) includes around 1.7 million tweets from six Gulf countries—Saudi Arabia, UAE, Qatar, Oman, Kuwait, and Bahrain—capturing a wide range of linguistic variations. The Saudi Dialect Corpus (SDC-5) comprises 790,000 tweets, offering in-depth insights into five major regional dialects of Saudi Arabia: Hijazi, Najdi, Southern, Northern, and Eastern, reflecting the complex linguistic landscape of the region. Both corpora are thoroughly annotated with dialect-specific seed words and geolocation data, achieving high levels of accuracy, as indicated by Cohen’s Kappa scores of 0.78 for GAC-6 and 0.90 for SDC-5. The annotation process leverages AI-driven techniques, including machine learning algorithms for automated dialect recognition and feature extraction, to enhance the granularity and precision of the data. These resources significantly contribute to the field of Arabic dialectology and facilitate the development of AI algorithms for linguistic data analysis, enhancing AI system design and efficiency. The data provided by this research are crucial for advancing AI methodologies, supporting diverse applications in the realm of next-generation AI technologies. Full article

(This article belongs to the Topic AI and Data-Driven Advancements in Industry 4.0)

► Show Figures

Figure 1

24 pages, 22050 KiB

Open AccessArticle

SOD: A Corpus for Saudi Offensive Language Detection Classification

by Afefa Asiri and Mostafa Saleh

Computers 2024, 13(8), 211; https://doi.org/10.3390/computers13080211 - 20 Aug 2024

Viewed by 1753

Abstract

Social media platforms like X (formerly known as Twitter) are integral to modern communication, enabling the sharing of news, emotions, and ideas. However, they also facilitate the spread of harmful content, and manual moderation of these platforms is impractical. Automated moderation tools, predominantly [...] Read more.

Social media platforms like X (formerly known as Twitter) are integral to modern communication, enabling the sharing of news, emotions, and ideas. However, they also facilitate the spread of harmful content, and manual moderation of these platforms is impractical. Automated moderation tools, predominantly developed for English, are insufficient for addressing online offensive language in Arabic, a language rich in dialects and informally used on social media. This gap underscores the need for dedicated, dialect-specific resources. This study introduces the Saudi Offensive Dialectal dataset (SOD), consisting of over 24,000 tweets annotated across three levels: offensive or non-offensive, with offensive tweets further categorized as general insults, hate speech, or sarcasm. A deeper analysis of hate speech identifies subtypes related to sports, religion, politics, race, and violence. A comprehensive descriptive analysis of the SOD is also provided to offer deeper insights into its composition. Using machine learning, traditional deep learning, and transformer-based deep learning models, particularly AraBERT, our research achieves a significant F1-Score of 87% in identifying offensive language. This score improves to 91% with data augmentation techniques addressing dataset imbalances. These results, which surpass many existing studies, demonstrate that a specialized dialectal dataset enhances detection efficacy compared to mixed-language datasets. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Large Language Modelling)

► Show Figures

Figure 1

21 pages, 2866 KiB

Open AccessArticle

Sentiment Analysis of Students’ Feedback on E-Learning Using a Hybrid Fuzzy Model

by Maryam Alzaid and Fethi Fkih

Appl. Sci. 2023, 13(23), 12956; https://doi.org/10.3390/app132312956 - 4 Dec 2023

Cited by 12 | Viewed by 3148

Abstract

It is crucial to analyze opinions about the significant shift in education systems around the world, because of the widespread use of e-learning, to gain insight into the state of education today. A particular focus should be placed on the feedback from students [...] Read more.

It is crucial to analyze opinions about the significant shift in education systems around the world, because of the widespread use of e-learning, to gain insight into the state of education today. A particular focus should be placed on the feedback from students regarding the profound changes they experience when using e-learning. In this paper, we propose a model that combines fuzzy logic with bidirectional long short-term memory (BiLSTM) for the sentiment analysis of students’ textual feedback on e-learning. We obtained this feedback from students’ tweets expressing their opinions about e-learning. There were some ambiguous characteristics in terms of the writing style and language used in the collected feedback. It was written informally and not in adherence to standardized Arabic language writing rules by using the Saudi dialects. The proposed model benefits from the capabilities of the deep neural network BiLSTM to learn and also from the ability of fuzzy logic to handle uncertainties. The proposed models were evaluated using the appropriate evaluation metrics: accuracy, F1-score, precision, and recall. The results showed the effectiveness of our proposed model and that it worked well for analyzing opinions obtained from Arabic texts written in Saudi dialects. The proposed model outperformed the compared models by obtaining an accuracy of 86% and an F1-score of 85%. Full article

(This article belongs to the Special Issue Artificial Intelligence in Complex Networks (2nd Edition))

► Show Figures

Figure 1

19 pages, 886 KiB

Open AccessArticle

Sentiment Analysis of Arabic Course Reviews of a Saudi University Using Support Vector Machine

by Ali Louati, Hassen Louati, Elham Kariri, Fahd Alaskar and Abdulaziz Alotaibi

Appl. Sci. 2023, 13(23), 12539; https://doi.org/10.3390/app132312539 - 21 Nov 2023

Cited by 8 | Viewed by 2348

Abstract

This study presents the development of a sentimental analysis system for high education students using Arabic text. There is a gap in the literature concerning understanding the perceptions and opinions of students in Saudi Arabia Universities regarding their education beyond COVID-19. The proposed [...] Read more.

This study presents the development of a sentimental analysis system for high education students using Arabic text. There is a gap in the literature concerning understanding the perceptions and opinions of students in Saudi Arabia Universities regarding their education beyond COVID-19. The proposed SVM Sentimental Analysis for Arabic Students’ Course Reviews (SVM-SAA-SCR) algorithm is a general framework that involves collecting student reviews, preprocessing them, and using a machine learning model to classify them as positive, negative, or neutral. The suggested technique for preprocessing and classifying reviews includes steps such as collecting data, removing irrelevant information, tokenizing, removing stop words, stemming or lemmatization, and using pre-trained sentiment analysis models. The classifier is trained using the SVM algorithm and performance is evaluated using metrics such as accuracy, precision, and recall. Fine-tuning is done by adjusting parameters such as kernel type and regularization strength to optimize performance. A real dataset provided by the deanship of quality at Prince Sattam bin Abdulaziz University (PSAU) is used and contains students’ opinions on various aspects of their education. We also compared our algorithm with CAMeLBERT, a state-of-the-art Dialectal Arabic model. Our findings show that while the CAMeLBERT model classified 70.48% of the reviews as positive, our algorithm classified 69.62% as positive which proves the efficiency of the suggested SVM-SAA-SCR. The results of the proposed model provide valuable insights into the challenges and obstacles faced by Arab Universities post-COVID-19 and can help to improve their educational experience. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

27 pages, 5232 KiB

Open AccessArticle

Text Classification of Patient Experience Comments in Saudi Dialect Using Deep Learning Techniques

by Najla Z. Alhazzani, Isra M. Al-Turaiki and Sarah A. Alkhodair

Appl. Sci. 2023, 13(18), 10305; https://doi.org/10.3390/app131810305 - 14 Sep 2023

Cited by 6 | Viewed by 2022

Abstract

Improving the quality of healthcare services is of the utmost importance in healthcare systems. Patient experience is a key aspect that should be gauged and monitored continuously. However, the measurement of such a vital indicator typically cannot be carried out directly, instead being [...] Read more.

Improving the quality of healthcare services is of the utmost importance in healthcare systems. Patient experience is a key aspect that should be gauged and monitored continuously. However, the measurement of such a vital indicator typically cannot be carried out directly, instead being derived from the opinions of patients who usually express their experience in free text. When it comes to patient comments written in the Arabic language, the currently used strategy to classify Arabic comments is totally reliant on human annotation, which is time-consuming and prone to subjectivity and error. Thus, fully using the value of patient feedback in a timely manner is difficult. This paper addresses the problem of classifying patient experience (PX) comments written in Arabic into 25 classes by using deep learning- and BERT-based models. A real-world data set of patient comments is obtained from the Saudi Ministry of Health for this purpose. Features are extracted from the data set, then used to train deep learning-based classifiers—including BiLSTM and BiGRU—for which pre-trained static word embedding and pre-training vector word embeddings are utilized. Furthermore, we utilize several Arabic pre-trained BERT models, in addition to building PX_BERT, a customized BERT model using the PX unlabeled database. From the experimental results for the 28 classifiers built in this study, the best-performing models (based on the F1 score) are found to be PX_BERT and AraBERTv02. To the best of our knowledge, this is the first study to tackle PX comment classification for the Arabic language. Full article

(This article belongs to the Special Issue Machine Learning and Artificial Intelligence for Human Information Analysis)

► Show Figures

Figure 1

14 pages, 1697 KiB

Open AccessArticle

Applying a Character-Level Model to a Short Arabic Dialect Sentence: A Saudi Dialect as a Case Study

by Tahani Alqurashi

Appl. Sci. 2022, 12(23), 12435; https://doi.org/10.3390/app122312435 - 5 Dec 2022

Cited by 5 | Viewed by 2583

Abstract

Arabic dialect identification (ADI) has recently drawn considerable interest among researchers in language recognition and natural language processing fields. This study investigated the use of a character-level model that is effectively unrestricted in its vocabulary, to identify fine-grained Arabic language dialects in the [...] Read more.

Arabic dialect identification (ADI) has recently drawn considerable interest among researchers in language recognition and natural language processing fields. This study investigated the use of a character-level model that is effectively unrestricted in its vocabulary, to identify fine-grained Arabic language dialects in the form of short written text. The Saudi dialects, particularly the four main Saudi dialects across the country, were considered in this study. The proposed ADI approach consists of five main phases, namely dialect data collection, data preprocessing and labelling, character-based feature extraction, deep learning character-based model/classical machine learning character-based models, and model evaluation performance. Several classical machine learning methods, including logistic regression, stochastic gradient descent, variations of the naive Bayes models, and support vector classification, were applied to the dataset. For the deep learning, the character convolutional neural network (CNN) model was adapted with a bidirectional long short-term memory approach. The collected data were tested under various classification tasks, including two-, three- and four-way ADI tasks. The results revealed that classical machine learning algorithms outperformed the CNN approach. Moreover, the use of the term frequency–inverse document frequency, combined with a character n-grams model ranging from unigrams to four-grams achieved the best performance among the tested parameters. Full article

(This article belongs to the Special Issue Recent Trends in Natural Language Processing and Its Applications)

► Show Figures

Figure 1

18 pages, 3643 KiB

Open AccessArticle

MULDASA: Multifactor Lexical Sentiment Analysis of Social-Media Content in Nonstandard Arabic Social Media

by Ghadah Alwakid, Taha Osman, Mahmoud El Haj, Saad Alanazi, Mamoona Humayun and Najm Us Sama

Appl. Sci. 2022, 12(8), 3806; https://doi.org/10.3390/app12083806 - 9 Apr 2022

Cited by 16 | Viewed by 3257

Abstract

The semantically complicated Arabic natural vocabulary, and the shortage of available techniques and skills to capture Arabic emotions from text hinder Arabic sentiment analysis (ASA). Evaluating Arabic idioms that do not follow a conventional linguistic framework, such as contemporary standard Arabic (MSA), complicates [...] Read more.

The semantically complicated Arabic natural vocabulary, and the shortage of available techniques and skills to capture Arabic emotions from text hinder Arabic sentiment analysis (ASA). Evaluating Arabic idioms that do not follow a conventional linguistic framework, such as contemporary standard Arabic (MSA), complicates an incredibly difficult procedure. Here, we define a novel lexical sentiment analysis approach for studying Arabic language tweets (TTs) from specialized digital media platforms. Many elements comprising emoji, intensifiers, negations, and other nonstandard expressions such as supplications, proverbs, and interjections are incorporated into the MULDASA algorithm to enhance the precision of opinion classifications. Root words in multidialectal sentiment LX are associated with emotions found in the content under study via a simple stemming procedure. Furthermore, a feature–sentiment correlation procedure is incorporated into the proposed technique to exclude viewpoints expressed that seem to be irrelevant to the area of concern. As part of our research into Saudi Arabian employability, we compiled a large sample of TTs in 6 different Arabic dialects. This research shows that this sentiment categorization method is useful, and that using all of the characteristics listed earlier improves the ability to accurately classify people’s feelings. The classification accuracy of the proposed algorithm improved from 83.84% to 89.80%. Our approach also outperformed two existing research projects that employed a lexical approach for the sentiment analysis of Saudi dialects. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Figure 1

29 pages, 6865 KiB

Open AccessEditor’s ChoiceArticle

Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning

by Shoayee Alotaibi, Rashid Mehmood, Iyad Katib, Omer Rana and Aiiad Albeshri

Appl. Sci. 2020, 10(4), 1398; https://doi.org/10.3390/app10041398 - 19 Feb 2020

Cited by 101 | Viewed by 13912

Abstract

Smartness, which underpins smart cities and societies, is defined by our ability to engage with our environments, analyze them, and make decisions, all in a timely manner. Healthcare is the prime candidate needing the transformative capability of this smartness. Social media could enable [...] Read more.

Smartness, which underpins smart cities and societies, is defined by our ability to engage with our environments, analyze them, and make decisions, all in a timely manner. Healthcare is the prime candidate needing the transformative capability of this smartness. Social media could enable a ubiquitous and continuous engagement between healthcare stakeholders, leading to better public health. Current works are limited in their scope, functionality, and scalability. This paper proposes Sehaa, a big data analytics tool for healthcare in the Kingdom of Saudi Arabia (KSA) using Twitter data in Arabic. Sehaa uses Naive Bayes, Logistic Regression, and multiple feature extraction methods to detect various diseases in the KSA. Sehaa found that the top five diseases in Saudi Arabia in terms of the actual afflicted cases are dermal diseases, heart diseases, hypertension, cancer, and diabetes. Riyadh and Jeddah need to do more in creating awareness about the top diseases. Taif is the healthiest city in the KSA in terms of the detected diseases and awareness activities. Sehaa is developed over Apache Spark allowing true scalability. The dataset used comprises 18.9 million tweets collected from November 2018 to September 2019. The results are evaluated using well-known numerical criteria (Accuracy and F1-Score) and are validated against externally available statistics. Full article

(This article belongs to the Special Issue Artificial Intelligence Applications to Smart City and Smart Enterprise)

► Show Figures

Figure 1

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI