Protecting Intellectual Security Through Hate Speech Detection Using an Artificial Intelligence Approach

Alrasheed, Sadeem; Aladhadh, Suliman; Alabdulatif, Abdulatif

doi:10.3390/a18040179

Open AccessArticle

Protecting Intellectual Security Through Hate Speech Detection Using an Artificial Intelligence Approach

by

Sadeem Alrasheed

¹

,

Suliman Aladhadh

^1,*

and

Abdulatif Alabdulatif

²

¹

Department of Information Technology, College of Computer, Qassim University, Buraydah 52571, Saudi Arabia

²

Department of Computer Science, College of Computer, Qassim University, Buraydah 52571, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(4), 179; https://doi.org/10.3390/a18040179

Submission received: 16 December 2024 / Revised: 25 January 2025 / Accepted: 4 February 2025 / Published: 21 March 2025

(This article belongs to the Special Issue Machine Learning for Pattern Recognition (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

Online social networks (OSNs) have become an integral part of daily life, with platforms such as X (formerly Twitter) being among the most popular in the Middle East. However, X faces the problem of widespread hate speech aimed at spreading hostility between communities, especially among Arabic-speaking users. This problem is exacerbated by the lack of effective tools for processing Arabic content and the complexity of the Arabic language, including its diverse grammar and dialects. This study developed a two-layer framework to detect and classify Arabic hate speech using machine learning and deep learning with various features and word embedding techniques. A large dataset of Arabic tweets was collected using the X API. The first layer of the framework focused on detecting hate speech, while the second layer classified it into religious, social, or political hate speech. Convolutional neural networks (CNN) outperformed other models, achieving an accuracy of 92% in hate speech detection and 93% in classification. These results highlight the framework’s effectiveness in addressing Arabic language complexities and improving content monitoring tools, thereby contributing to intellectual security and fostering a safer digital space.

Keywords:

hate speech detection; hate speech classification; two-layer approach; machine learning; deep learning

1. Introduction

Currently, online social networks (OSNs) play an important role in people’s daily lives and are changing the way they socialize and network [1]. Through OSNs, any type of content can be shared with a large audience within seconds [2]. In this sense, X (formerly Twitter) is one of the most popular OSNs used by Arabic speakers in the Arab world [3], and it has experienced rapid growth since its inception in 2006. For consistency, the term ’tweets” is used in this study to refer to users’ posts on X, as this term is widely recognized and continues to be used colloquially. X allows people to interact and maintain connections with their contacts and talk about their daily activities, making it a useful resource and a rich source of information [4].

In addition, the real-time nature of X enables the rapid dissemination of news, trends and opinions that influence public debate and social trends [5]. This makes the platform not only a tool for personal communication, but also an important medium for political, social and cultural exchange [6]. Nevertheless, the rapid spread of the platform and the nature of social interaction have led to the proliferation of extremist activities that directly or indirectly promote violence, intolerance, racism or other undesirable content. Extremist activities affect individuals and civilizations by influencing ideas and beliefs and inciting hatred and violence. Intellectual cyberthreat patterns emerge through various intellectual vessels and sources that spread deviant and decadent ideas through digital platforms, incite hatred and erase dissenting opinions under the guise of communication, reconciliation and cultural and intellectual integration. Toxic online content can not only cause psychological harm but also radicalize people and lead to actual hate crimes [7]. Hate speech is a form of extremism that is rooted in prejudice and intolerance and often leads to violence, discrimination and persecution. Alhazmi [8] highlights key challenges in identifying hate speech on Arabic Twitter, including the linguistic richness and ambiguity of Arabic and the dearth of comprehensive annotated corpora. The main goal of this study is to open an integrated approach using artificial and deep learning to identify and detect and classify hate speech in Arabic social media, contributing to intellectual security and safer digital spaces. The aim of this study is to develop methods to identify and resolve such problems and provide an improved experience in cyberspace. This paper presents a framework based on pre-training specific embeddings in Arabic, such as AraBERT with a CNN designed to work in two layers. All previous studies have focused on the study of English, while this paper deals with the application of these mechanisms to Arabic, as one of the complex languages, including its rich morphology and dialectal differences. It reflects tensions between different groups within and outside society and can spread rapidly via OSNs [9].

In recent years, extremist actions inciting hate have led to severe incidents, such as those in the United States. Some groups, through OSNs, have undergone what is described as a “self-learning process”, which influenced them to adopt beliefs that white supremacy necessitates violent behavior. This has culminated in tragic events including shootings at a black church [10]. The Center for Countering Digital Hate notes that the dissemination of hate speech intensifies significantly during major global events. For instance, during the World Cup, the social media platform X failed to eliminate 99% of hateful messages targeting Qatar and its players [11]. In order to protect intellectual security, the detection of hate speech, a form of intellectual extremism, is crucial, and must be performed effectively and promptly.

Arabic is the fourth most popular language on the Internet and is the sixth most popular language on X [12]. The use of X by Arabic speakers has increased significantly over time, as shown in Figure 1 below.

This rise highlights the need to address issues such as hate speech within the Arabic-speaking X community. Extensive studies have been conducted on detection of hate speech in different languages, especially English. Nevertheless, few relevant studies have been conducted in the Arabic language due to many limitations, including the complex linguistic features of Arabic, which differ from those of English, and the lack of large datasets. Based on the social responsibility of the individual towards society and the widespread accessibility of social media, this study makes the following important contributions:

Develops a system to detect hate speech in Arabic and classify it into specific types, with the goal of protecting intellectual security in society.
Creates a comprehensive Arabic hate speech dataset to support accurate detection and classification of hate speech across different dialects and contexts. This dataset will be a valuable resource for future research and technological advancements aimed at combating hate speech in the Arabic-speaking online community.

The remainder of this paper is structured as follows: Section 2 provides an overview of the fundamental concepts related to hate speech. Section 3 presents related studies and prior research in the field. Section 4 outlines the materials and methods used in this study. Section 5 details the experimental results and discusses them. Section 6 concludes the paper.

2. Fundamental Concepts of Hate Speech

X offers cheap, instant and decentralized distribution. Due to these characteristics, X has become an important medium for hate groups, as it is used to spread propaganda, inform about their targets and exchange information between like-minded people. It also allows them to justify the use of violence and legitimize their actions as they work to obstruct and delegitimize others. With the growing popularity of X, the opportunities to spread hate through this platform are multiplying [13]. Hate speech, as defined in Section 1, aims to harm target groups by insulting, harassing, intimidating, humiliating and victimizing them. It also aims to promote discrimination and hostility towards these groups based on innate or perceived characteristics such as gender, ethnicity, religion or sexual orientation. On the other hand, a hate environment is defined as an environment that publishes a hate message in any form, be it textual, visual or audio [14]. Its main aim is to spread hostile ideas and discriminatory attitudes online, which increases the severity and seriousness of this phenomenon and contributes to fueling hatred and extremism in digital societies.

Types of Hate Speech

As hate speech continues to rise on digital platforms, it is essential to comprehend its nature, in order to identify the types of attacks that different platforms encounter. This understanding aids in developing effective strategies to combat digital hate and in implementing appropriate measures to protect users, ensuring a safe communication environment. We will examine three main types of hate: political hate, social hate, and religious hate. These types of hate have been systematically researched in systematic reviews, including, but not limited to, Jahan and Oussalah [15], where the authors discuss how NLP-based approaches remain critical for such recognition.

Political hate speech (PHS) occurs when people or groups are attacked because of their political orientation or opinion. This usually occurs in times of political crisis or during elections, where hostile or insulting language is used against political opponents. Political hatred can take the form of insults, threats or questioning the loyalty of others to their home country. This type of hatred can sometimes lead to heightened political tensions and cause outbreaks of violence within the community [16].

Social hate speech (SHS) is directed against specific groups within a society, based on social characteristics such as ethnicity, social class, gender identity or even disability. This type of discourse aims to consolidate discrimination and social dominance by deepening the differences between different groups and excluding some individuals or groups from social and economic life. Most of the time, this type of language is used to spread negative thoughts or stereotypes that lead to the division of society and the exacerbation of differences between classes or social groups [17].

Religious hate speech (RHS) is directed against individuals or groups because of their religious beliefs, often in connection with differences in doctrine or cultural differences arising from religion. Religious hatred manifests itself in many forms, including discrimination in treatment, incitement to violence against religious communities, or the dissemination of hate speech against adherents of a particular religion. This type of hatred is considered one of the most dangerous as it affects religious identity, which is an integral part of the identity of individuals and groups [18].

Each of these types has particular characteristics and affects specific groups, but all can lead to the destruction of social cohesion and increase conflict. Understanding the different types of hatred helps to develop effective strategies to combat it and prevent its spread.

3. Related Studies

The detection of hate speech has become an important area of research, due to its impact on social cohesion and online safety. Efforts in this area have focused on developing systems and datasets to detect and classify hate speech in different languages and contexts. Approaches differ significantly between non-Arabic and Arabic languages due to linguistic, cultural and contextual differences. This section provides an overview of previous studies on hate speech detection.

3.1. Hate Speech Detection in Non-Arabic Languages

Several studies have explored hate speech detection in different languages using advanced machine learning and deep learning techniques. Gaddisa [19] focused on Afaan Oromo, leveraging tweets and Facebook posts annotated by linguistic experts and testing models such as CNNs, LSTMs, BiLSTMs, GRUs, and CNN-LSTM hybrids, with BiLSTM achieving the highest F1 score of 91%. This study highlighted the benefits of pre-trained embeddings and extended sampling, suggesting broader applicability across languages. Similarly, reference [20] examined Turkish hate speech detection using print media articles, employing models including the Hierarchical Attention Network (HAN) and BERT. BERT achieved the best accuracy of 90.6% when augmented with Turkish-specific linguistic features, underlining its robustness.

For the English language, Djamila et al. [21] utilized neural networks on the HASOC-2021 dataset, comparing BERT-based models, CNNs and ensembles, with the ensemble classifier achieving an F1 score of 85.1%, despite minor misdetections. Ananya et al. [22] addressed hate speech in Hinglish, using context-based embeddings such as ELMo, FLAIR, and BERT, achieving an F1 score of 87%, demonstrating their effectiveness for code-mixed languages. Flor et al. [23] analyzed Spanish hate speech using datasets from HaterNet and SemEval 2019, testing models such as BERT, XLM, and BETO, with F1 scores of 65.8% and 75.5%, respectively.

These findings emphasized the challenges posed by rare words and mislabeling, underscoring the need for improved datasets and model robustness. These studies collectively showcase the potential of advanced models and tailored approaches in addressing hate speech across diverse languages and contexts.

As with other models, research [24,25] has concluded that AraBERT is created to focus on the Arabic language. It is fine-tuned on massive amounts of Arabic text data, bringing in the ability to understand the finer aspects of the Arabic language, including its morphology and syntax. As has been demonstrated, AraBERT outperforms Arabic NLP tasks [26], such as sentiment analysis, named entity recognition and question answering. As a result, this model is a substantial breakthrough in the field of Arabic NLP, as it overcomes the problem of limited access to high-quality Arabic NLP sources and models.

mBERT [27] is expected to deal with several languages at any given time. The model is a variant of BERT and is initialized with a corpus of 104 languages, thus supporting transfer learning. This enables the model to transfer knowledge from one language to another, which enhances performance from one language to the other, and mBERT is useful in multilingual tasks, such as multilingual translation, cross-lingual opinion mining and multilingual question and answer. This novel model alleviates the lack of connection between NLP studies in one language and another, allowing for scholarly unity and cooperative effort.

DistilBERT is one of the variants of BERT [28], which intends to minimize memory consumption and computational time. It is more compact than the ‘classic’ BERT, it outperforms sacrificing mere parameter count, and has substantially faster inference times. DistilBERT minimizes redundancy and is trained to match outcomes of a larger, teacher model (such as BERT), using knowledge distillation methods. In its current form, this model provides a feasible approach to incorporating BERT-like models to contexts where resources are limited or to situations where responses must be generated as quickly as possible.

3.2. Hate Speech Detection in Arabic Languages

This section focuses on studies conducted specifically on Arabic hate speech detection, providing a detailed overview of the methods, tools and models used.

Nuha et al. [29] compiled a dataset of 6000 Arabic tweets containing religious hate speech, categorizing each tweet into hate or non-hate categories and indicating specific religions for hate tweets [30]. During preprocessing, diacritical marks, punctuation, emojis, non-Arabic characters and single-letter words were removed and normalized. The dataset was analyzed using machine learning models, including logistic regression (LR), support vector machine (SVM) and gated recurrent units (GRU), with the GRU-based model achieving an F1 score of 77%. Hala et al. [31] presented the L-HSAB dataset [32], comprising 5846 tweets labeled as normal, offensive, or hateful, with naive Bayes (NB) achieving the highest F1 score of 74.4%. Hatem et al. [33] developed the T-HSAB dataset for Tunisian Arabic [34], containing 6075 comments divided into normal (3834), offensive (1127), and hate (1078) categories, with NB outperforming SVM, achieving an F1 score of 83.6%.

Bushr et al. [35] used the OSACT4 dataset [36] to identify offensive speech in Arabic, integrating convolutional neural networks (CNN) and bidirectional gated recurrent units (Bi-GRU) enhanced with attention layers, achieving an F1 score of 85%. Omar et al. [37] collected data from multiple OSNs and evaluated 12 machine learning techniques and two deep learning models, with recurrent neural networks (RNN) achieving an F1 score of 98.7%, despite challenges in reproducibility due to limited transparency. Fatemah [38] focused on preprocessing informal social media texts for offensive language detection using the OSACT4 dataset and an SVM model, achieving 90% accuracy, although this was limited by the absence of advanced features during model training. Abdullah et al. [39] utilized the OSACT4 dataset of 10,000 tweets, with long short-term Memory (LSTM) performing best, with an F1 score of 85%.

Wissam et al. [40] used AraBERT with multitasking and multilabel classification techniques on the OSACT4 dataset, achieving a macro-F1 score of 90.15%, although it faced misclassification issues in non-offensive contexts. Raghad et al. [41] developed a dataset of 9316 annotated tweets, with a CNN model achieving an F1 score of 79%, although the study’s results were context-specific. Faris et al. [42] achieved a 91% hit rate using a dataset of 3696 tweets processed with term frequency-inverse document frequency (TF-IDF) and bag of words (BoW) methods, but faced challenges with mislabeled data. Amana et al. [43] combined multiple datasets (ArHS, L-HSAB, and OSACT4) for a total of 23,678 tweets, using BiLSTM-CNN to achieve F1 scores of 79%, 74%, and 73% for binary, ternary, and multi-label classifications, respectively.

Faisal [44] developed a dataset of 4203 comments categorized into seven classes, achieving impressive results with deep recurrent neural networks: 99.73% accuracy for binary classification, 95.38% for ternary classification and 84.14% for multi-class classification, although limited by its small dataset size. Hanane et al. [45] used a dataset of 15,050 YouTube comments [46], with the CNN-LSTM model achieving an F1 score of 83.65%, but noted the dataset’s outdated nature. Bassma et al. [47] developed a two-stage system using pre-trained word embeddings and a hybrid classification approach evaluated on the ArCybC dataset [48], with the SVM classifier achieving an F1 score of 87.8%, although it was affected by the outdated dataset [49].

Collectively, these studies underscore the importance of tailored approaches and highlight the challenges in Arabic hate speech detection, particularly concerning dataset diversity, dialect variation and evolving linguistic trends. The techniques discussed are summarized in Table 1, while Table 2 shows the only publicly available Arabic datasets used in hate speech detection studies.

Pretrained transformer models such as the BERT family including SBERT continue to deliver remarkable performance in MNLNLPTD because of their contextual understanding over sequence lengths. Some models in Arabic, such as AraBERT, are fine-tuned for task like, sentiment analysis and question answering, but these need high computational power. Most prior work focuses on structured Arabic datasets or English and have had high accuracy; however, issues have not been addressed with the informal variation between Arabic dialects and hate speech language.

4. Materials and Methods

The system employs a two-layer approach based on various machine learning and deep learning models to detect and classify hate speech. The first layer is dedicated to detecting hate speech, while the second layer focuses on classifying the detected hate speech into specific categories. The process begins with the collection of a comprehensive text Arabic dataset that serves as the basis for both layers of the system. In the first layer, feature extraction techniques are applied to prepare the dataset for analysis. To ensure a balanced and robust evaluation, a stratified five-fold cross-validation is performed, where the model is trained on four folds and tested on the fifth over several iterations. Predictions are generated for each iteration, and the overall performance of the hate speech detection model is calculated by averaging the results of these iterations.

In the second layer, the predictions of the best performing detection model are used to filter out non-hate content, leaving only hate speech data for further classification. This filtered dataset is subjected to classification into specific hate speech categories, with stratified five-fold cross-validation ensuring consistent and accurate scoring. The performance metrics are measured iteratively, and the final average score represents the effectiveness of the system in distinguishing between different types of hate speech. By integrating these two layers, the system provides a comprehensive and structured solution for the detection and classification of hate speech. The proposed system is shown in Figure 2.

In order to test the performance of different methods and algorithms, we applied and training machine learning and deep learning algorithms for the detection and classification of Arabic hate speech. Outperforming all deep learning models, we noticed that CNN performed better when used in conjunction with pre-trained embeddings. The reasons for the decision of CNN are as follows. First, CNN’s architecture to extract local features in textual data is highly beneficial in a morphologically rich language such as Arabic to detect hate speech. Second, the proposed model, CNN, is computationally efficient, since its nonsequential architecture does not include overheads arising from models like RNNs or LSTMs; thus, it is suitable for training with large datasets, such as Arabic tweet texts. Moreover, it is also seen in text classification and hate speech detection that the CNN outperforms sequences in short, context-free text because of its independence. Lastly, CNN is incorporated together with pre-trained embeddings, meaning that the localized feature extraction complements the global semantic knowledge incorporated in these representations.

N-grams and embedding of vectors from previous training were used in the study to extract features from raw data. N-grams were used to identify syntactic structures or sequences of terms and used for the detection of unambiguous hate speech content. Specifically, AraBERT and FastText covered the semantic aspects that the model required to identify other types of hate speech, such as idioms.

To deal with the nature of Arabic, the preprocessing comprised of diacritic removal, which lowers variability and normalization of giving words similar forms. Stemming was utilized to address issues of morphological differences, which counteracted noise and managed to keep words having the same root with the same status. These steps improved the data quality of the dataset and the model that was used in this study.

Three annotators, who are native Arabic speakers and majors in linguistics, were used to tag the dataset. To keep the labeling criteria consistent, annotators worked based on a set of instructions that highlighted aspects of an image that might be pertinent for a particular malware; in case of a disagreement between the annotators, the majority rule prevailed. Inter-annotator reliability was also assessed using Fleiss’ kappa, which gave a substantial reliability coefficient of 0.78. As the annotator team was small, all the guidelines were prescribed carefully and there were objectives to measure the agreement among the annotators.

Because of the richness of the Arabic language in words and grammar, preprocessing entailed the removal of the diacritics, stemming and lemmatization to merge words that have similar stems but with different inflections. Particular emphasis was paid to colloquial utterances and epithets that are often used in social media platforms. This was to neutralize the manifest dialects from the gulf and Levantine regions in the data collection process. However, some regionalisms did not receive adequate representation in our study and warrant the collection of additional data for enhanced dialectal representation.

4.1. Experimental Dataset and Setup

To create the dataset for hate speech analysis, a structured approach was followed. Initially, relevant keywords representing different types of hate speech, such as religious, political and social hate speech, were identified using various sources, including specialized databases including HATEBASE [50] and an Arabic lexicon [30]. Ahmad et al. [51] underscore the critical role of corpus design and evaluation in Arabic hate speech detection, which aligns with our focus on constructing a dialectically balanced dataset. These keywords were then used to extract tweets from the social media platform X, using the X API integrated with the Tweepy library. After retrieving approximately 9700 tweets, a filtering process removed duplicates and irrelevant entries, resulting in 8341 tweets for analysis. Preprocessing involved cleansing, tokenization and normalization using tools such as PyArabic for Arabic text processing and NLTK for further text analysis. These steps ensured a high quality dataset that was suitable for annotation and subsequent classification. These steps produced an accurate dataset that was required for annotation and further classification. The preprocessing techniques were based on literature review on text pre-processing and feature extraction to use in modeling hate speech in the social media. Studies such as [18,38,40,44,45] show that there is lot of efficaciousness in diacritic stripping, stemming and tokenization, for example, while handling the challenges of Arabic language in computational processes. This radically minimizes the noise, density and superiority representation of features or configurations in the dataset that feeds machine learning algorithms.

Following the dataset creation and preprocessing, the annotation phase was critical for preparing the data for supervised machine learning. This process involved two key tasks. Initially, tweets were annotated as either “hate speech” or “not hate speech”, providing a binary classification. Tweets identified as hate speech were further categorized into one of three subtypes. To ensure quality and reliability, three native Arabic-speaking annotators were selected, based on their expertise and willingness to participate. Clear annotation guidelines were provided and disagreements in the binary classification were resolved by majority voting. The average text length of tweets in the dataset was approximately 22 words, and inter-annotator agreement was validated with a Fleiss’ kappa score of 0.78, indicating substantial agreement. For the subcategories, tweets with complete disagreement among annotators were excluded, resulting in a final dataset of 7923 tweets, as shown in Figure 3. The reliability of these annotations was evaluated using Fleiss’ kappa.

The experiments were conducted on macOS Monterey for initial tests and on Google Colab Pro with an A100 GPU for efficient model training. Python 3 was used in both setups to ensure smooth implementation and faster processing.

4.2. Data Representation (Feature Extraction)

Data representation plays a crucial role in preparing textual data for machine learning and deep learning models, as it transforms the raw textual information into a format that can be understood and effectively processed by these models. In this study, various techniques were applied to extract meaningful features from Arabic texts tailored to the specific requirements of machine learning and deep learning approaches;, these methods are:

N-gram Feature Extraction: The text data were vectorized using the TF-IDF method, with configurations for word and character N-grams.
Word N-grams: In a text, word N-grams store groups of words and show how these words relate to each other in context. To improve the feature extraction process, this study used word N-grams between 1 and 3. Using a range of 1 to 3 allows the model to analyze single words (unigrams), pairs of consecutive words (bigrams), and three-word sequences (trigrams). This approach allows the model to understand both the meanings of individual words and the relationship of words in short lines to each other.
Character N-grams: In character N-grams, the study used characters that span up to 6 characters. This approach is very efficient for languages that have a rich morphological structure, such as Arabic, as it captures the morphological part of words. It also detects different spellings and common spelling errors that occur in digital communication.
Word/Character N-grams: The study investigated a hybrid approach of word and character N-grams, where word N-grams from 1 to 3 were integrated with character N-grams from 1 to 6 to take advantage of both representations.
Word Embeddings: For the deep learning models, the study used different methods to embed words. For CNN, RNN, LSTM and BiLSTM, we experimented with text representations by using trainable embeddings and pre-trained embeddings separately. More specifically, the pre-trained embeddings were generated from AraVec. AraVec is a pre-trained word embedding model that converts Arabic words into continuous vector representations. This model has been trained on large collections of Arabic texts, so that it can capture the complex linguistic patterns that characterize the Arabic language. The trainable embeddings, on the other hand, allowed the models to directly adapt the word representations to the dataset during the training process.

AraBERT and BiAraBERT were used in their pre-trained models for fine-tuning in the text classification task of Arabic tweet hate speech detection. The pre-training, which integrates both, provides a deep understanding of the structure of the language, while the fine-tuning adapts it to the subtleties of the dataset.

4.3. Evaluation and Validation

4.3.1. Stratified K-Fold Cross-Validation

To address the imbalance in the dataset, stratified K-fold cross-validation was utilized to evaluate the performance of the hate speech detection and classification models. This approach enhances traditional K-fold cross-validation by ensuring that each fold maintains the same class proportions as the original dataset, thereby preventing distortions caused by uneven distributions. The dataset is divided into K-folds, with K−1 folds used for training and one for testing in each iteration. This process was repeated K times, ensuring every fold serves as the test set once. For this study, K = 5 was selected, providing a balanced and robust assessment of the models. Figure 4 illustrates the class distribution across folds in the first (a) and second (b) layers of the hate speech classification system.

The method ensures that the class distributions are preserved using the following equations:

Class proportion in the dataset

i = \frac{n_{i}}{N}

(1)

where:

-: $n_{i}$ is the number of samples of class i in the dataset.
-: n is the total number of samples in the dataset.

Class proportion in each fold:

\frac{n_{i . j}}{n_{j}} \approx \frac{n_{i}}{N}

(2)

where:

-: n_i.j is the number of samples of class i in fold j.
-: n_j is the total number of samples in fold j.

By applying stratified K-fold cross-validation with K = 5, the models are evaluated on folds that reflect the original dataset’s class distribution, resulting in a more accurate and comprehensive evaluation, particularly for imbalanced datasets.

4.3.2. Computational Complexity Analysis

The theoretical time complexity of the CNN architecture is estimated to be O(n) to process sequential text. On the other hand, models such as AraBERT, which are developed based on transformers, have a complexity of O(n²), owing to their attention mechanisms. This trade-off shows the appropriateness of CNNs for real-time hate speech detection without denying the accuracy improvements provided by transformers.

4.3.3. Evaluation Metrics

Several metrics were employed to assess the effectiveness of the proposed model. These commonly used metrics enable comparisons to be made with other techniques in the literature. They include the precision, recall, F1 score and accuracy. Detailed definitions and descriptions of these metrics are provided in Table 3.

Class imbalance affected the model’s accuracy and recall of political hate speech and other categories as the minority. It was evident from precision–recall curves that the model had to make a compromise, resulting in precision and recall values of the minority classes as follows: political hate speech values were 79% and 74%, while those of religious hate speech were 91% and 89%, respectively. Further improvement might be achieved by mitigating this imbalance, using methods such as oversampling or by cost-sensitive learning.

5. Experimental Results and Discussion

In this study, we used a comprehensive training framework to evaluate the proposed two-layer system. This system, as described earlier, consists of two main layers. The first layer focuses on hate speech detection, while the second layer is dedicated to predefined categories of hate speech. At each layer, we experimented with different machine learning and deep learning models in combination with different word representation methods to achieve optimal performance for each task. The second layer relies directly on the results of the first layer and uses the predictions of the best performing model of the first layer as input for the second layer. According to the study by Aladeemy et al. [52], diacritic normalization and lemmatization are still indispensable operations in preprocessing content written in Arabic for sentiment and hate speech analysis. This also operates in specific domains such as Mednini et al. [53], where the authors focused on a specific domain known as hate speech about brands.

5.1. First Layer: Systematic Detection of Hate Speech

The first experiment focused on detecting Arabic hate speech tweets. Table 4 presents the results obtained using various machine learning models, including the support vector machine (SVM), random forest (RF), naïve Bayes (NB), and logistic regression (LR). It can be seen from Table 4 that the highest performance for Arabic hate speech detection was achieved by LR with a combination of word- and char-Ngrams as features. This configuration achieved an accuracy of 77.81%, a precision of 77.58%, a recall of 77.14% and an F1 score of 77.29%. These results show that the integration of features at the word and character level significantly improves the performance of the model compared to using the individual features.

A repetition of the experiment with deep learning models showed a significant improvement in hate speech detection. As shown in Table 5, Figure 5, Figure 6 and Figure 7, the highest performance was achieved by the convolutional neural network (CNN) with trainable embeddings, which achieved an accuracy of 91.95%, a precision of 92.59%, a recall of 93.36% and an F1 score of 92.96%. This result outperformed all other models, including recurrent architectures such as RNN, LSTM and BiLSTM, as well as models using pre-trained embeddings, such as AraVec, AraBERT and DistilBERT. These results highlight the effectiveness of CNNs, especially with trainable embeddings, in capturing complex patterns for detecting Arabic hate speech.

5.2. Second Layer: Systematic Classification of Hate Speech Types

In the second layer of our hate speech classification system, the predictions of the best performing model in the first layer, CNN, are used to classify hate speech into specific types. The predictions of each fold in the first layer are used as inputs for that layer, with non-hate speech texts filtered out to refine the results. This section evaluates the performance of different machine learning and deep learning models in classifying these types of hate speech, using different feature representations to improve the accuracy and effectiveness of the classification process. As shown in Table 6, Figure 8, Figure 9 and Figure 10, logistic regression (LR) achieved the best performance, with an accuracy of 75.58% and an F1 score of 72.31% using a combination of word and character N-grams, demonstrating the effectiveness of this feature set in capturing nuanced patterns in hate speech. The support vector machine (SVM) also performed well, achieving 73.99% accuracy and a 72.13% F1 score with the same features, showcasing its robustness. In contrast, naive Bayes (NB) struggled with word N-grams alone, achieving an F1 score of just 53.19%, but improved with character N-grams, reaching 74.13% accuracy and a 68.33% F1 score. Overall, the results highlight the importance of combining word and character N-grams to enhance model performance, with LR and SVM emerging as the most reliable models for hate speech classification.

The evaluation results in Table 7, Figure 11, Figure 12 and Figure 13 illustrate our experiments with deep learning models for hate speech classification and show that CNN is the most effective model. CNN achieved the highest accuracy of 93.23% with trainable embeddings and 93.33% with pre-trained embeddings and F1-scores of 91.93% and 92.36%, respectively. These results highlight CNN’s ability to effectively capture local text features and patterns using convolutional filters and demonstrate its adaptability and consistency across different embedding types. AraBERT also performed well, particularly with pre-trained embeddings, achieving an accuracy of 84.29% and an F1 score of 84.15%, highlighting its strength in processing nuanced speech and contextual information. Although AraBERT’s performance was slightly lower than CNN’s, it highlights the importance of contextual embeddings for hate speech detection. Overall, CNN proves to be the most reliable model in this study.

5.3. Confidence Intervals and Statistical Testing

Confidence intervals for accuracy and F1 scores were calculated as shown in Table 8. Paired t-tests demonstrated statistically significant performance differences between CNN and AraBERT (p < 0.01).

5.4. Benchmarking Against State-of-the-Art Models

To rigorously evaluate the performance and generalizability of our proposed framework, we conducted comprehensive benchmarking experiments on two external datasets: L-HSAB and OSACT4. The purpose of selecting these datasets was to cover a wide range of different linguistic features and areas and, therefore, provide a rather challenging test for the model proposed herein. The findings of these experiments are shown in Table 9.

5.5. Discussion

The evaluation results show that the proposed approach, based on CNN, performed better than other types of models in identifying and classifying Arabic hate speech. The improved performance was largely due to the use of pre-trained embeddings, reaching 92% accuracy in detecting hate speech and 93% accuracy in classifying it into the three classes of hate speech. This further highlight that CNN is better suited for extracting local features from text and that it benefits from the context of pre-trained embeddings. However, an examination of the findings reveals some errors in specific areas, as well as several limitations that need to be discussed.

Even though CNN showed promising results, it was compared to transformers such as AraBERT. Specifically, transformers outperform CNNs in modelling long-range dependencies, and the agreement ratio is 89%, which is slightly lower than CNN’s 92%. Notably, fine-tuned Mamba was fairly equal in multilingual hate speech detection to the others, although Mamba demanded significant fine-tuning for the specificity of the Arabic language. These comparisons show that the proposed CNN achieves a good trade-off between performance and required computation, especially in the case of short texts such as tweets.

In terms of efficiency, this study showed that the framework had reliable performance, with its computational complexity ideal for practical use. The string matching takes only a few seconds when working with the dataset of 10,000 tweets, which suggests that the method can be useful for monitoring in real time. It achieves lower computational costs than transformers, making the model suitable for applications with limited computational resources.

This makes the identification of false positives a moot point; nevertheless, it is ethically questionable. Misclassifications can negatively impact the interactions on the platform and the trust that users put in it by silencing genuine speech. This is a problem that has to be solved by designing AI systems that provide a way to inspect content that AI has deemed problematic, and by implementing bias-detection features that would lower the likelihood of missteps made by AI. The conclusions of the present study align with the challenges and solutions suggested by Abdelsamie et al. [54] about the difficulty of dealing with abusive language in Arabic social media.

5.5.1. Error Analysis

One general problem faced by the model was that it was unable to correctly classify sarcastic or implicit hate speech. Of the tweets containing hate, 15% were misclassified because the language used sarcasm or indirectness. In comparison to having obvious, easily recognizable rude words, implicit hate speech may contain sarcasm, puns or references to a specific culture, without necessarily having to contain Arabic words. For example, a seemingly positive comment, such as “We understand your concern”, may be a taunt to a target group that is designed to hurt. The second major source of misclassification arose from dialectal differences within the Arabic language. Although Standard Arabic is the basis for the majority of standardized educational representations, the Gulf, Levantine, and Maghrebi varieties that have infiltrated social media do not differ significantly in vocabulary, grammar and usage. The model was not very effective about 12% of the time, mainly due to an inability to understand colloquial or regional expressions. This limitation suggests that there is an opportunity to develop dialect-specific datasets, or at least embeddings, to enhance the study of Arabic hate speech in its multicultural aspects.

Progression performance was also affected by the class imbalance. The “religious hate speech” class had quite high precision and recall values, of 0.91 and 0.89 respectively, for classifying the posts accurately, while the “political hate speech “class produced lower values of precision (0.79) and recall (0.74) for classification of the posts. This can primarily be attributed to the limited inclusion of the political hate speech in the dataset, in addition to the generalization capability of the model.

The majority of misclassifications occurred in samples of hate speech and sarcasm, when no straightforward marker of offensive content was present. It was found that these cases contributed nearly 15% of false negatives. Tweets containing landmarks also posed problems, while the dialect-specific expressions turned out to be perhaps the hardest, as embeddings derived mainly from Modern Standard Arabic did not code for regional differences. Including dialect-specific data or embeddings could help alleviate such errors.

5.5.2. Limitations of the Chosen Model

On the positive side, CNN architecture showed many advantages; on the negative side, however, certain disadvantages were detected. First, there is a practical problem of training embeddings, which depend on data from other sources and, therefore, may not necessarily reflect the Arabic hate speech in informal or emerging contexts. For instance, models mentioned in this paper, such as AraBERT, can be trained for this purpose but are not originally optimized for the specific task of hate speech detection.

Moreover, while CNN is stronger in local features within text, these same characteristics reduce its ability to understand broader, more complex contextual dependencies. In the case of longer tweets that contain more than one idea or when the hate speech is intricately woven within the text message, sometimes the CNN model barely got it right, because it was missing specific contextual information. In similar circumstances, CNN could be outperformed by transformer-based models such as AraBERT because they incorporate attention representations directed towards global and contextual communication.

One of the limitations involved is that of scalability of the detection system, which must be transposable at any time to accommodate for any type of instance. Although CNN offered high and efficient output during the research experiments, implementing this model in a large-scale real-time architecture can be difficult. Furthermore, handling new embeddings and managing streams of millions of tweets per day would require more enhancement and resources.

Last, the research focus was limited to text-based discrimination; other formats, ranging from images, videos or even multimedia content, as seen on platforms such as X (formerly Twitter) were not considered. This might have restricted the model’s generalizability, as much of the hate speech that occurs in different social media platforms contains other forms of media that escalate the effect of hate speech.

5.5.3. Recommendations and Future Directions

To overcome the problems mentioned above, further research should focus on improved preprocessing strategies that are specific to the issues of dialectal variations and implicit hate speech. Using domain-specific lexicons in hate speech, tuning the embeddings on hate speech-relevant corpora or using different embeddings for different Arabic dialects could minimize the misclassification errors emerging from dialectal differences. Another approach could be to introduce other models using transformers such as AraBERT or utilize hybrid models that encase both CNN and transformer layers to approach the problem of CNN, which does not capture long or complex texts’ global context. These models penetrate better in tasks that necessitate a better understanding of textual relations and may generate a good synergy with CNNs that are optimized for local feature extraction.

Concerning class imbalance, by using oversampling or improving cost-sensitive learning, the performance of the model for sorting underrepresented types of hate speech, such as political speech, can be improved. This is likely to result in a more uniform performance of the exemplary schools at all classes. Third, extending the model for the analysis to several modalities, and the combination of textual data with image or video processing, would greatly improve the model’s reliability and versatility. As mentioned earlier, many hate speech messages on social media are image dependent, and it is thus necessary to incorporate such data.

6. Conclusions

This study presents an unpublished two-tier framework for Arabic hate speech identification and categorization, accompanied by limitations such as Arabic language richness and scarcity of datasets. Through their combination with the pre-trained embeddings, the CNNs provide high detection and classification rates. Future work focuses on working with transformer models to improve the hate speech detection and introduce features from other modalities to increase the protection of users on social media platforms.

This research processes a sentiment analysis or a similar task on texts of small sizes, such as tweets. Convolutional neural networks (CNNs) are employed due to their strengths, which include computational efficiency characteristics for working with large amounts of short texts and their ability to learn local patterns and features within the short sequences, which is especially important when analyzing the (tweet) texts. In addition, CNNs are especially effective when there are hardware and computational constraints in the environment.

A limitation of this current CNN approach is mentioned in the manuscript. This includes inadequate capabilities for the identification of long-range dependencies within the text and probable difficulties in processing syntactic features of the text in general. To overcome these limitations, the researchers intend to extrapolate from CNNs to examine more useful combinations of other architectures. This will comprise integrating CNN’s with recurrent neural network (RNNs) or long short-term memory (LSTMS). CNNs can learn local features, and RNNs or LSTMs can learn long-term dependencies and sequential information better than can CNN. Moreover, if we integrate attention mechanisms into the model, the model will be able to pay more attention to the specific parts of the text and obtain better performance of the model, as well as better computational interpretability.

In summary, this research study employs CNNs on a task related to short text data, meaning that the data are in the form of tweets. The researchers understand and foresee the disadvantages of CNNs and are keen to incorporate a hybrid model to improve the models’ shortcomings.

This study underscores the importance of detecting and classifying hate speech on OSNs such as X to ensure user safety and combat the spread of harmful content. This study has made significant contributions by compiling a comprehensive Arabic dataset from X, providing a valuable resource for the analysis of hate speech. Furthermore, the development of a novel two-layer framework that integrates both machine learning and deep learning techniques, along with various feature and embedding methods, has proven effective in detecting and classifying hate speech into specific categories, such as religious, political, and social.

Future work aims to enhance the framework by utilizing unexploited dataset features such as emojis and images, improving natural language processing for Arabic dialects and extending the approach to a unified model for simultaneous hate speech detection and classification. These advancements promise to address the challenges in automatic hate speech detection and further bolster user protection.

Author Contributions

Data curation, S.A. (Sadeem Alrasheed); formal analysis, S.A. (Sadeem Alrasheed); methodology, S.A. (Sadeem Alrasheed); resources, S.A. (Sadeem Alrasheed); software, S.A. (Sadeem Alrasheed); writing—original draft preparation, S.A. (Sadeem Alrasheed); investigation, S.A. (Suliman Aladhadh) and A.A.; project administration, S.A. (Suliman Aladhadh) and A.A.; supervision, S.A. (Suliman Aladhadh) and A.A.; validation, S.A. (Suliman Aladhadh) and A.A.; writing—review and editing, S.A. (Suliman Aladhadh) and A.A. All authors have read and agreed to the published version of the manuscript.

Funding

The authors gratefully acknowledge Qassim University, represented by the Deanship of Graduate Studies and Scientific Research, on the financial support for this research under the number (QU-J-PG-2-2025 - 54284) during the academic year 1446 AH/2024 AD.

Data Availability Statement

Authors have created a GitHub repository for this thesis, which contains all the code, datasets and documentation for the project titled “Protecting Intellectual Security Through Hate Speech Detection Using Artificial Intelligence Approaches”. GitHub Repository https://github.com/Sadeem1Alrasheed/Arabic-Hate-Speech-Detection-and-Classification, accessed on 28 February 2025).

Conflicts of Interest

The authors declare that they have no conflict of interest to report regarding the present study.

References

Robinson, L.; Smith, M. Social Media and Mental Health: Social Media Addiction. HelpGuide.org. Available online: https://www.helpguide.org/mental-health/wellbeing/social-media-and-mental-health (accessed on 15 April 2024).
Maarouf, A.; Pröllochs, N.; Feuerriegel, S. The Virality of Hate Speech on Social Media. Proc. ACM Hum. Comput. Interact. 2024, 8, 1–22. [Google Scholar] [CrossRef]
Al-Hassan, A.; Al-Dossari, H. Detection of Hate Speech in Arabic Tweets Using Deep Learning. Multimed. Syst. 2021, 28, 1963–1974. [Google Scholar] [CrossRef]
Alnazzawi, N. Using Twitter to Detect Hate Crimes and Their Motivations. HateMotiv Corpus. Data 2022, 7, 69. [Google Scholar] [CrossRef]
Mateusz, B. The Impact of Social Media on Modern Communication: How Platforms like Facebook and Twitter Influence the Way We Connect with Each Other. aithor.com. Available online: https://aithor.com/essay-examples/the-impact-of-social-media-on-modern-communication-how-platforms-like-facebook-and-twitter-influence-the-way-we-connect-with-each-other (accessed on 9 April 2024).
Meikle, G. Social Media; Routledge: London, UK, 2016. [Google Scholar] [CrossRef]
Rehman, A.; Khan, M.; Abbas, A.; Javed, M.; Abbas, M.; Hussain, M.; Ul-Allah, S. Evaluation of genetic variability and heritability of wheat genotypes under late sowing effects. Biol. Clin. Sci. Res. J. 2023, 2023, 268. [Google Scholar] [CrossRef]
Alhazmi, A.; Mahmud, R.; Idris, N.; Abo, M.E.M.; Eke, C. A systematic literature review of hate speech identification on Arabic Twitter data: Research challenges and future directions. PeerJ Comput. Sci. 2024, 10, e1966. [Google Scholar] [CrossRef]
Poletto, F.; Basile, V.; Sanguinetti, M.; Bosco, C.; Patti, V. Resources and Benchmark Corpora for Hate Speech Detection: A Systematic Review. Lang. Resour. Eval. 2020, 55, 477–523. [Google Scholar] [CrossRef]
Laub, Z. Hate Speech on Social Media: Global Comparisons. Council on Foreign Relations. Available online: https://www.cfr.org/backgrounder/hate-speech-social-media-global-comparisons (accessed on 12 January 2024).
Das, S. Twitter Fails to Delete 99% of Racist Tweets Aimed at Footballers in Run-Up to World Cup. The Guardian. Available online: https://www.theguardian.com/technology/2022/nov/20/twitter-fails-to-delete-99-of-racist-tweets-aimed-at-footballers-in-run-up-to-world-cup (accessed on 1 November 2023).
Faris, H.; Ibrahim, A.; Habib, M.; Castillo, P. Hate Speech Detection Using Word Embedding and Deep Learning in the Arabic Language Context. In Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods ICPRAM, Valletta, Malta, 22–24 February 2020; SciTePress: Setúbal, Portugal, 2020; Volume 1, pp. 453–460. [Google Scholar] [CrossRef]
Radcliffe, D.; Abuhmaid, H.; Mahliaire, N. Social Media in the Middle East 2022: A Year in Review; SSRN Electronic Journal: Rochester, NY, USA, 2023. [Google Scholar] [CrossRef]
Cohen-Almagor, R. Fighting Hate and Bigotry on the Internet. Ssrn.com. Available online: https://ssrn.com/abstract=1916552 (accessed on 9 May 2024).
Jahan, M.S.; Oussalah, M. A systematic review of hate speech automatic detection using natural language processing. Neurocomputing 2023, 546, 126232. [Google Scholar] [CrossRef]
Chaudhary, M.; Saxena, C.; Meng, H. Countering online hate speech: An nlp perspective. arXiv 2021, arXiv:2109.02941. [Google Scholar]
Brown, A.; Sinclair, A. The Politics of Hate Speech Laws; Routledge: London, UK, 2019. [Google Scholar]
Díaz-Faes, D.A.; Pereda, N. Is there such a thing as a hate crime paradigm? An integrative review of bias-motivated violent victimization and offending, its effects and underlying mechanisms. Trauma Violence Abus. 2022, 23, 938–952. [Google Scholar] [CrossRef]
Moon, R. Putting Faith in Hate; Cambridge University Press: Cambridge, UK, 2018. [Google Scholar]
Ganfure, G.O. Comparative Analysis of Deep Learning Based Afaan Oromo Hate Speech Detection. J. Big Data 2022, 9, 76. [Google Scholar] [CrossRef]
Hüsünbeyi, Z.M.; Akar, D.; Özgür, A. Identifying Hate Speech Using Neural Networks and Discourse Analysis Techniques. In Proceedings of the First Workshop on Language Technology and Resources for a Fair, Inclusive, and Safe Society within the 13th Language Resources and Evaluation Conference, Marseille, France, 20–25 June 2022; pp. 32–41. [Google Scholar]
Jahan, M.S.; Beddiar, D.; Oussalah, M.; Arhab, N.; Bounab, Y. Hate and Offensive Language Detection Using BERT for English Subtask A. In Proceedings of the Working Notes of FIRE 2021-Forum for Information Retrieval Evaluation, Gandhinagar, India, 13–17 December 2021; RWTH Aachen University: Aachen, Germany, 2021. [Google Scholar]
Srivastava, A.; Hasan, M.; Yagnik, B.; Walambe, R.; Kotecha, K. Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media. In Applications of Artificial Intelligence and Machine Learning: Select Proceedings of ICAAAIML 2020; Springer Singapore: Singapore, 2021; pp. 83–95. [Google Scholar] [CrossRef]
Aftan, S.; Shah, H. Using the AraBERT Model for Customer Satisfaction Classification of Telecom Sectors in Saudi Arabia. Brain Sci. 2023, 13, 147. [Google Scholar] [CrossRef]
Koshiry, A.M.E.; Eliwa, E.H.I.; Abd El-Hafeez, T.; Omar, A. Arabic Toxic Tweet Classification: Leveraging the AraBERT Model. Big Data Cogn. Comput. 2023, 7, 170. [Google Scholar] [CrossRef]
Antoun, W.; Baly, F.; HAJJ, H. Arabert: Transformer-based model for arabic language understanding. arXiv 2020, arXiv:2003.00104. [Google Scholar]
Pires, T. How multilingual is multilingual BERT. arXiv 2019, arXiv:1906.01502. [Google Scholar]
Wu, S.; Dredze, M. Are all languages created equal in multilingual BERT? arXiv 2020, arXiv:2005.09093. [Google Scholar]
Plaza-del-Arco, F.M.; Molina-González, M.D.; Ureña-López, L.A.; Martín-Valdivia, M.T. Comparing Pre-Trained Language Models for Spanish Hate Speech Detection. Expert Syst. Appl. 2021, 166, 114120. [Google Scholar] [CrossRef]
Albadi, N.; Kurdi, M.; Mishra, S. Are they Our Brothers? Analysis and Detection of Religious Hate Speech in the Arabic Twittersphere. In Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Barcelona, Spain, 28–31 August 2018. [Google Scholar] [CrossRef]
Albadi, N. nuhaalbadi/Arabic_hatespeech. GitHub. Available online: https://github.com/nuhaalbadi/Arabic_hatespeech (accessed on 10 May 2024).
Mulki, H.; Haddad, H.; Bechikh Ali, C.; Alshabani, H. L-HSAB: A Levantine Twitter Dataset for Hate Speech and Abusive Language. In Proceedings of the Third Workshop on Abusive Language Online, Florence, Italy, 1 August 2019; pp. 111–118. [Google Scholar] [CrossRef]
Hala-Mulki. Hala-Mulki/L-HSAB-First-Arabic-Levantine-HateSpeech-Dataset. GitHub. Available online: https://github.com/Hala-Mulki/L-HSAB-First-Arabic-Levantine-HateSpeech-Dataset (accessed on 10 May 2024).
Haddad, H.; Mulki, H.; Oueslati, A. T-HSAB: A Tunisian Hate Speech and Abusive Dataset. In Communications in Computer and Information Science; Springer International Publishing: Cham, Switzerland, 2019; pp. 251–263. [Google Scholar] [CrossRef]
Hala-Mulki. Hala-Mulki/T-HSAB-A-Tunisian-Hate-Speech-and-Abusive-Dataset. GitHub. Available online: https://github.com/Hala-Mulki/T-HSAB-A-Tunisian-Hate-Speech-and-Abusive-Dataset (accessed on 24 May 2024).
Haddad, B.; Zoher, O.; Anas, A.-A.; Ghneim, N. Arabic Offensive Language Detection with Attention-Based Deep Neural Networks. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, Marseille, France, 11–16 May 2020; pp. 76–81. [Google Scholar]
The 4th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT4). edinburghnlp.inf.ed.ac.uk. Available online: https://edinburghnlp.inf.ed.ac.uk/workshops/OSACT4/ (accessed on 3 June 2024).
Omar, A.; Mahmoud, T.M.; Abd-El-Hafeez, T. Comparative Performance of Machine Learning and Deep Learning Algorithms for Arabic Hate Speech Detection in OSNs. In Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020), Cham, Switzerland, 24 March 2020; pp. 247–257. [Google Scholar] [CrossRef]
Husain, F. OSACT4 Shared Task on Offensive Language Detection: Intensive Preprocessing-Based Approach. arXiv 2020, arXiv:2005.07297. [Google Scholar]
Alharbi, A.I.; Lee, M. Combining Character and Word Embeddings for the Detection of Offensive Language in Arabic. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, Marseille, France, 11–16 May 2020; pp. 91–96. [Google Scholar]
Djandji, M.; Baly, F.; Antoun, W.; Hajj, H. Multi-Task Learning Using AraBert for Offensive Language Detection. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, Marseille, France, 11–16 May 2020; pp. 97–101. [Google Scholar]
Alshaalan, R.; Al-Khalifa, H. Hate Speech Detection in Saudi Twittersphere: A Deep Learning Approach. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, Barcelona, Spain, 12 December 2020; pp. 12–23. [Google Scholar]
Aljarah, I.; Habib, M.; Hijazi, N.; Faris, H.; Qaddoura, R.; Hammo, B.; Abushariah, M.; Alfawareh, M. Intelligent Detection of Hate Speech in Arabic Social Network: A Machine Learning Approach. J. Inf. Sci. 2020, 47, 483–501. [Google Scholar] [CrossRef]
Duwairi, R.; Hayajneh, A.; Quwaider, M. A Deep Learning Framework for Automatic Detection of Hate Speech Embedded in Arabic Tweets. Arab. J. Sci. Eng. 2021, 46, 4001–4014. [Google Scholar] [CrossRef]
Anezi, F.Y.A. Arabic Hate Speech Detection Using Deep Recurrent Neural Networks. Appl. Sci. 2022, 12, 6010. [Google Scholar] [CrossRef]
Mohaouchane, H.; Mourhir, A.; Nikolov, N.S. Detecting Offensive Language on Arabic Social Media Using Deep Learning. In Proceedings of the 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), Granada, Spain, 22–25 October 2019. [Google Scholar] [CrossRef]
Alakrot, A.; Murray, L.; Nikolov, N.S. Dataset Construction for the Detection of Anti-Social Behaviour in Online Communication in Arabic. Procedia Comput. Sci. 2018, 142, 174–181. [Google Scholar] [CrossRef]
Shannaq, F.; Hammo, B.; Faris, H.; Castillo-Valdivieso, P.A. Offensive Language Detection in Arabic Social Networks Using Evolutionary-Based Classifiers Learned from Fine-Tuned Embeddings. IEEE Access 2022, 10, 75018–75039. [Google Scholar] [CrossRef]
Shannag, F.; Hammo, B.H.; Faris, H. The Design, Construction and Evaluation of Annotated Arabic Cyberbullying Corpus. Educ. Inf. Technol. 2022, 27, 10977–11023. [Google Scholar] [CrossRef] [PubMed]
Hatebase. Hatebase.org. Available online: https://hatebase.org (accessed on 9 December 2024).
Ahmad, A.; Azzeh, M.; Alnagi, E.; Abu Al-Haija, Q.; Halabi, D.; Aref, A.; AbuHour, Y. Hate speech detection in the Arabic language: Corpus design, construction, and evaluation. Front. Artif. Intell. 2024, 7, 1345445. [Google Scholar] [CrossRef] [PubMed]
Aladeemy, A.A.; Alzahrani, A.; Algarni, M.H.; Alsubari, S.N.; Aldhyani, T.H.; Deshmukh, S.N.; Khalaf, O.I.; Wong, W.-K.; Aqburi, S. Advancements and challenges in Arabic sentiment analysis: A decade of methodologies, applications, and resource development. Heliyon 2024, 10, e39786. [Google Scholar] [CrossRef]
Mednini, L.; Noubigh, Z.; Turki, M.D. Natural language processing for detecting brand hate speech. J. Telecommun. Digit. Econ. 2024, 12, 486–509. [Google Scholar] [CrossRef]
Abdelsamie, M.M.; Azab, S.S.; Hefny, H.A. A comprehensive review on Arabic offensive language and hate speech detection on social media: Methods, challenges and solutions. Soc. Netw. Anal. Min. 2024, 14, 1–49. [Google Scholar] [CrossRef]

Figure 1. Growth of X usage among Arabic speakers (2015–2023) [12].

Figure 2. Proposed framework for hate speech detection and classification.

Figure 3. Distribution of tweets across dataset classes.

Figure 4. In panel (a), the distribution of “Hate” and “Not Hate” classes is shown for each of the 5 folds used during stratified K-fold cross-validation. Panel (b) depicts the distribution of hate speech categories—”Social-HS”, “Political-HS”, and “Religious-HS”—in the second layer across the same folds.

Figure 5. Precision–recall (PR) curve for a repetition of the experiment with different deep learning models for the first layer (trainable embeddings).

Figure 6. Precision–recall (PR) curve for a repetition of the experiment with different deep learning models for the first layer (pretrained embedding).

Figure 7. Precision–recall (PR) curve for a repetition of the experiment with different deep learning models for the first layer (AraBERT pretrained embeddings).

Figure 8. Evaluation of the performance of different machine learning and deep learning models (Word-N-grams feature).

Figure 9. Evaluation of the performance of different machine learning and deep learning models (Char-N-grams feature).

Figure 10. Evaluation of the performance of different machine learning and deep learning models (Word + Char-N-grams feature).

Figure 11. Precision–recall (PR) curve for a repetition of the experiment with different deep learning models for the second layer (AraBERT pretrained embeddings).

Figure 12. Precision–recall (PR) curve for a repetition of the experiment with different deep learning models for the second layer (AraBERT pretrained embeddings).

Figure 13. Precision–recall (PR) curve for a repetition of the experiment with different deep learning models for the second layer (AraBERT pretrained embeddings).

Table 1. Summary of research related to hate speech detection.

Rf.	Year	Techniques	Classifier	Language	Datasets	Preprocessing	Max-score
[29]	2018	Machine Learning/ Deep Learning	LR, SVM, GRU	Arabic	Collection of 6000 tweets	Removing (diacritics, punctuation, non-Arabic characters, emoticons, and stop words). Normalizing letters	GRU = 79% Acc.
[31]	2019	Machine Learning	SVM, NB	Arabic	Collection of 5846 tweets	Removing non-Arabic characters, emojis, URLs,	NB = 88.4% Acc.
[33]	2019	Machine Learning	SVM, NB	Arabic	Collection of 6075 comments	Removing non-Arabic characters, emojis, URLs,	NB = 87.9.4% Acc.
[22]	2019	Deep Learning/Machine Learning	BERT, ELMo, FLAIR, Bi-LSTM	English/Hindi	Collection of 8000 tweets, YouTube, Instagram comments	Removing URLs, hashtags, mentions and punctuations	FLAIR BERTMU + Bi-LSTM = 73% Acc.
[35]	2020	Deep Learning	BiGRU and CNN	Arabic	Collection of 10,000 tweets and YouTube comments	Removing (diacritics, punctuation, non-Arabic characters, emoticons, and stop words). Normalizing letters	BiGRU = 91% Acc.
[37]	2020	Machine Learning/Deep Learning	12 machine learning models and 2 deep learning models (RNN, CNN)	Arabic	Collection of 20,000 tweets and YouTube, Facebook, Instagram comments	Removing non-Arabic characters, emojis, URLs	RNN = 98.7% Acc.
[38]	2020	Machine Learning	SVM	Arabic	Collection of 10,000 tweets	Data cleaning, symbol removal labeling, tokenization	SVM = 90.2% Acc.
[39]	2020	Machine Learning/Deep Learning	LSTM, XGBoost	Arabic	Collection of 10,000 tweets	Removing punctuation marks and non-Arabic characters.	LSTM = 96% Acc.
[40]	2020	Machine Learning	AraBERT	Arabic	Collection of 10,000 comments	Tokenization (removing user mentions, retweets, URLs, diacritics emojis and newlines), replacing underscore in hashtags with white spaces	AraBERT = 90.15% F1
[41]	2020	Deep Learning	RNN, CNN	Arabic	Collection of 9316 tweets	Removing non-Arabic characters, emojis, URLs	CNN = 83% Acc.
[42]	2021	Machine Learning	SVM, NB, DT, RF	Arabic	Collection of 3696 tweets	Removing (diacritics, punctuation, non-Arabic characters, emoticons, and stop words). Normalizing letters	RF = 91.3% Acc.
[43]	2021	Deep Learning	CNN, CNNLSTM, BiLSTM-CNN	Arabic	Collection of 23,678 comments	Removing (diacritics, punctuation, non-Arabic characters, emoticons, and stop words). Normalizing letters	BiLSTM = binary label: 80% ternary label: 74% Multi-label: 73%
[21]	2021	Deep Learning/Machine Learning	CNN, BERT ALBERT, RoBERTa, LR	English	Collection of 3843 tweets	Removing special characters, numbers, newlines, mention tags	(CNN + BERT-large-uncased + BERT-base- uncased + ALBERT-xxlarge-v2) = 85.5% Acc,
[23]	2021	Deep Learning/Machine Learning	LR, SVM CNN, LSTM BiLSTM XLM, mBERT BETO	Spanish	HaterNet dataset: collection of 6000 tweets HatEval dataset: collection of 6600 tweets	Normalizing URLs, emails, users’ mentions, percent, money, time, date expressions and phone numbers.	BETO = 65.8% F1 (HaterNet) 75.5% F1 (HatEval).
[44]	2022	Deep Learning	RNN	Arabic	Collection of 4203 comments	Data cleaning, symbol removal labeling, tokenization	RNN = binary label: 99.73% ternary label: 95.38% Multi-label: 84.14% Acc.
[45]	2022	Deep Learning	CNN, CNN-LSTM, BiLSTM	Arabic	Collection of 15,050 comments	Data cleaning, symbol removal labeling, 39 enization	CNN-LSTM = 87.27% 83.65% F1, 83.89% P 83.46% R
[47]	2022	Machine Learning	GA- SVM GA- XGBoost	Arabic	Collection of 4505 tweets	Removing non-Arabic characters, emojid, URLs	SVM = 88% Acc.
[19]	2022	Deep Learning	CNN, LSTMs, BiLSTMs, LSTM, GRU, and CNN-LSTM.	Ethiopian	Collection of 42,100 tweets and Facebook comments	18 enization, normalizing stop words removal	Bi-LSTM = 91% F1
[20]	2022	Deep Learning/Machine Learning	CNN + GRU BERT HAN	Turkish	Collection of 18,316 news articles	Removog non-Turkish characters, numbers, URLs	BERT = 90.4% Acc.

Table 2. Publicly available Arabic language datasets.

Rf. of Study Used	Dataset	Source	Region	Size	Labels	Year	Rf. of Dataset
[29,41]	Religious hate speech dataset	X	Multinational	6136 Tweets	Hate (2762), not hate (3374)	2019	[29]
[29,43]	L-HSAB Hate speech and offensive language	X	Levantine	5846 Tweets	Normal (3650), offensive (1728), hate (468)	2019	[31]
[33]	T-HSAB Hate speech and offensive language	X	Tunisian	6075 Tweets	Normal (3834), offensive (1127), hate (1078)	2019	[33]
[35,38,39,40,43]	OSACT4 Hate speech and offensive language	X	Multinational	20,000 Tweets	Offensive (1900), not offensive (8100), hate (50), not hate (9950)	2020	[35]
[45]	Offensive language	YouTube	Multinational	14,202 Comments	Offensive (9349), not offensive (4853)	2018	[45]
[47]	ArCybC Offensive language, cyberbullying	X	Multinational	4505 Tweets	Offensive (1887), cyberbullying (1728), Normal (890)	2022	[47]

Table 3. Evaluation metrics used for experiments.

Evaluation Metrics	Equation	Description
Accuracy	$\frac{T P + T N}{T P + F P + F N + T N}$	Accuracy measures the proportion of correctly classified instances out of the total instances. It is calculated as the ratio of the sum of true positives (TP) and true negatives (TN) to the total number of instances, which includes true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN)
Precision	$\frac{T P}{T P + F P}$	Precision measures the proportion of true positives (TP) out of all positive predictions (TP + FP)
Recall	$\frac{T P}{T P + F N}$	Recall evaluates the proportion of true positives (TP) out of all actual positives (TP + FN)
F1	$2 * \frac{P r e c i s i o n * R e c a l l}{P r e c i s i o n * R e c a l l}$	F1- score indicated as F1 is the average of precision and recall.

Table 4. Results obtained using various machine learning models, including support vector machine (SVM), random forest (RF), naïve Bayes (NB), and logistic regression (LR).

		Metrics
Model	Features	Accuracy	Precision	Recall	F1 Score
	WordNgrams	74.90	74.88	73.83	74.07
LR	Char-Ngrams	76.86	76.74	75.95	76.18
	Word + Char-N-grams	77.81	77.58	77.14	77.29
	Word-N-grams	74.49	74.16	74.26	75.02
SVM	Char-N-grams	77.16	76.7	76.86	77.39
	Word + Char-N-grams	77.06	76.82	76.9	77.28
	Word-N-grams	76.24	76.89	74.69	75.06
NB	Char-N-grams	76.01	75.73	75.3	75.45
	Word + Char-N-grams	76.93	76.6	76.46	76.51
	Word-N-grams	72.86	72.71	71.7	71.91
RF	Char-N-grams	74.76	74.91	73.42	73.71
	Word + Char-N-grams	74.45	74.62	73.06	73.35

Table 5. A repetition of the experiment with different deep learning models for the first layer.

		Metrics
Model	Features	Accuracy	Precision	Recall	F1 Score
RNN	Trainble Embeddings	85.81	89.48	84.55	86.93
LSTM		90.09	91.22	91.16	91.19
BiLSTM		88.82	91.52	88.07	89.73
CNN		91.95	92.59	93.36	92.96
(a) Trainable Embedding
RNN	AraVec pretrained Embeddings	79.77	84.97	77.83	81.24
LSTM		79.97	82.68	81.47	82.07
BiLSTM		79.68	82.02	81.87	81.93
CNN		92.75	94.11	93.18	93.64
Arabert	AraBERT pretrained Embeddings	91.67	91.78	91.28	91.49
Distilbert	AraBERT pretrained Embeddings	80.37	80.25	79.66	79.88
(b) Pretrained Embedding

Table 6. Evaluation of the performance of different machine learning and deep learning models in classifying types of hate speech, using different feature representations to improve the accuracy and effectiveness of the classification process.

		Metrics
Model	Features	Accuracy	Precision	Recall	F1 Score
	Word-N-grams	71.1	74.39	62.81	64.84
LR	Char-N-grams	74.91	77.11	69.04	71.35
	Word + Char-N-grams	75.58	77.61	70.04	72.31
	Word-N-grams	72.22	74,34	66.59	68.83
SVM	Char-N-grams	75.09	76.21	70.9	72.76
	Word + Char-N-grams	73.99	74.15	70.8	72.13
	Word-N-grams	68.02	75.76	55.09	53.19
NB	Char-N-grams	72.58	78.04	63.43	65.38
	Word + Char-N-grams	74.13	77.89	66.07	68.33
	Word-N-grams	69.77	73.32	62.55	64.76
RF	Char-N-grams	71.05	73.28	64.38	66.55
	Word + Char-N-grams	71.66	74.14	65.27	67.52

Table 7. A repetition of the experiment with different deep learning models for the second layer.

		Metrics
Model	Features	Accuracy	Precision	Recall	F1 Score
RNN	Trainable Embeddings	85.13	84.14	81.33	82.26
LSTM		79.92	71.85	70.61	68.79
BiLSTM		83.27	72.33	72.81	71.84
CNN		93.23	93.04	91.39	91.93
(a) Trainable Embedding
RNN	AraVec pretrained Embeddings	80.39	79.52	79.35	79.38
LSTM		80.68	80.14	80.00	80.01
BiLSTM		78.06	77.00	77.05	76.79
CNN		93.33	93.33	91.95	92.36
Arabert	AraBERT pretrained Embeddings	84.29	84.85	83.59	84.15
Distilbert	AraBERT pretrained Embeddings	78.68	77.57	77.18	77.24
(b) Pretrained Embedding

Table 8. Confidence intervals for accuracy and F1 scores for CNN.

Model	Metric	Mean (%)	95% CI (%)
CNN	Accuracy	92.0	[91.5, 92.5]
CNN	F1 Score	93.0	[92.4, 93.6]

Table 9. Confidence intervals for accuracy and F1 scores for CNN.

Dataset	Model	Accuracy (%)	F1 Score (%)
L-HSAB	CNN	89.5	88.3
L-HSAB	AraBERT	91.8	90.7
OSACT4	CNN	87.4	85.0
OSACT4	AraBERT	90.2	88.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alrasheed, S.; Aladhadh, S.; Alabdulatif, A. Protecting Intellectual Security Through Hate Speech Detection Using an Artificial Intelligence Approach. Algorithms 2025, 18, 179. https://doi.org/10.3390/a18040179

AMA Style

Alrasheed S, Aladhadh S, Alabdulatif A. Protecting Intellectual Security Through Hate Speech Detection Using an Artificial Intelligence Approach. Algorithms. 2025; 18(4):179. https://doi.org/10.3390/a18040179

Chicago/Turabian Style

Alrasheed, Sadeem, Suliman Aladhadh, and Abdulatif Alabdulatif. 2025. "Protecting Intellectual Security Through Hate Speech Detection Using an Artificial Intelligence Approach" Algorithms 18, no. 4: 179. https://doi.org/10.3390/a18040179

APA Style

Alrasheed, S., Aladhadh, S., & Alabdulatif, A. (2025). Protecting Intellectual Security Through Hate Speech Detection Using an Artificial Intelligence Approach. Algorithms, 18(4), 179. https://doi.org/10.3390/a18040179

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Protecting Intellectual Security Through Hate Speech Detection Using an Artificial Intelligence Approach

Abstract

1. Introduction

2. Fundamental Concepts of Hate Speech

Types of Hate Speech

3. Related Studies

3.1. Hate Speech Detection in Non-Arabic Languages

3.2. Hate Speech Detection in Arabic Languages

4. Materials and Methods

4.1. Experimental Dataset and Setup

4.2. Data Representation (Feature Extraction)

4.3. Evaluation and Validation

4.3.1. Stratified K-Fold Cross-Validation

4.3.2. Computational Complexity Analysis

4.3.3. Evaluation Metrics

5. Experimental Results and Discussion

5.1. First Layer: Systematic Detection of Hate Speech

5.2. Second Layer: Systematic Classification of Hate Speech Types

5.3. Confidence Intervals and Statistical Testing

5.4. Benchmarking Against State-of-the-Art Models

5.5. Discussion

5.5.1. Error Analysis

5.5.2. Limitations of the Chosen Model

5.5.3. Recommendations and Future Directions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI