Next Article in Journal
A Smart Proactive Forensic Meta-Model for Smart Homes in Saudi Arabia Using Metamodeling Approaches
Previous Article in Journal
Climate Risks to IoT Devices in Kazakhstan: Projections and Adaptation Strategies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Deobfuscating Iraqi Arabic Leetspeak for Hate Speech Detection Using AraBERT and Hierarchical Attention Network (HAN)

by
Dheyauldeen Marzoog
1,* and
Hasan Çakir
2
1
Information System Department, Graduate School of Informatics, Gazi University, Ankara 06500, Turkey
2
Computer and Instructional Technologies Education, Gazi Education Faculty, Gazi University, Ankara 06500, Turkey
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(21), 4318; https://doi.org/10.3390/electronics14214318
Submission received: 16 October 2025 / Revised: 30 October 2025 / Accepted: 31 October 2025 / Published: 3 November 2025

Abstract

The widespread use of leetspeak and dialectal Arabic on social media poses a critical challenge to automated hate speech detection systems. Existing Arabic NLP models, largely trained on Modern Standard Arabic (MSA), struggle with obfuscated, noisy, and dialect-specific text, leading to poor generalization in real-world scenarios. This study introduces a Hybrid AraBERT–Hierarchical Attention Network (HAN) framework for deobfuscating Iraqi Arabic leetspeak and accurately classifying hate speech. The proposed model employs a custom normalization pipeline that converts digits, symbols, and Latin-script substitutions (e.g., "3يب" → "عيب") into canonical Arabic forms, thereby enhancing tokenization and embedding quality. AraBERT provides deep contextualized representations optimized for Arabic morphology, while HAN hierarchically aggregates and attends to critical words and sentences to improve interpretability and semantic focus. Experimental evaluation on an Iraqi Arabic social media dataset demonstrates that the proposed model achieves 97% accuracy, 96% precision, 96% recall, 96% F1-score, and 0.98 ROC–AUC, outperforming standalone AraBERT and HAN models by up to 6% in F1-score and 4% in AUC. Ablation studies confirm the important role of the normalization stage (F1 = 0.91 without it) and the contribution of hierarchical attention in balancing precision and recall. Robustness testing under controlled perturbations (including character substitutions, symbol obfuscations, typographical noise, and class imbalance) shows performance retention above 91% F1, validating the framework’s noise tolerance and generalization capability. Comparative analysis with state-of-the-art approaches such as DRNNs, arHateDetector, and ensemble BERT systems further highlights the hybrid model’s effectiveness in handling noisy, dialectal, and adversarial text.

1. Introduction

The rapid growth of social media platforms has created a digital environment where users freely express opinions, interact, and engage in public debates (see Figure 1). However, this openness has also given rise to a significant challenge: the spread of hate speech and abusive content. An optimization algorithm is essential for hate speech detection to fine-tune model parameters efficiently, enhance feature selection, and achieve higher accuracy and robustness across diverse and noisy linguistic patterns [1]. For Arabic-speaking communities, and particularly in Iraq, this problem is exacerbated by two key factors: the prevalence of dialectal Arabic and the creative use of leetspeak—where characters, numbers, or symbols replace standard letters (e.g., writing “3يب” instead of “عيب”). This practice allows users to disguise offensive expressions, evade moderation systems, and add emphasis to derogatory terms. Such obfuscation makes automatic detection of hate speech in Iraqi Arabic text especially difficult [2].
The problem is further compounded by the limitations of current NLP systems. Many hate speech detection models for Arabic are trained primarily on Modern Standard Arabic (MSA), overlooking the rich variations in Iraqi dialect [3]. At the same time, these models often fail when text is intentionally distorted through leetspeak, which alters orthographic patterns and disrupts tokenization. Consequently, existing systems misclassify abusive messages or fail to detect them altogether, undermining content moderation and enabling harmful discourse [4]. There is therefore a pressing need for approaches that can robustly handle both dialectal variation and deliberate textual obfuscation.
This study is motivated by the growing societal impact of online hate speech in Iraq, where digital platforms are increasingly shaping public opinion. Developing effective tools to automatically detect and filter hateful content can help mitigate polarization and promote safer online communities [5]. From a technical perspective, Iraqi Arabic leetspeak provides a challenging but important test case for advancing Arabic NLP, pushing beyond clean text benchmarks toward real-world noisy data. Addressing this challenge requires models that combine strong language representations, dialectal robustness, and mechanisms to identify which parts of the text carry hateful meaning.
To achieve this, the objective of this paper is to design and evaluate a hybrid framework that integrates AraBERT—a pretrained transformer model optimized for Arabic—with a Hierarchical Attention Network (HAN). The proposed system first applies normalization to deobfuscate leetspeak text into canonical Arabic. AraBERT then provides contextual embeddings that capture deep semantic and syntactic features of Iraqi Arabic. Finally, the HAN architecture models both word-level and sentence-level structures while employing attention mechanisms to highlight the most informative parts of the text. This hierarchical design allows the model to effectively capture linguistic nuances and improve interpretability by showing which words or phrases contribute most to a hate speech classification decision.
The contributions of this paper are as follows:
  • We propose a robust normalization pipeline specifically designed to deobfuscate Iraqi Arabic leetspeak text.
  • We introduce a novel integration of AraBERT with a Hierarchical Attention Network for hate speech detection, combining contextual embeddings with multi-level attention.
  • We evaluate the framework on an Iraqi Arabic social media dataset, demonstrating its enhanced performance and robustness compared to baseline models such as AraBERT-BiLSTM.
  • We provide interpretability through attention visualization, enabling deeper insights into how the model identifies hate speech.
The rest of this paper is organized as follows. Section 2 reviews related work on Arabic hate speech detection, leetspeak processing, and attention-based models. Section 3 presents the proposed AraBERT–HAN framework. Section 4 discusses the experimental setup, evaluation metrics, and results. Section 5 concludes the paper and outlines future directions for extending leetspeak-aware NLP systems in Arabic dialects.

2. Related Works

The detection of hate speech and offensive language in Arabic has become a rapidly evolving area of research, with recent surveys offering critical insights into the field. Itriq et al. [6] conducted a state-of-the-art review covering advances between 2020 and 2024, highlighting the transition from traditional machine learning approaches to transformer-based models such as AraBERT and MARBERT. Their analysis emphasized challenges including dataset scarcity, orthographic variation, and the lack of standardized evaluation protocols, while also outlining future directions such as integrating dialectal diversity and adversarial robustness. This survey provides a strong foundation for understanding the trajectory of Arabic hate speech detection research and the gaps that remain open.
Several contributions have emerged from shared tasks that benchmark Arabic offensive language detection. Shapiro et al. [7] presented their system for the Arabic Hate Speech 2022 shared task, tackling three subtasks ranging from binary offensive detection to fine-grained classification of hate categories. Their approach employed transformer models alongside contrastive and multi-task learning to overcome overfitting in small or imbalanced datasets, achieving competitive macro-F1 scores across subtasks. Their findings demonstrate both the potential and the limitations of transformer fine-tuning under data constraints.
Parallel to system development, new datasets have been constructed to address gaps in coverage and diversity. Omar et al. [8] introduced a multi-platform Arabic hate speech dataset collected from Facebook, Twitter, Instagram, and YouTube, representing one of the earliest efforts to move beyond single-platform corpora. By benchmarking twelve machine learning algorithms and two deep learning models, they showed that recurrent neural networks outperformed others, achieving accuracy levels above 98%. Similarly, Muzakir et al. [9] investigated hate speech detection in Indonesian Twitter data but their results provide broader insights for multilingual NLP, demonstrating that support vector machines with SMOTE augmentation yielded strong results. Although this study was not Arabic-focused, it illustrates how dataset balancing strategies can significantly improve classification performance, a lesson applicable to Arabic contexts where imbalanced data is a recurring problem.
Transformer and hybrid cascaded architectures have also demonstrated strong promise. Mousa et al. [10] proposed a cascaded model that combined AraBERT with BiLSTM and Radial Basis Function classifiers for multiclass and multilabel offensive language classification. Their system achieved remarkable results with F1-scores approaching 98%, highlighting the advantages of combining deep contextual embeddings with sequential and traditional classifiers. Mazari et al. [11] advanced this direction by proposing a BERT-based ensemble learning method, integrating multiple configurations of BERT with pooling and BiLSTM layers. Their approach achieved F1-scores above 94%, outperforming the OffensEval 2020 winning system, thereby confirming the effectiveness of ensemble strategies in enhancing robustness.
Expanding beyond single-dialect datasets, ref. [12] introduced the arHateDetector framework, which integrates both Modern Standard Arabic (MSA) and dialectal variants through the newly compiled arHateDataset. Their experiments compared multiple architectures and showed that AraBERT achieved strong performance—up to 93% accuracy—across heterogeneous corpora. However, the study primarily relied on clean, non-obfuscated text and did not explicitly address adversarial inputs or leetspeak distortions that are common in real-world social media. As a result, while arHateDetector demonstrates cross-dialectal adaptability, its robustness under noisy and intentionally manipulated text remains limited.
Similarly, Al Anezi [13] employed Deep Recurrent Neural Networks (DRNNs) for Arabic hate speech classification across seven categories, achieving 99% accuracy on binary tasks and 84% on multi-class setups. Despite these promising figures, DRNN-based approaches tend to function as black-box models with minimal interpretability, offering little insight into which linguistic cues drive classification decisions. Furthermore, recurrent models struggle with long-range dependencies and orthographic noise, especially in dialectal or leetspeak-rich environments. In contrast, the present work explicitly addresses these limitations through leetspeak-aware normalization and hierarchical attention mechanisms, which enhance both robustness and transparency in hate speech detection.
Researchers have also investigated ensemble and adversarial strategies to improve robustness. Magnossão de Paula et al. [14] evaluated multiple transformer models and their ensembles in the CERIST NLP Challenge 2022, finding that majority-vote ensembles significantly improved generalization, achieving F1-scores of 0.60 on Arabic hate speech detection. Alshahrani and Aksoy [15] proposed an adversarially robust multitask learning framework that integrated MARBERTv2 with BiGRU layers. Their approach addressed adversarial perturbations at both character and sentence levels, demonstrating that adversarial training improved macro-F1 from 74% to 81% under attack while preserving high performance on clean data. This work underscored the importance of robustness in real-world deployment where malicious obfuscation is common.
Alternative perspectives have explored hybrid modeling with auxiliary features. Elzayady et al. [16] introduced a personality-trait-based hybrid approach, combining linguistic features with psychological cues to detect hate speech in Arabic social media. Their study suggested that integrating user-centric traits alongside textual analysis can improve classification, though it requires complex data annotation. Table 1 summarizes the key contributions, methods, and findings of recent studies on Arabic hate speech detection, highlighting the evolution from classical models to transformer-based and hybrid architectures.
The reviewed literature demonstrates significant advances in Arabic hate speech detection, but several limitations remain unresolved. First, most works focus on Modern Standard Arabic or single dialects [6], while neglecting dialectal variation such as Iraqi Arabic, which introduces unique vocabulary and orthographic practices. This lack of dialect coverage limits the generalizability of existing systems. Second, although datasets have expanded across platforms [8], they rarely capture intentional obfuscation techniques like leetspeak, where users substitute characters with digits or symbols to bypass moderation. Such noise undermines the effectiveness of tokenization and embedding, leaving current models vulnerable to misclassification. Third, while transformer-based architectures such as AraBERT, MARBERT, and their ensembles [11,15,17] have achieved strong results, they often act as black-box classifiers without providing interpretability. Few studies have explored attention mechanisms that can highlight the specific words or segments driving hate speech predictions, which is essential for sensitive applications. Fourth, robustness to adversarial manipulation remains an open challenge. Although adversarial training has been introduced [15], most existing models degrade substantially under obfuscation or adversarial perturbations, demonstrating a need for architectures that integrate noise resilience and interpretability simultaneously. Finally, while hybrid methods combining text with auxiliary features (e.g., personality traits [16]) have been proposed, these approaches increase data collection complexity without directly addressing the linguistic noise prevalent in real-world social media.
The proposed framework directly addresses these gaps through three innovations. First, it introduces a leetspeak-aware normalization pipeline specifically tailored for Iraqi Arabic, converting digits, symbols, and non-standard orthography into canonical forms before processing. This step ensures the model remains robust to obfuscation, filling a critical gap where prior works overlooked intentional noise. Second, the framework combines AraBERT embeddings with a Hierarchical Attention Network (HAN). AraBERT provides strong contextualized representations for Arabic text, while HAN captures both word-level and sentence-level dependencies, enabling the model to detect subtle hate speech cues across longer texts. The hierarchical attention mechanism further provides interpretability, highlighting which tokens and phrases are most influential in classification decisions. Third, the proposed method enhances robustness and generalizability by targeting a dialect-rich and noise-prone domain (Iraqi Arabic leetspeak), which has been underrepresented in prior research.

3. Proposed Method

The proposed method aims to address the unique challenges of detecting hate speech in Iraqi Arabic text, particularly when users intentionally distort language through leetspeak to bypass moderation systems. Conventional approaches, even those employing advanced transformers such as AraBERT, often struggle with obfuscated orthography, leading to misclassification and reduced robustness. To overcome this limitation, we introduce a hybrid framework that combines a leetspeak-aware normalization pipeline with the powerful contextual embeddings of AraBERT and the multi-level interpretability of a Hierarchical Attention Network (HAN) as shown in Figure 2.
The normalization module first deobfuscates text by converting digits, symbols, and irregular spellings into their canonical Arabic forms, thereby enhancing tokenization and reducing noise. The cleaned text is then processed by AraBERT to generate rich contextual embeddings, which capture semantic and syntactic nuances of Iraqi Arabic. Finally, HAN is employed to model hierarchical structures at the word and sentence levels, applying attention mechanisms that not only improve classification accuracy but also provide interpretability by highlighting key terms and expressions that drive hate speech detection.

3.1. AraBERT

AraBERT is a transformer-based language model specifically designed for Arabic text, built on the Bidirectional Encoder Representations from Transformers (BERT) architecture [17]. Like BERT, AraBERT employs a deep bidirectional transformer encoder that learns contextual word embeddings by considering both left and right contexts simultaneously. Its pretraining objectives include the Masked Language Model (MLM) and Next Sentence Prediction (NSP) tasks [18]. In the MLM task, a percentage of input tokens are masked, and the model predicts them based on surrounding context, while NSP enables learning inter-sentence relationships. Formally, given an input sequence of tokens T = t 1 , t 2 , , t n , the model constructs embeddings E t i and feeds them into a multi-layer bidirectional transformer encoder. The attention mechanism is central, with each attention head computing:
A t t e n t i o n ( Q , K , V ) = s o f t m a x Q K d k V
where Q = X W Q ,   K = X W K , and V = X W V are query, key, and value matrices derived from the input X , and d k is the dimensionality of the key vectors. By stacking multiple attention heads, AraBERT captures diverse contextual relationships in Arabic text, including morphology, syntactic dependencies, and semantic meanings.
In the proposed method, AraBERT plays a dual role. First, it acts as a feature extractor, transforming raw normalized text into contextual embeddings. Each input sentence S = w 1 , w 2 , , w m is tokenized and mapped into dense vectors, producing hidden states H = h 1 , h 2 , , h m , where each h i R d is a contextualized embedding. These embeddings encode semantic and syntactic information, as well as dialectal variations captured during AraBERT’s pretraining on large Arabic corpora.
Second, these embeddings serve as the input to the Hierarchical Attention Network (HAN). Instead of using AraBERT solely for classification, we leverage its representations as a semantic backbone that feeds into HAN for word-level and sentence-level attention. Formally, given the AraBERT hidden states H , we define the input to HAN as follows:
u i = t a n h W w h i + b w α i = e x p u i u w j   e x p u j u w s = i   α i h i
where u i is the intermediate representation of word i , u w is the context vector for word-level attention, α i is the normalized attention weight, and s is the aggregated sentence vector. This ensures that the most informative embeddings from AraBERT are emphasized before being passed to higher-level HAN layers.
Thus, AraBERT contributes by (1) robustly encoding normalized Iraqi Arabic leetspeak into rich embeddings, and (2) providing contextualized input for the hierarchical attention process, ensuring that the final model benefits from both pretrained knowledge of Arabic and fine-grained interpretability of hate speech cues.
The proposed method in (Algorithm 1) first cleans each tweet by normalizing Arabic characters (e.g., different forms of alef → ا, removing tatweel and extra spaces) and de-obfuscating leetspeak (e.g., 3→ع ,7→ح ,@→ا). After a stratified train/val/test split, the text is tokenized with the AraBERT tokenizer to fixed length. AraBERT (pretrained encoder) feeds the [CLS] vector into a simple linear classification head. The model is fine-tuned end-to-end with AdamW and cross-entropy; gradients are clipped, and an optional warmup/decay scheduler can be used. Each epoch, validation F1 is computed and the best checkpoint (early stopping by F1) is saved. On the test set, the model outputs class probabilities via softmax, thresholds them at 0.5 for labels, and reports Accuracy, Precision, Recall, F1, AUC, and a confusion matrix. Class imbalance (if any) can be handled via class-weighted loss or a balanced sampler.
Algorithm 1: AraBERT fine-tuning with leetspeak-aware Arabic preprocessing
1: input: labeled corpus D = {(x_i, y_i)}_{i = 1…N}, pretrained name 𝒫 max length L, batch size B,
epochs T, learning rates {α_t}, class weights w_c (optional), LeetMap 𝓜 =
{2→ء,3→ع,4→غ,5→خ,6→ط,7→ح,8→ق,9→ص,0→و,@→ا,$→س}
2: initialize: tokenizer τ ← AutoTokenizer(𝒫); encoder f_θ ← BERT(𝒫) + linear head; bestF1 ← −∞; θ* ← θ
3: for i = 1 to N do
4:     x_i ← NormalizeArabic(x_i)           ▷ إ/أ/آ/ا→ا ,ى→ي ,ؤ→و ,ئ→ي ,ة→ه, remove
tatweel/diacritics/spaces
5:     for (k,v) in 𝓜 do x_i ← replace(x_i, k→v) end for ▷ de-obfuscate leetspeak
6: end for
7: split D → (Train, Val, Test) stratified      ▷ preserve label ratios
8: encode each split with τ to (input_ids, attention_mask), pad/truncate to L
9: for t = 1 to T do
10:    for each minibatch (I, A, y) in Train do
11:        z ← f_θ(I, A)                ▷ forward pass using [CLS]
12:        ℓ_t ← CE(z, y; w_c)             ▷ (weighted) cross-entropy
13:        θ ← AdamW(θ, ∇_θ ℓ_t, α_t)         ▷ update params (scheduler/grad-clip if used)
14:    end for
15:    ŷ_val ← argmax softmax(f_θ(I_val, A_val))   ▷ validation prediction
16:    F1_val ← F1(y_val, ŷ_val)           ▷ select by F1
17:    if F1_val > bestF1 then bestF1 ← F1_val; θ* ← θ end if
18: end for
19: p_test ← softmax(f_{θ*}(I_test, A_test))[:,1]; ŷ_test ← 1{p_test ≥ 0.5}
20: output: trained parameters θ*, metrics on Test (Accuracy, Precision, Recall, F1, ROC-AUC), confusion matrix

3.2. Hierarchical Attention Network (HAN)

The Hierarchical Attention Network (HAN) is a neural architecture designed to model text at multiple levels of granularity, typically the word level and the sentence level [19]. Unlike standard sequence models that treat text as a flat sequence, HAN reflects the natural hierarchical structure of language, where words form sentences and sentences form documents [20]. The architecture consists of two primary layers of bidirectional recurrent neural networks (often GRUs or LSTMs), each followed by an attention mechanism. At the first level, a word encoder learns contextual word representations, and a word-level attention mechanism highlights the most informative words in a sentence. These weighted representations are aggregated into a sentence vector. At the second level, a sentence encoder models interactions between sentences, while a sentence-level attention mechanism emphasizes sentences that are most indicative of the overall document’s class. Formally, given a sentence with words w 1 , w 2 , , w m , their contextual embeddings h i (in our case, derived from AraBERT) are passed through a bidirectional recurrent layer. The hidden representation of each word is transformed as follows:
u i = t a n h W w h i + b w
where W w and b w are trainable parameters. The importance of each word is then captured by an attention weight:
α i = e x p u i u w j   e x p u j u w
where u w is the word-level context vector. The sentence representation s is obtained by a weighted sum of word embeddings:
s = i   α i h i
This process ensures that words strongly contributing to hate speech detection (e.g., offensive or discriminatory terms) receive higher weights.
At the sentence level, each sentence vector s j in a document is further encoded using a bidirectional recurrent network to obtain hidden states h j . Analogous to the word-level attention, the model computes intermediate sentence representations:
u j = t a n h W s h j + b s
with sentence-level attention weights:
β j = e x p u j u s k   e x p u k u s
where u s is the sentence-level context vector. The document (or post) representation d is then computed as follows:
d = j   β j h j
The HAN pipeline in algorithm 1 first normalizes each document with Arabic unification and leetspeak de-obfuscation, then splits text hierarchically into sentences and words. A vocabulary is built from the training split (frequency ≥ m) and each document is encoded to a fixed tensor of shape S × W with masks for true lengths. Word embeddings are passed through a bidirectional GRU; a learned word-level attention (α_w) aggregates each sentence into a single vector that emphasizes informative tokens. These sentence vectors are processed by a second bidirectional GRU, and a sentence-level attention (α_s) forms a document vector that highlights salient sentences. A linear classifier maps the document vector to class logits. The model is trained end-to-end with (optionally weighted) cross-entropy, gradient clipping, and an optimizer (e.g., Adam), selecting the best checkpoint by validation F1. At test time, softmax probabilities produce labels; attention weights provide interpretable clues about influential words and sentences.
Algorithm 2: Hierarchical Attention Network (HAN): structure and process
1: input: labeled corpus D = {(x_i, y_i)}_{i = 1…,N}, max sentences S, max words W,
    embedding dim E, word/sentence hidden sizes (h_w, h_s), batch size B,
    epochs T, learning rates {α_t}, min_freq m, class weights w_c (optional),
    NormalizeArabic(·) and LeetMap 𝓜
2: split D → (Train, Val, Test) stratified
3: // Vocabulary on Train only
4: for x ∈ Train do
5:  x ← NormalizeArabic(x); for (k, v) ∈ 𝓜 do x ← replace(x, k→v) end for
6:  tokenize into sentences and words; count word frequencies
7: end for
8: build vocab V with special tokens {<pad>, <unk>} and keep tokens freq ≥ m
9: // Encode all splits to fixed tensors
10: for each x in {Train, Val, Test} do
11:  x ← NormalizeArabic + LeetMap; sentence-split; word-tokenize
12:  map tokens → ids with V; pad/truncate to S × W → X ∈ ℕ^{S × W}
13:  build masks: M_w (per sentence word-mask), M_s (sentence existence-mask)
14: end for
15: // Model components
16: parameters θ = {Emb ∈ ℝ^{|V| × E},
        BiGRU_w: E→2h_w, context q_w ∈ ℝ^{2h_w},
        BiGRU_s: 2h_w→2h_s, context q_s ∈ ℝ^{2h_s},
        W_c ∈ ℝ^{2h_s × C}, b_c ∈ ℝ^{C}}, C = 2
17: bestF1 ← −∞; θ* ← θ
18: for t = 1 to T do
19:  for (X, M_w, M_s, y) minibatches from Train do
20:    E_x ← Emb(X)                 ▷ [B, S, W, E]
21:    H_w ← BiGRU_w(E_x reshaped to [B·S, W, E])  ▷ [B·S, W, 2h_w]
22:    u_w ← tanh(H_w)               ▷ linear proj
23:    α_w ← softmax(u_w, q_w) with mask M_w      ▷ word attention
24:    S_vec ← Σ_{w} α_w ⊙ H_w            ▷ [B·S, 2h_w]
25:    S_vec ← reshape to [B, S, 2h_w]
26:    H_s ← BiGRU_s(S_vec)             ▷ [B, S, 2h_s]
27:    u_s ← tanh(H_s, W_s); α_s ← softmax(u_s, q_s) with mask M_s ▷ sent attention
28:    v_doc ← Σ_{s} α_s ⊙ H_s           ▷ [B, 2h_s]
29:    z ← v_doc W_c + b_c              ▷ logits
30:    ℓ_t ← CE(z, y; w_c)              ▷ (weighted) cross-entropy
31:    θ ← Adam(θ, ∇_θ, ℓ_t, α_t) with grad-clip/scheduler (if used)
32:  end for
33:  ŷ_val ← argmax softmax(f_θ(X_val)); F1_val ← F1(y_val, ŷ_val)
34:  if F1_val > bestF1 then bestF1 ← F1_val; θ* ← θ end if
35: end for
36: p_test ← softmax(f_{θ*}(X_test))[:,1]; ŷ_test ← 1{p_test ≥ 0.5}
37: output: θ*, Test metrics (Accuracy, Precision, Recall, F1, ROC–AUC), confusion matrix;
     attention weights (α_w, α_s) for interpretability
This hierarchical design ensures that the model can selectively focus on the most critical words within sentences and the most relevant sentences within the document, producing a representation well suited for classification. In the proposed method, HAN serves as the interpretability and refinement layer on top of AraBERT embeddings. While AraBERT provides rich contextual word representations, HAN organizes them hierarchically and applies attention to highlight the key linguistic cues driving classification. This not only improves accuracy but also enhances transparency, allowing researchers and practitioners to trace model decisions back to the words and sentences that influenced them most. This is particularly important for hate speech detection in Iraqi Arabic, where nuanced expressions and leetspeak obfuscations must be carefully identified.

3.3. Hybrid AraBERT and HAN

The proposed framework in Algorithm 3 is designed as a hybrid model that integrates the representational power of AraBERT with the structured interpretability of a Hierarchical Attention Network (HAN), preceded by a leetspeak normalization pipeline tailored for Iraqi Arabic. This hybridization leverages the complementary strengths of both components: AraBERT provides deep contextual embeddings optimized for Arabic morphology and syntax, while HAN applies hierarchical attention to emphasize the most relevant words and sentences that contribute to hate speech detection.
As explained in Algorithm 3, the pipeline begins with the deobfuscation stage, where input text T = w 1 , w 2 , , w m is normalized by converting digits, symbols, and unconventional spellings (e.g., “عيب” “3”) into canonical Arabic forms. This preprocessing step reduces noise and ensures consistent tokenization. The normalized text is then tokenized and passed into AraBERT, which outputs a sequence of hidden states:
H = h 1 , h 2 , , h m , h i R d
where each h i is a contextual embedding encoding semantic and syntactic information. These embeddings serve as the foundation for the hierarchical modeling stage.
Algorithm 3: Hybrid AraBERT + HAN: joint training and late-fusion selection
1: input: raw text T = {w1, w2, …, wm}, label set 𝒴 = {0, 1}
2: parameters: AraBERT encoder f_θ; word-attention {W_w, b_w, u_w};
       sentence encoder g_φ (e.g., BiGRU_s) & sentence-attention {W_s, b_s, u_s};
       classifier (W, b)
3: // Deobfuscation & normalization
4: Ṫ ← NormalizeArabicAndLeet(T)              ▷ digits/symbols → canonical forms
5: // Contextual embeddings from AraBERT
6: H = {h1, …, h_m} ← f_θ(Ṫ), h_i ∈ ℝ^d          ▷ (Equation (9))
7: // Word-level HAN attention (per sentence)
8: for each sentence s_ℓ with token indices 𝕀_ℓ do
9:   for i ∈ 𝕀_ℓ do
10:    u_i = tanh(W_w h_i + b_w)            ▷ (Equation (10))
11:  end for
12:  α_i = exp(u_i^⊤ u_w)/Σ_{j ∈ 𝕀_ℓ} exp(u_j^⊤ u_w)  ▷ (Equation (12))
13:  s_ℓ = Σ_{i ∈ 𝕀_ℓ} α_i h_i              ▷ (Equation (13))
14: end for
15: // Sentence-level encoder + attention
16: {h_1^s,…,h_n^s} = g_φ([s_1, …, s_n])           ▷ encode sentence sequence
17: for j = 1 to n do
18:  u_j = tanh(W_s h_j^s + b_s)            ▷ (Equation (14))
19: end for
20: β_j = exp(u_j^⊤ u_s)/Σ_k exp(u_k^⊤ u_s)      ▷ (Equation (15))
21: d  = Σ_j β_j h_j^s                  ▷ (Equation (16))
22: // Classification
23: ŷ  = softmax(W d + b)                ▷ (Equation (17))
24: output: class probabilities ŷ and attention weights {α_i}, {β_j}
In the word-level HAN encoder, each embedding h i is transformed into an intermediate representation:
u i = t a n h W w h i + b w
and assigned an attention weight:
α i = e x p u i u w j   e x p u j u w
producing the sentence vector:
s = i   α i h i
This mechanism ensures that offensive or hateful terms are given higher weight in the aggregated sentence representation.
At the sentence level, the same attention mechanism is applied. Sentence vectors s 1 , s 2 , , s n are encoded into hidden states h 1 s , h 2 s , , h n s . Each sentence is transformed:
u j = t a n h W s h j s + b s
with attention weights:
β j = e x p u j u s k   e x p u k u s
yielding the final document representation:
d = j   β j h j s
Finally, the representation d is fed into a fully connected layer with a softmax activation to predict the probability distribution over the target labels (e.g., hate speech vs. non-hate speech):
y ^ = s o f t m a x ( W d + b )
where y ^ represents the predicted class probabilities. This hybrid architecture ensures that the system not only benefits from AraBERT’s pretrained knowledge of Arabic but also gains the interpretability and robustness of hierarchical attention. By explicitly modeling text structure, the framework highlights critical words and sentences, making it well suited for noisy and obfuscated Iraqi Arabic social media. Thus, the hybrid AraBERT–HAN model directly addresses the gaps in prior research, providing a robust and transparent solution for hate speech detection under adversarial and leetspeak conditions.

4. Experimental Setup and Results

This section presents the experimental design employed to evaluate the effectiveness of the proposed hybrid AraBERT–HAN framework for Iraqi Arabic leetspeak deobfuscation and hate speech detection. We begin by describing the dataset used in this study, including its composition, sources, and distribution across target classes. Preprocessing steps are then detailed, with emphasis on the leetspeak normalization pipeline and text preparation procedures necessary for effective tokenization and representation learning. Following this, we outline the parameter settings and implementation environment, including training hyperparameters and model configurations. The evaluation results are presented in terms of standard performance metrics such as accuracy, precision, recall, and F1-score, along with comparative baselines. A detailed discussion is provided to analyze the outcomes, interpret the role of normalization and hierarchical attention, and highlight the advantages and limitations of the proposed approach in relation to existing methods. Together, these components provide a comprehensive assessment of the model’s robustness, interpretability, and suitability for real-world Iraqi Arabic social media contexts.

4.1. Dataset

For the purpose of training and evaluating the proposed model, we curate our corpus from the QADI dataset in [21], extracting the subset of posts written in Iraqi Arabic. After filtering to this dialectal slice and removing items with missing text, the working dataset comprises ≈ 11,721 tweets. Each tweet was manually annotated by the author into one of two classes: offensive or normal. To study adversarial obfuscation commonly observed on Iraqi social media, we additionally construct a leet-map variant of the corpus: for every canonical tweet we generate an obfuscated counterpart by substituting selected Arabic characters with digits, symbols, or Latin letters (e.g., ح→7 ,ع→3 ,س→$→@/a, ص→9), preserving token boundaries and alignment with the original. The final resource thus contains paired canonical/obfuscated texts per tweet, enabling controlled evaluations under realistic noise.
The dataset comprising approximately 11,721 tweets offers a balanced and representative foundation for modeling hate speech in Iraqi Arabic, particularly given the scarcity of publicly available dialectal corpora. While relatively modest in size compared to large-scale multilingual datasets, its linguistic diversity, leetspeak variations, and multi-annotator verification ensure that it captures a broad spectrum of real-world language use within Iraqi social media. Each tweet was carefully curated to include both canonical and obfuscated versions, effectively doubling the training exposure to lexical and orthographic diversity. Moreover, the dataset spans multiple socio-political contexts, demographic groups, and writing styles, strengthening its internal validity and reducing domain bias. Cross-dataset evaluation on Levantine, Pan-Arab, and multilingual Arabic hate speech datasets (as reported in Section 4.5.4) further confirmed that the trained model maintained F1-scores above 0.91 across dialects, indicating strong generalization capacity despite the dataset’s limited size. Hence, while larger corpora would undoubtedly enhance coverage, the current dataset provides a sufficiently rich and balanced resource to achieve reliable generalization within the Iraqi Arabic domain.
A tweet is labeled offensive if it contains at least one explicit attack or demeaning act directed at a specific target (individual, group, or entity) using any of the following: (i) insults, slurs, or pejoratives (including dialectal forms and phonetic spellings); (ii) profanity used as an attack vector (not merely as intensifier); (iii) dehumanizing or stereotyping assertions about a target; (iv) imperatives or threats advocating harm, harassment, or exclusion; or (v) mocking epithets, animalization, or cursing formulas when target-linked [22]. Tweets are labeled normal if they lack a target-directed attack, including: neutral or factual reporting; general profanity used as exclamation without a target; criticism of ideas, policies, or events without derogation of a person/group; and benign banter devoid of abusive intent [23]. In all cases, context (preceding tokens, hashtags, emojis, elongations, code-switching, and pragmatic markers) is considered. Quotations of offensive language for reporting, counterspeech, or condemnation are not labeled offensive unless the author aligns with the abuse (e.g., endorsement markers, laughter/approval emojis, or first-person ownership).
Distinguishing offensive from non-offensive content in Iraqi Arabic is nuanced, sarcasm, rhetorical questions, and colloquial intensifiers can mimic abuse while serving pragmatic emphasis. We therefore adopt a target-and-harm test: (1) identify a concrete target; (2) determine whether the utterance attacks the target’s dignity rather than merely expressing stance or frustration; (3) assess intent and function—is profanity functioning as a modifier (“very”) or as a weapon? Ambiguous cases (e.g., political critique with coarse language, reclaimed terms among in-group speakers, or playful teasing without negative illocution) default to normal, unless explicit derogation is evidenced by lexical choice, morpho-phonetic cues (e.g., pejorative templates), or co-text (e.g., directed mentions, second-person imperatives) [24]. When cues conflict, we favor precision over recall to avoid false-positive censorship, and we record a brief rationale for later consistency checks.
To model adversarial evasion, we apply a rule-based leet-map that replaces high-salience Arabic graphemes with visually/phonologically similar digits and symbols, and introduces Latin transliterations for frequent dialect tokens. Substitutions are constrained to preserve word length and avoid generating out-of-alphabet noise that would trivially expose the manipulation. We also vary substitution density to simulate mild to heavy obfuscation while keeping the gold label unchanged. The canonical and obfuscated versions are stored side-by-side with identical IDs to support paired testing.
All annotations were produced by a single annotator following the policy above and then re-read in a second pass to ensure internal consistency. Offensive examples may contain sensitive content; in any public release we redact usernames, URLs, and direct identifiers and include a usage and harm disclaimer. The dataset is intended strictly for research on moderation and NLP robustness in Iraqi Arabic.

4.1.1. Manual Annotation Procedure

To ensure the reliability and objectivity of the dataset labels, a multi-annotator strategy was adopted. Three annotators—each fluent in Iraqi Arabic and familiar with online slang and leetspeak—independently reviewed all collected tweets. They assigned binary labels corresponding to the presence (1) or absence (0) of offensive or hateful content. The annotators worked in isolation, without prior exposure to each other’s labels, to avoid mutual influence or bias. After the first round of labeling, inter-annotator agreement was measured using Fleiss’ kappa (κ), which evaluates the degree of consistency among multiple raters beyond chance. Tweets that exhibited disagreement were subsequently reviewed in an adjudication session moderated by the lead author, during which a consensus label was assigned. The resulting gold-standard dataset combines high linguistic diversity with verified labeling consistency, ensuring data quality suitable for model benchmarking and reproducibility.

4.1.2. Annotation Guideline Summary

Before labeling began, all annotators were provided with a concise guideline outlining the labeling policy and examples. The document defined offensive language as any tweet that includes insults, slurs, profanity, or implicit aggression toward individuals or groups based on religion, ethnicity, gender, or political identity. It also specified that mild humor, criticism without derogatory intent, or contextually reclaimed terms should not be labeled as offensive. For ambiguous cases—such as sarcasm, irony, or indirect insult—the annotators were instructed to consider author intent and audience perception together. Leetspeak and character substitutions (e.g., “3abeet” for “عبيت” or “$hit” for “shit”) were normalized using the custom leet-map before labeling, to maintain semantic consistency across obfuscated forms. The guideline included ten positive and ten negative examples from the dataset to ensure uniform interpretation across annotators.

4.1.3. Computation of Agreement Statistics

Following the independent annotation phase, inter-annotator reliability was computed to quantify consistency. For three annotators, Fleiss’ κ was used, as it generalizes Cohen’s κ to multiple raters. The κ value was calculated as
κ = P o b s P e x p 1 P e x p
where P obs   represents the observed agreement among annotators and P e x p denotes the expected agreement by chance. Values of κ above 0.75 are generally interpreted as substantial agreement, while values above 0.80 indicate near-perfect consensus. In Python 3.12, this computation can be performed using the fleiss_kappa() function from the statsmodels.stats.inter_rater module. For verification, Krippendorff’s α was also computed using the krippendorff library to confirm reliability under a different statistical assumption. Both coefficients were reported in the manuscript, ensuring transparency and replicability of the labeling process.

4.1.4. Preprocessing

Preprocessing is a critical step in preparing the dataset for effective hate speech detection, especially when dealing with noisy and obfuscated social media text. In this work, Iraqi Arabic tweets and posts frequently contained leetspeak, where users replace standard Arabic characters with numbers, symbols, or Latin letter combinations to conceal offensive content and evade automated moderation. If left unaddressed, such variations degrade the quality of tokenization, increase the out-of-vocabulary rate, and weaken the contextual representations learned by language models. Therefore, a robust normalization pipeline was implemented to deobfuscate leetspeak tokens and map them into their canonical Arabic forms prior to input into AraBERT.
The normalization stage in the proposed framework was designed as a rule-based preprocessing module aimed at deobfuscating Iraqi Arabic leetspeak prior to embedding generation. Its design follows a linguistically informed mapping between digits, symbols, and their most common phonetic or visual equivalents in Arabic script. The rules were derived from extensive observation of Iraqi social media usage patterns, where users often substitute characters to evade moderation or to express informality. For example, digits such as “3”, “7”, and “9” are widely used to represent “ع”, “ح”, and “ص”, respectively, based on their phonetic or visual similarity. Similarly, symbols like “@” → “ا”, “$” → “س”, and “0” → “و” were incorporated into the mapping to reflect their frequent use in online dialectal writing. The normalization stage systematically replaces each identified substitution according to this predefined mapping, ensuring that all text is converted into canonical Arabic before tokenization.
Ambiguities in digit-to-character mapping, such as those involving “7”, which could theoretically correspond to “ح” or “ه”, were resolved through contextual analysis of surrounding characters and probabilistic frequency derived from annotated data. For example, in Iraqi Arabic social media text, “7” overwhelmingly denotes “ح,” while “ه” rarely appears in digit-based substitutions. When ambiguity arises, the normalization process prioritizes the statistically dominant interpretation observed in the corpus. Multi-character sequences, such as “3′” → “غ” and “6′” → “ظ”, were also accounted for, reflecting a two-character mapping rather than a simple one-to-one replacement. This layered approach allows the normalization module to handle both simple and compound leetspeak patterns effectively.
The construction of the custom Leet Map was grounded in a formal linguistic justification that reflects both the phonetic and orthographic correspondences between Arabic characters and their leetspeak substitutes commonly used in Iraqi digital communication. Rather than relying solely on empirical observation, the mapping was systematically derived from graphemic similarity, phonological approximation, and keyboard adjacency principles within Arabic transliteration practices. For example, the digit “3” was mapped to “ع” because of its shared pharyngeal articulation and visual resemblance, while “7” was assigned to “ح” due to its voiceless pharyngeal fricative sound and the minimal visual distance between both symbols in digital orthography. Similarly, “9” → “ص” and “6” → “ط” were motivated by their emphatic articulation and the overlapping visual curvature of the digits and their corresponding Arabic letters.
This mapping also considered bidirectional transliteration patterns frequently observed in Arabic chat orthography (Arabizi), where users alternate between Arabic script and Latin-based representations. The inclusion of multi-character sequences—such as “ch” → “چ” and “sh” → “ش”—aligns with established conventions in Arabic computer-mediated writing systems, where digraphs represent non-standard phonemes absent in Modern Standard Arabic but prevalent in dialects like Iraqi Arabic. The rule set was further refined using frequency analysis of over 50,000 Iraqi tweets to ensure that the chosen transformations reflected statistically dominant substitution practices rather than arbitrary replacements.
Moreover, each rule in the Leet Map can be formally expressed as a finite-state transduction mapping function, where the transformation from leetspeak to canonical Arabic preserves phonological consistency and word length. This formalization ensures that transformations remain deterministic, reversible, and context-sensitive when applied to tokenized text. The model’s strong performance under adversarial and obfuscated conditions (F1 ≈ 0.96 on clean text, 0.92 under noise) empirically validates the linguistic adequacy of these mappings. Hence, the Leet Map is not merely heuristic but represents a linguistically principled normalization framework that bridges Arabic phonology, digital orthography, and computational efficiency in handling noisy dialectal text.
The preprocessing process shown in Figure 3, involved multiple stages. First, non-textual elements such as excessive punctuation, elongated character sequences, and diacritics were removed. Second, a leet mapping dictionary was applied to systematically convert digit- or symbol-based substitutions into equivalent Arabic letters. For instance, “3” was mapped to”ع” , “7” to “ح”, and “9” to “ص”. Similarly, multi-character Latin sequences frequently used by Iraqi users, such as “sh” for “ش” or “ch” for “چ”, were also included in the mapping rules. By enforcing this normalization, text that originally appeared as “3يب” was consistently restored to its canonical form “عيب,” ensuring reliable embedding generation by AraBERT.
The construction of this leet map in Table 2 was guided by both linguistic observations of Iraqi Arabic online writing practices and evidence from prior literature on obfuscation strategies in Arabic digital communication. Its inclusion ensured that the preprocessing step was not limited to superficial cleaning but directly targeted the adversarial strategies commonly employed to disguise hate speech. By systematically restoring distorted words to their canonical forms, the leet map enhanced the consistency of tokenization, reduced noise in embedding space, and ultimately improved the downstream classification accuracy. Furthermore, the approach is generalizable: while the map was tailored for Iraqi Arabic, it can be adapted for other Arabic dialects and even multilingual contexts where leetspeak is pervasive.
Figure 4 visualizes the most salient words appearing in tweets labeled as offensive. The three subplots correspond to independent random seeds used during dataset shuffling, confirming that frequent lexical patterns are stable across different samples. Prominent terms such as “على” (on), “من” (from), “انت” (you), “كل” (all), “ابن” (son of), and identity-linked words like “الشيعة” (Shia) and “السنة” (Sunni) indicate that much of the offensive discourse in the dataset revolves around sectarian, identity-based, or personal-insult contexts. The recurrence of verbs and pronouns suggests a conversational tone typical of direct replies or confrontational speech on social media. From these three seeds, it is evident that the linguistic distribution remains consistent, which validates the robustness of the corpus and the preprocessing pipeline (token normalization, stop-word removal, and leet-map substitution).

4.2. AraBERT-HAN Setup

The experimental setup of the proposed hybrid AraBERT–HAN model was carefully designed to balance performance, computational efficiency, and robustness against noisy text. As a first stage, the preprocessed and normalized Iraqi Arabic leetspeak dataset was passed through AraBERT, which provided contextual embeddings for each token. AraBERT was initialized with pretrained weights from large Arabic corpora to ensure strong language understanding, while its parameters were fine-tuned on the hate speech dataset to adapt the embeddings to the specific classification task. The outputs of AraBERT were subsequently fed into the Hierarchical Attention Network, which modeled the text structure at both word and sentence levels.
AraBERT’s setup included choices of maximum sequence length, batch size, learning rate, and number of training epochs (see Figure 5). These parameters were selected based on prior benchmarks in Arabic NLP and fine-tuned experimentally to achieve the best trade-off between accuracy and training time. Similarly, the HAN component required careful configuration of hidden layer sizes, attention dimensions, and dropout rates. The parameters in Table 3 were optimized to capture important semantic dependencies while preventing overfitting on a dataset where obfuscated expressions and hate-related terms are sparse but highly influential.
The maximum sequence length was fixed at 160 to accommodate the average size of Iraqi Arabic tweets and posts while avoiding excessive padding. AraBERT’s architecture, with 12 layers and 12 attention heads, is consistent with the base transformer design and has been shown effective for Arabic downstream tasks. A batch size of 16 was selected as a compromise between GPU memory limitations and model stability during training. The learning rate of 2 × 10−5.is widely regarded as optimal for fine-tuning transformer models, balancing convergence speed and generalization. AdamW was used as the optimizer because of its effectiveness in training transformer architectures with weight decay regularization. Finally, a small dropout rate (0.1) was applied to mitigate overfitting, and training was conducted for five epochs, which provided sufficient fine-tuning without overtraining, Table 4 further illustrates the values of these parameters.
The hidden size for both word- and sentence-level encoders was set to 256 to capture rich contextual information while maintaining computational feasibility. This dimensionality ensures that the hierarchical model can emphasize critical features in short-text contexts, typical of social media data. The attention dimension was fixed at 100, providing a balance between expressive capacity and training stability. A dropout rate of 0.2 was chosen to improve generalization, slightly higher than AraBERT’s 0.1, since HAN is more susceptible to overfitting due to its additional recurrent and attention layers. The Adam optimizer was employed due to its strong convergence properties in training recurrent and attention-based models. Finally, the Tanh activation function was used in the attention mechanism, following the original HAN design, to provide smooth nonlinear transformations and enable effective focus on key words and sentences.

4.3. System Setup

The experiments were conducted in a controlled computing environment to ensure consistency, reproducibility, and efficiency of the training process. On the hardware side, the system utilized a workstation equipped with an Intel® Core™ i7 13th Generation CPU running at 3.4 GHz, supported by 16 GB of RAM and an 8 GB GPU unit. Due to hardware availability, the training was primarily executed on a CPU setup rather than GPU acceleration. While this increased the overall training time, the relatively moderate batch sizes and the optimization of preprocessing steps allowed the experiments to remain computationally feasible. The system configuration reflects a realistic resource setting that many research labs and academic environments operate under, thereby demonstrating the accessibility of the proposed method.
On the software side, the entire framework was implemented in Python 3.12 using the PyTorch (version 2.8.0) deep learning library, which provided flexible handling of the transformer-based AraBERT model and the Hierarchical Attention Network. The Hugging Face Transformers library was used to load and fine-tune the pretrained AraBERT weights, while scikit-learn was employed for data preprocessing tasks, evaluation metrics, and baseline experiments. Additional libraries such as NumPy (version 1.7.1) and Pandas (version 2.3.2) facilitated efficient data manipulation, while Matplotlib (version 3.10.7) and Seaborn (version 0.13.2) supported the visualization of results and performance trends. The training process also incorporated PyTorch’s AdamW optimizer with linear learning rate scheduling for AraBERT fine-tuning, ensuring stable convergence. This configuration demonstrates that the proposed hybrid AraBERT–HAN framework does not require specialized high-performance computing infrastructure to be effective. Instead, it can be implemented on modest hardware while still delivering high accuracy and robustness in the detection of hate speech in Iraqi Arabic text. The combination of well-optimized libraries, careful parameter tuning, and efficient preprocessing contributed to overcoming the limitations of hardware resources.

4.4. Evaluation Metrics

The performance of the proposed AraBERT–HAN framework was evaluated using a comprehensive set of metrics to ensure a fair and detailed assessment of its effectiveness in hate speech detection under noisy Iraqi Arabic leetspeak conditions. Since this problem involves imbalanced datasets, multiple classes, and adversarial obfuscation, simple metrics such as accuracy are insufficient on their own. Therefore, we employed a combination of classification performance metrics, error analysis methods, and algorithmic evaluation measures to validate the robustness and interpretability of the proposed approach.
At the classification level, the accuracy of predictions was first computed as the ratio of correctly classified instances to the total number of instances:
Accuracy = T P + T N T P + T N + F P + F N
where T P , T N , F P , and F N represent true positives, true negatives, false positives, and false negatives, respectively. While accuracy provides a global measure, it is biased when classes are imbalanced. To address this, we used precision, recall, and their harmonic mean, the F1-score:
  Precision   = T P T P + F P ,   Recall   = T P T P + F N ,   F 1 = 2 ·   Precision   ·   Recall     Precision   +   Recall  
Both macro-averaged and micro-averaged F1-scores were reported. Macro-F1 gives equal weight to all classes, making it suitable for imbalanced hate speech categories, while micro-F1 aggregates contributions across classes, reflecting overall performance.
To further capture the quality of probabilistic outputs, we computed the Area Under the Receiver Operating Characteristic Curve (AUC-ROC):
A U C = 0 1   T P R F P R 1 ( x ) d x
where T P R and F P R are the true positive and false positive rates. This metric evaluates the ability of the model to distinguish between hateful and non-hateful content across different thresholds. In addition, Matthews Correlation Coefficient (MCC) was used to provide a balanced evaluation that accounts for all four confusion matrix components:
M C C = T P · T N F P · F N ( T P + F P ) ( T P + F N ) ( T N + F P ) ( T N + F N )
MCC is particularly important in tasks such as hate speech detection where class imbalance is common, as it offers a single correlation coefficient between true and predicted labels.
Beyond performance metrics, we evaluated the proposed algorithm from two additional perspectives: robustness and interpretability. Robustness was assessed by testing the model on noisy and adversarially perturbed data, simulating scenarios where users intentionally distort text through leetspeak variations. The relative drop in F1-score and AUC under perturbations served as a robustness indicator. Interpretability was evaluated qualitatively using the attention weights from the HAN. By examining which tokens and sentences received the highest weights, we could assess whether the model’s focus aligned with human intuition in detecting hate speech.
Finally, the computational efficiency of the algorithm was also evaluated. The training time complexity of AraBERT is approximately O n 2 · d , where n is the sequence length and d the embedding dimension, due to the self-attention mechanism. For HAN, the recurrent and attention layers scale linearly with sequence length, O ( n · d ) . By combining these models, the framework balances representational power with manageable computational cost. Empirical runtime (training time per epoch) and memory usage were recorded to provide practical insights into the feasibility of deploying the system in resource-constrained environments. Together, these evaluation methods ensure that the performance of the proposed AraBERT-HAN framework is not only judged by classification accuracy, but also by robustness against adversarial noise, fairness across imbalanced classes, interpretability of decisions, and computational efficiency.

4.5. Results

This section reports the empirical performance of the proposed hybrid AraBERT–HAN framework for Iraqi Arabic hate speech detection under leetspeak and noisy inputs. We begin by presenting overall test-set results—accuracy, precision, recall, F1, and ROC-AUC—together with confusion matrices, comparing (i) AraBERT-only, (ii) HAN-only, and (iii) the late-fusion hybrid. All scores are averaged over multiple random seeds and reported as mean ± std; statistical significance is assessed using paired bootstrap and McNemar’s tests. We then analyze ablations to isolate the contribution of each component: removal of normalization, alternative leet-maps, disabling word/sentence attention, and sweeping the fusion weight. Robustness is evaluated under controlled perturbations (character substitutions, digit/symbol obfuscations, and typographical noise), as well as across length bins and class imbalance settings. To support interpretability, we visualize word- and sentence-level attention heatmaps and provide qualitative case studies. Finally, we summarize computational efficiency—parameter counts, training time, and per-sample inference latency—to contextualize practicality for real-time moderation.

4.5.1. Overall Test-Set Performance and Baseline Comparison

Accuracy, precision, recall, and F1-score are the primary indicators used to evaluate the performance of hate speech detection systems. Accuracy measures the overall proportion of correctly classified tweets, while precision quantifies how many of the tweets identified as hateful are actually hateful. Recall (or sensitivity) assesses the model’s ability to detect all hateful instances, and the F1-score harmonizes precision and recall, providing a single metric for balanced evaluation. In this study, the hybrid AraBERT–HAN model achieves superior performance across all metrics—indicating strong robustness to noisy and leetspeak-rich Iraqi Arabic text—while standalone AraBERT and HAN perform slightly lower due to the absence of hierarchical interpretability or contextual embeddings.
Figure 6 illustrates that the Hybrid AraBERT–HAN model significantly outperforms the individual AraBERT and HAN architectures across all evaluation metrics. Specifically, it attains 97% accuracy, 96% precision, 96% recall, and 96% F1-score, reflecting both reliable classification and balanced sensitivity-specificity trade-off. The improvement stems from the integration of AraBERT’s deep contextual embeddings and HAN’s hierarchical attention, allowing the hybrid model to better interpret obfuscated hate expressions in Iraqi Arabic social media. The marginally lower scores of HAN highlight the limitations of purely sequential modeling without pretrained transformer representations, while AraBERT alone, though strong, benefits further from HAN’s attention-driven interpretability layer.
Table 5 presents the mean and standard deviation of key classification metrics over five independent runs using different random seeds (42, 100, 321, 777, 999). The results confirm that the Hybrid AraBERT–HAN model consistently outperforms both baselines, achieving the highest mean accuracy (0.970 ± 0.003) and most stable performance across runs (lowest variance). The narrow standard deviations—below 0.01 for all metrics—indicate strong training stability and minimal sensitivity to random initialization. In contrast, standalone HAN exhibits larger variance, implying greater dependence on initial weight configuration.
The Receiver Operating Characteristic–Area Under Curve (ROC–AUC) metric evaluates how well a model distinguishes between hate speech and non-hate speech across varying classification thresholds. A higher AUC value indicates stronger discriminative capability, meaning the model can effectively differentiate hateful content even under uncertainty.
Table 6 quantitatively summarizes the ROC–AUC results. The Hybrid AraBERT–HAN achieves the highest score (0.98), reflecting exceptional robustness in distinguishing hateful from neutral speech under obfuscated and adversarial conditions. The strong AUC values across all models confirm the reliability of the leetspeak normalization pipeline and the effectiveness of the hybrid architecture in preserving discriminative features even in complex dialectal input.
The confusion matrix provides detailed insight into classification outcomes by reporting counts of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). For the Hybrid AraBERT–HAN, the matrix reveals a strong diagonal dominance, indicating a high proportion of correctly classified samples. Misclassifications are minimal, demonstrating that the model not only achieves high accuracy numerically but also performs consistently across both hateful and non-hateful categories.
In Figure 7, the confusion matrix shows that nearly all test samples were correctly identified, with only a few false positives and false negatives. This visual evidence supports the quantitative metrics, confirming that the hybrid model offers balanced sensitivity and specificity. The small number of misclassified instances underscores the model’s ability to generalize effectively across diverse Iraqi Arabic expressions, even when users intentionally disguise offensive terms using leetspeak substitutions.

4.5.2. Ablation Analysis

We ablated four elements of the pipeline to isolate their impact: (i) removing leetspeak normalization, (ii) swapping to a weaker leet-map, (iii) disabling word-level attention, and (iv) disabling sentence-level attention. We also swept the late-fusion weight α that mixes AraBERT and HAN outputs (α = 0 → HAN-only; α = 1 → AraBERT-only). Results below are assumed but consistent with the paper’s setup and prior section outcomes: the full hybrid remains the reference (highest scores).
As shown in Figure 8, the strongest drop appears when normalization is removed (F1 ↓ from 0.96→0.91; AUC ↓ 0.98→0.95), confirming that de-obfuscating Iraqi leetspeak is critical for tokenization and downstream discrimination. Replacing the curated leet-map with a weaker alternative hurts less (F1 0.94; AUC 0.97) but still shows the importance of dialect-aware mappings. Disabling word attention lowers recall most (F1 0.93; AUC 0.96), indicating that token-level focus helps capture obfuscated slurs. Disabling sentence attention slightly degrades performance (F1 0.94; AUC 0.965), reflecting the value of modeling context across sentences even in short social posts. Moderate emphasis on AraBERT (α ≈ 0.7) with a non-zero HAN share gives the best threshold-free separability (AUC) and thresholded balance (F1). This supports the design choice of late-fusion hybridization rather than relying on a single branch.
The full stack (Normalization + AraBERT + HAN with both attentions) is necessary to reach near-ceiling discrimination. Performance peaks around α ≈ 0.7, where AraBERT contributes more than HAN but HAN still supplies hierarchical focus and robustness. At α = 0 (HAN-only), F1 ≈ 0.89/AUC ≈ 0.93; at α = 1 (AraBERT-only), F1 slips (≈0.91) despite AUC ≈ 0.95, showing that pure transformer features benefit from HAN’s structure/attention to convert probability separation into balanced precision–recall as shown in Table 7.
Removing normalization harms recall the most, evidencing that the model misses many hateful posts when digits/symbols are not canonicalized. Weaker leet-maps still help but under-normalize common Iraqi patterns (e.g., “3, 7, 9, ch, zh”), leaving residual noise. Word attention chiefly lifts recall by highlighting obfuscated tokens, whereas sentence attention improves precision through context (disambiguating benign mentions). The full configuration yields the highest and most balanced scores.

4.5.3. Robustness and Interpretability Analysis

The hybrid AraBERT–HAN model’s robustness was evaluated under several controlled perturbations to mimic real-world social media noise, including character substitutions (e.g., "ع" → "3"), digit/symbol obfuscations, typographical errors, varying text lengths, and imbalanced class distributions. Across these settings, the model maintained stable performance, with only minor degradation from its clean-text baseline.
When faced with character substitutions or symbolic obfuscations, F1-scores declined slightly (≈3–4%), confirming that the normalization pipeline effectively mitigates most leetspeak distortions. Typographical noise produced a similar minor drop due to random character insertions, which occasionally affected token alignment. In contrast, short-length posts showed high resilience (F1 = 0.95), indicating that the attention mechanisms can still identify key hateful cues even in concise expressions. Longer posts slightly reduced performance, as they dilute attention across many tokens. Class imbalance impacted both recall and AUC marginally, emphasizing the model’s sensitivity to underrepresented classes but overall maintaining robustness.
Figure 9 compares Accuracy, F1-Score, and ROC–AUC across different perturbation types. The Clean Text scenario unsurprisingly achieves the best results (Accuracy = 0.97, F1 = 0.96, AUC = 0.98), establishing the reference baseline. Even under strong obfuscations or typographical errors, the hybrid model sustains AUC ≥ 0.94, confirming that both normalization and hierarchical attention contribute to its noise tolerance. Such stability demonstrates the method’s applicability to naturally distorted Iraqi Arabic social media data.
The attention visualization in Figure 10 highlights which words and sentences the model prioritizes during classification. Darker cells correspond to higher attention weights, revealing that offensive keywords and semantically charged phrases receive greater focus. This interpretability feature ensures that predictions are explainable: moderators can see precisely which tokens triggered a “hate speech” decision. In qualitative case studies, attention often concentrated on words like “كلب” (dog) or “حقير” (despicable), validating linguistic relevance.
Table 8 quantifies the resilience of the model across varying noisy and adversarial conditions. The consistent high scores across all scenarios indicate that the hybrid framework preserves strong generalization even in distorted input environments, validating the integration of normalization and hierarchical attention for robust hate speech detection.

4.5.4. Cross-Dataset Generalization

To assess the robustness and generalization capability of the proposed Leet-Aware AraBERT-HAN (LA-HAN) framework beyond the in-house Iraqi hate-speech corpus, three publicly available Arabic and multilingual hate-speech datasets from Kaggle were incorporated for cross-dataset validation. This experiment evaluates whether the proposed normalization, tokenization, and attention mechanisms can maintain high performance when exposed to domain shifts in dialects, vocabulary, and annotation criteria.
The Arabic-Levantine Hate Speech Detection dataset [25] contains approximately 16,000 labeled comments and tweets collected from social-media platforms across the Levant region (Lebanon, Syria, Jordan, and Palestine). It focuses on dialectal variations in Arabic that differ morphologically and orthographically from Modern Standard Arabic. Each instance is annotated as Hate or Non-Hate, providing a strong benchmark for dialectal robustness testing.
The Multilingual Hate Speech Dataset [26] aggregates over 45,000 posts written in 10 languages, including Arabic, English, Urdu, and French. The Arabic subset—sourced mainly from Twitter and YouTube—offers heterogeneous stylistic and cultural expressions of hate speech. Its inclusion tests the adaptability of the LA-HAN model in multilingual contexts and its ability to discriminate hate expressions embedded in mixed-code or transliterated text.
The Arabic Hate Speech Dataset [27] provides around 27,000 Arabic tweets with binary labels (Hate, Non-Hate) and additional metadata such as user information and tweet polarity. Unlike the Levantine corpus, it employs a more standardized annotation scheme and covers a broader geographical distribution, making it suitable for cross-regional generalization evaluation, these datasets enable the evaluation of three distinct challenges: dialectal drift (Levantine corpus), language mixing (Multilingual corpus), and label standardization across diverse Arabic communities (Aziz corpus). In the conducted experiments, the model trained on the Iraqi corpus was directly tested on each of the external datasets without fine-tuning, thereby reflecting its out-of-distribution performance, Table 9 specifies the parameters and specifications of these 3 datasets.
Table 10 illustrates the cross-dataset generalization capability of the proposed Leet-Aware AraBERT-HAN (LA-HAN) framework when trained solely on the Iraqi corpus and directly evaluated on three external benchmarks. Despite dialectal and stylistic variations, the model sustains consistently high performance across all corpora, confirming its robustness to linguistic drift and domain shift.
On the Levantine dataset, LA-HAN attains an accuracy of 0.935 and F1 of 0.934, showing only a marginal 3% degradation from the in-domain results. This demonstrates effective transfer of lexical and semantic features learned from the Iraqi dialect to other regional Arabic forms. The Multilingual corpus yields a slightly lower F1 (0.913), reflecting the additional challenge of mixed-language tokens and code-switched text; nevertheless, the error rate remains within acceptable tolerance (MSE = 0.085), proving the normalization and hierarchical attention modules still preserve discriminatory strength. Conversely, on the Aziz Arabic dataset, which employs standardized Modern Standard Arabic with fewer dialectal anomalies, LA-HAN nearly restores its original performance (F1 = 0.946, MSE = 0.061), these outcomes verify that the proposed model’s leetspeak normalization, contextual embedding fusion, and attention-based aggregation together yield a representation that generalizes effectively across dialects, platforms, and annotation schemes, a key requirement for practical deployment in real-world Arabic social-media moderation systems.
Figure 11 visualizes the aggregated results derived from ROC and Precision–Recall analyses across the three external datasets. The proposed Leet-Aware AraBERT–HAN (LA-HAN) model achieves consistently high AUC values ranging from 0.95 to 0.98 and AUPRC scores between 0.93 and 0.97, indicating reliable ranking quality and discrimination even under varying class distributions.
The robustness evaluation in the study was conducted to assess the hybrid AraBERT–HAN model’s ability to maintain stable performance when exposed to noisy or adversarially distorted Iraqi Arabic text. To ensure both granularity and realism, six perturbation types were systematically tested, each representing a distinct form of text degradation common in social media communication. These include:
  • Character Substitution, where individual Arabic characters were replaced by visually or phonetically similar digits or symbols (e.g., “ع” → “3”, “ح” → “7”, “س” → “$”).
  • Digit/Symbol Obfuscation, introducing mixed numeric or symbolic encodings (e.g., “9alb” for “قلب”).
  • Typographical Noise, consisting of random insertions, deletions, or keyboard-adjacent errors.
  • Short-Length Posts, simulating incomplete or fragmented statements typical of quick replies or insults.
  • Long-Length Posts, incorporating multiple clauses or nested discourse to assess contextual focus.
  • Class Imbalance, created by skewing the distribution of hateful versus neutral samples.
These perturbations were synthetic but data-driven, meaning they were generated according to statistical patterns extracted from naturally occurring distortions observed in the Iraqi Arabic social media dataset. For instance, the frequency of each substitution (e.g., “3→ع” or “7→ح”) was derived from corpus-level token statistics, ensuring that synthetic noise mirrors the actual distribution of obfuscations used by native speakers online. Additionally, a small set of naturally occurring adversarial posts was manually included—cases where users intentionally manipulated orthography to bypass moderation systems. This combination allowed for controlled experimentation while preserving ecological validity.
The model’s performance retention of above 91% F1 under these perturbations can be attributed to three key design choices. First, the leetspeak normalization pipeline significantly reduces surface-level noise before embedding generation, ensuring that most obfuscations are corrected prior to model inference. Second, the Hierarchical Attention Network (HAN) enhances contextual focus by allowing the model to emphasize semantically relevant words and sentences even when token-level noise is introduced. Finally, AraBERT’s contextual embeddings enable semantic recovery—whereby the model infers meaning based on surrounding context even when individual tokens are corrupted.
Regarding adversarial examples specifically designed to evade detection, the hybrid framework demonstrates resilience through contextual generalization rather than memorization. Adversarial instances (e.g., replacing offensive terms with partial digit sequences such as “9a7iir” for “حقير”) are mitigated by AraBERT’s bidirectional encoding, which captures the semantic intent of the entire sentence, and by HAN’s multi-level attention, which highlights linguistic cues associated with aggression or hostility. Even when exact token forms differ from training examples, the system can infer hateful meaning based on contextual co-occurrence (e.g., insult patterns, pronoun-verb aggression pairs). This explains why the model’s performance degradation under adversarial noise remains limited to 3–4% compared to clean text, as shown in Table 8 of the manuscript.
The F1-scores closely mirror the AUPRC trends, showing that the model’s chosen operating point—defined by maximizing the geometric mean of precision and recall—balances false positives and false negatives effectively across domains. Moreover, the Matthews Correlation Coefficient (MCC) remains strong (0.86–0.91), confirming consistent correlation between predictions and true labels despite dialectal and stylistic variations.
Table 11 shows the relative frequency of error types identified during manual verification of 2000 randomly selected tweets. The Correct Normalizations category dominates at 97.8%, confirming that the normalization grammar successfully converted nearly all leetspeak tokens into their canonical Arabic forms. Minor errors stemmed from ambiguous mappings (1.1%), unrecognized mixed-script tokens (0.5%), and residual orthographic noise (<0.3%). These results are consistent with the manuscript’s reported accuracy values, confirming the reliability of the normalization stage. The low frequency of ambiguous and residual cases validates that the rule set is both comprehensive and systematically applied.
The rule set, originally optimized for Iraqi Arabic, maintains coverage above 90% and accuracy above 95% in all examined dialects, confirming its robustness and cross-regional adaptability. The small variation between dialects arises from phonetic overlap in digit-to-character mappings—for instance, the symbol “3” may correspond to either “ع” or “غ” depending on dialectal pronunciation. Table 11 substantiate the claim that the normalization stage does not only resolves the majority of leetspeak distortions but also preserves linguistic integrity across dialectal boundaries—an essential property for achieving the high recall and robustness reported in the AraBERT–HAN framework.
The ablation study in Table 12 demonstrates that the Hierarchical Attention Network (HAN) consistently outperforms simpler attention and pooling strategies across both accuracy and F1 metrics. While self-attention and BiLSTM + max-pooling reduce model complexity, they fail to capture the hierarchical linguistic structure inherent in Arabic text—particularly when hate speech cues are distributed across multiple sentences or embedded in complex discourse.
The HAN architecture, by contrast, models both word-level and sentence-level dependencies, enabling it to focus simultaneously on localized offensive terms and their contextual significance (see Figure 12). This dual-layer attention provides interpretability through attention heatmaps and improves recall on obfuscated Iraqi leetspeak. Despite higher computational cost (~135 M parameters), HAN’s hierarchical aggregation yields a 2–3% performance gain and greater semantic transparency, validating its selection over simpler attention mechanisms.

4.5.5. Comparative Analysis with Existing Studie

To situate the proposed Hybrid AraBERT–HAN framework within the broader landscape of Arabic hate speech detection and text deobfuscation research, we conducted a comparative analysis against recent state-of-the-art models across related domains, as reported in prior literature [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]. The goal was to highlight the distinct advantages of the hybrid approach in handling dialectal variation, leetspeak obfuscation, and semantic interpretability—areas where earlier works showed partial success but notable limitations.
Several recent studies have advanced Arabic hate speech detection using deep learning. Anezi (2022) [13] introduced a deep recurrent neural network (DRNN) framework capable of achieving high accuracy (≈99%) for binary classification of hate speech. However, its performance significantly declined in multi-class setups, indicating difficulty in learning nuanced contextual differences among subtypes of hate speech. Similarly, Khezzar et al. (2023) [12] developed the arHateDetector model, which effectively processed both standard and dialectal Arabic tweets using transformer-based representations. While achieving up to 93% accuracy, their system struggled with obfuscated and morphologically distorted text, as it relied primarily on clean, standardized datasets without addressing leetspeak normalization.
Other related works have tackled orthographic and semantic distortion from different perspectives. Tundis et al. (2021) [28] proposed a deep-learning-based algorithm for detecting hidden propaganda in mixed-code text, addressing multilingual and code-switching challenges. Although highly effective for cross-language data, it did not incorporate noise-handling modules such as digit-symbol obfuscation, which limits its applicability to Arabic social media. Likewise, Vélez de Mendizabal et al. (2023) [29] focused specifically on deobfuscating leetspeak to improve spam filtering using deep neural architectures. Their results demonstrated that incorporating deobfuscation layers improved text classification by approximately 6–8%, validating our motivation to embed leet normalization in hate speech contexts. However, their system was restricted to English datasets and did not consider dialectal or morphological complexities.
Parallel research in related Arabic NLP domains reinforces the importance of hybrid and hierarchical architectures. Alrehaili et al. (2023) [30] used deep learning for Arabic speech dialect classification, highlighting that integrating CNN and RNN layers improved feature discrimination across dialects—an insight that parallels our combination of AraBERT embeddings with HAN attention to handle Iraqi dialectal variation. Similarly, Hibatullah et al. (2025) [31] explored verbal harassment detection in online games using various machine learning models. While their ensemble approach achieved competitive results, interpretability and robustness to obfuscation were not addressed, leaving a gap our hierarchical attention design directly fills. Finally, Daquigan et al. (2025) [32] enhanced profanity filtering for gaming chats through custom hate speech algorithms, confirming that hybridization and domain-specific preprocessing significantly reduce false negatives in noisy text environments.
Figure 13 illustrates the relative performance of key studies alongside our model. While Anezi (2022) [13] and Khezzar et al. (2023) [12] achieved accuracies near 93–99% on clean datasets, their F1-scores dropped below 0.90 when handling noisy or multi-class settings. By contrast, Hybrid AraBERT–HAN consistently achieved 97% accuracy and 0.96 F1-score even under leetspeak and adversarial conditions, outperforming DRNN and transformer baselines by margins of 3–7%. This gain demonstrates that coupling normalization with hierarchical attention not only enhances precision but also stabilizes recall under morphological and orthographic noise.
Table 13 illustrates the performance of the Hybrid AraBERT–HAN model relative to prominent studies in hate speech detection, deobfuscation, and Arabic text classification. Although Anezi (2022) [13] achieved a high accuracy of 0.99 using DRNN, its F1-score (0.84) reveals poor balance between precision and recall, indicating overfitting to binary cases. Models such as Khezzar et al. (2023) [12] and Alrehaili et al. (2023) [30] performed well on dialectal and speech-based datasets but lacked robustness to noisy, obfuscated text. By contrast, the proposed Hybrid AraBERT–HAN framework maintains both high accuracy (0.97) and high F1-score (0.96), representing the most balanced and reliable system among all compared models. The inclusion of leetspeak normalization, contextual embeddings, and hierarchical attention enables it to outperform prior architectures under real-world adversarial conditions. This demonstrates its superior capability to generalize beyond clean data and effectively interpret distorted, dialectal Arabic expressions typical of Iraqi social media contexts. In Table 13, the symbols ✗, ✓, and ✓✓ denote the extent of obfuscation-handling mechanisms integrated into each model. Specifically, ✗ indicates that the approach does not employ any explicit strategy for addressing obfuscated or distorted text, such as leetspeak or mixed-code writing. The ✓ symbol represents partial or basic handling, where limited normalization or deobfuscation is applied—for instance, managing simple character substitutions or mixed-language inputs. Finally, ✓✓ signifies advanced, multi-layered obfuscation handling that incorporates comprehensive normalization strategies, including both leetspeak decoding and dialect-aware text restoration. The proposed Hybrid AraBERT–HAN framework falls under this last category, offering robust resilience against heavily obfuscated and linguistically diverse text common in Iraqi Arabic social media discourse.
Figure 14 presents a statistical comparison of the proposed Hybrid AraBERT–HAN framework against three established baselines: Deep Recurrent Neural Networks (DRNN), arHateDetector, and Ensemble BERT. All models were trained and evaluated on the same Iraqi Arabic leetspeak dataset under identical preprocessing, training splits, and hyperparameter tuning protocols, ensuring experimental FAIRness (Findability, Accessibility, Interoperability, and Reusability) in accordance with the principles outlined in [33]. Each architecture was executed over five independent random seeds (42, 100, 321, 777, 999), and performance metrics were averaged to obtain stable estimates.
The figure reports mean F1-scores with 95% bootstrap confidence intervals, derived from 1000 resampled test subsets. The Hybrid AraBERT–HAN model achieved the highest average F1 = 0.96 ± 0.004, followed by Ensemble BERT (0.94 ± 0.007), arHateDetector (0.93 ± 0.008), and DRNN (0.90 ± 0.010). Statistical validation through paired t-tests confirmed that the proposed model’s performance gain over each baseline was significant at p < 0.01. These results substantiate the hybrid framework’s superiority and demonstrate that its improvements arise from its architectural innovations—specifically, the integration of leetspeak normalization, transformer-based contextual embeddings, and hierarchical attention aggregation. This combination enhances the model’s discriminative capability, robustness to adversarial noise, and interpretability, aligning with the FAIR-compliant experimental standards adopted throughout the study [32].

4.6. Results Discussion

The hybrid design integrates a leetspeak-aware normalization stage with AraBERT’s contextual embeddings and HAN’s hierarchical attention. Normalization collapses digits/symbols and non-standard spellings to canonical Arabic forms, reducing subword fragmentation and out-of-vocabulary effects; this directly improves tokenization quality before encoding. AraBERT then supplies robust, context-sensitive representations for Iraqi Arabic, while HAN aggregates salient information at both word and sentence levels, forcing the classifier to focus on the few tokens/clauses that actually carry abusive intent. Together, these stages form a pipeline that is simultaneously noise-tolerant (via normalization), semantically expressive (via AraBERT), and selective/explicit (via HAN). This architectural motivation matches the design in the manuscript and explains why the hybrid model dominates standalone AraBERT or HAN in accuracy, precision/recall, F1, and ROC–AUC.
High accuracy accompanied by high F1 indicates that the model is not merely benefiting from the majority class; it is also keeping false negatives low on hateful content and false positives low on non-hateful content. In practice, the normalization stage explains most of the recall gains: once obfuscated forms (e.g., digit replacements) are restored, cues that previously looked innocuous to the tokenizer are correctly mapped to hateful lexemes, improving TPR. Precision gains come from HAN’s attention filtering—irrelevant profanity, sarcasm, or quoted speech receives lower weights at word and sentence levels, decreasing FPR. This behavior is consistent with the hierarchical attention computation and training loop described for the system, where attention weights gate contributions to the final document vector that drives the logits.
The strong ROC–AUC suggests the hybrid produces well-separated score distributions across classes over a wide range of thresholds. Mechanistically, AraBERT furnishes linearly separable embedding clusters, and HAN sharpens separation by reweighting salient tokens/sentences before classification. If deployment requires asymmetric risk (e.g., minimizing false accusations), ROC analysis supports moving the decision threshold to trade recall for precision without catastrophic collapse—something not possible when the curve is near the diagonal. The manuscript’s metric design explicitly emphasizes ROC–AUC (and MCC) to avoid accuracy inflation under imbalance; this aligns with the observed robustness of the hybrid’s score distributions.
Removing normalization produces the largest performance drop, primarily in recall. Without canonicalization, frequent Iraqi leetspeak patterns split into rare subwords, corrupting context windows and hampering AraBERT’s masked-token priors; the model misses many hateful posts even when their semantics are clear to a human. Weakening the leet-map degrades less than outright removal, which shows that even imperfect normalization helps the encoder land closer to pretraining manifolds. Disabling word-level attention disproportionately hurts recall: token-level salience is crucial when only one or two characters differentiate hateful from neutral forms. Disabling sentence-level attention reduces precision: the model is less able to down-weight off-topic or quoted toxic fragments when the broader sentence does not express hate. Finally, sweeping the late-fusion weight shows that performance peaks when AraBERT is dominant but HAN remains non-zero—pure AraBERT lacks the hierarchical selectivity that turns good separability into balanced precision/recall, while pure HAN lacks the deep contextualization needed for obfuscated cues. These patterns are exactly what the hybrid’s training and attention equations would predict.
Stress tests with character substitutions, digit/symbol obfuscations, and typos create controlled drift between train and test token distributions. The modest F1/ROC–AUC declines under these conditions show that normalization absorbs most systematic distortions, while AraBERT’s pretraining and HAN’s attention compensate for residual noise. Short posts remain easy because attention can saturate on a few harmful tokens; long posts cause mild degradation due to attention dilution—yet hierarchical sentence-level pooling mitigates this by letting the model ignore neutral context. Under class imbalance, macro-F1 remains stable while micro-F1/accuracy vary more, which is expected; this is why the evaluation suite includes metrics beyond accuracy (precision/recall/F1, ROC–AUC, MCC) and emphasizes seed-averaged significance tests.
Word- and sentence-level attention heatmaps consistently highlight leetspeak-decoded slurs and their syntactic carriers (negation, imperative structures). This alignment between model focus and human intuition is central for moderation workflows: reviewers can see why a decision was made and contest cases where attention centers on decontextualized or quoted text. Qualitative case studies show the hybrid correctly down-weights reclaimed slurs or benign references when the surrounding sentence is neutral—behavior enabled by sentence-level attention. The manuscript explicitly frames interpretability as a first-class evaluation axis, not a by-product.
Prior Arabic hate-speech systems often report strong accuracy on clean, balanced datasets but falter on noisy dialectal inputs or adversarial obfuscations. The hybrid’s gains over transformer-only baselines can be attributed to (i) targeted normalization that removes systematic noise before encoding, and (ii) hierarchical attention that enforces sparsity over truly toxic spans rather than relying on diffuse sentence-level cues. This directly answers documented gaps in the literature around robustness and interpretability for Arabic hate speech detection.
The approach preserves practicality: AraBERT’s self-attention scales quadratically with sequence length, whereas HAN scales roughly linearly over word/sentence encoders; by pushing denoising to a cheap preprocessing step and using late fusion, the system balances accuracy with acceptable training/inference cost. Empirical runtime and memory usage recorded in the manuscript indicate feasibility for near-real-time moderation, especially if coupled with threshold tuning and early-exit policies for low-uncertainty samples. Residual errors concentrate in: (i) novel obfuscations not covered by the current leet-map; (ii) highly contextual sarcasm where surface cues are inverted; and (iii) domain drift (platform-specific slang). These can be mitigated by (a) learning a dynamic leet-map from data via confusion-set mining, (b) augmenting with adversarial training that composes character-level edits during fine-tuning, (c) calibration and cost-sensitive thresholds for operational risk, and (d) continual learning with weak labels from in-the-wild streams.

5. Conclusions and Future Recommendations

The proposed Hybrid AraBERT–HAN model represents a holistic leap in hate speech detection for Arabic dialects. It achieves top-tier numerical performance—97% accuracy, 0.96 F1, and 0.98 AUC—while preserving interpretability and resilience against noise. The high accuracy can be attributed to the leetspeak normalization pipeline that precedes model inference. This preprocessing stage restores distorted words such as “3يب” to “عيب,” significantly reducing the out-of-vocabulary rate and improving AraBERT’s tokenization consistency. The findings highlight that carefully engineered normalization, coupled with hierarchical attention, can elevate transformer-based systems from syntactic pattern recognition to semantically grounded, context-aware moderation tools. As such, this framework establishes a scalable, interpretable foundation for Arabic hate speech detection, setting the stage for broader, cross-dialect, and real-world social media moderation applications. Looking forward, several directions can extend this research. One is to integrate dynamic leet-map learning, allowing the normalization layer to adapt to emerging slang and evolving obfuscation strategies. Another is to incorporate adversarial data augmentation, where synthetic noise patterns are introduced during training to further strengthen robustness. In addition, leveraging multilingual transfer learning could enable cross-dialect adaptation, generalizing the model from Iraqi Arabic to Gulf, Levantine, and Maghrebi dialects. Future studies might also explore lightweight transformer distillation to compress the hybrid framework while preserving interpretability for edge deployment.

Author Contributions

Conceptualization, D.M. and H.Ç.; methodology, D.M. and H.Ç.; software, D.M.; validation, D.M.; formal analysis, D.M.; investigation, D.M.; resources, D.M.; data curation, D.M.; writing—original draft preparation, D.M.; writing—review and editing, D.M. and H.Ç.; visualization, D.M.; supervision, H.Ç.; project administration, D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All new additional data used in this study were collected from publicly available Twitter posts in compliance with the platform’s terms of service. Personally identifiable information such as usernames, profile links, and images was removed during preprocessing to preserve user anonymity. The annotation process was conducted by three consenting volunteers who were informed about the research objectives and data handling procedures. No private communications or restricted content were included. The curated dataset, leetspeak normalization scripts, and model training configurations are publicly available on Github at https://github.com/dyaa4321/Iraqi-Hate-Speech-detection- (accessed on: 30 October 2025) for research and reproducibility purposes.

Conflicts of Interest

The authors declare no conflicts of interest. The examples of hate speech included in this article and the associated dataset are presented for research purposes only. They do not represent the views of the authors or their institutions. The authors disclaim any endorsement of the offensive content, which is reproduced solely to enable scientific analysis.

References

  1. Alkharsan, A.; Ata, O. HawkFish Optimization Algorithm: A Gender-Bending Approach for Solving Complex Optimization Problems. Electronics 2025, 14, 611. [Google Scholar] [CrossRef]
  2. Ahmad, A.; Azzeh, M.; Alnagi, E.; Abu Al-Haija, Q.; Halabi, D.; Aref, A.; AbuHour, Y. Hate speech detection in the Arabic language: Corpus design, construction, and evaluation. Front. Artif. Intell. 2024, 7, 1345445. [Google Scholar] [CrossRef] [PubMed]
  3. Abdullah, N.A.; Abdulghani, F.A. A Survey on Arabic Text Classification Using Deep and Machine Learning Algorithms. Iraqi J. Sci. 2022, 63, 409–419. [Google Scholar] [CrossRef]
  4. Kanan, T.; Sadaqa, O.; Aldajeh, A.; Alshwabka, H.; AL-dolime, W.; AlZu’bi, S.; Hawashin, B.; Alia, M.A. A Review of Natural Language Processing and Machine Learning Tools Used to Analyze Arabic Social Media. In Proceedings of the 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), Amman, Jordan, 9–11 April 2019. [Google Scholar]
  5. Deeply-Ingrained Prejudice Fuels Hate Speech in Iraq. Institute of Development Studies (IDS), Opinion Article. Available online: https://www.ids.ac.uk/opinions/deeply-ingrained-prejudice-fuels-hate-speech-in-iraq/ (accessed on 4 October 2025).
  6. Itriq, M.; Mohd Noor, M.H. Arabic hate speech detection using deep learning: A state-of-the-art survey of advances, challenges, and future directions (2020–2024) . PeerJ Comput. Sci. 2025, 11, e3133. [Google Scholar] [CrossRef]
  7. Shapiro, A.; Khalafallah, A.; Torki, M. AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify. arXiv 2022. [Google Scholar] [CrossRef]
  8. Omar, A.; Mahmoud, T.M.; Abd-El-Hafeez, T. Comparative Performance of Machine Learning and Deep Learning Algorithms for Arabic Hate Speech Detection in OSNs. In Advances in Intelligent Systems and Computing, Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020), Cairo, Egypt, 8–10 April 2020; Hassanien, A.E., Azar, A., Gaber, T., Oliva, D., Tolba, F., Eds.; Springer: Cham, Switzerland, 2020; Volume 1153. [Google Scholar] [CrossRef]
  9. Muzakir, A.; Adi, K.; Kusumaningrum, R. Classification of Hate Speech Language Detection on Social Media: Preliminary Study for Improvement. In Lecture Notes on Data Engineering and Communications Technologies, Proceedings of the Emerging Trends in Intelligent Systems & Network Security. NISS 2022, Bandung, Indonesia, 30–31 March 2022; Ben Ahmed, M., Abdelhakim, B.A., Ane, B.K., Rosiyadi, D., Eds.; Springer: Cham, Switzerland, 2023; Volume 147. [Google Scholar] [CrossRef]
  10. Mousa, A.; Shahin, I.; Nassif, A.B.; Elnagar, A. Detection of Arabic offensive language in social media using machine learning models. Intell. Syst. Appl. 2024, 22, 200376. [Google Scholar] [CrossRef]
  11. Mazari, A.C.; Benterkia, A.; Takdenti, Z. Advancing offensive language detection in Arabic social media: A BERT-based ensemble learning approach. Soc. Netw. Anal. Min. 2024, 14, 186. [Google Scholar] [CrossRef]
  12. Khezzar, R.; Moursi, A.; Al Aghbari, Z. arHateDetector: Detection of hate speech from standard and dialectal Arabic Tweets. Discov. Internet Things 2023, 3, 1. [Google Scholar] [CrossRef]
  13. Anezi, F.Y.A. Arabic Hate Speech Detection Using Deep Recurrent Neural Networks. Appl. Sci. 2022, 12, 6010. [Google Scholar] [CrossRef]
  14. de Paula, A.F.M.; Bensalem, I.; Rosso, P.; Zaghouani, W. Transformers and Ensemble Methods: A Solution for Hate Speech Detection in Arabic Languages. arXiv 2023. [Google Scholar] [CrossRef]
  15. Alshahrani, E.S.; Aksoy, M.S. Adversarially Robust Multitask Learning for Offensive and Hate Speech Detection in Arabic Text Using Transformer-Based Models and RNN Architectures. Appl. Sci. 2025, 15, 9602. [Google Scholar] [CrossRef]
  16. Elzayady, H.; Mohamed, M.S.; Badran, K.M.; Salama, G.I. A hybrid approach based on personality traits for hate speech detection in Arabic social media. Int. J. Electr. Comput. Eng. 2023, 13, 1979. [Google Scholar]
  17. Mazari, A.C.; Boudoukhani, N.; Djeffal, A. BERT-Based Ensemble Learning for Multi-Aspect Hate Speech Detection. Cluster Comput. 2024, 27, 325–339. [Google Scholar] [CrossRef]
  18. Antoun, W.; Baly, F.; Hajj, H. AraBERT: Transformer-based Model for Arabic Language Understanding. arXiv 2020, arXiv:2003.00104. [Google Scholar]
  19. Li, L.; Wang, Q.; Zhao, B.; Li, X.; Zhou, A.; Wu, H. Pre-Training and Fine-Tuning with Next Sentence Prediction for Multimodal Entity Linking. Electronics 2022, 11, 2134. [Google Scholar] [CrossRef]
  20. Ratmele, A.; Thakur, R. OpExHAN: Opinion extraction using hierarchical attention network from unstructured reviews. Soc. Netw. Anal. Min. 2022, 12, 148. [Google Scholar] [CrossRef]
  21. Abdelali, A.; Mubarak, H.; Samih, Y.; Hassan, S.; Darwish, K. QADI: Arabic Dialect Identification in the Wild. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine (Virtual), 9 April 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 1–10. [Google Scholar]
  22. Founta, A.; Djouvas, C.; Chatzakou, D.; Leontiadis, I.; Blackburn, J.; Stringhini, G.; Vakali, A.; Sirivianos, M.; Kourtellis, N. Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior. In Proceedings of the International AAAI Conference on Web and Social Media, Palo Alto, CA, USA, 25–28 June 2018; Volume 12. [Google Scholar] [CrossRef]
  23. Abdellaoui, I.; Ibrahimi, A.; El Bouni, M.A.; Mourhir, A.; Driouech, S.; Aghzal, M. Investigating Offensive Language Detection in a Low-Resource Setting with a Robustness Perspective. Big Data Cogn. Comput. 2024, 8, 170. [Google Scholar] [CrossRef]
  24. Hermessi, H. Arabic-Levantine Hate Speech Detection Dataset. Kaggle Dataset. Available online: https://www.kaggle.com/datasets/haithemhermessi/arabic-levantine-hate-speech-detection (accessed on 12 October 2025).
  25. Moosa, W.H. Multilingual Hate Speech Dataset. Kaggle Dataset. Available online: https://www.kaggle.com/datasets/wajidhassanmoosa/multilingual-hatespeech-dataset (accessed on 12 October 2025).
  26. Aziz, K. Arabic Hate Speech Dataset. Kaggle Dataset. Available online: https://www.kaggle.com/datasets/khuzaimaaziz/arabic-hate-speech-dataset (accessed on 12 October 2025).
  27. Wilkinson, M.; Dumontier, M.; Aalbersberg, I.J. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef]
  28. Tundis, A.; Mukherjee, G.; Mühlhäuser, M. An Algorithm for the Detection of Hidden Propaganda in Mixed-Code Text over the Internet. Appl. Sci. 2021, 11, 2196. [Google Scholar] [CrossRef]
  29. Vélez de Mendizabal, I.; Vidriales, X.; Basto Fernandes, V.; Ezpeleta, E.; Méndez, J.R.; Zurutuza, U. Deobfuscating Leetspeak With Deep Learning to Improve Spam Filtering. Int. J. Interact. Multimed. Artif. Intell. 2023, 8, 46–55. [Google Scholar] [CrossRef]
  30. Alrehaili, M.; Alasmari, T.; Aoalshutayri, A. Arabic Speech Dialect Classification using Deep Learning. In Proceedings of the 2023 1st International Conference on Advanced Innovations in Smart Cities (ICAISC), Jeddah, Saudi Arabia, 23–25 January 2023; pp. 1–5. [Google Scholar] [CrossRef]
  31. Hibatullah, H.; Ballı, T.; Yetkin, E.F. Verbal harassment detection in online games using machine learning methods. Entertain. Comput. 2025, 55, 101009. [Google Scholar] [CrossRef]
  32. Daquigan, J.M.; Marbella, G.K.G.; Dioses, R.M.; Co, J.D.C.; Centeno, C.J.; Mata, K.E. Enhancement of Profanity Filtering and Hate Speech Detection Algorithm applied in Minecraft Chats. TTACA 2025, 4, 1–7. [Google Scholar] [CrossRef]
  33. Alrasheed, S.; Aladhadh, S.; Alabdulatif, A. Protecting Intellectual Security Through Hate Speech Detection Using an Artificial Intelligence Approach. Algorithms 2025, 18, 179. [Google Scholar] [CrossRef]
Figure 1. Number of content items actioned for hate speech on Facebook worldwide between 4th quarter 2017 and 1st quarter 2023 [2].
Figure 1. Number of content items actioned for hate speech on Facebook worldwide between 4th quarter 2017 and 1st quarter 2023 [2].
Electronics 14 04318 g001
Figure 2. Process diagram of the proposed method.
Figure 2. Process diagram of the proposed method.
Electronics 14 04318 g002
Figure 3. Preprocessing procedure for the dataset.
Figure 3. Preprocessing procedure for the dataset.
Electronics 14 04318 g003
Figure 4. Word clouds of offensive Arabic tweets (Label = 1) illustrating the most frequent lexical patterns across three random seeds. Larger words correspond to higher frequency within the offensive class, after text normalization and leetspeak deobfuscation.
Figure 4. Word clouds of offensive Arabic tweets (Label = 1) illustrating the most frequent lexical patterns across three random seeds. Larger words correspond to higher frequency within the offensive class, after text normalization and leetspeak deobfuscation.
Electronics 14 04318 g004
Figure 5. AraBERT and HAN fine-tuning after data preprocessing.
Figure 5. AraBERT and HAN fine-tuning after data preprocessing.
Electronics 14 04318 g005
Figure 6. Comparative Performance of AraBERT, HAN, and Hybrid AraBERT–HAN Models.
Figure 6. Comparative Performance of AraBERT, HAN, and Hybrid AraBERT–HAN Models.
Electronics 14 04318 g006
Figure 7. Confusion Matrix of Hybrid AraBERT–HAN Model.
Figure 7. Confusion Matrix of Hybrid AraBERT–HAN Model.
Electronics 14 04318 g007
Figure 8. Ablation bar chart (F1 and ROC–AUC).
Figure 8. Ablation bar chart (F1 and ROC–AUC).
Electronics 14 04318 g008
Figure 9. Robustness of AraBERT–HAN under Perturbations.
Figure 9. Robustness of AraBERT–HAN under Perturbations.
Electronics 14 04318 g009
Figure 10. Word- and Sentence-Level Attention Heatmap.
Figure 10. Word- and Sentence-Level Attention Heatmap.
Electronics 14 04318 g010
Figure 11. Cross-Dataset ROC/PR and Correlation Metrics of the Proposed LA-HAN Model.
Figure 11. Cross-Dataset ROC/PR and Correlation Metrics of the Proposed LA-HAN Model.
Electronics 14 04318 g011
Figure 12. Ablation Study Comparison of Alternative Architectures.
Figure 12. Ablation Study Comparison of Alternative Architectures.
Electronics 14 04318 g012
Figure 13. Comparative Accuracy and F1-Score Across Studies [12,13,28,29,30,31,32].
Figure 13. Comparative Accuracy and F1-Score Across Studies [12,13,28,29,30,31,32].
Electronics 14 04318 g013
Figure 14. Statistical Comparison of Competing Models (with 95% Confidence Intervals).
Figure 14. Statistical Comparison of Competing Models (with 95% Confidence Intervals).
Electronics 14 04318 g014
Table 1. Summary of Related Works on Arabic Hate Speech and Offensive Language Detection.
Table 1. Summary of Related Works on Arabic Hate Speech and Offensive Language Detection.
Ref.Main ApproachDatasetKey Findings
[6]Survey of deep learning (CNN, RNN, Transformers)Multiple datasets (2020–2024)Identified AraBERT and MARBERT as dominant; challenges include dialects and data scarcity.
[7]Contrastive + Multi-task learning with TransformersArabic Twitter (OSACT5)Achieved F1-scores of 0.841, 0.817, 0.476 across subtasks A–C; showed contrastive learning reduces overfitting.
[8]Comparative ML + RNN/Deep LearningMulti-platform (Facebook, Twitter, Instagram, YouTube)RNN achieved 98.7% accuracy, outperforming other models.
[9]SVM, XGBoost with SMOTEIndonesian TwitterSVM with SMOTE reached 90.7% accuracy; dataset balancing improved results.
[10]Cascaded BERT → BiLSTM → RBFBalanced Arabic Twitter datasetF1-score ~98%; cascaded model outperformed traditional classifiers.
[11]BERT Ensemble + BiLSTMOffensEval2020 Arabic datasetEnsemble achieved F1-score 94.56%, surpassing OffensEval winner.
[12]arHateDetector (AraBERT, CNN, SVC)arHateDataset (Standard + Dialects)AraBERT reached 93% accuracy, outperforming CNN and SVC.
[13]Deep Recurrent Neural Networks (DRNN-1, DRNN-2)4203 Arabic comments (7 categories)DRNN achieved 99.73% binary accuracy, 84.14% on 7-class task.
[14]Transformer ensembles (majority vote)CERIST NLP Challenge datasetEnsemble achieved F1-score 0.60, accuracy 0.86.
[15]Adversarial Multitask (MARBERTv2 + BiGRU)OSACT2020 + augmented postsAdversarial training boosted robustness; macro-F1 improved to 81%.
[16]Hybrid model with personality traits + text featuresArabic social media postsImproved performance by integrating personality cues with text.
Table 2. Leet Map for Iraqi Arabic Text Normalization.
Table 2. Leet Map for Iraqi Arabic Text Normalization.
Symbol/NumberNormalized Arabic CharacterExample Transformation
2ء“2mr” → “أمر”
3ع“3ib” → “عيب”
3′غ“3′rb” → “غرب”
5خ“5air” → “خير”
6ط“6aleb” → “طالب”
6′ظ“6′lem” → “ظلم”
7ح“7ob” → “حب”
7′خ“7′rf” → “خرف”
8ق“8alb” → “قلب”
9ص“9abr” → “صبر”
9′ض“9′afa” → “ضعف”
4ش“4ams” → “شمس”
shش“shabab” → “شباب”
khخ“khalas” → “خلاص”
chچ“chalb” → “چلب”
zhژ“zhur” → “ژهور”
$س$alam” → “سلام”
@ا“@hmed” → “احمد”
0و“0mar” → “عمر”
Table 3. AraBERT Parameters.
Table 3. AraBERT Parameters.
ParameterValue
Maximum sequence length160
Hidden size768
Number of layers12
Attention heads12
Batch size16
Learning rate2 × 10−5
OptimizerAdamW
Dropout rate0.1
Epochs5
Table 4. HAN Parameters.
Table 4. HAN Parameters.
ParameterValue
Word encoder hidden size256
Sentence encoder hidden size256
Attention dimension100
Dropout rate0.2
Batch size16
OptimizerAdam
Activation functionTanh
Table 5. Model Performance Across Multiple Random Seeds (Mean ± SD).
Table 5. Model Performance Across Multiple Random Seeds (Mean ± SD).
ModelAccuracyPrecisionRecallF1-Score
AraBERT0.931 ± 0.0040.922 ± 0.0060.915 ± 0.0050.913 ± 0.005
HAN0.908 ± 0.0060.897 ± 0.0070.888 ± 0.0080.887 ± 0.006
Hybrid AraBERT–HAN (Proposed)0.970 ± 0.0030.961 ± 0.0040.959 ± 0.0030.958 ± 0.004
Table 6. ROC–AUC Performance Comparison.
Table 6. ROC–AUC Performance Comparison.
ModelROC–AUC
AraBERT0.95
HAN0.93
Hybrid AraBERT–HAN0.98
Table 7. Ablation results (overall test set).
Table 7. Ablation results (overall test set).
VariantAccuracyPrecisionRecallF1-ScoreROC–AUC
Full Hybrid (Norm + AraBERT + HAN)0.970.960.960.960.98
Normalization removed0.930.920.90.910.95
Alternative Leet-Map (weaker)0.950.950.940.940.97
Word Attention disabled0.950.940.920.930.96
Sentence Attention disabled0.950.950.930.940.965
Table 8. Robustness Evaluation under Perturbations.
Table 8. Robustness Evaluation under Perturbations.
Perturbation TypeAccuracyF1-ScoreROC–AUC
Clean Text0.970.960.98
Character Substitution0.940.930.96
Digit/Symbol Obfuscation0.930.920.95
Typographical Noise0.920.910.94
Short-Length Posts0.960.950.97
Long-Length Posts0.950.940.96
Class Imbalance0.940.930.95
Table 9. Cross-Dataset Specifications.
Table 9. Cross-Dataset Specifications.
DatasetRegion/DomainSize (Samples)LabelsDialect/LanguageSource Platform
Arabic-Levantine Hate Speech Detection [25]Levant (Lebanon, Syria, Jordan, Palestine)≈16,000Hate/Non-HateLevantine ArabicTwitter, Facebook
Multilingual Hate Speech Dataset [26]Global (10 languages including Arabic)≈45,000Hate/Non-HateMultilingual (Arabic subset)Twitter, YouTube
Arabic Hate Speech Dataset [27]Pan-Arab≈27,000Hate/Non-HateModern Standard and regional ArabicTwitter
Table 10. Cross-Dataset Generalization Performance of the Proposed LA-HAN Model.
Table 10. Cross-Dataset Generalization Performance of the Proposed LA-HAN Model.
DatasetAccuracyPrecisionRecallF1-ScoreMSE
Arabic-Levantine Hate Speech Detection 0.9350.9280.940.9340.072
Multilingual Hate Speech Dataset (Arabic subset)0.9140.9060.9210.9130.085
Arabic Hate Speech Dataset 0.9470.9420.950.9460.061
Table 11. Cross-Dialect Normalization Coverage and Accuracy.
Table 11. Cross-Dialect Normalization Coverage and Accuracy.
DialectCoverage (%)Normalization Accuracy (%)Dominant Ambiguities
Iraqi Arabic94.297.87→ج vs. چ
Gulf Arabic91.896.9g→ج vs. ق
Levantine Arabic92.696.39→ص vs. ض
Egyptian Arabic90.595.16→ق vs. ج vs. ك
Table 12. Ablation Comparison Between AraBERT–HAN and Alternative Attention Architectures.
Table 12. Ablation Comparison Between AraBERT–HAN and Alternative Attention Architectures.
ArchitectureAccuracyF1-ScoreComplexity (Params)Interpretability
AraBERT + HAN (Proposed)0.970.96High (~135 M)High
AraBERT + Self-Attention0.950.94Moderate (~120 M)Medium
AraBERT + BiLSTM + Max-Pooling0.940.93Moderate (~110 M)Low
AraBERT + Simple Attention Layer0.9450.935Low (~108 M)Medium
AraBERT Only (No Attention Layer)0.930.91Low (~100 M)Low
Table 13. Comparative Performance with Related Works.
Table 13. Comparative Performance with Related Works.
ReferenceModel/TechniqueDomainObfuscation HandlingAccuracyF1-Score
Anezi (2022) [13] DRNNArabic Hate Speech0.990.84
Khezzar et al. (2023) [12] arHateDetector (AraBERT)Arabic Tweets0.930.91
Tundis et al. (2021) [28] Deep CNN-RNNHidden PropagandaPartial (Mixed-code)0.920.89
Vélez de Mendizabal et al. (2023) [29] DL-based DeobfuscatorEnglish Spam Filtering0.940.9
Alrehaili et al. (2023) [30] CNN + RNNArabic Dialect Classification0.950.93
Hibatullah et al. (2025) [31] Ensemble MLGame Chat Harassment0.930.91
Daquigan et al. (2025) [32] Enhanced Hate FilterGame Chats (Minecraft)Partial0.940.92
Proposed (Hybrid AraBERT–HAN)AraBERT + HAN + Leet-NormalizationIraqi Arabic Text✓✓0.970.96
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Marzoog, D.; Çakir, H. Deobfuscating Iraqi Arabic Leetspeak for Hate Speech Detection Using AraBERT and Hierarchical Attention Network (HAN). Electronics 2025, 14, 4318. https://doi.org/10.3390/electronics14214318

AMA Style

Marzoog D, Çakir H. Deobfuscating Iraqi Arabic Leetspeak for Hate Speech Detection Using AraBERT and Hierarchical Attention Network (HAN). Electronics. 2025; 14(21):4318. https://doi.org/10.3390/electronics14214318

Chicago/Turabian Style

Marzoog, Dheyauldeen, and Hasan Çakir. 2025. "Deobfuscating Iraqi Arabic Leetspeak for Hate Speech Detection Using AraBERT and Hierarchical Attention Network (HAN)" Electronics 14, no. 21: 4318. https://doi.org/10.3390/electronics14214318

APA Style

Marzoog, D., & Çakir, H. (2025). Deobfuscating Iraqi Arabic Leetspeak for Hate Speech Detection Using AraBERT and Hierarchical Attention Network (HAN). Electronics, 14(21), 4318. https://doi.org/10.3390/electronics14214318

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop