Deobfuscating Iraqi Arabic Leetspeak for Hate Speech Detection Using AraBERT and Hierarchical Attention Network (HAN)
Abstract
1. Introduction
- We propose a robust normalization pipeline specifically designed to deobfuscate Iraqi Arabic leetspeak text.
- We introduce a novel integration of AraBERT with a Hierarchical Attention Network for hate speech detection, combining contextual embeddings with multi-level attention.
- We evaluate the framework on an Iraqi Arabic social media dataset, demonstrating its enhanced performance and robustness compared to baseline models such as AraBERT-BiLSTM.
- We provide interpretability through attention visualization, enabling deeper insights into how the model identifies hate speech.
2. Related Works
3. Proposed Method
3.1. AraBERT
| Algorithm 1: AraBERT fine-tuning with leetspeak-aware Arabic preprocessing |
| 1: input: labeled corpus D = {(x_i, y_i)}_{i = 1…N}, pretrained name max length L, batch size B, epochs T, learning rates {α_t}, class weights w_c (optional), LeetMap 𝓜 = {2→ء,3→ع,4→غ,5→خ,6→ط,7→ح,8→ق,9→ص,0→و,@→ا,$→س} 2: initialize: tokenizer τ ← AutoTokenizer(); encoder f_θ ← BERT() + linear head; bestF1 ← −∞; θ* ← θ 3: for i = 1 to N do 4: x_i ← NormalizeArabic(x_i) ▷ إ/أ/آ/ا→ا ,ى→ي ,ؤ→و ,ئ→ي ,ة→ه, remove tatweel/diacritics/spaces 5: for (k,v) in 𝓜 do x_i ← replace(x_i, k→v) end for ▷ de-obfuscate leetspeak 6: end for 7: split D → (Train, Val, Test) stratified ▷ preserve label ratios 8: encode each split with τ to (input_ids, attention_mask), pad/truncate to L 9: for t = 1 to T do 10: for each minibatch (I, A, y) in Train do 11: z ← f_θ(I, A) ▷ forward pass using [CLS] 12: ℓ_t ← CE(z, y; w_c) ▷ (weighted) cross-entropy 13: θ ← AdamW(θ, ∇_θ ℓ_t, α_t) ▷ update params (scheduler/grad-clip if used) 14: end for 15: ŷ_val ← argmax softmax(f_θ(I_val, A_val)) ▷ validation prediction 16: F1_val ← F1(y_val, ŷ_val) ▷ select by F1 17: if F1_val > bestF1 then bestF1 ← F1_val; θ* ← θ end if 18: end for 19: p_test ← softmax(f_{θ*}(I_test, A_test))[:,1]; ŷ_test ← 1{p_test ≥ 0.5} 20: output: trained parameters θ*, metrics on Test (Accuracy, Precision, Recall, F1, ROC-AUC), confusion matrix |
3.2. Hierarchical Attention Network (HAN)
| Algorithm 2: Hierarchical Attention Network (HAN): structure and process |
| 1: input: labeled corpus D = {(x_i, y_i)}_{i = 1…,N}, max sentences S, max words W, embedding dim E, word/sentence hidden sizes (h_w, h_s), batch size B, epochs T, learning rates {α_t}, min_freq m, class weights w_c (optional), NormalizeArabic(·) and LeetMap 𝓜 2: split D → (Train, Val, Test) stratified 3: // Vocabulary on Train only 4: for x ∈ Train do 5: x ← NormalizeArabic(x); for (k, v) ∈ 𝓜 do x ← replace(x, k→v) end for 6: tokenize into sentences and words; count word frequencies 7: end for 8: build vocab V with special tokens {<pad>, <unk>} and keep tokens freq ≥ m 9: // Encode all splits to fixed tensors 10: for each x in {Train, Val, Test} do 11: x ← NormalizeArabic + LeetMap; sentence-split; word-tokenize 12: map tokens → ids with V; pad/truncate to S × W → X ∈ ℕ^{S × W} 13: build masks: M_w (per sentence word-mask), M_s (sentence existence-mask) 14: end for 15: // Model components 16: parameters θ = {Emb ∈ ℝ^{|V| × E}, BiGRU_w: E→2h_w, context q_w ∈ ℝ^{2h_w}, BiGRU_s: 2h_w→2h_s, context q_s ∈ ℝ^{2h_s}, W_c ∈ ℝ^{2h_s × C}, b_c ∈ ℝ^{C}}, C = 2 17: bestF1 ← −∞; θ* ← θ 18: for t = 1 to T do 19: for (X, M_w, M_s, y) minibatches from Train do 20: E_x ← Emb(X) ▷ [B, S, W, E] 21: H_w ← BiGRU_w(E_x reshaped to [B·S, W, E]) ▷ [B·S, W, 2h_w] 22: u_w ← tanh(H_w) ▷ linear proj 23: α_w ← softmax(u_w, q_w) with mask M_w ▷ word attention 24: S_vec ← Σ_{w} α_w ⊙ H_w ▷ [B·S, 2h_w] 25: S_vec ← reshape to [B, S, 2h_w] 26: H_s ← BiGRU_s(S_vec) ▷ [B, S, 2h_s] 27: u_s ← tanh(H_s, W_s); α_s ← softmax(u_s, q_s) with mask M_s ▷ sent attention 28: v_doc ← Σ_{s} α_s ⊙ H_s ▷ [B, 2h_s] 29: z ← v_doc W_c + b_c ▷ logits 30: ℓ_t ← CE(z, y; w_c) ▷ (weighted) cross-entropy 31: θ ← Adam(θ, ∇_θ, ℓ_t, α_t) with grad-clip/scheduler (if used) 32: end for 33: ŷ_val ← argmax softmax(f_θ(X_val)); F1_val ← F1(y_val, ŷ_val) 34: if F1_val > bestF1 then bestF1 ← F1_val; θ* ← θ end if 35: end for 36: p_test ← softmax(f_{θ*}(X_test))[:,1]; ŷ_test ← 1{p_test ≥ 0.5} 37: output: θ*, Test metrics (Accuracy, Precision, Recall, F1, ROC–AUC), confusion matrix; attention weights (α_w, α_s) for interpretability |
3.3. Hybrid AraBERT and HAN
| Algorithm 3: Hybrid AraBERT + HAN: joint training and late-fusion selection |
| 1: input: raw text T = {w1, w2, …, wm}, label set = {0, 1} 2: parameters: AraBERT encoder f_θ; word-attention {W_w, b_w, u_w}; sentence encoder g_φ (e.g., BiGRU_s) & sentence-attention {W_s, b_s, u_s}; classifier (W, b) 3: // Deobfuscation & normalization 4: Ṫ ← NormalizeArabicAndLeet(T) ▷ digits/symbols → canonical forms 5: // Contextual embeddings from AraBERT 6: H = {h1, …, h_m} ← f_θ(Ṫ), h_i ∈ ℝ^d ▷ (Equation (9)) 7: // Word-level HAN attention (per sentence) 8: for each sentence s_ℓ with token indices 𝕀_ℓ do 9: for i ∈ 𝕀_ℓ do 10: u_i = tanh(W_w h_i + b_w) ▷ (Equation (10)) 11: end for 12: α_i = exp(u_i^⊤ u_w)/Σ_{j ∈ 𝕀_ℓ} exp(u_j^⊤ u_w) ▷ (Equation (12)) 13: s_ℓ = Σ_{i ∈ 𝕀_ℓ} α_i h_i ▷ (Equation (13)) 14: end for 15: // Sentence-level encoder + attention 16: {h_1^s,…,h_n^s} = g_φ([s_1, …, s_n]) ▷ encode sentence sequence 17: for j = 1 to n do 18: u_j = tanh(W_s h_j^s + b_s) ▷ (Equation (14)) 19: end for 20: β_j = exp(u_j^⊤ u_s)/Σ_k exp(u_k^⊤ u_s) ▷ (Equation (15)) 21: d = Σ_j β_j h_j^s ▷ (Equation (16)) 22: // Classification 23: ŷ = softmax(W d + b) ▷ (Equation (17)) 24: output: class probabilities ŷ and attention weights {α_i}, {β_j} |
4. Experimental Setup and Results
4.1. Dataset
4.1.1. Manual Annotation Procedure
4.1.2. Annotation Guideline Summary
4.1.3. Computation of Agreement Statistics
4.1.4. Preprocessing
4.2. AraBERT-HAN Setup
4.3. System Setup
4.4. Evaluation Metrics
4.5. Results
4.5.1. Overall Test-Set Performance and Baseline Comparison
4.5.2. Ablation Analysis
4.5.3. Robustness and Interpretability Analysis
4.5.4. Cross-Dataset Generalization
- Character Substitution, where individual Arabic characters were replaced by visually or phonetically similar digits or symbols (e.g., “ع” → “3”, “ح” → “7”, “س” → “$”).
- Digit/Symbol Obfuscation, introducing mixed numeric or symbolic encodings (e.g., “9alb” for “قلب”).
- Typographical Noise, consisting of random insertions, deletions, or keyboard-adjacent errors.
- Short-Length Posts, simulating incomplete or fragmented statements typical of quick replies or insults.
- Long-Length Posts, incorporating multiple clauses or nested discourse to assess contextual focus.
- Class Imbalance, created by skewing the distribution of hateful versus neutral samples.
4.5.5. Comparative Analysis with Existing Studie
4.6. Results Discussion
5. Conclusions and Future Recommendations
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Alkharsan, A.; Ata, O. HawkFish Optimization Algorithm: A Gender-Bending Approach for Solving Complex Optimization Problems. Electronics 2025, 14, 611. [Google Scholar] [CrossRef]
- Ahmad, A.; Azzeh, M.; Alnagi, E.; Abu Al-Haija, Q.; Halabi, D.; Aref, A.; AbuHour, Y. Hate speech detection in the Arabic language: Corpus design, construction, and evaluation. Front. Artif. Intell. 2024, 7, 1345445. [Google Scholar] [CrossRef] [PubMed]
- Abdullah, N.A.; Abdulghani, F.A. A Survey on Arabic Text Classification Using Deep and Machine Learning Algorithms. Iraqi J. Sci. 2022, 63, 409–419. [Google Scholar] [CrossRef]
- Kanan, T.; Sadaqa, O.; Aldajeh, A.; Alshwabka, H.; AL-dolime, W.; AlZu’bi, S.; Hawashin, B.; Alia, M.A. A Review of Natural Language Processing and Machine Learning Tools Used to Analyze Arabic Social Media. In Proceedings of the 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), Amman, Jordan, 9–11 April 2019. [Google Scholar]
- Deeply-Ingrained Prejudice Fuels Hate Speech in Iraq. Institute of Development Studies (IDS), Opinion Article. Available online: https://www.ids.ac.uk/opinions/deeply-ingrained-prejudice-fuels-hate-speech-in-iraq/ (accessed on 4 October 2025).
- Itriq, M.; Mohd Noor, M.H. Arabic hate speech detection using deep learning: A state-of-the-art survey of advances, challenges, and future directions (2020–2024) . PeerJ Comput. Sci. 2025, 11, e3133. [Google Scholar] [CrossRef]
- Shapiro, A.; Khalafallah, A.; Torki, M. AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify. arXiv 2022. [Google Scholar] [CrossRef]
- Omar, A.; Mahmoud, T.M.; Abd-El-Hafeez, T. Comparative Performance of Machine Learning and Deep Learning Algorithms for Arabic Hate Speech Detection in OSNs. In Advances in Intelligent Systems and Computing, Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020), Cairo, Egypt, 8–10 April 2020; Hassanien, A.E., Azar, A., Gaber, T., Oliva, D., Tolba, F., Eds.; Springer: Cham, Switzerland, 2020; Volume 1153. [Google Scholar] [CrossRef]
- Muzakir, A.; Adi, K.; Kusumaningrum, R. Classification of Hate Speech Language Detection on Social Media: Preliminary Study for Improvement. In Lecture Notes on Data Engineering and Communications Technologies, Proceedings of the Emerging Trends in Intelligent Systems & Network Security. NISS 2022, Bandung, Indonesia, 30–31 March 2022; Ben Ahmed, M., Abdelhakim, B.A., Ane, B.K., Rosiyadi, D., Eds.; Springer: Cham, Switzerland, 2023; Volume 147. [Google Scholar] [CrossRef]
- Mousa, A.; Shahin, I.; Nassif, A.B.; Elnagar, A. Detection of Arabic offensive language in social media using machine learning models. Intell. Syst. Appl. 2024, 22, 200376. [Google Scholar] [CrossRef]
- Mazari, A.C.; Benterkia, A.; Takdenti, Z. Advancing offensive language detection in Arabic social media: A BERT-based ensemble learning approach. Soc. Netw. Anal. Min. 2024, 14, 186. [Google Scholar] [CrossRef]
- Khezzar, R.; Moursi, A.; Al Aghbari, Z. arHateDetector: Detection of hate speech from standard and dialectal Arabic Tweets. Discov. Internet Things 2023, 3, 1. [Google Scholar] [CrossRef]
- Anezi, F.Y.A. Arabic Hate Speech Detection Using Deep Recurrent Neural Networks. Appl. Sci. 2022, 12, 6010. [Google Scholar] [CrossRef]
- de Paula, A.F.M.; Bensalem, I.; Rosso, P.; Zaghouani, W. Transformers and Ensemble Methods: A Solution for Hate Speech Detection in Arabic Languages. arXiv 2023. [Google Scholar] [CrossRef]
- Alshahrani, E.S.; Aksoy, M.S. Adversarially Robust Multitask Learning for Offensive and Hate Speech Detection in Arabic Text Using Transformer-Based Models and RNN Architectures. Appl. Sci. 2025, 15, 9602. [Google Scholar] [CrossRef]
- Elzayady, H.; Mohamed, M.S.; Badran, K.M.; Salama, G.I. A hybrid approach based on personality traits for hate speech detection in Arabic social media. Int. J. Electr. Comput. Eng. 2023, 13, 1979. [Google Scholar]
- Mazari, A.C.; Boudoukhani, N.; Djeffal, A. BERT-Based Ensemble Learning for Multi-Aspect Hate Speech Detection. Cluster Comput. 2024, 27, 325–339. [Google Scholar] [CrossRef]
- Antoun, W.; Baly, F.; Hajj, H. AraBERT: Transformer-based Model for Arabic Language Understanding. arXiv 2020, arXiv:2003.00104. [Google Scholar]
- Li, L.; Wang, Q.; Zhao, B.; Li, X.; Zhou, A.; Wu, H. Pre-Training and Fine-Tuning with Next Sentence Prediction for Multimodal Entity Linking. Electronics 2022, 11, 2134. [Google Scholar] [CrossRef]
- Ratmele, A.; Thakur, R. OpExHAN: Opinion extraction using hierarchical attention network from unstructured reviews. Soc. Netw. Anal. Min. 2022, 12, 148. [Google Scholar] [CrossRef]
- Abdelali, A.; Mubarak, H.; Samih, Y.; Hassan, S.; Darwish, K. QADI: Arabic Dialect Identification in the Wild. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine (Virtual), 9 April 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 1–10. [Google Scholar]
- Founta, A.; Djouvas, C.; Chatzakou, D.; Leontiadis, I.; Blackburn, J.; Stringhini, G.; Vakali, A.; Sirivianos, M.; Kourtellis, N. Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior. In Proceedings of the International AAAI Conference on Web and Social Media, Palo Alto, CA, USA, 25–28 June 2018; Volume 12. [Google Scholar] [CrossRef]
- Abdellaoui, I.; Ibrahimi, A.; El Bouni, M.A.; Mourhir, A.; Driouech, S.; Aghzal, M. Investigating Offensive Language Detection in a Low-Resource Setting with a Robustness Perspective. Big Data Cogn. Comput. 2024, 8, 170. [Google Scholar] [CrossRef]
- Hermessi, H. Arabic-Levantine Hate Speech Detection Dataset. Kaggle Dataset. Available online: https://www.kaggle.com/datasets/haithemhermessi/arabic-levantine-hate-speech-detection (accessed on 12 October 2025).
- Moosa, W.H. Multilingual Hate Speech Dataset. Kaggle Dataset. Available online: https://www.kaggle.com/datasets/wajidhassanmoosa/multilingual-hatespeech-dataset (accessed on 12 October 2025).
- Aziz, K. Arabic Hate Speech Dataset. Kaggle Dataset. Available online: https://www.kaggle.com/datasets/khuzaimaaziz/arabic-hate-speech-dataset (accessed on 12 October 2025).
- Wilkinson, M.; Dumontier, M.; Aalbersberg, I.J. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef]
- Tundis, A.; Mukherjee, G.; Mühlhäuser, M. An Algorithm for the Detection of Hidden Propaganda in Mixed-Code Text over the Internet. Appl. Sci. 2021, 11, 2196. [Google Scholar] [CrossRef]
- Vélez de Mendizabal, I.; Vidriales, X.; Basto Fernandes, V.; Ezpeleta, E.; Méndez, J.R.; Zurutuza, U. Deobfuscating Leetspeak With Deep Learning to Improve Spam Filtering. Int. J. Interact. Multimed. Artif. Intell. 2023, 8, 46–55. [Google Scholar] [CrossRef]
- Alrehaili, M.; Alasmari, T.; Aoalshutayri, A. Arabic Speech Dialect Classification using Deep Learning. In Proceedings of the 2023 1st International Conference on Advanced Innovations in Smart Cities (ICAISC), Jeddah, Saudi Arabia, 23–25 January 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Hibatullah, H.; Ballı, T.; Yetkin, E.F. Verbal harassment detection in online games using machine learning methods. Entertain. Comput. 2025, 55, 101009. [Google Scholar] [CrossRef]
- Daquigan, J.M.; Marbella, G.K.G.; Dioses, R.M.; Co, J.D.C.; Centeno, C.J.; Mata, K.E. Enhancement of Profanity Filtering and Hate Speech Detection Algorithm applied in Minecraft Chats. TTACA 2025, 4, 1–7. [Google Scholar] [CrossRef]
- Alrasheed, S.; Aladhadh, S.; Alabdulatif, A. Protecting Intellectual Security Through Hate Speech Detection Using an Artificial Intelligence Approach. Algorithms 2025, 18, 179. [Google Scholar] [CrossRef]













| Ref. | Main Approach | Dataset | Key Findings |
|---|---|---|---|
| [6] | Survey of deep learning (CNN, RNN, Transformers) | Multiple datasets (2020–2024) | Identified AraBERT and MARBERT as dominant; challenges include dialects and data scarcity. |
| [7] | Contrastive + Multi-task learning with Transformers | Arabic Twitter (OSACT5) | Achieved F1-scores of 0.841, 0.817, 0.476 across subtasks A–C; showed contrastive learning reduces overfitting. |
| [8] | Comparative ML + RNN/Deep Learning | Multi-platform (Facebook, Twitter, Instagram, YouTube) | RNN achieved 98.7% accuracy, outperforming other models. |
| [9] | SVM, XGBoost with SMOTE | Indonesian Twitter | SVM with SMOTE reached 90.7% accuracy; dataset balancing improved results. |
| [10] | Cascaded BERT → BiLSTM → RBF | Balanced Arabic Twitter dataset | F1-score ~98%; cascaded model outperformed traditional classifiers. |
| [11] | BERT Ensemble + BiLSTM | OffensEval2020 Arabic dataset | Ensemble achieved F1-score 94.56%, surpassing OffensEval winner. |
| [12] | arHateDetector (AraBERT, CNN, SVC) | arHateDataset (Standard + Dialects) | AraBERT reached 93% accuracy, outperforming CNN and SVC. |
| [13] | Deep Recurrent Neural Networks (DRNN-1, DRNN-2) | 4203 Arabic comments (7 categories) | DRNN achieved 99.73% binary accuracy, 84.14% on 7-class task. |
| [14] | Transformer ensembles (majority vote) | CERIST NLP Challenge dataset | Ensemble achieved F1-score 0.60, accuracy 0.86. |
| [15] | Adversarial Multitask (MARBERTv2 + BiGRU) | OSACT2020 + augmented posts | Adversarial training boosted robustness; macro-F1 improved to 81%. |
| [16] | Hybrid model with personality traits + text features | Arabic social media posts | Improved performance by integrating personality cues with text. |
| Symbol/Number | Normalized Arabic Character | Example Transformation |
|---|---|---|
| 2 | ء | “2mr” → “أمر” |
| 3 | ع | “3ib” → “عيب” |
| 3′ | غ | “3′rb” → “غرب” |
| 5 | خ | “5air” → “خير” |
| 6 | ط | “6aleb” → “طالب” |
| 6′ | ظ | “6′lem” → “ظلم” |
| 7 | ح | “7ob” → “حب” |
| 7′ | خ | “7′rf” → “خرف” |
| 8 | ق | “8alb” → “قلب” |
| 9 | ص | “9abr” → “صبر” |
| 9′ | ض | “9′afa” → “ضعف” |
| 4 | ش | “4ams” → “شمس” |
| sh | ش | “shabab” → “شباب” |
| kh | خ | “khalas” → “خلاص” |
| ch | چ | “chalb” → “چلب” |
| zh | ژ | “zhur” → “ژهور” |
| $ | س | “$alam” → “سلام” |
| @ | ا | “@hmed” → “احمد” |
| 0 | و | “0mar” → “عمر” |
| Parameter | Value |
|---|---|
| Maximum sequence length | 160 |
| Hidden size | 768 |
| Number of layers | 12 |
| Attention heads | 12 |
| Batch size | 16 |
| Learning rate | 2 × 10−5 |
| Optimizer | AdamW |
| Dropout rate | 0.1 |
| Epochs | 5 |
| Parameter | Value |
|---|---|
| Word encoder hidden size | 256 |
| Sentence encoder hidden size | 256 |
| Attention dimension | 100 |
| Dropout rate | 0.2 |
| Batch size | 16 |
| Optimizer | Adam |
| Activation function | Tanh |
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| AraBERT | 0.931 ± 0.004 | 0.922 ± 0.006 | 0.915 ± 0.005 | 0.913 ± 0.005 |
| HAN | 0.908 ± 0.006 | 0.897 ± 0.007 | 0.888 ± 0.008 | 0.887 ± 0.006 |
| Hybrid AraBERT–HAN (Proposed) | 0.970 ± 0.003 | 0.961 ± 0.004 | 0.959 ± 0.003 | 0.958 ± 0.004 |
| Model | ROC–AUC |
|---|---|
| AraBERT | 0.95 |
| HAN | 0.93 |
| Hybrid AraBERT–HAN | 0.98 |
| Variant | Accuracy | Precision | Recall | F1-Score | ROC–AUC |
|---|---|---|---|---|---|
| Full Hybrid (Norm + AraBERT + HAN) | 0.97 | 0.96 | 0.96 | 0.96 | 0.98 |
| Normalization removed | 0.93 | 0.92 | 0.9 | 0.91 | 0.95 |
| Alternative Leet-Map (weaker) | 0.95 | 0.95 | 0.94 | 0.94 | 0.97 |
| Word Attention disabled | 0.95 | 0.94 | 0.92 | 0.93 | 0.96 |
| Sentence Attention disabled | 0.95 | 0.95 | 0.93 | 0.94 | 0.965 |
| Perturbation Type | Accuracy | F1-Score | ROC–AUC |
|---|---|---|---|
| Clean Text | 0.97 | 0.96 | 0.98 |
| Character Substitution | 0.94 | 0.93 | 0.96 |
| Digit/Symbol Obfuscation | 0.93 | 0.92 | 0.95 |
| Typographical Noise | 0.92 | 0.91 | 0.94 |
| Short-Length Posts | 0.96 | 0.95 | 0.97 |
| Long-Length Posts | 0.95 | 0.94 | 0.96 |
| Class Imbalance | 0.94 | 0.93 | 0.95 |
| Dataset | Region/Domain | Size (Samples) | Labels | Dialect/Language | Source Platform |
|---|---|---|---|---|---|
| Arabic-Levantine Hate Speech Detection [25] | Levant (Lebanon, Syria, Jordan, Palestine) | ≈16,000 | Hate/Non-Hate | Levantine Arabic | Twitter, Facebook |
| Multilingual Hate Speech Dataset [26] | Global (10 languages including Arabic) | ≈45,000 | Hate/Non-Hate | Multilingual (Arabic subset) | Twitter, YouTube |
| Arabic Hate Speech Dataset [27] | Pan-Arab | ≈27,000 | Hate/Non-Hate | Modern Standard and regional Arabic |
| Dataset | Accuracy | Precision | Recall | F1-Score | MSE |
|---|---|---|---|---|---|
| Arabic-Levantine Hate Speech Detection | 0.935 | 0.928 | 0.94 | 0.934 | 0.072 |
| Multilingual Hate Speech Dataset (Arabic subset) | 0.914 | 0.906 | 0.921 | 0.913 | 0.085 |
| Arabic Hate Speech Dataset | 0.947 | 0.942 | 0.95 | 0.946 | 0.061 |
| Dialect | Coverage (%) | Normalization Accuracy (%) | Dominant Ambiguities |
|---|---|---|---|
| Iraqi Arabic | 94.2 | 97.8 | 7→ج vs. چ |
| Gulf Arabic | 91.8 | 96.9 | g→ج vs. ق |
| Levantine Arabic | 92.6 | 96.3 | 9→ص vs. ض |
| Egyptian Arabic | 90.5 | 95.1 | 6→ق vs. ج vs. ك |
| Architecture | Accuracy | F1-Score | Complexity (Params) | Interpretability |
|---|---|---|---|---|
| AraBERT + HAN (Proposed) | 0.97 | 0.96 | High (~135 M) | High |
| AraBERT + Self-Attention | 0.95 | 0.94 | Moderate (~120 M) | Medium |
| AraBERT + BiLSTM + Max-Pooling | 0.94 | 0.93 | Moderate (~110 M) | Low |
| AraBERT + Simple Attention Layer | 0.945 | 0.935 | Low (~108 M) | Medium |
| AraBERT Only (No Attention Layer) | 0.93 | 0.91 | Low (~100 M) | Low |
| Reference | Model/Technique | Domain | Obfuscation Handling | Accuracy | F1-Score |
|---|---|---|---|---|---|
| Anezi (2022) [13] | DRNN | Arabic Hate Speech | ✗ | 0.99 | 0.84 |
| Khezzar et al. (2023) [12] | arHateDetector (AraBERT) | Arabic Tweets | ✗ | 0.93 | 0.91 |
| Tundis et al. (2021) [28] | Deep CNN-RNN | Hidden Propaganda | Partial (Mixed-code) | 0.92 | 0.89 |
| Vélez de Mendizabal et al. (2023) [29] | DL-based Deobfuscator | English Spam Filtering | ✓ | 0.94 | 0.9 |
| Alrehaili et al. (2023) [30] | CNN + RNN | Arabic Dialect Classification | ✗ | 0.95 | 0.93 |
| Hibatullah et al. (2025) [31] | Ensemble ML | Game Chat Harassment | ✗ | 0.93 | 0.91 |
| Daquigan et al. (2025) [32] | Enhanced Hate Filter | Game Chats (Minecraft) | Partial | 0.94 | 0.92 |
| Proposed (Hybrid AraBERT–HAN) | AraBERT + HAN + Leet-Normalization | Iraqi Arabic Text | ✓✓ | 0.97 | 0.96 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Marzoog, D.; Çakir, H. Deobfuscating Iraqi Arabic Leetspeak for Hate Speech Detection Using AraBERT and Hierarchical Attention Network (HAN). Electronics 2025, 14, 4318. https://doi.org/10.3390/electronics14214318
Marzoog D, Çakir H. Deobfuscating Iraqi Arabic Leetspeak for Hate Speech Detection Using AraBERT and Hierarchical Attention Network (HAN). Electronics. 2025; 14(21):4318. https://doi.org/10.3390/electronics14214318
Chicago/Turabian StyleMarzoog, Dheyauldeen, and Hasan Çakir. 2025. "Deobfuscating Iraqi Arabic Leetspeak for Hate Speech Detection Using AraBERT and Hierarchical Attention Network (HAN)" Electronics 14, no. 21: 4318. https://doi.org/10.3390/electronics14214318
APA StyleMarzoog, D., & Çakir, H. (2025). Deobfuscating Iraqi Arabic Leetspeak for Hate Speech Detection Using AraBERT and Hierarchical Attention Network (HAN). Electronics, 14(21), 4318. https://doi.org/10.3390/electronics14214318

