GTHL-Emo: Adaptive Imbalance-Aware and Correlation-Aligned Training for Arabic Multi-Label Emotion Detection
Abstract
1. Introduction
- We present a unified architecture that jointly leverages transformer encoders, graph neural networks, and correlation-aware label embeddings for robust Arabic multi-label emotion detection.
- We propose an adaptive hybrid loss weighting mechanism driven by batch-level statistics that dynamically mitigates class imbalance while preserving the label correlation structure.
- We introduce a transformer-based label embedding module with KL-divergence alignment that captures complex label dependencies and smooths predictions for rare emotions through correlated frequent labels.
- We conduct comprehensive evaluation across three Arabic benchmarks (SemEval-2018-Ec-Ar, ExaAEC, and SemEval-2025-Arq) with detailed ablation studies demonstrating the necessity of each architectural component.
2. Related Work
3. Proposed Methodology
3.1. Problem Formulation
3.2. Transformer-Based Text Representation
3.3. Dynamic Graph Construction and GraphSAGE Aggregation
3.3.1. Mixed Similarity Metric via Adaptive Lambda Network
3.3.2. Adjacency Matrix Construction
3.3.3. GraphSAGE Layer with Rich Aggregation
3.4. Label Dependency Modeling with Transformer Encoders
3.5. Feature Fusion and Classification
3.6. Adaptive Hybrid Loss Function
3.6.1. Adaptive Weight Computation
3.6.2. Loss Components
Binary Cross-Entropy Loss
Focal Loss
Ranking Loss
Graph KL Loss
3.6.3. Exponential Moving Average Stabilization
3.7. Inference and Thresholding
| Algorithm 1: Training Algorithm for GTHL-Emo with adaptive hybrid loss. |
Input: Training set , validation set , maximum epochs E, patience P, batch size B, learning rate Output: Trained parameters Initialize model parameters and optimizer; Initialize best validation Jaccard , patience counter ![]() |
4. Experiment and Result Analysis
4.1. Implementation Details
4.1.1. Data Preparation
4.1.2. Model Configuration and Training
4.2. Hyperparameter Settings
- Learning Rate: An optimal learning rate of was selected for SemEval-2018-Ec-Ar and SemEval-2025-Arq. For the larger ExaAEC dataset, a slightly higher learning rate of proved more optimal.
- Batch Size: A smaller batch size of 2 was optimally chosen for SemEval-2018-Ec-Ar and SemEval-2025-Arq due to their relatively smaller dataset sizes, allowing for more frequent gradient updates. For the larger ExaAEC dataset, a batch size of 8 provided a better balance between memory efficiency and training stability.
- Dropout Rate: This regularization parameter was adjusted from for SemEval-2018-Ec-Ar to for ExaAEC and SemEval-2025-Arq, reflecting different levels of model complexity and data volume.
- Epochs and Early Stopping: Training was conducted for a maximum of 20 epochs across all datasets. To efficiently conserve computational resources and prevent overfitting, early stopping with a patience of 7 was consistently applied based on a validation Jaccard score [47,48]. On SemEval-2018-Ec-Ar, training typically concluded at around 13 epochs; for ExaAEC, it was around 18 epochs; and for SemEval-2025-Arq, it was around 15 epochs.
4.2.1. Evaluation Metrics
4.2.2. Hardware and Software
4.3. Baselines
4.3.1. SemEval-2018-Ec-Ar
- Random [29]: A naïve baseline that assigns emotion labels to tweets by random sampling, serving as a lower-bound reference for multi-label performance.
- MEDIAN Team [29]: Utilizes a median-value heuristic where for each emotion, it predicts the median label frequency observed in the training data, acting as an official shared task baseline.
- SVM-Unigrams [29]: Trains a one-versus-rest Support Vector Machine (SVM) on Term Frequency-Inverse Document Frequency (TF-IDF) unigram features (no additional embeddings or lexicons), establishing a strong lexical baseline for multi-label emotion classification.
- UNCC Team [32]: Employs a fully connected neural network that fuses pretrained Word2Vec and Doc2Vec embeddings with psycholinguistic features (e.g., affective lexicons), using the same architecture across all subtasks for English and Arabic.
- Tw-StAR Team [31]: Implements a binary relevance scheme with TF-IDF-based features for each emotion label and standard machine learning classifiers (e.g., SVM), placing an emphasis on careful preprocessing for Arabic, English, and Spanish tweets.
- PARTNA Team [29]: Combines lexical resources (emotion lexicons and character-level features) with distributional (embedding-based) features, feeding them into logistic regression-based classifiers in a multi-label set-up.
- EMA Team [30]: Applies extensive Arabic-specific preprocessing (diacritics removal, elongation normalization, emoji→word replacement, and light stemming), followed by the extraction of word embedding and handcrafted features. These features were then fed into various classification and regression models for each subtask.
- Khalil et al. [34]: Uses AraVec (Word2Vec embeddings trained on Arabic corpora) as input to a BiLSTM network with attention to capture contextual and sequential patterns, achieving robust gains on SemEval-2018-Ec-Ar.
- Alswaidan and Menai [35]: Introduces a low-resource Arabic emotion recognition framework that stacks diverse classifiers (e.g., SVM or Random Forest (RF)) with self-training on unlabeled tweets, leveraging hybrid lexical and distributional features to augment scarce annotated data.
- Samy et al. [33]: Proposes GRU and C-GRU in order to identify the contextual information from Arabic tweets. Then, they are used as an extra layer in order to grasp the emotional states expressed in the input Arabic tweet.
- Elfaik et al. [5]: Proposes combining AraBERT for contextual embeddings with an attention-based LSTM-Bi-LSTM model to classify emotions in a given Arabic tweet.
- Mansy et al. [51]: Proposes an ensemble of deep learning architectures, combining MARBERT, BiLSTM, and Bi-GRU for emotion detection in Arabic tweets, using a weighted sum equation to aggregate predictions from the three models.
- Elfaik et al. [36]: Employs feature-level fusion of contextual embeddings (e.g., AraBERT or ArabicBERT) and attentional CNNs to jointly capture global and local cues in Arabic tweets, demonstrating improved generalization on SemEval-2018-Ec-Ar.
4.3.2. ExaAEC Dataset
- asafaya-BERT [52]: A general-purpose Arabic transformer trained on OSCAR and other web-scale corpora.
- Qarib-BERT [53]: A BERT variant adapted for Arabic news and Wikipedia content.
- MARBERT [54]: A Twitter-centric transformer model designed for Arabic dialects and informal language.
- AraBERTv0.2 [13]: A robust Arabic language model trained on a mix of formal and social media text.
4.3.3. SemEval-2025 Track A Arabic (Arq) Dataset
- HTU Team [37]: Proposed a Label-Fused Iterative Mask Filling (L-IMF) technique and implemented six DziriBERT-based classifiers for multi-label emotion classification in the Arabic dialect (Arq). Their approach addressed class imbalance and label dependencies.
- INFOTEC-NLP Team [55]: Proposed a hybrid model combining XLM-RoBERTa embeddings with Bi-LSTM and multi-head attention for multi-label emotion classification in the same Arabic dialect (Arq). This approach captures sequential dependencies and addresses class imbalance.
- LATE-GIL-NLP Team [38]: Proposed fine-tuning mDeBERTa-v3-base with optimized loss combinations for multi-label emotion classification in the same dataset (Arq). This approach focuses on handling imbalanced data without augmentation.
- YNU-HPCC Team [56]: Proposed translating the dataset to English and using DeBERTa with modified prediction heads for multi-label emotion classification in the same dataset. This approach addresses class imbalance and biases toward dominant emotions.
4.4. Main Results
4.4.1. Baseline Comparison
SemEval-2018-Ec-Ar
ExaAEC
SemEval-2025 Track A Arabic (Arq)
4.4.2. Component-Wise Ablation Study
4.4.3. Sensitivity and Stability Analysis of Adaptive Weights
Graph Construction Sensitivity
Loss Component Configuration Analysis
Coefficient Sensitivity and Stability
Adaptive Weight Dynamics
4.4.4. Findings
Effect on Different Graph Neural Networks
Effect on GraphSAGE Depth
Label Correlations
4.5. Complexity vs. Performance
4.6. Model Analysis and Interpretability
4.6.1. Interpretability
4.6.2. Nearest-Neighbor Explanations
4.6.3. Error Analysis and Failure Cases
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
| ID | Arabic Query | Nearest Neighbor Texts |
|---|---|---|
| E1 | ![]() | |
| E2 | ![]() | |
| E3 | ![]() | |
| E4 | ![]() | |
| ID | Arabic Text (Original) |
|---|---|
| E5 | ![]() |
| E6 | ![]() |
| E7 | ![]() |
| E8 | ![]() |
| E9 | ![]() |
References
- Hossain, M.M.; Hossain, M.S.; Mridha, M.; Safran, M.; Alfarhood, S. Multi task opinion enhanced hybrid BERT model for mental health analysis. Sci. Rep. 2025, 15, 3332. [Google Scholar] [CrossRef]
- Hossain, M.M.; Hossain, M.S.; Chaki, S.; Rahman, M.S.; Ali, A.S. Revolutionizing Mental Health Sentiment Analysis with BERT-Fuse: A Hybrid Deep Learning Model. IEEE Access 2025, 13, 85428–85446. [Google Scholar] [CrossRef]
- Mohammad, S.M.; Turney, P.D. Crowdsourcing a word–emotion association lexicon. Comput. Intell. 2013, 29, 436–465. [Google Scholar] [CrossRef]
- Abdul-Mageed, M.; Diab, M.; Kübler, S. SAMAR: Subjectivity and sentiment analysis for Arabic social media. Comput. Speech Lang. 2014, 28, 20–37. [Google Scholar] [CrossRef]
- Elfaik, H.; Nfaoui, E.H. Combining context-aware embeddings and an attentional deep learning model for Arabic affect analysis on Twitter. IEEE Access 2021, 9, 111214–111230. [Google Scholar] [CrossRef]
- Alhuzali, H.; Ananiadou, S. SpanEmo: Casting multi-label emotion classification as span-prediction. arXiv 2021, arXiv:2101.10038. [Google Scholar]
- Baziotis, C.; Athanasiou, N.; Chronopoulou, A.; Kolovou, A.; Paraskevopoulos, G.; Ellinas, N.; Narayanan, S.; Potamianos, A. Ntua-slp at semeval-2018 task 1: Predicting affective content in tweets with deep attentive rnns and transfer learning. arXiv 2018, arXiv:1804.06658. [Google Scholar]
- Chochlakis, G.; Mahajan, G.; Baruah, S.; Burghardt, K.; Lerman, K.; Narayanan, S. Leveraging label correlations in a multi-label setting: A case study in emotion. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar]
- Trohidis, K.; Tsoumakas, G.; Kalliris, G.; Vlahavas, I. Multi-label classification of music by emotion. EURASIP J. Audio Speech Music. Process. 2011, 2011, 4. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4171–4186. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Conneau, A.; Khandelwal, K.; Goyal, N.; Chaudhary, V.; Wenzek, G.; Guzmán, F.; Grave, E.; Ott, M.; Zettlemoyer, L.; Stoyanov, V. Unsupervised cross-lingual representation learning at scale. arXiv 2019, arXiv:1911.02116. [Google Scholar]
- Antoun, W.; Baly, F.; Hajj, H. Arabert: Transformer-based model for arabic language understanding. arXiv 2020, arXiv:2003.00104. [Google Scholar]
- Abdul-Mageed, M.; Elmadany, A.; Nagoudi, E.M.B. ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic. In 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers); Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 7088–7105. [Google Scholar] [CrossRef]
- Demszky, D.; Movshovitz-Attias, D.; Ko, J.; Cowen, A.; Nemade, G.; Ravi, S. GoEmotions: A dataset of fine-grained emotions. arXiv 2020, arXiv:2005.00547. [Google Scholar]
- Thiab, A.; Alawneh, L.; Mohammad, A.S. Contextual emotion detection using ensemble deep learning. Comput. Speech Lang. 2024, 86, 101604. [Google Scholar] [CrossRef]
- Read, J.; Pfahringer, B.; Holmes, G.; Frank, E. Classifier chains for multi-label classification. Mach. Learn. 2011, 85, 333–359. [Google Scholar] [CrossRef]
- Shi, M.; Tang, Y.; Zhu, X.; Liu, J. Multi-label graph convolutional network representation learning. IEEE Trans. Big Data 2020, 8, 1169–1181. [Google Scholar] [CrossRef]
- Zeng, D.; Zha, E.; Kuang, J.; Shen, Y. Multi-label text classification based on semantic-sensitive graph convolutional network. Knowl.-Based Syst. 2024, 284, 111303. [Google Scholar] [CrossRef]
- He, Y.; Zhang, Y.; Yang, F.; Yan, D.; Sheng, V.S. Label-dependent graph neural network. IEEE Trans. Comput. Soc. Syst. 2023, 11, 2990–3003. [Google Scholar] [CrossRef]
- Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In IEEE International Conference on Computer Vision; IEEE: New York, NY, USA, 2017; pp. 2980–2988. [Google Scholar]
- Dery, L. Multi-label ranking: Mining multi-label and label ranking data. In Machine Learning for Data Science Handbook: Data Mining and Knowledge Discovery Handbook; Springer: Cham, Switzerland, 2023; pp. 511–535. [Google Scholar]
- Alturayeif, N.; Luqman, H. Fine-grained sentiment analysis of arabic covid-19 tweets using bert-based transformers and dynamically weighted loss function. Appl. Sci. 2021, 11, 10694. [Google Scholar] [CrossRef]
- Chen, Z.M.; Wei, X.S.; Wang, P.; Guo, Y. Multi-label image recognition with graph convolutional networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2019; pp. 5177–5186. [Google Scholar]
- Vu, H.T.; Nguyen, M.T.; Nguyen, V.C.; Pham, M.H.; Nguyen, V.Q.; Nguyen, V.H. Label-representative graph convolutional network for multi-label text classification. Appl. Intell. 2023, 53, 14759–14774. [Google Scholar] [CrossRef]
- Huang, X.; Chen, B.; Xiao, L.; Yu, J.; Jing, L. Label-aware document representation via hybrid attention for extreme multi-label text classification. Neural Process. Lett. 2022, 54, 3601–3617. [Google Scholar] [CrossRef]
- Badaro, G.; Jundi, H.; Hajj, H.; El-Hajj, W.; Habash, N. Arsel: A large scale arabic sentiment and emotion lexicon. In Third Arabic Natural Language Processing Workshop (WANLP 2018); Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 26–35. [Google Scholar]
- Mohammad, S.; Bravo-Marquez, F.; Salameh, M.; Kiritchenko, S. Semeval-2018 task 1: Affect in tweets. In 12th International Workshop on Semantic Evaluation (SemEval-2018); Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 1–17. [Google Scholar]
- Badaro, G.; El Jundi, O.; Khaddaj, A.; Maarouf, A.; Kain, R.; Hajj, H.; El-Hajj, W. EMA at SemEval-2018 Task 1: Emotion Mining for Arabic. In 12th International Workshop on Semantic Evaluation; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 236–244. [Google Scholar]
- Mulki, H.; Ali, C.B.; Haddad, H.; Babaoğlu, I. Tw-StAR at SemEval-2018 Task 1: Preprocessing Impact on Multi-label Emotion Classification. In 12th International Workshop on Semantic Evaluation; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 167–171. [Google Scholar]
- Abdullah, M.; Shaikh, S. TeamUNCC at SemEval-2018 Task 1: Emotion Detection in English and Arabic Tweets using Deep Learning. In 12th International Workshop on Semantic Evaluation; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 350–357. [Google Scholar]
- Samy, A.E.; El-Beltagy, S.R.; Hassanien, E. A context integrated model for multi-label emotion detection. Procedia Comput. Sci. 2018, 142, 61–71. [Google Scholar] [CrossRef]
- Khalil, E.A.H.; Houby, E.M.E.; Mohamed, H.K. Deep learning for emotion analysis in Arabic tweets. J. Big Data 2021, 8, 136. [Google Scholar] [CrossRef]
- Alswaidan, N.; Menai, M.E.B. Hybrid feature model for emotion recognition in Arabic text. IEEE Access 2020, 8, 37843–37854. [Google Scholar] [CrossRef]
- Elfaik, H.; Nfaoui, E.H. Leveraging feature-level fusion representations and attentional bidirectional RNN-CNN deep models for Arabic affect analysis on Twitter. J. King Saud-Univ.-Comput. Inf. Sci. 2023, 35, 462–482. [Google Scholar] [CrossRef]
- Saleh, A.; Biltawi, M. HTU at SemEval-2025 Task 11: Divide and Conquer-Multi-Label emotion classification using 6 DziriBERTs submodels with Label-fused Iterative Mask Filling technique for low-resource data augmentation. In 19th International Workshop on Semantic Evaluation (SemEval-2025); Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 675–683. [Google Scholar]
- Vázquez-Osorio, J.; Gómez-Adorno, H.; Sierra, G.; Sierra-Casiano, V.; Canchola-Hernández, D.; Tovar-Cortés, J.; Solís-Vilchis, R.; Salazar, G. LATE-GIL-NLP at SemEval-2025 Task 11: Multi-Language Emotion Detection and Intensity Classification Using Transformer Models with Optimized Loss Functions for Imbalanced Data. In 19th International Workshop on Semantic Evaluation (SemEval-2025); Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 666–674. [Google Scholar]
- Sarbazi-Azad, S.; Akbari, A.; Khazeni, M. ExaAEC: A New Multi-label Emotion Classification Corpus in Arabic Tweets. In 2021 11th International Conference on Computer Engineering and Knowledge (ICCKE); IEEE: New York, NY, USA, 2021; pp. 465–470. [Google Scholar]
- Incremona, A.; Pozzi, A.; Guiscardi, A.; Tessera, D. A differentiable and uncertainty-aware mutual information regularizer for bias mitigation. Neurocomputing 2025, 669, 132498. [Google Scholar] [CrossRef]
- Hang, C.N.; Yu, P.D.; Chen, S.; Tan, C.W.; Chen, G. MEGA: Machine learning-enhanced graph analytics for infodemic risk management. IEEE J. Biomed. Health Inform. 2023, 27, 6100–6111. [Google Scholar] [CrossRef]
- Sehanobish, A.; Ravindra, N.; van Dijk, D. Gaining insight into sars-cov-2 infection and COVID-19 severity using self-supervised edge features and graph neural networks. In AAAI Conference on Artificial Intelligence; AAAI Press: Palo Alto, CA, USA, 2021; Volume 35, pp. 4864–4873. [Google Scholar]
- Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In 31st International Conference on Neural Information Processing Systems; ACM: New York, NY, USA, 2017; Volume 30. [Google Scholar]
- Ekman, P. Expression and the nature of emotion. In Approaches to Emotion; Scherer, K.R., Ekman, P., Eds.; Psychology Press: New York, NY, USA, 2014; pp. 319–343. [Google Scholar] [CrossRef]
- Plutchik, R. The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am. Sci. 2001, 89, 344–350. [Google Scholar] [CrossRef]
- Muhammad, S.H.; Ousidhoum, N.; Abdulmumin, I.; Wahle, J.P.; Ruas, T.; Beloucif, M.; de Kock, C.; Surange, N.; Teodorescu, D.; Ahmad, I.S.; et al. Brighter: Bridging the gap in human-annotated textual emotion recognition datasets for 28 languages. arXiv 2025, arXiv:2502.11926. [Google Scholar] [CrossRef]
- Prechelt, L. Automatic early stopping using cross validation: Quantifying the criteria. Neural Netw. 1998, 11, 761–767. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep feedforward networks. Deep Learn. 2016, 1, 161–217. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training recurrent neural networks. In International Conference on Machine Learning; PMLR: Atlanta, GA, USA, 2013; pp. 1310–1318. [Google Scholar]
- Mansy, A.; Rady, S.; Gharib, T. An ensemble deep learning approach for emotion detection in Arabic tweets. Int. J. Adv. Comput. Sci. Appl. 2022, 13. [Google Scholar] [CrossRef]
- Safaya, A.; Abdullatif, M.; Yuret, D. KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media. In Fourteenth Workshop on Semantic Evaluation, Barcelona; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 2054–2059. [Google Scholar]
- Abdelali, A.; Hassan, S.; Mubarak, H.; Darwish, K.; Samih, Y. Pre-Training BERT on Arabic Tweets: Practical Considerations. arXiv 2021, arXiv:2102.10684. [Google Scholar] [CrossRef]
- Abdul-Mageed, M.; Elmadany, A.; Nagoudi, E.M.B. ARBERT & MARBERT: Deep bidirectional transformers for Arabic. arXiv 2020, arXiv:2101.01785. [Google Scholar]
- Santos-Rodriguez, E.; Graff, M. INFOTEC-NLP at SemEval-2025 Task 11: A Case Study on Transformer-Based Models and Bag of Words. In 19th International Workshop on Semantic Evaluation (SemEval-2025); Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 350–356. [Google Scholar]
- Yang, H.; Wang, J.; Zhang, X. YNU-HPCC at SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Using Multiple Prediction Headers. In 19th International Workshop on Semantic Evaluation (SemEval-2025); Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 83–89. [Google Scholar]








| SemEval-2018-Ec-Ar | ExaAEC | SemEval-2025-Arq | |
|---|---|---|---|
| Train | 2278 | 16,031 | 901 |
| Validation | 585 | 2005 | 100 |
| Test | 1518 | 2014 | 902 |
| Total | 4381 | 20,050 | 1903 |
| Hyperparameter | Selected Value |
|---|---|
| Embedding Dimension (d) | 768 |
| Max Sequence Length | 128 |
| Learning Rate (SemEval-2018-Ec-Ar, SemEval-2025-Arq) | |
| Learning Rate (ExaAEC) | |
| Weight Decay | 0.02 |
| Class Weights | Weighted |
| Epochs (Max) | 20 |
| Dropout Rate (SemEval-2018-Ec-Ar) | 0.1 |
| Dropout Rate (ExaAEC, SemEval-2025-Arq) | 0.3 |
| GraphSAGE Layers | 2 |
| GraphSAGE Neighbors | 5 |
| Focal Loss | 0.1 |
| Graph Loss | 0.1 |
| Ranking Loss | 0.1 |
| Margin (Ranking Loss) | 0.5 |
| Focal Loss | Adaptive |
| Focal Loss | Adaptive |
| Early Stopping Patience | 7 |
| EMA Alpha (Loss Weights) | 0.95 |
| Top-k Spans (Attention) | 5 |
| Label Transformer Layers | 2 |
| Label Transformer Heads | 8 |
| Model | Jaccard Acc (%) | Micro-F1 (%) | Macro-F1 (%) |
|---|---|---|---|
| Baseline: Random [29] | 17.70 | – | – |
| MEDIAN Team [29] | 25.40 | – | – |
| Baseline: SVM-Unigrams [29] | 38.00 | – | – |
| UNCC Team [32] | 44.60 | – | – |
| Tw-StAR Team [31] | 46.50 | – | – |
| PARTNA Team [29] | 48.40 | – | – |
| EMA Team [30] | 48.90 | 61.80 | 46.10 |
| Khalil et al. [34] | 49.80 | 61.50 | 44.00 |
| Alswaidan and Menai [35] | 51.20 | 63.10 | 50.20 |
| Samy et al. [33] | 53.20 | 49.50 | 64.80 |
| Elfaik et al. [5] | 53.82 | – | – |
| Mansy et al. [51] | 54.00 | 52.70 | 70.10 |
| Elfaik et al. [36] | 60.00 | 52.00 | 35.00 |
| GTHL-Emo (Proposed) | 58.70 | 71.02 | 60.83 |
| Emotion | GTHL-Emo (Proposed) | Elfaik et al. [36] | ||||
|---|---|---|---|---|---|---|
| Precision | Recall | F1 | Precision | Recall | F1 | |
| anger | 0.78 | 0.86 | 0.82 | 0.67 | 0.62 | 0.70 |
| anticipation | 0.28 | 0.28 | 0.28 | 0.25 | 0.66 | 0.52 |
| disgust | 0.55 | 0.63 | 0.59 | 0.45 | 0.35 | 0.39 |
| fear | 0.79 | 0.72 | 0.75 | 0.38 | 0.11 | 0.17 |
| joy | 0.83 | 0.85 | 0.84 | 0.83 | 0.54 | 0.65 |
| love | 0.80 | 0.82 | 0.81 | 0.75 | 0.46 | 0.57 |
| optimism | 0.76 | 0.80 | 0.78 | 0.77 | 0.52 | 0.62 |
| pessimism | 0.45 | 0.72 | 0.56 | 0.33 | 0.16 | 0.22 |
| sadness | 0.70 | 0.84 | 0.76 | 0.69 | 0.45 | 0.55 |
| surprise | 0.35 | 0.32 | 0.33 | 0.00 | 0.00 | 0.00 |
| trust | 0.22 | 0.14 | 0.18 | 0.00 | 0.00 | 0.00 |
| Model | Jaccard Acc. (%) | Micro-F1 (%) | Macro-F1 (%) |
|---|---|---|---|
| Sarbazi et al. [39] (BiLSTM + ELMo) | – | 65.00 | – |
| asafaya-BERT | 56.66 | 61.06 | 59.02 |
| Qarib-BERT | 59.53 | 64.15 | 60.88 |
| MARBERT | 60.93 | 65.84 | 63.00 |
| AraBERTv0.2 | 59.42 | 63.72 | 58.97 |
| MARBERT Fine-tuned | 62.64 | 66.73 | 63.71 |
| GTHL-Emo (Proposed) | 65.99 ( pp, ↑ ) | 70.72 ( pp, ↑) | 68.71 ( pp, ↑) |
| Model | Anger | Ant. | Disgust | Fear | Joy | Love | Accept. | Sadness | Surprise | Neutral |
|---|---|---|---|---|---|---|---|---|---|---|
| MARBERT (base) | 42.91 | 54.84 | 65.73 | 57.14 | 65.47 | 64.22 | 74.92 | 69.91 | 61.54 | 73.36 |
| MARBERT Fine-tuned | 49.70 | 56.12 | 64.16 | 66.67 | 60.58 | 69.37 | 75.45 | 66.76 | 53.93 | 74.37 |
| AraBERTv0.2 | 45.89 | 52.73 | 63.48 | 60.24 | 40.00 | 62.40 | 72.78 | 60.47 | 57.21 | 74.48 |
| Qarib-BERT | 42.80 | 51.85 | 62.46 | 56.47 | 49.47 | 66.36 | 69.91 | 56.83 | 56.02 | 71.39 |
| asafaya-BERT | 37.32 | 45.54 | 60.69 | 59.52 | 65.69 | 54.32 | 69.91 | 56.01 | 54.59 | 70.98 |
| GTHL-Emo (Proposed) | 55.66 | 56.09 | 68.24 | 69.66 | 66.33 | 78.26 | 75.60 | 71.29 | 59.10 | 73.77 |
| Model | Macro-F1 (%) |
|---|---|
| HTU Team [37] | 51.2% |
| INFOTEC-NLP Team [55] | 51.7% |
| LATE-GIL-NLP Team [38] | 48.6% |
| YNU-HPCC Team [56] | 44.4% |
| GTHL-Emo (Proposed) | 56.69 (+9.65% ↑) |
| Ablation Setting | SemEval-2018-Ec-Ar | ExaAEC | SemEval-2025-Arq | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| J | µF1 | mF1 | HL | J | µF1 | mF1 | HL | J | µF1 | mF1 | HL | |
| w/o GraphSAGE | 57.60 | 69.92 | 60.43 | 63.69 | 68.61 | 66.20 | 0.0874 | 40.70 | 58.42 | 55.48 | 0.2905 | |
| w/o Label Transformer | 57.25 | 70.43 | 59.11 | 0.1404 | 64.03 | 68.40 | 66.31 | 0.0868 | 40.68 | 57.29 | 54.86 | 0.2664 |
| w/o Correlation Alignment | 58.43 | 70.38 | 59.67 | 0.1400 | 64.08 | 68.47 | 65.91 | 0.0868 | 41.07 | 58.23 | 56.04 | 0.2714 |
| w/o Ranking Loss | 58.01 | 69.67 | 57.51 | 0.1382 | 63.53 | 68.29 | 65.76 | 0.0849 | 40.12 | 57.88 | 55.43 | 0.2762 |
| w/o Imbalance Loss | 57.79 | 70.64 | 58.67 | 0.1383 | 63.53 | 68.11 | 65.43 | 0.0875 | 41.02 | 57.58 | 55.44 | 0.2631 |
| w/o Adaptive Weights | 59.15 | 70.19 | 57.53 | 0.1320 | 63.17 | 67.91 | 65.52 | 0.0891 | 39.11 | 56.83 | 54.60 | 0.2633 |
| GTHL-Emo (Full Model) | 58.70 | 71.02 | 60.83 | 0.1259 | 65.99 | 70.72 | 68.71 | 0.0887 | 41.83 | 58.52 | 56.69 | 0.2605 |
| Graph Setting | SemEval-2018-Ec-Ar | ExaAEC | SemEval-2025-Arq | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| J | µF1 | mF1 | HL | J | µF1 | mF1 | HL | J | µF1 | mF1 | HL | |
| Cosine Only | 57.84 | 70.34 | 60.76 | 0.1341 | 64.11 | 68.75 | 66.30 | 0.0864 | 41.99 | 58.48 | 54.71 | 0.2799 |
| Jaccard (Train Only) | 57.70 | 70.02 | 60.83 | 0.1359 | 62.83 | 67.51 | 65.16 | 0.0887 | 41.83 | 58.52 | 55.69 | 0.2905 |
| kNN Graph | 57.05 | 70.04 | 59.28 | 0.1321 | 62.86 | 68.65 | 66.32 | 0.0871 | 40.77 | 57.75 | 56.08 | 0.3064 |
| Random Graph | 58.38 | 70.55 | 59.15 | 0.1367 | 63.83 | 68.72 | 66.08 | 0.0841 | 40.83 | 56.27 | 54.74 | 0.3378 |
| GTHL-Emo (Proposed) | 58.70 | 71.02 | 60.83 | 0.1259 | 65.99 | 70.72 | 68.71 | 0.0887 | 41.83 | 58.52 | 56.69 | 0.2605 |
| Loss Weights | SemEval-2018-Ec-Ar | ExaAEC | SemEval-2025-Arq | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Focal | Graph | Rank | EMA | Jaccard Acc | Micro-F1 | Macro-F1 | Jaccard Acc | Micro-F1 | Macro-F1 | Jaccard Acc | Micro-F1 | Macro-F1 |
| 0.05 | 0.05 | 0.05 | 0.90 | 57.80 | 67.12 | 60.05 | 62.14 | 68.32 | 64.50 | 39.30 | 55.42 | 52.08 |
| 0.10 | 0.10 | 0.05 | 0.95 | 58.40 | 68.55 | 61.10 | 64.50 | 69.78 | 66.82 | 40.65 | 56.20 | 53.80 |
| 0.10 | 0.10 | 0.10 | 0.95 | 58.70 | 71.02 | 60.48 | 65.99 | 70.72 | 68.71 | 41.47 | 56.78 | 56.69 |
| Run | Parameters (M) | Enabled Modules | Macro-F1 | Sec per Epoch Avg |
|---|---|---|---|---|
| Graph Cosine Only | 170.340 | 6.00 | 54.71 | 14.77 |
| Graph Jaccard (Train Only) | 170.340 | 6.00 | 55.69 | 15.15 |
| kNN Graph | 170.340 | 6.00 | 56.08 | 15.40 |
| Random Graph | 170.340 | 6.00 | 54.74 | 15.26 |
| w/o Adaptive Weights | 170.340 | 5.00 | 54.60 | 15.43 |
| w/o Correlation Alignment | 170.340 | 5.00 | 56.04 | 14.26 |
| w/o GraphSAGE | 169.160 | 5.00 | 55.48 | 14.83 |
| w/o Imbalance Loss | 170.340 | 5.00 | 55.44 | 14.96 |
| w/o Label Transformer | 164.030 | 5.00 | 54.86 | 14.59 |
| w/o Ranking Loss | 170.340 | 5.00 | 55.43 | 14.86 |
| GTHL-Emo | 170.340 | 6.00 | 56.69 | 14.86 |
| ID | Query (English) | Query Labels | Rank | Sim. | Nearest Neighbor (English) | NN Labels |
|---|---|---|---|---|---|---|
| E1 | My heart adores him; don’t read and torture yourself. | love | 1 | 0.9626 | I love Yasser and wish he dominates the field. | love |
| E2 | A very happy day, praise be to God; it’s the leader’s wedding. | joy, love, optimism | 1 | 0.9247 | O God, make its ending joyful and blessed. | joy, love, optimism |
| E3 | Why wouldn’t I miss you? Stay safe and well. | joy, love, sadness | 1 | 0.9371 | Such a rush of feeling, like filling a car to the brim. | joy, love, optimism |
| E4 | Fear and anxiety after this emotional breakdown. | anger, disgust, fear, sadness | 1 | 0.9284 | I’m honestly scared in a way I’ve never been before. | fear |
| ID | Label | Case | Prob | Text (English Translation) | Gold | Pred |
|---|---|---|---|---|---|---|
| E5 | anger | TP | 1.0000 | Give everyone what they deserve; if you keep exaggerating for everyone, you will end up exhausted. | pessimism, sadness | anger, disgust, pessimism, sadness |
| E6 | anger | FN | 0.3542 | Yesterday we were love letters; today we are falling pages that hurt each other. | anger, love, pessimism, sadness | anticipation, pessimism, sadness |
| E7 | anticipation | TP | 0.9983 | Thursday is here; I am going to check the site now. Pray for me in the next two minutes. | anticipation, fear, surprise | anticipation, joy, optimism |
| E8 | anticipation | FN | 0.1454 | I hope my fears and expectations never come true for you and that they stay far away. | anticipation, fear | fear, pessimism |
| E9 | trust | FN | 0.0962 | Asking a question does not imply ignorance; it reflects a need for reassurance and certainty. | trust | disgust |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Alrasheedy, M.N.; Tiun, S.; Fauzi, F. GTHL-Emo: Adaptive Imbalance-Aware and Correlation-Aligned Training for Arabic Multi-Label Emotion Detection. Electronics 2026, 15, 1169. https://doi.org/10.3390/electronics15061169
Alrasheedy MN, Tiun S, Fauzi F. GTHL-Emo: Adaptive Imbalance-Aware and Correlation-Aligned Training for Arabic Multi-Label Emotion Detection. Electronics. 2026; 15(6):1169. https://doi.org/10.3390/electronics15061169
Chicago/Turabian StyleAlrasheedy, Mashary N., Sabrina Tiun, and Fariza Fauzi. 2026. "GTHL-Emo: Adaptive Imbalance-Aware and Correlation-Aligned Training for Arabic Multi-Label Emotion Detection" Electronics 15, no. 6: 1169. https://doi.org/10.3390/electronics15061169
APA StyleAlrasheedy, M. N., Tiun, S., & Fauzi, F. (2026). GTHL-Emo: Adaptive Imbalance-Aware and Correlation-Aligned Training for Arabic Multi-Label Emotion Detection. Electronics, 15(6), 1169. https://doi.org/10.3390/electronics15061169











