A Hybrid SBERT–WGAN Framework with Ensemble Learning for Sentiment Analysis in Imbalanced Datasets
Abstract
1. Introduction
- We employ five imbalanced datasets, including two English datasets and three Arabic datasets, to evaluate the proposed approach in a multilingual sentiment analysis setting.
- We generate sentence-level embeddings using SBERT, which capture the semantic meaning of complete sentences rather than relying on token-level representations.
- We address the class imbalance problem using a GAN-based synthetic data generation approach, and we compare its effectiveness with widely used resampling techniques such as SMOTE and ADASYN.
- We train the resulting representations using a soft voting ensemble classifier that combines several machine learning models. The performance of this ensemble is compared with individual classifiers as well as stacking-based ensemble models.
2. Related Works
3. Materials and Methods
3.1. Data Processing
3.2. Features Extraction
Sentence-BERT
3.3. Data Balancing
GAN-Based Synthetic Data Generation
4. Experiments Analysis and Results
4.1. Datasets
4.1.1. English Datasets
- TripAdvisor Hotel Reviews [34]: This is an English-language benchmark dataset collected from the TripAdvisor platform and contains more than 20,000 customer reviews covering various aspects of hotel quality. Each review was originally associated with a rating ranging from 1 to 5. To address a binary sentiment classification, the ratings are mapped into two sentiment categories: reviews with ratings of 4 and 5 are labeled as positive, while those with ratings of 1 and 2 are labeled as negative. Reviews with a neutral rating of 3 are excluded from the analysis. After this transformation, the resulting dataset becomes imbalanced, which allows the application and evaluation of the data balancing technique proposed in this work.
- Yelp Restaurant Reviews [35]: This dataset comprises more than 19,800 reviews collected from over 45 restaurants on the Yelp platform. Each review was annotated with a rating ranging from 1 to 5. Following the same procedure applied to the TripAdvisor dataset, ratings of 4 and 5 are mapped to positive sentiment, while ratings of 1 and 2 are mapped to negative sentiment. Reviews with a rating of 3 are ignored to maintain a binary classification. The resulting dataset contains 17,827 reviews with an imbalanced class distribution.
4.1.2. Arabic Datasets
- HTL dataset [36]: The Arabic hotel dataset was collected from the TripAdvisor platform and contains 15,572 reviews labeled as positive, negative, and neutral. The neutral category constitutes the minority proportion of the data. For the purposes of this study, we restrict the analysis to positive and negative sentiments to address the binary classification, as neutral reviews are less informative for sentiment analysis in business intelligence applications. The resulting dataset remains imbalanced, which enables the application and evaluation of different balancing techniques.
- RES dataset [36]: The Arabic restaurant dataset consists of Arabic customer reviews collected from more than 4500 restaurants listed across two review platforms: the Qaym startup and the TripAdvisor website. The dataset contains 10,970 reviews, with positive and negative sentiments constituting the majority of the samples. Following the same preprocessing strategy adopted for the first dataset, only positive and negative sentiment categories are retained.
- SemEval-2016 [37]: The SemEval-ABSA16 dataset comprises consumer reviews of more than 15,000 hotels collected from platforms such as Booking.com and TripAdvisor. It contains 13,113 reviews annotated with positive, negative, and neutral sentiment labels, with the neutral class representing the minority. In line with the preprocessing strategy adopted for the previous datasets, the neutral category is excluded. Although this dataset was originally designed for aspect-level sentiment analysis, it is also used in this study for sentence-level sentiment classification.
4.2. Baselines Classifiers
4.2.1. Ensemble Classifier
4.2.2. Soft Voting Technique
- Logistic Regression (LR): is a supervised machine learning model commonly used for binary classification tasks. It estimates the probability that an input instance belongs to a given class by computing a linear combination of the input features and then applying a non-linear sigmoid function. This transformation maps the output to a value in the range , which can be interpreted as a probability of membership in the class [42].
- Support Vector Machine (SVM): is a supervised learning algorithm for classification tasks that aims to identify the optimal separating hyperplane between classes by maximizing the margin between them. This margin corresponds to the maximum distance between the hyperplane and the closest data points from each class [43].
- eXtreme Gradient Boosting (XGBoost): is an ensemble learning algorithm based on the gradient boosting framework. It builds a strong predictive model by training a sequence of weak learners, DT in our study, in a sequential manner. Each newly added tree is optimized to correct the errors made by the previous models by minimizing a loss function through gradient-based optimization [44].
4.3. Parameter Settings
4.4. Evaluation Metrics
4.5. Experimental Findings
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Tan, K.L.; Lee, C.P.; Lim, K.M. A Survey of Sentiment Analysis: Approaches, Datasets, and Future Research. Appl. Sci. 2023, 13, 4550. [Google Scholar] [CrossRef]
- Zhang, L.; Liu, B. Sentiment Analysis and Opinion Mining. In Synthesis Lectures on Human Language Technologies; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar] [CrossRef]
- Mao, Y.; Liu, Q.; Zhang, Y. Sentiment analysis methods, applications, and challenges: A systematic literature review. J. King Saud. Univ. Comput. Inf. Sci. 2024, 36, 102048. [Google Scholar] [CrossRef]
- Birjali, M.; Kasri, M.; Hssane, A.B. A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowl. Based Syst. 2021, 226, 107134. [Google Scholar] [CrossRef]
- Altalhan, M.; Algarni, A.; Alouane, M.T.H. Imbalanced Data Problem in Machine Learning: A Review. IEEE Access 2025, 13, 13686–13699. [Google Scholar] [CrossRef]
- Zhang, Y.; Jin, R.; Zhou, Z.H. Understanding bag-of-words model: A statistical framework. Int. J. Mach. Learn. Cybern. 2010, 1, 43–52. [Google Scholar] [CrossRef]
- Salton, G.; Buckley, C. Term-Weighting Approaches in Automatic Text Retrieval. Inf. Process. Manag. 1988, 24, 513–523. [Google Scholar] [CrossRef]
- Bahrawi, N. Sentiment Analysis Using Random Forest Algorithm-Online Social Media Based. J. Inf. Technol. Its Util. 2019, 2, 29–33. [Google Scholar] [CrossRef]
- Sunitha, P.; Joseph, S.; Akhil, P.V. A Study on the Performance of Supervised Algorithms for Classification in Sentiment Analysis. In Proceedings of the TENCON 2019—2019 IEEE Region 10 Conference (TENCON); IEEE: New York, NY, USA, 2019; pp. 1351–1356. [Google Scholar] [CrossRef]
- Devika, M.; Sunitha, C.; Ganesh, A. Sentiment Analysis: A Comparative Study on Different Approaches. Procedia Comput. Sci. 2016, 87, 44–49. [Google Scholar] [CrossRef]
- Qi, Y.; Shabrina, Z. Sentiment analysis using Twitter data: A comparative application of lexicon- and machine-learning-based approach. Soc. Netw. Anal. Min. 2023, 13, 31. [Google Scholar] [CrossRef]
- Albahli, S. Twitter Sentiment Analysis: An Arabic Text Mining Approach Based on COVID-19. Front. Public Health 2022, 10, 966779. [Google Scholar] [CrossRef]
- Abdellah, A.E.; Cherrat, E.M.; Ouahi, H.; Bekkar, A. Sentiment Analysis from Texts Written in Standard Arabic and Moroccan Dialect Based on Deep Learning Approaches. Int. J. Comput. Digit. Syst. 2024, 16, 447–458. [Google Scholar] [CrossRef] [PubMed]
- Mikolov, T.; Chen, K.; Corrado, G.S.; Dean, J. Efficient Estimation of Word Representations in Vector Space. In Proceedings of the International Conference on Learning Representations, Scottsdale, AZ, USA, 2–4 May 2013. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. GloVe: Global Vectors for Word Representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing; Association for Computational Linguistics: Stroudsburg, PA, USA, 2014. [Google Scholar] [CrossRef]
- Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching Word Vectors with Subword Information. Trans. Assoc. Comput. Linguist. 2016, 5, 135–146. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar] [CrossRef]
- Clark, K.; Luong, M.T.; Le, Q.V.; Manning, C.D. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar] [CrossRef]
- Jakha, H.; Houssaini, S.E.; Houssaini, M.A.E.; Ajjaj, S.; Hadir, A. Optimizing Sentiment Analysis in Multilingual Balanced Datasets: A New Comparative Approach to Enhancing Feature Extraction Performance with ML and DL Classifiers. Appl. Syst. Innov. 2025, 8, 104. [Google Scholar] [CrossRef]
- Bello, A.; Ng, S.C.; Leung, M.F. A BERT Framework to Sentiment Analysis of Tweets. Sensors 2023, 23, 506. [Google Scholar] [CrossRef]
- Abdelgwad, M.M.; Soliman, T.H.A.; Taloba, A.I.; Farghaly, M.F. Arabic aspect based sentiment analysis using bidirectional GRU based models. J. King Saud Univ. Comput. Inf. Sci. 2021, 34, 6652–6662. [Google Scholar] [CrossRef]
- Jakha, H.; Houssaini, S.E.; Houssaini, M.A.E.; Ajjaj, S.; Kafi, J.E. S2BA-AraELECTRA: A stacked BiLSTM-BiGRU with attention mechanism and contextual embeddings from AraELECTRA for enhanced Arabic sentiment classification in business intelligence. Int. J. Inf. Technol. 2025, 1–16. [Google Scholar] [CrossRef]
- Tan, K.L.; Lee, C.P.; Anbananthen, K.S.M.; Lim, K.M. RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis With Transformer and Recurrent Neural Network. IEEE Access 2022, 10, 21517–21525. [Google Scholar] [CrossRef]
- Aljomah, F.; Aldhafeeri, L.; Alfadel, M.; Alshahrani, S.; Abbas, Q.; Alhumoud, S. Enhancing Arabic Sentiment Analysis with Pre-Trained CAMeLBERT: A Case Study on Noisy Texts. Comput. Mater. Contin. 2025, 84, 5317–5335. [Google Scholar] [CrossRef]
- Habbat, N.; Hicham, N.; Anoun, H.; Hassouni, L. Sentiment analysis of imbalanced datasets using BERT and ensemble stacking for deep learning. Eng. Appl. Artif. Intell. 2023, 126, 106999. [Google Scholar] [CrossRef]
- Kubát, M.; Matwin, S. Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In Proceedings of the International Conference on Machine Learning, Nashville, Tennessee, 8–12 July 1997. [Google Scholar]
- Chawla, N.; Bowyer, K.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Muslim, M.A.; Nikmah, T.L.; Pertiwi, D.A.A.; Subhan; Jumanto; Dasril, Y.; Iswanto. New model combination meta-learner to improve accuracy prediction P2P lending with stacking ensemble learning. Intell. Syst. Appl. 2023, 18, 200204. [Google Scholar] [CrossRef]
- Umer, M.; Sadiq, S.; Missen, M.M.S.; Hameed, Z.; Aslam, Z.; Siddique, M.A.; Nappi, M. Scientific papers citation analysis using textual features and SMOTE resampling techniques. Pattern Recognit. Lett. 2021, 150, 250–257. [Google Scholar] [CrossRef]
- He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence); IEEE: New York, NY, USA, 2008; pp. 1322–1328. [Google Scholar] [CrossRef]
- Reimers, N.; Gurevych, I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv 2019, arXiv:1908.10084. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Volume 27. [Google Scholar] [CrossRef]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein Generative Adversarial Networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar] [CrossRef]
- Alam, M.H.; Ryu, W.J.; Lee, S. Joint multi-grain topic sentiment: Modeling semantic aspects for online reviews. Inf. Sci. 2016, 339, 206–223. [Google Scholar] [CrossRef]
- Public Domain. Yelp Restaurant Reviews Dataset. 2017. Available online: https://www.kaggle.com/datasets/farukalam/yelp-restaurant-reviews (accessed on 20 February 2026).
- ElSahar, H.; El-Beltagy, S.R. Building Large Arabic Multi-domain Resources for Sentiment Analysis. In Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics, Cairo, Egypt, 14–20 April 2015. [Google Scholar] [CrossRef]
- Al-Smadi, M.; Qawasmeh, O.; Talafha, B.; Al-Ayyoub, M.; Jararweh, Y.; Benkhelifa, E. An enhanced framework for aspect-based sentiment analysis of Hotels’ reviews: Arabic reviews case study. In Proceedings of the 2016 11th International Conference for Internet Technology and Secured Transactions (ICITST); IEEE: New York, NY, USA, 2016; pp. 98–103. [Google Scholar] [CrossRef]
- Breiman, L. Bagging Predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the European Conference on Computational Learning Theory, Jerusalem, Israel, 17–19 March 1997. [Google Scholar]
- Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Kittler, J.; Hatef, M.; Duin, R.P.W.; Matas, J. On Combining Classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 226–239. [Google Scholar] [CrossRef]
- Peng, C.Y.J.; Lee, K.L.; Ingersoll, G.M. An Introduction to Logistic Regression Analysis and Reporting. J. Educ. Res. 2002, 96, 3–14. [Google Scholar] [CrossRef]
- Evgeniou, T.; Pontil, M. Support Vector Machines: Theory and Applications. In Proceedings of the Machine Learning and Its Applications, Williamstown, MA, USA, 28 June–1 July 2001. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016. [Google Scholar] [CrossRef]










| Ref | Language | Features | Balancing | Classifier | Accuracy (%) |
|---|---|---|---|---|---|
| [9] | English | TF-IDF, BoW | - | SVM | 83.67 |
| [11] | English | BoW, TF-IDF, Word2Vec | - | SVC | 71 |
| [12] | Arabic | BoW, TF-IDF | SMOTE | Multinomial NB | 91 |
| [13] | Arabic | TF-IDF, BERT | - | AraBERT | 88 |
| [19] | Multilingual | Mixed embeddings | SMOTE | EC, SVM | 98.6 (EN), 94.1 (AR) |
| [20] | English | BoW, Word2Vec, BERT | - | BERT | 93 |
| [21] | Arabic | Word2Vec, FastText | - | IAN-BiGRU | 83.98 |
| [22] | Arabic | AraELECTRA | SMOTE | Stacked BiLSTM-GRU | 96.77 |
| [23] | English | RoBERTa | - | RoBERTa-LSTM | 92.96 |
| [25] | Multilingual | BERT | SMOTE | Stacking Model | 94.0 (EN), 94.2 (AR) |
| [28] | P2P dataset | LightGBM | SMOTE | Stacking XGBoost | 99.98 |
| [29] | English | TF-IDF, Word2Vec | SMOTE, ADASYN | Extra Trees | 98.26 |
| Dataset Name | Language | Dataset Size | |
|---|---|---|---|
| Positive | Negative | ||
| TripAdvisor Hotel | English | 15,093 | 3214 |
| Yelp Res Reviews | English | 15,330 | 2497 |
| HTL | Arabic | 10,775 | 2647 |
| RES | Arabic | 8030 | 2675 |
| SemEval-2016 | Arabic | 7705 | 4556 |
| Parameter | Value |
|---|---|
| SBERT | |
| Sentence embedding model | paraphrase-multilingual-mpnet-base-v2 |
| Embedding dimension | 768 |
| Pooling strategy | Mean pooling |
| WGAN | |
| Training data | Minority class only |
| Input scaling | Min–Max scaling to |
| Generator architecture | Dense (512)–Dense (512) |
| Critic architecture | Dense (512)–Dense (256) |
| Optimizer | Adam |
| Learning rate | |
| Adam parameters | , |
| Gradient penalty coefficient () | 10 |
| Batch size | 256 |
| Number of epochs | 10 |
| Critic updates step | 3 |
| Classification Models | |
| Logistic Regression | L2 regularization, lbfgs solver, , max_iter = 3000 |
| SVM | Linear kernel, |
| XGBoost | Number of trees = 400, maximum depth = 6, learning rate = 0.05 |
| Classifier | TripAdvisor Hotel Reviews | Yelp Restaurant Reviews | ||||
|---|---|---|---|---|---|---|
| Acc | F1 | MCC | Acc | F1 | MCC | |
| LR | 93.73 | 95.80 | 81.85 | 94.34 | 96.30 | 80.44 |
| SVM | 94.03 | 96.01 | 82.39 | 94.46 | 96.38 | 80.45 |
| XGB | 93.73 | 95.81 | 81.68 | 94.23 | 96.24 | 79.23 |
| MLP | 94.19 | 96.13 | 82.75 | 94.49 | 96.41 | 80.37 |
| Stacking | 94.18 | 96.53 | 78.97 | 93.63 | 96.40 | 71.07 |
| Proposed approach | 95.08 | 97.02 | 82.96 | 95.74 | 97.53 | 82.09 |
| Classifier | HTL | RES | SemEval-2016 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Acc | F1 | MCC | Acc | F1 | MCC | Acc | F1 | MCC | |
| LR | 92.71 | 95.07 | 80.36 | 85.08 | 89.90 | 61.58 | 90.48 | 92.23 | 81.73 |
| SVM | 93.12 | 95.36 | 81.12 | 84.31 | 89.19 | 54.34 | 90.99 | 92.78 | 80.81 |
| XGB | 93.41 | 95.51 | 81.55 | 85.94 | 90.79 | 61.28 | 90.50 | 92.40 | 79.74 |
| MLP | 93.12 | 95.30 | 81.93 | 86.22 | 91.06 | 61.51 | 90.72 | 92.35 | 81.43 |
| Stacking | 93.08 | 95.35 | 80.85 | 82.81 | 88.95 | 51.16 | 90.75 | 92.77 | 80.03 |
| Proposed approach | 94.41 | 96.52 | 82.39 | 86.92 | 91.74 | 63.76 | 91.89 | 93.53 | 82.66 |
| Classifier | English Datasets | Arabic Datasets | |||
|---|---|---|---|---|---|
| TripAdvisor Hotel Reviews | Yelp Restaurant Reviews | HTL | RES | SemEval-2016 | |
| SMOTE with Voting | 94.29 | 94.76 | 92.93 | 84.73 | 90.44 |
| ADASYN with Voting | 93.53 | 94.17 | 92.48 | 83.42 | 89.93 |
| Proposed Approach | 95.08 | 95.74 | 94.41 | 86.92 | 91.89 |
| Dataset | Fixed Train–Test | K-Fold | |||
|---|---|---|---|---|---|
| 80:20 | 90:10 | 70:30 | K = 5 | K = 10 | |
| TripAdvisor Hotel Review | 95.08 | 94.81 | 95.07 | 95.04 | 95.07 |
| Yelp Restaurant Reviews | 95.74 | 97.42 | 97.06 | 95.25 | 95.38 |
| HTL | 94.41 | 95.23 | 94.31 | 94.73 | 94.84 |
| RES | 86.92 | 85.62 | 85.08 | 85.55 | 85.66 |
| SemEval-2016 | 91.89 | 90.95 | 91.87 | 91.26 | 91.25 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Jakha, H.; Tbaikhi, S.; El Houssaini, S.; El Houssaini, M.-A.; Ajjaj, S. A Hybrid SBERT–WGAN Framework with Ensemble Learning for Sentiment Analysis in Imbalanced Datasets. Appl. Syst. Innov. 2026, 9, 103. https://doi.org/10.3390/asi9050103
Jakha H, Tbaikhi S, El Houssaini S, El Houssaini M-A, Ajjaj S. A Hybrid SBERT–WGAN Framework with Ensemble Learning for Sentiment Analysis in Imbalanced Datasets. Applied System Innovation. 2026; 9(5):103. https://doi.org/10.3390/asi9050103
Chicago/Turabian StyleJakha, Hamza, Sanae Tbaikhi, Souad El Houssaini, Mohammed-Alamine El Houssaini, and Souad Ajjaj. 2026. "A Hybrid SBERT–WGAN Framework with Ensemble Learning for Sentiment Analysis in Imbalanced Datasets" Applied System Innovation 9, no. 5: 103. https://doi.org/10.3390/asi9050103
APA StyleJakha, H., Tbaikhi, S., El Houssaini, S., El Houssaini, M.-A., & Ajjaj, S. (2026). A Hybrid SBERT–WGAN Framework with Ensemble Learning for Sentiment Analysis in Imbalanced Datasets. Applied System Innovation, 9(5), 103. https://doi.org/10.3390/asi9050103

