A Hybrid Deep Learning Model Based on Local and Global Features for Amazon Product Reviews: An Optimal ALBERT-Cascade CNN Approach
Abstract
1. Introduction
- This approach, designed with ALBERT and dual-layer CNN blocks, combines both local and global information representation. By learning micro and macro-level relationships at the word and sentence levels, it becomes a highly effective method for NLP applications.
- The proposed approach introduces a computationally efficient model for NLP applications by leveraging ALBERT’s parameter sharing (reducing model size) and CNNs’ parallel processing capability (accelerating training).
- The main difficulty in sentiment analysis lies in managing long-term dependencies and various lexicons in the text [6]. This makes feature extraction and accurate classification of such texts significantly challenging in natural language processing [6,7]. Leveraging the unique strengths of ALBERT and CNN, the model effectively addresses long and short-term token lengths in the dataset, a key issue faced in real-world sentiment analysis.
- This study utilized advanced hyperparameter tuning using the next-generation OPTUNA framework to determine the optimal configuration for the proposed model. The empirical results show that the ALBERT-Cascade CNN architecture significantly improves performance compared to contemporary methods. In addition, the bias of the proposed model was reduced, and the effectiveness of the established architecture was rigorously measured through cross-validation and ablation studies. These assessments validate the robustness and generalizability of the proposed approach by setting a new benchmark for sentiment analysis, particularly in scenarios with limited data availability.
- Proposed hierarchically designed cascade 1-D CNN architecture and ALBERT transformer prevent the increase in the number of parameters and effectively deal with memory limitation by deploying ALBERT transformer model that assists to compress the text representation (embedding factorization), to guarantee that resource efficiency from input to final classification.
2. Related Works
3. Materials and Methods
3.1. Dataset and Preprocessing
3.2. Proposed Model Architecture
3.2.1. ALBERT
3.2.2. One-Dimensional (1-D) CNN Blocks
3.2.3. Fully Connected Layer
3.3. Hyperparameter Optimization
3.4. Performance Assessment
4. Experimental Results and Discussion
4.1. Validation and Test Studies
4.2. Analysis of Error Cases
4.3. Literature Comparison
4.4. Ablation Studies
5. Conclusions and Future Work
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| NLP | Natural Language Processing |
| DL | Deep Learning |
| ALBERT | A Lite BERT |
| CNN | Convolutional Neural Network |
| RNN | Recurrent Neural Network |
| LSTM | Long Short-Term Memory |
| BiLSTM | Bidirectional Long Short-Term Memory |
| AI | Artificial Intelligence |
| MLM | Masked Language Modeling |
| TPE | Tree-structured Parzen Estimator |
| DCNN | Dilated Convolutional Neural Network |
| LeBERT | Lexicon-Enhanced Bert Embedding |
| GRU | Gated Recurrent Unit |
| BiGRU | Bi-directional Gateway Recurrent Unit |
| ABCNN | Attention-Based Convolutional Neural Network |
| HAN | Hierarchical Attention Network |
| NSP | Next Sentence Prediction |
| SOP | Sentence Order Prediction |
| MLM | Masked Language Model |
| FC | Fully Connected |
| ReLU | Rectified Linear Unit |
| TP | True Positive |
| FP | False Positive |
| TN | True Negative |
| FN | False Negative |
| AUC | Area Under Curve |
| GPU | Graphics Processing Unit |
| TPU | Tensor Processing Unit |
| FNR | False-Negative Rate |
| FPR | False Positive Rate |
| ML | Machine Learning |
| BoW | Bag of Words |
| TF-IDF | Term Frequency-Inverse Document Frequency |
| SVC | Support Vector Classifier |
References
- Backlinko Team. 15 Online Review Statistics, 2025. Available online: https://backlinko.com/online-review-stats (accessed on 11 July 2025).
- Statista. Statista Research Department and Content Philosophy. Available online: https://www.statista.com/aboutus/our-research-commitment (accessed on 11 July 2025).
- FinancesOnline. 62 Customer Reviews Statistics You Must Learn: 2024 Market Share Analysis & Data. Available online: https://financesonline.com/customer-reviews-statistics (accessed on 13 July 2025).
- Wang, J.; Huang, J.X.; Tu, X.; Wang, J.; Huang, A.J.; Laskar, M.T.R.; Bhuiyan, A. Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges. ACM Comput. Surv. 2024, 56, 33. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Bao, T.; Ren, N.; Luo, R.; Wang, B.; Shen, G.; Guo, T. A BERT-Based Hybrid Short Text Classification Model Incorporating CNN and Attention-Based BiGRU. J. Organ. End User Comput. 2021, 33, 1–21. [Google Scholar] [CrossRef]
- Tan, K.L.; Lee, C.P.; Anbananthen, K.S.M.; Lim, K.M. RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis with Transformer and Recurrent Neural Network. IEEE Access 2022, 10, 21517–21525. [Google Scholar] [CrossRef]
- Jain, P.K.; Quamer, W.; Saravanan, V.; Pamula, R. Employing BERT-DCNN with Sentic Knowledge Base for Social Media Sentiment Analysis. J. Ambient Intell. Humaniz. Comput. 2023, 14, 10417–10429. [Google Scholar] [CrossRef]
- Mutinda, J.; Mwangi, W.; Okeyo, G. Sentiment Analysis of Text Reviews Using Lexicon-Enhanced BERT Embedding (LeBERT) Model with Convolutional Neural Network. Appl. Sci. 2023, 13, 1445. [Google Scholar] [CrossRef]
- Zhang, B. A BERT-CNN Based Approach on Movie Review Sentiment Analysis. SHS Web Conf. 2023, 163, 04007. [Google Scholar] [CrossRef]
- Deng, L.; Yin, T.; Li, Z.; Ge, Q. Analysis of the Effectiveness of CNN-LSTM Models Incorporating BERT and Attention Mechanisms in Sentiment Analysis of Data Reviews. In Proceedings of the 2023 4th International Conference on Big Data and Informatization Education (ICBDIE 2023); Atlantis Press: Dordrecht, The Netherlands, 2023; pp. 821–829. [Google Scholar] [CrossRef]
- Kaur, K.; Kaur, P. Improving BERT Model for Requirements Classification by Bidirectional LSTM-CNN Deep Model. Comput. Electr. Eng. 2023, 108, 108699. [Google Scholar] [CrossRef]
- Silitonga, C.A.A.; Dermawan, M.D.; Adeta, F.; Nadia, N. Comparative Study of BERT-CNN, TRANS-BLSTM, and RoBERTa Models for Sentiment Analysis. In Proceedings of the 8th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE 2024), Yogyakarta, Indonesia, 29–30 August 2024; pp. 358–363. [Google Scholar] [CrossRef]
- Gupta, B.; Prakasam, P.; Velmurugan, T. Integrated BERT Embeddings, BiLSTM-BiGRU and 1-D CNN Model for Binary Sentiment Classification Analysis of Movie Reviews. Multimed. Tools Appl. 2022, 81, 33067–33086. [Google Scholar] [CrossRef]
- Xiao, H.; Luo, L. An Automatic Sentiment Analysis Method for Short Texts Based on Transformer-BERT Hybrid Model. IEEE Access 2024, 12, 93305–93317. [Google Scholar] [CrossRef]
- Hou, Y.; Li, J.; He, Z.; Yan, A.; Chen, X.; McAuley, J. Bridging Language and Items for Retrieval and Recommendation. arXiv 2024, arXiv:2403.03952. [Google Scholar] [CrossRef]
- Parningotan Manik, L.; Kurniasih, A. On the Role of Text Preprocessing in BERT Embedding-Based DNNs for Classifying Informal Texts. Int. J. Adv. Comput. Sci. Appl. 2022, 13. [Google Scholar] [CrossRef]
- Shukla, D.; Dwivedi, S.K. The Study of the Effect of Preprocessing Techniques for Emotion Detection on Amazon Product Review Dataset. Soc. Netw. Anal. Min. 2024, 14, 191. [Google Scholar] [CrossRef]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations. arXiv 2020, arXiv:1909.11942. [Google Scholar] [CrossRef]
- Khan, S.H.; Iqbal, R. A Comprehensive Survey on Architectural Advances in Deep CNNs: Challenges, Applications, and Emerging Research Directions. arXiv 2025, arXiv:2503.16546. [Google Scholar] [CrossRef]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar] [CrossRef]
- İstanbullu, C. Parametre Optimizasyonuna Pratik Bir Çözüm: Optuna. Miuul. Available online: https://miuul.com/blog/parametre-optimizasyonuna-pratik-bir-cozum-optuna (accessed on 26 July 2025).
- Vaibhav, J. Performance Metrics: Confusion Matrix, Precision, Recall, and F1 Score. Towards Data Science. Available online: https://towardsdatascience.com/performance-metrics-confusion-matrix-precision-recall-and-f1-score-a8fe076a2262 (accessed on 16 August 2025).
- Priya Kamath, B.; Geetha, M.; Acharya, U.D.; Singh, D.; Rao, A.; Rai, S.; Shetty, R. Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews. IEEE Access 2025, 13, 25239–25255. [Google Scholar] [CrossRef]
- Zhao, X.; Sun, Y. Amazon Fine Food Reviews with BERT Model. Procedia Comput. Sci. 2022, 208, 401–406. [Google Scholar] [CrossRef]
- Wang, C.; Li, Y.; Wang, Z. A Novel Approach for Text Classification by Combining Pre-Trained BERT Model with CNN Classifier. In Proceedings of the 6th IEEE International Conference on Information Systems and Computer Aided Education (ICISCAE 2023), Dalian, China, 23–25 September 2023; pp. 57–62. [Google Scholar] [CrossRef]
- Ali, H.; Hashmi, E.; Yayilgan Yildirim, S.; Shaikh, S. Analyzing Amazon Products Sentiment: A Comparative Study of Machine and Deep Learning, and Transformer-Based Techniques. Electronics 2024, 13, 1305. [Google Scholar] [CrossRef]
- Ahmed, I. Comparative Study of Sentiment Analysis on Amazon Product Reviews Using Recurrent Neural Network (RNN). Int. J. Adv. Trends Comput. Sci. Eng. 2022, 11, 141–146. [Google Scholar] [CrossRef]


| Parameter | Hypermeter Value Ranges | Best Parameter Values |
|---|---|---|
| Learning rate | 1 × 10−5, 1 × 10−3 altered to 1 × 10−7, 1 × 10−2 | 0.000031 |
| Weight decay | 1 × 10−6, 1 × 10−2 | 0.000005 |
| Dropout rate | 0.1, 0.5 | 0.20 |
| Batch size | 16, 32, 64 | 16 |
| Number of epochs | 5, 20 | 5 |
| Number of filters | 64, 128, 192, 256, 320 | 128 |
| Hidden dimension | 50, 600 | 498 |
| Filter sizes | (2, 3, 4), (3, 4, 5), (1, 2, 3), (1, 3, 5) | (1, 2, 3) |
| Metric | Fold-1 | Fold-2 | Fold-3 | Fold-4 | Fold-5 | Average of 5-Fold |
|---|---|---|---|---|---|---|
| Accuracy | 0.9302 | 0.9299 | 0.9328 | 0.9309 | 0.9301 | 0.9308 |
| F1-score | 0.9282 | 0.9288 | 0.9319 | 0.9300 | 0.9289 | 0.9296 |
| Precision | 0.9404 | 0.9392 | 0.9425 | 0.9422 | 0.9415 | 0.9412 |
| Recall | 0.9164 | 0.9186 | 0.9215 | 0.9181 | 0.9167 | 0.9182 |
| AUC | 0.9796 | 0.9795 | 0.9796 | 0.9803 | 0.9795 | 0.9797 |
| Confusion Matrix | ||||||
| TP | 9030 | 9135 | 9190 | 9177 | 9136 | 45,668 |
| TN | 9564 | 9454 | 9456 | 9431 | 9456 | 47,361 |
| FP | 575 | 591 | 561 | 563 | 568 | 2855 |
| FN | 824 | 810 | 783 | 819 | 830 | 4066 |
| FNR | 0.0836 | 0.0814 | 0.0785 | 0.0819 | 0.0833 | 0.0818 |
| FPR | 0.0567 | 0.0588 | 0.0560 | 0.0563 | 0.0567 | 0.0569 |
| Metric | Fold-1 | Fold-2 | Fold-3 | Fold-4 | Fold-5 | Average of 5-Fold |
|---|---|---|---|---|---|---|
| Accuracy | 0.9300 | 0.9317 | 0.9311 | 0.9314 | 0.9325 | 0.9313 |
| F1-score | 0.9294 | 0.9312 | 0.9305 | 0.9308 | 0.9308 | 0.9305 |
| Precision | 0.9404 | 0.9430 | 0.9406 | 0.9420 | 0.9412 | 0.9414 |
| Recall | 0.9187 | 0.9197 | 0.9205 | 0.9199 | 0.9205 | 0.9199 |
| AUC | 0.9787 | 0.9797 | 0.9802 | 0.9810 | 0.9805 | 0.9800 |
| Confusion Matrix | ||||||
| TP | 9214 | 9241 | 9211 | 9221 | 9067 | 45,954 |
| TN | 9377 | 9383 | 9402 | 9398 | 9573 | 47,133 |
| FP | 584 | 559 | 582 | 568 | 566 | 2859 |
| FN | 815 | 807 | 795 | 803 | 783 | 4003 |
| FNR | 0.0813 | 0.0803 | 0.0795 | 0.0801 | 0.0795 | 0.0801 |
| FPR | 0.0586 | 0.0562 | 0.0583 | 0.0570 | 0.0558 | 0.0572 |
| Sample Text From Dataset | Label | Model Prediction | Why Does the Model Produce Incorrect Results? (Analysis and Defense) |
|---|---|---|---|
| 1. cute but run small sadly this dress be way too small for give to a friend who wear a medium it fit she perfectly | 1 | 0 | Mixed Sentiment: The model recognizes the words “sadly” and “too small” as negative, or 0. However, when the phrase “fit she perfectly” appears in the sentence, it changes to positive, or 1. It is difficult for the model to detect this contrast. |
| 2. have to return it i like the idea and want to love the dress but it be too tight around the next to be comfortable in it have to return | 1 | 0 | Label Noise: The text contains clearly negative phrases like “return,” “too tight,” and “want to love… but.” Therefore, the model predicts this as “0,” but the dataset labels it as “1.” This indicates a clear labeling error. |
| 3. do not fit true to size the shirt itself be beautiful i order a x because i know they run small but when it come today it be definetly not true to size it fit like a xl i have such high hope | 0 | 1 | Misleading Positive Words: Although the label in the dataset is set to Negative (0), the model may likely register a Positive (1) due to the presence of very strong positive expressions in the text, such as “beautiful” and “high hope.” |
| Ref. No | Models | Dataset Length | Class Type | Dataset | Results Accuracy(%) |
|---|---|---|---|---|---|
| [24] | BERT-TF-IDF | 100,000 | Binary | Amazon product reviews | 88.00 |
| [25] | BERT base | 75,000 | Multi classification | Amazon fine food reviews | 79.82 |
| [26] | BERT-CNN | - | Binary classification | Amazon product reviews | 92.10 |
| [27] | LR, BI-LSTM BERT base | 400,000 | Triple | Amazon consumer reviews | 86.10 87.10 89.00 |
| [28] | CNN, RNN, LSTM, GRU | Data1:20,742 Data2:66,666 Data3:49,870 | Binary | Amazon product reviews | 85.00 71.00 70.00 |
| [9] | LeBERT-CNN | 70,000 | Binary | Amazon product reviews | 82.40 |
| Proposed model | (average of 5-fold) | 99,949 | Binary | Amazon fashion reviews (2023) | 93.13 |
| Model Type | Params(M) | Model Size in RAM (MB) | Training Time (s/epoch(M)) | Peak VRAM (MB) | Accuracy Score | Inference Time (Second) |
|---|---|---|---|---|---|---|
| ALBERT | 21 | 81.89 | 18.939587 | 550.55 | 0.736667 | 2.31 s |
| BERT | 119 | 455.56 | 17.069057 | 754.36 | 0.840000 | 2.06 s |
| RoBERT | 140 | 536.00 | 17.489917 | 835.43 | 0.813333 | 1.96 s |
| LeBERT | 199 | 455.56 | 15.688810 | 754.36 | 0.853333 | 1.89 s |
| DistilBERT | 76 | 291.07 | 9.180782 | 589.87 | 0.866667 | 1.10 s |
| Models | Accuracy | F1-Score | Precision | Recall | AUC |
|---|---|---|---|---|---|
| ALBERT-base | 0.9164 | 0.9180 | 0.8949 | 0.9424 | 0.9780 |
| ALBERT-MaxPooling | 0.9090 | 0.9031 | 0.9586 | 0.8536 | 0.9730 |
| ALBERT-Cascade CNN | 0.9318 | 0.9301 | 0.9468 | 0.9140 | 0.9804 |
| ALBERT-LSTM | 0.9299 | 0.9281 | 0.9457 | 0.9112 | 0.9778, |
| ALBERT-BILSTM | 0.9276 | 0.9277 | 0.9205 | 0.9350 | 0.9795 |
| ALBERT-CNN-LSTM | 0.9303 | 0.9283 | 0.9493 | 0.9082 | 0.9786 |
| ALBERT-CNN-BILSTM | 0.9312 | 0.9306 | 0.9328 | 0.9283 | 0.9787 |
| ALBERT-LSTM-CNN | 0.9322 | 0.9304 | 0.9491 | 0.9124 | 0.9790 |
| ALBERT-BLSTM-CNN | 0.9315 | 0.9301 | 0.9433 | 0.9173 | 0.9786 |
| ALBERT-MHAttention | 0.9222 | 0.9218 | 0.9202 | 0.9235 | 0.9768 |
| ALBERT-LSTM-MHAttention-CNN | 0.9269 | 0.9257 | 0.9353 | 0.9162 | 0.9764 |
| ALBERT-BILSTM-MHAttention-CNN | 0.9297 | 0.9289 | 0.9339 | 0.9239 | 0.9783 |
| ALBERT-CNN-LSTM-MHAttention | 0.9287 | 0.9280 | 0.9311 | 0.9249 | 0.9792 |
| ALBERT-CNN-BILSTM-MHAttention | 0.9246 | 0.9240 | 0.9251 | 0.9229 | 0.9770 |
| Models | Accuracy | F1-Score | Precision | Recall | AUC |
|---|---|---|---|---|---|
| ALBERT-base | 0.9148 | 0.9175 | 0.9000 | 0.9356 | 0.9762 |
| ALBERT-MaxPooling | 0.9031 | 0.8985 | 0.9561 | 0.8475 | 0.9709 |
| ALBERT-Cascade CNN | 0.9280 | 0.9272 | 0.9491 | 0.9064 | 0.9780 |
| ALBERT-LSTM | 0.9235 | 0.9228 | 0.9438 | 0.9026 | 0.9759 |
| ALBERT-BILSTM | 0.923 | 0.9243 | 0.9203 | 0.9283 | 0.9775 |
| ALBERT-CNN-LSTM | 0.9264 | 0.9256 | 0.9482 | 0.9040 | 0.9764 |
| ALBERT-CNN-BILSTM | 0.9225 | 0.9227 | 0.9315 | 0.9141 | 0.9764 |
| ALBERT-LSTM-CNN | 0.9275 | 0.9267 | 0.9494 | 0.9050 | 0.9780 |
| ALBERT-BLSTM-CNN | 0.9261 | 0.9256 | 0.9432 | 0.9087 | 0.9762 |
| ALBERT-MHAttention | 0.9163 | 0.9169 | 0.9216 | 0.9123 | 0.9750 |
| ALBERT-LSTM-MHAttention-CNN | 0.9208 | 0.9205 | 0.9357 | 0.9058 | 0.9741 |
| ALBERT-BILSTM-MHAttention-CNN | 0.9252 | 0.9251 | 0.9375 | 0.9131 | 0.9792 |
| ALBERT-CNN-LSTM-MHAttention | 0.9227 | 0.9232 | 0.9285 | 0.9180 | 0.9757 |
| ALBERT-CNN-BILSTM-MHAttention | 0.9202 | 0.9208 | 0.9255 | 0.9160 | 0.9740 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Abbas, I.M.; Atacak, İ.; Toklu, S.; Barışçı, N.; Doğru, İ.A. A Hybrid Deep Learning Model Based on Local and Global Features for Amazon Product Reviews: An Optimal ALBERT-Cascade CNN Approach. Appl. Sci. 2026, 16, 25. https://doi.org/10.3390/app16010025
Abbas IM, Atacak İ, Toklu S, Barışçı N, Doğru İA. A Hybrid Deep Learning Model Based on Local and Global Features for Amazon Product Reviews: An Optimal ALBERT-Cascade CNN Approach. Applied Sciences. 2026; 16(1):25. https://doi.org/10.3390/app16010025
Chicago/Turabian StyleAbbas, Israa Mustafa, İsmail Atacak, Sinan Toklu, Necaattin Barışçı, and İbrahim Alper Doğru. 2026. "A Hybrid Deep Learning Model Based on Local and Global Features for Amazon Product Reviews: An Optimal ALBERT-Cascade CNN Approach" Applied Sciences 16, no. 1: 25. https://doi.org/10.3390/app16010025
APA StyleAbbas, I. M., Atacak, İ., Toklu, S., Barışçı, N., & Doğru, İ. A. (2026). A Hybrid Deep Learning Model Based on Local and Global Features for Amazon Product Reviews: An Optimal ALBERT-Cascade CNN Approach. Applied Sciences, 16(1), 25. https://doi.org/10.3390/app16010025

