Optimizing Multiclass Classification Using Convolutional Neural Networks with Class Weights and Early Stopping for Imbalanced Datasets
Abstract
:1. Introduction
2. Related Work
3. A CNN with Early Stopping and Class Weighting
- Step 1: Data Collection
- These steps include identifying, gathering, and interpreting relevant data from source(s).
- Step 2: Data Preparation
- Data preparation is required to enhance model generalization and input data quality. Several methods can be used to prepare the data, such as image rescaling, data augmentation, target size, and class weight.
- The general formula of class weight is the following:
- : Weight for class
- : Total number of samples in the dataset.
- Number of samples in class
- Total number of classes
- Step 3: CNN Modeling
- -
- Architecture Selection: A CNN architecture (such as ResNet, VGG, or a custom CNN) is selected. The selected architecture should be able to learn unique features from a dataset and effectively extract features from the dataset. Properly selecting architectures, output layer structures, and loss functions is critical for multiclass classification. This configuration facilitates the model’s ability to detect differences between classes efficiently.
- -
- Training Procedure: Using the given dataset, the model is trained while performance metrics (accuracy and loss function) are continuously monitored. The minority class performance receives particular attention since some classes may be overrepresented in the dataset. This situation leads to class imbalance. By performing this step, the model is kept from becoming skewed toward the majority classes.
- -
- Class Weighting: Class weights (Equation (1)) are applied to the loss function to highlight the importance of minority classes during training. Our approach ensures that the model can manage imbalanced datasets by prioritizing accurately and categorizing underrepresented classes. This change lessens the likelihood that minority classes may be overlooked during training.
- -
- Model Generalization: The CNN uses early stopping to monitor the validity loss during training. The model stops training to prevent excess specialization in the majority classes when validation loss either worsens or stays the same. To avoid overfitting or underfitting, this stopping condition ensures that the model learns features for both majority and minority classes more efficiently, improving the model’s generalization capabilities.
- -
- Model Optimization: Adam (adaptive moment estimation) is a popular optimization algorithm for CNNs that combines the benefits of the Momentum and RMSprop algorithms. Momentum keeps track of past gradients to accelerate convergences toward the steepest descent. At the same time, RMSprop adjusts each parameter’s learning rate based on the recent gradient’s magnitude to ensure stability and fast convergence. Model optimization aims to minimize the loss function by modifying the model’s parameters during training, which may improve the model’s performance on the training dataset.
- -
- Initialize the model parameters (), learning rate (), and hyper-parameters (, , and ε).
- -
- Compute the gradients () of the loss function () for the model parameters as follows:
- -
- Update the first-moment estimate (m), as follows:
- -
- Update the second-moment estimate (v) as follows:
- -
- Correct the bias in the first () and second () moment estimates for the current iteration (), as follows:
- -
- Compute the adaptive learning rates () as follows:
- -
- Update the model parameters using the following adaptive learning rates:
- Step 4: Evaluation
- TP = True positive (correctly classified as positive)
- TN = True negative (correctly classified as negative)
- FP = False positive (incorrectly classified as positive)
- FN = False negative (incorrectly classified as negative)
- N is the total number of samples;
- C is the number of classes (in this case, four—nitrogen, potassium, boron, and magnesium);
- is a binary indicator (0 or 1) if class label is the correct classification for observation ;
- is the predicted probability for class for observation .
4. Numerical Experiment
- (i)
- Data Collection
- (ii)
- Data Preparation
- Image Rescaling: The ImageDataGenerator tool rescaled images to a range of [0, 1] to normalize the pixel values. This was significant since raw pixel values usually fall within the [0, 255] range. Normalizing the data helped us speed up the model’s convergence.
- Data Augmentation: The training images were augmented using augmentation techniques, including zoom, shear transformations, and horizontal flipping. This increased the dataset’s size and artificially added diversity. This avoided overfitting and made the model more resilient to changes in the image data.
- Target Size: Every image was resized to 224 × 224 pixels, the standard size. This phase enabled the CNN to handle all inputs consistently, ensuring consistency throughout the dataset.
- Class Weights: class_weight ensured that the model would not disproportionately favor the classes with more samples since the training dataset may have contained imbalanced classes. Class weights provided more balanced predictions by penalizing the model more when it incorrectly classified a minority class.
- (iii)
- Modeling
- Model Architecture: The palm oil leaf images were trained using a three-layer CNN. This architecture enabled the model to learn and extract complex features for each class of nutrient (deficiency, normal, and excess) in the leaves.
- Output Layer and Activation: The model used a SoftMax activation function in the output layer to produce a probability distribution across multiple classes, making it suitable for multiclass classification.
- Epochs: The CNN was trained for 200 epochs.
- Early Stopping: ModelCheckPoint saved the best model based on validation performance. Early stopping ensured that the model would retain only the best weights and prevented unnecessary computation.
- Adam Optimizer: The Adam optimizer was used to compile the model. This optimizer’s adaptive learning rate modification features allowed the model to quickly converge on large datasets.
- Loss Function: The divergence between the actual labels and the predicted probability distribution was computed using categorical cross-entropy loss. This loss function is perfect for multiclass classification tasks as it penalizes erroneous predictors across multiple classes.
- (iv)
- Evaluation
5. Result and Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Jha, A.; Dave, M.; Madan, S. Comparison of binary class and Multiclassclassifier using different data mining classification techniques. In Proceedings of the International Conference on Advancements in Computing & Management (ICACM), Jaipur, India, 13–14 April 2019. [Google Scholar]
- Sevastyanov, L.A.; Shchetinin, E.Y. On methods for improving the accuracy of Multiclass classification on imbalanced data. ITTMM 2020, 20, 70–82. [Google Scholar]
- Fahmi, A.; Muqtadiroh, F.A.; Purwitasari, D.; Sumpeno, S.; Purnomo, M.H. A Multiclass classification of Dengue Infection Cases with Feature Selection in Imbalanced Clinical Diagnosis Data. Int. J. Intell. Eng. Syst. 2022, 15, 176–192. [Google Scholar]
- Das, S.; Datta, S.; Chaudhuri, B.B. Handling data irregularities in classification: Foundations, trends, and future challenges. Pattern Recognit. 2018, 81, 674–693. [Google Scholar] [CrossRef]
- Chakraborty, S.; Dey, L. Introduction to Classification. In Multi-Objective, Multi-Class and Multi-Label Data Classification with Class Imbalance: Theory and Practices; Springer Nature: Singapore, 2024; pp. 1–21. [Google Scholar]
- Cavus, M.; Biecek, P. An Experimental Study on the Rashomon Effect of Balancing Methods in Imbalanced Classification. arXiv 2024, arXiv:2405.01557. [Google Scholar]
- Fayaz, S.; Ahmad Shah, S.Z.; ud din, N.M.; Gul, N.; Assad, A. Advancements in Data Augmentation and Transfer Learning: A Comprehensive Survey to Address Data Scarcity Challenges. Recent Adv. Comput. Sci. Commun. (Former. Recent Pat. Comput. Sci.) 2024, 17, 14–35. [Google Scholar] [CrossRef]
- Khan, A.A.; Chaudhari, O.; Chandra, R. A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. Expert Syst. Appl. 2023, 244, 122778. [Google Scholar] [CrossRef]
- Hort, M.; Chen, Z.; Zhang, J.M.; Harman, M.; Sarro, F. Bias mitigation for machine learning classifiers: A comprehensive survey. ACM J. Responsible Comput. 2024, 1, 1–52. [Google Scholar] [CrossRef]
- Wang, Y.; Rosli, M.M.; Musa, N.; Li, F. Multiclass Imbalanced Data Classification: A Systematic Mapping Study. Eng. Technol. Appl. Sci. Res. 2024, 14, 14183–14190. [Google Scholar] [CrossRef]
- Tanha, J.; Abdi, Y.; Samadi, N.; Razzaghi, N.; Asadpour, M. Boosting methods for Multiclass imbalanced data classification: An experimental review. J. Big Data 2020, 7, 1–47. [Google Scholar] [CrossRef]
- Yang, Y.; Khorshidi, H.A.; Aickelin, U. A review on over-sampling techniques in classification of Multiclass imbalanced datasets: Insights for medical problems. Front. Digit. Health 2024, 6, 1430245. [Google Scholar] [CrossRef]
- Kumar, G.R.; Thippanna, G.; Kumar, K.P. Unveiling the Effectiveness of Multi-Classification Algorithms: A Comprehensive Investigation. J. Comput. Sci. Syst. Softw. 2024, 1, 13–17. [Google Scholar] [CrossRef]
- Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar] [CrossRef]
- Chen, L.; Li, S.; Bai, Q.; Yang, J.; Jiang, S.; Miao, Y. Review of image classification algorithms based on convolutional neural networks. Remote Sens. 2021, 13, 4712. [Google Scholar] [CrossRef]
- Naranjo-Torres, J.; Mora, M.; Hernández-García, R.; Barrientos, R.J.; Fredes, C.; Valenzuela, A. A review of convolutional neural network applied to fruit image processing. Appl. Sci. 2020, 10, 3443. [Google Scholar] [CrossRef]
- Srivastava, S.; Divekar, A.V.; Anilkumar, C.; Naik, I.; Kulkarni, V.; Pattabiraman, V. Comparative analysis of deep learning image detection algorithms. J. Big Data 2021, 8, 66. [Google Scholar] [CrossRef]
- Alkhawaldeh, I.M.; Albalkhi, I.; Naswhan, A.J. Challenges and limitations of synthetic minority oversampling techniques in machine learning. World J. Methodol. 2023, 13, 373. [Google Scholar] [CrossRef] [PubMed]
- Sharma, S.; Sharma, S.; Athaiya, A. Activation functions in neural networks. Towards Data Sci. 2017, 6, 310–316. [Google Scholar] [CrossRef]
- Yin, B.; Corradi, F.; Bohté, S.M. Accurate and efficient time-domain classification with adaptive spiking recurrent neural networks. Nat. Mach. Intell. 2021, 3, 905–913. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Bansal, A.; Verma, A.; Singh, S.; Jain, Y. Combination of oversampling and undersampling techniques on imbalanced datasets. In International Conference on Innovative Computing and Communications: Proceedings of ICICC 2022; Springer Nature: Singapore, 2022; Volume 3, pp. 647–656. [Google Scholar]
- Mienye, I.D.; Sun, Y. Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Inform. Med. Unlocked 2021, 25, 100690. [Google Scholar] [CrossRef]
- Fernando, K.R.M.; Tsokos, C.P. Dynamically weighted balanced loss: Class imbalanced learning and confidence calibration of deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 2940–2951. [Google Scholar] [CrossRef] [PubMed]
- Ghosh, K.; Bellinger, C.; Corizzo, R.; Branco, P.; Krawczyk, B.; Japkowicz, N. The class imbalance problem in deep learning. Mach. Learn. 2024, 113, 4845–4901. [Google Scholar] [CrossRef]
- Deng, M.; Guo, Y.; Wang, C.; Wu, F. An oversampling method for multi-class imbalanced data based on composite weights. PLoS ONE 2021, 16, e0259227. [Google Scholar] [CrossRef] [PubMed]
- Johnson, J.M.; Khoshgoftaar, T.M. Survey on deep learning with class imbalance. J. Big Data 2019, 6, 1–54. [Google Scholar] [CrossRef]
- Ruby, U.; Yendapalli, V. Binary cross entropy with deep learning technique for image classification. Int. J. Adv. Trends Comput. Sci. Eng. 2020, 9, 5393–5397. [Google Scholar]
- Rusiecki, A. Trimmed categorical cross-entropy for deep learning with label noise. Electronics Letters 2019, 55, 319–320. [Google Scholar] [CrossRef]
- Bakirarar, B.; Elhan, A.H. Class Weighting Technique to Deal with Imbalanced Class Problem in Machine Learning: Methodological Research. Türkiye Klin. Biyoistatistik 2023, 15, 19–29. [Google Scholar] [CrossRef]
- Tian, Y.; Su, D.; Lauria, S.; Liu, X. Recent advances on loss functions in deep learning for computer vision. Neurocomputing 2022, 497, 129–158. [Google Scholar] [CrossRef]
- Szandała, T. Review and comparison of commonly used activation functions for deep neural networks. In Bio-Inspired Neurocomputing; Springer: Berlin/Heidelberg, Germany, 2021; pp. 203–224. [Google Scholar]
- Zhu, Q.; He, Z.; Zhang, T.; Cui, W. Improving classification performance of softmax loss function based on scalable batch-normalization. Appl. Sci. 2020, 10, 2950. [Google Scholar] [CrossRef]
- Yao, Y.; Wang, J.; Zhang, Y. A systematic review of early stopping in deep learning. Appl. Sci. 2019, 9, 2348. [Google Scholar]
- Murat, F.; Yildirim, O.; Talo, M.; Baloglu, U.B.; Demir, Y.; Acharya, U.R. Application of deep learning techniques for heartbeats detection using ECG signals-analysis and review. Comput. Biol. Med. 2020, 120, 103726. [Google Scholar] [CrossRef] [PubMed]
- Kingman, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Guo, M.-H.; Xu, T.-X.; Liu, J.-J.; Liu, Z.-N.; Jiang, P.-T.; Mu, T.-J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M. Attention mechanisms in computer vision: A survey. Comp. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
- Jamhuri, M. Understanding the Adam Optimization Algorithm: A Deep Dive into the Formulas. Medium. Available online: https://jamhuri.medium.com/understanding-the-adam-optimization-algorithm-a-deep-dive-into-the-formulas-3ac5fc5b7cd3 (accessed on 29 December 2024).
Method | Algorithm | Strengths | Limitation |
---|---|---|---|
Resampling [18,21] | SMOTE, ADASYN, random oversampling, and random undersampling |
|
|
Ensemble methods [26,27] | Bagging and boosting (e.g., AdaBoost, random forest) |
|
|
Cost-sensitive learning [28,29] | Modified loss functions (e.g., cost-sensitive SVM) |
|
|
Class weighting [30,31] | Adjusted loss functions (e.g., weighted cross-entropy) |
|
|
Nutrients | Set | Class | Total | ||
---|---|---|---|---|---|
Deficiency | Normal | Excess | |||
Nitrogen, N | Train | 266 | 327 | 21 | 614 |
Testing | 29 | 36 | 3 | 68 | |
Potassium, K | Train | 219 | 283 | 113 | 615 |
Testing | 24 | 31 | 12 | 67 | |
Boron, B | Train | 59 | 175 | 380 | 614 |
Testing | 7 | 19 | 42 | 68 | |
Magnesium, Mg | Train | 49 | 281 | 287 | 617 |
Testing | 5 | 30 | 30 | 65 |
Imbalanced Datasets | Class Weighting | Class Weighting and Early Stopping | ||||
---|---|---|---|---|---|---|
Nutrient | Accuracy | Loss | Accuracy | Loss | Accuracy | Loss |
N | 0.6765 | 2.3386 | 0.6912 | 1.8329 | 0.6765 | 0.6256 |
K | 0.6119 | 2.8003 | 0.7015 | 1.8479 | 0.7015 | 0.7659 |
B | 0.7059 | 1.9037 | 0.7500 | 1.5133 | 0.6029 | 0.8028 |
Mg | 0.6923 | 1.9583 | 0.4615 | 1.0975 | 0.4615 | 1.0949 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Razali, M.N.; Arbaiy, N.; Lin, P.-C.; Ismail, S. Optimizing Multiclass Classification Using Convolutional Neural Networks with Class Weights and Early Stopping for Imbalanced Datasets. Electronics 2025, 14, 705. https://doi.org/10.3390/electronics14040705
Razali MN, Arbaiy N, Lin P-C, Ismail S. Optimizing Multiclass Classification Using Convolutional Neural Networks with Class Weights and Early Stopping for Imbalanced Datasets. Electronics. 2025; 14(4):705. https://doi.org/10.3390/electronics14040705
Chicago/Turabian StyleRazali, Muhammad Nazim, Nureize Arbaiy, Pei-Chun Lin, and Syafikrudin Ismail. 2025. "Optimizing Multiclass Classification Using Convolutional Neural Networks with Class Weights and Early Stopping for Imbalanced Datasets" Electronics 14, no. 4: 705. https://doi.org/10.3390/electronics14040705
APA StyleRazali, M. N., Arbaiy, N., Lin, P.-C., & Ismail, S. (2025). Optimizing Multiclass Classification Using Convolutional Neural Networks with Class Weights and Early Stopping for Imbalanced Datasets. Electronics, 14(4), 705. https://doi.org/10.3390/electronics14040705