Overcoming Class Imbalance in Incremental Learning Using an Elastic Weight Consolidation-Assisted Common Encoder Approach
Abstract
:1. Introduction
Contributions of This Work
2. Literature Review
3. Methodology and Experimental Setup
3.1. Model Architecture
3.2. Elastic Weight Consolidation (EWC) for Knowledge Retention
- is a hyperparameter that controls the strength of the EWC penalty.
- is the optimal parameter for the previous task.
- is the Fisher Information for each parameter i, calculated based on its importance in the previous task. The Fisher Information Matrix (FIM) evaluates the significance of parameters in relation to the likelihood of the observed data. Due to the high computational cost of calculating the full FIM, a diagonal approximation is commonly used. This approximation is typically achieved by accumulating the squared gradients of the loss function with respect to each parameter across the training dataset. The following pseudocode illustrates this computation (Algorithm 1).
Algorithm 1: Diagonal Fisher Approximation |
# Initialize Fisher Information approximation |
F = zeros_like(theta) |
for x, y in dataset |
y_pred = model(x) |
loss = cross_entropy(y_pred, y) |
gradients = compute_gradients(loss, theta) |
F += gradients ** 2 |
F/= len(dataset) |
3.3. Training Procedure
3.3.1. Initial Task Training
3.3.2. New Task Training with EWC Penalty
3.3.3. Updating EWC Parameters
3.3.4. New Task Training Without EWC Penalty
3.4. Handling Class Imbalance
- (1)
- Balanced Loss Function: During training, class weights are adjusted inversely proportional to their frequency in the data. Where is the frequency of class . This weighting reduces the impact of dominant classes while preserving the contribution of minority ones.
- (2)
- Shared Feature Learning: The common encoder learns generalized features that reduce the dependence on class-specific biases.
- (3)
- Decoupled Outputs: Task-specific heads allow the model to adapt to new tasks without propagating biases from earlier tasks.
3.5. Advantages of the Proposed Model
- Catastrophic Forgetting Mitigation: EWC ensures knowledge retention by protecting critical parameters.
- Class Imbalance Handling: The shared encoder and balanced loss function provide robust solutions for imbalanced data distributions.
- Scalability: Task-specific heads minimize architectural growth, making the model suitable for long-term incremental learning.
- Flexibility and Adaptability: The model balances stability (knowledge retention) with plasticity (learning new tasks).
3.6. Dataset Preparation
- (1)
- Initial Training Dataset: For the first experiment, the 10 most frequent classes were removed from the dataset to introduce imbalance and allow the model to focus on the less frequent classes during initial training.
- (2)
- Incremental Dataset: The second part of the dataset, including the 10 most frequent classes initially excluded, was used for incremental training to test the model’s adaptability and retention of earlier knowledge.
3.7. Experimental Procedure
- (1)
- Initial Training:
- (a)
- The first part of the dataset was used to train the model, initializing the common encoder and the task-specific head for the initial task.
- (b)
- After training, the model was saved as the baseline.
- (2)
- Incremental Training:
- (a)
- The second part of the dataset was used for training, with the classes excluded from the first stage.
- (b)
- During this phase, the common encoder was regularized using the EWC mechanism to preserve critical parameters from the initial training phase.
- (c)
- A new task-specific head was added to adapt to the new classes.
- (d)
- After incremental training, the updated model was saved.
- (3)
- Testing:
- (a)
- After the initial training, the model was tested using the dataset containing only the classes used during the first task. This evaluation aimed to measure the model’s performance on the initially trained classes before introducing new tasks.
- (b)
- After the second training (new task training), the model was tested using a dataset containing all 34 classes, including those from the first and second tasks. This evaluation measured the model’s ability to
- (i)
- Retain knowledge of the classes learned during the initial training (retention).
- (ii)
- Adapt to the newly introduced classes from the second task (adaptability).
- (iii)
- Perform across all classes in the dataset, balancing old and new knowledge (overall performance).
3.8. Evaluation Metrics
- (1)
- Accuracy: Measures the overall classification performance of the model across all tasks.
- (2)
- Class-Balanced Accuracy: Adjusted accuracy for class imbalance by calculating the mean accuracy per class.
- (3)
- Retention Score: Evaluate the model’s ability to retain knowledge of previously learned classes after incremental training.
- (4)
- Adaptation Score: Assesses the model’s effectiveness in learning new classes during incremental training.
- (5)
- F1-Score: Provides a balanced measure of precision and recall, particularly useful for imbalanced class distributions.
- (6)
- Macro and Micro Metrics: Macro metrics independently calculate each class’s metrics (e.g., precision, recall, F1-score) and then average them. This approach treats all classes equally, regardless of their frequency. Micro Metrics aggregates contributions from all classes globally, giving more weight to classes with higher support. This method is suitable for imbalanced datasets.
4. Results and Analysis
4.1. Overall Performance Trends
4.2. Effect of EWC
4.3. Impact of Class Removal
4.3.1. Removal of Most Frequent Classes:
- (a)
- Removing the 10 most frequent classes shifted focus toward smaller, underrepresented categories, significantly reducing macro metrics. For example, in the initial training without EWC, the macro recall was only 54.42%, reflecting poor performance on rare classes like Backdoor_Malware, SqlInjection, and XSS.
- (b)
- EWC mitigated these effects slightly, improving macro F1-scores from 56.33% (without EWC) to 57.13% (with EWC) in initial training. However, many rare classes still exhibited low recall, indicating persistent challenges in generalization.
4.3.2. Removal of Least Frequent Classes:
4.4. Class-Specific Observations
- (1)
- High-support classes such as BenignTraffic and DDoS-ICMP_Flood consistently achieved high precision, recall, and F1-scores, dominating weighted metrics.
- (2)
- Rare classes like Backdoor_Malware, SqlInjection, and XSS exhibited near-zero recall and F1-scores in most scenarios, especially without EWC, indicating a need for targeted strategies to improve their representation.
- (3)
- Moderately frequent classes (DNS_Spoofing, Recon-OSScan) showed variability in performance, benefiting slightly from EWC but requiring additional attention for robust generalization.
5. Incremental Learning Challenges
6. Summary of Micro and Macro Metrics
- (1)
- Micro Metrics: High micro precision, recall, and F1-scores (above 90%) were consistent across all scenarios, reflecting the dominance of high-support classes.
- (2)
- Macro Metrics: Macro scores revealed the disproportionate impact of rare classes on overall performance. These metrics varied significantly, ranging from 56.09% to 84.57%, indicating that addressing class imbalance is critical.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- De Lange, M.; Aljundi, R.; Masana, M.; Parisot, S.; Jia, X.; Leonardis, A.; Slabaugh, G.; Tuytelaars, T. A Continual Learning Survey: Defying Forgetting in Classification Tasks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3366–3385. [Google Scholar]
- Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A.A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. USA 2017, 114, 3521–3526. [Google Scholar] [CrossRef] [PubMed]
- Shaker, A.; Alesiani, F.; Yu, S.; Yin, W. Bilevel Continual Learning. arXiv 2020, arXiv:2011.01168. [Google Scholar] [CrossRef]
- Wu, Z.; Tran, H.; Pirsiavash, H.; Kolouri, S. Is Multi-Task Learning an Upper Bound for Continual Learning? In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Ebrahimi, S.; Meier, F.; Calandra, R.; Darrell, T.; Rohrbach, M. Adversarial Continual Learning. arXiv 2020, arXiv:2003.09553. [Google Scholar] [CrossRef]
- Kudithipudi, D.; Aguilar-Simon, M.; Babb, J.; Bazhenov, M.; Blackiston, D.; Bongard, J.; Brna, A.P.; Raja, S.C.; Cheney, N.; Clune, J.; et al. Biological Underpinnings for Lifelong Learning Machines. Nat. Mach. Intell. 2022, 4, 196–210. [Google Scholar] [CrossRef]
- Nguyen, C.V. Variational Continual Learning. arXiv 2017, arXiv:1710.10628. [Google Scholar] [CrossRef]
- Chaudhry, A. Efficient Lifelong Learning With a-Gem. arXiv 2018, arXiv:1812.00420. [Google Scholar] [CrossRef]
- Aljundi, R.; Babiloni, F.; Elhoseiny, M.; Rohrbach, M.; Tuytelaars, T. Memory Aware Synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Rebuffi, S.-A.; Kolesnikov, A.; Sperl, G.; Lampert, C.H. iCaRL: Incremental Classifier and Representation Learning. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5533–5542. [Google Scholar] [CrossRef]
- Wu, Y.; Chen, Y.; Wang, L.; Ye, Y.; Liu, Z.; Guo, Y.; Fu, Y. Large Scale Incremental Learning. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
- Yang, D.; Zhou, Y.; Hong, X.; Zhang, A.; Wang, W. One-Shot Replay: Boosting Incremental Object Detection via Retrospecting One Object. Proc. AAAI Conf. Artif. Intell. 2023, 37, 3127–3135. [Google Scholar] [CrossRef]
- Bang, J.; Kim, H.; Yoo, Y.; Ha, J.W.; Choi, J. Rainbow memory: Continual learning with a memory of diverse samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8218–8227. [Google Scholar]
- Shin, H.; Lee, J.K.; Kim, J.; Kim, J. Continual Learning with Deep Generative Replay. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 2990–2999. [Google Scholar]
- Rusu, A.A.; Rabinowitz, N.C.; Desjardins, G.; Soyer, H.; Kirkpatrick, J.; Kavukcuoglu, K.; Pascanu, R.; Hadsell, R. Progressive Neural Networks. arXiv 2016, arXiv:1606.04671. [Google Scholar]
- Hizal, S.; Cavusoglu, U.; Akgun, D. A novel deep learning-based intrusion detection system for IoT DDoS security. Internet Things 2024, 28, 101336. [Google Scholar] [CrossRef]
- Deng, J.-R.; Hu, J.; Zhang, H.; Wang, Y. Incremental Prototype Tuning for Class Incremental Learning. arXiv 2022, arXiv:2204.03410. [Google Scholar] [CrossRef]
- Hou, S.; Pan, X.; Loy, C.C.; Wang, Z.; Lin, D. Learning a Unified Classifier Incrementally via Rebalancing. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 831–839. [Google Scholar]
- Cha, S.; Cho, S.; Hwang, D.; Hong, S.; Lee, M.; Moon, T. Rebalancing Batch Normalization for Exemplar-Based Class-Incremental Learning. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 20127–20136. [Google Scholar]
- Kang, M.; Park, J.; Han, B. Class-Incremental Learning by Knowledge Distillation with Adaptive Feature Consolidation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Huang, L.; Cao, X.; Lu, H.; Liu, X. Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion. In Computer Vision—ECCV 2024, Proceedings of the 18th European Conference, Milan, Italy, 29 September–4 October 2024; Proceedings, Part LIV; Springer: Berlin, Heidelberg, Germany, 2024; pp. 214–231. [Google Scholar]
- Wen, H.; Pan, L.; Dai, Y.; Qiu, H.; Wang, L.; Wu, Q.; Li, H. Class Incremental Learning with Multi-Teacher Distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
- Küçükkara, M.Y.; Atban, F.; Bayılmış, C. Quantum-Neural Network Model for Platform Independent DDoS Attack Classification in Cyber Security. Adv. Quantum Technol. 2024, 7, 2400084. [Google Scholar] [CrossRef]
- Hızal, S.; Akhter, A.F.M.S.; Çavuşoğlu, Ü.; Akgün, D. Blockchain-based IoT security solutions for IDS research centers. Internet Things 2024, 27, 101307. [Google Scholar] [CrossRef]
- Joseph, K.; Rajasegaran, J.; Khan, S.; Khan, F.; Balasubramanian, V. Incremental object detection via meta-learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 9209–9216. [Google Scholar] [CrossRef]
- Bayılmış, C.; Ebleme, M.A.; Çavuşoğlu, Ü.; Küçük, K.; Sevin, A. A survey on communication protocols and performance evaluations for Internet of Things. Digit. Commun. Netw. 2022, 8, 1094–1104. [Google Scholar] [CrossRef]
- Chen, H.; Bajorath, J. Meta-learning for transformer-based prediction of potent compounds. Sci. Rep. 2023, 13, 16145. [Google Scholar] [CrossRef]
- Zhang, W.; Gu, X. Few-shot class incremental learning via efficient prototype replay and calibration. Entropy 2023, 25, 776. [Google Scholar] [CrossRef]
- Tabassum, A.; Erbad, A.; Mohamed, A.; Guizani, M. Privacy-preserving distributed IDs using incremental learning for IoT health systems. IEEE Access 2021, 9, 14271–14283. [Google Scholar] [CrossRef]
- Sun, Z.; Guo, R.; Jin, Z. Intrusion detection method based on active incremental learning in industrial internet of things environment. J. Internet Things 2022, 4, 99–111. [Google Scholar] [CrossRef]
- Akgün, D.; Hizal, S.; Çavuşoğlu, Ü. A new DDoS attack intrusion detection model based on deep learning for cybersecurity. Comput. Secur. 2022, 117, 102748. [Google Scholar] [CrossRef]
- Neto, E.C.P.; Dadkhah, S.; Ferreira, R.; Zohourian, A.; Lu, R.; Ghorbani, A.A. CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment. Sensors 2023, 23, 5941. [Google Scholar] [CrossRef]
Approach | Description | Strengths | Limitations | Key References |
---|---|---|---|---|
Regularization-Based | Constrains updates to critical parameters using regularization (e.g., EWC). | -Effectively mitigates catastrophic forgetting. -Preserves knowledge of earlier tasks. | -Struggles with class imbalance. -Prioritizes dominant classes. | [2,3,8,9] |
Replay-Based | Stores or generates data for replay during training (e.g., iCaRL, One-Shot Replay). | -Balances old and new classes. -Enhances incremental learning outcomes. | -Requires significant memory. -Biases in replay selection. -Does not fully address imbalance. | [10,11,12,13,14] |
Architectural-Based | Adds network resources dynamically (e.g., Progressive Neural Networks, PackNet). | -Reduces task interference. -Strong in mitigating catastrophic forgetting. | -Scalability issues. -Increasing network size with more tasks. | [11,15] |
Class Imbalance | Addresses bias towards overrepresented classes using bias correction (e.g., BIC). | -Attempts bias correction. | -Requires full dataset access. -Struggles with severe class imbalance across tasks. | [11,17,18,19] |
Encoder-Based | Uses shared encoders for task-agnostic feature representations, feature distillation, and multi-head architectures. | -Reduces class-specific disparities. -Preserves feature-level knowledge. | -Overlooks parameter importance. -Degradation of critical features over time. | [20,21,22] |
Proposed Method | Combines EWC with shared encoders and task-specific output heads. | -Robust feature learning across tasks. -Mitigates catastrophic forgetting and class imbalance. -Scalable and efficient. | -Sensitive to class distribution changes. -Needs improvement for rare classes. | This work |
Metric | Accuracy | Macro Precision | Macro Recall | Macro F1-Score |
---|---|---|---|---|
Initial Training (No EWC, Most Frequent Removed) | 0.9063 | 0.7077 | 0.5442 | 0.5633 |
Final Training (No EWC, Most Frequent Removed) | 0.9685 | 0.6301 | 0.5143 | 0.5143 |
Initial Training (No EWC, Least Frequent Removed) | 0.9838 | 0.8714 | 0.8346 | 0.8457 |
Final Training (No EWC, Least Frequent Removed) | 0.9752 | 0.6232 | 0.5546 | 0.5609 |
Initial Training (EWC, Most Frequent Removed) | 0.9099 | 0.6866 | 0.5507 | 0.5713 |
Final Training (EWC, Most Frequent Removed) | 0.8607 | 0.665 | 0.5787 | 0.593 |
Initial Training (EWC, Least Frequent Removed) | 0.9809 | 0.871 | 0.8028 | 0.8201 |
Final Training (EWC, Least Frequent Removed) | 0.9809 | 0.871 | 0.8028 | 0.8201 |
Class | Accuracy (No EWC) | Accuracy (EWC) |
---|---|---|
BenignTraffic | 0.95 | 0.96 |
DDoS-ICMP_Flood | 0.98 | 0.99 |
Backdoor_Malware | 0.02 | 0.05 |
SqlInjection | 0.01 | 0.03 |
XSS | 0 | 0 |
Recon-OSScan | 0.45 | 0.5 |
MITM-ArpSpoofing | 0.7 | 0.75 |
DDoS-UDP_Flood | 0.98 | 0.99 |
Class | Precision (Best) | Recall (Best) | F1-Score (Best) | Precision (Worst) | Recall (Worst) | F1-Score (Worst) |
---|---|---|---|---|---|---|
BenignTraffic | 0.8009 | 0.975 | 0.8777 | 0.5648 | 0.9279 | 0.851 |
DDoS-ICMP_Flood | 1 | 0.999 | 0.9994 | 0.9715 | 0.9736 | 0.9866 |
Backdoor_Malware | 0.4679 | 0.049 | 0.089 | 0 | 0 | 0 |
SqlInjection | 0.4301 | 0.024 | 0.0458 | 0 | 0 | 0 |
XSS | 0 | 0 | 0 | 0 | 0 | 0 |
Scenario | Macro Precision | Macro Recall | Macro F1-Score |
---|---|---|---|
Initial Training (No EWC, Most Frequent Removed) | 0.7077 | 0.5442 | 0.5633 |
Final Training (No EWC, Most Frequent Removed) | 0.6301 | 0.5143 | 0.5143 |
Initial Training (EWC, Most Frequent Removed) | 0.6866 | 0.5507 | 0.5713 |
Final Training (EWC, Most Frequent Removed) | 0.665 | 0.5787 | 0.593 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Baysal, E.; Bayılmış, C. Overcoming Class Imbalance in Incremental Learning Using an Elastic Weight Consolidation-Assisted Common Encoder Approach. Mathematics 2025, 13, 1887. https://doi.org/10.3390/math13111887
Baysal E, Bayılmış C. Overcoming Class Imbalance in Incremental Learning Using an Elastic Weight Consolidation-Assisted Common Encoder Approach. Mathematics. 2025; 13(11):1887. https://doi.org/10.3390/math13111887
Chicago/Turabian StyleBaysal, Engin, and Cüneyt Bayılmış. 2025. "Overcoming Class Imbalance in Incremental Learning Using an Elastic Weight Consolidation-Assisted Common Encoder Approach" Mathematics 13, no. 11: 1887. https://doi.org/10.3390/math13111887
APA StyleBaysal, E., & Bayılmış, C. (2025). Overcoming Class Imbalance in Incremental Learning Using an Elastic Weight Consolidation-Assisted Common Encoder Approach. Mathematics, 13(11), 1887. https://doi.org/10.3390/math13111887