Post Hoc Error Correction for Missing Classes in Deep Neural Networks
Abstract
1. Introduction
- It operates after training, requiring no retraining of the base model.
- It can recover completely missing classes excluded from the original training set.
- It is model-agnostic and can work with any pre-trained classifier architecture.
- It maintains performance on original classes while adding capability to recognize new categories.
2. Methodology
2.1. The Corrector Model
- Standard classification by means of f on a set of known classes ;
- Identification of new or missing classes using the g corrector.
- Resilience to the appearance of classes that were missing at the learning stage;
- The ability to process unbalanced samples by highlighting “rare” or excluded classes;
- The ability to extend the functionality of the original f classifier to the more general h system.
2.2. Metrics for Evaluating the Quality of a Proofreader
2.3. Basic One-Class Metrics
2.4. Metrics of “Spillover” and False Positives on Other Classes
2.5. Summary Indicators and Standards
2.6. Interpretation of Metrics
- close to 1 means that the corrector does not spoil the already correctly recognized samples of class i.
- demonstrates the share of “collateral damage” for already correct predictions; the lower the value the better.
- shows the corrector’s ability to restore previously erroneous examples of this class.
- indicates an increase in false positive assignments to class i (side effect).
3. Data
4. Proposed Correction Method
4.1. Base Model
4.2. Corrector Module
- Feature Extraction: Collect hidden representations from intermediate layers of the base model, including convolutional feature maps, LSTM hidden states, attention weights, and pre-classification embeddings.
- Corrector Training: Train a gradient boosting classifier (XGBoost) on these extracted features to learn patterns that distinguish between correctly classified samples and potential errors, particularly for classes excluded during base model training.
- Inference Pipeline: During deployment, the corrector analyzes the base model’s hidden representations and can override predictions when it detects high-confidence patterns of excluded classes.
- Computational Efficiency: Lightweight corrector training vs. full model retraining.
- Flexibility: The same correction framework is applicable to different base architectures.
- Incremental Learning: The ability to add new class recognition without modifying deployed models.
- Preservation of Existing Performance: Minimal impact on already learned categories.
5. Technical Implementation of the Experiment
6. Data Augmentation
- Random horizontal flip with a probability of ;
- Random rotation within ;
- Random resized cropping to a fixed resolution of with a scaling factor sampled from ;
- Color jitter with brightness and contrast variations up to ;
- Conversion to tensor and normalization with mean and standard deviation .
7. Training Methodology
8. Correction Mechanism
9. Results
9.1. Metrics
9.2. Grad-CAM Analysis of LSTM Model
- Happiness: primarily cheeks and forehead.
- Joy: eyebrows and lips.
- Fear: eyebrows and cheeks.
- Disgust: eyes and mouth.
- Sadness: eyes and mouth.
- Anger: teeth and eyebrows.
- Neutral: distributed across the face; no single region is dominant.
10. Computational Efficiency and Resource Requirements
11. Discussion
- Using more compact embeddings;
- Scaling experiments on large samples to increase stability;
- Developing adaptive correctors that take into account the semantic and statistical relationships of classes.
12. Limitations and Future Work
- Performance depends on the discriminative power of the base model’s hidden representations;
- Storage of intermediate features is required during inference;
- Effectiveness varies with the visual distinctiveness of excluded classes.
- Extension to incremental learning with multiple new classes;
- Investigation of feature compression techniques for efficiency;
- Application to other domains such as medical imaging or security systems.
13. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Tao, J.; Tan, T. Affective computing: A review. Int. J. Autom. Comput. 2005, 2, 302–312. [Google Scholar]
- Calvo, R.; D’Mello, S. Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Trans. Affect. Comput. 2010, 1, 18–37. [Google Scholar] [CrossRef]
- Liu, Y.; Fu, G. Emotion recognition by deeply learned multi-channel textual and EEG features. Future Gener. Comput. Syst. 2021, 119, 1–6. [Google Scholar] [CrossRef]
- Koelstra, S.; Muhl, C.; Soleymani, M.; Lee, J.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. DEAP: A database for emotion analysis using physiological signals. IEEE Trans. Affect. Comput. 2012, 3, 18–31. [Google Scholar] [CrossRef]
- Sariyanidi, E.; Gunes, H.; Cavallaro, A. Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1113–1133. [Google Scholar] [CrossRef] [PubMed]
- Goodfellow, I.; Erhan, D.; Carrier, P.; Courville, A.; Mirza, M.; Hamner, B.; Cukierski, W.; Tang, Y.; Thaler, D.; Lee, D.; et al. Challenges in representation learning: A report on three machine learning contests. In Neural Information Processing; Springer: Berlin/Heidelberg, Germany, 2013; pp. 117–124. [Google Scholar]
- He, H.; Garcia, E. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
- Chawla, N.; Bowyer, K.; Hall, L.; Kegelmeyer, W. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Cui, Y.; Jia, M.; Lin, T.; Song, Y.; Belongie, S. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9268–9277. [Google Scholar]
- Lin, T.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Khan, S.; Hayat, M.; Zamir, S.; Shen, J.; Shao, L. Striking the right balance with uncertainty. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 2995–3007. [Google Scholar]
- Huang, C.; Li, Y.; Loy, C.C.; Tang, X. Deep imbalanced learning for face recognition and attribute prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 2781–2794. [Google Scholar] [CrossRef]
- Hans, A.; Rao, S. A CNN-LSTM based deep neural networks for facial emotion detection in videos. Int. J. Adv. Signal Image Sci. 2021, 7, 11–20. [Google Scholar] [CrossRef]
- Rajpoot, A.S.; Panicker, M.R. Subject independent emotion recognition using EEG signals employing attention driven neural networks. Biomed. Signal Process. Control 2022, 75, 103547. [Google Scholar]
- Wang, K.; Peng, X.; Yang, J.; Meng, D.; Qiao, Y. Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 2020, 29, 4057–4069. [Google Scholar] [CrossRef]
- Wang, L.; Zhang, L.; Qi, X.; Yi, Z. Deep attention-based imbalanced image classification. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 3320–3330. [Google Scholar] [CrossRef]
- Altalhan, M.; Algarni, A.; Alouane, M. Imbalanced data problem in machine learning: A review. IEEE Access 2025, 13, 13686–13699. [Google Scholar] [CrossRef]
- Saurav, S.; Saini, R.; Singh, S. An integrated attention-guided deep convolutional neural network for facial expression recognition in the wild. Multimed. Tools Appl. 2025, 84, 10027–10069. [Google Scholar] [CrossRef]
- Huang, Q.; Huang, C.; Wang, X.; Jiang, F. Facial expression recognition with grid-wise attention and visual transformer. Inf. Sci. 2021, 580, 35–54. [Google Scholar] [CrossRef]
- Daihong, J.; Lei, D.; Jin, P. Facial expression recognition based on attention mechanism. Sci. Program. 2021, 2021, 6624251. [Google Scholar] [CrossRef]
- Gorban, A.; Grechuk, B.; Tyukin, I. Stochastic separation theorems: How geometry may help to correct AI errors. Not. Am. Math. Soc. 2023, 25–33. [Google Scholar] [CrossRef]
- Gorban, A.; Golubkov, A.; Grechuk, B.; Mirkes, E.; Tyukin, I. Correction of AI systems by linear discriminants: Probabilistic foundations. Inf. Sci. 2018, 466, 303–322. [Google Scholar] [CrossRef]
- Kovalchuk, A.; Lebedev, A.; Shemagina, O.; Nuidel, I.; Yakhno, V.; Stasenko, S. Enhancing Cascade Object Detection Accuracy Using Correctors Based on High-Dimensional Feature Separation. Technologies 2025, 13, 593. [Google Scholar] [CrossRef]
- Gorban, A.; Grechuk, B.; Mirkes, E.; Stasenko, S.; Tyukin, I. High-dimensional separability for one-and few-shot learning. Entropy 2021, 23, 1090. [Google Scholar] [CrossRef] [PubMed]
- Grechuk, B.; Gorban, A.; Tyukin, I. General stochastic separation theorems with optimal bounds. Neural Networks 2021, 138, 33–56. [Google Scholar] [CrossRef]
- Yi, W.; Sun, Y.; He, S. Data augmentation using conditional GANs for facial emotion recognition. In Proceedings of the 2018 Progress in Electromagnetics Research Symposium (PIERS-Toyama), Toyama, Japan, 1–4 August 2018; pp. 710–714. [Google Scholar]
- Chen, H.; Guo, C.; Li, Y.; Zhang, P.; Jiang, D. Semi-supervised multimodal emotion recognition with class-balanced pseudo-labeling. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 9556–9560. [Google Scholar]
- Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2014; Volume 27. [Google Scholar]
- Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese neural networks for one-shot image recognition. In Proceedings of the ICML Deep Learning Workshop, Lille, France, 6–11 July 2015; Volume 2. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
- Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; pp. 41–48. [Google Scholar]
- Chen, T.; Kornblith, S.; Swersky, K.; Norouzi, M.; Hinton, G.E. Big self-supervised models are strong semi-supervised learners. Adv. Neural Inf. Process. Syst. 2020, 33, 22243–22255. [Google Scholar]
- Li, S.; Deng, W.; Du, J. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2852–2861. [Google Scholar]
- Li, S.; Deng, W. RAF-DB DATASET. 2025. Available online: https://www.kaggle.com/datasets/shuvoalok/raf-db-dataset (accessed on 10 September 2025).
- Ming, Y.; Wang, J.; Zhang, J.; Li, H.; Liu, Y. CNN-LSTM facial expression recognition method fused with two-layer attention mechanism. Comput. Intell. Neurosci. 2022, 2022, 7450637. [Google Scholar] [CrossRef] [PubMed]






| Data Split | |||||
|---|---|---|---|---|---|
| Label | Emotion | Train | Test | Correct | Sum |
| 1 | Surprise | 814 | 379 | 426 | 1619 |
| 2 | Fear | 166 | 88 | 101 | 355 |
| 3 | Disgust | 449 | 220 | 208 | 877 |
| 4 | Happiness | 2974 | 1506 | 1477 | 5957 |
| 5 | Sadness | 1216 | 628 | 616 | 2460 |
| 6 | Anger | 435 | 216 | 216 | 867 |
| 7 | Neutral | 1615 | 798 | 791 | 3204 |
| All emotions | 7669 | 3835 | 3835 | 15,339 | |
| Block/Section | Layer/Parameter | Input Dim/Symbol | Output Dim/Value | Description |
|---|---|---|---|---|
| Conv block | Conv1 | Conv(3×3) + BN + ReLU + MaxPool | ||
| Conv2 | Conv(3×3) + BN + ReLU + MaxPool | |||
| Conv3 | Conv(3×3) + BN + ReLU + MaxPool | |||
| Recurrent block | Reshape | , | ||
| LSTM | LSTM(), batch_first=True | |||
| Attention block | MH Attention | Multi-head attention with h heads | ||
| Aggregation | Mean/weighted pooling or CLS token | |||
| Classifier | FC1 | Linear + ReLU | ||
| Dropout | Dropout(p) | |||
| FC2 | Output layer | |||
| Hyperparameters | ||||
| Input channels | – | 3 | RGB input | |
| Conv block channels | – | 32, 64, 128 | Number of filters per conv layer | |
| LSTM hidden size | – | 128 | Hidden dimension of LSTM | |
| Attention dimension | – | 128 | Attention embedding size | |
| Number of heads | – | h | 4 | Multi-head attention heads |
| FC hidden size | – | 128 | Fully connected hidden units | |
| Dropout rate | – | p | 0.2 | Dropout probability |
| Number of classes | – | N | 7 | Output classes |
| Label | ||
|---|---|---|
| 1 | 0.015 | 0.517 |
| 2 | 0.001 | 0.239 |
| 3 | 0.029 | 0.482 |
| 4 | 0.095 | 0.811 |
| 5 | 0.023 | 0.242 |
| 6 | 0.002 | 0.204 |
| 7 | 0.072 | 0.590 |
| Retention | |||||||
|---|---|---|---|---|---|---|---|
| Corrector | |||||||
| Label | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 1 | 1.000 | 1.000 | 1.000 | 0.963 | 0.995 | 0.997 | 0.971 |
| 2 | 0.898 | 1.000 | 0.977 | 0.989 | 1.000 | 0.977 | 1.000 |
| 3 | 1.000 | 1.000 | 1.000 | 0.977 | 1.000 | 1.000 | 0.945 |
| 4 | 0.999 | 0.999 | 0.999 | 1.000 | 0.997 | 0.999 | 0.975 |
| 5 | 0.992 | 1.000 | 0.995 | 0.876 | 1.000 | 1.000 | 0.881 |
| 6 | 0.981 | 0.995 | 1.000 | 0.903 | 0.968 | 1.000 | 0.977 |
| 7 | 0.982 | 0.999 | 0.997 | 0.955 | 0.957 | 1.000 | 1.000 |
| Average | 0.973 | 0.999 | 0.996 | 0.940 | 0.995 | 0.996 | 0.963 |
| Harm | |||||||
| Corrector | |||||||
| Label | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 1 | 0.000 | 0.000 | 0.000 | 0.037 | 0.005 | 0.003 | 0.029 |
| 2 | 0.102 | 0.000 | 0.023 | 0.011 | 0.000 | 0.023 | 0.000 |
| 3 | 0.000 | 0.000 | 0.000 | 0.023 | 0.000 | 0.000 | 0.055 |
| 4 | 0.001 | 0.001 | 0.001 | 0.000 | 0.003 | 0.001 | 0.025 |
| 5 | 0.008 | 0.000 | 0.005 | 0.124 | 0.000 | 0.000 | 0.119 |
| 6 | 0.019 | 0.005 | 0.000 | 0.097 | 0.032 | 0.000 | 0.023 |
| 7 | 0.018 | 0.001 | 0.003 | 0.045 | 0.043 | 0.000 | 0.000 |
| Average | 0.021 | 0.001 | 0.005 | 0.048 | 0.012 | 0.004 | 0.036 |
| Predict | Model 6 Classes + Corrector | Model 7 Classes | P | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Label | 1 | 2 | 3 | 4 | 5 | 6 | 7 | ||
| 1 | +0.51 | 0.77 | 0.79 | 0.84 | 0.75 | 0.76 | 0.80 | 0.78 | 0.55 |
| 2 | 0.55 | +0.24 | 0.06 | 0.43 | 0.47 | 0.45 | 0.39 | 0.43 | 0.56 |
| 3 | 0.38 | 0.50 | +0.48 | 0.98 | 0.98 | 0.97 | 0.95 | 0.74 | 0.65 |
| 4 | 0.99 | 0.99 | 0.99 | +0.77 | 0.86 | 0.90 | 0.92 | 0.90 | 0.86 |
| 5 | 0.69 | 0.71 | 0.72 | 0.71 | +0.24 | 0.72 | 0.80 | 0.78 | 0.31 |
| 6 | 0.68 | 0.66 | 0.63 | 0.63 | 0.73 | +0.20 | 0.63 | 0.67 | 0.30 |
| 7 | 0.77 | 0.76 | 0.73 | 0.79 | 0.87 | 0.71 | +0.59 | 0.90 | 0.66 |
| Component | Training Time | Inference Time/Image | Memory Usage |
|---|---|---|---|
| Base Model (CNN-LSTM) | ∼4 h per model | 15–20 ms | ∼45 MB |
| Corrector (XGBoost) | ∼10 min per corrector | <1 ms | 2–3 MB |
| Full Model Retraining | 4–6 h per model | 15–20 ms | ∼45 MB |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lebedev, A.A.; Kazantsev, V.B.; Stasenko, S.V. Post Hoc Error Correction for Missing Classes in Deep Neural Networks. Technologies 2026, 14, 8. https://doi.org/10.3390/technologies14010008
Lebedev AA, Kazantsev VB, Stasenko SV. Post Hoc Error Correction for Missing Classes in Deep Neural Networks. Technologies. 2026; 14(1):8. https://doi.org/10.3390/technologies14010008
Chicago/Turabian StyleLebedev, Andrey A., Victor B. Kazantsev, and Sergey V. Stasenko. 2026. "Post Hoc Error Correction for Missing Classes in Deep Neural Networks" Technologies 14, no. 1: 8. https://doi.org/10.3390/technologies14010008
APA StyleLebedev, A. A., Kazantsev, V. B., & Stasenko, S. V. (2026). Post Hoc Error Correction for Missing Classes in Deep Neural Networks. Technologies, 14(1), 8. https://doi.org/10.3390/technologies14010008

