ACVM: An Adaptive Combination Validation Mechanism for Long-Tailed Image Recognition
Abstract
1. Introduction
- We use K-fold cross-validation to replace independent validation sets with a comprehensive and authentic difficulty estimation tool for providing difficulty estimations at both the class and sample levels.
- We harmonize the learning process of the main model by using sample probability distribution information, which reflects the sample difficulty level, which is obtained from all sub-models. This not only improves the quality of the main model in learning tail classes, but also improves the overall performance.
- Extensive experiments on several popular long-tailed image recognition datasets (CIFAR10-LT and CIFAR100-LT, with several varying imbalance rates, and ImageNet-LT) demonstrate that the proposed method can effectively alleviate the long-tailed issue and achieve state-of-the-art performance on most datasets.
2. Related Work
2.1. Resampling
2.2. Reweighting
3. Methods
3.1. Description About the Long-Tail Issue
3.2. The Overall Framework of ACVM
3.3. Methods in ACVM
3.3.1. ADVW
3.3.2. DHL
3.4. The Procedure of ACVM Algorithm
| Algorithm 1: ACVM |
| Input: Training set , K (the number of cross-validation folds), (difficulty weight exponent), α (distribution harmonization weight), MaxEpochs. Output: Main model F with the optimized parameters . Procedure:
|
4. Experiments
4.1. Datasets
4.2. Experimental Details
4.3. Compared Methods
4.4. Main Results
4.5. Ablation Study
4.5.1. The Effect of α
4.5.2. The Effect of K
4.5.3. The Effect of Each Module
4.5.4. Isolating the Effect of K-Fold Cross-Validation
4.5.5. Quality of the Estimated Class Difficulty
5. Concluding Remarks
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| DNN | Deep Neural Network |
| ACVM | Adaptive Combination Validation Mechanism |
| ADVW | Adaptive Difficulty Validation Weighting |
| DHL | Distributed Harmonic Loss |
References
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollar, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference of Computer Vision (ECCV 2014), Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar] [CrossRef]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- Wang, J.; Zhang, W.; Zang, Y.; Cao, Y.; Pang, J.; Gong, T.; Chen, K.; Liu, Z.; Loy, C.C.; Lin, D. Seesaw loss for long-tailed instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 9695–9704. [Google Scholar] [CrossRef]
- Zhang, Y.; Kang, B.; Hooi, B.; Yan, S.; Feng, J. Deep long-tailed learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10795–10816. [Google Scholar] [CrossRef] [PubMed]
- Ren, J.; Yu, C.; Ma, X.; Zhao, H.; Yi, S. Balanced meta-softmax for long-tailed visual recognition. Adv. Neural Inf. Process. Syst. 2020, 33, 4175–4186. [Google Scholar] [CrossRef]
- Malisiewicz, T.; Gupta, A.; Efros, A.A. Ensemble of exemplar-svms for object detection and beyond. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 89–96. [Google Scholar] [CrossRef]
- Alshammari, S.; Wang, Y.-X.; Ramanan, D.; Kong, S. Long-tailed recognition via weight balancing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 6897–6907. [Google Scholar] [CrossRef]
- Jamal, M.A.; Brown, M.; Yang, M.-H.; Wang, L.; Gong, B. Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–24 June 2020; pp. 7610–7619. [Google Scholar] [CrossRef]
- Li, M.; Cheung, Y.-M.; Lu, Y. Long-tailed visual recognition via gaussian clouded logit adjustment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 6929–6938. [Google Scholar] [CrossRef]
- Zhou, B.; Cui, Q.; Wei, X.-S.; Chen, Z.-M. BBN: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 9719–9728. [Google Scholar] [CrossRef]
- Buda, M.; Maki, A.; Mazurowski, M.A. A Systematic Study of the Class Imbalance Problem in Convolutional Neural Networks. Neural Netw. 2018, 106, 249–259. [Google Scholar] [CrossRef] [PubMed]
- He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
- Zhang, Z.; Pfister, T. Learning Fast Sample Re-Weighting Without Reward Data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 725–734. [Google Scholar] [CrossRef]
- Shu, J.; Xie, Q.; Yi, L.; Zhao, Q.; Zhou, S.; Xu, Z.; Meng, D. Meta-Weight-Net: Learning an Explicit Mapping for Sample Weighting. Adv. Neural Inf. Process. Syst. 2019, 32, 1917–1928. [Google Scholar]
- Cui, Y.; Jia, M.; Lin, T.-Y.; Song, Y.; Belongie, S. Class-Balanced Loss Based on Effective Number of Samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–21 June 2019; pp. 9268–9277. [Google Scholar] [CrossRef]
- Cao, K.; Wei, C.; Gaidon, A.; Arechiga, N.; Ma, T. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss. Adv. Neural Inf. Process. Syst. 2019, 32, 1567–1578. [Google Scholar]
- Sinha, S.; Ohashi, H.; Nakamura, K. Class-Difficulty Based Methods for Long-Tailed Visual Recognition. Int. J. Comput. Vis. 2022, 130, 2517–2531. [Google Scholar] [CrossRef]
- Anguita, D.; Ghelardoni, L.; Ghio, A.; Oneto, L.; Ridella, S. The ’K’ in K-Fold Cross Validation. In Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges, Belgium, 25–27 April 2012; pp. 441–446. Available online: https://www.esann.org/sites/default/files/proceedings/legacy/es2012-62.pdf (accessed on 9 April 2025).
- Shen, L.; Lin, Z.; Huang, Q. Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks. In Proceedings of the European Conference of Computer Vision (ECCV 2016), Amsterdam, The Netherlands, 11–14 October 2016; pp. 467–482. [Google Scholar] [CrossRef]
- Japkowicz, N.; Stephen, S. The Class Imbalance Problem: A Systematic Study. Intell. Data Anal. 2002, 6, 429–449. [Google Scholar] [CrossRef]
- Byrd, J.; Lipton, Z. What Is the Effect of Importance Weighting in Deep Learning? In Proceedings of the International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; pp. 872–881. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Wu, T.; Huang, Q.; Liu, Z.; Wang, Y.; Lin, D. Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets. In Proceedings of the European Conference of Computer Vision (ECCV 2020), Glasgow, UK, 23–28 August 2020; pp. 162–178. [Google Scholar] [CrossRef]
- Kang, B.; Xie, S.; Rohrbach, M.; Yan, Z.; Gordo, A.; Feng, J.; Kalantidis, Y. Decoupling Representation and Classifier for Long-Tailed Recognition. arXiv 2019, arXiv:1910.09217. [Google Scholar] [CrossRef]
- Chen, X.; Zhou, Y.; Wu, D.; Zhang, W.; Zhou, Y.; Li, B.; Wang, W. Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 22 February–1 March 2022; pp. 356–364. [Google Scholar] [CrossRef]
- Wang, T.; Zhu, Y.; Zhao, C.; Zeng, W.; Wang, J.; Tang, M. Adaptive Class Suppression Loss for Long-Tail Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 3103–3112. [Google Scholar] [CrossRef]
- Li, B.; Yao, Y.; Tan, J.; Zhang, G.; Yu, F.; Lu, J.; Luo, Y. Equalized Focal Loss for Dense Long-Tailed Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 6990–6999. [Google Scholar] [CrossRef]
- Tan, J.; Wang, C.; Li, B.; Li, Q.; Ouyang, W.; Yin, C.; Yan, J. Equalization Loss for Long-Tailed Object Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–24 June 2020; pp. 11662–11671. [Google Scholar] [CrossRef]
- Huang, C.; Li, Y.; Loy, C.C.; Tang, X. Learning Deep Representation for Imbalanced Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 5375–5384. [Google Scholar] [CrossRef]
- Mahajan, D.; Girshick, R.; Ramanathan, V.; He, K.; Paluri, M.; Li, Y.; Bharambe, A.; Van Der Maaten, L. Exploring the Limits of Weakly Supervised Pretraining. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 181–196. [Google Scholar] [CrossRef]
- Sinha, S.; Ohashi, H. Difficulty-Net: Learning to Predict Difficulty for Long-Tailed Recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 28 February–4 March 2023; pp. 6444–6453. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Liu, Z.; Miao, Z.; Zhan, X.; Wang, J.; Gong, B.; Yu, S.X. Large-Scale Long-Tailed Recognition in an Open World. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–21 June 2019; pp. 2537–2546. [Google Scholar] [CrossRef]
- Park, S.; Lim, J.; Jeon, Y.; Choi, J.Y. Influence-Balanced Loss for Imbalanced Visual Classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 735–744. [Google Scholar] [CrossRef]
- Zhang, X.; Fang, Z.; Wen, Y.; Li, Z.; Qiao, Y. Range Loss for Deep Face Recognition with Long-Tailed Training Data. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5409–5418. [Google Scholar] [CrossRef]
- Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. Adv. Neural Inf. Process. Syst. 2017, 30, 3069–3079. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
- Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, NSW, Australia, 6–11 August 2017; pp. 1321–1330. [Google Scholar] [CrossRef]




| Datasets | Number of Classes | IF | Number of Samples in the Maximum Class | Number of Samples in the Minimum Class |
|---|---|---|---|---|
| CIFAR10-LT | 10 | 10, 50, 100, 200 | 5000 | 25~500 |
| CIFAR100-LT | 100 | 10, 50, 100, 200 | 500 | 2~50 |
| ImageNet-LT | 1000 | 256 | 1280 | 5 |
| Dataset | IF | K | α | |
|---|---|---|---|---|
| CIFAR10-LT | 10 | 3 | 1.0 | 0.4 |
| 50 | 3 | 1.5 | 0.2 | |
| 100 | 3 | 1.5 | 0.4 | |
| 200 | 3 | 1.0 | 0.6 | |
| CIFAR100-LT | 10 | 3 | 1.0 | 0.8 |
| 50 | 3 | 1.5 | 0.8 | |
| 100 | 3 | 1.5 | 0.6 | |
| 200 | 3 | 1.0 | 0.6 | |
| ImageNet-LT | 156 | 3 | 2.0 | 0.5 |
| Dataset | CIFAR10-LT | CIFAR100-LT | ||||||
|---|---|---|---|---|---|---|---|---|
| IF | 10 | 50 | 100 | 200 | 10 | 50 | 100 | 200 |
| CE loss | 86.39 | 74.81 | 70.36 | 65.68 | 55.65 | 43.85 | 38.21 | 34.84 |
| Focal loss [15] | 86.66 | 76.71 | 70.38 | 65.29 | 55.78 | 44.32 | 38.41 | 35.62 |
| EQ loss [31] | - | - | - | - | 58.32 | - | 40.54 | - |
| MWN [17] | 87.84 | 80.06 | 75.21 | 68.91 | 58.46 | 46.74 | 42.09 | 37.91 |
| CB loss [18] | 87.49 | 79.27 | 74.57 | 68.89 | 57.89 | 45.32 | 39.60 | 36.23 |
| LDAM [19] | 86.97 | - | 73.35 | - | 56.91 | - | 39.60 | - |
| CDB loss [20] | 88.21 | 81.06 | 77.91 | 73.93 | 59.47 | 47.09 | 42.70 | 37.99 |
| DN [34] | 87.97 | 80.65 | 77.93 | 74.17 | 55.50 | 44.89 | 40.93 | 36.87 |
| LDAM-DRW [19] | 88.16 | - | 77.03 | - | 57.99 | - | 42.04 | - |
| IB [37] | 88.25 | 81.70 | 78.26 | 73.96 | 57.13 | 46.22 | 42.14 | 37.31 |
| ACVM (ours) | 87.76 | 81.71 | 78.79 | 75.15 | 59.08 | 49.03 | 43.44 | 39.58 |
| Method | Accuracy |
|---|---|
| CE loss | 38.88 |
| Focal loss [15] | 30.50 |
| Range loss [38] | 30.70 |
| EQ loss [31] | 36.40 |
| BS [7] | 41.80 |
| CB loss [18] | 40.85 |
| LDAM [19] | 41.86 |
| LDAM-DRW [19] | 45.74 |
| CDB loss [20] | 46.56 |
| DN [34] | 44.69 |
| ACVM (ours) | 47.70 |
| α | 0.1 | 0.3 | 0.5 | 0.7 | 0.9 |
|---|---|---|---|---|---|
| All | 44.70 | 46.01 | 47.70 | 46.97 | 47.11 |
| Head | 62.90 | 64.18 | 65.99 | 65.18 | 64.87 |
| Medium | 38.87 | 40.17 | 41.96 | 41.23 | 41.45 |
| Tail | 13.68 | 15.15 | 16.18 | 16.33 | 16.76 |
| # | Sub-Model Loss | ADVW | DHL | Accuracy |
|---|---|---|---|---|
| 1 | Focal loss | ✔ | ✔ | 42.47 |
| 2 | CB loss | ✔ | ✔ | 41.87 |
| 3 | EQ loss | ✔ | ✔ | 41.95 |
| 4 | CDB loss | ✔ | ✔ | 41.56 |
| 5 | CE loss | ✘ | ✘ | 38.21 |
| 6 | CE loss | ✘ | ✔ | 40.65 |
| 7 | CE loss | ✔ | ✘ | 43.04 |
| 8 | CE loss | ✔ | ✔ | 43.44 |
| Method | Configuration Details | Accuracy |
|---|---|---|
| CE Baseline | Single model, 200 epochs | 38.21 |
| CE Baseline | Single model, 500 epochs | 39.15 |
| CDB loss | Single model, 500 epochs | 43.26 |
| Independent Models | 3 models trained on D for difficulty estimation | 40.68 |
| Standard Ensemble | Averaged accuracy of 3 independent CE models | 41.85 |
| Teacher-Student | Single model distilled from a 3-model ensemble | 41.52 |
| ACVM (K = 3) | Proposed K-fold sub-validation + ADVW + DHL | 43.44 |
| Method | r | Variance (Head) | Variance (Medium) | Variance (Tail) |
|---|---|---|---|---|
| CE Loss | 0.52 | 1.4 | 2.1 | 3.5 |
| CDB Loss | 0.78 | 1.1 | 2.9 | 6.7 |
| ACVM | 0.89 | 0.9 | 1.8 | 3.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Sun, T.; He, W.; Shao, C.; Zheng, S.; Yu, H. ACVM: An Adaptive Combination Validation Mechanism for Long-Tailed Image Recognition. Information 2026, 17, 455. https://doi.org/10.3390/info17050455
Sun T, He W, Shao C, Zheng S, Yu H. ACVM: An Adaptive Combination Validation Mechanism for Long-Tailed Image Recognition. Information. 2026; 17(5):455. https://doi.org/10.3390/info17050455
Chicago/Turabian StyleSun, Tianci, Wanqiu He, Changbin Shao, Shang Zheng, and Hualong Yu. 2026. "ACVM: An Adaptive Combination Validation Mechanism for Long-Tailed Image Recognition" Information 17, no. 5: 455. https://doi.org/10.3390/info17050455
APA StyleSun, T., He, W., Shao, C., Zheng, S., & Yu, H. (2026). ACVM: An Adaptive Combination Validation Mechanism for Long-Tailed Image Recognition. Information, 17(5), 455. https://doi.org/10.3390/info17050455

