Knowledge Distillation in Object Detection: A Survey from CNN to Transformer
Abstract
1. Introduction
2. Applications of Knowledge Distillation in Object Detection
2.1. CNN Based
2.1.1. HEAD
2.1.2. SSD-Det
2.1.3. HierKD
2.1.4. ERD
2.1.5. MLLD
2.1.6. FAM
2.1.7. SKD
2.1.8. ICD
2.1.9. PCD
2.1.10. DKD
2.1.11. KD-Zero
2.1.12. FGD
2.1.13. UniKD
2.1.14. DiffKD
2.1.15. Auto-KD
2.1.16. LD
2.1.17. DIST
2.1.18. CTCP
2.1.19. SED
2.1.20. ScaleKD
2.1.21. PKD
2.2. Transformer Based
2.2.1. KDEP
2.2.2. ZeroShot
2.2.3. Forget
2.2.4. OADP
2.2.5. DETRDistill
2.2.6. DK-DETR
3. Applications of Knowledge Distillation
3.1. Image Classification
3.2. Semantic Segmentation
3.3. Three-Dimensional Reconstruction
3.4. Document Analysis
4. Performance Results
5. Limitations
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hinton, G.E.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
- Sauer, A.; Asaadi, S.; Küch, F. Knowledge distillation meets few-shot learning: An approach for few-shot intent classification within and across domains. In Proceedings of the 4th Workshop on NLP for Conversational AI, Dublin, Ireland, 27 May 2022. [Google Scholar]
- Kim, Y.; Rush, A. Sequence-level knowledge distillation. arXiv 2016, arXiv:1606.07947. [Google Scholar] [CrossRef]
- Alkhulaifi, A.; Alsahli, F.; Ahmad, I. Knowledge distillation in deep learning and its applications. PeerJ Comput. Sci. 2021, 7, e474. [Google Scholar] [CrossRef]
- Kothandaraman, D.; Nambiar, A. Domain adaptive knowledge distillation for driving scene semantic segmentation. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW), Waikola, HI, USA, 5–9 January 2021. [Google Scholar]
- Ni, Y.; Zhang, X.; Li, Y. Knowledge Distillation Facilitates the Lightweight and Efficient Plant Diseases Detection Model. Agriculture 2023, 13, 1664. [Google Scholar]
- ul Haq, Q.M.; Ruan, S.J.; Haq, M.A.; Karam, S.; Shieh, J.L. An Incremental Learning of YOLOv3 Without Catastrophic Forgetting for Smart City Applications. IEEE Consum. Electron. Mag. 2022, 10, 56–63. [Google Scholar] [CrossRef]
- Pham, C.; Hoang, T.; Do, T.T. Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 7 January 2023; pp. 6424–6432. [Google Scholar]
- Zhang, H.; Chen, D.; Wang, C. Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning. In Proceedings of the 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 10–14 July 2023; pp. 1943–1948. [Google Scholar]
- Zhang, H.; Chen, D.; Wang, C. Confidence-Aware Multi-Teacher Knowledge Distillation. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022; pp. 4498–4502. [Google Scholar]
- Yuan, F.; Shou, L.; Pei, J.; Lin, W.; Gong, M.; Fu, Y.; Jiang, D. Reinforced Multi-Teacher Selection for Knowledge Distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021. [Google Scholar]
- Du, S.; You, S.; Li, X.; Wu, J.; Wang, F.; Qian, C.; Zhang, C. Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space. In Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; Curran Associates, Inc.: Dutchess County, NY, USA, 2020; Volume 33, pp. 12345–12355. [Google Scholar]
- Liu, Y.; Zhang, W.; Wang, J. Adaptive multi-teacher multi-level knowledge distillation. Neurocomputing 2020, 415, 106–113. [Google Scholar] [CrossRef]
- Li, L.; Dong, P.; Wei, Z.; Yang, Y. Automated Knowledge Distillation via Monte Carlo Tree Search. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 17413–17424. [Google Scholar]
- Liu, Y.; Chen, K.; Liu, C.; Qin, Z.; Luo, Z.; Wang, J. Structured Knowledge Distillation for Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Liu, Y.; Shu, C.; Wang, J.; Shen, C. Structured Knowledge Distillation for Dense Prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 7035–7049. [Google Scholar] [CrossRef]
- Yang, C.; Zhou, H.; An, Z.; Jiang, X.; Xu, Y.; Zhang, Q. Cross-Image Relational Knowledge Distillation for Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 12319–12328. [Google Scholar]
- Yang, Z.; Li, Z.; Shao, M.; Shi, D.; Yuan, Z.; Yuan, C. Masked Generative Distillation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
- Shu, C.; Liu, Y.; Gao, J.; Yan, Z.; Shen, C. Channel-Wise Knowledge Distillation for Dense Prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 5311–5320. [Google Scholar]
- Chen, J.; Zhu, D.; Qian, G.; Ghanem, B.; Yan, Z.; Zhu, C.; Xiao, F.; Culatana, S.C.; Elhoseiny, M. Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 699–710. [Google Scholar]
- Liu, L.; Wang, Z.; Phan, M.H.; Zhang, B.; Ge, J.; Liu, Y. BPKD: Boundary Privileged Knowledge Distillation for Semantic Segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, Hawaii, 7–11 January 2024; pp. 1062–1072. [Google Scholar]
- Wang, Y.; Zhou, W.; Jiang, T.; Bai, X.; Xu, Y. Intra-class Feature Variation Distillation for Semantic Segmentation. In Proceedings of the Computer Vision—ECCV Cham, Switzerland, 23–28 August 2020. [Google Scholar]
- Lin, S.; Xie, H.; Wang, B.; Yu, K.; Chang, X.; Liang, X.; Wang, G. Knowledge Distillation via the Target-Aware Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 10915–10924. [Google Scholar]
- Berrada, T.; Couprie, C.; Alahari, K.; Verbeek, J. Guided Distillation for Semi-Supervised Instance Segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, Hawaii, 7–11 January 2024; pp. 475–483. [Google Scholar]
- Xu, Y.; Yang, Y.; Zhang, L. Multi-Task Learning with Knowledge Distillation for Dense Prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 21550–21559. [Google Scholar]
- Yang, Z.; Li, R.; Ling, E.; Zhang, C.; Wang, Y.; Huang, D.; Ma, K.T.; Hur, M.; Lin, G. Label-Guided Knowledge Distillation for Continual Semantic Segmentation on 2D Images and 3D Point Clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 18601–18612. [Google Scholar]
- Ji, D.; Wang, H.; Tao, M.; Huang, J.; Hua, X.S.; Lu, H. Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 16876–16885. [Google Scholar]
- Phan, M.H.; Ta, T.A.; Phung, S.L.; Tran-Thanh, L.; Bouzerdoum, A. Class Similarity Weighted Knowledge Distillation for Continual Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 16866–16875. [Google Scholar]
- Fan, J.; Li, C.; Liu, X.; Song, M.; Yao, A. Augmentation-Free Dense Contrastive Knowledge Distillation for Efficient Semantic Segmentation. In Advances in Neural Information Processing Systems; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2023; Volume 36, pp. 51359–51370. [Google Scholar]
- Baek, D.; Oh, Y.; Lee, S.; Lee, J.; Ham, B. Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation. In Advances in Neural Information Processing Systems; Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2022; Volume 35, pp. 10380–10392. [Google Scholar]
- Lin, J.; Yuan, Y.; Shao, T.; Zhou, K. Towards High-Fidelity 3D Face Reconstruction From In-the-Wild Images Using Graph Convolutional Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Wang, S.; Clark, R.; Wen, H.; Trigoni, N. Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 2043–2050. [Google Scholar]
- Jackson, A.S.; Bulat, A.; Argyriou, V.; Tzimiropoulos, G. Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1031–1039. [Google Scholar]
- Zheng, H.; Cao, D.; Xu, J.; Ai, R.; Gu, W.; Yang, Y.; Liang, Y. Distilling Temporal Knowledge with Masked Feature Reconstruction for 3D Object Detection. arXiv 2024, arXiv:2401.01918. [Google Scholar] [CrossRef]
- Wang, J.; Xu, H.; Hu, X.; Leng, B. IFKD: Implicit field knowledge distillation for single view reconstruction. Math. Biosci. Eng. 2023, 20, 13864–13880. [Google Scholar] [CrossRef]
- Yang, J.; Shi, S.; Ding, R.; Wang, Z.; Qi, X. Towards Efficient 3D Object Detection with Knowledge Distillation. In Advances in Neural Information Processing Systems; Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 28 November–9 December 2022; Volume 35, pp. 21300–21313.
- Zhang, L.; Dong, R.; Tai, H.S.; Ma, K. PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 19–24 June 2023; pp. 21791–21801. [Google Scholar]
- Landeghem, J.V.; Maity, S.; Banerjee, A.; Blaschko, M.B.; Moens, M.F.; Llad’os, J.; Biswas, S. DistilDoc: Knowledge Distillation for Visually-Rich Document Applications. In Document Analysis and Recognition—ICDAR 2024: 18th International Conference, Athens, Greece, 30 August–4 September 2024; Proceedings, Part IV; Springer: Berlin/Heidelberg, Germany, 2024; pp. 195–217. [Google Scholar] [CrossRef]
- Banerjee, A.; Biswas, S.; Llad’ós, J.; Pal, U. GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation. In Document Analysis and Recognition—ICDAR 2024: 18th International Conference, Athens, Greece, 30 August–4 September 2024; Proceedings, Part III; Springer: Berlin/Heidelberg, Germany, 2024; pp. 354–373. [Google Scholar] [CrossRef]
- Chen, X.; He, B.; Hui, K.; Sun, L.; Sun, Y. Simplified TinyBERT: Knowledge Distillation for Document Retrieval. arXiv 2020, arXiv:2009.07531. [Google Scholar]
- Ma, C.; Zhang, Y.; Tu, M.; Zhao, Y.; Zhou, Y.; Zong, C. Multi-teacher Knowledge Distillation for End-to-End Text Image Machine Translation. In Proceedings of the Document Analysis and Recognition—ICDAR 2023; Fink, G.A., Jain, R., Kise, K., Zanibbi, R., Eds.; Springer: Cham, Switzerland, 2023; pp. 484–501. [Google Scholar]
- Elhanashi, A.; Dini, P.; Saponara, S.; Zheng, Q. Integration of Deep Learning into the IoT: A Survey of Techniques and Challenges for Real-World Applications. Electronics 2023, 12, 4925. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems; Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 7–12 December 2015; Volume 28. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 10778–10787. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. arXiv 2017, arXiv:1612.03144. Available online: http://arxiv.org/abs/1612.03144 (accessed on 26 November 2025).
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv 2016, arXiv:1506.02640. Available online: http://arxiv.org/abs/1506.02640 (accessed on 26 November 2025).
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Lecture Notes in Computer Science; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. arXiv 2018, arXiv:1703.06870. Available online: http://arxiv.org/abs/1703.06870 (accessed on 26 November 2025).
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. Available online: http://arxiv.org/abs/2010.11929 (accessed on 26 November 2025).
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M., Eds.; Springer: Cham, Switzerland, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 9626–9635. [Google Scholar]
- Shehzadi, T.; Hashmi, K.A.; Sarode, S.; Stricker, D.; Afzal, M.Z. STEP-DETR: Advancing DETR-based Semi-Supervised Object Detection with Super Teacher and Pseudo-Label Guided Text Queries. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, HI, USA, 12–19 October 2025; pp. 3069–3079. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-Shot Refinement Neural Network for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Li, Y.; Chen, Y.; Wang, N.; Zhang, Z. Scale-Aware Trident Networks for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Shi, J.; Ouyang, W.; et al. Hybrid Task Cascade for Instance Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv 2021, arXiv:2010.04159. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Vincent, O.R.; Folorunso, O. A Descriptive Algorithm for Sobel Image Edge Detection. In Proceedings of the InSITE 2009: Informing Science + IT Education Conference, Macon, GA, USA, 12–15 June 2009. [Google Scholar]
- Canny, J. A Computational Approach To Edge Detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, PAMI-8, 679–698. [Google Scholar] [CrossRef]
- Harris, C.G.; Stephens, M.J. A Combined Corner and Edge Detector. In Proceedings of the Alvey Vision Conference, London, UK, 1–3 September 1988. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; Volume 1, pp. 886–893. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V.N. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Rodriguez-Conde, I.; Campos, C.; Fdez-Riverola, F. Optimized convolutional neural network architectures for efficient on-device vision-based object detection. Neural Comput. Appl. 2021, 34, 10469–10501. [Google Scholar] [CrossRef]
- Du, Z.; Zhang, R.; Chang, M.F.; Zhang, X.; Liu, S.; Chen, T.; Chen, Y. Distilling Object Detectors with Feature Richness. In Proceedings of the Neural Information Processing Systems, Red Hook, NY, USA, 6–14 December 2021; Volume 34, pp. 5213–5224. [Google Scholar]
- Wang, L.; Yoon, K.J. Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 3048–3068. [Google Scholar] [CrossRef] [PubMed]
- Chen, G.; Choi, W.; Yu, X.; Han, T.X.; Chandraker, M. Learning Efficient Object Detection Models with Knowledge Distillation. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Zhang, F.; Zhu, X.; Ye, M. Fast Human Pose Estimation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 20 June 2019; pp. 3512–3521. [Google Scholar]
- Meng, Z.; Li, J.; Zhao, Y.; Gong, Y. Conditional Teacher-student Learning. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 6445–6449. [Google Scholar]
- Kim, S.; Kim, H.E. Transferring Knowledge to Smaller Network with Class-Distance Loss. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Romero, A.; Ballas, N.; Kahou, S.E.; Chassang, A.; Gatta, C.; Bengio, Y. FitNets: Hints for Thin Deep Nets. In Proceedings of the ICLR, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Yim, J.; Joo, D.; Bae, J.; Kim, J. Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. In Proceedings of the CVPR, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Zagoruyko, S.; Komodakis, N. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. In Proceedings of the ICLR, Toulon, France, 24–26 April 2017. [Google Scholar]
- Yim, J.; Joo, D.; Bae, J.; Kim, J. Paraphrasing Complex Network: Network Compression via Factor Transfer. In Proceedings of the NIPS, Montreal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Heo, B.; Lee, M.; Yun, S.; Choi, J.Y. Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons. In Proceedings of the AAAI, Honolulu, HI, USA, 27 January–1 February 2019. [Google Scholar]
- Zhou, G.; Fan, Y.; Cui, R.; Bian, W.; Zhu, X.; Gai, K. Rocket Launching: A Universal and Efficient Framework for Training Well-performing Light Net. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Liu, J.; Wen, D.; Gao, H.; Tao, W.; Chen, T.W.; Osa, K.; Kato, M. Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Shen, C.; Wang, X.; Song, J.; Sun, L.; Song, M. Amalgamating Knowledge towards Comprehensive Classification. Proc. AAAI Conf. Artif. Intell. 2019, 33, 3068–3075. [Google Scholar] [CrossRef]
- Heo, B.; Kim, J.; Yun, S.; Park, H.; Kwak, N.; Choi, J.Y. A Comprehensive Overhaul of Feature Distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, South Korea, 27 October–2 November 2019. [Google Scholar]
- Xu, K.; Rui, L.; Li, Y.; Gu, L. Feature Normalized Knowledge Distillation for Image Classification. In Proceedings of the Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer: Cham, Switzerland, 2020; pp. 664–680. [Google Scholar]
- Guan, Y.; Zhao, P.; Wang, B.; Zhang, Y.; Yao, C.; Bian, K.; Tang, J. Differentiable Feature Aggregation Search for Knowledge Distillation. arXiv 2020, arXiv:2008.00506. [Google Scholar] [CrossRef]
- Yang, J.; Martínez, B.; Bulat, A.; Tzimiropoulos, G. Knowledge distillation via adaptive instance normalization. arXiv 2020, arXiv:2003.04289. [Google Scholar] [CrossRef]
- Wang, X.; Fu, T.; Liao, S.; Wang, S.; Lei, Z.; Mei, T. Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020. [Google Scholar]
- Passban, P.; Wu, Y.; Rezagholizadeh, M.; Liu, Q. ALP-KD: Attention-Based Layer Projection for Knowledge Distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
- Chen, D.; Mei, J.P.; Zhang, Y.; Wang, C.; Wang, Z.; Feng, Y.; Chen, C. Cross-Layer Distillation with Semantic Calibration. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
- Yim, J.; Joo, D.; Bae, J.; Kim, J. A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 7130–7138. [Google Scholar]
- You, S.; Xu, C.; Xu, C.; Tao, D. Learning from Multiple Teacher Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017. [Google Scholar]
- Zhang, C.; Peng, Y. Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification. In Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018. [Google Scholar]
- Chen, Y.; Wang, N.; Zhang, Z. DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer. arXiv 2017, arXiv:1707.01220. [Google Scholar] [CrossRef]
- Lee, S.; Song, B.C. Graph-based Knowledge Distillation by Multi-head Attention Network. In Proceedings of the British Machine Vision Conference, Cardiff, UK, 9–12 September 2019. [Google Scholar]
- Park, W.; Kim, D.; Lu, Y.; Cho, M. Relational Knowledge Distillation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 29 June 2019; pp. 3962–3971. [Google Scholar]
- Liu, Y.; Cao, J.; Li, B.; Yuan, C.; Hu, W.; Li, Y.; feng Duan, Y. Knowledge Distillation via Instance Relationship Graph. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 29 June 2019; pp. 7089–7097. [Google Scholar]
- Tung, F.; Mori, G. Similarity-Preserving Knowledge Distillation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1365–1374. [Google Scholar]
- Peng, B.; Jin, X.; Liu, J.; Zhou, S.; Wu, Y.; Liu, Y.; Li, D.; Zhang, Z. Correlation Congruence for Knowledge Distillation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5006–5015. [Google Scholar]
- Yu, L.; Yazici, V.O.; Liu, X.; van de Weijer, J.; Cheng, Y.; Ramisa, A. Learning Metrics From Teachers: Compact Networks for Image Embedding. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2902–2911. [Google Scholar]
- Passalis, N.; Tzelepi, M.; Tefas, A. Probabilistic Knowledge Transfer for Lightweight Deep Representation Learning. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 2030–2039. [Google Scholar] [CrossRef] [PubMed]
- Passalis, N.; Tzelepi, M.; Tefas, A. Heterogeneous Knowledge Distillation Using Information Flow Modeling. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2336–2345. [Google Scholar]
- Chen, H.; Wang, Y.; Xu, C.; Xu, C.; Tao, D. Learning Student Networks via Feature Embedding. IEEE Trans. Neural Netw. Learn. Syst. 2018, 32, 25–35. [Google Scholar] [CrossRef]
- Jiao, X.; Yin, Y.; Shang, L.; Jiang, X.; Chen, X.; Li, L.; Wang, F.; Liu, Q. TinyBERT: Distilling BERT for Natural Language Understanding. arXiv 2019, arXiv:1909.10351. [Google Scholar]
- Tang, R.; Lu, Y.; Liu, L.; Mou, L.; Vechtomova, O.; Lin, J.J. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks. arXiv 2019, arXiv:1903.12136. [Google Scholar] [CrossRef]
- Allen-Zhu, Z.; Li, Y. Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning. arXiv 2020, arXiv:2012.09816. [Google Scholar]
- Li, Z.; Hoiem, D. Learning without Forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 2935–2947. [Google Scholar] [CrossRef]
- Peng, Z.; Li, Z.; Zhang, J.; Li, Y.; Qi, G.J.; Tang, J. Few-Shot Image Recognition with Knowledge Transfer. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 441–449. [Google Scholar]
- Bagherinezhad, H.; Horton, M.; Rastegari, M.; Farhadi, A. Label Refinery: Improving ImageNet Classification through Label Progression. arXiv 2018, arXiv:1805.02641. [Google Scholar] [CrossRef]
- Chen, W.C.; Chang, C.C.; Lu, C.Y.; Lee, C.R. Knowledge Distillation with Feature Maps for Image Classification. In Proceedings of the Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018. [Google Scholar]
- Wang, J.; Gou, L.; Zhang, W.; Yang, H.; Shen, H.W. DeepVID: Deep Visual Interpretation and Diagnosis for Image Classifiers via Knowledge Distillation. IEEE Trans. Vis. Comput. Graph. 2019, 25, 2168–2180. [Google Scholar] [CrossRef]
- Mukherjee, P.; Das, A.; Bhunia, A.K.; Roy, P.P. Cogni-Net: Cognitive Feature Learning Through Deep Visual Perception. In Proceedings of the 2019 IEEE International Conference on Image Processing, Athens, Greece, 7–10 October 2018; pp. 4539–4543. [Google Scholar]
- Zhu, M.; Han, K.; Zhang, C.; Lin, J.; Wang, Y. Low-resolution Visual Recognition via Deep Feature Distillation. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 3762–3766. [Google Scholar]
- Rijk, P.D.; Schneider, L.; Cordts, M.; Gavrila, D. Structural Knowledge Distillation for Object Detection. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Cao, W.; Zhang, Y.; Gao, J.; Cheng, A.; Cheng, K.; Cheng, J. PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Zhao, B.; Cui, Q.; Song, R.; Qiu, Y.; Liang, J. Decoupled Knowledge Distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 11953–11962. [Google Scholar]
- Jin, Y.; Wang, J.; Lin, D. Multi-Level Logit Distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 19–24 June 2023; pp. 24276–24285. [Google Scholar]
- Feng, T.; Wang, M.; Yuan, H. Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 9427–9436. [Google Scholar]
- Kang, Z.; Zhang, P.; Zhang, X.; Sun, J.; Zheng, N. Instance-Conditional Knowledge Distillation for Object Detection. In Advances in Neural Information Processing Systems; Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2021; Volume 34, pp. 16468–16480. [Google Scholar]
- Li, L.; Dong, P.; Li, A.; Wei, Z.; Yang, Y. KD-Zero: Evolving Knowledge Distiller for Any Teacher-Student Pairs. In Advances in Neural Information Processing Systems; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; Curran Associates, Inc.: New Orleans, LA, USA, 2023; Volume 36, pp. 69490–69504. [Google Scholar]
- Wang, L.; Li, X.; Liao, Y.; Jiang, Z.; Wu, J.; Wang, F.; Qian, C.; Liu, S. HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
- Dai, X.; Jiang, Z.; Wu, Z.; Bao, Y.; Wang, Z.; Liu, S.; Zhou, E. General Instance Distillation for Object Detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 7838–7847. [Google Scholar]
- Guo, J.; Han, K.; Wang, Y.; Wu, H.; Chen, X.; Xu, C.; Xu, C. Distilling Object Detectors via Decoupled Features. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 2154–2164. [Google Scholar]
- Wang, T.; Yuan, L.; Zhang, X.; Feng, J. Distilling Object Detectors with Fine-Grained Feature Imitation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Yang, Z.; Li, Z.; Jiang, X.; Gong, Y.; Yuan, Z.; Zhao, D.; Yuan, C. Focal and Global Knowledge Distillation for Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 4643–4652. [Google Scholar]
- Zhang, L.; Ma, K. Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors. In Proceedings of the International Conference on Learning Representations, Virtual Event, 3–7 May 2021. [Google Scholar]
- Wu, D.; Chen, P.; Yu, X.; Li, G.; Han, Z.; Jiao, J. Spatial Self-Distillation for Object Detection with Inaccurate Bounding Boxes. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 6855–6865. [Google Scholar]
- He, Y.; Zhu, C.; Wang, J.; Savvides, M.; Zhang, X. Bounding Box Regression with Uncertainty for Accurate Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Bilen, H.; Vedaldi, A. Weakly Supervised Deep Detection Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Liu, C.; Wang, K.; Lu, H.; Cao, Z.; Zhang, Z. Robust Object Detection with Inaccurate Bounding Boxes. arXiv 2022, arXiv:2207.09697. Available online: http://arxiv.org/abs/2207.09697 (accessed on 26 November 2025).
- Chen, P.; Yu, X.; Han, X.; Hassan, N.; Wang, K.; Li, J.; Zhao, J.; Shi, H.; Han, Z.; Ye, Q. Point-to-Box Network for Accurate Object Detection via Single Point Supervision. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
- Chadwick, S.; Newman, P. Training Object Detectors with Noisy Data. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 1319–1325. [Google Scholar]
- Li, J.; Xiong, C.; Socher, R.; Hoi, S.C.H. Towards Noise-resistant Object Detection with Noisy Annotations. arXiv 2020, arXiv:2003.01285. [Google Scholar] [CrossRef]
- Mao, J.; Yu, Q.; Yamakata, Y.; Aizawa, K. Noisy Annotation Refinement for Object Detection. arXiv 2021, arXiv:2110.10456. [Google Scholar] [CrossRef]
- Xu, Y.; Zhu, L.; Yang, Y.; Wu, F. Training Robust Object Detectors From Noisy Category Labels and Imprecise Bounding Boxes. IEEE Trans. Image Process. 2021, 30, 5782–5792. [Google Scholar] [CrossRef]
- Ma, Z.; Luo, G.; Gao, J.; Li, L.; Chen, Y.; Wang, S.; Zhang, C.; Hu, W. Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 14074–14083. [Google Scholar]
- Zareian, A.; Rosa, K.D.; Hu, D.H.; Chang, S.F. Open-Vocabulary Object Detection Using Captions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 14393–14402. [Google Scholar]
- Bansal, A.; Sikka, K.; Sharma, G.; Chellappa, R.; Divakaran, A. Zero-Shot Object Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Gu, X.; Lin, T.Y.; Kuo, W.; Cui, Y. Open-vocabulary Object Detection via Vision and Language Knowledge Distillation. In Proceedings of the International Conference on Learning Representations, Virtual Event, 25–29 April 2022. [Google Scholar]
- Li, Z.; Yao, L.; Zhang, X.; Wang, X.; Kanhere, S.S.; Zhang, H. Zero-Shot Object Detection with Textual Descriptions. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January 27–1 February 2019. [Google Scholar]
- Zheng, Y.; Huang, R.; Han, C.; Huang, X.; Cui, L. Background Learnable Cascade for Zero-Shot Object Detection. In Proceedings of the Asian Conference on Computer Vision (ACCV), Kyoto, Japan, 30 November–4 December 2020. [Google Scholar]
- Zheng, Y.; Wu, J.; Qin, Y.; Zhang, F.; Cui, L. Zero-Shot Instance Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 2593–2602. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models From Natural Language Supervision. arXiv 2021, arXiv:2103.00020. Available online: http://arxiv.org/abs/2103.00020 (accessed on 26 November 2025).
- Li, D.; Tasci, S.; Ghosh, S.; Zhu, J.; Zhang, J.; Heck, L. RILOD: Near real-time incremental learning for object detection at the edge. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing, Bellevue, WA, USA, 7–9 November 2019. [Google Scholar]
- Zheng, Z.; Ye, R.; Wang, P.; Wang, J.; Ren, D.; Zuo, W. Localization Distillation for Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10070–10083. [Google Scholar] [CrossRef] [PubMed]
- Cho, J.H.; Hariharan, B. On the Efficacy of Knowledge Distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
- Furlanello, T.; Lipton, Z.C.; Tschannen, M.; Itti, L.; Anandkumar, A. Born Again Neural Networks. arXiv 2018, arXiv:1805.04770. Available online: http://arxiv.org/abs/1805.04770 (accessed on 26 November 2025).
- Mirzadeh, S.I.; Farajtabar, M.; Li, A.; Levine, N.; Matsukawa, A.; Ghasemzadeh, H. Improved Knowledge Distillation via Teacher Assistant. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019. [Google Scholar]
- Yang, C.; Xie, L.; Su, C.; Yuille, A.L. Snapshot Distillation: Teacher-Student Optimization in One Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Zhang, Y.; Xiang, T.; Hospedales, T.M.; Lu, H. Deep Mutual Learning. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4320–4328. [Google Scholar]
- Pham, C.; Nguyen, V.A.; Le, T.; Phung, D.; Carneiro, G.; Do, T.T. Frequency Attention for Knowledge Distillation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 4–8 January 2024; pp. 2277–2286. [Google Scholar]
- Ji, M.; Heo, B.; Park, S. Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching. arXiv 2021, arXiv:2102.02973. [Google Scholar] [CrossRef]
- Shin, S.; Lee, J.; Lee, J.; Yu, Y.; Lee, K. Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
- Heideman, M.; Johnson, D.; Burrus, C. Gauss and the history of the fast Fourier transform. Arch. Hist. Exact Sci. 1985, 34, 265–277. [Google Scholar] [CrossRef]
- Li, Q.; Jin, S.; Yan, J. Mimicking Very Efficient Network for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Huang, J.; Guo, Z. Pixel-Wise Contrastive Distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–10 October 2023; pp. 16359–16369. [Google Scholar]
- Lee, S.H.; Kim, D.H.; Song, B.C. Self-supervised Knowledge Distillation Using Singular Value Decomposition. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Hadsell, R.; Chopra, S.; LeCun, Y. Dimensionality Reduction by Learning an Invariant Mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 2, pp. 1735–1742. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Huang, T.; You, S.; Wang, F.; Qian, C.; Xu, C. Knowledge Distillation from a Stronger Teacher. In Advances in Neural Information Processing Systems; Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates, Inc.: New York, NY, USA, 2022; Volume 35, pp. 33716–33727. [Google Scholar]
- Zhou, H.; Song, L.; Chen, J.; Zhou, Y.; Wang, G.; Yuan, J.; Zhang, Q. Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective. arXiv 2021, arXiv:2102.00650. [Google Scholar] [CrossRef]
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1971–1980. [Google Scholar]
- Lao, S.; Song, G.; Liu, B.; Liu, Y.; Yang, Y. UniKD: Universal Knowledge Distillation for Mimicking Homogeneous or Heterogeneous Object Detectors. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–10 October 2023; pp. 6362–6372. [Google Scholar]
- Lu, X.; Li, Q.; Li, B.; Yan, J. MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object Detection. arXiv 2020, arXiv:2009.11528. [Google Scholar]
- Yao, L.; Pi, R.; Xu, H.; Zhang, W.; Li, Z.; Zhang, T. G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 3571–3580. [Google Scholar]
- Huang, T.; Zhang, Y.; Zheng, M.; You, S.; Wang, F.; Qian, C.; Xu, C. Knowledge Diffusion for Distillation. In Advances in Neural Information Processing Systems; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; Curran Associates, Inc.: New Orleans, LA, USA, 2023; Volume 36, pp. 65299–65316. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. arXiv 2020, arXiv:2006.11239. [Google Scholar] [CrossRef]
- Song, J.; Meng, C.; Ermon, S. Denoising Diffusion Implicit Models. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
- He, X.; Zhao, K.; Chu, X. AutoML: A Survey of the State-of-the-Art. arXiv 2019, arXiv:1908.00709. [Google Scholar] [CrossRef]
- Świechowski, M.; Godlewski, K.; Sawicki, B.; Mańdziuk, J. Monte Carlo Tree Search: A review of recent modifications and applications. Artif. Intell. Rev. 2022, 56, 2497–2562. [Google Scholar] [CrossRef]
- Zheng, Z.; Ye, R.; Wang, P.; Ren, D.; Zuo, W.; Hou, Q.; Cheng, M.M. Localization Distillation for Dense Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 9407–9416. [Google Scholar]
- Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. In Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; Curran Associates, Inc.: New York, NY, USA, 2020; Volume 33, pp. 21002–21012. [Google Scholar]
- Qiu, H.; Li, H.; Wu, Q.; Shi, H. Offset Bin Classification Network for Accurate Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 13185–13194. [Google Scholar] [CrossRef]
- Son, W.; Na, J.; Choi, J.; Hwang, W. Densely Guided Knowledge Distillation Using Multiple Teacher Assistants. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 9395–9404. [Google Scholar]
- Yang, L.; Zhou, X.; Li, X.; Qiao, L.; Li, Z.; Yang, Z.; Wang, G.; Li, X. Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–10 October 2023; pp. 17175–17184. [Google Scholar]
- Guo, Q.; Mu, Y.; Chen, J.; Wang, T.; Yu, Y.; Luo, P. Scale-Equivalent Distillation for Semi-Supervised Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 14522–14531. [Google Scholar]
- Liu, Y.C.; Ma, C.Y.; He, Z.; Kuo, C.W.; Chen, K.; Zhang, P.; Wu, B.; Kira, Z.; Vajda, P. Unbiased Teacher for Semi-Supervised Object Detection. arXiv 2021, arXiv:2102.09480. Available online: http://arxiv.org/abs/2102.09480 (accessed on 26 November 2025).
- Sohn, K.; Zhang, Z.; Li, C.L.; Zhang, H.; Lee, C.Y.; Pfister, T. A Simple Semi-Supervised Learning Framework for Object Detection. arXiv 2020, arXiv:2005.04757. [Google Scholar] [CrossRef]
- Yang, Q.; Wei, X.; Wang, B.; Hua, X.S.; Zhang, L. Interactive Self-Training with Mean Teachers for Semi-Supervised Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 5941–5950. [Google Scholar]
- Zhou, Q.; Yu, C.; Wang, Z.; Qian, Q.; Li, H. Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 4081–4090. [Google Scholar]
- Zhu, Y.; Zhou, Q.; Liu, N.; Xu, Z.; Ou, Z.; Mou, X.; Tang, J. ScaleKD: Distilling Scale-Aware Knowledge in Small Object Detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 19–25 June 2023; pp. 19723–19733. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving Into High Quality Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162. [Google Scholar]
- Yang, Z.; Liu, S.; Hu, H.; Wang, L.; Lin, S. RepPoints: Point Set Representation for Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9656–9665. [Google Scholar]
- He, R.; Sun, S.; Yang, J.; Bai, S.; Qi, X. Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 9161–9171. [Google Scholar]
- Joulin, A.; Van Der Maaten, L.; Jabri, A.; Vasilache, N. Learning Visual Features from Large Weakly Supervised Data. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 67–84. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Kuznetsova, A.; Rom, H.; Alldrin, N.G.; Uijlings, J.R.R.; Krasin, I.; Pont-Tuset, J.; Kamali, S.; Popov, S.; Malloci, M.; Kolesnikov, A.; et al. The Open Images Dataset V4. Int. J. Comput. Vis. 2018, 128, 1956–1981. [Google Scholar] [CrossRef]
- Mahajan, D.; Girshick, R.; Ramanathan, V.; He, K.; Paluri, M.; Li, Y.; Bharambe, A.; van der Maaten, L. Exploring the Limits of Weakly Supervised Pretraining. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Sun, C.; Shrivastava, A.; Singh, S.; Gupta, A. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 843–852. [Google Scholar] [CrossRef]
- Klema, V.; Laub, A. The singular value decomposition: Its computation and some applications. IEEE Trans. Autom. Control 1980, 25, 164–176. [Google Scholar] [CrossRef]
- Tomani, C.; Cremers, D.; Buettner, F. Parameterized Temperature Scaling for Boosting the Expressive Power in Post-Hoc Uncertainty Calibration. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
- Liu, Z.; Hu, X.; Nevatia, R. Efficient Feature Distillation for Zero-Shot Annotation Object Detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 1–5 January 2024; pp. 893–902. [Google Scholar]
- Kang, M.; Zhang, J.; Zhang, J.; Wang, X.; Chen, Y.; Ma, Z.; Huang, X. Alleviating Catastrophic Forgetting of Incremental Object Detection via Within-Class and Between-Class Knowledge Distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–7 October 2023; pp. 18894–18904. [Google Scholar]
- Zeng, G.; Chen, Y.; Cui, B.; Yu, S. Continual learning of context-dependent processing in neural networks. Nat. Mach. Intell. 2019, 1, 364–372. [Google Scholar] [CrossRef]
- Menezes, A.G.; de Moura, G.; Alves, C.; de Carvalho, A.C.P.L.F. Continual Object Detection: A Review of Definitions, Strategies, and Challenges. Neural Netw. 2023, 161, 476–493. [Google Scholar] [CrossRef]
- Wang, L.; Liu, Y.; Du, P.; Ding, Z.; Liao, Y.; Qi, Q.; Chen, B.; Liu, S. Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 19–24 June 2023; pp. 11186–11196. [Google Scholar]
- Chang, J.; Wang, S.; Xu, H.M.; Chen, Z.; Yang, C.; Zhao, F. DETRDistill: A Universal Knowledge Distillation Framework for DETR-families. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–7 October 2023; pp. 6898–6908. [Google Scholar]
- Shehzadi, T.; Hashmi, K.A.; Liwicki, M.; Stricker, D.; Afzal, M.Z. Object Detection with Transformers: A Review. Sensors 2025, 25, 6025. [Google Scholar] [CrossRef]
- Shehzadi, T.; Hashmi, K.A.; Stricker, D.; Afzal, M.Z. Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vienna, Austria, 17–22 June, 2024; pp. 5840–5850. [Google Scholar]
- Liu, Z.; Liu, Q.; Liu, T.; Wang, Y.; Wen, W. Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 860–868. [Google Scholar]
- Li, L.; Miao, J.; Shi, D.; Tan, W.; Ren, Y.; Yang, Y.; Pu, S. Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–7 October 2023; pp. 6501–6510. [Google Scholar]
- Jia, C.; Yang, Y.; Xia, Y.; Chen, Y.T.; Parekh, Z.; Pham, H.; Le, Q.V.; Sung, Y.H.; Li, Z.; Duerig, T. Scaling Up Visual and Vision-Language Representation Learning with Noisy Text Supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML), Virtual, 18–24 July 2021; Proceedings of Machine Learning Research. Volume 139, pp. 4904–4916. [Google Scholar]
- Shehzadi, T.; Ifza; Stricker, D.; Afzal, M.Z. Semi-Supervised Object Detection: A Survey on Progress from CNN to Transformer. arXiv 2024, arXiv:2407.08460. [Google Scholar] [CrossRef]
- Yousaf, A.; Shehzadi, T.; Farooq, A.; Ilyas, K. Protein active site prediction for early drug discovery and designing. Int. Rev. Appl. Sci. Eng. 2021, 13, 98–105. [Google Scholar] [CrossRef]
- Shehzadi, T.; Majid, A.; Hameed, M.; Farooq, A.; Yousaf, A. Intelligent predictor using cancer-related biologically information extraction from cancer transcriptomes. In Proceedings of the 2020 International Symposium on Recent Advances in Electrical Engineering & Computer Sciences (RAEE & CS), Islamabad, Pakistan, 12–14 December 2020; Volume 5, pp. 1–5. [Google Scholar] [CrossRef]
- Shehzadi, T.; Hashmi, K.A.; Pagani, A.; Liwicki, M.; Stricker, D.; Afzal, M.Z. Mask-Aware Semi-Supervised Object Detection in Floor Plans. Appl. Sci. 2022, 12, 9398. [Google Scholar] [CrossRef]
- Shehzadi, T.; Ifza, I.; Stricker, D.; Afzal, M.Z. FD-SSD: Semi-supervised Detection of Bone Fenestration and Dehiscence in Intraoral Images. In Proceedings of the Medical Image Understanding and Analysis (MIUA), Leeds, UK, 15–17 July 2025; p. 15917. [Google Scholar]
- Xie, Q.; Hovy, E.H.; Luong, M.T.; Le, Q.V. Self-Training with Noisy Student Improves ImageNet Classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10684–10695. [Google Scholar] [CrossRef]
- Zhu, Y.; Wang, Y. Student Customized Knowledge Distillation: Bridging the Gap Between Student and Teacher. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 5037–5046. [Google Scholar] [CrossRef]
- Beyer, L.; Zhai, X.; Royer, A.; Markeeva, L.; Anil, R.; Kolesnikov, A. Knowledge distillation: A good teacher is patient and consistent. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10915–10924. [Google Scholar] [CrossRef]
- Li, Z.; Li, X.; Yang, L.; Zhao, B.; Song, R.; Luo, L.; Li, J.; Yang, J. Curriculum Temperature for Knowledge Distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 1504–1512. [Google Scholar] [CrossRef]
- Qiu, Z.; Ma, X.; Yang, K.; Liu, C.; Hou, J.; Yi, S.; Ouyang, W. Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Park, J.; No, A. Prune Your Model Before Distill It. In Proceedings of the Computer Vision—ECCV 2022, Tel Aviv, Israel, 23–27 October 2022; pp. 120–136. [Google Scholar]
- Wang, X.; Li, Y.; Chen, Z.; Zhao, H. SemiTabDETR: End-to-End Semi-supervised Table Detection with Transformer-Based Enhanced Query Approach. In Proceedings of the Document Analysis and Recognition—ICDAR, Wuhan, China, 16–21 September 2025; pp. 259–279. [Google Scholar]
- Shehzadi, T.; Stricker, D.; Afzal, M.Z. A Hybrid Approach for Document Layout Analysis in Document Images. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), Athens, Greece, 30 August–4 September 2024; pp. 21–39. [Google Scholar]
- Shehzadi, T.; Hashmi, K.A.; Stricker, D.; Liwicki, M.; Afzal, M.Z. Towards End-to-End Semi-Supervised Table Detection with Deformable Transformer. In Proceedings of the Document Analysis and Recognition—ICDAR 2023, San José, CA, USA, 21–26 August 2023; pp. 51–76. [Google Scholar]
- Shehzadi, T.; Ifza, I.; Stricker, D.; Afzal, M.Z. DocSemi: Efficient Document Layout Analysis with Guided Queries. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, HI, USA, 19–23 October 2025; pp. 7536–7546. [Google Scholar]
- Shehzadi, T.; Ifza, I.; Stricker, D.; Afzal, M.Z. Efficient Additive Attention for Transformer-based Semi-supervised Document Layout Analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, HI, USA, 19–25 October 2025; pp. 3495–3503. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.J.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014. [Google Scholar]





























| Methods | Stages | Reference | Teacher | Student | COCO | ||
|---|---|---|---|---|---|---|---|
| mAP | AP50 | AP75 | |||||
| HEAD [120] | One-Stage | ECCV22 | Faster R50 | RetinaNet R18 (31.7) | 36.2 | - | - |
| DIST [160] | NeurIPS22 | RetinaNet-X101 | RetinaNet-R50 (37.4) | 40.1 | 59.4 | 23.2 | |
| SKD [113] | NeurIPS22 | RetinaNet-R101 | RetinaNet-R50 (36.4) | 40.1 | 59.2 | 43.1 | |
| HierKD [135] | CVPR22 | CLIP-R50 | ATSS | - | 43.6 | - | |
| ERD [117] | CVPR22 | GFLV1 R-50 | GFLV1 R-50 | 40.2 | 58.3 | 43.6 | |
| FGD [124] | CVPR22 | RetinaNet-X101 | RetinaNet-R50 (37.4) | 40.7 | - | - | |
| UniKD [163] | ICCV23 | RetinaNet-X101 | RetinaNet-R50 (37.4) | 40.7 | - | - | |
| ICD [118] | NeurIPS21 | FCOS R101 | FCOS R50 | 42.6 | 61.6 | 45.8 | |
| LD [172] | CVPR22 | RetinaNet-R101 | RetinaNet-R50 (36.9) | 42.7 | 60.2 | 46.7 | |
| KD-Zero [119] | NeurIPS23 | RetinaNet-X101 | RetinaNet-R50 (37.4) | 40.9 | 60.4 | 43.5 | |
| DiffKD [166] | NeurIPS23 | RetinaNet-X101 | RetinaNet-R50 (37.4) | 41.4 | 60.7 | 44.0 | |
| Auto-KD [14] | ICCV23 | RetinaNet-X101 | RetinaNet-R50 (37.4) | 41.1 | - | - | |
| ScaleKD [182] | CVPR23 | FOCS R101 | FOCS R50 (38.5) | 44.0 | 61.6 | 41.6 | |
| PKD [114] | NeurIPS22 | TOOD-ResX101 | TOOD-Res50 (42.4) | 45.5 | 62.8 | 49.3 | |
| HEAD [120] | Two-Stage | ECCV22 | Cascade R50 | Faster R18 (33.9) | 36.7 | - | - |
| FAM [150] | WACV24 | Faster R101 | Faster R50 (37.9) | 40.8 | 61.4 | 44.5 | |
| MLLD [116] | CVPR23 | Faster R101 | Faster R50 (37.9) | 40.2 | 61.7 | 44.6 | |
| DKD [115] | CVPR22 | Faster R101 | Faster R50 (37.9) | 40.7 | 61.5 | 44.4 | |
| SKD [113] | NeurIPS22 | Faster ResNeXt101 | Faster R50 (37.4) | 40.9 | 61.0 | 44.9 | |
| PKD [114] | NeurIPS22 | Faster R101 | Faster R50 (37.9) | 40.5 | 60.9 | 44.4 | |
| Auto-KD [14] | ICCV23 | Cascade R101 | Faster R50 (38.4) | 42.4 | - | - | |
| ICD [118] | NeurIPS21 | Faster R101 | Faster R50 (37.9) | 40.9 | - | - | |
| DIST [160] | NeurIPS22 | Cascade R101 | Faster R50 (38.4) | 41.8 | 62.4 | 45.6 | |
| KD-Zero [119] | NeurIPS23 | Cascade R101 | Faster R50 (38.4) | 41.9 | 62.7 | 45.5 | |
| FGD [124] | CVPR22 | Cascade-X101 | Faster-R50 (38.4) | 42.0 | - | - | |
| UniKD [163] | ICCV23 | Cascade R101 | Faster R50 (38.4) | 42.3 | - | - | |
| DiffKD [166] | NeurIPS23 | Faster R101 | Faster R50 (38.4) | 42.4 | 62.9 | 46.4 | |
| CTCP [176] | ICCV23 | GFocal R101 | GFocal R50 (40.1) | 43.2 | 61.6 | 46.9 | |
| SED [177] | CVPR22 | Faster R50 | Faster R50 (38.40) | 43.4 | - | - | |
| ScaleKD [182] | CVPR23 | Cascade R101 | Cascade R50 (41.0) | 44.9 | 63.2 | 48.8 | |
| DK-DETR [202] | Transformer-Based | ICCV23 | VLM | Deformable DETR | 39.4 | 54.3 | 43.0 |
| Forget [194] | ICCV23 | Deformable DETR | Deformable DETR | 39.8 | - | - | |
| DETRDistill [198] | ICCV23 | AdaMixer ResNet-101 | AdaMixer ResNet-50 (42.3) | 44.7 | - | - | |
| Methods | Advantages | Limitations |
|---|---|---|
| HEAD [120] | Effectively bridges the semantic gap between heterogeneous teacher–student detectors | Requires customized algorithm designs for each teacher–student pair, limiting its flexibility |
| DK-DETR [202] | Significantly improves open-vocabulary object detection performance for novel categories without degrading the performance of base categories | Depends on auxiliary queries used solely during training introducing additional complexity and abstraction |
| FAM [150] | Captures global information, enhancing the student’s ability to mimic the teacher’s features | Needs additional computational complexity due to the use of FFT and inverse FFT operations |
| MLLD [116] | Introduces multi-level prediction alignment to enhance logit distillation | Computational complexities and additional hyperparameters present, affecting its practicality for large-scale tasks and industrial applications |
| DKD [115] | Improves the effectiveness of logit distillation by decoupling the distillation of target and non-target class knowledge | The use of this approach is complicated in industrial applications due to the increase in the number of hyperparameters |
| DIST [160] | Enhances knowledge distillation by leveraging both inter-class and intra-class relational knowledge through a correlation-based loss | Mainly targeted homogeneous teacher–student model pairs, leaving cross-architecture distillation largely unexplored |
| SKD [113] | Uses to capture additional relational knowledge in the feature space taking into account the spatial relationships | Its effectiveness may be influenced by specific experimental setups |
| HierKD [135] | Enables the detection of both seen and unseen categories, improving performance in open-vocabulary settings | Requires substantial computational resources due to its dependence on large-scale pre-trained vision-language models |
| ERD [117] | Highly effective in preserving knowledge from old classes during incremental object detection | Limited by its heavy reliance on low-level feature selection and the underutilization of high-level semantic information |
| Forget [194] | Excels in retaining knowledge from old classes during incremental object detection | Adapting this method to CNN-based detectors requires careful design to balance performance and computational efficiency |
| FGD [124] | Effectively focuses on critical pixels and channels while incorporating global contextual information | Relies on precise ground-truth bounding boxes to separate foreground and background |
| UniKD [163] | Transfers knowledge between diverse teacher–student pairs without complex adjustments | Learning from advanced teachers with different architectures does not always improve performance |
| ICD [118] | Able to effectively transfer knowledge useful for both classification and localization of every instance | This approach has high complexity and computational cost, especially during training |
| LD [172] | Improves dense object detection by transferring fine-grained localization knowledge | Relies on a high-performing teacher model, which may not always be available or feasible to train. |
| KD-Zero [119] | Automatically discovers optimal distillation strategies for any teacher–student pairs | Dependent on expert knowledge for initial setup and struggles with mismatched architectural styles between teacher and student models. |
| DiffKD [166] | Aligns student and teacher features through denoising, improving performance across tasks and models. | High computational cost due to the denoising process diffusion models |
| CTCP [176] | Improves performance by addressing protocol inconsistencies without additional costs | The approach reduces classification errors but does not contribute to reducing localization errors |
| SED [177] | Effectively handles large object size variance and class imbalance in semi-supervised object detection. | Increased complexity and computational demands impact scalability and practical implementation |
| DETRDistill [198] | Enhances performance across different DETR models | Requires additional training complexity required to adapt the distillation framework to various DETR models |
| Auto-KD [14] | Efficient selection of distillation strategies using Monte Carlo Tree Search | Encounter difficulties in adapting these strategies to a wide range of model architectures and tasks |
| ScaleKD [182] | Enhances small object detection by transferring scale-aware knowledge into the student model | Struggles to handle scale-aware knowledge transfer for objects of sizes not well-represented in the training data |
| PKD [114] | focuses on relational information between features, reducing the impact of magnitude differences | Corruption of features from downsampling in the teacher model, hindering the student model’s ability to imitate specific information |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Shehzadi, T.; Noor, R.; Ifza, I.; Liwicki, M.; Stricker, D.; Afzal, M.Z. Knowledge Distillation in Object Detection: A Survey from CNN to Transformer. Sensors 2026, 26, 292. https://doi.org/10.3390/s26010292
Shehzadi T, Noor R, Ifza I, Liwicki M, Stricker D, Afzal MZ. Knowledge Distillation in Object Detection: A Survey from CNN to Transformer. Sensors. 2026; 26(1):292. https://doi.org/10.3390/s26010292
Chicago/Turabian StyleShehzadi, Tahira, Rabya Noor, Ifza Ifza, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. 2026. "Knowledge Distillation in Object Detection: A Survey from CNN to Transformer" Sensors 26, no. 1: 292. https://doi.org/10.3390/s26010292
APA StyleShehzadi, T., Noor, R., Ifza, I., Liwicki, M., Stricker, D., & Afzal, M. Z. (2026). Knowledge Distillation in Object Detection: A Survey from CNN to Transformer. Sensors, 26(1), 292. https://doi.org/10.3390/s26010292

