Shared Knowledge Distillation Network for Object Detection
Abstract
:1. Introduction
- Through analysis and exploration of feature gaps and roles in distillation, it is evident that cross-layer feature gaps within the student network are significantly smaller than those between the student and teacher. This observation motivates us to propose a new Shared Knowledge Distillation (Shared-KD) framework.
- Our Shared-KD technique minimizes the shared features between the teacher–student layer and the cross-features within the student. This achieves cross-layer distillation without complex transformations.
- Our Shared-KD outperforms other state-of-the-art feature-distillation methods on various deep models and datasets, achieving superior performance and training acceleration.
2. Related Work
2.1. Object Detection
2.2. Knowledge Distillation
3. Shared Knowledge Distillation
3.1. Knowledge Distillation
3.2. Conventional Cross-Layer Distillation
3.3. Formulation of Shared Knowledge Distillation
Algorithm 1 Shared Knowledge Distillation for Object Detection |
Input: Teacher: T, Student: S, Input: x, label: y, hyper-parameter:
|
4. Experiments
4.1. Experiments on Object Detection
4.2. Instance Segmentation
4.3. Ablation Study
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Gao, J.; Wu, D.; Yin, F.; Kong, Q.; Xu, L.; Cui, S. MetaLoc: Learning to learn wireless localization. IEEE J. Sel. Areas Commun. 2023, 41, 3831–3847. [Google Scholar] [CrossRef]
- Cao, X.; Lyu, Z.; Zhu, G.; Xu, J.; Xu, L.; Cui, S. An overview on over-the-air federated edge learning. arXiv 2024, arXiv:2208.05643v1. [Google Scholar] [CrossRef]
- Sun, W.; Zhao, Y.; Ma, W.; Guo, B.; Xu, L.; Duong, T.Q. Accelerating Convergence of Federated Learning in MEC with Dynamic Community. IEEE Trans. Mob. Comput. 2023, 23, 1769–1784. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. Acm 2012, 60, 84–90. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Li, L.; Li, A. A2-Aug: Adaptive Automated Data Augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 2266–2273. [Google Scholar]
- Romero, A.; Ballas, N.; Kahou, S.E.; Chassang, A.; Gatta, C.; Bengio, Y. Fitnets: Hints for thin deep nets. arXiv 2014, arXiv:1412.6550. [Google Scholar]
- Zagoruyko, S.; Komodakis, N. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv 2016, arXiv:1612.03928. [Google Scholar]
- Dong, P.; Niu, X.; Li, L.; Xie, L.; Zou, W.; Ye, T.; Wei, Z.; Pan, H. Prior-Guided One-shot Neural Architecture Search. arXiv 2022, arXiv:2206.13329. [Google Scholar]
- Zhu, C.; Li, L.; Wu, Y.; Sun, Z. Saswot: Real-time semantic segmentation architecture search without training. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 26 February 2024; Volume 38, pp. 7722–7730. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Kim, J.; Park, S.; Kwak, N. Paraphrasing complex network: Network compression via factor transfer. arXiv 2018, arXiv:1802.04977. [Google Scholar]
- Ahn, S.; Hu, S.X.; Damianou, A.; Lawrence, N.D.; Dai, Z. Variational information distillation for knowledge transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9163–9171. [Google Scholar]
- Tung, F.; Mori, G. Similarity-preserving knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1365–1374. [Google Scholar]
- Heo, B.; Lee, M.; Yun, S.; Choi, J.Y. Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January 2019–1 February 2019; Volume 33, pp. 3779–3787. [Google Scholar]
- Huang, Z.; Wang, N. Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. arXiv 2017, arXiv:1707.01219. [Google Scholar]
- Chen, D.; Mei, J.P.; Zhang, Y.; Wang, C.; Wang, Z.; Feng, Y.; Chen, C. Cross-Layer Distillation with Semantic Calibration. arXiv 2020, arXiv:2012.03236. [Google Scholar] [CrossRef]
- Chung, I.; Park, S.; Kim, J.; Kwak, N. Feature-map-level online adversarial knowledge distillation. In Proceedings of the International Conference on Machine Learning. PMLR, Virtual Event, 13–18 July 2020; pp. 2006–2015. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Li, L. Self-regulated feature learning via teacher-free feature distillation. In Proceedings of the European Conference on Computer Vision. Springer, Tel Aviv, Israel, 3–27 October 2022; pp. 347–363. [Google Scholar]
- Dong, P.; Li, L.; Wei, Z. Diswot: Student architecture search for distillation without training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 11898–11908. [Google Scholar]
- Liu, X.; Li, L.; Li, C.; Yao, A. Norm: Knowledge distillation via n-to-one representation matching. arXiv 2023, arXiv:2305.13803. [Google Scholar]
- Li, L.; Liang, S.N.; Yang, Y.; Jin, Z. Teacher-free distillation via regularizing intermediate representation. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; pp. 1–6. [Google Scholar]
- Li, L.; Dong, P.; Wei, Z.; Yang, Y. Automated knowledge distillation via monte carlo tree search. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 17413–17424. [Google Scholar]
- Buciluǎ, C.; Caruana, R.; Niculescu-Mizil, A. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 20–23 August 2006; pp. 535–541. [Google Scholar]
- Yim, J.; Joo, D.; Bae, J.; Kim, J. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4133–4141. [Google Scholar]
- Zhang, L.; Ma, K. Improve object detection with feature-based knowledge distillation: Towards accurate and efficient detectors. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Yang, Z.; Li, Z.; Jiang, X.; Gong, Y.; Yuan, Z.; Zhao, D.; Yuan, C. Focal and global knowledge distillation for detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4643–4652. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Models | Distillation | AP | 50 | 75 | S | M | L |
---|---|---|---|---|---|---|---|
Two-stage detectors | |||||||
Faster RCNN-R101 (T) | - | 39.8 | 60.1 | 43.3 | 22.5 | 43.6 | 52.8 |
Faster RCNN-R50 (S) | - | 38.4 | 59.0 | 42.0 | 21.5 | 42.1 | 50.3 |
Faster RCNN-R50 (S) | FitNets | 38.9 (0.5↑) | 59.5 | 42.4 | 21.9 | 42.2 | 51.6 |
Faster RCNN-R50 (S) | GID | 40.2 (1.8↑) | 60.7 | 43.8 | 22.7 | 44.0 | 53.2 |
Faster RCNN-R50 (S) | FRS | 39.5 (1.1↑) | 60.1 | 43.3 | 22.3 | 43.6 | 51.7 |
Faster RCNN-R50 (S) | FGD | 40.4 (2.0↑) | - | - | 22.8 | 44.5 | 53.5 |
Faster RCNN-R50 (S) | Shared-KD | 40.6 (2.2↑) | 61.6 | 45.0 | 24.5 | 45.6 | 53.7 |
One-stage detectors | |||||||
RetinaNet-R101 (T) | - | 38.9 | 58.0 | 41.5 | 21.0 | 42.8 | 52.4 |
RetinaNet-R50 (S) | - | 37.4 | 56.7 | 39.6 | 20.0 | 40.7 | 49.7 |
RetinaNet-R50 (S) | FitNets | 37.4 (0.0↑) | 57.1 | 40.0 | 20.8 | 40.8 | 50.9 |
RetinaNet-R50 (S) | GID | 39.1 (1.7↑) | 59.0 | 42.3 | 22.8 | 43.1 | 52.3 |
RetinaNet-R50 (S) | FRS | 39.3 (1.9↑) | 58.8 | 42.0 | 21.5 | 43.3 | 52.6 |
RetinaNet-R50 (S) | Shared-KD | 39.4 (2.0↑) | 59.0 | 42.5 | 21.5 | 43.9 | 54.0 |
Anchor-free detectors | |||||||
FCOS-R101 (T) | - | 40.8 | 60.0 | 44.0 | 24.2 | 44.3 | 52.4 |
FCOS-R50 (S) | - | 38.5 | 57.7 | 41.0 | 21.9 | 42.8 | 48.6 |
FCOS-R50 (S) | FitNets | 39.9 (1.4↑) | 58.6 | 43.1 | 23.1 | 43.4 | 52.2 |
FCOS-R50 (S) | GID | 42.0 (3.5↑) | 60.4 | 45.5 | 25.6 | 45.8 | 54.2 |
FCOS-R50 (S) | FRS | 40.9 (2.4↑) | 60.3 | 43.6 | 25.7 | 45.2 | 51.2 |
FCOS-R50 (S) | FGD | 42.1 (3.6↑) | - | - | 27.0 | 46.0 | 54.6 |
FCOS-R50 (S) | Shared-KD | 42.2 (3.7↑) | 60.9 | 46.1 | 25.7 | 46.7 | 54.1 |
Models | Distillation | AP | S | M | L |
---|---|---|---|---|---|
Mask RCNN-R50 (S) | - | 35.4 | 19.1 | 38.6 | 48.4 |
Mask RCNN-R50 (S) | FKD | 37.4 | 19.7 | 40.5 | 52.1 |
Mask RCNN-R50 (S) | FGD | 37.8 | 17.1 | 40.7 | 56.0 |
Mask RCNN-R50 (S) | MGD | 38.1 | 17.1 | 41.1 | 56.3 |
Mask RCNN-R50 (S) | Shared-KD | 41.3 | 23.1 | 45.0 | 55.2 |
Method | AP | 50 | 75 | S | M | L |
---|---|---|---|---|---|---|
Ours | 40.6 | 61.6 | 45.0 | 24.5 | 45.6 | 53.7 |
without Teacher Share Module | 40.5 | 61.2 | 44.6 | 24.2 | 45.2 | 53.3 |
without identical-layer distillation | 40.1 | 57.0 | 43.5 | 21.0 | 44.0 | 52.5 |
without cross-layer distillation | 40.3 | 60.9 | 43.9 | 23.0 | 44.5 | 53.0 |
Models | Distillation | AP | 50 | 75 | S | M | L | Time | Memory |
---|---|---|---|---|---|---|---|---|---|
RetinaNet-R101 (T) | - | 38.9 | 58.0 | 41.5 | 21.0 | 42.8 | 52.4 | - | - |
RetinaNet-R50 (S) | - | 37.4 | 56.7 | 39.6 | 20.0 | 40.7 | 49.7 | - | - |
RetinaNet-R50 (S) | SemCKD [17] | 38.8 (1.4↑) | 58.5 | 49.5 | 22.0 | 43.0 | 52.0 | 13.8 h | 4.5 GB |
RetinaNet-R50 (S) | Shared-KD | 39.4 (2.0↑) | 59.0 | 42.5 | 21.5 | 43.9 | 54.0 | 10.5 h | 3.8 GB |
Method | AP | 50 | 75 | S | M | L |
---|---|---|---|---|---|---|
Baseline | 37.4 | 56.7 | 39.6 | 20.0 | 40.7 | 49.7 |
2 | 39.4 | 58.3 | 42.3 | 22.6 | 43.5 | 51.2 |
5 | 38.6 | 57.9 | 41.0 | 21.6 | 42.0 | 51.8 |
10 | 37.8 | 57.2 | 40.6 | 21.2 | 41.8 | 51.4 |
20 | 37.2 | 56.8 | 40.1 | 20.8 | 41.2 | 51.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, Z.; Zhang, P.; Liang, P. Shared Knowledge Distillation Network for Object Detection. Electronics 2024, 13, 1595. https://doi.org/10.3390/electronics13081595
Guo Z, Zhang P, Liang P. Shared Knowledge Distillation Network for Object Detection. Electronics. 2024; 13(8):1595. https://doi.org/10.3390/electronics13081595
Chicago/Turabian StyleGuo, Zhen, Pengzhou Zhang, and Peng Liang. 2024. "Shared Knowledge Distillation Network for Object Detection" Electronics 13, no. 8: 1595. https://doi.org/10.3390/electronics13081595
APA StyleGuo, Z., Zhang, P., & Liang, P. (2024). Shared Knowledge Distillation Network for Object Detection. Electronics, 13(8), 1595. https://doi.org/10.3390/electronics13081595