ADYOLOv5-Face: An Enhanced YOLO-Based Face Detector for Small Target Faces
Abstract
:1. Introduction
2. Related Work
2.1. Two-Stage Face Detection Methods
2.2. One-Stage Face Detection Methods
3. Proposed Method
3.1. Overview of ADYOLOv5-Face
3.2. Architecture of the Neck Part
3.3. Details of the Neck Part GD Mechanism
- Low-stage Gather-and-Distribute mechanism
- High-stage gather-and-distribute mechanism
3.4. Prediction Head for Tiny Faces
3.5. Loss
4. Experiment Setup
4.1. Datasets
4.2. Experimental Evaluation Metrics
- True Positive (): the number of instances correctly predicted as positive (face).
- False Negative (): the number of instances incorrectly predicted as negative (non-face) when they are actually positive (face).
- False Positive (): the number of instances incorrectly predicted as positive (face) when they are actually negative (non-face).
- True Negative (): the number of instances correctly predicted as negative (non-face).
4.3. Ablation Experiment
4.4. Contrast Experiment
4.4.1. Experiments on Wider Face
4.4.2. Experiments on XD-Face
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
YOLO | You Only Look Once |
CNN | Convolutional Neural Network |
RFE | Receptive Field Enhancement module |
FPN | Feature Pyramid Network |
GD mechanism | Gather-and-Distribute mechanism |
GHM | Gradient Harmonizing Mechanism |
SWF | Slide Weight Function (SWF) |
NWD | Normalized Wasserstein Distance |
IoU | Intersection over Union |
RPN | Region Proposal Network |
FAM | feature alignment module |
IFM | information fusion module |
IIM | information injection module |
Low-GD | low-stage Gather-and-Distribute mechanism (Low-GD) |
AvgPool | average pooling operation |
High-GD | high-stage gather-and-distribute mechanism |
AP | Average Precision |
PR | Precision Recall |
TP | True Positive |
FN | False Negative |
FP | False Positive |
TN | True Negative |
References
- Ren, S.; He, K.; Girshick, R.B.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Computer Vision—ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Deng, J.; Guo, J.; Ververas, E.; Kotsia, I.; Zafeiriou, S. Retinaface: Single-shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 5203–5212. [Google Scholar]
- Qi, D.; Tan, W.; Yao, Q.; Liu, J. YOLO5Face: Why reinventing a face detector. In Computer Vision—ECCV 2022 Workshops; Springer: Cham, Switzerland, 2022; pp. 228–244. [Google Scholar]
- Zhang, S.; Chi, C.; Lei, Z.; Li, S.Z. Refineface: Refinement neural network for high performance face detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 4008–4020. [Google Scholar] [CrossRef] [PubMed]
- Zhu, Y.; Cai, H.; Zhang, S.; Wang, C.; Xiong, Y. Tinaface: Strong but simple baseline for face detection. arXiv 2020, arXiv:2011.13183. [Google Scholar]
- Zhang, S.; Zhu, X.; Lei, Z.; Shi, H.; Wang, X.; Li, S.Z. Faceboxes: A CPU real-time face detector with high accuracy. In Proceedings of the 2017 IEEE International Joint Conference on Biometrics (IJCB), Denver, CO, USA, 1–4 October 2017; pp. 1–9. [Google Scholar]
- Ju, L.; Kittler, J.; Rana, M.A.; Yang, W.; Feng, Z. Keep an eye on faces: Robust face detection with heatmap-Assisted spatial attention and scale-Aware layer attention. Pattern Recognit. 2023, 140, 109553. [Google Scholar] [CrossRef]
- Wang, C.; He, W.; Nie, Y.; Guo, J.; Liu, C.; Wang, Y.; Han, K. Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Chi, C.; Zhang, S.; Xing, J.; Lei, Z.; Li, S.Z.; Zou, X. Selective refinement network for high performance face detection. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 8231–8238. [Google Scholar]
- Chen, M.; Ren, X.; Yan, Z. Real-time indoor object detection based on deep learning and gradient harmonizing mechanism. In Proceedings of the 2020 IEEE 9th Data Driven Control and Learning Systems Conference (DDCLS), Liuzhou, China, 20–22 November 2020; pp. 772–777. [Google Scholar]
- Cao, Y.; Chen, K.; Loy, C.C.; Lin, D. Prime sample attention in object detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11583–11591. [Google Scholar]
- Yu, Z.; Huang, H.; Chen, W.; Su, Y.; Liu, Y.; Wang, X. Yolo-facev2: A scale and occlusion aware face detector. Pattern Recognit. 2024, 155, 110714. [Google Scholar] [CrossRef]
- Jiang, H.; Learned-Miller, E. Face detection with the faster R-CNN. In Proceedings of the 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA, 30 May–3 June 2017; pp. 650–657. [Google Scholar]
- Sun, X.; Wu, P.; Hoi, S.C. Face detection using deep learning: An improved faster RCNN approach. Neurocomputing 2018, 299, 42–50. [Google Scholar] [CrossRef]
- Zhu, C.; Zheng, Y.; Luu, K.; Savvides, M. CMS-RCNN: Contextual multi-scale region-based cnn for unconstrained face detection. In Deep Learning for Biometrics; Springer: Cham, Switzerland, 2017; pp. 57–79. [Google Scholar]
- Khan, S.S.; Sengupta, D.; Ghosh, A.; Chaudhuri, A. MTCNN++: A CNN-based face detection algorithm inspired by MTCNN. Vis. Comput. 2024, 40, 899–917. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Pleiss, G.; Van Der Maaten, L.; Weinberger, K.Q. Convolutional networks with dense connectivity. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 44, 8704–8716. [Google Scholar] [CrossRef] [PubMed]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Liu, Y.; Tang, X. Bfbox: Searching face-appropriate backbone and feature pyramid network for face detector. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 13568–13577. [Google Scholar]
- Guo, J.; Deng, J.; Lattas, A.; Zafeiriou, S. Sample and computation redistribution for efficient face detection. arXiv 2021, arXiv:2105.04714. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Zhang, J.; Wu, X.; Hoi, S.C.; Zhu, J. Feature agglomeration networks for single stage face detection. Neurocomputing 2020, 380, 180–189. [Google Scholar] [CrossRef]
- Najibi, M.; Samangouei, P.; Chellappa, R.; Davis, L.S. SSH: Single Stage Headless Face Detector. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Wang, W.; Wang, X.; Yang, W.; Liu, J. Unsupervised face detection in the dark. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1250–1266. [Google Scholar] [CrossRef] [PubMed]
- Li, J.; Wang, Y.; Wang, C.; Tai, Y.; Qian, J.; Yang, J.; Wang, C.; Li, J.; Huang, F. DSFD: Dual Shot Face Detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Zhang, Z.; Shen, W.; Qiao, S.; Wang, Y.; Wang, B.; Yuille, A. Robust face detection via learning small faces on hard images. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass, CO, USA, 1–5 March 2020; pp. 1361–1370. [Google Scholar]
- Fang, Z.; Ren, J.; Marshall, S.; Zhao, H.; Wang, Z.; Huang, K.; Xiao, B. Triple loss for hard face detection. Neurocomputing 2020, 398, 20–30. [Google Scholar] [CrossRef]
- Wu, S.; Li, X.; Wang, X. IoU-aware single-stage object detector for accurate localization. Image Vis. Comput. 2020, 97, 103911. [Google Scholar] [CrossRef]
- Wang, J.; Xu, C.; Yang, W.; Yu, L. A normalized Gaussian Wasserstein distance for tiny object detection. arXiv 2021, arXiv:2110.13389. [Google Scholar]
- Everingham, M.; Gool, L.V.; Williams, C.K.I.; Winn, J.M.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Liu, Y.; Tang, X.; Han, J.; Liu, J.; Rui, D.; Wu, X. Hambox: Delving into mining high-quality anchors on face detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 13043–13051. [Google Scholar]
- Gao, J.; Yang, T. Face detection algorithm based on improved TinyYOLOv3 and attention mechanism. Comput. Commun. 2022, 181, 329–337. [Google Scholar] [CrossRef]
- Sufian Chan, A.A.; Abdullah, M.; Mustam, S.M.; Poad, F.A.; Joret, A. Face Detection with YOLOv7: A Comparative Study of YOLO-Based Face Detection Models. In Proceedings of the 2024 International Conference on Green Energy, Computing and Sustainable Technology (GECOST), Miri Sarawak, Malaysia, 17–19 January 2024; pp. 105–109. [Google Scholar]
Name | Images | Faces | Dense/Bright | Dense/Dark | Sparse/Bright | Sparse/Dark |
---|---|---|---|---|---|---|
A-208 | 142 | 6208 | 142 | 0 | 0 | 0 |
A-211 | 150 | 2001 | 100 | 50 | 0 | 0 |
B-301 | 320 | 25,108 | 160 | 160 | 0 | 0 |
B-403 | 300 | 11,775 | 150 | 150 | 0 | 0 |
B-418 | 315 | 17,351 | 165 | 150 | 0 | 0 |
B-507 | 335 | 21,478 | 185 | 150 | 0 | 0 |
B-509 | 300 | 2216 | 0 | 0 | 0 | 300 |
C-107 | 400 | 5373 | 0 | 0 | 400 | 0 |
C-225 | 540 | 10,740 | 300 | 0 | 240 | 0 |
Total | 2802 | 102,250 | 1202 | 660 | 640 | 300 |
Confusion Matrix | Actual | ||
---|---|---|---|
Face | Non-Face | ||
Predicted | Face | ||
Non-Face |
Modification | Easy (%) | Medium (%) | Hard (%) | Params (M) | Flops (G) |
---|---|---|---|---|---|
baseline | 93.70 | 92.68 | 83.02 | 7.063 | 16.4 |
+neck | 95.16 | 93.42 | 81.36 | 10.008 | 21.2 |
+neck +head | 94.80 | 93.77 | 84.37 | 10.123 | 22.8 |
Detector | Backbone | Easy (%) | Medium (%) | Hard (%) |
---|---|---|---|---|
DSFD (2019) [34] | ResNet152 | 94.29 | 91.47 | 71.39 |
RetinaFace (2020) [6] | ResNet50 | 94.92 | 91.90 | 64.17 |
HAMBox (2020) [41] | ResNet50 | 95.27 | 93.76 | 76.75 |
TinaFace (2020) [9] | ResNet50 | 95.61 | 94.25 | 81.43 |
SCRFD-2.5GF (2021) [29] | Basic ResNet | 93.78 | 92.16 | 77.87 |
TinyYolov3 (2022) [42] | YOLOv3-tiny | 95.26 | 89.2 | 77.9 |
YOLOv7-tiny-Face (2022) [7] | YOLOv7-tiny | 94.7 | 92.6 | 82.1 |
YFaces-Tiny (2024) [43] | YOLOv7-tiny | 94.07 | 92.36 | 83.06 |
YOLOv8n-face (2023) [7] | YOLOv8n | 94.5 | 92.2 | 79 |
YOLOv5s-Face (2021) [7] | YOLOv5s | 94.33 | 92.61 | 83.15 |
ADYOLOv5-Face (ours) | YOLOv5s | 94.80 | 93.77 | 84.37 |
Detector | Precision (%) | Recall (%) | AP50 (%) | AP@50:5:95 (%) |
---|---|---|---|---|
YOLOv7-tiny-Face | 63.9 | 63.7 | 54.9 | 23.5 |
YOLOv5s-Face | 46.8 | 74.6 | 57.1 | 25.5 |
YOLOFacev2 | 65.5 | 67.8 | 57.6 | 25.9 |
baseline | 65.5 | 66.5 | 58.2 | 24.8 |
baseline+neck | 65.7 | 66.7 | 57.1 | 25.0 |
ADYOLOv5-Face (ours) | 66.5 | 70.2 | 59.7 | 26.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, L.; Wang, G.; Miao, Q. ADYOLOv5-Face: An Enhanced YOLO-Based Face Detector for Small Target Faces. Electronics 2024, 13, 4184. https://doi.org/10.3390/electronics13214184
Liu L, Wang G, Miao Q. ADYOLOv5-Face: An Enhanced YOLO-Based Face Detector for Small Target Faces. Electronics. 2024; 13(21):4184. https://doi.org/10.3390/electronics13214184
Chicago/Turabian StyleLiu, Linrunjia, Gaoshuai Wang, and Qiguang Miao. 2024. "ADYOLOv5-Face: An Enhanced YOLO-Based Face Detector for Small Target Faces" Electronics 13, no. 21: 4184. https://doi.org/10.3390/electronics13214184
APA StyleLiu, L., Wang, G., & Miao, Q. (2024). ADYOLOv5-Face: An Enhanced YOLO-Based Face Detector for Small Target Faces. Electronics, 13(21), 4184. https://doi.org/10.3390/electronics13214184