YOLO-FFRD: Dynamic Small-Scale Pedestrian Detection Algorithm Based on Feature Fusion and Rediffusion Structure
Abstract
1. Introduction
- Improvement of the YOLOv8 base model by proposing the Feature Fusion and Refinement with Deep Context (FFRD) structure, a multi-scale parallel network that integrates position and detail information from shallow feature maps with semantic information from deep features to effectively extract and integrate small-target information.
- Integration of the MLCA (Multilevel Cross-Attention) mechanism into the improved structure, which combines channel, spatial, local, and global information to balance model performance and complexity, thereby enhancing detection accuracy and efficiency for dynamic small targets.
- Adoption of the weighted Intersection over Union (wIoU) loss function to enhance the algorithm’s understanding of dynamic objects, along with detailed experimental validation (including comparative experiments, ablation studies, embedded platform deployment, and VisDrone2019 dataset tests) to demonstrate the effectiveness and superiority of the proposed improvements, which significantly boost small-object detection performance.
2. Related Work
2.1. Introduction to YOLOV8 Algorithm
- Backbone network structure: The backbone network structure of YOLOv8 is similar to that of YOLOv5. YOLOv5’s backbone network architecture follows a clear pattern, involving a series of 3 × 3 convolutional layers with a stride of 2 to downsample feature maps, followed by a C3 module to further enhance the features. In YOLOv8, the original C3 (CSP Bottleneck with 3 convolutions) modules are replaced with new C2f (CSP Bottleneck with 2 convolutions) modules, which introduce additional branches to enrich gradient flow during backpropagation [19].
- FPN-PAN structure: YOLOv8 still employs the FPN (Feature Pyramid Network) and PAN (Path Aggregation Network) structure to construct the feature pyramid network of YOLO, facilitating comprehensive fusion of multi-scale information. Apart from replacing the C3 modules inside FPN-PAN with C2f modules, the rest of the structure remains largely consistent with YOLOv5’s FPN-PAN structure. The basic structure is depicted in Figure 2 [20].
- Detection Head Structure: From YOLOv3 to YOLOv5, the detection head has always been “coupled,” meaning that a single layer of convolution is used to simultaneously perform both classification and localization tasks. It was not until the advent of YOLOX that the YOLO series first adopted a “decoupled head.” Similarly, YOLOv8 also employs a decoupled head structure, with two parallel branches extracting category features and location features, respectively. Each branch then uses a 1 × 1 convolution layer to complete the classification and localization tasks [21].
2.2. Introduction to Small-Object Detection Methods
3. Method
3.1. Feature Fusion and Rediffusion Structure
3.2. Improvement of Attention Mechanism
3.3. Improvement of Loss Function
4. Experiment
4.1. Experimental Environment and Parameter Settings
4.2. Evaluation Indicators of the Network
4.3. Model Training Experiment Results
4.4. Ablation Study Results
4.5. Experimental Results on Mobile Robot Platform
4.6. Experimental Results on the VisDrone2019 Public Dataset
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 779–788. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; Springer: New York, NY, USA, 2016; pp. 21–37. [Google Scholar]
- Liu, Y.; Zheng, Y.; Han, L.; Liu, J.; Pan, Z.; Sun, F. The Moving Target Recognition and Tracking Using RGB-D Data with the Mobile Robot. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 4342–4347. [Google Scholar]
- Yu, M.; Leung, H. Small-Object Detection for UAV-Based Images. In Proceedings of the 2023 IEEE International Systems Conference (SysCon), Vancouver, BC, Canada, 17–20 April 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
- Lee, M.C.; Lee, M. Deep Learning-Based Target Following and Obstacle Avoidance Methods in Mobile Robots. In Proceedings of the 2022 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), Yeosu, Republic of Korea, 26–28 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–4. [Google Scholar]
- Yong, C.L.; Hoe Kwan, B.; Ng, D.W.-K.; Seng Sim, H. Human Tracking and Following Using Machine Vision on a Mobile Service Robot. In Proceedings of the 2022 IEEE 10th Conference on Systems, Process & Control (ICSPC), Malacca, Malaysia, 17–18 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 274–279. [Google Scholar]
- Liu, Z.; Deng, Y.; Ma, F.; Du, J.; Xiong, C.; Hu, M.; Zhang, L.; Ji, X. Target Detection and Tracking Algorithm Based on Improved Mask RCNN and LMB. In Proceedings of the 2021 International Conference on Control, Automation and Information Sciences (ICCAIS), Xi’an, China, 14–17 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1037–1041. [Google Scholar]
- Wan, D.; Lu, R.; Shen, S.; Xu, T.; Lang, X.; Ren, Z. Mixed Local Channel Attention for Object Detection. Eng. Appl. Artif. Intell. 2023, 123, 106442. [Google Scholar] [CrossRef]
- Cho, Y.-J. Weighted Intersection over Union (wIoU): A New Evaluation Metric for Image Segmentation. J. Imaging 2023, 9, 187. [Google Scholar] [CrossRef]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Ling, H.; Hu, Q.; Zheng, J.; Peng, T.; Wang, X.; Zhang, Y.; et al. VisDrone-SOT2019: The Vision Meets Drone Single Object Tracking Challenge Results. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 199–212. [Google Scholar]
- Cao, X.; Yao, R.; Yao, P. Targets Recognition Based on Feature Extraction with Small Convolutional Networks for Ultrasonic NDT. In Proceedings of the 2023 International Conference on Intelligent Management and Software Engineering (IMSE), Chongqing, China, 19–21 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 108–111. [Google Scholar]
- Gai, R.; Li, M.; Chen, N. Cherry Detection Algorithm Based on Improved YOLOv5s Network. In Proceedings of the 2021 IEEE 23rd International Conference on High Performance Computing & Communications; 7th International Conference on Data Science & Systems; 19th International Conference on Smart City; 7th International Conference on Dependability in Sensor, Cloud & Big Data Systems & Applications (HPCC/DSS/SmartCity/DependSys), Haikou, China, 20–22 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 2097–2103. [Google Scholar]
- Zhang, H.; Guo, H.; Ding, K.; Liu, J.; Chen, W. Complex Small Target Image Recognition Algorithm Based on Data Enhancement in YOLOv7. In Proceedings of the 2023 IEEE 6th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), Haikou, China, 18–20 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 470–473. [Google Scholar]
- Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar] [CrossRef]
- Peri, S.D.B.; Palaniswamy, S. A Novel Approach to Detect and Track Small Animals Using YOLOv8 and DeepSORT. In Proceedings of the 2023 4th IEEE Global Conference for Advancement in Technology (GCAT), Bangalore, India, 6–8 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
- Talaat, F.M.; ZainEldin, H. An Improved Fire Detection Approach Based on YOLO-v8 for Smart Cities. Neural Comput. Appl. 2023, 35, 20939–20954. [Google Scholar] [CrossRef]
- Shetty, A.D.; Ashwath, S. Animal Detection and Classification in Image & Video Frames Using YOLOv5 and YOLOv8. In Proceedings of the 2023 7th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 22–24 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 677–683. [Google Scholar]
- Luo, Q.; Zhang, Z.; Yang, C.; Lin, J. An Improved Soft-CBAM-YOLOv5 Algorithm for Fruits and Vegetables Detection and Counting. In Proceedings of the 2023 IEEE 6th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), Haikou, China, 18–20 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 187–192. [Google Scholar]
- Samaniego, L.A.; Peruda, S.R.; Brucal, S.G.E.; Yong, E.D.; De Jesus, L.C.M. Image Processing Model for Classification of Stages of Freshness of Bangus Using YOLOv8 Algorithm. In Proceedings of the 2023 IEEE 12th Global Conference on Consumer Electronics (GCCE), Nara, Japan, 10–12 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 401–403. [Google Scholar]
- Wu, Z.; Yu, H.; Zhang, L.; Sui, Y. AMB: Automatically Matches Boxes Module for One-Stage Object Detection. In Proceedings of the 2023 IEEE International Conference on Image Processing and Computer Applications (ICIPCA), Changchun, China, 11–13 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1516–1522. [Google Scholar]
- Umanandhini, D.; Devi, M.S.; Beulah Jabaseeli, N.; Sridevi, S. Batch Normalization Based Convolutional Block YOLOv3 Real Time Object Detection of Moving Images with Backdrop Adjustment. In Proceedings of the 2023 9th International Conference on Smart Computing and Communications (ICSCC), Kochi, India, 17–19 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 25–29. [Google Scholar]
- Hong, A.; Weisong, C.; Runfu, Z. The Algorithm Research of Infrared Small Target Detection and Recognition. In Proceedings of the 2013 2nd International Conference on Measurement, Information and Control (MIC), Harbin, China, 16–18 August 2013; IEEE: Piscataway, NJ, USA, 2013; Volume 2, pp. 936–940. [Google Scholar]
- Rooban, S.; Iwin Thanakumar Joseph, S.; Manimegalai, R.; Eshwar, I.V.S.; Mageswari, R.U. Simulation of Pick and Place Robotic Arm Using Coppeliasim. In Proceedings of the 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 29–31 March 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 600–606. [Google Scholar]
- Li, W.; Zhang, Z.; Li, C.; Zou, J. Small Target Detection Algorithm Based on Two-Stage Feature Extraction. In Proceedings of the 2023 6th International Conference on Software Engineering and Computer Science (CSECS), Chengdu, China, 22–24 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–5. [Google Scholar]
- Gunawan, F.; Hwang, C.-L.; Cheng, Z.-E. ROI-YOLOv8-Based Far-Distance Face-Recognition. In Proceedings of the 2023 International Conference on Advanced Robotics and Intelligent Systems (ARIS), Taipei, Taiwan, China, 30 August–1 September 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
- Zhang, T.; Gai, K.; Bai, H. Multiscale Image Deblurring Network Using Dual Attention Mechanism. In Proceedings of the 2022 16th IEEE International Conference on Signal Processing (ICSP), Beijing, China, 21–23 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 85–89. [Google Scholar]
- Dong, Y. Research on Performance Improvement Method of Dynamic Object Detection Based on Spatio-Temporal Attention Mechanism. In Proceedings of the 2023 IEEE International Conference on Image Processing and Computer Applications (ICIPCA), Changchun, China, 11–13 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1558–1563. [Google Scholar]
- Pandey, S.; Chen, K.-F.; Dam, E.B. Comprehensive Multimodal Segmentation in Medical Imaging: Combining YOLOv8 with SAM and HQ-SAM Models. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Paris, France, 2–6 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 2584–2590. [Google Scholar]
- Cai, X.; Lai, Q.; Wang, Y.; Wang, W.; Sun, Z.; Yao, Y. Poly Kernel Inception Network for Remote Sensing Detection. Remote Sens. 2024, 16, 1234. [Google Scholar] [CrossRef]
- Hu, Y.-Q. Object Detection Algorithm Based on Fusion of Spatial Information. In Proceedings of the 2022 4th International Conference on Intelligent Information Processing (IIP), Guangzhou, China, 28–30 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 258–261. [Google Scholar]
- Shi, Y.; Hidaka, A. Attention-YOLOX: Improvement in On-Road Object Detection by Introducing Attention Mechanisms to YOLOX. In Proceedings of the 2022 International Symposium on Computing and Artificial Intelligence (ISCAI), Beijing, China, 16–18 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 5–14. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 7132–7141. [Google Scholar]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y.; et al. Segment Anything. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 10796–10806. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: New York, NY, USA, 2018; pp. 3–19. [Google Scholar]
- Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 6–12 December 2020; Curran Associates, Inc.: Nice, France, 2020. [Google Scholar]
- Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 5501–5516. [Google Scholar] [CrossRef] [PubMed]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA, 7–12 February 2020; AAAI Press: Menlo Park, CA, USA, 2020; pp. 12993–13000. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 658–666. [Google Scholar]
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open MMLab Detection Toolbox and Benchmark. In Proceedings of the 2019 IEEE/CVF International Conference on Multimedia and Expo (ICME), Shanghai, China, 8–12 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1276–1281. [Google Scholar]
Configuration Name | Version Parameters |
---|---|
Graphics Card (GPU) | GeForce RTX 4090 24 GB |
Processor (CPU) Operating system | AMD EPYC 9654 96-Core Processor Windows 10 |
GPU acceleration library | Cuda 11.0 |
Programming language | Python 3.8.10 |
Model | Precision/% | Recall/% | mAP@0.5/% | mAP@0.95/% | F1/% |
---|---|---|---|---|---|
C2f + C2f (PN) (YOLOv8) | 86.4 | 79.9 | 88.5 | 62.5 | 82.2 |
C2f + C2f (FFRD) | 86.5 | 80.5 | 89.0 | 63.0 | 82.5 |
C2f + MLCA (PN) | 86.3 | 79.9 | 88.9 | 62.7 | 82.3 |
MLCA + C2f (FFRD) | 86.3 | 80.6 | 89.5 | 62.6 | 82.8 |
MLCA + MLCA (PN) | 84.2 | 80.7 | 89.3 | 62.5 | 82.6 |
MLCA + MLCA (FFRD) (OUR) | 86.4 | 81.2 | 90.0 | 63.3 | 83.0 |
Model | Precision/% | Recall/% | mAP@0.5/% | mAP@0.5–0.95/% |
---|---|---|---|---|
Yolov8n + CIoU | 84.8 | 79.9 | 90.0 | 81.6 |
Yolov8n + DIoU | 85.2 | 80.5 | 91.2 | 82.7 |
Yolov8n + GIoU | 86.0 | 80.7 | 91.6 | 81.2 |
Yolov8n + wIoU (our) | 86.4 | 81.2 | 92.0 | 83.4 |
Model | Precision/% | Recall/% | mAP@0.5/% | mAP@0.95/% | F1/% |
---|---|---|---|---|---|
YOLOv8n | 86.5 | 79.6 | 89.4 | 62.5 | 82.7 |
SSD | 85.0 | 77.8 | 87.9 | 62.0 | 82.5 |
MASK_RCNN | 87.0 | 79.0 | 89.5 | 63.2 | 82.9 |
Faster-RCNN | 85.2 | 78.9 | 89.2 | 62.9 | 81.4 |
YOLOv8n-FFRD (our) | 86.7 | 81.5 | 90.1 | 63.5 | 83.2 |
wIoU | MLCA(B) | MLCA(N) | FFRD | Precision/% | Recall/% | mAP@0.5/% |
---|---|---|---|---|---|---|
× | × | × | × | 85.1 | 79.3 | 87.6 |
× | × | × | √ | 85.0 | 79.8 | 88.6 |
× | × | √ | √ | 85.3 | 80.8 | 88.9 |
√ | × | √ | √ | 85.9 | 80.5 | 89.4 |
√ | √ | × | √ | 86.0 | 80.3 | 89.5 |
√ | √ | √ | √ | 86.4 | 81.2 | 90.0 |
Models | mAP@0.5/% | Params/M | GFLOPs | FPS (RTX 4090) | FPS (Jetson Nano) |
---|---|---|---|---|---|
YOLOv8n | 89.4 | 3.2 | 8.7 | 581 | 28 |
YOLOv8s | 91 | 11.2 | 28.6 | 325 | 18 |
ResNet50-FPN (Faster-RCNN) | 89.5 | 36.8 | 165.2 | 76 | 6 |
ResNet101-FPN (Faster-RCNN) | 91.6 | 50.3 | 210.5 | 41 | 4 |
RT-DETR-S | 90.2 | 23.4 | 55.8 | 209 | - |
RT-DETR-M | 92.3 | 34.1 | 87.5 | 132 | - |
D0 (EfficientDet) | 89.9 | 3.8 | 9.2 | 382 | 28 |
D7 (EfficientDet) | 91.6 | 55.6 | 330.8 | 28 | 19 |
YOLOv8n-FFRD | 90.1 | 2.9 | 7.9 | 631 | 32 |
YOLOv8s-FFRD | 92 | 10.1 | 25.3 | 352 | 21 |
Model | Precision (%) | Recall (%) | mAP@0.5 (%) | mAP@0.5–0.95 (%) | F1 (%) |
---|---|---|---|---|---|
YOLOv8n (Baseline) | 43.2 | 33.2 | 33.4 | 19.2 | 37.5 |
YOLOv8n-FFRD (Ours) | 48.1 | 35.9 | 35.8 | 22.1 | 41.1 |
Improvement | +4.9 | +2.7 | +2.4 | +2.9 | +3.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, S.; Wang, R.; Wang, S.; Yue, P.; Guo, G. YOLO-FFRD: Dynamic Small-Scale Pedestrian Detection Algorithm Based on Feature Fusion and Rediffusion Structure. Sensors 2025, 25, 5106. https://doi.org/10.3390/s25165106
Li S, Wang R, Wang S, Yue P, Guo G. YOLO-FFRD: Dynamic Small-Scale Pedestrian Detection Algorithm Based on Feature Fusion and Rediffusion Structure. Sensors. 2025; 25(16):5106. https://doi.org/10.3390/s25165106
Chicago/Turabian StyleLi, Shuqin, Rui Wang, Suyu Wang, Pengxu Yue, and Guanlun Guo. 2025. "YOLO-FFRD: Dynamic Small-Scale Pedestrian Detection Algorithm Based on Feature Fusion and Rediffusion Structure" Sensors 25, no. 16: 5106. https://doi.org/10.3390/s25165106
APA StyleLi, S., Wang, R., Wang, S., Yue, P., & Guo, G. (2025). YOLO-FFRD: Dynamic Small-Scale Pedestrian Detection Algorithm Based on Feature Fusion and Rediffusion Structure. Sensors, 25(16), 5106. https://doi.org/10.3390/s25165106