Small Object Detection in Traffic Scenes for Mobile Robots: Challenges, Strategies, and Future Directions
Abstract
1. Introduction
1.1. Academic and Engineering Significance of Mobile Robot Perception
1.2. Characteristics and Technical Bottlenecks of Small Object Detection
2. Analysis of Challenges and Requirements in Traffic Scenes
2.1. Environmental Factors
2.2. Target Characteristics
2.3. System Constraints
3. Core Technological Developments
3.1. Feature Enhancement Methods
3.1.1. Multi-Scale Feature Fusion
3.1.2. Attention Mechanisms
3.1.3. Context Enhancement and Global Modeling
3.2. Network Architecture Innovations
3.2.1. Lightweight Backbone Networks
3.2.2. Detection Head Optimization
3.2.3. Task Collaboration and Branch Design
3.3. Data-Driven Optimization
3.3.1. Small Object Enhancement Strategies
- Copy-Paste: cutting targets from the background and pasting them into other images to increase the density of small targets [27];
- Mosaic/MixUp: mixing multiple images to maintain distribution diversity [28];
- Random Zoom-In: randomly zooming in on targets to increase their relative size and improve model attention [29].
3.3.2. Synthetic Data and Simulation-Based Generation
3.3.3. Transfer Learning and Few-Shot Training Strategies
3.4. Empirical Comparative Analysis
4. Mobile Robot-Specific Solutions
4.1. Hardware-Aware Algorithm Design
4.2. Multi-Sensor Spatio-Temporal Alignment
4.2.1. Camera–LiDAR Fusion
4.2.2. IMU and GPS Spatio-Temporal Synchronization
4.2.3. Millimeter-Wave Radar and Sonar-Based Blind Spot Compensation
4.3. Embedded Deployment
- Using TensorRT to accelerate model inference (e.g., converting PyTorch (v1.13.1) models to ONNX and then to TensorRT) [45];
- Applying INT8/FP16 quantization to compress models [46];
- Utilizing the DeepStream SDK to build asynchronous pipelines for improved processing throughput [47];
- Deploying lightweight detection networks such as YOLOv5-Nano to meet the battery life requirements of mobile power supplies [48].
5. Benchmarking Systems and Metric Limitations in Small Object Detection
5.1. Dataset Comparison
- Although COCO is a general-purpose dataset, it exhibits limited generalizability to traffic scenes;
- KITTI has low object density and a limited proportion of small objects, making it difficult to comprehensively assess the small object detection capabilities of detectors;
- VisDrone and DOTA, due to their aerial perspectives and dense object distributions, have become important benchmarks for small object detection;
- TT100K is specifically designed for traffic sign detection, featuring a wide range of categories and a significant proportion of small objects, making it well-suited for research into micro-object detection in traffic environments.
5.2. Evaluation Methods for Small Object Detection in Real-World Traffic Scenes
6. Open Issues and Future Directions
6.1. Generalization Ability in Extreme Scenarios
- Low-light and nighttime environments: for instance, urban night navigation, tunnels, and areas without street lighting, where image noise is high, contrast is low, and small objects are easily obscured by the background;
- Adverse weather conditions: Such as rain, snow, and haze, which may cause image blurring and color distortion, thereby affecting the model’s perceptual reliability;
- Dynamic scene variations: Including rapid motion leading to motion blur and frequent scene transitions, requiring the detector to adapt quickly;
- Domain shift issues: When models are generalized from training datasets (e.g., urban traffic) to real-world deployment scenarios (e.g., suburban or mountainous areas), performance tends to degrade significantly, necessitating solutions to domain adaptation challenges.
6.2. Human–Machine Collaborative Annotation and Incremental Learning
- Annotating small objects is expensive and demands high levels of precision;
- Systems lack adaptability to new target categories, requiring full retraining whenever the environment changes (e.g., introduction of new object types);
- Model updates are resource-intensive and cannot be conducted at scale in an online manner.
6.3. Ultra-Low-Latency Detection and System Coordination Optimization
- Embedded devices offer limited computational resources and are often unable to support real-time inference of complex models.
- System bottlenecks beyond model inference (e.g., preprocessing, I/O latency) frequently result in excessive overall latency.
- Multi-stage processing chains lack collaborative optimization, leading to unstable end-to-end response times.
6.4. Differentiated Optimization Strategies for Specific Applications
7. Conclusions
- Traffic scenes are highly dynamic and complex, subject to multiple environmental disturbances such as lighting variation, adverse weather, object occlusion, and motion blur, all of which significantly increase the difficulty of small object detection.
- The development of small object detection methods has progressed along multiple dimensions, including feature enhancement (e.g., FPN, attention mechanisms), lightweight network architectures (e.g., MobileNet, YOLOv7-tiny), and data-driven optimization (e.g., Mosaic and copy–paste data augmentation, pseudo-label learning).
- For practical deployment, it is essential to consider the joint design of perception capability and computational constraints, such as deployment optimization on hardware platforms like Jetson Nano and Raspberry Pi. Meanwhile, multi-sensor fusion and ROS integration have become crucial approaches to enhancing system robustness.
- Current mainstream evaluation systems do not yet fully reflect the small object detection needs in real-world deployment. It is therefore necessary to construct more representative datasets and adopt fine-grained metrics (e.g., mAP@0.5:0.95, small object recall rate) for comprehensive assessment.
- In the future, advances in generalization, incremental learning, low-latency inference, and platform-aware optimization will drive the development of small object detection systems that are not only adaptive over time but also capable of autonomous evolution across complex real-world scenarios.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Raj, R.; Kos, A. A Comprehensive Study of Mobile Robot: History, Developments, Applications, and Future Research Perspectives. Appl. Sci. 2022, 12, 6951. [Google Scholar] [CrossRef]
- Antonyshyn, L.; Silveira, J.; Givigi, S.; Marshall, J. Multiple Mobile Robot Task and Motion Planning: A Survey. ACM Comput. Surv. 2023, 55, 1–35. [Google Scholar] [CrossRef]
- Kaur, R.; Singh, S. A Comprehensive Review of Object Detection with Deep Learning. Digit. Signal Process. 2023, 132, 103812. [Google Scholar] [CrossRef]
- Ravi, N.; El-Sharkawy, M. Real-Time Embedded Implementation of Improved Object Detector for Resource-Constrained Devices. J. Low Power Electron. Appl. 2022, 12, 21. [Google Scholar] [CrossRef]
- Mirzaei, B.; Nezamabadi-Pour, H.; Raoof, A.; Derakhshani, R. Small Object Detection and Tracking: A Comprehensive Review. Sensors 2023, 23, 6887. [Google Scholar] [CrossRef]
- Liu, Y.; Sun, P.; Wergeles, N.; Shang, Y. A Survey and Performance Evaluation of Deep Learning Methods for Small Object Detection. Expert Syst. Appl. 2021, 172, 114602. [Google Scholar] [CrossRef]
- Mittal, P. A Comprehensive Survey of Deep Learning-Based Lightweight Object Detection Models for Edge Devices. Artif. Intell. Rev. 2024, 57, 242. [Google Scholar] [CrossRef]
- da Silva, D.Q.; dos Santos, F.N.; Filipe, V.; Sousa, A.J.; Oliveira, P.M. Edge AI-Based Tree Trunk Detection for Forestry Monitoring Robotics. Robotics 2022, 11, 136. [Google Scholar] [CrossRef]
- Tian, Z.; Qu, P.; Li, J.; Sun, Y.; Li, G.; Liang, Z.; Zhang, W. A Survey of Deep Learning-Based Low-Light Image Enhancement. Sensors 2023, 23, 7763. [Google Scholar] [CrossRef]
- Matveev, I.; Karpov, K.; Chmielewski, I.; Siemens, E.; Yurchenko, A. Fast Object Detection Using Dimensional Based Features for Public Street Environments. Smart Cities 2020, 3, 93–111. [Google Scholar] [CrossRef]
- He, B.; Yang, Y.; Zheng, S.; Fan, G. YOLOv8 for Adverse Weather: Traffic Sign Detection in Autonomous Driving. In Proceedings of the Fourth International Conference on Advanced Algorithms and Signal Image Processing (AASIP 2024), Kuala Lumpur, Malaysia, 28–30 June 2024; pp. 311–316. [Google Scholar]
- Liu, J.; Zhang, J.; Ni, Y.; Chi, W.; Qi, Z. Small-Object Detection in Remote Sensing Images with Super Resolution Perception. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 15721–15734. [Google Scholar] [CrossRef]
- Majhi, S.K.; Gupta, R.K.; Ojha, B.; Sapkota, A.; Muduli, D. Deep Learning Fusion Ensemble for Enhanced Traffic Sign Detection Using the ICTS Dataset. In Proceedings of the2024 IEEE 4th International Conference on Applied Electromagnetics, Signal Processing, & Communication (AESPC), Bhubaneswar, India, 29–30 November 2024; pp. 1–5. [Google Scholar]
- Muzammul, M.; Li, X. Comprehensive Review of Deep Learning-Based Tiny Object Detection: Challenges, Strategies, and Future Directions. Knowl. Inf. Syst. 2025, 67, 3825–3913. [Google Scholar] [CrossRef]
- Bai, L.; Cao, J.; Zhang, M.; Li, B. Collaborative Edge Intelligence for Autonomous Vehicles: Opportunities and Challenges. IEEE Netw. 2025, 39, 12–19. [Google Scholar] [CrossRef]
- Kim, K.; Jang, S.J.; Park, J.; Lee, E.; Lee, S.S. Lightweight and Energy-Efficient Deep Learning Accelerator for Real-Time Object Detection on Edge Devices. Sensors 2023, 23, 1185. [Google Scholar] [CrossRef]
- Kim, S.-W.; Kook, H.-K.; Sun, J.-Y.; Kang, M.-C.; Ko, S.-J. Parallel Feature Pyramid Network for Object Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Ferrari, V., Sminchisescu, C., Hebert, M., Weiss, Y., Eds.; Springer: Cham, Switzerland, 2018; pp. 239–256. [Google Scholar]
- Chen, Y.; Zhu, X.; Li, Y.; Wei, Y.; Ye, L. Enhanced Semantic Feature Pyramid Network for Small Object Detection. Signal Process. Image Commun. 2023, 113, 116919. [Google Scholar] [CrossRef]
- Ma, P.; He, X.; Chen, Y.; Liu, Y. ISOD: Improved Small Object Detection Based on Extended Scale Feature Pyramid Network. Vis. Comput. 2025, 41, 465–479. [Google Scholar] [CrossRef]
- Lian, J.; Yin, Y.; Li, L.; Wang, Z.; Zhou, Y. Small Object Detection in Traffic Scenes Based on Attention Feature Fusion. Sensors 2021, 21, 3031. [Google Scholar] [CrossRef]
- Chen, J.; Li, X.; Ou, Y.; Hu, X.; Peng, T. Graphormer-Based Contextual Reasoning Network for Small Object Detection. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Xiamen, China, 13–15 October 2023; Springer: Singapore, 2023; pp. 294–305. [Google Scholar]
- Ma, Z.; Zhou, L.; Wu, D.; Zhang, X. A Small Object Detection Method with Context Information for High Altitude Images. Pattern Recognit. Lett. 2025, 188, 22–28. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 1580–1589. [Google Scholar]
- Mu, J.; Su, Q.; Wang, X.; Liang, W.; Xu, S.; Wan, K. A Small Object Detection Architecture with Concatenated Detection Heads and Multi-Head Mixed Self-Attention Mechanism. J. Real-Time Image Process. 2024, 21, 184. [Google Scholar] [CrossRef]
- Gao, Y.; Li, Z.; Wang, Y.; Zhu, S. A Novel YOLOv5_ES Based on Lightweight Small Object Detection Head for PCB Surface Defect Detection. Sci. Rep. 2024, 14, 23650. [Google Scholar] [CrossRef]
- Maji, D.; Nagori, S.; Mathew, M.; Poddar, D. YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, New Orleans, LA, USA, 19–20 June 2022; pp. 2637–2646. [Google Scholar]
- Ghiasi, G.; Cui, Y.; Srinivas, A.; Qian, R.; Lin, T.-Y.; Cubuk, E.D.; Le, Q.V.; Zoph, B. Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 2918–2928. [Google Scholar]
- Yang, Z.; Lan, X.; Wang, H. Comparative Analysis of YOLO Series Algorithms for UAV-Based Highway Distress Inspection: Performance and Application Insights. Sensors 2025, 25, 1475. [Google Scholar] [CrossRef] [PubMed]
- Wei, J.; Li, Y.; Zhang, B. EDCNet: A Lightweight Object Detection Method Based on Encoding Feature Sharing for Drug Driving Detection. In Proceedings of the 2022 IEEE 24th International Conference on High Performance Computing & Communications; 8th International Conference on Data Science & Systems; 20th International Conference on Smart City; 8th International Conference on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), Hainan, China, 18–20 December 2022; pp. 1006–1013. [Google Scholar]
- Senapati, R.K.; Satvika, R.; Anmandla, A.; Ashesh Reddy, G.; Anil Kumar, C. Image-to-Image Translation Using Pix2Pix GAN and Cycle GAN. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics, Tirunelveli, India, 27–28 June 2023; Springer: Singapore, 2023; pp. 573–586. [Google Scholar]
- Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a Few Examples: A Survey on Few-Shot Learning. ACM Comput. Surv. 2020, 53, 1–34. [Google Scholar] [CrossRef]
- Chong, T.; Zhang, Y.; Ma, C.; Liu, T. Design and Analysis of Video Desensitization Algorithm Based on Lightweight Model PP PicoDet. In Proceedings of the 2023 International Conference on Artificial Intelligence and Automation Control (AIAC), Xiamen, China, 17–19 November 2023; pp. 121–124. [Google Scholar]
- Xiong, Y.; Liu, H.; Gupta, S.; Akin, B.; Bender, G.; Wang, Y.; Kindermans, P.-J.; Tan, M.; Singh, V.; Chen, B. MobileDets: Searching for Object Detection Architectures for Mobile Accelerators. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 3825–3834. [Google Scholar]
- Chang, Q.; Peng, J.; Xie, L.; Sun, J.; Yin, H.; Tian, Q.; Zhang, Z. DATA: Domain-Aware and Task-Aware Self-Supervised Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 9841–9850. [Google Scholar]
- Liang, T.; Xie, H.; Yu, K.; Xia, Z.; Lin, Z.; Wang, Y.; Tang, T.; Wang, B.; Tang, Z. BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework. Adv. Neural Inf. Process. Syst. 2022, 35, 10421–10434. [Google Scholar]
- Li, Y.; Yu, A.W.; Meng, T.; Caine, B.; Ngiam, J.; Peng, D.; Shen, J.; Lu, Y.; Zhou, D.; Le, Q.V.; et al. DeepFusion: LiDAR-Camera Deep Fusion for Multi-Modal 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 17182–17191. [Google Scholar]
- Wen, Z.; Zheng, L.; Zeng, T. Extended Object Tracking Using an Orientation Vector Based on Constrained Filtering. Remote Sens. 2025, 17, 1419. [Google Scholar] [CrossRef]
- Gao, Y.; Yang, M.; Zang, X.; Deng, L.; Li, M.; Xu, Y.; Sun, M. Adaptive Distributed Student’s T Extended Kalman Filter Employing Allan Variance for UWB Localization. Sensors 2025, 25, 1883. [Google Scholar] [CrossRef]
- Gu, P.; Meng, Z. S-VIO: Exploiting Structural Constraints for RGB-D Visual Inertial Odometry. IEEE Robot. Autom. Lett. 2023, 8, 3542–3549. [Google Scholar] [CrossRef]
- Luo, N.; Hu, Z.; Ding, Y.; Li, J.; Zhao, H.; Liu, G.; Wang, Q. DFF-VIO: A General Dynamic Feature Fused Monocular Visual-Inertial Odometry. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 465–479. [Google Scholar] [CrossRef]
- Senel, N.; Kefferpütz, K.; Doycheva, K.; Elger, G. Multi-Sensor Data Fusion for Real-Time Multi-Object Tracking. Processes 2023, 11, 501. [Google Scholar] [CrossRef]
- Wei, Z.; Zhang, F.; Chang, S.; Liu, Y.; Wu, H.; Feng, Z. MmWave Radar and Vision Fusion for Object Detection in Autonomous Driving: A Review. Sensors 2022, 22, 2542. [Google Scholar] [CrossRef]
- Qian, K.; Zhu, S.; Zhang, X.; Li, L.E. Robust Multimodal Vehicle Detection in Foggy Weather Using Complementary Lidar and Radar Signals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 444–453. [Google Scholar]
- Palladin, E.; Dietze, R.; Narayanan, P.; Bijelic, M.; Heide, F. SAMFusion: Sensor-Adaptive Multimodal Fusion for 3D Object Detection in Adverse Weather. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2024; Springer: Cham, Switzerland, 2024; pp. 484–503. [Google Scholar]
- Jeong, E.; Kim, J.; Ha, S. TensorRT-Based Framework and Optimization Methodology for Deep Learning Inference on Jetson Boards. ACM Trans. Embed. Comput. Syst. 2022, 21, 3508391. [Google Scholar] [CrossRef]
- Yao, Z.; Yazdani Aminabadi, R.; Zhang, M.; Wu, X.; Li, C.; He, Y. ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers. Adv. Neural Inf. Process. Syst. 2022, 35, 27168–27183. [Google Scholar]
- Xie, S.; Deng, G.; Lin, B.; Jing, W.; Li, Y.; Zhao, X. Real-Time Object Detection from UAV Inspection Videos by Combining YOLOv5s and DeepStream. Sensors 2024, 24, 3862. [Google Scholar] [CrossRef] [PubMed]
- Sekar, K.; Dheepa, T.; Sheethal, R.; Devi, K.S.; Baskaran, M.V.; Smita, R.S.; Dixit, T.V.; Dutta Borah, M. Efficient Object Detection on Low-Resource Devices Using Lightweight MobileNet-SSD. In Proceedings of the 2025 International Conference on Intelligent Systems and Computational Networks (ICISCN), Bidar, India, 24–25 January 2025; pp. 1–6. [Google Scholar]
- Macenski, S.; Foote, T.; Gerkey, B.; Lalancette, C.; Woodall, W. Robot Operating System 2: Design, Architecture, and Uses in the Wild. Sci. Robot. 2022, 7, eabm6074. [Google Scholar] [CrossRef] [PubMed]
- Liu, R.; Zheng, J.; Luan, T.H.; Gao, L.; Hui, Y.; Xiang, Y.; Dong, M. ROS-Based Collaborative Driving Framework in Autonomous Vehicular Networks. IEEE Trans. Veh. Technol. 2023, 72, 6987–6999. [Google Scholar] [CrossRef]
- Duan, F.; Li, W.; Tan, Y. ROS Debugging. In Intelligent Robot: Implementation and Applications; Springer: Singapore, 2023; pp. 71–92. [Google Scholar]
- Wu, Z.; Zhen, H.; Zhang, X.; Bai, X.; Liu, X. SEMA-YOLO: Lightweight Small Object Detection in Remote Sensing Image via Shallow-Layer Enhancement and Multi-Scale Adaptation. Remote Sens. 2025, 17, 1917. [Google Scholar] [CrossRef]
Method | Backbone/Key Modules | Dataset | Accuracy (APsmall/mAP) | FPS |
---|---|---|---|---|
SSD + PFPNet | VGG-16 | MS-COCO | +7.8 APsmall ↑ vs. SSD | 34 (RTX-2070) |
RetinaNet + ES-FPN | ResNet-50 | MS-COCO | 23.7/40.5 | 18 (RTX-2080) |
EfficientDet-D0 (with BiFPN) | Efficient-B0 | MS-COCO | 12.0/33.8 | 23 (Jetson Xavier) |
YOLOv5-s + GhostNet | Ghost-CSP | VisDrone | —/+3.1 mAP ↑ −28% params | 40 (Jetson Nano) |
ISOD (Ext. Scale FPN) | CSPDark-53 | TT100K | —/0.635 | 28 (RTX-3060) |
Dataset Name | Release Year | Typical Scenario | Number of Images | Number of Categories | Small Object Ratio (Area < 322) | Annotation Type |
---|---|---|---|---|---|---|
KITTI | 2012 | Urban streets | ∼15 K | 8 | 17.3% | 2D bounding boxes |
BDD100K | 2018 | Urban, motorways | 100 K | 10 | 25.7% | 2D bounding boxes + temporal data |
COCO | 2015 | General scenarios | 118 K | 80 | 41.4% | 2D bounding boxes/segmentation |
VisDrone | 2019 | Aerial view | 263 K | 10 | 53.3% | 2D bounding boxes |
TT100K | 2016 | Chinese roads | 100 K | 221 | 32.8% | 2D bounding boxes |
DOTA | 2018 | Satellite aerial photography | 280 K | 15 | 60%+ | Rotated bounding boxes |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wei, Z.; Zou, Y.; Xu, H.; Wang, S. Small Object Detection in Traffic Scenes for Mobile Robots: Challenges, Strategies, and Future Directions. Electronics 2025, 14, 2614. https://doi.org/10.3390/electronics14132614
Wei Z, Zou Y, Xu H, Wang S. Small Object Detection in Traffic Scenes for Mobile Robots: Challenges, Strategies, and Future Directions. Electronics. 2025; 14(13):2614. https://doi.org/10.3390/electronics14132614
Chicago/Turabian StyleWei, Zhe, Yurong Zou, Haibo Xu, and Sen Wang. 2025. "Small Object Detection in Traffic Scenes for Mobile Robots: Challenges, Strategies, and Future Directions" Electronics 14, no. 13: 2614. https://doi.org/10.3390/electronics14132614
APA StyleWei, Z., Zou, Y., Xu, H., & Wang, S. (2025). Small Object Detection in Traffic Scenes for Mobile Robots: Challenges, Strategies, and Future Directions. Electronics, 14(13), 2614. https://doi.org/10.3390/electronics14132614