Deep learning-based methods for real-time small target detection are critical for applications such as traffic monitoring, land management, and marine transportation. However, achieving high-precision detection of small objects against complex backgrounds remains challenging due to insufficient feature representation and background interference. Existing methods
[...] Read more.
Deep learning-based methods for real-time small target detection are critical for applications such as traffic monitoring, land management, and marine transportation. However, achieving high-precision detection of small objects against complex backgrounds remains challenging due to insufficient feature representation and background interference. Existing methods often struggle to balance multi-scale feature enhancement and computational efficiency, particularly in scenarios with low target-to-background contrast. To address this challenge, this study proposes an efficient detection method called hierarchical feature enhancement and feature fusion YOLO (HFEF
2-YOLO), which is based on the hierarchical dynamic attention. Firstly, a Hierarchical Filtering Feature Pyramid Network (HF-FPN) is introduced, which employs a dynamic gating mechanism to achieve differentiated screening and fusion of cross-scale features. This design addresses the feature redundancy caused by fixed fusion strategies in conventional FPN architectures, preserving edge details of tiny targets. Secondly, we propose a Dynamic Spatial–Spectral Attention Module (DSAM), which adaptively fuses channel-wise and spatial–dimensional responses through learnable weight allocation, generating dedicated spatial modulation factors for individual channels and significantly enhancing the saliency representation of dim small targets. Extensive experiments on four benchmark datasets (VEDAI, AI-TOD, DOTA, NWPU VHR-10) demonstrate the superiority of HFEF
2-YOLO; the proposed method can reach an accuracy of 0.761, 0.621, 0.737, and 0.969 (in terms of mAP@0.5), outperforming state-of-the-art methods by 3.5–8.1%. Furthermore, a lightweight version (L-HFEF
2-YOLO) is developed via dynamic convolution, reducing parameters by 42% while maintaining >95% accuracy, demonstrating real-time applicability on edge devices. Robustness tests under simulated degradation (e.g., noise, blur) validate its practicality for satellite-based tasks.
Full article