HSFANet: Hierarchical Scale-Sensitive Feature Aggregation Network for Small Object Detection in UAV Aerial Images
Abstract
1. Introduction
- We introduce a novel DPA module. This module enables resilient and flexible cross-layer feature interaction by fusing spatial and semantic features from different levels and mitigating the effect of noise interference introduced during feature compression, allowing the detector to effectively discern small object position information within hierarchical feature pyramids for precise localization.
- We devise an effective SSL function that supervises predictions across multiple output scales. This loss function prompts the model to dedicate more learning resources to small and difficult objects by adaptively reweighting the losses, thus improving detection robustness.
- We propose a HSFANet and conduct comprehensive experiments on two public aerial object detection datasets, VisDrone and UAVDT, to verify the effectiveness and generalization capability of our method. The results clearly demonstrate that HSFANet considerably enhances small object detection performance and attains new state-of-the-art results in this challenging domain.
2. Related Work
2.1. Small Object Detection in Aerial Images
2.2. Hierarchical Feature Representation for Small Objects
2.3. Loss Function Design for Small Object Detection
3. Methods
3.1. Dynamic Position Aggregation
3.2. Scale-Sensitive Loss
4. Experiments
4.1. Datasets
4.2. Evaluation Metrics
4.3. Implementation Details
4.4. Main Results
4.5. Ablation Studies
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Chen, Y.; Yuan, X.; Wang, J.; Wu, R.; Li, X.; Hou, Q.; Cheng, M.M. YOLO-MS: Rethinking multi-scale representation learning for real-time object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 4240–4252. [Google Scholar] [CrossRef]
- Zhou, G.; Qian, L.; Gamba, P. A novel iterative self-organizing pixel matrix entanglement classifier for remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5407121. [Google Scholar] [CrossRef]
- Li, M.; Jia, T.; Wang, H.; Ma, B.; Lu, H.; Lin, S.; Cai, D.; Chen, D. Ao-detr: Anti-overlapping detr for X-ray prohibited items detection. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 12076–12090. [Google Scholar] [CrossRef]
- Guo, Q.; Xie, K.; Ye, W.; Zhou, T.; Xu, S. A Sparse Bayesian Learning Method for Moving Target Detection and Reconstruction. IEEE Trans. Instrum. Meas. 2025, 74, 4505413. [Google Scholar] [CrossRef]
- Zhuang, J.; Chen, W.; Guo, B.; Yan, Y. Infrared weak target detection in dual images and dual areas. Remote Sens. 2024, 16, 3608. [Google Scholar] [CrossRef]
- Wang, Z.; Wang, C.; Li, X.; Xia, C.; Xu, J. MLP-Net: Multi-layer perceptron fusion network for infrared small target detection. IEEE Trans. Geosci. Remote Sens. 2024, 63, 5601313. [Google Scholar] [CrossRef]
- Chen, X.; Cui, J.; Liu, Y.; Zhang, X.; Sun, J.; Ai, R.; Gu, W.; Xu, J.; Lu, H. Joint scene flow estimation and moving object segmentation on rotational LiDAR data. IEEE Trans. Intell. Transp. Syst. 2024, 25, 17733–17743. [Google Scholar] [CrossRef]
- Chen, G.; Jia, Y.; Yin, Y.; Fu, S.; Liu, D.; Wang, T. Remote sensing image dehazing using a wavelet-based generative adversarial networks. Sci. Rep. 2025, 15, 3634. [Google Scholar] [CrossRef]
- Wang, B.; Yang, M.; Cao, P.; Liu, Y. A novel embedded cross framework for high-resolution salient object detection. Appl. Intell. 2025, 55, 277. [Google Scholar] [CrossRef]
- Zhou, G.; Liu, W.; Zhu, Q.; Lu, Y.; Liu, Y. ECA-MobileNetV3 (Large)+ SegNet model for binary sugarcane classification of remotely sensed images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4414915. [Google Scholar] [CrossRef]
- Zhou, G.; Wang, Q.; Huang, Y.; Tian, J.; Li, H.; Wang, Y. True2 orthoimage map generation. Remote Sens. 2022, 14, 4396. [Google Scholar] [CrossRef]
- Liao, H.; Xia, J.; Yang, Z.; Pan, F.; Liu, Z.; Liu, Y. Meta-learning based domain prior with application to optical-ISAR image translation. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 7041–7056. [Google Scholar] [CrossRef]
- Xu, X.; Fu, X.; Zhao, H.; Liu, M.; Xu, A.; Ma, Y. Three-dimensional reconstruction and geometric morphology analysis of lunar small craters within the patrol range of the Yutu-2 Rover. Remote Sens. 2023, 15, 4251. [Google Scholar] [CrossRef]
- Xiong, X.; He, M.; Li, T.; Zheng, G.; Xu, W.; Fan, X.; Zhang, Y. Adaptive feature fusion and improved attention mechanism-based small object detection for UAV target tracking. IEEE Internet Things J. 2024, 11, 21239–21249. [Google Scholar] [CrossRef]
- Alshehri, M.; Zahoor, L.; AlQahtani, Y.; Alshahrani, A.; AlHammadi, D.A.; Jalal, A.; Liu, H. Unmanned aerial vehicle based multi-person detection via deep neural network models. Front. Neurorobotics 2025, 19, 1582995. [Google Scholar] [CrossRef]
- Zhao, X.; Wang, T.; Li, Y.; Zhang, B.; Liu, K.; Liu, D.; Wang, C.; Snoussi, H. Target-driven visual navigation by using causal intervention. IEEE Trans. Intell. Veh. 2023, 9, 1294–1304. [Google Scholar] [CrossRef]
- Zeng, S.; Yang, W.; Jiao, Y.; Geng, L.; Chen, X. SCA-YOLO: A new small object detection model for UAV images. Vis. Comput. 2024, 40, 1787–1803. [Google Scholar] [CrossRef]
- Zhang, R.; Wang, Y.; Li, Z.; Ding, F.; Wei, C.; Wu, M. Online Adaptive Keypoint Extraction for Visual Odometry Across Different Scenes. IEEE Robot. Autom. Lett. 2025, 10, 7539–7546. [Google Scholar] [CrossRef]
- Rekavandi, A.M.; Xu, L.; Boussaid, F.; Seghouane, A.K.; Hoefs, S.; Bennamoun, M. A guide to image-and video-based small object detection using deep learning: Case study of maritime surveillance. IEEE Trans. Intell. Transp. Syst. 2025, 26, 2851–2879. [Google Scholar] [CrossRef]
- Wang, L.; Fu, Q.; Zhu, R.; Liu, N.; Shi, H.; Liu, Z.; Li, Y.; Jiang, H. Research on high precision localization of space target with multi-sensor association. Opt. Lasers Eng. 2025, 184, 108553. [Google Scholar] [CrossRef]
- Ma, S.; Zhang, Y.; Peng, L.; Sun, C.; Ding, L.; Zhu, Y. OWRT-DETR: A Novel Real-Time Transformer Network for Small Object Detection in Open Water Search and Rescue From UAV Aerial Imagery. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4205313. [Google Scholar] [CrossRef]
- Wang, T.; Li, J.; Wu, H.N.; Li, C.; Snoussi, H.; Wu, Y. ResLNet: Deep residual LSTM network with longer input for action recognition. Front. Comput. Sci. 2022, 16, 166334. [Google Scholar] [CrossRef]
- Li, D.; Tong, S.; Yang, H.; Hu, Q. Time-synchronized control for spacecraft reorientation with time-varying constraints. IEEE/ASME Trans. Mechatron. 2024, 30, 2073–2083. [Google Scholar] [CrossRef]
- Huang, Y.; Chen, J.; Huang, D. UFPMP-Det: Toward accurate and efficient object detection on drone imagery. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, Virtual Conference, 22 February–1 March 2022; Volume 36, pp. 1026–1033. [Google Scholar]
- Yang, C.; Huang, Z.; Wang, N. QueryDet: Cascaded sparse query for accelerating high-resolution small object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13668–13677. [Google Scholar]
- Liu, Z.; Gao, G.; Sun, L.; Fang, Z. HRDNet: High-resolution detection network for small objects. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China, 5–9 July 2021; pp. 1–6. [Google Scholar]
- You, J.; Kim, Y.K. Up-sampling method for low-resolution LiDAR point cloud to enhance 3D object detection in an autonomous driving environment. Sensors 2022, 23, 322. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- He, J.; Erfani, S.; Ma, X.; Bailey, J.; Chi, Y.; Hua, X.S. α-IoU: A family of power intersection over union losses for bounding box regression. Adv. Neural Inf. Process. Syst. 2021, 34, 20230–20242. [Google Scholar]
- Xu, C.; Wang, J.; Yang, W.; Yu, H.; Yu, L.; Xia, G.S. Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2022, 190, 79–93. [Google Scholar] [CrossRef]
- Xu, C.; Wang, J.; Yang, W.; Yu, H.; Yu, L.; Xia, G.S. RFLA: Gaussian receptive field based label assignment for tiny object detection. In Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 526–543. [Google Scholar]
- Jiang, L.; Yuan, B.; Du, J.; Chen, B.; Xie, H.; Tian, J.; Yuan, Z. MFFSODNet: Multiscale feature fusion small object detection network for UAV aerial images. IEEE Trans. Instrum. Meas. 2024, 73, 5015214. [Google Scholar] [CrossRef]
- Le Jeune, P.; Bahaduri, B.; Mokraoui, A. A comparative attention framework for better few-shot object detection on aerial images. Pattern Recognit. 2025, 161, 111243. [Google Scholar] [CrossRef]
- Cheng, G.; Yuan, X.; Yao, X.; Yan, K.; Zeng, Q.; Xie, X.; Han, J. Towards large-scale small object detection: Survey and benchmarks. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 13467–13488. [Google Scholar] [CrossRef] [PubMed]
- Xia, C.; Gao, H.; Yang, W.; Yu, J. MSDT: Multiscale Diffusion Transformer for Multimodality Image Fusion. IEEE Trans. Emerg. Top. Comput. Intell. 2025, 9, 2269–2283. [Google Scholar] [CrossRef]
- Yan, R.; Yan, L.; Geng, G.; Cao, Y.; Zhou, P.; Meng, Y. ASNet: Adaptive semantic network based on transformer–CNN for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5608716. [Google Scholar] [CrossRef]
- Xie, Y.; Liu, S.; Chen, H.; Cao, S.; Zhang, H.; Feng, D.; Wan, Q.; Zhu, J.; Zhu, Q. Localization, balance and affinity: A stronger multifaceted collaborative salient object detector in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 63, 4700117. [Google Scholar] [CrossRef]
- Ye, X.; Xu, C.; Zhu, H.; Xu, F.; Zhang, H.; Yang, W. Density-Aware DETR with Dynamic Query for End-to-End Tiny Object Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 13554–13569. [Google Scholar] [CrossRef]
- Yang, Z.; Li, Q.; Yuan, Y.; Wang, Q. HCNet: Hierarchical feature aggregation and cross-modal feature alignment for remote sensing image captioning. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5624711. [Google Scholar] [CrossRef]
- Liao, Y.; Peng, C.; Li, X.; Wang, X.; Deng, Y. HRGA-Net: Hierarchical Rotation Gaussian Attention Network for Accurate Insulator Detection from UAV Images. IEEE Trans. Power Deliv. 2025. [Google Scholar] [CrossRef]
- Liu, H.I.; Tseng, Y.W.; Chang, K.C.; Wang, P.J.; Shuai, H.H.; Cheng, W.H. A denoising fpn with transformer r-cnn for tiny object detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4704415. [Google Scholar] [CrossRef]
- Feng, Y.; Huang, J.; Du, S.; Ying, S.; Yong, J.H.; Li, Y.; Ding, G.; Ji, R.; Gao, Y. Hyper-yolo: When visual object detection meets hypergraph computation. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 47, 2388–2401. [Google Scholar] [CrossRef] [PubMed]
- Shi, Z.; Hu, J.; Ren, J.; Ye, H.; Yuan, X.; Ouyang, Y.; He, J.; Ji, B.; Guo, J. HS-FPN: High Frequency and Spatial Perception FPN for Tiny Object Detection. arXiv 2024, arXiv:2412.10116. [Google Scholar] [CrossRef]
- Yang, G.; Lei, J.; Zhu, Z.; Cheng, S.; Feng, Z.; Liang, R. AFPN: Asymptotic feature pyramid network for object detection. In Proceedings of the 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, HI, USA, 1–4 October 2023; pp. 2184–2189. [Google Scholar]
- Guo, C.; Fan, B.; Zhang, Q.; Xiang, S.; Pan, C. Augfpn: Improving multi-scale feature learning for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12595–12604. [Google Scholar]
- Deng, C.; Wang, M.; Liu, L.; Liu, Y.; Jiang, Y. Extended feature pyramid network for small object detection. IEEE Trans. Multimed. 2021, 24, 1968–1979. [Google Scholar] [CrossRef]
- Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inf. Process. Syst. 2020, 33, 21002–21012. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Liu, W.; Lu, H.; Fu, H.; Cao, Z. Learning to Upsample by Learning to Sample. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 6027–6037. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar] [CrossRef]
- Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 2021, 52, 8574–8586. [Google Scholar] [CrossRef]
- Peyré, G.; Cuturi, M. Computational optimal transport: With applications to data science. Found. Trends® Mach. Learn. 2019, 11, 355–607. [Google Scholar] [CrossRef]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019; pp. 213–226. [Google Scholar]
- Du, D.; Qi, Y.; Yu, H.; Yang, Y.; Duan, K.; Li, G.; Zhang, W.; Huang, Q.; Tian, Q. The unmanned aerial vehicle benchmark: Object detection and tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 370–386. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
- Yang, F.; Fan, H.; Chu, P.; Blasch, E.; Ling, H. Clustered object detection in aerial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8311–8320. [Google Scholar]
- Zhang, J.; Huang, J.; Chen, X.; Zhang, D. How to fully exploit the abilities of aerial image detectors. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Li, C.; Yang, T.; Zhu, S.; Chen, C.; Guan, S. Density map guided object detection in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 190–191. [Google Scholar]
- Duan, C.; Wei, Z.; Zhang, C.; Qu, S.; Wang, H. Coarse-grained density map guided object detection in aerial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2789–2798. [Google Scholar]
- Wei, Z.; Duan, C.; Song, X.; Tian, Y.; Wang, H. Amrnet: Chips augmentation in aerial images object detection. arXiv 2020, arXiv:2009.07168. [Google Scholar] [CrossRef]
- Deng, S.; Li, S.; Xie, K.; Song, W.; Liao, X.; Hao, A.; Qin, H. A global-local self-adaptive network for drone-view object detection. IEEE Trans. Image Process. 2020, 30, 1556–1569. [Google Scholar] [CrossRef]
- Akyon, F.C.; Altinuc, S.O.; Temizel, A. Slicing aided hyper inference and fine-tuning for small object detection. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 966–970. [Google Scholar]
- Du, B.; Huang, Y.; Chen, J.; Huang, D. Adaptive sparse convolutional networks with global context enhancement for faster object detection on drone images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 13435–13444. [Google Scholar]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
- Wang, C.Y.; Yeh, I.H.; Mark Liao, H.Y. Yolov9: Learning what you want to learn using programmable gradient information. In Proceedings of the 18th European Conference, Milan, Italy, 29 September–4 October 2024; Springer: Cham, Switzerland, 2024; pp. 1–21. [Google Scholar]
- Liu, C.; Gao, G.; Huang, Z.; Hu, Z.; Liu, Q.; Wang, Y. YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images. IEEE Trans. Intell. Transp. Syst. 2024, 25, 13863–13875. [Google Scholar] [CrossRef]
- Chen, Y.; Ye, Z.; Sun, H.; Gong, T.; Xiong, S.; Lu, X. Global-Local Fusion with Semantic Information-Guidance For Accurate Small Object Detection in UAV Aerial Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4701115. [Google Scholar] [CrossRef]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-fcn: Object detection via region-based fully convolutional networks. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Volume 29, pp. 379–387. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
Environment | Configuration Information |
---|---|
Operating system | Ubuntu 22.04.2 |
CPU | AMD EPYC 7773X 64-Core Processor |
CPU number | 2 |
GPU | NVIDIA Corporation GA102 [GeForce RTX 3090] |
GPU memory size | 24G |
GPU number | 8 |
GPU calculate platform | CUDA 11.7.0 |
Python version | Python 3.9.19 |
Deep learning framework | PyTorch 2.0.1 |
Method | Year & Venue | Backbone | AP | |||||
---|---|---|---|---|---|---|---|---|
FRCNN [56] | 2015 NIPS | ResNeXt101 | 24.4 | 47.8 | 21.8 | 17.8 | 34.8 | 34.3 |
ClusDet [57] | 2019 ICCV | ResNeXt101 | 32.4 | 56.2 | 31.6 | - | - | - |
DREN [58] | 2019 ICCVW | ResNeXt152 | 30.3 | - | - | - | - | - |
DMNet [59] | 2020 CVPRW | ResNeXt101 | 29.4 | 49.3 | 30.6 | 21.6 | 41.0 | 56.7 |
CDMNet [60] | 2021 ICCVW | ResNeXt101 | 30.7 | 51.3 | 32.0 | 22.2 | 42.4 | 44.7 |
AMRNet [61] | 2020 Arxiv | ResNeXt101 | 32.1 | - | - | 23.2 | 43.9 | 60.5 |
GLSAN [62] | 2021 TIP | ResNet101 | 32.5 | 55.8 | 30.0 | - | - | - |
HRDNet [26] | 2021 ICME | ResNeXt50+101 | 35.5 | 62.0 | 35.1 | - | - | - |
FCOS+SAHI [63] | 2022 ICIP | - | - | 38.5 | - | - | - | - |
VFNet+SAHI [63] | 2022 ICIP | - | - | 42.2 | - | - | - | - |
TOOD+SAHI [63] | 2022 ICIP | - | - | 43.5 | - | - | - | - |
QueryDet [25] | 2022 CVPR | ResNet50 | 28.3 | 48.1 | 28.8 | - | - | - |
CEASC [64] | 2023 CVPR | ResNet18 | 28.7 | 50.7 | 28.4 | - | - | - |
UFPMP-Det [24] | 2022 AAAI | ResNeXt101 | 40.1 | 66.8 | 41.3 | - | - | - |
CenterNet [65] | 2019 arXiv | Hourglass104 | 27.8 | 47.9 | 27.6 | 21.3 | 42.1 | 49.8 |
YOLOX [66] | 2021 arXiv | CSPv5-M | 27.6 | 47.7 | 27.5 | 17.6 | 41.0 | 46.1 |
YOLOv9 [67] | 2024 ECCV | GELAN | 29.5 | 49.9 | 29.4 | 20.2 | 41.7 | 47.8 |
YOLC [68] | 2024 TITS | ResNeXt101 | 36.3 | 60.1 | 37.4 | 28.9 | 47.5 | 51.8 |
GLSDet [69] | 2025 TGRS | CSPv5-M | 32.3 | 54.2 | 32.8 | 23.3 | 42.9 | 50.2 |
HSFANet | Ours | CSPDarknet53 | 41.4 | 63.2 | 44.3 | 31.1 | 51.5 | 55.4 |
Class | Instances | P | R | AP | ||
---|---|---|---|---|---|---|
pedestrian | 8844 | 73.7 | 68.7 | 40.4 | 73.7 | 39.2 |
people | 5125 | 71.0 | 57.7 | 29.8 | 62.0 | 24.7 |
bicycle | 1287 | 49.8 | 47.9 | 25.5 | 47.2 | 23.2 |
car | 14,064 | 83.7 | 88.3 | 70.3 | 91.4 | 80.0 |
van | 1975 | 65.8 | 62.6 | 48.8 | 64.5 | 56.4 |
truck | 750 | 65.4 | 56.4 | 43.1 | 59.3 | 49.2 |
tricycle | 1045 | 60.5 | 54.4 | 34.0 | 53.6 | 37.9 |
awning-tricycle | 532 | 43.4 | 28.8 | 20.3 | 29.9 | 22.7 |
bus | 251 | 81.9 | 72.0 | 61.2 | 77.8 | 70.1 |
motor | 4886 | 67.3 | 72.5 | 40.1 | 72.5 | 39.8 |
Method | AP | |||||
---|---|---|---|---|---|---|
R-FCN [70] | 7.0 | 17.5 | 3.9 | 4.4 | 14.7 | 12.1 |
SSD [71] | 9.3 | 21.4 | 6.7 | 7.1 | 17.1 | 12.0 |
FRCNN [56] | 5.8 | 17.4 | 2.5 | 3.8 | 12.3 | 9.4 |
FRCNN [56]+FPN | 11.0 | 23.4 | 8.4 | 8.1 | 20.2 | 26.5 |
ClusDet [57] | 13.7 | 26.5 | 12.5 | 9.1 | 25.1 | 31.2 |
DMNet [59] | 14.7 | 24.6 | 16.3 | 9.3 | 26.2 | 35.2 |
CDMNet [60] | 16.8 | 29.1 | 18.5 | 11.9 | 29.0 | 15.7 |
DREN [58] | 17.1 | - | - | - | - | - |
AMRNet [61] | 18.2 | 30.4 | 19.8 | 10.3 | 31.3 | 33.5 |
GLSAN [62] | 17.1 | 28.3 | 18.8 | - | - | - |
CEASC [64] | 17.1 | 30.9 | 17.8 | - | - | - |
CenterNet [65] | 13.2 | 26.7 | 11.8 | 7.8 | 26.6 | 13.9 |
UFPMP-Det [24] | 24.6 | 38.7 | 28.0 | - | - | - |
YOLOX [66] | 15.8 | 27.7 | 16.3 | 10 | 27.3 | 24.8 |
YOLOV9 [67] | 16.1 | 28.1 | 16.5 | 10.6 | 27.9 | 25.3 |
YOLC [68] | 19.3 | 30.9 | 20.1 | 10.9 | 32.2 | 35.5 |
GLSDet [69] | 18.3 | 29.8 | 17.6 | 11.8 | 29.3 | 26.8 |
HSFANet (Ours) | 24.9 | 40.7 | 28.0 | 28.6 | 26.7 | 20.0 |
Baseline | DPA | SSL | AP | |||||
---|---|---|---|---|---|---|---|---|
✓ | - | - | 36.4 | 56.8 | 34.8 | 25.1 | 47.7 | 58.8 |
✓ | ✓ | - | 41.3 | 63.0 | 43.2 | 29.9 | 50.2 | 58.9 |
✓ | - | ✓ | 38.4 | 59.4 | 40.8 | 26.8 | 50.4 | 54.3 |
✓ | ✓ | ✓ | 41.4 | 63.2 | 44.3 | 31.1 | 51.5 | 55.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, H.; Ou, Z.; Yao, S.; Zhu, Y.; Zhu, Y.; Li, H.; Wang, S.; Guo, Y.; Song, M. HSFANet: Hierarchical Scale-Sensitive Feature Aggregation Network for Small Object Detection in UAV Aerial Images. Drones 2025, 9, 659. https://doi.org/10.3390/drones9090659
Zhang H, Ou Z, Yao S, Zhu Y, Zhu Y, Li H, Wang S, Guo Y, Song M. HSFANet: Hierarchical Scale-Sensitive Feature Aggregation Network for Small Object Detection in UAV Aerial Images. Drones. 2025; 9(9):659. https://doi.org/10.3390/drones9090659
Chicago/Turabian StyleZhang, Hongxing, Zhonghong Ou, Siyuan Yao, Yifan Zhu, Yangfu Zhu, Hailin Li, Shigeng Wang, Yang Guo, and Meina Song. 2025. "HSFANet: Hierarchical Scale-Sensitive Feature Aggregation Network for Small Object Detection in UAV Aerial Images" Drones 9, no. 9: 659. https://doi.org/10.3390/drones9090659
APA StyleZhang, H., Ou, Z., Yao, S., Zhu, Y., Zhu, Y., Li, H., Wang, S., Guo, Y., & Song, M. (2025). HSFANet: Hierarchical Scale-Sensitive Feature Aggregation Network for Small Object Detection in UAV Aerial Images. Drones, 9(9), 659. https://doi.org/10.3390/drones9090659