Tiny Object Detection via Normalized Gaussian Label Assignment and Multi-Scale Hybrid Attention
Highlights
- We propose a novel similarity metric based on the receptive field to replace the traditional IoU for obtaining similarity scores.
- In the label assignment strategy, we integrate classification scores and Gaussian similarity scores to comprehensively screen candidate samples in order to obtain highquality candidates.
- We propose a Multi-scale Hybrid Attention module that integrates multi-scale local attention and global attention to enhance the feature representation of tiny objects.
- BCDet can be easily integrated into both anchor-based and anchor-free detectors, significantly enhancing the robustness and scalability of tiny object detection in remote sensing images under complex environmental conditions.
- Compared to existing state-of-the-art methods, this approach achieves superior performance on the AI-TOD-v2 and VisDrone2019 datasets.
Abstract
1. Introduction
- We propose an efficient small object detection method called BCDet. This method measures the similarity between Gaussian distributions using a Normalized Gaussian Label Assignment (NGLA) strategy instead of the IoU metric, and designs a classification–localization joint quality screening strategy to optimize sample allocation. Furthermore, we construct a Multi-scale Hybrid Attention module (MSHA) to enhance the perception of small object features and improve the detector’s performance on small object detection tasks.
- We introduce a Normalized Gaussian Label Assignment (NGLA) strategy. This strategy introduces a symmetric and scale-invariant metric to quantify the similarity between two Gaussian distributions, thus overcoming the scale sensitivity of WD [8] and the asymmetry of KLD [9]. Furthermore, this paper constructs a candidate sample quality ranking mechanism to dynamically filter candidate samples with both high classification confidence and high localization accuracy, thereby solving the problem of imbalance between classification and localization scores in traditional assignment strategies.
- We utilize a Multi-scale Hybrid Attention module (MSHA) to integrate global attention and multi-scale local convolution features. By using multi-branch dilated convolution, the discriminative features of tiny objects are captured, enhancing the spatial sensitivity and scale adaptability of the detection head to tiny objects.
- The proposed method performs well on the AI-TOD-v2 dataset and the VisDrone-2019 dataset. It achieves state-of-the-art performance in common Anchor-free-type detectors such as FCOS [10], Anchor-based-type detectors such as Faster R-CNN [11], and the latest benchmark detection network, DetectoRS [12].
2. Related Work
2.1. Object Detection
2.2. Tiny Object Detection
3. Proposed Method
3.1. Overview
3.2. 2D Gaussian Modeling
3.3. Normalized Gaussian Label Assignment
3.4. Multi-Scale Hybrid Attention
4. Experiments
4.1. Experiment Settings
4.1.1. AI-TOD-v2 Dataset
4.1.2. VisDrone2019 Dataset
4.2. Comparison of State-of-the-Art Methods
4.2.1. Comparative Experiments on the AI-TOD-v2 Dataset
4.2.2. Comparative Experiments on the VisDrone2019 Dataset
4.3. Ablation Study
4.3.1. P-R Curve Comparison
4.3.2. Individual Component Effectiveness
4.3.3. Confusion Matrix Comparison
4.3.4. Comparison of Different Measurement Methods
4.3.5. Ablation Comparison Between Different Backbones
4.3.6. Comparison of Different Quality Score Combination Strategies
4.3.7. Comparison of Different Values
4.3.8. Computational Cost and Parameter Comparison
4.4. Visualization
5. Discussion
5.1. Analysis of Effectiveness
5.2. Limitations
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| CNN | Convolutional Neural Network |
| NGLA | Normalized Gaussian Label Assignment |
| NBCD | Normalized Bhattacharyya Distance |
| BCD | Bhattacharyya Distance |
| DCN | Deformable Convolution Cetwork |
| RPN | Region Proposal Network |
| FPN | Feature Pyramid Network |
| IoU | Intersection over Union |
References
- Wang, J.; Yang, W.; Guo, H.; Zhang, R.; Xia, G.S. Tiny object detection in aerial images. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; IEEE: New York, NY, USA, 2021; pp. 3791–3798. [Google Scholar]
- Xu, C.; Wang, J.; Yang, W.; Yu, H.; Yu, L.; Xia, G.-S. Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2022, 190, 79–93. [Google Scholar] [CrossRef]
- Kisantal, M.; Wojna, Z.; Murawski, J.; Naruniec, J.; Cho, K. Augmentation for small object detection. In Proceedings of the 9th International Conference on Advances in Computing and Information Technology, Sydney, Australia, 21–22 December 2019; pp. 119–133. [Google Scholar] [CrossRef]
- Ma, F.; Sun, X.; Zhang, F.; Zhou, Y.; Li, H.C. What Catch Your Attention in SAR Images: Saliency Detection Based on Soft-Superpixel Lacunarity Cue. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5200817. [Google Scholar] [CrossRef]
- Liu, H.; Zhou, X.; Wang, C.; Chen, S.; Kong, H. Fourier-Deformable Convolution Network for Road Segmentation From Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4415117. [Google Scholar] [CrossRef]
- Li, X.; Xu, F.; Zhang, J.; Zhang, H.; Lyu, X.; Liu, F.; Gao, H.; Kaup, A. Frequency-Guided Denoising Network for Semantic Segmentation of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2026, 64, 5400217. [Google Scholar] [CrossRef]
- Bhattacharyya, A. On a measure of divergence between two statistical populations defined by their probability distribution. Bull. Calcutta Math. Soc. 1943, 35, 99–110. [Google Scholar]
- Peyré, G.; Cuturi, M.; Hoedel, J.M. Computational optimal transport: With applications to data science. Found. Trends Mach. Learn. 2019, 11, 355–607. [Google Scholar]
- Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9626–9635. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Qiao, S.; Chen, L.C.; Yuille, A. Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 10213–10224. [Google Scholar]
- Chen, S.; Huang, H.; Zhu, S.; Xu, H.; He, Y.; Wang, D.H. SiamCCF: Siamese visual tracking via cross-layer calibration fusion. IET Comput. Vis. 2023, 17, 869–882. [Google Scholar] [CrossRef]
- Chen, S.; Lai, X.; Yan, Y.; Wang, D.H.; Zhu, S. Learning an attention-aware parallel sharing network for facial attribute recognition. J. Vis. Commun. Image Represent. 2023, 90, 103745. [Google Scholar] [CrossRef]
- Du, Y.; Yan, Y.; Chen, S.; Hua, Y. Object-adaptive LSTM network for real-time visual tracking with adversarial data augmentation. Neurocomputing 2020, 384, 67–83. [Google Scholar] [CrossRef]
- Huang, Y.; Yan, Y.; Chen, S.; Wang, H. Expression-targeted feature learning for effective facial expression recognition. J. Vis. Commun. Image Represent. 2018, 55, 677–687. [Google Scholar] [CrossRef]
- Chen, S.; Xu, B.; Zhang, M.; Yan, Y.; Du, X.; Zhuang, W.; Wu, Y. HC-GCN: Hierarchical contrastive graph convolutional network for unsupervised domain adaptation on person re-identification. Multimed. Syst. 2023, 29, 2779–2790. [Google Scholar] [CrossRef]
- Tsai, M.D.; Tseng, K.W.; Lai, C.C.; Wei, C.T.; Cheng, K.F. Exploring Airborne LiDAR and Aerial Photographs Using Machine Learning for Land Cover Classification. Remote Sens. 2023, 15, 2280. [Google Scholar] [CrossRef]
- Wei, C.T.; Tsai, M.D.; Chang, Y.L.; Wang, M.C.J. Enhancing the Accuracy of Land Cover Classification by Airborne LiDAR Data and WorldView-2 Satellite Imagery. ISPRS Int. J. Geo-Inf. 2022, 11, 391. [Google Scholar] [CrossRef]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Computer Vision—ECCV 2016; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Liu, J.; Yu, J.; Zhang, C.; Cui, H.; Zhao, J.; Zheng, W.; Xu, F.; Wei, X. Optimization of a multi-environmental detection model for tomato growth point buds based on multi-strategy improved YOLOv8. Sci. Rep. 2025, 15, 25726. [Google Scholar] [CrossRef]
- Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
- Yi, J.; Han, W.; Lai, F. YOLOv8n-DDSW: An efficient fish target detection network for dense underwater scenes. PeerJ Comput. Sci. 2025, 11, e2798. [Google Scholar] [CrossRef]
- Zheng, Y.; Zheng, W.; Du, X. Paddy-YOLO: An accurate method for rice pest detection. Comput. Electron. Agric. 2025, 238, 110777. [Google Scholar] [CrossRef]
- Law, H.; Deng, J. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference Computer Vision—ECCV, Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as Points. arXiv 2019, arXiv:1904.07850. [Google Scholar] [PubMed]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Computer Vision—ECCV 2020; Springer: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
- Sun, P.; Zhang, R.; Jiang, Y.; Kong, T.; Xu, C.; Zhan, W.; Tomizuka, M.; Li, L.; Yuan, Z.; Wang, C.; et al. Sparse r-cnn: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 14454–14463. [Google Scholar]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
- Shen, F.; Cui, J.; Li, W.; Zhou, S. TinyDef-DETR: A Transformer-Based Framework for Defect Detection in Transmission Lines from UAV Imagery. Remote Sens. 2025, 17, 3789. [Google Scholar] [CrossRef]
- Chen, X.; Yin, H. DCC-DETR: A real-time lightweight gesture recognition network for home human–robot interaction. J. Real-Time Image Process. 2025, 22, 184. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Shunzhi, Z.; Lizhao, L.; Si, C. Image feature detection algorithm based on the spread of Hessian source. Multimed. Syst. 2017, 23, 105–117. [Google Scholar] [CrossRef]
- Chen, S.; Qiu, L.; Tian, Z.; Yan, Y.; Wang, D.H.; Zhu, S. MTNet: Mutual tri-training network for unsupervised domain adaptation on person re-identification. J. Vis. Commun. Image Represent. 2023, 90, 103749. [Google Scholar] [CrossRef]
- Zhang, Z.; Li, J.; Li, J.; Xu, J. SAMKD: Spatial-aware Adaptive Masking Knowledge Distillation for Object Detection. arXiv 2025, arXiv:2501.07101. [Google Scholar]
- Liu, Z.; Zhang, Y.; He, J.; Zhang, T.; Rehman, S.u.; Saraee, M.; Sun, C. Enhancing Infrared Small Target Detection: A Saliency-Guided Multi-Task Learning Approach. IEEE Trans. Intell. Transp. Syst. 2025, 26, 3603–3618. [Google Scholar] [CrossRef]
- Chen, S.; Wang, L.; Wang, Z.; Yan, Y.; Wang, D.H.; Zhu, S. Learning meta-adversarial features via multi-stage adaptation network for robust visual object tracking. Neurocomputing 2022, 491, 365–381. [Google Scholar] [CrossRef]
- Zheng, X.; Qiu, Y.; Zhang, G.; Lei, T.; Jiang, P. ESL-YOLO: Small Object Detection with Effective Feature Enhancement and Spatial-Context-Guided Fusion Network for Remote Sensing. Remote Sens. 2024, 16, 4374. [Google Scholar] [CrossRef]
- Wen, Z.; Li, P.; Liu, Y.; Chen, J.; Xiang, X.; Li, Y.; Wang, H.; Zhao, Y.; Zhou, G. FANet: Frequency-Aware Attention-Based Tiny-Object Detection in Remote Sensing Images. Remote Sens. 2025, 17, 4066. [Google Scholar] [CrossRef]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9759–9768. [Google Scholar]
- Xu, C.; Wang, J.; Yang, W.; Yu, H.; Yu, L.; Xia, G.S. RFLA: Gaussian receptive field based label assignment for tiny object detection. In Computer Vision—ECCV 2022; Springer: Cham, Switzerland, 2022; pp. 526–543. [Google Scholar]
- Zhou, Z.; Zhu, Y. KLDet: Detecting Tiny Objects in Remote Sensing Images via Kullback–Leibler Divergence. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4703316. [Google Scholar] [CrossRef]
- Fu, R.; Chen, C.; Yan, S.; Heidari, A.A.; Wang, X.; Escorcia-Gutierrez, J.; Mansour, R.F.; Chen, H. Gaussian similarity-based adaptive dynamic label assignment for tiny object detection. Neurocomputing 2023, 543, 126285. [Google Scholar] [CrossRef]
- Xu, C.; Ding, J.; Wang, J.; Yang, W.; Yu, H.; Yu, L.; Xia, G.S. Dynamic coarse-to-fine learning for oriented tiny object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7318–7328. [Google Scholar]
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 213–226. [Google Scholar]
- Cao, B.; Yao, H.; Zhu, P.; Hu, Q. Visible and Clear: Finding Tiny Objects in Difference Map. In Computer Vision—ECCV 2024; Springer: Cham, Switzerland, 2024. [Google Scholar]
- Bian, J.; Mingtao, F.; Weisheng, D.; Fangfang, W.; Jianqiao, L.; Yaonan, W.; Guangming, S. Feature Information Driven Position Gaussian Distribution Estimation for Tiny Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025. [Google Scholar]
- Du, B.; Huang, Y.; Chen, J.; Huang, D. Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
- Cai, X.; Lai, Q.; Wang, Y.; Wang, W.; Sun, Z.; Yao, Y. Poly Kernel Inception Network for Remote Sensing Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
- Hou, X.; Liu, M.; Zhang, S.; Wei, P.; Chen, B. Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 17574–17583. [Google Scholar] [CrossRef]
- Lu, X.; Li, B.; Yue, Y.; Li, Q.; Yan, J. Grid r-cnn. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 7363–7372. [Google Scholar]
- Zhu, B.; Wang, J.; Jiang, Z.; Zong, F.; Liu, S.; Li, Z.; Sun, J. AutoAssign: Differentiable Label Assignment for Dense Object Detection. arXiv 2020, arXiv:2007.03496. [Google Scholar] [CrossRef]
- Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. Tood: Task-aligned one-stage object detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; IEEE Computer Society: New York, NY, USA, 2021; pp. 3490–3499. [Google Scholar]
- Liu, C.; Gao, G.; Huang, Z.; Hu, Z.; Liu, Q.; Wang, Y. YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images. IEEE Trans. Intell. Transp. Syst. 2024, 25, 13863–13875. [Google Scholar] [CrossRef]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. [Google Scholar] [CrossRef]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep High-Resolution Representation Learning for Human Pose Estimation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5686–5696. [Google Scholar] [CrossRef]







| Method | Venue | Schedule | Backbone | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Anchor-based | ||||||||||
| Faster R-CNN † [11] | TPAMI’17 | 1× | ResNet-50 | 13.3 | 30.7 | 9.3 | 0.0 | 9.7 | 25.5 | 37.3 |
| RetinaNet † [21] | TPAMI’20 | 1× | ResNet-50 | 6.9 | 19.7 | 3.2 | 2.1 | 7.0 | 10.9 | 17.8 |
| ATSS [44] | CVPR’20 | 1× | ResNet-50 | 15.5 | 36.5 | 9.6 | 1.9 | 12.7 | 24.6 | 36.2 |
| Cascade R-CNN † [20] | TPAMI’21 | 1× | ResNet-50 | 15.7 | 34.9 | 11.9 | 0.1 | 11.6 | 28.3 | 39.4 |
| DetectoRS † [12] | CVPR’21 | 1× | ResNet-50 | 16.0 | 35.4 | 12.2 | 0.1 | 12.2 | 28.2 | 39.8 |
| NWD-RKA † [2] | ISPRS’22 | 1× | ResNet-50 | 24.1 | 55.4 | 16.8 | 9.0 | 23.7 | 29.7 | 39.4 |
| RFLA † [45] | ECCV’22 | 1× | ResNet-50 | 24.5 | 56.2 | 17.5 | 9.5 | 24.2 | 29.5 | 39.8 |
| CEASC [55] | CVPR’23 | 1× | ResNet-50 | 17.0 | 39.5 | 11.5 | 3.9 | 16.1 | 22.0 | - |
| PKS R-CNN [56] | CVPR’24 | 1× | - | 11.1 | 25.1 | 8.5 | 0.0 | 7.3 | 22.1 | - |
| SR-TOD [53] | ECCV’24 | 1× | ResNet-50 | 22.9 | 54.0 | 15.7 | 7.7 | 22.9 | 28.4 | - |
| Salience DETR [57] | CVPR’24 | 1× | - | 20.9 | 52.6 | 12.6 | 8.2 | 20.4 | 27.0 | - |
| Bian et al. [54] | CVPR’25 | 1× | ResNet-50 | 23.9 | 55.8 | 16.6 | 7.3 | 23.5 | 29.0 | - |
| Anchor-free | ||||||||||
| Grid R-CNN [58] | CVPR’19 | 1× | ResNet-50 | 14.7 | 31.7 | 11.4 | 0.0 | 11.5 | 27.4 | 38.0 |
| FCOS * [10] | ICCV’19 | 1× | ResNet-50 | 16.4 | 37.7 | 11.7 | 5.4 | 16.8 | 19.7 | 26.1 |
| AutoAssign [59] | ArXiv’20 | 1× | ResNet-50 | 14.3 | 37.5 | 7.5 | 3.9 | 14.1 | 19.4 | 23.2 |
| TOOD [60] | ICCV’21 | 1× | ResNet-50 | 18.6 | 43.0 | 12.7 | 3.2 | 16.8 | 26.2 | 38.1 |
| KLDet † [46] | TGRS’24 | 1× | ResNet-50 | 19.6 | 47.4 | 12.7 | 7.9 | 20.5 | 22.9 | 28.4 |
| YOLC † [61] | TITS’24 | 4× | HRNet-w48 | 23.5 | 57.2 | 15.2 | 9.6 | 23.4 | 28.6 | 36.0 |
| Ours | ||||||||||
| FCOS w/ BCDet | - | 1× | ResNet-50 | 19.3 | 47.2 | 12.2 | 8.3 | 19.2 | 21.8 | 28.1 |
| Faster R-CNN w/ BCDet | - | 1× | ResNet-50 | 22.3 | 54.7 | 13.9 | 8.2 | 21.7 | 28.0 | 35.5 |
| DetectoRS w/ BCDet | - | 1× | ResNet-50 | 25.7 | 58.9 | 18.6 | 8.5 | 25.4 | 30.8 | 40.0 |
| Method | Venue | Schedule | |||||
|---|---|---|---|---|---|---|---|
| Faster R-CNN † [11] | TPAMI’17 | 1× | 20.0 | 32.9 | 0.0 | 16.1 | 34.3 |
| Cascade R-CNN [20] | CVPR’18 | 1× | 25.2 | 42.6 | 0.1 | 7.0 | 22.5 |
| FCOS † [10] | ICCV’19 | 1× | 18.0 | 33.9 | 1.5 | 14.3 | 28.1 |
| DetectoRS † [12] | CVPR’21 | 1× | 24.9 | 41.7 | 0.0 | 21.6 | 39.7 |
| NWD-RKA † [2] | ISPRS’22 | 1× | 26.5 | 47.2 | 4.8 | 22.5 | 38.2 |
| CEASC [55] | CVPR’23 | 1× | 25.1 | 42.6 | 2.0 | 8.6 | 20.7 |
| KLDet † [46] | TGRS’24 | 1× | 19.7 | 38.3 | 3.4 | 17.2 | 27.6 |
| PKS R-CNN [56] | CVPR’24 | 1× | 24.3 | 42.4 | 0.1 | 7.3 | 22.5 |
| FCOS * | - | 1× | 21.7 | 41.0 | 4.7 | 19.2 | 30.4 |
| Faster R-CNN * | - | 1× | 23.3 | 44.1 | 3.6 | 20.1 | 33.9 |
| DetectoRS * | - | 1× | 27.8 | 48.9 | 5.4 | 23.8 | 39.8 |
| Baseline | NGLA | MSHA | ||||||
|---|---|---|---|---|---|---|---|---|
| ✓ | 13.3 | 30.7 | 9.3 | 0.0 | 9.7 | 25.5 | ||
| ✓ | ✓ | 21.3 | 53.9 | 13.1 | 7.6 | 21.5 | 27.1 | |
| ✓ | ✓ | 18.3 | 46.3 | 11.2 | 7.2 | 19.2 | 21.2 | |
| ✓ | ✓ | ✓ | 22.3 | 54.7 | 13.9 | 8.2 | 21.7 | 28.0 |
| Metric | ||||||
|---|---|---|---|---|---|---|
| IoU | 16.7 | 45.7 | 7.9 | 7.6 | 18.2 | 22.0 |
| CIoU [62] | 12.2 | 28.3 | 8.4 | 0.0 | 8.3 | 22.3 |
| GIoU [63] | 15.3 | 43.6 | 6.5 | 5.8 | 16.5 | 20.1 |
| DIoU [62] | 15.3 | 42.7 | 6.9 | 5.3 | 16.3 | 19.8 |
| WD [8] | 21.3 | 53.0 | 12.8 | 7.0 | 20.5 | 28.1 |
| (Ours) | 22.3 | 54.7 | 13.9 | 8.2 | 21.7 | 28.0 |
| Backbone | Method | ||||
|---|---|---|---|---|---|
| ResNet-101 | Faster R-CNN † [11] | 13.1 | 29.9 | 0.0 | 9.2 |
| FR w/NWD-RKA † [2] | 20.8 | 52.4 | 8.5 | 20.4 | |
| FR w/BCDet | 21.7 | 54.2 | 8.7 | 21.2 | |
| ResNeXt-101-32x4d | Faster R-CNN † [11] | 14.0 | 32.3 | 0.1 | 10.0 |
| FR w/NWD-RKA † [2] | 22.5 | 55.4 | 8.9 | 21.6 | |
| FR w/BCDet | 23.4 | 56.8 | 8.7 | 22.9 | |
| HRNet-w32 | Faster R-CNN † [11] | 14.5 | 32.8 | 0.1 | 11.1 |
| FR w/NWD-RKA † [2] | 22.6 | 55.3 | 8.4 | 21.8 | |
| FR w/BCDet | 23.9 | 57.9 | 9.1 | 23.1 |
| Metric Method | |||||||
|---|---|---|---|---|---|---|---|
| IoU | 13.3 | 30.7 | 9.3 | 0.0 | 9.7 | 25.5 | 37.3 |
| × IoUα | 14.2 | 33.5 | 9.8 | 0.5 | 10.6 | 26.2 | 37.7 |
| × | 21.7 | 53.9 | 13.3 | 7.8 | 20.9 | 27.8 | 35.1 |
| × | 22.3 | 54.7 | 13.9 | 8.2 | 21.7 | 28.0 | 35.5 |
| 1 | 21.8 | 53.9 | 13.6 | 7.6 | 20.6 | 27.8 | 35.3 |
| 3 | 21.9 | 54.2 | 13.7 | 7.2 | 20.9 | 27.5 | 35.7 |
| 4 | 22.1 | 54.4 | 14.1 | 7.7 | 21.2 | 27.8 | 35.5 |
| 5 | 22.3 | 54.7 | 13.9 | 8.2 | 21.7 | 28.0 | 35.5 |
| 6 | 22.0 | 54.5 | 13.8 | 8.2 | 21.4 | 27.7 | 35.2 |
| 10 | 21.9 | 54.0 | 13.6 | 7.8 | 20.8 | 27.4 | 35.2 |
| Method | Strategy | Params (M) | FLOPs (G) | FPS | (%) | |
|---|---|---|---|---|---|---|
| FCOS [10] | Baseline | 31.85 | 196.96 | 6.0 | 16.4 | |
| NGLA(✓) | MSHA(×) | 31.85 | 196.96 | 6.0 | 18.4 | |
| NGLA(✓) | MSHA(✓) | 33.45 | 198.65 | 5.8 | 19.3 | |
| Faster R-CNN [11] | Baseline | 41.16 | 206.7 | 6.9 | 13.3 | |
| NGLA(✓) | MSHA(×) | 41.16 | 206.7 | 6.9 | 21.3 | |
| NGLA(✓) | MSHA(✓) | 42.96 | 208.58 | 6.6 | 22.3 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lin, S.; Zhong, L.; Chen, S.; Wang, D.-H. Tiny Object Detection via Normalized Gaussian Label Assignment and Multi-Scale Hybrid Attention. Remote Sens. 2026, 18, 396. https://doi.org/10.3390/rs18030396
Lin S, Zhong L, Chen S, Wang D-H. Tiny Object Detection via Normalized Gaussian Label Assignment and Multi-Scale Hybrid Attention. Remote Sensing. 2026; 18(3):396. https://doi.org/10.3390/rs18030396
Chicago/Turabian StyleLin, Shihao, Li Zhong, Si Chen, and Da-Han Wang. 2026. "Tiny Object Detection via Normalized Gaussian Label Assignment and Multi-Scale Hybrid Attention" Remote Sensing 18, no. 3: 396. https://doi.org/10.3390/rs18030396
APA StyleLin, S., Zhong, L., Chen, S., & Wang, D.-H. (2026). Tiny Object Detection via Normalized Gaussian Label Assignment and Multi-Scale Hybrid Attention. Remote Sensing, 18(3), 396. https://doi.org/10.3390/rs18030396

