DHQ-DETR: Distributed and High-Quality Object Query for Enhanced Dense Detection in Remote Sensing
Abstract
:1. Introduction
- We propose a groundbreaking distribution-based approach to box modeling and incorporate the distribution focus loss, which demonstrates robustness when dealing with dense multi-scale targets.
- We introduce a high-quality query selection module designed to resolve the misalignment inherent in the initialization of object queries.
- We develop a refined assignment strategy, coupled with an extra detection head, to enhance the stability and convergence speed of the DETR training process.
2. Related Work
2.1. CNN-Based Detectors
2.2. End-to-End Object Detector
3. Materials and Methods
3.1. The Overall Structure
3.2. Distribution-Based Modeling
3.2.1. Basic Decoder
3.2.2. Distribution-Based Decoder
3.2.3. Distribution Focal Loss
3.3. High-Quality Query Selection Module
3.3.1. Basic Method
3.3.2. HQQS Module
3.4. Short-Circuit Training Decoder
Algorithm 1 Task-aligned assignment algorithm |
Input:
|
Model | Epochs | AP | AP50 | AP75 | APS | APM | APL | Params | GFLOPS |
---|---|---|---|---|---|---|---|---|---|
DETR-R50 [16] | 500 | 42.0 | 62.4 | 44.2 | 20.5 | 45.8 | 61.1 | 41M | 86 |
Anchor DETR-R50 [40] | 50 | 42.1 | 63.1 | 44.9 | 22.3 | 46.2 | 60.0 | 39M | – |
Conditional DETR-R50 [41] | 50 | 40.9 | 61.8 | 43.3 | 20.8 | 44.6 | 59.2 | 44M | 90 |
DAB-DETR-R50 [18] | 50 | 42.2 | 63.1 | 44.7 | 21.5 | 45.7 | 60.3 | 44M | 94 |
DN-DETR-R50 [19] | 50 | 44.1 | 64.4 | 46.7 | 22.9 | 48.0 | 63.4 | 44M | 94 |
Align-DETR-R50 [42] | 50 | 46.0 | 64.9 | 49.5 | 25.2 | 50.5 | 64.7 | 42M | 94 |
RT-DETR-R50 [22] | 72 | 53.1 | 71.3 | 57.7 | 34.8 | 58.0 | 70.0 | 42M | 136 |
YOLOv5 L [10] | 300 | 49.0 | 67.3 | – | – | – | – | 46M | 109 |
YOLOv7 L [11] | 300 | 51.2 | 69.7 | 55.5 | 35.2 | 55.9 | 66.7 | 36M | 104 |
YOLOv8 L [12] | 300 | 52.9 | 69.8 | 57.5 | 35.3 | 58.3 | 69.8 | 43M | 165 |
DETR-R101 [16] | 500 | 43.5 | 63.8 | 46.4 | 21.9 | 48.0 | 61.8 | 60M | 152 |
Anchor DETR-R101 [40] | 50 | 43.5 | 64.3 | 46.6 | 23.2 | 47.7 | 61.4 | 58M | – |
Conditional DETR-R101 [41] | 50 | 42.8 | 63.7 | 46.0 | 21.7 | 46.6 | 60.9 | 63M | 156 |
DAB-DETR-R101 [18] | 50 | 43.5 | 63.9 | 46.6 | 23.6 | 47.3 | 61.5 | 63M | 174 |
DN-DETR-R101 [19] | 50 | 45.2 | 65.5 | 48.3 | 24.1 | 49.1 | 65.1 | 63M | 174 |
Align-DETR-R101 [42] | 50 | 46.9 | 65.5 | 50.9 | 25.6 | 51.9 | 66.0 | 61M | 174 |
DETR-DC5-R50 [16] | 500 | 43.3 | 63.1 | 45.9 | 22.5 | 47.3 | 61.1 | 41M | 187 |
Anchor DETR-DC5-R50 [40] | 50 | 44.2 | 64.7 | 47.5 | 24.7 | 48.2 | 60.6 | 39M | 151 |
Conditional DETR-DC5-R50 [41] | 50 | 43.8 | 64.4 | 46.7 | 24.0 | 47.6 | 60.7 | 44M | 195 |
DAB-DETR-DC5-R50 [18] | 50 | 44.5 | 65.1 | 47.7 | 25.3 | 48.2 | 62.3 | 44M | 202 |
DN-DETR-DC5-R50 [19] | 50 | 46.3 | 66.4 | 49.7 | 26.7 | 50.0 | 64.3 | 44M | 202 |
Align-DETR-DC5-R50 [42] | 50 | 48.3 | 66.7 | 52.5 | 29.7 | 52.8 | 65.9 | 42M | 200 |
DETR-DC5-R101 [16] | 500 | 44.9 | 64.7 | 47.7 | 23.7 | 49.5 | 62.3 | 60M | 253 |
Anchor DETR-DC5-R101 [40] | 50 | 45.1 | 65.7 | 48.8 | 25.8 | 49.4 | 61.6 | 58M | – |
Conditional DETR-DC5-R101 [41] | 50 | 45.0 | 65.5 | 48.4 | 26.1 | 48.9 | 62.8 | 63M | 262 |
DAB-DETR-DC5-R101 [18] | 50 | 45.8 | 65.9 | 49.3 | 27.0 | 49.8 | 63.8 | 63M | 282 |
DN-DETR-DC5-R101 [19] | 50 | 47.3 | 67.5 | 50.8 | 28.6 | 51.5 | 65.0 | 63M | 282 |
Align-DETR-DC5-R101 [42] | 50 | 49.3 | 67.4 | 53.7 | 30.6 | 54.3 | 66.4 | 61M | 280 |
DHQ-DETR | 72 | 53.7 | 71.6 | 57.9 | 34.7 | 58.4 | 70.6 | 43M | 154 |
4. Results
Model | Extra Data | AP | AP50 | AP75 |
---|---|---|---|---|
YOLOv5 (2020) [10] | × | 49.0 | 73.0 | 50.9 |
YOLOv8 (2023) [12] | × | 52.9 | 74.5 | 56.1 |
DETR (2020) [16] | × | 46.7 | 72.3 | 49.5 |
DN-DETR (2022) [19] | × | 53.1 | 78.2 | 57.5 |
RT-DETR (2023) [22] | × | 53.0 | 79.0 | 57.8 |
DecoupleNet D2 (2024) [45] | ✓ | - | 78.0 | - |
PP-YOLOE-R-l (2022) [46] | ✓ | - | 80.0 | - |
MAE + MTP (2024) [47] | ✓ | - | 80.7 | - |
LSKNet (2024) [48] | ✓ | - | 81.6 | - |
Strip R-CNN (2025) [49] | ✓ | - | 82.3 | - |
DHQ-DETR | × | 54.3 | 81.5 | 58.9 |
4.1. Main Results
4.2. Ablation Studies
4.2.1. Distribution-Based Location Offset
4.2.2. HQQS Module
4.3. Assignment Strategies
5. Discussion
5.1. Limitations
5.2. Future Research Directions
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
DETR | Detection transformer |
NMS | Non-Maximum Suppression |
DHQ | Distributed and high-quality object query |
HQQS | High-quality query selection module |
FFN | Feedforward network |
DFL | Distribution focus loss |
References
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; Number 1. pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Long, X.; Deng, K.; Wang, G.; Zhang, Y.; Dang, Q.; Gao, Y.; Shen, H.; Ren, J.; Han, S.; Ding, E.; et al. PP-YOLO: An Effective and Efficient Implementation of Object Detector. arXiv 2020, arXiv:2007.12099. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.; Berg, A.C. SSD: Single Shot MultiBox Detector. arXiv 2015, arXiv:1512.02325. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Jocher, G. YOLOv5 Release v7.0. 2022. Available online: https://github.com/ultralytics/yolov5/tree/v7.0 (accessed on 1 March 2023).
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-time Object Detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Jocher, G. YOLOv8. 2023. Available online: https://github.com/ultralytics/ultralytics/tree/main (accessed on 1 March 2023).
- Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Improving Object Detection With One Line of Code. arXiv 2017, arXiv:1704.04503. [Google Scholar] [CrossRef]
- Zhou, P.; Zhou, C.; Peng, P.; Du, J.; Sun, X.; Guo, X.; Huang, F. NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination. arXiv 2020, arXiv:2007.13376. [Google Scholar] [CrossRef]
- Liu, S.; Huang, D.; Wang, Y. Adaptive NMS: Refining Pedestrian Detection in a Crowd. arXiv 2019, arXiv:1904.03629. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 1, 2, 4, 6. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv 2021, arXiv:2010.04159. [Google Scholar] [CrossRef]
- Liu, S.; Li, F.; Zhang, H.; Yang, X.; Qi, X.; Su, H.; Zhu, J.; Zhang, L. DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR. arXiv 2022, arXiv:2201.12329. [Google Scholar]
- Li, F.; Zhang, H.; Liu, S.; Guo, J.; Ni, L.M.; Zhang, L. DN-DETR: Accelerate DETR training by introducing query denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2, 3, 5, 6. [Google Scholar]
- Chen, Q.; Chen, X.; Wang, J.; Zhang, S.; Yao, K.; Feng, H.; Han, J.; Ding, E.; Zeng, G.; Wang, J. Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment. arXiv 2023, arXiv:2207.13085. [Google Scholar] [CrossRef]
- Roh, B.; Shin, J.; Shin, W.; Kim, S. Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity. arXiv 2021, arXiv:2111.14330. [Google Scholar] [CrossRef]
- Lv, W.; Zhao, Y.; Xu, S.; Wei, J.; Wang, G.; Cui, C.; Du, Y.; Dang, Q.; Liu, Y. DETRs Beat YOLOs on Real-time Object Detection. arXiv 2023, arXiv:2304.08069. [Google Scholar] [CrossRef]
- Zong, Z.; Song, G.; Liu, Y. DETRs with Collaborative Hybrid Assignments Training. arXiv 2023, arXiv:2211.12860. [Google Scholar] [CrossRef]
- Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. TOOD: Task-aligned One-stage Object Detection. arXiv 2021, arXiv:2108.07755. [Google Scholar] [CrossRef]
- Kirillov, A.; Girshick, R.; He, K.; Dollár, P. Panoptic feature pyramid networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6399–6408. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Hosang, J.H.; Benenson, R.; Schiele, B. A convnet for non-maximum suppression. arXiv 2015, arXiv:1511.06437. [Google Scholar] [CrossRef]
- Hosang, J.H.; Benenson, R.; Schiele, B. Learning non-maximum suppression. arXiv 2017, arXiv:1705.02950. [Google Scholar] [CrossRef]
- Solovyev, R.A.; Wang, W. Weighted Boxes Fusion: Ensembling boxes for object detection models. arXiv 2019, arXiv:1910.13302. [Google Scholar] [CrossRef]
- Choi, J.; Chun, D.; Kim, H.; Lee, H. Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving. arXiv 2019, arXiv:1904.04620. [Google Scholar] [CrossRef]
- He, Y.; Zhang, X.; Savvides, M.; Kitani, K. Softer-NMS: Rethinking Bounding Box Regression for Accurate Object Detection. arXiv 2018, arXiv:1809.08545. [Google Scholar] [CrossRef]
- Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. arXiv 2020, arXiv:2006.04388. [Google Scholar] [CrossRef]
- Zheng, D.; Dong, W.; Hu, H.; Chen, X.; Wang, Y. Less is More: Focus Attention for Efficient DETR. arXiv 2023, arXiv:2307.12612. [Google Scholar] [CrossRef]
- Zhu, P.; Wen, L.; Du, D.; Bian, X.; Fan, H.; Hu, Q.; Ling, H. Detection and tracking meet drones challenge. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7380–7399. [Google Scholar] [CrossRef]
- Girshick, R.B.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv 2013, arXiv:1311.2524. [Google Scholar] [CrossRef]
- Tychsen-Smith, L.; Petersson, L. Improving Object Localization with Fitness NMS and Bounded IoU Loss. arXiv 2017, arXiv:1711.00164. [Google Scholar] [CrossRef]
- Rezatofighi, S.H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.D.; Savarese, S. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. arXiv 2019, arXiv:1902.09630. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, X.; Yang, T.; Sun, J. Anchor detr: Query design for transformer-based detector. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 28 February–1 March 2022; pp. 2, 3, 6. [Google Scholar]
- Meng, D.; Chen, X.; Fan, Z.; Zeng, G.; Li, H.; Yuan, Y.; Sun, L.; Wang, J. Conditional detr for fast training convergence. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 10–17 October 2021; pp. 2, 3, 6. [Google Scholar]
- Cai, Z.; Liu, S.; Wang, G.; Ge, Z.; Zhang, X.; Huang, D. Align-DETR: Improving DETR with Simple IoU-aware BCE loss. arXiv 2023, arXiv:2304.07527. [Google Scholar] [CrossRef]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3974–3983. [Google Scholar]
- Lin, T.; Maire, M.; Belongie, S.J.; Bourdev, L.D.; Girshick, R.B.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. arXiv 2014, arXiv:1405.0312. [Google Scholar] [CrossRef]
- Lu, W.; Chen, S.B.; Shu, Q.L.; Tang, J.; Luo, B. DecoupleNet: A Lightweight Backbone Network With Efficient Feature Decoupling for Remote Sensing Visual Tasks. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4414613. [Google Scholar] [CrossRef]
- Wang, X.; Wang, G.; Dang, Q.; Liu, Y.; Hu, X.; Yu, D. PP-YOLOE-R: An Efficient Anchor-Free Rotated Object Detector. arXiv 2022, arXiv:2211.02386. [Google Scholar] [CrossRef]
- Wang, D.; Zhang, J.; Xu, M.; Liu, L.; Wang, D.; Gao, E.; Han, C.; Guo, H.; Du, B.; Tao, D.; et al. MTP: Advancing Remote Sensing Foundation Model via Multitask Pretraining. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 11632–11654. [Google Scholar] [CrossRef]
- Li, Y.; Li, X.; Dai, Y.; Hou, Q.; Liu, L.; Liu, Y.; Cheng, M.M.; Yang, J. LSKNet: A Foundation Lightweight Backbone for Remote Sensing. arXiv 2024, arXiv:2403.11735. [Google Scholar] [CrossRef]
- Yuan, X.; Zheng, Z.; Li, Y.; Liu, X.; Liu, L.; Li, X.; Hou, Q.; Cheng, M.M. Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection. arXiv 2025, arXiv:2501.03775. [Google Scholar] [CrossRef]
- Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Shi, J.; Ouyang, W.; et al. Hybrid Task Cascade for Instance Segmentation. arXiv 2019, arXiv:1901.07518. [Google Scholar] [CrossRef]
- Jiangmiao, P.; Kai, C.; Jianping, S.; Huajun, F.; Wanli, O.; Dahua, L. Libra R-CNN: Towards Balanced Learning for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep High-Resolution Representation Learning for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 3349–3364. [Google Scholar] [CrossRef]
- Chen, L.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. arXiv 2016, arXiv:1606.00915. [Google Scholar] [CrossRef] [PubMed]
- Hong, S.; Kang, S.; Cho, D. Patch-Level Augmentation for Object Detection in Aerial Images. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 127–134. [Google Scholar] [CrossRef]
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
- Chen, C.; Zhang, Y.; Lv, Q.; Wei, S.; Wang, X.; Sun, X.; Dong, J. RRNet: A Hybrid Detector for Object Detection in Drone-Captured Images. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 100–108. [Google Scholar] [CrossRef]
- Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. arXiv 2018, arXiv:1808.01244. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Li, Y.; Chen, Y.; Wang, N.; Zhang, Z. Scale-Aware Trident Networks for Object Detection. arXiv 2019, arXiv:1901.01892. [Google Scholar] [CrossRef]
- Yang, B.; Xu, W.; Bi, F.; Zhang, Y.; Kang, L.; Yi, L. Multi-scale neighborhood query graph convolutional network for multi-defect location in CFRP laminates. Comput. Ind. 2023, 153, 104015. [Google Scholar] [CrossRef]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as Points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
- Meethal, A.; Granger, E.; Pedersoli, M. Cascaded Zoom-in Detector for High Resolution Aerial Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar]
- Zhang, H.; Wang, Y.; Dayoub, F.; Sünderhauf, N. VarifocalNet: An IoU-aware Dense Object Detector. arXiv 2020, arXiv:2008.13367. [Google Scholar] [CrossRef]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection. arXiv 2019, arXiv:1912.02424. [Google Scholar] [CrossRef]
Model | AP | AP50 | AP75 |
---|---|---|---|
Cascade R-CNN [1] | 16.0 | 31.9 | 15.0 |
HTC-drone [50] | 22.6 | 45.2 | 20.0 |
Libra-HBR [51] | 25.6 | 48.3 | 24.0 |
HRDet+ [52] | 28.4 | 54.5 | 26.1 |
S + D [1,53] | 28.6 | 51.0 | 28.3 |
ACM-OD [3,54] | 29.1 | 54.1 | 27.4 |
DPNet [1,55] | 29.6 | 54.0 | 28.7 |
RRNet [56] | 29.1 | 55.8 | 27.2 |
RetinaNet [7] | 11.8 | 21.3 | 11.6 |
CornerNet [57] | 17.4 | 34.1 | 15.8 |
YOLOv3 [58] | 17.8 | 37.3 | 15.0 |
TridentNet [59] | 22.5 | 43.3 | 20.5 |
CNAnet [60] | 26.4 | 48.0 | 25.5 |
EHR-RetinaNet [7] | 26.5 | 48.3 | 25.4 |
CN-DhVaSa [61] | 27.8 | 50.7 | 26.8 |
DETR (2020) [16] | 23.1 | 39.8 | 25.7 |
DN-DETR (2022) [19] | 31.4 | 51.6 | 26.8 |
RT-DETR (2023) [22] | 31.0 | 50.2 | 26.9 |
CZ Det (2023) [62] | 32.2 | 54.9 | 31.2 |
DHQ-DETR | 32.4 | 55.4 | 30.0 |
Model | Sampling Level | DFL | Epochs | AP | AP50 |
---|---|---|---|---|---|
RT-DETR [22] | × | × | 36 | 48.7 | 67.1 |
DHQ-DETR | 16 | × | 36 | 48.5 | 66.6 |
DHQ-DETR | 32 | × | 36 | 48.8 | 67.0 |
DHQ-DETR | 64 | × | 36 | 48.7 | 67.1 |
DHQ-DETR | 16 | ✓ | 36 | 49.1 | 67.3 |
DHQ-DETR | 32 | ✓ | 36 | 49.4 | 67.5 |
DHQ-DETR | 64 | ✓ | 36 | 49.3 | 67.3 |
Model | IoU-Aware [63] | HQQS | AP | ||
---|---|---|---|---|---|
RT-DETR [22] | × | × | 47.9 | 0.35 | 0.47 |
RT-DETR [22] | ✓ | × | 48.7 | 0.82 | 0.45 |
RT-DETR [22] | ✓ | ✓ | 49.5 | 0.79 | 0.58 |
Assignment Strategies | Epochs | AP | |
---|---|---|---|
Vanilla | 36 | 48.7 | 67.1 |
RetinaNet [7] | 36 | 49.6 | 67.9 |
Faster R-CNN [3] | 36 | 49.9 | 68.9 |
FCOS [5] | 36 | 50.1 | 68.6 |
ATSS [64] | 36 | 50.4 | 68.9 |
Ours | 36 | 50.8 | 69.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, C.; Zhang, J.; Huo, B.; Xue, Y. DHQ-DETR: Distributed and High-Quality Object Query for Enhanced Dense Detection in Remote Sensing. Remote Sens. 2025, 17, 514. https://doi.org/10.3390/rs17030514
Li C, Zhang J, Huo B, Xue Y. DHQ-DETR: Distributed and High-Quality Object Query for Enhanced Dense Detection in Remote Sensing. Remote Sensing. 2025; 17(3):514. https://doi.org/10.3390/rs17030514
Chicago/Turabian StyleLi, Chenglong, Jianwei Zhang, Bihan Huo, and Yingjian Xue. 2025. "DHQ-DETR: Distributed and High-Quality Object Query for Enhanced Dense Detection in Remote Sensing" Remote Sensing 17, no. 3: 514. https://doi.org/10.3390/rs17030514
APA StyleLi, C., Zhang, J., Huo, B., & Xue, Y. (2025). DHQ-DETR: Distributed and High-Quality Object Query for Enhanced Dense Detection in Remote Sensing. Remote Sensing, 17(3), 514. https://doi.org/10.3390/rs17030514