Refined Deformable-DETR for Electric Pylon Detection Based on Optical Satellite Image
Abstract
1. Introduction
- A Refined Deformable-DETR framework is proposed for electric pylon detection in optical remote sensing imagery, which enhances detection performance by refining object query representations.
- A Spatial Context-aware Query Modulation (SCQM) module is introduced to enable effective interaction between global contextual information and object queries, improving robustness against scale variation and background interference.
- Extensive experiments on the self-constructed EPRD demonstrate that the proposed method improves the overall AP by 1.4% over the baseline, with a notable gain of 3.7% for small objects. Additional evaluations on the public EPD further verify the generalization capability of the proposed approach.
2. Related Work
3. Method
3.1. Deformable-DETR
3.2. Refined Deformable-DETR
3.3. Spatial Context-Aware Query Modulation (SCQM)
3.4. Query Modulation Strength Analysis
4. Results and Analysis
4.1. Experimental Environment
4.2. Datasets and Evaluation Metrics
4.2.1. EPRD
4.2.2. EPD
4.2.3. Evaluation Metrics
4.3. Main Results on EPRD
4.4. Ablation Studies
4.4.1. Effect of the Reduction Ratio r
4.4.2. Effect of Pooling Strategy
4.4.3. Effect of Query Modulation Strength
4.5. Generalization Results on the EPD
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Shen, Y.; Huang, J.; Wang, J.; Jiang, J.; Li, J.; Ferreira, V. A Review and Future Directions of Techniques for Extracting Powerlines and Pylons from LiDAR Point Clouds. Int. J. Appl. Earth Obs. Geoinf. 2024, 132, 104056. [Google Scholar] [CrossRef]
- Tian, G.; Meng, S.; Bai, X.; Liu, L.; Zhi, Y.; Zhao, B.; Meng, L. Research on Monitoring and Auxiliary Audit Strategy of Transmission Line Construction Progress Based on Satellite Remote Sensing and Deep Learning. In Proceedings of the 2020 2nd International Conference on Information Technology and Computer Application (ITCA), Guangzhou, China, 18–20 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 73–78. [Google Scholar] [CrossRef]
- Chi, X.; Sun, Y.; Zhao, Y.; Lu, D.; Gao, Y.; Zhang, Y. An Improved YOLOv8 Network for Detecting Electric Pylons Based on Optical Satellite Image. Sensors 2024, 24, 4012. [Google Scholar] [CrossRef] [PubMed]
- Huang, Z.; Wang, F.; You, H.; Hu, Y. STC-Det: A Slender Target Detector Combining Shadow and Target Information in Optical Satellite Images. Remote Sens. 2021, 13, 4183. [Google Scholar] [CrossRef]
- Aleissaee, A.A.; Kumar, A.; Anwer, R.M.; Khan, S.; Cholakkal, H.; Xia, G.-S.; Khan, F.S. Transformers in Remote Sensing: A Survey. Remote Sens. 2023, 15, 1860. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Neural Information Processing Systems Foundation, Inc. (NeurIPS): San Diego, CA, USA, 2017; pp. 5998–6008. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Wang, X.; Wang, A.; Yi, J.; Song, Y.; Chehri, A. Small Object Detection Based on Deep Learning for Remote Sensing: A Comprehensive Review. Remote Sens. 2023, 15, 3265. [Google Scholar] [CrossRef]
- Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.M.; Shum, H.Y. DINO: DETR with Improved Denoising Anchor Boxes for End-to-End Object Detection. arXiv 2022, arXiv:2203.03605. [Google Scholar]
- Chi, X.; Sun, Y.; Zhao, Y.; Lu, D.; Yang, J.; Zhang, Y. A Comprehensive Framework for Fine-Grained Object Recognition in Remote Sensing. In Computational Visual Media; Lecture Notes in Computer Science; Springer Nature: Singapore, 2025; Volume 15663, pp. 131–150. [Google Scholar] [CrossRef]
- Zhang, K.; Zhang, N.; Shi, C.; Lu, Q.; Zheng, X.; Cao, Y.; Zhang, X.; Yang, J. YOLOv9-GDV: A Power Pylon Detection Model for Remote Sensing Images. Remote Sens. 2025, 17, 2229. [Google Scholar] [CrossRef]
- Wang, H.; Yang, G.; Li, E.; Tian, Y.; Zhao, M.; Liang, Z. High-voltage power transmission tower detection based on faster R-CNN and YOLO-V3. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 8750–8755. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 7036–7045. [Google Scholar]
- Huang, Z.; Wang, F.; You, H.; Hu, Y. Shadow Information-Based Slender Targets Detection Method in Optical Satellite Images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 6001905. [Google Scholar] [CrossRef]
- Tang, H.; Li, Z.; Zhang, D.; He, S.; Tang, J. Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 1958–1974. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2117–2125. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 8759–8768. [Google Scholar] [CrossRef]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 10781–10790. [Google Scholar] [CrossRef]
- Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. SCRDet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 8232–8241. [Google Scholar] [CrossRef]
- Wu, S.; Li, X.; Wang, X. IoU-Balanced Loss Functions for Single-Stage Object Detection. Pattern Recognit. Lett. 2020, 136, 63–69. [Google Scholar] [CrossRef]
- Tan, J.; Lu, X.; Zhang, G.; Yin, C.; Li, Q. Equalization Loss for Large Vocabulary Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 11652–11661. [Google Scholar]
- Shehzadi, T.; Hashmi, K.A.; Liwicki, M.; Stricker, D.; Afzal, M.Z. Object Detection with Transformers: A Review. Sensors 2025, 25, 6025. [Google Scholar] [CrossRef] [PubMed]
- Meng, D.; Chen, X.; Fan, Z.; Zeng, G.; Li, H.; Yuan, Y.; Sun, L.; Wang, J. Conditional DETR for fast training convergence. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 3651–3660. [Google Scholar] [CrossRef]
- Liu, S.; Li, F.; Zhang, H.; Yang, X.; Qi, X.; Su, H.; Zhu, J.; Zhang, L. DAB-DETR: Dynamic Anchor Boxes Are Better Queries for DETR. arXiv 2022, arXiv:2201.12329. [Google Scholar] [CrossRef]
- Li, F.; Zhang, H.; Liu, S.; Guo, J.; Ni, L.M.; Zhang, L. DN-DETR: Accelerate DETR Training by Introducing Query Denoising. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 2239–2251. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Zhang, X.; Yang, T.; Sun, J. Anchor DETR: Query Design for Transformer-Based Object Detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; AAAI Press: Washington, DC, USA, 2022; pp. 2567–2575. [Google Scholar]
- Gao, P.; Zheng, M.; Wang, X.; Dai, J.; Li, H. Fast Convergence of DETR with Spatially Modulated Co-Attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 3621–3630. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Lecture Notes in Computer Science, Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer: Berlin/Heidelberg, Germany, 2018; Volume 11211, pp. 3–19. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 11534–11542. [Google Scholar]
- Dai, X.; Chen, Y.; Xiao, B.; Chen, D.; Liu, M.; Yuan, L.; Zhang, L. Dynamic Head: Unifying Object Detection Heads with Attentions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 7369–7378. [Google Scholar] [CrossRef]
- Yang, B.; Bender, G.; Le, Q.V.; Ngiam, J. CondConv: Conditionally Parameterized Convolutions for Efficient Inference. In Advances in Neural Information Processing Systems (NeurIPS); Curran Associates Inc.: Red Hook, NY, USA, 2019; pp. 1307–1318. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Dong, C.; Jiang, S.; Sun, H.; Li, J.; Yu, Z.; Wang, J.; Wang, J. QEDetr: DETR with Query Enhancement for Fine-Grained Object Detection. Remote Sens. 2025, 17, 893. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Lecture Notes in Computer Science, Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer: Berlin/Heidelberg, Germany, 2018; Volume 8693, pp. 740–755. [Google Scholar] [CrossRef]
- Qiao, S.; Sun, Y.; Zhang, H. Deep learning based electric pylon detection in remote sensing images. Remote Sens. 2020, 12, 1857. [Google Scholar] [CrossRef]
- Li, F.; Zeng, A.; Liu, S.; Zhang, H.; Li, H.; Zhang, L.; Ni, L.M. Lite DETR: An Interleaved Multi-Scale Encoder for Efficient DETR. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 18558–18567. [Google Scholar]
- Gao, Y.; Sun, Y.; Ding, X.; Zhao, C.; Liu, S. EASE-DETR: Easing the Competition among Object Queries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 19361–19370. [Google Scholar]








| Hyperparameter | Setting |
|---|---|
| Backbone | ResNet-50 |
| Optimizer | AdamW |
| Number of object queries | 300 |
| Epochs | 50 |
| Train Batch Size | 8 |
| Initial Learning Rate | 2 × 10−4 |
| Weight decay | 1 × 10−4 |
| SCQM reduction ratio r | 1 |
| EPRD | Images | Total Objects | Small | Medium | Large |
|---|---|---|---|---|---|
| Train | 741 | 1210 | 5.9% | 78.0% | 16.1% |
| Validation | 242 | 407 | 11.3% | 81.8% | 6.9% |
| Test | 357 | 561 | 6.3% | 82.5% | 11.2% |
| Total | 1340 | 2178 | 7.0% | 79.9% | 13.1% |
| Method | Epochs | AP | AP50 | AP75 | APS | APM | APL |
|---|---|---|---|---|---|---|---|
| DETR | 150 | 73.3 | 96.1 | 87.5 | 37.2 | 73.0 | 86.2 |
| DAB-DETR | 50 | 70.9 | 95.3 | 85.2 | 40.8 | 70.7 | 80.1 |
| Conditional-DETR | 50 | 71.4 | 96.2 | 87.2 | 35.3 | 70.7 | 84.5 |
| Deformable-DETR | 50 | 72.7 | 96.1 | 87.6 | 47.2 | 72.5 | 82.4 |
| Refined Deformable-DETR | 50 | 74.1 | 96.2 | 89.1 | 50.9 | 73.7 | 85.0 |
| Method | Epochs | Params (M) | FLOPs (G) | FPS | AP | APS |
|---|---|---|---|---|---|---|
| Deformable-DETR | 50 | 40.10 | 123.0 | 27.0 | 72.7 | 47.2 |
| Refined Deformable-DETR | 50 | 40.23 | 123.0 | 24.9 | 74.1 | 50.9 |
| r | Hidden dim. | Params (M) | AP | AP50 | AP75 | APS | APM | APL |
|---|---|---|---|---|---|---|---|---|
| 1 (SCQM) | 256 | 40.23 | 74.1 | 96.2 | 89.1 | 50.9 | 73.7 | 85.0 |
| 4 | 64 | 40.13 | 73.9 | 96.4 | 89.4 | 47.4 | 73.8 | 84.2 |
| 8 | 32 | 40.12 | 73.9 | 96.6 | 87.0 | 49.1 | 73.6 | 85.9 |
| 16 | 16 | 40.11 | 73.2 | 95.5 | 87.9 | 42.5 | 72.6 | 86.5 |
| Pooling Strategy | Params (M) | AP | AP50 | AP75 | APS | APM | APL |
|---|---|---|---|---|---|---|---|
| GAP (SCQM) | 40.23 | 74.1 | 96.2 | 89.1 | 50.9 | 73.7 | 85.0 |
| GMP | 40.23 | 74.5 | 96.4 | 89.3 | 46.8 | 74.4 | 83.8 |
| GAP+GMP | 40.23 | 73.5 | 95.5 | 87.6 | 45.1 | 73.5 | 83.9 |
| α | AP | AP50 | AP75 | APS | APM | APL |
|---|---|---|---|---|---|---|
| 0 | 72.7 | 96.1 | 87.6 | 47.2 | 72.5 | 82.4 |
| 0.25 | 73.8 | 97.1 | 88.6 | 47.3 | 73.7 | 83.1 |
| 0.5 | 73.7 | 95.8 | 90.1 | 47.9 | 73.6 | 84.2 |
| 0.75 | 73.4 | 95.9 | 90.1 | 49.0 | 73.2 | 84.6 |
| 1 (SCQM) | 74.1 | 96.2 | 89.1 | 50.9 | 73.7 | 85.0 |
| Method | Epochs | AP | AP50 | AP75 | APS | APM | APL |
|---|---|---|---|---|---|---|---|
| DETR | 150 | 43.6 | 81.3 | 39.5 | 36.0 | 64.0 | - |
| DAB-DETR | 50 | 45.5 | 84.1 | 44.3 | 40.3 | 67.9 | - |
| Conditional-DETR | 50 | 46.6 | 87.3 | 44.4 | 42.1 | 68.8 | - |
| Deformable-DETR | 50 | 49.6 | 88.4 | 50.4 | 45.0 | 68.2 | - |
| Refined Deformable-DETR | 50 | 50.8 | 90.2 | 54.1 | 46.9 | 70.1 | - |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yang, J.; Sun, Y.; Zhao, Y.; Lu, D.; Hao, Y.; Liu, X. Refined Deformable-DETR for Electric Pylon Detection Based on Optical Satellite Image. Sensors 2026, 26, 3467. https://doi.org/10.3390/s26113467
Yang J, Sun Y, Zhao Y, Lu D, Hao Y, Liu X. Refined Deformable-DETR for Electric Pylon Detection Based on Optical Satellite Image. Sensors. 2026; 26(11):3467. https://doi.org/10.3390/s26113467
Chicago/Turabian StyleYang, Jun, Yu Sun, Yingjun Zhao, Donghua Lu, Yuxi Hao, and Xianglin Liu. 2026. "Refined Deformable-DETR for Electric Pylon Detection Based on Optical Satellite Image" Sensors 26, no. 11: 3467. https://doi.org/10.3390/s26113467
APA StyleYang, J., Sun, Y., Zhao, Y., Lu, D., Hao, Y., & Liu, X. (2026). Refined Deformable-DETR for Electric Pylon Detection Based on Optical Satellite Image. Sensors, 26(11), 3467. https://doi.org/10.3390/s26113467

