Object Detection in Remote Sensing Images Based on Adaptive Multi-Scale Feature Fusion Method
Abstract
:1. Introduction
- We propose a novel multi-scale feature extraction method, which includes both inter-level and intra-level feature extraction and fusion.
- Within the hierarchy, the extracted features are scale-invariant, and objects of different scales in the feature map are expected to achieve a uniform feature representation.
- A scale selection mechanism is introduced, which discriminates between the multi-scale features extracted within the hierarchy. This method assigns greater weight to features from scales that are most critical and suitable for subsequent tasks, effectively focusing on the most relevant aspects for enhanced performance.
- In the experiment, our proposed method achieved a 74.21% mAP result on the DOTA-v1.0 dataset and a 84.90% mAP result on the HRSC2016 dataset, which demonstrate the effectiveness and superiority of the improvement.
2. Related Work
2.1. Object Detection in Remote Sensing Image
2.2. Extraction and Fusion of Multi-Scale Features
2.3. Attention Mechanism
3. Methods
3.1. The Overall Network Structure
3.2. Multi-Scale Feature Extraction and Fusion
3.3. Adaptive Multi-Scale-Feature-Enhanced Module
3.3.1. Global Pooling
3.3.2. Attention Vector
3.3.3. Feature Fusion
4. Experiments and Results
4.1. Datasets
4.2. Implementation Details
4.3. Results
4.4. Ablation Study
5. Discussion
6. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sagar, A.S.; Chen, Y.; Xie, Y.; Kim, H.S. MSA R-CNN: A comprehensive approach to remote sensing object detection and scene understanding. Expert Syst. Appl. 2024, 241, 122788. [Google Scholar] [CrossRef]
- Zhang, X.; Zhang, T.; Wang, G.; Zhu, P.; Tang, X.; Jia, X.; Jiao, L. Remote Sensing Object Detection Meets Deep Learning: A metareview of challenges and advances. IEEE Geosci. Remote Sens. Mag. 2023, 11, 8–44. [Google Scholar] [CrossRef]
- Yu, Y.; Da, F. Phase-shifting coder: Predicting accurate orientation in oriented object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 13354–13363. [Google Scholar]
- Jiang, X.; Wu, Y. Remote Sensing Object Detection Based on Convolution and Swin Transformer. IEEE Access 2023, 11, 38643–38656. [Google Scholar] [CrossRef]
- Gao, T.; Niu, Q.; Zhang, J.; Chen, T.; Mei, S.; Jubair, A. Global to local: A scale-aware network for remote sensing object detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5615614. [Google Scholar] [CrossRef]
- Chen, S.; Zhao, J.; Zhou, Y.; Wang, H.; Yao, R.; Zhang, L.; Xue, Y. Info-FPN: An Informative Feature Pyramid Network for object detection in remote sensing images. Expert Syst. Appl. 2023, 214, 119132. [Google Scholar] [CrossRef]
- Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. Scrdet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8232–8241. [Google Scholar]
- Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 3520–3529. [Google Scholar]
- Song, Q.; Yang, F.; Yang, L.; Liu, C.; Hu, M.; Xia, L. Learning point-guided localization for detection in remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 1084–1094. [Google Scholar] [CrossRef]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Wang, X.; Zhang, S.; Yu, Z.; Feng, L.; Zhang, W. Scale-equalizing pyramid convolution for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 13359–13368. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. Proc. AAAI Conf. Artif. Intell. 2017, 31, 4278–4284. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vegas, NV, USA, 26 June–1 July 2016; pp. 2818–2826. [Google Scholar]
- Sun, J.; Shen, Z.; Wang, Y.; Bao, H.; Zhou, X. LoFTR: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 8922–8931. [Google Scholar]
- Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 2018, 20, 3111–3122. [Google Scholar] [CrossRef]
- Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning roi transformer for oriented object detection in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 2849–2858. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.S.; Bai, X. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 1452–1459. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Yan, J. Arbitrary-oriented object detection with circular smooth label. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 677–694. [Google Scholar]
- Yang, X.; Hou, L.; Zhou, Y.; Wang, W.; Yan, J. Dense label encoding for boundary discontinuity free rotation detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 15819–15829. [Google Scholar]
- Li, W.; Chen, Y.; Hu, K.; Zhu, J. Oriented reppoints for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 1819–1828. [Google Scholar]
- Deng, Z.; Sun, H.; Zhou, S.; Zhao, J.; Lei, L.; Zou, H. Multi-scale object detection in remote sensing imagery with convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 2018, 145, 3–22. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 7036–7045. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10781–10790. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial transformer networks. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems, NIPS 2015, Montreal, QC, Canada, 7–12 December 2015; pp. 2017–2025. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective kernel networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 510–519. [Google Scholar]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3974–3983. [Google Scholar]
- Liu, Z.; Yuan, L.; Weng, L.; Yang, Y. A High Resolution Optical Satellite Image Dataset for Ship Recognition and Some New Baselines. In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods—Volume 1: ICPRAM, INSTICC, SciTePress, Porto, Portugal, 24–26 February 2017; pp. 324–331. [Google Scholar]
- Zhou, Y.; Yang, X.; Zhang, G.; Wang, J.; Liu, Y.; Hou, L.; Jiang, X.; Liu, X.; Yan, J.; Lyu, C. Mmrotate: A rotated object detection benchmark using pytorch. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 7331–7334. [Google Scholar]
- Azimi, S.M.; Vig, E.; Bahmanyar, R.; Körner, M.; Reinartz, P. Towards multi-class object detection in unconstrained remote sensing imagery. In Proceedings of the Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; pp. 150–165. [Google Scholar]
- Zhang, G.; Lu, S.; Zhang, W. CAD-Net: A context-aware detection network for objects in remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2019, 57, 10015–10024. [Google Scholar] [CrossRef]
- Wang, J.; Yang, W.; Li, H.C.; Zhang, H.; Xia, G.S. Learning center probability map for detecting objects in aerial images. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4307–4323. [Google Scholar] [CrossRef]
- Jiang, Y.; Zhu, X.; Wang, X.; Yang, S.; Li, W.; Wang, H.; Fu, P.; Luo, Z. R2CNN: Rotational region CNN for orientation robust scene text detection. arXiv 2017, arXiv:1706.09579. [Google Scholar]
Method | Backbone | PL | BD | BR | GTF | SV | LV | SH | TC |
---|---|---|---|---|---|---|---|---|---|
FR-O [33] | R101 | 79.42% | 77.13% | 17.70% | 64.05% | 35.30% | 38.02% | 37.16% | 89.41% |
ICN [36] | R101-FPN | 81.40% | 74.30% | 47.70% | 70.30% | 64.90% | 67.80% | 70.00% | 90.80% |
RoI-Trans. [17] | R101-FPN | 88.64% | 78.52% | 43.44% | 75.92% | 68.81% | 73.68% | 83.59% | 90.74% |
CADNet [37] | R101-FPN | 87.80% | 82.40% | 49.40% | 73.50% | 71.10% | 63.50% | 76.60% | 90.90% |
CenterMap [38] | R50-FPN | 88.88% | 81.24% | 53.15% | 60.65% | 78.62% | 66.55% | 78.10% | 88.83% |
SCRDet [7] | R101-FPN | 89.98% | 80.65% | 52.09% | 68.36% | 68.36% | 60.32% | 72.41% | 90.85% |
R-FR [18] | R-50-FPN | 89.25% | 82.45% | 49.95% | 69.36% | 78.17% | 73.60% | 85.92% | 90.90% |
Ours | R-50 | 89.26% | 82.26% | 51.33% | 68.49% | 78.88% | 74.14% | 85.59% | 90.88% |
Method | Backbone | BC | ST | SBF | RA | HA | SP | HC | mAP |
FR-O [33] | R101 | 69.64% | 59.28% | 50.30% | 52.91% | 47.89% | 47.40% | 46.30% | 54.13% |
ICN [36] | R101-FPN | 79.10% | 78.20% | 53.60% | 62.90% | 67.00% | 64.20% | 50.20% | 68.20% |
RoI-Trans. [17] | R101-FPN | 77.27% | 81.46% | 58.39% | 53.54% | 62.83% | 58.93% | 47.67% | 69.56% |
CADNet [37] | R101-FPN | 79.20% | 73.30% | 48.40% | 60.90% | 62.00% | 67.00% | 62.20% | 69.90% |
CenterMap [38] | R50-FPN | 77.80% | 83.61% | 49.36% | 66.19% | 72.10% | 72.36% | 58.70% | 71.74% |
SCRDet [7] | R101-FPN | 87.94% | 86.86% | 65.02% | 66.68% | 66.25% | 68.24% | 65.21% | 72.61% |
R-FR [18] | R-50-FPN | 84.04% | 85.48% | 57.58% | 60.98% | 66.25% | 69.23% | 57.74% | 73.40% |
Ours | R-50 | 84.94% | 85.73% | 60.78% | 64.76% | 65.72% | 71.32% | 59.08% | 74.21% |
Method | PL | BD | BR | GTF | SV | LV | SH | TC |
---|---|---|---|---|---|---|---|---|
R-FR+FPN | 89.25% | 82.45% | 49.95% | 69.36% | 78.17% | 73.60% | 85.92% | 90.90% |
ADD | 89.25% | 83.24% | 50.48% | 66.61% | 78.79% | 74.95% | 85.30% | 90.90% |
Ours | 89.26% | 82.26% | 51.33% | 68.49% | 78.88% | 74.14% | 85.59% | 90.88% |
Method | BC | ST | SBF | RA | HA | SP | HC | mAP |
R-FR+FPN | 84.04% | 85.48% | 57.58% | 60.98% | 66.25% | 69.23% | 57.74% | 73.40% |
ADD | 85.02% | 85.50% | 55.94% | 66.19% | 65.66% | 71.26% | 60.54% | 73.98% |
Ours | 84.94% | 85.73% | 60.78% | 64.76% | 65.72% | 71.32% | 59.08% | 74.21% |
Method | Backbone | mAP |
---|---|---|
R-FR [18] | R-50-FPN | 75.70% |
ADD | R-50 | 82.90% |
Ours | R-50 | 84.90% |
Method | Dilation Rate | PL | BD | BR | GTF | SV | LV | SH | TC |
---|---|---|---|---|---|---|---|---|---|
R-FR+FPN | - | 89.25% | 82.45% | 49.95% | 69.36% | 78.17% | 73.60% | 85.92% | 90.90% |
Ours | (1, 2) | 89.13% | 83.97% | 50.23% | 67.77% | 78.84% | 75.45% | 85.30% | 90.89% |
(1, 2, 3) | 89.26% | 82.26% | 51.33% | 68.49% | 78.88% | 74.14% | 85.59% | 90.88% | |
(1, 2, 3, 4) | 89.39% | 81.00% | 49.95% | 66.19% | 78.73% | 74.85% | 85.33% | 90.90% | |
Method | Dilation Rate | BC | ST | SBF | RA | HA | SP | HC | mAP |
R-FR+FPN | - | 84.04% | 85.48% | 57.58% | 60.98% | 66.25% | 69.23% | 57.74% | 73.40% |
Ours | (1, 2) | 85.84% | 85.48% | 55.09% | 66.72% | 66.67% | 70.90% | 60.06% | 74.16% |
(1, 2, 3) | 84.94% | 85.73% | 60.78% | 64.76% | 65.72% | 71.32% | 59.08% | 74.21% | |
(1, 2, 3, 4) | 84.63% | 85.64% | 55.02% | 66.59% | 65.29% | 71.96% | 59.47% | 73.66% |
Method | Dilation Rate | Backbone | mAP |
---|---|---|---|
R-FR [18] | - | R-50-FPN | 75.70% |
Ours | (1, 2) | R-50 | 82.40% |
(1, 2, 3) | R-50 | 84.90% | |
(1, 2, 3, 4) | R-50 | 83.30% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, C.; Zhang, S.; Hu, M.; Song, Q. Object Detection in Remote Sensing Images Based on Adaptive Multi-Scale Feature Fusion Method. Remote Sens. 2024, 16, 907. https://doi.org/10.3390/rs16050907
Liu C, Zhang S, Hu M, Song Q. Object Detection in Remote Sensing Images Based on Adaptive Multi-Scale Feature Fusion Method. Remote Sensing. 2024; 16(5):907. https://doi.org/10.3390/rs16050907
Chicago/Turabian StyleLiu, Chun, Sixuan Zhang, Mengjie Hu, and Qing Song. 2024. "Object Detection in Remote Sensing Images Based on Adaptive Multi-Scale Feature Fusion Method" Remote Sensing 16, no. 5: 907. https://doi.org/10.3390/rs16050907
APA StyleLiu, C., Zhang, S., Hu, M., & Song, Q. (2024). Object Detection in Remote Sensing Images Based on Adaptive Multi-Scale Feature Fusion Method. Remote Sensing, 16(5), 907. https://doi.org/10.3390/rs16050907