High Quality Object Detection for Multiresolution Remote Sensing Imagery Using Cascaded Multi-Stage Detectors
Abstract
:1. Introduction
2. Related Works
3. Datasets and Evaluation Metrics
4. Causes of Performance Degradation in a Four-Stage Cascade R-CNN
4.1. Cascaded Bounding Box Regression
4.2. Mismatch between RoI Features and the Classifier
5. Proposed Method: Cascade R-CNN++
5.1. New Ensemble Strategy for Classification
5.2. Modified Loss Function for Bounding Box Regression
6. Experimental Results
6.1. Implementation Details
6.2. Stage-Wise Comparison
6.3. Ablation Experiments of the Proposed Modifications
6.4. Comparison with State-of-the-Art Detectors
6.5. Model Transferability on Multiresolution Remote Sensing Images
7. Discussion
8. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Weber, J.; Lefèvre, S. Spatial and Spectral Morphological Template Matching. Image Vis. Comput. 2012, 30, 934–945. [Google Scholar] [CrossRef]
- Hung, C.; Bryson, M.; Sukkarieh, S. Multi-Class Predictive Template for Tree Crown Detection. ISPRS J. Photogramm. Remote Sens. 2012, 68, 170–183. [Google Scholar] [CrossRef]
- Chaudhuri, D.; Samal, A. An Automatic Bridge Detection Technique for Multispectral Images. IEEE Trans. Geosci. Remote Sens. 2008, 46, 2720–2727. [Google Scholar] [CrossRef] [Green Version]
- Martha, T.R.; Kerle, N.; van Westen, C.J.; Jetten, V.; Kumar, K.V. Segment Optimization and Data-Driven Thresholding for Knowledge-Based Landslide Detection by Object-Based Image Analysis. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4928–4943. [Google Scholar] [CrossRef]
- Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object Detection in Optical Remote Sensing Images: A Survey and a New Benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J. A Survey on Object Detection in Optical Remote Sensing Images. ISPRS J. Photogramm. Remote Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef] [Green Version]
- Han, J.; Zhang, D.; Cheng, G.; Guo, L.; Ren, J. Object Detection in Optical Remote Sensing Images Based on Weakly Supervised Learning and High-Level Feature Learning. IEEE Trans. Geosci. Remote Sens. 2014, 53, 3325–3337. [Google Scholar] [CrossRef] [Green Version]
- Zhang, F.; Du, B.; Zhang, L.; Xu, M. Weakly Supervised Learning Based on Coupled Convolutional Neural Networks for Aircraft Detection. IEEE Trans. Geosci. Remote Sens. 2016, 54, 5553–5563. [Google Scholar] [CrossRef]
- Tang, T.; Zhou, S.; Deng, Z.; Zou, H.; Lei, L. Vehicle Detection in Aerial Images Based on Region Convolutional Neural Networks and Hard Negative Example Mining. Sensors 2017, 17, 336. [Google Scholar] [CrossRef] [Green Version]
- Liu, W.; Ma, L.; Chen, H. Arbitrary-Oriented Ship Detection Framework in Optical Remote-Sensing Images. IEEE Geosci. Remote Sens. Lett. 2018, 15, 937–941. [Google Scholar] [CrossRef]
- Dong, Z.; Wang, M.; Wang, Y.; Zhu, Y.; Zhang, Z. Object Detection in High Resolution Remote Sensing Imagery Based on Convolutional Neural Networks with Suitable Object Scale Features. IEEE Trans. Geosci. Remote Sens. 2019, 58, 2104–2114. [Google Scholar] [CrossRef]
- Cheng, G.; Zhou, P.; Han, J. Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Li, K.; Cheng, G.; Bu, S.; You, X. Rotation-Insensitive and Context-Augmented Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2017, 56, 2337–2348. [Google Scholar] [CrossRef]
- Cheng, G.; Si, Y.; Hong, H.; Yao, X.; Guo, L. Cross-Scale Feature Fusion for Object Detection in Optical Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2021, 18, 431–435. [Google Scholar] [CrossRef]
- Ming, Q.; Miao, L.; Zhou, Z.; Dong, Y. CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Detection in Remote-Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
- Fu, K.; Chang, Z.; Zhang, Y.; Xu, G.; Zhang, K.; Sun, X. Rotation-Aware and Multi-Scale Convolutional Neural Network for Object Detection in Remote Sensing Images. ISPRS J. Photogramm. Remote Sens. 2020, 161, 294–308. [Google Scholar] [CrossRef]
- Shamsolmoali, P.; Chanussot, J.; Zareapoor, M.; Zhou, H.; Yang, J. Multipatch Feature Pyramid Network for Weakly Supervised Object Detection in Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2021, 18, 1–13. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X.; Ke, X. Quad-FPN: A Novel Quad Feature Pyramid Network for SAR Ship Detection. Remote Sens. 2021, 13, 2771. [Google Scholar] [CrossRef]
- Hou, J.-B.; Zhu, X.; Yin, X.-C. Self-Adaptive Aspect Ratio Anchor for Oriented Object Detection in Remote Sensing Images. Remote Sens. 2021, 13, 1318. [Google Scholar] [CrossRef]
- Dong, R.; Jiao, L.; Zhang, Y.; Zhao, J.; Shen, W. A Multi-Scale Spatial Attention Region Proposal Network for High-Resolution Optical Remote Sensing Imagery. Remote Sens. 2021, 13, 3362. [Google Scholar] [CrossRef]
- Lin, Q.; Zhao, J.; Fu, G.; Yuan, Z. CRPN-SFNet: A High-Performance Object Detector on Large-Scale Remote Sensing Images. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 416–429. [Google Scholar] [CrossRef] [PubMed]
- Lin, Q.; Zhao, J.; Du, B.; Fu, G.; Yuan, Z. MEDNet: Multiexpert Detection Network with Unsupervised Clustering of Training Samples. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
- Han, W.; Fan, R.; Wang, L.; Feng, R.; Li, F.; Deng, Z.; Chen, X. Improving Training Instance Quality in Aerial Image Object Detection with a Sampling-Balance-Based Multistage Network. IEEE Trans. Geosci. Remote Sens. 2021, 59, 10575–10589. [Google Scholar] [CrossRef]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1483–1498. [Google Scholar] [CrossRef] [Green Version]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into High Quality Object Detection. In Proceedings of the IEEE Conference on 2018 Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Xia, G.-S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Ding, J.; Xue, N.; Long, Y.; Xia, G.; Lu, Q. Learning RoI Transformer for Detecting Oriented Objects in Aerial Images. In Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot Multibox Detector. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Cheng, G.; He, M.; Hong, H.; Yao, X.; Qian, X.; Guo, L. Guiding Clean Features for Object Detection in Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 8019205. [Google Scholar] [CrossRef]
- Cheng, G.; Lang, C.; Wu, M.; Xie, X.; Yao, X.; Han, J. Feature Enhancement Network for Object Detection in Optical Remote Sensing Images. J. Remote Sens. 2021, 2021, 9805389. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-Class Geospatial Object Detection and Geographic Image Classification Based on Collection of Part Detectors. ISPRS J. Photogramm. Remote Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014, Zurich, Switzerland, 6–12 September 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-Cnn. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Girshick, R.; Radosavovic, I.; Gkioxari, G.; Dollár, P.; He, K. Detectron. 2018. Available online: https://github.com/facebookresearch/detectron (accessed on 9 October 2019).
- Goyal, P.; Dollár, P.; Girshick, R.; Noordhuis, P.; Wesolowski, L.; Kyrola, A.; Tulloch, A.; Jia, Y.; He, K. Accurate, Large Minibatch Sgd: Training Imagenet in 1 Hour. arXiv 2017, arXiv:1706.02677. [Google Scholar]
- Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS—Improving Object Detection with One Line of Code. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
Classifier | RoI | AP (%) | AP50 (%) | AP75 (%) | APS (%) | APM (%) | APL (%) |
---|---|---|---|---|---|---|---|
1# | 5# | 43.2 | 63.8 | 47.9 | 24.6 | 40.1 | 49.0 |
1# | 1# | 44.7 | 64.3 | 49.7 | 26.1 | 41.7 | 53.6 |
2# | 5# | 44.6 | 63.6 | 49.6 | 25.8 | 40.8 | 52.6 |
2# | 2# | 45.0 | 63.5 | 50.5 | 25.6 | 41.4 | 53.7 |
3# | 5# | 44.4 | 63.4 | 50.0 | 26.3 | 40.6 | 50.1 |
3# | 3# | 44.9 | 63.4 | 50.2 | 26.7 | 41.1 | 51.2 |
4# | 5# | 44.0 | 62.6 | 49.3 | 25.3 | 40.9 | 50.8 |
4# | 4# | 43.9 | 62.5 | 49.2 | 25.4 | 40.6 | 51.1 |
FPS | AP | AP50 | AP75 | AP90 | APS | APM | APL | |
---|---|---|---|---|---|---|---|---|
Three stage Cascade R-CNN++ | 2.54 | 45.0 | 64.2 | 50.5 | 17.7 | 28.1 | 41.0 | 51.9 |
Four stage Cascade R-CNN++ | 2.46 | 45.4 | 64.4 | 51.2 | 17.8 | 27.5 | 41.5 | 51.1 |
Five stage Cascade R-CNN++ | 2.34 | 45.7 | 64.6 | 51.0 | 19.4 | 27.0 | 41.8 | 53.6 |
Ens | Reg | AP | AP50 | AP75 | AP90 | APS | APM | APL |
---|---|---|---|---|---|---|---|---|
44.1 | 63.0 | 49.5 | 17.3 | 25.9 | 39.8 | 50.3 | ||
√ | 44.3 | 63.3 | 50.0 | 17.9 | 26.3 | 40.1 | 50.9 | |
√ | 45.3 | 64.5 | 50.6 | 17.6 | 27.0 | 41.3 | 52.8 | |
√ | √ | 45.7 | 64.6 | 51.0 | 19.4 | 27.0 | 41.8 | 53.6 |
AP | AP50 | AP75 | AP90 | APS | APM | APL | |
---|---|---|---|---|---|---|---|
Faster R-CNN | 40.1 | 59.3 | 45.0 | 11.7 | 24.1 | 36.1 | 48.9 |
FPN | 40.3 | 62.0 | 50.4 | 15.4 | 24.4 | 37.5 | 47.2 |
RetinaNet | 38.9 | 60.1 | 42.6 | 10.0 | 22.3 | 36.8 | 46.2 |
Cascade R-CNN | 42.5 | 60.4 | 49.1 | 15.7 | 23.6 | 38.4 | 49.6 |
Cascade R-CNN++ | 45.7 | 64.6 | 51.0 | 19.4 | 27.0 | 41.8 | 53.6 |
Cascade R-CNN++ * | 47.1 | 66.2 | 53.2 | 19.5 | 30.8 | 42.7 | 55.9 |
AP | AP50 | AP75 | AP90 | APS | APM | APL | |
---|---|---|---|---|---|---|---|
Faster R-CNN | 53.3 | 88.3 | 59.6 | 6.0 | 21.5 | 48.5 | 59.7 |
FPN | 55.6 | 89.6 | 61.4 | 7.1 | 37.0 | 50.0 | 63.3 |
RetinaNet | 49.1 | 88.2 | 50.8 | 4.5 | 18.7 | 44.1 | 55.2 |
Cascade R-CNN | 59.1 | 91.5 | 69.1 | 8.5 | 38.3 | 53.3 | 66.2 |
Cascade R-CNN++ | 60.0 | 91.2 | 70.8 | 9.2 | 43.1 | 54.1 | 67.8 |
AP | AP50 | AP75 | AP90 | APS | APM | APL | |
---|---|---|---|---|---|---|---|
Empirical thresholds | 45.5 | 64.0 | 51.0 | 19.2 | 26.7 | 42.9 | 52.8 |
Auto determined thresholds | 45.7 | 64.6 | 51.0 | 19.4 | 27.0 | 41.8 | 53.6 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, B.; Shen, Y.; Guo, S.; Chen, J.; Sun, L.; Li, H.; Ao, Y. High Quality Object Detection for Multiresolution Remote Sensing Imagery Using Cascaded Multi-Stage Detectors. Remote Sens. 2022, 14, 2091. https://doi.org/10.3390/rs14092091
Wu B, Shen Y, Guo S, Chen J, Sun L, Li H, Ao Y. High Quality Object Detection for Multiresolution Remote Sensing Imagery Using Cascaded Multi-Stage Detectors. Remote Sensing. 2022; 14(9):2091. https://doi.org/10.3390/rs14092091
Chicago/Turabian StyleWu, Binglong, Yuan Shen, Shanxin Guo, Jinsong Chen, Luyi Sun, Hongzhong Li, and Yong Ao. 2022. "High Quality Object Detection for Multiresolution Remote Sensing Imagery Using Cascaded Multi-Stage Detectors" Remote Sensing 14, no. 9: 2091. https://doi.org/10.3390/rs14092091
APA StyleWu, B., Shen, Y., Guo, S., Chen, J., Sun, L., Li, H., & Ao, Y. (2022). High Quality Object Detection for Multiresolution Remote Sensing Imagery Using Cascaded Multi-Stage Detectors. Remote Sensing, 14(9), 2091. https://doi.org/10.3390/rs14092091