Oriented Object Detection Based on Foreground Feature Enhancement in Remote Sensing Images
Abstract
:1. Introduction
- (1)
- A novel one-stage anchor-free oriented object detection method is presented, which mainly includes two modules (KAM and PCLM). The KAM is utilized to enhance the features of the foreground part of the image and suppress the features of the background part, enabling the one-stage detector to have a higher distinction between the foreground and background. The PCLM is applied to the classification branches to make the foreground features of different categories as orthogonal to the feature space as possible, in order to enhance the identification of different categories of samples and reduce the confusion between the samples of different categories.
- (2)
- Furthermore, the EMFL is proposed to improve the learning process of positive and negative samples. Compared with the original focal loss [19], the constructed EMFL has a more dynamic training process, allowing the network to fully learn the information of positive samples in the early, middle and late stages, and also to balance for the learning of different categories of samples.
2. Related Works
2.1. Anchor-Based Oriented Object Detection
2.2. Anchor-Free Oriented Object Detection
2.3. Contrast Learning in Object Detection
3. Proposed Method
3.1. Initial Network Architecture
3.2. Keypoint Attention Module
3.3. Prototype Contrastive Learning Module
3.4. Equalized Modulation Focal Loss
4. Experimental Results and Analysis
4.1. Datasets
- (1)
- DOTA: DOTA [38] is a large-scale dataset for remote sensing object detection. Data comes from different sensors and platforms. DOTA1.0 contains 2806 aerial images with various scales, orientations, and object shapes. The images are derived from different sensors and platforms. Image resolution range from to , 188,282 samples in 15 common categories: plane (PL), baseball diamond (BD), bridge (BR), ground track field (GTF), small vehicle (SV), large vehicle (LV), ship (SH), tennis court (TC), basketball court (BC), storage tank (ST),soccer-ball field (SBF), roundabout (RA), harbor (HA), swimming pool (SP), and helicopter (HC). DOTA1.5 adds a container crane (CC) class and instances of less than 10 pixels to version 1.0. DOTA1.5 contains 402,089 instances. Compared to DOTA1.0, DOTA1.5 is more challenging but also more stable during training. In this paper, we use the training and validation sets for training and the test set for testing. All images were cropped to patch with a gap of 250. The multiscale parameters for DOTA1.0 are {0.8,1.0,1.2}, and those for DOTA1.5 are {0.8,1.0,1.2}. During the training process, we also used the random flip and random rotation methods. Cropped image detection results were merged into the final results during testing. Using non-maximum-suppression (NMS) method with a 0.1 IOU threshold. When testing, we use multiscale testing as well as flip testing.
- (2)
- HRSC2016: HRSC2016 [39] is a challenging ship detection dataset annotated with bounding boxes, with 1061 aerial images in size ranging from to . In total, 436, 181, and 444 images were included in the training, validation, and test set, respectively. We used the training and validation sets for training and the test set for testing. All images were resized to , without changing the aspect ratio. Random flipping and random rotation were used for training.
4.2. Experimental Setup
4.3. Comparisons with Other Methods
4.4. Ablation Studies
4.5. Visual Analysis
4.6. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully Convolutional One-Stage Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Liu, L.; Pan, Z.; Lei, B. Learning a Rotation Invariant Detector with Rotatable Bounding Box. arXiv 2017, arXiv:1711.09405. [Google Scholar]
- Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning RoI Transformer for Oriented Object Detection in Aerial Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2849–2858. [Google Scholar]
- Han, J.; Ding, J.; Xue, N.; Xia, G.S. Redet: A Rotation-Equivariant Detector for Aerial Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 2786–2795. [Google Scholar]
- Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3520–3529. [Google Scholar]
- Yang, X.; Yan, J.; Feng, Z.; He, T. R3det: Refined Single-Stage Detector with Feature Refinement for Rotating Object. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 3163–3171. [Google Scholar]
- Yang, X.; Yan, J.; Liao, W.; Yang, X.; Tang, J.; He, T. Scrdet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing. IEEE Trans. Pattern Anal. Mach. Intell. 2022. Early Access. [Google Scholar]
- Han, J.; Ding, J.; Li, J.; Xia, G.S. Align Deep Features for Oriented Object Detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–11. [Google Scholar]
- Lin, Y.; Feng, P.; Guan, J.; Wang, W.; Chambers, J. IENet: Interacting Embranchment One Stage Anchor Free Detector for Orientation Aerial Object Detection. arXiv 2019, arXiv:1912.00969. [Google Scholar]
- Yang, X.; Yan, J.; Ming, Q.; Wang, W.; Zhang, X.; Tian, Q. Rethinking Rotated Object Detection with Gaussian Wasserstein Distance Loss. In Proceedings of the International Conference on Machine Learning, Chongqing, China, 9–11 July 2021; pp. 11830–11841. [Google Scholar]
- Yang, X.; Yang, X.; Yang, J.; Ming, Q.; Wang, W.; Tian, Q.; Yan, J. Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence. Adv. Neural Inf. Process. Syst. 2021, 34, 18381–18394. [Google Scholar]
- Llerena, J.M.; Zeni, L.F.; Kristen, L.N.; Jung, C. Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection. arXiv 2021, arXiv:2106.06072. [Google Scholar]
- Chen, Z.; Chen, K.; Lin, W.; See, J.; Yu, H.; Ke, Y.; Yang, C. Piou Loss: Towards Accurate Oriented Object Detection in Complex Environments. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 195–211. [Google Scholar]
- Yi, J.; Wu, P.; Liu, B.; Huang, Q.; Qu, H.; Metaxas, D. Oriented Object Detection in Aerial Images with Box Boundary-Aware Vectors. In Proceedings of the Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 2150–2159. [Google Scholar]
- Zhao, P.; Qu, Z.; Bu, Y.; Tan, W.; Guan, Q. Polardet: A Fast, More Precise Detector for Rotated Target in Aerial Images. Int. J. Remote Sens. 2021, 42, 5831–5861. [Google Scholar]
- Wang, J.; Yang, L.; Li, F. Predicting Arbitrary-Oriented Objects as Points in Remote Sensing Images. Remote Sens. 2021, 13, 3731. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Zhang, Z.; Guo, W.; Zhu, S.; Yu, W. Toward Arbitrary-Oriented Ship Detection with Rotated Region Proposal and Discrimination Networks. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1745–1749. [Google Scholar]
- Yang, X.; Sun, H.; Fu, K.; Yang, J.; Sun, X.; Yan, M.; Guo, Z. Automatic Ship Detection in Remote Sensing Images from Google Earth of Complex Scenes Based on Multiscale Rotation Dense Feature Pyramid Networks. Remote Sens. 2018, 10, 132. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Yang, X.; Sun, H.; Sun, X.; Yan, M.; Guo, Z.; Fu, K. Position Detection and Direction Prediction for Arbitrary-Oriented Ships via Multitask Rotation Region Convolutional Neural Network. IEEE Access 2018, 6, 50839–50849. [Google Scholar]
- Liao, M.; Zhu, Z.; Shi, B.; Xia, G.S.; Bai, X. Rotation-Sensitive Regression for Oriented Scene Text Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5909–5918. [Google Scholar]
- Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.S.; Bai, X. Gliding Vertex on the Horizontal Bounding Box for Multi-Oriented Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 1452–1459. [Google Scholar]
- Xiao, Z.; Qian, L.; Shao, W.; Tan, X.; Wang, K. Axis Learning for Orientated Objects Detection in Aerial Images. Remote Sens. 2020, 12, 908. [Google Scholar]
- Zhou, L.; Wei, H.; Li, H.; Zhao, W.; Zhang, Y.; Zhang, Y. Arbitrary-Oriented Object Detection in Remote Sensing Images Based on Polar Coordinates. IEEE Access 2020, 8, 223373–223384. [Google Scholar]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as Points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
- Lang, S.; Ventola, F.; Kersting, K. DAFNe: A One-Stage Anchor-Free Approach for Oriented Object Detection. arXiv 2021, arXiv:2109.06148. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
- Chen, X.; Fan, H.; Girshick, R.; He, K. Improved Baselines with Momentum Contrastive Learning. arXiv 2020, arXiv:2003.04297. [Google Scholar]
- Zhu, R.; Zhao, B.; Liu, J.; Sun, Z.; Chen, C.W. Improving Contrastive Learning by Visualizing Feature Transformation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10306–10315. [Google Scholar]
- Xie, E.; Ding, J.; Wang, W.; Zhan, X.; Xu, H.; Sun, P.; Luo, P. Detco: Unsupervised Contrastive Learning for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 8392–8401. [Google Scholar]
- Xie, Z.; Lin, Y.; Zhang, Z.; Cao, Y.; Lin, S.; Hu, H. Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 16684–16693. [Google Scholar]
- Wang, X.; Gao, J.; Long, M.; Wang, J. Self-Tuning for Data-Efficient Deep Learning. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 10738–10748. [Google Scholar]
- Law, H.; Deng, J. Cornernet: Detecting Objects as Paired Keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar]
- Chen, Q.; Wang, Y.; Yang, T.; Zhang, X.; Cheng, J.; Sun, J. You Only Look One-Level Feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 13039–13048. [Google Scholar]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
- Liu, Z.; Yuan, L.; Weng, L.; Yang, Y. A High Resolution Optical Satellite Image Dataset for Ship Recognition and Some New Baselines. In Proceedings of the International Conference on Pattern Recognition Applications and Methods, Porto, Portugal, 24–26 February 2017; SciTePress: Setúbal, Portugal, 2017; Volume 2, pp. 324–331. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing Data Using T-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Methods | Backbone | PL | BD | BR | GTF | SV | LV | SH | TC | BC | ST | SBF | RA | HA | SP | HC | mAP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Two-stage | RoI-Trans [5] | R101 | 88.6 | 78.5 | 43.4 | 75.9 | 68.8 | 73.6 | 83.5 | 90.7 | 77.2 | 81.4 | 58.3 | 53.5 | 62.8 | 58.9 | 47.6 | 69.56 |
ReDet [6] | ReR50 | 88.7 | 82.6 | 53.9 | 74.0 | 78.1 | 84.0 | 88.0 | 90.8 | 87.7 | 85.7 | 61.7 | 60.3 | 75.9 | 68.0 | 63.5 | 76.25 | |
Oriented R-CNN [7] | R101 | 88.8 | 83.4 | 55.2 | 76.9 | 74.2 | 82.1 | 87.5 | 90.9 | 85.5 | 85.3 | 65.5 | 66.8 | 74.3 | 70.1 | 57.2 | 76.28 | |
Oriented R-CNN* [7] | R101 | 90.2 | 84.7 | 62.0 | 80.4 | 79.0 | 85.0 | 88.5 | 90.8 | 87.2 | 87.9 | 72.2 | 70.0 | 82.9 | 78.4 | 68.0 | 80.52 | |
One-stage | PIoU [15] | DLA-34 | 80.9 | 69.7 | 24.1 | 60.2 | 38.3 | 64.4 | 64.8 | 90.9 | 77.2 | 70.4 | 46.5 | 37.1 | 57.1 | 61.9 | 64.0 | 60.50 |
IENet [11] | R101 | 88.1 | 71.3 | 34.2 | 51.7 | 63.7 | 65.6 | 71.6 | 90.1 | 71.0 | 73.6 | 37.6 | 41.5 | 48.0 | 60.5 | 49.5 | 61.24 | |
ProbIoU [14] | R50 | 89.0 | 72.1 | 46.9 | 62.2 | 75.7 | 74.7 | 86.6 | 89.5 | 78.3 | 83.1 | 55.8 | 64.0 | 65.5 | 65.4 | 46.2 | 70.04 | |
R3Det [8] | R101 | 88.7 | 83.0 | 50.9 | 67.2 | 76.2 | 80.3 | 86.7 | 90.7 | 84.6 | 83.2 | 61.9 | 61.3 | 66.9 | 70.6 | 53.9 | 73.79 | |
SCRDet++ [9] | R152 | 89.2 | 83.3 | 50.9 | 68.1 | 71.6 | 80.2 | 78.5 | 90.8 | 86.0 | 84.0 | 65.9 | 60.8 | 68.8 | 71.3 | 66.2 | 74.41 | |
S2ANet [10] | R50 | 89.1 | 82.8 | 48.3 | 71.1 | 78.1 | 78.3 | 87.2 | 90.8 | 84.9 | 85.6 | 60.3 | 62.6 | 65.2 | 69.1 | 57.9 | 74.12 | |
CenterRot [18] | R152 | 89.6 | 81.4 | 51.1 | 68.8 | 78.7 | 81.4 | 87.2 | 90.8 | 80.3 | 84.2 | 56.1 | 64.2 | 75.8 | 74.6 | 56.5 | 74.75 | |
PolarDet [17] | R101 | 89.7 | 87.0 | 45.3 | 63.3 | 78.4 | 76.6 | 87.1 | 90.7 | 80.5 | 85.8 | 60.9 | 67.9 | 68.2 | 74.6 | 68.6 | 75.02 | |
KLD [13] | R50 | 88.9 | 83.7 | 50.1 | 68.7 | 78.2 | 76.0 | 84.5 | 89.4 | 86.1 | 85.2 | 63.1 | 60.9 | 75.0 | 71.5 | 67.4 | 75.28 | |
BBAV [16] | R101 | 88.6 | 84.0 | 52.1 | 69.5 | 78.2 | 80.4 | 88.0 | 80.9 | 87.2 | 86.3 | 56.1 | 65.6 | 67.1 | 72.0 | 63.9 | 75.36 | |
DAFNe [29] | R101 | 89.4 | 86.2 | 53.7 | 60.5 | 82.0 | 81.1 | 88.6 | 90.3 | 83.8 | 87.2 | 53.9 | 69.3 | 75.6 | 81.2 | 70.8 | 76.95 | |
O2DFFE (Ours) | R50 | 89.7 | 87.0 | 53.8 | 63.2 | 82.0 | 85.4 | 88.5 | 90.6 | 89.1 | 86.7 | 64.8 | 63.4 | 75.7 | 73.7 | 65.2 | 77.30 | |
O2DFFE (Ours) | R101 | 89.8 | 87.0 | 54.0 | 63.3 | 81.5 | 85.5 | 88.5 | 90.8 | 89.1 | 86.6 | 65.0 | 63.4 | 76.4 | 74.0 | 67.0 | 77.44 | |
O2DFFE * (Ours) | R101 | 90.2 | 83.8 | 57.3 | 80.9 | 79.3 | 84.3 | 88.6 | 90.8 | 85.7 | 86.8 | 70.6 | 70.6 | 79.2 | 76.0 | 74.4 | 79.93 |
Methods | Backbone | PL | BD | BR | GTF | SV | LV | SH | TC | BC | ST | SBF | RA | HA | SP | HC | CC | mAP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Two-stage | ReDet [6] | ReR50 | 79.2 | 82.8 | 51.9 | 71.4 | 52.3 | 75.7 | 80.9 | 90.8 | 75.8 | 68.6 | 49.2 | 72.3 | 73.3 | 70.5 | 63.3 | 11.5 | 66.86 |
ReDet * [6] | ReR50 | 88.5 | 86.4 | 61.2 | 81.2 | 67.6 | 83.6 | 90.0 | 90.8 | 84.3 | 75.3 | 71.4 | 72.6 | 78.3 | 74.7 | 76.1 | 46.9 | 76.80 | |
One-stage | DAFNe [29] | R101 | 80.6 | 86.3 | 52.1 | 62.8 | 67.0 | 76.7 | 88.9 | 90.8 | 77.2 | 83.4 | 51.7 | 74.0 | 75.9 | 75.7 | 72.4 | 34.8 | 71.99 |
O2DFFE(Ours) | R101 | 83.8 | 80.6 | 49.3 | 71.6 | 61.7 | 76.7 | 85.3 | 86.8 | 80.1 | 80.2 | 59.5 | 9.0 | 69.2 | 72.1 | 65.6 | 13.7 | 69.12 | |
O2DFFE(Ours) | R101 | 80.8 | 83.8 | 53.3 | 76.2 | 66.8 | 82.5 | 89.6 | 90.8 | 80.1 | 84.2 | 61.7 | 72.9 | 76.2 | 75.2 | 70.0 | 35.8 | 73.79 |
Methods | Backbone | mAP | |
---|---|---|---|
Two-stage | RoI-Trans [5] | R101 | 86.20 |
ReDet [6] | ReR50 | 90.46 | |
Oriented R-CNN [7] | R101 | 90.50 | |
One-stage | IENet [11] | R101 | 75.01 |
ProbIoU [14] | R50 | 87.09 | |
BBAV [16] | R101 | 88.60 | |
R3Det [8] | R101 | 89.26 | |
DAFNe [29] | R50 | 89.76 | |
S2ANet [10] | R50 | 90.17 | |
CenterRot [18] | R50 | 90.20 | |
PolarDet [17] | R101 | 90.46 | |
O2DFFE (Ours) | R101 | 89.23 | |
O2DFFE* (Ours) | R101 | 90.54 |
Method | FL | EMFL | KAMh | KAMs | PCLM | mAP | Params | FPS |
---|---|---|---|---|---|---|---|---|
O2DFFE | √ | 69.22 | 611.02M | 15.68 | ||||
√ | 70.20 | 611.02M | 15.57 | |||||
√ | √ | 70.48 | 644.54M | 14.82 | ||||
√ | √ | 71.26 | 644.54M | 14.84 | ||||
√ | √ | √ | 72.10 | 644.54M | 14.73 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lin, P.; Wu, X.; Wang, B. Oriented Object Detection Based on Foreground Feature Enhancement in Remote Sensing Images. Remote Sens. 2022, 14, 6226. https://doi.org/10.3390/rs14246226
Lin P, Wu X, Wang B. Oriented Object Detection Based on Foreground Feature Enhancement in Remote Sensing Images. Remote Sensing. 2022; 14(24):6226. https://doi.org/10.3390/rs14246226
Chicago/Turabian StyleLin, Peng, Xiaofeng Wu, and Bin Wang. 2022. "Oriented Object Detection Based on Foreground Feature Enhancement in Remote Sensing Images" Remote Sensing 14, no. 24: 6226. https://doi.org/10.3390/rs14246226
APA StyleLin, P., Wu, X., & Wang, B. (2022). Oriented Object Detection Based on Foreground Feature Enhancement in Remote Sensing Images. Remote Sensing, 14(24), 6226. https://doi.org/10.3390/rs14246226