CPISNet: Delving into Consistent Proposals of Instance Segmentation Network for High-Resolution Aerial Images
Abstract
:1. Introduction
- CPISNet is proposed for multi-category instance segmentation of aerial images;
- Effects of AFEN, ERoIE, and proposal consistent cascaded (PCC) architecture to the CPISNet are individually verified, which boost the integral network performance;
- CPISNet achieves the best AP of instance segmentation in high-resolution aerial images compared to the other state-of-the-art methods.
2. Related Work
2.1. Object Detection
2.2. Instance Segmentation
3. The Proposed Method
3.1. The Adaptive Feature Extraction Network
3.1.1. Backbone Network
3.1.2. Multi-Level Feature Extraction Network
3.2. The RoI Extractors
3.2.1. Single RoI Extractor
3.2.2. Elaborated RoI Extractor
3.3. Proposal Consistent Cascaded Architecture for Instance Segmentation
4. Experiments
4.1. The Datasets
4.1.1. The iSAID
4.1.2. The NWPU VHR-10 Instance Segmentation Dataset
4.2. Evaluation Metrics
4.3. The Loss Functions
4.4. Implementation Details
4.5. Ablation Experiments
4.5.1. Effects of CPISNet
4.5.2. Experiments on AFEN
4.5.3. Experiments on ERoIE
- Stage 1: Effects of the Preliminarily Elaborated Module
- 2.
- Stage 2: Effects of the Post Elaborated Module
- 3.
- Stage 3: Effects of the Integral ERoIE
4.5.4. Experiments on PCC
- Group 1: Selecting the Depth of Mask Branch
- 2.
- Group 2: Effects of PCC
4.6. Instance Segmentation Results on iSAID
4.7. Instance Segmentation Results on NWPU-VHR-10 Dataset
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Cheng, G.; Zhou, P.; Han, J. Learning rotation-invariant convolutional neural networks for object detection in vhr optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Zhou, P.; Xu, D. Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection. IEEE Trans. Image Process. 2018, 28, 265–278. [Google Scholar] [CrossRef] [PubMed]
- Zhang, T.; Zhang, X.; Shi, J.; Wei, S. Hyperli-net: A hyper-light deep learning network for high-accurate and high-speed ship detection from synthetic aperture radar imagery. ISPRS J. Photogramm. Remote Sens. 2020, 167, 123–153. [Google Scholar] [CrossRef]
- Su, H.; Wei, S.; Ming, J.; Wang, C.; Yan, M.; Kumar, D.; Shi, J.; Zhang, X. Precise and robust ship detection for high-resolution sar imagery based on hr-sdnet. Remote Sens. 2020, 12, 167. [Google Scholar]
- Liu, L.; Pan, Z.; Lei, B. Learning a rotation invariant detector with rotatable bounding box. arXiv 2017, arXiv:1711.09405. [Google Scholar]
- An, Q.; Pan, Z.; Liu, L.; You, H. Drbox-v2: An improved detector with rotatable boxes for target detection in sar images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8333–8349. [Google Scholar] [CrossRef]
- Mou, L.; Zhu, X.X. Rifcn: Recurrent network in fully convolutional network for semantic segmentation of high resolution remote sensing images. arXiv 2018, arXiv:1805.02091. [Google Scholar]
- Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. Resunet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS J. Photogramm. Remote Sens. 2020, 162, 94–114. [Google Scholar] [CrossRef] [Green Version]
- Ding, L.; Tang, H.; Bruzzone, L. Lanet: Local attention embedding to improve the semantic segmentation of remote sensing images. IEEE Trans. Geosci. Remote Sens. 2020, 59, 426–435. [Google Scholar] [CrossRef]
- Marmanis, D.; Schindler, K.; Wegner, J.D.; Galliani, S.; Datcu, M.; Stilla, U. Classification with an edge: Improving semantic image segmentation with boundary detection. ISPRS J. Photogramm. Remote Sens. 2018, 135, 158–172. [Google Scholar] [CrossRef] [Green Version]
- Su, H.; Wei, S.; Liu, S.; Liang, J.; Wang, C.; Shi, J.; Zhang, X. Hq-isnet: High-quality instance segmentation for remote sensing imagery. Remote Sens. 2020, 12, 989. [Google Scholar] [CrossRef] [Green Version]
- Wei, S.; Zeng, X.; Qu, Q.; Wang, M.; Su, H.; Shi, J. Hrsid: A high-resolution sar images dataset for ship detection and instance segmentation. IEEE Access 2020, 8, 120234–120254. [Google Scholar] [CrossRef]
- Su, H.; Wei, S.; Yan, M.; Wang, C.; Shi, J.; Zhang, X. Object detection and instance segmentation in remote sensing imagery based on precise mask r-cnn. In Proceedings of the IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 1454–1457. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Feng, Y.; Diao, W.; Zhang, Y.; Li, H.; Chang, Z.; Yan, M.; Sun, X.; Gao, X. Ship instance segmentation from remote sensing images using sequence local context module. In Proceedings of the IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 1025–1028. [Google Scholar]
- Mou, L.; Zhu, X.X. Vehicle instance segmentation from aerial image and video using a multitask learning residual fully convolutional network. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6699–6711. [Google Scholar] [CrossRef] [Green Version]
- Zamir, S.W.; Arora, A.; Gupta, A.; Khan, S.; Sun, G.; Khan, F.S.; Zhu, F.; Shao, L.; Xia, G.-S.; Bai, X. isaid: A large-scale dataset for instance segmentation in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–20 June 2019; pp. 28–37. [Google Scholar]
- Yekeen, S.T.; Balogun, A.-L.; Yusof, K.B.W. A novel deep learning instance segmentation model for automated marine oil spill detection. ISPRS J. Photogramm. Remote Sens. 2020, 167, 190–200. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolo9000: Better, faster, stronger. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Hononlulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 6569–6578. [Google Scholar]
- Yang, X.; Liu, Q.; Yan, J.; Li, A.; Zhang, Z.; Yu, G. R3det: Refined single-stage detector with feature refinement for rotating object. arXiv 2019, arXiv:1908.05612. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Hononlulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Santiago, Chile, 13–16 November 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162. [Google Scholar]
- Li, Y.; Qi, H.; Dai, J.; Ji, X.; Wei, Y. Fully convolutional instance-aware semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Hononlulu, HI, USA, 21–26 July 2017; pp. 2359–2367. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
- Huang, Z.; Huang, L.; Gong, Y.; Huang, C.; Wang, X. Mask scoring r-cnn. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 6409–6418. [Google Scholar]
- Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Shi, J.; Ouyang, W.; et al. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4974–4983. [Google Scholar]
- Xie, E.; Sun, P.; Song, X.; Wang, W.; Liu, X.; Liang, D.; Shen, C.; Luo, P. Polarmask: Single shot instance segmentation with polar representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12193–12202. [Google Scholar]
- Wang, X.; Kong, T.; Shen, C.; Jiang, Y.; Li, L. Solo: Segmenting objects by locations. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 649–665. [Google Scholar]
- Wang, X.; Zhang, R.; Kong, T.; Li, L.; Shen, C. Solov2: Dynamic and fast instance segmentation. arXiv 2020, arXiv:2003.10152. [Google Scholar]
- Chen, H.; Sun, K.; Tian, Z.; Shen, C.; Huang, Y.; Yan, Y. Blendmask: Top-down meets bottom-up for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8573–8581. [Google Scholar]
- Radosavovic, I.; Kosaraju, R.P.; Girshick, R.; He, K.; Dollár, P. Designing network design spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10428–10436. [Google Scholar]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Hononlulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
- Radosavovic, I.; Johnson, J.; Xie, S.; Lo, W.-Y.; Dollár, P. On network design spaces for visual recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 1882–1890. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 9308–9316. [Google Scholar]
- Vu, T.; Kang, H.; Yoo, C.D. Scnet: Training inference sample consistency for instance segmentation. arXiv 2020, arXiv:2012.10150. [Google Scholar]
- Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J. Photogramm. Remote Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J. A survey on object detection in optical remote sensing images. ISPRS J. Photogramm. Remote Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef] [Green Version]
- Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-nms–improving object detection with one line of code. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5561–5569. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 5693–5703. [Google Scholar]
- Sun, K.; Zhao, Y.; Jiang, B.; Cheng, T.; Xiao, B.; Liu, D.; Mu, Y.; Wang, X.; Liu, W.; Wang, J. High-resolution representations for labeling pixels and regions. arXiv 2019, arXiv:1904.04514. [Google Scholar]
- Rossi, L.; Karimi, A.; Prati, A. A novel region of interest extraction layer for instance segmentation. In Proceedings of the 2020 25th IEEE International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 2203–2209. [Google Scholar]
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. Mmdetection: Open mmlab detection toolbox and benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
Backbone Network | Stage Output Width | Num of Blocks | Group Ratio | ||||||
---|---|---|---|---|---|---|---|---|---|
ResNet-50 | 256 | 512 | 1024 | 2048 | 3 | 4 | 6 | 3 | ✘ |
ResNet-101 | 256 | 512 | 1024 | 2048 | 3 | 4 | 23 | 3 | ✘ |
RegNetx-3.2GF | 96 | 192 | 432 | 1008 | 2 | 6 | 15 | 2 | 48 |
RegNetx-4.0GF | 80 | 240 | 560 | 1360 | 2 | 5 | 14 | 2 | 40 |
Model | AFEN | ERoIE | PCC | AP | AP | AP | AP | AP | AP |
---|---|---|---|---|---|---|---|---|---|
Mask R-CNN | 36.0 | 58.4 | 38.8 | 22.7 | 43.3 | 49.7 | |||
✓ | 36.6 | 59.3 | 39.6 | 23.8 | 43.1 | 51.7 | |||
Modules | ✓ | 36.9 | 59.2 | 39.9 | 23.1 | 44.0 | 52.1 | ||
✓ | 37.9 | 60.2 | 41.0 | 24.0 | 45.3 | 53.8 | |||
CPISNet | ✓ | ✓ | ✓ | 38.6 | 61.5 | 41.4 | 25.7 | 45.6 | 55.0 |
Feature Extraction Structures | AP | AP | AP | AP | AP | AP |
---|---|---|---|---|---|---|
ResNet-101 + FPN | 36.0 | 58.4 | 38.8 | 22.7 | 43.3 | 49.7 |
HRNetv2-w32 + HRFPN | 36.3 | 58.7 | 39.0 | 24.4 | 42.5 | 51.1 |
RegNetx-3.2GF + FPN | 36.1 | 59.0 | 38.3 | 23.9 | 43.1 | 49.4 |
AFEN-3.2GF | 36.4 | 59.1 | 38.9 | 24.1 | 42.9 | 51.2 |
AFEN-4.0GF | 36.6 | 59.3 | 39.6 | 23.8 | 43.1 | 51.7 |
Elaborated Layer | AP | AP | AP | AP | AP | AP |
---|---|---|---|---|---|---|
36.4 | 58.6 | 39.4 | 23.1 | 43.4 | 51.3 | |
36.6 | 58.7 | 39.7 | 23.0 | 43.8 | 51.2 | |
36.7 | 58.9 | 39.9 | 22.9 | 44.0 | 51.5 | |
36.5 | 58.7 | 39.5 | 22.7 | 44.0 | 52.0 | |
36.5 | 58.6 | 39.3 | 23.0 | 43.8 | 50.9 | |
36.4 | 58.5 | 39.2 | 22.7 | 43.5 | 51.5 | |
36.4 | 58.5 | 39.3 | 22.9 | 43.8 | 51.6 | |
36.7 | 58.6 | 39.7 | 23.0 | 43.6 | 51.6 | |
36.4 | 58.5 | 39.4 | 22.3 | 43.8 | 50.7 |
Elaborated Layer | AP | AP | AP | AP | AP | AP |
---|---|---|---|---|---|---|
36.3 | 58.6 | 39.3 | 22.9 | 43.4 | 51.1 | |
36.4 | 58.9 | 39.1 | 23.2 | 43.5 | 51.2 | |
36.7 | 58.8 | 39.8 | 22.9 | 43.7 | 51.8 | |
36.6 | 58.7 | 39.6 | 22.8 | 43.7 | 51.5 | |
36.7 | 59.1 | 39.7 | 23.1 | 43.8 | 51.7 | |
36.5 | 59.0 | 39.2 | 22.8 | 43.7 | 51.1 | |
36.6 | 58.9 | 39.4 | 23.1 | 43.9 | 50.7 | |
36.6 | 58.7 | 39.4 | 22.6 | 43.9 | 51.6 | |
36.9 | 59.2 | 39.9 | 23.1 | 44.0 | 52.1 |
Effects of Integral ERoIE | AP | AP | AP | AP | AP | AP |
---|---|---|---|---|---|---|
SRoIE | 36.0 | 58.4 | 38.8 | 22.7 | 43.3 | 49.7 |
ERoIE without appendages | 36.0 | 58.4 | 38.7 | 22.1 | 43.2 | 50.3 |
+post GCB | 36.3 | 58.8 | 39.2 | 22.4 | 43.7 | 51.4 |
+post DCN | 36.6 | 59.0 | 39.6 | 23.3 | 43.9 | 51.3 |
ERoIE | 36.9 | 59.2 | 39.9 | 23.1 | 44.0 | 52.1 |
Number of Blocks | AP | AP | AP | AP | AP | AP |
---|---|---|---|---|---|---|
2 | 37.3 | 59.8 | 40.1 | 23.8 | 44.4 | 53.4 |
4 | 37.6 | 60.2 | 40.8 | 23.6 | 44.9 | 53.2 |
6 | 37.6 | 60.2 | 40.7 | 23.3 | 45.2 | 53.3 |
8 | 37.9 | 60.2 | 41.0 | 24.0 | 45.3 | 53.8 |
10 | 37.7 | 60.4 | 40.7 | 23.9 | 45.1 | 53.4 |
Cascaded Architectures | Backbone | AP | AP | AP | AP | AP | AP |
---|---|---|---|---|---|---|---|
Cascaded Mask Branch | R-50 | 36.0 | 58.0 | 38.7 | 23.7 | 42.9 | 48.9 |
R-101 | 36.9 | 59.1 | 40.3 | 23.1 | 44.1 | 51.6 | |
Mask Information Flow | R-50 | 36.6 | 59.1 | 39.3 | 23.7 | 43.7 | 51.3 |
R-101 | 37.5 | 60.1 | 40.5 | 23.2 | 44.7 | 53.6 | |
PCC | R-50 | 37.0 | 58.8 | 40.1 | 24.1 | 44.1 | 52.4 |
R-101 | 37.9 | 60.2 | 41.0 | 24.0 | 45.3 | 53.8 |
Method | AP | AP | AP | AP | AP | AP | FPS | Model Size |
---|---|---|---|---|---|---|---|---|
Mask R-CNN | 36.0 | 58.4 | 38.8 | 22.7 | 43.3 | 49.7 | 13.6 | 504.2 Mb |
MS R-CNN | 36.9 | 58.3 | 40.3 | 22.7 | 44.0 | 51.9 | 12.9 | 634.4 Mb |
CM R-CNN | 36.9 | 59.1 | 40.3 | 23.1 | 44.1 | 51.6 | 11.5 | 768.4 Mb |
HTC | 37.4 | 60.2 | 40.1 | 23.5 | 44.6 | 53.5 | 7.4 | 791.9 Mb |
SCNet | 37.3 | 59.5 | 40.3 | 23.3 | 44.8 | 52.3 | 6.7 | 908.4 Mb |
CPISNet | 38.6 | 61.5 | 41.4 | 25.7 | 45.6 | 55.0 | 6.1 | 663.3 Mb |
CPISNet* | 39.4 | 62.4 | 42.4 | 26.6 | 46.6 | 54.2 | 5.3 | 663.3 Mb |
Method | SV | LV | PL | ST | SH | SP | HB | TC | GTF | SBF | BD | BR | BC | RA | HC |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mask R-CNN | 40.2 | 35.0 | 54.4 | 77.6 | 40.1 | 29.5 | 21.9 | 36.9 | 11.7 | 4.0 | 30.9 | 34.5 | 46.6 | 49.2 | 27.3 |
MS R-CNN | 40.5 | 35.0 | 55.9 | 77.4 | 41.7 | 30.7 | 23.4 | 37.7 | 11.8 | 5.1 | 31.5 | 37.6 | 47.7 | 49.9 | 28.1 |
CM R-CNN | 41.1 | 35.9 | 54.4 | 77.7 | 43.5 | 30.6 | 22.9 | 38.6 | 12.0 | 4.6 | 31.6 | 35.1 | 48.0 | 50.2 | 27.8 |
HTC | 41.4 | 35.5 | 54.6 | 78.6 | 42.9 | 32.4 | 23.3 | 39.8 | 12.3 | 4.5 | 32.1 | 36.2 | 47.9 | 50.8 | 28.4 |
SCNet | 41.8 | 35.5 | 56.6 | 78.5 | 41.2 | 32.6 | 21.9 | 39.8 | 12.1 | 3.9 | 31.6 | 36.4 | 47.5 | 51.6 | 28.9 |
CPISNet | 42.9 | 37.8 | 54.6 | 78.8 | 41.1 | 36.6 | 23.9 | 41.2 | 13.0 | 7.6 | 33.7 | 35.9 | 48.5 | 53.4 | 30.1 |
CPISNet* | 43.6 | 37.2 | 55.6 | 80.5 | 42.8 | 36.7 | 25.0 | 41.8 | 12.8 | 5.8 | 35.4 | 39.3 | 49.8 | 54.3 | 30.0 |
Method | AP | AP | AP | AP | AP | AP |
---|---|---|---|---|---|---|
Mask R-CNN | 36.2 | 58.6 | 38.8 | 38.9 | 44.2 | 12.0 |
MS R-CNN | 37.0 | 57.8 | 40.5 | 39.7 | 46.0 | 14.3 |
CM R-CNN | 37.1 | 59.0 | 40.1 | 39.8 | 46.4 | 12.9 |
HTC | 37.5 | 59.6 | 40.8 | 40.2 | 47.4 | 14.2 |
SCNet | 38.1 | 60.4 | 41.2 | 40.9 | 46.9 | 12.6 |
CPISNet | 39.1 | 62.2 | 42.5 | 41.8 | 49.6 | 17.6 |
CPISNet* | 40.0 | 62.7 | 43.9 | 42.9 | 50.4 | 16.5 |
Method | SV | LV | PL | ST | SH | SP | HB | TC | GTF | SBF | BD | BR | BC | RA | HC |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mask R-CNN | 13.2 | 29.6 | 42.9 | 34.1 | 46.1 | 37.4 | 29.2 | 75.4 | 27.1 | 36.3 | 51.3 | 17.6 | 49.0 | 43.3 | 9.6 |
MS R-CNN | 14.0 | 30.0 | 43.8 | 33.9 | 46.6 | 37.9 | 30.1 | 76.1 | 29.7 | 35.7 | 54.2 | 17.7 | 49.9 | 44.4 | 11.8 |
CM R-CNN | 14.2 | 30.4 | 43.5 | 34.5 | 47.1 | 38.3 | 30.5 | 76.6 | 28.1 | 37.4 | 53.0 | 18.2 | 50.5 | 44.2 | 10.2 |
HTC | 14.5 | 31.7 | 43.9 | 34.8 | 47.7 | 38.7 | 31.0 | 77.3 | 29.7 | 37.9 | 53.3 | 18.9 | 50.2 | 43.9 | 9.3 |
SCNet | 14.2 | 31.7 | 45.1 | 35.9 | 48.0 | 39.2 | 31.1 | 77.0 | 30.2 | 36.3 | 56.6 | 18.7 | 51.5 | 46.7 | 9.7 |
CPISNet | 14.9 | 32.9 | 46.2 | 35.8 | 49.5 | 40.6 | 32.7 | 77.6 | 31.9 | 39.3 | 54.3 | 19.9 | 52.9 | 45.2 | 13.2 |
CPISNet* | 14.9 | 34.0 | 47.3 | 36.0 | 50.2 | 41.6 | 33.9 | 78.8 | 31.6 | 40.2 | 56.2 | 20.2 | 55.7 | 47.6 | 12.0 |
Method | AP | AP | AP | AP | AP | AP | FPS | Model Size |
---|---|---|---|---|---|---|---|---|
Mask R-CNN | 58.3 | 90.9 | 63.5 | 46.5 | 59.6 | 57.5 | 12.2 | 503.9 Mb |
MS R-CNN | 59.5 | 90.8 | 65.2 | 43.9 | 61.1 | 56.8 | 11.1 | 634.1 Mb |
CM R-CNN | 60.4 | 92.6 | 67.5 | 48.1 | 61.0 | 63.0 | 10.6 | 768.3 Mb |
HTC | 61.4 | 92.2 | 67.0 | 49.3 | 62.1 | 60.8 | 7.5 | 791.8 Mb |
SCNet | 62.3 | 91.3 | 69.4 | 49.8 | 62.8 | 68.2 | 7.1 | 908.2 Mb |
CPISNet | 66.1 | 93.7 | 73.1 | 53.3 | 66.2 | 75.5 | 5.2 | 663.1 Mb |
CPISNet* | 67.5 | 94.3 | 74.9 | 55.4 | 67.7 | 74.0 | 5 | 663.1 Mb |
Method | AI | BD | GTF | VC | SH | TC | HB | ST | BC | BR |
---|---|---|---|---|---|---|---|---|---|---|
Mask R-CNN | 28.4 | 81.4 | 84.3 | 50.6 | 52.8 | 59.6 | 60.7 | 69.6 | 69.6 | 25.8 |
MS R-CNN | 29.6 | 81.8 | 85.4 | 52.5 | 52.5 | 61.7 | 59.6 | 69.1 | 72.4 | 30.3 |
CM R-CNN | 26.3 | 82.9 | 86.2 | 52.5 | 56.2 | 64.6 | 62.9 | 70.5 | 72.7 | 29.4 |
HTC | 28.7 | 83.3 | 87.6 | 54.4 | 57.9 | 64.8 | 63.0 | 72.3 | 73.4 | 28.0 |
SCNet | 32.9 | 85.8 | 89.1 | 55.1 | 58.6 | 69.5 | 64.4 | 70.0 | 72.9 | 24.7 |
CPISNet | 41.5 | 86.2 | 91.6 | 57.4 | 57.6 | 73.3 | 67.6 | 74.2 | 75.7 | 35.9 |
CPISNet* | 43.1 | 86.2 | 92.5 | 59.7 | 58.2 | 74.5 | 66.6 | 74.6 | 83.6 | 35.7 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zeng, X.; Wei, S.; Wei, J.; Zhou, Z.; Shi, J.; Zhang, X.; Fan, F. CPISNet: Delving into Consistent Proposals of Instance Segmentation Network for High-Resolution Aerial Images. Remote Sens. 2021, 13, 2788. https://doi.org/10.3390/rs13142788
Zeng X, Wei S, Wei J, Zhou Z, Shi J, Zhang X, Fan F. CPISNet: Delving into Consistent Proposals of Instance Segmentation Network for High-Resolution Aerial Images. Remote Sensing. 2021; 13(14):2788. https://doi.org/10.3390/rs13142788
Chicago/Turabian StyleZeng, Xiangfeng, Shunjun Wei, Jinshan Wei, Zichen Zhou, Jun Shi, Xiaoling Zhang, and Fan Fan. 2021. "CPISNet: Delving into Consistent Proposals of Instance Segmentation Network for High-Resolution Aerial Images" Remote Sensing 13, no. 14: 2788. https://doi.org/10.3390/rs13142788