SegDetector: A Deep Learning Model for Detecting Small and Overlapping Damaged Buildings in Satellite Images
Abstract
:1. Introduction
- (1)
- This study proposes a full-resolution semantic segmentation-based target detection model, SegDetector, which has better detection performance for small targets. At the same time, SegDetector avoids the use of NMS and thus increases the speed of detection.
- (2)
- To improve the detection of small or overlapping targets, the SegDetector calculates binary cross-entropy loss for different categories of foreground and background to improve the detection performance of overlapping targets.
- (3)
- The SegDetector model can perform rotation detection of targets for more accurate localization without retraining the model and without increasing the model complexity.
2. Data and Methods
2.1. Data Collection
2.2. SegFormer Model
2.2.1. Data Processing
2.2.2. Encoder
2.2.3. Decoder
2.2.4. Loss
2.3. SegFormer-Based Detection Model SegDetector
2.3.1. Enhance Detection of Overlapping Targets
2.3.2. Implementation of Target Detection Function
3. Experiments and Results Analysis
3.1. Experimental Parameters
3.2. Experimental Evaluation Indexes
3.3. Evaluation of Experimental Effects
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
References
- Koshimura, S.; Moya, L.; Mas, E.; Bai, Y. Tsunami damage detection with remote sensing: A review. Geosciences 2020, 10, 177. [Google Scholar] [CrossRef]
- Sui, H.; Liu, C.; Huang, L. Application of remote sensing technology in earthquake-induced building damage detection. Geomat. Inf. Sci. Wuhan Univ. 2019, 44, 1008–1019. [Google Scholar]
- Li, H.; Huang, C.; Liu, Q.; Liu, G.; He, Y.; Yu, H. Review on dynamic monitoring of mangrove forestry using remote sensing. J. Geo-Inf. Sci. 2018, 20, 1631–1643. [Google Scholar]
- Xie, Y.; Feng, D.; Chen, H.; Liu, Z.; Mao, W.; Zhu, J.; Hu, Y.; Baik, S. Damaged Building Detection from Post-Earthquake Remote Sensing Imagery Considering Heterogeneity Characteristics. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
- Li, J.; Huang, X.; Tu, L.; Zhang, T.; Wang, L. A review of building detection from very high resolution optical remote sensing images. GISci. Remote Sens. 2022, 59, 1199–1225. [Google Scholar] [CrossRef]
- Xu, H.; Zhu, Y.; Zhen, T.; Ll, Z. Survey of lmage semantic segmentation methods based on deep neural network. J. Front. Comput. Sci. Technol. 2021, 15, 47–59. [Google Scholar]
- Ding, J.; Zhang, J.; Zhan, Z.; Tang, X.; Wang, X. A Precision Efficient Method for Collapsed Building Detection in Post-Earthquake UAV Images Based on the Improved NMS Algorithm and Faster R-CNN. Remote Sens. 2022, 14, 663. [Google Scholar] [CrossRef]
- Bai, T.; Pang, Y.; Wang, J.; Han, K.; Luo, J.; Wang, H.; Lin, J.; Wu, J.; Zhang, H. An optimized faster R-CNN method based on DRNet and RoI align for building detection in remote sensing images. Remote Sens. 2020, 12, 762. [Google Scholar] [CrossRef] [Green Version]
- Liu, Y.; Zhang, Z.; Zhong, R.; Chen, D.; Ke, Y.; Peethambaran, J.; Chen, C.; Sun, L. Multilevel building detection framework in remote sensing images based on convolutional neural networks. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2018, 11, 3688–3700. [Google Scholar] [CrossRef]
- Bai, Y.; Hu, J.; Su, J.; Liu, X.; Liu, H.; He, X.; Meng, S.; Mas, E.; Koshimura, S. Pyramid pooling module-based semi-siamese network: A benchmark model for assessing building damage from xBD satellite imagery datasets. Remote Sens. 2020, 12, 4055. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.; Liao, H.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Wang, C.; Bochkovskiy, A.; Liao, H.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Chen, Q.; Wang, Y.; Yang, T.; Zhang, X.; Cheng, J.; Sun, J. You only look one-level feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13039–13048. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.; Berg, A.C. Ssd: Single shot multibox detector. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Li, Z.; Zhou, F. FSSD: Feature fusion single shot multibox detector. arXiv 2017, arXiv:1712.00960. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Lin, T.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Salscheider, N.O. Featurenms: Non-maximum suppression by learning feature embeddings. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 7848–7854. [Google Scholar]
- Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS—Improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5561–5569. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.; Wu, J. Unet 3+: A full-scale connected unet for medical image segmentation. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 1055–1059. [Google Scholar]
- Gupta, R.; Hosfelt, R.; Sajeev, S.; Patel, N.; Goodman, B.; Doshi, J.; Heim, E.; Choset, H.; Gaston, M. XBD: A dataset for assessing building damage from satellite imagery. arXiv 2019, arXiv:1911.09296v1. [Google Scholar]
- Gupta, R.; Goodman, B.; Patel, N.; Hosfelt, R.; Sajeev, S.; Heim, E.; Doshi, J.; Lucas, K.; Choset, H.; Gaston, M. Creating xBD: A dataset for assessing building damage from satellite imagery. In Proceedings of the IEEEConference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–20 June 2019; pp. 10–17. [Google Scholar]
- Tilon, S.; Nex, F.; Kerle, N.; Vosselman, G. Post-Disaster Building Damage Detection from Earth Observation Imagery Using Unsupervised and Transferable Anomaly Detecting Generative Adversarial Networks. Remote Sens. 2020, 12, 4193. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–15 October 2021; pp. 10012–10022. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–15 October 2021; pp. 568–578. [Google Scholar]
- Chen, Z.; Chang, R.; Guo, H.; Pei, X.; Zhao, W.; Yu, Z.; Zou, L. Prediction of Potential Geothermal Disaster Areas along the Yunnan–Tibet Railway Project. Remote Sens. 2022, 14, 3036. [Google Scholar] [CrossRef]
- Chen, Z.; Chang, R.; Zhao, W.; Li, S.; Guo, H.; Xiao, K.; Wu, L.; Hou, D.; Zou, L. Quantitative Prediction and Evaluation of Geothermal Resource Areas in the Southwest Section of the Mid-Spine Belt of Beautiful China. Int. J. Digit. Earth 2022, 15, 748–769. [Google Scholar] [CrossRef]
- Dong, S.; Chen, Z. A multi-level feature fusion network for remote sensing image segmentation. Sensors 2021, 21, 1267. [Google Scholar] [CrossRef] [PubMed]
- Jian, M.; Wang, J.; Yu, H.; Wang, G.; Meng, X.; Yang, L.; Dong, J.; Yin, Y. Visual saliency detection by integrating spatial position prior of object with background cues. Expert Syst. Appl. 2021, 168, 114219. [Google Scholar] [CrossRef]
- Lin, T.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6569–6578. [Google Scholar]
Hyperparameter | Range |
---|---|
Input size | 640 × 640 |
Activation | ReLU |
Optimizer | AdamW |
Loss Function | Binary cross entropy |
Dropout | 0.2 |
IOU Threshold | 0.5 |
Score Threshold | 0.5 |
Hyperparameter | Range |
Input size | 640 × 640 |
Activation | ReLU |
Model | MIOU | MPA | Precision | Recall | F1 |
---|---|---|---|---|---|
DDRNet | 0.497 | 0.563 | 0.687 | 0.574 | 0.625 |
SeMask | 0.511 | 0.591 | 0.719 | 0.586 | 0.646 |
SegFormer | 0.502 | 0.582 | 0.700 | 0.582 | 0.636 |
Improved-SegFormer | 0.514 | 0.586 | 0.724 | 0.586 | 0.648 |
Model | MIOU | MPA | Precision | Recall | F1 | |||
---|---|---|---|---|---|---|---|---|
DDRNet | 0.961 | 0.581 | 0.439 | 0.660 | 0.735 | 0.811 | 0.734 | 0.770 |
SeMask | 0.962 | 0.587 | 0.446 | 0.665 | 0.756 | 0.819 | 0.743 | 0.779 |
SegFormer | 0.961 | 0.583 | 0.443 | 0.662 | 0.742 | 0.819 | 0.737 | 0.776 |
Improved-SegFormer | 0.962 | 0.587 | 0.451 | 0.666 | 0.753 | 0.821 | 0.742 | 0.781 |
Model | MIOU | MPA | Precision | Recall | F1 | ||
---|---|---|---|---|---|---|---|
PPM-Net [10] | 0.918 | 0.473 | 0.696 | - | - | - | 0.777 |
DDRNet | 0.944 | 0.573 | 0.759 | 0.835 | 0.866 | 0.841 | 0.853 |
SeMask | 0.967 | 0.588 | 0.778 | 0.847 | 0.880 | 0.837 | 0.858 |
SegFormer | 0.942 | 0.581 | 0.762 | 0.832 | 0.871 | 0.839 | 0.854 |
Improved-SegFormer | 0.960 | 0.591 | 0.773 | 0.844 | 0.878 | 0.844 | 0.861 |
Recall | Precision | F1 | FPS | Param/M | |
---|---|---|---|---|---|
YOLOv3 | 0.15 | 0.60 | 0.24 | 65 | 61.6 |
YOLOv4 | 0.53 | 0.43 | 0.47 | 62 | 64.2 |
CenterNet2 | 0.51 | 0.56 | 0.53 | 89 | 19.9 |
Faster R-CNN | 0.67 | 0.48 | 0.56 | 28 | 137.2 |
YOLOX | 0.62 | 0.51 | 0.56 | 116 | 9.0 |
SegDetector | 0.81 | 0.63 | 0.71 | 102 | 3.8 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, Z.; Chen, Z.; Sun, Z.; Guo, H.; Leng, B.; He, Z.; Yang, J.; Xing, S. SegDetector: A Deep Learning Model for Detecting Small and Overlapping Damaged Buildings in Satellite Images. Remote Sens. 2022, 14, 6136. https://doi.org/10.3390/rs14236136
Yu Z, Chen Z, Sun Z, Guo H, Leng B, He Z, Yang J, Xing S. SegDetector: A Deep Learning Model for Detecting Small and Overlapping Damaged Buildings in Satellite Images. Remote Sensing. 2022; 14(23):6136. https://doi.org/10.3390/rs14236136
Chicago/Turabian StyleYu, Zhengbo, Zhe Chen, Zhongchang Sun, Huadong Guo, Bo Leng, Ziqiong He, Jinpei Yang, and Shuwen Xing. 2022. "SegDetector: A Deep Learning Model for Detecting Small and Overlapping Damaged Buildings in Satellite Images" Remote Sensing 14, no. 23: 6136. https://doi.org/10.3390/rs14236136
APA StyleYu, Z., Chen, Z., Sun, Z., Guo, H., Leng, B., He, Z., Yang, J., & Xing, S. (2022). SegDetector: A Deep Learning Model for Detecting Small and Overlapping Damaged Buildings in Satellite Images. Remote Sensing, 14(23), 6136. https://doi.org/10.3390/rs14236136