Next Article in Journal
Accurate Instance Segmentation for Remote Sensing Images via Adaptive and Dynamic Feature Learning
Next Article in Special Issue
Deep Learning Triplet Ordinal Relation Preserving Binary Code for Remote Sensing Image Retrieval Task
Previous Article in Journal
Urban Building Mesh Polygonization Based on 1-Ring Patch and Topology Optimization
Previous Article in Special Issue
A Dense Encoder–Decoder Network with Feedback Connections for Pan-Sharpening
Article

An Improved Swin Transformer-Based Model for Remote Sensing Object Detection and Instance Segmentation

School of Physics and Optoelectronic Engineering, Xidian University, 2 South TaiBai Road, Xi’an 710071, China
*
Author to whom correspondence should be addressed.
Academic Editors: Fahimeh Farahnakian, Jukka Heikkonen and Pouya Jafarzadeh
Remote Sens. 2021, 13(23), 4779; https://doi.org/10.3390/rs13234779 (registering DOI)
Received: 19 October 2021 / Revised: 15 November 2021 / Accepted: 22 November 2021 / Published: 25 November 2021
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Remote sensing image object detection and instance segmentation are widely valued research fields. A convolutional neural network (CNN) has shown defects in the object detection of remote sensing images. In recent years, the number of studies on transformer-based models increased, and these studies achieved good results. However, transformers still suffer from poor small object detection and unsatisfactory edge detail segmentation. In order to solve these problems, we improved the Swin transformer based on the advantages of transformers and CNNs, and designed a local perception Swin transformer (LPSW) backbone to enhance the local perception of the network and to improve the detection accuracy of small-scale objects. We also designed a spatial attention interleaved execution cascade (SAIEC) network framework, which helped to strengthen the segmentation accuracy of the network. Due to the lack of remote sensing mask datasets, the MRS-1800 remote sensing mask dataset was created. Finally, we combined the proposed backbone with the new network framework and conducted experiments on this MRS-1800 dataset. Compared with the Swin transformer, the proposed model improved the mask AP by 1.7%, mask APS by 3.6%, AP by 1.1% and APS by 4.6%, demonstrating its effectiveness and feasibility. View Full-Text
Keywords: instance segmentation; object detection; Swin transformer; remote sensing image; cascade mask R-CNN instance segmentation; object detection; Swin transformer; remote sensing image; cascade mask R-CNN
Show Figures

Graphical abstract

MDPI and ACS Style

Xu, X.; Feng, Z.; Cao, C.; Li, M.; Wu, J.; Wu, Z.; Shang, Y.; Ye, S. An Improved Swin Transformer-Based Model for Remote Sensing Object Detection and Instance Segmentation. Remote Sens. 2021, 13, 4779. https://doi.org/10.3390/rs13234779

AMA Style

Xu X, Feng Z, Cao C, Li M, Wu J, Wu Z, Shang Y, Ye S. An Improved Swin Transformer-Based Model for Remote Sensing Object Detection and Instance Segmentation. Remote Sensing. 2021; 13(23):4779. https://doi.org/10.3390/rs13234779

Chicago/Turabian Style

Xu, Xiangkai, Zhejun Feng, Changqing Cao, Mengyuan Li, Jin Wu, Zengyan Wu, Yajie Shang, and Shubing Ye. 2021. "An Improved Swin Transformer-Based Model for Remote Sensing Object Detection and Instance Segmentation" Remote Sensing 13, no. 23: 4779. https://doi.org/10.3390/rs13234779

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop