Multi-Category Segmentation of Sentinel-2 Images Based on the Swin UNet Method
Abstract
:1. Introduction
- (1)
- The SwinUnet model is improved and applied in a 10-categories segmentation from Sentinel-2 images, which is compared with traditional classification methods and CNN-based segmentation methods.
- (2)
- The FROM-GLC10 dataset is optimized and used as sample data. The approach and the transformer performance are analyzed.
- (3)
- The segmentation results of different spectral combinations from Sentinel-2 MSI (Multi-Spectral Imager) images are systematically compared and the optimal spectral combination scheme is obtained.
2. Data and Methods
2.1. Study Areas and Data
2.2. Technical Route and Swin UNet
2.3. Network Enhancement Methods
2.4. Evaluation Metrics
2.5. Experimental Environment
3. Results and Validations
3.1. Results and Comparison with CNN-Based Networks
3.2. Results of Large-Scale Mapping
3.3. Different Image Band Combinations
4. Discussion
4.1. Impact Factors of Accuracy
4.2. Migrability of the Segmentation Model
4.3. Limitations and Outlooks
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Hansen, M.C.; Loveland, T.R. A review of large area monitoring of land cover change using Landsat data. Remote Sens. Environ. 2012, 122, 66–74. [Google Scholar] [CrossRef]
- Foley, J.A.; DeFries, R.; Asner, G.P.; Barford, C.; Bonan, G.; Carpenter, S.R.; Chapin, F.S.; Coe, M.T.; Daily, G.C.; Gibbs, H.K. Global consequences of land use. Science 2005, 309, 570–574. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Vorosmarty, C.J.; McIntyre, P.B.; Gessner, M.O.; Dudgeon, D.; Prusevich, A.; Green, P.; Glidden, S.; Bunn, S.E.; Sullivan, C.A.; Liermann, C.R.; et al. Global threats to human water security and river biodiversity. Nature 2010, 467, 555–561. [Google Scholar] [CrossRef] [PubMed]
- Findell, K.L.; Berg, A.; Gentine, P.; Krasting, J.P.; Lintner, B.R.; Malyshev, S.; Santanello, J.A., Jr.; Shevliakova, E. The impact of anthropogenic land use and land cover change on regional climate extremes. Nat. Commun. 2017, 8, 989–990. [Google Scholar] [CrossRef]
- Haddeland, I.; Heinke, J.; Biemans, H.; Eisner, S.; Florke, M.; Hanasaki, N.; Konzmann, M.; Ludwig, F.; Masaki, Y.; Schewe, J.; et al. Global water resources affected by human interventions and climate change. Proc. Natl. Acad. Sci. USA 2014, 111, 3251–3256. [Google Scholar] [CrossRef] [Green Version]
- Zhu, Z.; Wulder, M.A.; Roy, D.P.; Woodcock, C.E.; Hansen, M.C.; Radeloff, V.C.; Healey, S.P.; Schaaf, C.; Hostert, P.; Strobl, P.; et al. Benefits of the free and open Landsat data policy. Remote Sens. Environ. 2019, 224, 382–385. [Google Scholar] [CrossRef]
- Kauth, R.J.; Thomas, G.S. The tasselled cap—A graphic description of the spectral-temporal development of agricultural crops as seen by Landsat. Mach. Process. Remote. Sens. Data 1976, 159, 41–51. [Google Scholar]
- Pekel, J.F.; Cottam, A.; Gorelick, N.; Belward, A.S. High-resolution mapping of global surface water and its long-term changes. Nature 2016, 540, 418–422. [Google Scholar] [CrossRef]
- Huang, X.; Li, J.; Yang, J.; Zhang, Z.; Li, D.; Liu, X. 30 m global impervious surface area dynamics and urban expansion pattern observed by Landsat satellites: From 1972 to 2019. Sci. China Earth Sci. 2021, 64, 1922–1933. [Google Scholar] [CrossRef]
- Hansen, M.C.; Potapov, P.V.; Moore, H.; Turubanova, S.A.; Tyukavina, T. High-Resolution Global Maps of 21st-Century Forest Cover Change. Science 2013, 342, 850–853. [Google Scholar] [CrossRef] [Green Version]
- Yuan, W.; Xu, W. MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer. Remote Sens. 2021, 13, 4743. [Google Scholar] [CrossRef]
- Vapnik, V.N.; Chervonenkis, A.Y. On a perceptron class. Avtomat. Telemekh. 1964, 1964, 112–120. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Zhang, X.; Liu, L.; Chen, X.; Gao, Y.; Xie, S.; Mi, J. GLC_FCS30: Global land-cover product with fine classification system at 30 m using time-series Landsat imagery. Earth Syst. Sci. Data 2021, 13, 2753–2776. [Google Scholar] [CrossRef]
- Yang, J.; Huang, X. The 30 m annual land cover dataset and its dynamics in China from 1990 to 2019. Earth Syst. Sci. Data 2021, 13, 3907–3925. [Google Scholar] [CrossRef]
- Gong, P.; Wang, J.; Yu, L.; Zhao, Y.; Chen, J. Finer resolution observation and monitoring of global land cover: First mapping results with Landsat TM and ETM+ data. Int. J. Remote Sens. 2013, 34, 2607–2654. [Google Scholar] [CrossRef] [Green Version]
- Moser, G.; Serpico, S.B.; Benediktsson, J.A. Land-Cover Mapping by Markov Modeling of Spatial–Contextual Information in Very-High-Resolution Remote Sensing Images. Proc. IEEE 2013, 101, 631–651. [Google Scholar] [CrossRef]
- Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources. IEEE Geosci. Remote Sens. Mag. 2018, 5, 8–36. [Google Scholar] [CrossRef] [Green Version]
- Lecun, Y.; Bottou, L. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- Zhang, L.; Zhang, L.; Bo, D. Deep Learning for Remote Sensing Data: A Technical Tutorial on the State of the Art. IEEE Geosci. Remote Sens. Mag. 2016, 4, 22–40. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Laurens, V.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1496–1500. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 84–90. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. Comput. Sci. 2014, 30, 330–335. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 640–651. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Wu, Z.; Chen, X.; Gao, Y.; Li, Y. Rapid Target Detection in High Resolution Remote Sensing Images Using YOLO Model. ISPRS-Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, 42, 1915–1920. [Google Scholar] [CrossRef] [Green Version]
- Cao, K.; Zhang, X. An improved res-unet model for tree species classification using airborne high-resolution images. Remote Sens. 2020, 12, 1128. [Google Scholar] [CrossRef] [Green Version]
- Gao, L.; Liu, H.; Yang, M.; Chen, L.; Xiao, Z. STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 10990–11003. [Google Scholar] [CrossRef]
- Guo, Y.; Liao, J.; Shen, G. A deep learning model with capsules embedded for high-resolution image classification. IEEE J.-Stars 2020, 14, 214–223. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 1–11. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Zhang, L. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 6881–6890. [Google Scholar]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv 2020, arXiv:2105.05537. [Google Scholar]
- Gong, P.; Liu, H.; Zhang, M.; Li, C.; Wang, J.; Huang, H.; Clinton, N.; Ji, L.; Li, W.; Bai, Y.; et al. Stable classification with limited sample: Transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017. Sci. Bull. 2019, 64, 370–373. [Google Scholar] [CrossRef] [Green Version]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Filip, R.; Giorgos, T.; Ondrej, C. Fine-tuning CNN Image Retrieval with No Human Annotation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 41, 1655–1668. [Google Scholar]
- Chen, L.; Qu, H.; Zhao, J.; Chen, B.; Principe, J.C. Efficient and robust deep learning with Correntropy-induced loss function. Neural Comput. Appl. 2016, 27, 1019–1031. [Google Scholar] [CrossRef]
- Milletari, F.; Navab, N.; Ahmadi, S.A. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar]
- Wang, X.; Ming, Y.U.; Ren, H.E. Remote sensing image semantic segmentation combining UNET and FPN. Chin. J. Liq. Cryst. Disp. 2021, 36, 475–483. [Google Scholar] [CrossRef]
- Arbia, G.; Haining, G. Spatial error propagation when computing linear combinations of spectral bands: The case of vegetation indices. Environ. Ecol. Stat. 2003, 10, 375–396. [Google Scholar] [CrossRef]
Band Number | Band Name | Central Wavelength (μm) | Resolution (m) |
---|---|---|---|
B2 | Blue (B) | 0.49 | 10 |
B3 | Green (G) | 0.56 | 10 |
B4 | Red (R) | 0.66 | 10 |
B8 | Near Infrared (N) | 0.84 | 10 |
Item | Backbone | MIOU (%) | Accuracy (%) | F1-Score (%) |
---|---|---|---|---|
UNet | VGG | 70.6 | 85.05 | 69.84 |
ResNet50 | 67.73 | 82.73 | 66.93 | |
DeeplabV3+ | MobileNet | 70.47 | 86.58 | 72.91 |
Xception | 69.2 | 82.44 | 71.63 | |
Swin-UNet | Swin Transformer | 72.06 | 89.77 | 76.46 |
Class | IOU (%) | Recall (%) | Precision (%) |
---|---|---|---|
Crop | 82.01 | 93.84 | 86.68 |
Forest | 82.06 | 93.07 | 87.4 |
Grassland | 34.22 | 38.41 | 62.19 |
Shrubland | 64.23 | 70.3 | 77.11 |
Wetland | 36.36 | 43.15 | 60.12 |
Water | 86.22 | 91.46 | 93.78 |
Tundra | 33.89 | 41.66 | 70.48 |
Impervious land | 59.62 | 72.24 | 77.34 |
Bare land | 38.5 | 42.1 | 73.17 |
Snow/Ice | 29.9 | 33.64 | 68.33 |
Band Combinations | MIOU (%) | Accuracy (%) | F1-Score (%) |
---|---|---|---|
RGB | 71.30 | 86.31 | 72.09 |
NGB | 72.06 | 89.77 | 76.46 |
NRGB | 69.92 | 82.86 | 70.68 |
Class | Crop | Forest | Grassland | Water | Impervious Land | Bareland | Snow/Ice | Wetland | UA (%) |
---|---|---|---|---|---|---|---|---|---|
Crop | 656 | 18 | 0 | 0 | 0 | 0 | 0 | 0 | 97.33 |
Forest | 34 | 486 | 0 | 0 | 0 | 0 | 0 | 0 | 93.46 |
Grassland | 0 | 132 | 199 | 2 | 0 | 0 | 1 | 0 | 59.58 |
Water | 32 | 1 | 0 | 322 | 11 | 0 | 0 | 19 | 83.64 |
Impervious land | 25 | 24 | 0 | 0 | 900 | 0 | 0 | 20 | 92.88 |
Bare land | 14 | 20 | 47 | 0 | 67 | 715 | 1 | 37 | 79.36 |
Snow/ice | 0 | 1 | 2 | 9 | 0 | 0 | 606 | 8 | 96.81 |
Wetland | 83 | 16 | 16 | 22 | 0 | 22 | 60 | 270 | 55.21 |
PA (%) | 77.73 | 69.63 | 75.38 | 90.70 | 92.02 | 97.01 | 90.72 | 76.27 | |
OA (%): 84.81 | Kappa coefficient: 0.82 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yao, J.; Jin, S. Multi-Category Segmentation of Sentinel-2 Images Based on the Swin UNet Method. Remote Sens. 2022, 14, 3382. https://doi.org/10.3390/rs14143382
Yao J, Jin S. Multi-Category Segmentation of Sentinel-2 Images Based on the Swin UNet Method. Remote Sensing. 2022; 14(14):3382. https://doi.org/10.3390/rs14143382
Chicago/Turabian StyleYao, Junyuan, and Shuanggen Jin. 2022. "Multi-Category Segmentation of Sentinel-2 Images Based on the Swin UNet Method" Remote Sensing 14, no. 14: 3382. https://doi.org/10.3390/rs14143382