Combining Satellite Image Standardization and Self-Supervised Learning to Improve Building Segmentation Accuracy
Abstract
Highlights
- Performing atmospheric correction before pan-sharpening improves the accuracy of building segmentation.
- The two pretext tasks are specifically designed to consider building features in satellite imagery.
- The newly developed multi-task SSL network performs better than existing SSL methods in building segmentation.
- The proposed method works effectively in situations where labeled satellite data is limited and on personal computers.
Abstract
1. Introduction
- (1)
- To clarify how the order of the AC and pan-sharpening processes affects the accuracy of building segmentation and recommend the most effective standardization method.
- (2)
- To propose two new pretext tasks that take into account the features of objects distributed in WorldView-3 images.
- (3)
- To design a novel multi-task SSL network that combines generative learning methods (using two new pretext tasks) and contrastive learning methods to improve the accuracy of building segmentation.
2. Materials
Dataset No. | Satellite Type | No. of Images | No. of Bands | Resolution (m) | No. of Patches | No. of Objects | Source | ||
---|---|---|---|---|---|---|---|---|---|
MS | PAN | MS | PAN | ||||||
Dataset I | WV-3 | 3 | 4 | 1 | 1.2 | 0.3 | 1632 | 21,951 | This study |
(Sakaiminato, Japan) | |||||||||
1683 | 14,828 | ||||||||
(Sanuki, Japan) | |||||||||
1681 | 18,389 | ||||||||
(Tsukuba, Japan) | |||||||||
Dataset II | WV-3 | 9 | 4 | 1 | 1.2 | 0.3 | 9593 | - | This study |
(9 regions in Japan) | |||||||||
Dataset III | WV-3 | 1 | 8 | 1 | 1.2 | 0.3 | 7392 | - | [34] |
(Washington, D.C., USA) | |||||||||
Dataset IV | WV-2 | 2 | 8 | 1 | 2 | 0.5 | 68,651 | - | [35] |
WV-3 | 9 | 8 | 1 | 1.2 | 0.5 | (11 regions outside Japan) |
3. Methods
3.1. Satellite Image Standardization
- (1)
- only pan-sharpening is performed, no AC is performed;
- (2)
- AC is first performed on the MS and PAN bands using 6S, and then pan-sharpening is performed;
- (3)
- AC is first performed only on the MS bands using FLAASH (FLAASH cannot be applied to PAN band), and then pan-sharpening is performed;
- (4)
- pan-sharpening is first performed, and then AC is performed on the pan-sharpened MS bands using 6S;
- (5)
- pan-sharpening is first performed, and then AC is performed on the pan-sharpened MS bands using FLAASH.
3.2. Designing a Multi-Task SSL Network to Pre-Train the VGG19 Backbone
3.2.1. New Pretext Task 1: Image Inpainting Using Multiple Small Blocks
3.2.2. New Pretext Task 2: Image Spectrum Recovery
3.2.3. Pretext Task 3: Contrastive Learning
3.3. U-Net with SSL Pre-Trained VGG19 Backbone for Building Segmentation
3.4. Accuracy Assessment
4. Results
4.1. The Most Effective Method for Standardizing Satellite Imagery
4.2. Building Segmentation Results Using U-Net with SSL Pre-Trained VGG19 Backbone
4.2.1. Performance of Two New Pretext Tasks in SSL
4.2.2. Performance of the New Multi-Task SSL Network
4.3. Comparison with Other SSL Approaches
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Pretext Task | No. of Layers | Layer | Input Channels | Output Channels | Output Size | Kernel Size | Stride | Padding |
---|---|---|---|---|---|---|---|---|
Image Inpainting | 1 | Interpolate | - | - | - | - | - | - |
2 | Conv | 512 | 512 | 32 × 32 | 3 × 3 | 1 | 1 | |
3 | BN | 512 | - | - | - | - | - | |
4 | ReLU | - | - | - | - | - | - | |
5 | Addition | - | - | - | - | - | - | |
6 | Interpolate | - | - | - | - | - | - | |
7 | Conv | 512 | 256 | 64 × 64 | 3 × 3 | 1 | 1 | |
8 | BN | 256 | - | - | - | - | - | |
9 | ReLU | - | - | - | - | - | - | |
10 | Addition | - | - | - | - | - | - | |
11 | Interpolate | - | - | - | - | - | - | |
12 | Conv | 256 | 128 | 128 × 128 | 3 × 3 | 1 | 1 | |
13 | BN | 128 | - | - | - | - | - | |
14 | ReLU | - | - | - | - | - | - | |
15 | Addition | - | - | - | - | - | - | |
16 | Interpolate | - | - | - | - | - | - | |
17 | Conv | 128 | 64 | 256 × 256 | 3 × 3 | 1 | 1 | |
18 | BN | 64 | - | - | - | - | - | |
19 | ReLU | - | - | - | - | - | - | |
20 | Addition | - | - | - | - | - | - | |
21 | Interpolate | - | - | - | - | - | - | |
22 | Conv | 64 | 32 | 512 × 512 | 3 × 3 | 1 | 1 | |
23 | BN | 32 | - | - | - | - | - | |
24 | ReLU | - | - | - | - | - | - | |
25 | Conv | 32 | 4 | 512 × 512 | 3 × 3 | 1 | 1 | |
Spectrum Recovery | 1 | Interpolate | - | - | - | - | - | - |
2 | Conv | 512 | 512 | 32 × 32 | 3 × 3 | 1 | 1 | |
3 | BN | 512 | - | - | - | - | - | |
4 | ReLU | - | - | - | - | - | - | |
5 | Addition | - | - | - | - | - | - | |
6 | Interpolate | - | - | - | - | - | - | |
7 | Conv | 512 | 256 | 64 × 64 | 3 × 3 | 1 | 1 | |
8 | BN | 256 | - | - | - | - | - | |
9 | ReLU | - | - | - | - | - | - | |
10 | Addition | - | - | - | - | - | - | |
11 | Interpolate | - | - | - | - | - | - | |
12 | Conv | 256 | 128 | 128 × 128 | 3 × 3 | 1 | 1 | |
13 | BN | 128 | - | - | - | - | - | |
14 | ReLU | - | - | - | - | - | - | |
15 | Addition | - | - | - | - | - | - | |
16 | Interpolate | - | - | - | - | - | - | |
17 | Conv | 128 | 64 | 256 × 256 | 3 × 3 | 1 | 1 | |
18 | BN | 64 | - | - | - | - | - | |
19 | ReLU | - | - | - | - | - | - | |
20 | Addition | - | - | - | - | - | - | |
21 | Interpolate | - | - | - | - | - | - | |
22 | Conv | 64 | 32 | 512 × 512 | 3 × 3 | 1 | 1 | |
23 | BN | 32 | - | - | - | - | - | |
24 | ReLU | - | - | - | - | - | - | |
25 | Conv | 32 | 4 | 512 × 512 | 3 × 3 | 1 | 1 | |
Contrastive Learning | 1 | Average Pooling | - | - | 1 × 1 | - | - | - |
2 | Linear | 512 | 1024 | - | - | - | - | |
3 | ReLU | - | - | - | - | - | - | |
4 | Linear | 1024 | 512 | - | - | - | - |
No. | Method | Size of Small Blocks | Mean | (A) | (B) | (C) | ||||
---|---|---|---|---|---|---|---|---|---|---|
IoU | OA | IoU | OA | IoU | OA | IoU | OA | |||
1 | Inpainting with small blocks | 16 × 16 | 0.625 | 0.953 | 0.648 | 0.961 | 0.636 | 0.959 | 0.591 | 0.938 |
2 | 32 × 32 | 0.628 | 0.953 | 0.642 | 0.959 | 0.646 | 0.959 | 0.596 | 0.942 | |
3 | 64 × 64 | 0.617 | 0.951 | 0.655 | 0.962 | 0.595 | 0.949 | 0.600 | 0.943 |
No. | Method | Proportion of Small Blocks in Pretext Task 1 | Mean | (A) | (B) | (C) | ||||
---|---|---|---|---|---|---|---|---|---|---|
IoU | OA | IoU | OA | IoU | OA | IoU | OA | |||
1 | Multiple task-based SSL network (This study) | 0.05 | 0.595 | 0.946 | 0.571 | 0.945 | 0.620 | 0.955 | 0.594 | 0.940 |
2 | 0.15 | 0.642 | 0.955 | 0.661 | 0.963 | 0.663 | 0.962 | 0.603 | 0.942 | |
3 | 0.30 | 0.663 | 0.958 | 0.692 | 0.968 | 0.678 | 0.964 | 0.620 | 0.943 | |
4 | 0.40 | 0.630 | 0.953 | 0.655 | 0.962 | 0.609 | 0.952 | 0.624 | 0.946 | |
5 | 0.50 | 0.626 | 0.952 | 0.638 | 0.958 | 0.623 | 0.954 | 0.615 | 0.943 | |
6 | 0.60 | 0.621 | 0.950 | 0.655 | 0.962 | 0.582 | 0.946 | 0.625 | 0.944 | |
7 | 0.75 | 0.641 | 0.955 | 0.651 | 0.960 | 0.652 | 0.961 | 0.621 | 0.945 |
Position in Figure 7 | SimCLR | MoCo v2 | BYOL | PGSSL | This Study |
---|---|---|---|---|---|
Top | 0.744 | 0.874 | 0.780 | 0.886 | 0.910 |
Medium | 0.754 | 0.774 | 0.752 | 0.790 | 0.834 |
Bottom | 0.906 | 0.941 | 0.905 | 0.878 | 0.992 |
References
- He, C.; Liu, Y.; Wang, D.; Liu, S.; Yu, L.; Ren, Y. Automatic Extraction of Bare Soil Land from High-Resolution Remote Sensing Images Based on Semantic Segmentation with Deep Learning. Remote Sens. 2023, 15, 1646. [Google Scholar] [CrossRef]
- Huang, X.; Zhang, L. An Adaptive Mean-Shift Analysis Approach for Object Extraction and Classification from Urban Hyperspectral Imagery. IEEE Trans. Geosci. Remote Sens. 2008, 46, 4173–4185. [Google Scholar] [CrossRef]
- Bai, H.; Li, Z.; Guo, H.; Chen, H.; Luo, P. Urban Green Space Planning Based on Remote Sensing and Geographic Information Systems. Remote Sens. 2022, 14, 4213. [Google Scholar] [CrossRef]
- Ahmadi, S.; Zoej, M.J.V.; Ebadi, H.; Moghaddam, H.A.; Mohammadzadeh, A. Automatic Urban Building Boundary Extraction from High Resolution Aerial Images Using an Innovative Model of Active Contours. Int. J. Appl. Earth Obs. Geoinf. 2010, 12, 150–157. [Google Scholar] [CrossRef]
- Zhang, X.; Gao, K.; Wang, J.; Hu, Z.; Wang, H.; Wang, P.; Zhao, X.; Li, W. Self-Supervised Learning with Deep Clustering for Target Detection in Hyperspectral Images with Insufficient Spectral Variation Prior. Int. J. Appl. Earth Obs. Geoinf. 2023, 122, 103405. [Google Scholar] [CrossRef]
- Manolakis, D.; Marden, D.; Shaw, G.A. Hyperspectral Image Processing for Automatic Target Detection Applications. Linc. Lab. J. 2003, 14, 79–116. [Google Scholar]
- Cui, Y.; Liu, P.; Ma, Y.; Chen, L.; Xu, M.; Guo, X. Pansharpening via Predictive Filtering with Element-Wise Feature Mixing. ISPRS J. Photogramm. Remote Sens. 2025, 219, 22–37. [Google Scholar] [CrossRef]
- Li, D.; Ke, Y.; Gong, H.; Li, X. Object-Based Urban Tree Species Classification Using Bi-Temporal WorldView-2 and WorldView-3 Images. Remote Sens. 2015, 7, 16917–16937. [Google Scholar] [CrossRef]
- Wang, D.; Qiu, P.; Wan, B.; Cao, Z.; Zhang, Q. Mapping α- and β-Diversity of Mangrove Forests with Multispectral and Hyperspectral Images. Remote Sens. Environ. 2022, 275, 113021. [Google Scholar] [CrossRef]
- Luo, Q.; Li, Z.; Huang, Z.; Abulaiti, Y.; Yang, Q.; Yu, S. Retrieval of Mangrove Leaf Area Index and Its Response to Typhoon Based on WorldView-3 Image. Remote Sens. Appl. Soc. Environ. 2023, 30, 100931. [Google Scholar] [CrossRef]
- Liu, X.; Frey, J.; Denter, M.; Zielewska-Büttner, K.; Still, N.; Koch, B. Mapping Standing Dead Trees in Temperate Montane Forests Using a Pixel- and Object-Based Image Fusion Method and Stereo WorldView-3 Imagery. Ecol. Indic. 2021, 133, 108438. [Google Scholar] [CrossRef]
- Yao, Y.; Wang, S. Effects of Atmospheric Correction and Image Enhancement on Effective Plastic Greenhouse Segments Based on a Semi-Automatic Extraction Method. ISPRS Int. J. Geo-Inf. 2022, 11, 585. [Google Scholar] [CrossRef]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. arXiv 2019, arXiv:1911.05722. [Google Scholar]
- Wang, Y.; Albrecht, C.M.; Braham, N.A.A.; Mou, L.; Zhu, X.X. Self-Supervised Learning in Remote Sensing: A Review. IEEE Geosci. Remote Sens. Mag. 2022, 10, 213–247. [Google Scholar] [CrossRef]
- Balestriero, R.; Ibrahim, M.; Sobal, V.; Morcos, A.; Shekhar, S.; Goldstein, T.; Bordes, F.; Bardes, A.; Mialon, G.; Tian, Y.; et al. A Cookbook of Self-Supervised Learning. arXiv 2023, arXiv:2304.12210. [Google Scholar] [CrossRef]
- Liu, X.; Zhang, F.; Hou, Z.; Mian, L.; Wang, Z.; Zhang, J.; Tang, J. Self-Supervised Learning: Generative or Contrastive. IEEE Trans. Knowl. Data Eng. 2023, 35, 857–876. [Google Scholar] [CrossRef]
- Tang, Y.; Yang, Y.; Sun, G. Generative and Contrastive Graph Representation Learning with Message Passing. Neural Netw. 2025, 185, 107224. [Google Scholar] [CrossRef] [PubMed]
- Zhang, R.; Isola, P.; Efros, A.A. Colorful Image Colorization. arXiv 2016, arXiv:1603.08511. [Google Scholar] [CrossRef]
- Pathak, D.; Krähenbühl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context Encoders: Feature Learning by Inpainting. arXiv 2016, arXiv:1604.07379. [Google Scholar] [CrossRef]
- Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. arXiv 2020, arXiv:2006.09882. [Google Scholar]
- Zbontar, J.; Jing, L.; Misra, I.; LeCun, Y.; Deny, S. Barlow Twins: Self-Supervised Learning via Redundancy Reduction. arXiv 2021, arXiv:2103.03230. [Google Scholar] [CrossRef]
- Li, W.; Chen, H.; Shi, Z. Semantic Segmentation of Remote Sensing Images with Self-Supervised Multitask Representation Learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 6438–6450. [Google Scholar] [CrossRef]
- Chen, D.Y.; Peng, L.; Zhang, W.Y.; Wang, Y.D.; Yang, L.N. Research on Self-Supervised Building Information Extraction with High-Resolution Remote Sensing Images for Photovoltaic Potential Evaluation. Remote Sens. 2022, 14, 5350. [Google Scholar] [CrossRef]
- Ministry of Land, Infrastructure, Transport and Tourism of Japan. Housing Economy Related Data. 2024. Available online: https://www.mlit.go.jp/statistics/details/t-jutaku-2_tk_000002.html (accessed on 4 September 2025).
- Suvorov, R.; Logacheva, E.; Mashikhin, A.; Remizova, A.; Ashukha, A.; Silvestrov, A.; Kong, N.; Goka, H.; Park, K.; Lempitsky, V.; et al. Resolution-Robust Large Mask Inpainting with Fourier Convolutions. arXiv 2021, arXiv:2109.07161. [Google Scholar] [CrossRef]
- Bigdeli, S.; Süsstrunk, S. Deep Semantic Segmentation Using Nir as Extra Physical Information. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 2439–2443. [Google Scholar] [CrossRef]
- Cai, Y.; Fan, L.; Zhang, C. Semantic Segmentation of Multispectral Images via Linear Compression of Bands: An Experiment Using RIT-18. Remote Sens. 2022, 14, 2673. [Google Scholar] [CrossRef]
- Singhal, U.; Yu, S.X.; Steck, Z.; Kangas, S.; Reite, A.A. Multi-Spectral Image Classification with Ultra-Lean Complex-Valued Models. arXiv 2022, arXiv:2211.11797. [Google Scholar]
- Yang, J.; Li, P.; He, Y. A Multi-Band Approach to Unsupervised Scale Parameter Selection for Multi-Scale Image Segmentation. ISPRS J. Photogramm. Remote Sens. 2014, 94, 13–24. [Google Scholar] [CrossRef]
- Johnson, B.; Xie, Z. Unsupervised Image Segmentation Evaluation and Refinement Using a Multi-Scale Approach. ISPRS J. Photogramm. Remote Sens. 2011, 66, 473–483. [Google Scholar] [CrossRef]
- Huang, B.; Collins, M.L.; Bradbury, K.; Malof, M.J. Deep Convolutional Segmentation of Remote Sensing Imagery: A Simple and Efficient Alternative to Stitching Output Labels. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 6899–6902. [Google Scholar] [CrossRef]
- Karila, K.; Matikainen, L.; Karjalainen, M.; Puttonen, E.; Chen, Y.; Hyyppä, J. Automatic Labelling for Semantic Segmentation of VHR Satellite Images: Application of Airborne Laser Scanner Data and Object-Based Image Analysis. ISPRS Open J. Photogramm. Remote Sens. 2023, 9, 100046. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Bosch, M.; Kurtz, Z.; Hagstrom, S.; Brown, M. A multiple view stereo benchmark for satellite imagery. In Proceedings of the 2016 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 18–20 October 2016; pp. 1–9. [Google Scholar] [CrossRef]
- Myron, B.; Hirsh, G.; Kevin, F.; Andrea, L.; Sean, W.; Shea, H.; Marc, B.; Scott, A. Large-Scale Public Lidar and Satellite Image Data Set for Urban Semantic Labeling. In Proceedings of the Laser Radar Technology and Applications XXIII, Orlando, FL, USA, 17–18 April 2018; pp. 154–167. [Google Scholar] [CrossRef]
- Yang, J.; Matsushita, B.; Zhang, H. Improving Building Rooftop Segmentation Accuracy through the Optimization of UNet Basic Elements and Image Foreground-Background Balance. ISPRS J. Photogramm. Remote Sens. 2023, 201, 123–137. [Google Scholar] [CrossRef]
- Vermote, E.F.; Tanré, D.; Luc Deuzé, J.; Herman, M.; Morcrette, J.J. Second Simulation of the Satellite Signal in the Solar Spectrum, 6S: An Overview. IEEE Trans. Geosci. Remote Sens. 1997, 35, 675–686. [Google Scholar] [CrossRef]
- Wilson, R.T. Py6S: A Python Interface to the 6S Radiative Transfer Model. Comput. Geosci. 2013, 51, 166–171. [Google Scholar] [CrossRef]
- Adler-Golden, S.M.; Matthew, M.W.; Bernstein, L.S.; Levine, R.Y.; Berk, A.; Richtsmeier, S.C.; Acharya, P.K.; Anderson, G.P.; Felde, G.; Gardner, J.; et al. Atmospheric Correction for Shortwave Spectral Imagery Based on MODTRAN4. Proc. SPIE 1999, 3753, 61–69. [Google Scholar] [CrossRef]
- Guo, Y.; Zeng, F. Atmospheric Correction Comparison of SPOT-5 Image Based on Model FLAASH and Model QUAC. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2012, 39, 7–11. [Google Scholar] [CrossRef]
- Nguyen, H.C.; Jung, J.; Lee, J.; Choi, S.U.; Hong, S.Y.; Heo, J. Optimal Atmospheric Correction for Above-Ground Forest Biomass Estimation with the ETM+ Remote Sensor. Sensors 2015, 15, 18865–18886. [Google Scholar] [CrossRef]
- Marcello, J.; Eugenio, F.; Perdomo, U.; Medina, A. Assessment of Atmospheric Algorithms to Retrieve Vegetation in Natural Protected Areas Using Multispectral High Resolution Imagery. Sensors 2016, 16, 1624. [Google Scholar] [CrossRef] [PubMed]
- Yang, M.; Hu, Y.; Tian, H.; Khan, F.A.; Liu, Q.; Goes, J.I.; Gomes, H.D.R.; Kim, W. Atmospheric Correction of Airborne Hyperspectral CASI Data Using Polymer, 6S and FLAASH. Remote Sens. 2021, 13, 5062. [Google Scholar] [CrossRef]
- Laben, C.A.; Brower, B.V. Process for Enhancing the Spatial Resolution of Multispectral Imagery Using Pan-Sharpening. U.S. Patent 6,011,875, 4 January 2000. [Google Scholar]
- Sun, W.; Chen, B.; Messinger, D.W. Nearest-Neighbor Diffusion-Based Pan-Sharpening Algorithm for Spectral Images. Opt. Eng. 2014, 53, 013107. [Google Scholar] [CrossRef]
- Yilmaz, C.S.; Yilmaz, V.; Gungor, O. A Theoretical and Practical Survey of Image Fusion Methods for Multispectral Pansharpening. Inform. Fusion. 2022, 79, 1–43. [Google Scholar] [CrossRef]
- NV5 Geospatial. NNDiffuse Pan-Sharpening. NV5 Geospatial Software Documentation. 2025. Available online: https://www.nv5geospatialsoftware.com/docs/nndiffusepansharpening.html (accessed on 4 September 2025).
- Yilmaz, C.S.; Yilmaz, V.; Gungor, O.; Shan, J. Metaheuristic Pansharpening Based on Symbiotic Organisms Search Optimization. ISPRS J. Photogramm. Remote Sens. 2019, 158, 167–187. [Google Scholar] [CrossRef]
- Yilmaz, V.; Yilmaz, C.S.; Güngör, O.; Shan, J. A Genetic Algorithm Solution to the Gram-Schmidt Image Fusion. Int. J. Remote Sens. 2020, 41, 1458–1485. [Google Scholar] [CrossRef]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. arXiv 2020, arXiv:2002.05709. [Google Scholar] [CrossRef]
- Chen, X.; Fan, H.; Girshick, R.; He, K. Improved Baselines with Momentum Contrastive Learning. arXiv 2020, arXiv:2003.04297. [Google Scholar] [CrossRef]
- Li, X.; Sun, X.; Meng, Y.; Liang, J.; Wu, F.; Li, J. Dice Loss for Data-Imbalanced NLP Tasks. arXiv 2020, arXiv:1911.02855. [Google Scholar] [CrossRef]
- Bhattacharyya, A. On a Measure of Divergence between Two Multinomial Populations. Sankhya Indian J. Stat. 1946, 7, 401–406. [Google Scholar]
- Schanda, J. Chapter 4: CIE Color Difference Metrics. In Colorimetry: Understanding the CIE System, 1st ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2007. [Google Scholar] [CrossRef]
- Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.H.; Buchatskaya, E.; Doersch, C.; Pires, B.A.; Guo, Z.D.; Azar, M.G.; et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning. arXiv 2020, arXiv:2006.07733. [Google Scholar] [CrossRef]
- Cheng, T.; Ji, X.; Yang, G.; Zheng, H.; Ma, J.; Yao, X.; Zhu, Y.; Cao, W. DESTIN: A New Method for Delineating the Boundaries of Crop Fields by Fusing Spatial and Temporal Information from WorldView and Planet Satellite Imagery. Comput. Electron. Agric. 2020, 178, 105787. [Google Scholar] [CrossRef]
- Nininahazwe, F.; Varin, M.; Théau, J. Mapping Common and Glossy Buckthorns (Frangula Alnus and Rhamnus Cathartica) Using Multi-Date Satellite Imagery WorldView-3, GeoEye-1 and SPOT-7. Int. J. Digit. Earth. 2023, 16, 31–42. [Google Scholar] [CrossRef]
- Gao, B.C.; Montes, M.J.; Davis, C.O.; Goetz, A.F.H. Atmospheric Correction Algorithms for Hyperspectral Remote Sensing Data of Land and Ocean. Remote Sens. Environ. 2009, 113, S17–S24. [Google Scholar] [CrossRef]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked Autoencoders Are Scalable Vision Learners. arXiv 2021, arXiv:2111.06377. [Google Scholar] [CrossRef]
- Baek, W.K.; Lee, M.J.; Jung, H.S. Land Cover Classification From RGB and NIR Satellite Images Using Modified U-Net Model. IEEE Access 2024, 12, 69445–69455. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep High-Resolution Representation Learning for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3349–3364. [Google Scholar] [CrossRef] [PubMed]
- Li, R.; Zheng, S.; Zhang, C.; Duan, C.; Wang, L.; Atkinson, P.M. ABCNet: Attentive Bilateral Contextual Network for Efficient Semantic Segmentation of Fine-Resolution Remotely Sensed Imagery. ISPRS J. Photogramm. Remote Sens. 2021, 181, 84–98. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. arXiv 2021, arXiv:2105.15203. [Google Scholar] [CrossRef]
- Wang, L.; Li, R.; Zhang, C.; Fang, S.; Duan, C.; Meng, X.; Atkinson, P.M. UNetFormer: A UNet-like Transformer for Efficient Semantic Segmentation of Remote Sensing Urban Scene Imagery. ISPRS J. Photogramm. Remote Sens. 2022, 190, 196–214. [Google Scholar] [CrossRef]
- Chen, L.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar] [CrossRef]
- Liu, S.; Huang, D.; Wang, Y. Learning Spatial Fusion for Single-Shot Object Detection. arXiv 2019, arXiv:1911.09516. [Google Scholar] [CrossRef]
- Yang, J.; Zhao, Z.; Yang, J. A shadow removal method for high resolution remote sensing image. Geomat. Inf. Sci. Wuhan Univ. 2008, 33, 17–20. [Google Scholar]
- Guo, J.; Tian, Q.; Wu, Y. Study on multispectral detecting shadow areas and a theoretical model of removing shadows from remote sensing images. J. Remote Sens. 2006, 2, 151–159. [Google Scholar] [CrossRef]
No. | Method | Mean DB | DB Between Image (a) and Image (b) | DB Between Image (b) and Image (c) | DB Between Image (c) and Image (a) |
---|---|---|---|---|---|
1 | Only NNDiffuse | 0.400 | 0.459 | 0.486 | 0.257 |
2 | 6S + NNDiffuse | 0.153 | 0.140 | 0.113 | 0.207 |
3 | FLAASH + NNDiffuse | 0.369 | 0.295 | 0.489 | 0.322 |
4 | NNDiffuse + 6S | 0.252 | 0.225 | 0.274 | 0.258 |
5 | NNDiffuse + FLAASH | 0.378 | 0.389 | 0.464 | 0.279 |
No. | Method | Mean | (A) | (B) | (C) | Variance | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
IoU | OA | IoU | OA | IoU | OA | IoU | OA | IoU | OA | ||
1 | Only NNDiffuse | 0.395 | 0.862 | 0.371 | 0.883 | 0.508 | 0.940 | 0.304 | 0.762 | 5.38 × 10−3 | 1.66 × 10−3 |
2 | 6S + NNDiffuse | 0.575 | 0.943 | 0.505 | 0.930 | 0.622 | 0.956 | 0.599 | 0.942 | 3.46 × 10−3 | 1.76 × 10−4 |
3 | FLAASH + NNDiffuse | 0.420 | 0.890 | 0.324 | 0.865 | 0.465 | 0.918 | 0.470 | 0.886 | 5.22 × 10−3 | 7.03 × 10−4 |
4 | NNDiffuse + 6S | 0.531 | 0.935 | 0.502 | 0.929 | 0.514 | 0.938 | 0.576 | 0.937 | 2.07 × 10−4 | 2.22 × 10−5 |
5 | NNDiffuse + FLAASH | 0.424 | 0.889 | 0.385 | 0.885 | 0.511 | 0.940 | 0.378 | 0.843 | 4.14 × 10−3 | 9.41 × 10−4 |
No. | Image Inpainting Method | Image Damage Strategy | Mean | (A) | (B) | (C) | ||||
---|---|---|---|---|---|---|---|---|---|---|
IoU | OA | IoU | OA | IoU | OA | IoU | OA | |||
1 | One Large Block | LB | 0.602 | 0.949 | 0.596 | 0.951 | 0.610 | 0.955 | 0.598 | 0.943 |
2 | Small Blocks (this study) | SB1 | 0.591 | 0.946 | 0.543 | 0.940 | 0.613 | 0.954 | 0.617 | 0.944 |
3 | Small Blocks (this study) | SB2 | 0.611 | 0.950 | 0.637 | 0.959 | 0.611 | 0.953 | 0.584 | 0.939 |
4 | Small Blocks (this study) | SB3 | 0.628 | 0.953 | 0.642 | 0.959 | 0.646 | 0.959 | 0.596 | 0.942 |
No. | Method | Damage Strategy | Mean | (A) | (B) | (C) | ||||
---|---|---|---|---|---|---|---|---|---|---|
IoU | OA | IoU | OA | IoU | OA | IoU | OA | |||
1 | CIC | RGB to CIELAB | 0.648 | 0.957 | 0.664 | 0.965 | 0.670 | 0.965 | 0.611 | 0.942 |
2 | ISR (this study) | R1Z | 0.605 | 0.949 | 0.602 | 0.953 | 0.624 | 0.955 | 0.589 | 0.938 |
3 | ISR (this study) | R2A | 0.593 | 0.947 | 0.614 | 0.954 | 0.596 | 0.950 | 0.570 | 0.937 |
4 | ISR (this study) | R3A | 0.606 | 0.949 | 0.588 | 0.950 | 0.635 | 0.957 | 0.594 | 0.941 |
5 | ISR (this study) | 4A | 0.652 | 0.958 | 0.668 | 0.964 | 0.684 | 0.966 | 0.604 | 0.943 |
No. | Pretext Task 1 | Pretext Task 2 | Pretext Task 3 | Mean | (A) | (B) | (C) | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
IoU | OA | IoU | OA | IoU | OA | IoU | OA | ||||
1 | × | × | × | 0.575 | 0.943 | 0.505 | 0.930 | 0.622 | 0.956 | 0.597 | 0.942 |
2 | √ | × | × | 0.628 | 0.953 | 0.642 | 0.959 | 0.646 | 0.959 | 0.596 | 0.942 |
3 | √ | √ | × | 0.636 | 0.954 | 0.661 | 0.963 | 0.645 | 0.958 | 0.604 | 0.940 |
4 | √ | √ | √ | 0.663 | 0.958 | 0.692 | 0.968 | 0.678 | 0.964 | 0.620 | 0.943 |
No. | SSL Type | Method | Mean | (A) | (B) | (C) | ||||
---|---|---|---|---|---|---|---|---|---|---|
IoU | OA | IoU | OA | IoU | OA | IoU | OA | |||
1 | Contrastive Learning | SimCLR | 0.615 | 0.951 | 0.669 | 0.965 | 0.582 | 0.948 | 0.594 | 0.940 |
2 | MoCo v2 | 0.583 | 0.946 | 0.602 | 0.952 | 0.572 | 0.946 | 0.576 | 0.939 | |
3 | BYOL | 0.589 | 0.947 | 0.569 | 0.945 | 0.593 | 0.952 | 0.606 | 0.943 | |
4 | Multiple Task-based | PGSSL | 0.640 | 0.955 | 0.608 | 0.953 | 0.659 | 0.962 | 0.653 | 0.951 |
5 | This study | 0.663 | 0.958 | 0.692 | 0.968 | 0.678 | 0.964 | 0.620 | 0.943 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, H.; Matsushita, B. Combining Satellite Image Standardization and Self-Supervised Learning to Improve Building Segmentation Accuracy. Remote Sens. 2025, 17, 3182. https://doi.org/10.3390/rs17183182
Zhang H, Matsushita B. Combining Satellite Image Standardization and Self-Supervised Learning to Improve Building Segmentation Accuracy. Remote Sensing. 2025; 17(18):3182. https://doi.org/10.3390/rs17183182
Chicago/Turabian StyleZhang, Haoran, and Bunkei Matsushita. 2025. "Combining Satellite Image Standardization and Self-Supervised Learning to Improve Building Segmentation Accuracy" Remote Sensing 17, no. 18: 3182. https://doi.org/10.3390/rs17183182
APA StyleZhang, H., & Matsushita, B. (2025). Combining Satellite Image Standardization and Self-Supervised Learning to Improve Building Segmentation Accuracy. Remote Sensing, 17(18), 3182. https://doi.org/10.3390/rs17183182