Precise City-Scale Urban Water Body Semantic Segmentation and Open-Source Sampleset Construction Based on Very High-Resolution Remote Sensing: A Case Study in Chengdu

Cheng, Xi; Zhu, Qian; Song, Yujian; Yang, Jieyu; Wang, Tingting; Zhao, Bin; Shen, Zhanfeng

doi:10.3390/rs16203873

Open AccessArticle

Precise City-Scale Urban Water Body Semantic Segmentation and Open-Source Sampleset Construction Based on Very High-Resolution Remote Sensing: A Case Study in Chengdu

by

Xi Cheng

^1,*

,

Qian Zhu

¹,

Yujian Song

¹,

Jieyu Yang

¹,

Tingting Wang

¹,

Bin Zhao

¹ and

Zhanfeng Shen

^2,3

¹

College of Geophysics, Chengdu University of Technology, Chengdu 610059, China

²

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China

³

University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(20), 3873; https://doi.org/10.3390/rs16203873

Submission received: 28 August 2024 / Revised: 12 October 2024 / Accepted: 15 October 2024 / Published: 18 October 2024

(This article belongs to the Section AI Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

Addressing the challenges related to urban water bodies is essential for advancing urban planning and development. Therefore, obtaining precise and timely information regarding urban water bodies is of paramount importance. To address issues such as incomplete extraction boundaries, mistaken feature identification, and omission of small water bodies, this study utilized very high-resolution (VHR) satellite images of the Chengdu urban area and its surroundings to create the Chengdu Urban Water Bodies Semantic Segmentation Dataset (CDUWD). Based on the shape characteristics of water bodies, these images were processed through annotation, cropping, and other operations. We introduced Ad-SegFormer, an enhanced model based on SegFormer, which integrates a densely connected atrous spatial pyramid pooling module (DenseASPP) and progressive feature pyramid network (AFPN) to better handle the multi-scale characteristics of urban water bodies. The experimental results demonstrate the effectiveness of combining the CDUWD dataset with the Ad-SegFormer model for large-scale urban water body extraction, achieving accuracy rates exceeding 96%. This study demonstrates the effectiveness of Ad-SegFormer in improving water body extraction and provides a valuable reference for extracting large-scale urban water body information using VHR images.

Keywords:

urban water body; very high-resolution remote sensing; CDUWD; Ad-SegFormer; semantic segmentation; Chengdu

Graphical Abstract

1. Introduction

In recent decades, rapid urbanization in China has significantly altered the morphology and geographical distribution of urban water bodies, posing unprecedented challenges. These changes may lead to a reduction in the areas of water bodies, deterioration of water quality, disruption of ecosystems, and a lack of coordination with urban infrastructure. They increase the risks of flooding and decrease the sustainability of water supply and resources. Additionally, they impact urban landscapes and residents’ quality of life. Urban water bodies play a crucial role in monitoring urban environmental conditions, researching the urban heat island phenomenon, and sustaining urban ecological stability [1]. Their morphology and spatial distribution are integral components of city planning and development, emphasizing the importance of promptly and accurately acquiring information about urban water bodies [2,3].

Remote sensing images are frequently used for mapping urban water bodies because they provide rich spectral information on ground features and are cost-effective for large-scale detection [4,5]. The Normalized Difference Water Index (NDWI), based on spectral indices, is commonly used to obtain basic information such as the location, area, and shape of urban water bodies [6,7,8,9,10]. However, these methods are primarily focused on low or medium spatial resolution remote sensing images, such as those from the Landsat and Sentinel series. This limitation in resolution hampers the extraction of small or narrow water bodies within urban areas, such as landscape water bodies or small streams in urban parks. These water bodies are crucial for urban ecology and the planning and construction of urban parks, highlighting the importance of accurately extracting them in urban water body extraction tasks [11,12,13,14,15]. With the advancement of spatial resolution in remote sensing images, Very High-Resolution (VHR) remote sensing images provide more accurate data support for urban water body extraction. Scholars have conducted work on urban water body extraction based on VHR imagery. The extraction results have demonstrated that the effectiveness of water body extraction on VHR images is superior to that on low-to-medium resolution images, particularly in capturing small and narrow water bodies. [16,17]. However, these methods rely on threshold calculations and parameter selection, requiring individual adjustments for each image in the study area. This requirement makes them not universally applicable and convenient in the application of large-scale water body extraction.

With the development of deep learning, semantic segmentation models such as FCN [18], BiSeNet [19], U-Net [20], and DeepLabV3 [21] have emerged, providing new methods for accurately extracting urban water bodies in high-resolution remote sensing images. Some scholars have extracted urban water bodies from high-resolution remote sensing images utilizing the original FCN model [22], the improved U-Net, DeepLabV3 [23,24], and BiSeNet models [25]. The extraction results of these methods demonstrate the exceptional precision and accuracy of CNN-based models in extracting water bodies from high-resolution remote sensing images.

At the end of 2020, the Vision Transformer (ViT) introduced the Transformer architecture, well-known for its achievements in natural language processing, into computer vision tasks [26]. This integration marked a significant milestone in the field of computer vision, transforming image analysis approaches by leveraging self-attention mechanisms. Consequently, the introduction of the Vision Transformer also opens up new possibilities for the task of extracting urban water bodies. Notably, the Swin Transformer [27] and SegFormer [28] model based on the attention mechanism demonstrated excellent performance in semantic segmentation. Some researchers applied these models to extract water bodies from Sentinel series satellite images [29,30]. The results indicate that the attention mechanism model can accurately extract water body information from remote sensing images, providing new ideas for urban water body extraction tasks.

Based on these advanced technologies, this study addresses challenges in urban water body extraction, focusing on Chengdu City to achieve both accurate and efficient extraction and large-scale water mapping.

The main contributions were as follows: (1) We developed an open-source Chengdu Urban Water Bodies Semantic Segmentation Dataset (CDUWD) based on VHR remote sensing images, which is specifically tailored for urban water body extraction. (2) We developed Ad-SegFormer, an optimized framework based on SegFormer, by incorporating a densely connected atrous spatial pyramid pooling module (DenseASPP) and a progressive feature pyramid network (AFPN). We then compared the effectiveness of FCN, BiSeNet, DeepLabV3, Swin Transformer, SegFormer, and Ad-SegFormer in urban water body extraction and validated its performance using the CDUWD dataset. (3) We successfully completed city-scale urban water extraction and mapping in Chengdu. This study presents a practical method for extracting large-scale water body information from VHR images.

2. Materials and Methods

2.1. Study Area

Chengdu, the capital of Sichuan Province, lies in the western part of the Sichuan Basin and serves as a significant metropolis in western China. Chengdu has a dense river network, with a density of 1.22 km per square kilometer, the highest in Sichuan Province. This characteristic not only defines the city’s landscape but also presents unique challenges and opportunities for urban water information extraction, making it an invaluable site for studying urban water bodies.

For the purposes of this study, the focus is placed on the central urbanized region of Chengdu, rather than its broader administrative boundaries. This decision is driven by the goal of closely aligning the study area with the actual distribution of urban water bodies, ensuring that the research is both accurate and relevant. The central urban area represents the main urban zone of Chengdu and is delineated based on a comprehensive analysis of urban development patterns and satellite imagery, ensuring a clear and precise definition of the study area [31]. This defined boundary allows us to focus on analyzing the distribution of water bodies within Chengdu’s most developed regions, thereby enabling a more detailed understanding of water distribution across different urban landscapes. (Figure 1).

2.2. Data Source

The foundational dataset of this study is derived from VHR imagery, specifically the 2020 Level 20 images collected from Google Earth, comprising a total of 15 images with a spatial resolution of 0.27 m. These images are crucial for their detailed depiction of Chengdu’s urban landscape. Each image includes red, green, and blue optical bands, providing a rich data source for analyzing urban features such as water bodies. The images are projected using the WGS84 coordinate system, ensuring precise and globally comparable spatial analyses.

Given the extensive scope of this research, encompassing the entirety of Chengdu’s central urbanized area, the methodology involves stitching together multiple VHR images to create a comprehensive map. Although these images were captured in the same year, they were not taken simultaneously, resulting in color inconsistencies between them. The study addressed this issue by deliberately including some mosaicked images with color discrepancies during sample selection, which ensured consistency and accuracy in the subsequent analysis.

2.3. Methods

Figure 2 illustrates the study’s primary workflow. It contains five key components: constructing datasets, model training, model evaluation, optimal model selection and mapping.

2.3.1. Production of CDUWD

During the sample selection process, we used a stratified sampling strategy to ensure that different types of water bodies, such as rivers, lakes, and small water bodies, were well-represented in the dataset. This strategy considered water body morphology, size, and location to capture a wide variety of environmental contexts, including urban centers, suburban areas, and industrial zones. To ensure effective data training, priority was given to areas rich in water bodies. The characteristics of water bodies in the study urban environment are as follows: (1) Multi-scale features: the images exhibit significant multi-scale features, portraying variations in water body spatial distribution and appearance. (2) Scene variability: water bodies display contrasting styles between urban core and peripheral areas, varying from small and regular to large and irregular shapes. (3) Complex and diverse backgrounds: the presence of buildings and shadows complicates water body extraction, with background complexity affecting the spectral signatures and the accuracy of identification. These characteristics guided the selection of representative sample images, ensuring the dataset’s robustness for model training and analysis.

When selecting sample areas, we paid attention to the diversity of water samples and fully considered the significance of non-water samples. The colors and shapes of water bodies vary. Rivers and lakes typically appear deep blue or green, while ponds and reservoirs may be blue, green, or brown. Rivers meander, lakes come in various forms, and ponds and reservoirs vary depending on their purpose and construction. Water bodies differ in size and depth, with variations between clear, transparent waters and those affected by pollution or human interference, to ensure that the model can accurately capture the geometric features and morphological changes of different water bodies. At the same time, the selection of non-water samples is also crucial because these features often have similar spectral and spatial characteristics to water bodies, making them prone to confusion. This includes shadows, linear roads, green vegetation, farmland, and bare land, among other non-water types. These samples can help the model learn to differentiate water bodies from regions with similar features, thereby enhancing the accuracy and robustness of image segmentation. Ultimately, 77 sample points were chosen, and sample images of size 4000 × 4000 pixels were selected around these points on the VHR image. Following the acquisition of these sample images, two main steps were undertaken.

First, following the uniform water body annotation principles, we conducted precise annotation of the selected samples using manual visual interpretation. The annotation principles for the water body of this study are as follows:

(1): Water bodies smaller than 50 pixels in the image are not annotated.
(2): Dry riverbeds, waterless ditches, and ditches where the presence of water is difficult to determine by the naked eye are not annotated.
(3): Ponds, artificial reservoirs, water-filled ditches, lakes, rivers, clearly water-logged paddy fields, and wetlands are annotated. To ensure the extracted water bodies maintain accurate shapes, shadowed areas cast by buildings onto the water bodies are also labeled as water bodies.

Subsequently, convert annotated samples into raster images so they can be directly used for training semantic segmentation models. According to step 1 in Figure 2, a series of cropping operations were performed on these sample and annotation images, resulting in two datasets: one containing 950 samples of 1024 × 1024 pixels, and another containing 3800 samples of 512 × 512 pixels, The process of creating the dataset is illustrated in Figure 2.

Based on selected sample characteristics, we constructed a new dataset: CDUWD. This dataset is divided into six categories: main rivers (CDUWD-1), small rivers (CDUWD-2), lake (CDUWD-3), small water bodies (CDUWD-4), other water bodies (CDUWD-5), and non-water bodies (CDUWD-6). We can gain a finer understanding of the distribution and characteristics of different types of water bodies in the urban environment, providing more reliable data for urban planning, environmental protection, and resource management. These data will be made publicly available at https://github.com/xicheng79/, accessed on 10 October 2024. To further reduce subjective labeling bias and enhance the reliability of the dataset, we plan to continuously update the labeled dataset to ensure its accuracy and relevance for future research applications.

The quantity and proportion statistics of each type of water body sample in Table 1 provide important reference data for the study, helping assess the distribution of various water body types in the dataset and guiding subsequent analysis and model training. Additionally, the training and testing sets were randomly divided to ensure consistency in sample distribution, while maintaining a balanced representation of various water body types in both sets, thereby improving the model’s generalization capability and segmentation accuracy.

2.3.2. Data Augmentation

To comprehensively support urban water body extraction tasks, we conducted extensive data augmentation. This involved randomly cropping images and adjusting their sizes to match the model input requirements. Additionally, we applied horizontal and vertical flips to images and label images with a 50% probability, randomly rotated them between 0 and 45 degrees, and applied random changes to brightness, contrast, saturation, and hue within a range of 0 to 0.03. Through these processes, we generated diverse training samples showcasing a wider range of sizes, angles, and lighting conditions, facilitating better adaptation of the model to the complex scenarios encountered in urban water body extraction tasks.

2.3.3. Ad-SegFormer Structure

In this study, we introduce SegFormer, an advanced Transformer-based semantic segmentation model that is widely recognized for its powerful feature extraction capabilities and adaptability, making it widely used in various image segmentation tasks. Due to the multi-scale nature of urban water bodies with complex and varied morphology, SegFormer’s extensive receptive field and efficient multi-scale feature fusion play crucial roles in accurately capturing detailed information, thus enhancing the model’s performance in urban water body extraction. Additionally, SegFormer maintains high accuracy while being computationally efficient, making it ideal for large-scale remote sensing applications, such as in our case study of Chengdu. To address this, we designed a new model structure, Ad-SegFormer, which incorporates two DenseASPPs and a AFPN [32] to extract urban water bodies (see Figure 3).

Module Design and Feature Fusion: We retained the original hierarchical structure of SegFormer, comprising a tiered encoder and a lightweight decoder. The encoder employs a progressively expanding field of view through a Transformer structure, consisting of four blocks. These blocks utilize a multi-head self-attention mechanism, which calculates the relationship between each pixel and all other pixels in the image, effectively capturing image features from local to global levels. Since Blocks 3 and 4 are rich in high-level semantic information and global context, the DenseASPP module is used here to supplement the receptive field. Following this, the AFPN employs a progressive feature fusion strategy, effectively integrating feature maps from different blocks. AFPN is added after each block to handle and fuse multi-scale features. In the decoder, each Transformer module is connected to a multilayer perceptron (MLP), typically consisting of two fully connected layers and an activation function, to further process the features. Initially, the lightweight MLP decoder up-samples the low-resolution feature maps to match the high-resolution feature maps. Then, the MLP is used to fuse and process these feature maps. Finally, another MLP layer predicts the mask of all fused features.

DenseASPP Module: To achieve quick response times, real-time semantic segmentation often employs lightweight backbones. However, due to the limited number of convolutional blocks, lightweight backbones can suffer from insufficient receptive fields, which can be critical in certain scenarios. Tasks like water body extraction rely heavily on a substantial receptive field. ASPP [33] ingeniously combines atrous convolution with spatial pyramid pooling to achieve a larger multi-scale receptive field with the same number of parameters as regular convolutions. It then performs global average pooling on feature maps of different sizes and upsamples them to match the original input feature map size, which inspired us to incorporate these elements into SegFormer.

For tasks like water body extraction that require robust feature representation, we introduced the concept of dense connections from DenseNet [34]. This significantly improves segmentation accuracy and edge clarity. In DenseNet, the input of each layer includes the output of all previous layers, which significantly reduces the number of parameters, enhances feature representation efficiency, and improves training efficiency by enabling direct gradient backpropagation. Combining these ideas, we developed DenseASPP to address the limitations in extracting water bodies of different scales, thus enhancing model accuracy and convergence speed.

The main structure of this module includes:

(1): 3 × 3 atrous convolution layers with different dilation rates (3, 6, 12, 18, 24), which expand the receptive field for multi-scale and long-distance spatial information fusion and feature extraction.
(2): Dense connections between different feature layers to promote feature reuse, improving target edge accuracy and clarity during segmentation.

AFPN Module: Water body extraction, a task demanding high precision, often requires identification across different scales. The AFPN enhances the model’s multi-scale fusion capability and strengthens important features. Initially, we obtain feature maps from different blocks of SegFormer, with low-level features input into the feature pyramid network to form fused features according to the framework design. Subsequently, higher-level features are progressively incorporated into the fusion process, culminating with the highest-level features. This strategy effectively prevents significant semantic gaps during feature fusion, thereby reducing the loss or degradation of high-level semantic information during propagation and interaction.

When fusing features at different scales, this progressive feature fusion strategy effectively leverages the detailed information captured by the convolutional network at various levels (including non-adjacent levels). It avoids information conflicts arising from complex scenes or significant variations in water body shape and size, thereby improving the accuracy of water body extraction.

2.3.4. Evaluation Index

Water bodies extraction can be considered as a binary classification problem, where the predicted result is either water or non-water. Based on the real categories and the actual predicted results, the outcomes can be classified as follows: True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN). To evaluate the results of urban water bodies extraction from VHR images, we used such as precision, recall, intersection over union (IoU), and F1-score. The calculation formula is as follows:

P r e c i s i o n = T P / (T P + F P)

(1)

R e c a l l = T P / (T P + F N)

(2)

I o U = T P / (T P + F P + F N)

(3)

F 1 - s c o r e = 2 \times (P r e c i s i o n \times R e c a l l) / (P r e c i s i o n + R e c a l l)

(4)

Precision characterizes the accuracy of predicting correctly positive samples. Recall reflects the model’s ability to recognize positive samples; a higher recall indicates a stronger ability of the model to recognize positive samples. Both the F1-score and IoU provide a comprehensive overview of overall performance, where higher values imply greater accuracy in prediction results.

In addition to evaluating model accuracy, we also used Flops (Floating Point Operations) and the number of parameters to quantitatively assess the performance of different models. Flops serve as a metric to assess the algorithmic or model complexity, reflecting the computational workload and evaluating model efficiency. Smaller Flops values indicate superior performance.

3. Results

3.1. Comparison of Extraction Results of Different Methods

To evaluate the performance of the Ad-SegFormer model in water bodies extraction, we compared it with the FCN, BiSeNet, DeepLabV3, Swin Transformer, and SegFormer models. We trained and tested these models using the same dataset, training parameters, and experimental environment. To enhance the model’s ability to capture details and boundaries of water bodies, we selected the crafted 1024 × 1024-pixel dataset (including 760 training samples and 190 validation samples) as our training data. The experimental setup included an Nvidia GeForce GTX 3060 graphics card and an Intel Core i7-11700F CPU, with a total of 160,000 iterations per machine. Due to limitations in computational resources and hardware conditions, we opted for ResNet50 as the backbone for the convolutional model, the miniature version for the Swin Transformer model, and the mit-b3 version for the SegFormer model. These models utilized pre-trained weights from the ImageNet dataset, which comprises millions of images spanning various categories and serves as a crucial benchmark in computer vision. By learning from a vast and diverse set of natural images, models can acquire a universal feature representation capability, enhancing their performance and generalization across tasks such as image recognition. To quantitatively compare the extraction performance of these models on urban water bodies, the evaluation results of each model are shown below.

The evaluation results in Table 2 clearly demonstrate that, under the same conditions, attention-based models such as Swin Transformer, SegFormer (mit-b3), and our focus model Ad-SegFormer significantly outperform convolution-based models in urban water body extraction tasks. For instance, in terms of precision, Swin Transformer and Ad-SegFormer achieved 97.92% and 96.25%, respectively, while FCN and BiSeNet only reached 87.22% and 86.99%, showing higher accuracy. Compared to SegFormer and Swin Transformer, Ad-SegFormer excels across multiple evaluation metrics. For example, its recall and IoU reached 96.60% and 95.59%, surpassing SegFormer’s 95.44% and 94.77%. Additionally, Ad-SegFormer maintains high accuracy with lower parameter count (52.48 M) and computational complexity (303.99 GFLops), compared to Swin Transformer’s 59.83 M parameters and 798.73 GFLops. This indicates that Ad-SegFormer not only offers superior accuracy but also provides an efficient and cost-effective solution for large-scale applications.

Figure 4 provides an intuitive display of each model’s extraction results across various water body scenarios. FCN, BiSeNet, and DeepLabV3, as typical CNN models, demonstrate a certain level of accuracy in extracting the overall contours of larger water bodies (see the first and second rows in Figure 4). However, convolutional models underperform in boundary detail and small water body recognition, especially FCN and BiSeNet, which exhibit incomplete extraction and even omissions of smaller water bodies. These convolutional models also tend to misclassify and blur boundaries in complex backgrounds, such as vegetation or buildings surrounding water bodies, likely due to their limited receptive field that restricts the ability to fully capture global features. In contrast, SegFormer and Swin Transformer demonstrate higher accuracy in extracting small water bodies and delineating clear boundaries compared to convolutional models. These Transformer-based models effectively handle multi-scale features through self-attention mechanisms, resulting in more complete boundary processing. However, Swin Transformer occasionally exhibits blurred boundaries when processing small areas and closely adjacent water body regions, indicating some limitations in fine detail extraction.

Ad-SegFormer further enhances the receptive field and multi-scale feature fusion capabilities, exhibiting notable advantages in identifying small water bodies and maintaining precise boundary integrity (see the third and fourth rows in Figure 4). While maintaining similar floating-point operations and parameter counts as SegFormer, Ad-SegFormer achieves higher extraction accuracy and optimizes computational resource consumption.

In the last two rows of Figure 4, each model’s ability to handle water body regions that closely resemble urban vegetation or are easily confused with shadows is shown to varying degrees. Building on the above analysis, it can be observed that models other than FCN and BiSeNet exhibit a certain level of recognition capability in these complex backgrounds. This improvement is likely due to the careful consideration of such challenging factors in urban water body extraction when constructing the CDUWD samples, with precise annotations that enable higher recognition accuracy in confusing scenarios. This further demonstrates that high-quality dataset annotations provide a more reliable foundation for complex urban water body extraction tasks.

Overall, Transformer models demonstrate superior capabilities in capturing the diversity and complexity of urban water bodies compared to traditional CNN models, especially in multi-scale and detailed boundary extraction. Through architectural enhancements, Ad-SegFormer achieves optimal performance in small water body recognition, boundary clarity, and complex background differentiation, making it the preferred choice for various water body extraction tasks in this study.

3.2. Key Parameter Analysis of the Models

During the urban water body extraction process, various factors can impact the accuracy of prediction results. This study, utilizing the Ad-SegFormer model, primarily examines two critical factors: data size and input augmentation. By altering the inputs of these two factors while keeping other settings at their defaults, we aim to explore the effectiveness of the multi-scale attention mechanism in urban water body extraction. Table 3 shows the key factors and associated descriptions for model training.

We considered those factors affecting the extraction of water bodies and combined them for training Ad-SegFormer, among them, the experimental setup is consistent with the setup in Section 4.1. Table 4 presents the evaluation results.

(1) Data augmentation significantly impacts model performance. For 1024-sized images, applying data augmentation (g2) significantly improved the model’s precision from 96.246 to 97.446, and IoU from 95.585 to 96.176. This indicates that augmentation techniques help enhance the model’s accuracy and consistency when processing larger images. However, for 512-sized images, the performance after augmentation (g4) was slightly lower than that of the non-augmented data (g3), especially in terms of precision and recall. This may be due to augmentation techniques not fully benefiting smaller images and possibly introducing noise.

(2) Secondly, image size plays a crucial role in model performance. Without data augmentation, 512-sized images (g3) outperformed 1024-sized images (g1) across all performance metrics. This suggests that, under the same conditions, smaller images may be more suitable for the current model, possibly because they are easier to process and analyze. When data augmentation was applied, although the 1024-sized images (g2) showed higher precision than the 512-sized images (g4), improvements in other metrics were not as evident and were slightly lower compared to the non-augmented 512-sized images (g3).

Overall, data augmentation significantly benefits larger-sized (1024) images, while smaller-sized (512) images already exhibit excellent performance without augmentation. These findings suggest that different optimization strategies should be adopted for different image sizes to fully utilize the model’s capabilities. This also provides important insights for further optimizing the application of the Ad-SegFormer model in urban water body extraction tasks.

3.3. Evaluation of Extraction Performance in CDUWD

This section delves into evaluating the Ad-SegFormer model’s performance in extracting various types of water bodies. We use Overall Accuracy (OA) as the primary metric, averaging the model’s classification accuracy across all categories within each subset of the dataset. The detailed extraction results are summarized in Table 5.

The research findings indicate that the Ad-SegFormer model demonstrates outstanding performance in extracting urban water bodies in urban aquatic environments, with an average OA exceeding 98%. It excels in accurately extracting mainstream, lakes, and small water bodies, with an overall accuracy (OA) surpassing 98%. For various features such as tributaries and other water bodies, its average OA remains above 97%. Additionally, the model exhibits a lower rate of misidentification, reducing the misjudgment of non-water elements. Consequently, the remarkable accuracy of the Ad-SegFormer model in urban water body extraction provides reliable technical support for comprehensive monitoring and protection of urban water bodies.

3.4. Evaluation of Extraction Performance in on Public Dataset

To compare the effectiveness of our algorithm on both a public dataset and our newly created dataset, we selected a water body segmentation dataset available on Kaggle (https://www.kaggle.com/datasets/kaoyuu/kaoyuuu/data, accessed on 10 October 2024) as a benchmark. This dataset has a high spatial resolution and primarily focuses on urban/town scenes, making it comparable to the CDUWD dataset developed in this study. This selection enables a more comprehensive evaluation of the algorithm’s performance across datasets with similar characteristics. The Kaggle dataset consists of 1000 samples, each with a resolution of 492 × 492 pixels. For consistency, we applied a thresholding technique to binarize the original labeled images, with pixels labeled as water set to a value of 1 and background pixels to a value of 0. The data were split into training, validation, and test sets with an 8:1:1 ratio.

Experiments on this public dataset were conducted following the same methodology used for the CDUWD dataset (Section 3.1). This cross-dataset validation not only verifies the robustness of our algorithm in extracting urban water bodies but also highlights its adaptability across different high-resolution datasets.

Table 6 shows that Swin Transformer and Ad-SegFormer perform best on the Kaggle dataset. Swin Transformer has the highest Precision (95.35%), while Ad-SegFormer excels in Recall (93.42%) and IoU (89.41%), resulting in the top F1-score (94.29%). This suggests that attention-based models, particularly Ad-SegFormer, are more effective for accurate and comprehensive urban water body extraction.

The comparison of water body extraction results on the Kaggle dataset highlights (Figure 5) the performance differences among the SegFormer, Swin Transformer, and Ad-SegFormer models. All three models demonstrate high accuracy in water body detection; however, Ad-SegFormer exhibits superior boundary precision and detail capture, particularly in handling narrow and irregular water bodies. SegFormer shows limited capability in capturing details along complex boundaries, while Swin-Transformer presents some improvements. However, both SegFormer and Swin Transformer exhibit challenges in distinguishing closely adjacent water bodies, often merging them into a single entity. This limitation may stem from Swin Transformer’s reliance on local window attention mechanisms, which restrict its ability to capture global dependencies, resulting in insufficient boundary distinction in closely adjacent water regions. Although SegFormer has a hierarchical structure, it lacks explicit multi-scale feature fusion modules and edge detail enhancement mechanisms, leading to a tendency to overlook subtle differences in adjacent water bodies during complex boundary segmentation. By incorporating modules such as DenseASPP and AFPN, Ad-SegFormer achieves finer multi-scale feature fusion and global attention capture. The results indicate that Ad-SegFormer demonstrates good adaptability and generalization across datasets in urban water body extraction tasks, showing robust performance, especially in applications requiring detailed segmentation.

3.5. Mapping of Urban Water Bodies in Chengdu

Through comparative analysis of the experimental results on the CDUWD dataset, it is clear that the Ad-SegFormer model excels in accurately extracting urban water bodies with precise edge delineation. This high accuracy and efficiency make Ad-SegFormer particularly suitable for large-scale urban water extraction tasks. We analyzed key performance factors affecting SegFormer and selected optimal parameter combinations for Ad-SegFormer. Specifically, the mit-b3 version was chosen, and 1024 × 1024-pixel samples were used for training. Data augmentation was applied, and the optimal weight file from training was selected for the final water body prediction. During the prediction phase, VHR imagery of Chengdu was divided into 1024 × 1024 pixels patches with a 20% overlap to minimize edge inconsistencies. The final extraction results for Chengdu’s urban water bodies are shown in Figure 6.

Based on the extracted results shown in Figure 6, firstly, the large map on the left vividly illustrates the distribution of water bodies in Chengdu. The city’s boundaries and the concentration of water bodies are clearly shown, with major rivers and water systems extending along main roads and green belts, forming distinct water body distribution patterns. Using the Ad-SegFormer model combined with the dataset constructed in this study, expansive water bodies, such as wide urban rivers and large lakes, are accurately delineated, demonstrating excellent extraction performance.

The smaller figures on the right conduct a detailed analysis of different regions and different types of water bodies in Chengdu. Figure 6a,b,d show the extraction results of the main rivers, tributaries and small water bodies in the urban central area. Figure 6c presents the extraction result of the lakes in the green park area. Figure 6e shows the extraction results of the lakes and small water bodies at the peri-urban junction. Figure 6f shows the extraction result of other small water bodies in the rural areas outside the central urban area of Chengdu. It can be seen from Figure 6f that even in the rural area where the water body distribution is sparse and there are obvious interferences, the Ad-SegFormer model also achieved relatively good extraction results.

In summary, the combination of the dataset constructed in this study and the Ad-SegFormer model exhibits remarkable effectiveness in extracting water bodies in Chengdu. It not only precisely outlines the vast water bodies but also effectively discerns a variety of complex water features, highlighting the dataset’s comprehensive applicability and the Ad-SegFormer model’s robustness and adaptability in urban water body analysis.

4. Discussion

4.1. Advantages of Transformer Models and Dataset Impact

In this study, we developed the open-source CDUWD dataset, featuring diverse water body types and representative urban characteristics, to support reliable model training and performance evaluation. Through comprehensive experiments, we compared traditional CNN-based models with advanced transformer-based architectures. The results clearly demonstrate the superiority of transformer models over CNNs in urban water body extraction, particularly due to their self-attention mechanism, which effectively handles multi-scale features and complex urban backgrounds. This enables Transformer models to achieve stronger generalization capabilities, performing well across varied urban environments. Transformer models like Ad-SegFormer excel in capturing small water bodies, delineating clear boundaries, and ensuring segmentation integrity, which are essential for applications involving diverse water body shapes and scales.

The successful application of the CDUWD dataset has not only validated its effectiveness but has also enhanced model accuracy and robustness in recognizing urban water bodies. High-quality datasets, like CDUWD, are essential for advancing urban water body research and applications. To address the morphological complexity of urban water bodies, we prioritized receptive field expansion and multi-scale feature fusion within the Ad-SegFormer model, enabling it to handle issues like large-scale variations, blurred boundaries, and small water body omissions often encountered in VHR remote sensing.

4.2. Limitations and Future Directions

This study successfully created the CDUWD dataset and developed the Ad-SegFormer model, demonstrating the model’s effectiveness in accurately identifying water bodies across diverse urban contexts. Additionally, our findings indicate that data augmentation techniques, such as random cropping, rotation, and brightness adjustment, help maintain high model accuracy without significantly increasing computational costs. However, certain limitations provide avenues for future research. One such limitation is the reliance on a single data source; the current approach uses only optical remote sensing imagery, which may reduce effectiveness in complex or obstructed urban environments. For future research, we aim to integrate LiDAR and SAR data, as LiDAR’s 3D terrain detail can improve boundary precision, and SAR’s all-weather capability aids in recognition under challenging conditions.

We also recognize that in extreme cases, such as confusion between water bodies, vegetation, and shadows, model performance may be impacted. Future research could explore combining additional remote sensing indices, such as NDWI and NDVI, to improve the model’s robustness in these scenarios. Another potential approach is to expand the sample set with images from regions with varied climatic and geographical conditions, which would help further validate the model’s generalization ability. Addressing these limitations would enhance the flexibility, accuracy, and cost-efficiency of urban water body extraction methodologies, extending the applicability of the CDUWD dataset and Ad-SegFormer model to broader urban scenarios and more diverse environmental conditions.

Additionally, to address the trade-offs between accuracy and computational cost for larger datasets, we employed pre-training on public datasets, effectively reducing training time and computational requirements while achieving high accuracy on the target dataset. Moving forward, we plan to utilize existing public water segmentation datasets for pre-training to further lower computational costs, aligning with our primary goal of creating this open-source dataset as a valuable resource for similar tasks.

5. Conclusions

This study has demonstrated that the Ad-SegFormer model significantly outperforms traditional CNN-based models in extracting urban water bodies across diverse environments, achieving an overall accuracy (OA) exceeding 98%. Its strengths lie in precisely delineating boundaries and accurately detecting various water bodies, including mainstream rivers, lakes, and smaller water features, effectively addressing the challenges of complex urban water body extraction tasks. Furthermore, the integration of DenseASPP and AFPN modules contributes to its robust performance in handling complex backgrounds and small water bodies. This study also thoroughly investigated the impact of data augmentation and sample size variations on urban water body extraction accuracy, verifying the feasibility of applying semantic segmentation for city-scale water body extraction using VHR remote sensing imagery of Chengdu.

The CDUWD dataset developed in this study has also proven to be a valuable resource, encompassing diverse urban water body types and representative environmental features. Its high-quality annotations and inclusion of challenging cases, such as water bodies closely resembling vegetation or shadowed regions, have enabled Ad-SegFormer and other models to achieve enhanced accuracy in complex urban settings. CDUWD’s contributions extend beyond this study, as it provides a solid foundation for future research in urban water body extraction and can support the development of more generalized and robust segmentation models.

Author Contributions

Conceptualization, X.C. and Q.Z.; software, Y.S.; validation, X.C., Q.Z. and J.Y.; formal analysis, J.Y.; investigation, T.W. and B.Z.; resources, Z.S.; data curation, Y.S.; writing—original draft preparation, X.C. and Q.Z.; writing—review and editing, X.C. and Q.Z.; project administration, X.C.; funding acquisition, Z.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Third Comprehensive Scientific Expedition to Xinjiang (2021xjkk1403), National Key Research and Development Program of China (2021YFC1523503), National Natural Science Foundation of China (41971375), Key Research and Development Program of Xinjiang Uygur Autonomous Region, grant number (2022B03001-3), Graduate Quality Engineering Construction Funding Program of Chengdu University of Technology (2024YAL016).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Fletcher, T.D.; Andrieu, H.; Hamel, P. Understanding, Management and Modelling of Urban Hydrology and Its Consequences for Receiving Waters: A State of the Art. Adv. Water Resour. 2013, 51, 261–279. [Google Scholar] [CrossRef]
Chen, F.; Chen, X.; Van De Voorde, T.; Roberts, D.; Jiang, H.; Xu, W. Open Water Detection in Urban Environments Using High Spatial Resolution Remote Sensing Imagery. Remote Sens. Environ. 2020, 242, 111706. [Google Scholar] [CrossRef]
Huang, X.; Xie, C.; Fang, X.; Zhang, L. Combining Pixel- and Object-Based Machine Learning for Identification of Water-Body Types from Urban High-Resolution Remote-Sensing Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2097–2110. [Google Scholar] [CrossRef]
Wu, Y.; Li, M.; Guo, L.; Zheng, H.; Zhang, H. Investigating Water Variation of Lakes in Tibetan Plateau Using Remote Sensed Data over the Past 20 Years. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 2557–2564. [Google Scholar] [CrossRef]
Zhou, Y.; Luo, J.; Shen, Z.; Hu, X.; Yang, H. Multiscale Water Body Extraction in Urban Environments from Satellite Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 4301–4312. [Google Scholar] [CrossRef]
Kaplan, G.; Avdan, U. Object-Based Water Body Extraction Model Using Sentinel-2 Satellite Imagery. Eur. J. Remote Sens. 2017, 50, 137–143. [Google Scholar] [CrossRef]
Chang, L.; Cheng, L.; Huang, C.; Qin, S.; Fu, C.; Li, S. Extracting Urban Water Bodies from Landsat Imagery Based on mNDWI and HSV Transformation. Remote Sens. 2022, 14, 5785. [Google Scholar] [CrossRef]
Deng, Y.; Jiang, W.; Tang, Z.; Li, J.; Lv, J.; Chen, Z.; Jia, K. Spatio-Temporal Change of Lake Water Extent in Wuhan Urban Agglomeration Based on Landsat Images from 1987 to 2015. Remote Sens. 2017, 9, 270. [Google Scholar] [CrossRef]
McFeeters, S.K. The Use of the Normalized Difference Water Index (NDWI) in the Delineation of Open Water Features. Int. J. Remote Sens. 1996, 17, 1425–1432. [Google Scholar] [CrossRef]
Gao, B. NDWI—A Normalized Difference Water Index for Remote Sensing of Vegetation Liquid Water from Space. Remote Sens. Environ. 1996, 58, 257–266. [Google Scholar] [CrossRef]
Bolund, P.; Hunhammar, S. Ecosystem Services in Urban Areas. Ecol. Econ. 1999, 29, 293–301. [Google Scholar] [CrossRef]
Ahern, J. Green Infrastructure for Cities: The Spatial Dimension. In Cities of the Future: Towards Integrated Sustainable Water and Landscape Management; IWA Publishing: London, UK, 2007. [Google Scholar]
Nassauer, J.I. Messy Ecosystems, Orderly Frames. Landsc. J. 1995, 14, 161–170. [Google Scholar] [CrossRef]
Kaplan, S. The Restorative Benefits of Nature: Toward an Integrative Framework. J. Environ. Psychol. 1995, 15, 169–182. [Google Scholar] [CrossRef]
Ulrich, R.S.; Simons, R.F.; Losito, B.D.; Fiorito, E.; Miles, M.A.; Zelson, M. Stress Recovery during Exposure to Natural and Urban Environments. J. Environ. Psychol. 1991, 11, 201–230. [Google Scholar] [CrossRef]
Senaras, C.; Gedik, E.; Yardimci, Y. A Novel Dynamic Thresholding and Categorizing Approach to Extract Water Objects from VHR Satellite Images. In Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium, Quebec City, QC, Canada, 13–18 July 2014; pp. 4934–4937. [Google Scholar]
Liu, Q.; Tian, Y.; Zhang, L.; Chen, B. Urban Surface Water Mapping from VHR Images Based on Superpixel Segmentation and Target Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 5339–5356. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. arXiv 2015, arXiv:1411.4038. [Google Scholar]
Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. arXiv 2018, arXiv:1808.00897. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. ISBN 978-3-319-24573-7. [Google Scholar]
Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
Li, L.; Yan, Z.; Shen, Q.; Cheng, G.; Gao, L.; Zhang, B. Water Body Extraction from Very High Spatial Resolution Remote Sensing Data Based on Fully Convolutional Networks. Remote Sens. 2019, 11, 1162. [Google Scholar] [CrossRef]
Li, W.; Li, Y.; Gong, J.; Feng, Q.; Zhou, J.; Sun, J.; Shi, C.; Hu, W. Urban Water Extraction with UAV High-Resolution Remote Sensing Data Based on an Improved U-Net Model. Remote Sens. 2021, 13, 3165. [Google Scholar] [CrossRef]
Li, Z.; Wang, R.; Zhang, W.; Hu, F.; Meng, L. Multiscale Features Supported DeepLabV3+ Optimization Scheme for Accurate Water Semantic Segmentation. IEEE Access 2019, 7, 155787–155804. [Google Scholar] [CrossRef]
Wang, F.; Luo, X.; Wang, Q.; Li, L. Aerial-BiSeNet: A Real-Time Semantic Segmentation Network for High Resolution Aerial Imagery. Chin. J. Aeronaut. 2021, 34, 47–59. [Google Scholar] [CrossRef]
Chen, Z.; Xie, L.; Niu, J.; Liu, X.; Wei, L.; Tian, Q. Visformer: The Vision-Friendly Transformer. arXiv 2021, arXiv:2104.12533. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. arXiv 2021, arXiv:2103.14030. [Google Scholar]
Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
Yang, X.; Chen, M.; Yu, C.; Huang, H.; Yue, X.; Zhou, B.; Ni, M. WaterSegformer: A Lightweight Model for Water Body Information Extraction from Remote Sensing Images. IET Image Process. 2023, 17, 862–871. [Google Scholar] [CrossRef]
Li, J.; Li, G.; Xie, T.; Wu, Z. MST-UNet: A Modified Swin Transformer for Water Bodies’ Mapping Using Sentinel-2 Images. J. Appl. Remote Sens. 2023, 17, 026507. [Google Scholar] [CrossRef]
Zhang, H.; Ning, X.; Shao, Z.; Wang, H. Spatiotemporal Pattern Analysis of China’s Cities Based on High-Resolution Imagery from 2000 to 2015. ISPRS Int. J. Geo-Inf. 2019, 8, 241. [Google Scholar] [CrossRef]
Yang, G.; Lei, J.; Zhu, Z.; Cheng, S.; Feng, Z.; Liang, R. AFPN: Asymptotic Feature Pyramid Network for Object Detection. In Proceedings of the 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, Oahu, HI, USA, 1–4 October 2023. [Google Scholar]
Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. arXiv 2018, arXiv:1608.06993. [Google Scholar]

Figure 1. Study area and sample sites in Chengdu City. The colored dots represent 4000 × 4000 sampling sites, with different colors indicating various urban water body types.

Figure 2. Flowchart of the study methodology, including dataset creation, model training, evaluation, and application. In Step 1, the red box delineates the boundaries of the specified sample, while in Step 5, the blue color illustrates the results of urban water bodies extracted within Chengdu.

Figure 3. Ad-SegFormer structure diagram. The inner section of the black line in the lower part of figure represents the additional modules we integrated based on the Segformer model.

Figure 4. Extraction results of different models in the task of urban water body extraction, where blue pixels represent the extracted water bodies.

Figure 5. Extraction results of different models on public Kaggle dataset, where blue pixels represent the extracted water bodies.

Figure 6. Mapping of urban water bodies in Chengdu City. Sub-figure (a–f) show the extraction results across various urban zones, including the urban center (residential, green space, and industrial areas), peri-urban, and rural regions. The findings encompass different water bodies such as major rivers, tributaries, lakes, small water bodies, and others.

Table 1. Water body masking examples and coverage statistics in CDUWD (water bodies indicated by color).

Dataset	CDUWD-1	CDUWD-2	CDUWD-3	CDUWD-4	CDUWD-5	CDUWD-6
Image
Mask
Count (1024 × 1024)	192	162	288	78	57	173
Percentage (%)	20.2	17.1	30.3	8.2	6.0	18.2

Table 2. Performance evaluation of various models for urban water body segmentation.

	Precision	Recall	IoU	F1-Score	Backbone	Flops (GFLOPs)	Parameter (M)
FCN	87.22	90.88	80.20	89.01	Resnet50	791.90	49.48
BiSeNet	86.99	94.04	82.44	90.37	Resnet50	396.31	59.24
DeepLabV3	89.72	93.96	84.83	91.79	Resnet50	1079.74	68.10
SegFormer	96.02	95.44	94.77	95.73	mit-b3	286.30	47.24
Swin Transformer	97.92	98.40	96.39	98.16	tiny	798.73	59.83
Ad-SegFormer	96.25	96.60	95.59	96.42	mit-b3	303.99	52.48

Table 3. Key factors in the model training. we only evaluate factors with multiple choices.

Factors	Choices	Explanations
Data size	ds1	1024 × 1024 pixels (including 760 training samples and 190 validation samples)
Data size	ds2	512 × 512 pixels (including 3040 training samples and 760 validation samples)
Data augmentation	da1	None
Data augmentation	da2	Random horizontal flip of 0.2–2.0 ration, random crop (1024 × 1024/512 × 512), fill (1024 × 1024/512 × 512)

Table 4. Evaluation results (“g” represents “Group.”).

	Combination	Precision	Recall	IoU	F1-Score
g1	ds1-da1	96.24	96.60	95.59	96.42
g2	ds1-da2	97.45	96.27	96.18	96.86
g3	ds2-da1	96.93	96.88	96.16	96.90
g4	ds2-da2	97.30	97.11	94.57	97.21

Table 5. Evaluation results for each subset of the CDUWD.

Subset of the Dataset	Type of Water Body	Overall Accuracy
CDUWD-1	main rivers	98.09%
CDUWD-2	small rivers	97.61%
CDUWD-3	lakes	98.98%
CDUWD-4	small water	98.12%
CDUWD-5	others water	99.16%
CDUWD-6	non-water	99.86%

Table 6. Evaluation results of each model using public Kaggle dataset.

	Precision	Recall	IoU	F1-Score
FCN	94.13	87.39	82.9	90.28
BiSeNet	94.85	92.3	88.09	93.51
DeepLabV3	94.33	85.85	81.46	89.33
SegFormer	93.64	92.63	87.36	93.12
Swin Transformer	95.35	92.53	88.68	93.86
Ad-SegFormer	95.22	93.42	89.41	94.29

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cheng, X.; Zhu, Q.; Song, Y.; Yang, J.; Wang, T.; Zhao, B.; Shen, Z. Precise City-Scale Urban Water Body Semantic Segmentation and Open-Source Sampleset Construction Based on Very High-Resolution Remote Sensing: A Case Study in Chengdu. Remote Sens. 2024, 16, 3873. https://doi.org/10.3390/rs16203873

AMA Style

Cheng X, Zhu Q, Song Y, Yang J, Wang T, Zhao B, Shen Z. Precise City-Scale Urban Water Body Semantic Segmentation and Open-Source Sampleset Construction Based on Very High-Resolution Remote Sensing: A Case Study in Chengdu. Remote Sensing. 2024; 16(20):3873. https://doi.org/10.3390/rs16203873

Chicago/Turabian Style

Cheng, Xi, Qian Zhu, Yujian Song, Jieyu Yang, Tingting Wang, Bin Zhao, and Zhanfeng Shen. 2024. "Precise City-Scale Urban Water Body Semantic Segmentation and Open-Source Sampleset Construction Based on Very High-Resolution Remote Sensing: A Case Study in Chengdu" Remote Sensing 16, no. 20: 3873. https://doi.org/10.3390/rs16203873

APA Style

Cheng, X., Zhu, Q., Song, Y., Yang, J., Wang, T., Zhao, B., & Shen, Z. (2024). Precise City-Scale Urban Water Body Semantic Segmentation and Open-Source Sampleset Construction Based on Very High-Resolution Remote Sensing: A Case Study in Chengdu. Remote Sensing, 16(20), 3873. https://doi.org/10.3390/rs16203873

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Precise City-Scale Urban Water Body Semantic Segmentation and Open-Source Sampleset Construction Based on Very High-Resolution Remote Sensing: A Case Study in Chengdu

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Source

2.3. Methods

2.3.1. Production of CDUWD

2.3.2. Data Augmentation

2.3.3. Ad-SegFormer Structure

2.3.4. Evaluation Index

3. Results

3.1. Comparison of Extraction Results of Different Methods

3.2. Key Parameter Analysis of the Models

3.3. Evaluation of Extraction Performance in CDUWD

3.4. Evaluation of Extraction Performance in on Public Dataset

3.5. Mapping of Urban Water Bodies in Chengdu

4. Discussion

4.1. Advantages of Transformer Models and Dataset Impact

4.2. Limitations and Future Directions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI