Rural Settlement Mapping and Its Spatiotemporal Dynamics Monitoring in the Yellow River Delta Using Multi-Modal Fusion of Landsat Optical and Sentinel-1 SAR Polarimetric Decomposition Data by Leveraging Deep Learning

Jiantao Liu; Yan Zhang; Fei Meng; Jianhua Gong; Dong Zhang; Yu Peng; Can Zhang

doi:10.3390/rs17213512

,

and

¹

School of Surveying and Geo-Informatics, Shandong Jianzhu University, Jinan 250101, China

²

National Engineering Research Center for Geoinformatics, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

³

School of Navigation and Aerospace Ineternet of Things, Aerospace Information Technology University, Jinan 250200, China

⁴

Qilu Aerospace Information Research Institute, Jinan 250132, China

Remote Sens.2025, 17(21), 3512;https://doi.org/10.3390/rs17213512

Version Notes

Order Reprints

Highlights

What·are·the·main·findings?

A TransUNet-based deep learning framework through integrating Sentinel-1 SAR and Landsat optical data was proposed for high-precision RSA mapping.
The proposed method achieved superior performance (F1 = 84.77%, mIoU = 85.39%), outperforming other methods in both accuracy and applicability for rural settlement extraction.

What is the implication of the main findings?

Multi-temporal analysis revealed a distinct “west/south dense—east/north sparse” spatial pattern with clustering characteristics and evolving morphological complexity (2002–2023).
The study established the comprehensive RSA dataset for the Yellow River Delta, providing critical baseline data for regional sustainable development planning.

Abstract

The Yellow River Delta (YRD) is a vital agricultural and ecologically fragile zone in China. Understanding the spatial pattern and evolutionary characteristics of Rural Settlements Area (RSA) in this region is crucial for both ecological protection and sustainable development. This study focuses on Dongying, a key YRD city, and compares four advanced deep learning models—U-Net, DeepLabv3+, TransUNet, and TransDeepLab—using fused Sentinel-1 radar and Landsat optical imagery to identify the optimal method for RSA mapping. Results show that TransUNet, integrating polarization and optical features, achieves the highest accuracy, with Precision, Recall, F1 score, and mIoU of 89.27%, 80.70%, 84.77%, and 85.39%, respectively. Accordingly, TransUNet was applied for the spatiotemporal extraction of RSA in 2002, 2008, 2015, 2019, and 2023. The results indicate that medium-sized settlements dominate, showing a “dense in the west/south, sparse in the east/north” pattern with clustered distribution. Settlement patches are generally regular but grow more complex over time while maintaining strong connectivity. In summary, the proposed method offers technical support for RSA identification in the YRD, and the extracted multi-temporal settlement data can serve as a valuable reference for optimizing settlement layout in the region.

Keywords:

rural settlement area; spatiotemporal evolution; Yellow River Delta; TransUNet; multi-modal data fusion

1. Introduction

In recent years, with the rapid economic growth and accelerated urbanization, a series of issues have emerged in rural settlements area (RSA). On the one hand, the swift socio-economic development has attracted a large number of rural laborers to migrate to urban areas, resulting in many RSA becoming “hollow villages”, thereby reducing the efficiency of regional land use [1,2]. On the other hand, the increase in agricultural income has triggered a surge in new construction, expansion, and renovation of rural housing, leading to disordered and low-density expansion of RSA. This has accelerated the loss of arable land, posed potential threats to food security, and adversely affected the regional ecological environment, thus hindering the development of agricultural productivity and the smooth progress of urbanization [3,4,5]. As a typical alluvial plain in the Yellow River Basin, the Yellow River Delta (YRD) is characterized by unique sediment deposition and river course changes, creating a dynamically evolving ecological environment. This makes it a representative area for studying the evolution of RSA under special geological conditions. Timely acquisition of the spatial pattern and spatiotemporal evolution characteristics of RSA in the region is of great significance for ecological protection and sustainable development planning [6,7].

With the continuous advancement of remote sensing and geographic information technologies, extensive achievements have been made in remote sensing data application and recognition methods [8]. Significant progress has also been achieved in the study of the spatiotemporal evolution of RSA in aspects such as settlement morphology, regional types, location, scale, density, and structural systems. Spatial autocorrelation analysis, kernel density estimation, landscape pattern indices, and the nearest neighbor index are widely employed in these studies [9,10]. However, most existing research focuses on short time scales, while long-term spatiotemporal dynamics remain underexplored [11]. In addition, many studies rely on single-source remote sensing data, which presents certain limitations. Optical imagery excels in capturing spectral and spatial details of surface features—such as color, texture, and vegetation coverage—but is susceptible to interference from cloud cover and haze, and is unusable at night [12,13,14]. Nighttime light data are cost-effective and provide broad coverage, but their low resolution may lead to the omission of small settlements and are often affected by light overflow and noise. LiDAR data offer high accuracy in capturing 3D structural features of settlements, yet they are costly to acquire and process. Synthetic Aperture Radar (SAR) actively transmits microwaves and receives backscattered signals, offering detailed scattering information of ground objects and functioning regardless of cloud or illumination conditions. Polarimetric decomposition of SAR data can further reveal target scattering mechanisms, providing a more comprehensive description of surface features. However, SAR lacks spectral information, as it only provides echo intensity [15]. Compared to single-source data, multi-source and multi-modal data fusion can enhance the comprehensiveness of land cover extraction. Fusing optical and polarimetric features can effectively mitigate the impact of weather and surface obstructions on RSA extraction [16,17]. Nevertheless, such fusion also increases the complexity of data processing and may raise the cost of data acquisition and analysis.

Methods for extracting RSA mainly include visual interpretation, pixel-based classification, object-oriented classification, machine learning, and deep learning [18,19,20]. Traditional visual interpretation is accurate but inefficient and subjective [21,22]. Pixel-based methods are simple but noisy and ignore spatial context. Object-based analysis improves accuracy but is parameter-sensitive. Machine learning handles high-dimensional data well but requires manual feature engineering. Deep learning achieves top performance automatically but is a black-box and computationally expensive [23,24].

At present, Convolutional Neural Networks (CNNs) and semantic segmentation models (such as U-Net, SegNet, and DeepLab) have shown outstanding performance in settlement extraction, as they can effectively learn high-level features [25,26,27]. Among them, the TransUNet model successfully integrates the strengths of CNNs and Transformers, preserving CNN’s advantage in local feature extraction while leveraging the self-attention mechanism of Transformers to enhance global context modeling, thereby improving performance in segmentation tasks [28]. For example, Aamir et al. used a U-Net-based architecture to identify rural residential areas from high-resolution imagery through image segmentation, predicting the pixels in remote sensing images that represent RSA in the study area, achieving an overall accuracy of 98%, and effectively accomplishing RSA extraction [29]. Similarly, Ye et al. employed Gaofen-2 imagery and a fully convolutional network to recognize and map RSA. By incorporating a Dilated-ResNet and a multi-scale context sub-network (SE module) into the ResNet architecture, their model effectively delineated and differentiated rural residential areas, achieving an overall accuracy of 98% and a Kappa coefficient of 85% [30]. However, most existing studies directly apply general RSA extraction strategies to specific study areas without adequately considering the characteristics of different data sources and regional geographic environments [31,32]. Therefore, there remains a need to explore settlement extraction strategies tailored to the specific properties of remote sensing imagery and the unique conditions of the study area, in order to meet the application requirements of various scenarios.

Based on the above discussion, this study employs Sentinel-1 radar data (processed with H-Alpha polarimetric decomposition to generate three bands: Entropy, Anisotropy, and Alpha) and Landsat optical data as the dataset for RSA recognition. Four advanced deep learning models—U-Net, DeepLabv3+, TransUNet, and TransDeepLab—are comparatively evaluated to identify the most effective model for RSA extraction in the YRD. More specifically, the objectives of this study are to: (i) compare the performance of different deep learning models based on fused polarization and optical features and determine the optimal model for RSA mapping; (ii) generate reliable time-series RSA data for Dongying City in 2002, 2008, 2015, 2019, and 2023; (iii) understand the spatiotemporal evolution characteristics of RSA in Dongying City from 2002 to 2023. The study is application-oriented, aiming to generate reliable multi-temporal settlement data for long-term spatiotemporal analysis. The methodological adaptation ensures data quality and robustness, supporting the subsequent comprehensive analysis.

The remainder of this paper is organized as follows. Section 2 introduces the study area and data sources. Section 3 presents the research methodology. Section 4 provides the RSA extraction results and analysis. Section 5 discusses the main conclusions and offers suggestions for future research.

2. Study Area and Dataset

2.1. Study Area Overview

The geographic location of the modern Yellow River Delta is defined by Kenli and Ninghai as its apex, extending from the Taoer River mouth in the north to the tributary ditch mouth in the south, covering a total area of approximately 5400 square kilometers. Among them, about 93% of the Yellow River Delta is located in Dongying City, Shandong Province. Dongying City is not only a city at the mouth of the Yellow River but also the central city of the Yellow River Delta, possessing unique geographical significance. Therefore, this research focuses on Dongying City as the study area in analyzing the evolution of rural settlements in the YRD. Dongying is located in the northern part of Shandong Province, within the YRD region. It administers three districts and two counties: Dongying District, Hekou District, Kenli District, Lijin County, and Guangrao County. Geographically, it lies between 36°55′N and 38°10′N latitude, and 118°07′E and 119°10′E longitude. It is bordered by the Bohai Sea to the east and north, Binzhou City to the west, and Zibo and Weifang cities to the south, covering a total area of 8243 km² (Figure 1). The region features a typical warm temperate continental monsoon climate and is predominantly flat, consisting mainly of plains. Wetlands and saline-alkali land are widely distributed throughout the area, with a low and slightly inclined terrain. As of the end of 2023, the permanent population of Dongying City was approximately 2.206 million, of which 606,900 were rural residents.

Figure 1. Study area overview.

2.2. Dataset

2.2.1. Note on Temporal Analysis and Period Selection

The temporal intervals between our study epochs (2002, 2008, 2015, 2019, and 2023) are irregular. This design was necessitated by the constraints of remote sensing data availability and quality. The years were specifically chosen to coincide with cloud-free, high-quality imagery from comparable Landsat sensors (TM, ETM+ and OLI) to ensure consistency in our land cover classification. The incorporation of Sentinel-1 SAR data was only feasible from 2015 onwards when consistent coverage of the study area became available.

It is important to note that our core analysis is based on calculating the net change in rural settlement area between these discrete epochs. We focus on the absolute change occurring within each specific period (e.g., 2002–2008, 2008–2015) rather than modeling or assuming a linear annualized rate of change. This approach is methodologically robust for assessing long-term trends using multi-temporal satellite imagery, as it prioritizes direct, measurable comparisons between time points over interpolated rates. Consequently, the conclusions drawn from comparing these net changes are valid despite the unequal interval lengths.

2.2.2. Data Source Introduction

The Landsat satellite is equipped with multiple spectral bands, providing data in the visible, near-infrared, and thermal infrared ranges. With a revisit cycle of 16 days, it enables the acquisition of long-term time-series data and is widely used in fields such as land use analysis, agricultural monitoring, and urban development [33,34]. Considering the availability of remote sensing data in the YRD region, we obtained Landsat Level-1 Terrain Precision (L1TP) optical imagery from the United States Geological Survey (USGS) website, selecting cloud-free scenes within the study area and the sample preparation region. While the relatively high cloud fractions reported in some images were mainly concentrated over the sea. The selected Landsat images are all from path 121, row 34. Specifically, for the years 2002 and 2008, the following Landsat 7 ETM+ and Landsat 5 TM bands were selected: B1 (blue), B2 (green), B3 (red), B4 (Near-Infrared, NIR), B5 (Shortwave Infrared 1, SWIR 1) and B7 (Shortwave Infrared 2, SWIR2). For the years 2015, 2019, and 2023, the following Landsat 8 OLI bands were selected: B2 (blue), B3 (green), B4 (red), B5 (Near-Infrared, NIR), B6 (Shortwave Infrared 1, SWIR 1), B7 (Shortwave Infrared 2, SWIR 2). Each downloaded Landsat scene covers parts of Binzhou, Dongying, Zibo, and Weifang cities. Preprocessing steps such as radiometric calibration, atmospheric correction, and clipping to the study area were performed on the imagery. Detailed information about the downloaded images is provided in Table 1.

Table 1. Information of Selected Landsat Images.

Sentinel-1 is a key Earth observation mission of the European Space Agency (ESA) under the Copernicus Programme. It consists of a constellation of two satellites, Sentinel-1A and Sentinel-1B, launched in 2014 and 2016, respectively. The system is capable of acquiring high-resolution ground images under all weather conditions, including at night and through cloud cover, typically with a spatial resolution of 10 m [35,36]. In this study, the Sentinel-1 SAR images were acquired in Interferometric Wide (IW) swath mode, Single Look Complex (SLC), C-band, with VV and VH polarization, at processing level-1. The images were obtained on 18 August 2015; 21 August 2019; and 24 August 2023.

3. Methodology

3.1. Overall Framework

The RSA extraction workflow for the YRD, based on the TransUNet model and fused multi-source, multi-modal Landsat and Sentinel-1 imagery, is illustrated in Figure 2. The main steps are as follows. (i) Data preprocessing and multi-source data fusion; (ii) Construction of RSA sample dataset; (iii) Determination of the optimal RSA extraction strategy; (iv) Temporal RSA extraction and evolution analysis.

Figure 2. Workflow.

3.2. Polarimetric Decomposition of Sentinel-1 Radar Data

Sentinel-1 data primarily provides single or dual-polarized backscattering coefficients, which reflect the intensity of radar wave backscatter from a target. Polarimetric decomposition, on the other hand, analyzes the polarimetric scattering matrix to extract the scattering mechanisms of targets, thereby offering a more comprehensive characterization of their scattering properties [37,38]. Based on theoretical foundations and application characteristics, polarimetric decomposition methods can generally be classified into two main categories: model-based and model-free. Model-based decomposition relies on specific physical scattering models to separate radar signals into known scattering mechanisms, such as Freeman–Durden or Yamaguchi decomposition [39,40]. Model-free decomposition, however, uses mathematical approaches to extract scattering information without relying on explicit physical models—for instance, Cloude–Pottier and Pauli decomposition [41]. Model-based methods are suitable when the scattering mechanisms are well understood, whereas model-free methods are more general and adaptable to complex land cover types and diverse scattering behaviors.

In this study, we apply the H-Alpha decomposition method in a dual-polarized mode to process Sentinel-1 SAR data, generating polarimetric datasets comprising three bands: Entropy (H), Anisotropy (A), and Alpha angle. The H-Alpha decomposition is a core component of the Cloude–Pottier approach. It effectively distinguishes between volume scattering and surface scattering in SAR imagery, making it suitable for both dual- and quad-polarized Sentinel-1 data [42]. This is particularly useful in analyzing scattering mechanisms in RSA, including vegetation, farmland, and buildings. For fully polarimetric SAR data, the H-Alpha decomposition begins by constructing a polarimetric coherence matrix (T) or covariance matrix (C). An eigenvalue decomposition is then applied to the coherence matrix T, yielding its eigenvalues (λ1, λ2, λ3) and corresponding eigenvectors (v1, v2, v3). The eigenvalues represent the distribution of scattering energy, while the eigenvectors indicate the orientation of the scattering mechanisms. Key parameters such as entropy (H), alpha angle (α), and anisotropy (A) are then derived from these components [43].

The Sentinel-1 data preprocessing was mainly performed in SNAP Desktop v9.0.0 software. The workflow included orbit refinement using precise orbit files, thermal-noise removal based on metadata-provided lookup tables, radiometric calibration to convert pixel values to backscattering coefficient σ⁰ (σ⁰ = |DN_i|²/A_i², where DN_i and A_i are pixel gray value and calibration parameter, respectively), speckle filtering using the Refined Lee filter to reduce noise while preserving radiometric and texture information, and terrain/geocoding correction using a 30 m SRTM DEM to correct geometric distortions, with resampling to 10 m spatial resolution. The detailed processing workflow and result of the Sentinel-1 SAR H-Alpha decomposition are illustrated in Figure 3. The specific Cloude-Pottier H/α/A decomposition implemented in this study for dual-polarized Sentinel-1 data requires SLC products that contain phase information to generate the covariance matrix [C2]. For fully polarimetric data, the same decomposition would operate on a 3 × 3 coherence matrix [T3] [44,45].

Figure 3. Workflow and Results of Sentinel-1 SAR Polarimetric Decomposition.

3.3. Data Fusion

In this study, the dataset used for RSA identification consists of Sentinel-1 radar data processed via H-Alpha polarimetric decomposition (including three bands: Entropy, Anisotropy, and Alpha) and Landsat optical data. Before performing multi-source data fusion, the radar data are resampled to a spatial resolution of 30 m. Additionally, both types of data are clipped to ensure a consistent spatial extent that fully covers the study area of Dongying City and the distribution of the sample set. For data fusion, the three polarimetric features derived from SAR were concatenated with the Landsat spectral bands to form the multi-source input dataset. Given the significant differences in pixel values between the different bands of various image sources, Z-score normalization is applied in this study to reduce band discrepancies and enhance data consistency prior to fusion [46]. Specifically, normalization was performed on a per-scene and per-band basis, using the mean and standard deviation of each individual band within the same scene. This procedure was carried out independently for radar and optical data, and separately for each temporal phase, ensuring that no cross-modal or cross-temporal statistics were involved. In this way, potential data leakage was effectively avoided.

3.4. Samples

The “Third Law of Geography” asserts that “the more similar the geographical environments, the more similar the geographic features” [47]. This fundamental principle suggests that locations with comparable environmental conditions tend to exhibit analogous spatial characteristics and patterns. Given the high similarity in both geographical environment and remote sensing characteristics between the study area (Dongying City) and the surrounding Yellow River Delta region, we prepared the sample set using fused optical and Sentinel-1 radar data from only the 2015, 2019, and 2023 images to enhance sample diversity and reduce the workload of manual interpretation by selecting typical areas with concentrated RSA from the downloaded Landsat scene for sample set preparation. In the study, RSA is defined as contiguous built-up areas dominated by rural residences, including residential clusters, courtyards, and associated infrastructure (e.g., village roads, small squares), which covers all identifiable RSAs, including natural villages and urban villages. Consequently, the sample set extended beyond administrative boundaries and covered parts of Dongying City, Zibo City, Binzhou City, and Weifang City.

For each year, four 1024 × 1024 pixel sample patches were extracted. Labels were manually created through visual interpretation. In the training sample set, each tile was further divided into multiple 512 × 512 pixel patches, generating approximately 200 patches per tile, resulting in a total of 1800 patches across the three years. From these, 1500 patches were dynamically selected for training, while 300 patches were used for validation. The remaining 1024 × 1024 pixel tiles (one per year) were reserved for testing, ensuring that training/validation and test sets were fully independent. All four patches are non-overlapping. Among the four patches, three were used as training samples for model training, and one was reserved for accuracy assessment. All four patches are non-overlapping. Figure 4 shows the spatial distribution of selected sample patches and examples of sample labels, where white areas represent RSA (label value of 1) and black areas represent non-settlement regions (label value of 0).

Figure 4. Sample block distribution map and schematic diagram.

3.5. Deep Learning Model Training

3.5.1. TransUNet Model

TransUNet is a deep learning model that combines the Transformer architecture with the U-Net structure, offering both global modeling capabilities and local feature extraction abilities [48,49]. It has achieved excellent performance in segmentation tasks. The model architecture is primarily composed of an encoder and a decoder (Figure 5).

Figure 5. TransUNet structure.

The encoder part adopts a CNN-Transformer architecture, where CNN is first used as a feature extractor to generate a feature map for the input image. The extracted image features from the CNN feature map undergo a 1 × 1 patch embedding, replacing the patch embedding method from the original image. The Transformer encoder consists of multiple identical Transformer layers, each containing two sub-layers: self-attention and a feed-forward neural network. The self-attention mechanism captures the relationships between different regions in the image, enabling the model to better understand the global context of the image. The decoder part retains the classic U-Net structure, composed of multiple upsampling and convolution operations. Each upsampling operation enlarges the feature map while preserving the feature information, gradually recovering the image’s spatial resolution. Similarly to traditional U-Net, the decoder uses skip connections to combine low-level and high-level features from the encoder, preserving and transmitting detailed information to the final segmentation result.

By combining the global information modeling of Transformer with the local feature extraction of U-Net, TransUNet achieves higher accuracy in image segmentation tasks, especially in complex tasks that require an understanding of both image details and global relationships, outperforming traditional U-Net.

3.5.2. U-Net Model

The U-Net model features a symmetric encoder-decoder architecture with skip connections, enabling precise pixel-level localization [50]. However, its high computational and memory demands limit its applicability on high-resolution images, and its performance in capturing long-range dependencies remains constrained [51].

3.5.3. Deeplabv3+ Model

DeepLabv3+ enhances segmentation accuracy through an atrous spatial pyramid pooling (ASPP) module and a decoder that integrates multi-level features [52]. While it effectively captures multi-scale contextual information and improves boundary refinement, its complex structure leads to increased parameters and higher hardware requirements. It may still underperform in scenarios with highly intricate edges.

3.5.4. TransDeepLab Model

TransDeepLab combines a Swin Transformer-based encoder with DeepLabv3+’s decoder design, improving global context modeling and long-range dependency capture [53]. It maintains the ASPP module for multi-scale feature extraction and incorporates attention mechanisms for cross-scale feature interaction. Although more efficient than pure Transformer models, it demands larger datasets for training and shows limited gains in small-object segmentation.

3.6. Accuracy Assessment

To evaluate the segmentation performance of the proposed model, four metrics are used: Precision, Recall, F1-score, and Mean Intersection over Union (mIoU). For binary classification images, each pixel has four possible prediction outcomes: True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN)—where TP is the number of pixels correctly classified as RSA category, TN is the number of pixels correctly classified as non-RSA, FP is the number of pixels misclassified as RSA, and FN is the number of pixels misclassified as non-RSA.

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

R e c a l l = \frac{T P}{T P + F N}

(2)

F 1 = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(3)

I o U = \frac{T P}{T P + F P + F N}

(4)

m I o U = \frac{1}{N} \sum_{i = 1}^{N} I o U_{i}

(5)

The IoU ranges from 0 to 1, with values closer to 1 indicating better model fit. mIoU is the average of the IoU across all categories. The advantage of mIoU is that it can provide a balanced evaluation of the model’s performance across different categories, especially in datasets with imbalanced category distribution [54,55,56].

4. Results

4.1. Experimental Settings

The model training in this study adopts the following strategies. (i) The training is conducted for 30 epochs. (ii) Input images are cropped into 512 × 512 pixel patches, with a batch size of 4 per iteration, which was determined through multiple tests to balance GPU memory constraints and training stability. (iii) The Adam (Adaptive Moment Estimation) optimizer is used for parameter optimization, with an initial learning rate of 1 × 10⁻⁴ to ensure stable convergence, and a ReduceLROnPlateau scheduler is employed to automatically decrease the learning rate when the validation loss plateaued. (iv) Binary Focal Cross-Entropy Loss is employed to address class imbalance. (v) A strategy for saving the best model is implemented to retain the parameters achieving the highest performance on the validation set in real time.

Additionally, the deep learning library used was TensorFlow 2.17.0. The optimization of model parameters was conducted on the Ubuntu 22.04 operating system with Intel Xeon(R) Gold 6240R CPU and NVIDIA GeForce RTX 4090 with 128 GB memory.

4.2. Accuracy Evaluation Results

To scientifically determine the optimal strategy for extracting RSA information in the YRD, we designed two sets of comparative experiments.

(i): Model Performance Comparison (Fixed Data Source). Using the multisource fused dataset as a unified data source, four models—U-Net, DeepLabv3+, TransUNet, and TransDeepLab—were employed to extract RSA for the years 2015, 2019, and 2023. By quantitatively comparing the models’ performance in terms of Precision, Recall, F1-score, and mIoU, the optimal extraction model was identified.
(ii): Data Source Comparison (Fixed Model). Using the best-performing model selected in Experiment (i) as the unified extraction method, RSA for 2015, 2019, and 2023 were extracted based on Landsat optical imagery, polarization-decomposed radar imagery, and the fused multisource dataset. This experiment aimed to verify the advantage of multisource and multimodal remote sensing data fusion from a data dimension perspective. The accuracy comparison results of the RSA extraction models are shown in Table 2.

Table 2. Accuracy Comparison of RSA Extraction Models.

According to Table 2, under the same data source conditions, the TransUNet model demonstrates the best performance among all evaluated models. It achieves the highest scores across all accuracy metrics with Precision (89.27%), Recall (80.70%), F1-score (84.77%), and mIoU (85.39%). Each metric significantly outperforms those of the other comparison models. The U-Net and TransDeepLab models follow in terms of accuracy, while the DeepLabv3+ model performs the worst. Specifically, the mIoU of TransUNet surpasses that of U-Net (83.50%), DeepLabv3+ (80.33%), and TransDeepLab (83.45%) by 1.89%, 5.06%, and 1.94%, respectively. This substantial performance advantage confirms the superiority of TransUNet in image segmentation tasks. Therefore, we select TransUNet as the optimal model for RSA identification.

To further verify the advantage of multisource and multimodal data, the same RSA identification model was tested using both fused optical-radar data and single-modal data. The experimental results (as shown in Table 3) reveal that the mIoU of the fused optical-radar dataset exceeds that of using only optical imagery (81.83%) and only radar imagery (77.39%) by 3.56% and 8.00%, respectively. These accuracy evaluation results demonstrate the effectiveness of multisource data fusion in improving model performance. This advantage primarily stems from the complementary nature of optical and radar imagery—their synergy enables the model to capture more comprehensive land surface features, maintaining stable extraction accuracy across various complex scenarios. In contrast, single-source datasets are limited in information dimensions, leading to relatively restricted performance. Therefore, optical-radar fusion data is adopted as the optimal data source for RSA extraction in this study.

Table 3. Comparison of Extraction Accuracy Using Different Data Sources for RSA.

Based on the results of the two experiments, it is concluded that the combination of the TransUNet model and the fused optical-polarimetric data yields the highest extraction accuracy for RSA. This approach is therefore adopted as the optimal strategy for RSA information extraction in this study. Furthermore, the extracted RSA data based on this strategy serve as the foundation for subsequent feature analysis. Specifically, the TransUNet model and Landsat optical imagery are used to extract RSA for the years 2002 and 2008, while the TransUNet model combined with multi-source data fusion is applied for the years 2015, 2019, and 2023.

4.3. Extraction Results of RSA

The time-series extraction results of RSA in this study are shown in Figure 6. It can be seen that the proposed method demonstrates strong learning capability and effectively identifies the spatial distribution of RSA across different time periods. Figure 7 shows the five phases of remote sensing images and RSA extraction results of different types of RSA in a local area of Dongying City. Due to the limitations of image resolution, the classification results contain certain misclassification errors, while the overall classification accuracy remains satisfactory. The results accurately reflect the boundaries of RSA and maintain good stability across various terrains and land cover types. Validation against actual ground-truth data confirms that the proposed method retains high recognition accuracy even in complex scenarios. Therefore, for RSA information extraction in the estuarine delta region, the TransUNet model combined with optical and polarimetric feature fusion proves to be both practical and reliable, providing strong data support for the spatiotemporal evolution analysis of RSA.

Figure 6. Time-Series Extraction Results of RSA in Dongying City.

Figure 7. Local Extraction Results of RSA (the upper image shows the remote sensing imagery, while the lower image displays the extraction results overlaid with RSA). The (a–c) in the figure represent 1–3 groups.

4.4. Spatiotemporal Evolution Characteristics of RSA

4.4.1. RSA Scale Evolution

To investigate changes in the scale and hierarchy of RSA in Dongying City, this study employed GIS spatial analysis to calculate the area of each RSA patch. The classification of RSA into five scales—Small settlements (≤0.01 km²), Small-to-medium settlements (0.01–0.05 km²), Medium settlements (0.05–0.2 km²), Large settlements (0.2–0.5 km²), and Extra-large settlements (≥0.5 km²). This classification scheme was determined by referring to common practices in rural settlement studies, where four to five size categories are typically adopted to reflect structural heterogeneity. Considering the actual distribution of settlement sizes in Dongying City, the thresholds were further adjusted to capture local characteristics shaped by both natural and socio-economic conditions. The number of patches and the total area of patches for each category were statistically analyzed. Detailed results are presented in Figure 8, Table 4 and Table 5.

Figure 8. Histogram of changes in RSA scale: (a) Statistics of the Number of RSA Patches by Scale; (b) Statistics of the Area of RSA Patches by Scale.

Table 4. Statistics of the Number of RSA Patches by Scale (Unit: patches).

Table 5. Statistics of the Area of RSA Patches by Scale (Unit: km²).

In terms of scale, RSA in Dongying are predominantly medium-sized, followed by large settlements, while extra-large settlements remain the least numerous. Small settlements account for a relatively high proportion and exhibit the greatest fluctuations over time. Between 2002 and 2023, the number of small settlements increased markedly, from 361 to 703, whereas large and extra-large settlements grew more steadily, from 228 to 275 and from 35 to 53, respectively. Although small settlements expanded in number, the overall settlement area has increasingly concentrated in medium- and large-scale settlements. In particular, the continuous expansion of large and extra-large settlements has become the main driver of spatial intensification. This shift suggests that Dongying’s RSA system is gradually transforming from a dispersed to a more centralized and large-scale pattern.

4.4.2. RSA Density Evolution

The kernel density analysis method, based on a non-parametric model, transforms discrete sample data into a continuous distribution function while eliminating the influence of unknown factors [57]. This allows for a more intuitive and clear analysis of the dynamic evolution of variables. In this study, the polygon features of RSA across five time periods in Dongying City were converted into point features, and kernel density analysis was conducted based on the point data. The kernel density results were classified using the natural breaks method, divided into five density intervals, including [0, 0.18), [0.18, 0.51), [0.51, 1.00), [1.00, 2.38), and [2.38, 3.55] (Unit: settlements/km²). These intervals represent low-, low–medium-, medium-, medium–high-, and high-density RSA zones, respectively. The detailed analysis results are illustrated in Figure 9.

Figure 9. Spatial Distribution of RSA Density.

Over time, the spatial distribution of RSA in Dongying City has exhibited distinct patterns characterized by “higher density in the west than the east, and in the south than the north,” as well as “contiguous low-density areas and increasingly clustered high-density areas forming patches and nodes.” Overall, the RSA density in each district and county has shown a year-on-year increase, with a clear shift from lower to higher density values. By 2023, the areas of medium, medium-high, and high-density zones had all increased compared to those in 2002. Notably, the central region of Lijin County gradually became the high-density core area of RSA in the city. This shift is closely related to local industrial development, transportation improvements, rural planning, and supportive policies. The emergence of high-density zones is typically associated with the acceleration of industrialization and urbanization, as population increasingly concentrates in these areas, resulting in a pronounced clustering effect of RSA. At the same time, the reduction in low-density areas reflects a transformation of Dongying’s RSA pattern—from a scattered layout to a more centralized development model.

4.4.3. RSA Agglomeration Evolution

Hotspot analysis identifies areas of local high-value aggregation (hotspots) and low-value aggregation (cold spots), providing a detailed depiction of the spatial distribution of specific phenomena [58]. In this study, a 2 km × 2 km grid was selected as the analysis unit after comparative trials with multiple scales, as it provided the most effective balance between spatial detail and interpretability, and proved most suitable for revealing meaningful clustering patterns of RSA. Vector data of RSA across five time periods in Dongying City were spatially associated with the grid based on settlement area attributes. The Hot Spot Analysis tool was applied to detect spatial clusters based on settlement area within each grid cell. The results were then classified into four intensity levels using the natural breaks method: Low, Medium, High, Very High. These levels represent the varying intensities of settlement scale clustering, with the results illustrated in Figure 10.

Figure 10. Hotspot Map of RSA Scale in Dongying City.

RSA in the eastern part of Dongying are relatively sparse, with low-value hotspots mainly distributed in Hekou, Dongying, and Kenli Districts. In contrast, very high hotspot areas are concentrated in central Lijin County, western Dongying District, and southern Guangrao County, where continuous clusters are especially prominent. Medium and high hotspot areas occur around these zones but cover smaller areas. Across five periods, the overall clustering pattern has remained stable in spatial location, though the intensity has shifted: clustering in Lijin has gradually strengthened, western Dongying has remained stable, and southern Guangrao weakened during 2002–2008 but intensified again thereafter.

Comparison with density analysis shows that settlement number and scale are generally positively correlated, as larger settlements often appear in high-density areas. However, exceptions exist: while the numerical core of settlements is in central Lijin, large-scale hotspots are more concentrated in southern Guangrao. This divergence indicates varied organizational patterns of rural settlements across Dongying, shaped by multiple interacting factors.

4.4.4. RSA Morphology Evolution

This study employed Fragstats 4.2 software and selected five landscape pattern indices to analyze the morphological evolution of RSA, including Number of Patches (NP), Mean Patch Area (AREA_MN), Landscape Shape Index (LSI), Mean Shape Index (SHAPE_MN), Patch Cohesion Index (COHESION). These indices provide a comprehensive understanding of the spatial configuration, shape complexity, and connectivity of RSA patches. The formulas and definitions for each index can be found in References [59,60].

Table 6 shows the changes in landscape pattern indices of RSA in Dongying from 2002 to 2023. The Number of Patches (NP) first decreased from 1728 in 2002 to 1505 in 2008, then rose to 1951 in 2023, while the Mean Patch Area (AREA_MN) exhibited the opposite trend. This suggests a consolidation of settlements during 2002–2008, followed by increased fragmentation as new or dispersed settlements appeared after 2008.

Table 6. Presents the landscape pattern indices of RSA in Dongying City.

The Landscape Shape Index (LSI) and Mean Shape Index (SHAPE_MN), which reflect morphological complexity, also display an overall upward trend, despite a short decline during 2002–2008 when settlement shapes became more regular. By 2023, LSI reached 54.518 and SHAPE_MN 1.317, indicating increasingly irregular settlement geometries and greater heterogeneity. The Patch Cohesion Index (COHESION) remained high, rising slightly from 92.992 to 93.952, showing that connectivity among patches has stayed strong despite growing morphological complexity. Overall, these results confirm the trend toward spatial agglomeration in Dongying’s RSA distribution.

5. Conclusions and Discussion

5.1. Conclusions

In this study, we take Dongying City, a major city in the YRD, as the research area. A multi-source and multi-modal dataset combining H-Alpha decomposition polarimetric features from Sentinel-1 SAR data and Landsat optical imagery was constructed as the input to the model. A deep learning model, TransUNet, was employed to develop a method for accurately identifying RSA suited to the characteristics of the YRD region. RSA from five time periods were extracted, and regional statistical analysis, kernel density analysis, hotspot analysis, and landscape pattern index analysis were applied to explore the spatial and temporal evolution in terms of scale, density, clustering, and morphology. The main conclusions are as follows.

(a): The TransUNet model, integrating optical and polarimetric features, achieved satisfactory accuracy in extracting multi-temporal RSA information in the YRD, with Precision, Recall, F1-score, and mIoU reaching 89.27%, 80.70%, 84.77%, and 85.39%, respectively, making it the optimal strategy for RSA extraction in this study.
(b): Under the same data conditions (fusion of optical and polarimetric features), the mIoU of the TransUNet model improved by 1.89%, 5.06%, and 1.94% compared to U-Net, DeepLabv3+, and TransDeepLab, respectively. The ViT module in TransUNet effectively captures global contextual information, overcoming the limited receptive field of U-Net and DeepLabv3+, while skip connections enable multi-scale feature fusion. Compared with TransDeepLab, TransUNet achieves a more balanced integration of local details and global semantics. Under the same model architecture (TransUNet), the fusion of optical and radar features outperformed using either single-source data alone, as their complementarity enables the model to capture more comprehensive surface information, ensuring stable extraction accuracy in complex scenarios.
(c): Analysis of the spatiotemporal evolution of RSA in Dongying City revealed that medium-sized settlements dominate, while extra-large settlements remain few. The density pattern is characterized by “higher density in the west and south, and lower density in the east and north”, with low-density zones distributed in large patches and high-density zones increasingly forming clustered patches and points. The central area of Lijin County has gradually become the core high-density area of RSA. Spatial distribution overall shows an agglomeration pattern, where large-scale RSA often coincide with high-density zones. Settlement patches were initially more regular but have become increasingly complex and irregular, although spatial cohesion remains high, with no significant decline in patch connectivity—further verifying the trend of spatial agglomeration in RSA evolution.

5.2. Discussion

This study establishes an application-oriented framework that links methodological adaptation with spatiotemporal analysis of rural settlements. Although no fundamentally new algorithm is proposed, the integration of advanced deep learning models with fused optical and SAR data demonstrates the potential of adapting state-of-the-art techniques to specific regional contexts. The results not only provide reliable long-term settlement data for the Yellow River Delta but also highlight the value of model–data synergy in supporting sustainable development planning.

The spatiotemporal evolution patterns identified—such as the dominance of medium-sized settlements and the “dense in the west/south and sparse in the east/north” distribution—are consistent with previous findings on urban–rural differentiation in the Yellow River Delta, while offering new evidence based on higher-resolution, multi-temporal settlement mapping. This alignment reinforces the robustness of our approach, while the multi-source data fusion strategy enhances mapping accuracy beyond traditional single-sensor methods.

In terms of methodology, sample patches were selected from entire satellite scenes covering Dongying and surrounding areas to maximize data diversity and ensure robust model training. Owing to the high similarity in geographical and settlement characteristics across these areas, this strategy provided a reliable basis for model development, and the independent test patches remain representative for Dongying. A key limitation, however, is the lack of SAR data prior to 2014, which restricted multi-source training to 2015, 2019, and 2023. For 2002 and 2008, the trained model was directly applied to optical imagery only. Visual inspection confirmed that the predicted settlement patterns aligned well with the original Landsat imagery, suggesting good model generalization, though differences in predictor availability may introduce some uncertainty in long-term comparisons. Future research should consider incorporating alternative SAR archives or harmonized optical-only approaches to further improve temporal consistency and robustness, and seek higher-resolution optical imagery to improve the accuracy of RSA detail extraction.

Finally, although some Landsat scenes contained relatively high overall cloud fractions, the clouds were largely concentrated offshore and did not affect the visibility of rural settlements. Similarly, minor temporal mismatches between Sentinel-1 and Landsat acquisitions were considered negligible, as rural settlements are structurally stable within a year.

Author Contributions

Conceptualization, J.L.; Formal analysis, Y.Z. and C.Z.; Funding acquisition, J.L.; Investigation, Y.Z.; Methodology, J.L.; Visualization, C.Z.; Writing—original draft, J.L. and Y.Z.; Writing—review and editing, F.M., J.G., D.Z. and Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Natural Science Foundation of China (Grant No. 42171113), the Jinan City and University Integration Development Project (Grant No. JNSX2023065), and the Shandong Natural Science Foundation (Grant No. ZR2024QD122).

Data Availability Statement

The data presented in this study are available on request from the corresponding author or the first author. The data are not publicly available due to the confidentiality requirements of the project source.

Acknowledgments

The authors thank ESA for providing Sentinel-1 Data.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, M.; Li, Q.; Bai, Y.; Fang, C. A novel framework to evaluate urban-rural coordinated development: A case study in Shanxi Province, China. Habitat Int. 2024, 144, 103013. [Google Scholar] [CrossRef]
Yang, L.; Ou, C.; Wang, Z.; Du, Z.; Yao, X.; Du, Z. Unveiling patterns and drivers of long-term rural settlement changes from the urban-rural gradient perspective: A case study of the Beijing-Tianjin-Hebei region in China. Habitat Int. 2025, 156, 103300. [Google Scholar] [CrossRef]
Qu, Y.; Jiang, G.; Ma, W.; Li, Z. How does the rural settlement transition contribute to shaping sustainable rural development? Evidence from Shandong, China. J. Rural Stud. 2021, 82, 279–293. [Google Scholar]
Zhang, R.; Zhang, X. Spatial Pattern Evolution and Driving Mechanism of Rural Settlements in Rapidly Urbanized Areas: A Case Study of Jiangning District in Nanjing City, China. Land 2023, 12, 749. [Google Scholar] [CrossRef]
Chen, S.; Wang, X.; Lin, Q. Spatial pattern characteristics and influencing factors of mountainous rural settlements in metropolitan fringe area: A case study of Pingnan County, Fujian Province. Heliyon 2024, 10, e26606. [Google Scholar] [CrossRef]
Fu, Y.; Chen, S.; Ji, H.; Fan, Y.; Li, P. The modern Yellow River Delta in transition: Causes and implications. Mar. Geol. 2021, 436, 106476. [Google Scholar] [CrossRef]
Liu, J.; Zhang, C.; Feng, Q.; Yin, G.; Zhang, Y. Large-scale subpixel mapping of impervious surface in Yellow River Delta High-efficiency Ecological Economic Zone: An artificial intelligence approach. Earth Sci. Inform. 2024, 18, 43. [Google Scholar] [CrossRef]
Song, W.; Li, H. Spatial pattern evolution of rural settlements from 1961 to 2030 in Tongzhou District, China. Land Use Policy 2020, 99, 105044. [Google Scholar] [CrossRef]
Shi, L.; Wang, Y. Evolution characteristics and driving factors of negative decoupled rural residential land and resident population in the Yellow River Basin. Land Use Policy 2021, 109, 105685. [Google Scholar] [CrossRef]
Ji, Z.; Xu, Y.; Sun, M.; Liu, C.; Lu, L.; Huang, A.; Duan, Y.; Liu, L. Spatiotemporal characteristics and dynamic mechanism of rural settlements based on typical transects: A case study of Zhangjiakou City, China. Habitat Int. 2022, 123, 102545. [Google Scholar] [CrossRef]
Chen, S.; Wang, X.; Qiang, Y.; Lin, Q. Spatial–temporal evolution and land use transition of rural settlements in mountainous counties. Environ. Sci. Eur. 2024, 36, 38. [Google Scholar] [CrossRef]
Wang, Y.; Zhu, X.; Wei, T.; Xu, F.; Williams, T.K.-A.; Zhang, H. Entity-based image analysis: A new strategy to map rural settlements from Landsat images. Remote Sens. Environ. 2025, 318, 114549. [Google Scholar] [CrossRef]
Feng, Q.; Niu, B.; Ren, Y.; Su, S.; Wang, J.; Shi, H.; Yang, J.; Han, M. A 10-m national-scale map of ground-mounted photovoltaic power stations in China of 2020. Sci. Data 2024, 11, 198. [Google Scholar] [CrossRef]
Xing, H.; Zhu, L.; Chen, B.; Liu, C.; Niu, J.; Li, X.; Feng, Y.; Fang, W. A comparative study of threshold selection methods for change detection from very high-resolution remote sensing images. Earth Sci. Inform. 2022, 15, 369–381. [Google Scholar] [CrossRef]
Liu, J.; Li, Y.; Zhang, Y.; Liu, X. Large-Scale Impervious Surface Area Mapping and Pattern Evolution of the Yellow River Delta Using Sentinel-1/2 on the GEE. Remote Sens. 2023, 15, 136. [Google Scholar] [CrossRef]
Xu, R. Mapping Rural Settlements from Landsat and Sentinel Time Series by Integrating Pixel- and Object-Based Methods. Land 2021, 10, 244. [Google Scholar] [CrossRef]
Shrestha, B.; Ahmad, S.; Stephen, H. Fusion of Sentinel-1 and Sentinel-2 data in mapping the impervious surfaces at city scale. Environ. Monit. Assess. 2021, 193, 556. [Google Scholar] [CrossRef] [PubMed]
Liu, J.; Zhang, Y.; Feng, Q.; Yin, G.; Zhang, D.; Li, Y.; Gong, J.; Li, Y.; Li, J. Understanding urban expansion and shrinkage via green plastic cover mapping based on GEE cloud platform: A case study of Shandong, China. Int. J. Appl. Earth Obs. Geoinf. 2024, 128, 103749. [Google Scholar] [CrossRef]
Fallatah, A.; Simon, J.; Mitchell, D. Object-based random forest classification for informal settlements identification in the Middle East: Jeddah a case study. Int. J. Remote Sens. 2020, 41, 4421–4445. [Google Scholar] [CrossRef]
Xing, H.; Bingyao, C.; Yongyu, F.; Yuanlong, N.; Dongyang, H.; Xue, W.; Kong, Y. Mapping irrigated, rainfed and paddy croplands from time-series Sentinel-2 images by integrating pixel-based classification and image segmentation on Google Earth Engine. Geocarto Int. 2022, 37, 13291–13310. [Google Scholar] [CrossRef]
Wang, Z. Spatial Differentiation Characteristics of Rural Areas Based on Machine Learning and GIS Statistical Analysis—A Case Study of Yongtai County, Fuzhou City. Sustainability 2023, 15, 4367. [Google Scholar] [CrossRef]
Xu, F.; Ho, H.C.; Chi, G.; Wang, Z. Abandoned rural residential land: Using machine learning techniques to identify rural residential land vulnerable to be abandoned in mountainous areas. Habitat Int. 2019, 84, 43–56. [Google Scholar] [CrossRef]
Zheng, X.; Pu, S.; Xue, X. ASCEND-UNet: An Improved UNet Configuration Optimized for Rural Settlements Mapping. Sensors 2024, 24, 5453. [Google Scholar] [CrossRef] [PubMed]
Li, T.; Feng, Q.; Niu, B.; Chen, B.; Yan, F.; Gong, J.; Liu, J. Mapping urban villages based on point-of-interest data and a deep learning approach. Cities 2025, 156, 105549. [Google Scholar] [CrossRef]
Chen, Z.; Li, D.; Fan, W.; Guan, H.; Wang, C.; Li, J. Self-Attention in Reconstruction Bias U-Net for Semantic Segmentation of Building Rooftops in Optical Remote Sensing Images. Remote Sens. 2021, 13, 2524. [Google Scholar] [CrossRef]
Wang, Y.; Zhao, Q.; Wu, Y.; Tian, W.; Zhang, G. SCA-Net: Multiscale Contextual Information Network for Building Extraction Based on High-Resolution Remote Sensing Images. Remote Sens. 2023, 15, 4466. [Google Scholar] [CrossRef]
Chen, B.; Feng, Q.; Niu, B.; Yan, F.; Gao, B.; Yang, J.; Gong, J.; Liu, J. Multi-modal fusion of satellite and street-view images for urban village classification based on a dual-branch deep neural network. Int. J. Appl. Earth Obs. Geoinf. 2022, 109, 102794. [Google Scholar] [CrossRef]
Chen, J.; Mei, J.; Li, X.; Lu, Y.; Yu, Q.; Wei, Q.; Luo, X.; Xie, Y.; Adeli, E.; Wang, Y.; et al. TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers. Med. Image Anal. 2024, 97, 103280. [Google Scholar] [CrossRef] [PubMed]
Aamir, Z.; Seddouki, M.; Himmy, O.; Maanan, M.; Tahiri, M.; Rhinane, H. Rural settlements segmentation based on deep learning U-Net using remote sensing images. ISPRS Int. Soc. Photogramm. Remote Sens. 2022, 48, 1–5. [Google Scholar] [CrossRef]
Ye, Z.; Si, B.; Lin, Y.; Zheng, Q.; Zhou, R.; Huang, L.; Wang, K. Mapping and Discriminating Rural Settlements Using Gaofen-2 Images and a Fully Convolutional Network. Sensors 2020, 20, 6062. [Google Scholar] [CrossRef]
Conrad, C.; Rudloff, M.; Abdullaev, I.; Thiel, M.; Löw, F.; Lamers, J.P.A. Measuring rural settlement expansion in Uzbekistan using remote sensing to support spatial planning. Appl. Geogr. 2015, 62, 29–43. [Google Scholar] [CrossRef]
Shi, Z.; Ma, L.; Zhang, W.; Gong, M. Differentiation and correlation of spatial pattern and multifunction in rural settlements considering topographic gradients: Evidence from Loess Hilly Region, China. J. Environ. Manag. 2022, 315, 115127. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.; Huang, S.; Fang, C.; Guan, L.; Liu, M. Global urban and rural settlement dataset from 2000 to 2020. Sci. Data 2024, 11, 1359. [Google Scholar] [CrossRef]
Hoffman-Hall, A.; Loboda, T.V.; Hall, J.V.; Carroll, M.L.; Chen, D. Mapping remote rural settlements at 30 m spatial resolution using geospatial data-fusion. Remote Sens. Environ. 2019, 233, 111386. [Google Scholar] [CrossRef]
Verma, A.; Bhattacharya, A.; Dey, S.; López-Martínez, C.; Gamba, P. Built-up area mapping using Sentinel-1 SAR data. ISPRS J. Photogramm. Remote Sens. 2023, 203, 55–70. [Google Scholar] [CrossRef]
Liu, J.; Li, Y.; Zhang, Y.; Feng, Q.; Shi, T.; Zhang, D.; Liu, P. Impervious surface Mapping and its spatial–temporal evolution analysis in the Yellow River Delta over the last three decades using Google Earth Engine. Earth Sci. Inform. 2023, 16, 1727–1739. [Google Scholar] [CrossRef]
Holm, W.A.; Barnes, R.M. On radar polarization mixed target state decomposition techniques. In Proceedings of the 1988 IEEE National Radar Conference, Arbor, MI, USA, 20–21 April 1988; pp. 249–254. [Google Scholar]
Fang, L.; Yang, Z.; Mu, W.; Liu, T. A Novel Polarization Scattering Decomposition Model and Its Application to Ship Detection. Remote Sens. 2024, 16, 178. [Google Scholar] [CrossRef]
Freeman, A.; Durden, S.L. A three-component scattering model for polarimetric SAR data. IEEE Trans. Geosci. Remote Sens. 1998, 36, 963–973. [Google Scholar] [CrossRef]
Yamaguchi, Y.; Moriyama, T.; Ishido, M.; Yamada, H. Four-component scattering model for polarimetric SAR image decomposition. IEEE Trans. Geosci. Remote Sens. 2005, 43, 1699–1706. [Google Scholar] [CrossRef]
Cloude, S.R.; Pottier, E. A review of target decomposition theorems in radar polarimetry. IEEE Trans. Geosci. Remote Sens. 1996, 34, 498–518. [Google Scholar] [CrossRef]
Salma, S.; Keerthana, N.; Dodamani, B.M. Target decomposition using dual-polarization sentinel-1 SAR data: Study on crop growth analysis. Remote Sens. Appl. Soc. Environ. 2022, 28, 100854. [Google Scholar] [CrossRef]
Salehi, M.; Maghsoudi, Y.; Mohammadzadeh, A. Assessment of the Potential of H/A/Alpha Decomposition for Polarimetric Interferometric SAR Data. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2440–2451. [Google Scholar] [CrossRef]
Sinha, S. H/A/α Polarimetric Decomposition Of Dual Polarized Alos Palsar for Efficient Land Feature Detection and Biomass Estimation Over Tropical Deciduous Forest. Geogr. Environ. Sustain. 2022, 15, 37–46. [Google Scholar] [CrossRef]
Cloude, S.R. The dual polarization entropy/alpha decomposition: A PALSAR case study. Environ. Sci. 2007, 644, 2. [Google Scholar]
Cabello-Solorzano, K.; Ortigosa de Araujo, I.; Peña, M.; Correia, L.; J Tallón-Ballesteros, A. The Impact of Data Normalization on the Accuracy of Machine Learning Algorithms: A Comparative Analysis. In Proceedings of the 18th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2023), Salamanca, Spain, 5–7 September 2023; pp. 344–353. [Google Scholar]
Zhu, A.; Lv, G.; Zhou, C.; Qin, C. Geographic similarity: Third Law of Geography? J. Geo-Inf. Sci. 2020, 22, 673–679. [Google Scholar]
Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Zhou, Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In Proceedings of the Computer Vision—ECCV 2022 Workshops, Tel Aviv, Israel, 23–27 October 2023; pp. 205–218. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.W.; Wu, J. UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 1055–1059. [Google Scholar]
Liu, J.; Zhang, Y.; Liu, C.; Liu, X. Monitoring Impervious Surface Area Dynamics in Urban Areas Using Sentinel-2 Data and Improved Deeplabv3+ Model: A Case Study of Jinan City, China. Remote Sens. 2023, 15, 1976. [Google Scholar] [CrossRef]
Azad, R.; Heidari, M.; Shariatnia, M.; Aghdam, E.K.; Karimijafarbigloo, S.; Adeli, E.; Merhof, D. TransDeepLab: Convolution-Free Transformer-Based DeepLab v3+ for Medical Image Segmentation. In Proceedings of the Predictive Intelligence in Medicine, Singapore, 22 September 2022; pp. 91–102. [Google Scholar]
Liu, P.; Liu, X.; Liu, M.; Shi, Q.; Yang, J.; Xu, X.; Zhang, Y. Building Footprint Extraction from High-Resolution Images via Spatial Residual Inception Convolutional Neural Network. Remote Sens. 2019, 11, 830. [Google Scholar] [CrossRef]
Luo, H.; Khoshelham, K.; Chen, C.; He, H. Individual tree extraction from urban mobile laser scanning point clouds using deep pointwise direction embedding. ISPRS J. Photogramm. Remote Sens. 2021, 175, 326–339. [Google Scholar] [CrossRef]
Li, Y.; Shi, T.; Zhang, Y.; Chen, W.; Wang, Z.; Li, H. Learning deep semantic segmentation network under multiple weakly-supervised constraints for cross-domain remote sensing image semantic segmentation. ISPRS J. Photogramm. Remote Sens. 2021, 175, 20–33. [Google Scholar] [CrossRef]
Liu, Y.; Ke, X.; Wu, W.; Zhang, M.; Fu, X.; Li, J.; Jiang, J.; He, Y.; Zhou, C.; Li, W.; et al. Geospatial characterization of rural settlements and potential targets for revitalization by geoinformation technology. Sci. Rep. 2022, 12, 8399. [Google Scholar] [CrossRef] [PubMed]
Tan, S.; Zhang, M.; Wang, A.; Ni, Q. Spatio-Temporal Evolution and Driving Factors of Rural Settlements in Low Hilly Region—A Case Study of 17 Cities in Hubei Province, China. Int. J. Environ. Res. Public Health 2021, 18, 2387. [Google Scholar] [CrossRef] [PubMed]
Shimrah, T.; Sarma, K.; Varga, O.G.; Szilard, S.; Singh, S.K. Quantitative assessment of landscape transformation using earth observation datasets in Shirui Hill of Manipur, India. Remote Sens. Appl. Soc. Environ. 2019, 15, 100237. [Google Scholar] [CrossRef]
Shifaw, E.; Sha, J.; Li, X. Detection of spatiotemporal dynamics of land cover and its drivers using remote sensing and landscape metrics (Pingtan Island, China). Environ. Dev. Sustain. 2020, 22, 1269–1298. [Google Scholar] [CrossRef]

Figure 1. Study area overview.

Figure 2. Workflow.

Figure 3. Workflow and Results of Sentinel-1 SAR Polarimetric Decomposition.

Figure 4. Sample block distribution map and schematic diagram.

Figure 5. TransUNet structure.

Figure 6. Time-Series Extraction Results of RSA in Dongying City.

Figure 7. Local Extraction Results of RSA (the upper image shows the remote sensing imagery, while the lower image displays the extraction results overlaid with RSA). The (a–c) in the figure represent 1–3 groups.

Figure 8. Histogram of changes in RSA scale: (a) Statistics of the Number of RSA Patches by Scale; (b) Statistics of the Area of RSA Patches by Scale.

Figure 9. Spatial Distribution of RSA Density.

Figure 10. Hotspot Map of RSA Scale in Dongying City.

Table 1. Information of Selected Landsat Images.

Acquisition Date	Sensor Satellite	Cloud Cover/%
28 August 2002	Landsat 7 ETM+	2
23 October 2008	Landsat 5 TM	7
05 June 2015	Landsat 8 OLI	16.21
31 June 2019	Landsat 8 OLI	19.77
15 September 2023	Landsat 8 OLI	0.06

Note: The Landsat 7 ETM+ image used for 2002 was acquired prior to the Scan Line Corrector (SLC) failure in May 2003. Therefore, it is an SLC-on product and does not contain the data gaps associated with later Landsat 7 imagery.

Table 2. Accuracy Comparison of RSA Extraction Models.

Method	Precision (%)	Recall (%)	F1 (%)	mIoU (%)
U-Net	89.10	76.72	82.45	83.50
Deeplabv3+	88.96	69.93	78.31	80.33
TransUNet	89.27	80.70	84.77	85.39
TransDeepLab	87.24	78.11	82.42	83.45

Table 3. Comparison of Extraction Accuracy Using Different Data Sources for RSA.

Data Source Types	Precision (%)	Recall (%)	F1 (%)	mIoU (%)
Optical Remote Sensing Data	91.13	71.69	80.25	81.83
Radar Remote Sensing Data	86.29	65.15	74.24	77.39
Fused Optical and Radar Data	89.27	80.70	84.77	85.39

Table 4. Statistics of the Number of RSA Patches by Scale (Unit: patches).

Category	Range (km²)	2002	2008	2015	2019	2023
Small settlements	≤0.01	361	185	413	361	703
Small-to-medium settlements	0.01–0.05	386	362	356	410	372
Medium settlements	0.05–0.2	761	688	700	779	762
Large settlements	0.2–0.5	228	236	242	249	275
Extra-large settlements	≥0.5	35	66	62	62	53
All settlements		1771	1537	1773	1861	2165

Table 5. Statistics of the Area of RSA Patches by Scale (Unit: km²).

Category	Range (km²)	2002	2008	2015	2019	2023
Small settlements	≤0.01	1.26	0.78	1.33	0.97	1.88
Small-to-medium settlements	0.01–0.05	10.61	10.87	10.23	12.38	10.29
Medium settlements	0.05–0.2	85.75	74.05	74.71	84.68	84.58
Large settlements	0.2–0.5	66.98	71.51	73.01	76.84	83.26
Extra-large settlements	≥0.5	26.03	51.19	49.66	51.25	43.73
All settlements		190.63	208.40	208.94	226.12	223.74

Table 6. Presents the landscape pattern indices of RSA in Dongying City.

Year	NP	AREA_MN	LSI	SHAPE_MN	COHESION
2002	1728	10.991	49.047	1.263	92.992
2008	1505	13.801	45.898	1.258	93.664
2015	1657	12.613	48.697	1.292	93.724
2019	1621	13.956	50.166	1.316	94.002
2023	1951	11.468	54.518	1.317	93.952

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Rural Settlement Mapping and Its Spatiotemporal Dynamics Monitoring in the Yellow River Delta Using Multi-Modal Fusion of Landsat Optical and Sentinel-1 SAR Polarimetric Decomposition Data by Leveraging Deep Learning

Highlights

Abstract

1. Introduction

2. Study Area and Dataset

2.1. Study Area Overview

2.2. Dataset

2.2.1. Note on Temporal Analysis and Period Selection

2.2.2. Data Source Introduction

3. Methodology

3.1. Overall Framework

3.2. Polarimetric Decomposition of Sentinel-1 Radar Data

3.3. Data Fusion

3.4. Samples

3.5. Deep Learning Model Training

3.5.1. TransUNet Model

3.5.2. U-Net Model

3.5.3. Deeplabv3+ Model

3.5.4. TransDeepLab Model

3.6. Accuracy Assessment

4. Results

4.1. Experimental Settings

4.2. Accuracy Evaluation Results

4.3. Extraction Results of RSA

4.4. Spatiotemporal Evolution Characteristics of RSA

4.4.1. RSA Scale Evolution

4.4.2. RSA Density Evolution

4.4.3. RSA Agglomeration Evolution

4.4.4. RSA Morphology Evolution

5. Conclusions and Discussion

5.1. Conclusions

5.2. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics