Multi-Source Attention U-Net: A Novel Deep Learning Framework for the Land Use and Soil Salinization Classification of Keriya Oasis in China with RADARSAT-2 and Landsat-8 Data

Xiang, Yang; Nurmemet, Ilyas; Lv, Xiaobo; Yu, Xinru; Gu, Aoxiang; Aihaiti, Aihepa; Li, Shiqin

doi:10.3390/land14030649

Open AccessArticle

Multi-Source Attention U-Net: A Novel Deep Learning Framework for the Land Use and Soil Salinization Classification of Keriya Oasis in China with RADARSAT-2 and Landsat-8 Data

by

Yang Xiang

^1,2,3

,

Ilyas Nurmemet

^1,2,3,*

,

Xiaobo Lv

^1,2,3,

Xinru Yu

^1,2,3,

Aoxiang Gu

^1,2,3,

Aihepa Aihaiti

^1,2,3

and

Shiqin Li

^1,2,3

¹

College of Geography and Remote Sensing Sciences, Xinjiang University, Urumqi 830046, China

²

Xinjiang Key Laboratory of Oasis Ecology, Xinjiang University, Urumqi 830046, China

³

Xinjiang Field Scientific Observation and Research Station for the Oasisization Process in the Hinterland of the Taklamakan Desert, Yutian 848400, China

^*

Author to whom correspondence should be addressed.

Land 2025, 14(3), 649; https://doi.org/10.3390/land14030649

Submission received: 8 February 2025 / Revised: 15 March 2025 / Accepted: 17 March 2025 / Published: 19 March 2025

Download

Browse Figures

Versions Notes

Abstract

Soil salinization significantly impacts global agricultural productivity, contributing to desertification and land degradation; thus, rapid regional monitoring of soil salinization is crucial for agricultural production and sustainable management. With advancements in artificial intelligence, the efficiency and precision of deep learning classification models applied to remote sensing imagery have been demonstrated. Given the limited feature learning capability of traditional machine learning, this study introduces an innovative deep fusion U-Net model called MSA-U-Net (Multi-Source Attention U-Net) incorporating a Convolutional Block Attention Module (CBAM) within the skip connections to improve feature extraction and fusion. A salinized soil classification dataset was developed by combining spectral indices obtained from Landsat-8 Operational Land Imager (OLI) data and polarimetric scattering features extracted from RADARSAT-2 data using polarization target decomposition. To select optimal features, the Boruta algorithm was employed to rank features, selecting the top eight features to construct a multispectral (MS) dataset, a synthetic aperture radar (SAR) dataset, and an MS + SAR dataset. Furthermore, Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), and deep learning methods including U-Net and MSA-U-Net were employed to identify the different degrees of salinized soil. The results indicated that the MS + SAR dataset outperformed the MS dataset, with the inclusion of the SAR band resulting in an Overall Accuracy (OA) increase of 1.94–7.77%. Moreover, the MS + SAR MSA-U-Net, in comparison to traditional machine learning methods and the baseline model, improved the OA and Kappa coefficient by 8.24% to 12.55% and 0.08 to 0.15, respectively. The results demonstrate that the MSA-U-Net outperformed traditional models, indicating the potential of integrating multi-source data with deep learning techniques for monitoring soil salinity.

Keywords:

soil salinity; RADARSAT-2; classification; polarimetric decomposition; deep learning

1. Introduction

Soil salinization is considered an environmental and socioeconomic challenge, profoundly impacting agricultural development and ecosystem health [1]. Salinity issues can arise in arid areas and may be the result of natural processes or human activities [2]. In general, the persistent lack of precipitation and improper irrigation lead to the migration of accumulated salts from saline geological deposits to the surface, leading to excessive accumulation of salts [3]. Over the past two decades, soil salinization has been a significant ecological challenge, mostly caused by the growing influence of climate change [4]. The global distribution of saline soils covers a significant land area, totaling approximately 932.2 million hectares. More than 100 countries are impacted by soil salinization [5,6]. In China, saline soils make up about 5% of the country’s arable land [4], with Xinjiang being recognized as one of the largest global regions affected by widespread salinization [7]. Therefore, an exact and real-time assessment of soil salinization extent is vital for the effective implementation of management strategies and the sustainable use of land resources.

Traditional methods for assessing regional soil salinization, primarily through field-based sampling and laboratory analysis, are time-consuming, labor-intensive, and limited in their capacity to offer large-scale, real-time dynamic monitoring [8]. In contrast, remote sensing technology provides an affordable and efficient approach to soil salinity assessment, providing rapid, timely, and relatively inexpensive data acquisition [9,10]. It has, thus, been largely adopted for soil salinity mapping and monitoring. Nevertheless, directly and accurately estimating soil salt content from optical imagery remains challenging due to constraints in spatial and spectral resolutions, as well as interference from vegetation cover and atmospheric effects [11]. In this context, SAR has significant advantages due to its strong penetrative capabilities and ability to capture data under all-weather, all-day conditions [12,13]. SAR technology holds considerable promise for soil salinity detection, as radar systems are highly sensitive to parameters such as electrical conductivity (EC), backscattering coefficient, soil moisture, and salinity [14,15].

With the advancement of SAR technologies, polarimetric SAR (PolSAR) has emerged as a high-resolution imaging system that provides comprehensive polarization scattering information about terrains, surpassing the capabilities of single-polarization SAR systems [16]. The full-polarization scattering information derived from PolSAR data is associated with the physical properties of surface objects, which makes a promising technology for accessing soil salinity [17]. For example, Zhang et al. retrieved and estimated soil salt content from dual-polarized PolSAR images using a one-dimensional convolutional neural network [18]. In addition, polarimetric target decomposition has found extensive application across multiple areas, such as monitoring geological disasters [19], assessing forests [20], estimating soil moisture and salinity [21], and performing land cover classification [17]. Overall, polarization decomposition offers a novel perspective for retrieving ground information, offering significant insights for the management and sustainable use of soils.

With the advent of the age of artificial intelligence, deep learning (DL) techniques have demonstrated considerable promise in extracting features from remote sensing images [22]. Numerous studies have demonstrated that, in contrast to traditional machine learning, the distinctive multi-layered architecture of neural networks enables the extraction of multi-scale and deep-level feature information, thereby allowing it to achieve a superior classification accuracy and effect [23,24]. DL approaches have gained considerable attention from researchers due to their superior classification performance on satellite imagery, particularly with complex data types such as PolSAR [25]. Fu et al. proposed an optimized fully convolutional network (FCN) model that utilizes a dilated convolution and multi-scale network architecture to achieve precise classification of high-resolution images [26]. Rajat Garg et al. utilized the DeepLab V3+ model and its transfer learning capabilities to achieve superior performance in land cover classification using PolSAR data, even with small datasets [24]. Despite the advancements made by FCN in semantic segmentation, they also present several drawbacks, including spatial information loss due to multiple pooling layers; high memory and computational demands; increased complexity in training, particularly with limited datasets; and challenges in accurately segmenting object boundaries [27].

The U-Net model, a symmetric FCN, was originally developed for image segmentation tasks in the field of biomedical imaging [28]. By integrating low-level spatial information from down-sampling with an up-sampling input via skip connections, U-Net significantly outperforms traditional FCNs [29]. In recent years, U-Net-based deep learning approaches have made significant strides in remote sensing data monitoring, largely due to U-Net’s ability to capture subtle spatial features and perform well under small-sample conditions [30]. Although both traditional and deep learning methods have achieved notable success in land cover classification, most studies still predominantly rely on either visible light or SAR data [31,32]. As several studies have pointed out, this reliance on a single data source often limits the consistency and reliability of the classification accuracy [33,34]. Specifically, single-source data may struggle to capture the full complexity of regions with significant variation in land cover characteristics, such as complex terrain and boundary areas [35]. Despite the challenges posed by data preprocessing issues—such as differences in resolution [36], registration [37], noise handling [38] and the difficulties associated with effectively integrating features from different data sources—research focusing on fully using the complementary advantages of multi-source data for more comprehensive feature extraction remains limited [39]. Integrating different data sources has the potential to combine the spectral details with the edge and texture information, thereby improving classification accuracy [40].

Building on its strong performance with small samples, several U-Net-based networks and improved U-Net networks have been optimized by integrating existing modules to enhance efficiency, robustness, and generalization. Attention mechanisms, such as the Convolutional Block Attention Module (CBAM) introduced by Woo et al., have been shown to improve deep learning models by adaptively focusing on relevant features and suppressing noise [41]. The CBAM integrates channel and spatial attention to enhance feature maps, making it particularly effective in improving convolutional neural networks for remote sensing tasks [41]. Specifically, channel attention strengthens the representation of critical features across different channels, allowing the model to focus on meaningful information in multi-source remote sensing data [42]. For instance, radar data channels typically contain pronounced edge and texture information, while spectral data channels capture land cover spectral details [43]. By enhancing key channels, channel attention can reduce interference from less relevant channels in soil salinity monitoring, potentially improving soil salinity monitoring under specific conditions. Spatial attention, on the other hand, adjusts feature responses at different spatial positions, potentially increasing the model’s ability to detect spatial patterns in soil salinity distributions. Therefore, this module can aid in capturing potential transitional distribution patterns of soil salinization.

In order to make full use of deep learning to extract multi-scale features from images by combining the rich texture information with the multispectral information, this paper proposes an innovative deep-fusion U-Net model utilizing multi-source data to map different degrees of salinized soil. The model, named Multi-Source Attention U-Net (MSA-U-Net) in this study, is based on a deep fusion architecture that integrates the CBAM into the skip connection, featuring a simple, lightweight structure capable of leveraging limited RS data for improved accuracy.

2. Materials

2.1. Study Site

The Keriya Oasis (35°14′–39°29′ N, 81°9′–82°51′ E) in southwestern Xinjiang, China, is located between Cele County to the west and Minfeng County to the east [33]. Covering an approximate total area of 39,500 km², the Keriya Oasis exhibits mountainous terrain in its southern region and flat plain desert landscapes to the north [44]. The region has a dry climate, low precipitation, and high evaporation rates, leading to extreme drought conditions [45]. As a typical inland arid desert region with a warm temperate climate, it also experiences abundant solar radiation and heat resources. The annual temperature stands at 11.6 °C, with a notable temperature variation between day and night. Based on the above reasons, water shortages and extremely fragile ecosystems have emerged as serious environmental issues in this region [46]. The location of the study area is shown in Figure 1.

The Keriya River, a seasonal watercourse, flows from south to north, originating in the Kunlun Mountains [47]. The meltwater provides a major water source for oasis agriculture in the midstream areas and natural oases in the downstream. However, due to intense evaporation, dissolved salts have accumulated on the surface, resulting in elevated salt concentrations. This has resulted in significant water shortages, which have contributed to soil salt accumulation, exacerbating ecological degradation in the Keriya Oasis [48]. Therefore, it is essential to develop a comprehensive understanding of the region’s characteristics.

2.2. Data Collection and Preprocessing

The RADARSAT-2 satellite was launched by the Canadian Space Agency (CSA) on 14 December 2007 [49]. The RADARSAT-2 data were collected on 6 May 2022. Additionally, optical data from the Landsat-8 OLI, launched by NASA in 2013 [50], were also used in this study. The main parameters are shown in Table 1.

The RADARSAT-2 data were preprocessed using SNAP 9.0 software [51]. The main preprocessing steps are as follows: (1) radiometric calibration; (2) the generation of T3 polarization coherent matrix; (3) multi-view processing; (4) application of a 5 × 5 window refined Lee filtering for polarimetric speckle reduction; (5) geocoding; (6) resampling to the optimal resolution of 10 m.

The Landsat-8 OLI data were preprocessed by ENVI 5.3 and ArcMap 10.8.1 by performing the following operations: (1) radiometric calibration; (2) atmospheric correction; (3) image alignment; (4) resampling to the same resolution of 10 m; (5) spatial subset.

2.3. Field Data

Field sampling for this study took place in May 2022, with 60 sampling sites (Figure 2) distributed relatively evenly throughout the region. These sites represented a range of land use and soil salinization (LUSS) types. Soil type, soil texture, sampling time, number of sampling sites, surrounding vegetation types, and the corresponding GPS coordinate were recorded together at the sampling sites. The researchers securely sealed the soil samples in sealed bags with labels for laboratory analysis. The topsoil (0–20 cm) samples were air-dried, ground, and filtered through 1 mm sieves. Each soil sample was thoroughly mixed with distilled water at a soil-to-solution ratio of 1:5 in a flask and then shaken to achieve homogeneous dispersion. The mixture was allowed to settle for 24 h before the supernatant was carefully filtered to remove suspended particles. The electrical conductivity (

{EC}_{1 : 5}

) of the filtered solution was then measured using a digital multiparameter measuring apparatus (Multi 3420 Set B, WTW GmbH, Munich, Germany) [33]. According to previous studies, a highly significant linear relationship exists between the ECe and

{EC}_{1 : 5}

values for salinized soils in southern Xinjiang [52], with the

{EC}_{1 : 5}

value converted to the ECe value using an empirical equation, marked as EC [53,54]. Soil electrical conductivity has been found to be positively linked to its soluble salt content. Consequently, the researchers employed regression equations to estimate the total salt content in samples [55,56].

By combining high-definition online satellite images with field photos, the LUSS types were classified into six categories based on our previous laboratory research [57] and the soil salinization classification standards outlined in [58]. These categories, determined by physical and chemical properties, especially electrical conductivity, include vegetation, water body, bare land, slightly salinized soil, moderately salinized soil, and highly salinized soil (Table 2).

2.4. Collection of Training and Validation Data

Training data are a critical component of supervised learning, with most machine learning algorithms relying on a large number of samples. However, acquiring reference data from satellite images can be a difficult task [59]. To construct a deep learning training set, 180 sample plots (Figure 3) were created through visual interpretation, combined with field-collected landscape photos and sampling point data from the study area, false-color composites from Landsat-8 OLI, and spectral indices (e.g., vegetation or soil-related indices). These resources were used for visual interpretation and classification of different LUSS classes.

Since the vegetation, bare land, and water bodies of Keriya Oasis are obviously displayed in remote sensing images, some areas of the study area were labeled through visual interpretation. For salinized soil of different degrees, our research team members divided plots along the oasis edge’s highway at the junction of the oasis and desert and sampled the salinized soil, and the sampling points were evenly distributed within each plot. The distribution of some salinized land plots is shown in Figure 2. After laboratory analysis, the EC values were obtained and used as a reference for visual salinization levels at the sample sites.

The training data include 180 plots of 128 × 128 pixels, and the validation data consist of 70 plots of the same size. In order to ensure data consistency, deep learning and machine learning require the same set of training and validation. Due to computational memory limitations, it is not possible to train all machine learning model algorithms using the complete training set. Thus, we established a balanced dataset by randomly selecting sampling points from the training and validation plots, ensuring the same number of pixels in each class [60]. The statistics of training set and verification set are shown in Table 3.

3. Methods

This study developed MSA-U-Net, an enhanced architecture integrating the Convolutional Block Attention Module (CBAM) into U-Net’s skip connections to optimize multi-scale feature fusion. We conducted comprehensive comparisons of traditional machine learning methods (SVM, RF, KNN) and deep learning models (U-Net, MSA-U-Net) for soil salinization classification. The entire study’s flowchart is shown in Figure 4.

3.1. Vegetation and Soil-Related Indices

Optical remote sensing obtains surface information by transmitting and receiving electromagnetic wave energy reflected from targets [61]. Thus, the pre-processed Landsat-8 images contain rich spectral information, most of which is presented in the spectral reflectance values of the different visible bands of the images [62]. In salinized soils, soil salinity affects the growth of vegetation and the physicochemical composition of the soil, leading to changes in spectral reflectance at different salinity levels [63,64]. Thus, the spectral reflectance of saline soils or vegetation can indirectly indicate soil salinity [65,66].

In recent years, numerous vegetation and soil salinity indices have been developed for the effective monitoring of vegetation health and soil salinity respectively. These indices are crucial for understanding the relationship between optical imagery and soil characteristics. The spectral indices including vegetation and soil indices (listed in Table 4) were selected for feature extraction based on their relevance to the salinization process and their proven effectiveness in previous studies [9,67].

3.2. Polarimetric Decomposition

The polarization target decomposition theory, first proposed in 1970 [80], laid the foundation for understanding scattering mechanisms in synthetic aperture radar (SAR) images. Numerous polarization decomposition methods have been developed and widely applied to analyze and process fully polarized SAR data [81,82,83]. These methods decompose each pixel of a SAR image into multiple scattering components based on known scattering mechanisms. To describe and analyze the different types of scattering mechanisms, we derive the polarization covariance and coherence matrix of the target. Polarization target decomposition methods are typically divided into two categories: coherent decomposition, which is based on the target scattering matrix, and incoherent decomposition, which utilizes the polarization coherence matrix, polarization covariance matrix, Mueller matrix, or Stokes vector [84].

In this study, the scattering matrix S, correlation matrix T, and covariance matrix C were derived from the RADARSAT-2 data. To fully utilize the PolSAR data, several polarization target decomposition methods were employed, including Pauli decomposition for coherent polarization target decomposition, Freeman–Durden decomposition [85], Yamaguchi decomposition [86], and Cloude decomposition [87] for incoherent polarization target decomposition, and Sinclair decomposition for common target polarization decomposition. Additionally, Van Zyl decomposition [88], H/A/Alpha decomposition [89], and Touzi decomposition [90] were also considered as part of the eight polarization target decomposition methods used in this study.

The scattering matrix S of the target is expressed as follows:

S = [\begin{matrix} 〈S_{h h}〉 & 〈S_{h v}〉 \\ 〈S_{v h}〉 & 〈S_{v v}〉 \end{matrix}] .

(1)

The polarization covariance matrix C of the target is expressed as follows:

C = \frac{1}{2} [\begin{matrix} 〈S_{h h} \cdot {S_{h h}}^{*}〉 & \sqrt{2} 〈S_{h h} \cdot {S_{h v}}^{*}〉 & 〈S_{h h} \cdot {S_{v v}}^{*}〉 \\ \sqrt{2} 〈S_{h v} \cdot {S_{h h}}^{*}〉 & 2 〈S_{h v} \cdot {S_{h v}}^{*}〉 & \sqrt{2} 〈S_{h v} \cdot {S_{v v}}^{*}〉 \\ 〈S_{v v} \cdot {S_{h h}}^{*}〉 & \sqrt{2} 〈S_{v v} \cdot {S_{h v}}^{*}〉 & 〈S_{v v} \cdot {S_{v v}}^{*}〉 \end{matrix}] .

(2)

The polarization coherence matrix T of the target is expressed as follows:

T = \frac{1}{2} [\begin{matrix} 〈{(S_{h h} + S_{v v}) \cdot (S}_{h h} + {S_{v v})}^{*}〉 & 〈{(S_{h h} + S_{v v}) \cdot (S}_{h h} - {S_{v v})}^{*}〉 & 2 〈(S_{h h} + S_{v v}) \cdot {S_{h v}}^{*}〉 \\ 〈{(S_{h h} - S_{v v}) \cdot (S}_{h h} + {S_{v v})}^{*}〉 & 〈(S_{h h} - S_{v v}) \cdot {(S}_{h h} - {S_{v v})}^{*}〉 & 2 〈(S_{h h} - S_{v v}) \cdot {S_{h v}}^{*}〉 \\ 2 〈S_{h v} {\cdot (S}_{h h} + {S_{v v})}^{*}〉 & 2 〈S_{h v} {\cdot (S}_{h h} - {S_{v v})}^{*}〉 & 4 〈S_{h v} \cdot {S_{h v}}^{*}〉 \end{matrix}]

(3)

where the symbol * denotes the conjugate.

3.3. Optimal Features Selection

Feature selection is a critical step in the feature construction process, involving the selection of the most important subset of variables for prediction or classification tasks. The Boruta algorithm is an efficient feature selection optimization technique that introduces shadow features and trains the features along with the original features in a unified feature matrix to evaluate the importance of the original features [91]. Specifically, the algorithm generates shadow features randomly and compares them with the candidate features to identify those in the dataset that are relevant to the dependent variable, ranking them according to their relative importance [92].

3.4. Supervised Classification

3.4.1. Machine Learning Algorithms

K-Nearest Neighbor (KNN)

KNN is a classic classification algorithm first proposed in 1967 [93]. The core principle of the KNN algorithm involves evaluating the features of test data and comparing them with those in the training set, subsequently identifying the

K

nearest samples to perform classification or prediction tasks. The similarity between samples can be measured using various distance metrics [94,95,96]. KNN has been commonly applied to soil salinity monitoring due to its simplicity, ease of understanding, suitability for multiple classification tasks, and broad applicability [97,98].

Support Vector Machine (SVM)

The Support Vector Machine (SVM) algorithm, first introduced by Vapnik et al., is among the most commonly used kernel-based learning algorithms [99]. In remote sensing classification, the data sample to be classified is typically a single pixel extracted from multispectral or hyperspectral images. The corresponding set of band measurements for each pixel forms a feature vector. The core principle of SVM involves the identification of a hyperplane with the largest margin in a transformed feature space, derived from the feature vectors, with the objective of effectively partitioning the dataset into specified categories [100]. Furthermore, SVM is especially beneficial in remote sensing applications for its capability to manage limited training datasets, often resulting in a higher classification accuracy in comparison with traditional methods [101,102].

Random Forest (RF)

The Random Forest (RF) algorithm, proposed by Leo Breiman et al., is a widely utilized technique in machine learning [103]. RF has been extensively applied in soil salinity monitoring and information extraction [104,105,106]. This method employs ensemble learning, incorporating decision trees, with multiple trees trained on bootstrapped data samples [107]. The classification of each pixel is determined by the majority vote of all trees. In recent studies, RF has gained popularity for classification and regression tasks due to its high-speed training and high classification accuracy [108].

3.4.2. Deep Learning Algorithms

U-Net

The U-Net, introduced by Olaf Ronneberger et al. in 2015, is a convolutional neural network architecture [28,109]. The backbone of U-Net comprises two symmetrical parts: the encoder and the decoder. The encoder processes the input image through pooling operations and extracts features by stacking multiple convolutional layers. Pooling reduces the spatial dimensions by down-sampling while capturing increasingly abstract features. The decoder generates an up-sampled feature map (using operations such as transposed convolution) by combining features from different stages of the encoder. The final layer of the decoder is a convolutional layer that acts as a classifier, generating a probability matrix as output. The output is then thresholded, or the maximum probability value is selected, with each pixel being assigned to its corresponding category.

Multi-Source Attention U-Net (MSA-U-Net)

Based on the U-Net framework, this study introduced and modified the architecture to propose the Multi-Source Attention U-Net (MSA-U-Net). This innovative design enhances the original U-Net by incorporating the CBAM into the skip connections, thereby facilitating more effective feature fusion across different layers. A detailed overview of the MSA-U-Net is presented in Figure 5.

Each layer of the encoding section consists of a series of operations designed to extract and refine features from the input. It includes two main stages, each comprising a convolutional layer, followed by a rectified linear unit (ReLU) activation function [110], and a batch normalization layer. The ReLU activation function introduces non-linearity, and the batch normalization layer normalizes the activations to stabilize and accelerate the training process. The initial convolutional layer converts the input feature map from input to output channels, utilizing a 3 × 3 kernel with a padding of 1 to preserve the spatial dimension. The second stage repeats these operations, further refining the feature map while keeping the output channel number consistent. After the initial extraction of features, the network performs an early fusion of the optical and radar features through concatenation, followed by further encoding to integrate the multi-modal information deeply.

In the decoder, each stage begins with a 2 × 2 transposed convolution operation for up-sampling the feature map, followed by halving the number of feature channels. Subsequently, the up-sampled feature mapping is further refined using a sequence of two 3 × 3 convolutional layers. This up-sampling and convolutional refinement process is repeated four times, halving the number of filters at each stage. A 1 × 1 convolution operation at the end of MSA-U-Net is used to reduce the feature maps to make them equal to the number of classes in the dataset.

The attention mechanism is a process that focuses on the critical parts of information, effectively filtering out secondary or irrelevant details. In deep learning, the attention mechanism enhances the model’s ability to concentrate on key information by dynamically assigning weights to the input data, thereby enhancing processing efficiency and accuracy. Therefore, the MSA-U-Net introduces the Convolutional Block Attention Module (CBAM) (Figure 6) at various levels of the decoder pathway. The CBAM consists of two components: the channel attention mechanism (CAM) and the spatial attention mechanism (SAM).

Radar data capture the surface texture and structure using backscatter and polarization, reflecting roughness, moisture, and material type. Due to their high sensitivity to surface material composition and physical properties, spectral data can distinguish features such as vegetation, soil, and water by reflectance at different wavelengths. The channel attention mechanism assigns weights to important feature channels, prioritizing key information in both radar and spectral data (e.g., vegetation indices or radar polarization channels). This helps the model automatically select the most relevant features, extracting the texture and structure from the radar data, and material composition from the spectral data. When integrating both data types, channel attention balances their differences, enhancing detailed information. The spatial attention mechanism captures local spatial information, focusing on key regions within images, improving the model’s ability to differentiate surface areas and boost performance in multi-source data fusion.

In summary, the proposed MSA-U-Net model not only leverages the complementary strengths of optical and radar data but also enhances segmentation performance through the strategic incorporation of attention mechanisms. It is designed to be versatile and can be used for various tasks requiring high precision in multi-source image analysis, including remote sensing image segmentation.

In addition to the main classification models, we designed two U-Net variants, U-Net-SAM and U-Net-CAM, to evaluate the impact of different attention mechanisms on skip connections. These variants are discussed in the Discussion section.

The proposed models were developed in Python 3.11.8 using a Pytorch backend. All the experiments were conducted on an NVIDIA GeForce RTX 4060 GPU (NVIDIA, Santa Clara, CA, USA). The initial learning rate was assigned as 0.0001. The models were trained with mini-batches of size 16, and early stopping was applied during the training process [111].

3.5. Model Evaluation Metrics

This study employed several evaluation metrics to measure the performance of different algorithms on the validation dataset. The specific evaluation metrics include User’s Accuracy (UA), Producer’s Accuracy (PA), Overall Accuracy (OA), Kappa coefficient, and F1 score [112]. The formulas are as follows:

Recall = \frac{TP}{TP + FN} = \frac{C_{ii}}{\sum_{i = 1}^{n} C_{ij}} = Producer ’ s Accuracy,

(4)

Precision = \frac{TP}{TP + FP} = \frac{C_{ii}}{\sum_{i = 1}^{n} C_{ji}} = User ’ s Accuracy,

(5)

OA = \frac{TP + TN}{TP + FP + TN + FN},

(6)

K appa = \frac{P_{0} - P_{e}}{1 - P_{e}} {; P}_{0} = \frac{TP + TN}{TP + TN + FP + FN}; P_{e} = \frac{(TP + FP) \times (TP + FN) + (TN + FP) \times (TN + FN)}{{(TP + TN + FP + FN)}^{2}},

(7)

{F 1}_{s c o r e} = 2 \times \frac{UA \times PA}{UA + PA} = \frac{2 \times TP}{FP + 2 \times TP + FN}

(8)

where

n

is the number of categories;

C_{i j}

refers to the pixel of actual class

i

and correctly classified as

j

; TP, FP, TN, and FN represent the total number of true positive, false positive, true negative, and false negative pixels, respectively.

4. Results

4.1. Polarimetric Decomposition of RADARSAT-2 Data

In this study, polarization target decomposition was performed on RADARSAT-2’s fully polarized images using SNAP software to extract feature parameters. First, the scattering matrix S, covariance matrix C, and coherence matrix T were calculated from the RADARSAT-2 data. Next, various polarization decomposition methods were applied to extract polarimetric scattering information. A total of 26 polarization scattering features were extracted and are shown in Table 5, and the standard RGB composition images are presented in Figure 7. By visually inspecting each polarimetric feature image, we identified and excluded those with significant noise and a high number of corrupted pixels, ultimately retaining 23 polarization features.

4.2. Feature Preprocessing and Selection of Optimal Feature Subset

The feature space in this study consists of 40 features, including 7 vegetation indices, 10 salinity indices, and 23 polarimetric scattering features. These polarimetric decomposed features and spectral indices were normalized to a range due to their significant differences. Before inputting training data into the machine learning models for training, each feature F was normalized to a scale of [−1, 1]. The normalized formula is expressed below:

F_{scaled} = \frac{F - F_{\min}}{F_{\max} - F_{\min}}

(9)

where

F

represents the pixel value.

A total of 40 variables were ranked by Z-Score values using the Boruta algorithm, as shown in Figure 8. Among these, eight variables were identified as having significance levels exceeding the shadow features’ maximum significance (significance of 2.402). These variables including SI2, ENDVI, Sinclair_Dbl, NDGI, Cloude_Vol, SI, Cloude_Dbl, and Freeman_Durden_Odd were subsequently selected for the classification model. The MS dataset comprised four optical features (e.g., SI2, ENDVI, etc.). The SAR dataset consisted of four radar-derived polarimetric scattering features (e.g., Sinclair_Dbl, Cloude_Dbl, etc.). To balance the amount of information from the different datasets, a combined MS + SAR dataset was created by integrating the top two features from each group. The selected features strike a balance between data diversity and redundancy minimization, providing a robust foundation for accurate soil salinization classification.

4.3. Classification Results of MSA-U-Net

To quantitatively evaluate the performance of MSA-U-Net compared to other methods for monitoring salinized soil, the classification accuracies of the five classifiers were analyzed. This evaluation includes a detailed statistical summary and the corresponding confusion matrix, offering a comprehensive performance assessment.

As demonstrated in Figure 9, the classification results of the MS data (Figure 9(A-1–A-5)) display pixel misclassification, with mixed distributions observed among vegetation, water bodies, and saline soil. On average, the Overall Accuracy (OA) across the five classification methods for MS data is 77.11% (Table 6). Similarly, the classification results of the SAR data (Figure 9(B-1–B-5)) exhibit a lower classification accuracy, with evident effects of noise and outliers on the results, leading to an average OA of 53.23%. In contrast, the multi-source classification results (Figure 9(C-1–C-5)) show better outcomes, with reduced noise in the radar bands and fewer mixed land cover pixels, achieving the highest average OA of 80.37%.

The results in Table 6 show that the proposed MSA-U-Net yielded the highest OA (87.34%) and Kappa (0.84), outperforming traditional classifiers such as KNN (OA: 74.79%, Kappa: 0.69), SVM (OA: 77.66%, Kappa: 0.73), RF (OA: 79.10%, Kappa: 0.74), and the standard U-Net (OA: 82.98%, Kappa: 0.76). Notably, the MSA-U-Net also demonstrated superior PA and UA across most salinization classes.

4.4. Comparison and Analysis of Classification Results Based on Multi-Sources Data

To assess the performance of five methods (KNN, SVM, RF, U-Net, and MSA-U-Net) on these datasets and evaluate the impact of data integration, OA, Kappa, and F1 score were used.

Table 7 presents the classification performance of all tested models. On the MS dataset, traditional machine learning methods (KNN, SVM, RF) achieved moderate performance, with OA ranging from 73% to 77%. However, the deep learning methods (U-Net and MSA-U-Net) outperformed them, achieving an OA of 79.15% and 79.57%, respectively.

For the SAR dataset, there was a noticeable decrease in the extraction accuracy (OA, Kappa, and F1 score) compared to the MS dataset. Specifically, SAR classification results for deep learning methods showed a more significant decrease in accuracy compared to machine learning methods, with OA dropping by 5% (23% vs. 28%), Kappa by 0.04 (0.28 vs. 0.32), and F1 score by 0.04 (0.25 vs. 0.29).

When combining MS and SAR data, all models showed improved performance, with MSA-U-Net achieving the best results. OA increased significantly, rising from 79.57% to 87.34%. Additionally, Kappa improved by 0.09 (0.75 vs. 0.84), and the F1 score increased by 0.07 (0.80 vs. 0.87).

4.5. Characteristics of Spatial Distribution of Soil Salinity in Keriya Oasis

The classification results indicated a distinct spatial distribution of salinized soils in the Keriya Oasis. Highly salinized soils (HSs) are predominantly concentrated in the central and northeastern parts of the region, corresponding to areas with limited vegetation cover. These areas are typically characterized by bare soil or sparse vegetation, where the accumulation of salts is facilitated by higher evaporation rates and shallower groundwater tables. Moderately salinized soils (MSs) are mainly distributed in the ecotone between the Oasis and the Taklimakan Desert. Slightly salinized soils (SSs) are primarily located in the western and southern regions, often overlapping with vegetated zones, as the presence of vegetation is indicative of lower surface salt accumulation.

5. Discussion

5.1. Classification Performance Across Different Remote Sensing Data Sources

As demonstrated in Figure 9 and Table 6 and Table 7, the MS + SAR data significantly enhanced the classification performance compared to the single-source data (MS and SAR). The MS data, which capture spectral indices sensitive to vegetation characteristics, proved highly effective in identifying vegetation-dominated areas and low-salinity soils [113]. However, their limited sensitivity to soil structure posed challenges for accurately classifying saline soils. In contrast, the SAR data excelled in capturing soil texture and structure, enabling more effective differentiation of salinity levels [114]. Nevertheless, the radar data performed poorly in distinguishing vegetation types, highlighting the limitations of radar’s sensitivity to spectral characteristics and the negative impact of high-frequency noise, such as speckle, on classification results [115]. The combination of optical and radar data improved classification accuracy by utilizing the complementary strengths of both sources [92]. Multi-source data fusion significantly enhanced OA, Kappa, and F1 scores across all classification models, particularly for challenging categories like moderately and highly salinized soils. These results highlight the critical role of multi-source data fusion in improving classification performance, especially in complex landscapes characterized by diverse land cover types and varying salinity levels.

5.2. Comparison of MSA-U-Net with Traditional Machine Learning and Baseline Models

As shown in Table 7, DL methods, especially MSA-U-Net, were superior in extracting salinization information compared to the other three ML models in MS and MS + SAR data. It may be due to the fact that U-Net and MSA-U-Net are based on convolutional neural networks (CNNs), which effectively capture subtle differences in spectral features and reflectance associated with different salinity levels [116]. However, for the SAR dataset, both ML and DL methods obtained poor classification results, likely due to the model’s high sensitivity to high-frequency noise (e.g., speckle) [117].

From the comparative analysis results in Table 6, the MSA-U-Net achieved the best performance, especially in classifying saline soils. In contrast to RF and U-Net, the extraction accuracy for the slightly salinized soil improved from 70.37% and 72.22% to 86.96%, and for moderately salinized soil, it increased from 79.22% and 68.18% to 81.99%. In general, the MSA-U-Net also showed significant improvements in OA and Kappa. These improvements can be attributed to the integration of the CBAM. The CBAM effectively highlights regions with significant spectral variations, such as salinity transitions, and emphasizes spectral indices closely linked to soil salinity. By incorporating the CBAM, the MSA-U-Net achieves a well-balanced synergy between feature preservation and enhancement, resulting in superior performance in multi-source data fusion tasks.

5.3. Impact of Attention Mechanisms on Skip Connections

To evaluate the impact of integrating the CBAM into the U-Net model for extracting and analyzing multi-source data, we compared its classification performance against four established approaches. The baseline U-Net was used as the control, alongside three enhanced variants: the U-Net incorporating a Spatial Attention Module (SAM) in the skip connections (referred to as U-Net-SAM), the U-Net incorporating a Channel Attention Module (CAM) in the skip connections (referred to as U-Net-CAM), and the MSA-U-Net. To ensure a consistent comparison, all models shared identical encoding and decoding architectures, maintaining structural equivalence across all approaches.

The aforementioned algorithms utilized multi-source features as input. Table 8 presents a comparison of the classification results across four different models. The MSA-U-Net model achieved the highest performance, with an OA of 87.34%, a Kappa coefficient of 0.84, and an F1 score of 0.87, outperforming the other three models (Table 8). The U-Net-SAM and U-Net-CAM models showed comparable performance, each achieving an OA of approximately 86%. However, the baseline U-Net model demonstrated relatively poor performance, highlighting the limitations of the standard architecture without attention mechanisms.

The performance improvements in U-Net-CAM, U-Net-SAM, and MSA-U-Net over the baseline U-Net can be attributed to the incorporation of attention mechanisms. The CAM selectively amplifies important features across channels, refining feature representation [118]. The SAM enhances the model’s focus on critical spatial features, improving boundary detection, particularly in tasks with complex spatial patterns like salinization zones. The MSA-U-Net, which integrates both channel and spatial attention through the CBAM, demonstrates the best performance. This dual-attention mechanism effectively captures interdependencies across feature maps, allowing the model to distinguish subtle variations, such as differences in vegetation or salinity levels, which are crucial for accurate classification in multi-source data [119]. Moreover, integrating the CBAM into the skip connections helps preserve refined features during fusion, further boosting the model’s performance.

5.4. Potential and Limitations

Soil salinization, a significant global environmental issue, remains a key research focus. In the Keriya Oasis, salinity levels increase from the oasis toward the desert, a trend consistent with previous studies [33,44]. The superiority of deep learning combined with multi-source remote sensing over traditional machine learning approaches aligns with prior findings [57]. However, few studies have explored attention-based feature fusion strategies, which are crucial for effectively integrating multi-source data in soil salinization mapping. MSA-U-Net addresses this gap by incorporating such a strategy, enhancing its ability to capture complex spatial and spectral relationships. Specifically, the Convolutional Block Attention Module in skip connections refines feature representation by selectively emphasizing spatially and spectrally important information, improving the differentiation of subtle salinity variations. These factors likely contribute to its superior accuracy, yet further comparative studies are needed to fully validate its effectiveness and generalizability across different salinization contexts.

However, this study has several limitations. First, the labeling of saline soils is inherently subjective, and distinguishing between different salinity levels remains challenging, potentially leading to discrepancies between labeled data and actual conditions, thereby affecting classification accuracy. Second, while SAR data enhance classification performance, residual speckle noise in radar images may still introduce errors. Additionally, attention-based fusion strategies struggle to filter redundant information in real time and are highly sensitive to differences between optical and SAR images, making the effective fusion of multi-source data a persistent challenge in remote sensing. Research on land use and soil salinity monitoring using deep learning and multi-source remote sensing data remains limited. Future studies should focus on improving the deep feature fusion of optical and SAR images and exploring more advanced deep learning models (e.g., Vision Transformer [120], Vision Mamba [121]) and multi-source feature fusion strategies (e.g., linear feature fusion [122], gated-based feature fusion [123]) to enhance the accuracy and reliability of soil salinity monitoring.

6. Conclusions

A novel deep fusion U-Net model, MSA-U-Net, is proposed based on multi-source data to optimize salinized soil mapping performance across different degrees of salinization. To improve mapping accuracy, the model integrates the Convolutional Block Attention Module (CBAM) into the skip connections, combining the benefits of spatial and channel attention mechanisms. This dual-attention approach allows the model to capture both spatial and channel-wise interdependencies effectively, leading to more accurate feature extraction and classification. With an OA of 87.34%, a Kappa of 0.84, and an F1 score of 0.87 in the study area, the MSA-U-Net demonstrated excellent potential for precise and reliable salinized soil mapping. These findings underscore the advantages of integrating attention mechanisms into U-Net for multi-source remote sensing tasks.

The findings of this research offer a novel method for assessing soil salinization in the Keriya Oasis, aiding the region’s ecological stability. Furthermore, the MSA-U-Net model demonstrates its potential for broader remote sensing applications where multi-source data integration is critical, offering a novel perspective for monitoring soil salinity.

Author Contributions

Y.X. and I.N. conceptualized the manuscript’s topic and were responsible for the overall direction and planning. X.L., X.Y. and A.G. contributed to field data collection. Y.X. conducted the data analysis and drafted the manuscript. X.L., A.A. and S.L. reviewed and revised the first draft. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Xinjiang Uygur Autonomous Region, China (No.2024D01C34); the Third Xinjiang Comprehensive Scientific Expedition (No.2022xjkk0301); the National Natural Science Foundation of China (No.42061065, No.32160319); and the 2024 National Undergraduates Training Program for Innovation of Xinjiang University (No.202410755011).

Data Availability Statement

The datasets are available from the corresponding author upon reasonable request.

Acknowledgments

All authors sincerely appreciate the invaluable feedback provided by all those who contributed to the refinement of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Singh, A. Soil salinization management for sustainable development: A review. J. Environ. Manag. 2021, 277, 111383. [Google Scholar] [CrossRef] [PubMed]
Singh, A. Soil salinization and waterlogging: A threat to environment and agricultural sustainability. Ecol. Indic. 2015, 57, 128–130. [Google Scholar] [CrossRef]
Hafez, E.M.; Omara, A.E.D.; Alhumaydhi, F.A.; El-Esawi, M.A. Minimizing hazard impacts of soil salinity and water stress on wheat plants by soil application of vermicompost and biochar. Physiol. Plant. 2021, 172, 587–602. [Google Scholar] [CrossRef]
Li, J.; Pu, L.; Han, M.; Zhu, M.; Zhang, R.; Xiang, Y. Soil salinization research in China: Advances and prospects. J. Geogr. Sci. 2014, 24, 943–960. [Google Scholar] [CrossRef]
Barbouchi, M.; Abdelfattah, R.; Chokmani, K.; Aissa, N.B.; Lhissou, R.; El Harti, A. Soil salinity characterization using polarimetric InSAR coherence: Case studies in Tunisia and Morocco. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 8, 3823–3832. [Google Scholar] [CrossRef]
Fathizad, H.; Ardakani, M.A.H.; Heung, B.; Sodaiezadeh, H.; Rahmani, A.; Fathabadi, A.; Scholten, T.; Taghizadeh-Mehrjardi, R. Spatio-temporal dynamic of soil quality in the central Iranian desert modeled with machine learning and digital soil assessment techniques. Ecol. Indic. 2020, 118, 106736. [Google Scholar] [CrossRef]
Peng, J.; Biswas, A.; Jiang, Q.; Zhao, R.; Hu, J.; Hu, B.; Shi, Z. Estimating soil salinity from remote sensing and terrain data in southern Xinjiang Province, China. Geoderma 2019, 337, 1309–1319. [Google Scholar] [CrossRef]
Ma, G.; Ding, J.; Han, L.; Zhang, Z.; Ran, S. Digital mapping of soil salinization based on Sentinel-1 and Sentinel-2 data combined with machine learning algorithms. Reg. Sustain. 2021, 2, 177–188. [Google Scholar] [CrossRef]
Asfaw, E.; Suryabhagavan, K.; Argaw, M. Soil salinity modeling and mapping using remote sensing and GIS: The case of Wonji sugar cane irrigation farm, Ethiopia. J. Saudi Soc. Agric. Sci. 2018, 17, 250–258. [Google Scholar] [CrossRef]
Yu, H.; Liu, M.; Du, B.; Wang, Z.; Hu, L.; Zhang, B. Mapping soil salinity/sodicity by using Landsat OLI imagery and PLSR algorithm over semiarid West Jilin Province, China. Sensors 2018, 18, 1048. [Google Scholar] [CrossRef]
Metternicht, G.I.; Zinck, J. Remote sensing of soil salinity: Potentials and constraints. Remote Sens. Environ. 2003, 85, 1–20. [Google Scholar] [CrossRef]
Ulaby, F.; Allen, C.; Eger Iii, G.; Kanemasu, E. Relating the microwave backscattering coefficient to leaf area index. Remote Sens. Environ. 1984, 14, 113–133. [Google Scholar] [CrossRef]
Ulaby, F.T.; Moore, R.K.; Fung, A.K. Microwave Remote Sensing: Active and Passive. Volume 2-Radar Remote Sensing and Surface Scattering and Emission Theory; NASA: Washington, DC, USA, 1982. [Google Scholar]
Rhoades, J.; Chanduvi, F.; Lesch, S. Soil Salinity Assessment: Methods and Interpretation of Electrical Conductivity Measurements; Food & Agriculture Organization: Rome, Italy, 1999. [Google Scholar]
Grissa, M.; Abdelfattah, R.; Mercier, G.; Zribi, M.; Chahbi, A.; Lili-Chabaane, Z. Empirical model for soil salinity mapping from SAR data. In Proceedings of the 2011 IEEE International Geoscience and Remote Sensing Symposium, Vancouver, BC, Canada, 24–29 July 2011; pp. 1099–1102. [Google Scholar]
Han, P.; Chen, Z.; Wan, Y.; Cheng, Z. PolSAR image classification based on optimal feature and convolution neural network. In Proceedings of the IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 1735–1738. [Google Scholar]
Qi, Z.; Yeh, A.G.-O.; Li, X.; Lin, Z. A novel algorithm for land use and land cover classification using RADARSAT-2 polarimetric SAR data. Remote Sens. Environ. 2012, 118, 21–39. [Google Scholar] [CrossRef]
Zhang, Q.; Li, L.; Sun, R.; Zhu, D.; Zhang, C.; Chen, Q. Retrieval of the soil salinity from Sentinel-1 Dual-Polarized SAR data based on deep neural network regression. IEEE Geosci. Remote Sens. Lett. 2020, 19, 4006905. [Google Scholar] [CrossRef]
Chen, S.-W.; Wang, X.-S.; Xiao, S.-P. Urban damage level mapping based on co-polarization coherence pattern using multitemporal polarimetric SAR data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 2657–2667. [Google Scholar] [CrossRef]
Musthafa, M.; Khati, U.; Singh, G. Sensitivity of PolSAR decomposition to forest disturbance and regrowth dynamics in a managed forest. Adv. Space Res. 2020, 66, 1863–1875. [Google Scholar] [CrossRef]
Wei, Q.; Nurmemet, I.; Gao, M.; Xie, B. Inversion of soil salinity using multisource remote sensing data and particle swarm machine learning models in Keriya Oasis, northwestern China. Remote Sens. 2022, 14, 512. [Google Scholar] [CrossRef]
Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.-S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
Kussul, N.; Lavreniuk, M.; Skakun, S.; Shelestov, A. Deep learning classification of land cover and crop types using remote sensing data. IEEE Geosci. Remote Sens. Lett. 2017, 14, 778–782. [Google Scholar] [CrossRef]
Garg, R.; Kumar, A.; Bansal, N.; Prateek, M.; Kumar, S. Semantic segmentation of PolSAR image data using advanced deep learning model. Sci. Rep. 2021, 11, 15365. [Google Scholar] [CrossRef]
Gao, J.; Deng, B.; Qin, Y.; Wang, H.; Li, X. Enhanced radar imaging using a complex-valued convolutional neural network. IEEE Geosci. Remote Sens. Lett. 2018, 16, 35–39. [Google Scholar] [CrossRef]
Fu, G.; Liu, C.; Zhou, R.; Sun, T.; Zhang, Q. Classification for high resolution remote sensing imagery using a fully convolutional network. Remote Sens. 2017, 9, 498. [Google Scholar] [CrossRef]
Garcia-Garcia, A.; Orts-Escolano, S.; Oprea, S.; Villena-Martinez, V.; Garcia-Rodriguez, J. A review on deep learning techniques applied to semantic segmentation. arXiv 2017, arXiv:1704.06857. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; proceedings, part III 18. 2015; pp. 234–241. [Google Scholar]
Han, Z.; Dian, Y.; Xia, H.; Zhou, J.; Jian, Y.; Yao, C.; Wang, X.; Li, Y. Comparing fully deep convolutional neural networks for land cover classification with high-spatial-resolution Gaofen-2 images. ISPRS Int. J. Geo-Inf. 2020, 9, 478. [Google Scholar] [CrossRef]
Akca, S.; Gungor, O. Semantic segmentation of soil salinity using in-situ EC measurements and deep learning based U-NET architecture. Catena 2022, 218, 106529. [Google Scholar] [CrossRef]
Solórzano, J.V.; Mas, J.F.; Gao, Y.; Gallardo-Cruz, J.A. Land use land cover classification with U-net: Advantages of combining sentinel-1 and sentinel-2 imagery. Remote Sens. 2021, 13, 3600. [Google Scholar] [CrossRef]
Clark, A.; Phinn, S.; Scarth, P. Optimised U-Net for land use–land cover classification using aerial photography. PFG–J. Photogramm. Remote Sens. Geoinf. Sci. 2023, 91, 125–147. [Google Scholar] [CrossRef]
Zhao, J.; Nurmemet, I.; Muhetaer, N.; Xiao, S.; Abulaiti, A. Monitoring Soil Salinity Using Machine Learning and the Polarimetric Scattering Features of PALSAR-2 Data. Sustainability 2023, 15, 7452. [Google Scholar] [CrossRef]
Wang, J.; Li, W.; Gao, Y.; Zhang, M.; Tao, R.; Du, Q. Hyperspectral and SAR Image Classification via Multiscale Interactive Fusion Network. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 10823–10837. [Google Scholar] [CrossRef]
Chen, B.; Huang, B.; Xu, B. Multi-source remotely sensed data fusion for improving land cover classification. ISPRS J. Photogramm. Remote Sens. 2017, 124, 27–39. [Google Scholar] [CrossRef]
Hughes, L.H.; Marcos, D.; Lobry, S.; Tuia, D.; Schmitt, M. A deep learning framework for matching of SAR and optical imagery. ISPRS J. Photogramm. Remote Sens. 2020, 169, 166–179. [Google Scholar] [CrossRef]
Sommervold, O.; Gazzea, M.; Arghandeh, R. A survey on SAR and optical satellite image registration. Remote Sens. 2023, 15, 850. [Google Scholar] [CrossRef]
Kulkarni, S.C.; Rege, P.P. Pixel level fusion techniques for SAR and optical images: A review. Inf. Fusion 2020, 59, 13–29. [Google Scholar] [CrossRef]
Chen, C.; Yuan, X.; Gan, S.; Kang, X.; Luo, W.; Li, R.; Bi, R.; Gao, S. A new strategy based on multi-source remote sensing data for improving the accuracy of land use/cover change classification. Sci. Rep. 2024, 14, 26855. [Google Scholar] [CrossRef] [PubMed]
Zhang, G.; Roslan, S.N.A.B.; Wang, C.; Quan, L. Research on land cover classification of multi-source remote sensing data based on improved U-net network. Sci. Rep. 2023, 13, 16275. [Google Scholar] [CrossRef] [PubMed]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Haut, J.M.; Fernandez-Beltran, R.; Paoletti, M.E.; Plaza, J.; Plaza, A. Remote sensing image superresolution using deep residual channel attention. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9277–9289. [Google Scholar] [CrossRef]
Yu, X.; Qian, Y.; Geng, Z.; Huang, X.; Wang, Q.; Zhu, D. EMC²A-Net: An Efficient Multibranch Cross-Channel Attention Network for SAR Target Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5210314. [Google Scholar] [CrossRef]
Nurmemet, I.; Sagan, V.; Ding, J.-L.; Halik, Ü.; Abliz, A.; Yakup, Z. A WFS-SVM model for soil salinity mapping in Keriya Oasis, Northwestern China using polarimetric decomposition and fully PolSAR data. Remote Sens. 2018, 10, 598. [Google Scholar] [CrossRef]
Du, M.; Li, L.; Luo, G.; Dong, K.; Shi, Q. Effects of climate and land use change on agricultural water consumption in Yutian Oasis. Bull. Soil Water Conserv. 2020, 40, 103–109. [Google Scholar]
Muhetaer, N.; Nurmemet, I.; Abulaiti, A.; Xiao, S.; Zhao, J. An efficient approach for inverting the soil salinity in Keriya Oasis, northwestern China, based on the optical-radar feature-space model. Sensors 2022, 22, 7226. [Google Scholar] [CrossRef]
Yang, S.; Yang, T. Exploration of the dynamic water resource carrying capacity of the Keriya River Basin on the southern margin of the Taklimakan Desert, China. Reg. Sustain. 2021, 2, 73–82. [Google Scholar] [CrossRef]
Mamat, Z.; Yimit, H.; Lv, Y. Spatial distributing pattern of salinized soils and their salinity in typical area of Yutian Oasis. Chin. J. Soil Sci. 2013, 44, 1314–1320. [Google Scholar]
Zhang, B.; Perrie, W.; He, Y. Validation of RADARSAT-2 fully polarimetric SAR measurements of ocean surface waves. J. Geophys. Res. Oceans 2010, 115. [Google Scholar] [CrossRef]
Jiang, Y.; Ding, F.; Ma, R.; Li, X.; Xu, X. Atmospheric correction algorithms comparison for Lake Water based on Landsat8 images. In Proceedings of the 2018 Fifth International Workshop on Earth Observation and Remote Sensing Applications (EORSA), Xi’an, China, 18–20 June 2018; pp. 1–5. [Google Scholar]
Faizan, M. Radarsat-2 Data Processing Using SNAP Software; Anna University: Chennai, India, 2020. [Google Scholar]
Liu, X.; Chi, C. Relationships of electrical conductivity between 1:5 soil/water extracts and saturation paste extracts of salt-affected soils in southern Xinjiang. Jiangsu Agric. Sci. 2015, 43, 289–291. [Google Scholar] [CrossRef]
Tian Yong, J.Z.; Chen, L.; Xi, H.; Zhang, B.; Gan, K. Characteristics of soil water and salt spatial differentiation along the Yellow River section of Ulan Buh Desert and its causes. J. Desert Res. 2024, 44, 247–258. [Google Scholar] [CrossRef]
Gao, Z.; Li, X.; Zuo, L.; Zou, B.; Wang, B.; Wang, W.J. Unveiling soil salinity patterns in soda saline-alkali regions using Sentinel-2 and SDGSAT-1 thermal infrared data. Remote Sens. Environ. 2025, 322, 114708. [Google Scholar] [CrossRef]
Han, L.; Ding, J.; Zhang, J.; Chen, P.; Wang, J.; Wang, Y.; Wang, J.; Ge, X.; Zhang, Z. Precipitation events determine the spatiotemporal distribution of playa surface salinity in arid regions: Evidence from satellite data fused via the enhanced spatial and temporal adaptive reflectance fusion model. Catena 2021, 206, 105546. [Google Scholar] [CrossRef]
Abuelgasim, A.; Ammad, R. Mapping soil salinity in arid and semi-arid regions using Landsat 8 OLI satellite data. RSASE 2019, 13, 415–425. [Google Scholar] [CrossRef]
Abulaiti, A.; Nurmemet, I.; Muhetaer, N.; Xiao, S.; Zhao, J. Monitoring of soil salinization in the Keriya Oasis based on deep learning with PALSAR-2 and Landsat-8 datasets. Sustainability 2022, 14, 2666. [Google Scholar] [CrossRef]
Brady, N.C.; Weil, R.R.; Weil, R.R. The Nature and Properties of Soils; Prentice Hall: Upper Saddle River, NJ, USA, 2008; Volume 13. [Google Scholar]
Chi, M.; Feng, R.; Bruzzone, L. Classification of hyperspectral remote-sensing data with primal SVM for small-sized training dataset problem. Adv. Space Res. 2008, 41, 1793–1799. [Google Scholar] [CrossRef]
Sawangarreerak, S.; Thanathamathee, P. Random forest with sampling techniques for handling imbalanced prediction of university student depression. Information 2020, 11, 519. [Google Scholar] [CrossRef]
Goetz, A.F.; Rock, B.N.; Rowan, L.C. Remote sensing for exploration; an overview. Econ. Geol. 1983, 78, 573–590. [Google Scholar] [CrossRef]
Li, X.; Zhang, F.; Wang, Z. Present situation and development trend of remote sensing monitoring model for soil salinization. Remote Sens. Nat. Resour. 2022, 34, 11–21. [Google Scholar] [CrossRef]
Fernandez-Buces, N.; Siebe, C.; Cram, S.; Palacio, J. Mapping soil salinity using a combined spectral response index for bare soil and vegetation: A case study in the former lake Texcoco, Mexico. J. Arid Environ. 2006, 65, 644–667. [Google Scholar] [CrossRef]
Peñuelas, J.; Llusià, J. Effects of carbon dioxide, water supply, and seasonality on terpene content and emission by Rosmarinus officinalis. J. Chem. Ecol. 1997, 23, 979–993. [Google Scholar] [CrossRef]
Allbed, A.; Kumar, L.; Aldakheel, Y.Y. Assessing soil salinity using soil salinity and vegetation indices derived from IKONOS high-spatial resolution imageries: Applications in a date palm dominated region. Geoderma 2014, 230, 1–8. [Google Scholar] [CrossRef]
Zhang, Z.; Ding, J.; Zhu, C.; Wang, J.; Ma, G.; Ge, X.; Li, Z.; Han, L. Strategies for the efficient estimation of soil organic matter in salt-affected soils through Vis-NIR spectroscopy: Optimal band combination algorithm and spectral degradation. Geoderma 2021, 382, 114729. [Google Scholar] [CrossRef]
Alavipanah, S.K.; Damavandi, A.A.; Mirzaei, S.; Rezaei, A.; Hamzeh, S.; Matinfar, H.R.; Teimouri, H.; Javadzarrin, I. Remote sensing application in evaluation of soil characteristics in desert areas. Nat. Environ. Change 2016, 2, 1–24. [Google Scholar]
Chen, J.M. Evaluation of vegetation indices and a modified simple ratio for boreal applications. CaJRS 1996, 22, 229–242. [Google Scholar] [CrossRef]
DeFries, R.S.; Townshend, J. NDVI-derived land cover classifications at a global scale. Int. J. Remote Sens. 1994, 15, 3567–3586. [Google Scholar] [CrossRef]
Huete, A.R. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
Gitelson, A.A.; Kaufman, Y.J.; Merzlyak, M.N. Use of a green channel in remote sensing of global vegetation from EOS-MODIS. Remote Sens. Environ. 1996, 58, 289–298. [Google Scholar] [CrossRef]
Jordan, C.F. Derivation of leaf-area index from quality of light on the forest floor. Ecology 1969, 50, 663–666. [Google Scholar] [CrossRef]
Nedkov, R. Normalized Differential Greenness Index for vegetation dynamics assessment. Comptes Rendus L’academie Bulg. Sci. 2017, 70, 1143–1146. [Google Scholar]
Hongyan, C.; Gengxing, Z.; Jingchun, C.; Ruiyan, W.; Mingxiu, G. Remote sensing inversion of saline soil salinity based on modified vegetation index in estuary area of Yellow River. Trans. Chin. Soc. Agric. Eng. 2015, 31, 107–114. [Google Scholar] [CrossRef]
Tripathi, N.; Rai, B.K.; Dwivedi, P. Spatial modeling of soil alkalinity in GIS environment using IRS data. In Proceedings of the Proceedings of the 18th Asian Conference on Remote Sensing, Kuala Lumpur, Malaysia, 20–24 October 1997; pp. 81–86. [Google Scholar]
Khan, N.M.; Rastoskuev, V.V.; Sato, Y.; Shiozawa, S. Assessment of hydrosaline land degradation by using a simple approach of remote sensing indicators. Agric. Water Manag. 2005, 77, 96–109. [Google Scholar] [CrossRef]
Nicolas, H.; Walter, C. Detecting salinity hazards within a semiarid context by means of combining soil and remote-sensing data. Geoderma 2006, 134, 217–230. [Google Scholar] [CrossRef]
Abbas, A.; Khan, S. Using remote sensing techniques for appraisal of irrigated soil salinity. In Proceedings of the International Congress on Modelling and Simulation (MODSIM), Christchurch, New Zealand, 10–13 December 2007; pp. 2632–2638. [Google Scholar]
Taghizadeh-Mehrjardi, R.; Minasny, B.; Sarmadian, F.; Malone, B. Digital mapping of soil salinity in Ardakan region, central Iran. Geoderma 2014, 213, 15–28. [Google Scholar] [CrossRef]
Huynen, J.R. Phenomenological Theory of Radar Targets; Academic Press: Cambridge, MA, USA, 1970. [Google Scholar]
Trudel, M.; Magagi, R.; Granberg, H.B. Application of target decomposition theorems over snow-covered forested areas. IEEE Trans. Geosci. Remote Sens. 2009, 47, 508–512. [Google Scholar] [CrossRef]
Mishra, P.; Singh, D.; Yamaguchi, Y. Land cover classification of PALSAR images by knowledge based decision tree classifier and supervised classifiers based on SAR observables. Prog. Electromagn. Res. B 2011, 30, 47–70. [Google Scholar] [CrossRef]
An, W.; Cui, Y.; Yang, J. Three-component model-based decomposition for polarimetric SAR data. IEEE Trans. Geosci. Remote Sens. 2010, 48, 2732–2739. [Google Scholar] [CrossRef]
Cloude, S.R.; Pottier, E. A review of target decomposition theorems in radar polarimetry. IEEE Trans. Geosci. Remote Sens. 1996, 34, 498–518. [Google Scholar] [CrossRef]
Freeman, A.; Durden, S.L. A three-component scattering model for polarimetric SAR data. IEEE Trans. Geosci. Remote Sens. 1998, 36, 963–973. [Google Scholar] [CrossRef]
Yamaguchi, Y.; Moriyama, T.; Ishido, M.; Yamada, H. Four-component scattering model for polarimetric SAR image decomposition. IEEE Trans. Geosci. Remote Sens. 2005, 43, 1699–1706. [Google Scholar] [CrossRef]
Cloude, S.R. Target decomposition theorems in radar scattering. ElL 1985, 21, 22–24. [Google Scholar] [CrossRef]
van Zyl, J.J. Application of Cloude’s target decomposition theorem to polarimetric imaging radar data. In Proceedings of the Radar Polarimetry; SPIE: St. Bellingham, WA, USA, 1993; pp. 184–191. [Google Scholar]
Cloude, S.R.; Pottier, E. An entropy based classification scheme for land applications of polarimetric SAR. IEEE Trans. Geosci. Remote Sens. 1997, 35, 68–78. [Google Scholar] [CrossRef]
Touzi, R. Target scattering decomposition in terms of roll-invariant target parameters. IEEE Trans. Geosci. Remote Sens. 2006, 45, 73–84. [Google Scholar] [CrossRef]
Kursa, M.B.; Jankowski, A.; Rudnicki, W.R. Boruta—A system for feature selection. Fundam. Informaticae 2010, 101, 271–285. [Google Scholar] [CrossRef]
Xiao, S.; Nurmemet, I.; Zhao, J. Soil salinity estimation based on machine learning using the GF-3 radar and Landsat-8 data in the Keriya Oasis, Southern Xinjiang, China. Plant Soil. 2024, 498, 451–469. [Google Scholar] [CrossRef]
Cover, T.; Hart, P. Nearest neighbor pattern classification. ITIT 1967, 13, 21–27. [Google Scholar] [CrossRef]
Yu, Z. Research on Remote Sensing Image Terrain Classification Algorithm Based on Improved KNN. In Proceedings of the 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE), Dalian, China, 27–29 September 2020; pp. 569–573. [Google Scholar]
Samaniego, L.; Bárdossy, A.; Schulz, K. Supervised classification of remotely sensed imagery using a modified k-NN technique. IEEE Trans. Geosci. Remote Sens. 2008, 46, 2112–2125. [Google Scholar] [CrossRef]
Hechenbichler, K.; Schliep, K. Weighted k-Nearest-Neighbor Techniques and Ordinal Classification; Ludwig Maximilian University of Munich: Munich, Germany, 2004. [Google Scholar] [CrossRef]
Haq, Y.U.; Shahbaz, M.; Asif, S.; Ouahada, K.; Hamam, H. Identification of Soil Types and Salinity Using MODIS Terra Data and Machine Learning Techniques in Multiple Regions of Pakistan. Sensors 2023, 23, 8121. [Google Scholar] [CrossRef] [PubMed]
Vermeulen, D.; Van Niekerk, A. Machine learning performance for predicting soil salinity using different combinations of geomorphometric covariates. Geoderma 2017, 299, 1–12. [Google Scholar] [CrossRef]
Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2011, 66, 247–259. [Google Scholar] [CrossRef]
Mantero, P.; Moser, G.; Serpico, S.B. Partially supervised classification of remote sensing images through SVM-based probability density estimation. IEEE Trans. Geosci. Remote Sens. 2005, 43, 559–570. [Google Scholar] [CrossRef]
Wang, J.; Peng, J.; Li, H.; Yin, C.; Liu, W.; Wang, T.; Zhang, H. Soil salinity mapping using machine learning algorithms with the Sentinel-2 MSI in arid areas, China. Remote Sens. 2021, 13, 305. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Wang, F.; Yang, S.; Wei, Y.; Shi, Q.; Ding, J. Characterizing soil salinity at multiple depth using electromagnetic induction and remote sensing data with random forests: A case study in Tarim River Basin of southern Xinjiang, China. Sci. Total Environ. 2021, 754, 142030. [Google Scholar] [CrossRef]
Fathizad, H.; Ardakani, M.A.H.; Sodaiezadeh, H.; Kerry, R.; Taghizadeh-Mehrjardi, R. Investigation of the spatial and temporal variation of soil salinity using random forests in the central desert of Iran. Geoderma 2020, 365, 114233. [Google Scholar] [CrossRef]
Wang, F.; Shi, Z.; Biswas, A.; Yang, S.; Ding, J. Multi-algorithm comparison for predicting soil salinity. Geoderma 2020, 365, 114211. [Google Scholar] [CrossRef]
Gislason, P.O.; Benediktsson, J.A.; Sveinsson, J.R. Random forest classification of multisource remote sensing and geographic data. In Proceedings of the IGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing Symposium, Anchorage, AK, USA, 20–24 September 2004; pp. 1049–1052. [Google Scholar]
Sheykhmousa, M.; Mahdianpari, M.; Ghanbari, H.; Mohammadimanesh, F.; Ghamisi, P.; Homayouni, S. Support vector machine versus random forest for remote sensing image classification: A meta-analysis and systematic review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 6308–6325. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T.; Navab, N.; Hornegger, J.; Wells, W.M.; Frangi, A.F. Medical image computing and computer-assisted intervention–MICCAI 2015. Lect. Notes Comput. Sci. 2015, 9351, 234–241. [Google Scholar]
Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
Kinga, D.; Adam, J.B. A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015; p. 6. [Google Scholar]
Congalton, R.G. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens. Environ. 1991, 37, 35–46. [Google Scholar] [CrossRef]
Dehaan, R.; Taylor, G. Field-derived spectra of salinized soils and vegetation as indicators of irrigation-induced soil salinization. Remote Sens. Environ. 2002, 80, 406–417. [Google Scholar] [CrossRef]
Ravi, K.P.; Periasamy, S. Systematic discrimination of irrigation and upheaval associated salinity using multitemporal SAR data. Sci. Total Environ. 2021, 790, 148148. [Google Scholar] [CrossRef]
Periasamy, S.; Ravi, K.P.; Tansey, K. Identification of saline landscapes from an integrated SVM approach from a novel 3-D classification schema using Sentinel-1 dual-polarized SAR data. Remote Sens. Environ. 2022, 279, 113144. [Google Scholar] [CrossRef]
Garajeh, M.K.; Malakyar, F.; Weng, Q.; Feizizadeh, B.; Blaschke, T.; Lakes, T. An automated deep learning convolutional neural network algorithm applied for soil salinity distribution mapping in Lake Urmia, Iran. Sci. Total Environ. 2021, 778, 146253. [Google Scholar] [CrossRef]
Liu, C.; Sun, Y.; Xu, Y.; Sun, Z.; Zhang, X.; Lei, L.; Kuang, G. A review of optical and SAR image deep feature fusion in semantic segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 12910–12930. [Google Scholar] [CrossRef]
Muhammad, W.; Aramvith, S.; Onoye, T. SENext: Squeeze-and-ExcitationNext for single image super-resolution. IEEE Access 2023, 11, 45989–46003. [Google Scholar] [CrossRef]
Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
Aleissaee, A.A.; Kumar, A.; Anwer, R.M.; Khan, S.; Cholakkal, H.; Xia, G.-S.; Khan, F.S. Transformers in remote sensing: A survey. Remote Sens. 2023, 15, 1860. [Google Scholar] [CrossRef]
Liu, X.; Zhang, C.; Zhang, L. Vision mamba: A comprehensive survey and taxonomy. arXiv 2024, arXiv:2405.04404. [Google Scholar] [CrossRef]
Zhang, H.; Wan, L.; Wang, T.; Lin, Y.; Lin, H.; Zheng, Z. Impervious surface estimation from optical and polarimetric SAR data using small-patched deep convolutional networks: A comparative study. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 2374–2387. [Google Scholar] [CrossRef]
Kang, W.; Xiang, Y.; Wang, F.; You, H. CFNet: A cross fusion network for joint land cover classification using optical and SAR images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 1562–1574. [Google Scholar] [CrossRef]

Figure 1. Overview of the study area. (A) Location of Xinjiang, China; (B) Yutian County, Xinjiang; (C) location of study area, Yutian County; (D) Landsat-8 OLI false-color optical remote sensing image of the study area; (E) conceptual map of the geographical environment of the study area.

Figure 2. (A) Spatial distribution of the field survey sites and selected training and validation plots; (B) (a–f) photos of in-field observations; (C) soil sampling scheme and collection.

Figure 3. Examples of some training samples in the Keriya Oasis in detail. (a) Field photographs (taken in July 2022); (b) original image plots of Landsat-8 OLI (true-color band combinations: R: Red, G: Green, B: Blue); (c) image plots of Landsat-8 OLI (false-color band combinations: R: NIR, G: Red, B: Green); (d) RGB composite images from Pauli decomposition (R: Pauli_r, G: Pauli_g, B: Pauli_b); (e) label.

Figure 4. Overall research process.

Figure 5. Encoder–decoder-based structure of MSA-U-Net.

Figure 6. The structural diagram of the Convolutional Block Attention Module.

Figure 7. RGB composite images (R: double scattering (*_Dbl)/; B: surface scattering (*_Odd)/; G: volume scattering (*_Vol)) of different polarimetric decompositions: (A) Yamaguchi decomposition, (B) Pauli decomposition, (C) H/A/Alpha decomposition, (D) Freeman–Durden decomposition, (E) Cloude decomposition, (F) Vanzyl decomposition, (G) Sinclair decomposition.

Figure 8. Importance of variables using the Boruta algorithm. (Green variables are important features, yellow variables are tentative features, blue variables are shadow features, and red variables are unimportant.)

Figure 9. Classification results. (A–C) represent MS data, SAR data, and MS + SAR data, respectively; (A-1–C-1) KNN classification; (A-2–C-2) SVM classification; (A-3–C-3) RF classification; (A-4–C-4) U-Net classification; (A-5–C-5) MSA-U-Net classification.

Table 1. The main parameters of remote sensing data.

Remote Sensing	RADARSAT-2	Landsat-8
Map Projection	WGS84 (DD)	UTM
Sensor	C-band synthetic aperture radar	Operational Land Imager
Data Observation Date	6 May 2022	18 May 2022
Product Type	SLC	L2SP
Nominal Resolution	5.5 m $\times$ 4.8 m	30 m
Incident angle/Orbit Inclination Angle	42.1°	98.2°
Revisit Time	24 d	16 d
Orbit Type	Sun Synchronous Orbit	Sun Synchronous Orbit
Satellite Attitude	798 km	705 km
Band Number	---	11
Polarizations	HH, HV, VV, VH	---

Table 2. Land use and soil salinization classes exhibiting different degrees of soil salinization, their characteristics, and corresponding pictures.

Symbol	Class	Characteristics
WB	Water Body	Salt lakes, rivers and tributaries, swamps, ponds, and reservoirs.
VG	Vegetation	Grassland, farmland, red willow, populus euphratica, reed, camel thorn, and dried riverbed.
BL	Barren Land	Gobi, desert, wasteland, and bare land.
SS	Slightly Salinized soil	EC value 2–4 (dS/m), covered by a thin salt crust (0–2 cm), and the vegetation coverage is around 30%.
MS	Moderately Salinized soil	EC value of 4–8 (dS/m), with a salt crust thickness of 1 to 4 cm and a vegetation coverage of around 5 to 15%.
HS	Highly Salinized soil	EC value of 8–16 (dS/m), covered by a thin salt crust (4–10 cm), and the vegetation coverage is less than 5%.

Table 3. The number of plots for each DL training and validation sample and the number of pixels selected for ML.

Class	Abbreviation	Training (70%)		Validation (30%)
Class	Abbreviation	Plots	Pixels	Plots	Pixels
Vegetation	VG	42	4290	18	1810
Barren Land	BL	20	2877	9	1090
Water Body	WB	18	1576	8	1004
Slightly Salinized soil	SS	35	3542	15	1406
Moderately Salinized soil	MS	32	3195	14	1367
Highly Salinized soil	HS	33	3433	16	1743
Total	-	180	17,913	70	8420

Table 4. Summary of the selected spectral indices.

Category	Index	Calculation Formula	Reference
Vegetation Indices	Simple Ratio Vegetation Index (SR)	$SR = NIR / R$	[68]
	Normalized Difference Vegetation Index (NDVI)	$NDVI = (NIR - R) / (NIR + R)$	[69]
	Soil Adjusted Vegetation Index (SAVI)	$SAVI = [(NIR - R) \times 1.5] / [(NIR + R + 0.5)]$	[70]
	Green Normalized Difference Vegetation Index (GNDVI)	$GNDVI = (NIR - G) / (NIR + G)$	[71]
	Differential Vegetation Index (DVI)	$DVI = NIR - R$	[72]
	Normalized Difference Green Index (NDGI)	$NDGI = (G - R) / (G + R)$	[73]
	Enhanced Normalized Vegetation Index (ENDVI)	$ENDVI = (NIR + SWIR 2 - R) / (NIR + SWIR 2 + R)$	[74]
Soil-related Indices	Salinity Index (SI-T)	$SI - T = (R / NIR) \times 100$	[75]
	Salinity Index (SI1)	$SI 1 = \sqrt{R \times G}$	[76]
	Salinity Index (SI2)	$SI 2 = \sqrt{G^{2} \times R^{2} \times {NIR}^{2}}$	[77]
	Salinity Index (SI3)	$SI 3 = \sqrt{G^{2} \times R^{2}}$	[77]
	Salinity Index (SI4)	$SI 4 = (R \times B) / G$	[78]
	Salinity Index (S1)	$S 1 = B / R$	[77]
	Salinity Index (S2)	$S 2 = (B - R) / (B + R)$	[78]
	Salinity Index (S3)	$S 3 = (R \times G) / B$	[78]
	Salinity Ratio Index (SAIO)	$SAIO = (R - NIR) / (G + NIR)$	[79]
	Brightness Index (BRI)	$BRI = \sqrt{G^{2} + R^{2}}$	[79]

Table 5. Polarimetric scattering features derived from RADARSAT-2 data.

Polarization Decomposition	Number of Features	Polarimetric Scattering Features
Pauli	3	Pauli_a/Pauli_b/Pauli_c
Cloude	3	Cloude_Dbl/Cloude_Odd/Cloude_Vol
H/A/Alpha	3	Alpha/Anisotropy/Entropy
VanZyl	3	VanZyl_Vol/VanZyl_Odd/VanZyl_Dbl
Freeman-Durden	3	Freeman_Durden_Dbl/Freeman_Durden_Odd/Freeman_Durden_Vol
Sinclair	3	Sinclair_Dbl/Sinclair_Vol/Sinclair_Surf
Touzi	4	Alpha/Psi/Phi/Tau*
Yamaguchi	4	Yamaguchi_Dbl/Yamaguchi_Odd/Yamaguchi_Vol/Yamaguchi_Hlx

Note: * indicates a feature that was excluded.

Table 6. Performance comparison of KNN, SVM, RF, U-Net, and MSA-U-Net models with respect to PA, UA, OA, and Kappa across classes (WB, VG, BL, SS, MS, HS).

	KNN		SVM		RF		U-Net		MSA-U-Net
	PA (%)	UA (%)	PA (%)	UA (%)	PA (%)	UA (%)	PA (%)	UA (%)	PA (%)	UA (%)
WB	90.32	88.89	87.90	91.60	90.32	87.50	87.10	85.71	88.33	85.48
VG	83.45	81.12	82.01	87.69	84.17	82.98	79.86	82.84	82.14	82.14
BL	72.43	79.76	97.30	63.83	72.43	81.71	97.30	75.63	92.18	89.19
SS	66.67	62.18	51.11	93.88	70.37	75.25	72.22	92.20	86.96	88.89
MS	61.04	61.04	57.14	65.67	79.22	63.21	68.18	82.68	81.99	85.71
HS	80.38	81.41	83.03	78.61	85.44	91.22	92.41	83.91	91.67	90.51
OA (%)	74.79		77.66		79.10		82.98		87.34
Kappa	0.69		0.73		0.74		0.76		0.84

Table 7. Evaluation index values of different methods under different data sources.

Data Source	Classification Methods	OA (%)	Kappa	F1 Score
Landsat-8 OLI (MS)	KNN	76.38	0.71	0.77
	SVM	73.08	0.67	0.72
	RF	77.36	0.75	0.80
	U-Net	79.15	0.74	0.79
	MSA-U-Net	79.57	0.75	0.80
RADARSAT-2 (SAR)	KNN	51.38	0.41	0.51
	SVM	52.98	0.43	0.51
	RF	55.64	0.46	0.56
	U-Net	50.85	0.40	0.50
	MSA-U-Net	51.49	0.41	0.51
Landsat-8 + RADARSAT-2 (MS + SAR)	KNN	74.79	0.69	0.75
	SVM	77.66	0.73	0.77
	RF	79.10	0.75	0.80
	U-Net	82.98	0.79	0.82
	MSA-U-Net	87.34	0.84	0.87

Table 8. Classification accuracies of the U-Net, the U-Net-SAM, the U-Net-CAM, and the MSA-U-Net for the study area.

Models	U-Net		U-Net-SAM		U-Net-CAM		MSA-U-Net
Classes	PA (%)	UA (%)	PA (%)	UA (%)	PA (%)	UA (%)	PA (%)	UA (%)
VG	79.86	82.84	84.89	80.82	83.45	82.27	82.73	82.14
BL	97.30	75.63	92.43	89.53	88.65	90.61	89.19	92.18
WT	87.10	85.71	84.68	88.98	84.68	89.74	85.48	88.33
SS	92.41	83.91	90.51	91.08	91.14	90.57	90.51	91.67
MS	68.18	82.68	83.12	79.01	84.42	80.75	85.71	81.99
HS	72.22	92.20	83.89	90.96	84.44	83.98	88.89	86.96
OA (%)	82.98		86.81		86.28		87.34
Kappa coefficient	0.79		0.84		0.83		0.84
F1 score	0.82		0.86		0.86		0.87

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiang, Y.; Nurmemet, I.; Lv, X.; Yu, X.; Gu, A.; Aihaiti, A.; Li, S. Multi-Source Attention U-Net: A Novel Deep Learning Framework for the Land Use and Soil Salinization Classification of Keriya Oasis in China with RADARSAT-2 and Landsat-8 Data. Land 2025, 14, 649. https://doi.org/10.3390/land14030649

AMA Style

Xiang Y, Nurmemet I, Lv X, Yu X, Gu A, Aihaiti A, Li S. Multi-Source Attention U-Net: A Novel Deep Learning Framework for the Land Use and Soil Salinization Classification of Keriya Oasis in China with RADARSAT-2 and Landsat-8 Data. Land. 2025; 14(3):649. https://doi.org/10.3390/land14030649

Chicago/Turabian Style

Xiang, Yang, Ilyas Nurmemet, Xiaobo Lv, Xinru Yu, Aoxiang Gu, Aihepa Aihaiti, and Shiqin Li. 2025. "Multi-Source Attention U-Net: A Novel Deep Learning Framework for the Land Use and Soil Salinization Classification of Keriya Oasis in China with RADARSAT-2 and Landsat-8 Data" Land 14, no. 3: 649. https://doi.org/10.3390/land14030649

APA Style

Xiang, Y., Nurmemet, I., Lv, X., Yu, X., Gu, A., Aihaiti, A., & Li, S. (2025). Multi-Source Attention U-Net: A Novel Deep Learning Framework for the Land Use and Soil Salinization Classification of Keriya Oasis in China with RADARSAT-2 and Landsat-8 Data. Land, 14(3), 649. https://doi.org/10.3390/land14030649

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Source Attention U-Net: A Novel Deep Learning Framework for the Land Use and Soil Salinization Classification of Keriya Oasis in China with RADARSAT-2 and Landsat-8 Data

Abstract

1. Introduction

2. Materials

2.1. Study Site

2.2. Data Collection and Preprocessing

2.3. Field Data

2.4. Collection of Training and Validation Data

3. Methods

3.1. Vegetation and Soil-Related Indices

3.2. Polarimetric Decomposition

3.3. Optimal Features Selection

3.4. Supervised Classification

3.4.1. Machine Learning Algorithms

3.4.2. Deep Learning Algorithms

3.5. Model Evaluation Metrics

4. Results

4.1. Polarimetric Decomposition of RADARSAT-2 Data

4.2. Feature Preprocessing and Selection of Optimal Feature Subset

4.3. Classification Results of MSA-U-Net

4.4. Comparison and Analysis of Classification Results Based on Multi-Sources Data

4.5. Characteristics of Spatial Distribution of Soil Salinity in Keriya Oasis

5. Discussion

5.1. Classification Performance Across Different Remote Sensing Data Sources

5.2. Comparison of MSA-U-Net with Traditional Machine Learning and Baseline Models

5.3. Impact of Attention Mechanisms on Skip Connections

5.4. Potential and Limitations

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI