Next Article in Journal
Satellite-Based Assessment of Marine Environmental Indicators and Their Variability in the South Pacific Island Regions: A National-Scale Perspective
Previous Article in Journal
Thermal Deformation Correction for the FY-4A LMI
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Polarimetric SAR Salt Crust Classification via Autoencoded and Attention-Enhanced Feature Representation

1
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
2
Key Laboratory of Target Cognition and Application Technology (TCAT), Chinese Academy of Sciences, Beijing 100190, China
3
School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China
4
College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
5
Qinghai Yanhu Industry Company Limited, Qinghai 816099, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2026, 18(1), 164; https://doi.org/10.3390/rs18010164
Submission received: 10 November 2025 / Revised: 25 December 2025 / Accepted: 31 December 2025 / Published: 4 January 2026

Highlights

What are the main findings?
  • Field surveys combined with PolSAR analyses reveal the scattering characteristics among multiple salt crust types around Qarhan Salt Lake, enabling reliable mapping of their spatial patterns and multi-temporal variations.
  • A unified classification framework integrating autoencoded and attention-enhanced feature representation with Transformer global modeling markedly improves the separability of fine-grained salt crust types despite their subtle scattering differences.
What is the implication of the main finding?
  • The proposed framework achieves more accurate classification of salt crust types, thereby enabling more precise salt crust mapping and more reliable temporal analysis, which provides a practical and scalable solution for long-term salt crust monitoring.

Abstract

Qarhan Salt Lake, located in the Qaidam Basin of northwestern China, is a highland lake characterized by diverse surface features, including salt lakes, salt crusts, and saline-alkali lands. Investigating the distribution and dynamic variations of salt crusts is essential for mineral resource development and regional ecological monitoring. To this end, the surface of the study area was categorized into several types according to micro-geomorphological characteristics. Polarimetric synthetic aperture radar (PolSAR), which provides rich scattering information, is well suited for distinguishing these surface categories. To achieve more accurate classification of salt crust types, the scattering differences among various types were comparatively analyzed. Stable samples were further selected using unsupervised Wishart clustering with reference to field survey results. Besides, to address the weak inter-class separability among different salt crust types, this paper proposes a PolSAR classification method tailored for salt crust discrimination by integrating unsupervised feature learning, attention-based feature optimization, and global context modeling. In this method, convolutional autoencoder (CAE) is first employed to learn discriminative local scattering representations from original polarimetric features, enabling effective characterization of subtle scattering differences among salt crust types. Vision Transformer (ViT) is introduced to model global scattering relationships and spatial context at the image-patch level, thereby improving the overall consistency of classification results. Meanwhile, the attention mechanism is used to bridge local scattering representations and global contextual information, enabling joint optimization of key scattering features. Experiments on fully polarimetric Gaofen-3 and dual-polarimetric Sentinel-1 data show that the proposed method outperforms the best competing method by 2.34 % and 1.17 % in classification accuracy, respectively. In addition, using multi-temporal Sentinel-1 data, recent temporal changes in salt crust distribution are identified and analyzed.

1. Introduction

Salt lakes are saline water bodies rich in mineral resources and play an important role in both industrial production and regional environmental systems [1]. In salt lake regions, the distribution of saline surfaces is closely linked to mineral resource development and ecological monitoring. Among these surfaces, salt crusts are special high-salinity formations widely distributed around salt lakes. Different salt crust types often exhibit distinct salinity levels, which are closely associated with variations in subsurface brine concentration [2]. In general, salt crusts with harder surfaces and higher roughness tend to correspond to higher brine salinity and are thus preferred zones for brine extraction in industrial salt production. Consequently, investigating the spatial distribution of salt crusts is of direct practical significance for both mineral resource exploitation and regional ecological environment monitoring in salt lake areas [3].
Due to the vastness of the region and inconvenient transportation, traditional field survey methods are insufficient to obtain the distribution of salt crusts. With the continuous development of remote sensing technology, the application of remote sensing in environmental monitoring has attracted increasing attention [4,5,6,7]. Polarimetric synthetic aperture radar (PolSAR), with its all-weather and all-day observation capability, enables effective assessment of surface and near-surface material composition, morphology, and structural characteristics [8,9]. In addition, PolSAR could provide long-term time series data [10,11], making it possible to analyze temporal changes in salt crust regions. In combination with field investigations, therefore, this study aims to employ PolSAR-based land cover classification methods to examine the spatial distribution and temporal variations of different salt crust types in the salt lake region.
PolSAR image classification is a fundamental application of polarimetric SAR data, and its performance strongly depends on the effective representation of land-cover features. A wide range of polarimetric features have been developed based on raw polarimetric observations [12,13] and polarimetric decomposition theories [14,15,16,17], which have supported various unsupervised classification approaches such as Wishart-based methods [18,19]. In addition, machine learning models such as support vector machine (SVM) [20] and random forest (RF) [21] have also been applied to PolSAR image classification. However, most existing approaches rely on manually designed polarimetric features for pixel-wise classification [22], which limits their ability to fully exploit the rich information contained in PolSAR data.
In recent years, deep learning has been increasingly applied to PolSAR image processing, enabling automatic feature extraction. Representative approaches include convolutional neural networks (CNNs) [23], graph convolutional networks [24], and self-attention mechanisms [25]. Among them, CNNs have been widely used in PolSAR land-cover classification due to their strong capability in extracting local spatial features. Early studies explored CNN-based feature learning from polarimetric data [26], and subsequent works further improved performance through complex-valued CNNs [27], high-dimensional polarimetric inputs with adaptive feature selection [28,29], and refined network designs such as multi-path structures, residual connections, and manifold regularization [30,31,32]. Despite these advances, CNNs mainly focus on local neighborhood modeling and are less effective in capturing long-range dependencies. To address this limitation, the Vision Transformer (ViT) was introduced in 2020 [33,34] and later applied to PolSAR classification by explicitly modeling global relationships among pixels [35,36,37]. ViT-based frameworks have demonstrated improved global representation capability, while their sensitivity to local structures remains limited. To balance local and global feature modeling, several hybrid CNN–ViT architectures have been proposed, including frameworks that integrate three-dimensional (3D) and two-dimensional (2D) CNNs with local window attention (LWA) [38], jointly model polarimetric coherency matrices across different polarization orientations using 3D convolution and self-attention [39], or employ multi-scale context fusion strategies [40]. Although these methods have achieved promising results, existing patch-level PolSAR classification approaches often rely on relatively simple fusion strategies, and the complementary strengths of CNNs in local texture perception and ViT in global context modeling have not yet been fully exploited.
At present, many patch-based classification methods place excessive emphasis on spatial correlation, which often leads to insufficient utilization of polarimetric information. In the context of salt crust classification, however, numerous studies have demonstrated that polarimetric decomposition parameters are closely related to the physical properties of salt crust surfaces [41]. In particular, the eigenvalues of the polarimetric coherency matrix exhibit strong correlations with surface roughness, and the associated scattering intensities provide clear physical interpretations [42]. Cloude [43] further analyzed the eigenvalues and eigenvectors derived from polarimetric coherency decomposition and reported a strong linear relationship between the anisotropy parameter A and the root mean square height of the surface. Building on these findings, Allain et al. [44] proposed the Double-Bounce Eigenvalue Relative Difference (DERD) and the Single-Bounce Eigenvalue Relative Difference (SERD), among which SERD is particularly effective in high-entropy media for identifying dominant scattering mechanisms. Liu et al. [45] subsequently explored the feasibility of using PolSAR data to distinguish salt crust types in the Lop Nur region by reinterpreting the physical meaning of the SERD parameter and inverting surface roughness from it, demonstrating its potential for salt crust identification. More recently, Li et al. [46] investigated polarimetric decomposition parameters of different salt crust types in the Qarhan Salt Lake region and proposed a method based on statistical and texture similarity measures to characterize parameter differences among salt crust categories. Collectively, these studies indicate that polarimetric decomposition parameters possess discriminative capability for differentiating salt crust types, providing a physical basis for their identification.
For PolSAR image classification, different polarimetric features are often combined to form semantic representations that reflect land-cover category differences [47]. However, the scattering differences among salt crust types are relatively limited, whereas polarimetric decomposition parameters are numerous and often redundant, especially for fully polarimetric SAR data with rich information content. Moreover, manually designing polarimetric features suitable for classification is typically labor-intensive and problem-dependent, which limits both efficiency and generalization capability. Consequently, a key challenge lies in how to further learn discriminative representations from existing polarimetric features to improve classification accuracy and robustness. As an unsupervised neural network, the convolutional autoencoder (CAE) provides an effective solution to this challenge by transforming original features into latent representations through nonlinear mappings [48]. Owing to its simple structure, CAE not only facilitates efficient interpretation of SAR images but also enables the extraction of higher-level abstract features. In addition, CAEs have been shown to effectively suppress speckle noise inherent in SAR data. Previous studies have demonstrated the effectiveness of autoencoders in PolSAR-related tasks, where they were employed to automatically learn polarimetric features [49] or optimize texture representations for improved classification performance [50].
Thus, the main research objectives of this study are summarized as follows:
1.
To learn discriminative representations from polarimetric features to address the weak separability among salt crust types.
2.
To design a PolSAR classification method that improves salt crust classification performance by jointly modeling local and global information.
3.
To analyze the spatial and temporal variations of salt crust distributions using time-series PolSAR data.
To achieve these objectives, stable regions of different salt crust types are first selected as samples based on field surveys and unsupervised Wishart clustering results. A novel classification method integrating CAE, attention mechanisms, and ViT is then proposed. The framework consists of two stages: CAE pretraining and end-to-end classification. During the pretraining stage, the CAE encoder learns deeper local feature representations, while in the classification stage, attention mechanisms adaptively refine channel- and spatial-level features before they are fed into the ViT for global modeling. Experimental results demonstrate that the proposed method effectively distinguishes salt crust types in both fully polarimetric and dual-polarimetric data and provides accurate descriptions of their spatial distributions.
The main contributions of this paper can be summarized as follows:
1.
Based on multiple field surveys, salt crusts surrounding the Qarhan Salt Lake were classified into different types according to surface characteristics. Their scattering differences were analyzed using both fully polarimetric and dual-polarimetric data, enabling consistent characterization of spatial distributions.
2.
To address the weak scattering differences among salt crust types, a feature processing strategy based on a CAE was adopted. The CAE compresses redundant features in fully polarimetric data and learns latent representations from dual-polarimetric data to enhance class separability and robustness.
3.
Considering the spatial continuity and regularity of salt crust distribution, this paper proposes a PolSAR classification framework. The CAE extracts local features, the attention mechanism balances them across channels and spatial dimensions, and the ViT models global dependencies. By leveraging neighborhood information within image patches, the framework reduces misclassification and improves spatial coherence among salt crust types.
4.
Time-series dual-polarimetric data from 2019 to 2025 were used to classify surface types at multiple time points. Based on these results, the temporal variations of salt crust distributions were analyzed, providing insights for salt lake resource development and environmental monitoring.
The remainder of this paper is organized as follows. Section 2 introduces the study area and field investigations of salt crusts. Section 3 presents the proposed method and its theoretical background in detail. Section 4 analyzes the differences in scattering among different salt crust types and discusses the experimental results of salt crust classification. Section 5 examines the results of the proposed method under different conditions and further discusses the temporal variations of salt crust types and future research directions. Finally, Section 6 provides the summary and conclusions.

2. Study Area and Field Investigation

2.1. Study Area

Qarhan Salt Lake, located in the Qaidam Basin of northwest China, is the largest salt lake in the country and the second largest in the world, with a geographical extent of 36°45′N–37°10′N, 94°45′E–95°35′E [51]. The lake area is rich in inorganic resources such as potassium and lithium, serving as China’s most important potash production base. As shown in Figure 1, Qarhan Salt Lake extends in a belt-like distribution, measuring approximately 80 km from east to west and 40 km from north to south. The climate is characterized by scarce precipitation, high evaporation, and intense solar radiation [52]. Shaped by the combined influence of the arid continental plateau climate and the enclosed basin topography, the region has developed a unique water–salt circulation system.
The formation of salt crusts is closely related to specific geological, geomorphological and hydroclimatic conditions [53]. During the long-term development of Qarhan Salt Lake, repeated alternations of salinization and desalination gradually transformed much of the lake area into salt crusts, resulting in extensive salt crust regions surrounding the lake. Owing to its large spatial extent, representative salt crust distribution patterns, and practical importance for resource development and environmental monitoring, the Qarhan Salt Lake region was selected as the primary study area in this work.

2.2. Field Investigation of Salt Crusts

To investigate the distribution characteristics of different salt crust types, multiple field investigations were conducted around the Qarhan Salt Lake region. These investigations were carried out in October 2023, July 2024, and June 2025, and the survey routes are illustrated in Figure 1. Based on geomorphological characteristics, previous studies [41,45], and the results of field investigations, the salt crust region can be categorized into four representative types: cracked salt crust, sharp-edged salt crust, micro-hilly salt crust, and flat salt crust. In the peripheral zones of the salt crust distribution, the surface types are mainly composed of widely distributed gravel and sand dunes. As shown in Figure 2, these six surface types exhibit distinct morphological characteristics.
Cracked salt crust: mainly distributed in the central part of the salt crust area. The surface is relatively flat and composed of salt crystals with fractures but without significant uplift, showing irregular geometric patterns and fissures. Large gaps often occur between cracks, which are partially filled with mud and sand.
Sharp-edged salt crust: found between the cracked and micro-hilly salt crusts. The surface is covered with fine-grained white salt crystals, and the overall structure is uneven without regularity, characterized by abrupt protrusions and depressions. This type of salt crust has a hard texture and a relatively low mud and sand content.
Micro-hilly salt crust: distributed between the sharp-edged and flat salt crust. It consists of small hollow salt crusts with a raised center, producing minor surface undulations. The surface texture is relatively soft, with higher mud and sand content.
Flat salt crust: located at the outermost edge of the salt crust region, adjacent to gravel and sand dune areas. The surface contains only sparse salt crystals and exhibits little structural relief. Beneath the surface lies a large amount of mud and sand, making the texture very loose and soft.
Gravel and sand dune areas: the gravel surface is composed of coarse sand, forming a hard texture that resists wind erosion. Over time, some regions have become covered by fine to medium sand grains, giving rise to undulating dunes.
Shadow areas: shaped by long-term aeolian transport, the surface develops irregular dune landforms. Due to the considerable height and slope of these dunes, radar shadows form on the leeward slopes or steep faces.
These surface types exhibit relatively stable spatial distributions and distinct morphological characteristics, forming concentric or semi-concentric patterns extending outward from the lake center. The clear zonal structure and spatial continuity of different salt crust types provide a meaningful basis for subsequent PolSAR-based classification and analysis.

3. Methods

As shown in Figure 3, the proposed classification framework consists of four components: the sample generation module, the CAE module, the attention-based feature enhancement module (FEM), and the ViT classification module.
In the sample generation process, the Wishart classifier is used to divide regions with identical or similar scattering mechanisms through unsupervised clustering. Combined with field survey results, it enables the approximate delineation of different salt crust types and provides reliable samples for subsequent supervised classification. CAE module is then utilized to extract and construct high-dimensional deep features, while simultaneously suppressing the influence of speckle noise through neighborhood information of image patches. Structurally, the CAE comprises an encoder built from stacked convolutional and nonlinear activation layers, and a decoder with a mirrored deconvolutional configuration. The initial features derived from raw polarimetric observations and polarimetric decompositions serve as inputs for CAE pretraining. After pretraining, the encoder is retained to extract deeper and more abstract representations from the image patches. Then, the attention-based FEM [54], is applied to reweight extracted features across both channel and spatial dimensions, thereby producing more discriminative feature representations for classification. Finally, these enhanced features are fed into the ViT, which performs feature learning and maps them to class labels. After end-to-end training, accurate classification results of surface types are obtained.

3.1. Label Sample Generation

Obtaining reliable labeled samples in the Qarhan Salt Lake region is difficult due to incomplete sampling coverage and temporal mismatches with the PolSAR data. The sampling campaign could not cover the entire salt-crust area, the PolSAR acquisition dates may not coincide with the sampling dates, and the salt-crust distribution will change gradually over time, all of which may introduce inconsistencies between the sampled observations and the PolSAR images. The ground samples were therefore not directly used for supervised classification. Conversely, an unsupervised clustering strategy was adopted during the sample generation stage. Specifically, the H / α –Wishart classifier was employed to segment the PolSAR images. This method combines eigenvalue decomposition with a Wishart classifier and can automatically partition the image into regions with similar polarimetric scattering mechanisms, without manual intervention.
Based on target decomposition theory, eigenvalue decomposition is utilized to the coherency matrix to estimate three parameters: entropy (H), alpha ( α ), and anisotropy (A). These parameters could characterize the scattering process and the physical scattering mechanisms in PolSAR. The parameters H and α are further used to construct a 2D feature space, in which land-cover types are separated into different canonical scattering regions. From the initial classification space, the cluster centers of each region are estimated along with the distances between each pixel and the corresponding cluster centers. Each pixel is then assigned to a specific category according to the minimum Wishart distance criterion (the detailed distance calculation can be found in the reference [19]). This process can be further refined through iterative optimization of the cluster centers, resulting in more stable clustering outcomes. Finally, the unsupervised segmentation results are combined with field survey results for manual refinement, from which stable and representative regions are selected as training and testing samples for supervised classification, ensuring the reliability of the subsequent model-learning process.

3.2. Initial Feature Extraction

The advantage of using fully polarimetric SAR data lies in its capacity to reveal detailed scattering characteristics through polarimetric decomposition. The coherency matrix provides a comprehensive description of the target’s polarimetric behavior, from which physically meaningful scattering mechanisms can be derived via decomposition techniques. In this study, a total of 22 features are extracted from the coherency matrix T and various polarimetric decomposition methods, as summarized in Table 1.
Although fully polarimetric data are widely regarded as more advantageous for extracting scattering mechanisms, they are often unavailable due to various practical limitations. In contrast, dual-polarization modes provide a reasonable alternative, reducing data volume and simplifying technical requirements. Following the dual-polarimetric eigenvalue decomposition proposed by Cloude [55], 6 polarimetric features are extracted from the covariance matrix C, which are also listed in Table 1.

3.3. Convolutional Autoencoder

To extract high-level abstract features from salt crust image patches, a stacked CAE structure was designed in this study to learn effective feature representations in an unsupervised manner [56]. The CAE consists of three convolutional layers, which is used to compress redundant information, extract representative features, and reduce the impact of speckle noise. The resulting encoded features then serve as stable and discriminative inputs for the subsequent feature enhancement and ViT modules.
As shown in Figure 3, the encoder consists of multiple convolutional layers combined with nonlinear activation functions, which are used to progressively extract deep structural features from the input image patch. To ensure consistency in spatial dimensions between input and output, no pooling layers are included in the encoding process, and all convolutional operations adopt appropriate padding strategies. The input feature map has a size of C × M × M , and the encoded output is C × M × M , where only the channel dimension changes. Each convolutional layer is equipped with trainable kernels and bias terms, together with suitable activation functions, to enhance the nonlinear modeling capability of the network [57]. Assuming the input is denoted as x i , and the encoder consists of L convolutional layers, the k -th output in the l -th layer can be expressed as:
h i ( l , k ) = f h i ( l 1 ) W ( l , k ) + b ( l , k ) , l = 1 , 2 , , L
where h i ( 0 ) = x i denotes the original input, while W ( l , k ) and b ( l , k ) represent the k -th convolution kernel and bias term in the l -th layer, respectively. The symbol ∗ denotes the 2D convolution operation, and f ( · ) refers to the ReLU activation function. The decoder reconstructs the encoded features through deconvolution, progressively restoring the original image structure using a symmetric architecture. It consists of multiple deconvolution layers with the same configuration as the encoder, ensuring that the reconstructed image remains as close as possible to the original input in both spatial dimensions and semantic structure. Specifically, the reconstruction of image y i can be expressed as:
h ˜ i ( m , k ) = f h ˜ i ( m 1 ) W ( L + m , k ) + b ( L + m , k ) , m = 1 , 2 , , L
where h ˜ i ( 0 ) = h i ( L ) denotes the final output of the encoder, W ( L + m , k ) represents the corresponding transpose convolution kernel, and b ( L + m , k ) is the bias term. The reconstructed image is obtained from the output of the final layer as y i = g h ˜ i ( L ) , where g ( · ) denotes a linear or identity activation function. To optimize the training process, the CAE minimizes the reconstruction error between the input and output. The loss function is defined as:
J ( X , Y ) = 1 2 N i = 1 N y i x i F 2

3.4. Attention-Based Feature Enhancement

After feature extraction by the CAE, an attention-based FEM is introduced to perform fine-grained modeling and enhancement of the encoded features. Since the extracted feature patches have fixed sizes, the Convolutional Block Attention Module (CBAM) can be adopted to sequentially apply attention to both channel and spatial dimensions, guiding the network to focus on more discriminative features. For a feature patch from the convolutional encoder, denoted as F R C × M × M , channel attention is computed by applying max pooling and average pooling operations to generate two one-dimensional channel descriptors. These descriptors are then processed through a shared multilayer perceptron (MLP) to obtain the channel attention map M c R C × 1 × 1 :
M c ( F ) = σ ( M L P ( A v g P o o l ( F ) ) + M L P ( M a x P o o l ( F ) ) )
Spatial attention is applied on the channel-compressed feature map to guide the network toward key regions in the spatial dimension:
M s ( F ) = σ f N × N AvgPool ( F c ) ; MaxPool ( F c ) ,
where σ ( · ) denotes the Sigmoid activation function, f N × N represents a convolution operation with an N × N kernel, and [ · ; · ] indicates concatenation along the channel dimension. Finally, the enhanced features are obtained by applying both channel and spatial attention to the input feature map:
F = M c ( F ) F , F ˜ = M s ( F ) F ,
where ⊗ denotes element-wise multiplication. Based on standard CBAM, a multi-scale structure was designed to enhance the model’s ability to perceive spatial information at different scales, as shown in Figure 4. This structure employs multiple parallel branches with varying receptive fields, where channel and spatial attention are modeled separately at each scale. The enhanced feature maps generated by different branches are then fused. Let the input feature be denoted as F, and the enhanced feature from the s -th scale branch as F ˜ s . The final fused feature can be expressed as:
F ˜ multi = s = 1 S w s · F ˜ s , where s = 1 S w s = 1 , w s ( 0 , 1 ) .
w s denotes the weight for each scale. The FEM serves as an intermediate feature processing unit between the CAE and ViT modules, providing richer and more informative feature representations for subsequent classification tasks. This design not only strengthens the modeling of local spatial structures but also enhances the generalization ability of the network under deeper nonlinear transformations.

3.5. Vision Transformer

To achieve efficient classification of salt crust feature patches, ViT is introduced as the discriminative model following the FEM. As shown in Figure 5, the overall ViT framework consists of a data preprocessing layer, a multi-head self-attention (MHSA) layer, and a MLP layer. Since the ViT requires a two-dimensional sequence as input, the enhanced feature patch C × M × M is first restructured in the data preprocessing layer. Each image feature block is partitioned into N 2 sub-patches, with each sub-patch flattened into a vector of dimension ( C × ( M / N ) 2 ) , denoted as a token. Thus, the input is reshaped into R N 2 × D , where D = C × ( M / N ) 2 . Each token is projected into a unified embedding space of dimension d. A learnable class token is then introduced and concatenated with all patch embeddings to form the final input sequence Z 0 R ( 1 + N 2 ) × d for subsequent processing. Unlike conventional ViTs, no positional encoding is included in this study to better align with the characteristics of PolSAR image patches.
The MHSA layer, as the core component, aims to capture dependencies among different regions within an image patch [39]. The input sequence Z 0 is first linearly transformed into queries (Q), keys (K), and values (V), and their scaled dot-product similarity is computed. After normalization with the Softmax function, the attention weights are obtained as:
Attention ( Q , K , V ) = Softmax Q K d V ,
where Q = Z 0 W Q , K = Z 0 W K , V = Z 0 W V , and W Q , W K , W V R d × h d k are learnable weight matrix, and d k = h / d . To enhance representational capacity, multiple attention heads are computed in parallel and concatenated:
MHSA ( Z 0 ) = Concat ( head 1 , , head h ) W O ,
where each head i = Attention ( Q i , K i , V i ) and W O R d × d is the output projection. The MHSA output is then passed through residual connections and LayerNorm, followed by an MLP sublayer consisting of two fully connected layers with GeLU activation. Finally, the class token from the output sequence is extracted and fed into a Softmax classifier to produce the probability distribution of salt crust categories:
y ^ = Softmax ( z class W c + b c ) ,
where z class R d is the class token, W c R d × C and b c R C are the trainable weight matrix and bias term of the classifier, and C denotes the number of salt crust categories.

3.6. Training Process

3.6.1. Pretraining of Convolutional Autoencoder

In the proposed method, CAE processes salt crust image patches by reconstructing them through forward propagation, with its parameters optimized via minimization of the mean squared error (MSE) between the original and reconstructed images. The MSE loss function evaluates the reconstruction quality of the encoder–decoder process. During training, network parameters are updated via backpropagation, where gradients are computed with respect to each trainable parameter and optimized using gradient descent until convergence. Once convergence is achieved, the encoder effectively captures the structural and semantic information of salt crust image patches.
The pretrained CAE encoder serves as a feature extractor, loading its parameters into the classification framework to generate high-level abstract features from image patches. To balance the robustness of pretrained representations with task-specific adaptability, a partial freezing strategy is applied: the first two convolutional layers are fixed, while the final layer is fine-tuned. This approach maintains reliable extraction of features while enhancing adaptability to the salt crust classification task.

3.6.2. Overall Training of Classification Network

The ultimate objective of the network is to predict the class label of each input salt crust image patch. Accordingly, the optimization process is guided by the cross-entropy loss function, which measures the discrepancy between the predicted class probabilities and the ground-truth labels:
L CE = 1 N i = 1 N j = 1 C z i j log ( z ^ i j ) ,
where N denotes the number of samples, C is the number of classes, z i represents the one-hot encoded ground-truth label vector of the i-th sample, and z ^ i denotes the predicted probability vector. For a sample belonging to class j, z i j = 1 ; otherwise, z i j = 0 .
During training, image patches are first processed by the CAE to extract high-level abstract features, enabling an initial modeling of the structural characteristics of salt crust patches. The encoded features are then passed into the FEM for local modeling and saliency enhancement. Finally, the features are input into the ViT, where self-attention mechanisms are applied to produce the final class predictions. It is worth noting that the CAE is pretrained with the objective of reconstructing input patches, rather than directly optimizing for classification. As such, the learned representations, though expressive, may not be fully aligned with classification performance. To address this, an end-to-end fine-tuning strategy is adopted during the overall training stage: after loading the pretrained CAE encoder parameters, all components of the network are jointly optimized. This enables the network to adapt its feature representations to the classification objective, thereby improving both recognition accuracy and generalization capability.

4. Experimental Results

4.1. Experimental Data

To validate the proposed classification method, experiments were conducted using fully polarimetric Gaofen-3 data and dual-polarimetric Sentinel-1 data. The Gaofen-3 data, acquired on 4 April 2019, was collected in quad-polarization stripmap I (QPSI) mode with a nominal resolution of 8 m. It covers a representative salt crust distribution area of the Qarhan Salt Lake and serves as one of the primary data sources for classification experiments. In addition, Sentinel-1 data were employed due to their open accessibility, ease of acquisition, and relatively short and regular revisit interval compared to other SAR satellites, making them well suited for continuous monitoring of salt crust dynamics. To align with the field survey schedule, Sentinel-1 data acquired on 1 August 2024, was selected for experimental validation. It was collected in interferometric wide-swath (IW) mode with dual-polarization channels (VV, VH), which coincides closely with one of the field campaigns. As shown in Figure 6, different salt crust types exhibit distinguishable patterns in the Pauli pseudo-color composite images.
After standard preprocessing steps, including radiometric calibration, multilook processing, geometric correction, and Lee filtering, polarimetric coherency and covariance matrices were constructed from Gaofen-3 and Sentinel-1 data, respectively [1]. The image sizes of the two datasets are 7474 × 8673 and 2930 × 3514 , with approximate pixel spacings of 4.5 m × 4.5 m and 15 m × 15 m , respectively.

4.2. Scattering Characteristics of Salt Crusts

4.2.1. Backscattering Intensity

As shown in Figure 7, boxplots of backscattering intensities were generated for fully polarimetric data (HV, VV, HH) and dual-polarimetric data (VH, VV) to compare the scattering behavior of different salt crust types. Overall, since salt crusts are generally homogeneous and dominated by symmetric or isotropic structures, the co-polarized channels exhibit consistently higher backscattering intensities than the cross-polarized channels. For rougher salt crust types, such as cracked, sharp-edged, and micro-hilly, the co-polarized channel intensities exceed 12 dB , with HH and VV channels showing a high degree of similarity.
The cracked salt crust is characterized by irregular surface structures with dense fissures and boundary features. Scattering from fissure edges enhances microstructural effects, strengthening the polarization rotation in cross-polarized channels and resulting in strong backscattering responses across all polarizations. Although the sharp-edged salt crust exhibits greater surface undulation, its irregular and dispersed orientation weakens the concentration of scattering energy in co-polarized channels, while polarization rotation in cross-polarization remains limited, leading to moderate overall backscattering intensity. The micro-hilly salt crust, with its soft surface and high sand–mud content, lacks regularity and reflectivity, causing electromagnetic waves to penetrate or be absorbed, and thus producing weak echoes in both co- and cross-polarized channels. By contrast, flat-shaped salt crusts, dunes, and dune-shadow regions display notably reduced backscattering in cross-polarized channels, with average values generally below 25 dB . The flat-shaped surface is relatively smooth, favoring specular reflection that diminishes backscattered energy. Dunes, composed of loose material with limited geometric structure, tend to allow penetration and absorption of radar waves, resulting in weak backscattering. In dune-shadow areas, backscattering is further reduced due to shielding effects from leeward slopes or steep dune faces.
When fully polarimetric and dual-polarimetric data are compared, a consistent variation trend of backscattering intensities is observed across most salt crust types, indicating strong agreement in their relative polarization responses. However, in flat-shaped salt crusts and dune-shadow regions, the backscattering intensities of both co- and cross-polarized channels in dual-polarimetric data are significantly lower than those in fully polarimetric data. A possible explanation lies in the differences in imaging geometry. The Gaofen-3 data were obtained in a descending right-looking mode with a satellite heading angle of 11.139° west of south, whereas the Sentinel-1 data were acquired in an ascending right-looking mode with a heading angle of 13.202° west of north. The difference in satellite heading angles may lead to different sensitivities to surface geometric structures. In particular, changes in the relative angle between the satellite flight direction and surface orientations, such as dune alignments, can result in variations in scattering intensity. In addition, the incidence angle ranges of the Gaofen-3 and Sentinel-1 data are 34 . 89 ° 36 . 75 ° and 33 . 51 ° 38 . 12 ° , respectively. Given that the variations in incidence angle are relatively small ( 1 . 86 ° for Gaofen-3 and 4 . 61 ° for Sentinel-1), the influence of incidence angle on the classification results is considered limited and is therefore neglected in this study.

4.2.2. H / α Plane

To further investigate the scattering differences among various salt crust types, polarimetric entropy (H) and mean scattering angle ( α ) were estimated, and scatter density maps of six surface types were plotted in the H / α feature space.
From Figure 8, it can be observed that cracked and sharp-edged types are mainly distributed in the regions of medium-to-high entropy ( H > 0.5 ) and scattering angles ( α > 30 ° ), indicating relatively complex scattering mechanisms dominated by volume scattering, accompanied by varying degrees of double-bounce scattering. Specifically, the cracked type, due to its regular geometry and fissures, exhibits a stronger tendency toward double-bounce scattering. The sharp-edged type, with pronounced surface undulations, shows a significant volume scattering component. The micro-hilly type, in contrast, displays a more dispersed and banded distribution, primarily attributed to complex scattering caused by small-scale undulating structures. In comparison, the flat-shape type, characterized by smooth surfaces, demonstrates typical surface scattering with a relatively simple scattering mechanism. Its scatter points are concentrated in the low-entropy ( H < 0.4 ) and low-scattering-angle ( α < 20 ° ) region. The dune type is primarily located in the medium-entropy and low-to-moderate scattering-angle region, dominated by single-bounce or weak volume scattering. The dune-shadow type, situated in shadowed areas, receives little effective incident energy, resulting in weak echoes and few effective scattering components.
In summary, different salt crust types exhibit clear separability in the H / α space. High-entropy, high-angle regions correspond to multipath and complex structural scattering, providing a basis for distinguishing structurally complex types such as cracked and sharp-edged. Conversely, low-entropy, low-angle regions are indicative of smooth surfaces lacking structural variations, such as the flat-shape type. This feature space thus provides a parametric foundation for subsequent classification models and enhances the physical interpretability of salt crust type identification.
In the dual-polarimetric data, as shown in Figure 9, the scatter distributions of the aforementioned types are noticeably contracted, with both entropy and scattering angle values slightly reduced, indicating certain limitations of the dual-polarization mode in capturing complex scattering mechanisms. Moreover, the dual-polarimetric data show insufficient discriminative ability between micro-hilly and flat-shape types, as well as between sharp-edged and dune types. By contrast, fully polarimetric data, with its richer polarization combinations, can capture scattering information more comprehensively, thereby offering clear advantages in distinguishing salt crust types.

4.3. Sample Generation and Initial Feature Extraction

Given that different salt crust types exhibit certain separability in the H / α feature plane, unsupervised H / α -Wishart classification was first conducted to obtain stable training samples for supervised classification. The unsupervised classification results of both fully polarimetric and dual-polarimetric data are shown in Figure 10. Although the results of unsupervised Wishart classification cannot correspond one-to-one with actual salt crust types, the clustering structures exhibit good spatial stability. Combined with field survey information, regions along the survey routes that showed consistent unsupervised classification results and clear surface type characteristics were selected as labeled samples. It should be noted that the clustering results of Sentinel-1 dual-polarization data are not satisfactory. Therefore, the sample selection primarily relies on field survey results, supplemented by reference to the clustering outcomes derived from Gaofen-3 fully polarimetric data. Finally, for the Gaofen-3 data, a total of 56,055 samples were selected, of which 27,702 samples were used for training and validation, with the remaining samples allocated to the test set. For the Sentinel-1 data, 27,907 samples were selected, including 12,507 samples for training and validation, while the rest were used for testing.
For feature extraction, the 22-dimensional features extracted from the fully polarimetric Gaofen-3 data and the 6-dimensional features extracted from the dual-polarimetric Sentinel-1 data were used as the basic input features for experimental validation. In addition, several common combinations of input features are available for fully polarimetric data. The influence of these initial feature combinations on classification performance will be further explored in subsequent sections.

4.4. Experimental Parameter Settings

In the proposed method, the input features were divided into fixed-size feature patches of 15 × 15 . For feature processing, the stacked CAE described above was applied, with its encoder comprising three convolutional layers of 3 × 3 kernels and 1 × 1 stride. The three convolutional layers output 64, 32, and 16 feature maps, respectively, each followed by a ReLU activation function to enhance nonlinear representation capability. Correspondingly, the CAE decoder adopts a symmetric structure with three transposed convolutional layers that progressively reconstruct the feature maps. Following the CAE encoding, feature representations were refined by the multi-scale enhancement module, configured as an extension of the standard CBAM with three parallel branches of receptive fields 1 × 1 , 3 × 3 , and 5 × 5 , each applied to perform attention modeling on the input features. The enhanced features were then fed into ViT for feature learning and classification. The ViT consists of six Transformer layers, each including a multi-head self-attention module with four attention heads, followed by two fully connected sublayers with hidden dimensions of 256 and 128, respectively.
For training, the batch size was fixed at 64 for both stages. During CAE pretraining, the learning rate was set to 5 × 10 5 , the weight decay to 1 × 10 4 , and the training lasted for 30 epochs with Adam as the optimizer, and the objective function was the mean squared error. In the subsequent end-to-end fine-tuning stage, the learning rate was adjusted to 1 × 10 4 , the number of epochs remained 50, and Adam was again employed, with the optimization objective defined as the cross-entropy loss.

4.5. Classification Results

A series of comparative experiments were designed to comprehensively validate the effectiveness of the proposed method. The comparison involved several mainstream supervised classification approaches, including RF, Swin Transformer [58], CFAT [59], Hybrid CVNet [60], PolSAR Former [38] and Lightweight DeepLabV3+ [61]. By systematically comparing classification performance with these methods, the accuracy and robustness of the proposed approach were evaluated from multiple perspectives. To quantitatively assess classification results, three commonly used metrics were employed: overall accuracy (OA), average accuracy (AA), and the Kappa coefficient. Specifically, OA represents the proportion of correctly classified pixels to the total number of pixels, AA measures the average classification accuracy across all categories, and the Kappa coefficient reflects the degree of agreement between the classification results and random chance.
The classification accuracies of different methods are summarized in Table 2. Traditional machine-learning methods such as RF are pixel-based and sensitive to speckle noise, leading to limited performance, with an overall accuracy of only 88.24 % and a lower Kappa coefficient than the other methods. The Swin Transformer improves classification accuracy due to its global self-attention mechanism, which effectively captures dependencies in remote sensing imagery. CFAT models directional and spatial correlations through direction-aware feature rearrangement, but its benefit is limited for salt crust classification due to the generally homogeneous and weakly anisotropic surface patterns. The PolSAR Former combines 3D and 2D CNNs for feature extraction and employs a LWA mechanism to acquire global information, achieving relatively accurate classification performance. Nevertheless, its input feature representation remains relatively primitive, which constrains the potential of the Transformer encoder. Hybrid CVNet employs complex-valued 3D and 2D convolutions to extract local features, and further models global dependencies using a complex-valued Transformer. The subtle inter-class differences among salt crust types make it difficult to directly obtain highly discriminative features from complex-valued inputs. By employing atrous convolutions and pyramid pooling structure, Lightweight DeepLabV3+ effectively captures multi-scale contextual information, resulting in an overall classification accuracy of 94.24 % . The proposed method reconstructs and refines input features through autoencoding and attention-based enhancement, and seamlessly integrates these components with the Transformer encoder. As a result, the proposed method achieves the best performance across all evaluation metrics, with OA, AA, and Kappa values reaching 96.91 % , 96.41 % , and 96.08 % , respectively, outperforming other classification methods.
In the classification maps, as illustrated in Figure 11, different colors represent the spatial distribution of six surface types. Among them, the reference map is used to illustrate the overall spatial distribution of different salt crust types. However, since salt crust distributions are generally natural and may contain small discontinuous regions, the reference map can not provide strict pixel-level correspondence. It can be observed that the proposed method produces salt crust distribution patterns similar to those obtained by other comparative approaches. Pixel-based methods such as RF exhibit noticeable misclassifications in certain regions, often accompanied by severe “salt-and-pepper” noise. In contrast, Swin Transformer performs patch-based classification and alleviates noise interference by leveraging spatial context. CFAT performs well for the cracked salt crust type due to its directional modeling, but shows limited performance for other types. Hybrid CVNet and PolSAR Former, which combine complex CNNs with self-attention, achieve relatively accurate classification results by capturing both local and global information. Although Lightweight DeepLabV3+ attains high overall accuracy, its performance deteriorates in cracked salt crust regions. The results generated by the proposed method exhibit clearer boundaries, significantly reduced block noise, and more coherent and complete land-cover delineations. The classified maps better reflect the actual geomorphological structures, with smoother transitions between fine-grained categories. In comparison, baseline methods such as Swin Transformer still show noticeable misclassifications in several regions, whereas the proposed approach demonstrates superior effectiveness, particularly in multi-class discrimination, boundary preservation, and noise suppression.
From the classification maps, it can be observed that the salt crusts in the Qarhan Salt Lake region exhibit clear zonal patterns and well-defined boundaries, with an overall concentric or semi-concentric distribution extending outward from the lake body. This spatial structure is highly consistent with the unsupervised classification results obtained using the Wishart method and aligns well with geomorphological observations from field surveys. Using the proposed method, the spatial distribution of different salt crust types in the study area was accurately described.
The experimental results on dual-polarimetric data exhibit trends similar to those observed with fully polarimetric data, as listed in Table 3. Traditional methods such as RF achieve relatively lower performance across the three evaluation metrics. Patch-based deep learning methods, including Swin Transformer and CFAT, achieve improved performance by incorporating spatial contextual information, while Hybrid CVNet and PolSAR Former, which integrate CNNs with self-attention mechanisms, further enhance classification accuracy by jointly modeling local features and global contextual dependencies. Lightweight DeepLabV3+ achieves reasonable performance but lacks explicit global modeling capability. In contrast, the proposed method, which combines CAE-based feature reconstruction with attention-based enhancement, consistently delivered the best performance, achieving OA, AA, and Kappa values of 91.92 % , 92.46 % , and 89.65 % , respectively. These results further demonstrate that the proposed framework maintains discriminative power even in dual-polarimetric scenarios. The autoencoder module provides features more favorable for classification and the attention mechanism enhances feature representation quality. Together, these components synergistically improve the classification capability of the Transformer. Consistently, as shown in Figure 12, the proposed method produces classification results with reduced noise interference and improved spatial continuity compared to other approaches, in line with the previous analysis.
A comparison between the classification results from fully polarimetric (Figure 11) and dual-polarimetric data (Figure 12) shows that both datasets can effectively distinguish between different surface types. However, since dual-polarimetric data contain less polarimetric information and generally have lower spatial resolution than fully polarimetric data, they exhibit weaker performance in fine-scale discrimination, manifested in discontinuities along some boundaries and blurred small-scale structures. Nevertheless, in terms of the overall spatial distribution, both datasets demonstrate a high degree of consistency in describing the zonal patterns of salt crust types in the Qarhan Salt Lake region, with the primary differences attributable to temporal variations in acquisition time. This indicates that under data access constraints, dual-polarimetric data still provide strong classification capability, retaining considerable practical value for large-scale surface type mapping.
To compare the computational complexity of the proposed method with the competing approaches, we evaluate the number of parameters (Params) and floating point operations (FLOPs) required for single image-patch inference using fully polarimetric Gaofen-3 data and dual-polarimetric Sentinel-1 data. The corresponding results are summarized in Table 4. PolSAR Former and Hybrid CVNet both rely on 3D convolution for feature extraction, followed by self-attention mechanisms for classification. The combination of convolutional feature extraction and subsequent feature fusion introduces relatively large parameter sizes and computational costs, which also vary noticeably with the input feature dimensionality. The Swin Transformer effectively controls computational complexity through window-based and shifted-window attention mechanisms, while CFAT incorporates directional feature modeling but still adopts global self-attention. Owing to its convolution-based design and the parameter-sharing mechanism in the atrous spatial pyramid pooling module, Lightweight DeepLabV3+ exhibits the lowest parameter count and computational complexity among the compared methods; however, it lacks explicit global modeling capability. Under fully polarimetric data conditions, the proposed method requires 0.9341 M Params and 0.8109 G FLOPs, which are significantly lower than those of PolSAR Former and Hybrid CVNet, and only slightly higher than those of the Swin Transformer. In comparison, the proposed method achieves a balanced trade-off between classification performance and computational complexity.

5. Discussion

5.1. Effect of Each Module

To analyze the contributions of different modules, ViT is used as the baseline method to evaluate the positive impact of each introduced component on classification performance. Firstly, CAE is introduced as the feature extraction module to enhance feature representation capability. This module can effectively capture high-level abstract features during the encoding process, thereby improving the quality of features fed into the Transformer. As listed in Table 5, for both datasets, the integration of the CAE module leads to improved ViT classification performance, with all three evaluation metrics increasing by more than 1 % . To further validate the effect of the CAE module, ablation experiments are conducted after incorporating the FEM. For the fully polarimetric data, compared with the results without CAE, the overall accuracy OA, AA, and Kappa coefficient increase by 2.29 % , 0.83 % , and 2.87 % , respectively. For the dual-polarimetric data, the improvements are 1.91 % , 2.72 % , and 1.17 % , respectively. These results demonstrate that the CAE effectively learns the most discriminative feature combinations among different salt crust types in an unsupervised manner, adaptively capturing the nonlinear relationships among various features.
Furthermore, the effect of multi-scale FEM is analyzed, which employs attention mechanisms to adaptively reweight feature channels and spatial dimensions. For both datasets, the addition of this module further improves ViT classification performance. It should be noted that, without the CAE as the feature extractor, the performance gain for the Gaofen-3 fully polarimetric data is limited. This may be attributed to the excessive number of input features, among which the most relevant ones for salt crust classification are not effectively selected. To further validate the effect of the multi-scale FEM, ablation experiments are also performed with the CAE included. After incorporating this module, the OA for the Gaofen-3 fully polarimetric and Sentinel-1 dual-polarimetric datasets increases by 1.12 % and 1.78 % , respectively. In addition, the single-scale FEM with receptive field of 3 × 3 was incorporated for comparison. The improvement in classification accuracy demonstrates that the multi-scale FEM enables more discriminative feature reconstruction and achieves a better balance between channel and spatial weighting. Overall, for both datasets, the proposed method consistently achieves significant improvements in OA, AA, and Kappa compared with the baseline. By effectively combining the complementary advantages of the CAE, attention-based feature enhancement mechanism, and Transformer encoder, the proposed framework achieves superior performance in salt crust classification.
As shown in Table 6, the computational complexity of the proposed method is mainly dominated by the CAE and the ViT modules. Taking the fully polarimetric data as an example, the ViT alone contains 0.87 M Params, which mainly come from the linear projections and feed-forward networks, while the FLOPs introduced by the self-attention mechanism are relatively small (0.0173 G). After introducing the CAE, the FLOPs increase significantly to 0.8044 G, which is mainly due to the dense convolutions and feature reconstruction operations of the CAE. In the complete model, the additional Params and FLOPs introduced by the multi-scale FEM are relatively limited. The multi-scale FEM is designed to perform lightweight, multi-scale feature enhancement and therefore does not introduce a substantial increase in overall computational burden. Although the proposed method incurs higher computational cost compared with the single ViT, this increase is considered acceptable for salt crust classification, where rapid or real-time processing is not a primary requirement.

5.2. Effect of the Image Patch Size

Since the image patches serve not only as the input to the CAE but also as the basis for subsequent feature enhancement and ViT classification. Therefore, their spatial scale largely determines the model’s ability to perceive local textures and structural information. If the patches are too small, they may fail to capture sufficient contextual information, leading to incomplete feature representations. Conversely, overly large patches may introduce excessive background information or noise, thereby weakening the discriminative capability of extracted features. To quantitatively analyze this effect, five patch sizes ( 9 × 9 , 12 × 12 , 15 × 15 , 18 × 18 , and 21 × 21 ) are selected for comparative experiments. The corresponding classification results are presented in Table 7. For the Gaofen-3 fully polarimetric data, as the patch size increases, richer global spatial information is captured, resulting in a gradual improvement across all three evaluation metrics. In contrast, for the Sentinel-1 dual-polarimetric data, the highest classification accuracy is achieved when the patch size is 15 × 15 . Beyond this size, both OA and Kappa show a decreasing trend. This discrepancy can be attributed primarily to the difference in spatial resolution between the two datasets. The Sentinel-1 data have relatively lower resolution and fewer discriminative features, and larger patches tend to introduce more irrelevant information, which adversely affects classification performance. It is also noteworthy that larger patch sizes significantly increase computational cost during both training and inference. Therefore, considering the balance between classification accuracy and computational efficiency, the patch size of 15 × 15 is adopted in this study.

5.3. Effect of the Input Features

A variety of classification network architectures have been proposed to better exploit the information contained in polarimetric SAR data, each designed for specific forms of initial input features. Although these methods can achieve improvements in classification accuracy, they often suffer from limited generalization capability. A model that can adapt to multiple types of input features would demonstrate greater flexibility and robustness. Therefore, this section investigates the sensitivity of the proposed method to different forms of initial input features of Gaofen-3 fully polarimetric data. In the experiment, four representative types of input features are selected for comparison: the 9-dimensional coherency matrix features [27], the 6-dimensional polarimetric scattering features used in CNN-based methods [26], the 16-dimensional features employed by ViT [35], and the 7-dimensional features selected by Li et al. [46] that exhibit the greatest inter-class differences among salt crust types. Except for the input features, all other network structures are kept identical to ensure a fair evaluation of the impact of different feature representations on classification performance. The corresponding classification results are shown in Table 8. The 9-dimensional independent parameters of the coherency matrix lack a corresponding physical interpretation model, which results in the poorest classification performance. In contrast, the features used in this study are derived from model-based decomposition and thus possess meaningful physical interpretations. Therefore, the 22-dimensional features used in this study achieved the highest classification accuracy of 96.91 % , whereas the original 9-dimensional coherency matrix features yielded the lowest accuracy at 93.64 % . Besides, a larger number of features tends to produce better performance, as the CAE is capable of nonlinearly deriving a feature combination that most effectively represents the entire set of input features.

5.4. Temporal Variations of Salt Crusts

Although the spatial resolution of dual-polarimetric data is relatively low, it remains feasible to analyze temporal variations in salt crust distribution using such data. In particular, Sentinel-1 dual-polarimetric imagery provides extensive temporal coverage and a stable revisit cycle, making it suitable for long-term monitoring. Therefore, seven years of multi-temporal Sentinel-1 imagery, spanning from 2019 to 2025, were selected to analyze the spatial distribution changes of different salt crust types over time, as shown in Figure 13. To avoid potential effects caused by variations in satellite viewing geometry, all Sentinel-1 images used for temporal analysis were acquired from the same orbit direction and frame, ensuring consistent observation geometry across different time points.
Based on the proposed classification method, the Sentinel-1 data for each year were pre-processed, characteristics were extracted, and classification was performed, resulting in continuous annual maps of the salt crust distribution, as shown in Figure 14. Meanwhile, Figure 15 presents an enlarged view of the central salt crust area near the lake, which more clearly illustrates the gradual temporal variations of the salt crusts. To further investigate the driving factors behind these changes, monthly precipitation and temperature data from January 2019 to June 2025 were collected and compared with the remote sensing results. Generally, precipitation can dissolve salts within the crust, whereas high temperatures and strong radiation promote evaporation, leading to salt crystallization and reformation. Consequently, the spatial distribution of salt crusts exhibits temporal variability.
Overall, only minor spatial changes were observed among different salt crust types over time. A comparison between the fully polarimetric Gaofen-3 classification map from 4 April 2019 (Figure 11g) and the dual-polarimetric Sentinel-1 classification map from 6 April 2019 (Figure 14a) reveals a high degree of consistency in salt crust distribution, further demonstrating that the proposed classification framework maintains stable performance regardless of variations in different SAR systems. As shown in Figure 16, annual temperature variations in the Qarhan Salt Lake region follow a relatively stable pattern, while precipitation fluctuates significantly due to multiple influencing factors. In 2020 and 2022, temperatures were comparable to other years, but precipitation was markedly lower, approximately only 30 mm. As the residual moisture within the salt crusts evaporated, the surface roughness and undulation increased. Consequently, the areas covered by high-salinity cracked and sharp-edged salt crusts expanded notably in these two years, particularly within the regions highlighted by the black rectangles in Figure 15. Moreover, a slight expansion of sharp-edged crusts can also be observed along their boundaries adjoining the micro-hilly zones. In contrast, during 2021 and 2023, precipitation exhibited higher variations, leading to relatively stable salt crust distributions. Before the data acquisitions on 1 August 2024 and 4 May 2025, limited precipitation can also be observed, leading to a slight expansion in the areas covered by cracked and sharp-edged salt crusts in the classification results of 2024 and 2025. It should also be noted that in late 2024 and early 2025, the distribution of salt crusts underwent significant changes due to certain human activities, particularly in the cracked and near-lake regions. Human activities have removed the surface salt crust, leaving behind a rough, gravel-like texture. These changes are clearly visible in both the Pauli pseudo-color images and the corresponding classification maps, as indicated by the red ellipse in Figure 15g.

5.5. Future Work

This study conducted a comprehensive analysis of the polarimetric scattering characteristics of different types of salt crusts and proposed a feature extraction and classification framework tailored for salt crust discrimination. The proposed method enabled an initial qualitative characterization of the spatial distribution and temporal variation patterns of salt crusts. Since fully polarimetric data provide higher resolution and richer polarization information than dual-polarimetric data, they offer more accurate representations of salt crust distribution. Therefore, future work will further investigate the temporal evolution of salt crusts using fully polarimetric observations. Moreover, subsequent studies will focus on quantifying the relationship between salt crust surface roughness and polarimetric SAR parameters, exploring the influence mechanisms of multiple features on scattering responses. Building on this foundation, additional in situ measurements such as moisture content and surface roughness will be incorporated to establish a polarimetric SAR parameter inversion model suited to high-salinity soil environments, thereby improving the accuracy and adaptability of salt crust classification and surface parameter estimation.

6. Conclusions

This paper focuses on the micro-geomorphological characteristics of the Qarhan Salt Lake region, combining field investigations and polarimetric scattering analyses to conduct a detailed classification of surface types. First, the polarimetric scattering behaviors of typical salt crusts and other surface types were qualitatively analyzed. Based on these analyses, the H / α -Wishart unsupervised clustering method, together with field survey data, was employed to select stable and representative training samples, providing reliable support for supervised classification. Furthermore, a classification framework integrating CAE, attention-based FEM, and ViT is proposed. In this framework, the CAE preserves the spatial structure of image patches while effectively extracting important features, the FEM enhances the discriminative representation of them, and the ViT enables global modeling and final decision-making based on the encoded representations. Experimental results on both Gaofen-3 fully polarimetric and Sentinel-1 dual-polarimetric SAR datasets demonstrate that the proposed method achieves high classification accuracy and strong generalization capability in salt crust type discrimination. Furthermore, the spatiotemporal evolution of salt crust distributions was analyzed using Sentinel-1 time-series data from 2019 to 2025. Combined with contemporaneous precipitation and temperature records, this analysis provides insight into the environmental factors potentially influencing salt crust evolution. Future work will focus on quantifying the relationship between salt crust surface roughness and polarimetric SAR scattering parameters, exploring the mechanisms by which surface parameters affect scattering responses, and developing a parameter inversion model suitable for high-salinity environments.
References yes

Author Contributions

Conceptualization, Q.Y. (Qiang Yin); methodology, F.D.; software, F.D.; validation, Q.Y. (Qiang Yin) and W.H.; formal analysis, Q.Y. (Qiang Yin); investigation, F.D., Q.Y. (Qiang Yin), Q.Y. (Qunxiong Yan) and J.Z.; data curation, F.D.; writing—original draft preparation, F.D.; writing—review and editing, Q.Y. (Qiang Yin) and W.H.; supervision, W.H.; project administration, W.H.; funding acquisition, W.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China under Grant No. 62331026, and the Natural Science Foundation of Shandong Province under Grant No. ZR2024ZD19.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Juan Zhang and Qunxiong Yan were employed by the company Qinghai Yanhu Industry Company Limited, Qinghai 816099, China. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Zhang, F.; Meng, F.; Ma, F.; Yin, Q.; Zhou, Y. Time correlation entropy: A novel multitemporal polsar feature and its application in salt lake classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4506814. [Google Scholar] [CrossRef]
  2. Cai, N.; Wang, W.; Xiao, G.; Yang, Z.; Zhu, H.; Wang, X. Geochemical Characteristics and Origin of Heavy Metals and Dispersed Elements in Qarhan Salt Lake Brine. Water 2025, 17, 1927. [Google Scholar] [CrossRef]
  3. Hui, R.; Tan, H.; Li, X.; Wang, B. Variation of soil physical-chemical characteristics in salt-affected soil in the Qarhan Salt Lake, Qaidam Basin. J. Arid Land 2022, 14, 341–355. [Google Scholar] [CrossRef]
  4. Hong, G.; Wang, S.; Li, J.; Huang, J. Fully polarimetric synthetic aperture radar (SAR) processing for crop type identification. Photogramm. Eng. Remote Sens. 2015, 81, 109–117. [Google Scholar] [CrossRef]
  5. Ni, J.; Zhang, F.; Ma, F.; Yin, Q.; Xiang, D. Random region matting for the high-resolution polsar image semantic segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3040–3051. [Google Scholar] [CrossRef]
  6. Hu, J.; Zhu, K.; Fu, H.; Liu, J.; Wang, C.; Gui, R. Isolating orbital error from multitemporal InSAR derived tectonic deformation based on wavelet and independent component analysis. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4510705. [Google Scholar] [CrossRef]
  7. He, L.; He, X.; Hui, F.; Ye, Y.; Zhang, T.; Cheng, X. Investigation of polarimetric decomposition for Arctic summer sea ice classification using Gaofen-3 fully polarimetric SAR data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 3904–3915. [Google Scholar] [CrossRef]
  8. Gao, Z.; Gong, H.; Zhou, X.; Shao, Y.; Yuan, M.; Wang, L. Study on the polarimetric characteristics of the Lop Nur arid area using PolSAR data. J. Appl. Remote Sens. 2014, 8, 083681. [Google Scholar] [CrossRef]
  9. Zhang, T.; Shao, Y.; Gong, H.; Li, L.; Wang, L. Salt content distribution and paleoclimatic significance of the lop nur “Ear” feature: Results from analysis of EO-1 hyperion imagery. Remote Sens. 2014, 6, 7783–7799. [Google Scholar] [CrossRef]
  10. Qi, Z.; Yeh, A.G.O.; Li, X.; Xian, S.; Zhang, X. Monthly short-term detection of land development using RADARSAT-2 polarimetric SAR imagery. Remote Sens. Environ. 2015, 164, 179–196. [Google Scholar] [CrossRef]
  11. Ni, J.; López-Martínez, C.; Hu, Z.; Zhang, F. Multitemporal SAR and polarimetric SAR optimization and classification: Reinterpreting temporal coherence. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5236617. [Google Scholar] [CrossRef]
  12. Li, Z.C.; Li, H.C.; Gao, G.; Hong, W.; Emery, W.J. Unsupervised classification for multilook polarimetric SAR images via double Dirichlet process mixture model. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5208016. [Google Scholar] [CrossRef]
  13. Conradsen, K.; Nielsen, A.A.; Schou, J.; Skriver, H. A test statistic in the complex Wishart distribution and its application to change detection in polarimetric SAR data. IEEE Trans. Geosci. Remote Sens. 2003, 41, 4–19. [Google Scholar] [CrossRef]
  14. Freeman, A.; Durden, S. A three-component scattering model for polarimetric SAR data. IEEE Trans. Geosci. Remote Sens. 1998, 36, 963–973. [Google Scholar] [CrossRef]
  15. Lopez-Martinez, C.; Pottier, E.; Cloude, S.R. Statistical assessment of eigenvector-based target decomposition theorems in radar polarimetry. IEEE Trans. Geosci. Remote Sens. 2005, 43, 2058–2074. [Google Scholar] [CrossRef]
  16. van Zyl, J.J.; Arii, M.; Kim, Y. Model-Based Decomposition of Polarimetric SAR Covariance Matrices Constrained for Nonnegative Eigenvalues. IEEE Trans. Geosci. Remote Sens. 2011, 49, 3452–3459. [Google Scholar] [CrossRef]
  17. Yamaguchi, Y.; Sato, A.; Boerner, W.M.; Sato, R.; Yamada, H. Four-Component Scattering Power Decomposition With Rotation of Coherency Matrix. IEEE Trans. Geosci. Remote Sens. 2011, 49, 2251–2258. [Google Scholar] [CrossRef]
  18. Lee, J.S.; Grunes, M.R.; Kwok, R. Classification of multi-look polarimetric SAR imagery based on complex Wishart distribution. Int. J. Remote Sens. 1994, 15, 2299–2311. [Google Scholar] [CrossRef]
  19. Qin, X.; Zhang, Y.; Li, Y.; Cheng, Y.; Yu, W.; Wang, P.; Zou, H. Distance measures of polarimetric SAR image data: A survey. Remote Sens. 2022, 14, 5873. [Google Scholar] [CrossRef]
  20. Liu, C.; Li, Z.; Huang, L.; Zhang, P.; Wu, Z.; Zhou, J.; Tang, Z.; Li, G. Identifying wet and dry snow with dual-polarized c-band sar data based on markov random field model. IEEE Geosci. Remote Sens. Lett. 2023, 20, 2000305. [Google Scholar] [CrossRef]
  21. Lu, Y.; Zhang, B.; Perrie, W. Arctic sea ice and open water classification from spaceborne fully polarimetric synthetic aperture radar. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4203713. [Google Scholar] [CrossRef]
  22. Yin, Q.; Hong, W.; Zhang, F.; Pottier, E. Optimal combination of polarimetric features for vegetation classification in PolSAR image. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 3919–3931. [Google Scholar] [CrossRef]
  23. Chen, S.W.; Tao, C.S. PolSAR image classification using polarimetric-feature-driven deep convolutional neural network. IEEE Geosci. Remote Sens. Lett. 2018, 15, 627–631. [Google Scholar] [CrossRef]
  24. Shi, J.; He, T.; Ji, S.; Nie, M.; Jin, H. Cnn-improved superpixel-to-pixel fuzzy graph convolution network for polsar image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4410118. [Google Scholar] [CrossRef]
  25. Geng, J.; Zhang, Y.; Jiang, W. Polarimetric SAR image classification based on hierarchical scattering-spatial interaction transformer. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5205014. [Google Scholar] [CrossRef]
  26. Zhou, Y.; Wang, H.; Xu, F.; Jin, Y.Q. Polarimetric SAR image classification using deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1935–1939. [Google Scholar] [CrossRef]
  27. Zhang, Z.; Wang, H.; Xu, F.; Jin, Y.Q. Complex-valued convolutional neural network and its application in polarimetric SAR image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 7177–7188. [Google Scholar] [CrossRef]
  28. Yang, C.; Hou, B.; Ren, B.; Hu, Y.; Jiao, L. CNN-based polarimetric decomposition feature selection for PolSAR image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8796–8812. [Google Scholar] [CrossRef]
  29. Dong, H.; Zhang, L.; Lu, D.; Zou, B. Attention-based polarimetric feature selection convolutional network for PolSAR image classification. IEEE Geosci. Remote Sens. Lett. 2020, 19, 4001705. [Google Scholar] [CrossRef]
  30. Xiao, D.; Wang, Z.; Wu, Y.; Gao, X.; Sun, X. Terrain segmentation in polarimetric SAR images using dual-attention fusion network. IEEE Geosci. Remote Sens. Lett. 2020, 19, 4006005. [Google Scholar] [CrossRef]
  31. Ding, L.; Zheng, K.; Lin, D.; Chen, Y.; Liu, B.; Li, J.; Bruzzone, L. MP-ResNet: Multipath residual network for the semantic segmentation of high-resolution PolSAR images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 4014205. [Google Scholar] [CrossRef]
  32. Bi, H.; Sun, J.; Xu, Z. A graph-based semisupervised deep learning model for PolSAR image classification. IEEE Trans. Geosci. Remote Sens. 2018, 57, 2116–2132. [Google Scholar] [CrossRef]
  33. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
  34. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  35. Dong, H.; Zhang, L.; Zou, B. Exploring vision transformers for polarimetric SAR image classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5219715. [Google Scholar] [CrossRef]
  36. Wang, H.; Xing, C.; Yin, J.; Yang, J. Land cover classification for polarimetric SAR images based on vision transformer. Remote Sens. 2022, 14, 4656. [Google Scholar] [CrossRef]
  37. Ni, J.; Tian, K.; López-Martínez, C.; Zhan, Y.; Lin, X.; Tao, D. Hag-former: A temporal-polarimetric relationship inference network from local to global. IEEE Geosci. Remote Sens. Lett. 2024, 21, 4018205. [Google Scholar] [CrossRef]
  38. Jamali, A.; Roy, S.K.; Bhattacharya, A.; Ghamisi, P. Local window attention transformer for polarimetric SAR image classification. IEEE Geosci. Remote Sens. Lett. 2023, 20, 4004205. [Google Scholar] [CrossRef]
  39. Wang, L.; Gui, R.; Hong, H.; Hu, J.; Ma, L.; Shi, Y. A 3-D convolutional vision transformer for PolSAR image classification and change detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 11503–11520. [Google Scholar] [CrossRef]
  40. Zhang, J.; Zhang, W.; Zhou, X.; Chu, Q.; Yin, X.; Li, G.; Dai, X.; Hu, S.; Jin, F. CNN and Transformer fusion network for sea ice classification using Gaofen-3 polarimetric SAR images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 18898–18914. [Google Scholar] [CrossRef]
  41. Gong, H.; Shao, Y.; Zhang, T.; Liu, L.; Gao, Z. Scattering mechanisms for the “Ear” feature of Lop Nur lake basin. Remote Sens. 2014, 6, 4546–4562. [Google Scholar] [CrossRef]
  42. Hajnsek, I.; Pottier, E.; Cloude, S.R. Inversion of surface parameters from polarimetric SAR. IEEE Trans. Geosci. Remote Sens. 2003, 41, 727–744. [Google Scholar] [CrossRef]
  43. Cloude, S.R. Eigenvalue parameters for surface roughness studies. In Proceedings of the Polarization: Measurement, Analysis, and Remote Sensing II, Denver, CO, USA, 19–21 July 1999; Volume 3754, pp. 2–13. [Google Scholar]
  44. Allain, S.; Ferro-Famil, L.; Pottier, E. A polarimetric classification from PolSAR data using SERD/DERD parameters. In Proceedings of the 6th European Conference on Synthetic Aperture Radar EUSAR 2006, Dresden, Germany, 16–18 May 2006; p. CD. [Google Scholar]
  45. Liu, C.A.; Gong, H.; Shao, Y.; Yang, Z.; Liu, L.; Geng, Y. Recognition of salt crust types by means of PolSAR to reflect the fluctuation processes of an ancient lake in Lop Nur. Remote Sens. Environ. 2016, 175, 148–157. [Google Scholar] [CrossRef]
  46. Li, S.; Lin, Z.; Yin, Q.; Ma, F.; Hong, W. Salt Crust Classification of Qarhan Salt Lake Based on Polarimetric Feature Selection of GF-3 SAR Data. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 7–12 July 2024; pp. 10834–10837. [Google Scholar]
  47. Zhang, S.; Cui, L.; Dong, Z.; An, W. A deep learning classification scheme for PolSAR image based on polarimetric features. Remote Sens. 2024, 16, 1676. [Google Scholar] [CrossRef]
  48. Liu, C.; Li, Z.; Wu, Z.; Huang, L.; Zhang, P.; Li, G. An unsupervised snow segmentation approach based on dual-polarized scattering mechanism and deep neural network. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4300614. [Google Scholar] [CrossRef]
  49. Xie, H.; Wang, S.; Liu, K.; Lin, S.; Hou, B. Multilayer feature learning for polarimetric synthetic radar data classification. In Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium, Quebec City, QC, Canada, 13–18 July 2014; pp. 2818–2821. [Google Scholar]
  50. Geng, J.; Fan, J.; Wang, H.; Ma, X.; Li, B.; Chen, F. High-resolution SAR image classification via deep convolutional autoencoders. IEEE Geosci. Remote Sens. Lett. 2015, 12, 2351–2355. [Google Scholar] [CrossRef]
  51. Bao, X.; Zhang, R.; He, X.; Shama, A.; Yin, G.; Chen, J.; Zhang, H.; Liu, G. An Integrated Time-Series Relative Soil Moisture Monitoring Method Based on a SAR Backscattering Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 18, 2137–2156. [Google Scholar] [CrossRef]
  52. Xiang, W.; Liu, G.; Zhang, R.; Pirasteh, S.; Wang, X.; Mao, W.; Li, S.; Xie, L. Modeling saline mudflat and aquifer deformation synthesizing environmental and hydrogeological factors using time-series InSAR. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 11134–11147. [Google Scholar] [CrossRef]
  53. Song, H.; Fan, Q.; Li, Q.; Chen, T.; Yang, H.; Han, C. Recharge processes limit the resource elements of Qarhan Salt Lake in western China and analogues in the evaporite basins. J. Oceanol. Limnol. 2023, 41, 1226–1242. [Google Scholar] [CrossRef]
  54. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  55. Cloude, S. The dual polarization entropy/alpha decomposition: A PALSAR case study. Sci. Appl. Sar Polarim. Polarim. Interferom. 2007, 644, 2. [Google Scholar]
  56. Xie, W.; Jiao, L.; Hou, B.; Ma, W.; Zhao, J.; Zhang, S.; Liu, F. POLSAR image classification via Wishart-AE model or Wishart-CAE model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3604–3615. [Google Scholar] [CrossRef]
  57. Li, S.; Pan, Z.; Hu, Y. Multi-aspect convolutional-transformer network for SAR automatic target recognition. Remote Sens. 2022, 14, 3924. [Google Scholar] [CrossRef]
  58. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
  59. Cui, X. CFAT: Convolutional Fieldy Attention Transformer for Polarimetric SAR Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 26432–26445. [Google Scholar] [CrossRef]
  60. Alkhatib, M.Q. Polsar image classification using a hybrid complex-valued network (hybridcvnet). IEEE Geosci. Remote Sens. Lett. 2024, 21, 4017705. [Google Scholar] [CrossRef]
  61. Shi, J.; Ji, S.; Jin, H.; Zhang, Y.; Gong, M.; Lin, W. Multi-Feature Lightweight DeeplabV3+ Network for Polarimetric SAR Image Classification with Attention Mechanism. Remote Sens. 2025, 17, 1422. [Google Scholar] [CrossRef]
Figure 1. Geographic location of the Qarhan Salt Lake and routes of multiple field investigations (the base map is derived from Google Earth).
Figure 1. Geographic location of the Qarhan Salt Lake and routes of multiple field investigations (the base map is derived from Google Earth).
Remotesensing 18 00164 g001
Figure 2. Optical photographs of different surface types. (a) Cracked salt crust. (b) Sharp-edged salt crust. (c) Micro-hilly salt crust. (d) Flat salt crust. (e) Gravel and sand dune areas. (f) Shadow areas.
Figure 2. Optical photographs of different surface types. (a) Cracked salt crust. (b) Sharp-edged salt crust. (c) Micro-hilly salt crust. (d) Flat salt crust. (e) Gravel and sand dune areas. (f) Shadow areas.
Remotesensing 18 00164 g002
Figure 3. Flowchart of the proposed method.
Figure 3. Flowchart of the proposed method.
Remotesensing 18 00164 g003
Figure 4. Schematic diagram of the attention-based FEM.
Figure 4. Schematic diagram of the attention-based FEM.
Remotesensing 18 00164 g004
Figure 5. Schematic diagram of the ViT module.
Figure 5. Schematic diagram of the ViT module.
Remotesensing 18 00164 g005
Figure 6. Coverage areas of two different experimental datasets (the base map is derived from Google Earth) and their corresponding pseudo-color images. (a) Coverage areas of two datasets. (b) Gaofen-3 pauli pseudo-color image (Blue:|T11|, Green:|T33|, Red:|T22|). (c) Sentienl-1 pseudo-color image (Blue:|C11|, Green:|C11-2real(C12)+C22|, Red:|C22|).
Figure 6. Coverage areas of two different experimental datasets (the base map is derived from Google Earth) and their corresponding pseudo-color images. (a) Coverage areas of two datasets. (b) Gaofen-3 pauli pseudo-color image (Blue:|T11|, Green:|T33|, Red:|T22|). (c) Sentienl-1 pseudo-color image (Blue:|C11|, Green:|C11-2real(C12)+C22|, Red:|C22|).
Remotesensing 18 00164 g006
Figure 7. Comparison of backscattering intensity for different types of salt crusts in two datasets. (a) Gaofen-3. (b) Sentinel-1.
Figure 7. Comparison of backscattering intensity for different types of salt crusts in two datasets. (a) Gaofen-3. (b) Sentinel-1.
Remotesensing 18 00164 g007
Figure 8. H / α plane density maps of different surface types from Gaofen-3 fully polarimetric data. (a) Cracked salt crust. (b) Sharp-edged salt crust. (c) Micro-hilly salt crust. (d) Flat salt crust. (e) Shadow areas. (f) Gravel and sand dune areas.
Figure 8. H / α plane density maps of different surface types from Gaofen-3 fully polarimetric data. (a) Cracked salt crust. (b) Sharp-edged salt crust. (c) Micro-hilly salt crust. (d) Flat salt crust. (e) Shadow areas. (f) Gravel and sand dune areas.
Remotesensing 18 00164 g008
Figure 9. H / α plane density maps of different surface types from Sentinel-1 dual-polarimetric data. (a) Cracked salt crust. (b) Sharp-edged salt crust. (c) Micro-hilly salt crust. (d) Flat salt crust. (e) Shadow areas. (f) Gravel and sand dune areas.
Figure 9. H / α plane density maps of different surface types from Sentinel-1 dual-polarimetric data. (a) Cracked salt crust. (b) Sharp-edged salt crust. (c) Micro-hilly salt crust. (d) Flat salt crust. (e) Shadow areas. (f) Gravel and sand dune areas.
Remotesensing 18 00164 g009
Figure 10. Feature images and H / α -Wishart unsupervised classification results for the two datasets. (a) Entroy feature image in Gaofen-3. (b) Alpha feature image in Gaofen-3. (c) H / α -Wishart Classification result in Gaofen-3. (d) Entroy feature image in Sentinel-1. (e) Alpha feature image in Sentinel-1. (f) H / α -Wishart Classification result in Sentinel-1.
Figure 10. Feature images and H / α -Wishart unsupervised classification results for the two datasets. (a) Entroy feature image in Gaofen-3. (b) Alpha feature image in Gaofen-3. (c) H / α -Wishart Classification result in Gaofen-3. (d) Entroy feature image in Sentinel-1. (e) Alpha feature image in Sentinel-1. (f) H / α -Wishart Classification result in Sentinel-1.
Remotesensing 18 00164 g010
Figure 11. Classification maps obtained by different methods for Gaofen-3 fully polarimetric data. (a) Reference map. (b) RF. (c) Swin Transformer. (d) CFAT. (e) Hybrid CVNet. (f) PolSAR Fomer. (g) Lightweight DeepLabV3+. (h) The Proposed Method. (i) Legend.
Figure 11. Classification maps obtained by different methods for Gaofen-3 fully polarimetric data. (a) Reference map. (b) RF. (c) Swin Transformer. (d) CFAT. (e) Hybrid CVNet. (f) PolSAR Fomer. (g) Lightweight DeepLabV3+. (h) The Proposed Method. (i) Legend.
Remotesensing 18 00164 g011
Figure 12. Classification maps obtained by different methods for Sentinel-1 dual-polarimetric data. (a) Reference map. (b) RF. (c) Swin Transformer. (d) CFAT. (e) Hybrid CVNet. (f) PolSAR Fomer. (g) Lightweight DeepLabV3+. (h) The Proposed Method. (i) Legend.
Figure 12. Classification maps obtained by different methods for Sentinel-1 dual-polarimetric data. (a) Reference map. (b) RF. (c) Swin Transformer. (d) CFAT. (e) Hybrid CVNet. (f) PolSAR Fomer. (g) Lightweight DeepLabV3+. (h) The Proposed Method. (i) Legend.
Remotesensing 18 00164 g012
Figure 13. Pseudo-color images of Sentinel-1 data at different time points (The red ellipses indicate areas affected by human activities). (a) 6 April 2019. (b) 10 August 2020. (c) 10 September 2021. (d) 12 August 2022. (e) 13 October 2023. (f) 4 May 2025.
Figure 13. Pseudo-color images of Sentinel-1 data at different time points (The red ellipses indicate areas affected by human activities). (a) 6 April 2019. (b) 10 August 2020. (c) 10 September 2021. (d) 12 August 2022. (e) 13 October 2023. (f) 4 May 2025.
Remotesensing 18 00164 g013
Figure 14. Surface type distribution maps of Sentinel-1 data at different time points (The black rectangular regions indicate areas with more pronounced changes, while the red ellipses indicate areas affected by human activities). (a) 6 April 2019. (b) 10 August 2020. (c) 10 September 2021. (d) 12 August 2022. (e) 13 October 2023. (f) 1 August 2024. (g) 4 May 2025. (h) Legend.
Figure 14. Surface type distribution maps of Sentinel-1 data at different time points (The black rectangular regions indicate areas with more pronounced changes, while the red ellipses indicate areas affected by human activities). (a) 6 April 2019. (b) 10 August 2020. (c) 10 September 2021. (d) 12 August 2022. (e) 13 October 2023. (f) 1 August 2024. (g) 4 May 2025. (h) Legend.
Remotesensing 18 00164 g014
Figure 15. Locally enlarged surface type distribution maps of Sentinel-1 data at different time points (The black rectangular regions indicate areas with more pronounced changes, while the red ellipses indicate areas affected by human activities). (a) 6 April 2019. (b) 10 August 2020. (c) 10 September 2021. (d) 12 August 2022. (e) 13 October 2023. (f) 1 August 2024. (g) 4 May 2025. (h) Legend.
Figure 15. Locally enlarged surface type distribution maps of Sentinel-1 data at different time points (The black rectangular regions indicate areas with more pronounced changes, while the red ellipses indicate areas affected by human activities). (a) 6 April 2019. (b) 10 August 2020. (c) 10 September 2021. (d) 12 August 2022. (e) 13 October 2023. (f) 1 August 2024. (g) 4 May 2025. (h) Legend.
Remotesensing 18 00164 g015aRemotesensing 18 00164 g015b
Figure 16. Variations in precipitation and temperature in the Qarhan Salt Lake region from January 2019 to June 2025.
Figure 16. Variations in precipitation and temperature in the Qarhan Salt Lake region from January 2019 to June 2025.
Remotesensing 18 00164 g016
Table 1. The extracted initial features for proposed method.
Table 1. The extracted initial features for proposed method.
Data TypeFeaturesDescriptionDimension
Fully polarimetricT11, T22, T33Elements of coherency matrix3
FDD_Odd, FDD_Dbl, FDD_VolFreeman decomposition3
Y4R_Odd, Y4R_Dbl, Y4R_Vol, Y4R_HlxYamaguchi decomposition4
AY_Odd, AY_Dbl, AY_VolAn & Yang decomposition3
VZ_Odd, VZ_Dbl, VZ_VolVan Zyl decomposition3
Entropy, Anisotropy, AlphaH/A/alpha decomposition3
SERD, DERDEigenvalues relative difference2
SpanSpan of coherency matrix1
Dual polarimetricC11, C22Elements of covariance matrix2
Entropy, Anisotropy, AlphaH/A/alpha decomposition3
SpanSpan of covariance matrix1
Odd, Dbl, Vol, and Hlx denote surface, double-bounce, volume, and helical scattering, respectively.
Table 2. Classification accuracies of different methods on Gaofen-3 fully polarimetric data.
Table 2. Classification accuracies of different methods on Gaofen-3 fully polarimetric data.
RFSwin TransfomerCFATHybrid CVNetPolSAR FormerLightweight DeepLabV3+Proposed Method
Cracked93.1989.5898.6697.8196.2393.6498.60
Sharp-edged83.1472.3084.6288.8186.1388.8890.82
Micro-hilly92.2298.4195.9996.5897.3497.7098.55
Flat81.6895.0089.3890.1892.0195.9994.79
Dune-shadow82.5687.8088.4395.3790.4690.2298.06
Dune87.0893.4293.2691.8195.3597.6797.67
OA (%)88.2491.2293.3494.5893.8794.2496.91
AA (%)86.6590.0191.7293.4392.9294.0196.41
Kappa × 10086.0188.8891.5593.1392.2392.7196.08
Table 3. Classification accuracies of different methods on Sentinel-1 dual-polarimetric data.
Table 3. Classification accuracies of different methods on Sentinel-1 dual-polarimetric data.
RFSwin TransformerCFATHybrid CVNetPolSAR FormerLightweight DeepLabV3+Proposed Method
Cracked90.0494.7397.4093.1595.6092.2191.92
Sharp-edged67.6080.1567.6863.3384.7477.5286.30
Micro-hilly88.3592.6797.3496.9593.0995.5290.20
Flat68.4684.7084.5488.0679.0185.7988.01
Dune-shadow84.0585.5493.0194.2388.2193.8894.15
Dune84.7498.9591.5897.8996.5897.37100.00
OA (%)80.9288.1090.5190.6988.1490.7591.92
AA (%)80.5489.4688.5988.9489.5390.3892.46
Kappa × 10077.4384.8287.7788.0384.8088.1289.65
Table 4. Comparison of computational complexity among different methods.
Table 4. Comparison of computational complexity among different methods.
RFCFATHybrid CVNetSwin TransformerPolSAR FormerLightweight DeepLabV3+Proposed Method
Params-G (M)N/A1.19401.29340.83953.26100.25500.9341
FLOPs-G (G)N/A0.69144.22160.51263.11180.07150.8109
Params-S (M)N/A1.19211.25200.83741.77410.25360.9157
FLOPs-S (G)N/A0.69062.12390.51170.92540.07090.6077
“-G” and “-S” indicate the Params and FLOPs corresponding to Gaofen-3 and Sentinel-1 data, respectively.
Table 5. Classification accuracies of different combinations of sub-modules.
Table 5. Classification accuracies of different combinations of sub-modules.
ModulesGaofen-3Sentinel-1
CAESingle-Scale FEMMulti-Scale FEMViTOA (%)AA (%)Kappa × 100OA (%)AA (%)Kappa × 100
×××94.3093.3792.7688.4288.3285.21
××95.7994.9094.6590.1489.6087.39
××94.6295.5893.2190.0189.7488.48
×96.4895.7895.5391.1591.4588.65
×96.9196.4196.0891.9292.4689.65
Table 6. Computational complexity of different module combinations.
Table 6. Computational complexity of different module combinations.
ModulesGaofen-3Sentinel-1
CAESingle-Scale FEMMulti-Scale FEMViTParamsFLOPsParamsFLOPs
×××0.86780.01730.81580.0164
××0.91770.80440.89920.6012
××0.89600.02950.81800.0173
×0.92250.80650.90400.6033
×0.93410.81090.91570.6077
Table 7. Classification accuracies of the proposed method under different patch sizes.
Table 7. Classification accuracies of the proposed method under different patch sizes.
Patch SizeGaofen3Sentinel1
OA (%)AA (%)Kappa × 100OA (%)AA (%)Kappa × 100
9 × 9 95.3394.9394.0887.8188.2484.38
12 × 12 95.7394.6894.5789.3289.5086.33
15 × 15 96.9196.4196.0891.9292.4689.65
18 × 18 97.0396.4096.2291.6891.9389.35
21 × 21 97.0596.7896.2691.4092.2489.02
Table 8. Classification accuracies of the proposed method using different input features.
Table 8. Classification accuracies of the proposed method using different input features.
AccuracyOA (%)AA (%)Kappa × 100
6-D Features95.3994.3794.15
7-D Features96.4695.7495.50
9-D Features93.6491.1091.91
16-D Features96.6995.9895.80
22-D Features96.9196.4196.08
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dong, F.; Yin, Q.; Zhang, J.; Yan, Q.; Hong, W. Polarimetric SAR Salt Crust Classification via Autoencoded and Attention-Enhanced Feature Representation. Remote Sens. 2026, 18, 164. https://doi.org/10.3390/rs18010164

AMA Style

Dong F, Yin Q, Zhang J, Yan Q, Hong W. Polarimetric SAR Salt Crust Classification via Autoencoded and Attention-Enhanced Feature Representation. Remote Sensing. 2026; 18(1):164. https://doi.org/10.3390/rs18010164

Chicago/Turabian Style

Dong, Fabin, Qiang Yin, Juan Zhang, Qunxiong Yan, and Wen Hong. 2026. "Polarimetric SAR Salt Crust Classification via Autoencoded and Attention-Enhanced Feature Representation" Remote Sensing 18, no. 1: 164. https://doi.org/10.3390/rs18010164

APA Style

Dong, F., Yin, Q., Zhang, J., Yan, Q., & Hong, W. (2026). Polarimetric SAR Salt Crust Classification via Autoencoded and Attention-Enhanced Feature Representation. Remote Sensing, 18(1), 164. https://doi.org/10.3390/rs18010164

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop