Polarimetric Synthetic Aperture Radar Image Classification Based on Double-Channel Convolution Network and Edge-Preserving Markov Random Field

Shi, Junfei; Nie, Mengmeng; Ji, Shanshan; Shi, Cheng; Liu, Hongying; Jin, Haiyan

doi:10.3390/rs15235458

Open AccessArticle

Polarimetric Synthetic Aperture Radar Image Classification Based on Double-Channel Convolution Network and Edge-Preserving Markov Random Field

by

Junfei Shi

^1,2

,

Mengmeng Nie

^1,2,

Shanshan Ji

^1,2,

Cheng Shi

^1,2,

Hongying Liu

³

and

Haiyan Jin

^1,2,*

¹

School of Computer Science and Technology, Xi’an University of Technology, Xi’an 710048, China

²

Shaanxi Key Laboratory for Network Computing and Security Technology, Xi’an University of Technology, Xi’an 710048, China

³

Medical College, Tianjin University, Tianjin 300134, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(23), 5458; https://doi.org/10.3390/rs15235458

Submission received: 30 September 2023 / Revised: 8 November 2023 / Accepted: 17 November 2023 / Published: 22 November 2023

(This article belongs to the Special Issue Modeling, Processing and Analysis of Microwave Remote Sensing Data)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Deep learning methods have gained significant popularity in the field of polarimetric synthetic aperture radar (PolSAR) image classification. These methods aim to extract high-level semantic features from the original PolSAR data to learn the polarimetric information. However, using only original data, these methods cannot learn multiple scattering features and complex structures for extremely heterogeneous terrain objects. In addition, deep learning methods always cause edge confusion due to the high-level features. To overcome these limitations, we propose a novel approach that combines a new double-channel convolutional neural network (CNN) with an edge-preserving Markov random field (MRF) model for PolSAR image classification, abbreviated to “DCCNN-MRF”. Firstly, a double-channel convolution network (DCCNN) is developed to combine complex matrix data and multiple scattering features. The DCCNN consists of two subnetworks: a Wishart-based complex matrix network and a multi-feature network. The Wishart-based complex matrix network focuses on learning the statistical characteristics and channel correlation, and the multi-feature network is designed to learn high-level semantic features well. Then, a unified network framework is designed to fuse two kinds of weighted features in order to enhance advantageous features and reduce redundant ones. Finally, an edge-preserving MRF model is integrated with the DCCNN network. In the MRF model, a sketch map-based edge energy function is designed by defining an adaptive weighted neighborhood for edge pixels. Experiments were conducted on four real PolSAR datasets with different sensors and bands. The experimental results demonstrate the effectiveness of the proposed DCCNN-MRF method.

Keywords:

PolSAR image classification; Wishart-based complex matrix network; multi-feature network; double-channel convolution network; edge-preserving MRF

1. Introduction

Polarimetric synthetic aperture radar (PolSAR) is an active radar imaging system that emits and receives electromagnetic waves in multiple polarimetric directions [1]. In comparison to the single-polarimetric SAR system, a fully polarimetric SAR system can capture more scattering information from ground objects through four polarimetric modes, which can produce a

2 \times 2

scattering matrix instead of complex-valued data. The advantages of PolSAR systems have led to their widespread application in various fields, such as military monitoring [2], object detection [3], crop growth prediction [4], and terrain classification [5]. One particular task related to PolSAR is image classification by assigning a class label to each pixel. This is a fundamental and essential task for further automatic image interpretation. In the past few decades, various PolSAR image classification methods have been proposed, which mainly include traditional scattering mechanism-based methods and more recent deep learning-based methods.

Traditional scattering mechanism-based methods primarily focus on exploiting the scattering features and designing classifiers, which can be categorized into three main groups. The first category comprises statistical distribution-based methods that leverage the statistical characteristics of PolSAR complex matrix data, such as Wishart [6,7,8,9], mixed Wishart [10,11,12,13],

G 0

[14], Kummer [15] distribution. These methods try to exploit various non-Gaussian distribution models for heterogeneous PolSAR images. However, estimating parameters for non-Gaussian models can be a complex task. The second category is the target decomposition-based methods that extract scattering features from target decomposition to differentiate various terrain objects. Some commonly employed methods for target scattering information extraction include Cloude and Pottier decomposition [16,17], Freeman decomposition [18], four-component decomposition [19],

m - δ

decomposition [20], and eigenvalue decomposition [21]. These methods are designed to distinguish different objects based on the extracted information. Nevertheless, it is important to note that these pixel-wise methods easily produce classes with speckle noise. To address this issue, some researchers have explored the combination of statistical distribution and scattering features, including

W G - Γ

[22], K-Wishart [23],

Γ - W M M

[24], and other similar approaches. In these approaches, the initial classification result is obtained by utilizing the scattering features and then further optimized using a statistical distribution model. However, these methods based on scattering mechanisms tend to overlook the incorporation of high-level semantic information. Additionally, they face challenges in effectively learning the complicated textural structures associated with heterogeneous terrain types, including buildings and forests.

Recently, deep learning models have achieved remarkable performance in learning high-level semantic features, and so they are extensively utilized in the domain of PolSAR image classification. In light of the valuable information contained within PolSAR original data, numerous deep learning methods have been developed for PolSAR image classification. Deng et al. [21] proposed a deep belief network for PolSAR image classification. Furthermore, Jiao et al. [25] introduced the Wishart deep stacking network for fast PolSAR image classification. Later, Dong et al. [26] applied neural structure searching to PolSAR images, which performed well. In a separate study, Xie et al. [27] developed a semi-supervised recurrent complex-valued convolutional neural network (CNN) model that could effectively learn complex data, thereby improving the classification accuracy. Liu et al. [28] derived an active assemble deep learning method that incorporated active learning into a deep network. This method significantly reduced the number of training samples required for PolSAR image classification. Additionally, Liu et al. [29] further constructed an adaptive graph model to decrease computational complexity and enhance classification performance. Luo et al. [30] proposed a novel approach for multi-temporal PolSAR image classification by combining a stacking auto-encoder network with a CNN model. Ren et al. [31] improved the complex-valued CNN method and proposed a new structure to learn complex features of PolSAR data. These deep learning methods focused on learning the polarimetric features and scattering high-level features to enhance the performance of classification algorithms. However, these methods only utilized the original data, which may lead to the misclassification of extremely heterogeneous terrain objects, such as buildings, forests, and mountains. This is because there are significant scattering and textural structure variations within heterogeneous objects, which make it difficult to extract high-level semantic features using complex matrix learning alone.

Nowadays, there are many advantages in the field of PolSAR image classification to the multiple scattering feature-based deep learning methods. It is widely recognized that the utilization of various target decomposition-based and textural features can greatly improve the accuracy of PolSAR image classification. However, one crucial aspect to improving classification performance is feature selection. To address this issue, Yang et al. [32] proposed a CNN-based polarimetric feature selection model. This model incorporated the use of the Kullback–Leibler distance to select feature subsets and employed a CNN to identify the optimal features that could enhance classification accuracy. Bi et al. [33] proposed a method that combined low-rank feature extraction, a CNN, and a Markov random field (MRF) for classification. Dong et al. [34] introduced an end-to-end feature learning and classification method for PolSAR images. In their approaches, high-dimensional polarimetric features were directly inputted into a CNN, allowing the network to learn discriminating representations for classification. Furthermore, Wu et al. [35] proposed a statistical-spatial feature learning network that aimed to jointly learn both statistical and spatial features from PolSAR data while also reducing the speckle noise. Shi et al. [36] presented a multi-feature sparse representation model that enabled learning joint sparse features for classification. Furthermore, Liang et al. [37] introduced a multi-scale deep feature fusion and covariance pooling manifold network (MFFN-CPMN) for high-resolution SAR image classification. This network combined the benefits of local spatial features and global statistical properties to enhance classification performance. These multi-feature learning methods [35,38] have the ability to automatically fuse and select multiple polarimetric and scattering features to improve classification performance. However, these methods ignored the statistical distribution of the original complex matrix, resulting in the loss of channel correlation.

The aforementioned deep learning methods solely focused on either the original complex matrix data or multiple scattering features. However, it is important to note that these two types of data can offer complementary information. Unfortunately, only a few methods are capable of utilizing both types of data simultaneously. This limitation arises due to the different structures and distributions of the two types of data, which cannot be directly employed in the same data space. To combine them, Shi et al. [36] proposed a complex matrix and multi-feature joint learning method, which constructed a complex matrix dictionary in the Riemannian space and a multi-feature dictionary in the Euclidean space and further jointly learned the sparse features for classification. However, it has been observed that this method is unable to effectively learn high-level semantic features, particularly for heterogeneous terrain objects. In this paper, we construct a double-channel convolution network (DCCNN) that aims to effectively learn both the complex matrix and multiple features. Additionally, a unified fusion module is designed to combine both of them.

Furthermore, deep learning-based methods demonstrate a strong ability to effectively learn semantic features for heterogeneous PolSAR images. However, it is important to note that the utilization of high-level features often leads to the loss of edge details. This phenomenon can be attributed to the fact that two neighboring pixels across an edge have similar high-level semantic features, which are extracted from large-scale contextual information. Therefore, high-level features cannot identify the edge details, as a result of edge confusion. In order to address this issue and mitigate the impact of speckle noise, the MRF [39,40] has emerged as a valuable tool in remote sensing image classification. For example, Song et al. [22] combined the MRF with the WGt mixed model, which could capture both the statistical distribution and contextual information simultaneously. Karachristos et al. [41] proposed a novel method that utilized hidden Markov models and target decomposition representation to fully exploit the scattering mechanism and enhance classification performance. The traditional MRF with a fixed square neighborhood window is considered effective in removing speckle noise but tends to blur the edge pixels. This is because, for edge pixels, the neighbors should be along the edge instead of the square box. Considering the edge direction, Liu. et al. [42] proposed the polarimetric sketch map to describe the edges and structure of PolSAR images. Inspired by the polarimteric sketch map, in this paper, we define an adaptive weighted neighborhood structure for edge pixels. Then, an edge preserving prior term is designed to optimize the edges with an adaptive weighted neighborhood. Therefore, by implementing appropriate contextual design, the MRF has the ability to modify the edge details. It can not only smooth the classification map to reduce speckles, but also preserve edges through designing a suitable adaptive neighborhood prior term.

To preserve edge details, we combine the proposed DCCNN model and the MRF. By leveraging the strengths of both semantic features and edge preservation, the proposed method aims to achieve optimal results. Furthermore, we develop an edge-preserving prior term that specifically addresses the issue of blurred edges. Therefore, the main contributions of our proposed method can be summarized into three aspects, as follows:

(1): Based on different traditional deep learning networks with either the complex matrix or multiple features as the input, our method presents a novel double-channel CNN (DCCNN) network that jointly learns both complex matrix and multi-feature information. By designing Wishart and multi-feature subnetworks, the DCCNN model can not only learn pixel-wise complex matrix features, but also extract high-level discriminating features for heterogeneous objects.
(2): In this paper, the Wishart-based complex matrix and multi-feature subnetworks are integrated into a unified framework, and a weighted fusion module is presented to adaptively learn the valuable features and suppress useless features in order to improve the classification performance.
(3): A novel DCCNN-MRF method is proposed by combining the proposed DCCNN model with an edge-preserving MRF, which can classify heterogeneous objects effectively, as well as revising the edges. In contrast to conventional square neighborhoods, in the DCCNN-MRF model, a sketch-based adaptive weighted neighborhood is designed to construct the prior term and preserve edge details.

The remaining sections of this paper are structured as follows. Related work is introduced in Section 2. Section 3 explains the proposed method in detail. The experimental results and analysis are given in Section 4, and the conclusions are summarized in Section 5.

2. Related Work

2.1. PolSAR Data

PolSAR data represent the back-scattering echo waves produced by electromagnetic wave under horizontal and vertical polarization. The scattering matrix S is obtained by

S = [\begin{matrix} S_{h h} & S_{h v} \\ S_{v h} & S_{v v} \end{matrix}]

(1)

where

S_{h h}

represents the horizontally emitted and vertically received scattering waves. The case is similar for

S_{h v}

,

S_{v h}

, and

S_{v v}

. Under the Pauli basis, the scattering matrix can be vectored as

k = \frac{1}{\sqrt{2}} {[S_{h h} + S_{v v}, S_{h h} - S_{v v}, 2 S_{h v}]}^{T}

. After multi-look processing, the coherency matrix T is expressed as

T = k_{i} k_{j}^{H} = [\begin{matrix} T_{11} & T_{12} & T_{13} \\ T_{21} & T_{22} & T_{23} \\ T_{31} & T_{32} & T_{33} \end{matrix}]

(2)

where

{(\cdot)}^{H}

is the conjugate transpose operation. The coherency matrix T is a Hermitian symmetry matrix, in which the diagonal elements are real numbers while the others are complex numbers, and

T_{12} = T_{21}^{H}

. In addition, another PolSAR data representation, such as the covariance matrix C, can be converted from T through linear transformation.

2.2. Polarimetric Sketch Map

A polarimetric sketch map [43] is a powerful tool for analyzing PolSAR images that can provide a comprehensive overview of their basic structure. Moreover, it can effectively represent the inner structures of complex heterogeneous terrain types. Liu et al. [42] proposed a hierarchical semantic model (HSM) that combined a constant false-alarm rate (CFAR) detector and weighted gradient-based detector to obtain a polarimetric sketch map for object detection.

The process of extracting a polarimetric sketch map is explained as follows. Firstly, the polarimetric edge-line detection technique is applied to obtain the polarimetric energy map. Non-maximum suppression is then utilized to obtain the edge map. Finally, the hypothesis-testing method is employed to select the sketch lines, producing the polarimetric sketch map. Figure 1 shows polarimetric sketch maps for the Xi’an and Flevoland datasets. The Pauli RGB images of Xi’an and Flevoland are shown in Figure 1a,b, respectively. Figure 1c,d show the corresponding polarimetric sketch maps. It is evident that the polarimetric sketch map serves as a sparse structural representation of the PolSAR image, effectively describing both edges and object structures. The sketch map consists of sketch segments with a certain orientation and length.

3. Proposed Method

In this paper, a novel DCCNN-MRF method is proposed for PolSAR image classification, whose framework is illustrated in Figure 2. Firstly, a refined Lee filter [44] is applied to the original PolSAR image to reduce the speckle noise. Then, a double-channel convolution network is developed to jointly learn the complex matrix and multiple features. On the one hand, a Wishart-based convolutional network is designed, which utilizes the complex matrix as the input and defines the Wishart measurement as the first convolution layer. The Wishart convolution network can effectively measure the similarity of complex matrices. Following this initial step, a traditional CNN is employed to learn deeper features. On the other hand, a multi-feature subnetwork is specifically designed to acquire various polarimetric scattering features. These features serve the purpose of providing supplementary information for the Wishart convolution network. Subsequently, a unified framework is developed to adaptively merge the outputs of the two subnetworks. To accomplish this fusion, multiple layer convolutions are employed to effectively combine the two types of features. Secondly, to suppress speckle noise and revise the edges, a MRF model is incorporated with the DCCNN network. This integration also improves the overall performance of image classification. The data term in the MRF model is defined as the class probability obtained from the DCCNN model, and the prior term is designed using an edge penalty function. The purpose of this edge penalty function is to reduce the confusion related to edges that may arise due to the high-level features of the deep model.

3.1. Double-Channel Convolution Network

In this paper, a DCCNN is proposed to jointly learn the complex matrix and various scattering features from PolSAR data, as shown in Figure 2. The DCCNN network consists of two subnetworks: the Wishart-based complex matrix and multi-feature subnetworks, which can learn complex matrix relationships and various polarimetric features, respectively. Then, a unified feature fusion module is designed to combine different features dynamically, which provides a unified framework for integrating complex matrix and multi-feature learning. The incorporation of complementary information further enhances the classification performance.

(1) Wishart-based complex matrix subnetwork

Traditional deep learning methods commonly convert the polarimetric complex matrix into a column vector. However, this conversion process results in the loss of both the matrix structure and the data distribution of PolSAR data. To effectively capture the characteristics of the complex matrix effectively, a Wishart-based complex matrix network is designed. This network aims to learn the statistical distribution of the PolSAR complex matrix. The first layer in the neural network architecture is the Wishart convolution layer. This layer is responsible for converting the Wishart metric into a linear transformation, which corresponds to the convolution operation. To be specific, the coherency matrix T, which is widely known to follow the Wishart distribution, is calculated by the Wishart distance in this layer. For example, the distance between the jth pixel

T_{m}

and the ith class center

W_{i}

can be measured by the Wishart distance, defined as

d (T_{j}, W_{i}) = ln |W_{i}| + T r ({W_{i}}^{- 1} T_{j})

(3)

where

ln (\cdot)

is the log operation,

T r (\cdot)

is the trace operation of a matrix, and

|\cdot|

is the determinant operation of a matrix. However, the Wishart metric is not directly applicable to the convolution network due to its reliance on complex matrices. In [25], Jiao et al. proposed a method to convert the Wishart distance into a linear operation. Firstly, the T matrix is converted into a vector as follows:

\bar{T} = [T_{11}, T_{22}, T_{33}, r e a l (T_{12}), i m a g (T_{12}), r e a l (T_{13}), i m a g (T_{13}), r e a l (T_{23}), i m a g (T_{23})]

(4)

where

r e a l (\cdot)

and

i m a g (\cdot)

are used to extract the real and imagery parts of a complex number, respectively. This allows for the conversion of a complex matrix into a real-valued vector, where each element is a real value. Then, the Wishart convolution can be defined as

h (I_{i}) = {\bar{W}}^{T} \bar{T_{i}} + b

(5)

where

\bar{W}

is the convolution kernel;

\bar{T_{i}}

is the ith pixel value; b is the bias vector defined as

b = [ln (|〈T_{1}〉|), ln (|〈T_{2}〉|), \dots, ln (|〈T_{C}〉|)]

; and

h (\cdot)

is the output of the Wishart convolution layer. Although it is a linear operation on vector

\bar{T_{i}}

, it is equal to the Wishart distance between pixel

T_{i}

and class center W.

In addition, to learn the statistical characteristics of complex matrices, we initialize the convolution kernel as the class center. Thus, the Wishart convolution is interpretable and can learn the distance between each pixel and the class centers. Thus, it overcomes the non-interpretability of traditional networks. The number of kernels is set equal to the number of classes, and the initial convolution kernel is calculated by averaging the complex matrices of labeled samples for each class. After the first Wishart convolution layer, a complex matrix is transformed into a real value for each pixel. Subsequently, several CNN convolution layers are utilized to learn the contextual high-level features.

(2) Multi-feature subnetwork

The Wishart subnetwork is capable of effectively learning the statistical characteristics of the complex matrix. However, when it comes to heterogeneous areas, the individual complex matrices cannot learn the high-level semantic features. This is because the heterogeneous structure results in neighboring pixels having significantly different scattering matrices, even though they belong to the same class. To learn high-level semantic information in heterogeneous areas, it is necessary to employ multiple features that offer complementary information to the original data. In this paper, a set of 57-dimensional features are extracted. These features encompass both the original data and various polarimetric decomposition-based features. These features include Cloude decomposition, Freeman decomposition, and Yamaguki decomposition. The detailed feature extraction process can be found in [45], as shown in Table 1. The feature vector is defined as

F = {f_{1}, f_{2}, \dots, f_{57}}

, which describes each pixel from several perspectives. Due to the great ranges of different features, a normalization process is employed initially. Subsequently, several layers of convolutions are applied to facilitate the learning of high-level features.

In addition, the network architecture employs a three-layer convolutional structure to achieve multi-scale feature learning. The convolution kernel size is

3 \times 3

, and the moving step size is set to 1. To reduce both the parameter number and computational complexity, we select the maximum pooling method for down-sampling. This technique effectively maintains the same receptive field while reducing the spatial dimensions of the feature maps.

(3) The proposed DCCNN fusion network

To enhance the benefits derived from both the complex matrix and multiple features, a unified framework is designed to fuse these two subnetworks. To be specific, the complex matrix features $H_{0}$ are extracted from the Wishart subnetwork, and the multi-feature vector $H_{1}$ is obtained from the multi-feature subnetwork. Then, they are weighted and connected to construct the combined feature X. Later, several CNN convolution layers are utilized to fuse them. By multiple layer convolution, all the features are fused to capture global feature information effectively. Adaptive weights are learned to automatically obtain larger weights for effective features and smaller weights for useless features. Thus, discriminating features are extracted, and useless features are suppressed. The classification accuracy of the target object can be improved by focusing on useful features. Therefore, the feature transformation of the proposed DCCNN network can be described as

H_{f} = D C C N N (w e i g h t \times (W i s h a r t (T) \oplus M u l t i - f e a t u r e (F)))

(6)

where

W i s h a r t (T)

represents the feature

H_{0}

extracted from the Wishart subnetwork based on the T matrix,

M u l t i - f e a t u r e (F)

indicates the feature

H_{1}

extracted from the multi-feature subnetwork based on the multi-feature F, ⊕ is the connection operation of

H_{0}

and

H_{1}

, and

w e i g h t

is the weight vector for the combined features. The combined features are then fed into the DCCNN, which is specifically designed to generate high-level features denoted as

H_{f}

. Finally, the softmax layer is utilized for classification.

3.2. Combining Edge-Preserving MRF and DCCNN Model

The proposed DCCNN model can effectively learn both the statistical characteristics and multiple features for PolSAR data. The learned high-level semantic features can improve the classification performance especially for heterogeneous areas. However, as the number of convolution layers increases, the DCCNN model incorporates larger-scale contextual information. While this is beneficial for capturing global patterns and relationships, it poses challenges for edge pixels. The high-level features learned by the model struggle to accurately identify neighboring pixels that cross the edge with different classes. Consequently, deep learning methods always blur edge details with high-level features. In order to learn the contextual relationships for heterogeneous terrain objects and simultaneously accurately identify edge features, we combine the proposed DCCNN network with the MRF to optimize the pixel level classification results.

The MRF is a widely used probability model that can learn contextual relationship by designing an energy function. The MRF can learn the pixel features effectively, as well as incorporating contextual information. In this paper, we design an edge penalty function to revise the edge pixels and suppress the speckle noise. Within the MRF framework, an energy function is defined, which consists of data and prior terms. The data term represents the probability of each pixel belonging to a certain class, while the prior term is the class prior probability. The energy function is defined as

U (Y | X) = \sum_{S} U_{1} (x_{s} | y_{s}) + \sum_{s} U_{2} (y_{s} | y_{r}, r \in η_{s})

(7)

where

U_{1} (x_{s} | y_{s})

is the data term, which stands for the probability of data

x_{s}

belonging to class

y_{s}

for pixel s. In this paper, we define the data term as the probability learned from the DCCNN model. The probability from the DCCNN model is normalized to

(0, 1)

.

U_{2} (y_{s} | y_{r}, r \in η_{s})

is the prior term, which is the prior probability of class

y_{s}

. In the MRF, the spatial contextual relationship is implemented to learn the prior probability.

η_{s}

is a neighboring set of pixel s, and r is the neighboring pixel of s. When neighboring pixel r has the same class label as pixel s, the probability increases; otherwise, it decreases. When none of the neighboring pixels belong to class

y_{s}

, it indicates that pixel s may likely be a noisy point. In such cases, it is advisable to revise the classification of pixel s to match the majority class of its neighboring pixels. In addition, the neighborhood set is essential for the prior term. If pixel s belongs to a non-edge region, a

3 \times 3

square neighbor is suitable for suppressing speckle noise. If pixel s is nearing the edges, its neighbors should be pixels along the edges instead of pixels in a square box. Furthermore, it is not fair that all the neighbors contribute to the pixel with the same probability, especially for edge pixels. Pixels on the same side of the edge are similar to the central pixel, which should have a higher probability than totally different pixels crossing the edge, even though they are also close to the central pixel. Neighboring pixels crossing the edge with a completely different class are unfavorable for estimating the probability of pixel s and can even lead to erroneous estimation.

In this paper, we firstly define the edge and non-edge regions for a PolSAR image by utilizing the polarimetric sketch map [43]. The polarimetric sketch map is calculated by polarimetric edge detection and sketch pursuit methods. Each sketch segment is characterized by its direction and length. Then, edge regions are extracted using a geometric structure block to expand a certain width along the sketch segments, such as a five-pixel width. Figure 3 illustrates examples of edge and non-edge regions. Figure 3a shows the PolSAR PauliRGB image. Figure 3b shows the polarimetric sketch map extracted from (a). Figure 3c shows the geometric structure block. By expanding the sketch segments with (c), the edge and non-edge regions are shown in Figure 3d. Pixels in white are edge regions, while pixels in black are non-edge regions. The directions of the edge pixels are assigned as the directions of the sketch segments.

In addition, we design adaptive neighborhood sets for edge and non-edge regions. For non-edge regions, a

3 \times 3

box is utilized as the neighborhood set. For edge regions, we adopt an adaptive weighted neighborhood window to obtain the adaptive neighbors. That is, the pixels along the edges have a higher probability than the other pixels. The weight of pixel r to central pixel s is measured by the revised Wishart distance, defined as

D_{w} = \frac{1}{2} \{ln |C_{r}| + ln |C_{s}| + T r (C_{r}^{- 1} C_{s} + C_{s}^{- 1} C_{r})\}

(8)

where

C_{r}

and

C_{s}

are the covariance matrices of neighboring and central pixels, respectively. According to the Wishart measurement, the weight of neighboring pixel r to central pixel s is defined as

W_{r s} = exp (- D_{w})

(9)

The adaptive weighted neighboring window is shown in Figure 4. Figure 4a shows the Pauli RGB subimage of the Xi’an area, in which pixel A is in the non-edge region, while pixels B and C belong to edge regions. Figure 4b shows the class label map of (a). We select a

3 \times 3

neighborhood window for pixel A in the non-edge region, as shown in Figure 4c. Figure 4d,e depict the adaptive weighted neighbors for point B and C, respectively. In addition, for edge pixels, varying weights are assigned to the neighboring pixels. It is evident that the neighborhood pixels are always located along the edges. The black pixels that are distant from the center pixel no longer qualify as neighborhood pixels. Furthermore, neighborhood pixels with lighter colors are assigned higher weights, while pixels with darker colors have lower weights. From Figure 4c,d, we can see that pixels on the same side of the edge have higher weights than those on the other side, which could avoid the confusion of neighboring pixels crossing the edge.

According to the adaptive weighted neighborhood, we develop an edge-preserving prior term that effectively integrates the contextual relationship while simultaneously minimizing the impact of neighboring pixels that traverse the edge. The prior term is built as follows:

U_{2} (y_{s} | y_{r}, r \in η_{s}) = - β \sum_{r \in η_{s}} ω_{r s} δ (y_{s}, y_{r})

(10)

where

β

is the balance factor between data and prior terms;

y s

and

y r

are the class labels of pixel s and r, respectively;

ω_{r s}

is the neighborhood weight of pixel r to central pixel s; and

δ (y_{s}, y_{r})

is the Kronecker delta function, defined as

δ (y_{s}, y_{r}) = \{\begin{matrix} 1, y_{s} = y_{r} \\ 0, y_{s} \neq y_{r} \end{matrix}

(11)

where

δ (y_{s}, y_{r})

takes a value of 1 when

y_{s}

and

y_{r}

are equal, and 0 otherwise. It is used to describe the class relationship between the central point and its neighbor pixels. After MRF optimization, the proposed method can obtain the final classification map with both better region homogeneity in heterogeneous regions and edge preservation.

A flowchart of the proposed DCCNN-MRF method is presented in Figure 5. Firstly, the refined Lee filter is applied to reduce speckle noise. Secondly, a Wishart complex matrix subnetwork is designed to learn complex matrix features, and a multi-feature subnetwork is developed to learn multiple scattering features. Thirdly, the two kinds of features are weight-fused to select discriminating features that enhance classification performance. Fourthly, to address the issue of edge confusion, a sketch map is extracted from the PolSAR image, and an adaptive weighted neighborhood window is constructed to design an edge-preserving MRF prior term. Finally, the proposed DCCNN-MRF method combines the data term from the DCCNN model and the edge-preserving prior term, which can classify the heterogeneous objects into homogenous regions, as well as preserving edge details. The proposed DCCNN-MRF algorithm procedure is described in Algorithm 1.

Algorithm 1 Procedure of the proposed DCCNN-MRF method

Input: PolSAR original data S, class label map

L

, balance factor

β

, and class number C.

Step 1: Apply a refined Lee filter to PolSAR data to obtain the filtered coherency matrix T.

Step 2: Extract multiple scattering features F from PolSAR images based on Table 1.

Step 3: Learn the complex matrix features

H_{0}

from coherency matrix T using the Wishart subnetwork.

Step 4: Learn the high-level features

H_{1}

from multiple features F by the multi-feature subnetwork.

Step 5: Weight-fuse

H_{0}

and

H_{1}

into the DCCNN model and learn the fused feature

H_{f}

.

Step 6: Obtain the class probability P and estimated class label map Y using the DCCNN model.

Step 7: Obtain the sketch map of the PolSAR image and compute the adaptive weighted neighbors for edge pixels by Equation (9).

Step 8: Optimize the estimated class label Y using Equation (7) according to the edge-preserving MRF model.

Output: class label estimation map Y.

4. Experimental Results and Analysis

4.1. Experimental Data and Settings

In this section, the effectiveness of the proposed method is validated using four sets of PolSAR images that have different bands and sensors. An overview of the four PolSAR datasets is provided in Table 2. Detailed descriptions are provided below.

(A) Xi’an dataset: The first image is a full-polarization subset acquired by the RADARSAT-2 system over the Xi’an area with a size of

512 \times 512

pixels. The Pauli RGB image and its ground-truth map are shown in Figure 6a,b, respectively. In this PolSAR image, there are three main kinds of land cover, including water, grass, and building areas.

(B) Flevoland dataset 1: The other dataset was acquired from the Flevoland area, comprising four-look L-band fully polarimetric SAR data from the AIRSAR system with a spatial resolution of

12.1 \times 6.7

m. The image size is

750 \times 1024

pixels. The Pauli RGB image and its ground-truth map are illustrated in Figure 6c,d, respectively. In this image, there are 15 types of crops, including stembean, peas, forest, lucerne, beat, wheat, potatoes, baresoil, grasses, rapeseed, barley, wheat2, wheat3, water, and buildings areas. We named this Flevoland dataset 1.

(C) San Francisco dataset: This comprises four-look C-band fully polarimetric SAR data covering the Francisco area from the RADARSAT-2 sensor. The spatial resolution is

10 \times 5

m, and the image size is

1800 \times 1380

pixels. Figure 6e,f presents a Pauli RGB image and ground-truth map for this dataset, respectively. There are five terrain types in this image, including ocean, vegetation, low-density, high-density, and developed urban areas.

(D) Flevoland dataset 2: This is another Flevoland dataset acquired by RADARSAT-2, comprising C-band data obtained over Flevoland in the Netherlands, with an image size of

1400 \times 1200

pixels. The Pauli RGB image and its ground-truth map are presented in Figure 6g,h, respectively. In this image, there are four kinds of land cover, including water, urban, woodland, and cropland areas. We named this dataset Flevoland dataset 2.

In addition, some evaluation indicators were calculated to validate the performance of the proposed method, such as the class accuracy, overall accuracy (OA), average accuracy (AA), Kappa coefficient, and confusion matrix.

To verify the proposed method, five classification algorithms were used, namely: super-RF [46], PolMPCNN [47], DBDA [48], S3ANet [49], and CV-CNN [50]. The first method is the superpixel and polarimetric feature-based classification method (shortened to “super-RF”), in which the random forest algorithm and superpixels are combined to reduce the influence of speckle noise and misclassification. The second method is the polarimetric multi-path convolutional neural network (shortened to “PolMPCNN”), in which a multi-path structure and two-scale sampling are used to learn polarization rotation angles adaptively. The third method is the double-branch dual-attention mechanism network (shortened to “DBDA”), in which two branches are designed to capture spectral and spatial features and then construct the channel and spatial attention blocks to optimize the feature maps. The fourth method is the spatial-scattering separated attention network (shortened to “S3ANet”), in which the spatial and scattering channel information is fused to acquire the feature, and then a squeeze and fusion attention unit is used to enhance the network. The last method is a complex-valued convolutional neural network (shortened to “CV-CNN”). This method applies a CNN to PolSAR data, effectively utilizing both the amplitude and phase information presented in PolSAR images.

The parameter settings included a patch size of

9 \times 9

, a learning rate of 0.001, and a batch size of 128 with 50 training epochs. The sample proportions for training and testing were set to 10% and 90%, respectively. To ensure fairness, the four comparative experiments also maintained a 10% training ratio, and all the experimental results were taken as the average accuracy over ten runs. Moreover, the experimental environment was a Windows 10 operating system with an Intel(R) Core (TM) i7-10700 CPU, 64GB RAM, and an NVIDIA GeForce RTX 3060 GPU. The deep learning platform was Python 3.7 and a PyTorch GPU 1.12.1.

4.2. Experimental Results for Xi’an Dataset

The experimental results for the comparison between our method and the five others are illustrated in Figure 7b–h, respectively. The super-RF method is a superpixel-based method that can effectively reduce speckle noise. However, the classification map in Figure 7b includes some misclassifications in the edge regions of water and building due to the utilization of low-level features. Compared to super-RF, the PolMPCNN method in (c) effectively enhanced the classification accuracy of water and building by utilizing distinct polarization convolutional networks to capture different terrain scattering characteristics. However, the boundary of these two regions lacked precision, resulting in some misclassified pixels around the boundary. In contrast, the DBDA approach shown in (d) effectively preserved edge details, while it also produced some misclassifications in the building area due to the missing global information. S3ANet, shown in (e), was capable of effectively eliminating noisy points by integrating spectral, spatial, and multi-scale features. However, there were some misclassifications at the edges of the water and building classes. The CV-CNN in (f) exhibited numerous small misclassified areas as a result of solely relying on the matrix information. In contrast, the proposed DCCNN method, as depicted in Figure 7g, demonstrated superior performance by fully exploiting multiple pieces of scattering information. Furthermore, Figure 7h showcases the outcome of combining the DCCNN with the MRF. It is evident that this approach acquired a more precise water boundary and effectively reduced speckle noise through the incorporation of the MRF energy function. The experimental results further validate the effectiveness of the proposed DCCNN-MRF method.

Moreover, Table 3 presents the classification accuracy of the proposed and comparison methods on the Xi’an dataset. The bold is the highest accuracy. It is evident that the proposed DCCNN-MRF and DCCNN methods outperformed the other methods in terms of the OA, AA, and Kappa coefficient. Specifically, the proposed DCCNN method achieved a significantly higher OA compared to the five comparison methods, with an improvement of 7.45%, 3.38%, 5.3%, 3.88%, and 5.58% respectively. In addition, the performance was further enhanced by the DCCNN-MRF method, which combined the DCCNN and MRF to effectively suppress noise and optimize edges. To be specific, the super-RF algorithm tended to produce misclassifications, particularly in the water class, since it did not fully utilize the pixel information of the edges. Similarly, both the DBDA and S3ANet algorithms also encountered misclassification issues in the water class due to their pixel-wise classification approach. Contrary to these methods, PolMPCNN reached the maximum classification accuracy of 95.52% and 97.68% for the water and building classes, respectively, but had lower accuracy for the grass class. Although the CV-CNN method achieved high classification accuracy in the water area, it still generated noticeable false classifications, particularly in the grass class. In contrast, the proposed method demonstrated superior classification accuracy across multiple classes by effectively utilizing scattering characteristics and boundary information. The confusion matrix is presented in Figure 8. Upon analyzing the values in the first row of the matrix, one can see that in the water classification results, a total of 1681 pixels were inaccurately predicted as grass. Within the grass category, there were 1385 pixels misclassified as water and 1359 pixels misclassified as building. Furthermore, the building category contained 1214 pixels that were mistakenly classified as grass, indicating a tendency for confusion between grass and water.

4.3. Experimental Results for Flevoland Dataset 1

The experimental results obtained by the five different comparison methods and our proposed method are depicted in Figure 9b–h, respectively. It is evident that our proposed method outperformed the other methods in terms of classification performance. Specifically, all comparison methods yielded excellent classification results in the areas of stembean, potatoes, wheat, peas, and wheat3. However, the super-RF method. as shown in Figure 9b, exhibited numerous instances of regular misclassified areas, such as rapeseed, grasses, and wheat2. The reason for this was that this method relied solely on superpixel-based segmentation, and the lack of deep features led to the misclassification of the entire superpixel block. The classification performance of PolMPCNN for wheat2 showed significant improvement. However, due to the similarities between the characteristics of rapeseed, wheat2, and wheat3, there were numerous misclassifications in rapeseed. In addition, a large number of pixels in barley were misclassified in (c). The DBDA method produced some noisy classes in the water, rapeseed, and barley areas due to the absence of matrix features. On the other hand, by combining spatial and scattering features, the S3ANet method could obtain better classification results in water areas. However, the pixel-wise S3ANet approach faced challenges in accurately classifying the beet and wheat2 areas accurately. The CV-CNN still misclassified rapeseed as wheat2. Additionally, some speckle noise was produced in the beet, wheat2, and wheat3 areas presented in Figure 9f due to the lack of global information. In comparison to these five methods, the proposed DCCNN method extracted deep features to enhance the classification accuracy. Moreover, the proposed DCCNN method, with the incorporation of the MRF, could reduce the number of isolated points by utilizing contextual information.

To quantitatively assess the classification performance, the classification indicators of the proposed and comparison methods are given in Table 4. The bold is the highest accuracy. It can be seen that the proposed method achieved better results than the other approaches in terms of the OA, AA, and Kappa coefficient. Specifically, when compared with the five other methods, the proposed DCCNN method showed improvements of 7.79%, 6.15%, 5.66%, 4.96%, and 1.01% in the OA, respectively. The DCCNN-MRF method further improved the performance by incorporating the MRF. According to Table 4, one can see that the entire building area was misclassified by the super-RF method due to the low-level features. Moreover, the super-RF and DBDA methods misclassified the rapeseed class. It is crucial to differentiate the scattering features of this category from other similar features to avoid confusion. However, the two methods currently employed are not effective in accurately extracting these scattering characteristics through the use of vectorized features. The PolMPCNN method exhibited a classification accuracy exceeding 90% in various categories. However, when it came to the barley class, the accuracy dropped significantly to only 40.25%. The S3ANet method achieved the highest classification accuracy in the baresoil area, yet it still showed low performance in the grasses and building areas. The CV-CNN method obtained slightly higher accuracy than that in ref. [50]. This was because the training sample ratio was set at 10% for all methods. In addition, the building class had an extremely small sample number. We expanded the number of training samples to relieve the problem of unbalanced samples. Figure 10 presents the confusion matrix of the proposed DCCNN-MRF method. From the analysis of the confusion matrix, it can be observed that only a few pixels were misclassified across all classes. Furthermore, the proposed method achieved 100% accuracy in the forest, lucerne, wheat, potatoes, baresoil, barley, wheat2, and water classes, which further validates the effectiveness of this method. The 18 pixels belonging to the beat class and the 3 pixels belonging to the building class were incorrectly classified as potatoes, while the potatoes class did not have any misclassified pixels.

4.4. Experimental Results for San Francisco Dataset

Figure 11b–h show the experimental results of the five comparison methods and the proposed method, respectively. Specifically, in Figure 11b, the super-RF method is observed to cause to confusion between the high-density urban and low-density urban classes due to the use of low-level polarimetric features. Additionally, some edges between vegetation and low-density urban areas were either lost or misclassified due to the absence of boundary information in (b). It is evident from Figure 11c that the PolMPCNN method demonstrated satisfactory classification performance in the ocean area. However, there were many pixels misclassified in other areas. On the contrary, the pixel-wise DBDA method exhibited better performance at boundary areas but tended to generate many noisy classes in vegetation areas. The S3ANet method addressed the issue of speckle noise by capturing both the scattering and the phase information, while it still resulted in misclassified pixels along the edges. It is noteworthy that the classification results of the CV-CNN method exhibited a smoother outcome, whereas some speckles were present in both low-density urban and high-density urban areas in (f) due to the utilization of low-level features. Compared to the five comparison methods, the proposed DCCNN method shown in (f) effectively strengthened the classification accuracy by fully exploiting scattering features. Furthermore, by incorporating the MRF, the proposed DCCNN-MRF method achieved superior performance, suppressing noise and optimizing edges.

More importantly, Table 5 provides the OA, AA, and Kappa metrics of various methods applied to the San Francisco dataset. The bold is the highest accuracy. It is evident from the table that the proposed DCCNN method surpassed the five comparison methods in terms of OA by achieving improvements of 5.45%, 0.85%, 4.39%, 0.64%, and 2.07%, respectively. Additionally, by incorporating the MRF, the proposed DCCNN-MRF method indicated a further improvement of 0.08% compared to the proposed DCCNN method. For the super-RF method, the classification accuracy in high-density urban areas was only 77.76%. This was because the method failed to effectively distinguish homogeneous regions. In contrast, the classification accuracy of PolMPCNN in these categories surpassed 95%, with the accuracy of ocean classification reaching 99.98%. However, the accuracy of vegetation classification was lower. The DBDA method could not classify the vegetation and developed areas, which was consistent with the experimental results in Figure 11d. Conversely, the S3ANet and CV-CNN models demonstrated exceptional performance, achieving an accuracy rate of over 95% across all indicators. Lastly, the CV-CNN technique produced false classes, particularly in low-density urban areas, as it solely relied on the T matrix as input. By fusing the complex matrix and multiple features, the proposed DCCNN and DCCNN-MRF methods could obtain the highest classification accuracy for the ocean class of all considered methods. Furthermore, the proposed DCCNN achieved an accuracy of 99.94% for the developed areas, which was 0.5% higher than that of the comparison methods.

Additionally, the confusion matrix of the DCCNN-MRF method is presented in Figure 12. Analyzing the first row of the matrix, it can be observed that only two pixels were misclassified as low-density urban in the ocean category, accounting for almost 0% of the total. However, a significant number of misclassifications occurred in other categories. Specifically, 801 pixels were incorrectly classified as low-density urban in the vegetation category, and 555 pixels were misclassified as vegetation in the low-density urban category. These results indicate that the main source of confusion was between the vegetation and low-density urban areas.

4.5. Experimental Results for Flevoland Dataset 2

As depicted in Figure 13b–h, the visualized classification results clearly demonstrate the superiority of our proposed method over the other methods. When comparing the labels in Figure 13a, it becomes evident that the super-RF method failed to correctly classify urban and cropland areas due to its reliance on low-level features. Due to the inadequate performance of PolMPCNN in accurately classifying features that were very similar, there was a significant amount of confusion in distinguishing between urban, woodland, and cropland areas in (c). In addition, both the DBDA and CV-CNN methods in Figure 13d,f generated a significant amount of speckle noise, disregarding global information. By contrast, the S3ANet method in Figure 13e exhibited effective classification results in cropland and urban areas due to the addition of attention mechanisms. Nevertheless, it produced isolated points in vegetation and cropland areas. The effectiveness of the proposed DCCNN method in reducing misclassification by learning high-level features can be readily observed. Moreover, the proposed DCCNN-MRF method could enhance the classification result by using the MRF to suppress noise and optimize edges.

Moreover, the per-class accuracy, OA, AA, and Kappa coefficients of the methods mentioned above are compared to those of the proposed model in Table 6. The analysis revealed that both the super-RF method and the DBDA method had a lower accuracy within the urban class, with rates of 81.84% and 89.37%, respectively. This decrease in accuracy could primarily be attributed to the presence of strong heterogeneous areas. The classification accuracy of PolMPCNN was 99.14% in the water class, but only 96.29% in the urban class. This observation highlights the inherent difficulty in accurately classifying urban regions. Compared to these two methods, S3ANet demonstrated the greatest classification accuracy rate of 99.91% in urban areas but exhibited lower performance in woodland areas. The CV-CNN method primarily resulted in misclassification in the cropland class. However, the proposed DCCNN method offered significant improvements in the classification of all four classes. This was achieved by leveraging its ability to extract scattering features and incorporating neighboring pixels’ information. Furthermore, the proposed DCCNN-MRF method demonstrated superior classification performance, achieving an OA of 99.86%, an AA of 99.84%, and a Kappa coefficient of 99.81%. Meanwhile, three of the existing four classes, including water, woodland, and cropland, obtained their maximum OA using the proposed DCCNN-MRF method. Additionally, the confusion matrix of our proposed DCCNN-MRF method is displayed in Figure 14. Upon analyzing the confusion matrix, it becomes apparent that within the urban area, there existed a total of 242 pixels that were erroneously classified as woodland. Conversely, there were relatively few misclassified pixels in the water region. Furthermore, the confusion matrix reveals that 428 pixels were misclassified as cropland in the woodland area. Similarly, 221 pixels were misclassified as woodland in the cropland area. The presence of a significant number of misclassifications between woodland and cropland suggests that these classes were a main source of confusion in the classification results.

4.6. Discussion

(1) Effect of each subnetwork

The proposed DCCNN contains two critical parts that aim to fully exploit scattering features. These components are the Wishart complex matrix subnetwork (shortened to “Wishart”) and the multi-feature subnetwork (shortened to “multi-feature”). To assess the contributions of these two components, we present the classification results of Wishart, multi-feature, and the complete DCCNN model on four different datasets. The classification accuracies are presented in Table 7. According to the data in Table 7, it is evident that the proposed DCCNN method consistently achieved a higher OA on all four datasets compared to both the Wishart and multi-feature subnetworks, with improvements of at least 1.62%, 0.06%, 0.03%, and 0.02%, respectively, compared to the multi-feature subnetwork. In addition, in most cases, the multi-feature method outperformed the Wishart subnetwork by exploiting more scattering features. The fused DCCNN method could obtain a higher classification accuracy than both subnetworks. This suggests that each subnetwork plays an indispensable role in the overall performance of the proposed DCCNN method.

(2) Effect of MRF

The MRF is a crucial component of the proposed DCCNN-MRF method. To vary the effect of this component, we report the classification results of the Wishart subnetwork, Wishart + MRF, multi-feature subnetwork, multi-feature + MRF, DCCNN, and DCCNN + MRF on the four datasets. In addition, the classification accuracy is summarized in Table 8. From the results, it is evident that the inclusion of the MRF in all three approaches (Wishart, multi-feature, and proposed DCCNN) led to a noticeable improvement in classification accuracy. Furthermore, the proposed DCCNN-MRF method could achieve 0.17%, 0.01%, 0.08%, and 0.08% improvements on the four datasets, respectively, compared to the DCCNN. This proves the effectiveness of the MRF component in enhancing the classification outcomes.

(3) Effect of patch size

In this experiment, we investigated the effect of the patch size on the OA of the proposed method, as shown in Figure 15. To be specific, we varied the patch size from

7 \times 7

to

15 \times 15

with an interval of 2. For the Xi’an dataset, we observed a fluctuation in the OA as the patch size increased. However, the OA gradually improved for the other datasets. Another interesting observation was that after reaching a patch size of

9 \times 9

, the OA did not show a significant increase. However, as the patch size increased, the required training and testing times also significantly increased. Therefore, we selected

9 \times 9

as the patch size in the experiment. This size provided a balance between accuracy and computational efficiency.

(4) Effect of training sample ratio

Here, we discuss the classification performance of the proposed method with different training sample ratios (see Figure 16). Specifically, we varied the training sample ratio from 5% to 30% with an interval of 5%. It was evident that as the training sample ratio increased, the OA gradually improved for all four datasets. However, the magnitude of improvement gradually diminished. In addition, when the training proportion reached 10%, the proposed DCCNN-MRF method demonstrated an improvement ranging from 0.62% to 1.4% on the Xi’an dataset, while it improved by 0.13% on the other three datasets. To achieve a trade-off between time expenditure and classification accuracy, a training sample ratio of 10% was selected.

(5) Analysis of running time

In Table 9, we present the running times of the comparison methods and the proposed DCCNN-MRF method on the Xi’an dataset. The super-RF method used the random forest algorithm to obtain the initial classification result and then combined the obtained superpixel information for optimization. This made the training time and test time for this method relatively short. It can be seen that the majority of the time was consumed during the training stage. The proposed DCCNN-MRF method took less training time than the DBDA, S3ANet, and CV-CNN methods. In addition, the DCCNN-MRF method took less test time than the DBDA and CV-CNN methods. Although it may not have attained the shortest running time, the proposed DCCNN-MRF method achieved the highest classification accuracy.

5. Conclusions

In this paper, a novel DCCNN-MRF method was proposed for PolSAR image classification that combined a double-channel convolution network and edge-preserving MRF to improve classification performance. Firstly, a novel DCCNN was developed, which consisted of Wishart-based complex matrix and multi-feature subnetworks. The Wishart-based complex matrix subnetwork was designed to learn the statistical characteristics of the original data, while the multi-feature subnetwork was designed to learn more high-level scattering features, especially for extremely heterogeneous areas. Then, a unified framework was presented to combine the two subnetworks and fuse the advantageous features of both. Finally, the DCCNN model was combined with an edge-preserving MRF to alleviate the issue of edge confusion caused by the deep network. In this model, an adaptive weighted neighborhood prior term was developed to optimize the edges. Experiments were conducted on four real PolSAR datasets, and quantitative evaluation indicators were calculated, including the OA, AA, and Kappa coefficient. The experiments showed that the proposed methods could obtain both higher classification accuracy and better visual appearance compared with some related methods. Our findings demonstrated that the proposed method could not only obtain homogeneous classification results for heterogeneous terrain objects, but also preserve edge details well.

In addition, further work should focus on how to generate more training samples. Obtaining ground-truth data for PolSAR images is challenging, and the proposed method currently requires a relatively high percentage of training samples (10%). To address the issue of limited labeled samples, various techniques can be employed to augment the sample size. One such approach is the utilization of generative adversarial networks (GANs) to generate additional samples. In addition, a feature selection mechanism could be exploited to better fuse the complex matrix and multi-feature information.

Author Contributions

Conceptualization, J.S.; Methodology, S.J.; Software (Python 3.7), C.S.; Validation, M.N.; Formal analysis, H.L.; Writing—review & editing, H.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under grants 62006186 and 62272383, the Science and Technology Program of Beilin District in Xi’an under grant GX2105, and the open fund of the National Key Laboratory of Geographic Information Engineering under grant SKLGIE2019-M-3-2.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kim, H.; Hirose, A. Unsupervised Fine Land Classification Using Quaternion Autoencoder-Based Polarization Feature Extraction and Self-Organizing Mapping. IEEE Trans. Geosci. Remote Sens. 2017, 56, 1839–1851. [Google Scholar] [CrossRef]
Shi, J. Image Processing Model and Method of Full-Polarization Synthetic Aperture Radar; Publishing House of Electronics Industry: Beijing, China, 2021. [Google Scholar]
Susaki, J.; Kishimoto, M. Urban Area Extraction Using X-Band Fully Polarimetric SAR Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 2592–2601. [Google Scholar] [CrossRef]
Zhang, W.T.; Wang, M.; Guo, J. A Novel Multi-Scale CNN Model for Crop Classification with Time-Series Fully Polarization SAR Images. In Proceedings of the 2021 2nd China International SAR Symposium (CISS), Shanghai, China, 3–5 November 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–5. [Google Scholar]
Wang, Y.; Chen, W.; Mao, X.; Lei, W. Terrain classification of polarimetric SAR images based on optimal polarization features. In Proceedings of the 2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI), Shijiazhuang, China, 22–24 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 400–403. [Google Scholar]
Gadhiya, T.; Roy, A.K. Classification of Polarimetrie Synthetic Aperture Radar Images Using Revised Wishart Distance. In Proceedings of the 2018 15th IEEE India Council International Conference (INDICON), Coimbatore, India, 16–18 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
Lee, J.S.; Grunes, M.R.; Kwok, R. Classification of multi-look polarimetric SAR imagery based on complex Wishart distribution. Int. J. Remote Sens. 1994, 15, 2299–2311. [Google Scholar] [CrossRef]
Luo, S.; Tong, L. A Fast Algorithm for the Sample of PolSAR Data Generation Based on the Wishart Distribution and Chaotic Map. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 389–392. [Google Scholar]
Xie, W.; Xie, Z.; Zhao, F.; Ren, B. POLSAR image classification via clustering-WAE classification model. IEEE Access 2018, 6, 40041–40049. [Google Scholar] [CrossRef]
Huang, Y.; Zhang, Y.; Shi, P.; Chambers, J. Variational adaptive Kalman filter with Gaussian-inverse-Wishart mixture distribution. IEEE Trans. Autom. Control 2020, 66, 1786–1793. [Google Scholar] [CrossRef]
Liu, C.; Liao, W.; Li, H.C.; Fu, K.; Philips, W. Unsupervised classification of multilook polarimetric SAR data using spatially variant wishart mixture model with double constraints. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5600–5613. [Google Scholar] [CrossRef]
Liu, M.; Deng, Y.; Wang, D.; Liu, X.; Wang, C. Unified Classification Framework for Multipolarization and Dual-Frequency SAR. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar] [CrossRef]
Wu, Q.; Hou, B.; Wen, Z.; Jiao, L. Variational learning of mixture wishart model for PolSAR image classification. IEEE Trans. Geosci. Remote Sens. 2018, 57, 141–154. [Google Scholar] [CrossRef]
Li, M.; Shen, Q.; Xiao, Y.; Liu, X.; Chen, Q. PolSAR Image Building Extraction with G0 Statistical Texture Using Convolutional Neural Network and Superpixel. Remote Sens. 2023, 15, 1451. [Google Scholar] [CrossRef]
Zou, P.; Zhen, L.; Tian, B. High-resolution polarized SAR image level set segmentation. J. Image Graph. 2014, 19, 1829–1835. [Google Scholar]
Cloude, S.R.; Pottier, E. A review of target decomposition theorems in radar polarimetry. IEEE Trans. Geosci. Remote Sens. 1996, 34, 498–518. [Google Scholar] [CrossRef]
Fang, C.; Wen, H.; Yirong, W. An improved Cloude-Pottier decomposition using H-α-span and complex Wishart classifier for polarimetric SAR classification. In Proceedings of the 2006 CIE International Conference on Radar, Shanghai, China, 16–19 October 2006; IEEE: Piscataway, NJ, USA, 2006; pp. 1–4. [Google Scholar]
Freeman, A.; Durden, S.L. A three-component scattering model for polarimetric SAR data. IEEE Trans. Geosci. Remote Sens. 1998, 36, 963–973. [Google Scholar] [CrossRef]
An, W.; Xie, C.; Yuan, X.; Cui, Y.; Yang, J. Four-component decomposition of polarimetric SAR images with deorientation. IEEE Geosci. Remote Sens. Lett. 2011, 8, 1090–1094. [Google Scholar] [CrossRef]
Zhang, W.; Ye, H. Unsupervised classification for Mini-RF SAR using m-δ decomposition and Wishart classification. In Proceedings of the 2019 International Applied Computational Electromagnetics Society Symposium-China (ACES), Nanjing, China, 8–11 August 2019; IEEE: Piscataway, NJ, USA, 2006; Volume 1, pp. 1–2. [Google Scholar]
Deng, J.W.; Li, H.L.; Cui, X.C.; Chen, S.W. Multi-Temporal PolSAR Image Classification Based on Polarimetric Scattering Tensor Eigenvalue Decomposition and Deep CNN Model. In Proceedings of the 2022 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Xi’an, China, 25–27 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
Song, W.; Li, M.; Zhang, P.; Wu, Y.; Tan, X.; An, L. Mixture WGΓ-MRF Model for PolSAR Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 56, 905–920. [Google Scholar] [CrossRef]
Qu, J.; Qiu, X.; Ding, C.; Lei, B. Unsupervised classification of polarimetric SAR image based on geodesic distance and non-Gaussian distribution feature. Sensors 2021, 21, 1317. [Google Scholar] [CrossRef]
Li, Z.C.; Li, H.C.; Hu, W.S.; Pan, L. Spatially Variant Gamma-WMM with Extended Variational Inference for Unsupervised PolSAR Classification. In Proceedings of the IGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 2550–2553. [Google Scholar]
Jiao, L.; Liu, F. Wishart deep stacking network for fast POLSAR image classification. IEEE Trans. Image Process. 2016, 25, 3273–3286. [Google Scholar] [CrossRef]
Dong, H.; Zou, B.; Zhang, L.; Zhang, S. Automatic design of CNNs via differentiable neural architecture search for PolSAR image classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 6362–6375. [Google Scholar] [CrossRef]
Xie, W.; Ma, G.; Zhao, F.; Liu, H.; Zhang, L. PolSAR image classification via a novel semi-supervised recurrent complex-valued convolution neural network. Neurocomputing 2020, 388, 255–268. [Google Scholar] [CrossRef]
Liu, S.J.; Luo, H.; Shi, Q. Active ensemble deep learning for polarimetric synthetic aperture radar image classification. IEEE Geosci. Remote Sens. Lett. 2020, 18, 1580–1584. [Google Scholar] [CrossRef]
Liu, F.; Wang, J.; Tang, X.; Liu, J.; Zhang, X.; Xiao, L. Adaptive graph convolutional network for PolSAR image classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–14. [Google Scholar] [CrossRef]
Luo, J.; Lv, Y.; Guo, J. Multi-temporal PolSAR Image Classification Using F-SAE-CNN. In Proceedings of the 2022 3rd China International SAR Symposium (CISS), Shanghai, China, 2–4 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [Google Scholar]
Ren, Y.; Jiang, W.; Liu, Y. A New Architecture of a Complex-Valued Convolutional Neural Network for PolSAR Image Classification. Remote Sens. 2023, 15, 4801. [Google Scholar] [CrossRef]
Yang, C.; Hou, B.; Ren, B.; Hu, Y.; Jiao, L. CNN-based polarimetric decomposition feature selection for PolSAR image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8796–8812. [Google Scholar] [CrossRef]
Bi, H.; Santos-Rodriguez, R.; Flach, P. Polsar Image Classification via Robust Low-Rank Feature Extraction and Markov Random Field. In Proceedings of the IGARSS 2020—2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 6 September–2 October 2020; IEEE: Piscataway, NJ, USA, 2022; pp. 708–711. [Google Scholar]
Dong, H.; Zhang, L.; Lu, D.; Zou, B. Attention-based polarimetric feature selection convolutional network for PolSAR image classification. IEEE Geosci. Remote Sens. Lett. 2020, 19, 1–5. [Google Scholar] [CrossRef]
Wu, Q.; Wen, Z.; Wang, Y.; Luo, Y.; Li, H.; Chen, Q. A Statistical-Spatial Feature Learning Network for PolSAR Image Classification. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
Shi, J.; Jin, H. Complex Matrix and Polarimetric Feature Joint Learning for Polarimetric Sar Image Classification. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 2714–2717. [Google Scholar]
Liang, W.; Wu, Y.; Li, M.; Cao, Y.; Hu, X. High-resolution SAR image classification using multi-scale deep feature fusion and covariance pooling manifold network. Remote Sens. 2021, 13, 328. [Google Scholar] [CrossRef]
Koukiou, G. Simulated Annealing for Land Cover Classification in PolSAR Images. Adv. Remote Sens. 2022, 11, 49–61. [Google Scholar] [CrossRef]
Ni, L.; Zhang, B.; Shen, Q.; Gao, L.; Sun, X.; Li, S.; Wu, H. Edge constrained MRF method for classification of hyperspectral imagery. In Proceedings of the 2014 6th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Lausanne, Switzerland, 24–27 June 2014; IEEE: Piscataway, NJ, USA, 2022; pp. 1–4. [Google Scholar]
Koukiou, G.; Anastassopoulos, V. Fully Polarimetric Land Cover Classification Based on Markov Chains. Adv. Remote Sens. 2021, 10, 19. [Google Scholar] [CrossRef]
Karachristos, K.; Koukiou, G.; Anastassopoulos, V. Fully Polarimetric Land Cover Classification Based on Hidden Markov Models Trained with Multiple Observations. Prog. Remote Sens. Technol. 2021, 10, 13. [Google Scholar] [CrossRef]
Shi, J.; Li, L.; Liu, F.; Jiao, L.; Liu, H.; Yang, S.; Liu, L.; Hao, H. Unsupervised polarimetric synthetic aperture radar image classification based on sketch map and adaptive Markov random field. J. Appl. Remote Sens. 2016, 10, 025008. [Google Scholar] [CrossRef]
Liu, F.; Shi, J.; Jiao, L.; Liu, H.; Yang, S.; Wu, J.; Hao, H.; Yuan, J. Hierarchical semantic model and scattering mechanism based PolSAR image classification. Pattern Recognit. 2016, 59, 325–342. [Google Scholar] [CrossRef]
Lee, J.S.; Grunes, M.R.; Grandi, G.D. Polarimetric SAR speckle filtering and its implication for classification. IEEE Trans. Geosci. Remote Sens. 1999, 37, 2363–2373. [Google Scholar]
Shi, J.; Jin, H.; Li, X. A Novel Multi-feature Joint Learning Method for Fast Polarimetric SAR Terrain Classification. IEEE Access 2020, 8, 30491–30503. [Google Scholar] [CrossRef]
Chen, Q.; Cao, W.; Shang, J.; Liu, J.; Liu, X. Superpixel-based Cropland Classification of SAR Image with Statistical Texture and Polarization Features. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
Cui, Y.; Liu, F.; Jiao, L.; Guo, Y.; Liang, X.; Li, L.; Yang, S.; Qian, X. Polarimetric Multipath Convolutional Neural Network for PolSAR Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–18. [Google Scholar] [CrossRef]
Li, R.; Zheng, S.; Duan, C.; Yang, Y.; Wang, X. Classification of Hyperspectral Image Based on Double-Branch Dual-Attention Mechanism Network. Remote Sens. 2020, 12, 582. [Google Scholar] [CrossRef]
Fan, Z.; Ji, Z.; Fu, P.; Wang, T.; Sun, Q. Complex-Valued Spatial-Scattering Separated Attention Network for Polsar Image Classification. In Proceedings of the IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020. [Google Scholar]
Zhang, Z.; Wang, H.; Xu, F.; Jin, Y.Q. Complex-Valued Convolutional Neural Network and Its Application in Polarimetric SAR Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 7177–7188. [Google Scholar] [CrossRef]

Figure 1. Examples of polarimetric sketch maps. (a) Pauli RGB PolSAR image for Xi’an dataset; (b) Pauli RGB PolSAR image for Flevoland dataset; (c) corresponding polarimetric sketch map for Xi’an dataset; (d) corresponding polarimetric sketch map for Flevoland dataset.

Figure 2. Framework of the proposed double-channel CNN and MRF model for PolSAR image classification.

Figure 3. Example of edge and non-edge regions. (a) Pauli RGB image of Ottawa; (b) the corresponding polarimetric sketch map; (c) geometric structural block; (d) the white areas are edge regions, and the black areas are non-edge regions.

Figure 4. Examples of adaptive neighbor structures. (a) The Pauli RGB image of the Xi’an area, where point A is in the non-edge region, and point B and point C are in the edge region; (b) the label map of (a); (c) the

3 \times 3

fixed neighborhood for point A; (d) the weighted neighborhood structure for point B; and (e) the weighted neighborhood structure for point C.

Figure 4. Examples of adaptive neighbor structures. (a) The Pauli RGB image of the Xi’an area, where point A is in the non-edge region, and point B and point C are in the edge region; (b) the label map of (a); (c) the

3 \times 3

fixed neighborhood for point A; (d) the weighted neighborhood structure for point B; and (e) the weighted neighborhood structure for point C.

Figure 5. Flowchart of the proposed DCCNN-MRF method.

Figure 6. Pauli RGB images and group-truth maps for the four PolSAR datasets. (a) Pauli RGB image of Xi’an dataset; (b) ground-truth image of Xi’an dataset; (c) Pauli RGB image of Flevoland dataset 1; (d) ground-truth image of Flevoland dataset 1; (e) Pauli RGB image of San Francisco dataset; (f) ground-truth image of San Francisco dataset; (g) Pauli RGB image of Flevoland dataset 2; (h) ground-truth image of Flevoland dataset 2.

Figure 7. Classification results of different methods in the Xi’an area. (a) The label map of Xi’an; (b) the classification map produced by the super-RF method; (c) the classification map produced by the PolMPCNN method; (d) the classification map produced by the DBDA method; (e) the classification map produced by the S3ANet method; (f) the classification map produced by the CV-CNN method; (g) the classification map produced by the proposed DCCNN method; (h) the classification map produced by the proposed DCCNN-MRF method.

Figure 8. Confusion matrix of the proposed method for the Xi’an dataset.

Figure 9. Classification results of Flevoland dataset 1. (a) The label map of Flevoland 1 area; (b) the classification map produced by the super-RF method; (c) the classification map produced by the PolMPCNN method; (d) the classification map produced by the DBDA method; (e) the classification map produced by the S3ANet method; (f) the classification map produced by the CV-CNN method; (g) the classification map produced by the proposed DCCNN method; (h) the classification map produced by the proposed DCCNN-MRF method; (i) the corresponding classes for different colors.

Figure 10. Confusion matrix of the proposed method for Flevoland dataset 1.

Figure 11. Classification results of different methods in the San Francisco area. (a) The label map of the San Francisco area; (b) the classification map produced by the super-RF method; (c) the classification map produced by the PolMPCNN method; (d) the classification map produced by the DBDA method; (e) the classification map produced by the S3ANet method; (f) the classification map produced by the CV-CNN method; (g) the classification map produced by the proposed DCCNN method; (h) the classification map produced by the proposed DCCNN-MRF method.

Figure 12. Confusion matrix of the proposed method for San Francisco dataset.

Figure 13. Classification results of different methods in the Flevoland 2 area. (a) The label map of Flevoland 2 area; (b) the classification map produced by the super-RF method; (c) the classification map produced by the PolMPCNN method; (d) the classification map produced by the DBDA method; (e) the classification map produced by the S3ANet method; (f) the classification map produced by the CV-CNN method; (g) the classification map produced by the proposed DCCNN method; (h) the classification map produced by the proposed DCCNN-MRF method.

Figure 14. Confusion matrix of the proposed method for Flevoland dataset 2.

Figure 15. The effect of patch size on classification accuracy.

Figure 16. The effect of training sample ratio on classification accuracy.

Table 1. Multiple feature extraction of PolSAR images.

Feature	Name	Dimensions
$T_{11}, T_{22}, T_{33}, r e a l (T_{12}), i m g (T_{12}), r e a l (T_{13}), i m g (T_{13}), r e a l (T_{23}), i m g (T_{23})$	Coherency matrix elements	9
$r e a l (S_{h h}), i m g (S_{h h}), r e a l (S_{h v}), i m g (S_{h v}), r e a l (S_{v v}), i m g (S_{h v})$	Scattering matrix elements	6
$s p a n = {\|S_{h h}\|}^{2} + {\|S_{h v}\|}^{2} + {\|S_{v v}\|}^{2}$	SPAN data	1
$C o n t r a s t : c o n = \sum_{i} \sum_{j} {(i - j)}^{2} P (i, j)$		4
$E n e r g y : A s m = \sum_{i} \sum_{j} P {(i, j)}^{2}$		4
$E n t r o p y : E n t = - \sum_{i} \sum_{j} P (i, j) log P (i, j)$	GLCM features	4
$R e l a t i v i t y : C o r r = [\sum_{i} \sum_{j} ((i, j) p (i, j)) - μ_{x} μ_{y}] / σ_{x} σ_{y}$		4
$E_{e d g e} = (\frac{1}{n} \sum_{i = 1}^{n} w_{i} x_{i}) / (\frac{1}{m} \sum_{j = 1}^{m} w_{j} x_{j})$	Edge features	4
$E_{l i n e} = min {E_{e d g e}^{i j}, E_{e d g e}^{j k}}$	Line features	4
H,A, $α$	Cloude and Pottier decomposition	3
$A_{0}$ , $B_{0}$ , $B, C, D, E, F, G, H$	Huynen decomposition	9
The surface, double-bounce, and volume scattering power	Freeman decomposition	3
$r_{o} = \frac{〈S_{v v} S_{v v}^{}〉}{〈S_{h h} S_{h h}^{}〉}$	Co-polarization ratio	1
$r_{x} = \frac{〈S_{h v} S_{h v}^{}〉}{〈S_{h h} S_{h h}^{}〉}$	Cross-polarization ratio	1
Total		57

Table 2. PolSAR dataset used in the experiments.

Name	System	Band	Dimensions	Resolution	Class
Xi’an	RADARSAT-2	C	$8 \times 8$ m	$512 \times 512$	3
Flevoland1	AIRSAR	L	$12.1 \times 6.7$ m	$750 \times 1024$	15
San Francisco	RADARSAT-2	C	$10 \times 5$ m	$1800 \times 1380$	5
Flevoland2	AIRSAR	C	$12 \times 8$ m	$1400 \times 1200$	4

Table 3. Classification accuracy of different methods on Xi’an dataset (%).

Class	Super-RF	PolMPCNN	DBDA	S3ANet	CV-CNN	DCCNN	DCCNN-MRF
water	70.91	95.52	89.88	84.39	95.30	94.88	94.99
grass	94.97	90.95	93.13	93.99	88.25	97.49	97.67
building	90.94	97.68	91.56	96.72	95.34	98.31	98.48
OA	89.94	94.01	92.09	93.51	91.81	97.39	97.56
AA	85.61	94.71	91.53	91.70	92.96	96.90	97.05
Kappa	83.02	90.25	86.91	89.24	86.67	95.69	95.96

Table 4. Classification accuracy of different methods on Flevoland dataset 1 (%).

Class	Super-RF	PolMPCNN	DBDA	S3ANet	CV-CNN	DCCNN	DCCNN-MRF
stembeans	96.77	99.79	99.97	93.79	99.72	100	100
peas	98.64	99.00	99.53	96.63	99.99	99.98	99.99
forest	95.88	99.97	99.37	96.01	99.82	100	100
lucerne	96.63	98.45	100	96.86	98.17	99.97	100
wheat	99.05	95.52	98.69	99.86	98.66	100	100
beat	95.70	98.00	99.83	98.09	99.22	99.82	99.82
potatoes	96.04	98.76	99.59	95.21	99.14	100	100
baresoil	94.57	99.84	83.11	100	100	100	100
grasses	84.03	95.71	85.12	68.19	99.94	99.94	99.95
rapeseed	53.13	93.55	65.23	81.60	94.18	99.94	99.95
barley	100	40.25	85.52	95.11	99.58	100	100
wheat2	79.93	95.97	99.48	96.83	99.41	100	100
wheat3	99.39	93.88	99.23	99.79	99.37	99.97	99.98
water	100	91.27	89.93	99.27	99.99	100	100
building	0	96.01	95.17	66.39	100	98.95	98.74
OA	92.18	93.82	94.31	95.01	98.96	99.97	99.98
AA	85.98	93.06	93.32	92.24	99.15	99.90	99.90
Kappa	91.44	93.27	93.79	94.55	98.87	99.97	99.97

Table 5. Classification accuracy of different methods on San Francisco dataset (%).

Class	Super-RF	PolMPCNN	DBDA	S3ANet	CV-CNN	DCCNN	DCCNN-MRF
ocean	99.98	99.98	99.32	99.73	99.99	100	100
vegetation	93.89	95.63	88.20	97.40	96.42	99.31	99.44
low-density urban	97.31	99.39	98.33	98.27	94.51	99.51	99.73
high-density urban	77.76	98.57	90.00	99.82	96.37	99.80	99.90
developed	81.00	96.81	81.15	99.47	95.92	99.87	99.94
OA	94.33	98.93	95.39	99.14	97.71	99.78	99.86
AA	89.99	98.08	91.40	98.93	96.64	99.70	99.80
Kappa	91.81	98.46	93.36	98.76	96.70	99.68	99.79

Table 6. Classification accuracy of different methods for Flevoland dataset 2 (%).

Class	Super-RF	PolMPCNN	DBDA	S3ANet	CV-CNN	DCCNN	DCCNN-MRF
urban	81.84	96.29	89.37	99.91	96.26	99.57	99.73
water	98.69	99.14	95.83	98.86	99.85	99.96	99.97
woodland	94.92	98.77	95.82	97.11	96.48	99.70	99.78
cropland	94.16	98.66	97.69	99.17	93.94	99.78	99.89
OA	93.88	98.49	95.50	98.61	96.57	99.78	99.86
AA	92.40	98.21	94.68	98.76	96.63	99.76	99.84
Kappa	91.61	97.94	93.84	98.11	95.33	99.70	99.81

Table 7. Classification accuracy of different subnetworks on four datasets (%).

Dataset	Xi’an		Flevoland 1		San Francisco		Flevoland 2
Accuracy	OA	Kappa	OA	Kappa	OA	Kappa	OA	Kappa
Wishart	88.58	81.12	95.49	95.08	92.36	89.00	94.37	92.31
Multi-feature	95.77	92.98	99.91	99.90	99.75	99.64	99.76	99.68
DCCNN	97.39	95.69	99.97	99.97	99.78	99.68	99.78	99.70

Table 8. Classification accuracy of different model settings on four datasets (%).

Dataset	Xi’an		Flevoland 1		San Francisco		Flevoland 2
Accuracy	OA	Kappa	OA	Kappa	OA	Kappa	OA	Kappa
Wishart	88.58	81.12	95.49	95.08	92.36	89.00	94.37	92.31
Wishart+MRF	89.25	82.24	95.93	95.55	93.00	89.92	95.84	94.33
Multi-feature	95.77	92.98	99.91	99.90	99.75	99.64	99.76	99.68
Multi-feature+MRF	95.98	93.33	99.94	99.93	99.83	99.76	99.85	99.79
DCCNN	97.39	95.69	99.97	99.97	99.78	99.68	99.78	99.70
DCCNN+MRF	97.56	95.96	99.98	99.97	99.86	99.79	99.86	99.81

Table 9. Running times of different methods on Xi’an dataset (s).

	Super-RF	DBDA	S3ANet	CV-CNN	DCCNN-MRF
Training time	59.22	239.35	461.33	7872.68	121.99
Testing time	1.85	32.50	1.43	19.88	13.24

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, J.; Nie, M.; Ji, S.; Shi, C.; Liu, H.; Jin, H. Polarimetric Synthetic Aperture Radar Image Classification Based on Double-Channel Convolution Network and Edge-Preserving Markov Random Field. Remote Sens. 2023, 15, 5458. https://doi.org/10.3390/rs15235458

AMA Style

Shi J, Nie M, Ji S, Shi C, Liu H, Jin H. Polarimetric Synthetic Aperture Radar Image Classification Based on Double-Channel Convolution Network and Edge-Preserving Markov Random Field. Remote Sensing. 2023; 15(23):5458. https://doi.org/10.3390/rs15235458

Chicago/Turabian Style

Shi, Junfei, Mengmeng Nie, Shanshan Ji, Cheng Shi, Hongying Liu, and Haiyan Jin. 2023. "Polarimetric Synthetic Aperture Radar Image Classification Based on Double-Channel Convolution Network and Edge-Preserving Markov Random Field" Remote Sensing 15, no. 23: 5458. https://doi.org/10.3390/rs15235458

APA Style

Shi, J., Nie, M., Ji, S., Shi, C., Liu, H., & Jin, H. (2023). Polarimetric Synthetic Aperture Radar Image Classification Based on Double-Channel Convolution Network and Edge-Preserving Markov Random Field. Remote Sensing, 15(23), 5458. https://doi.org/10.3390/rs15235458

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Polarimetric Synthetic Aperture Radar Image Classification Based on Double-Channel Convolution Network and Edge-Preserving Markov Random Field

Abstract

1. Introduction

2. Related Work

2.1. PolSAR Data

2.2. Polarimetric Sketch Map

3. Proposed Method

3.1. Double-Channel Convolution Network

3.2. Combining Edge-Preserving MRF and DCCNN Model

4. Experimental Results and Analysis

4.1. Experimental Data and Settings

4.2. Experimental Results for Xi’an Dataset

4.3. Experimental Results for Flevoland Dataset 1

4.4. Experimental Results for San Francisco Dataset

4.5. Experimental Results for Flevoland Dataset 2

4.6. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI