SAR Image Classiﬁcation Using Fully Connected Conditional Random Fields Combined with Deep Learning and Superpixel Boundary Constraint

: As one of the most important active remote sensing technologies, synthetic aperture radar (SAR) provides advanced advantages of all-day, all-weather, and strong penetration capabilities. Due to its unique electromagnetic spectrum and imaging mechanism, the dimensions of remote sensing data have been considerably expanded. Important for fundamental research in microwave remote sensing, SAR image classiﬁcation has been proven to have great value in many remote sensing applications. Many widely used SAR image classiﬁcation algorithms rely on the combination of hand-designed features and machine learning classiﬁers, which still experience many issues that remain to be resolved and overcome, including optimized feature representation, the fuzzy confusion of speckle noise, the widespread applicability, and so on. To mitigate some of the issues and to improve the pattern recognition of high-resolution SAR images, a ConvCRF model combined with superpixel boundary constraint is developed. The proposed algorithm can successfully combine the local and global advantages of fully connected conditional random ﬁelds and deep models. An optimizing strategy using a superpixel boundary constraint in the inference iterations more efﬁciently preserves structure details. The experimental results demonstrate that the proposed method provides competitive advantages over other widely used models. In the land cover classiﬁcation experiments using the MSTAR, E-SAR and GF-3 datasets, the overall accuracy of our proposed method achieves 90.18 ± 0.37, 91.63 ± 0.27, and 90.91 ± 0.31, respectively. Regarding the issues of SAR image classiﬁcation, a novel integrated learning containing local and global image features can bring practical implications.


Introduction
As one of the most important active remote sensors, synthetic aperture radar (SAR) is able to provide effective datasets for geoscience and earth observation. Compared with optical satellite sensors, SAR systems are almost unaffected by variation in atmospheric opacity at the microwave band [1,2]. As a stable data source not affected by the external effects of weather and time period, SAR datasets have been widely applied in many fields, including marine environmental monitoring [3][4][5], disaster emergency response [6,7], land cover mapping [8][9][10], and precision agriculture [11,12]. ative adversarial network (GAN) [29]. In recent years, with the sharp increase in data volume, we have to face more and more big data tasks in remote sensing applications. Many single-method classifiers always encounter a bottleneck in these cases. The ensemble learning theory was proposed to deal with these complex data analysis. A semantic image segmentation with fully connected CNN and conditional random field (CRF) postprocessing was proposed to improve the accuracy of large-scale natural image classification [30]. The combination of recurrent neural network (RNN) and CRF was also applied for scene classification of natural images [31]. Compared with CNN, RNN is better at processing the dataset with sequential connection. In SAR or hyperspectral remote sensing applications, the lack of labeled images is difficult to support complete training of this very deep model. However, this ensemble learning still provides a lot of inspiration to advance intelligent remote sensing.
To take better advantage of contextual information, some studies have adopted the patch-based strategy to perform land cover mapping [32,33], especially for deep learning algorithms [16,18,34]. In the patch-based strategy, each center pixel in the neighborhood with a fixed-size block is an image patch, which subsequently serves as the processing unit in the training and classifying steps. The biggest advantage of the patch-based method is that the classification can follow the traditional supervised pipeline, which introduces the hand-crafted regions of interest (ROIs) as training samples. Compared with supervised semantic segmentation, the patch-based method provides more efficient and convenient access to establish the training datasets with clean labels, especially for limited SAR data. Nevertheless, the patch-based method also introduces some problems that remain to be solved. Firstly, the classification accuracy, to a large extent, depends on the homogeneity of the involved image patches [32]. However, in reality, the real SAR image also covers some heterogeneous mixing of multiple land cover types, which poses challenges to the determination of patch sizes and label assignments. The oversized patches introduce many confusing areas, while small patches are susceptible to speckle noise instead. Secondly, the patch-based method has poor edge-preserving ability. Influenced by this drawback, the edges of adjacent regions with different land types suffer from slight distortion. The zigzag and mosaic effects distributed in the boundary regions negatively impact the overall classification accuracy. The probabilistic graphical model (PGM) is one of the research hotspots in the field of SAR image machine learning and pattern recognition [35,36]. As a typical representative of PGMs, the conditional random field (CRF) model transforms the classification task into maximum a posteriori (MAP) inference. Notably, the long-range connectivity and multiscale potential functions provide advantages for the CRF model in contextual feature representation [37] which can efficiently enhance the edge-preserving and recognition abilities. Compared with other methods, the CRF model not only focuses on local spatial information but also extends to global contextual connectivity. Based on the superiority of the CRF model, the disadvantages of patch-based classification methods are expected to be dramatically suppressed. Although the CRF model has various prospects in the advancement of SAR image classification, the excessive smoothing derived from the pairwise penalty factor often causes micro-regions to disappear. Based on the merits and shortcomings of different classification methods, we propose an integrated model, ConvCRF, that combines the CRF and CNN models with a superpixel boundary constraint (SBC). It aims to maximize the effectiveness of multiple algorithms and advance intelligent SAR image classification. There are three original contributions proposed in this paper: (1) The ConvCRF model is proposed to combine fully connected CRF MAP modeling with convolutional representation layers. The proposed model adopts dual Gaussian kernels to build high-order potential functions, which considers the backscattering of locality and long-range connectivity. Compared with classical SAR image classification methods, the proposed method realizes comprehensive feature representation with global and local feature information.
(2) A patch-based CNN algorithm is used for the unary potential to build the preliminary labeling condition. This method provides a convenient approach to acquire initial high-precision PGM with the widely used supervised pipeline. The patch-based mechanism introduces contextual locality using square neighborhood information for the central pixel, which considerably decreases the negative impact of speckle noise in high-resolution SAR images.
(3) A superpixel boundary constraint mechanism is adopted to improve the process of PGM inference. To overcome excessive smoothing and protect to micro-regions, superpixel boundaries derived from the graph-cut algorithm are used for the neighborhood constraints to average random field probabilities in each iteration. This strategy can also improve the edge-preserving performance.
The remainder of this paper is organized as follows: In Section 2, the datasets are described in detail. The core idea and overall architecture of the proposed algorithm are presented in Section 3. Section 4 describes the experimental results. The analysis of hyper-parameters is discussed in Section 5. The last section draws the conclusions and proposes future work.

Overview of the Experimental SAR Data
In our experiments, four different SAR images were adopted after a series of standardized corrections and calibrations. The whole dataset included two single-polarization images and two full-polarization images. The first X-band airborne SAR image was obtained from the Moving and Stationary Target Recognition (MSTAR) dataset [38], shown in Figure 1a. It is a public cluster scene image with a size of 1076 × 1058 pixels and 0.3-m resolution. It covers a field region located in Athens, AL, USA. As shown in Figure 1b, the second SAR image is a polarimetric SAR data from the L-band E-SAR airborne system. It is a Pauli-basis decomposition map of the polarized scattering matrix which can be parameterized by [|S HH +S VV | 2 , |S HH -S VV | 2 , 2|S HV | 2 ]. The image size of the selected region is 873 × 1159 pixels with a 3-m resolution. It covers a part of the airfield region located in Oberpfafenhofen, Germany. In Figure 1c, the third dataset is a C-band spaceborne SAR image, which was collected by the GF-3 satellite. These full-polarization data were from Quad-Polarization Strip I (QPSI) imaging mode with 8-m resolution. It is also a Pauli-basis decomposition map parameterized by [|S HH +S VV | 2 , |S HH -S VV | 2 , 2|S HV | 2 ], which was derived from the polarized scattering matrix. It covers a lakeshore area situated in Suzhou, China. The spaceborne SAR image size is 2376 × 2040 pixels.
Remote Sens. 2020, 12, x FOR PEER REVIEW 4 of 25 (2) A patch-based CNN algorithm is used for the unary potential to build the preliminary labeling condition. This method provides a convenient approach to acquire initial high-precision PGM with the widely used supervised pipeline. The patch-based mechanism introduces contextual locality using square neighborhood information for the central pixel, which considerably decreases the negative impact of speckle noise in high-resolution SAR images.
(3) A superpixel boundary constraint mechanism is adopted to improve the process of PGM inference. To overcome excessive smoothing and protect to micro-regions, superpixel boundaries derived from the graph-cut algorithm are used for the neighborhood constraints to average random field probabilities in each iteration. This strategy can also improve the edge-preserving performance.
The remainder of this paper is organized as follows: In Section 2, the datasets are described in detail. The core idea and overall architecture of the proposed algorithm are presented in Section 3. Section 4 describes the experimental results. The analysis of hyperparameters is discussed in Section 5. The last section draws the conclusions and proposes future work.

Overview of the Experimental SAR Data
In our experiments, four different SAR images were adopted after a series of standardized corrections and calibrations. The whole dataset included two single-polarization images and two full-polarization images. The first X-band airborne SAR image was obtained from the Moving and Stationary Target Recognition (MSTAR) dataset [38], shown in Figure 1a. It is a public cluster scene image with a size of 1076 × 1058 pixels and 0.3-m resolution. It covers a field region located in Athens, AL, USA. As shown in Figure 1b, the second SAR image is a polarimetric SAR data from the L-band E-SAR airborne system. It is a Pauli-basis decomposition map of the polarized scattering matrix which can be parameterized by [|SHH+SVV| 2 , |SHH-SVV| 2 , 2|SHV| 2 ]. The image size of the selected region is 873 × 1159 pixels with a 3-m resolution. It covers a part of the airfield region located in Oberpfafenhofen, Germany. In Figure 1c, the third dataset is a C-band spaceborne SAR image, which was collected by the GF-3 satellite. These full-polarization data were from Quad-Polarization Strip I (QPSI) imaging mode with 8-m resolution. It is also a Pauli-basis decomposition map parameterized by [|SHH+SVV| 2 , |SHH-SVV| 2 , 2|SHV| 2 ], which was derived from the polarized scattering matrix. It covers a lakeshore area situated in Suzhou, China. The spaceborne SAR image size is 2376 × 2040 pixels.  Table 1 presents different land cover patches showing significant differences in intensity, texture, structure, etc. Especially for the polarimetric SAR images, single scattering, dihedral scattering, and inclined dihedral scattering introduce huge differences to the feature space. In the VHR (Very High Resolution) airborne SAR image, shrubs and trees have their own distinctive texture features. In this image, the regular stripes make the shrub easy to recognize. For full-polarization SAR images, polarimetric decomposition provides significant benefits for classifying different land cover types. The building area shows a series of linear stripes with a mix of pink and silver colors formed by different heights of inclined dihedral scattering with a certain rotation. The water area is composed of single scattering with a speckle blue color. Other land types also present significant differences. The training samples of the three given datasets are presented in Table 2. For deep models, the imbalance in the quantity of different land types will lead to poor generalization performance and overfitting. To avoid these issues, random elimination was used to ensure the overall balance of training samples. In each training period, the datasets were randomly divided into training and validation sets according to a 4:1 ratio. The test set for each experimental region is the whole SAR image apart from the selected training and validation sets.   Table 1 presents different land cover patches showing significant differences in intensity, texture, structure, etc. Especially for the polarimetric SAR images, single scattering, dihedral scattering, and inclined dihedral scattering introduce huge differences to the feature space. In the VHR (Very High Resolution) airborne SAR image, shrubs and trees have their own distinctive texture features. In this image, the regular stripes make the shrub easy to recognize. For full-polarization SAR images, polarimetric decomposition provides significant benefits for classifying different land cover types. The building area shows a series of linear stripes with a mix of pink and silver colors formed by different heights of inclined dihedral scattering with a certain rotation. The water area is composed of single scattering with a speckle blue color. Other land types also present significant differences. The training samples of the three given datasets are presented in Table 2. For deep models, the imbalance in the quantity of different land types will lead to poor generalization performance and overfitting. To avoid these issues, random elimination was used to ensure the overall balance of training samples. In each training period, the datasets were randomly divided into training and validation sets according to a 4:1 ratio. The test set for each experimental region is the whole SAR image apart from the selected training and validation sets.   Table 1 presents different land cover patches showing significant differences in intensity, texture, structure, etc. Especially for the polarimetric SAR images, single scattering, dihedral scattering, and inclined dihedral scattering introduce huge differences to the feature space. In the VHR (Very High Resolution) airborne SAR image, shrubs and trees have their own distinctive texture features. In this image, the regular stripes make the shrub easy to recognize. For full-polarization SAR images, polarimetric decomposition provides significant benefits for classifying different land cover types. The building area shows a series of linear stripes with a mix of pink and silver colors formed by different heights of inclined dihedral scattering with a certain rotation. The water area is composed of single scattering with a speckle blue color. Other land types also present significant differences. The training samples of the three given datasets are presented in Table 2. For deep models, the imbalance in the quantity of different land types will lead to poor generalization performance and overfitting. To avoid these issues, random elimination was used to ensure the overall balance of training samples. In each training period, the datasets were randomly divided into training and validation sets according to a 4:1 ratio. The test set for each experimental region is the whole SAR image apart from the selected training and validation sets.   Table 1 presents different land cover patches showing significant differences in intensity, texture, structure, etc. Especially for the polarimetric SAR images, single scattering, dihedral scattering, and inclined dihedral scattering introduce huge differences to the feature space. In the VHR (Very High Resolution) airborne SAR image, shrubs and trees have their own distinctive texture features. In this image, the regular stripes make the shrub easy to recognize. For full-polarization SAR images, polarimetric decomposition provides significant benefits for classifying different land cover types. The building area shows a series of linear stripes with a mix of pink and silver colors formed by different heights of inclined dihedral scattering with a certain rotation. The water area is composed of single scattering with a speckle blue color. Other land types also present significant differences. The training samples of the three given datasets are presented in Table 2. For deep models, the imbalance in the quantity of different land types will lead to poor generalization performance and overfitting. To avoid these issues, random elimination was used to ensure the overall balance of training samples. In each training period, the datasets were randomly divided into training and validation sets according to a 4:1 ratio. The test set for each experimental region is the whole SAR image apart from the selected training and validation sets.   Table 1 presents different land cover patches showing significant differences in intensity, texture, structure, etc. Especially for the polarimetric SAR images, single scattering, dihedral scattering, and inclined dihedral scattering introduce huge differences to the feature space. In the VHR (Very High Resolution) airborne SAR image, shrubs and trees have their own distinctive texture features. In this image, the regular stripes make the shrub easy to recognize. For full-polarization SAR images, polarimetric decomposition provides significant benefits for classifying different land cover types. The building area shows a series of linear stripes with a mix of pink and silver colors formed by different heights of inclined dihedral scattering with a certain rotation. The water area is composed of single scattering with a speckle blue color. Other land types also present significant differences. The training samples of the three given datasets are presented in Table 2. For deep models, the imbalance in the quantity of different land types will lead to poor generalization performance and overfitting. To avoid these issues, random elimination was used to ensure the overall balance of training samples. In each training period, the datasets were randomly divided into training and validation sets according to a 4:1 ratio. The test set for each experimental region is the whole SAR image apart from the selected training and validation sets.   Table 1 presents different land cover patches showing significant differences in intensity, texture, structure, etc. Especially for the polarimetric SAR images, single scattering, dihedral scattering, and inclined dihedral scattering introduce huge differences to the feature space. In the VHR (Very High Resolution) airborne SAR image, shrubs and trees have their own distinctive texture features. In this image, the regular stripes make the shrub easy to recognize. For full-polarization SAR images, polarimetric decomposition provides significant benefits for classifying different land cover types. The building area shows a series of linear stripes with a mix of pink and silver colors formed by different heights of inclined dihedral scattering with a certain rotation. The water area is composed of single scattering with a speckle blue color. Other land types also present significant differences. The training samples of the three given datasets are presented in Table 2. For deep models, the imbalance in the quantity of different land types will lead to poor generalization performance and overfitting. To avoid these issues, random elimination was used to ensure the overall balance of training samples. In each training period, the datasets were randomly divided into training and validation sets according to a 4:1 ratio. The test set for each experimental region is the whole SAR image apart from the selected training and validation sets.   Table 1 presents different land cover patches showing significant differences in intensity, texture, structure, etc. Especially for the polarimetric SAR images, single scattering, dihedral scattering, and inclined dihedral scattering introduce huge differences to the feature space. In the VHR (Very High Resolution) airborne SAR image, shrubs and trees have their own distinctive texture features. In this image, the regular stripes make the shrub easy to recognize. For full-polarization SAR images, polarimetric decomposition provides significant benefits for classifying different land cover types. The building area shows a series of linear stripes with a mix of pink and silver colors formed by different heights of inclined dihedral scattering with a certain rotation. The water area is composed of single scattering with a speckle blue color. Other land types also present significant differences. The training samples of the three given datasets are presented in Table 2. For deep models, the imbalance in the quantity of different land types will lead to poor generalization performance and overfitting. To avoid these issues, random elimination was used to ensure the overall balance of training samples. In each training period, the datasets were randomly divided into training and validation sets according to a 4:1 ratio. The test set for each experimental region is the whole SAR image apart from the selected training and validation sets.   Table 1 presents different land cover patches showing significant differences in intensity, texture, structure, etc. Especially for the polarimetric SAR images, single scattering, dihedral scattering, and inclined dihedral scattering introduce huge differences to the feature space. In the VHR (Very High Resolution) airborne SAR image, shrubs and trees have their own distinctive texture features. In this image, the regular stripes make the shrub easy to recognize. For full-polarization SAR images, polarimetric decomposition provides significant benefits for classifying different land cover types. The building area shows a series of linear stripes with a mix of pink and silver colors formed by different heights of inclined dihedral scattering with a certain rotation. The water area is composed of single scattering with a speckle blue color. Other land types also present significant differences. The training samples of the three given datasets are presented in Table 2. For deep models, the imbalance in the quantity of different land types will lead to poor generalization performance and overfitting. To avoid these issues, random elimination was used to ensure the overall balance of training samples. In each training period, the datasets were randomly divided into training and validation sets according to a 4:1 ratio. The test set for each experimental region is the whole SAR image apart from the selected training and validation sets.   Table 1 presents different land cover patches showing significant differences in intensity, texture, structure, etc. Especially for the polarimetric SAR images, single scattering, dihedral scattering, and inclined dihedral scattering introduce huge differences to the feature space. In the VHR (Very High Resolution) airborne SAR image, shrubs and trees have their own distinctive texture features. In this image, the regular stripes make the shrub easy to recognize. For full-polarization SAR images, polarimetric decomposition provides significant benefits for classifying different land cover types. The building area shows a series of linear stripes with a mix of pink and silver colors formed by different heights of inclined dihedral scattering with a certain rotation. The water area is composed of single scattering with a speckle blue color. Other land types also present significant differences. The training samples of the three given datasets are presented in Table 2. For deep models, the imbalance in the quantity of different land types will lead to poor generalization performance and overfitting. To avoid these issues, random elimination was used to ensure the overall balance of training samples. In each training period, the datasets were randomly divided into training and validation sets according to a 4:1 ratio. The test set for each experimental region is the whole SAR image apart from the selected training and validation sets.   Table 1 presents different land cover patches showing significant differences in intensity, texture, structure, etc. Especially for the polarimetric SAR images, single scattering, dihedral scattering, and inclined dihedral scattering introduce huge differences to the feature space. In the VHR (Very High Resolution) airborne SAR image, shrubs and trees have their own distinctive texture features. In this image, the regular stripes make the shrub easy to recognize. For full-polarization SAR images, polarimetric decomposition provides significant benefits for classifying different land cover types. The building area shows a series of linear stripes with a mix of pink and silver colors formed by different heights of inclined dihedral scattering with a certain rotation. The water area is composed of single scattering with a speckle blue color. Other land types also present significant differences. The training samples of the three given datasets are presented in Table 2. For deep models, the imbalance in the quantity of different land types will lead to poor generalization performance and overfitting. To avoid these issues, random elimination was used to ensure the overall balance of training samples. In each training period, the datasets were randomly divided into training and validation sets according to a 4:1 ratio. The test set for each experimental region is the whole SAR image apart from the selected training and validation sets.

Overall Pipeline for SAR Image Classification
The overall algorithm framework for SAR image classification is presented in this section. The proposed classification algorithm is based on an improved, fully connected CRF model which mainly consists of unary potential function, pairwise potential function, superpixel region constraints, model inference, and parameter optimization. In our proposed approach, the unary potential function is defined by the CNN model, which helps realize patch-based pre-classification to model the single-valued dependency relationship between the observation field and the label field. The linear combination of Gaussian kernels determined the pairwise potential function, which enabled the efficient application of contextual information for sophisticated SAR images. Furthermore, to partially mitigate some of the over-smoothing effect, the superpixel region constraints were integrated into the conventional CRF model. Finally, the mean field approximation inference (MFAI) algorithm was introduced to enable rapid model solving. The overall framework for SAR image classification is depicted in Figure 2.

Overall Pipeline for SAR Image Classification
The overall algorithm framework for SAR image classification is presented in this section. The proposed classification algorithm is based on an improved, fully connected CRF model which mainly consists of unary potential function, pairwise potential function, superpixel region constraints, model inference, and parameter optimization. In our proposed approach, the unary potential function is defined by the CNN model, which helps realize patch-based pre-classification to model the single-valued dependency relationship between the observation field and the label field. The linear combination of Gaussian kernels determined the pairwise potential function, which enabled the efficient application of contextual information for sophisticated SAR images. Furthermore, to partially mitigate some of the over-smoothing effect, the superpixel region constraints were integrated into the conventional CRF model. Finally, the mean field approximation inference (MFAI) algorithm was introduced to enable rapid model solving. The overall framework for SAR image classification is depicted in Figure 2.

Fully Connected Conditional Random Field Mode
The conditional random field is a probabilistic graphical model that directly models the posteriori probability of the label field based on observation field. Considering the neighborhood structure of the CRF pairwise potential, the connection types included 4adjacent, 8-adjacent, and full connection. Compared with the partially connected CRF, the fully connected CRF performed better in the synthesis of global contextual information. It completely perceived shape, texture, and intensity. According to Hammersley-Clifford theorem, the posteriori probability of CRF model is defined as a Gibbs distribution: where X is the observation field; Y is the label field; φ c is a potential function defined in the maximum clique c; C is the set of maximum cliques, which is defined in the graph of the label field; and Z(X) is defined as the normalized term. The CRF model aims to infer the MAP label y* for the observed value, which can be represented as follows: The Gibbs energy for a single label is computed as follows: For the second-order fully connected CRF model, the Gibbs energy consists of unary potential and pairwise potential. In this case, the posteriori probability of CRF model can be given by the following: where Σϕ µ (y i |X) is the unary potential, which represents the per-pixel inference between the observation field and the label field. The unary potential is obtained by the pattern recognition classifier based on each pixel. Σϕ p (y i ,y j |X) is the pairwise potential which describes the correlation among different pixels. The pairwise potential is defined as the linear combination of Gaussian kernels, which is defined as follows: where µ(y i ,y j ) is the compatibility function. It is a penalty coefficient for the adjacent similar pixels with different labeling. w (m) is the linear combination weight for the Gaussian kernel k (m) (v i ,v j ). The Gaussian kernel function is given by the following: For SAR image classification, the discriminative texture features usually involve the synthesis of multiple intensities and positions [14]. Especially for the polarized SAR data, the operation of polarimetric decomposition introduces multidimensional scattering intensity vectors. Therefore, the pairwise potential is composed of dual Gaussian kernels, which focus on the nearby similar regions and speckle regions, respectively. The two-part Gaussian kernels are defined as follows: where p i and p j are the pixel positions, I i and I j are the intensity vectors, θ α and θ γ control the degree of nearness, and θ β describes the constraint of similarity. In this two-part Gaussian kernels, the first part dominates the same labeling among similar nearby pixels, and the other helps reduce isolated speckled regions. The MAP solution of a fully connected CRF model is an inferential optimization. It is a difficult task with high time complexity. Therefore, many algorithms adopt indirect approaches to attain the approximate solution. One of the most efficient algorithms is called filter-based mean field approximation (MFA) inference [39]; it is a variational approximate method based on a series of independent marginal distributions and Gaussian filtering message passing. The mean field algorithm proposes an alternative distribution Q(Y) to replace the original distribution P(Y). This strategy transforms the exact distribution into a series of independent marginals: This procedure is implemented by minimizing the KL-divergence D(Q||P) between the approximate distribution Q and the original distribution P. The iterative update equation is computed as follows: where C is the set of cliques on the whole graph and y c is the set of samples in the clique c.
For the unary potential, the update equation is extracted as follows: With regard to the pairwise potential, the update equation is given by the following: The second-order iterative update equation can be represented as follows: As for the fully connected CRF model, the computation is still costly in a direct pipeline. To further accelerate the inference, a Gaussian bilateral filter is introduced to realize the efficient computation. The convolution operator with high-dimensional filtering kernels is computed as follows: where G m is the mth Gaussian kernel in the pairwise potential and ⊗ is the convolutional operator. For each pixel, the final labeling is determined by the maximum marginal distribution.

Algorithm 1. Mean field approximation inference.
Input: Observation field X and label field Y. The orders of potential function M, the set of maximum clique C, and the number of iterations D. i, j∈ [1, . . . , N], N is the number of samples in a given clique. l∈[l 1 , . . . , L], L is the set of labels. 1: Initialize marginal distribution: For all i∈ [1, N], normalize Q i (y i |X). 7: end while Output: The mean field approximation distribution Q(Y|X).

Convolutional Neural Network Pre-Classification
Compared with conventional pattern recognition algorithms, the convolutional neural network is superior in several advanced aspects, such as big data processing, feature self-learning, and robust generalization. It is more accurate in many pattern-recognition applications [29,[40][41][42]. As for the two-order fully connected CRF model, the unary potential needs to acquire an initial labeling probability vector for each single pixel. This is the fundamental pixelwise relation model between the observation and label fields. Therefore, it is advisable for the CRF to fully use the patch-based CNN model as the unary potential. This strategy can offer a superior label probability vector for each sample and can help to relieve the edge-preserving limitation of a patch-based CNN model. As a powerful nonlinear prediction model, the patch-based CNN algorithm has a competitive advantages in local pattern recognition for SAR images. It can provide superior labeling probability vectors [Y 1 , Y 2 , Y 3 , . . . , Y n ] for each pixel, which is suitable as the initial unary potential ϕ µ (y i |X) for the CRF model.
In the convolutional neural topology, multidimensional imagery patterns with complicated texture can be decomposed into hierarchical feature maps. These feature maps provide considerable advantages in terms of descriptive capability and spatial invariance. The powerful feature representation makes this layer-wise connecting network a suitable candidate for the pattern recognition of complex SAR images. There are four main layers in the CNN hierarchy: convolution, pooling, dense connection, and SoftMax. The stacked convolution and pooling layers are the important feature extractors. The dense layers help to reduce feature dimensions. At last, the SoftMax classifier realizes label prediction.
The convolution layer is the important feature extractor. It sets a series of trainable convolutional kernels to extract feature maps. The competitive advantages of the convolution layers lie in the local perceptive fields and weight sharing. These strategies help to efficiently reduce the time and space complexity. Compared with classical fully connected neural networks, the convolution operation takes into account the preservation of spatial locality. In the process of convolutional computation between two layers, the data flows of feature maps follow the following formula: where f (·) is the activation function, x l j denotes the jth feature map in the lth layer, M j stands for the sequence of associated features, and k l ij is a trainable kernel that involves the ith input feature map and the jth output feature map. Through this layer-wise mechanism, multiscale imagery features are converted to advanced abstraction.
Maxpooling is an efficient downsampling operation between different convolution layers. It is used to extract the local patch maximum among neighboring pixels. This layer is able to achieve feature dimension reduction and to improve some degrees of distortion invariance. Maxpooling is represented as follows: where G denotes the neighboring scale of pooling with offset location (∆x, ∆y) and s is the pooling stride. In practical use, we usually adopt the neighboring 2 × 2 with a stride 2 maxpooling. The dense layer is also called the fully connected layer. In this layer, each neural node is linked to all nodes of the previous layer. It can map multidimensional feature space to a one-dimensional vector. This layer plays an important role in the synthesized representation of hierarchical features. Furthermore, the synthesis of feature weighting also benefits greatly in follow-on regression. The SoftMax model is an extension of the classical logistic regression. It produces perfect performance on multi-class pattern recognition jointly with the CNN topology. The SoftMax model is defined as follows: where y l i denotes the ith labeling prediction vector and a l is defined as a l = k T x l + b. In addition, x l i is the ith input of the lth layer and Σe a is a normalization which calculates the weighted sum of all neurons in the lth layer. In the training stage, cross entropy is introduced as the cost function to evaluate the convergent performance. The cross-entropy function is represented as follows: where y j denotes the label truth of the jth sample, which has been declared in the training dataset. CNN models adopt a backpropagation mechanism to perform the weight update. The gradient of cost function is given by the following: Depending on i and j, the derivative of cost function is divided into two cases. When i = j, the equation is defined as follows: Otherwise, the partial differential equation is as follows: Therefore, the overall gradient of cost function is computed as: Based on the stochastic gradient descent (SGD) algorithm, the model weights will be updated iteratively. Simply put, backpropagation introduces an efficient control mechanism for feedback in the neural network. for b = 1 to B do 4: end for 13: Compute SoftMax functionŷ b ← so f tmax(D F b ) . 14: Compute Back propagation based on gradient ∂C b ∂a . 16: end for 17: end while Output: Pre-classified observation field.

Simple Linear Iterative Clustering Superpixel Boundary Constraint
The two-order fully connected CRF model takes into full consideration the contextual information in image labeling. Simultaneously, the pairwise penalty factor introduces excessive smoothing to the boundaries or narrow micro-regions of the given imagery. To partially alleviate these limitations, the boundary constraint mechanism is integrated into the model inference. A superpixel is a small irregular imagery patch with self-similar visual sense in terms of intensity and texture. It is an efficient simplified representation of a complex observation field. Therefore, it has the potential to provide a boundary constraint for the CRF inference, which aims to alleviate excessive smoothing, to some degree.
The boundaries of all extracted superpixels are regarded as auxiliary decision-making information for the CRF inference framework. This mechanism provides a beneficial modification that improves the whole model. In fact, it further strengthens the associated dependency among the pixel neighborhood. In order to introduce the boundary constraint, Equation (12) is followed by a weighted computation. For pixel i, the weighted formula is defined as follows: where w s is the constrained weight. Pixel i is located in the superpixel region S i , and the number of pixels in S i is given by N i . In the iterative process of CRF inference, the posteriori probability of each pixel is modified by the average probability in the boundary constraint region. Many researches have been looking forward toward graph-cut algorithms having great potential in improving SAR image classification [16,18]. The simple linear iterative clustering (SLIC) algorithm [43] is a state-of-the-art graph-cut method. It adopts the idea of space transformation and clustering to generate a series of highly self-similar superpixels. This algorithm supports the graph-cut of color and gray images. Therefore, it is suitable for processing single and full polarization SAR images. The SLIC algorithm has competitive advantages in generating ideal boundary constraints for the random field inference procedure. Firstly, the computational cost of the SLIC optimization is dramatically reduced by limiting the searching space. This reduces the space complexity to linear. On the other hand, the synthesis of distance measures combines intensity and spatial proximity, which offers an efficient approach to controlling the size and compactness of superpixels. In this graph-cut algorithm, the single pixel is transformed into a feature vector where d c is the distance of the color space, d s is the distance of the location space, and N C and N S are the color and location normalization, respectively. This distance measure is the evaluation criteria for later clustering. The initially given cluster centers are adjusted iteratively according to the nearest distance D. Comparing the latter with the former cluster centers, the residual error E is computed by an L 2 normalization. The stepwise update is performed until the residual error converges.

4:
Iterative compute and weighted update Q i ← Q i + w s Q i /(1 + w s ) . 5: end for 6: Initialize Q i ← Q i . 7: end for Output: The CRF probability distribution with superpixel boundary constraint.

Experimental Results
In this part, we designed a series of contrast experiments to verify the validity of our proposed classification algorithm. Several widely used features and machine learning algorithms were introduced to the contrast evaluation. In the contrast experiments, we adopted different hand-designed features that are accepted as state-of-the-art methods for SAR image classification, including GLCM and Gabor wavelet. In addition, multiple widely accepted machine learning algorithms were involved in the controlled experiments, including SVM, RF, GBDT, baseline CNN, and ConvCRF. In order to guarantee statistically significant experiments and to acquire average results, the Monte Carlo random state and data shuffle strategy were introduced to the multiple redundancy-repeating runs. Each independent run used a random data queue and model initial state. Reasonable adjustment of hyperparameters was conducted to ensure convergence and efficiency of the different involved classification algorithms.
In the final accuracy evaluation stage, we used the average producer's accuracy (PA), user's accuracy (UA), overall accuracy (OA), and kappa coefficient as the main assessment indexes. PA denotes the fraction of correctly classified pixels with regard to all pixels of that ground truth class, and UA indicates the fraction of correctly classified pixels with regard to all pixels classified as this class in the classified image. OA is calculated as the total number of correctly classified pixels divided by the total number of test pixels. The kappa coefficient represents the agreement between the classification result and the ground truth, which ranges from 0 to 1. The closer this value is to 1, the higher accuracy the classification is.

Classification Experiments on MSTAR X-Band Single-Polarized Dataset
The MSTAR cluster is a public open dataset with 0.3-m resolution and single polarization. In this experiment, multiple algorithms with optimal hyperparameters were adopted to perform controlled contrast experiments. The Monte Carlo random state and data shuffle training strategies were introduced to duplicate the experiment multiple times and to figure out statistically average results. The classification maps of the involved algorithms and the accuracy statistics are displayed in Figure 3 and Table 3. The MSTAR cluster is a public open dataset with 0.3-m resolution and single polarization. In this experiment, multiple algorithms with optimal hyperparameters were adopted to perform controlled contrast experiments. The Monte Carlo random state and data shuffle training strategies were introduced to duplicate the experiment multiple times and to figure out statistically average results. The classification maps of the involved algorithms and the accuracy statistics are displayed in Figure 3 and Table 3. It is intuitively depicted in Figure 3 that the speckle noise widely distributed in VHR single-polarized SAR images has extremely negative effects on the classification tasks. According to Figure 3, it is noticeable that the classification maps of the SVM, RF, GBDT, and baseline CNN algorithms are seriously polluted by the misclassified speckle points. As for the inherent backscattering features, most of the fuzzy classification confusion is distributed in shrub and ground areas. After the GLCM and Gabor wavelet texture representations were introduced to combine the inherent backscattering and hand-designed features, obviously, the classification results were partially improved in the former confusing  It is intuitively depicted in Figure 3 that the speckle noise widely distributed in VHR single-polarized SAR images has extremely negative effects on the classification tasks. According to Figure 3, it is noticeable that the classification maps of the SVM, RF, GBDT, and baseline CNN algorithms are seriously polluted by the misclassified speckle points. As for the inherent backscattering features, most of the fuzzy classification confusion is distributed in shrub and ground areas. After the GLCM and Gabor wavelet texture representations were introduced to combine the inherent backscattering and hand-designed features, obviously, the classification results were partially improved in the former confusing areas. Furthermore, it can be illustrated that the edges of different land cover types exhibit many misclassified commission errors, especially in the boundary of radar shadow and ground areas. The baseline CNN algorithm improved the pattern recognition of different land cover types and alleviated some of the speckle confusion and boundary errors. Our proposed method presents obvious superiority to the other algorithms. Compared with the other methods, the speckle confusion and boundary error are effectively resolved in our algorithm. Moreover, the SBC mechanism applied to global inference realizes a more complete structure preservation for the boundary of different land cover types.
According to Table 3, we can draw more quantitative conclusions. It can be intuitively noticed that there are significant differences in the classification accuracy of multiple algorithms. Our proposed method achieves better performance than the other models for most land cover types. Especially for the shrub area, the proposed method almost increases the classification producer accuracy by more than 10%. We can infer that the proposed method has more powerful recognition ability for complex texture cover types in SAR images, such as shrubs and trees. In addition, compared with the combination of classical machine learning algorithms and widely accepted hand-designed feature extractors, the deep models present dominant advantages, as evidenced by the OA and kappa coefficient values. However, it is worth noting that the hand-designed features combined with machine learning algorithms provide a partial advantage in shadow extraction. According to the previous analysis, the deep models are less sensitive to homogeneous land types with low backscattering intensity. However, as with the classical machine learning algorithms, this difference is not obvious.

Classification Experiments on E-SAR L-Band Full-Polarization Dataset
The L-band E-SAR Oberpfafenhofen is a full-polarization SAR data with 3-m resolution. The Pauli-basis decomposition [|S HH +S VV | 2 , |S HH -S VV | 2 , 2|S HV | 2 ] based on the polarimetric scattering coherence matrix provides more discriminative information for SAR image classification. In our experiments, several hand-designed texture feature extractors and machine learning algorithms were adopted to perform the comparative tests. All machine learning experiments followed the Monte Carlo runs and data shuffle strategy to acquire statistically average results. The classification maps of the considered algorithms are shown in Figure 4. Moreover, the quantitative accuracy evaluation is displayed in Table 4.
As shown in Figure 4, it is worth noting that most of the widely accepted SAR image classification algorithms suffer from serious speckle pollution and boundary confusion. The commission error mainly occurs in building areas, and the omission error is mainly distributed in building and woodland areas. When only using inherent backscattering polarized information to conduct classification tasks, the different land cover types are extremely confused, especially in building areas. After introducing hand-designed texture features, the confusion errors were drastically reduced. However, the boundary between woodland and herbage areas still has many misclassified errors. It is intuitively illustrated that our proposed algorithm provides competitive advantages in the pattern recognition of different land cover types and performs better in terms of the generalization and robustness of the widespread speckle noise. In addition, the proposed method efficiently reduces boundary errors and preserves the complete structure.
Moreover, we can get more quantitative accuracy results in Table 4. It is noticeable is that the proposed algorithm has dominant advantages in the extraction of buildings and woodland. Compared with the widely used algorithms, the proposed method increases the producer accuracy of building and vegetation areas by roughly 10%. The OA and kappa coefficient also display considerable improvement. It is noted that the combination of polarimetric scattering and GLCM texture features also works well in polarized SAR image classification. Similar to what we have observed earlier, the homogenous airport runway area with low scattering intensity is not sensitive to texture descriptors. The scattering features with machine learning algorithms achieve the best performance in terms of the extraction of airport runway.

Classification Experiments on GF-3 C-Band Full-Polarization Dataset
The GF-3 dataset is a Pauli-basis decomposition SAR image with 8-m resolution. In the experiments with GF-3 data, we adopted multiple machine learning algorithms to evaluate the accuracy, including SVM, RF, GBDT, baseline CNN, ConvCRF, and our proposed algorithm. Furthermore, widely used hand-designed texture descriptors were also introduced to contrast the experiments. All the machine learning algorithms executed Monte Carlo random state and the data shuffle strategy to ensure the independence of repeating runs. The classification maps and accuracy results are presented in Figure 5 and Table 5, respectively.
According to Figure 5, it is intuitively illustrated that most of involved algorithms achieve good performance on the classification of GF-3 polarimetric SAR data. As for this moderate spatial-resolution SAR image, the negative impact of speckle noise apparently decreased with large-scale pattern recognition of land cover types. Several machine learning algorithms combined with hand-designed texture descriptors performed better on the extraction of water and mountains. However, on account of the lower generalization, their classification maps suffered serious speckle pollution in building and vegetation areas. However, it is worth noting that the proposed method successfully solved this problem. It can realize higher accuracy in building and vegetation areas and produced a cleaner classification map. It is noticeable is that the baseline CNN algorithm was not superior in the classification of this moderate spatial-resolution polarimetric SAR image. It was inferior to the ConvCRF algorithm in the highly fragmented areas.
distributed in building and woodland areas. When only using inherent backscattering polarized information to conduct classification tasks, the different land cover types are extremely confused, especially in building areas. After introducing hand-designed texture features, the confusion errors were drastically reduced. However, the boundary between woodland and herbage areas still has many misclassified errors. It is intuitively illustrated that our proposed algorithm provides competitive advantages in the pattern recognition of different land cover types and performs better in terms of the generalization and robustness of the widespread speckle noise. In addition, the proposed method efficiently reduces boundary errors and preserves the complete structure.
Moreover, we can get more quantitative accuracy results in Table 4. It is noticeable is that the proposed algorithm has dominant advantages in the extraction of buildings and woodland. Compared with the widely used algorithms, the proposed method increases the producer accuracy of building and vegetation areas by roughly 10%. The OA and kappa coefficient also display considerable improvement. It is noted that the combination of polarimetric scattering and GLCM texture features also works well in polarized SAR image classification. Similar to what we have observed earlier, the homogenous airport runway area with low scattering intensity is not sensitive to texture descriptors. The scattering features with machine learning algorithms achieve the best performance in terms of the extraction of airport runway.

Classification Experiments on GF-3 C-Band Full-Polarization Dataset
The GF-3 dataset is a Pauli-basis decomposition SAR image with 8-m resolution. In the experiments with GF-3 data, we adopted multiple machine learning algorithms to evaluate the accuracy, including SVM, RF, GBDT, baseline CNN, ConvCRF, and our proposed algorithm. Furthermore, widely used hand-designed texture descriptors were also introduced to contrast the experiments. All the machine learning algorithms executed Monte Carlo random state and the data shuffle strategy to ensure the independence of repeating runs. The classification maps and accuracy results are presented in Figure 5 and  According to Figure 5, it is intuitively illustrated that most of involved algorithms achieve good performance on the classification of GF-3 polarimetric SAR data. As for this moderate spatial-resolution SAR image, the negative impact of speckle noise apparently decreased with large-scale pattern recognition of land cover types. Several machine learning algorithms combined with hand-designed texture descriptors performed better on the extraction of water and mountains. However, on account of the lower generalization, their classification maps suffered serious speckle pollution in building and vegetation areas. However, it is worth noting that the proposed method successfully solved this problem.  The complete accuracy evaluation statistics are provided in Table 5 in detail. As shown in Table 5, we can observe that commission errors mainly exists in mountain and vegetation areas, while omission errors were mainly distributed in mountain areas. Compared with state-of-the-art machine learning algorithms, the proposed method achieves a better performance in terms of the OA and kappa coefficient. In addition, it has an obvious advantage in the recognition of buildings and vegetation, which contain more complex texture features with fuzzy speckle noise. Moreover, it is worth noting that the widely accepted machine learning methods achieve higher accuracy in the extraction of water areas.

Analysis and Discussion
This section provides our detailed analysis and discussion of the whole experimental pipeline. Section 5.1 discusses the selection of optimal hyperparameters. The effectiveness analysis of the superpixel boundary constraint is illustrated in Section 5.2.

Selection of Optimal Hyperparameters
The selection of optimal hyperparameters is essential for the final classification performance. In other words, the settings of different hyperparameters have significant impacts on the feature representation, model velocity, and veracity of the results. In our contrast experiments, we set multiple hyperparameters to evaluate different experimental trajectories and to select the optimal values as the final settings. Considering the influence and priority in varying degrees, we selected a series of key parameters, including the size of SAR patches, the computational weights of two-part Gaussian kernels, the intensity factor of the first Gaussian kernel, and the nearness factors of the two-part Gaussian kernels. Moreover, the parameters of the superpixel boundary constraint are also important for the overall model performance, which is analyzed and evaluated in detail in Section 5.2. All our experiments were conducted using the scientific Python development environment on the Windows platform with Intel Core i7-4790K CPU, NVIDIA Tesla K20c GPU, and 32 GB running memory. In addition, an international end-to-end open-source machine learning framework TensorFlow undertook a part of the feature computational work. We introduced a parallel computing library CUDA to accelerate the computing speed.
In this part, we conducted contrast experiments and analyses on the three given datasets. Large coverage and messy mixing regions were included in these typical multipolarization SAR images, which put forward higher robustness and requirements on algorithm performance. To verify the reliability of the experiments and to exclude accidents, we conducted multiple repeated experiments to acquire the average overall accuracy (OA) and Kappa coefficient. Furthermore, some statistically significant strategies were also used in these machine learning experiments, including Monte Carlo random state and data shuffle.
First, we designed a series of contrast experiments to analyze the patch size setup. Multiple patch scales ranging from 7 to 15 for X-band MSTAR SAR, 5 to 13 for L-band E-SAR, and 19 to 27 for the C-band GF-3 SAR were evaluated with a fixed offset. In Figure 6, it is intuitively observed that SAR patch size has a significant influence on the classification performance. For the identical SAR dataset, varying scales of patch size lead to slight fluctuation in classification accuracy within a certain range. Meanwhile, the classification results of airborne and spaceborne SAR images also showed different sensitivities to patch size. According to the experimental results, the optimal SAR patch sizes of the given datasets are 13, 5, and 25, respectively. Moreover, stepwise accuracy analysis of the proposed model demonstrated differences in algorithm performance. An optimal patch size realizes a classified balance between the degrees of discrimination and confusion, which captures abundant discriminative features and suppresses much of the classification confusion. In Figure 7, the C-band GF-3 polSAR dataset is used as an example. Clearly, it is observed that the classification performance changes with the increase of SAR patch sizes.
given datasets are 13, 5, and 25, respectively. Moreover, stepwise accuracy analysis of the proposed model demonstrated differences in algorithm performance. An optimal patch size realizes a classified balance between the degrees of discrimination and confusion, which captures abundant discriminative features and suppresses much of the classification confusion. In Figure 7, the C-band GF-3 polSAR dataset is used as an example. Clearly, it is observed that the classification performance changes with the increase of SAR patch sizes.  In the second setup, we conducted contrast sensibility tests of the hyperparameters for Gaussian kernel functions in our proposed model. Variable hyperparameters of the kernels had a slight effect on the final classification results. In order to explore the sensitivity analysis of Gaussian kernel functions independently and to exclude the training uncertainty of CNN, we controlled the experiments to select the same CNN prediction as unary potential. As a result, for each set of Gaussian kernel parameters, we acquired a given datasets are 13, 5, and 25, respectively. Moreover, stepwise accuracy analysis of the proposed model demonstrated differences in algorithm performance. An optimal patch size realizes a classified balance between the degrees of discrimination and confusion, which captures abundant discriminative features and suppresses much of the classification confusion. In Figure 7, the C-band GF-3 polSAR dataset is used as an example. Clearly, it is observed that the classification performance changes with the increase of SAR patch sizes.  In the second setup, we conducted contrast sensibility tests of the hyperparameters for Gaussian kernel functions in our proposed model. Variable hyperparameters of the kernels had a slight effect on the final classification results. In order to explore the sensitivity analysis of Gaussian kernel functions independently and to exclude the training uncertainty of CNN, we controlled the experiments to select the same CNN prediction as unary potential. As a result, for each set of Gaussian kernel parameters, we acquired a In the second setup, we conducted contrast sensibility tests of the hyperparameters for Gaussian kernel functions in our proposed model. Variable hyperparameters of the kernels had a slight effect on the final classification results. In order to explore the sensitivity analysis of Gaussian kernel functions independently and to exclude the training uncertainty of CNN, we controlled the experiments to select the same CNN prediction as unary potential. As a result, for each set of Gaussian kernel parameters, we acquired a unique constant probability graph. In this experiment, OA was only affected by the hyperparameters of Gaussian kernel functions. Therefore, the experimental results had no standard deviation in this part.
The four main hyperparameters that dominated the nearness labeling and speckle reduction were evaluated in this part, including nearby similarity weight, nearness factor, intensity factor, and isolated speckle weight. A sample of the results with the MSTAR dataset is depicted in Figure 8, which shows the change regularity of OA with different parameter settings. In general, these four main kernel function hyperparameters dominated the contextual feature information and speckle noise filtering of the overall model framework. Figure 8a illustrates that OA rises at first and then gradually flattens with increasing nearby similarity weight. Variable nearness factors in the first kernel also have different effects on OA. The larger nearness factors in the first kernel are more efficient at improving OA, but they are not sensitive to nearby similarity weight. Figure 8b depicts the dynamic factor relationship in the first kernel. Although these two factors make small differences in the absolute OA value, given the inverse relationship between computational factors and signal intensity, we infer that the nearness factor has a relatively larger impact on OA. Moreover, we can see a more significant relation between the nearness factors and the isolated speckle weights in Figure 8c. For the smaller nearness factors in the second kernel, OA is not sensitive to the isolated speckle weight. Meanwhile, with the increasing of nearness factor in the second kernel, the proposed model has better performance at first and then shows an obvious decrease. unary potential. As a result, for each set of Gaussian kernel parameters, we acquired a unique constant probability graph. In this experiment, OA was only affected by the hyperparameters of Gaussian kernel functions. Therefore, the experimental results had no standard deviation in this part.
The four main hyperparameters that dominated the nearness labeling and speckle reduction were evaluated in this part, including nearby similarity weight, nearness factor, intensity factor, and isolated speckle weight. A sample of the results with the MSTAR dataset is depicted in Figure 8, which shows the change regularity of OA with different parameter settings. In general, these four main kernel function hyperparameters dominated the contextual feature information and speckle noise filtering of the overall model framework. Figure 8a illustrates that OA rises at first and then gradually flattens with increasing nearby similarity weight. Variable nearness factors in the first kernel also have different effects on OA. The larger nearness factors in the first kernel are more efficient at improving OA, but they are not sensitive to nearby similarity weight. Figure 8b depicts the dynamic factor relationship in the first kernel. Although these two factors make small differences in the absolute OA value, given the inverse relationship between computational factors and signal intensity, we infer that the nearness factor has a relatively larger impact on OA. Moreover, we can see a more significant relation between the nearness factors and the isolated speckle weights in Figure 8c. For the smaller nearness factors in the second kernel, OA is not sensitive to the isolated speckle weight. Meanwhile, with the increasing of nearness factor in the second kernel, the proposed model has better performance at first and then shows an obvious decrease.

Effectiveness Analysis of SBC
In this part, we used the MSTAR data as an example to analyze the effectiveness of the SBC mechanism in SAR image classification. Although the ConvCRF model performed better than the baseline deep learning model, MFAI still introduced excessive smoothing omission error to imaging boundaries or narrow micro-regions. To partially mitigate this issue, the SBC mechanism was applied to the MFAI iterative computing process to minimize the labeling loss of the classical unconstrained inference. In this part, we focus on a discussion of the effect of different SBC hyperparameters. The main constraint factors that control homogeneity and edges are the number of super cliques and adjacent compaction. In this experiment, OA was only affected by the parameters of SBC. As for the same SBC parameter setting, the classification result is invariable. Therefore, the experimental results had no standard deviation in this part. In Figure 9, we present a series

Effectiveness Analysis of SBC
In this part, we used the MSTAR data as an example to analyze the effectiveness of the SBC mechanism in SAR image classification. Although the ConvCRF model performed better than the baseline deep learning model, MFAI still introduced excessive smoothing omission error to imaging boundaries or narrow micro-regions. To partially mitigate this issue, the SBC mechanism was applied to the MFAI iterative computing process to minimize the labeling loss of the classical unconstrained inference. In this part, we focus on a discussion of the effect of different SBC hyperparameters. The main constraint factors that control homogeneity and edges are the number of super cliques and adjacent compaction. In this experiment, OA was only affected by the parameters of SBC. As for the same SBC parameter setting, the classification result is invariable. Therefore, the experimental results had no standard deviation in this part. In Figure 9, we present a series of stepwise SBC conditions. Moreover, to give a visual display of the SBC regions, the pixels within identical iterative computing boundaries were replaced by the mean values. It is worth noting that the homogeneous edges were found to be strongly influenced by the quantity and compaction factors in the high-resolution cluster SAR image. In addition, we can observe that the micro-regions and edges of the sample classification maps were influenced by the different SBC parameters. Figure 10a illustrates that the hyperparameters of SBC have an obvious influence on OA. With the increasing number of superpixels, OA rose up at first and then gradually decreased in most cases. If it was below the optimal scale, the clique was too small to obtain a precise edge. The excessively large scale introduced considerable confusion instead. On the other hand, we introduced different constrained levels of the SBC mechanism in the MFAI process. Variable constraint weights are involved in the MFAI computing procedure. The experimental results of different SBC controlling powers are presented in Figure 10b. As the constraint weight increased, OA showed an approximately single-peak distribution. influenced by the different SBC parameters. Figure 10a illustrates that the hyperparameters of SBC have an obvious influence on OA. With the increasing number of superpixels, OA rose up at first and then gradually decreased in most cases. If it was below the optimal scale, the clique was too small to obtain a precise edge. The excessively large scale introduced considerable confusion instead. On the other hand, we introduced different constrained levels of the SBC mechanism in the MFAI process. Variable constraint weights are involved in the MFAI computing procedure. The experimental results of different SBC controlling powers are presented in Figure 10b. As the constraint weight increased, OA showed an approximately single-peak distribution.

Sensitivity Analysis of increasing data portions
Generally speaking, the conventional single-stream deep models need large datasets to ensure the generalization ability. Our proposed method combined CRF with deep model and SBC mechanism. In this part, we put forward exploratory motivation to analyze the sensitivity and robustness with respect to the increasing training dataset portion. In our experiments, we performed dozens of runs on the proposed model using only 10% of the whole datasets and then by randomly increasing by 10% from the rest of the dataset

Sensitivity Analysis of Increasing Data Portions
Generally speaking, the conventional single-stream deep models need large datasets to ensure the generalization ability. Our proposed method combined CRF with deep model and SBC mechanism. In this part, we put forward exploratory motivation to analyze the sensitivity and robustness with respect to the increasing training dataset portion. In our experiments, we performed dozens of runs on the proposed model using only 10% of the whole datasets and then by randomly increasing by 10% from the rest of the dataset at a time. In Figure 11, it is intuitively observed that OA rises up with the increase in dataset portion. Furthermore, the growth of OA gradually changes slowly. Compared with the proposed model, the baseline CNN is more subjected to the increase in dataset portion. Moreover, it is worth noting that the different image sizes and polarization also present different sensitivities to the increase in dataset portion. Figures 12 and 13 display classification maps of E-SAR data using baseline CNN and our proposed model with 10% dataset portion increase at a time. It is proven that the proposed model has better performance on robustness with respect to the increasing training dataset portion.

Sensitivity Analysis of Increasing Data Portions
Generally speaking, the conventional single-stream deep models need large datasets to ensure the generalization ability. Our proposed method combined CRF with deep model and SBC mechanism. In this part, we put forward exploratory motivation to analyze the sensitivity and robustness with respect to the increasing training dataset portion. In our experiments, we performed dozens of runs on the proposed model using only 10% of the whole datasets and then by randomly increasing by 10% from the rest of the dataset at a time. In Figure 11, it is intuitively observed that OA rises up with the increase in dataset portion. Furthermore, the growth of OA gradually changes slowly. Compared with the proposed model, the baseline CNN is more subjected to the increase in dataset portion. Moreover, it is worth noting that the different image sizes and polarization also present different sensitivities to the increase in dataset portion. Figures 12 and 13 display classification maps of E-SAR data using baseline CNN and our proposed model with 10% dataset portion increase at a time. It is proven that the proposed model has better performance on robustness with respect to the increasing training dataset portion.

Sensitivity Analysis of Increasing Data Portions
Generally speaking, the conventional single-stream deep models need large datasets to ensure the generalization ability. Our proposed method combined CRF with deep model and SBC mechanism. In this part, we put forward exploratory motivation to analyze the sensitivity and robustness with respect to the increasing training dataset portion. In our experiments, we performed dozens of runs on the proposed model using only 10% of the whole datasets and then by randomly increasing by 10% from the rest of the dataset at a time. In Figure 11, it is intuitively observed that OA rises up with the increase in dataset portion. Furthermore, the growth of OA gradually changes slowly. Compared with the proposed model, the baseline CNN is more subjected to the increase in dataset portion. Moreover, it is worth noting that the different image sizes and polarization also present different sensitivities to the increase in dataset portion. Figures 12 and 13 display classification maps of E-SAR data using baseline CNN and our proposed model with 10% dataset portion increase at a time. It is proven that the proposed model has better performance on robustness with respect to the increasing training dataset portion.    Figure 13. Classification maps of E-SAR data using our proposed model with increasing dataset portions: (a-j) dataset portions ranging from 10% to 100% and 10% increases at a time.

Conclusions
High-resolution SAR image classification is always a challenging topic in the fundamental research of microwave remote sensing. In this paper, a novel integrated learning algorithm using ConvCRF with the SBC mechanism was proposed for single and full polarization SAR image classification. According to a series of systematic verifications of contrast experiments, it is clear that the proposed algorithm can successfully obtain optimized classification maps and higher accuracy indexes. The proposed algorithm combines local feature representation with the global MFAI process, which can effectively relieve the issues of confusing speckle noise and can promote local pattern recognition in highresolution SAR images. In the land cover classification experiments using the MSTAR, E-SAR, and GF-3 datasets, the overall accuracy of our proposed method achieves 90.18 ± 0.37, 91.63 ± 0.27, and 90.91 ± 0.31, respectively. In addition, it is worth noting that the proposed method provides competitive advantages in the classification of building and woodland areas, which has potential for remote sensing analysis of urban and agricultural applications. Additionally, for the shrub area in the MSTAR data, the proposed method almost increased the classification producer accuracy by more than 10%. The proposed method presents better performance on the representation of dense texture features. However, compared with other machine learning algorithms with widely used texture descriptors, the proposed algorithm still has some aspects that remain to be improved, especially in terms of homogeneous land cover types with low backscattering intensity. The contrast experiments conducted on different microwave bands of airborne and spaceborne SAR images also verified the applicability of our algorithm.
In order to improve the intelligent analysis of SAR images for more comprehensive remote sensing applications, follow-up research could focus on a wide range of aspects, including ensemble learning with multiple algorithms, more flexible and independent learning with a lower degree of supervision, greedy learning from large amounts of SAR patches, the more efficient patch-scale learning with prior knowledge and mathematical morphology. Figure 13. Classification maps of E-SAR data using our proposed model with increasing dataset portions: (a-j) dataset portions ranging from 10% to 100% and 10% increases at a time.

Conclusions
High-resolution SAR image classification is always a challenging topic in the fundamental research of microwave remote sensing. In this paper, a novel integrated learning algorithm using ConvCRF with the SBC mechanism was proposed for single and full polarization SAR image classification. According to a series of systematic verifications of contrast experiments, it is clear that the proposed algorithm can successfully obtain optimized classification maps and higher accuracy indexes. The proposed algorithm combines local feature representation with the global MFAI process, which can effectively relieve the issues of confusing speckle noise and can promote local pattern recognition in high-resolution SAR images. In the land cover classification experiments using the MSTAR, E-SAR, and GF-3 datasets, the overall accuracy of our proposed method achieves 90.18 ± 0.37, 91.63 ± 0.27, and 90.91 ± 0.31, respectively. In addition, it is worth noting that the proposed method provides competitive advantages in the classification of building and woodland areas, which has potential for remote sensing analysis of urban and agricultural applications. Additionally, for the shrub area in the MSTAR data, the proposed method almost increased the classification producer accuracy by more than 10%. The proposed method presents better performance on the representation of dense texture features. However, compared with other machine learning algorithms with widely used texture descriptors, the proposed algorithm still has some aspects that remain to be improved, especially in terms of homogeneous land cover types with low backscattering intensity. The contrast experiments conducted on different microwave bands of airborne and spaceborne SAR images also verified the applicability of our algorithm.
In order to improve the intelligent analysis of SAR images for more comprehensive remote sensing applications, follow-up research could focus on a wide range of aspects, including ensemble learning with multiple algorithms, more flexible and independent learning with a lower degree of supervision, greedy learning from large amounts of SAR patches, the more efficient patch-scale learning with prior knowledge and mathematical morphology.
Funding: This research was funded by the National Key R&D Program of China "Research on establishing medium spatial resolution spectrum earth and its application", China's 13th Five-year Plan Civil Space Pre-Research Project under grant Y7K00100KJ, and China's 13th Five-year Plan Civil Space Pre-Research Project "National emergency planning, response and information support using satellite remote sensing" under grant Y930060K8M.