PolSAR Image Classiﬁcation Using a Superpixel-Based Composite Kernel and Elastic Net

: The presence of speckles and the absence of discriminative features make it difﬁcult for the pixel-level polarimetric synthetic aperture radar (PolSAR) image classiﬁcation to achieve more accurate and coherent interpretation results, especially in the case of limited available training samples. To this end, this paper presents a composite kernel-based elastic net classiﬁer (CK-ENC) for better PolSAR image classiﬁcation. First, based on superpixel segmentation of different scales, three types of features are extracted to consider more discriminative information, thereby effectively suppressing the interference of speckles and achieving better target contour preservation. Then, a composite kernel (CK) is constructed to map these features and effectively implement feature fusion under the kernel framework. The CK exploits the correlation and diversity between different features to improve the representation and discrimination capabilities of features. Finally, an ENC integrated with CK (CK-ENC) is proposed to achieve better PolSAR image classiﬁcation performance with limited training samples. Experimental results on airborne and spaceborne PolSAR datasets demonstrate that the proposed CK-ENC can achieve better visual coherence and yield higher classiﬁcation accuracies than other state-of-art methods, especially in the case of limited training samples. datasets acquired from different systems evaluated the classiﬁcation performance and effectiveness of the proposed CK-ENC. The classiﬁcation results demonstrate that CK-ENC outperforms the state-of-the-art methods both in quantitative metrics and in visual quality, especially under the circumstance of limited training samples. In future work, we will generalize the CK-ENC method to classify dual-frequency PolSAR datasets.


Introduction
Since the polarimetric synthetic aperture radar (PolSAR) systems can transmit and receive electromagnetic signals in different polarization channels [1], the PolSAR datasets can provide more detailed information about the backscattering phenomena than data collected by single-channel SAR or other remote sensing systems [2]. The availability of PolSAR data stimulates intensive research in polarimetric analysis techniques and applications, including PolSAR target detection [3], change detection [4], polarization classification, and so on. In particular, the PolSAR image classification continues to be an active field of research [2].
In the remote sensing community, algorithms for PolSAR image classification are endless [5][6][7][8][9][10][11]. The feature extraction, as one important aspect of classification algorithms, has seen a lot of success and received sustained development [12]. The features used for PolSAR image classification include the polarization target decomposition (TD) features [5,[13][14][15], the polarization data features [16,17], and so on [18,19]. Here, these feature extractions can be called explicit feature extractions [2], where features are extracted by projecting the PolSAR complex-valued data into the real domain. Broadly, in the explicit feature extraction aspect, the following problems may be encountered [2,20]. Firstly, the feature extraction process increases computation time and computational load. Secondly, features for special classification tasks that are hand-crafted and determined by plenty of experiments require manual trial and involve computational error. Finally, the feature extraction process cannot avoid the loss of valuable information [21].
Since the scattering characteristics of the distributed targets for PolSAR images can be described by their coherency or covariance matrix [22], it is reasonable to make classification algorithms work directly on these complex-valued (CV) matrices. At the same time, this can also ease the aforementioned problems caused by the explicit feature extraction from original PolSAR CV data. One choice is classification approaches based on statistical distribution assumption [7,21,[23][24][25][26]. However, the common disadvantages of these methods are usually complicated parameter estimation and limited model applicability [27,28]. Recently, some classifiers with the training-testing format, which work directly on the PolSAR CV data, have constituted an active area of research [2,20,[29][30][31][32][33][34]. Among them, complex-valued networks provide results comparable to networks designed for real-valued input [32][33][34]. Although these methods have achieved remarkable breakthroughs, the demands for a large number of labeled samples and their sensitivity to training parameters remain to be solved [12,35]. Since the PolSAR matrices form a Riemannian manifold instead of Euclidean space [36], other classification methods based on CV matrices utilize the similarities between PolSAR matrix samples in the manifold [27,28,36]. Among these methods, representation-based classification methods [27,28] are flexible and can be applied to different polarized SAR datasets without certain distribution assumptions and training processes [27].
In addition, as we know, due to the imaging mechanism, PolSAR images are heavily contaminated by the inherent speckles [1]. Note that some of the aforementioned methods based on PolSAR matrices only consider the polarimetric characteristics [20,27,29,30]. The existence of speckles may make classification results include many misclassified pixels and degrade the quality of classification, especially when the training samples are limited [37,38]. To suppress the interference of speckles, the consideration of the spatial correlations contained in PolSAR image is one of the most commonly used and effective methods [39]. Hence, other above-mentioned methods use image patches [2,28,[31][32][33][34] or superpixels [36] to incorporate the spatial information into PolSAR image classification. In this way, improved and smoother classification results can be achieved. However, some methods utilize image patches or superpixels as classification units, which may cause classification errors in certain areas and may not better preserve the contours of certain ground targets [40]. Additionally, the patch-based methods will increase computational complexity and load.
To overcome the limitations mentioned above, this paper proposes a pixel-level Pol-SAR image classification method under the condition of few training samples. This method directly utilizes PolSAR CV data as the benchmark data without any explicit feature extraction. To preserve target details while considering spatial information, a multi-feature extraction strategy based on superpixel segmentation of different scales is first proposed. Then, a composite kernel is designed to realize the multiple information fusion, thereby improving the representation and discrimination capabilities of features. Finally, the composite kernel elastic net representation-based classification method (CK-ENC) is proposed, which is utilized to realize pixel-level PolSAR image classification under the condition of limited training samples.
For the multi-feature extraction, first, the coherency matrix is directly adopted to represent the polarimetric second-order matrix feature (PSMF). This feature can retain the full polarization scattering information of PolSAR targets [22]. In addition, to suppress the interference of speckles and obtain smooth classification performance, the local mean feature (LMF) within coarse-scale superpixels is designed to obtain the spatial stationary information. Superpixels are generated by a modified simple linear iterative clustering (SLIC) algorithm [38]. Based on the assumption that a superpixel represents a homogeneous and local stationary area, the coherency matrix follows the complex Wishart distribution in a superpixel. Therefore, the statistical covariance matrix parameter of Wishart distribution is estimated as the local mean feature of a pixel to consider the local spatial correlation. Moreover, to encapsulate more discriminative information and further enhance the classification performance, inspired by the work of [41], the nonlocal Wishart weighted feature (NWWF) among fine-scale superpixels is designed. In traditional nonlocal methods, rectangular windows are utilized for searching and matching neighborhood pixels. Although promising results can be obtained in this way, the computational load in terms of speed is usually maintained due to pixel by pixel calculation. Therefore, this paper makes full use of superpixels to simplify nonlocal processing. Considering the superpixels with different scales capturing different spatial correlations, the NWWF extraction is based on a fine-scale superpixel segmentation map. In this way, NWWF can be regarded as a further refinement to the feature extracted by the coarse-scale superpixels, which considers the nonlocal spatial information to realize the information complementary with LMF. In addition, to extract more robust NWWFs, new weights of neighborhood superpixels are derived from an adaptive threshold decision strategy (ATDS) [42] and the dissimilarity based on the statistical test. NWWF uses a nonlocal search to explore the spatial correlation of superpixel pairs in a larger neighborhood, which is regarded as a very important supplementary information to obtain more accurate classification results.
After that, based on the kernel theory [43], a composite kernel (CK) is proposed to embed these three features into a high-dimensional linear space to realize the information fusion. The three features are all Hermitian symmetric positive semi-definite (HPD) matrices, which form nonlinear manifolds. The Stein kernel function based on the geometric distances is suitable for mapping these features to the higher-dimensional reproducing kernel Hilbert space (RKHS) [27]. Therefore, we first map the three features to yield three different kernels. Then, according to the properties of Mercer's kernels [44], the three kernels are combined in proportion to form the CK. In this way, the multiple information fusion under the kernel framework is realized to improve the representation and discrimination capabilities of features. In addition, compared with other kernels based on fixed square windows [28], the proposed CK based on superpixels can effectively reduce the computational load and the computational complexity.
Finally, a linear-space-learning classifier, the elastic net representation-based classification method (ENC) integrated with the CK (CK-ENC) is proposed for the final PolSAR image classification. The EN [45], a convex combination of the sparse representation (SR) and the collaborative representation (CR), has been utilized in various fields. The ENC mechanism combines the l 1 -norm in SR and the l 2 -norm regularization in CR for efficient classification. In other words, the ENC makes use of the advantages of both SR and CR to realize a balance between within-class variations and between-class interference [46]. It can offer more robust coefficients to achieve more reliable classification results, especially for the condition of few training samples. In addition, unlike machine learning classifiers, the ENC does not need a training process and does not tune too many parameters. It only represents each test sample as the sparse combination of atoms from an over-complete dictionary [45]. Thus, in this paper, to circumvent parameter selection and debugging problems and achieve better classification performance, the CK-ENC is proposed for the PolSAR image classification. The CK-ENC can yield higher classification accuracy even with a small set of training samples.
The major contributions of this paper can be summarized from the following three aspects.
• Based on superpixel segmentation of different scales, a multi-feature extraction strategy is proposed. It can fully mine the inherent characteristics of PolSAR data and capture more discriminative information, thereby preserving the target contour and suppressing the speckles to improve the visual coherence of the classification maps. • A composite kernel (CK) is constructed to implement the feature fusion and obtain a richer feature representation. The CK can well reflect the properties of PolSAR data hidden in the high dimensional feature space and effectively fuse multiple sources of information, thereby improving the representation and discrimination capabilities of features.
• The CK-ENC is proposed for the final PolSAR image classification. CK-ENC employs ENC to estimate more robust weight coefficients for pixel labeling, thereby achieving more accurate classification, especially for the condition of limited training samples.
The remainder of this paper is organized as follows. Section 2 details the proposed CK-ENC classificaton. The experimental results and discussions are reported in Section 3 and Section 4, respectively. Finally, Section 5 concludes this paper with some remarks.

Proposed Method
The flowchart of the proposed method is illustrated in Figure 1. It contains three modules, multi-feature extraction, CK construction for feature fusion, and CK-ENC for final PolSAR image classification.

Multi-Feature Extraction
To derive better and richer semantic representation, based on superpixel segmentation and statistical analysis, a multi-feature extraction strategy is proposed to extract three features for obtaining accurate classification results.

Polarimetric Second-Order Matrix Feature
To suppress the interference of speckles, as the second-order statistics, the polarimetric coherency matrix T is utilized to analyze the electromagnetic scattering characteristics of the distributed target [1]: where V denotes vertical polarization and H denotes horizontal polarization. S HH , S HV , S V H , and S VV are four complex backscattered coefficients. u L is the polarimetric target vector [1].
The superscript H denotes the conjugate transpose, and . indicates temporal or spatial ensemble averaging. It is clear that T is an HPD matrix. This paper adopts T as the straightforward and effective polarimetric second-order matrix feature (PSMF) to describe each pixel in a PolSAR image. In other words, a 3 × 3 × H × W polarimetric feature matrix based on T is utilized to describe the PolSAR image with a size of H × W. This can avoid the problems caused by the explicit feature extraction and keep the contour information of targets in classification results.

Local Mean Feature within Coarse-Scale Superpixels
The coarse-scale superpixels are first generated by the modified SLIC algorithm [38] to consider the spatial relationship between pixels. Then, the local mean feature (LMF) of each pixel in a PolSAR image is extracted via the similarity within the superpixels.
Each superpixel is a disjoint and homogeneous pixel block, which can be regarded as a stationary and homogeneous area with uniform texture. In a homogeneous area with fully developed speckles and no texture, T obeys the complex Wishart distribution [47], i.e., T ∼ W T (n, q, Σ). Where the parameter q is 3 for monostatic PolSAR on a reciprocal medium, and n is the number of looks. Σ = E{T}, and Tr(.) is the trace of a matrix.
Assume that the ith pixel T i in a PolSAR image belongs to the superpixel Y SP k . Let T j be a group of adjacent pixels with similar properties within superpixel Y SP k , where j = 1, 2, . . . , J k , and J k is the number of pixels. Note that each superpixel represents a local, stationary region, T i in superpixel Y SP k can be modeled by a complex Wishart model, i.e., T i ∼ W T i n, q,Σ SP k . Thus, we estimate the statistical parameterΣ SP k as the LMF to exploit spatial information.Σ SP k is calculated with the maximum-likelihood (ML) estimator [47]: Similar to the Equation (2), the LMF of each pixel can be extracted. Relying on coarsescale superpixels, the local stationary information in a whole PolSAR image can be obtained to effectively suppress the influence of speckles and improve the visual coherence of the classification map.

Nonlocal Wishart Weighted Feature among Fine-Scale Superpixels
To obtain more accurate and robust classification performance, more descriptive and discriminative features should be extracted to provide additional invaluable information. Inspired by the nonlocal idea in [41], the nonlocal Wishart weighted feature (NWWF) is extracted. It can exploit the nonlocal spatial information around each superpixel, which can provide a richer spatial context to enhance the discriminability of each pixel. Figure 2 summarizes the main steps of NWWF extraction process. The key ingredient comes from the Wishart weight, which balances the relative importance of neighboring superpixels around the current superpixel. Considering the robustness and computation efficiency, the distance derived from the Wishart test statistic is adapted for the dissimilarity measure between superpixels [37,47]. Let Σ SP i and Σ SP j respectively denote the center covariance matrix of superpixels Y SP i and Y SP j , the dissimilarity distance between the ith and jth superpixels is defined as: whereΣ SP i andΣ SP j respectively are ML estimators of Σ SP i and Σ SP j in Equation (2). Details of the aforementioned derivation can be found in [37,47].
After the dissimilarity measurement, weights of neighborhood superpixels, which balances the relative importance, can be derived from the dissimilarity. For one superpixel Y SP i , all neighborhood superpixels can be denoted by Y SP m , where m = 1, 2, . . . , M, and M is the total number. Based on the distance D S Y SP i , Y SP m , the Wishart weight w i,m between Y SP i and Y SP m can be estimated by the exponential kernel: where γ is the scale parameter. If the center superpixel Y SP i and a neighborhood superpixel Y SP m are more similar, the value of weight w i,m will be higher.
Notably, there may be some neighborhood superpixels belonging to the different categories with the center superpixel, which still make a contribution by weights for the final NWWF extraction. This will make results disturbed by heterogeneous regions with smaller weights. Therefore, to improve the robustness to heterogeneous, a more effective weight computation is adopted as follows: where τ is the adaptive threshold, which is provided by an AIDS. Inspired by [42], we extend AIDS to deal with PolSAR images. Assume a PolSAR image contains C classes, the set of available training samples can be denoted as X = X 1 , X 2 , . . . , X C with C subsets.
Each subset X c = T c 1 , T c 2 . . . , T c n c ∈ R 3×3×n c is constructed by n c training samples in the class c(c = 1, 2, . . . , C). The total number of all training samples for the training set is N and N = ∑ C c=1 n c . For the cth class, the mean coherency matrix is calculated as: According to Equation (3), all distances between two classes can be computed, which can be composed the set D 1 , D 2 , . . . , D C(C−1)/2 in ascending order. Then, we take the median values in this set as the adaptive threshold τ to decide weights: Based on dissimilarity and threshold decided by AIDS, the new weight computation scheme can reduce or even eliminate the impact of the heterogeneous regions, thereby improving the representation performance.
Finally, with the calculated weights, the NWWF of pixels in superpixel Y SP i can be estimated in a weighted maximum likelihood way: It is worth noting that, to preserve more details and achieve relatively good performance, superpixels with the fine-scale size are generated to extract local mean spatial features for Equation (8). In other words, compared with Equation (2), the local spatial featureΣ SP m in Equation (8) is generated by superpixels with different size. In summary, for each pixel in a PolSAR image, three features are extracted to preserve the original CV attributes, suppress the interference of speckles, and capture more discriminative information to obtain more accurate classification results.

Composite Kernel (CK) Construction
The above three extracted features for each pixel all 3 × 3 CV matrices, which form a nonlinear manifold. This nonlinear geometry often makes PolSAR classification complicated and difficult. Therefore, to better build a classifier for PolSAR images, a composite kernel (CK) is developed based on geometric distance and kernel method. It can map these features to the Hilbert space and realize the multi-feature information fusion to achieve promising classification accuracy.
On the Riemannian manifold, the similarities between points can be measured by the geodesic distance [36]. The widely used geodesic distances include the affine invariant Riemannian metric (AIRM) [36], the log-Euclidean distance (LED) [48], and the Bartlett distance [49]. Due to the eigenvalue decomposition in the equation, AIRM has high computational complexity [27]. In addition, LED applies the Euclidean metric by projecting the SPD matrix into the Euclidean space, which distorts the matrix structure and may lead to suboptimal results [28]. Rather than the eigenvalue decomposition for AIRM, the Bartlett distance only needs to calculate the matrix logarithm operation, which means a low computational load. Therefore, for simple calculation and effective implementation, this paper chooses the Bartlett distance.
Given two SPD matrices X ∈ C p×p and Y ∈ C p×p on a Riemannian manifold, the Bartlett distance, also known as Stein divergence or Jensen-Bregman LogDet divergence, is defined as: where log(·) is the principal matrix logarithm.
In addition, through the kernel method, matrices on the Riemannian manifold can be embedded into the RKHS to handle the nonlinearity. In this way, many pattern recognition methods can be utilized for the PolSAR image classification. Base on the Gaussian RBF kernel and the above Bartlett distance, the Stein kernel function [28] can be defined as: The Stein kernel is a positive definite kernel when the values of β is inside of the following set: In this paper, the choice of β is (p − 1)/2. Therefore, according to the stein kernel, three features are mapped into the RKHS to form three different kernels, and then the CK is composed of the three kernels. More specifically, given the polarimetric second-order matrix feature X PSMF s ∈ C p×p for the pixel s = {i, j}, the mapped polarimetirc second-order matrix kernel, dented by k PSMF X i , X j , is defined as: For the local mean feature denoted by X LMF s ∈ C p×p , the mapped local mean kernel k LMF X i , X j is as follows: In addition, similarlu to Equation (13), the nonlocal Wishart weighted kernel k NWWF X i , X j mapped from the nonlocal Wishart weighted feature X NWWF s ∈ C p×p is calculated as: Finally, according to the properties of Mercer's kernels [43], the CK can be created by combing the above three kernels: (15) where µ PSMF , µ LMF , and µ NWWF are the weight parameters of the three different kernels. Their values are satisfied: For the proposed CK-ENC, the three weights µ PSMF , µ LMF , and µ NWWF are set to 0.1, 0.2, and 0.7, respectively. In the following experimental part, the influences of these weights on the performances of the proposed method will be further analyzed.

Composite Kernel-Based Elastic Net Classifier (CK-ENC)
For higher computational efficiency and better classification accuracy, the CK integrated with the ENC is developed for the PolSAR image classification. Under the condition of few training samples, ENC can estimate robust coefficients to reveal a more powerful discriminant ability for better classification performance.
For a testing sample y ∈ C p×p , the objection of ENC is to find the coefficient vector α ∈ R n×1 for the linear combination of the training samples X = X 1 , X 2 , . . . , X C with the combination of 1 and 2 penalties. Thus, the objective function under the kernel framework can be formulated as: where λ 1 and λ 2 are the regularization parameters. φ(.) is an embedding function, which maps the data from Riemannian manifold into PKHS. α = α 1 , . . . , α c , . . . , α c is the coefficient vector to reconstruct the testing sample y, and α c ∈ R n c ×1 is the vector representing coefficients corresponding to the subset X c . It is known that the inner product of two instances in PKHS can be calculated by a kernel function k(.) : φ(A), φ(B) = k (A, B), where A and B are all on a Riemannian manifold. Thus, Equation (17) can be expanded as: In this paper, the objective function in Equation (18) adopts the CK in Equation (15). In addition, the sparse modeling software [50] is used to solve the convex problem in Equation (18) and find the optimized solutionα = α 1 , . . . ,α c , . . . ,α c . According to the estimated coefficient vectorα, the testing sample y can be classified to the best category by the following rule: In the same way, the classification of a complete PolSAR image can be realized. In summary, we propose the CK-ENC to achieve pixel-level PolSAR classification. CK-ENC makes good use of the inherent statistical characteristics and the spatial information of PolSAR data through the CK. Thus, it can obtain more discriminative representation and overcome the influence of speckles, thereby preserving image boundaries well and making the classification results smoother. In addition, under the condition of limited training samples, the CK-ENC combines the CK with ENC to achieve PolSAR image classification, which can balance the between-class interference and within-class variations to obtain more accurate classification results. Subsequently, we will investigate the effectiveness of the proposed CK-ENC method with real PolSAR images.

Experimental Results
Experiments were carried out to evaluate the classification capability of the proposed CK-ENC method. We first introduce three real PolSAR datasets utilized in the experiments and three objective metrics for quantitative evaluation of classification performance. Then, a comparison to classification algorithms and the experimental setup are given. Finally, to make a sufficient comparison among various algorithms, the visualized classification results and the quantitative performance are displayed for the full demonstration.

Experimental Datasets Description and Objective Metrics
To demonstrate the effectiveness of CK-ENC, we select three real PolSAR datasets from an airborne system (L-band AIRSAR) and two spaceborne systems (C-band GaoFen3 and C-band RADARSAT-2). The three selected PolSAR pseudo-images are from different areas, and the types and quantities of classes in these datasets are also different. Therefore, the effectiveness of the proposed classification method can be verified by three selected datasets in terms of the system, the operative band, and the classification problem. The details of these datasets are listed as following.

Flevoland Benchmark Dataset
It is an L-band four-look PolSAR data with a size of 750 × 1024 pixels, acquired by the NASA/JPL AIRSAR system on August 16,1989. The Pauli RGB image is shown in Figure 3a and contains 15 classes: stem beans, peas, forest, lucerne, wheat, beet, potatoes, bare soil, grass, rapeseed, barley, wheat2, wheat3, water, buildings. The ground-truth of the image is shown in Figure 3b, and the corresponding color code is displayed in Figure 3c.

Yihechang Dataset
The Yihechang dataset near a domestic airport in China was obtained by the spaceborne GaoFen3 system of the China National Space Administration (CNSA) in 27 June 2019. The Pauli RGB map are shown in Figure 4a. As a fully polarized image of the C-band, its image size is 590 × 800 pixels, and the resolution is 5m. This dataset is provided by the Aerospace Information Research Institute, Chinese Academy of Sciences. There are four land cover classes identified in this dataset: road, building, grass, and farmland. Figure 4b,c show the ground truth and the corresponding color code, respectively.

San Francisco Dataset
The third dataset, San Francisco, is obtained by RADARSAT-2 in 9 April 2008, which is a spaceborne system of the Canadian Space Agency. It is the C-band full PolSAR image and is composed of 1800 × 1380 pixels. The Pauli RGB image, the ground-truth map, and the color code are respectively shown in Figure 5a-c. The image consists of five major classes: water, vegetation, high-density urban, low-density urban, and developed. To evaluate the quantitative performance of different algorithms, three objective metrics are adopted, namely, overall accuracy (OA), average accuracy (AA), and the Kappa coefficient (κ). Besides, the individual accuracy (CA) of each class is also listed. Specifically, OA refers to the percentage of correctly classified testing samples in all testing samples; AA is the mean of all class accuracies; Kappa is a robustness measurement with the degree of agreement.

Comparison Algorithms and Experimental Setup
To verify the effectiveness of the proposed method, the proposed CK-ENC is compared with some competing methods including Wishart-based ML (WML) [23], regionbased Markov random field (RMRF) [39], random forest (RF) [2], support vector machine (SVM) [6], multilayer projective dictionary pair learning and sparse autoencoder-based method (MDPL-SAE) [10], adaptive nonlocal stacked sparse autoencoder (ANSSAE) [11], SRC with majority voting (SRC-MV) [51], superpixel-based joint SRC (JSRC-SP) [51], Wishartbased joint CRC (W-JCRC) [52], and double kernels SRC (DK-SRC) [28]. In this paper, it is not our main purpose to examine the impact of spatial information on classification results. Therefore, for the WML and RF methods, the spatial information is introduced into these methods by superpixel-based segmentation. They are denoted by S-WML and S-RF, respectively. In addition, we found that the classifiers with the explicit feature-based kernels gain poor classification performance under the condition of limited training samples. Thus, for a fair comparison, we replace these kernels in SRC-MV, JSRC-SP, and W-JCRC methods with the polarimetric second-order statistical kernel. Since the proposed method combines the CK, we construct the CK-SVM classifier according to [6] for testing. The optimization problem of CK-SVM is resolved by the LIBSVM library [53] and the parameters are obtained by cross-validation. In addition, the CK is embedded into CRC and SRC separately to compare the performance of representation-based classifiers. For deep learning-based methods MDPL-SAE and ANSSAE, their parameters are tuned by cross-validation. In this paper, the MDPL-SAE and ANSSAE are implemented in the Keras framework with TensorFlow as the backend. Other methods are all run on MATLAB R2014a. The machine used for experiments is a Lenovo Y720 cube gaming PC with an Intel Core i7-7700 CPU, an Nvidia GeForce GTX 1080 GPU, and 16GB RAM under Ubuntu 18.04 LTS operating system.
For the proposed CK-ENC, we conduct experiments to set the number of superpixels and the regularization parameters λ 1 and λ 2 . It should be noted that the number of superpixels is decided by the initial expected spatial size R for similar pixels search [51]. Thus, we vary the value of R to observe its impact on the classification results. To explore the effect of these parameters on the performance of CK-ENC and tune them, we utilize the leave-one-out cross validation (LOOCV) strategy to conduct the experiments based on available training samples. In addition, the same training and testing samples are chosen in the same set of experiments to ensure consistency [46]. The labeled samples of each dataset were divided into training and test sets randomly. For all datasets, 20 labeled pixels per class are randomly chosen for training, and the remaining labeled samples are treated as the test set. To avoid any bias, the experimental results are repeated ten times, and the mean OA values are reported.

Impact of the Number of Superpixels
We first report experiments about the influence of the number of superpixels on the classification performance. According to the previous theory, the local mean kernel k LMF and the nonlocal Wishart weighted kernel k NWWF are all affected by the number of superpixels. Thus, for all datasets, the ENC with the k LMF (LMK-ENC) and the k NWWF (NWWK-ENC) are respectively employed to determine the optimal number of superpixels. As mentioned above, the optimal number of superpixels means the optimal R. Here, the value of R ranges from 3 to 23, and the interval is 2. Figure 6 illustrates the OA values of the two classifiers under the varying initial spatial sizes. It can be observed that the classification accuracies of LMK-ENC and NWWK-ENC are all worse when the value of R is small. With the increase of R value, the OA curve shows a trend of first increasing and then decreasing. In addition, for three datasets, the optimal R values of NWWK-ENC are all smaller than the optimal value of LMK-ENC. This can be explained as follows. If the value of R is too much smaller, each superpixel may not provide enough spatial information for accurate classification. On the other hand, a much larger value of R may cause more heterogeneous pixels to be contained in each superpixel, which easily leads to insufficient segmentation. Moreover, for NWWK-ENC, the smaller value of R than LMK-ENC can alleviate the effect of heterogeneous pixels. At the same time, enough spatial information is provided through the weighted neighborhood superpixels. Based on the above analysis and the results in Figure 6, the R parameters of LMK-ENC and NWWK-ENC for different datasets are set according to their best performances. The details of the R parameters settings are shown in Table 1.

Impact of the Regularization Parameters
The regularization parameters λ 1 and λ 2 are critical for the proposed CK-ENC, which are used to balance the data item and the penalty items. On the one hand, too much smaller values of λ 1 and λ 2 have no contribution to higher overall classification accuracy. On the other hand, when the values of λ 1 and λ 2 exceed a threshold, too many important features may be lost, resulting in reduced classification accuracy. To search the optimal regularization parameters λ 1 and λ 2 for the experimental datasets, we conduct experiments in the range of λ 1 = 1e − 6, 1e − 5, . . . , 1e − 1 and λ 2 = 1e − 6, 1e − 5, . . . , 1e − 1. Figure 7 demonstrates the results on the three PolSAR datasets. As shown in Figure 7, when λ 1 = 1e − 2 and λ 2 = 1e − 3, the performance of CK-ENC for the Flevoland is the best. For the Yihechang and San Francisco, the best regularization parameters λ 1 and λ 2 are 1e − 3 and 1e − 2, respectively.

Classification Results Comparison
In this sub-section, we evaluate the effectiveness of the proposed PolSAR image classification method by the visualized classification results and the quantitative performance.

Experiment on Flevoland Dataset
The first experiment is carried on the Flevoland dataset. The classification accuracies of different algorithms are shown in Table 2, and the comparison results are shown in Figure 8.
From Table 2, it is apparent that the proposed CK-ENC performs better than other compared methods, in terms of OA, AA, and the Kappa coefficient. Compared with S-WML and RMRF, CK-ENC obtains higher accuracies with a more than 8% improvement in OA. That clearly demonstrates the advantage of capsuling more discriminative features. By utilizing more effective features, S-RF can achieve a better classification result. However, it cannot maintain a balance between different classes. As presented in this table, although the OA of the Bare soil reaches 100%, the OA of the Wheat is only 83%. This phenomenon also appears in CK-SVM, MDPL-SAE, and ANSSAE. The main reason may be that due to the limitation of available labeled samples, the training-based classification algorithms cannot fully explore and learn the inherent polarimetric information. Thus, they cannot identify all classes effectively to maintain a classification balance between classes. For SRC-MV and JSRC-SP based on superpixels, they can achieve classification accuracies than 95%. Additionally, W-JCRC based on the statistical distance-weighted regularization obtains an OA up to 95.94%. However, none of them consider nonlocal spatial information, which results in accuracy lower than CK-ENC. By considering the nonlocal spatial information, the OA of DK-SRC reaches 97.66%. It indicates that it is necessary to introduce nonlocal spatial information for more accurate classification results. Although DK-SRC achieves a great classification result, its performance is not as good as the proposed CK-ENC. Compared with DK-SRC, CK-ENC improves the accuracy of wheat, grass, and barley by about 4%. It may be the result of the integration of more spatial information and the fusion of various types of features. In addition, the results of CK-ENC are better than CK-SRC and CK-CRC, which illustrates that ENC combining 1 and 2 -norm regularized terms outperforms the original SRC and CRC. Overall, for the Flevoland dataset, the proposed CK-ENC achieves the best classification accuracy, especially when the number of labeled samples is limited.  As shown in Figure 8, the proposed CK-ENC performs a better visual effect than other methods and has better agreement with the ground truth. As shown in Figure 8c-e, S-WML and RMRF misclassify a considerably large part of water into bare soil. S-RF achieves a bad result in recognizing wheat3, but it can distinguish water well. As shown in Figure 8f-h, the classification maps by training-based methods have many notable misclassified pixels, which is consistent with the results listed in Table 2. It indicates that the proposed CK-ENC can provide competitive performance even with limited labeled samples. Comparing Figure 8o with Figure 8i,j, the proposed CK-ENC can reduce the number of misclassified homogeneous regions (as highlighted by black ovals). This illustrates that fusing pixelbased features (i.e., PSMF) and capturing more discriminant information is indispensable to improve classification results. From Figure 8k,l, we can see that the classification maps are over-smoothed, and the pixels located around class boundaries are misclassified. The reason is that W-JCRC and DK-SRC utilize rectangle windows to joint neighboring pixels. By contrast, CK-ENC adopts superpixels to provide adaptive spatial information, thereby avoiding mixing pixels belonging to different classes and preserving image boundaries well. In addition, compared with Figure 8m,n, CK-ENC can obtain a better and accurate classification result (as highlighted by white rectangles). According to Table 2 and Figure 8, it can be concluded that the proposed CK-ENC outperforms other compared approaches on the Flevoland dataset.

Experiment on Yihechang Dataset
The second experiment is conducted on the Yihechang dataset. The quantitative evaluation results are listed in Table 3, and the classification maps are shown in Figure 9.
As shown in Table 3, the proposed CK-ENC has the highest accuracy and kappa coefficient. What is more, compared with other methods, CK-ENC has excellent performance for correctly classifying the Farmland. That shows that our method can provide more discriminant information by the multi-feature fusion, thereby achieving satisfying results for complex scattering classes. As shown in Figure 9, the classification map of CK-ENC has fewer remarkable misclassified pixels and is much clearer and smoother compared with other methods. Moreover, CK-ENC significantly reduces the misclassification of the edges and improves the visual coherence of the classification map. Therefore, for the Yihechang dataset, whether objective metrics or visual performance, the proposed CK-ENC delivers better performance than other compared methods.

Experiment on San Francisco Dataset
The third experiment is conducted on the San Francisco dataset. Table 4 reports the quantitative evaluation results for different classification methods. The corresponding classification results are shown in Figure 10.
As shown in Table 4, CK-ENC has the highest OA value and Kappa coefficient, which demonstrates the effectiveness of the proposed method. Moreover, the AA value of CK-ENC is the highest, which proves that our method can extract features with the representation and discrimination ability, thereby maintaining the classification balance between classes. It is noteworthy that, for all urban classes including High-density urban, Low-density urban, and Developed, the CK-ENC yields accuracies more than 95%. This shows that even for similar classes with small differences, the proposed method also well captures within-class variation to present a better performance, which outperforms other competitive methods. From Figure 10, it is apparent that CK-ENC performs the best visual effect. As shown in Figure 10c-l, serious confusions exist between high-density urban and low-density urban. This phenomenon has been weakened in Figure 10m-o, which indicates that the proposed CK can capsule more discriminant information by the feature fusion and eliminate the between-class interference. In addition, compared with Figure 10m,n, the proposed CK-ENC further alleviate this problem (as highlighted by black ovals), which is coincident with the results in Table 4. Moreover, for Developed and Vegetation, CK-ENC shows better visual effects in regional label consistency than other methods (as highlighted by white ovals). In summary, by the fusion of various types of features, the proposed CK-ENC method can captive more discriminative information, thereby exploring more characteristics contained in PolSAR data to obtain more accurate classification results.

Impact of the Kernel Parameter β
To verify the influence of β on the classification result, we select 15 values within the effective range of β for experiments. Figure 11 illustrates the OA values of the proposed method under different β values on the three PolSAR datasets. As shown in Figure 11, the change of β value has little effect on classification performance. Therefore, without loss of generality, we set β to (p − 1)/2 in the above experiments, that is, β = 1.  Figure 11. Impact of the kernel parameter β for three PolSAR datasets.

Impact of the Proposed Composite Kernel
In CK-ENC, the three kernel weights µ PSMF , µ LMF , and µ NWWF can reflect the contribution of the three feature kernels k PSMF , k LMF , and k NWWF for classification. To verify the performance of the proposed CK k CK , Figure 12 illustrates the OA values of different combinations of kernel weight parameters µ PSMF and µ LMF under the condition of µ PSMF + µ LMF + µ NWWF = 1. For the Flevoland dataset, when the three feature kernel are used separately, that is, when µ PSMF = 1, µ LMF = 0, µ NWWF = 0, the OA value is 71.61%, when µ PSMF = 0, µ LMF = 1, µ NWWF = 0, the OA value is 89.50%, and when µ PSMF = 0, µ LMF = 0, µ NWWF = 1, the OA value is 96.01%. Obviously, the OA value obtained by using only the k PSMF is the lowest. This shows that the local mean feature (LMF) and the nonlocal Wishart weighted feature (NWWF) seem to be more effective for higher accuracy than the polarimetric second-order matrix feature (PSMF). When the k PSMF is separately combined with k LMF and k NWWF , i.e., 0 < µ PSMF < 1, µ NWWF = 0 and 0 < µ PSMF < 1, µ LMF = 0, the OA value increases first and then decreases. Especially when µ PSMF varies from 0.2 to 0.9, the OA value keeps decreasing. This indicates that NWWF should be utilized for the PolSAR image classification, but its weight value is comparatively smaller than other features. Therefore, this paper fixes the value of µ PSMF to 0.1 for three datasets. If the k LMF and k NWWF are used at the same time, i.e., µ PSMF = 0, with the increasing of µ LMF , the change of OA value is to increase first and then decrease. The best OA value is 97.10%, which is higher than using both kernels alone. This shows that a suitable combination of LMF and NWWF has a positive impact on classification performance. From Figure 12a, we can observe that the highest OA value occurs when the three kernels are fused, i.e., µ PSMF = 0, µ LMF = 0, µ NWWF = 0. This means the validity of the proposed composite kernel k CK .
For the Yihechang and San Francisco datasets, it can be observed that the best classification results are obtained using CK. In addition, for the San Francisco dataset, although the increase in OA is not obvious when combining with the k PSMF , the contour details of some ground targets are clearer due to the fusion of PSMF. For better interpretation, Figure 13a,b respectively show the classification results without and with the k PSMF . As shown in Figure 13, the proposed method with the k PSMF can identify the fine structures effectively (as highlighted by black rectangles). To sum up, the proposed CK, which integrates three different kernels, can achieve better results than single kernels and dual kernels.

Efficiency Comparison
To assess the efficiency of the proposed CK-ENC, Figure 15 reports the execution time of different methods, including feature extraction time and classification time. For trainingbased methods, the execution time also includes model training time. Besides, these methods are all speed up by using the graphical processing unit (GPU). From Figure 15, we can see that S-WML has the shortest execution time because its classification framework is simple. As the GPU is used for acceleration, S-RF, MDPL-SAE, and ANSSAE methods have certain advantages in terms of execution time. However, compared with CK-ENC, the classification performance of these methods is undesirable. Restricted by the rectangular window operation, the execution time of DK-SRC is longer than other methods. In addition, benefiting from the role of superpixels, the proposed CK-ENC has a rather short time cost compared with DK-SRC. However, the execution time of the proposed CK-ENC is higher than SRC-MV, JSRC-SP, and W-JCRC. The main reason is that CK-ENC needs to calculate three different kernels, while SRC-MV, JSRC-SP, and W-JCRC only need to calculate one kernel. To sum up, taking both time consumption and accuracy into consideration, the proposed CK-ENC can get very competitive classification results.

Conclusions
This paper presents the CK-ENC method to achieve PolSAR image classification under the circumstance of limited training samples. Without any data projection, CK-ENC directly uses the PolSAR CV data as the benchmark data to avoid the loss of polarimetric information. Based on the superpixel segmentation of different scales, CK-ENC introduces a multi-feature extraction strategy to achieve better target contour preservation and enhance the robustness against speckles. In addition, a CK is constructed to effectively implement feature fusion, thereby improving the representation and discrimination capabilities of features. In this way, the proposed CK-ENC can achieve better classification performance. Moreover, to achieve more reliable results with limited training samples, we integrated the CK with the ENC for final PolSAR image classification. Experiments on three PolSAR datasets acquired from different systems evaluated the classification performance and effectiveness of the proposed CK-ENC. The classification results demonstrate that CK-ENC outperforms the state-of-the-art methods both in quantitative metrics and in visual quality, especially under the circumstance of limited training samples. In future work, we will generalize the CK-ENC method to classify dual-frequency PolSAR datasets.
Author Contributions: Y.C. and Y.W. conceived and designed the experiments; Y.C. performed the experiments and analyzed the results; Y.C. wrote the paper; and Y.W., P.Z., W.L., and M.L. revised the paper. All authors have read and agreed to the published version of the manuscript.