PolSAR Image Feature Extraction via Co-Regularized Graph Embedding

: Dimensionality reduction (DR) methods based on graph embedding are widely used for feature extraction. For these methods, the weighted graph plays a vital role in the process of DR because it can characterize the data’s structure information. Moreover, the similarity measurement is a crucial factor for constructing a weighted graph. Wishart distance of covariance matrices and Euclidean distance of polarimetric features are two important similarity measurements for polarimetric synthetic aperture radar (PolSAR) image classiﬁcation. For obtaining a satisfactory PolSAR image classiﬁcation performance, a co-regularized graph embedding (CRGE) method by combing the two distances is proposed for PolSAR image feature extraction in this paper. Firstly, two weighted graphs are constructed based on the two distances to represent the data’s local structure information. Speciﬁcally, the neighbouring samples are sought in a local patch to decrease computation cost and use spatial information. Next the DR model is constructed based on the two weighted graphs and co-regularization. The co-regularization aims to minimize the dissimilarity of low-dimensional features corresponding to two weighted graphs. We employ two types of co-regularization and the corresponding algorithms are proposed. Ultimately, the obtained low-dimensional features are used for PolSAR image classiﬁcation. Experiments are implemented on three PolSAR datasets and results show that the co-regularized graph embedding can enhance the performance of PolSAR image classiﬁcation.


Introduction
Regardless of the influence of the light and the weather, polarimetric synthetic aperture radar (PolSAR) has the ability to gain high-resolution images. Moreover, PolSAR uses different polarization combinations to obtain more abundant information of land covers than the single-polarization SAR [1][2][3]. Therefore PolSAR is a powerful tool for land cover classification. How to extract appropriate features affects the PolSAR image classification performance greatly. Thus, this paper focuses on feature extraction.
In the early years, Lee et al. [4] put forward the Wishart classifier to classify PolSAR images and the definition of the Wishart distance was given. By using Bayes maximum likelihood classifier specific to the complex Wishart distribution, the Wishart distance was defined. The Wishart classifier becomes one of most classical PolSAR image classification methods and is widely used. Afterwards, Lee et al. [5] cooperated H/A/α decomposition with Wishart classifier for unsupervised PolSAR image classification. Three important polarimetric parameters of H/A/α decomposition (i.e., entropy H, anisotropy A, and alpha angle α) and the total backscattering power for H/A/α decomposition SPAN were used to initialize cluster centers for the Wishart classifier in [6]. Wu et al. [7] employed Markov random field (MRF) in the basis of the Wishart distribution and region for PolSAR image classification. The symmetric revised Wishart (SRW) distance is proposed to construct a weighted graph for spectral clustering in [8]. Ersahin et al. [9] used the SRW distance for spectral graph partitioning based on the contour information and spatial proximity. The SRW distance has also been introduced into a superpixel segmentation method, i.e., simple linear iterative clustering (SLIC) for segmenting PolSAR images into superpixels [10]. Recently, the Wishart distance was used in the deep learning architecture, i.e., deep stacking network (DSN), named Wishart DSN for PolSAR image classification [11].
For the majority of PolSAR image classification methods, polarimetric features are usually exploited as features [12][13][14][15][16][17]. Some original PolSAR data, i.e., the covariance matrix, scattering matrix, and coherency matrix can yield polarimetric features. Moreover, polarimetric features are also usually obtained from considerable polarimetric target decomposition methods, namely Yamaguchi decomposition, Pauli decomposition, Yang decomposition, Krogager decomposition, Huynen decomposition, Van Zyl decomposition, Freeman decomposition, and H/A/α decomposition [12][13][14]16]. Zhang et al. [14] used sparse representation based on polarimetric features to classify PolSAR images. The coherency matrix was converted into a 6D feature vector as feature to input deep convolutional neural networks (CNN) [15]. The complex-valued CNN [17] was put forward by using PolSAR data's amplitude and phase information for PolSAR image classification. In some recent works, visual features, namely color features and texture features, were extracted for PolSAR image classification [18][19][20]. Uhlmann et al. [18] concluded three kinds of features, i.e., polarimetric features, color features and texture features of PolSAR images and evaluated the performance of different combinations of features for image classification. Ren et al. [20] proposed manifold regularized low-rank representation method based on polarimetric features and texture features for PolSAR image classification.
To acquire low-dimensional features for PolSAR image classification, some dimensionality reduction (DR) methods were applied. Tu et al. [12] used Laplacian eigenmaps (LE) to extract low-dimensional features for classification. Shi et al. [13] proposed supervised graph embedding (SGE) based on the discriminative information for supervised PolSAR image classification. Because of the spatial information's significance, some tensor-based DR methods were introduced for PolSAR image feature extraction, such as tensor local discriminant embedding (TLDE) [21], tensorial independent component analysis (TICA) [22] and tensorial locally linear embedding [23]. Ren et al. [24] proposed the tensor embedding framework for PolSAR image feature extraction.
DR methods conclude linear and nonlinear methods. Traditional DR methods, i.e., linear discriminant analysis (LDA) and principal component analysis (PCA) are supervised and unsupervised linear methods, respectively [25]. The kernel method is used to extend the linear DR methods to nonlinear ones [26], i.e., kernel fisher discriminant analysis and kernel PCA. Some nonlinear DR methods aim to preserve the local or global geometric information of the nonlinear manifold structure, such as Laplacian eigenmap (LE) [27], Isomap [28] and local linear embedding (LLE) [29]. Yan et al. raised a framework to reformulate all the above DR methods by the graph embedding framework [30]. For the graph embedding DR framework, the weighted graph is constructed to represent the structure information of the data which we attempt to preserve in the process of DR. Different weighted graphs generate different methods. The similarity measurement plays a key role in constructing a weighted graph. The weighted graph of LE is constructed based on local pairwise Euclidean distances [27] and the weighted graph of Isomap is constructed based on global pairwise geodesic distances [28].
However, the above DR methods are mainly specific to single-view data. Even for multi-view data, they only directly concatenate vectors from multiple views into a vector and neglect the complementarity and correlation of different views. Kan et al. [31] proposed multiview discriminant analysis (MDA) to extend LDA to multiview cases. Multiview spectral embedding (MSE) was proposed to learn the complementarity of multiple views [32]. Kumar et al. [33] put forward a spectral clustering method based on co-regularization for multiview data. These methods are also based on graph embedding, which construct different graphs for the data from different views.
From above works, we can see that Wishart distance and polarimetric features are two indispensable factors for PolSAR image classification. For graph embedding DR methods, the similarity measurement is a crucial factor for constructing a weighted graph. Different distances can describe the similarity of samples more comprehensively from different perspectives. Therefore both Wishart distance of covariance matrices and Euclidean distance of polarimetric features are used in this paper. To extract more comprehensive information for a better classification performance, this paper attempts to put forward a feature extraction method by combining Wishart distance of covariance matrices and Euclidean distance of polarimetric features. Part of this work was introduced in [34]. Based on the two distances, we construct two weighted graphs to represent the local structure information. To use spatial information and decrease computation cost, we seek neighbouring samples in a local patch instead of globally. Then the co-regularized graph embedding (CRGE) model is formed on the basis of the two weighted graphs and co-regularization. The co-regularization is defined to measure the the dissimilarity of low-dimensional features corresponding to two graphs. The obtained low-dimensional features are used for ultimate image classification.
The remainder of this paper is organized as follows. Some background knowledge is described briefly in Section 2. The proposed co-regularized graph embedding method is presented elaborately in Section 3. Section 4 shows the experiments which are conducted on three PolSAR data sets and the analysis of results. Section 5 gives some discussions specific to the proposed method. At last, Section 6 concludes this paper.

Graph Embedding DR Framework
Given n samples E = [e 1 , e 2 , · · · , e n ] T ∈ R n×D , where n and D denote the number and the dimensionality of samples, respectively, the graph embedding DR framework which is given in [30] aims to find low-dimensional representations F = [f 1 , f 2 , · · · , f n ] T ∈ R n×d , where d is the reduced dimensionality and d < D.
For graph embedding, we firstly need to construct an intrinsic weighted graph G to describe the similarity relationship among samples. The optimization problem is given: where L = D − G, i.e, the Laplacian matrix. D is a diagonal matrix and its elements D ii = ∑ j G ij . P can be the Laplacian matrix specific to a penalty graph G p which characterizes the similarity properties which we try to constrain, or a diagonal matrix for scale normalization. b is usually a constant.

PolSAR Data
For PolSAR, an observed target can be characterized by the scattering matrix S with size 2 × 2 [1]. That is, where S HH , S HV , S V H and S VV represent the backscattering coefficients corresponding to four channels: HH, HV, VH and VV. Based on the reciprocity property, S HV is equal to S V H . The scattering matrix S is able to be vectorized as the scattering vector k = [S HH , √ 2S HV , S VV ] T . Regrading mulitilook PolSAR data, the covariance matrix is gained by computing as follows: where n L is the number of looks, the superscript * represents the complex conjugation.

Wishart Distance
The Wishart distance is frequently used as the similarity measurement for PolSAR image classification [1,4,5]. By using Bayes maximum likelihood classifier specific to the complex Wishart distribution, the Wishart distance was obtained [4]. The definition of the original Wishart distance is like: where C is a sample covariance matrix, Σ is the cluster mean of the class. In this paper, we use a symmetric revised Wishart (SRW) distance to construct a graph. Because SRW distance satisfies some conditions for a general metric, such as definiteness, generalised nonnegativity, symmetry [8]. The SRW distance of two p × p covariance matrices A and B is denoted as:

The Proposed Method
This section will describe the proposed co-regularized graph embedding method elaborately. Figure 1 presents the whole process based on co-regularized graph embedding for PolSAR image classification. The proposed method is in basis of a superpixel rather than a single pixel to decrease computation cost and use spatial information. Therefore, we firstly segment PolSAR images into superpixels which will be described detailedly in Section 4. Based on Wishart distance of covariance matrices and Euclidean distance of polarimetric features, we construct two corresponding weighted graphs. Then the co-regularized graph embedding DR model is built based on the two graphs and co-regularized for feature extraction. The obtained low-dimensional features are used for PolSAR image classification.

Polarimetric Features
Polarimetric features can describe polarimetric scattering and physical properties of targets and usually can be extracted from PolSAR data and some polarimetric decomposition methods. In the proposed method, we extract thirty polarimetric features from the covariance matrix C and five decomposition methods (i.e., Pauli decomposition, H/A/α decomposition, Krogager decomposition, Huynen decomposition and Freeman decomposition). In detail, • six are from the elements of covariance matrix C, i.e., C 11 , C 22 , C 33 , ReC 12 , ImC 12 , ReC 13 , ImC 13 , ReC 23 , ImC 23 ; • three are from Pauli decomposition, i.e., |a| 2 , |b| 2 , |c| 2 which denote powers of a isotropic odd, a even and a π/4 even scatter; • three are from Krogager decomposition i.e., |k s | 2 , |k d | 2 , |k h | 2 which denote powers of a sphere, a diplane and a helix scatter; • three are from Freeman decomposition i.e., P s , P d , P v which denote powers of a surface, a double-bounce and a volume scatter; • nine are from Huynen decomposition, i.e., A 0 , B 0 + B, B 0 − B, C, D, E, F, G, H denote information corresponding to symmetry, nonsymmetry and irregularity, linear, curvature, torsion, helicity, glue and orientation; • six are from H/A/α decomposition, i.e., λ 1 , λ 2 , λ 3 , H, A, α denote three eigenvalues, entropy, anisotropy and alpha angle.
Therefore, sample i is denoted as a 30D feature vector e i ∈ R 30 . Here, "sample" means a superpixel rather than a single pixel. The details will be introduced in Section 4.
The similarity of two samples is characterized by the Euclidean distance of polarimetric features. Concretely, for sample i and sample j, the Euclidean distance d E is

Constructing Two Graphs
Based on Wishart distance of covariance matrices d SRW and Euclidean distance of polarimetric features d E , we construct two corresponding graphs G (1) and G (2) .
To use the spatial and reduce the computation burden, we search for k neighbouring samples in a h × h local region rather than the full image. Specifically, for sample i, we search for k nearest samples in the h × h region centered on sample i based on d SRW and d E , which are denoted as O(C i , k) and O(e i , k).
Then we construct two graphs G (1) and G (2) as follows: where t 1 and t 2 are two parameters. Here, we select t 1 = max(d SRW (C i , C j )) and t 2 = max(d E (e i , e j )).

Co-Regularized Graph Embedding Model
Based on two graphs G (1) and G (2) , we pursue corresponding low-dimensional representations F (1) ∈ R d×n and F (2) ∈ R d×n which can be obtained via the following objective functions: min F (2) tr where L (1) and L (2) are normalized Laplacian matrices for G (1) and G (2) . Concretely, L (1) = (1) and D (2) are two diagonal matrices specific to L (1) and L (2) . Furthermore, their diagonal elements are D ij , respectively.
To not only exploit polarimetric features, but also bring in the Wishart distance, we combine problem (7) and (8) by a co-regularization term. Then the optimization problem of the co-regularized graph embedding DR method is constructed as follows: where α and 1 − α are parameters to balance two low-dimensional features, λ is the parameter for the co-regularization term. Here Cor(F (1) , F (2) ) is the co-regularization term, which measures the similarity of two low-dimensional features F (1) and F (2) . A natural way is as follows: As with [33], another commonly used similarity measure is as follows:

Optimization
Specific to two different co-regularization terms (10) and (11), we will describe how to solve the corresponding optimization problems.
(1) When Cor1(F (1) , F (2) Assuming f (1) , f (2) are arbitrary column vectors from F (1) , F (2) , the Lagrange function of problem (12) is as follows: where µ is the Lagrange multiplier. Sequently, we do the partial derivative of L(f (1) , f (2) , µ). Set The above two equations are reformulated by using a matrix form, i.e., Through doing the eigenvalue decomposition of matrix αL (1) we select the d eigenvectors specific to the smallest d eigenvalues for F (1) F (2) . The detailed algorithm is given in Algorithm 1.
Step1: Do the eigenvalue decomposition of matrix Step2: Select the d eigenvectors specific to the smallest d eigenvalues as F (1) F (2) .
Output: Low-dimensional features F (1) and F (2) ( and regardless of the constant term and the scale term, we approximately have Therefore, problem (9) becomes: Problem (18) can be solved by using an alternate optimization algorithm for F (1) and F (2) . Firstly, by doing eigenvalue decomposition of L (1) and L (2) , d eigenvectors which are specific to d smallest eigenvalues are used as the initialized F (1) and F (2) . The iterative process is as follows: • Fixing F (1) , solve for F (2) . Problem (18) becomes: Therefore d eigenvectors which are specific to d smallest eigenvalues of (1 − α)L (2) Fixing F (2) , solve for F (1) . Problem (18) becomes: Therefore d eigenvectors which are specific to d smallest eigenvalues of αL (1) − λF (2) F (2) T form F (1) .
The stop error ε or the maximum number of iterations T max can be used as the stop criterion of the iterative process. The detailed algorithm is given in Algorithm 2.
Based on the obtained low-dimensional features F (1) and F (2) , F (1) F (2) is used as features for the final classification.

Computational Complexity Analysis of the Proposed Method
The computational complexity of the proposed method mainly consists of two steps, i.e., constructing two weighted graphs and solving the CRGE model. Firstly, because the neighbouring samples are searched in a local region, where the number of samples is approximately equal to k. Thus, the computation complexity is O(nk), where n is the number of samples and k is the number of neighbouring samples. Then, for CRGE with Cor1, we need do the eigenvalue decomposition on a 2n × 2n matrix, thus the computational complexity is O((2n) 3 ). For CRGE with Cor2, the alternating iteration algorithm is needed. In each iteration, the algorithm relates to two eigenvalue decomposition problems on two n × n matrices. Thus, the computational complexity is each iteration is O(2n 3 ). Therefore, the total computational complexity for the whole iterative process is O(2Tn 3 ), where T is the number of iterations.

Experiments
To validate the effectiveness of the co-regularized graph embedding method, experiments are implemented on three PolSAR data sets.

Description of Data Sets
The first one is the Flevoland data set for a part of cropland from Flevoland in the Netherlands. The size of the Flevoland data set is 750 × 1024. Fifteen classes of crops are considered: stem beans, potatoes, peas, lucerne, wheat I, forest, beet, bare soil, grass, rapeseed, barely, wheat II, wheat III, water, and buildings. Figure 2a shows the PauliRGB image of this data set, where red is for |S HH − S VV |, green is for |S HV + S V H |, and blue is for |S HH + S VV |. Figure 2b describes the corresponding ground truth for fifteen classes of crops. To conduct more experiments to discuss parameter setting, we firstly select a subarea of the Flevoland data set with 200 × 320 pixels. The subset consists of nine classes of crops: stem beans, rapeseed, wheat I, grass, lucerne, potatoes, bare soil, wheat II, and sugar beat. The PauliRGB image of the subset and its ground truth map are shown in Figure 2d,e. The second data set is the Oberpfaffenhofen data set for the Oberpfaffenhofen area from Germany. The data set has 700 × 700 pixels which include three classes of land covers: wood land, built-up areas and open areas. Figure 3a shows the corresponding PauliRGB image and Figure 3b is the ground truth map. The third data set is the San Francisco Bay data set for the San Francisco Bay area from the USA. It is four-look L-band data set and has 900 × 1024 pixels. Four classes of land covers are considered, i.e., mountains, sea, buildings and grass [35]. Figure 4a shows the PauliRGB image and Figure 4b is the ground truth map.  Because of the coherent imaging mechanism, there exist strong speckle noise in PolSAR images. Therefore, the refined Lee filter firstly is exploited for speckle reduction. Furthermore, we segment a PolSAR image into many superpixels to enhance the classification performance and decrease the computation cost. In our experiments, the adaptive superpixel generation method [36] is used to obtain superpixels. Figures 2c, 3c and 4c show the the segmentation results for three data sets, respectively. Then we conduct experiments based on a superpixel rather than a single pixel. The mean of polarimetric feature vectors of pixels in a superpixel is used as the polarimetric feature vector of the superpixel. The mean of covariance matrices of pixels in a superpixel is used as the covariance matrix of the superpixel.
Regarding methods for comparison, the classical PolSAR image classification method, i.e., the Wishart classifier (WC) [4] is used as a baseline. Moreover, two DR methods, i.e., principle component analysis (PCA), Laplacian eigenmaps (LE) [12] are employed for comparison with the proposed method. To further validate the effectiveness of the combination by co-regularization, two extreme cases of the proposed method are also used for comparison. One case when λ = 0, α = 1 means that we just use a weighted graph based on Wishart distance, which is equal to Wishart distance-based Laplacian embedding (WDLE). In the same way, the other case when λ = 0, α = 0 means that we just use a weighted graph based on Euclidean distance of polarimetric features, which is equal to polarimetric feature-based Laplacian embedding (PFLE). Please note that for WDLE and PFLE, neighbouring samples are searched in a local region, which is different from LE [12] (LE search neighbouring samples globally). In addition, all methods are based on superpixels. To emphasize the role of feature extraction methods in PolSAR image classification, we just employ nearest neighbour (NN) classifier for classification based on obtained low-dimensional features by PCA, LE, PFLE, WDLE and the proposed CRGE. We randomly select 1% samples for training in the NN classifier and compute the average accuracy of 10 experiments as the final accuracy.

Parameter Setting
The parameter setting is discussed in this part. The experiments which are specific to parameter setting discussion are conducted on the subset of the Flevoland data set. About two types of regularization mentioned in Section 3, i.e., Equations (10) and (11), the corresponding two models are problems (12) and (18). The first one as is denoted as CRGE-Cor1 and the other one is denoted as CRGE-Cor2. Experiments of CRGE-Cor1 and CRGE-Cor2 are carried out with different sizes of the local patch h × h and the number of neighbouring samples k. The classification accuracies are shown in Figure 5. We list three cases when h = 41, h = 61 and h = 81. Because the proposed method is based on superpixel which contains about 200 pixels, the number of superpixels in the h × h patch is limited, such as the 41 × 41 region contains 7 superpixels at most, which is shown in Figure 5a. We can see that CRGE-Cor2 performs much better than CRGE-Cor1. CRGE-Cor1 performs poorly under different sizes of the local region and the number of the neighbouring samples. The highest classification accuracy of CRGE-Cor1 is lower than 90% and is much lower than the classification accuracy of GRGE-Cor2. Therefore, we adopt the second type of regularization, i.e., CRGE-Cor2 for the following experiments. Please note that CRGE means CRGE-Cor2 in the following description. Moreover, regarding the size of the local patch h × h and the number of neighbouring samples k, taking the classification performance and the computation burden into account, we select 61 × 61 local region and 10 neighbouring samples, i.e., h = 61, k = 10 for the experiments in the subset of the Flevoland data set. About the reduced dimensionality d, Figure 6 shows the classification accuracy of six methods with dimensionality which ranges from 1 to 10. We can see that the classification accuracy stays unchange when the dimensionality is larger than 5. Therefore, we select d = 6 as the reduced dimensionality. Moreover, the proposed co-regularized graph embedding method has a higher accuracy than other methods with ranging reduced dimensionality. When the reduced dimensionality d = 4, the classification accuracy of the proposed co-regularized graph embedding method reached nearly 100%. Moreover, for other three data sets, i.e., the full Flevoland data set, the Oberpfaffenhofen data set and the San Francisco Bay data set, the classification accuracies of the proposed method with dimensionality ranging from 1:10 are shown in Figure 7. We can see that when the dimensionality reaches 6, the classification accuracy almost stop increasing. Therefore, the reduced dimensionality is set as 6 for three data sets. The stop error ε for the iterative process is set as ε = 10 −6 . The maximum number of iterations T max is set as T max = 10. The parameter α and λ are set as α = 0.1, λ = 0.2.  Table 1 gives classification accuracies of six methods on the subset of the Flevoland data set, which include user accuracy (UA), producer accuracy (PA) for each classes and overall accuracy (OA). Note that the bold means the highest UA, PA or OA. Figure 8 presents the visual classification results maps of six methods on the subset of the Flevoland data set. We can see that the LE performs worst among six methods. WC performs nearly as well as PCA. PFLE performs a little better than PCA and WC. WDLE performs fairly worse than WC, PCA and PFLE. The overall accuracy (OA) of the proposed CRGE is higher than other methods. The user accuracy (UA) for each class of the proposed method is higher than other comparing methods. The overall accuracy of the proposed method is larger than 99% and the classification map is nearly similar to the ground truth map. Moreover, although both PFLE and WDLE cannot achieve a satisfactory classification performance, the combination of two distances based on co-regularization contributes to the improvement of the classification performance. Table 2 gives the classification accuracies of six methods on the Flevoland data set, which include user accuracy (UA), producer accuracy (PA) for each classes and overall accuracy (OA). Figure 9 presents visual classification maps of six methods on the Flevoland data set. It is obvious that PFLE and WDLE performs very badly while LE performs better than them. The reason may be that the Flevoland data set has a large size and contains many classes. Samples from one class distribute quite diversely, which results in that local search just based on one distance is not fit for this data set. However, the proposed CRGE performs best among the comparing methods, which can evaluate the combination of the two distances by co-regularization can improve the classification performance. PCA performs better than LE and WC. The overall accuracy of the proposed CRGE method is about 5% larger than ones of other methods.   Table 3 gives the classification accuracies of six methods on the Oberpfaffenhofen data set, which include user accuracy (UA), producer accuracy (PA) for each classes and overall accuracy (OA). Figure 10 presents visual classification maps of six methods on the Oberpfaffenhofen data set. The size of the local patch is set as h = 81. The number of neighbouring samples is set as k = 15. The Wishart classifier performs worst among compared methods mainly because a great amount of "built-up areas" are misclassified as shown in Figure 10b. Table 2 presents that the user accuracy of class "built-up areas" is 36.39% while the user accuracies of other classes are larger than 85%. PCA, LE, PFLE and WDLE perform similarly. The classification accuracy of the proposed CRGE method reach 98%. The classification map of the proposed method also seem closer to the ground truth map. Table 4 gives the classification accuracies of six methods on the San Francisco Bay data set, which include user accuracy (UA), producer accuracy (PA) for each classes and overall accuracy (OA). Figure 11 presents the visual classification maps of six methods on the San Francisco Bay data set. The size of the local patch and the number of neighbouring samples are set as h = 101 and k = 20. The Wishart classifier still performs worst among compared methods mainly because many samples from three classes of land covers: mountains, grass and buildings are misclassified. PCA, LE and WDLE perform similarly and PFLE performs worse than them. The proposed co-regularized graph embedding method is superior to other methods. The classification accuracy is about 2% larger than ones of other methods. The user accuracy and producer accuracy for each class of the proposed CRGE method are higher than other comparing method. Obviously, the classification map of the proposed method appears more similar to the ground truth map.     Table 5 shows the computational time of six methods on three data sets. We can see that WC costs little time on the Oberpfaffenhofen data set and the San Francisco Bay data set because the classification of WC is based on the mean of a class. Other methods employ the nearest neighbour (NN) classifier, which is based on each pixel of a class. Therefore, other methods cost more time than WC. However, for the Flevoland data set, WC takes more time than PCA because the number of classes is larger than other datasets'. The computational time of PCA is relatively less than LE, PFLE, WDLE and CRGE because PCA has the explicit projection matrix to extract low-dimensional features, which can reduce computational time. PFLE and WDLE take less time than LE because of local search. The proposed CRGE costs the most time because it has an iterative process to pursue the low-dimensional features.

Discussion
Because the Wishart distance of covariance matrices and Euclidean distance of polarimetric features are two important similarity measurements for PolSAR image classification, this paper attempts to propose the co-regularized graph embedding DR method to combine the two distances. Two weighted graphs are constructed corresponding to the two distances. Then the DR model is built based on the two graphs and co-regularization.
Observing the experimental results, we can see that Cor2 is proper for PolSAR image classification and the corresponding DR model can enhance the classification performance as shown in Figure 5. Under different sizes of the local patch and the number of neighbouring samples, CRGE-Cor1 performs much worse than CRGE-Cor2. Therefore, we employ CRGE-Cor2 for experiments. It is obvious that the proposed method performs better than other comparing methods on three data sets, especially, on the Flevoland data set. The proposed method can increase the overall accuracy by 5% and has a better classification map on the Flevoland data set. In fact, classification of the Flevoland data set is a more difficult problem because it contains 15 classes and has a large size, which is a strong evidence of the superiority of the proposed method. Moreover, two extreme cases of the proposed method, i.e., PFLE and WDLE perform worst on the Flevoland data set among six method. On other data sets, both of them cannot achieve a satisfactory classification performance. Therefore, we can conclude that the combination of the two distances by co-regularization indeed can improve the classification performance including the numerical results and visual results.
From Table 5, we can see that the proposed method costs much time for the data sets with a large size. To take a comprehensive consideration of balance the accuracy and speed, for a data set with a small size and many classes, the proposed method is a good choice for PolSAR image classification.

Conclusions
This paper put forward a co-regularized graph embedding method in basis of two types of distances (i.e., Wishart distance of covariance matrices and Euclidean distance of polarimetric features) for PolSAR image classification. Two weighted graphs are constructed specific to two distances. Then the DR model is constructed based on two weight graphs and co-regularization. The co-regularization aims to minimize the dissimilarity of low-dimensional features corresponding to two graphs. The co-regularized graph-embedding DR model can achieve a better classification performance than compared methods.
Moreover, the proposed co-regularized embedding method mainly focuses on graphs corresponding to different similarity measurements rather than the multiple data sets from different perspectives. We directly pursue the low-dimensional features specific to different graphs. Therefore, the original data sets become unnecessary and the similarity between two samples is indispensable, which can result in two advantages: (1) for one data set, the proposed method can use different similarity measurements to extract features from different perspectives, which can characterize the data more comprehensively; (2) sometimes great difference in multiple data sets from different perspectives may influence the classification performance or bring challenges for data representation, which can be avoided by the proposed method.
The proposed method has some limitations: (1) the proposed method is specific to two weighted graphs and cannot deal with more graphs; (2) although CRGE with Cor2 can enhance the classification performance, it is time consuming.
In future work, we will mainly focus on extending the proposed method for more graphs, and design the co-regularization which can meet the requirements of accuracy and speed.