## 1. Introduction

Polarimetric synthetic aperture radar (PolSAR) can provide more useful information of targets with four polarizations than the single polarization SAR. Therefore, the PolSAR data has been used for various remote sensing applications such as land cover classification, urban extraction, and analysis [

1,

2,

3]. Land cover classification has attracted more and more attention. However, due to the speckle noise within the PolSAR data, the image classification is still a challenge. Until now, many supervised and unsupervised PolSAR image classification methods have been proposed to resolve this issue [

4,

5,

6,

7,

8,

9]. In these two kinds of classification strategies, feature selection is the key element since a set of suitable features may get a correct classification even if using a simple classifier. In contrast, it could be difficult to achieve the satisfactory land cover classification without well-selected features even if using a complex and advanced classifier [

10]. The features extracted from the PolSAR image include the physical scattering features obtained from the various decomposition methods and the statistical contexture features. Some polarimetric decomposition theorems have been introduced [

11,

12,

13,

14,

15] and classification methods based on decomposition results have been explored [

7,

8,

9,

16,

17]. However, there are some misclassifications in this kind of scattering-mechanism-based PolSAR land cover classification for the reason that some different classes may have the same scattering mechanism and the same classes can exhibit different scattering mechanisms especially for the oriented urban areas and the vegetation [

2,

18,

19]. To resolve this issue, a wide variety of polarimetric features are used including the decomposition powers and several polarimetric indexes such as backscattering coefficients of different polarizations (linear: HH, HV, VV; circular: LL, RR, RL; and linear 45°, 45C, 45X) and their ratios. In addition to polarimetric information, some studies on PolSAR image classification are also researched from the prospects of image understanding, which indicate the effectiveness of image texture descriptors on classification [

20]. Recently, some studies have indicated that the fusion of physical and textural information derived from various SAR polarizations is helpful in improving classification results. Tu et al. [

10] proposed the combination of various decomposition scattering powers, backscattering coefficients, and phase differences between co-polarizations and cross-polarizations as well as some other polarimetric signatures for PolSAR image classification. Qi et al. [

21] utilized the decomposition scattering powers, image texture, and the interferometry information to achieve classification for RADARSAT-2 data. Zhang et al. [

22] utilized the scattering powers and GLCM texture features for the ESAR image classification. While these integration methods can make full use of image information and significantly improve classification accuracy, some deficiencies still exist. First of all, various features have information redundancies. For instance, the Krogager rotation angle is relative to the polarization orientation angles and the H/alpha parameters describe the chaotic volume scattering, which is also considered in the Freeman–Durden methods. These information redundancies may lead to low classification accuracy. Even though some dimensionality reduction techniques are utilized to diminish these redundancies [

10,

23,

24], the computation time of these methods for so many features are very large, which makes the classification techniques uneasy to use. Second, these classification methods are mostly pixel-based, which results in the sensitivity to speckle noise and large computation load.

Considering the decomposition drawbacks and features redundancies in this paper, we propose an improved multiple-component model-based decomposition method for the sake of PolSAR image classification. It consists of two main improvements. First, the reorientation process is applied to the coherency matrix before it is decomposed into five scattering components. Then, two suitable volume scattering models designed for forests and oriented urban buildings are used in the decomposition. The advantage of the proposed decomposition is that volume scattering of vegetation is enhanced while its double-bounce scattering is reduced. Moreover, double-bounce scattering of urban buildings is enhanced and its volume scattering decreases. Therefore, the scattering powers obtained using the improved decomposition method has a good ability in discriminating the urban areas. After that, five decomposition powers are selected for the classification instead of using numerous polarimetric features.

Compared with pixel-based image classification methods, region-based classification is a promising scheme. After segmenting images under some constraints such as intensity, location, texture, and edge, we can get many homogeneous regions and then classification is based on these regions instead of pixels. A superpixel [

25] denotes a local, coherent region, which is approximately homogeneous in size and shape just like pixels. Xiang et al. [

25] proposed a superpixel generating algorithm based on pixel intensity and location similarity for the SAR image and extracted the Gabor filters and GLCM from each superpixel for classification. For PolSAR data, Xiang et al. [

26] proposed an adaptive superpixel generation method based on the spherically invariant random vector product model, which can generate satisfactory superpixels with good boundary adherence and compactness. In this paper, we extract the superpixels using Xiang’s method [

26]. Afterwards, GLCM features and the spatial relationships (e.g., mean and variance values of each superpixel) are extracted for superpixel-based classification.

Even though the number of polarimetric features is much less than those of other classification methods, texture features and spatial information also have some feature redundancies. There are many linear and nonlinear dimensionality reduction methods proposed to project high-dimensional data into a new space of lower dimensionality including principal component analysis (PCA) [

27], linear discriminated analysis (LDA) [

28], locally linear embedding (LLE) [

29], isometric feature mapping (ISOMAP) [

30], and Laplacian eigenmaps (LE) [

31]. Shi et al. [

24] used a linear dimensionality reduction technology named SGE to obtain a low-dimensional subspace that can preserve the discriminative information from training samples. Tu et al. [

10] pointed out that it is more effective to use a nonlinear local dimensionality reduction method considering the nonlinearity of the polarimetric manifold. Hence, they used Laplacian eigenmaps to map the high dimensional polarimetric features into lower dimensionality feature vector for PolSAR image classification. However, this method is an unsupervised learning algorithm, which means it assumes no prior information on the input data. In addition, the size of the neighborhood needs to be set before mapping, which is inflexible [

32]. To improve the classification performance, the discriminative information from the given training samples should be considered. Considering the advantages of LLE such as being non-iterative and avoiding the local minima problems in this paper, we propose to use the supervised locally linear embedding (S-LLE) to reduce the feature redundancies. It has favorable properties. (i) It adaptively estimates the local neighborhood surrounding each sample and (ii) the objective function simultaneously maximizes the local margin between heterogeneous samples and it also pushes the homogeneous samples closer to each other.

The main contributions of this paper mainly lie on the following aspects: (1) the improved decomposition scattering powers proposed for PolSAR image classification, (2) superpixel-based classification strategy, and (3) supervised locally linear embedding approach for feature dimensionality reduction. Even though the superpixel-based classification strategy is already widely used, the scattering power features extracted from the superpixels and the dimensionality reduction are both improvements we proposed in this work. Therefore, the contributions and structure of our method are dramatically different from the existing approaches. The remainder of this paper is organized as follows.

Section 2 describes the decomposition scattering powers obtained from the improved multiple-component decomposition technique. In

Section 3, the superpixels generated from PolSAR data and the corresponding features extraction are described. In

Section 4, the S-LLE technique is described and the dimensional reduction performances of different methods are compared. We show the study area and further compare the experimental results with other methods in

Section 5.

Section 6 concludes the paper.

## 3. Superpixel Generation and Feature Extraction

Object based segmentation and classification for PolSAR images have attracted more and more attention for the reason that it is computational efficient and can reduce the effect of speckle noise by taking the image objects as the processing unit instead of using isolated pixels [

1,

26]. A superpixel is defined as a local homogeneous region that can preserve most of the object information and adhere to the object boundaries. Furthermore, the features extracted from a superpixel usually exhibit more useful information than those extracted with a pixel since the superpixel may contain many neighborhood pixels. Therefore, it is helpful in improving the image classification accuracy [

25]. Several superpixel generation methods for PolSAR images have been proposed. In this paper, we adopt the adaptive polarimetric SLIC, i.e., Pol-ASLIC approach, which was proposed by Xiang et al. [

26] to produce superpixels. This method can generate superpixels with an adaptive shape and compactness according to the image content. Furthermore, the boundary adherence is quite good, which shows potential ability to classify the land covers. The detailed information about the Pol-ASLIC can be found in Reference [

26].

Figure 3 gives one illustration of the superpixel generation using AIRSAR C band data where we can find that the urban buildings can be discriminated very well and the superpixel boundary is quite clear and accurate.

Note that the superpixels are obtained based on the polarimetric coherency or covariance matrix. Therefore, the polarimetric information can be fully considered and the superpixel boundary can accurately discriminate the objects with different scattering mechanisms. Consequently, in this work we apply the superpixel boundaries on five decomposition scattering power images and extract some texture and spatial features from these images for each superpixel. These features contain: (1) spatial features, i.e., mean and variance values of each superpixel; (2) texture features that include the homogeneity of gray-level co-occurrence matrix (GLCM), GLCM contrast, GLCM dissimilarity, and GLCM entropy of four directions (0°, 45°, 90°, 135°). Therefore, for each superpixel, we can get five polarimetric scattering powers and 90 texture and spatial features. Even though the dimension of this feature set is large, it is much smaller than that of the features used in Reference [

21]. It is worth pointing out that, although the dimension of polarimetric scattering power features is dramatically reduced, the polarimetric information of the whole feature set is enough for further classification since all of the spatial and texture features are calculated based on the scattering powers with the assistance of superpixels, which are also obtained based on the scattering matrix. The effectiveness of classification features using in our proposed method is discussed in the following sections.

## 4. Dimensional Reduction of the Features for PolSAR Image Classification

As we discussed in the above section, features with a large dimension may result in information redundancies, which can reduce the classification accuracy. Therefore, feature dimensionality reduction is necessary for image classification and has been widely studied. There are two kinds of dimensionality reduction techniques, i.e., linear and nonlinear methods. In our research, since the features lie on a complicated nonlinear manifold, nonlinear methods are more reasonable than linear dimensionality reduction to discover the intrinsic structure in the data. For PolSAR image classification, we aim to aggregate the pixels of the same class and separating the pixels of different classes. That means the local structure of data needs to be retained so that data pointing in the same class are clustered while data points in different classes are kept away from each other. Therefore, local nonlinear dimensionality reduction techniques are optimal in PolSAR image classification. Now there are many local nonlinear dimensionality reduction techniques such as locally linear embedding (LLE) [

29], Laplacian eigenmaps (LE) [

31], and local tangent space alignment (LTSA) [

47]. These methods are all unsupervised learning algorithms, which consider no prior information on the original data. Furthermore, some parameters need to be set before mapping such as the neighborhood size. This section presents a supervised LLE (S-LLE) method, which can estimate the neighborhood size adaptively and also takes in the discriminable information of training samples.

Let the data matrix $Z$ with size $D\times M$ be the input of S-LLE approach, which includes $M$ columns $D$ dimensional feature vectors. The output of S-LLE approach is a new data matrix $Y$ with size $d\times M$ where the dimension of feature vector $d\le D$ in the embedded space. S-LLE is implemented with the following steps.

#### 4.1. Estimation of the Adjacency Graph

The unsupervised LLE method finds the

$K$ nearest neighbors for each data point

${Z}_{i}$ in the data matrix

$Z$ using the Euclidean distance measure and then we can obtain the proximity matrix

$A$ with size

$K\times M$. The

ith column contains the indices of

$K$ points, which are the neighbors of

${Z}_{i}$. It can be seen that the neighborhood size

$K$ is essential in the LLE algorithm, which should be determined before the feature mapping. In our work, for the sake of exploring geometrical and discrimination information of the data, the neighboring graph can be split into two components, which are the within-class neighboring graph

${G}_{w}$ and between-class neighboring graph

${G}_{b}$. Therefore, for each data point

${Z}_{i}$, we can calculate two neighborhood subsets called

${N}_{w}\left({Z}_{i}\right)$ and

${N}_{b}\left({Z}_{i}\right)$. Note that

${N}_{w}\left({Z}_{i}\right)$ represents the neighbors having the same class label with

${Z}_{i}$ and

${N}_{b}\left({Z}_{i}\right)$, which denotes the neighbors with different labels with

${Z}_{i}$. It can be seen that, unlike the classical unsupervised LLE approach, the proposed algorithm adjusts the neighborhood size

$K,$ according to the similarity measure between the local sample point

${Z}_{i}$ and the rest of the samples. The two neighborhood subsets

${N}_{w}\left({Z}_{i}\right)$ and

${N}_{b}\left({Z}_{i}\right)$ are calculated using the equations below.

where

$L\left({Z}_{i}\right)$ denotes the class label of

${Z}_{i}$,

$ED\left(k,i\right)$ represents the Euclidean distance between data points

${Z}_{k}$ and

${Z}_{i}$, and

$\mathrm{D}\left({Z}_{i}\right)$ denotes the average distance between

${Z}_{i}$ and all other samples. What we can see from Equation (19) is that the set of within-class neighbors of

${Z}_{i}$, i.e.,

${N}_{w}\left({Z}_{i}\right)$, is all data samples with the same class label with

${Z}_{i}$ and the distance is lower than the average distance associated with

${Z}_{i}$. There is a similar interpretation for the set of between-class neighbors

${N}_{b}\left({Z}_{i}\right)$. Thus, it is clear that the neighborhood size is adaptive for every data sample, which is shown in

Figure 4 and can bypass the setting of parameter

$K$.

#### 4.2. Computation of the Weights for Neighbors

Different from the traditional unsupervised LLE approach, in this paper, we divide the single weight matrix

$W$ into two sub-weight matrices

${W}_{\mathrm{w}}$ and

${W}_{\mathrm{b}}$, which denote the weights of the within-class neighbor graph and the between-class neighbor graph, respectively. Note that the weight value in the matrices measures the closeness of two data points, which can be further used to measure the contributions of the nearest neighbors to the reconstruction of a given point. The sub-weight matrices

${W}_{\mathrm{w}}$ and

${W}_{\mathrm{b}}$ can be obtained by optimizing the following task as shown below.

where

${M}_{1}$ and

${M}_{2}$ are the number of samples within the class neighborhood

${N}_{w}\left({Z}_{i}\right)$ and the between-class neighborhood

${N}_{b}\left({Z}_{i}\right)$, respectively.

#### 4.3. Solution of the Mapping Projections

The feature vector mapping projection can be obtained with the optimization of two objective functions as shown below.

With this optimization, it can be found that, after the feature mapping, the data points within the same class become closer to each other and data points within different classes are farther away than before. Therefore, after this supervised feature dimensionality reduction, different classes can be distinguished quite well. With the condition

${Y}^{\mathrm{T}}Y=I$, the two objective functions (23) can be further combined into one objective function, which is shown below.

where

$\mathrm{Tr}(\xb7)$ is the matrix trace operator and the parameter

$\gamma $ is a balance factor that control the within-class and between-class objective function. With

$B=\gamma {W}_{\mathrm{b}}+\left(1-\gamma \right){W}_{\mathrm{w}}$, Equation (24) can be further written as

$\mathrm{maxTr}\left({Y}^{\mathrm{T}}BY\right)$. Then solving this maximization problem is equivalent to optimizing the eigenvector problem

$BY=\lambda Y$ with the largest nonzero Eigen value.

After mapping the polarimetric, spatial, and textural features from a high dimensional vector to a low dimensional vector, the nearest neighbor (NN) classifier is utilized to achieve the image classification, which adopts the low dimensional feature vector as the input.

Figure 5 gives the whole flowchart of our proposed methodology.