Nearest-Regularized Subspace Classification for PolSAR Imagery Using Polarimetric Feature Vector and Spatial Information

Feature extraction using polarimetric synthetic aperture radar (PolSAR) images is of great interest in SAR classification, no matter if it is applied in an unsupervised approach or a supervised approach. In the supervised classification framework, a major group of methods is based on machine learning. Various machine learning methods have been investigated for PolSAR image classification, including neural network (NN), support vector machine (SVM), and so on. Recently, representation-based classifications have gained increasing attention in hyperspectral imagery, such as the newly-proposed sparse-representation classification (SRC) and nearest-regularized subspace (NRS). These classifiers provide excellent performance that is comparable to or even better than the classic SVM for remotely-sensed image processing. However, rare studies have been found to extend this representation-based NRS classification into PolSAR images. By the use of the NRS approach, a polarimetric feature vector-based PolSAR image classification method is proposed in this paper. The polarimetric SAR feature vector is constructed by the components of different target decomposition algorithms for each pixel, including those scattering components of Freeman, Huynen, Krogager, Yamaguchi decomposition, as well as the eigenvalues, eigenvectors and their consequential parameters such as entropy, anisotropy and mean scattering angle. Furthermore, because all these representation-based methods were originally designed to be pixel-wise classifiers, which only consider the separate pixel signature while ignoring the spatial-contextual information, the Markov random field (MRF) model is also introduced in our scheme. MRF can provide a basis for modeling contextual constraints. Two AIRSAR data in the Flevoland area are used to validate the proposed classification scheme. Experimental results demonstrate that the proposed method can reach an accuracy of around 99% for both AIRSAR data by randomly selecting 300 pixels of each class as the training samples. Under the condition that the training data ratio is more than 4%, it has better performance than the SVM, SVM-MRF and NRS methods.


Introduction
Due to the active microwave imaging characteristics of fine resolution, day-and-night and weather-independence, Synthetic Aperture Radar (SAR) has been widely used for Earth remote sensing for more than 30 years, and it has come to play a significant role in geographical survey, climate change research, environment and Earth system monitoring, multi-dimensional mapping and other applications [1].As the most essential properties of electromagnetic waves, polarization can reflect different information provided by the magnitude, frequency and phase.The polarization characteristics are highly related to the properties of the target, such as structure, size, posture, and so on.In the past few decades, many polarimetric SAR (PolSAR) systems (AIRSAR, ESAR, Pi-SAR, SAR580-Convair, ALOS2, RADARSAT-2, TerraSAR-X, etc.) have been developed and applied in land cover classification [2,3], soil moisture inversion [4][5][6][7], tomography retrieval [8], target detection [9,10], and so on.Because the orientation, size, dielectric and moisture characteristics of the target affect PolSAR data, the data are sensitive to the geometry of the building, branching structure and shape of the leaves [11,12].Therefore, the PolSAR data classification has attracted extensive attention and become a hot topic in PolSAR applications [2,3,[13][14][15][16][17].
Generally, the PolSAR classification mainly contains three steps, respectively polarimetric filtering, feature extraction and classification.The speckle reduction problem is more complicated for polarimetric SAR than a single polarization SAR, because of the difficulties of preserving polarimetric properties and of dealing with the cross-product terms [18].In order to guarantee the posterior processing quality, various PolSAR filters are designed to reduce speckle and yield reliable results in practical PolSAR missions [19].Although PolSAR can provide abundant target features within the structure, orientation and physical information, the measured data for each pixel are only comprised by a 2 × 2 scattering matrix or a 3 × 3 coherency matrix.How to map the limited scattering coefficients into diverse feature space, namely feature extraction, is still a difficult and challenged topic.In the early PolSAR classification methods, the backscattering coefficients and their ratios, differences are employed as the polarimetric features to represent object pixels.Though it is a most straightforward method, the physical mechanisms reflected by those features are composite and interdependent.The distinguishing nature of each object pixel is greatly reduced.Since target decomposition (TD) theorems were first formalized by Huynen [20], various TD methods were proposed to extract the independent components representing specific physical information to better characterize each pixel, such as Huynen decomposition [20], Freeman decomposition [21], van Zyl decomposition [22], Neumann decomposition [23], Krogager decomposition [24], Cloude-Pottier decomposition [25], Yamaguchi decomposition [26], Yang decomposition [27], Pauli decomposition [28], Barnes decomposition [29], and so on.According to those extracted features, the classifier is utilized to measure the similarity among pixels and realizes the category discrimination.
Recently, numerous supervised classification techniques for PolSAR imagery have been developed [11,[30][31][32][33], providing excellent results in the discrimination of different land-cover objects.Among them, the most prominent is the representation-based classification [34].In some cases, the representation-based classifications provide better performance than the classic neural network (NN) [35] and support vector method (SVM) [14] in that they require few labeled data instead of the traditional training-test mode.Representation-based classification is essentially based on the concept that a pixel can be represented as a linear combination of labeled samples via the sparse regularization methods, such as the l 0 -norm, l 1 -norm and l 2 -norm regularization.In this manner, an approximation of the pixel is generated from labeled samples of each class independently, and the class label is then derived according to the class of the minimum representation error.Representation-based classification can be divided into two branches according to the methods for weight vector solving of the linear combination.One is by l 2 -norm minimization, such as the nearest regularized subspace (NRS) [36,37] and the collaborative-representation classification (CRC) [38,39]; the other is by l 0 or l 1 -norm minimization, such as the sparse representation-based classifier (SRC) [31][32][33].
Although all these aforementioned representation-based classifications demonstrate superior performance for high-dimensional data classification even with a limited number of training samples, they were originally designed to be pixel-wise classifiers, which only consider the polarimetric features while ignoring the spatial-contextual information (i.e., the correlations between spatially-adjacent pixels).For remote sensing imagery (whether optical or radar images), neighboring pixels tend to belong to the same class with high probability, thus spatial-contextual information should greatly increase the accuracy of the classification methods.Although wavelet transformation is a good way to extract texture feature including the polarimetric and spatial information, it is not easy to combine with other classifiers [40].The Markov random field (MRF) approach is a popular model for incorporating spatial information into image classification [11,41,42].Therefore, we try to investigate the benefits of combining the representation-based classifications with the MRF model in order to further improve the classification performance.
In machine vision, MRF is a probabilistic model that is commonly used to integrate spatial context into image classification tasks [41].One of the MRF models is the classic maximum a posteriori Markov random filed (MAP-MRF) framework, which was first proposed in 1984.It employs Bayesian statistical guidelines to analyze the machine vision problem [43].Another method considered in the MRF model is graph cuts (GC) [44,45].GC can naturally formulate the MRF model in terms of energy minimization, which is better than other methods, such as belief propagation, dynamic programming, etc.In the remote sensing community, researchers have combined the MRF model with other classifiers for remote sensing data analysis, such as the classic SVM with MRF (SVM-MRF [11]) and the Gaussian mixture model (GMM) with MRF (GMM-MRF [46]).As for polarimetric SAR classification, in [47], a Wishart-MRF model is proposed to classify polarimetric SAR images, which not only achieves higher accuracy than the other three frequently-used methods, but also addresses the 'salt-and-pepper' problem well.In [11], a new contextual classifier combining Wishart, SVM and MRF is employed to classify forest species and outperforms the methods with the use of only polarimetric features.From their experimental results, we found that the MRF model can be used to capture the spatial-contextual information and be successfully applied for polarimetric SAR image classification.
In this work, a TD vector-based NRS-MRF classification algorithm is proposed to increase the classification accuracy of land cover species by exploiting tens of TD features and spatial-contextual information.Although NRS and MRF are known methods, for better efficiency and accuracy, introducing them to solve the polarimetric SAR classification with a 79-dimension feature vector should be constructive for this classification issue, especially for the future multi-dimensional classification with multi-polarization, multi-channel, multi-aspect, and so on.Moreover, the output of representation-based classifications is the residual between the test pixel and the corresponding recovery, while MRF is a probabilistic model as indicated before.Thus, a critical issue for the straightforward combination is to make a transformation of residuals that is suitable for the integration with the MRF model.The details of the proposed classification strategy will be discussed in the following section.Experimental results demonstrate that the performance is much better than the existing methods, such as NRS, SVM and SVM-MRF.Compared with the state-of-the-art PolSAR classification, we make the following contributions.

•
We construct a comprehensive polarimetric feature vector including 79 TD features.By using the labeled data, the representation-based classifier can exploit the discriminative information within classes contained in the 79-dimensional feature spaces.

•
We introduce the NRS classifier considering within-class variations to measure the similarity of the polarimetric feature vector between the test pixel and the labeled pixels and further employ the GC algorithm for spatial information mining.

•
We propose a PolSAR imagery-oriented transformation function that connects the residual from NRS and the probability for the MRF model.
The rest of this paper is organized as follows.Section 2 mainly introduces the TD methodology, the NRS classifier and the MRF model.Section 3 specifically describes the proposed NRS-MRF classification framework and gives a detailed introduction for how to effectively combine the NRS classifier with the MRF model for polarimetric SAR images.Then, the experimental results and analysis are presented in Section 4. Finally, conclusions are drawn in Section 5.

Methodology
In this section, we will briefly introduce some background knowledge on SAR polarimetric feature extraction and classification.First, several representative polarimetric decomposition features and their intrinsic mechanisms are described and analyzed.The detailed extraction methods are illustrated to show how the raw scattering coefficients are translated into decomposition components.Second, the principle of the nearest-regularized subspace classifier is deduced to demonstrate its efficiency and accuracy with l 2 -norm regularization.Finally, the MRF model is employed to incorporate the spatial contextual information into the polarimetric classification results.

Polarimetric Feature Vector Construction
Target decomposition theorems are aimed at providing such an interpretation based on sensible physical constraints such as the average target being invariant to changes in the wave polarization basis [48].Polarimetric targeted decomposition theory was firstly proposed by Huynen [20] for interpreting the scattering mechanisms of the ground targets, which mainly include surface scattering, dihedral scattering, volume scattering, and so on.Since this original work, there have been many other proposed decompositions that can be classified into four main types [48]: Those based on the Kennaugh matrix [K](Huynen [20], Barnes [29] and Yang [27]); • Those using an eigenvector or eigenvalues analysis of the covariance matrix [C] or coherency matrix [T] (Cloude [49], van Zyl [22], Cloude and Pottier [25], Holm [50] and the multiple-component scattering model (MCSM) [51]);
Among the four categories of TD algorithms, the former three categories belong to the incoherent decomposition, which is usually used to process the coherent matrix [T], covariance matrix [C] and Kennaugh matrix [K] for representing them as a linear combination.The coherent TD algorithm is used to process the scattering matrix [S] for representing the polarimetric information of the point target and the distributed target.Accordingly, the 12 incoherent TD algorithms and two coherent TD algorithms are employed together to generate 79 features produced by each pixel sample, as shown in Table 1.The representations of PolSAR imagery and the main TD algorithms will be briefly interpreted in the following part.

PolSAR Data Representations
The information of the PolSAR data is contained in four channels, respectively HH, HV, V H and VV, which indicate four linear orthogonal polarization combinations.Assuming the data are processed with the spatial average under monostatic mode, the complex backscattering matrix [S] for each image pixel at a specific incidence angle can be expressed as: where the subscript h indicates the horizontal polarization channel and the subscript v represents the vertical polarization channel.Under the monostatic backscattering case, there exists the reciprocity theorem, i.e., S hv = S vh .We discuss this backscattering case in the following part of the paper.From the scattering matrix [S], polarimetric information is usually interpreted by the second-order polarimetric descriptors' coherent matrix and covariance matrix.The multilook coherent matrix [T] and covariance matrix [C] can be expressed as follows. [ where • denotes the ensemble average, the superscript H indicates the complex conjugation and transpose of the vector and matrix, k is the Pauli-based scattering vector and Ω is the lexicographic scattering vector.These two vectors can be specified by vectorizing the scattering matrix [S] and are defined as the following in the backscattering case. ) where the superscript T indicates the transpose of the vector.Accordingly, the coherency matrix [T] and covariance matrix [C] are 3 × 3 in the monostatic backscattering case.

Decomposition Order Numbers Components
Note: The implications of decomposition components can be found in the cited references, which are not stated here in detail.'#' indicates that this TD feature is selected to form the 49 features in the Experimental section.
The Kennaugh matrix containing nine parameters can be expressed as follows: with where the superscript * denotes the complex conjugation; the nine parameters are called the Huynen parameters.These parameters are roll angle dependent corresponding to the target rotation along the radar line-of-sight and contain real physical target information [48].In the case of a pure target, there exists a one-to-one correspondence between the Kennaugh matrix and the coherency matrix.To sum up, these four matrices [S], [T], [C], [K] are the basis and object of TD algorithms for feature extraction.

Kennaugh Matrix-Based Decompositions
The basis of this kind of TD algorithm is the phenomenological theory proposed by Huynen.The basic idea of the Huynen target decomposition theorem is to separate from the incoming data stream a part that would be identified with a single average target and a residue component.It is argued that this approach follows the perception in the desire to distinguish a required object from its clutter environment.Huynen decomposition can be denoted as follows. [ where

Eigenvector-Based Decompositions
The residue matrix in the Huynen decomposition is not invariant under the wider class of transformations, as well as the eigenvectors.In order to observe other types of fluctuations of the target vector, more general polarization basis invariant forms of decomposition are developed, which is the decomposition approach based on the eigenvector.
This representative decomposition method is proposed by Cloude and Pottier [25], which relies on the eigenvalue analysis of coherency matrix [T] to generate estimations of the average target scattering matrix parameters. with: where [U 3 ] is a 3 × 3 complex matrix and λ 1 > λ 2 > λ 3 ≥ 0 are the corresponding eigenvalues.In this way, the coherency matrix [T] can be expressed as the sum of three independent targets as follows: [ This method rebuilds an average target, and it can be seen as a simple scattering target.Based on the eigenvalue decomposition, many useful parameters can be computed, which include the scattering entropy H, average α angle, anisotropy A and some other intermediate parameters λ, β, γ, δ, as well as parameter combinations (1-H)(1-A), (1-H)A, H(1-A), HA.It should be noted that entropy, scattering angle and anisotropy are important polarimetric features, which can reflect the dominant scattering mechanism, the number of major scattering types and the ratio between them.
Holm provided an alternative physical interpretation of the eigenvalues' spectrum by interpreting the target as a sum of a single scattering [S] matrix (Rank 1 coherency matrix) plus two noise or remainder terms.This is a hybrid approach, combining an eigenvalue analysis (providing invariance under unitary transformations) with the concept of the single target plus noise model of the Huynen approach.According to different eigenvalue matrix decompositions, Holm decomposition has two representations as follows [48], which are used in the feature vector construction.

Model-Based Decompositions
The model-based decomposition is based on the physical scattering mechanism of the target and decomposes the coherent matrix into surface scattering, dihedral scattering, volume scattering, helix scattering or a partial scattering combination for the specific application scenario.
The Freeman-Durden decomposition method decomposes the target into three parts: a canopy scattering part from a cloud of randomly-oriented dipoles, a double bounce scattering part from a pair of orthogonal surfaces with different dielectric constants and a Bragg scattering part from a moderately rough surface [52].In the decomposition method, the [T] matrix can be expressed as: where P S , P D , P V correspond to the power of the single bounce scattering part, the double bounce scattering part and the volume scattering part and [T S ], [T D ], [T V ] are the coherency matrices of the three scattering mechanism obtained from scattering models, respectively.The Yamaguchi decomposition decomposes the target into four scattering mechanisms following the Freeman-Durden decomposition.The decomposition adds a helix scattering power as the fourth component to deal with the non-reflection symmetric scattering cases [26].Like the Freeman-Durden decomposition, the [T] matrix can be expressed as: where f s , f d , f v and f c correspond to the power of the single bounce scattering part, the double bounce scattering part, the volume scattering part and the helix scattering part.

Coherent Decompositions
The objective of the coherent decompositions is to express the measured scattering matrix [S] as a combination of basis matrices corresponding to canonical scattering mechanisms.The coherent target decomposition is useful if only one dominant target component is expected, e.g., dihedral or trihedral edge contributions in urban areas or as calibration targets.Hence, it could be applied to the case of high-resolution, low-entropy scattering problems.
The Krogager decomposition presents a relation to directly measurable quantities and therefore to the actual physical scattering mechanisms represented by the component matrices.In the Krogager decomposition, a symmetric scattering matrix [S] is decomposed into three coherent components that have a physical interpretation in terms of sphere, diplane and helix targets under a change of rotation angle and are defined as follows.
[S] = e jϕ e jϕ S k S where S s , S d and S h denote the Sinclair matrix of the sphere, diplane and helix; k S , k D , k H are all real coefficients of those components.
In this research, some other decomposition approaches of these four types are employed to extract polarimetric features, such as Yang decomposition [27], MCSM method [51], Touzi method [53], van Zyl decomposition [22], and so on.The details of each decomposition method can be found in the relevant literature.

Nearest Regularized Subspace
Assume a given hyperspectral dataset with training samples X = {x i } n i=1 in d and the class label w i ∈ {1, 2, . . . ,C}, where d is the number of spectral variables (bands), C is the number of classes and n is the total number of training samples.
An approximation of the test sample y is represented via a linear combination of available training samples per class.For each class, we can calculate the approximation y l as: where X l is a matrix with the size of d × n l , n l is the number of available training samples for class l, C ∑ l=1 n l = n and α l represents the weight vector coefficients with the size of n l × 1 for the linear combination.Suppose we have obtained the weight vector; the label of the test sample is determined by the residual between y and y l , which is represented as: After that, the class label is then derived according to the class of the most accurate representation (i.e., the minimum value of the residuals for all the classes), In NRS [36], the weight vector for the linear combination is solved as follows, The Γ l,y is a biasing Tikhonov matrix specific to each class l and test sample y, and the λ is a global regularization parameter, which is used to balance the minimization between the residual and regularization terms.For the NRS classifier, we design the diagonal elements of matrix Γ l,y in the form of: where x l,1 , x l,2 , . . . . . ., x l,n l are the columns of X l for the class l.Commonly, Euclidean distance will be used for measuring the similarity between the testing sample and training sample and then to vote for the test sample belonging to the proper label.According to the formulas defined in Equation ( 21) and Equation ( 22), the approximation y l of each test sample y can be expressed as: We notice that the solution of weight vector has a closed form in NRS.Another similar classifier is CRC [38], whose difference is using an identity matrix Iinstead of the regularization term in Equation (4).In [36], experimental results demonstrated that NRS significantly outperforms CRC for hyperspectral image classification, which turns out to be reasonable.Using a biasing Tikhonov matrix Γ l,y to mine the data distribution, the samples in X l , which are dissimilar to y in terms of the Euclidean distance, should be given less contribution towards the linear combination than those that are more similar.In NRS, the more relevant weight vector α l makes the approximation y l from the correct class have better similarity with the test sample, which results in much better classification performance than CRC does.

MRF Model
The pixel-wise NRS classifier has been demonstrated to be effective [36]; however, for the PolSAR classification, the image representation of the same category object is always the block-wise manner, under which the spatial information may have the potential to further improve the classification accuracy.
In the field of machine vision, image analysis is usually defined as the modeling problem.The MRF model [41,42], widely used in image analysis, provides a convenient and consistent modeling approach with context-dependent entities or intrinsically linked with features.
Previous work pointed out the advantages and disadvantages of seeking the joint probability and the conditional probability in the MRF model.From those, GC [44,45,54,55] turns out to be an effective technique, whose energy function can arise from the Bayesian labeling of the first-order MRF model [55].This algorithm is closely related to the min-cut technique, which is to find the cheapest cut among all cuts separating the terminals.
In the MRF model, we have the probability of the test pixel y p (note that y p has the same meaning as the aforementioned y) for each class, and the energy function E(•) of y p for class l can be represented as, E y p w p = l = µE data y p w p = l + E smooth y p w p = l E data y p w p = l = ∑ p∈P Dp y p w p = l (25 where Dp is the spectral energy function for y p and µ is the balance parameter.E smooth y p w p = l means the extent on piecewise smoothness, where w p and w q are the labels, δ is the Kronecker delta function (δ(w p ,w q ) = 1 if w p = w q , and δ(w p ,w q ) = 0 otherwise), B p,q is viewed as the penalty between y p and y q [45], y q represents the adjacent pixel to y p and N p is the set of neighbors for the given pixel y p .Commonly, the more similar these neighboring pixels are, the larger the value will be.If they are totally different, B p,q will be definitely zero.

Proposed Method
In the previous section, the related works have been introduced in detail, including the polarimetric decomposition features, representation-based classifier and MRF model.Next, the overall processing flow from the raw data to classification results will be described to illustrate how the NRS-MRF algorithm measures the pixel similarity in high-dimensional polarimetric decomposition features space.
In general, the proposed method consists of four steps, respectively pre-processing (polarimetric filtering), feature vector construction, NRS classification and classification optimization with spatial information, as shown in Figure 1.As the input of the classification procedure, the fully-polarimetric SAR image is firstly processed with the refined Lee filter to reduce speckle.Next, 18 kinds of polarimetric decomposition operations are performed on each image pixel and turn the polarimetric SAR images into a data cube of a high-dimensional feature space.Then, some labeled samples are randomly selected to construct a structured dictionary, and other samples are taken into account as test samples.After the preparation of classification, the NRS classifier calculates the residual distance and probability belonging to each class consequently.Finally, the optimized labels are calculated according to the probability and local energy function.*/",,-6-("#

Feature Vector Construction and Similarity Measurement
As for the NRS classification algorithm, a few labeled samples are utilized to form a dictionary, and the testing sample is projected onto the dictionary to get the sparse representation coefficients, which are used to calculate reconstruction errors and finally identify the class label.The sample dimension is just the feature dimension, and the sparse coefficient dimension is related to the size of the dictionary.It can be seen that the precise selection of a low-dimensional feature space is not necessary, and the NRS algorithm can exploit the discriminative information behind the high-dimensional feature space through the sparse representation coefficients.Therefore, we can catenate tens of TD features to construct a high-dimensional feature vector to characterize each pixel sample for further sparse representation processing.
Once the feature vector for each pixel is extracted, the NRS-based similarity measurements are conducted among the test pixels and the structured dictionary.For each test pixel, the residuals to all classes are calculated and can be used for initial classification considering polarimetric features only, as shown in Figure 1.

Parameter Tuning of Transformation Function
In the NRS algorithm, the residual (reconstruction error) for each class is calculated.The smaller the residual of this class, the higher the probability of belonging to this class.As previously mentioned, the GC theory is used to solve the MRF model, which acquires the probability as its input.Therefore, we should attempt to transform the residual for each class to the probability for each class.According to the above analysis, it is reasonable to postulate that the transformation should be a decreasing function of the residual, and the following conditions should be imposed on function f , which indicates a decreasing function, and r l represents the residual of the test pixel for class l and is calculated by Equation (19).To map the residuals into probability for all the classes, a common decreasing function is designed to implement the transformation as the following, where x is a tuning parameter for PolSAR classification.Note that the range of probability f (r l ) is from 0-1.
In order to realize the optimal transformation from the residual to probability, we set the range of x to [−3, 0] to investigate the relation between the exponent x and the final classification accuracy.As shown in Figure 2, the optimal value for x is −0.5; thus, the transformation function is defined as follows.

Feature Vector-Based NRS-MRF Classification
Followed by the above transformation step, for the test pixel y p , we can compute the probability for each class, denoted as f (r l ).Then, we calculate the energy D j y j w j = l = − ln { f (r l )} and the energy of this pixel belonging to class l, named as E y p w p = l .The local energy involves some cut-related algorithms, such as max-flow/min-cut.Finally, we find the label class (y) = arg min l=1,2,...,C E y p w p = l , which is the optimal classification result derived from combining polarimetric and spatial contextual features, as shown in Algorithm 1.

Algorithm 1 The feature vector-based NRS-MRF classifier.
Input: Fully polarimetric SAR image Output: Class labels of the entire test image pixels.
1: Preprocessing←polarimetric filtering with refined Lee filter 2: Feature vector construction←stacking 18 categories of polarimetric decomposition features 3: Dictionary construction←feature vector of training data X = {x i } n i=1 with corresponding class label   w p ← compute the final label considering the argminE.15: end for

Experimental Results and Analysis
In this section, we design two experiments to discuss the effectiveness of the proposed classification method.Two airborne PolSAR datasets are employed to conduct the experiments for the validation of the proposed classification scheme, which all have ground truth measurements.

AIRSAR Data in Flevoland I
The first quad-polarimetric SAR dataset is the widely-used L-band data acquired by NASA/JPL AIRSAR system in Flevoland, The Netherlands, August 1989.The incidence angles are around 20 • at the near range and 44 • at the far range.There are 11 different terrain types marked in the ground truth, most of which are agricultural classes, such as stem beans, forest, wheat, bare land, rapeseed, pea, and so on.The size of this datum is 750 × 1024 pixels.The size of the ground truth data is around 68,188 pixels.The Pauli image and the ground truth image are shown in Figure 3.

AIRSAR Data in Flevoland II
The second experimental quad-polarimetric SAR dataset is acquired in the same area by the AIRSAR system in June 1991.There are 14 different terrain types measured with the ground truth in the 30 • -60 • incidence angle range: potato, beet, maize, wheat, grass, fruit, barley, beans, lucerne, flax, oats, onions, peas and rapeseed.The size of this L-band datum is 1024 × 1279 pixels.The size of the ground truth data is around 122,928 pixels.The Pauli image and the ground truth image are shown in Figure 4.

Polarimetric Feature Vector
After the speckle filtering process with the refined Lee method of a 3 × 3 window size, the polarimetric SAR feature space is constructed by the components of different target decomposition algorithms for each pixel, including those scattering components of Freeman, Huynen, Krogager and Yamaguchi decomposition, as well as the eigenvalues, eigenvectors and their consequential parameters, such as entropy, anisotropy and mean scattering angle.In total, 79 polarimetric features are retrieved from the two datasets, which are listed in Table 1.
On the other side, the processing efficiency will be improved with the decrease of the feature dimension.We design another method of feature vector construction to study the relation between the number of features and the classification accuracy.As a preliminary study, the correlation coefficient is chosen to evaluate the similarity among different features.Firstly, the correlation coefficient between each feature is calculated by the following equation, where Cov(X, Y) is the covariance between the single feature vector X and Y and Var is the variance of the feature vector.Secondly, a proper threshold should be set.In the experiment, the empirical threshold of the correlation coefficient is chosen as 97%.Thereafter, the features with a correlation coefficient above the threshold are selected as the feature vector.The new feature vector has 49 features, which have been indicated by the '#' sign in Table 1.

Classifier Parameter Tuning
As a global regularization parameter, the adjustment of λ is also important to the NRS algorithm's performance [37].By testing different radar data, it is found that basically when λ equals a small value, high classification accuracies can be achieved.Therefore, an optimal or suboptimal λ should be firstly determined for the subsequent experiments.Due to the need to traverse different values of λ, the running time will be too long if the entire image takes part in the classification.Therefore, we only select 900 samples as the testing dataset to select the suboptimal value of λ. Figure 5 displays the trend of classification accuracy as λ increases with the Flevoland I data.Thus, λ equal to 0.1 is used for the following experiments.NRS is implemented on the AIRSAR dataset in Flevoland I firstly, at the same time comparing with the one combined with MRF, i.e., NRS-MRF, classic SVM and the combination of SVM-MRF.It should also be noted that we will randomly select 300 pixels of each class as the training samples, and the rest of the pixels of each class as the testing samples.This is because for some datasets, there are big differences in the number of pixels between classes, and this imbalance problem could cause bad classification performance.
The classification results with 49 and 79 features are shown in Figure 6a-d and Figure 6e,f, respectively.The reduced feature vector impacts the NRS algorithm by 5% and has little effect on NRS-MRF method.In this sense, MRF not only utilizes the spatial information to improve the accuracy, but also can improve the efficiency through the insensitivity to the feature number.As for the experiments with 79 features, the NRS algorithm can achieve the overall accuracy of 74.09%.It is seen that there are many errors existing inside each class area.This phenomenon could be largely reduced when introducing the MRF method, taking the spatial information into consideration.Figure 6h shows the result of NRS-MRF, which is obviously improved with the overall accuracy of 99.68%.Most part of the misclassification errors inside the each class disappear.For comparison, the classification results of SVM and SVM-MRF are also shown in Figure 6e,f.The corresponding accuracies are 71.72% and 93.20%, respectively.In order to analyze the classification performance for each class, the confusion matrices of the SVM, NRS, SVM-MRF and NRS-MRF approaches are shown in Tables 2-5.As for the former three methods, except Class 11, each class has some points wrongly classified, while referring to the NRS-MRF method in Table 5, there are only a few pixels of Classes 7, 8, 9 and 10 classified into the wrong classes.These pixels existed in the margin area of classes, which could probably be affected by the filtering process.Then, the NRS and NRS-MRF are implemented on the AIRSAR dataset in Flevoland II, in comparison with classic SVM, and the combination of SVM-MRF.During the NRS implementation, the regularization parameter λ and the x parameter in the conversion equation take the values of 0.6 and −0.5, respectively.
The classification results are displayed in Figure 7a-h.For these experimental data, the difference between 49 and 79 features is greatly decreased.Referring to NRS with 79 features, the overall accuracy is 90.15%.Similarly, there are many errors existing inside each class area.After applying the combined NRS-MRF method, taking the spatial information into consideration, the classification accuracy is increased to 99.23%; while, the classification results of SVM and SVM-MRF are 87.10% and 94.33%, respectively.Hence, for this dataset, NRS and SVM are improved by the same scale with the introduction of the MRF method.

The Influence of Training Size
After the analysis of the parameter impacts and classification accuracy, it is necessary to discuss the proper rate of samples for training in the whole dataset.Taking the Flevoland I data as an example, we train with different ratios for all four algorithms, not fixed number of 300 training samples; the change tendency is described in Figure 8.It is clear that for all algorithms, 5% is enough to get stable classification accuracy.However, NRS has a big improvement and the best accuracy when combining with MRF, as the rate of training samples increases.The difference between Figure 8 and Figure 6 may have been noticed.As we before, Figure 6 is the result when the same amount of samples in each class is used for training, while Figure 8 is from the viewpoint of the same rate of samples in each class for training.

The Analysis of Classification Performance
As previously mentioned, the proposed method is capable of keeping a good classification performance with the limited training data and the reduced polarimetric features.The main reason may come from the collaboration of NRS and MRF.MRF can greatly decrease the misclassifications based on the good homogeneousness assumption.However, for different sensors and areas, this advantage may be invalid.Therefore, the improvement of MRF or other post-processing methods will be still worth studying in the future research work.

Conclusions and Future Work
In this paper, the NRS classifier is employed for polarimetric features extracted from PolSAR images, by the use of TD.Considering that the spatial information is not well utilized, MRF is also introduced to our scheme, since it could provide a basis for modeling contextual constraints.The polarimetric SAR feature space is constructed by the components of different target decomposition algorithms for each pixel, using two airborne PolSAR datasets in Flevoland area.In the implementation of the classification, the impact of the regularization parameter of NRS and the conversion coefficient of MRF on the overall accuracy of classification are analyzed, respectively.Furthermore, the proper number of training samples is also discussed.Experimental results of Flevoland I data demonstrated that the classification accuracy of the proposed NRS-MRF is superior to the original NRS and SVM with 25% and 28%, respectively.Moreover, NRS-MRF is even superior to the state-of-the-art SVM-MRF with 6% under the condition of randomly selecting 300 pixels of each class as the training samples.To reach a high classification accuracy, the selected training data ratio should be at least 4%.The future work of our group may focus on the study of deep learning, which is used to extract features and classify targets.
[T 0 ] indicates the single target component and [T N ] indicates the residual component.Actually, there exist three ways in which the measured coherency [T] matrix can be factorized into a pure target [T 0 ] and a distributed N-target [T N ].The original decomposition proposed by Huynen is one of the three possible decomposition structures.Those other two decomposition structures are presented later by Barnes and Holm and are also employed in the feature vector.

Figure 1 .
Figure 1.Diagram of the proposed method.

Figure 2 .
Figure 2. Impact of parameter x on the classification accuracy. f

6 :
and feature vector of entire test image with each test sample represented Y = y p m p=1 .4: NRS classification: 5: for each p ∈ [1, m] do for each l ∈ [1, C] do 7:r[p][l] ← compute the residual distance r l y p according to Equation(19) ;

Figure 5 .
Figure 5.The influence of the regularization parameter on the classification accuracy for the Flevoland I data.

Figure 6 .
Figure 6.The classification results for Flevoland I data: (a-d) classification with 49 features; (e-h) classification with 79 features.

Figure 7 .
Figure 7.The classification results for Flevoland II data: (a-d) classification with 49 features; (e-h) classification with 79 features.

Figure 8 .
Figure 8.The influence of the training sample ratio on classification accuracy for Flevoland I data.

Table 2 .
Confusion matrix for Flevoland I data using SVM.

Table 3 .
Confusion matrix for Flevoland I data using NRS.

Table 4 .
Confusion matrix for Flevoland I data using SVM-MRF.

Table 5 .
Confusion matrix for Flevoland I data using NRS-MRF.