This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
This paper proposes a hybrid classifier for polarimetric SAR images. The feature sets consist of span image, the H/A/α decomposition, and the GLCMbased texture features. Then, a probabilistic neural network (PNN) was adopted for classification, and a novel algorithm proposed to enhance its performance. Principle component analysis (PCA) was chosen to reduce feature dimensions, random division to reduce the number of neurons, and Brent’s search (BS) to find the optimal bias values. The results on San Francisco and Flevoland sites are compared to that using a 3layer BPNN to demonstrate the validity of our algorithm in terms of confusion matrix and overall accuracy. In addition, the importance of each improvement of the algorithm was proven.
The classification of different objects, as well as different terrain characteristics, with single channel monopolarisation SAR images can carry a significant amount of error, even when operating after multilooking [
The Wishart maximum likelihood (WML) method has often been used for PolSAR classification [
To overcome above shortcomings, polarimetric decompositions were introduced with an aim to establish a correspondence between the physical characteristics of the considered areas and the observed scattering mechanisms. There are seven famous decomposition methods: Pauli [
Recently, texture information has been extracted and used as a parameter to enhance the classification results. The texture parameters can be defined as many types, such as entropy [
Thus, we chose the combination of H/A/α and GLCM as the parameter set of our method. The next problem is how to choose the best classifier. In the past, standard multilayered feedforward NNs with a back propagation (BP) algorithm have been applied for SAR image classification [
However, BP needs much effort to determine the architecture of networks and more computations for training. Moreover, BP yields deterministic but not probabilistic results. This makes it technically impractical in classifications. Probabilistic neural networks (PNNs), therefore, are effective alternatives that are faster in determining the network architecture and in training. Moreover, PNNs provide probabilistic viewpoints and deterministic classification results [
The input weights and layer weights of PNN can be set directly from the available data, while the bias traditionally is difficult to determine, so it is usually obtained manually either by iterative experiments or by an exhaustive algorithm [
The structure of this paper is as follows: In the next section, we introduce the concept of Pauli decomposition. Section 3 presents the feature set, namely, the span image, the H/A/α decomposition, and the feature derived from GLCM. In section 4, the mechanism, structure and shortcomings of PNNs are introduced. Section 5 proposes our method and expatiates on the three important improvements: PCA, random division and optimization by Brent’s Search. Section 6 applied our method to terrain classification on San Francisco site, and find that our method performs better than 3layer BPNN method. Section 7 applied our method to crop classification on Flevoland site. Section 8 discusses the significances of combined feature sets, random division, and PCA. Finally, Section 9 concludes this paper.
The features are derived from the multilook coherence matrix of the polarimetric SAR data. Suppose
The Pauli decomposition expresses the scattering matrix
Thus,
An RGB image could be formed with the intensities 
The coherence matrix is obtained as:
The average of multiple singlelook coherence matrices is the multilook coherence matrix. (
The proposed features can be divided into three types, which are explained below.
The span or total scattered power indicates the received power by a fully polarimetric system and is given by:
Cloude and Potter [
Then, the pseudoprobabilities of the
The entropy indicates the degree of statistical disorder of the scattering phenomenon. It can be defined as:
For high entropy values, a complementary parameter (anisotropy) is necessary to fully characterize the set of probabilities. The anisotropy is defined as the relative importance of the second scattering mechanisms [
The four estimates of the angles are easily evaluated as:
Thus, vectors from coherence matrix can be represented as (
The Gray level cooccurrence matrix (GLCM) is a text descriptor which takes into account the specific position of a pixel relative to another. The GLCM is a matrix whose elements correspond to the relative frequency of occurrence of pairs of gray level values of pixels separated by a certain distance in a given direction [
Where (
GLCMs are suggested to calculate from four displacement vectors with
The four features are extracted from normalized GLCMs, the sum of which is equal to 1. Suppose the normalized GLCM value at (
The texture features consist of 4 GLCMbased features, which should be multiplied by 3 since there exist three channels (
Neural networks are widely used in pattern classification since they do not need any information about the probability distribution and the
Taking a two categories situation as an example, we should decide the known state of nature
Here,
In a simple case that assumes the loss function and
Here,
The mathematical expression of PNN can be expressed as:
In this paper, the
The
This type of setting can produce a network with zero errors on training vectors, and obviously it does not need any training.
Suppose
However, it is obvious that
If it is too small, the spread of each radial basis layer function becomes too large, and the network will take too many nearby design vectors into account, moreover, the radial basis neurons will output large values (near 1) for all the inputs used to design the network. If it is too larger, the spread becomes near zero, and the network will degrades as a nearest neighbor classifier.
Here we propose a novel method to solve the above two problems. The main idea is shown in
Excessive features increase computation times and storage memory. Furthermore, they sometimes make classification more complicated, which is called the curse of dimensionality. It is necessary to reduce the number of features.
Principal component analysis (PCA) is an efficient tool to reduce the dimensionality of a data set consisting of a large number of interrelated variables, while retaining most of the variations. It is achieved by transforming the data set to a new set of ordered variables. This technique has three effects: it orthogonalizes the components of the input vectors so that uncorrelated with each other, it orders the resulting orthogonal components so that those with the largest variation come first, and eliminates those components contributing the least to the variation in the data set.
It should be noted that the input vectors should be normalized to have zero mean and unity variance before performing PCA, which is shown in
The normalization is a standard procedure. Details about PCA can be found in Ref. [
Realistic sample numbers
The goal of finding optimal
BS is a linear search, a hybrid of the golden section search and a quadratic interpolation. Golden section search has a firstorder rate of convergence, while polynomial interpolations have an asymptotic rate faster than superlinear. On the other hand, the rate of convergence for the golden section search starts when the algorithm is initialized, whereas the asymptotic behavior for the polynomial interpolations can take many iterations to become apparent. BS attempts to combine the best features of both approaches. BS has the advantage that it does not require computation of the derivative, which greatly fits the optimization problem.
The NASA/JPL AirSAR Lband data for the San Francisco (California, USA) area was used for the experiments. Its size is 1,024 × 900. In order to reduce the computations, the subarea with size 600 × 600 was extracted from the leftupper point of original image. The ground truth of the test site can be found at Ref. [
Quantitative information about the experiment is described as follows, where ‘•’ denotes parameters known before simulation and ‘♦’ denotes the parameters obtained at the initial stage of the experiment.
•Number of features: 19
♦ Number of reduced features by PCA: 11 (obtained by performing PCA on total available pairs)
•Location of Sub San Francisco Area:
Xrange: 1–600
Yrange: 1–600
•Location of Training/Test Rectangular Area (the first and second pixels denote the coordinate of the leftupper point of the rectangle, the third and forth pixels denote the width and length of the rectangle)
Sea:
Training Area1 [100 500 60 60]
Training Area2 [300 200 60 60]
Test Area [500 50 60 60]
Urban:
Training Area1 [450 400 60 60]
Training Area2 [500 250 60 60]
Test Area [500 530 60 60]
Vegetated
Training Area1 [50 50 60 60]
Training Area2 [50 250 60 60]
Test Area [320 450 60 60]
•Parameters of GLCM
local area: 5 × 5 (pixels)
Number of gray levels: 8
Offset: [0 1]
•Properties of available training/target pairs
Pairs = 21,600
R = 11
K = 3
P (size 11 × 21,600)
T (size 3 × 21,600)
♦ Training Ratio: 0.01 (obtained by simple iterative tests)
•Validation Ratio: 0.99
•Properties of NN optimized by our approach
•Q = Pairs × trainRatio = 216
♦ b = 4.73(obtained by BS method)
•IW = P (size: 216 × 11)
•LW = T (size: 3 × 216)
•Properties of BS Method
Tolerance X Value: 1e–3
Tolerance Function Value: 1e–5
Maximum Iterative Steps: 30
•Hardware: Pentium 4 CPU 1.66 GHz, 512 MB of RAM
•Software: PolSARpro v4.0, Neural Network Toolbox of Matlab 7.8(R2009)
The subarea (600 × 600) is shown in
Then, the basic span image and three channels (
The curve of cumulative sum of variance with dimensions of reduced vectors via PCA is shown in
Thus, 11 new features obtained via PCA are input to the NN for classification training.
The classification is run over three classes, the sea, the urban areas and the vegetated zones. The training and testing areas are selected manually shown in
The IW and LW are easily set according to our novel approach, and the number of neurons decreases from 21,600 to only 216. The
The optimal
We use the trained PNN to classify the whole image, and the results are shown in
From
Finally, our method is compared to the 3layer BPNN [
It is obvious that the classification accuracies of our proposed method in training area are all higher than 32.5% (33.3% denotes the perfect classification). For the testing area, classification accuracies are all higher than 30.1%. The main drawback is around 3.3% of vegetated zones are misclassified as urban area.
The overall accuracies are calculated as CM_{11} + CM_{22} + CM_{33} and listed in
Flevoland, an agricultural area in The Netherlands, was chosen as another example. The site is composed of strips of rectangular agricultural fields. The scene is designated as a supersite for the earth observing system (EOS) program, and is continuously surveyed by the authorities. The ground truth of the test site can be seen in Ref [
•Number of features: 19
♦ Number of reduced features by PCA: 13 (obtained by performing PCA on total available pairs)
•Location of Train/Test Rectangular Area
Bare Soil 1:
Train Area [240 300 20 20]
Test Area [770 490 20 20]
Bare Soil 2
Train Area [335 440 20 20]
Test Area [420 425 20 20]
Barley
Train Area [285 500 20 20]
Test Area [765 425 20 20]
Forest
Train Area [959 155 20 20]
Test Area [900 490 20 20]
Grass
Train Area [535 240 20 20]
Test Area [500 303 20 20]
Lucerne
Train Area [550 495 20 20]
Test Area [505 550 20 20]
Peas
Train Area [523 330 20 20]
Test Area [436 200 20 20]
Potatoes
Train Area [32 40 20 20]
Test Area [655 307 20 20]
Rapeseed
Train Area [188 200 20 20]
Test Area [280 250 20 20]
Stem Beans
Train Area [800 350 20 20]
Test Area [777 384 20 20]
Sugar beet
Train Area [877 444 20 20]
Test Area [650 225 20 20]
Water
Train Area [965 50 20 20]
Test Area [961 201 20 20]
Wheat
Train Area [780 710 20 20]
Test Area [700 520 20 20]
•Parameters of GLCM
local area: 5×5 (pixels)
Number of gray levels: 8
Offset: [0 1]
•Properties of available training/target pairs
Pairs = 5200
R = 13
K = 13
P (size 13 × 5200)
T (size 13 × 5200)
♦ Training Ratio: 0.2 (obtained by simple iterative tests)
•Validation Ratio: 0.8
•Properties of NN optimized by our approach
•Q = Pairs × trainRatio = 1040
♦ b = 1.0827(obtained by BS method)
•IW = P (size: 13 × 1040)
•LW = T (size: 13 × 1040)
•Properties of BS Method
Tolerance X Value: 1e–3
Tolerance Function Value: 1e–5
Maximum Iterative Steps: 30
•Hardware: Pentium 4 CPU 1.66 GHz, 512 MB of RAM
•Software: PolSARpro v4.0, Neural Network Toolbox of Matlab 7.8(R2009)
The Pauli image of Flevoland is shown in
The basic span image and three channels (
The curve of cumulative sum of variance with dimensions of reduced vectors via PCA is shown in
The classification is run over 11 classes, bare soil 1, bare soil 2, barley, forest, grass, Lucerne, peas, potatoes, rapeseed, stem beans, and sugar beet. They are selected manually according to the ground truth [
Since the types of classes increase (3 to 13) while the available data decrease (21,800 to 5,200), thus, we should dispose more data to the training subset. Finally,
The IW and LW are easily set according to our novel approach, and the number of neurons decreases from 5200 to only 1040. The
The optimal
The confusion matrices on training area and testing area are calculated and listed in
We apply our method on the whole image. The results are shown in
The BS has an important effect on our algorithm as shown in
The feature sets can be divided into two types. One is the polarimetric feature set, which contains the span, the six H/A/α parameters; the other is the texture feature set, which contains the properties extracted from the GLCM.
If we do not use the random division, the structure of PNN will increase 1/
From another point of view, the overall accuracy of traditional method was expected to be much higher than that of our method since it uses a great many neurons, whereas, in fact, they are nearly the same. The reason may consist of the optimization of
PNNs with and without PCA are investigated in the same manner as in Section 7.1. Their computation times are depicted in
In addition, the overall accuracies of these two PNNs are observed. It should be noted that input data of the PNN without PCA still should be normalized although the PCA is omitted, otherwise the performance of PNN will decrease rapidly.
The overall accuracies obtained by the two PNNs are pictured in
In this paper, a hybrid feature set has been introduced which is made up of the span image, the H/A/α decomposition, and the GLCMbased texture features. Then, a probabilistic neural network has been established. We proposed a novel weights/biases setting method based on Brent’s method and PCA. The method can decrease the feature set, reduce the number of neurons, and find optimal bias values.
Experiments of terrain classification on a San Francisco site and a crop classification on Flevoland show that our method can obtain good results which are more accurate than those of 3layer BPNN. Afterwards, combined feature set, random division and PCA are assumed to be omitted in turn, and the results prove the indispensability of each improvement.
Outline of PNN (R, Q, and K represent number of elements in input vector, input/target pairs, and classes of input data, respectively. IW and LW represent input weight and layer weight, respectively).
The outline of our method.
Using normalization before PCA.
Pauli image of subarea of San Francisco.
Basic span image and three channels image.
Parameters of H/A/Alpha decomposition.
GLCMbased features of
GLCMbased features of
GLCMbased features of
The curve of cumulative sum of variance with dimensions.
Sample data of San Francisco (Red denotes sea, green urban areas, blue vegetated zones).
The curve of error versus step.
The curve of b versus step.
Classification results of the whole image.
Pauli Image of Flevoland (1024 × 750).
Basic span image and three channels image.
Parameters of H/A/Alpha decomposition.
GLCMbased features of T_{11}.
GLCMbased features of T_{22}.
GLCMbased features of T_{33}.
The curve of cumulative sum of variance with dimensions.
Sample data areas of Flevoland.
The curve of error versus step.
The curve of b versus step.
Confusion matrix comparison on train area (values are given in percent) The overall accuracy is 93.71%.
Confusion matrix comparison on test area (values are given in percent) The overall vccuracy is 86.2%.
Classification Map of our method.
Computation time with square width.
The overall accuracy versus square width.
Pauli bases and their corresponding meanings.
S_{a}  Single or oddbounce scattering 
S_{b}  Double or evenbounce scattering 
S_{c}  Those scatterers which are able to return the orthogonal polarization to the one of the incident wave (forest canopy) 
Properties of GLCM.
Contrast  Intensity contrast between a pixel and its neighbor 

Correlation  Correlation between a pixel and its neighbor (μ denotes the expected value, and σ the standard variance) 

Energy  Energy of the whole image 

Homogeneity  Closeness of the distribution of GLCM to the diagonal 

Detailed data of PCA on 19 features.
1  2  3  4  5  6  7  8  9  
37.97  50.81  60.21  68.78  77.28  82.75  86.27  89.30  92.27  
 
10  11  12  13  14  15  16  17  18  
94.63  96.36  97.81  98.60  99.02  99.37  99.62  99.80  99.92 
Comparison of confusion matrix. (O denotes the output class, T denotes the target class).
 

7158  4  60  3600  42  5  
33.1%  0.0%  0.3%  33.3%  0.4%  0.0%  
0  6882  136  0  3429  355  
0%  31.9%  0.6%  0.0%  31.7%  3.3%  
42  314  7004  0  129  3240  
0.2%  1.4%  32.4%  0.0%  1.2%  30.0%  
 
7150  0  76  3597  33  0  
33.1%  0.0%  0.4%  33.3%  0.3%  0.0%  
2  7074  74  0  3445  354  
0%  32.8%  0.3%  0.0%  31.9%  3.3%  
48  126  7050  3  122  3246  
0.2%  0.6%  32.6%  0.0%  1.1%  30.1% 
Overall accuracies (values are given in percent).
97.4%  95.1%  
98.5%  95.3% 
Detailed data of PCA on 19 features.
1  2  3  4  5  6  7  8  9  
26.31  42.98  52.38  60.50  67.28  73.27  78.74  82.61  86.25  
 
10  11  12  14  15  16  17  18  
89.52  92.72  95.50  98.79  99.24  99.63  99.94  99.97 
Comparison of PNNs using polarimetric feature set, texture feature set, and combined feature set (TR denotes Classification Accuracy of Total Random).
Comparison of PNN with and without our weights/biases setting (RD denotes Random Division).
1.0818  0.0231  46.8  94.8%  94.9%  
4.0803  0.0386  105.7  95.5%  95.5%  
22.4270  0.0751  298.6  96.3%  96.2%  
58.1409  0.1125  516.8  95.9%  95.4% 