^{*}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

This paper proposes a hybrid crop classifier for polarimetric synthetic aperture radar (SAR) images. The feature sets consisted of span image, the H/A/α decomposition, and the gray-level co-occurrence matrix (GLCM) based texture features. Then, the features were reduced by principle component analysis (PCA). Finally, a two-hidden-layer forward neural network (NN) was constructed and trained by adaptive chaotic particle swarm optimization (ACPSO). ^{−7} s.

The classification of different objects, as well as different terrain characteristics, with single channel monopolarisation SAR images can carry a significant amount of error, even when operating after multilooking [

The Wishart maximum likelihood (WML) method has often been used for PolSAR classification [

To overcome above shortcomings, polarimetric decompositions were introduced with an aim at establishing a correspondence between the physical characteristics of the considered areas and the observed scattering mechanisms. The most effective method is the Cloude decomposition, also known as H/A/α method [

In order to reduce the feature vector dimensions obtained by H/A/α and GLCM, and to increase the discriminative power, the principal component analysis (PCA) method was employed. PCA is appealing since it effectively reduces the dimensionality of the feature and therefore reduces the computational cost.

The next problem is how to choose the best classifier. In the past years, standard multi-layered feed-forward neural networks (FNN) have been applied for SAR image classification [

However, NNs suffer from converging too slowly and being easily trapped into local extrema if a back propagation (BP) algorithm is used for training [

In order to improve the performance of PSO, an adaptive chaotic PSO (ACPSO) method was proposed. In order to prevent overfitting, cross-validation was employed, which is a technique for assessing how the results of a statistical analysis will generalize to an independent data set and is mainly used to estimate how accurately a predictive model will perform in practice [

The structure of this paper is as follows: In the next Section 2 the concept of Pauli decomposition was introduced. Section 3 presents the span image, the H/A/α decomposition, the feature derived from GLCM, and the principle component analysis for feature reduction. Section 4 introduces the forward neural network, proposed the ACPSO for training, and discussed the importance of using

The features are derived from the multilook coherence matrix of the PolSAR data [_{qp}_{hv}_{vh}

The Pauli decomposition expresses the scattering matrix

Thus,

An RGB image could be formed with the intensities |^{2}, |^{2}, |^{2}. The meanings of _{a}_{b}_{c}

The coherence matrix is obtained as [

The average of multiple single-look coherence matrices is the multi-look coherence matrix. (_{11}, _{22}, _{33}) usually are regarded as the channels of the PolSAR images.

The proposed features can be divided into three types, which are explained below.

The span or total scattered power is given by:

H/A/α decomposition is designed to identify in an unsupervised way polarimetric scattering mechanisms in the _{2} and _{3}.

Then, the pseudo-probabilities of the

The entropy [

For high entropy values, a complementary parameter (anisotropy) [

The four estimates of the angles are easily evaluated as:

Thus, vectors from coherence matrix can be represented as (

Gray level co-occurrence matrix (GLCM) is a text descriptor which takes into account the specific position of a pixel relative to another. The GLCM is a matrix whose elements correspond to the relative frequency of occurrence of pairs of gray level values of pixels separated by a certain distance in a given direction [

GLCMs are suggested to be calculated from four displacement vectors with

The texture features consist of 4 GLCM-based features, which should be multiplied by 3 since there are three channels (_{11}, _{22}, _{33}). In addition, there are one span feature, and six

PCA is an efficient tool to reduce the dimension of a data set consisting of a large number of interrelated variables while retaining most of the variations. It is achieved by transforming the data set to a new set of ordered variables according to their variances or importance. This technique has three effects: It orthogonalizes the components of the input vectors so that uncorrelated with each other, it orders the resulting orthogonal components so that those with the largest variation come first, and eliminates those components contributing the least to the variation in the data set [

More specifically, for a given n-dimensional matrix

The detailed steps of PCA are as follows: (1) organize the dataset; (2) calculate the mean along each dimension; (3) calculate the deviation; (4) find the covariance matrix; (5) find the eigenvectors and eigenvalues of the covariance matrix; (6) sort the eigenvectors and eigenvalues; (7) compute the cumulative energy content for each eigenvector; (8) select a subset of the eigenvectors as the new basis vectors; (9) convert the source data to z-scores; (10) project the z-scores of the data onto the new basis. _{1}, _{2}}, and the new basis is {_{1}, _{2}}. After the data was projecting onto the new basis, we can find that the data focused along the first dimension of the new basis.

Neural networks are widely used in pattern classification since they do not need any information about the probability distribution and the

The training vectors are formed from the selected areas and normalized and presented to the NN which is trained in batch mode. The network configuration is _{I}_{H}_{1} × _{H}_{2} × _{O}_{I}_{H}_{1} neurons in the first hidden layer, _{H}_{2} neurons in the second hidden layer, and _{O}

The traditional NN training method can easily be trapped into the local minima, and the training procedures take a long time [

In PSO, each potential solution is represented as a particle. Two properties (position _{1} and _{2} are positive constants, called “_{1} and _{2} are random numbers that are uniformly distributed in the interval [0,1]. These random numbers are updated every time when they occur. Δ

The population of particles is then moved according to Equations (_{max}, should not be exceeded by any particle to keep the search within a meaningful solution space. The PSO algorithm runs through these processes iteratively until the termination criterion is satisfied.

Let _{i}_{i}_{i}

Step 1 Initialize every particle’s position with a uniformly distributed random vector;

Step 2 Initialize every particle’s best known position to its initial position, _{i}_{i}

Step 3 If _{i}_{i}

Step 4 Repeat until certain termination criteria was met

Step 4.1 Pick random numbers _{1} & _{2};

Step 4.2 Update every particle’s velocity according to formula (16);

Step 4.3 Update every particle’s position according to formula (17);

Step 4.4 If _{i}_{i}_{i}_{i}_{i}_{i}

Step 5 Output

In order to enhance the performance of canonical PSO, two improvements are proposed as follows. The inertia weight

Here, _{max} denotes the maximum inertial weight, _{min} denotes the minimum inertial weight, _{max} denotes the epoch when the inertial weight reaches the final minimum, and

The parameters (_{1}, _{2}) were generated by pseudo-random number generators (RNG) in classical PSO. The RNG cannot ensure the optimization’s ergodicity in solution space because they are pseudo-random; therefore, we employed the Rossler chaotic operator [_{1}, _{2}). The Rossler equations are as follows:

Here

The dynamic properties of _{1} = _{2} =

There are some other chaotic PSO methods proposed in the past. Wang

The main difference between our ACPSO and popular PSO lies in two points: (1) we introduced in the adaptive inertia weight factor strategy; (2) we used the Rossler attractor because of the following advantages [

Step 1 Initialize every particle’s position with a uniformly distributed random vector;

Step 2 Initialize every particle’s best known position to its initial position, _{i}_{i}

Step 3 If _{i}_{i}

Step 4 Repeat until certain termination criteria was met:

Step 4.1 Update the value of ω according to formula (18);

Step 4.2 Pick chaotic random numbers _{1} & _{2} according to formula (19)

Step 4.3 Update every particle’s velocity according to formula (16);

Step 4.4 Update every particle’s position according to formula (17);

Step 4.5 If _{i}_{i}_{i}_{i}_{i}_{i}

Step 5 Output

Let _{1}, _{2}, _{3} represent the connection weight matrix between the input layer and the first hidden layer, between the first and the second hidden layer, and between the second hidden layer and the output layer, respectively. When the ACPSO is employed to train the multi-layer neural network, each particle is denoted by:

The outputs of all neurons in the first hidden layer are calculated by following steps:

Here _{i}_{1}_{j}_{H}_{2}_{j}

The outputs of all neurons in the output layer are given as follows:

Here _{O}

The error of one sample is expressed as the MSE of the difference between its output and the corresponding target value:
_{k}_{S}_{S}_{1}, _{2}, _{3}). Our goal is to minimize this fitness function

Cross validation methods consist of three types: Random subsampling,

A challenge is to determine the number of folds. If

If the model selection and true error estimation are computed simultaneously, the data needs to be divided into three disjoint sets [

Flevoland, an agricultural area in The Netherlands, is chosen as the example. The site is composed of strips of rectangular agricultural fields. The scene is designated as a supersite for the earth observing system (EOS) program, and is continuously surveyed by the authorities.

The Pauli image of Flevoland is shown in

The basic span image and three channels (_{11}, _{22}, _{33}) are easily obtained and shown in _{11}, _{22}, _{33} are shown in

The curve of cumulative sum of variance with dimensions of reduced vectors via PCA is shown in

The classification is run over 13 classes, bare soil 1, bare soil 2, barley, forest, grass, lucerne, peas, potatoes, rapeseed, stem beans, sugar beet, water, and wheat. Our strategy is a semiautomatic method, viz. the training area was chosen and labeled manually. For each crop type, we choose a square of size 20 × 20, which is easy to perform since the training area size is 13 × 20 × 20 = 5,200 compared to the size of the whole image is 1,024 × 750 = 768,000. In order to reduce the complexity of experiment, the test areas are chosen randomly from rest areas [

The final manually selected training areas are shown in

_{I}_{H}_{1} and _{H}_{2} are set as 10 via the information entropy method [

The network was trained by the proposed ACPSO algorithm, of which the parameters are obtained via trial-and-error method and shown in

The curves of function fitness

The confusion matrices on training area of our method are calculated and shown in

A typical classification accuracy of both training area and test area by BP, ABP, MBP, and PSO are listed in

Yudong also used Resilient back-propagation (RPROP) algorithm to train the neural network to classify the same Flevoland area [

In order to compare the robustness of each algorithm, we perform each algorithm 50 runs and calculated the minimum, the average, and the maximum of the classification rates. The results are listed in

Computation time is another important factor used to evaluate the classifier. The time for network training of our algorithm costs about 120 s, which can be ignored since the weights/biases of the NN remain fixed after training unless the property of images changes greatly. For example, the main crops in Flevoland are involved in the 13 types shown in ^{−7}s, which is fast enough for real time applications.

In this study, a crop classification classifier was constructed by following stages. First, a hybrid feature set was introduced which was made up of the span image, the H/A/α decomposition, and the GLCM-based texture features. Afterwards, PCA was carried on to reduce the features. The principle components were sent to the two-hidden-layer neural network, which was trained by the proposed ACPSO method. 10-fold cross validation was employed to prevent overfitting. Experiments on Flevoland site show that the proposed ACPSO-NN obtains satisfying results. The ACPSO trains the neural network more efficiently and effectively than BP, ABP, MBP, PSO, and RPROP methods. More rigorous testing on more complex problems will be performed in future works.

The research is financed by following projects: (1) National Natural Science Foundation of China (#60872075); (2) National Technical Innovation Project Essential Project Cultivate Project (#706928) and (3) Nature Science Fund in Jiangsu Province (#BK2007103).

Geometric Illustration of PCA.

A three-layer neural network.

Flow chart of the PSO algorithm.

A Rossler chaotic number generator with

Chaotic sequence of (

A 5-fold cross validation.

Pauli Image of Flevoland (1,024 × 750). (

Basic span image and three channels image. (_{11} (dB); (_{22} (dB); (_{33}(dB).

Parameters of H/A/α decomposition. (

GLCM-based features of _{11}. (

GLCM-based features of _{22}. (

GLCM-based features of _{33}. (

Cumulative sum of variance

Sample data areas of Flevoland. (

The curve of fitness

Confusion Matrixes of ACPSO-NN algorithm. (

Pauli bases and their corresponding meanings.

_{a} |
Single- or odd-bounce scattering |

_{b} |
Double- or even-bounce scattering |

_{c} |
Those scatterers which are able to return the orthogonal polarization to the one of the incident wave (forest canopy) |

Properties of GLCM.

Contrast | Intensity contrast between a pixel and its neighbor | Σ|^{2} |

Correlation | Correlation between a pixel and its neighbor (μ denotes the expected value, and |
Σ [(_{i}_{j}_{i}_{j} |

Energy | Energy of the whole image | Σ^{2}( |

Homogeneity | Closeness of the distribution of GLCM to the diagonal | Σ[ |

Large

Large | ↓ | ↑ | ↑ |

small | ↑ | ↓ | ↓ |

Purposes of different subsets.

Training | Learning to fit the parameters of the classifier |

Validation | Estimate the error rate to tune the parameters of the classifier |

Testing | Estimate the true error rate to assess the classifier |

Detailed cumulative sum of variance.

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |

26.31 | 42.98 | 52.38 | 60.50 | 67.28 | 73.27 | 78.74 | 82.61 | 86.25 | |

| |||||||||

10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | |

89.52 | 92.72 | 95.50 | 98.06 | 98.79 | 99.24 | 99.63 | 99.94 | 99.97 |

Sample numbers of training and test area.

5,200 10 loops (4,680 for train and 520 for validation) | 5,200 | 10,400 |

Parameters of PSO & ACPSO.

Dimensions | 393 | 393 |

_{max} |
0.04 | 0.04 |

Maximum Iterations | 2,000 | 2,000 |

_{max} |
1,500 | 1,500 |

24 | 24 | |

_{1} |
2 | 2 |

_{2} |
2 | 2 |

Function tolerance | 1e^{−6} |
1e^{−6} |

ω_{max} |
- | 0.9 |

ω_{min} |
- | 0.4 |

- | 0.2 | |

- | 0.4 | |

- | 5.7 |

A typical classification accuracy of different algorithms (Maximum iterations = 2,000).

Random | 7.69% | 7.69% | 7 |

MBP | 8.8% | 7.5% | 6 |

BP | 8.3% | 8.2% | 5 |

ABP | 90.7% | 86.4% | 4 |

PSO | 98.1% | 88.7% | 3 |

RPROP[ |
98.62% | 92.87% | 2 |

ACPSO | 99.0% | 94.0 | 1 |

Statistical results of different algorithms (Maximum iterations = 2,000).

| ||||||
---|---|---|---|---|---|---|

Random | 7.58% | 7.69% | 7.83% | 7.58% | 7.69% | 7.81% |

MBP | 8.52% | 8.83% | 9.08% | 6.98% | 7.44% | 7.92% |

BP | 7.96% | 8.33% | 8.65% | 7.90% | 8.17% | 8.35% |

ABP | 81.04% | 87.18% | 94.12% | 76.60% | 83.55% | 89.83% |

PSO | 95.83% | 97.68% | 98.52% | 83.15% | 89.32% | 91.54% |

RPROP | 97.63% | 98.71% | 98.90% | 90.87% | 92.65% | 93.77% |

ACPSO | 98.15% | 98.84% | 99.13% | 92.56% | 93.80% | 94.52% |

Computation Time of Flevoland image classification.

Span | 0.13 s |

H/A/α decomposition | 0.24 s |

GLCM | 0.23 s |

PCA | 0.18 s |

NN Training |
120 s |

Classification | 0.048 s |

(* denotes training time can be ignored)