^{1}

^{*}

^{2}

^{1}

^{3}

^{1}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Averaged learning subspace methods (ALSM) have the advantage of being easily implemented and appear to outperform in classification problems of hyperspectral images. However, there remain some open and challenging problems, which if addressed, could further improve their performance in terms of classification accuracy. We carried out experiments mainly by using two kinds of improved subspace methods (namely, dynamic and fixed subspace methods), in conjunction with the [0,1] and [-1,+1] normalization methods. We used different performance indicators to support our experimental studies: classification accuracy, computation time, and the stability of the parameter settings. Results are presented for the AVIRIS Indian Pines data set. Experimental analysis showed that the fixed subspace method combined with the [0,1] normalization method yielded higher classification accuracy than other subspace methods. Moreover, ALSMs are easily applied: only two parameters need to be set, and they can be applied directly to hyperspectral data. In addition, they can completely identify training samples in a finite number of iterations.

Hyperspectral data provide detailed spectral information about ground scenes based on a huge number of channels with narrow contiguous spectral bands. Hyperspectral data can therefore better discriminate the spectral signatures of land-cover classes that appear similar when viewed by traditional multispectral sensors [

However, this increase of data dimensionality has introduced challenging methodological problems because of the incapacity of common image processing algorithms to deal with such high-volume data sets [

The subspace pattern recognition method is another dimensionality reduction method that can achieve dimension reduction and classification concurrently. The subspace method represents each class by a model of a linear subspace of a feature space. This method was originally proposed by Watanabe

For character or face image recognition, the processing object is a binary image or a single-band gray-scale image, but for hyperspectral data, the object is a high-dimensional gray-scale image (dimensions equal to the number of bands). Thus, the subspace method must be extended accordingly to hyperspectral data classifications.

In the specific context of hyperspectral data classification, averaged learning subspace methods (ALSM) for hyperspectral data classification have been described by our previous work [

Moreover, several critical issues are still unclear, for example, (1) how the data normalization method affects the subspace method, (2) how various approaches for selecting subspace dimensions affect the classification accuracy, (3) how learning parameters influence the training speed and classification accuracy, (4) how the size of the training data set influences the classification accuracy, and (5) how to compute eigenvalues from the correlation matrices.

To avoid overflow problems, high-dimensional hyperspectral data need to be normalized to unit-length before one performs the subspace training and classification procedure. The primary objective of image normalization is to remove the effects of outliers by limiting the extent of scatterplot data [

Another major problem with subspace methods regards eigenvalue computation algorithms. The computational cost of subspace methods critically depends on the eigenvalue computation methods; thus, we adopted the QR method [

In this paper, we present the dynamic subspace dimension method, which sets each subspace dimension independently in ALSMs (hereafter referred to as the dynamic subspace method), and the fixed subspace dimension method, which fixes subspace dimensions for each class as the same value as that used in ALSMs (hereafter referred to as the fixed subspace method) based on two normalization methods. We also carried out experimental studies on 16 land-cover classes using the “Indian Pines” 92AV3C9 data set collected from the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) hyperspectral sensor of June 1992 for the Indian Pines area, Indiana, USA (

The rest of this paper is organized as follows. First, we describe the main idea of subspace methods. Next, we present the data sets and associated processing steps, i.e., normalization methods and eigenvalue computation algorithms. Then, we show comparison results and analyses for AVIRIS hyperspectral data experiments between different normalization methods and our subspace methods. Finally, we present concluding remarks.

Subspace methods have been extended in many ways. The most basic is called class-featuring information compression (CLAFIC) [

Assume that available hyperspectral data from a given site contain ^{(1)}, ^{(2)}, …, ^{(K)} appear. A set of labeled pixels for all such classes should also be available, divided into training and test data sets.

Given a set of training samples _{k,i}^{n}^{(k)} (1 ≤ ^{(k)}, and _{k}^{k}^{n}^{(k)} for which _{k}_{k}_{k,1}, …, _{k,k}_{k,i}

^{k}_{k,i}

Where:
_{k,i,j}_{k,i}_{k,i,1}, …, _{k,i,p}^{T}

For the recognition (classification) task, one needs to compute the distance between the pattern vector (pixel)

Hence, finding the shortest distance is equivalent to finding the largest squared length of the orthogonal projection between pattern vector

Combining _{k}^{T}v_{k}

Let _{k}_{k}^{T}S_{k}_{k}_{k, 1}≥_{k, 2}≥, …, ≥_{k}, r_{k}_{k,1}, _{k,2}, _{k}, r_{k}

Where:

The proof is shown by Tsuda [

In summary, determining the subspace of class ^{(k)} is to solve the eigenvalue problem of matrix _{k}^{T}S_{k}_{k}^{T}S_{k}_{k,1}, _{k,2}, _{k}, r_{k}_{k}^{T}S_{k}_{k}^{k}^{(k)}.

CLAFIC has the drawback that subspaces obtained for one class are not dependent on subspaces of other classes. To avoid this problem, an iteration-learning algorithm, called the ALSM has been proposed [

At each step ^{(i)} is misclassified into another class, say ^{(j)}, or a sample vector of another class, say ^{(k)}, is misclassified into class ^{(i)}. We denote the conditional correlation matrix by:
^{(j)}. Based on current existing subspaces, all training samples are classified according to _{k}^{(i,j)}, ^{(i)} can be computed from _{k}^{(}^{i}^{)}.

The subspace dimension markedly affects the pattern recognition rate. The dimensionalities of class subspaces are decided in the CLAFIC stage and then are kept constant during the learning process. Methods for selecting the dimension can be divided into two types: (1) fixed subspace methods, which set a uniform dimension for all classes, and (2) dynamic subspace methods, which set subspace dimensions differently for each class.

For dynamic subspace methods, the selection of the dimension _{i}^{(i)} can be chosen based on a fidelity value (i.e., threshold)

In this section, we compare the proposed ALSM classification systems with two different normalization methods developed for ALSM. In the two normalization methods, we use them to normalize each pixel to a unit-length vector by dividing each element according to the vector length. This method can avoid the influence of noise pixels, since it does not use the values of neighboring pixels. Detailed descriptions of the two normalization methods are as follows.

Since high-dimensional hyperspectral data are usually at least 10 bits in size, the cumulative values of original high-dimensional hyperspectral data may cause overflow problems when we compute eigenvalues and eigenvectors from the correlation matrix in the ALSM training process without normalization. Hence, normalization is an important step of the algorithm.

There are many normalization methods. In this section, we only consider the [-1, +1] and [0, 1] normalization methods. The choice of scaling each attribute to the range [-1, +1] or [0, 1] is motivated by the successful application of the method to SVM classifiers [

In the [0, 1] normalization method, data are normalized to the range [0, 1] as follows: Given a pixel _{1}, _{2}, …, _{n}^{T}_{1}^{2}+ _{2}^{2}+…+ _{n}^{2}) denotes the pixel length. Obviously, each element value of the normalized pixel is located within the range [0, 1] and the length of the pixel is 1.

In the [-1, +1] normalization method, data are normalized to the range [-1, +1] and the scale can be adjusted such that the mean of the data is equal to zero. The [-1, +1] normalization procedure is given as follows: Given a pixel _{1}, _{2}, …, _{n}^{T}_{1}, _{2}, …, _{n}^{T}_{1}+_{2}+…+_{n}_{1}-^{2}+(_{2}-^{2}+ …+(_{n}-m^{2} = 0.

Computing the eigenvalues and eigenvectors of the correlation matrix of the input data vector (training samples) is a time-consuming process since the correlation matrix can be as large as bands × bands of elements in hyperspectral data. The time can be noticeably shortened by choosing an appropriate eigenvalue computation algorithm.

Let A be an ^{T}AQ

To verify the performance of the proposed ALSM algorithm, simulations were carried out on the “Indian Pine” AVIRIS 92AV3C data set, which consists of a 145 × 145 pixel portion [see

To avoid the possibility of overflow problems when computing eigenvalues and eigenvectors, all images, training data, and test data were normalized by the [-1,+1] and [0,1] normalization methods. Experiments with various parameter values were necessary to develop a reasonable subspace classifier. The following section describes the design and results these experiments.

Our objective was to optimize the accuracy of the ALSM classifier by solving the ALSM model selection issue (i.e., estimating the best values for the dimensions and learning parameters). Three types of experiments were carried out to determine how the classification accuracy is affected by the subspace dimension, normalization, and learning parameters. Furthermore, to assess the influence of the number of training data, we further varied the number of training samples drawn from the training set such that 50% of the original training data were used for training while maintaining a constant testing set in the fixed dimension method.

In all experiments, we set a stopping condition of learning iterations as the study data were completely identified. Iterations greater than 1,000 were not considered because learning process failed to converge after 1,000 iterations in our experiments. The proposed ALSM algorithms were developed by using C++ programs (Microsoft Visual Studio 2005.NET).

We varied the subspace dimension from small to large. For the fixed subspace method, we varied the dimension from 5 to 9. In the dynamic subspace method, we increased the fidelity value from 0.99985 to 0.99993 in steps of 0.00001. Experimental results indicate that when the fidelity value was smaller than 0.99985, the training process was unable to identify 100% of the training samples within 1000 steps and the classification accuracy dropped. When the fidelity value was greater than 0.99993, the classification accuracy rapidly dropped and caused a divergence. Therefore, we considered only fidelity values within the range [0.99985, 0.99993]. For similar reasons we did not consider dimensions smaller than 5 or greater than 9 in the fixed subspace method. The two learning parameters

In the dynamic subspace method, the subspace dimension is determined by

The classification accuracy and training time (maximum training iterations) as a function of the number of subspace dimensions are shown in

Similar behavior was also apparent in the fixed subspace method.

The classification accuracy decreased when the number of subspace dimensions increased in both the fixed and dynamic subspace methods because of the subspaces overlapping problem. When the fidelity value or subspace dimension increases, subspaces become “closer” or overlap with each other and some noise may be included in the subspace. However, if the fidelity value or the subspace dimension is smaller, subspaces are sufficiently separated, but the smaller the number of subspace dimensions, the more information is required to determine a class.

From the results shown in

We investigated the sensitivity of the classification accuracy and maximum training iterations to the learning parameters in both the dynamic and fixed subspace methods. The two learning parameters were set to the same constant value; results from using distinct values are discussed later.

In the dynamic subspace method, the fidelity value was fixed at 0.99985 and the corresponding convergence interval of learning parameters was [0.18, 0.42], implies the two learning parameters were set to the same constant value within [0.18, 0.42]. In the fixed subspace method, the number of dimensions was fixed at 5 and the corresponding convergence interval of learning parameters was [0.15, 0.51]. As shown in

We examined the behavior of the classification accuracy when the two learning parameters were set to different constant values in both the dynamic and fixed subspace methods. In the dynamic subspace method, we set the fidelity value to 0.99987 and the learning parameter

As shown in

The dynamic subspace method diverged when the [0,1] normalization method was used for various fidelity values and learning parameters.

First, we examined the behavior of the classification accuracy when the two learning parameters were set to different constant values. Similar to the behavior shown in

To assess the training effectiveness at each iteration step, we classified the test samples by concurrently generating subspaces according to

The accuracy of the training and testing data sets increased steadily with the learning iteration (

In

Since using two different learning parameters did not improve the classification accuracy, hereafter we do not consider such a case. For subspace dimensions, we consider only those from 5 to 9, since a dimension of less than 5 or more than 9 causes the training process to diverge or the classification accuracy to drop. We increased the learning parameters starting from 0.03 with a step-size of 0.03 in all cases. In

The five curves in

For examining the behavior of the classifiers with respect to the size of the training set, this experiment evaluated the effect of training set size on the performance of fixed subspace method in conjunction with the [0, 1] normalization.

To assess the influence of the number of training data, we reduced the number of training samples by 50% except for classes 7 and 9. For classes 7 and 9, we used the original training data set since the number of training samples was limited. The numbers of training and test samples are listed in

As expected, the experimental results clearly show the same behavior as in

In this paper, we have proposed strategies for the optimization of subspace algorithms by modification of the algorithms to make them more suitable for application to hyperspectral data sets. We modified the subspace methods based on the combination of a normalization technique and QR method, and applied them to an AVIRIS dataset to classify 16 land cover classes. Specifically, we verified the following:

The fixed subspace method in conjunction with the [0,1] normalization method is substantially more accurate than other approaches such as the dynamic subspace method.

When the two learning parameters are equal or close to each other, the classification accuracy increases. When the value of the learning parameters is large, the classification accuracy tends to increase and the training time shortens.

The classification accuracy is not sensitive to the dimension of the subspace when it is within at small interval, but a larger dimension tends to reduce the training time.

Experimental results clearly showed the classification accuracy increased with the size of the training data set.

Our experiments performed by using the subspace method indicate that it is an effective method: it possesses high-speed convergence and can completely identify training samples. Our findings can provide some guidance for the selection of subspace methods, e.g., effective dimension selection and parameter selection rules that make use of the benefits of subspace classifiers and avoid the weaknesses.

Additional aspects of this method remain to be investigated before it becomes operational. The method needs to be extended by considering a broader spectrum of land-cover classes that might also be aggregated to different informational levels. Moreover, data from different sensors and platforms need to be analyzed to explore the sensitivity and efficiency of our method for handling different spatial and spectral resolutions data. The subspace technique might be further improved by considering several subspaces instead of a single subspace in one class, or by combining with some other innovative methods such as kernel-based methods [

The authors would like to thank the Laboratory of Remote Sensing at Purdue University for providing the AVIRIS hyperspectral image data sets used at experiment. This work was supported by the Global Environment Research Fund (B-081) of the Ministry of the Environment, Japan.

Plots of the classification accuracy and maximum training iterations as a function of the number of subspace dimensions by the dynamic and fixed subspace methods. (a) Classification accuracy by dynamic subspace vs. mean dimension, (b) Maximum learning iterations by dynamic subspace vs. mean dimension, (c) Classification accuracy by fixed subspace vs. subspace dimension, and (d) Maximum learning iterations by fixed subspace vs. subspace dimension.

Classification accuracy and maximum training iterations as a function of learning parameters combined with [-1, +1] normalization. (a) Classification accuracy by the dynamic subspace method and classification accuracy by the fixed subspace method (b) maximum learning iterations by the dynamic subspace method and maximum learning iterations by the fixed subspace method. The fidelity value in the dynamic subspace method was 0.99985 and the dimension in the fixed subspace method was 5.

Classification accuracy and maximum iterations vs. the difference between

The dynamic subspace method diverged when the [0,1] normalization method was used.

Plots of the difference between

Plots of the accuracy rate vs. the number of iterations for the training and test samples. (a) Dimension of 7 with the learning parameter 0.54; the best test data accuracy of 91.79% was reached when the training iterations completed (at 167). (b) Dimension of 6 with the learning parameter 0.51; an accuracy of 91.34% was achieved when the training iteration completed (at 141). However, the best test data accuracy of 92.11% was reached at training iteration 76.

(a) Band 16 (central wavelength: 547.60 nm) of the AVIRIS Indian Pines data set and (b) ground truth. (c) Classification map obtained with the fixed subspace method combined with the [0, 1] normalization method. The subspace dimension was 7 and both learning parameters were 0.54. The overall classification accuracy was 91.79%.

(a) Classification accuracy and (b) maximum training iterations for various subspace dimensions and learning parameters. The highest classification accuracy of 91.79% was reached by D7 with a learning parameter of 0.54; the corresponding maximum training iteration was 167.

Behavior of the algorithm with low sample sizes. (a) Classification accuracy and (b) maximum training iterations for various subspace dimensions and learning parameters after reducing the number of training samples by 50%. The maximum classification accuracy value was reached at 89.50% in D4 for the learning parameter 0.51; the corresponding maximum training iteration was 141.

Land-cover classes and number of training and test samples in the AVIRIS indian pines data set.

C1. alfalfa | 26 | 26 |

C2. corn-notill | 671 | 671 |

C3. corn-min | 400 | 400 |

C4. corn | 98 | 99 |

C5. grass-pasture | 228 | 228 |

C6. grass-trees | 357 | 357 |

C7. grass-pasture | 13 | 13 |

C8. hay-windrowed | 241 | 241 |

C9. oats | 10 | 10 |

C10. soybean-notill | 480 | 480 |

C11. soybean-min | 1,137 | 1,137 |

C12. soybean-cleantill | 282 | 283 |

C13. wheat | 104 | 105 |

C14. woods | 617 | 618 |

C15. bldg-grass | 180 | 181 |

C16. stone-steel | 44 | 45 |

Subspace dimension of each class (see

C1 | C2 | C3 | C4 | C 5 | C 6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

4 | 6 | 7 | 6 | 4 | 7 | 2 | 5 | 2 | 8 | 8 | 8 | 6 | 5 | 8 | 5 |

Mean subspace dimensions for different fidelity values.

Fidelity value | 0.99985 | 0.99986 | 0.99987 | 0.99988 | 0.09989 | 0.9999 | 0.99991 | 0.99992 | 0.99993 |

Mean dimension | 4.94 | 5.69 | 6.56 | 7.75 | 9.19 | 10.81 | 12.81 | 14.88 | 17.63 |

Results of the confusion matrix.

0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 92 | ||

0 | 10 | 6 | 0 | 0 | 0 | 0 | 0 | 19 | 47 | 2 | 0 | 0 | 0 | 1 | 87.91 | ||

0 | 10 | 7 | 0 | 0 | 0 | 0 | 0 | 5 | 15 | 11 | 0 | 0 | 0 | 0 | 87.76 | ||

0 | 8 | 9 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 81.19 | ||

0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 98.68 | ||

0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 3 | 1 | 1 | 0 | 2 | 2 | 0 | 97.21 | ||

0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | ||

3 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 97.52 | ||

0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 90.91 | ||

0 | 7 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 24 | 2 | 0 | 0 | 0 | 1 | 92.04 | ||

0 | 27 | 27 | 2 | 2 | 1 | 0 | 0 | 0 | 32 | 8 | 0 | 0 | 0 | 3 | 91.1 | ||

0 | 1 | 10 | 0 | 1 | 0 | 0 | 0 | 0 | 3 | 5 | 0 | 0 | 1 | 0 | 92.45 | ||

0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 99.06 | ||

0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 39 | 0 | 93.74 | ||

0 | 0 | 0 | 1 | 0 | 5 | 0 | 3 | 0 | 0 | 0 | 1 | 0 | 17 | 0 | 83.44 | ||

0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 97.56 | ||

26 | 671 | 400 | 99 | 228 | 357 | 13 | 241 | 10 | 480 | 1137 | 283 | 105 | 618 | 181 | 45 | ||

88.46 | 92.1 | 86 | 82.83 | 98.25 | 97.76 | 69.23 | 97.93 | 100 | 86.67 | 91.82 | 90.81 | 100 | 96.93 | 75.14 | 88.89 | (%) |

Overall classification accuracy: 91.79%; Kappa coefficient: 0.9065

Numbers of training and test samples from the AVIRIS indian pines data set. Test samples were the same as in

Class | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 | Total |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Training | 13 | 335 | 200 | 49 | 114 | 178 | 13 | 120 | 10 | 240 | 568 | 141 | 52 | 308 | 90 | 22 | 2453 |

Test | 26 | 671 | 400 | 99 | 228 | 357 | 13 | 241 | 10 | 480 | 1137 | 283 | 105 | 618 | 181 | 45 | 4894 |

Comparisons of the maximum classification accuracy and the corresponding training time with 100% and 50% of training data.

100% of training data | 50% of training data | |||||||||
---|---|---|---|---|---|---|---|---|---|---|

Dimension | D5 | D6 | D7 | D8 | D9 | D4 | D5 | D6 | D7 | D8 |

Classification accuracy | 91.28% | 91.52% | 91.79% | 90.01% | 90.54% | 89.50% | 88.43% | 88.60% | 88.68% | 87.60% |

Training iterations | 553 | 97 | 167 | 33 | 41 | 141 | 68 | 44 | 27 | 11 |

Learning parameter | 0.45 | 0.42 | 0.54 | 0.75 | 0.63 | 0.51 | 0.54 | 0.72 | 0.72 | 0.84 |