Band Ranking via Extended Coe ﬃ cient of Variation for Hyperspectral Band Selection

: Hundreds of narrow bands over a continuous spectral range make hyperspectral imagery rich in information about objects, while at the same time causing the neighboring bands to be highly correlated. Band selection is a technique that provides clear physical-meaning results for hyperspectral dimensional reduction, alleviating the di ﬃ culty for transferring and processing hyperspectral images caused by a property of hyperspectral images: large data volumes. In this study, a simple and e ﬃ cient band ranking via extended coe ﬃ cient of variation (BRECV) is proposed for unsupervised hyperspectral band selection. The naive idea of the BRECV algorithm is to select bands with relatively smaller means and lager standard deviations compared to their adjacent bands. To make this simple idea into an algorithm, and inspired by coe ﬃ cient of variation (CV), we constructed an extended CV matrix for every three adjacent bands to study the changes of means and standard deviations, and accordingly propose a criterion to allocate values to each band for ranking. A derived unsupervised band selection based on the same idea while using entropy is also presented. Though the underlying idea is quite simple, and both cluster and optimization methods are not used, the BRECV method acquires qualitatively the same level of classiﬁcation accuracy, compared with some state-of-the-art band methods


Introduction
Hyperspectral images have a wide range of applications, such as change detection [1], target detection [2][3][4], semantic interpretation [5] and image classification [6][7][8]. The reason why hyperspectral images can identify and distinguish a variety of materials is that they have a large amount of narrow spectral bands [9]. On the other hand, while providing detailed spectral measurements, the large number of bands also make the hyperspectral images inconvenient to acquire, store, transmit, process, and also cause the curse of dimensionality [10,11]. However, a subset (i.e., a few bands) of an entire hyperspectral data set could be sufficient to identify and distinguish objects as the information in the nearby bands is highly correlated [12,13]. As a result, dimensional reduction should be applied to reduce the inter-band spectral redundancy without losing significant information in data exploitation [14][15][16] and to fulfill the requirements of following tasks, taking classification after dimensional reduction as an example. •

Taita Hills
The Taita Hills data set was captured by airborne AisaEAGLE (Specim Ltd., Finland) imaging spectrometer in Taita-Taveta District, Kenya, in 2012 [8]. The image consists of 586 × 701 pixels with 64 bands from 0.4 µm to 1.0 µm in 0.6 m ground resolution. The image is classified with field information to six agricultural classes, namely to "Acacia" (Acacia spp.), "Banana" (Musa acuminate), "Grevillea" (Grevillea robusta), "Maize" (Zea mays), "Mango" (Mangifera indica) and "Sugarcane" (Saccharum officinarum) [8]. Out of the classes, acacia, grevillea and mango are trees between 3 to 14 m tall, while banana, maize and sugarcane are tall grasses. Figure 1 shows the data in false-color image composition and the related ground truth investigated the following day after the airborne imagery acquisition. • Salinas The Salinas data set was also captured by AVIRIS sensor over Salinas Valley, California, USA, in 1998. The image consists of 512 × 217 pixels. There are 16 classes of interest, and the wavelengths range from 0.4 to 2.5 . Twenty bands are removed due to water absorption, and 204 bands are used in experiments.

Taita Hills
The Taita Hills data set was captured by airborne AisaEAGLE (Specim Ltd., Finland) imaging spectrometer in Taita-Taveta District, Kenya, in 2012 [8]. The image consists of 586 × 701 pixels with 64 bands from 0.4 to 1.0 in 0.6 m ground resolution. The image is classified with field information to six agricultural classes, namely to "Acacia" (Acacia spp.), "Banana" (Musa acuminate), "Grevillea" (Grevillea robusta), "Maize" (Zea mays), "Mango" (Mangifera indica) and "Sugarcane" (Saccharum officinarum) [8]. Out of the classes, acacia, grevillea and mango are trees between 3 to 14 meters tall, while banana, maize and sugarcane are tall grasses. Figure 1 shows the data in false-color image composition and the related ground truth investigated the following day after the airborne imagery acquisition.
(a) (b) Figure 1. A false-color image over an agricultural field in the Taita Hills data set (a) and the related ground truth (b).

Underlying Impetus
We believe that for adjacent bands, a band with a relatively smaller mean and a relatively larger standard deviation means that this band is more informative than its nearby bands. On the contrary,

Underlying Impetus
We believe that for adjacent bands, a band with a relatively smaller mean and a relatively larger standard deviation means that this band is more informative than its nearby bands. On the contrary, if a band has an increased mean and a decreased standard deviation compared with its adjacent band, this band should not be selected. In our experiment, we directly dropped these bands. From the above analysis, the changes of means and standard deviations of nearby bands need to be investigated.
To this end, we constructed a 3×3 matrix extended from CV for every three adjacent bands. CV is dimensionless and calculates the standard deviation on the unit mean of a band: σ is the standard deviation, and µ is the mean of a band.

Extend Scalar CVs to A 3 × 3 Matrix
Given three adjacent bands named b1, b2 and b3 with means of µ 1 , µ 2 , µ 3 and standard deviations of σ 1 , σ 2 , σ 3 , we could construct a matrix: Clearly, we could evaluate the changes of standard deviations through each row in M and the changes of means through each column in M. For a band with a relatively larger standard deviation compared with its adjacent bands, m 22 will be greater than m 21 and m 23 . For a band with a relatively smaller mean compared with its adjacent bands, m 22 will also be greater than m 12 and m 32 , as mean is on the denominator. From this observation, it is easy to select bands with relatively smaller means and larger standard deviations. To evaluate the degree of increase in the standard deviation of b2 compared with b1 and b3, and the degree of decrease in the mean of b2 compared with b1 and b3, a criterion is proposed:  (3) is equal to (σ 2 − σ 1 )(1/µ 2 − 1/µ 1 ), where σ 2 − σ 1 assesses the degree of how much bigger σ 2 is than σ 1 , and 1/µ 2 − 1/µ 1 measures the degree of how much smaller µ 2 is than µ 1 . Therefore, according to Equation (3), b2 will have a large value if its mean is relatively smaller, and its standard deviation is relatively bigger. For every three adjacent bands, we could use Equation (3) to assign value to the middle band. Therefore, every band in a hyperspectral image will obtain a value for ranking. Ranking these values, we could get the order of each band for band selection.

Does Entropy Also Work?
In information theory, entropy is usually used to assess the "information" in a signal. The BRECV method tries to select bands that are more informative than their adjacent bands. It is reasonable to ask whether entropy could also be used based on the same idea as the BRECV method. To verify this, we use the conditional entropy to find the relationship between nearby bands. For every adjacent three bands, b1, b2 and b3, b2 will obtain a value based on: H(b2 | b1) is the conditional entropy of b2 given b1 and H(b2 | b3) is the conditional entropy of b2 given b3. Then, the values are sorted for band selection. This method is termed as band ranking via entropy (BRE).

Drop Adjacent Bands
In some cases, nearby bands will have similar values, resulting in that some adjacent bands will be selected sequentially. To avoid this situation, a band will be dropped if its left band or right band has already been chosen. For instance, if b2 has already been selected, b1 and b3 will be discarded even they have higher values than other bands. The BRECV and BRE methods with dropping adjacent bands are termed as BRECVD and BRED.
The CV values of each band could be directly sorted; this also could be used for band selection. This method is called as band ranking via CV (BRCV). Therefore, in this study, three band selection methods are proposed: BRCV, BRECV/BRECVD and BRE/BRED.

Time Complexity Analysis
Given a hyperspectral image I ∈ R h×w×c , the time complexity to calculate means of bands is O(hwc), and the time complexity to calculate standard deviations of bands is O(hwc). h means the height, w means the width, and c means the number of bands in a hyperspectral image. Operations on allocating values for bands, sorting values and dropping adjacent bands involve only means and standard deviations. Since c usually is far less than hwc, we could omit these operations. Hence, the time complexities of BRECV and BRECVD is linear.
Though we do not analyze the space complexity in detail, as only means and standard deviations of bands are used, the space complexity is also very low for BRECV and BRECVD. For the BRCV, BRE and BRED methods, their performances are not robust, which will be shown in the experimental results later. In this case, we do not analyze their computational complexity.

Results and Discussion
To verify the effectiveness of the proposed methods, classification experiments were implemented on the above-mentioned six different real-work hyperspectral images. Details of the experimental setup and results are shown in this section together with the discussion.

•
Optimal neighborhood reconstruction (ONR) [32] ONR selects bands by finding the optimal band combination to reconstruct the original data. A noise reducer was used to minimize the influence of noisy bands.

•
Optimal clustering framework (OCF) [26] OCF first finds the optimal clustering under some reasonable constraint and then ranks the clustering to effectively select bands on the clustering structure. OCF could also automatically determine the number of the required bands to choose.

•
Enhanced fast density-peak-based clustering (EFDPC) [19,32] EFDPC tries to find the cluster centers with properties as large local density and large intercluster distance. Large local density means a cluster should have as many points as possible. Large intercluster distance means different cluster centers should be far from each other. EFDPC ranks bands through weighting these two properties.

Classifiers
Support vector machine (SVM) and k-nearest neighborhood (KNN) classifiers are used to verify the classification performance for different band selection methods. We used the SVM and KNN classifiers provided by MATLAB R2019b. For SVM classifiers, the kernel function is "rbf", and the coding method is "onevsall". For KNN classifiers, k is 3. All the parameters were the same for each classification experiment.
For each data set, 10% of labeled samples of each class were randomly selected to train the classifiers, and the rest 90% of samples were used for testing. Each experiment was implemented 100 times individually, and these results were averaged to have a stable result. Overall accuracy (OA) curves were used to compare different band select methods. Similar to [26] and [32], we only used at most 30 bands in each experiment. Our code is available at https://github.com/cvvsu/BRECV. • KSC From Figure 3, on the KSC data set, the classification results of the BRECV and BRECVD methods were quite similar compared with OCF and ONR and were better than EFDPC. The performances of BRE and BRED we also acceptable on the KSC data set. However, the BRCV method still had a lower performance on this data set. The dropping adjacent bands method improved the performance of the BRE method a lot.

•
Pavia University From Figure 4, On the Pavia University data set, the performance of BRECVD and BRED methods exceeded OCF methods when the number of selected bands greater than 25. BRECV outperformed the EFDPC method at first and was finally overtaken. BRCV still had the lowest performance compared with other methods. • Pavia University From Figure 4, On the Pavia University data set, the performance of BRECVD and BRED methods exceeded OCF methods when the number of selected bands greater than 25. BRECV outperformed the EFDPC method at first and was finally overtaken. BRCV still had the lowest performance compared with other methods.

• Botswana
From Figure 5, on the Botswana data set, except the BRCV method, all the other band selection methods showed similar classification performance.
• Salinas From Figure 6, on the Salinas data set, similar to the results on the Botswana data set, all methods except BRCV had qualitatively the same level of performance.
• Taita Hills   • Average OAs over different selected bands Table 1 shows the indices of selected bands by BRECVD on different data sets. Table 2

Discussion
From the above experiments, we found that the proposed BRECV and BRECVD methods achieved quite stable performances on all the data sets. The performances of entropy-based methods were not robust, and directly ranking the CV value of each band did not provide good results except on the Taita Hills data set. One possible reason is that mean and standard deviation provide two dimensions to investigate the relationships between nearby bands, while entropy and pure CV provide just one dimension.

•
Average OAs over different selected bands Table 1 shows the indices of selected bands by BRECVD on different data sets. Tables 2 and 3 show the average OAs by SVM classifier and KNN classifier over the 30 selected bands, respectively. Generally, BRECV and BRECVD had better average OAs on different data sets compared with the EFDPC method. The performances of BRE and BRED methods were not robust, and their performances on the Indian Pines data set were quite bad. The BRCV method achieved relatively better performance on the Taita Hills data set, and on other data sets, the BRCV method did not have acceptable classification performance.

Discussion
From the above experiments, we found that the proposed BRECV and BRECVD methods achieved quite stable performances on all the data sets. The performances of entropy-based methods were not robust, and directly ranking the CV value of each band did not provide good results except on the Taita Hills data set. One possible reason is that mean and standard deviation provide two dimensions to investigate the relationships between nearby bands, while entropy and pure CV provide just one dimension.
Compared with the EFDPC method, the BRECV and BRECVD methods achieved better classification results. Only on the Indian Pines data set, both the tree methods had similar performance. Compared with the OCF, BRECV and BRECVD showed better performance on the Indian Pines data set and had a similar performance on the other five data sets. The ONR method outperformed all other methods on most data sets. In most cases, dropping adjacent bands for BRE and BRECV methods improved the classification performance.
Considering the fact that BRECV and BRECVD methods use neither clustering nor optimization methods, and select bands based only on means and standard deviations of bands, it is reasonable to believe these two methods are also useful for hyperspectral band selection. Moreover, the physical meaning is clear for bands selected by BRECV and BRECVD.
From the classification performance of the proposed methods, we could say that the bands selected by our methods were representative and informative since these selected bands from a hyperspectral image achieved qualitatively the save level of classification performance, compared with the results of some state-of-the-art band selection methods and the whole hyperspectral data set. Figure 8 illustrates the scatter plots of means and standard deviations of bands in each data set; the selected bands are also shown in filled red colors. Figure 8 illustrates a concrete sense of the relative locations of selected bands in each hyperspectral data set. For the KSC data set, the selected bands were mainly located in a small band region, which is similar to the results of ONR. In ONR, the selected bands from the KSC data set only covered 2 5 of the whole spectrum.

Conclusions
This study investigated the relationship between nearby bands in a hyperspectral data set and proposes a criterion for band ranking. An extended matrix based on coefficient of variation was used to help study changes of means and standard deviations. Finally, several band ranking methods are presented for hyperspectral band selection according to the relationships between nearby bands. The proposed methods were quite efficient, as the methods did not need to face the large-volume problem. Compared with other band selection methods, the proposed methods obtained qualitatively the same level of classification performance.