Band Subset Selection for Hyperspectral Image Classiﬁcation

: This paper develops a new approach to band subset selection (BSS) for hyperspectral image classiﬁcation (HSIC) which selects multiple bands simultaneously as a band subset, referred to as simultaneous multiple band selection (SMMBS), rather than one band at a time sequentially, referred to as sequential multiple band selection (SQMBS), as most traditional band selection methods do. In doing so, a criterion is particularly developed for BSS that can be used for HSIC. It is a linearly constrained minimum variance (LCMV) derived from adaptive beamforming in array signal processing which can be used to model misclassiﬁcation errors as the minimum variance


Introduction
Hyperspectral image classification has received considerable interest in recent years .Its band selection (BS) issue has been also studied extensively .In general, there are two approaches to BS.One is to select bands one at a time, sequentially; this is referred to as sequential multiple band selection (SQMBS).In this case, a criterion that can be used to select bands, according to priorities ranked by the criterion, is usually required.Such a criterion is referred to as a band prioritization (BP) criterion, and it can be designed according to two perspectives.One type of BP criterion is based on data characteristics or statistics such as variance, signal-to-noise ratio (SNR), entropy, and information divergence (ID) to calculate a priority score for each of the individual bands in order to rank them [25].As a result, such BP-based SQMBS is generally unsupervised and is not adaptive to any particular application.In other words, the same selected bands are also applied to all different applications.The other type of BP criterion is supervised and is adaptive to a particular application, such as classification , target detection [49,50], endmember extraction [51], spectral unmixing [52], etc.Unfortunately, one of major problems with BP-derived BS methods is how to deal with band correlation.Since hyperspectral imagery has very high interband correlation, the fact that a band has a high priority to be selected implies that its adjacent bands also have high priorities to be selected.To avoid this dilemma, band decorrelation may be required to remove redundant bands from a group of selected bands.However, this also comes with two issues, i.e., how to select a band correlation criterion to measure the correlation between two bands, and how to determine the threshold for two bands that are sufficiently decorrelated.
As an alternative to BP-based SQMBS methods, another approach, referred to as simultaneous multiple band selection (SMMBS), is to select multiple bands simultaneously as a band subset.This approach does not have issues in prioritizing bands or decorrelating bands that are encountered in SQMBS.However, the price paid for these advantages is how to develop an effective search strategy to find an optimal band subset, since it generally requires an exhaustive search, which is practically infeasible.To address this issue, several works have been recently proposed, such as band clustering [58][59][60], particle swarm optimization (PSO) in [35], firefly algorithm (FA) in [36], multitask sparsity pursuit (MTSP) [38], multigraph determinantal point process (MDPP) [43], dominant set extraction BS (DSEBS) in [40], etc.Of particular interest is a new concept of band subset selection (BSS) to address this issue which is quite different from the aforementioned SMMBS methods in the sense of the search strategy to be used for finding an optimal set of multiple bands.It considers a selected band as a desired endmember.Accordingly, finding an optimal set of endmembers from all data sample vectors can be translated to selecting an optimal band subset simultaneously from all bands.With this interpretation, two sequential algorithms designed to realize an N-finder algorithm (N-FINDR) [61] numerically, called sequential N-FINDR (SQ N-FINDR) and successive N-FINDR (SC N-FINDR) [62][63][64][65] can be redesigned to find desired band subsets, called SQ BSS and SC BSS algorithms.These two SQ BSS and SC BSS algorithms were recently developed for SMMBS in applications of anomaly detection [66] and spectral unmixing and classification [67,68].This paper further extends BSS to hyperspectral image classification and has several different aspects not found in [66][67][68].First and foremost is the criterion used for BSS, which is the minimum variance resulting from a linearly constrained finite impulse response filter arising in adaptive beamforming in array signal processing [69][70][71][72].This linearly constrained minimum variance (LCMV)-based BSS interprets signal sources as class signature vectors and linearly constrains the class signature vectors, finding an optimal band subset for classification.It is very different from constrained energy minimization (CEM)-based BS [26], which constrains a single selected band, and also from constrained multiple band selection (CMBS) [68], which extends CEM-BS by constraining multiple bands as band subsets, not as class signature vectors as LCMV-BSS does.Secondly, two new SQ BSS and SC BSS algorithms are developed for LCMV-BSS, specifically for classification, referred to as SQ LCMV-BSS and SC LCMV-BSS.Thirdly, the classifier used to evaluate BS performance is also an LCMV classifier which is particularly designed to best utilize the bands selected by LCMV-BSS.Fourthly, despite the fact that LCMV-BSS may not exhaust all possible band combinations, to the authors' best knowledge, LCMV-BSS is probably the only BSS algorithm to search band subsets among all possible band combinations numerically compared to other SMMBS algorithms such as PSO, FA, MTSP, MDPP, DSEBS which are indeed designed to run only a very small selected set of band subsets.Finally, and most importantly, the proposed LVMV-BSS is very easy to implement because there are no parameters that need to be tuned, as many BS methods have.This is a tremendous advantage since such parameters must be adaptive to various applications.

LCMV Criterion for BSS
Suppose that there are M classes of interest and each class is specified by a class signature vector, denoted by d 1 , d 2 , • • • , d M .We can now form a class signature matrix, denoted by The goal is to design an FIR linear filter with L filter coefficients {w 1 , w 2 , • • • , w L }, denoted by an L-dimensional vector w = (w 1 , w 2 , • • • , w L ) T that minimizes the filter output energy subject to the following constraint: where c = (c 1 , c 2 , • • • , c k ) T is a constraint vector.Using (1), we derive the following linearly constrained optimization problem: where R = (1/N)∑ N i=1 r i r T i is the autocorrelation sample matrix of the image.The solution to ( 2) is called the LCMV-based classifier and can be obtained in [69,71,72] by with Substituting ( 3) into (4) yields According to [70], (5) is the minimum variance weighted by R −1 .As a matter of fact, (5) can be also viewed as the minimal R −1 -weighted least squares error (LSE) caused by misclassification errors from operating δ LCMV on the entire image cube.For those who would like to learn more about LCMV, its details can be found in [69][70][71].

Band Subset Selection
A BS problem is generally described as follows.Assume that J(.) is a generic objective function of Ω BS for the BS to be optimized where Ω BS is a band subset selected from a full band set Ω.For a given number n BS of selected bands, a BS method is to find an optimal band subset Ω * BS with |Ω BS |= n BS which satisfies the following optimization problem: Depending upon how the objective function J(Ω BS ) is designed, the optimization in ( 6) can be performed by either maximization or minimization over all possible band subsets Ω BS contained in Ω with |Ω BS |= n BS .
Since solving (6) requires exhausting all possible n BS -band combinations to find an optimal band subset, Ω * BS , it is practically impossible to do so.Accordingly, many approaches have been investigated by designing various criteria or features to define J(Ω BS ) and solve (6).One traditional approach is to design a BP criterion to rank all bands from which BS can be carried out by selecting bands according to their calculated priorities by a particular BP criterion.Such an approach generally results in an SQMBS method which selects multiple bands one at a time sequentially.As noted in the introduction, one major issue arising from this approach is how to deal with redundant bands caused by band correlation.As an alternative, another BP-derived SQMBS method is to specify a particular application such as minimum estimated abundance covariance (MEAC) for classification [34], which can generate feature vectors for BP and then takes advantage of the sequential forward floating search (SFFS) and sequential backward floating search (SBFS) developed in [73] to derive forward and backward BS methods.However, the band correlation issue still remains.
In contrast to SQMBS, many recent efforts have been directed to SMMBS, which selects multiple bands simultaneously at the same time.Associated with SMMBS are also two main issues needed to be addressed.One is determining the number n BS of bands to be selected, which is also an issue in SQMBS.Generally, n BS can be determined by either trial-and-error or the virtual dimensionality (VD) developed in [69,74].The other is a more critical issue, which is to how to find appropriate n BS bands.Suppose that n BS = p is the number of bands needed to be selected, where L is the total number of bands, and B l j is the selected jth band.In order to find an optimal band subset Ω * p , we must run through all possible L is large such as in hyperspectral imagery.In this case, developing an effective search strategy for finding an optimal set of multiple bands that does not exist in SQMBS is a great challenge to SMMBS.
A simple SMMBS approach is to group or combine bands into clusters, each of which produces a representative band for BS using certain band measure criteria [58][59][60].In particular, the concept in [58] is similar to Fisher's ratio, using mutual information as a band prioritization criterion for clustering.Most interestingly, a band group-wise method was developed [38], which used band combinations by compressive sensing and a multitask sparsity pursuit (MTSP)-based criterion to select band combinations based on linear sparse representation via an evolution-based algorithm-derived search strategy.Another SMMBS approach is to narrow the search range by specifying particular parameters to limit a small number of band subsets as candidate optimal sets, then follow an optimization algorithm such as PSO [35] or FA [36] to find an optimal band subset from the selected candidate set of band subsets.
Most recently, two other promising approaches have been reported.One is to use graph-based representations with each path used to specify a particular band subset.For example, Yuan et al. [43] proposed a graph-based SMMBS method, called multigraph determinantal point process (MDPP), which makes use of multiple graphs to discover a structure and diverse band subset from a graph where each node represents a band and the edges are specified by similarity between bands.Accordingly, a path represents a possible band subset.Then, a search algorithm called mixture determinantal point process (Mix-DPP) was further developed to find a diverse subset that can be a potential optimal band combination.The other is DSEBS, which exploits structure information via a set of local spatial-spectral filters and uses a graph-based clustering search strategy derived from dominant set extraction to find a potential optimal band subset [40].
In addition to the above-mentioned approaches there is also a new approach, called BSS, which considers the problem of multiple band selection as an endmember finding problem.If a desired selected band is interpreted as an endmember and the full band set as the entire data set, then a band subset can be interpreted as a set of endmembers.Consequently, finding an optimal set of n BS bands can be carried out in a similar way to finding an optimal set of n BS endmembers.This BSS-based approach has recently proved to be very promising and has great potential in various applications such as anomaly detection in [65], spectral unmixing in [66], and target detection in [67].This paper presents another new application of BSS to hyperspectral image classification with LCMV used as a criterion particularly designed for classification.

LCMV-BSS Algorithms
Now, if we replace the full band set Ω in R −1 of (5) with a selected band subset Ω BS , then ( 5) which is the minimum variance weighted by R −1 Ω BS resulting from the LCMV filter using a partial band subset specified by Ω BS .There is another interpretation of (7) which can be also considered as the least R −1 Ω BS -weighted square error.It should be noted that the constraint vector c is specifically designed to take care of M class signatures, d 1 , d 2 , • • • , d M , not bands.Accordingly, c has nothing to do with the selected band subset Ω BS and, thus, it remains a constant in (7) for any selected band subset Ω BS .
Using the MV(Ω BS ) in (7), a criterion can be designed to find an optimal band subset Ω * BS which solves By virtue of (8), two types of algorithms from SQ N-FINDR and SC N-FINDR, called the sequential LCMV-BSS (SQ LCMV-BSS) algorithm and the successive LCMV-BSS (SC LCMV-BSS) algorithm, can be further developed as follows.

SQ LCMV-BSS
The idea of SQ LCMV-BSS is to use two loops to iterate band subsets Ω BS in an outer loop and compute MV(Ω BS ) in (7) in an inner loop.Depending upon how MV(Ω BS ) is computed in the inner loop, two versions can be developed.The first one is called SQ LCMV-BSS-1, and finds the minimum variance MV(Ω (j) BS ) currently being iterated for 1 ≤ j ≤ n BS in the inner loop compared to the minimum variance MV(Ω (l) BS ) obtained at the lth iteration in the outer loop.A detailed step-by-step implementation is described below.

Algorithm 1 SQ LCMV-BSS-1
Step 1: Initial conditions (i) n BS = p, which is the number of selected multiple bands determined by VD.
Find an index j* by A second version of SQ LCMV-BSS, referred to as SQ LCMV-BSS-2, always finds the minimum variance MV(Ω (j) BS ) currently being iterated for 1 ≤ j ≤ n BS at each iteration in the inner loop; its detailed step-by-step implementation is summarized as follows.
Algorithm 2 SQ LCMV-BSS-2 Step 1: Initial conditions (i) n BS = p, which is the number of selected multiple bands determined by VD.
(ii) Let Ω (0) p = B p uniformly selected from the band set Ω.
Step 2: Outer loop For l =

SC LCMV-BSS
A second type of LCMV-BSS algorithm is SC LCMV-BSS, which reverses the two loops implemented in SQ LCMV-BSS by iterating the computation of MV(Ω BS ) in (7) in an outer loop, while iterating band subsets n BS in an inner loop.Its detailed step-step implementation is provided in the following.

Algorithm 3 SC LCMV-BSS
Step 1: Initial conditions (i) n BS = p, which is the number of selected multiple bands determined by VD.

Real Image Experiments
Three popular real hyperspectral images, Purdue University's Indiana Indian Pines, Salinas, and University of Pavia, available at http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_ Remote_Sensing_Scenes, were used in experiments.The detailed data descriptions and matlab data files can be also found on this website.

Purdue Indiana Indian Pines Scene
The first image scene used for experiments is an airborne visible/infrared imaging spectrometer (AVIRIS) hyperspectral data set from the Purdue Indiana Indian Pines test site shown in Figure 1a

Salinas
A second set of AVIRIS data used for experiments was the Salinas scene shown in Figure 2a

ROSIS Data
The last hyperspectral image data used for experiments was the University of Pavia image shown in Figure 3

Salinas
A second set of AVIRIS data used for experiments was the Salinas scene shown in Figure 2a, which was captured by the AVIRIS sensor over Salinas Valley, California, with a spatial resolution of 3.7 m per pixel and spectral resolution of 10 nm.It has a size of 512 × 217 × 224. Figure 2b,c

Salinas
A second set of AVIRIS data used for experiments was the Salinas scene shown in Figure 2a, which was captured by the AVIRIS sensor over Salinas Valley, California, with a spatial resolution of 3.

ROSIS Data
The last hyperspectral image data used for experiments was the University of Pavia image shown in Figure 3

ROSIS Data
The last hyperspectral image data used for experiments was the University of Pavia image shown in Figure 3, which is an urban area surrounding the University of Pavia, Italy.It was recorded using the ROSIS-03 satellite sensor.It is of size 610 × 340 × 115 with a spatial resolution of 1.3 m per pixel and spectral coverage ranging from 0.43 to 0.86 µm with spectral resolution of 4 nm (the 12 most noisy channels were removed before experiments).Nine classes of interest, plus a background (BKG) class (class 0), were considered for this image.In the following experiments, four types of BS methods were tested for a comparative study and analysis.
1. Uniform band selection (UBS): According to our extensive experiments, UBS is a reasonably good BS method which is also reported in the literature.It does not require any prior knowledge or BS criterion.It is the simplest BS method.2. MEAC: This uses the minimum covariance derived from the estimated abundance matrix, which is similar to the minimum variance in (5).In addition, it can also represent the category of SQMBS methods.As noted in the introduction and in Section 3, although PSO, FA, and MTSP are also SMMBS methods, they are not compared in this paper for the following reasons.One is that their design rationale is completely different from that of LCMV-BSS.Secondly, the initial candidate sets from which their search algorithms find an optimal band subset are random and are also too small.So, their results are not representative and also are not reproducible.Thirdly, the details of their used parameters were not specified and provided in their papers.Therefore, it is very difficult to implement their algorithms for fair comparisons.
Table 1 tabulates the number nBS of selected bands estimated for three scenes using Harsanyi-Farrand-Chang (HFC) method/noise whitened HFC (NWHFC method developed for VD in [69,74,75] where nBS was determined to be nBS = 18 for Purdue's data, 21 for Salinas and 14 for University of Pavia with a false alarm probability of 10 −4 .In the following experiments, four types of BS methods were tested for a comparative study and analysis.

1.
Uniform band selection (UBS): According to our extensive experiments, UBS is a reasonably good BS method which is also reported in the literature.It does not require any prior knowledge or BS criterion.It is the simplest BS method.

2.
MEAC: This uses the minimum covariance derived from the estimated abundance matrix, which is similar to the minimum variance in (5).In addition, it can also represent the category of SQMBS methods.

3.
MDPP and DSEBS: Both represent the category of SMMBS methods.They make use of graph representations to specify band groups.Most importantly, these two methods were compared with CEM/LCMV-based methods in [26] and both are also based on the LCMV formulation specified by (2). 4.
LCMV-BSS developed in this paper: This represents the category of BSS methods using the LCMV formulation in (2).
As noted in the introduction and in Section 3, although PSO, FA, and MTSP are also SMMBS methods, they are not compared in this paper for the following reasons.One is that their design rationale is completely different from that of LCMV-BSS.Secondly, the initial candidate sets from which their search algorithms find an optimal band subset are random and are also too small.So, their results are not representative and also are not reproducible.Thirdly, the details of their used parameters were not specified and provided in their papers.Therefore, it is very difficult to implement their algorithms for fair comparisons.
Table 1 tabulates the number n BS of selected bands estimated for three scenes using Harsanyi-Farrand-Chang (HFC) method/noise whitened HFC (NWHFC method developed for VD in [69,74,75] where n BS was determined to be n BS = 18 for Purdue's data, 21 for Salinas and 14 for University of Pavia with a false alarm probability of 10 −4 .In order to perform HSIC, choosing an appropriate classifier is crucial.Recently, Yu et al. [76] developed a new classifier, called the iterative multiclass constrained background suppression classifier (IMCBSC), and further demonstrated that IMCBSC performed well in both overal accuarcy rate (P OA ) and precision rate (P R ) Since IMCBSC was also derived from LCMV and implemented by LCMV in an iterative manner, the iterative linearly constrained minimum variance (ILCMV) is used in this paper instead of IMCBSC to reflect its idea arising from LCMV and its iterative nature in algorithm implementation.Most importantly, ILCMV was adopted for two main reasons.One is because of the work in [76], which showed that ILCMV could perform at least comparably in P OA but significantly better than the work in [12].The other is that ILCMV is indeed derived from the LCMV criterion specified by (2).So, it is natural to use ILCMV to perform classification.
Two remarks on the implementation of ILCMV are noteworthy.

1.
Unlike most supervised classifiers used for HSIC which require training samples, ILCMV only needs the knowledge of the class signatures D, which can be obtained by either prior knowledge or class sample means.Specifically, the class signatures in D are not necessarily real data samples.2.
Also, unlike most supervised classifiers used for HSIC which require test and training data samples from the same class, the test samples for ILCMV can be selected from any arbitrary class including the BKG class, and are not necessarily limited to the same class trained by the training samples.This is a crucial difference between ILCMV and existing hyperspectral image classification algorithms reported in the literature.For more details, we refer to [23,76].Apparently, it is difficult to see any appreciable difference among all the classification results in Figures 4-6 by visual inspection.In this case, to better evaluate each BS method, conducting a quantitative analysis is necessary.It has been shown in [23,76] that using overall accuracy (OA), POA may not be sufficient to evaluate the effectiveness of classification performance.To address this issue, two additional measures, called precision rate, PR, and detection rate, PD (also known as recall rate), developed in [23,76] were introduced for HSIC where PR and PD have been widely used in pattern recognition such as medical imaging, handwritten character recognition, and biometric recognition.The definitions and details of POA, PR, and PD can be found in [23,76].
Tables 3-5 show PD, POA, and PR calculated by the ILCMV classification results in Figures 4-6 using the bands selected in Table 2 for Purdue's data, Salinas, and University of Pavia, respectively, where the best results with highest rates are shown in boldface.Here, we would like to point out a crucial fact used in the experiments, as noted in the second remark described above, where the PD, POA, and PR were calculated by including the background (BKG) for classification because LCMV is particularly designed to take care of the BKG issue in classification, as shown in [76].This is quite different from many reports which calculate POA excluding BKG from classification, such as [12].
Since PD varies with each class, it is difficult to evaluate the overall classification performance.So, our analysis is conducted based on POA and PR.As we can see from the tables, SQ LCMV-BSS-2 and SC LCMV-BSS outperformed all the other five BS methods in terms of POA and PR for Salinas and University of Pavia scenes, but were slightly worse than MDPP in POA and DSEBS in PR.Interestingly, both MDPP and DSEBS produced the best results in terms of POA and PR respectively for the Purdue data.As also noted in Tables 3-5, the POA and PR using full bands were generally not as good as those produced by most of the test BS methods, but also worse than that produced by UBS.These experiments showed that hyperspectral image classification can benefit greatly from the judicious selection of bands with appropriately determined nBS.Apparently, it is difficult to see any appreciable difference among all the classification results in Figures 4-6 by visual inspection.In this case, to better evaluate each BS method, conducting a quantitative analysis is necessary.It has been shown in [23,76] that using overall accuracy (OA), P OA may not be sufficient to evaluate the effectiveness of classification performance.To address this issue, two additional measures, called precision rate, P R , and detection rate, P D (also known as recall rate), developed in [23,76] were introduced for HSIC where P R and P D have been widely used in pattern recognition such as medical imaging, handwritten character recognition, and biometric recognition.The definitions and details of P OA , P R , and P D can be found in [23,76].
Tables 3-5 show P D , P OA , and P R calculated by the ILCMV classification results in Figures 4-6 using the bands selected in Table 2 for Purdue's data, Salinas, and University of Pavia, respectively, where the best results with highest rates are shown in boldface.Here, we would like to point out a crucial fact used in the experiments, as noted in the second remark described above, where the P D , P OA , and P R were calculated by including the background (BKG) for classification because LCMV is particularly designed to take care of the BKG issue in classification, as shown in [76].This is quite different from many reports which calculate P OA excluding BKG from classification, such as [12].
Since P D varies with each class, it is difficult to evaluate the overall classification performance.So, our analysis is conducted based on P OA and P R .As we can see from the tables, SQ LCMV-BSS-2 and SC LCMV-BSS outperformed all the other five BS methods in terms of P OA and P R for Salinas and University of Pavia scenes, but were slightly worse than MDPP in P OA and DSEBS in P R .Interestingly, both MDPP and DSEBS produced the best results in terms of P OA and P R respectively for the Purdue data.As also noted in Tables 3-5, the P OA and P R using full bands were generally not as good as those produced by most of the test BS methods, but also worse than that produced by UBS.These experiments showed that hyperspectral image classification can benefit greatly from the judicious selection of bands with appropriately determined n BS .Table 6 tabulates the computing times in seconds for each of six BS methods in a computer environment with a 1.6 GHz Intel Core i5 with OS X EI Capitan and 4 GB 1600 MHz DDR3; the software used to run experiments was Matlab_R2014b.Obviously, the best time was achieved by DSEBS, followed by SC LCMV-BSS and SQ LCMV-BSS.The worst time was achieved by MDPP for the Purdue data and MEAC for Salinas and University of Pavia.As noted above, a classifier can also have a significant impact on BS, especially when BKG is included for consideration.A recent work [12] developed four edge preserving filtering (EPF)-based techniques-EPF-B-c, EPF-G-c, EPF-B-g, and EPF-G-g for HSIC-and also conducted a comprehensive comparative analysis to show that their methods indeed performed better than most recently developed spectral-spatial techniques.Therefore, in what follows, we conducted experiments to evaluate the performance of ILCMV in comparison with these four EPF-based techniques with BKG particularly included for classification.To see this, we also implemented these four EPF-based techniques with "B" and "G" used to specify bilateral filter and guided filter, respectively, and "g" and "c" indicate that the first principal component and color composite of the three principal components are used as reference images [12].
Tables 7-15 tabulate the results in terms of P OA and P R rates produced by the four EFP-based methods and ILCMV, all of which included BKG for classification and also used the bands selected in Table 2 to implement the three image scenes.Data for the Purdue image is shown in Tables 7-9 using bands selected by SQ LCMV-BSS-1, SQ LCMV-BSS-2, and SC LCMV-BSS; data for Salinas is shown in Tables 10-12 using bands selected by SQ LCMV-BSS-1, SQ LCMV-BSS-2, and SC LCMV-BSS; and data for University of Pavia is shown in Tables 13-15 using bands selected by SQ LCMV-BSS-1, SQ LCMV-BSS-2, and SC LCMV-BSS.In addition, their computing times in seconds are included in the tables for comparison.Table 7. P OA and P R calculated by the classification results using the bands selected by SQ LCMV-BSS-1 for the Purdue data.Several interesting findings can be derived from the results in Tables 7-15.

1.
It is very obvious to note that BSS did improve classification results.Such an improvement cannot found in the four EPF-based methods, where the classification results of the four EPF-based methods using band subsets could only get worse compared with the results using full bands.This may be due to the fact that the four EPF-based methods used principal component analysis (PCA) to compress the original data in preprocessing which retains some crucial information provided by full bands.

2.
The precision rates produced by the four EPF-based methods were very low as also noted in [23,76].However, ILCMV using bands selected by LCMV-BSS consistently performed very well in both P OA and P R .

3.
According to Tables 7-9, ILCMV performed slightly better than the four EPF-based methods in P OA but significantly better in P R for Purdue's data and Salinas.The scene of the University of Pavia is interesting, as shown in Tables 13-15.The four EPF-based methods performed very well in P OA but did very poorly in P R with about only 20%.Furthermore, P OA produced by ILCMV may not be as good as those produced by the four EPF-based methods (about 10% less) but the P R produced by ILCMV were around 96% which is nearly 4.8 times better than the 20% produced by the four EPF-based methods.These experiments demonstrated that the BKG issue is critical in data analysis of the University of Pavia and cannot be ignored or discarded in data processing.
Unfortunately, this BKG issue has never been investigated in the past.4.
Unlike the four EPF-based methods, which performed well in P OA but very poorly in P R , ILCMV consistently performs well in both P OA and P R , and even better when it is implemented in conjunction with BSS-a case that the EPF-based methods actually failed, as shown in Tables 7-15.

5.
Last but not least, BS is heavily determined by three factors: the data to be processed, the BS method selected, and the classifier used.Unfortunately, most works on BS for hyperspectral image classification have been focused on the design and development of BS methods but very little has been reported on performance evaluation of different classifiers which use the same set of bands selected by a BS method.For example, as shown in Tables 7-15, if the four EPF methods were implemented by BS, their classification results could not be improved, but those of ILCMV could.6.
It should be noted that P D results are not included in Tables 7-15 due to two reasons.One is that the results of P D using full bands are already available in [23,76].The other is that EPF-based methods using partial bands did not perform better than their counterparts using full bands.So, it does not make sense to include their results in tables.Besides this, due to limited space, there is no need to include their results.

Conclusions
This paper developed an SMMBS method, called LCMV-BSS, which selects multiple bands as a band subset using LCMV to linearly constrain class signature vectors as a criterion to select an optimal band subset.It is completely different from existing BS methods, with the following contributions: (i) It is a BSS method particularly developed for HSIC; (ii) It is quite different from single band-constrained methods in [26] and multiple-band constrained methods in [68], by constraining multiple class signature vectors instead of multiple bands; (iii) It develops three numerical search algorithms to find optimal band subsets which are different from the graph-based approaches [40,43] used by other SMMBS methods; (iv) It is very simple to implement via (7) with no parameters needing to be tuned; (v) Most importantly, it shows that HSIC can be improved by BS provided that the number n BS of selected bands and the set of n BS bands are properly selected.

p − 1 c
which specifies the band to be replaced by the lth band B l .Such a band is now denoted by B (l+1) j .A new set of bands is then produced by letting B (l+1) j * = B l and B (l+1) j = B (l) j for j = j *

p − 1 cj
which specifies the band to be replaced by the lth band B l .Such a band is now denoted by B (l+1) j .A new set of bands is then produced by letting B for j = j *
, which is an urban area surrounding the University of Pavia, Italy.It was recorded using the ROSIS-03 satellite sensor.It is of size 115 340 610 × × with a spatial resolution of 1.3 m per pixel and spectral coverage ranging from 0.43 to 0.86 μm with spectral resolution of 4 nm (the 12 most noisy channels were removed before experiments).Nine classes of interest, plus a background (BKG) class (class 0), were considered for this image.
, which is an urban area surrounding the University of Pavia, Italy.It was recorded using the ROSIS-03 satellite sensor.It is of size 115 340 610   with a spatial resolution of 1.3 m per pixel and spectral coverage ranging from 0.43 to 0.86 m with spectral resolution of 4 nm

Figure 3 .
Figure 3. Ground truth of University of Pavia scene with nine classes.(a) Band 95, (b) color ground truth image, (c) class labels.
3. MDPP and DSEBS: Both represent the category of SMMBS methods.They make use of graph representations to specify band groups.Most importantly, these two methods were compared with CEM/LCMV-based methods in [26] and both are also based on the LCMV formulation specified by (2). 4. LCMV-BSS developed in this paper: This represents the category of BSS methods using the LCMV formulation in (2).

Figure 3 .
Figure 3. Ground truth of University of Pavia scene with nine classes.(a) Band 95, (b) color ground truth image, (c) class labels.

Table 2
lists the bands selected by seven BS methods-uniform BS (UBS), minimum estimated

Table 3 .
P D , P OA , and P R calculated from the classification results in Figure4for Purdue's data.

Table 4 .
P D , P OA , and P R calculated from the classification results in Figure5for Salinas.

Table 5 .
P D , P OA , and P R calculated from the classification results in Figure6for University of Pavia.

Table 8 .
P OA and P R calculated by the classification results using full bands and the bands selected by SQ LCMV-BSS-2 for the Purdue data.

Table 9 .
P OA and P R calculated by the classification results using full bands and the bands selected by SC LCMV-BSS for the Purdue data.

Table 10 .
P OA and P R calculated by the classification results using full bands and the bands selected by SQ LCMV-BSS-1 in Table2for Salinas.

Table 11 .
P OA and P R calculated by the classification results using full bands and the bands selected by SQ LCMV-BSS-2 in Table2for Salinas.

Table 12 .
P OA and P R calculated by the classification results using full bands and the bands selected by SC LCMV-BSS in Table2for Salinas.

Table 13 .
P OA and P R calculated by the classification results using full bands and the bands selected by SQ LCMV-BSS-1 in Table2for University of Pavia.

Table 14 .
P OA and P R calculated by the classification results using full bands and the bands selected by SQ LCMV-BSS-2 in Table2for University of Pavia.

Table 15 .
P OA and P R calculated by the classification results using full bands and the bands selected by SC LCMV-BSS in Table2for University of Pavia.