Weighted-Fusion-Based Representation Classifiers for Hyperspectral Imagery

Spatial texture features have been demonstrated to be very useful for the recently-proposed representation-based classifiers, such as the sparse representation-based classifier (SRC) and nearest regularized subspace (NRS). In this work, a weighted residual-fusion-based strategy with multiple features is proposed for these classifiers. Multiple features include local binary patterns (LBP), Gabor features, and the original spectral signatures. In the proposed classification framework, representation residuals for a testing pixel from using each type of features are weighted to generate the final representation residual, and then the label of the testing pixel is determined according to the class yielding the minimum final residual. The motivation of this work is that different features represent pixels from different perspectives and their fusion in the residual domain can enhance the discriminative ability. Experimental results of several real hyperspectral image datasets demonstrate that the proposed residual-based fusion outperforms the original NRS, SRC, support vector machine (SVM) with LBP, and SVM with Gabor features, even in small-sample-size (SSS) situations.


Introduction
Containing hundreds of spectral narrow bands, hyperspectral imagery has a very high spectral resolution that provides potential for more accurate object classification.Numerous classification algorithms [1,2] using hyperspectral data have been developed for a variety of applications, such as mineral exploration, environmental monitoring, etc.With the advance of sensor technology, hyperspectral images with high spatial resolution are becoming more available.
Recently, representation-based classification, which does not assume any data density distribution, has attracted great interest.In our previous work, nearest regularized subspace (NRS) [3] was proposed for hyperspectral image classification, where each testing pixel was represented by a linear combination of all available labeled samples per class and the class label was the one whose training samples provide the lowest representation residual.Moreover, a distance-weighted Tikhonov regularization was employed to impose adaptive penalty on representation coefficients of different training samples in the way that samples more similar to the testing pixel could be assigned with larger coefficients.The NRS has a closed-form solution, and has been demonstrated to be more efficient than the state-of-the-art classifiers, such as support vector machine (SVM) [2].
Sparse representation-based classification (SRC) [4] was originally introduced in computer vision [5,6], and first applied to hyperspectral image classification in [7].Tang et al. [8] proposed a classifier based on the SRC, which outperformed some popular classification methods, such as the SVM, subspace projection (SP), orthogonal matching pursuit (OMP), and so on.Similar to the NRS, the SRC determines the class label of a testing pixel by solving a problem of sparse regression between training samples and a testing pixel.However, the solution of sparse coefficients may not be unique or stable when the atoms in a dictionary contain high mutual coherence [9,10].In order to alleviate such problems, improvement strategies have been applied on the SRC with structured priors, including joint sparsity (JS) [11], low-rank (LR) Lasso [12], collaborative hierarchical Lasso (CHiLasso) [13], and so on.
With the improvement of spatial resolution, spatial information has been utilized for hyperspectral image classification [14][15][16][17][18][19].For example, composite kernels (CK) for both spectral and spatial features were employed by a SVM classifier, referred to as SVM-CK [14].Recently, spatial features, such as morphological attribute profile (MAP) [15,20,21], 2-D Gabor features [16,22], 3-D Gabor features [17], gray level co-occurrence matrix (GLCM) [18], local binary patterns (LBP) [19], and shape features [23], were investigated, and a number of spatial features were combined for the representation-based classifiers.In [16], the benefits of using spatial features extracted from a Gabor filter for the NRS were demonstrated.In [24], weighted joint collaborative representation (WJCR) classification was investigated.For face recognition, Lee et al. [25] introduced a robust method that applied the Gabor-edge components histogram on the SRC.A joint sparse representation model (JSRM) was proposed in [26][27][28].Multiscale adaptive sparse representation was developed to reduce the sensitivity to region size in the JSRM [29].
Along with algorithm development for hyperspectral image classification, feature-level and decision-level fusion have been investigated [23,30,31].Waske et al. [30] analyzed the robustness of multiple classifier systems based on the SVM and random feature selection, and it was demonstrated that the combination of the selected features for multiple classifiers could truly improve classification accuracy.In [31], an adaptive affinity propagation algorithm was proposed for dimension reduction for classification improvement.In [32,33], novel frameworks by combining supervised and unsupervised learning methods (e.g., SVM and fuzzy-c-means clustering algorithm) were developed.Decision fusion could also improve the performance of target and anomaly detection [34].In [35], a wavelet-based NRS classifier for noise-robust hyperspectral image classification based on redundant discrete wavelet transformation was introduced.In [36], a dynamic classifier selection approach was presented, where both spatial and spectral information were used to determine the label of a testing pixel once the remaining neighborhood meets the threshold.
In this paper, a weighted-residual-fusion-based strategy with multiple features is proposed for representation-based classifiers.Multiple features include LBP features, Gabor features, and the original spectral signatures.In the proposed classification framework, each type of feature is used for the NRS or SRC, generating multiple representation residuals, and then all these residuals are added together with different weights [37] and the label of the testing pixel is determined according to the class yielding the minimum weighted sum of the residuals.The LBP and Gabor feature are employed due to the facts that the LBP has an excellent ability to describe local image texture features and a Gabor filter is able to extract global features.It is expected that different types of spatial features reflect the characteristics of a pixel from different perspectives and their fusion in the residual domain is able to enhance class separability even in small-sample-size (SSS) situations.

Gabor-Filter
A Gabor filter [38] is an orientation-dependent band-pass filter with orientation sensitive but rotation invariant characteristics.Here, Gabor features are the magnitudes of each Gabor-filtered image, which is equivalent to signal power in the corresponding filter pass band.In a 2-D (a, b) coordinate system, a Gabor filter, including a real and imaginary term, is represented as, (1) where: In Equation (1), δ is the wavelength of the sinusoidal factor, θ is the orientation separation angle (e.g., etc.) of Gabor kernels, φ represents the phase offset, σ is the standard derivation of the Gaussian envelope, and γ is the spatial aspects ratio specifying the ellipticity of the support of the Gabor function.
return the real and imaginary parts of the Gabor filter, respectively.Parameter σ is determined by δ and spatial frequency bandwidth bw as,

LBP
Recently, the original LBP [39] and its variants have been used to extract rotation invariant features for classification.An entire image is convoluted by a small mask, e.g., a 3 × 3 window; for each central pixel in the window, its gray value is compared with its surrounding 8 pixels; a binary label is then obtained, which can be either "0" or "1" (0 means a smaller gray value and 1 means a larger one).For a general case, let m represent the number of surrounding neighbors and let these pixels be denoted . For the center pixel c g , its LBP code is calculated as, where the function U(.) is defined as As illustrated in Figure 1, for each pixel, a binary number is obtained by concatenating all these binary values in a clockwise direction, which starts from one of its top-middle neighbors.It is easy to observe that such an LBP operator actually represents the texture orientation and smoothness information in a local region.For a 3 × 3 mask, the eight labels compose the center pixel's binary numbers that can be further converted into decimal numbers as an LBP code.Then, an occurrence histogram, as a nonparametric statistical estimate, is calculated; during the process, a binning procedure is required to ensure that the extracted histogram features have the same dimensions.Furthermore, to deal with textures at different scales, the LBP operator may use windows of different sizes [39].The original rotation invariant LBP is achieved by circularly rotating each bit pattern to the minimum value.In our work, a variant of LBP called "uniform-LBP" is employed, where the histogram has a separate bin for every uniform pattern, while all non-uniform patterns are assigned to a single bin.Here, a pattern is called uniform if the binary pattern contains at most two circular 0-1 and 1-0 transitions.Selecting uniform patterns appropriately can reduce the length of the feature vector, thereby improving classification performance [19,40,41].

Consider a dataset with training samples
x in a d-dimensionality space with class labels being denoted as , where C is the number of classes, n is the total number of training samples.Let the number of training samples in the l-th class be represented as l n .In the NRS, l y represents the class-specific approximation in the l-th class of a test pixel y , which is calculated via a linear combination of available training samples in the l-th class l X , with the weight factor l α as, The optimal l α can be solved by an  2  norm regularization described as, where λis a regularization parameter , and the biasing Tikhonov matrix defined as, which can reflect the data locality structure in the calculated weight coefficients; that is, the labeled samples that are most dissimilar to the testing pixel will provide a much smaller contribution to the linear representation.A closed-form solution of the weight vector l α in Equation ( 8) can be directly calculated as, After the weight vector is obtained, the class label of the testing pixel is determined according to the class that minimizes the Euclidean distance between l y and y , i.e., In our work, the 1 l -norm minimization is solved using the l1_ls package (SRC with the 1 l -norm minimization is implemented using the l1_ls package [42].Then, similarly, the label of the testing pixel is determined by computing the representation residual per class.

Residual-Fusion-Based-Collaborative Representation
Figure 2 illustrates the flowchart of the proposed weighted-residual-fusion-based classifier (NRC or SRC) that merges three representation residuals from using three types of features (i.e., LBP features, Gabor features, and the original spectral features).During the feature exaction process, band-selection-based dimensionality reduction is conducted with the linear prediction error (LPE) criterion [43,44], which is an unsupervised band selection method to find a small set of distinctive and informative bands.Based on the concept that a band yielding the maximum reconstruction error is considered as the most dissimilar band, LPE begins with the best two-band combination, and then augments this two-band combination to three, four, and so on, until a desired number of bands is selected.For each selected band, the LBP operator and the Gabor filter are used to extract spatial features as described previously.As shown in Equation (11), each NRS and SRC classifier (before assigning class membership) produces a certain representation residual.The residuals from using different types of features are combined to generate a weighted sum as r y r y r y r y (13) where are the residuals when using the l-th class' labeled samples with the original spectral features (the resulting classifiers are denoted as Spec-NRS for NRS or Spec-SRC for SRC), the LBP features (denoted as LBP-NRS or LBP-SRC), and with Gabor features (denoted as Gabor-NRS or Gabor-SRC), respectively.The class label of the testing pixel y is determined by the one producing the minimum weighted sum of residual.Here, w1, w2, and w3 are the weights of residuals when using the spectral signatures, LBP features, and Gabor features, respectively.A larger weight means the corresponding feature is more important in decision making, which is obviously data and class dependent.Note that these weights are imposed with non-negativity and sum-to-one constraints.In the proposed weighted-residual-fusion-based classifier, both LBP and Gabor features are extracted to present spatial information.Note that Gabor features are produced by the average magnitude response for each Gabor filtered image reflecting the global signal power, while the LBP-coded image is an expression of detailed local spatial features, such as edges, corners, and knots.Thus, Gabor filtering is a supplement to the local LBP that lacks the consideration of distant pixel interactions.It is expected that different features represent a testing pixel from different perspectives, and fusion of these features can enhance discriminative power.

Experimental Section
In this section, the performance of the proposed weighted-residual-fusion-based representation classifiers, denoted as RF-NRS and RF-SRC, on two real hyperspectral datasets is investigated.The classification results are compared with other algorithms, such as the original NRS or SRC using spectral signatures only, LBP-NRS or LBP-SRC using LBP features only, Gabor-NRS or Gabor-SRC using Gabor features only, and SVM using Gabor-filtering and LBP features.All experimental data are downloaded from a public website [45].
The first experimental data were collected by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor [46].The image scene, with a spatial size of 610 × 340 pixels, covers the city of Pavia, Italy.The data set has 91 spectral bands after water-band removal.It has a spectral coverage from 0.43-0.86um and a spatial resolution of 1.3 m.There are nine classes from the ground truth map.
The second data were collected by the AVIRIS sensor over an area in Salinas Valley, California with a spatial resolution of 3.7 m.The image comprises 512 × 217 pixels with 204 bands (after 20 water absorption bands are removed).It mainly contains vegetables, bar soils, and vineyard fields.There are 16 classes from the ground truth map.With the same sensor, another scene was collected over northwest Indiana's Indian Pine test site in 1992.The scene consists of 145 × 145 pixels with a spatial resolution of 20 m.There are also 16 different land-cover classes.The information of classes and their labeled samples for both the Salinas and University of Pavia datasets are given in Tables 1 and 2. All experiments are carried out using MATLAB on an Intel i7 quad core 3.40-GHz machine with 10 GB of RAM.
Figure 3 are used in this work according to [47], and the parameter bw in Equation ( 4) is set to five and 10 bands for the Gabor filter according to our empirical study shown in Figure 4.Note that selecting more bands may not significantly improve classification accuracy, but definitely increases computational cost due to the resulting higher feature dimensionality.Thus, only three bands are selected for the LBP operator and 10 bands for the Gabor filter.Figure 5 further illustrates the effect of patch sizes on the LBP.It can be seen that classification accuracy tends to be the maximum with 21 × 21 patch size for the University of Pavia data.
To estimate an optimal  , Figure 6 3-6, the best values of w1, w2 and w3 are: 0.2, 0.3, 0.5; 0.6, 0.1, 0.4; 0.2, 0.7, 0.1; 0.6(0.7),0.1, 0.3 (0.2), respectively.The weight of each residual indicates the importance of the corresponding feature.For example, in Table 3, as Gabor features perform well in the University of Pavia dataset, its weight is 0.5, much higher than the other two; in Table 4 for the Salinas dataset, where the spectral signatures actually provide the highest classification accuracy, the corresponding weight is 0.6, larger than others.

Classification Performance
The effectiveness of the proposed RF-NRS classifier is evaluated by the comparison with the original NRS [3], Gabor-NRS [16], and LBP-NRS; similarly, RF-SRC is compared with the original SRC [6], Gabor-SRC, and LBP-SRC.In the University of Pavia dataset, the number of training samples per class is varied from 30-100.In the dataset of Salinas Valley, the number of training samples per class is from 10-30.All the samples are selected randomly to avoid bias.
To validate the effectiveness of the proposed RF-NRS and RF-SRC, the original NRS, SRC, LBP-NRS, LBP-SRC, Gabor-NRS, and Gabor-SRC are compared.Figure 7 illustrates the classification accuracy (%) of these methods versus the number of training samples per class for the two experimental datasets.It is obvious that the accuracy of all the classifiers increases with more training samples, and the classifiers using both spectral and spatial information are much better than the one with solely spectral information.For instance, in Figure 7d, the classification accuracy of the original SRC is at least 23% lower than that of RF-SRC with 60 training samples per class.Furthermore, the proposed RF-based classifiers are consistently much better than that of LBP-NRS, Gabor-NRS, LBP-SRC, and Gabor-SRC.Taking the Salinas Valley as an example, the accuracy difference between RF-NRS and LBP-NRS is approximately 2%.When the number of training samples per class is 30 for the University of Pavia dataset, the accuracy of RF-NRS is 6% higher than the others, indicating the proposed method is more robust in SSS situations.For the Salinas dataset, Figure 7 shows that the The classification accuracy for each class and overall accuracy (OA) are listed in Tables 7 and 8 for the two datasets.For the Salinas dataset, 30 samples are randomly selected from each class for training and the rest for testing, while for the University of Pavia dataset, the number of training samples is 60 per class.As can be seen in Tables 7 and 8, using spatial features enhances classification accuracy.For example, in Table 7, accuracy is increased by approximately 8% with the integration of the LBP feature, and Table 8 shows the Gabor feature brings about 9% improvement.Figures 8 and 9 are the thematic maps of these hyperspectral datasets.Clearly, the classification maps of the proposed residual-fusion methods are less noisy and more accurate, which are consistent with the results listed in Tables 7 and 8.   Classification results from the aforementioned classifiers using the Indian Pines data are shown in Table 9 with the same number of training and testing samples as in [48].It is apparent that the proposed RF-NRS and RF-SRC still provide superior performance, which further affirms that classification accuracy can be greatly improved by fusing two complementary spatial features (i.e., Gabor features and LBP features).
In Figure 10, the classification results of the proposed methods are compared with SVM using spatial features, i.e., LBP-SVM [49] and Gabor-SVM [50].As can be seen, the classification accuracy of RF-NRS is more than 1% higher than that of RF-SRC.It is due to the calculation of weight coefficients is sparse in SRC, which is prone to misclassify a testing sample when the number of training samples is limited or the quality of training samples is poor.The proposed RF-NRS and RF-SRC have higher classification accuracy (at least 4%) than Gabor-SVM and LBP-SVM especially in the University of Pavia dataset.Moreover, with the increase of the number of training samples, the proposed RF-NRS and RF-SRC outperform Gabor-SVM and LBP-SVM.It also illustrates that the residual-based fusion methods provide even more solid and robust performance in SSS situations (e.g., 30 training samples per class).

Conclusions
In this paper, a weighted-residual-fusion-based classification framework was proposed.The overall classification utilized multiple features, including LBP features, Gabor features, and the original spectral features.A representation-based classifier, such as NRS or SRC, was applied to each type of feature, and fusion was achieved by a weighted combination of their residuals.It was found that the resulting classifiers, i.e., RF-NRS and RF-SRC, were more discriminative than the original spectral classifiers and the classifiers with the Gabor feature and the LBP feature only.Experimental results from several real hyperspectral images demonstrated that the proposed residual-fusion-based classification methods consistently outperformed the traditional classifiers and other state-of-the-art classification algorithms (e.g., Gabor-SVM and LBP-SVM) when the number of training samples per class was varied.Specifically, the proposed RF-NRS achieved about 6% classification accuracy improvement over the traditional Gabor-NRS and Gabor-SVM, and about 2% over the LBP-NRS and LBP-SVM for the Salinas dataset when 30 training samples per class were chosen.RF-NRS outperformed Gabor-NRS by about 5%, LBP-NRS by about 9%, LBP-SVM by about 10%, and Gabor-SVM by about 9% for the University of Pavia data when 60 training samples per class were chosen.In addition, the proposed RF-NRS and RF-SRC tended to be more robust in SSS situations.
In future work, the spectral-local-global feature fusion strategy will be exploited with more state-of-the-art features, and automatic estimation of optimal/suboptimal weighting parameters will be investigated.

Figure 1 .
Figure 1.An example of LBP binary thresholding: (a) a 3 × 3 window; and (b) binary labels of the surrounding eight neighbors (in the clockwise direction).
The basic idea of SRC is to represent the testing pixel y as a sparse linear combination of a training sample dictionary X containing samples from all the C classes.Then, the sparse representation coefficients

Figure 2 .
Figure 2. Flowchart of the proposed residual-fusion-based strategy with multiple features and representation-based classifiers.

Figure 6 .Figure 3 .
Figure 6.In the Salinas data, the optimal parameter is 0.5 for NRS and 0.1 for SRC; in the University of Pavia data, it is 1 for NRS and 0.1 for SRC.

Figure 4 .
Figure 4. Classification accuracy versus the number of selected bands for: (a) University of Pavia dataset (60 training samples per class); (b) Salinas dataset (30 training samples per class).

Figure 5 .
Figure 5. Impact of patch size on LBP generation using the University of Pavia dataset.

Figure 6 .
Figure 6.Classification accuracy (%) on two datasets using different parameters for  : (a) and (b) Salinas dataset with 30 training samples per class, and (c) and (d) University of Pavia dataset with 60 training samples per class.

Figure 10 .
Figure 10.Classification accuracy (%) versus the number of training samples per class for: (a) University of Pavia dataset; (b) Salinas dataset.

Table 1 .
Labeled samples for the University of Pavia dataset.

Table 2 .
Labeled samples for the Salinas dataset.

Table 3 .
Classification accuracy (%) versus the values of 1

Table 4 .
Classification accuracy (%) versus the values of 1 w and 2 w for RF-NRS in Salinas dataset with 30 training samples per class.

Table 5 .
Classification accuracy (%) versus the values of 1 w and 2 w for RF-SRC in University of Pavia dataset with 60 training samples per class.

Table 8 .
Classification accuracy (%) for the University of Pavia dataset.

Table 9 .
Classification accuracy (%) for the Indian Pine dataset.