3D-Gabor Inspired Multiview Active Learning for Spectral-Spatial Hyperspectral Image Classiﬁcation

: Active learning (AL) has been shown to be very effective in hyperspectral image (HSI) classiﬁcation. It signiﬁcantly improves the performance by selecting a small quantity of the most informative training samples to reduce the complexity of classiﬁcation. Multiview AL (MVAL) can make the comprehensive analysis of both object characterization and sampling selection in AL by using various features of multiple views. However, the original MVAL cannot effectively exploit the spectral-spatial information by respecting the three-dimensional (3D) nature of the HSI and the query selection strategy in the MVAL is only based on the disagreement of multiple views. In this paper, we propose a 3D-Gabor inspired MVAL method for spectral-spatial HSI classiﬁcation, which consists of two main steps. First, in the view generation step, we adopt a 3D-Gabor ﬁlter to generate multiple cubes with limited bands and utilize the feature assessment strategies to select cubes for constructing views. Second, in the sampling selection step, a novel method is proposed by using both internal and external uncertainty estimation (IEUE) of views. Speciﬁcally, we use the distributions of posterior probability to learn the “internal uncertainty” of each independent view, and adopt the inconsistencies between views to estimate the “external uncertainty”. Classiﬁcation accuracies of the proposed method for the four benchmark HSI datasets can be as high as 99.57%, 99.93%, 99.02%, 98.82%, respectively, demonstrating the improved performance as compared with other state-of-the-art methods.


Introduction
Hyperspectral image (HSI) [1][2][3][4] contains hundreds of narrow bands, and has been extensively used in different application domains, such as forest monitoring and mapping [5,6], land-use classification [7,8], anomaly detection [9], endmember extraction [10] and environment monitoring [11]. Among those kinds of applications, supervised classification is a fundamental task and has been widely studied over the past decades [12,13]. Detailed spectral information is naturally beneficial for supervised land-cover classification in HSI. However, problems such as Hughes phenomenon can emerge due to the high dimension of HSI data [14]. To alleviate the above-mentioned problem, a lot of methods have been proposed in literature. For instance, band selection is studied to reduce the redundancy between contiguous bands [15] and spectral-spatial feature extraction takes advantages of the more distinguishable characteristics [16,17]. However, among those methods, sufficient labeled samples/pixels are crucial to get the reliable classification results [18]. Since it is difficult to obtain a large number of labeled samples due to the time-consuming and expensive manual labeling process [19], defining a set of high informative training set is one of the solutions.
To select the ideal training samples, active learning (AL) has been widely studied [20,21]. As a resampling approach, it seeks interactively to construct a set of the most informative training samples from the unlabeled data pool in a biased way, thus substantially reducing the human labeling cost without sacrificing the classification accuracy [22]. It is a man-machine interacted learning procedure, which is able to significantly improve the performance of classifiers when the selected samples are of high information.
Selecting informative samples plays a critical role in AL [23] and therefore, several query strategies have been proposed in the literature. The sampling selection methods can be divided into three families [24]. (1) Posterior probability-based heuristic, which gives confidence of the class assignment to estimate classification uncertainty of each candidate, such as breaking ties (BT) [25], mutual information (MI) [26] and Kullback-Leibler (KL)-Max strategy [27,28]; (2) Large margin-based heuristic, which uses the distance to hyperplane to estimate the confidence of classifier, with straightforwardly utilizing classifiers like SVM [29]. This family includes several representative methods, such as margin sampling (MS) [30], multiclass level uncertainty (MCLU) [31] and significance space construction (SSC) [32]; (3) Committee-based heuristic, which qualifies the uncertainty of samples by using the inconsistent hypothesis between each committee. Typical methods include the normalized entropy query-by-bagging (nEQB) [33], maximum disagreement (MD)-based criteria [34] and adaptive maximum disagreement (AMD) [35]. The traditional single-view AL (SVAL) is usually based on the first two families, and a new branch of AL named multiview AL (MVAL), which adhered to the principles of the third family with the information of multiple views, has attracted considerable interest over the past few years [36].
The MV is first introduced into AL domain in [37], where the Co-Testing algorithm is proposed to learn the hypotheses of the model. Different from the traditional SVAL, MVAL describes the objects by learning from several views [38,39]. MVAL has been proven to be very successful for HSI classification and the advantages of the MVAL lie in three aspects [34]. First, it provides a direct way to estimate the value of candidates using disagreement between each classifier. Second, it significantly decreases the number of training samples by exploiting complementary information, and meanwhile the learning procedure converges quickly with fewer learning steps. Third, the result of MVAL is more credible than SVAL, since the classification output is the combination of different predictions. Based on the afore-mentioned advantages, we focus on the MVAL for HSI classification in this paper.
In MVAL, the method of constructing multiple views and the strategy of sampling selection are the two core issues, which are of vital importance. As to the first issue, [40] introduces a feature splitting method and [41] presents four view generation approaches by utilizing the original spectral information, while the spectral-spatial based view generation strategies proposed in [34,42] can incorporate spatial information to bring the original data into a more discriminative space. Moreover, a feature-driven AL is proposed by [43], in which Gabor filtering and morphological profiles are used for instantiation. However, the original data in [43] is increased to 52 times of the bands, leading to the extremely large amount of channels. To solve the problem, we propose a three-dimensional Gabor (3D-Gabor) and cube assessment based method to generate the multiple views without augmenting the dimensions. As to the second issue, various query selection methods have been proposed in the last decade [24,34,35]. They provide the simple and direct approaches to utilize the nature of disagreement between multiple views. However, those methods ignore the abundant information of the probability distributions within each independent views. What is worse, different classifiers are likely to gradually converge to a similar stage as the iteration number increases, which causes the decrease of the efficient inconsistency information. Therefore, we propose an internal and external uncertainty estimation (IEUE) strategy, which makes full use of both uncertainties within each independent classifier and between different classifiers. Based on the above-mentioned analysis, we present a 3D-Gabor inspired MVAL method using spectral-spatial information for HSI classification in this paper. Compared to the existing literature, the contributions of this paper are twofold: 1. We propose a 3D-Gabor feature extraction and cube assessment based method for view generation.
Representative views are generated with only one certain frequency and direction to ensure the low dimensional filtering outputs. 2. We present an IEUE strategy to rank the unlabeled samples for selection. Compared to most of the existing query strategies, the proposed method takes advantages of the integral of posterior distribution and committee-based disagreement.
The remainder of this paper is organized as follows. Section 2 introduces the proposed MVAL framework, in which 3D-Gabor feature extraction, cube assessment, IEUE query method and the output strategy are described in detail. Section 3 reports the experimental results on four benchmark HSI datasets. Section 4 gives some subsequent discussions of our proposed method and conclusions are drawn in Section 5.

Proposed Method
In this section, we describe the proposed spectral-spatial based MVAL in detail. The block diagram of our framework is shown in Figure 1, which is composed of two parts: view generation and query selection. Firstly, since the sufficiency, independency and accuracy are the vital issues in view generation [41], a novel approach obeying these principles is proposed to construct multiple views. Notice that each view is able to provide the sufficient information and is distinct from others, classifiers trained from those views can make reliable disagreement assumption. Sequentially, based on the prediction output of classifiers, the IEUE strategy ranks candidates by their internal-external uncertainties both independently and comprehensively. The samples with high controversy degree are regarded as informative. A certain batch of them are selected from the unlabeled pool in each iteration. The selected samples are then labeled by manual work and added into the training set to boost the generalization of the classifiers. The whole interaction repeats until N times or the proposed stopping criteria is met. Finally, we use the majority voting approach to combine the classification output of each view.    Figure 2 plots the proposed 3D-Gabor and cube assessment based view construction method. Firstly, the 3D-Gabor filters with various frequencies and directions are adopted for converting original HSI into multiple data cubes, which can provide different spectral-spatial information between each other. Sequentially, a cube evaluation criterion, which is motivated by the Fisher's ratio criteria (FR) [44] and conditional mutual information [45], is proposed to calculate the sufficiency and independency of the cubes obtained from the first step. Cubes which have qualified assessment are regarded as views in our proposed method.

3D-Gabor
Gabor transform is closely related to human visual, and has been widely used in face recognition [46], texture classification [47] and information mining [48]. As the 3D extension of the traditional two-dimensional Gabor (2D-Gabor), 3D-Gabor is a powerful tool for feature extraction of the HSI data. It can capture the spatial and spectral information simultaneously. By respecting the 3D natural characteristic of the HSI data, 3D-Gabor has shown remarkable performance in HSI data analysis [49]. Specifically, a 3D-Gabor kernel can be designed as follows G ω,ϕ,θ (x, y, λ) = g(x, y, λ) exp j(xω x + yω y + λω λ ) (1) where ω denotes the central frequency of the wave vector, and ω x , ω y , ω λ are the projections of vector along x axis, y axis and spectral dimensions. ϕ denotes the angle between the vector and spectral dimensions and θ is the angle between the projection of the wave vector on x-y plane and x axis. The factor g(x, y, λ) refers to a 3D Gaussian envelope in the (x, y, λ) domain, and other factors are exponential harmonic. As shown in Equation (1), the 3D-Gabor filter captures spectral-spatial information in a comprehensive manner. It is noteworthy that in HSI, discriminative information for classification is tend to appear on low frequencies in the spatial dimension and high frequencies in the spectral dimension [50,51]. Therefore using spatial smoothing and differential spectral preservation can enhance the class separability [49]. We apply the low-pass filter using Gaussian-enveloped cos harmonic to extract the smooth spatial feature to preserve the integral structure of images and reduce the noise and high-pass filter with Gaussian-enveloped sin harmonic to obtain the distinct signal in spectral domain.
Moreover, several frequencies and orientations are adopted to generate several data cubes with different spectral-spatial features. Specifically, the parameters of frequencies and orientations used in this paper are set as follows • ω ∈ { 1 4 , 1 8 , 1 12 , 1 16 , 1 20 }; It is noteworthy that when ϕ = 0, the wave vector points to the same direction with different θ, thus leading to a total of 13 orientations. Furthermore, each cube is generated with a certain frequency ω and a pair of orientations {ϕ, θ}. Therefore, the obtained gabor cube M = Gabor(data, ω, ϕ, θ) owns the same size as original HSI data. We can get 65 Gabor cubes varying in frequencies and directions, and the cube assessment criterion can be adopted to select the most suitable ones from the generated 65 Gabor cubes.

Cube Assessment
A. Fisher's ratio In this paper, the FR criteria is adopted to measure the class separability of each 3D-Gabor cube. Using the between-class and within-class scatter matrices, the FR of each pair of classes is modeled as where the numerator represents between-class scatter matrix based on the means of class i and class j, and the denominator is within-class scatter matrix based on the variance of those two classes. FR i,j implies the class separability between classes, the larger the FR i,j , i.e., the better discriminating capability. In this paper, the overall FR is calculated by the mean value of each two class FR i,j , which can be written as where r denotes the number of ground class, and r(r−1) 2 represents the number of FR i,j obtained by each pair of classes. It is noteworthy that we perform the FR on the small size of the initial labeled training samples D s in this paper. To address the resulting dimensional issue, we calculate the FR on band level and then average the result. As such, the modified FR can be formed by As an example, Table 1 displays the FR (i.e., the FR obtained from Equation (4)) of each 3D-Gabor cube for the Indian Pines dataset. Detailed descriptions of this image is shown in Section 3.1. It is observed from Table 1 that cubes with lower frequency often achieve higher FR. Intuitively, the cubes with larger FR are more suitable for constructing views since they can separate the classes much better. However, it is noteworthy that similar cubes can yield close FR values. Notably that selecting similar cubes will reduce the effectiveness of the MVAL, we manually discard the cubes with repeated FR, and pick up the cubes whose FR are larger than a certain threshold τ. Subsequently, the conditional mutual information is adopted to select the cubes with both high discriminability and low redundancy.

B. Conditional mutual information
Conditional mutual information measurement is utilized to generate multiple views that are dissimilar with each other. Let c = {c 1 , c 2 , ..., c r } be the ground truth labels, M x×y×λ i be the ith 3D-Gabor cube with three dimensions x, y and λ, and B bi be the bth band of ith cube, the mutual information of M i to c can be defined as where H(B bi ) denotes the entropy of B bi and H(B bi |c) is the conditional entropy of B bi given c. Subsequently, we can measure the information shared by M j and c given M i according to the following conditional mutual information where H(B bj |B bi ) denotes the conditional entropy of B bj given B bi , and H(B bj |c, B bi ) refers to the entropy of B bj given B bi and c. It is noteworthy that the larger I(M j , c|M i ), the more dissimilar between M j and M i .

IEUE Query Selection
Before we propose our IEUE technology, we first introduce the multinomial logistic regression (MLR) classifier that we adopt to learn the prediction of classification. It is a discriminative classifier which can learn the useful features and has been proven to be very successful for HSI classification [20,52]. The MLR can directily models the posterior densities by using the following algorithm where [20]. The regressors are inferred by the logistic regression via variable splitting and augmented Lagrangian algorithm (LORSAL) [53]. In this paper, we use MLR to learn the posterior distributions and the prediction labels for query selection, and this classifier is also adopted to make the final outputs of the classification results. Figure 3 provides a schematic illustration of how to calculate the IEUE. In order to fully exploit the discriminative information of both within-view and between-view, the IEUE of a certain candidate x can be defined as where IU and EU denote the internal and external uncertainty of views, respectively. On the one hand, we propose a multiview breaking ties criteria inspired by traditional breaking ties (BT) strategy [43] to estimate the IU. Let c be the ground truth labels, the IU is given as where c + denotes the class label with the maximum probability and P(·) represents the comprehensive posterior probability of each class, which is the summation of the view-wise probability distributions where p i (c|x) denotes the independent distribution of the ith classifier. As shown in Equation (9), the IU focuses on the difference between the first and second maximum probabilities of classes. The candidate samples with smaller IU imply the more difficulty in labeling, therefore, those candidates are regarded as more informative, and vice versa. In this regard, the IU prefers the samples which lie on the boundary of two classes. On the other hand, to evaluate the EU, a maximum disagreement criteria based on the disagreement of the prediction labels of different classifiers is adopted in this paper. Let {1, 2, ..., k} be the classifier indexes and the EU is modeled as where l i (x) denotes the prediction label of the ith classifier for x and |unique(·)| is the number of unique prediction labels among classifiers. The samples which have higher EU are considered to be selected preferentially so as to guarantee that the chosen samples are the most controversial sets. Based on the above analysis, IEUE utilizes the contents of both posterior probability distributions and committee-based disagreement. We can rank the IEUE of candidate samples in an ascend manner, and the samples with minimum value are regarded as the most informative ones. Subsequently, the classifier selects the first few candidate samples with smaller IEUE and add them into the training set.

Output Strategy
To combine the classification results of individual classifiers, the majority voting is adopted in this paper. Since views are selected by cube assessment, each view is qualified to train the classifier independently. In majority voting, the classification labels of each view are treated equally. For a certain sample x, the algorithm adopts the label which emerges the most frequently among classifier hypothesis (i.e., owning the most votes) as its output label [37]. The final output of the prediction labels can be written as where |count(·)| stands for counting the frequencies of output labels from each classifier l i . As shown in Equation (12), majority voting is a very simple scheme to combine the results and it is appropriate when there are at least three views.

Experiments
In this section, we evaluate the performance of the proposed method on four benchmark hyperspectral datasets. For simplicity, we abbreviate our method as 3D-Gabor-IEUE. Extensive experiments are conducted to make the comprehensive comparisons with other state-of-the-art methods. Firstly, we make the comparison of the recently proposed methods, including multiview disagreement-intersection (MV3D-DisInt) based method [34], multiview disagreement-singularity (MV3D-DisSin) based method [34], Gabor-breaking ties (Gabor-BT) [43] and PCA-Gabor scheme [16]. To make more specifical comparison, we compare the 3D-Gabor-IEUE with other view generation and query selection methods as well. The comparison of view generation approaches is conducted using spectral information, 2D-Gabor [54] and the 3D-Gabor construction with no cube assessment named 3D-Gabor(no CA). The comparison of query selection is performed by maximum disagreement (MD) [34], entropy query-by-bagging (EQB) [24], adaptive maximum disagreement (AMD) [41], breaking ties (BT(SV)) [43] and randomly sampling (RS(SV)). It is notable that MD, EQB, AMD belong to the MVAL selection, while BT(SV) and RS(SV) are the query strategies of SVAL.

Data Description
Four publicly available HSI datasets are employed as benchmark sets, including the Indian Pines data, KSC data, University of Pavia data and University of Houston data. The details of four HSI datasets are described as follows.
1. Indian Pines data: this dataset was collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor over a mixed agricultural area in the Northwestern Indiana, USA, early in the growing season on 12 June 1992. Several bands affected by noise and water absorption phenomenon were removed from the original data, leaving a total of 200 channels of 10 nm width for experiments. This dataset contains 145 × 145 pixels with center wavelength from 0.4 to 2.5 µm. Figure 4 depicts the false color composition of the AVIRIS Indian Pines data as well as the map with 16 mutually exclusive ground truth classes. Since the number of samples is unbalanced and the spatial resolution is relatively low, the data poses a very challenging classification problem. Due to the noise and water-vapor absorption, 50 spectral bands were removed and 126 bands left. For classification purpose, 13 classes representing various land cover were manually defined. Figure 5 shows the false color composite and the corresponding ground truth map of the image.

Experimental Setup
In order to validate the framework of our 3D-Gabor-IEUE, we first compare it with other state-of-the-art algorithms including MV3D-DisInt, MV3D-DisSin, Gabor-BT and PCA-Gabor. The first three schemes belong to AL families and the last one is a non-AL classification method. MV3D-DisInt and MV3D-DisSin are derived from MD scheme. MV3D-DisInt focuses on reducing the redundancy of the selected training sets, while MV3D-DisSin prefers candidates which have higher spatial singularity. We use 3D-Gabor to construct the multiple views and sigularity maps instead of 3D-RDWT filter in [34]. Gabor-BT is a feature-driven SVAL algorithm which constructs high-dimension discriminative features to fully exploit the potential of AL. PCA-Gabor is a spectral-spatial response classification method based on PCA transform and Gabor filter.
Meanwhile, to specifically evaluate the performance of our proposed 3D-Gabor-IEUE, we compare it with other state-of-the-art methods from two aspects. First, we assess the view generation methods by fixing the query strategy as IEUE. Second, query selection methods are evaluated by using multiple views generated by 3D-Gabor. In greater detail, to assess the efficiency of our proposed 3D-Gabor-based method, we compare it with other view generation algorithms, i.e., Spec, 2D-Gabor and 3D-Gabor(no CA). Among those approaches, Spec denotes using the original spectral information without constructing multiple views, and it is equal to the SV with BT strategy. 2D-Gabor generates 20 cubes by the 2D-Gabor filter, and then the first several cubes with the largest FR value are selected as multiple views. 3D-Gabor(no CA) generates multiple views by randomly selecting a few cubes from the 65 Gabor cubes obtained by 3D-Gabor without cube assessment. For the sake of fairness, the frequencies, directions and the number of views are set to the same in the 2D-Gabor, 3D-Gabor(no CA) and 3D-Gabor. That means, the frequencies ω of the those methods are set to { 1 4 , 1 8 , 1 12 , 1 16 , 1 20 }, the orientations θ of the 2D-Gabor are set to {0, π 4 , π 2 , 3π 4 }, the orientations {ϕ, θ} of the 3D-Gabor(no CA) and 3D-Gabor are identically set to {0, π 4 , π 2 , 3π 4 }, and number of views is set to 8. Moreover, cube assessment is adopted in the 3D-Gabor method, whose threshold τ is set to 1 and the number of qualified cubes are 18, 21, 21 and 19 for the four datasets, respectively. Furthermore, in order to evaluate the performance of our IEUE query strategy, we compare it with other widely-used query methods, i.e., MD, EQB, AMD and two SV based strategies (BT and RS). MD focuses mainly on the number of distinct prediction labels among classifiers, while EQB is based on the entropy of the distribution of classification predictions. AMD selects the samples which have the maximum disagreement degree between each classifier. BT estimates the information of samples by calculating the difference between two highest class-wise probabilities and RS randomly selects the new training samples. In both BT(SV) and RS(SV), we use the principle component analysis (PCA) for dimension reduction, to keep the same size of combining 8 views as the original dataset.
In all the above-mentioned methods, the MLR classifier is adopted to learn the posterior probability distributions of each view. The MLR is implemented via the variable splitting and augmented Lagrangian (LORSAL) algorithm and the parameters are set following the work in [53]. For each dataset, we randomly select 5 samples per class from each view to initialize the classifiers and the remainder of labeled samples are set to be the initial testing/candidate samples. (For PCA-Gabor method, we randomly select the same number of training samples as other methods in the final step.) The batch size of candidate samples acquired in each iteration is 3 and the iteration number is empirically defined. Detailed settings of the four datasets are displayed in Table 2.
Moreover, the aforementioned methods are compared quantitatively by four indexes, including overall accuracy (OA), average accuracy (AA), kappa coefficient (Kappa) and the accuracy of each class. Each experiment is conducted with ten independent Monte Carlo runs using random selection of initial training and testing/candidate samples.

Comparison with Spectral-Spatial Classification Methods
Four recently proposed spectral-spatial classification methods are compared with the proposed 3D-Gabor-IEUE. The experimental results of different methods are displayed in Tables 3-6. Two conclusions can be drawn from the results.

•
First, 3D-Gabor-IEUE outperforms the competing algorithms all over the four datasets, with both higher classification accuracy and lower standard deviations. For instance, in Table 3 of Indian Pines data, the OA of 3D-Gabor-IEUE achieves 99.57%, which is 2.72, 3.1%, 1.13% and 17.42% higher than another four state-of-the-art algorithms, respectively. In Tables 4 and 5, the proposed method obtains significantly higher value of OA, AA and Kappa than the MV3D-DisInt, MV3D-DisSin and PCA-Gabor, and slightly better than Gabor-BT. This observation can also be revealed from Figures 8 and 9, where 3D-Gabor-IEUE performs the best with the least misclassified samples. The impressive performance of 3D-Gabor-IEUE indicates its superiority in spectral-spatial HSI classifiation.
• Second, using the same amount of training datasets, AL based methods (i.e., MV3D-DisInt, MV3D-DisSin and Gabor-BT) achieves better results than non-AL scheme (i.e., PCA-Gabor). Specifically, the OAs of AL-methods are at least 14.32%, 6.21%, 3.27% and 17.84% higher than PCA-Gabor in Indian Pines data, KSC data, the University of Pavia data and the University of Houston data, respectively. It is observed that AL technology can perform well with very limited training samples. For instance, in Table 6, the 3D-Gabor-IEUE achieves the OA of 98.82% with only 255 training samples (5 for initialization and 180 are iteratively selected). Those results demonstrate the effectiveness of AL scheme.

Comparison with View Generation Methods
Three view generation methods are compared with the proposed 3D-Gabor by fixing the query strategy as IEUE in this paper. The classification accuracies of different methods are shown in Tables 3-6 and the classification maps are displayed in Figures 8 and 9. Two observations can be revealed from the experimental results.

•
First, the 3D-Gabor-based feature extraction method outperforms the other applied algorithms on the four datasets. The encouraging improvements of classification accuracies confirms that the 3D-Gabor is powerful to extract discriminative information by obeying the 3D nature of the HSI data. On the contrary, the Spec and 2D-Gabor provide higher classification errors than the 3D-Gabor-based methods. It is observed from Table 5 that the OA of Spec is 8.47% and 9.62% lower than 3D-Gabor(no CA) and 3D-Gabor for the University of Pavia dataset, respectively. Similar properties of OA, AA and Kappa are also shown from the other results. The undesirable performance of original spectral data is plausible since the spectral data disregards the critical spatial information. Moreover, 2D-Gabor performs worse than the two 3D-Gabor-based methods in all of the datasets, suggesting that the spatial information can not be readily extracted with 2D-Gabor. When compared to original spectral data, it is even worse as unexpected in the University of Pavia data ( see Table 5) and University of Houston data (see Table 6). This surprising finding indicates that the spatial information extracted by 2D-Gabor might contain many unwanted signals.   • Second, 3D-Gabor with the cube assessment (i.e., 3D-Gabor-IEUE) provides better or comparable performance than 3D-Gabor(no CA). For instance, as shown from Tables 3 and 5, the OA of 3D-Gabor is substantially higher than that of the 3D-Gabor(no CA). The reasons for better performance of 3D-Gabor lies in two important aspects, i.e., the cube assessment can successfully reject cascade of underqualified cubes by utilizing FR measurement, and can generate distinct views by conditional mutual information restriction. Both of two aspects lead to more reliable classification results.

Comparison with Query Selection Methods
Five query selection methods are compared by fixing the view generation as 3D-Gabor, to demonstrate the effectiveness of the proposed IEUE strategy. The quantitative results are displayed in Tables 3-6. Two conclusions can be obtained from the experimental results.

•
First, the proposed IEUE performs best as comparing to other methods (including both MV-based and SV-based families) for the four datasets in terms of the highest accuracies (see 3D-Gabor-IEUE). It can further be observed from Table 3 that the OA, AA and Kappa of IEUE are much higher than those of the MD, AMD, BT(SV) and RS(SV), and slightly higher than the EQB. The experimental results displayed in Table 3 also demonstrate that our method is more stable, as evidenced by the smaller standard deviation. Classification performance of the KSC (see Table 4), the University of Pavia (see Table 5) and the University of Houston (see Table 6) also yield the similar properties. Specifically, the separation of "graminoid marsh" (i.e., class 8) and "spartina marsh" (i.e., class 9) from other classes is difficult in Table 4 of the KSC dataset, and the classification accuracies of those two classes by IEUE outperform other methods significantly. Unlike the most of the existing MVAL query strategies which only focus on the between-view inconsistencies, the main reason for the best performance of IEUE is that it takes both internal and external uncertainty into consideration. Therefore, the proposed IEUE method can provide more discriminative information for selecting the most informative samples.
• Second, the MVAL methods (i.e., MD, EQB, AMD and IEUE) can almost achieve better results than SVAL families (BT(SV), RS(SV)) for the four HSI datasets. This observation can be quantitatively concluded according to Tables 3 and 4. The main reason for the superior performance of MVAL is that this kind of method combines the results of each classifier, thus reducing the number of misclassified samples and leading to better classification performance.

Assessment of Selected Samples
It is known that in AL classification, the discriminative information carried by uncertain samples is significantly larger than others. In order to explore the sampling effectiveness of the proposed IEUE method, we display the selected samples and the initial class-wise accuracies for the four HSI datasets in Figure 10. However, the number of samples for different classes is different. To solve the problem, we adopt the ratios of the numbers to the corresponding classes for illustration. It can be observed that the IEUE prefers the samples from the classes which are difficult to classify. For instance, class "corn-min till" (class 3) is identified to be the difficult class in Indian Pines dataset (see Figure 10a), since the initial overall accuracy of this class is low as compared to other classes and the IEUE is inclined to choose more samples which belong to this class. On the contrary, as can be seen in Figure 10b, the IEUE selects the least samples in class "cattail marsh" (class 6) and class "water" (class 10), since these classes achieve the highest two accuracies and they are easy to classify. Similar trends for the other two datasets can be found in Figure 10c,d. In a nutshell, the preference to the informative samples which belong to difficult classes demonstrates the sampling effectiveness of our IEUE query strategy.

Analysis of Computational Complexity and Learning Rate
Taking the the learning speed into consideration, we show the required elapsed time of three query algorithms for the Indian Pines and the KSC data in Table 7. The hardware that we utilize for the experiments is Intel CPU, 3.40 GHZ. From the result table, the MD and IEUE is faster than EQB, while the time of MD is slightly less than the proposed IEUE. It is due to the fact that IEUE operates the probability subtraction based on MD algorithm. Moreover, the time of EQB is relatively higher. It is because that EQB operates the calculation of the entropy for each candidate. Moreover, we display the learning curves of the above-mentioned three methods in Figure 11 to compare the learning rates. It can be observed that the proposed IEUE converges significantly faster with higher accuracies. For KSC data, although both the IEUE and EQB can achieve 100% accuracies finally, the training steps are fewer for IEUE. Those observations demonstrate the learning efficiency of the proposed method.

Discussion
According to the above-mentioned experiments, the proposed 3D-Gabor-IEUE outperforms almost all the applied methods, especially the experiments using only the spectral information. From the Spec to 3D-Gabor-IEUE, the improvements of OA of the four datasets are 19.21%, 9.47%, 9.62% and 14.49%, respectively. This illustrates that the 3D-Gabor-IEUE method is a reliable spectral-spatial classification method. Furthermore, it seems the proposed method has potential to deal with the challenging data like Indian Pines (with the spatial resolution of 30 m and 200 channels). The main reason of the significant improvements of Indian pines are two folds. First, the original HSI with only spectral characteristics are often mixed. Samples from different classes tend to be overlapped, especially when then spatial resolution is limited. In addition, since the contiguous bands in HSI data are highly correlated, the differential information between each band can not be fully exploited. The proposed IEUE technology is based on the posterior probability distributions of each classifier. Therefore, it is necessary to construct the feature space where the class boundaries are clearer. In this paper, the 3D-Gabor with smoothness in spatial domain and discrimination in spectral domain is used, leading to a more discriminative feature space as comparing to the original data. The 3D-Gabor features significantly boost the performance of IEUE. Second, the samples queried by 3D-Gabor-IEUE are very informative. Adding those samples to the training sets can improve the performance quickly. Therefore the accuracies are significantly improved with very limited training data.

Conclusions
In this paper, we have presented a 3D-Gabor inspired MVAL framework (i.e., 3D-Gabor-IEUE) for spectral-spatial HSI classification. The proposed method consists of two major steps. The first one is the view generation approach based on 3D-Gabor feature extraction and cube assessment criterion. The main advantage of our view generation is that it not only provides sufficient information for independent classification, but also satisfies the requirement of diversity between views. Compared to the 2D-Gabor and original HSI data, the class separating capability is significantly increased with 3D-Gabor transformation. The second one is the proposed IEUE method for sampling selection. Using the posterior probability distribution and the disagreement between views, this strategy comprehensively estimates the uncertainty on both internal and external aspects. Compared to the traditional SVAL and disagreement-based MVAL scheme, samples are evaluated in an more discriminative manner by using the IEUE and the highly informative candidates can be accurately selected. The effectiveness of the proposed method is evaluated on four AVIRIS and ROSIS datasets. Quantitatively, the OA of the 3D-Gabor-IEUE improves at most 22% compared to other state-of-the-art methods. In the future, we will fully exploit the internal information within each view to make more convincing estimation for sampling selection. Furthermore, combining other data sources in generating views and using other HSI data for experiments is also a probable future research direction.