Next Article in Journal
Advances in the Remote Sensing of Terrestrial Evaporation
Next Article in Special Issue
Speckle Noise Reduction Technique for SAR Images Using Statistical Characteristics of Speckle Noise and Discrete Wavelet Transform
Previous Article in Journal
Same Viewpoint Different Perspectives—A Comparison of Expert Ratings with a TLS Derived Forest Stand Structural Complexity Index
Previous Article in Special Issue
Hybrid Grasshopper Optimization Algorithm and Differential Evolution for Multilevel Satellite Image Segmentation
Article Menu
Issue 9 (May-1) cover image

Export Article

Remote Sens. 2019, 11(9), 1136; https://doi.org/10.3390/rs11091136

Article
Spatial Prior Fuzziness Pool-Based Interactive Classification of Hyperspectral Images
1
Dipartimento di Matematica e Informatica—MIFT, University of Messina, Messina 98121, Italy
2
Department of Computer Engineering, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan 64200, Pakistan
3
Institute of Data Science and Artificial Intelligence, Innopolis University, Innopolis 420500, Russia
4
School of Computer Science, South China Normal University, Guangzhou 510000, China
5
Institute of Software Development and Engineering, Innopolis University, Innopolis 420500, Russia
6
Faculty of Computing, Engineering and the Built Environment, Ulster University, Newtownabbey, Co Antrim BT37 0QB, UK
*
Correspondence: [email protected]; Tel.: +92-321-6617922
These authors contributed equally to this work.
Received: 26 March 2019 / Accepted: 5 May 2019 / Published: 13 May 2019

Abstract

:
Acquisition of labeled data for supervised Hyperspectral Image (HSI) classification is expensive in terms of both time and costs. Moreover, manual selection and labeling are often subjective and tend to induce redundancy into the classifier. Active learning (AL) can be a suitable approach for HSI classification as it integrates data acquisition to the classifier design by ranking the unlabeled data to provide advice for the next query that has the highest training utility. However, multiclass AL techniques tend to include redundant samples into the classifier to some extent. This paper addresses such a problem by introducing an AL pipeline which preserves the most representative and spatially heterogeneous samples. The adopted strategy for sample selection utilizes fuzziness to assess the mapping between actual output and the approximated a-posteriori probabilities, computed by a marginal probability distribution based on discriminative random fields. The samples selected in each iteration are then provided to the spectral angle mapper-based objective function to reduce the inter-class redundancy. Experiments on five HSI benchmark datasets confirmed that the proposed Fuzziness and Spectral Angle Mapper (FSAM)-AL pipeline presents competitive results compared to the state-of-the-art sample selection techniques, leading to lower computational requirements.
Keywords:
hyperspectral imaging; active learning; fuzziness; spectral angle mapper; soft threshold

1. Introduction

Hyperspectral Imaging (HSI) is a technological tool which takes into account a wide spectrum of light instead of just primary colors such as red, green, and blue to characterize a pixel [1,2]. The light striking a pixel, indeed, is divided into many different spectral bands to provide more information on what is imaged. HSI has been adopted in a wide range of real-world applications including biomedical imaging, geosciences, and surveillance to mention a few [3]. One of the main challenges in the HSI domain is the management of the data, which typically yields hundreds of contiguous and narrow spectral bands with very high spatial resolution throughout the electromagnetic spectrum [2,4]. Therefore, HSI classification is complex and can be dominated by a multitude of urban classes and nested regions, than the traditional monochrome or RGB images [5].
Supervised classification methods are widely adopted in the analysis of HSI datasets. These methods include, for example, multinomial logistic regression [6], random forests [7], ensemble learning [8], deep learning [9], support vector machine (SVM) [10], and k-nearest neighbors (KNN) [11]. However, supervised classifiers often under-perform due to the Hughes phenomenon [12], also known as the issue of dimensionality, which occurs whenever the number of available labeled training samples is considerably lower than the number of spectral bands required by the classifier [11]. Figure 1 (Loss of accuracy in terms of ground maps) and Table 1 (Loss of accuracy in terms of overall and kappa ( κ ) for different number of labeled training samples i.e., 1% and 10% respectively) illustrates the loss in predictive performance of such classification methods for a particular ground image (Pavia University) when using two different sample size.
The limited availability of labelled training data in the HSI-domain is one of the motivations for the utilization of semi-supervised learning [13]. Examples of such methods include kernel techniques [14] such as SVM, Tri-training [15] algorithms which generates three classifiers from the original labeled samples, then these classifiers are refined using unlabeled samples in the tri-training process, and Graph-based learning [16,17]. A major limitation of such approaches, however, is the low predictive performance when utilizing a small number of training samples within high dimensionality, as commonly observed in HSI classification [18,19] as shown in Figure 1 and Table 1.
Active learning (AL) is a class of semi-supervised learning method which tackles the limitations as mentioned earlier [20,21]. The main component of an AL method is the iterative utilization of the training model to acquire new training samples to be entered to the training set for the next iteration [22]. AL methods can be pool-based or stream-based depending on how they enter new data to the training set, and employ measures like uncertainty, representativeness, inconsistency, variance, and error to rank and select new samples [23]. Despite the gained success, there are still particular characteristics which can cause AL to present inflated false discovery rate and low statistical power [11]. These characteristics include: (i) sample selection bias; (ii) high correlation among the bands; and (iii) non-stationary behavior of unlabeled samples.
This study introduces a customized AL pipeline for HSI to reduce sample selection bias whilst maintaining the data stability in the spatial domain. The presented pipeline distinguishes from standard AL methods in three relevant aspects. First, instead of simply using the uncertainty of samples to select new samples, it utilizes the fuzziness measure associated with the confidence of the training model in classifying those samples correctly. Second, it couples samples’ fuzziness with their diversity to select new training samples which simultaneously minimize the error among the training samples while maximizing the spectral angle between the selected sample and the existing training samples. In our current work, instead of measuring angle-based distances among all new samples and all existing training samples, a reference sample is selected from within the training set against which the diversity of the new samples is measured. This achieves the same goal while reducing the computational overhead as the size of training set is always much smaller than the validation set which is the source of new samples. Thirdly, the proposed Fuzziness and Spectral Angle Mapper (FSAM) method keeps the pool of new samples balanced, giving equal representation to all classes, which is achieved via softening the thresholds at run time. Experimental results on five benchmark datasets demonstrate that the customized spatial AL pipeline leads to an increased predictive power regarding kappa ( κ ) coefficient and overall accuracy, precision, recall, and F1-Score.
The remainder of the paper is structured as follows. Section 2 reviews related work on learning methods applied to HSI. Section 3 presents the theoretical aspects of the proposed FSAM-AL pipeline followed by a theoretical explanation of the implemented objective function. Section 4 discusses the experimental dataset and experimental settings. Section 5 presents the experimental results and intensive discussion on obtained results. Section 6 compare the experimental results with the state of the art sample selection methods used in AL frameworks in the recent most years. Finally, Section 7 summarizes the contributions and future research directions.

2. Related Work

The HSI technology has been employed in several real-world applications including object detection and recognition, change detection (The process of identifying the changes occurred over the time on the earth surface), human-made material identification, semantic annotation, unmixing, and classification [2,16]. However, some challenges can arise from typical characteristics of HSI data, notably the limited availability of labeled data which can lead to inflated false discovery rate and low statistical power [22]. This aspect results in a relatively poor predictive performance of supervised [24] and semi-supervised [25] learning methods when addressing HSI classification, as shown in Figure 1.
AL poses as an alternative for coping with the limited amount of labeled data by iteratively selecting informative samples for the training set [11]. Alternatives of sample selection method utilized in AL and corresponding references to the literature are shown in Table 2. Table 2 classifies the references in the literature according to the information utilized by the sample selection methods, being either spectral (consider only the wavelength of the pixel) or spectral–spatial (pixel location in addition to the wavelength). The latter class is particularly relevant in the HSI-domain as the acquisition of training samples depends on a large degree on the spatial distribution of the queried samples. However, only a few studies have integrated spatial constraints into AL methods [26,27]. In the Table 2 we have provided a unified summary of existing sample selection methods and the information they use along with references to their respective papers.
Tuia et al. [28] presented a detailed survey on AL methods addressing HSI analysis and contrasted non-probabilistic methods, which assume that all query classes are known before to the initialization, to probabilistic approaches that allow the discovery of new classes. The latter class was also pointed as more suitable for cases when the prior assumption is no longer fulfilled [44]. In addition to probabilistic and non-probabilistic AL methods, large margin heuristics have been utilized as the base learner to combine the benefits of HSI analysis and AL [28,45]. A particular approach for selecting samples that have achieved remarkable results for several applications is query by committee (QBC) [46]. Contrarily from previous methods, QBC selects samples based on the maximum disagreement of an ensemble of classifiers. Overall, these sampling methods suffer from high computational complexity due to the iterative training of the classifier for each sample [28].
Pool-based AL, also known as batch-mode AL, addresses the high computational complexity observed in the aforementioned methods by concomitantly considering the uncertainty (spectral information) and diversity (spatial information) of the selected samples [36]. A seminal work was presented by Munoz-Mari et al. [47], which highlighted the benefits of integrating spatio-contextual information to AL even when the distribution of queried samples in the spatial space is ignored. This method was later expanded to include the position of selected samples in the feature space [32]. One of the outcomes from such a transformation is the point-wise dispersed distribution in the spatial domain, which incurs the risk of revisiting the same geographical location several times, especially in the HSI-domain [32].
A considerable amount of research has been conducted on AL in the recent years, often analyzing only spectral properties, whilst ignoring spatial information that plays a vital role in HSI classification as shown in [32,48]. Spatial and spectral HSI classification can achieve higher performance than its pixel-wise counterpart as it utilizes not only information of spectral signature but also from spatial domain [48]. Thus, the combination of spatial and spectral information for AL represents a novel and promising contribution yet to be explored in the HSI-domain. This study proposes a customized pool-based AL pipeline which exploits both spatial and spectral information in the context of HSI classification.

3. Methodology

We address the small sample problem when classifying high dimensional HSI data by defining an AL scheme selecting a pool of diverse samples by taking into account two main criteria. The first is the fuzziness of samples, which is associated with the confidence of the trained model in properly classifying the unseen samples. The second is the diversity of the samples, thus reducing the redundancy among the selected samples. The combination of two criteria results in the selection of a pool of potentially most informative and diverse samples in each iteration.
Although there have been lots of different sampling methods (few mentioned in Table 2), uncertainty remains one of the most popular method that can be used to select the informative samples [11,49]. Usually, the most uncertain samples have similar posterior probabilities for two most possible classes [49]. Thus, probabilistic model could be directly used to evaluate the uncertainty of unlabeled sample [49].
However, assessing the uncertainty of a sample is not as straight-forward when one is using non-probabilistic (NP) classifiers because their output does not exist in the form of posteriori probabilities [23,49]. The output of such classifiers can be manipulated to obtain an approximation of posteriori probability functions for the classes being trained [23].
Suppose X = [ x 1 , x 2 , x 3 , , x L ] T R L × ( M × N ) is an HSI cube which is composed of L spectral bands and ( M × N ) samples per band belonging to C classes where x i = [ x 1 , i , x 2 , i , x 3 , i , , x L , i ] T is the i t h sample in the cube. Let us assume ( x i , y i ) R ( M × N ) × R C , where y i is the class label of the i t h sample. Let us further assume that n finite (limited) number of labeled training samples are selected from X to create the training set D T = { ( x i , y i ) } i = 1 n . The rest of the samples form the validation set D V = { ( x i , y i ) } i = 1 m . Please note that n m , and ( D T D V ) = .
An NP classifier trained on D T when tested on D V would produce an output matrix μ of m × C dimensions containing NP outputs of the classifier. Let μ i j be the j t h output (membership for the j t h class) for i t h sample. There are several methods proposed in the literature to transform such NP outputs into the posteriori probabilities [23,49]. Such methods are computationally complex in two folds. First these methods need to compute the Bayesian decision for each samples x i choosing the category y j having the largest discriminant function f j ( x i ) = p ( μ j | x i ) . Secondly, these methods assume that the training outputs are restricted as { 0 , 1 } . However, these methods also consider to manipulate each Bayes rule using Jacobin’s derivation over the limit theorem on infinite number samples to approximate the posterior probabilities in a least squares sense, i.e., f ( x i , μ j ) = p ( μ j , x i ) [49].
In order to overcome the above mentioned difficulties, in this work, we used marginal probability distribution [32] which is obtained form the D V information in the HSI data, serves as an engine in which our AL pipeline can exploit both the spatial and spectral information in the data. The posteriori class probabilities are modeled with the discriminative random field [32,50] in which the association potential is linked with discriminative, generative, ensemble and signal hidden layer feed forward neural network based classifiers. Thus, the posteriori probabilities are computed as similar to the work [32]. From these posteriori probabilities we obtained the membership matrix which should satisfy the following properties [11]:
j = 1 C μ i j = 1 a n d 0 < i = 1 N μ i j < 1
In Equation (1), μ i j [ 0 , 1 ] and μ i j = μ j ( x i ) is a function that represents the membership of i t h sample x i to the j t h class [11]. For the true class, the posteriori probability would be approximated as close to 1, whereas, if the output is small (wrong class), the probability would be approximated as close to 0. However, AL methods do not require accurate probabilities, but only need a ranking of the samples according to their posteriori probabilities which would help to estimate the fuzziness [49] and the output of the sample.
The fuzziness ( E ( μ ) ) upon ( M × N ) samples for C classes from the membership matrix ( μ i j ) can then be defined as expressed in Equation (2) which must satisfy the properties defined in [51,52].
E ( μ ) = 1 N × C i = 1 N j = 1 C μ i j l o g ( μ i j ) + ( 1 μ i j ) l o g ( 1 μ i j )
Then, we first associate E ( μ ) , predicted class labels, and actual class labels with D V and then sort the D V in descending order based on the fuzziness values. We then heuristically select the m ^ number of misclassified samples which have higher fuzziness, where m ^ m . The proposed strategy keeps the pool of m ^ new samples balanced, giving equal representation to all classes, which is achieved via softening the thresholds at run time.
Next, the spectral angular mapper (SAM) (More information about spectral angle mapper (SAM) function can be found in the following papers [53,54,55]) function is used to discriminate the samples within the same class to minimize the redundancy among the pool of m ^ selected samples. SAM is an automated method for directly comparing sample spectra to a known spectra. It treats both spectra as vectors and calculates the spectral angle between them. It is insensitive to illumination since it uses only the vector direction and not the vector length [56]. The output of SAM is an image showing the best match of each pixel at each spatial location [57]. This approach is typically used as a first cut for determining the mineralogy and works well in area of homogeneous regions.
In this work, SAM takes the arc-cosine-based dot product between the test spectrum which have higher fuzziness D V H = { ( ( x i j l ) , y j ) } i = 1 m ^ to a reference (training samples) spectrum D T = { ( x p j l , y j ) } p = 1 n , where j { 1 , 2 , 3 , , C } and l { 1 , 2 , 3 , , L } where L is the total number of bands in HSI dataset, with the following objective functions:
( α j ) = min p n c o s 1 l = 1 L x p j l · x p j l l = 1 L ( x p j l ) 2 l = 1 L ( x p j l ) 2
Equation (3) aims to compute the spectral difference among all the training samples for C classes, respectively. We then select one reference spectrum from each class which minimizes the angular distance among others within same class, i.e., the sample which is more similar to others in the given class. This process will return the number of reference spectrum’s up to the number of classes in HSI. We then pick one reference spectrum from ( α j ) to compare with all the selected test spectrum for the same class and account the angular distance among them in ( β i j ) as shown in Equation (4).
( β i j ) = j = 1 C i = 1 m ^ c o s 1 l = 1 L ( α j ) · x i j l l = 1 L ( ( α j ) ) 2 l = 1 L ( x i j l ) 2
I n d ( D V H ) = argmax i ( D V H ) | X ϕ ( β i j )
where I n d ( D V H ) denotes the induces of samples which have higher fuzziness, D V H | X represents the index of samples of D V H that are not contained in X , ϕ provides the trade-off between diversity, and X denotes the index of the unlabeled sample that will be included in the pool. Please note that here we used a soft threshing scheme to balance the number of classes in both training and selected samples. The proposed pipeline systematically select the ( h / ( n u m b e r o f c l a s s e s ) ) higher fuzziness samples from D V H for each class, if one or more classes missed in the pool of selected samples. This process is repeated until the cardinality of X is equal to h, i.e., | X | = h , where h is the size of pool. This technique guarantees that the selected samples in X are diverse regarding to their angles to all the others in ( ( β i j ) ). Since the initial size of X is zero, thus, the first sample included in X is always the higher fuzziness sample from E ( μ ) .
There are several advantages of using fuzziness information carried out through SAM as query function: (i) easy to implement; (ii) robust in mapping the spectral similarity for reference to higher fuzziness test spectrum only; (iii) powerful because it represents the influence of shading effects to accentuate the selected test reflectance characteristics [55]. On the other hand, the main drawback of SAM is spectral mixture problems, i.e., SAM assumes that the reference spectrum chosen to classify the HSI represents the pure spectrum which is not the case in our problem. Such problems occur when the HSI is in low or medium spatial resolution. Furthermore, as we know, the surface of the earth is heterogeneous and complex in many ways, thus, containing many mixed samples. The spectral confusion in samples can lead to overestimation or underestimation errors for a spectral signatures. This is not the case of the proposed solution, since we iteratively select the reference spectrum from each class using Equation (3) as a pure spectrum and comparing this with selected test spectrum respectively using Equation (4) with the help of whiting parameter to minimize the redundancy among the selected samples. The complete work-flow of our proposed pipeline is described in Algorithm 1 and Figure 2.
Algorithm 1: Pseudo-code of our Proposed FSAM Algorithm.
Remotesensing 11 01136 i001

4. Experimental Datasets and Settings

The performance of our proposed FSAM-AL pipeline is validated on five benchmark HSI datasets acquired by two different sensors, e.g., Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) and Reflective Optics System Imaging Spectrometer (ROSIS). These datasets include, Salinas-A, Salinas full scene, Kennedy Space Center (KSC), Pavia University (PU), and Pavia Center (PC) (Further information about these datasets can be found in [58]).
We evaluated the FSAM pipeline against four different classifiers: extreme learning machine (ELM) [59], support vector machine (SVM), k-nearest neighbor (kNN), and ensemble learning (EL). These classifiers are chosen because they have been extensively studied in the literature for HSI classification and rigorously utilized for comparison purposes. Furthermore, our goal is to show that the proposed method can work well with a diverse set of classifiers. The performance of aforementioned classifiers is measured using two well know metrics: overall accuracy and kappa ( κ ) coefficient [11]. Furthermore, F1-score, precision, and recall rates are also compared. To further validate the real time applicability of FSAM, we compared it against four benchmark sample selection methods, namely: random sampling (RS), mutual information (MI), breaking ties (BT), and modified breaking ties (MBT).
  • Random Sampling (RS) [11,28] method relies on the random selection of the samples without considering any specific conditions.
  • Mutual Information (MI) [32] of two samples is a measure of the mutual dependence between the two samples.
  • Breaking Ties (BT) [33] relies on the smallest difference of the posterior probabilities for each sample. In a multiclass settings, BT can be applied by calculating the difference between two highest probabilities. As a result, BT finds the samples minimizing the distance between the first two most probable classes. The BT method generally focuses on the boundaries comprising many samples, possibly disregarding boundaries with fewer samples.
  • Modified Breaking Ties (MBT) [34,35] includes more diversity in the sampling process as compared to BT. The samples are selected by maximizing the probability of the largest class for each individual class. MBT takes into account all the class boundaries by conducting the sampling in cyclic fashion, making sure that the MBT does not get trapped in any class whereas BT could be trapped in a single boundary.
In all experiments, the initial training size is set as 100 samples from an entire HSI data. In each iteration the size of training set increases with h = 1 % actively selected samples by FSAM pipeline. The best part of FSAM is that there is no hyper-parameters need to be tuned except classification methods. In ELM, the hidden neurons are systematically selected from the range of [ 1 500 ] . Similarly, in kNN, the nearest neighbors are set to k = [ 2 20 ] , SVM is tested with polynomial kernel function, and ensemble learning classifiers are trained using tree-based model with [ 1 100 ] number of trees. All such parameters are carefully tuned and optimized during the experimental setup. All these experiments are carried out using MATLAB (2014b) on an Intel Core i5 3.20 GHz CPU with 12 GB of RAM.

5. Experimental Results

In this section, we performed a set of experiments to evaluate our proposed FSAM pipeline using both ROSIS and AVIRIS sensors datasets. Evaluating ROSIS sensor datasets is more challenging classification problem dominated by complex urban classes and nested regions then AVIRIS. Here we evaluate the influence of the number labeled samples on the classification performance achieved by several classifiers. Figure 3 and Figure 4 shows the overall and kappa ( κ ) accuracy as a function of the number of labeled samples obtained by FSAM, i.e., fuzziness and SAM diversity-based active selection of most informative and diverse samples in each iteration. These labeled samples were selected by machine–machine interaction which significantly reduces the cost in terms of labeled collection through human supervisor which is the key aspects of automatic AL methods. The plots are shown in Figure 3 and Figure 4 and generated based on only selected samples in contrast to the entire population which reveals clear advantages of using fewer labeled samples for FSAM pipeline.
From Figure 3 and Figure 4, it can be observed that FSAM greatly improved the accuracy. The results also reveal that SVM and LB outperformed other classifiers in most cases, whereas, as expected, KNN provides lower classification accuracy than SVM and LB, since the candidates are more relevant when the samples are acquired from the class boundaries. Furthermore, it can also be observed that SVM always performed better than KNN, ELM, and ensemble learning classifiers. ELM could perform better with more number of hidden neurons on more powerful machines. For instance, when the 2% of labeled samples were used, the performance has been significantly increased in contrast to the 1% of actively selected samples. These observations confirm that FSAM can greatly improve the results obtained by different classifiers based on a small portion from the entire population, i.e., the classifiers trained using a limited number of selected labeled samples can produce better generalization performance rather than selecting the bulk amount of label training samples.
It is perceived form Figure 3 and Figure 4 that by including the samples back to the training set, the classification results are significantly improved for all the classifiers. Moreover, it can be seen that SVM and ELM classifiers are more robust then ensemble and KNN classifiers. For examples, with 1% actively selected samples in ELM classifier case, only 2% difference in classification with different number of samples can be observed, however, for the KNN and SVM classifiers, the difference is quite high. Similar observations can be made for ensemble models.
In order to present the classification results in geographical fashioned for both ROSIS and AVIRIS sensors datasets, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9 shows ground truths segmentation of all experimental datasets used in this work. These ground truths are generated using 2 % of actively selected samples by FSAM pipeline. In all the experiments, we provide the quantity of labeled training samples and the test samples which provide an indication of the number of true versus estimated labels used in the experiments. It can be observed from listed results, that our proposed fuzziness and diversity-based active labeled sample selection pipeline is quite robust as it achieved higher classification results which are way better or at least comparable with several state-of-the-art AL methods.
To better analyze the performance of FSAM on ROSIS and AVIRIS datasets, Table 3 shows the statistical significance in terms of recall, precision, and F1-score tests. The experiments shown in Table 3 are performed with 2% of actively selected labeled samples from each class for all experimental datasets. Table 3 is produced to support the results shown in Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9 for both AVIRIS and ROSIS sensor datasets. The global recall, precision, and F1-score for each classifier of these results are obtained using 5 Monte Carlo runs. Furthermore, these Tables shows the statistical significance of FSAM in terms of recall, precision, and F1-score with the 99% confidence interval. The obtained values indicate the ability of FSAM to correctly identify the unseen samples in which each classifier was trained on a very small amount of labeled training samples. For any good model, precision, recall, and F1-score values should be greater than 80% in average, and in our case, these values are almost above 80% for all experimental datasets and for all classifiers, demonstrating that the proposed FSAM-AL pipeline is not classifier sensitive.

6. Comparison and Discussion

The most advanced developments in AL are single pass context and hybrid AL. These techniques combine the concepts of incremental and adaptive learning from the field of online and traditional machine learning. These advancements have resulted in a substantial number of AL methods. The most classical and well studied AL methods include, for example, the works [60,61] focused on online learning. These works specifically designed for on-line single-pass setting in which the data stream samples arrive continuously, thus, does not allow classifier re-training. Furthermore, these works focused on close concepts of conflict and ignorance. Conflict models how close a query point is to the actual class boundary and ignorance represents the distance between already seen training samples and a new sample.
Similar works proposed in [62,63] focused only on early AL strategies such as early-stage experimental design problems. The TED method was proposed to select the samples using robust AL method incorporated with structured sparsity-inducing norms to relax the NP-hard objective of the convex formulation. Thus, these works only focused on selecting an optimal set of initial samples to kick-start the AL. However, the superiority of our proposed FSAM pipeline is that it shows state-of-the-art performance independent of how the initial labeled training samples are selected. Such methods can easily be integrated into the works which utilize the decision boundary based sample selection methods.
A novel tri-training semi-supervised hyperspectral image classification method based on regularized local discriminant embedding feature extraction (RLDE) was proposed in [64]. In this work, the RLDE process is used for optimal number of feature extraction to overcome the limitation of singular values and over-fitting of local Fisher discriminant analysis and local discriminant embedding. At a later stage, active learning method is used to select the informative samples from the candidate set. This work solves the singularity issues of LDA, however, this may include the redundant samples back to the training set which do not provide any new information to the classifier.
Spatial–spectral multiview 3D Gabor inspired active learning for hyperspectral image classification method was proposed in [65]. Trivial multiview active learning methods can make a comprehensive analysis of both sample selection and object characterization in active learning by using several features of multiple views. However, multiview cannot effectively exploit spatial–spectral information by respecting the 3D nature of hyperspectral imaging, therefore, the sample selection method in multiview is only based on the disagreement of multiple views. To overcome such problems, J. Hu, et al. [65] proposed a two-step 3D Gabor inspired multiview method for hyperspectral image classification. The first step consists of the view generation step, in which a 3D Gabor filter was used to generate multiple cubes with limited bands and utilize the features assessment strategies to select cubes for constructing views. On a second stage, an active learning method was presented which used both external and internal uncertainty estimation of views. More specifically, posterior probability distribution was used to learn the internal uncertainty of each individual independent view and external uncertainty was computed using inconsistency between the views.
Of course, the frameworks proposed in the above papers can be easily integrated with our proposed FSAM sample selection method instead of selecting the samples based on uncertainty or tri-training methods. We initialize our active learning method from 100 number of randomly selected labeled training samples and we experimentally demonstrate that randomly increasing the size of the training set slightly increases the accuracy nevertheless the classifiers become computationally complex. Therefore, at the first step, we decided to separate the set of misclassified samples which have higher fuzziness values (samples fuzziness magnitude between 0.7–1.0). We then select a specific percentage of misclassified samples which have higher fuzziness to compute the spectral angle among the reference training samples. We then fused a specific percentage of selected samples with the original training set to retrain the classifier from scratch for better generalization and classification performance on those samples which were initially misclassified by the same classifier.
More specifically, the proposed solution has been rigorously investigated through comparison against some significant works recently published in the HSI classification area, adopting different sample selection methods such as random sampling (RS), mutual information (MI), breaking ties (BT), modified breaking ties (MBT), uncertainty, and fuzziness as introduced in Section 4. This comparison is based on the Botswana hyperspectral dataset acquired by the NASA EO-1 Satellite Hyperion sensor [58,66]. The experiments are based on five Monte Carlo runs with 100 initial training samples selected from this dataset. In each iteration, the training set size has been increased of 50 samples selected by a specific method among the ones to be compared. The results thus obtained are presented in the Table 4, Table 5, Table 6, Table 7 and Table 8. Based on such results, we can argue that the FSAM pipeline outperforms the other solutions taken into account in these experiments. This is due to the dual soft thresholding method for selection of the most informative as well as spatially heterogeneous labeled training samples. Furthermore, another benefit of the proposed FSAM solution is that it systematically selects the most informative but least redundant labeled training samples by machine–machine interaction without involving any supervisor, automatically, while the other AL frameworks need that a supervisor selects the samples at each iteration, manually.
By the Botswana dataset we experimentally demonstrated that FSAM outperforms all other sample selection methods, i.e., random selection, mutual information, breaking ties, modified breaking ties, and fuzziness in terms of accuracy, starting from the same classifiers and the same number of labeled training samples as shown in Table 4, Table 5, Table 6, Table 7 and Table 8. Furthermore, all these sample selection methods are more often subjective and tends to bring redundancy into the classifiers. reducing the generalization performance of the classifiers. More specifically, the number of samples required to learn a model in FSAM can be much lower than the number of selected samples. In such scenarios, there is a risk, however, that the learning model may get overwhelmed because of the uninformative or spatially miscellaneous samples selected by query function.

7. Conclusions

The classification of multiclass spatial–spectral HSI with a small labeled training sample size is a challenging task. To overcome this problem, this paper introduces a customized AL pipeline for HSI to reduce the sample selection bias while maintaining the data stability in the spatial domain.
The proposed FSAM pipeline differs from traditional AL methods in three relevant aspects. First, instead of simply using the uncertainty of samples to select new samples, it utilizes the fuzziness measure associated with the confidence of the training model in classifying those samples correctly. Second, it couples the samples’ fuzziness with their diversity to select new training samples which simultaneously minimize the error among the training samples while maximizing the spectral angle between the selected sample and the existing training samples. In our work, instead of measuring angle-based distances among all new samples and all existing training samples, a reference sample is selected from within the training set against which the diversity of the new samples is measured. This achieves the same goal while reducing the computational overhead as the size of training set is always much smaller than the validation set which is the source of new samples. Thirdly, the FSAM keeps the pool of new samples balanced, giving equal representation to all classes, which is achieved via softening the thresholds at run time.
Experimental results on five benchmark datasets demonstrate that the proposed FSAM leads to an increased predictive power regarding kappa ( κ ) and overall accuracy, precision, recall, and F1-Score parameters. A comparison of FSAM with state-of-the-art sample selection method is performed, confirming that the FSAM is effective in terms of overall accuracy and κ , also with few training samples.
However, the main drawback of SAM is spectral mixture problems, i.e., SAM assumes that the reference spectra chosen to classify the HSI represents the pure spectra. Such problem occurs when the HSI is in low or medium spatial resolution. Furthermore, as we know, the surface of the earth is widely heterogeneous and complex, thus containing many mixed samples. The spectral confusion in samples can lead to overestimation or underestimation errors for a spectral signatures. Our future research direction aims to address such limitations to classify low or mid spatial resolution hyperspectral images in a computationally efficient way. Further work will be directed toward testing the FSAM pipeline in different analysis scenarios dominated by the limited availability of training samples a priori.

Author Contributions

Conceptualization, M.A.; methodology, M.A.; validation, M.A.; formal analysis, M.A., A.M.K.; investigation, M.A.; writing—original draft preparation, M.A.; writing—review and editing, M.A., A.K., A.M.K., M.M., S.D., A.S. and O.N.; visualization, M.A.; supervision, A.M.K.

Funding

This research work was partially funded by Innopolis University.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
HSIHyperspectral Imaging
ALActive Learning
SAMSpectral Angle Mapper
FSAMFuzziness and Spectral Angle Mapper (SAM)-based Active Sample Selection
D T Training Samples
D V Test Samples
D V H High Fuzziness Samples from Test Set
SVMSupport Vector Machine
KNNK Nearest Neighbours
LBLogistic Boost
ELMExtreme Learning Machine
QBCQuery by Committee
RSRandom Sampling
MIMutual Information
BTBreaking Ties
MBTModified Breaking Ties
κ kappa
OAOverall Accuracy

References

  1. Schneider, A.; Feussner, H. Diagnostic Procedures; Institute of Minimally Invasive Interdisciplinary Therapeutic Interventions (MITI), Technische Universität München (TUM), Biomedical Engineering in Gastrointestinal Surgery: London, UK, 2017; Chapter 5. [Google Scholar]
  2. Ahmad, M.; Alqarni, M.; Khan, A.M.; Hussain, R.; Mazzara, M.; Distefano, S. Segmented and non-segmented stacked denoising autoencoder for hyperspectral band reduction. Optik Int. J. Light Electron Opt. 2019, 180, 370–378. [Google Scholar] [CrossRef]
  3. Qu, Y.; Qi, H.; Kwan, C. Unsupervised Sparse Dirichlet-Net for Hyperspectral Image Super-Resolution. arXiv 2018, arXiv:1804.05042. [Google Scholar]
  4. Ahmad, M.; Bashir, A.K.; Khan, A.M. Metric similarity regularizer to enhance pixel similarity performance for hyperspectral unmixing. Optik Int. J. Light Electron Opt. 2017, 140, 86–95. [Google Scholar] [CrossRef]
  5. Ahmad, M.; Khan, A.M.; Mazzara, M.; Distefano, S. Multi-layer Extreme Learning Machine-based Autoencoder for Hyperspectral Image Classification. In Proceedings of the 14th International Conference on Computer Vision Theory and Applications (VISAPP’19), Prague, Czech Republic, 25–27 February 2019. [Google Scholar]
  6. Li, J.; Bioucas-Dias, J.M.; Plaza, A. Spectral-Spatial Hyperspectral Image Segmentation Using Subspace Multinomial Logistic Regression and Markov Random Fields. IEEE Trans. Geosci. Remote Sens. 2012, 50, 809–823. [Google Scholar] [CrossRef]
  7. Xia, J.; Du, P.; He, X.; Chanussot, J. Hyperspectral Remote Sensing Image Classification Based on Rotation Forest. IEEE Geosci. Remote Sens. Lett. 2014, 11, 239–243. [Google Scholar] [CrossRef]
  8. Pan, B.; Shi, Z.; Xu, X. Hierarchical Guidance Filtering-Based Ensemble Classification for Hyperspectral Images. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4177–4189. [Google Scholar] [CrossRef]
  9. Pan, B.; Shi, Z.; Xia, X. R-VCANet: A New Deep-Learning-Based Hyperspectral Image Classification Method. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 1975–1986. [Google Scholar] [CrossRef]
  10. Tan, K.; Du, P. Hyperspectral Remote Sensing Image Classification Based on Support Vector Machine. J. Infrared Millim. Waves 2013, 27, 123–128. [Google Scholar] [CrossRef]
  11. Ahmad, M.; Protasov, S.; Khan, A.M.; Hussian, R.; Khattak, A.M.; Khan, W.A. Fuzziness-based active learning framework to enhance hyperspectral image classification performance for discriminative and generative classifiers. PLoS ONE 2018, 13, e0188996. [Google Scholar] [CrossRef]
  12. Hughes, G. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 1968, 14, 55–63. [Google Scholar] [CrossRef]
  13. Persello, C.; Bruzzone, L. Active and Semisupervised Learning for the Classification of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2014, 52, 6937–6956. [Google Scholar] [CrossRef]
  14. Yang, L.; Yang, S.; Jin, P.; Zhang, R. Semi-Supervised Hyperspectral Image Classification Using Spatio-Spectral Laplacian Support Vector Machine. IEEE Geosci. Remote Sens. Lett. 2014, 11, 651–655. [Google Scholar] [CrossRef]
  15. Zhou, Z.H.; Li, M. Tri-Training: Exploiting Unlabeled Data Using Three Classifiers. IEEE Trans. Knowl. Data Eng. 2005, 17, 1529–1541. [Google Scholar] [CrossRef]
  16. Ahmad, M.; Khan, A.M.; Hussain, R. Graph-based spatial–spectral feature learning for hyperspectral image classification. IET Image Process. 2017, 11, 1310–1316. [Google Scholar] [CrossRef]
  17. Ly, N.H.; Qian, D.; Fowler, J.E. Sparse Graph-Based Discriminant Analysis for Hyperspectral Imagery. IEEE Trans. Geosci. Remote Sens. 2014, 52, 3872–3884. [Google Scholar] [CrossRef]
  18. Zhang, L.; Du, B. Recent advances in hyperspectral image processing. Geo-Spat. Inf. Sci. 2012, 15, 143–156. [Google Scholar] [CrossRef]
  19. Ahmad, M.; Khan, A.M.; Hussain, R.; Protasov, S.; Chow, F.; Khattak, A.M. Unsupervised geometrical feature learning from hyperspectral data. In Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece, 6–9 December 2016; pp. 1–6. [Google Scholar]
  20. Tuia, D.; Ratle, F.; Pacifici, F.; Kanevski, M.F.; Emery, W.J. Active Learning Methods for Remote Sensing Image Classification. IEEE Trans. Geosci. Remote Sens. 2009, 47, 2218–2232. [Google Scholar] [CrossRef]
  21. Dopido, I.; Jun, L.; Marpu, P.R.; Plaza, J.M.A.; Dias, B.; Benediktsson, J.A. Semisupervised Self-Learning for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2013, 51, 4032–4044. [Google Scholar] [CrossRef]
  22. Yang, L.; MacEachren, A.M.; Mitra, P.; Onorati, T. Visually-Enabled Active Deep Learning for (Geo) Text and Image Classification: A Review. ISPRS Int. J. Geo-Inf. 2018, 7, 65. [Google Scholar] [CrossRef]
  23. Yu, H.; Yang, X.; Zheng, S.; Sun, C. Active Learning from Imbalanced Data: A Solution of Online Weighted Extreme Learning Machine. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 1088–1103. [Google Scholar] [CrossRef]
  24. Liu, C.; He, L.; Li, Z.; Li, J. Feature-Driven Active Learning for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2018, 56, 341–354. [Google Scholar] [CrossRef]
  25. Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2011, 66, 247–259. [Google Scholar] [CrossRef]
  26. Pasolli, E.; Melgani, F.; Tuia, D.; Pacifici, F.; Emery, W.J. Improving active learning methods using spatial information. In Proceedings of the 2011 IEEE International Geoscience and Remote Sensing Symposium, Sendai, Japan, 1–5 August 2011; pp. 3923–3926. [Google Scholar]
  27. Liu, A.; Jun, G.; Ghosh, J. Active learning of hyperspectral data with spatially dependent label acquisition costs. In Proceedings of the 2009 IEEE International Geoscience and Remote Sensing Symposium, Cape Town, South Africa, 12–17 July 2009. [Google Scholar]
  28. Tuia, D.; Volpi, M.; Copa, L.; Kanevski, M.; Munoz-Mari, J. A Survey of Active Learning Algorithms for Supervised Remote Sensing Image Classification. IEEE J. Sel. Top. Signal Process. 2011, 5, 606–617. [Google Scholar] [CrossRef]
  29. Pasolli, E.; Melgani, F.; Tuia, D.; Pacifici, F.; Emery, W.J. SVM Active Learning Approach for Image Classification Using Spatial Information. IEEE Trans. Geosci. Remote Sens. 2014, 52, 2217–2233. [Google Scholar] [CrossRef]
  30. MacKay, D.J.C. Information-Based Objective Functions for Active Data Selection. Neural Comput. 1992, 4, 590–604. [Google Scholar] [CrossRef]
  31. Krishnapuram, B.; Williams, D.; Xue, Y.; Carin, L.; Figueiredo, M.; Hartemink, A.J. On Semi-Supervised Classification. In Advances in Neural Information Processing Systems 17; Saul, L.K., Weiss, Y., Bottou, L., Eds.; MIT Press: Cambridge, MA, USA, 2005; pp. 721–728. [Google Scholar]
  32. Li, J.; Bioucas-Dias, J.M.; Plaza, A. Spectral–Spatial Classification of Hyperspectral Data Using Loopy Belief Propagation and Active Learning. IEEE Trans. Geosci. Remote Sens. 2013, 51, 844–856. [Google Scholar] [CrossRef]
  33. Luo, T.; Kramer, K.; Samson, S.; Remsen, A.; Goldgof, D.B.; Hall, L.O.; Hopkins, T. Active learning to recognize multiple types of plankton. In Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 23–26 August 2004; pp. 478–481. [Google Scholar]
  34. Li, J.; Bioucas-Dias, J.M.; Plaza, A. Hyperspectral Image Segmentation Using a New Bayesian Approach With Active Learning. IEEE Trans. Geosci. Remote Sens. 2011, 49, 3947–3960. [Google Scholar] [CrossRef]
  35. Shi, Q.; Du, B.; Zhang, L. Spatial Coherence-Based Batch-Mode Active Learning for Remote Sensing Image Classification. IEEE Trans. Image Process. 2015, 24, 2037–2050. [Google Scholar] [CrossRef] [PubMed]
  36. Demir, B.; Persello, C.; Bruzzone, L. Batch-Mode Active-Learning Methods for the Interactive Classification of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2011, 49, 1014–1031. [Google Scholar] [CrossRef]
  37. Lewis, D.D.; Gale, A.W. A Sequential Algorithm for Training Text Classifiers. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, 3–6 July 1994; pp. 3–12. [Google Scholar]
  38. Di, W.; Crawford, M.M. Active Learning via Multi-View and Local Proximity Co-Regularization for Hyperspectral Image Classification. IEEE J. Sel. Top. Signal Process. 2011, 5, 618–628. [Google Scholar] [CrossRef]
  39. Patra, S.; Bhardwaj, K.; Bruzzone, L. A Spectral-Spatial Multicriteria Active Learning Technique for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 5213–5227. [Google Scholar] [CrossRef]
  40. Li, J. Active learning for hyperspectral image classification with a stacked autoencoders based neural network. In Proceedings of the 2015 7th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Tokyo, Japan, 2–5 June 2015; pp. 1–4. [Google Scholar]
  41. David, A.C.; Ghahramani, Z. Active Learning with Statistical Models. J. Artif. Intell. Res. 1996, 4, 129–145. [Google Scholar]
  42. Rajan, S.; Ghosh, J.; Crawford, M.M. An Active Learning Approach to Hyperspectral Data Classification. IEEE Trans. Geosci. Remote Sens. 2008, 46, 1231–1242. [Google Scholar] [CrossRef]
  43. Seung, H.S.; Opper, M.; Sompolinsky, H. Query by Committee. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1992; pp. 287–294. [Google Scholar]
  44. Haines, T.; Xiang, T. Active Learning using Dirichlet Processes for Rare Class Discovery and Classification. In Proceedings of the British Machine Vision Conference, Dundee, UK, 29 August–2 September 2011; pp. 9.1–9.11. [Google Scholar]
  45. Michel, J.; Malik, J.; Inglada, J. Lazy yet efficient land-cover map generation for HR optical images. In Proceedings of the 2010 IEEE International Geoscience and Remote Sensing Symposium, Honolulu, HI, USA, 25–30 July 2010; pp. 1863–1866. [Google Scholar]
  46. Borisov, A.; Tuv, E.; Runger, G. Active Batch Learning with Stochastic Query-by-Forest (SQBF). Proc. Mach. Learn. Res. 2011, 16, 59–69. [Google Scholar]
  47. Munoz-Mari, J.; Tuia, D.; Camps-Valls, G. Semisupervised Classification of Remote Sensing Images with Active Queries. IEEE Trans. Geosci. Remote Sens. 2012, 50, 3751–3763. [Google Scholar] [CrossRef]
  48. He, L.; Li, J.; Liu, C.; Li, S. Recent Advances on Spectral–Spatial Hyperspectral Image Classification: An Overview and New Guidelines. IEEE Trans. Geosci. Remote Sens. 2018, 56, 1579–1597. [Google Scholar] [CrossRef]
  49. Yu, H.; Sun, C.; Yang, W.; Yang, X.; Zuo, X. AL-ELM: One uncertainty-based active learning algorithm using extreme learning machine. Neurocomputing 2015, 166, 140–150. [Google Scholar] [CrossRef]
  50. Kumar, S.; Hebert, M. Discriminative Random Fields. Int. J. Comput. Vis. 2006, 68, 179–201. [Google Scholar] [CrossRef]
  51. Luca, A.D.; Termini, S. A Definition of a Non-Probabilistic Entropy in the Setting of Fuzzy Sets Theory. J. Inf. Control 1972, 20, 301–312. [Google Scholar] [CrossRef]
  52. Yeung, D.S.; Trillas, E. Measures of Fuzziness under Different Uses of Fuzzy Sets. In Advances in Computational Intelligence; Springer: Berlin/Heidelberg, Germany, 2012; pp. 25–34. [Google Scholar]
  53. Yuhas, R.H.; Goetz, A.F.H.; Boardman, J.W. Discrimination Among Semi-Arid Landscape Endmembers Using the Spectral Angle Mapper (SAM) Algorithm. In Summaries of the 4th JPL Airborne Earth Science Workshop; JPL Publication, Summaries of the Third Annual JPL Airborne Geoscience Workshop; NASA: Washington, DC, USA, 1992; pp. 147–149. [Google Scholar]
  54. van der Meer, F.D.; Vazquez-Torres, M.; van Dijk, P.M. Spectral characterization of ophiolite lithologies in the Troodos ophiolite complex of Cyprus and its potential in prospecting for massive sulphide deposits. Int. J. Remote Sens. 1997, 18, 1245–1257. [Google Scholar] [CrossRef]
  55. Carvalho, O.A.D.; Meneses, P.R. Spectral Correlation Mapper (SCM); An Improvement on the Spectral Angle Mapper (SAM). In Summaries of the 9th JPL Airborne Earth Science Workshop; JPL Publication 00-18; NASA: Washington, DC, USA, 2000. [Google Scholar]
  56. Remondino, F.; Rizzi, A.; Barazzetti, L.; Scaioni, M.; Francesco, F.; Raffaella, B.; Anna, P. Review of Geometric and Radiometric Analyses of Paintings. Photogramm. Rec. 2011, 26, 439–461. [Google Scholar] [CrossRef]
  57. Singh, R.S. Evaluation of EO-1 Hyperion Data for Crop Studies in Part of Indo-Gangatic Plains: A Case Study of Meerut District. Adv. Remote Sens. 2015, 4, 263–269. [Google Scholar] [CrossRef]
  58. Hyperspectral Datasets Description. Available online: http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes (accessed on 30 June 2018).
  59. Yang, J.; Yu, H.; Yang, X.; Zuo, X. Imbalanced Extreme Learning Machine Based on Probability Density Estimation. In Multi-Disciplinary Trends in Artificial Intelligence; Bikakis, A., Zheng, X., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 160–167. [Google Scholar]
  60. Woodward, M.; Finn, C. Active One-shot Learning. arXiv 2017, arXiv:1702.06559. [Google Scholar]
  61. Lughofer, E. Single-pass active learning with conflict and ignorance. Evol. Syst. 2012, 3, 251–271. [Google Scholar] [CrossRef]
  62. Liu, W.; Chang, X.; Chen, L.; Yang, Y. Early Active Learning with Pairwise Constraint for Person Re-identification. In Machine Learning and Knowledge Discovery in Databases; Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 103–118. [Google Scholar]
  63. Nie, F.; Wang, H.; Huang, H.; Ding, C. Early Active Learning via Robust Representation and Structured Sparsity. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, Beijing, China, 3–9 August 2013; pp. 1572–1578. [Google Scholar]
  64. Ou, D.; Tan, K.; Du, Q.; Zhu, J.; Wang, X.; Chen, Y. A Novel Tri-Training Technique for the Semi-Supervised Classification of Hyperspectral Images Based on Regularized Local Discriminant Embedding Feature Extraction. Remote Sens. 2019, 11, 654. [Google Scholar] [CrossRef]
  65. Hu, J.; He, Z.; Li, J.; He, L.; Wang, Y. 3D-Gabor Inspired Multiview Active Learning for Spectral-Spatial Hyperspectral Image Classification. Remote Sens. 2018, 10, 1070. [Google Scholar] [CrossRef]
  66. Preet, P.; Batra, S.S.; Jayadeva. Feature Selection for classification of hyperspectral data by minimizing a tight bound on the VC dimension. arXiv 2015, arXiv:1509.08112. [Google Scholar]
Figure 1. (a) Pavia University image; (b) True ground truths differentiate nine different classes; (c) SVM trained with 1% randomly selected training samples; (d) SVM trained with 10% randomly selected training samples; (e) KNN trained with 1% randomly selected training samples; (f) KNN trained with 10% randomly selected training samples; (g) Logistic boost (LB) trained with 1% randomly selected training samples; (h) LB trained with 10% randomly selected training samples.
Figure 1. (a) Pavia University image; (b) True ground truths differentiate nine different classes; (c) SVM trained with 1% randomly selected training samples; (d) SVM trained with 10% randomly selected training samples; (e) KNN trained with 1% randomly selected training samples; (f) KNN trained with 10% randomly selected training samples; (g) Logistic boost (LB) trained with 1% randomly selected training samples; (h) LB trained with 10% randomly selected training samples.
Remotesensing 11 01136 g001
Figure 2. Proposed FSAM-AL Pipeline in which red labeled boxes represent where we contribute.
Figure 2. Proposed FSAM-AL Pipeline in which red labeled boxes represent where we contribute.
Remotesensing 11 01136 g002
Figure 3. Overall accuracy with different number of training samples (%) selected in each iteration from different datasets. It is perceived from the above figure that by including the samples back to the training set, the classification results are significantly improved for all the classifiers. Moreover, it can be seen that SVM and ELM classifiers are more robust. For examples, with 2% actively selected samples in ELM classifier case, only 2% difference in the classification with a different number of samples can be observed, however, for the KNN and SVM classifiers, the difference is quite high.
Figure 3. Overall accuracy with different number of training samples (%) selected in each iteration from different datasets. It is perceived from the above figure that by including the samples back to the training set, the classification results are significantly improved for all the classifiers. Moreover, it can be seen that SVM and ELM classifiers are more robust. For examples, with 2% actively selected samples in ELM classifier case, only 2% difference in the classification with a different number of samples can be observed, however, for the KNN and SVM classifiers, the difference is quite high.
Remotesensing 11 01136 g003
Figure 4. Kappa ( κ ) accuracy with different number of training samples (%) selected in each iteration from Salinas-A, Salinas, Kennedy Space Center, Pavia University, and Pavia Center datasets respectively. It is perceived from the above figure that by including the samples back to the training set, the classification results in terms of kappa κ are significantly improved for all the classifiers. Moreover, it can be seen that SVM and ELM classifiers are more robust then ensemble and KNN classifiers. For examples, with 2% actively selected samples in ELM classifier case, only 2% difference in the classification with a different number of samples can be observed, however, for the KNN and SVM classifiers, the difference is quite high. Similar observations can be made for ensemble learning models.
Figure 4. Kappa ( κ ) accuracy with different number of training samples (%) selected in each iteration from Salinas-A, Salinas, Kennedy Space Center, Pavia University, and Pavia Center datasets respectively. It is perceived from the above figure that by including the samples back to the training set, the classification results in terms of kappa κ are significantly improved for all the classifiers. Moreover, it can be seen that SVM and ELM classifiers are more robust then ensemble and KNN classifiers. For examples, with 2% actively selected samples in ELM classifier case, only 2% difference in the classification with a different number of samples can be observed, however, for the KNN and SVM classifiers, the difference is quite high. Similar observations can be made for ensemble learning models.
Remotesensing 11 01136 g004
Figure 5. Salinas-A: (a): Ground Band, (b): True Ground Truths, (c): Training Ground Truths, (d): Test Ground Truths, and ground truths predicted by (e): SVM, (f): KNN, (g): GB, (h): LB, and (i): ELM classifier with 2% of selected training samples.
Figure 5. Salinas-A: (a): Ground Band, (b): True Ground Truths, (c): Training Ground Truths, (d): Test Ground Truths, and ground truths predicted by (e): SVM, (f): KNN, (g): GB, (h): LB, and (i): ELM classifier with 2% of selected training samples.
Remotesensing 11 01136 g005
Figure 6. Salinas: (a) Ground Band, (b): True Ground Truths, (c): Training Ground Truths, (d): Test Ground Truths, and ground truths predicted by (e): SVM, (f): KNN, (g): GB, (h): LB, and (i): ELM classifier with 2% of selected training samples.
Figure 6. Salinas: (a) Ground Band, (b): True Ground Truths, (c): Training Ground Truths, (d): Test Ground Truths, and ground truths predicted by (e): SVM, (f): KNN, (g): GB, (h): LB, and (i): ELM classifier with 2% of selected training samples.
Remotesensing 11 01136 g006
Figure 7. Kennedy Space Center: (a) Ground Band, (b): True Ground Truths, (c): Training Ground Truths, (d): Test Ground Truths, and ground truths predicted by (e): SVM, (f): KNN, (g): GB, (h): LB, and (i): ELM classifiers with 2% of selected training samples.
Figure 7. Kennedy Space Center: (a) Ground Band, (b): True Ground Truths, (c): Training Ground Truths, (d): Test Ground Truths, and ground truths predicted by (e): SVM, (f): KNN, (g): GB, (h): LB, and (i): ELM classifiers with 2% of selected training samples.
Remotesensing 11 01136 g007
Figure 8. Pavia University: (a) Ground Band, (b): True Ground Truths, (c): Training Ground Truths, (d): Test Ground Truths, and ground truths predicted by (e): SVM, (f): KNN, (g): GB, (h): LB, and (i): ELM classifier with 2% of selected training samples.
Figure 8. Pavia University: (a) Ground Band, (b): True Ground Truths, (c): Training Ground Truths, (d): Test Ground Truths, and ground truths predicted by (e): SVM, (f): KNN, (g): GB, (h): LB, and (i): ELM classifier with 2% of selected training samples.
Remotesensing 11 01136 g008
Figure 9. Pavia Center: (a) Ground Band, (b): True Ground Truths, (c): Training Ground Truths, (d): Test Ground Truths, and ground truths predicted by (e): SVM, (f): KNN, (g): GB, (h): LB, and (i): ELM classifier with 2% of selected training samples.
Figure 9. Pavia Center: (a) Ground Band, (b): True Ground Truths, (c): Training Ground Truths, (d): Test Ground Truths, and ground truths predicted by (e): SVM, (f): KNN, (g): GB, (h): LB, and (i): ELM classifier with 2% of selected training samples.
Remotesensing 11 01136 g009
Table 1. Classification accuracies in-terms of overall and kappa ( κ ) obtained by three different classifiers with two different number of randomly selected labeled training samples, i.e., 1% and 10% respectively. All these classifiers are trained and tested using 5-fold cross validation, and from results one can conclude that SVM produce better results when trained with 10% randomly selected training samples. However, the obtained results are not good enough to identify the ground materials accurately, therefore, further investigations are required.
Table 1. Classification accuracies in-terms of overall and kappa ( κ ) obtained by three different classifiers with two different number of randomly selected labeled training samples, i.e., 1% and 10% respectively. All these classifiers are trained and tested using 5-fold cross validation, and from results one can conclude that SVM produce better results when trained with 10% randomly selected training samples. However, the obtained results are not good enough to identify the ground materials accurately, therefore, further investigations are required.
Classifiers1% Training Samples10% Training Samples
kappa ( κ )Overallkappa ( κ )Overall
SVM0.5853%0.6818%0.7804%0.8365%
KNN0.4897%0.6488%0.6791%0.7691%
LB0.5216%0.6488%0.7531%0.8365%
Table 2. Different sample selection methods used in Active Learning frameworks for hyperspectral image classification in the recent years.
Table 2. Different sample selection methods used in Active Learning frameworks for hyperspectral image classification in the recent years.
Sample Selection MethodsReferences
SpectralSpectral–Spatial
Random selection[11,20,28][29]
Mutual information[30,31][32]
Breaking ties[33][34,35]
Modified breaking ties[34][34,35]
Uncertain sampling[36,37][38,39,40]
Fisher information ratio[41][42]
Fuzziness information[11]——
Query by committee[43]——
Table 3. Statistical applicability of our proposed FSAM samples selection method. Each classifier is trained with 2% of actively selected training samples.
Table 3. Statistical applicability of our proposed FSAM samples selection method. Each classifier is trained with 2% of actively selected training samples.
TestsELMKNNGBLBSVM
Salinas-A Dataset.
Recall0.9852 ± 0.00370.9489 ± 0.05280.9644 ± 0.02810.9650 ± 0.03150.9855 ± 0.0092
Precision0.9903 ± 0.00290.9567 ± 0.03520.9649 ± 0.02750.9672 ± 0.02870.9885 ± 0.0071
F1 Score0.9875 ± 0.00310.9459 ± 0.06090.9639 ± 0.02700.9654 ± 0.03000.9867 ± 0.0076
Salinas Dataset.
Recall0.9544 ± 0.00320.9247 ± 0.01620.9551 ± 0.01530.9580 ± 0.01350.9583 ± 0.0115
Precision0.9584 ± 0.00300.9189 ± 0.01170.9596 ± 0.01270.9603 ± 0.01170.9642 ± 0.0091
F1 Score0.9552 ± 0.00260.9225 ± 0.01290.9570 ± 0.01380.9588 ± 0.01240.9611 ± 0.0101
Kennedy Space Center Dataset.
Recall0.8220 ± 0.05180.7513 ± 0.07390.8158 ± 0.08390.8159 ± 0.07900.8546 ± 0.0579
Precision0.8445 ± 0.04230.8375 ± 0.06380.8315 ± 0.07510.8344 ± 0.06830.8596 ± 0.0509
F1 Score0.8260 ± 0.04690.7491 ± 0.07600.8183 ± 0.08150.8195 ± 0.07750.8542 ± 0.0533
Pavia University Dataset.
Recall0.7782 ± 0.02830.7954 ± 0.03460.8838 ± 0.03910.8856 ± 0.03760.8858 ± 0.0287
Precision0.8708 ± 0.02290.9508 ± 0.00900.9165 ± 0.02470.9146 ± 0.02520.8929 ± 0.0203
F1 Score0.8107 ± 0.02550.8122 ± 0.03160.8973 ± 0.03360.8976 ± 0.03270.8886 ± 0.0241
Pavia Center Dataset.
Recall0.8913 ± 0.00730.9281 ± 0.01510.9493 ± 0.01240.9509 ± 0.01200.9568 ± 0.0115
Precision0.9187 ± 0.00660.9125 ± 0.01360.9469 ± 0.01120.9489 ± 0.01070.9572 ± 0.0119
F1 Score0.8999 ± 0.00640.9248 ± 0.01290.9480 ± 0.01160.9498 ± 0.01120.9568 ± 0.0117
Table 4. Kappa ( κ ) accuracy obtained by SVM Classifier with different number of training samples selected in each iteration from Botswana dataset with different sample selection methods from literature.
Table 4. Kappa ( κ ) accuracy obtained by SVM Classifier with different number of training samples selected in each iteration from Botswana dataset with different sample selection methods from literature.
Sample Selection MethodNumber of Training Samples
50100150200250300350400450500
Kappa Accuracy
Random Sampling [28]0.81560.84830.87380.88860.90050.91010.91700.91510.91630.9221
Mutual Information [30,31,32]0.81490.84370.86020.87980.88630.90020.91080.92170.91950.9302
Breaking Ties [33]0.81630.83160.84010.85610.87780.89190.90080.90140.90870.9128
Modified Breaking Ties [34,35]0.81560.85220.85630.88930.90070.90400.90680.91360.91380.9103
Fuzziness [11]0.81740.81290.84220.86480.87550.89340.89890.89860.91560.9119
FSAM0.81670.87490.90270.90910.94930.95560.96680.97880.99280.9984
Table 5. Kappa ( κ ) accuracy obtained by ELM Classifier with different number of training samples selected in each iteration from Botswana dataset with different sample selection methods from literature.
Table 5. Kappa ( κ ) accuracy obtained by ELM Classifier with different number of training samples selected in each iteration from Botswana dataset with different sample selection methods from literature.
Sample Selection MethodNumber of Training Samples
50100150200250300350400450500
Kappa Accuracy
Random Sampling [28]0.80940.82530.83640.85640.86480.87300.89190.89580.90630.9140
Mutual Information [30,31,32]0.80510.82460.84300.85380.86990.87720.88810.89620.89830.9070
Breaking Ties [33]0.80510.81740.83920.86070.86800.87440.88190.89630.89270.9022
Modified Breaking Ties [34,35]0.79610.82160.85630.86540.87180.87690.88960.90120.90810.9149
Fuzziness [11]0.79580.82240.84630.85130.86060.87330.87800.88410.90080.9083
FSAM0.80210.83850.85440.88460.89230.89680.92130.93550.94920.9551
Table 6. Kappa ( κ ) accuracy obtained by KNN Classifier with different number of training samples selected in each iteration from Botswana dataset with different sample selection methods from literature.
Table 6. Kappa ( κ ) accuracy obtained by KNN Classifier with different number of training samples selected in each iteration from Botswana dataset with different sample selection methods from literature.
Sample Selection MethodNumber of Training Samples
50100150200250300350400450500
Kappa Accuracy
Random Sampling [28]0.78540.81450.81580.84280.85470.85560.86030.86400.86950.8757
Mutual Information [30,31,32]0.78540.80290.81540.83420.84850.85190.85920.86530.87270.8814
Breaking Ties [33]0.78540.82050.83300.84660.84690.85410.86260.87310.87600.8757
Modified Breaking Ties [34,35]0.78540.83960.84630.84740.85840.86350.86850.87490.88130.8841
Fuzziness [11]0.78540.82480.82980.84450.85290.85840.86280.86780.87020.8771
FSAM0.78540.83690.85120.88420.89190.91890.92360.94690.95840.9625
Table 7. Kappa ( κ ) accuracy obtained by GB Classifier with different number of training samples selected in each iteration from Botswana dataset with different sample selection methods from literature.
Table 7. Kappa ( κ ) accuracy obtained by GB Classifier with different number of training samples selected in each iteration from Botswana dataset with different sample selection methods from literature.
Sample Selection MethodNumber of Training Samples
50100150200250300350400450500
Kappa Accuracy
Random Sampling [28]0.81400.82720.83420.83580.84790.85970.86490.87030.86750.8717
Mutual Information [30,31,32]0.81390.79410.81180.84060.86200.86080.86870.87500.87870.8712
Breaking Ties [33]0.81390.78750.83180.83550.86250.86910.87950.88950.89350.8928
Modified Breaking Ties [34,35]0.81390.80670.84040.8520.85700.86120.87110.87720.87500.8795
Fuzziness [11]0.81390.84700.84880.85240.85590.86580.86920.87550.87820.8853
FSAM0.81400.80540.86050.88520.90600.92680.92470.92800.94550.9592
Table 8. Kappa ( κ ) accuracy obtained by LB Classifier with different number of training samples selected in each iteration from Botswana dataset with different sample selection methods from literature.
Table 8. Kappa ( κ ) accuracy obtained by LB Classifier with different number of training samples selected in each iteration from Botswana dataset with different sample selection methods from literature.
Sample Selection MethodNumber of Training Samples
50100150200250300350400450500
Kappa Accuracy
Random Sampling [28]0.80740.81050.82750.83590.84190.84290.84820.86200.86990.8768
Mutual Information [30,31,32]0.80730.82120.831510.83660.84090.84420.85640.8540.85710.8565
Breaking Ties [33]0.80740.81550.83000.82930.83370.84670.85130.84900.85820.8679
Modified Breaking Ties [34,35]0.80740.83030.84020.84220.84840.86490.87040.88130.88160.8794
Fuzziness [11]0.80730.81260.82360.84110.84470.85860.86260.86840.86690.8709
FSAM0.80730.81180.82260.87790.89930.90940.92160.94020.94750.9588

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Remote Sens. EISSN 2072-4292 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top