Hyperspectral Image Classiﬁcation Promotion Using Clustering Inspired Active Learning

: Deep neural networks (DNNs) have promoted much of the recent progress in hyperspectral image (HSI) classiﬁcation, which depends on extensive labeled samples and deep network structure and has achieved surprisingly good generalization capacity. However, due to the expensive labeling cost, the labeled samples are scarce in most practice cases, which causes these DNN-based methods to be prone to over-ﬁtting and inﬂuences the classiﬁcation result. To mitigate this problem, we present a clustering-inspired active learning method for enhancing the HSI classiﬁcation result, which mainly contributes to two aspects. On one hand, the modiﬁed clustering by fast search and ﬁnd of peaks clustering method is utilized to select highly informative and diverse samples from unlabeled samples in the candidate set for manual labeling, which empowers us to appropriately augment the limited training set (i.e., labeled samples) and thus improves the generalization capacity of the baseline DNN model. On the other hand, another K-means clustering-based pseudo-labeling scheme is utilized to pre-train the DNN model with all samples in the candidate set. By doing this, the pre-trained model can be effectively generalized to unlabeled samples in the testing set after being ﬁne tuned-based on the augmented training set. The experiment accuracies on two benchmark HSI datasets show the effectiveness of the proposed method.


Introduction
A hyperspectral image (HSI) contains not only spatial information but also abundant spectral information. The substances, which are difficultly distinguished in natural images can be easily recognized in hyperspectral imagery. As a result, HSIs have been widely applied in resource exploration, mineral detection, environmental investigation and lesion detection, etc. [1][2][3][4][5].
HSI classification is an essential HSI application which focuses on assigning each pixel a unique class label. To date, a large number of HSI classification methods have been proposed from different perspectives, depending on the HSI classification methods whether using deep learning-based methods to obtain HSI features and classification results, the HSI classification methods can be roughly divided into the non-deep learning-based method and the deep learning-based method. fake inputs to solve the small sample HSI classification tasks. Although this method can enhance HSI classification accuracy with limited samples via the generative capacity of GANs, the quality of the generated samples is often ignored, which limits the improvement of the classification result.
This paper presents a cluster-inspired active learning method for HSI classification with limited labeled samples, which mainly contributes to two aspects. Firstly, the modified clustering by fast search and find of peaks (MCFSFDP) clustering method is utilized to select highly informative and diverse samples from unlabeled samples in the candidate set for manually labeling by an expert, which empowers us to appropriately augment the limited training set (i.e., labeled samples) and thus improve the generalization capacity of the baseline DNN model. Secondly, another K-means clustering-based pseudo-labeling scheme is utilized to pre-train the DNN model with the unlabeled samples in the candidate set. By doing this, the pre-trained model can be effectively generalized to unlabeled samples in the testing set after being fine-tuned based on the augmented training set.
This paper is organized as follows. In Section 2, the proposed method is described in detail, including data pre-processing, actively selecting core samples from the candidate set via MCFSFDP, the pre-trained DNN model via pseudo-labeling of unlabeled samples in candidate set generated via K-means, and network training and testing. In Sections 3 and 4, the results and discussion are presented. In Section 5, the conclusions of this paper are summarized.

The Proposed Method
The cluster inspired active learning method includes four major steps: (1) data preprocessing, which extracts the spectral information of each pixel as the sample and divides all the samples into the training set, candidate set and the testing set; (2) actively selecting core samples from the candidate set via MCFSFDP-the effective MCFSFDP clustering method is utilized to actively select core samples from unlabeled samples in the candidate set for manually labeling; (3) the K-means clustering-based pseudo-labeling scheme is utilized to pre-train the DNN model with samples in candidate set; and (4) fine-tuning and testing, using core samples and small samples as new augmented training samples to fine-tune the network and obtain the final classification result of the testing samples. The flowchart of our proposed method is shown in Figure 1.

Data Pre-Processing
In this paper, the HSI used in the classification task is denoted as R. An HSI consists of 3D data; we only use the spectral information of each pixel as the sample. We randomly select M pixels from R as limited training samples; in other words, the quantity of the small sample is denoted by M. These selected training samples include all the categories, and each category has almost the same number of pixels. The pixel P i includes the corresponding spectral information with a size of h × 1 as the training sample. h denotes the spectral number of R. {P i } M i=1 denotes the limited samples, and the limited samples have these manually labeled labels.
Then, we extract N pixels from R and their corresponding spectral information C j N j=1 as unlabeled samples in the candidate set, N denotes the number of samples in the candidate set, i.e., the number of the candidate samples. Finally, the rest samples are testing samples. K denotes the number of testing samples. {Q u } K u=1 denotes the testing samples. The samples are also denoted as column vectors, with sizes of h × 1 mathematically.
The samples in the testing set are all used for testing. The core samples are actively selected for labeling via the active learning method, which are selected from the candidate samples. In addition, the K-means clustering method will automatically give the samples in candidate set pseudo-labels for the network pre-training. Here, M plus N is almost equal to K. The samples in the training set, the candidate set and the testing set are not overlapping.
The sample is extracted from R is shown in Figure 2.

Actively Selecting Core Samples via MCFSFDP
To actively select the core samples for manually labeling from unlabeled candidate samples, the clustering-based method may be suitable. In our opinion, clustering by fast search and find of peaks (CFSFDP) [43] is a representative method. The idea of this method is that "the cluster centers are determined as those points that not only have higher density than their neighbors, but also keep a certain distance from the point with higher density than them". In this clustering method, the two thresholds, i.e., distance and density, are important to determine the cluster centers. The points which have higher distances and densities at the same time can be determined as the cluster centers.
In our opinion, CFSFDP is useful in actively selecting the cluster centers and clustering process; however, the wild points (i.e., the inter-class points) are important and difficult to distinguish. To solve this problem, the effective clustering method based on modified clustering by fast search and find of peaks (MCFSFDP), is proposed to actively select core samples by choosing the adaptive distance threshold [28]. The MCFSFDP algorithm is similar to the CFSFDP algorithm in [43], the class center must have two characters, the first character is "a higher density than their neighbors" and the second is "a relatively large distance from points with higher densities". Different from the CFSFDP, the MCFSFDP chooses the class centers only by larger distance, which can effectively acquire the cluster centers and the wild points and enhance the quality of the selected samples. The details of the proposed method are as follows.
The samples C j N j=1 in candidate set are used for actively selected core samples via clustering based active learning method; for simplicity, each sample C j in candidate set is denoted as point j, which is actually a column vector. For each point j, we calculate the local density ρ j and distance δ j from the point with higher density; if point j has the highest density, the largest distance between j and the other points is denoted as δ j .
The local density ρ j of point j is given in Formula (1): Formula (1) represents the number of samples around the point j in a threshold radius d c . The values of δ j and ρ j are depended on the Euclidean distance d jk , d jk is determined by any pair of the point j and point k.
here, d c is considered as a cut-off distance. ρ j denotes the number of points which in the radius d c and j is the center point. δ j is the minimum distance between j and any other points with higher density, which is shown in Formula (2): where ρ k denotes the local density of k. For the point with maximum local density, we usually take δ j = max k (d jk ). δ j is much larger than the typical nearest neighbor distance only for points that are local or global maxima in the density. The cluster centers are recognized as points for which the value of δ j is anomalously large and the value of ρ j is higher than a value density at the same time.
The distance and density of each point are directly shown in the decision graph. We provided the decision graph of samples in candidate set with a size of 200 × 1 for the Indian Pines dataset [44], as shown in Figure 3. The Indian Pines dataset is often used in the hyperspectral image classification task, which was gathered by AVIRIS sensor over the Indian Pines test site in North-western Indiana and consists of 145 × 145 pixels and 224 spectral reflectance bands in the wavelength range 0.4-2.5 µm. In the threshold determining step, the MCFSFDP is different from CFSFDP [43], the MCFSFDP is used to select core samples for manually labeling. The distance δ is considered as the only threshold from the decision graph to select samples. This operation can select not only the cluster centers but also the wild points to enhance the quality of samples for increasing the classification result. Because the wild points are in the boundary of any pair of two clusters, which are usually difficult to distinguish, training this type of sample is useful for improving the classification result.
For selecting the core samples adaptively, we should select an optimal distance threshold value δ A . n In Formula (3), δ v denotes the distance, which contains points, and f (δ v ) denotes the mapping relationship of the number n v of points whose distances are larger than or equal to δ v , as shown in Figure 4a. In Formula (4), where δ v+1 ≥ δ v , c v denotes the differential of n v . Formula (5) denotes the variation quantity of the number points with δ v , as shown in Figure 4b. Formula (4) is the intermediate result of Formulas (3) and (5).
In the MCFSFDP method, the adaptive distance threshold is denoted as δ A , and the points whose distance are larger than δ A are automatically selected as core samples. δ v is an important point that must ensure that the number n v and n v+1 of points are stable, and at the same time, that the value q v is larger than the value q v+1 . At this point, δ v is selected as the adaptive distance δ A .
In the Indian Pines dataset, as can be seen from Figure 4a, we can find the distance range (0.15-0.17), and the n v begins to approach stability. As can be seen from Figure 4b, c v with the distance value δ v in range (0.15-0.17) has a local maxima of 0.15. Therefore, 0.15 is considered as the adaptive distance δ A in the Indian Pines dataset.
With the adaptive distance δ A , the points j with the distance value δ j > δ A are adaptively chosen as core samples for manual labeling.
Then, the labeled core samples are added into training samples to form the augmented training set. The number of core samples is denoted as T, and the number of training samples after expansion is M + T. B g M+T g=1 denotes the final training dataset.

K-Means Clustering-Based Pseudo-Labeling Scheme
After selecting the core samples via MCFSFDP, we use K-means clustering to obtain the pseudo-labels of the samples C j N j=1 in candidate set. The steps are as follows: Step 1: Randomly selecting k samples from C j N j=1 as the initial cluster centers, i.e., µ 1 , . . . , µ f , . . . , µ k .
Step 2: Calculating the distance between each vector C j with each class center µ f , and the distance is Euclidean distance. If C j is closest to µ f , C j is classified as the category of cluster center µ f .
Step 3: For all c f samples C j , which have the same label of µ f in class f, recalculating the new cluster center through calculating the average value µ f .
where c f is the number of samples in class f .
Step 4: Repeating step 2 and step 3 Z times. Z is the iteration times of the K-means process, which is a parameter. After the computing process, the cluster centers represent the final average values, i.e., µ Z 1 , . . . , µ Z f , . . . , µ Z k . The labels of samples C j N j=1 in candidate set belong to {1, . . . , f, . . . , k}, which are all pseudo-labels by K-means clustering.
The candidate samples with pseudo-labels are utilized to pre-train the DNN model.

Fine-Tuning and Testing
After obtaining the core samples via MCFSFDP and generating the pseudo-labels of samples C j N j=1 in candidate set, transfer learning is utilized to train the DNN model. The samples in candidate set with pseudo-labels are utilized to pre-train the DNN model.
Then, the samples B g M+T g=1 in augmented training set are used to fine-tune the DNN model for obtaining the final network classification model.
Finally, testing the network with the samples {Q u } K u=1 in the testing set is performed. The schematic diagram of the structure of the DNN model and training process is shown in Figure 5. We use the back-propagation neural network [45] as the DNN model. This DNN model contains an input layer, three fully connected layers and a soft-max layer. The first fully connected layer has 512 hidden nodes, the second fully connected layer has 2048 hidden nodes and the third fully connected layer has 1024 hidden nodes. The number of nodes in the soft-max layer varies with the pre-training process and the fine-tuning process because the number of categories with pseudo-labels in candidate set in the pre-training process is different from the number of categories with true labels in the augmented training set in the fine-tuning process.

Experiments and Analysis
To validate the feasibility and effectiveness of the proposed method, two HSI datasets were used in the experiments. In this section, we firstly introduce the datasets. Secondly, the experimental parameter settings are illustrated. Finally, ablation experiments and comparative experiments are performed to show the HSI classification results of the proposed method.

Datasets
In this paper, two widely used public HSI image datasets were adopted in our experiments. Dataset 1: In order to evaluate the proposed method, the first dataset was the Indian Pines image, which was imaged by the Airborne Visual Infrared Imaging Spectrometer (AVIRIS) [44], as shown in Figure 6a. The ground truth is shown in Figure 6b. The size of this image is 145 × 145 pixels with 224 spectral bands, and the wavelength ranges from 0.4 to 2.5 µm. Among the pixels, only 10,249 pixels are feature pixels, and the remaining 10,776 pixels are background pixels. For the exact purpose of eliminating the bands that cannot be reflected by water, the number of bands was reduced to 200. In the actual classification, since background pixels need to be eliminated, there were 16 categories in total. Each category of image samples number is given in Table 1.  The samples in training set could be regarded as limited samples with labels. The samples in candidate set were used for choosing core samples, and the core samples are added into the training samples as a new augmented training set. The samples in candidate set were also used for pre-training the DNNs with their pseudo-labels. The samples in testing set were used for evaluating the effect of the proposed method.

Experimental Parameter Settings
In the experiment, the samples were randomly selected from the HSI dataset. The training sample set includes 200 samples. For utilizing the effective cluster-inspired active learning method, the samples in candidate set were used to obtain the core samples through the MCFSFDP algorithm for manual labeling, and the pseudo-labels of the samples in candidate set were generated through the K-means algorithm for the DNN's pre-training. The number of cluster centers was set to 10, 20, . . . , 100.
In the experiment, as shown in Figure 5, the DNN framework used three fully connected layers and one soft-max layer. In our algorithm, three fully connected layers, namely, hidden layers, all adopted Leaky ReLU as the activation function. The number of neuron nodes in the three hidden layers was 512, 2048 and 1024, respectively. The learning rate was 0.0001. The batch size was designed as 256.
The code was run on a computer with Intel i9-11900K, NVIDIA 3060 GPU × 2, 128 GB Memory, and 1TB SSD.

Effectiveness of the Core Samples Actively Selected via MCFSFDP
The effectiveness of the core samples generated by the actively selected method is worthy to be verified. To verify the influence of core samples selected based active learning method in classification, we compared the accuracy of randomly selected samples based active learning method with the accuracy of actively selected core sample-based method, the number of randomly selected samples from candidate set being same as the core samples. The testing accuracy via the training samples with randomly selected samples and training samples with core samples via our proposed MCFSFDP in Dataset 1 is shown in Table 3. Additionally, the testing accuracy for Dataset 2 is shown in Table 4. In the Indian Pines dataset, the adaptive distance threshold is calculated as 0.15, and we obtain 55 core samples via the MCFSFDP algorithm. The curve for determining the adaptive distance is shown in Figure 4. The adaptive distance is 0.12, and the number of core samples is 40 in Dataset 2. The curve for determining the adaptive distance is shown in Figure 8. As can be seen from Tables 3 and 4, the testing result for small samples with core samples is higher than the result for small samples with randomly selected samples. Specifically in Table 4, the overall accuracy (OA) of small samples with core samples is shown to be more than 2% greater than the overall accuracy (OA) of randomly selected samples. Therefore, using the actively selected core samples via MCFSFDP to train the BP neural network can enhance the testing accuracy of the small sample HSI classification.
Additionally, the actively selected core sample-based method not only enhances the quantity but also the quality of the training samples.
The other testing results in the two datasets, i.e., the accuracy of each class, average accuracy (AA) and Kappa, which are also shown in Tables 3 and 4. 3.3.2. Effectiveness of the Proposed Method-Based on Actively Selected Core Samples Through the above experiments, we have demonstrated the effectiveness of the actively selected core samples method in small sample HSI classification. The classification results prove the effectiveness of the proposed method based on actively selected samples on two datasets.
In the two datasets, the original training samples set, which has 200 samples with their labels, is used for training the BP neural network, while the testing samples set is used for testing the network. In the Indian Pines dataset, the adaptive distance threshold is calculated as 0.15, and we obtain 55 core samples via the MCFSFDP algorithm. These core samples are added into the training samples set and we utilize the new augmented training dataset to train the network. The testing result of the original training samples set and the augmented training samples set with core samples in Dataset 1 is shown in Table 5, while the curve for determining the adaptive distance is shown in Figure 4. The testing accuracy for the Salinas dataset is shown in Table 6, and the curve for determining the adaptive distance is shown in Figure 8. The adaptive distance is 0.12, and the number of core samples is 40 in dataset 2, which can also be seen in Tables 3 and 4. In Table 5, the testing accuracy (OA) with the original training samples set for Dataset 1 is 58.9% after 13,000 training epochs. In contrast to this, the testing accuracy (OA) with the augmented training samples set with core samples in Dataset 1 is 67.8% after 13,000 training epochs. According to the data, the testing accuracy (OA) with the original training samples set is lower than the testing accuracy with the augmented training samples set with core samples.
Additionally, in Table 6, the maximal testing accuracy (OA) with the original training samples set in Dataset 2 is 81.7% after 11,000 training epochs. In contrast to this, the testing accuracy (OA) with the augmented training samples set with core samples for Dataset 2 is 85.6% after 11,000 training epochs. According to the data, the testing accuracy (OA) with the original training samples set is also lower than the testing accuracy (OA) with the training samples set with core samples. Consequently, obtaining the core samples via MCFSFDP added to the training samples set, which is demonstrated to enhance the small sample HSI classification accuracy in Dataset 1 and Dataset 2.
The other testing results in the two datasets, i.e., the accuracy of each class, average accuracy (AA) and Kappa, which are also shown in Tables 5 and 6.

Effectiveness of Pre-Training by Testing Samples with Pseudo-Labels
Through the above experiments, we have proved the effectiveness of active learning in small sample HSI classification. In order to demonstrate the effectiveness of the proposed method of pre-training using candidate samples with pseudo-labels via clustering combined with adaptive active learning, we labeled the pseudo-labels for the candidate samples via the K-means algorithm and utilized these data to pre-train the BP neural network. Then, the training samples set with core samples is used for fine-tuning the network.
To determine the appropriate number of clusters for pseudo-labels, we observe the testing accuracy of the proposed method with a different number of clusters after 13,000 training epochs in Dataset 1. The testing accuracy of the proposed method with different numbers of clusters after 13,000 training epochs in Dataset 1 is shown in Table 7. The testing accuracy of the proposed method with different numbers of clusters after 11,000 training epochs in Dataset 2 is shown in Table 8. In Table 7, the maximal testing accuracy (68.9%) of the proposed method for Dataset 1 shows that the number of cluster centers is 50 when using 13,000 training epochs. Compared with the value of Table 5, the testing accuracy of the proposed method is higher than that of the original training samples set (58.9%) and the training samples set with core samples 67.8%). According to the data, compared with the method of only adaptive active learning, the testing accuracy of the proposed method significantly improved. Additionally, in Table 8, the maximal testing accuracy (86.8%) of the proposed method for Dataset 2 shows that the number of cluster centers is 80 when using 11,000 training epochs. Compared with the value of Table 6, the testing accuracy of the proposed method is higher than that of the original training samples set (81.7%) and the training samples set with core samples (85.6%).
In Tables 7 and 8, due to the different distributions of samples in the two datasets, the number of clusters in the Indian Pines dataset and the Salinas dataset are different, which choose 50 clusters and 80 clusters, respectively. Consequently, the proposed cluster-inspired active learning method is demonstrated to enhance the small sample HSI classification accuracy and has a better effect than the above method in Tables 3-6 on Dataset 1 and Dataset 2.

The Proposed Method Compared with the Other Methods
In these experiments, our method is compared with other methods, including random based active learning method, K-means based active learning method, minimum probability-based active learning method, CFSFDP based active learning method [43] and our MCFSFDP based active learning method [28] and the proposed cluster inspired active learning method.
Specifically speaking, K-means selected sample-based method utilizes the K-means algorithm to extracts samples. Minimum probability-based active learning method uses n minimum probabilities of predicted samples to choose samples. CFSFDP and MCFSFDP selected sample-based methods are used to increase the number of samples. The classification effect is different through Back-Propagation neural network. The testing accuracy (OA) of these methods compared with the proposed method for Dataset 1 is shown in Table 9. The testing accuracy (OA) for Dataset 2 is shown in Table 10. Through the classification results of different methods for Dataset 1 and 2, it can be seen that the testing accuracy of the proposed cluster-inspired active learning method is better than the other methods. Among them, the testing accuracy of K-means-based active learning method is lowest, and our MCFSFDP based active learning method is the second-best method.

Influence of the Network Training Iterations
The experimental results in Tables 11 and 12 show that adding the core samples into the training samples set for training the network can obtain better testing accuracy than using original small samples for Dataset 1 and Dataset 2.
According to the data in Table 11, the number of epochs, which is 13,000, is confirmed as the best training iteration with core samples, as it obtains the testing accuracy (58.9%) in the original training samples set for Dataset 1. The testing accuracy of the training samples set with core samples is 67.8%, which is the best testing accuracy of training samples with core samples, subsequent experiments still use 13,000 epochs as the best training iteration. The best testing accuracy in the original training sample set is 60.1% with the 11,000 iterations. In addition, we choose 13,000 epochs as the iteration times in Dataset 1. The iteration influence curve is shown in Figure 9.   According to the data in Table 12, the number of epochs, which is 6000, is confirmed as the best training period for attaining the best testing accuracy (82.9%) in the original training samples set for Dataset 2. The testing accuracy of the training samples set with core samples is 84.1%, which is higher than that of the original samples set. However, the testing accuracy of training samples with core samples trained using 11,000 epochs is 85.6%, it is the best training result, and subsequent experiments use 11,000 epochs as the condition. The iteration influence curve is shown in Figure 10.

Influence of the Number of Clusters and Iterations
As can be seen from Tables 13 and 14, the testing accuracy of the proposed method is influenced by the number of clusters via K-means and the network training epochs.
In Table 13, the best accuracy is shown to be 68.9%, when we choose 13,000 iterations and 50 clusters for Dataset 1. The best testing accuracy, as shown in Table 14, is 86.8% with the best parameters, which are 11,000 iterations and 80 clusters. Therefore, Tables 9 and 10 demonstrate the two best accuracies as the final results for Dataset 1 and Dataset 2. Table 13. The testing accuracy of the proposed method with different numbers of clusters and iterations for Dataset 1.

Dataset Epochs
The Number of Clusters Testing Accuracy OA (%)

Conclusions
In this paper, we present a cluster-inspired active learning method for HSI classification, which mainly contributes to two aspects. On one hand, the modified clustering by fast search and find of peaks (MCFSFDP) clustering method is utilized to select highly informative and diverse samples from samples in candidate set for manual labeling, which empowers us to appropriately augment the limited training set (i.e., labeled samples) and thus improve the generalization capacity of the baseline DNNs model. On the other hand, another K-means clustering-based pseudo-labeling scheme is utilized to pre-train the DNN model with all samples candidate set. By doing this, the pre-trained model can be effectively generalized to testing samples after being fine-tuned based on the augmented training set. The experimental results demonstrate that the proposed method is useful in selecting core samples with high quality to expand the data and improve the small sample HSI classification accuracy effectively.