Shape Adaptive Neighborhood Information-Based Semi-Supervised Learning for Hyperspectral Image Classiﬁcation

: Hyperspectral image (HSI) classiﬁcation is an important research topic in detailed analysis of the Earth’s surface. However, the performance of the classiﬁcation is often hampered by the high-dimensionality features and limited training samples of the HSIs which has fostered research about semi-supervised learning (SSL). In this paper, we propose a shape adaptive neighborhood information (SANI) based SSL (SANI-SSL) method that takes full advantage of the adaptive spatial information to select valuable unlabeled samples in order to improve the classiﬁcation ability. The improvement of the classiﬁcation mainly relies on two aspects: (1) the improvement of the feature discriminability, which is accomplished by exploiting spectral-spatial information, and (2) the improvement of the training samples’ representativeness which is accomplished by exploiting the SANI for both labeled and unlabeled samples. First, the SANI of labeled samples is extracted, and the breaking ties (BT) method is used in order to select valuable unlabeled samples from the labeled samples’ neighborhood. Second, the SANI of unlabeled samples are also used to ﬁnd more valuable samples, with the classiﬁer combination method being used as a strategy to ensure conﬁdence and the adaptive interval method used as a strategy to ensure informativeness. The experimental comparison results tested on three benchmark HSI datasets have demonstrated the signiﬁcantly superior performance of our proposed method.


Introduction
Hyperspectral remote sensing has been widely used in earth observation with the special advantages of obtaining rich spectral information with hundreds of narrow and continuous spectral bands [1,2]. Classification is an indispensable part of hyperspectral remote sensing image processing and applications [3]. However, hyperspectral images (HSIs) classification is confronted with great challenges due to its high-dimensional characteristics. Such high data dimensionality typically leads to the requirement of abundant labeled samples for supervised classification; however, the labels on the samples are often quite labor-intensive and time-consuming to obtain. When the number of labeled samples is limited, the so-called Hughes phenomenon often occurs, in that the classification accuracy decreases with increasing data dimensionality [4]. To address this issue, two new solutions have emerged in recent years [5]: one solution is to develop classifiers that can perform efficiently in the scenario of limited labeled samples and high-dimensional features, such as support vector machine (SVM) classifier [6,7] and multinomial logistic regression (MLR) [8,9]; the second solution is Remote Sens. 2020, 12 semi-supervised learning (SSL), in which unlabeled samples are introduced into the training sets in order to improve the capability of the classifier, because the unlabeled samples are helpful in improving the estimation of the class boundaries and can be obtained in a much easier way. As a very effective solution, SSL has attracted great attention for HSI classification [10]. The main challenges of SSL are how to determine the labels of the new selected samples and how to choose the most informative unlabeled samples. For the first challenge, guaranteeing the confidence of selected unlabeled samples is the key. One of the general methods is to introduce spatial information into SSL. Such spatial information has usually been applied to two aspects: the process of classification and the process of SSL. It has been broadly acknowledged that utilizing spatial information in remote sensing image classification can effectively remove the salt-and-pepper noise and improve classification accuracy [7,11,12]. Previous studies have demonstrated that the spatial information, such as texture feature [13][14][15][16], Markov random field [17][18][19], and extended morphological attribute profiles [20][21][22], can improve the classification results of HSIs classification significantly. Relevant studies have showed the effectiveness of spectral-spatial information-based SSL methods [23][24][25][26].
In the process of SSL, the spatial information can be used to select unlabeled samples with high confidence according to the hypothesis of local similarity, which assumes that adjacent pixels share the same class label. For example, in [27], high confidence unlabeled samples were selected from the labeled samples' neighborhoods which is defined by first-order spatial connectivity. In [28], square neighborhood-based spatial constraints were introduced to exploit the spatial consistency to correct and reassign the incorrectly classified unlabeled labels. In [29], the spatial information extracted by a two-dimensional Gabor filter was stacked with spectral features and fed to an SVM classifier; at the same time, the stacked spatial-spectral information was used to construct label propagation graphs and select unlabeled samples for SSL. However, a fixed size window-based spatial information usually conflicts with the spatial characteristics of real scenarios; in other words, the geometric features of ground objects are usually shape adaptive. Therefore, the superpixel-based spatial information was exploited in the process of SSL. For example, Liu et al. [30] proposed a superpixel-based SSL methods that introduced the concept of spatial adaptivity into unlabeled samples' selection to improve the classification performance. Balasubramaniam et al. [31] used softmax classifier to choose the right samples to update optimized training library of objects (superpixels) for multi-classifier object-oriented image analysis (OOIA). In [32], a superpixel graph and discrete potential based SSL method was proposed, in which each superpixel was viewed as a node in a graph which leads to a significant reduction in the volume of the HSI to be classified. However, the superpixel method can extract the spatial information adaptively at the object level, but cannot extract spatial neighborhood information in a pointwise adaptive manner. For the pixels located at the class boundary area, the spatial neighborhood information cannot be accurately represented by superpixels. In addition, the unlabeled samples are mainly selected by using the spatial neighborhood information of the labeled samples in the above methods; considering that the number of initially labeled samples is very small, the valuable unlabeled samples in their neighborhoods are limited considerably. To utilize the spatial neighborhood information of unlabeled samples, Tan et al. [33] defined a circular neighborhood that took an unlabeled sample as the center to assign the unlabeled sample with the label of the closest labeled sample that appears in the neighborhood.
For the second challenge of SSL, active learning (AL) provides a promising solution by using a variety of heuristic methods to select unbiased and informative samples from unlabeled samples, which can significantly reduce the cost of acquiring large training samples. Among the existing AL methods, breaking ties (BT) [34] is a simple and high-performance sample selection criterion that has been extensively studied. For example, in [35], unlabeled samples selected by breaking ties method were applied in order to improve the classification performance; furthermore, Li et al. [36] proposed a modified breaking ties (MBT) active learning method and applied it to spectral-spatial classification of HSIs [37]. Wang et al. [38] discussed the influence of random sampling (RS), BT, and MBT on a proposed spatial-spectral information based SSL algorithm, and the result showed that the BT method Remote Sens. 2020, 12, 2976 3 of 22 performed significantly better than RS and MBT as the number of unlabeled samples increased. BT and multiclass-level uncertainty methods were adopted to model the primitive co-occurrence matrix based active relearning framework in order to effectively integrate spatial information into the AL [39]. Shu et al. [40] proposed a BT-MS active learning method of HSI classification that introduced the mean shift method into the BT algorithm to improve the representativeness of the samples.
Based on the above discussion, in this paper, we propose a shape adaptive neighborhood information-based SSL (SANI-SSL) method to make full use of the adaptive spatial information and to select valuable unlabeled samples. The unlabeled samples selection is divided into two parts: (1) in the first part, samples are selected from the labeled samples' spatial neighborhoods, whereby the SANI and the BT algorithm are utilized to derive reliable and valuable unlabeled samples; and, (2) in the second part, samples are selected from the unlabeled samples' spatial neighborhoods, whereby an adaptive interval strategy is utilized to ensure the informativeness of the unlabeled samples, and the SANI and classifier combination strategy are utilized to ensure the confidence. The main contributions of this paper lies in two aspects: (1) Compared with the fixed size window-based and superpixel based spatial neighborhood information that most existing research methods adopted, the SA based method can represent the neighborhood information in a pointwise adaptive manner. We exploit the SANI in our proposed SSL method in order to select new unlabeled samples, which can make the training samples more representative and valuable and, thus, achieve better classification accuracy. (2) The unlabeled samples' selection makes full use of SANI of the whole image which avoids the restriction of limited labeled samples. In addition, an adaptive interval strategy is proposed in our method to ensure the informativeness of unlabeled samples selected from unlabeled samples' neighborhoods. The proposed strategy utilizes the uncertainty information of available training samples to select more diverse unlabeled samples.
The remainder of this paper is organized, as follows. In Section 2, we briefly introduce the spatial information extraction and classification methods and the framework of our SANI-SSL method. Section 3 presents experimental results on three public hyperspectral datasets. Finally, Section 4 discusses the result in Sections 3 and 5 concludes this paper.

Methodology
First, we briefly define the basic notations used in this paper. Let x ≡ {x 1 , · · · x n } ∈ R d×n denote an HSI, where d is the number of spectral bands and n is the number of samples; let κ ≡ {1, · · · , K} denote a set of class labels and let y ≡ y 1 , · · · , y n be the image labels. The training sets are composed of initial labeled samples and unlabeled samples and are represented as D Tr .

Extended Morphological Attribute Profile
In this paper, we consider combining spectral and spatial information together in order to improve the feature discriminability of different classes. The mathematical morphology (MP) is a powerful framework for the spatial information analysis of remote sensing images [41]. To account for the important spectral information of the image, in [20], extended morphological attribute profiles (EMAPs) were proposed for HSIs analysis. The EMAPs are extracted via morphological operators using different kinds of attributes, such as area, moment of inertia, and standard deviation. For the connected component C of the grayscale image f , if the attribute verifies the predefined condition T(C) = attr(C) > λ, then the regions are kept unchanged; otherwise, they are merged into adjacent regions with similar grayscale values. If the gray level of the adjacent region is brighter, the operation is called a thickening operation and, if not, it is called a thinning operation. The APs can be obtained Remote Sens. 2020, 12, 2976 4 of 22 through a series of thickening operations and thinning operations with different attributes thresholds {λ 1 , λ 2 , · · · , λ n }, as shown in formula (1): where φ( f ) and γ( f ) represent the thickening thinning operation, respectively. Subsequently, the APs of different grayscale images are computed and stacked together to construct the extended attribute profiles (EAPs): Last, the EAPs of multiple attributes are computed and stacked together to construct the EMAPs. In this paper, the spectral information and EMAP information are exploited together to improve the classification performance. To avoid significant computational complexity, the principal component analysis is firstly implemented for HSIs, and the former four principal components that contain more than 99% of the total variance of HSIs are used to extract the EMAP features on according to different attributes.

Shape Adaptive Neighborhood Information
In this paper, we adopt the shape adaptive (SA) method proposed by [42] in order to extract neighborhood information with an adaptive size that is multidirectional for every pixel. The SA method has been successfully used in high-quality denoising of images and has been illustrated an efficient method to extract homogeneous region [43,44]. According to [42], a SA region is determined by the central pixel and the positions of its eight polygon vertices. The eight directions are denoted as {θ|θ k = kπ/4} k=1,2,··· ,8 and are already known. The eight length h θ k (k = 1, 2, · · · , 8) from the central pixel to the vertices need to be computed. With a predefined candidate length set H = {h 1 , h 2 , · · · , h m }, a varying-scale family of directional local polynomial approximation (LPA) convolution kernels ( g h,θ k h∈H ) are applied to the first principal component (PC1) of the HSI in order to obtain a set of directional varying scale estimates y h,θ k h∈H , as shown in formula (3): where ⊗ denotes the convolution operation. More specifically, the LPA kernels are determined by polynomial and the window size of eight directions, which can perceive the spatial characteristics of the center pixel in different directions. The main reason that we adopt PC1 to obtain local estimates y h,θ k is that the principal component analysis (PCA) algorithm is simple and quite effective for extracting relevant information in HSI, the PC1 contains more than 90% of the total variance of HSIs and it can save the calculation time while preserving the original information as much as possible.
The estimation values of the pixels located at the local area can be calculated according to the polynomial, and the estimation error can be measured in order to determine the corresponding confidence interval according to the estimation error at each scale. For pixel x, the corresponding confidence intervals are computed, as follows: where µ is the threshold parameter and std(y h,θ k ) represents the standard deviation of y h,θ k Subsequently, the scale in θ k of point x can be determined by the intersection of confidence intervals (ICI) rule: The final SA regions are obtained through a convex combination of the corresponding eight directional scale estimates. The SA regions that were determined by LPA-ICI barely cross the boundaries Remote Sens. 2020, 12, 2976 5 of 22 of the HSI, since pixels on both sides of the edges are usually dissimilar, which can guarantee category consistency of the pixels in the SA region to a great extent. Figure 1 shows examples of neighborhoods determined by the SA method.
Remote Sens. 2020, 12, 2976 5 of 22 guarantee category consistency of the pixels in the SA region to a great extent. Figure 1 shows examples of neighborhoods determined by the SA method.

Sparse Multinomial Logistic Regression
In this paper, we consider using the kernel based sparse multinomial logistic regression (SMLR) model to predict the label and estimate the post probability of every pixel. The main reasons are, as follows. Firstly, the kernel transformation method can accomplish the separation of the classes in a higher dimensional space, which helps to solve the high-dimensional problem of HSI. Secondly, the SMLR method can perform efficiently in HSIs classification especially when the labeled training samples are limited. In addition, the SMLR method has great performance in combination with the kernel method. Many experiments have proved that SMLR has much better performance than conventional classification methods when dealing with HSIs classification problem [8,[45][46][47]. The SMLR was also successfully applied to semi-supervised classification [30,33,48].
The SMLR model is build based on the MLR classifier, which is formally given by [49],

[ , , ]
K denotes the logistic regression parameter; and the linear or nonlinear transformation of input feature vectors (i.e., the extracted EMAPs and spectral features) composed of l fixed functions. Relevant researches showed that nonlinear features, such as kernel-based features, are more conducive to improving data discriminability for HSI classification. In this paper, the Gaussian radial basis function (RBF) [50] based kernel transformation is used to improve the data discriminability, which is written as: where Spe x , Spa x represent the spectral and EMAP feature of entity x , and  represents the kernel width. The transformed spectral-spatial features of training samples are fed to the MLR model to solve logistic regression parameter . To reduce the computational complexity of calculating , in [36], the SMLR method was proposed to perform a sparsity constraint on the regression parameters  . According to the SMLR algorithm, the parameter  can be modeled as a prior random vector using the Laplacian density  , where  is the regularization parameter that controls the sparsity level of  . With the training samples,  are formulated, as follows:

Sparse Multinomial Logistic Regression
In this paper, we consider using the kernel based sparse multinomial logistic regression (SMLR) model to predict the label and estimate the post probability of every pixel. The main reasons are, as follows. Firstly, the kernel transformation method can accomplish the separation of the classes in a higher dimensional space, which helps to solve the high-dimensional problem of HSI. Secondly, the SMLR method can perform efficiently in HSIs classification especially when the labeled training samples are limited. In addition, the SMLR method has great performance in combination with the kernel method. Many experiments have proved that SMLR has much better performance than conventional classification methods when dealing with HSIs classification problem [8,[45][46][47]. The SMLR was also successfully applied to semi-supervised classification [30,33,48].
The SMLR model is build based on the MLR classifier, which is formally given by [49], where ω ≡ [ω (1) , · · · , ω (K−1) ] denotes the logistic regression parameter; and h( are the linear or nonlinear transformation of input feature vectors (i.e., the extracted EMAPs and spectral features) composed of l fixed functions. Relevant researches showed that nonlinear features, such as kernel-based features, are more conducive to improving data discriminability for HSI classification. In this paper, the Gaussian radial basis function (RBF) [50] based kernel transformation is used to improve the data discriminability, which is written as: where x Spe , x Spa represent the spectral and EMAP feature of entity x, and σ represents the kernel width. The transformed spectral-spatial features of training samples are fed to the MLR model to solve logistic regression parameter ω. To reduce the computational complexity of calculating ω, in [36], the SMLR method was proposed to perform a sparsity constraint on the regression parameters ω. According to the SMLR algorithm, the parameter ω can be modeled as a prior random vector using the Laplacian Remote Sens. 2020, 12, 2976 6 of 22 density p(ω) ∝ exp(−γ ω 1 ), where γ is the regularization parameter that controls the sparsity level of ω. With the training samples, ω are formulated, as follows: In [36], the logistic regression via variable splitting and augmented Lagrangian algorithm (LORSAL) [51] was adopted to solve the optimization problem in (8). After solving the parameter ω, the label and the post probability of every pixel can be calculated. The labels of testing samples are assigned by the class that has the maximum probability.

Adaptive Sparse Representation
In this paper, another classification method, called adaptive sparse representation (ASR) [52], is adopted in order to guarantee the confidence of unlabeled samples. This method utilizes an adaptive sparse strategy to allow every test pixel to adaptively choose its own appropriate dictionary atoms within each class based on the principle of sparse representation that unlabeled samples can be represented as a linear combination of dictionary atoms constructed by labeled samples from all classes. The adaptive norm used in this method can exploit the strong correlations among the training dictionary while still preserving their diversity in a flexible way. In this way, the test pixels can be expressed more accurately, and the classification performance had shown great superiority when compared with SVM [53] and LORSAL-MLL [36]. However, the classification result does not contain the post probability of every class and the decomposition and classification of the whole image is quite time-consuming. Therefore, we take this classifier as an auxiliary classification method that only classifies the candidate unlabeled samples to ensure high confidence of the unlabeled samples.

Unlabeled Samples Selection Method in Proposed SANI-SSL
In the proposed method, we accomplish classification improvement by selecting unlabeled samples from two parts. We first consider the samples from the initial labeled samples' SA neighborhood (LSAN), which is more reliable and contains abundant unused information. Because the number of labeled training samples is very small, the unlabeled samples in their SA neighborhoods will be trapped in a limited area and the available information is restricted. Thereinafter, the information of the unlabeled samples' SA neighborhood (uLSAN) is also utilized to select more valuable samples. Figure 2 illustrates the flowchart of the unlabeled samples selection of our SANI-SSL method.
Remote Sens. 2020, 12, 2976 6 of 22 In [36], the logistic regression via variable splitting and augmented Lagrangian algorithm (LORSAL) [51] was adopted to solve the optimization problem in (8). After solving the parameter  , the label and the post probability of every pixel can be calculated. The labels of testing samples are assigned by the class that has the maximum probability.

Adaptive Sparse Representation
In this paper, another classification method, called adaptive sparse representation (ASR) [52], is adopted in order to guarantee the confidence of unlabeled samples. This method utilizes an adaptive sparse strategy to allow every test pixel to adaptively choose its own appropriate dictionary atoms within each class based on the principle of sparse representation that unlabeled samples can be represented as a linear combination of dictionary atoms constructed by labeled samples from all classes. The adaptive norm used in this method can exploit the strong correlations among the training dictionary while still preserving their diversity in a flexible way. In this way, the test pixels can be expressed more accurately, and the classification performance had shown great superiority when compared with SVM [53] and LORSAL-MLL [36]. However, the classification result does not contain the post probability of every class and the decomposition and classification of the whole image is quite time-consuming. Therefore, we take this classifier as an auxiliary classification method that only classifies the candidate unlabeled samples to ensure high confidence of the unlabeled samples.

Unlabeled Samples Selection Method in Proposed SANI-SSL
In the proposed method, we accomplish classification improvement by selecting unlabeled samples from two parts. We first consider the samples from the initial labeled samples' SA neighborhood (LSAN), which is more reliable and contains abundant unused information. Because the number of labeled training samples is very small, the unlabeled samples in their SA neighborhoods will be trapped in a limited area and the available information is restricted. Thereinafter, the information of the unlabeled samples' SA neighborhood (uLSAN) is also utilized to select more valuable samples. Figure 2 illustrates the flowchart of the unlabeled samples selection of our SANI-SSL method.

Selecting Unlabeled Samples from LSAN
The selection of unlabeled samples from LSAN is based on two steps. In the first step, a set of candidate unlabeled samples with high confidence is chosen from the SA neighborhood of the labeled samples. In second step, the BT algorithm is adopted on the previously constructed candidate set, in such a way that the most valuable samples from the candidate set can be automatically selected. The specific procedure is shown in Figure 3 for illustrative purpose.

Selecting Unlabeled Samples from LSAN
The selection of unlabeled samples from LSAN is based on two steps. In the first step, a set of candidate unlabeled samples with high confidence is chosen from the SA neighborhood of the labeled samples. In second step, the BT algorithm is adopted on the previously constructed candidate set, Remote Sens. 2020, 12, 2976 7 of 22 in such a way that the most valuable samples from the candidate set can be automatically selected. The specific procedure is shown in Figure 3 for illustrative purpose. (1) The construction of candidate training samples In HSIs classification, there are two basic assumptions for the generation of unlabeled samples for SSL. The first assumption is that samples that have similar spectral characteristics likely belong to the same class. The second assumption is that spatially adjacent pixels likely share same class. Therefore, we integrate the spatial and spectral information into the SSL process in order to construct candidate training samples.
First, we train the spectral-spatial information-based SMLR classifier while using the initial labeled training samples to produce a classification map that contains the probabilities, as mentioned in 2.2.1. Second, we extract the SA neighborhoods of the labeled samples. Finally, based on the classification map, we select neighboring unlabeled samples whose predicted class labels are the same as the labels of the corresponding center pixels to constitute the candidate training samples . In this way, an initial pool of candidate training samples with high confidence is established.
(2) Active learning The candidate training samples usually contain a large amount of redundant information, since the neighboring pixels carry very similar information to the central pixel. Such a redundancy, although, does not reduce the quality of the classifier, slows down the training phase considerably. AL methods are usually adopted to determine unlabeled training samples that have high uncertainty in order to reduce the redundancies and select the most informative samples from candidate sets. In this paper, we adopt the BT algorithm to evaluate the uncertainty of every pixel by calculating the difference between the two highest probabilities, which is formulated as where M k represents the class label with the largest posterior probability for i x , and  \{ } M k represent all of the class labels, excluding M k . The value of BT is between (0, 1), and a smaller value indicates more uncertainty. However, it is important to emphasize that the BT algorithm assumes that only one sample satisfies the condition. Taking the computational complexity into consideration, we select (1) The construction of candidate training samples In HSIs classification, there are two basic assumptions for the generation of unlabeled samples for SSL. The first assumption is that samples that have similar spectral characteristics likely belong to the same class. The second assumption is that spatially adjacent pixels likely share same class. Therefore, we integrate the spatial and spectral information into the SSL process in order to construct candidate training samples.
First, we train the spectral-spatial information-based SMLR classifier while using the initial labeled training samples to produce a classification map that contains the probabilities, as mentioned in Section 2.2.1. Second, we extract the SA neighborhoods of the labeled samples. Finally, based on the classification map, we select neighboring unlabeled samples whose predicted class labels are the same as the labels of the corresponding center pixels to constitute the candidate training samples D LSAN C .
In this way, an initial pool of candidate training samples with high confidence is established.
(2) Active learning The candidate training samples usually contain a large amount of redundant information, since the neighboring pixels carry very similar information to the central pixel. Such a redundancy, although, does not reduce the quality of the classifier, slows down the training phase considerably. AL methods are usually adopted to determine unlabeled training samples that have high uncertainty in order to reduce the redundancies and select the most informative samples from candidate sets. In this paper, we adopt the BT algorithm to evaluate the uncertainty of every pixel by calculating the difference between the two highest probabilities, which is formulated as Remote Sens. 2020, 12, 2976 8 of 22 where k M represents the class label with the largest posterior probability for x i , and κ\{k M } represent all of the class labels, excluding k M . The value of BT is between (0, 1), and a smaller value indicates more uncertainty. However, it is important to emphasize that the BT algorithm assumes that only one sample satisfies the condition. Taking the computational complexity into consideration, we select the µ 1 most informative samples from D LSAN C at every iteration. The unlabeled samples selected from LSAN are represented as D LSAN u .

Selecting Unlabeled Samples from uLSAN
The unlabeled samples selected from uLSAN are added to the D Tr to make full use of the information in the unlabeled samples and select more representative samples. This part of training samples (denoted as D uLSAN u ) is determined according to three strategies, which are implemented to guarantee the informativeness and confidence. The specific procedures are shown in Figure 4 for illustrative purpose.  (1) Strategy to ensure informativeness The BT values of the whole image are computed based on the classification probabilities obtained by training samples. To utilize the feedback information of the available training samples, the most informative samples u x of Tr D is used as a benchmark to select new unlabeled samples. More specifically, we define an adaptive interval of uncertainty that takes the BT value of u x to be the lower bound, since a lower BT value indicates less uncertainty. The minimum BT value of is denoted as BT u x , and the adaptive interval is presented as Because the training samples are updated every iteration, the BT u x and the interval change every iteration. Subsequently, the samples whose BT values are in the interval are selected as candidate samples uLSAN C D .
(2) Strategies to ensure confidence The neighboring pixels in an HSI usually follow the same spectral signatures or features, and the labels of neighboring pixels are highly correlated, which usually arises in the manner of label consistency or label smoothness. In order to ensure high confidence in the label of uLSAN Except for the SC strategy, the ASR method is introduced to predict the labels of the candidate samples with a different mechanism. With the available Tr D , the training dictionary is constructed by the EMAP and spectral features of training samples, and the labels of unlabeled samples can be predicted using the ASR. Only candidate samples that satisfy both of the first two conditions are predicted by the ASR to reduce the operation time. Subsequently, samples whose labels assigned by (1) Strategy to ensure informativeness The BT values of the whole image are computed based on the classification probabilities obtained by training samples. To utilize the feedback information of the available training samples, the most informative samples x u of D Tr is used as a benchmark to select new unlabeled samples. More specifically, we define an adaptive interval of uncertainty that takes the BT value of x u to be the lower bound, since a lower BT value indicates less uncertainty. The minimum BT value of D Tr is denoted as x BT u , and the adaptive interval is presented as [x BT u , x BT u + η], where η is the length of the interval. Because the training samples are updated every iteration, the x BT u and the interval change every iteration. Subsequently, the samples whose BT values are in the interval are selected as candidate samples D uLSAN C .
(2) Strategies to ensure confidence The neighboring pixels in an HSI usually follow the same spectral signatures or features, and the labels of neighboring pixels are highly correlated, which usually arises in the manner of label consistency or label smoothness. In order to ensure high confidence in the label of D uLSAN C , the SANI of D uLSAN C is utilized, and the principle of spatial consistency (SC) is implemented, which demands that the labels of samples are consistent with the mode of the labels of their SA neighborhood. Subsequently, samples that do not meet the SC requirement are excluded from D uLSAN C . Except for the SC strategy, the ASR method is introduced to predict the labels of the candidate samples with a different mechanism. With the available D Tr , the training dictionary is constructed by the EMAP and spectral features of training samples, and the labels of unlabeled samples can be predicted using the ASR. Only candidate samples that satisfy both of the first two conditions are predicted by the ASR to reduce the operation time. Subsequently, samples whose labels assigned by ASR are not same as those under SMLR are excluded from D uLSAN

Datasets Used in the Experiments
In this paper, three public hyperspectral datasets are considered in order to evaluate the performance of the proposed approach.
(1) The first hyperspectral dataset used in this paper was acquired by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor over a mixed agricultural/forest area in northwestern Indiana in 1992. The dataset contains 145 lines by 145 samples with a spatial resolution of 20 m. It is composed of 224 spectral reflectance bands in the wavelength range from 0.4 um-2.5 um at 10 nm intervals. After an initial screening, the spectral bands were reduced to 200 by removing the bands that cover the region of water absorption. The available ground-truth map contains 10,366 labeled samples with 16 different classes. Figure 5 shows the false color composition image of Indian Pines datasets and ground-truth map. also added to Tr D in order to retrain the classifiers.

Datasets Used in the Experiments
In this paper, three public hyperspectral datasets are considered in order to evaluate the performance of the proposed approach.
(1) The first hyperspectral dataset used in this paper was acquired by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor over a mixed agricultural/forest area in northwestern Indiana in 1992. The dataset contains 145 lines by 145 samples with a spatial resolution of 20 m. It is composed of 224 spectral reflectance bands in the wavelength range from 0.4 um-2.5 um at 10 nm intervals. After an initial screening, the spectral bands were reduced to 200 by removing the bands that cover the region of water absorption. The available ground-truth map contains 10,366 labeled samples with 16 different classes. Figure 5 shows the false color composition image of Indian Pines datasets and ground-truth map.

Experimental Setup
(1) Spatial information extraction: in this paper, the parameters about EMAP are determined with reference to related work in [20,47]. The thresholds of area attribute are determined according to the scale of the objects, the thresholds of moment of inertia, and standard deviation are determined according to geometry of the objects and the homogeneity of the intensity values of the pixels. Although finer threshold division is conducive to extracting detailed spatial information, the amount of calculation will increase accordingly.

Experimental Setup
(1) Spatial information extraction: in this paper, the parameters about EMAP are determined with reference to related work in [20,47]. The thresholds of area attribute are determined according to the scale of the objects, the thresholds of moment of inertia, and standard deviation are determined according to geometry of the objects and the homogeneity of the intensity values of the pixels. Although finer threshold division is conducive to extracting detailed spatial information, the amount of calculation will increase accordingly.

Experimental Setup
(1) Spatial information extraction: in this paper, the parameters about EMAP are determined with reference to related work in [20,47]. The thresholds of area attribute are determined according to the scale of the objects, the thresholds of moment of inertia, and standard deviation are determined according to geometry of the objects and the homogeneity of the intensity values of the pixels. Although finer threshold division is conducive to extracting detailed spatial information, the amount of calculation will increase accordingly. (2) Classifier parameter: the kernel width σ = 0.6, and the degree of sparsity γ = 0.00001 for SMLR.
(3) Training sets: to evaluate performance of the proposed SANI-SSL method for scenarios with limited labeled samples, only five truly labeled samples per class were randomly chosen from the ground truth for all of the methods in this paper. The number of unlabeled samples is set to be µ 1 + µ 2 = 32, µ 1 + µ 2 = 18, µ 1 + µ 2 = 32 for the three datasets, respectively. (4) Other settings: the parameter η, which represents the length of the uncertainty interval is intuitively set to 0.25 because we have empirically found that this parameter setting provides better performance and more detailed discussion about η is shown at 3.4.3. The SSL process is executed for 30 iterations. In order to evaluate the performance of our SANI-SSL method quantitatively, the overall accuracy (OA), average accuracy (AA), class-specific accuracies (CA), and kappa coefficient are computed by averaging ten Monte Carlo runs that correspond to independent initial labeled sets.

Experimental Results with Unlabeled Samples Selected from Only LSAN
In the first experiment, we evaluate the SANI-SSL performance with unlabeled samples selected from only LSAN. In this part, the superpixel (SP) and fixed size (FS) neighborhood information-based SSL methods are also adopted to illustrate the performance of our SANI-SSL method. More specifically, the superpixel information is extracted while using an oversegmentation algorithm, called the entropy rate superpixel (ERS) method [54], and the numbers of superpixels for three datasets are 800, 8000, and 4000, respectively. The fixed size neighborhood information is extracted using a 5 × 5 window for the three datasets. Similar to the SANI-SSL method, the unlabeled samples are selected from the SP-based and FS-based neighborhood of labeled samples. Figure 8 shows the OA results as a function of the number of unlabeled samples that were obtained by different spatial neighborhood information for the three datasets. As shown in Figure 8, the OAs increase as the number of unlabeled samples increases, which reveal clear advantages for using the spatial neighborhood information to select unlabeled samples. In Figure 8, it can also be seen that the accuracies obtained by the SA based sample selection method are much higher than the accuracies obtained by the SP-based and FS-based methods. The OAs grows rapidly at first since the most informative samples are selected at first and then level off as the valuable unlabeled samples in the neighborhood are running out. (2) Classifier parameter: the kernel width   0.6 , and the degree of sparsity   0.00001 for SMLR.
(3) Training sets: to evaluate performance of the proposed SANI-SSL method for scenarios with limited labeled samples, only five truly labeled samples per class were randomly chosen from the ground truth for all of the methods in this paper. The number of unlabeled samples is set to be     (4) Other settings: the parameter  , which represents the length of the uncertainty interval is intuitively set to 0.25 because we have empirically found that this parameter setting provides better performance and more detailed discussion about is shown at 3.4.3. The SSL process is executed for 30 iterations. In order to evaluate the performance of our SANI-SSL method quantitatively, the overall accuracy (OA), average accuracy (AA), class-specific accuracies (CA), and kappa coefficient are computed by averaging ten Monte Carlo runs that correspond to independent initial labeled sets.

Experimental Results with Unlabeled Samples Selected from Only LSAN
In the first experiment, we evaluate the SANI-SSL performance with unlabeled samples selected from only LSAN. In this part, the superpixel (SP) and fixed size (FS) neighborhood information-based SSL methods are also adopted to illustrate the performance of our SANI-SSL method. More specifically, the superpixel information is extracted while using an oversegmentation algorithm, called the entropy rate superpixel (ERS) method [54], and the numbers of superpixels for three datasets are 800, 8000, and 4000, respectively. The fixed size neighborhood information is extracted using a 5 × 5 window for the three datasets. Similar to the SANI-SSL method, the unlabeled samples are selected from the SP-based and FS-based neighborhood of labeled samples. Figure 8 shows the OA results as a function of the number of unlabeled samples that were obtained by different spatial neighborhood information for the three datasets. As shown in Figure 8, the OAs increase as the number of unlabeled samples increases, which reveal clear advantages for using the spatial neighborhood information to select unlabeled samples. In Figure 8, it can also be seen that the accuracies obtained by the SA based sample selection method are much higher than the accuracies obtained by the SP-based and FS-based methods. The OAs grows rapidly at first since the most informative samples are selected at first and then level off as the valuable unlabeled samples in the neighborhood are running out.   Tables 1-3 show the CAs, OAs, AAs, kappa, and running time average statistics obtained by different spatial neighborhood information for the three datasets where the best results are highlighted in bold in order to show the classification results in more detail. As seen in Tables 1-3, the proposed SA based method yields better performance than the other methods. This finding is expected, since the neighborhood information determined by the SA method can exploit pixels' spatial information more accurately and comprehensively. For illustrative purpose, Figures 9-11 display the best classification maps obtained using different spatial neighborhood information after 30 iterations for three datasets, along with the corresponding OAs. This experiment reflects the importance the of spatial neighborhood information for selecting the valuable unlabeled samples and the superiority of the SA method for representing the spatial neighborhood information.

Experimental Results with Unlabeled Samples Selected from LSAN and uLSAN
In the second experiment, we evaluate the performance of SANI-SSL with unlabeled samples selected from both LSAN and uLSAN. As the Figure 8 shows, the classification performance using unlabeled samples selected from LSAN converges after several iterations, which is expected, since the available information of LSAN is limited. Therefore, in our SANI-SSL method, the unlabeled samples that are selected from uLSAN are also added to the training samples. In this part, we conducted several comparison experiments to illustrate some key issues.

Experimental Results with Unlabeled Samples Selected from LSAN and uLSAN
In the second experiment, we evaluate the performance of SANI-SSL with unlabeled samples selected from both LSAN and uLSAN. As the Figure 8 shows, the classification performance using unlabeled samples selected from LSAN converges after several iterations, which is expected, since the available information of LSAN is limited. Therefore, in our SANI-SSL method, the unlabeled samples that are selected from uLSAN are also added to the training samples. In this part, we conducted several comparison experiments to illustrate some key issues.

Experimental Results with Unlabeled Samples Selected from LSAN and uLSAN
In the second experiment, we evaluate the performance of SANI-SSL with unlabeled samples selected from both LSAN and uLSAN. As the Figure 8 shows, the classification performance using unlabeled samples selected from LSAN converges after several iterations, which is expected, since the available information of LSAN is limited. Therefore, in our SANI-SSL method, the unlabeled samples that are selected from uLSAN are also added to the training samples. In this part, we conducted several comparison experiments to illustrate some key issues.

The Influence of Unlabeled Samples from uLSAN
To demonstrate the effectiveness of D uLSAN u , we adopted exactly the same number of initial training samples (five per class) and unlabeled samples selected from both LSAN and uLSAN in this part. When considering that the labels of pixels in the LSAN are more reliable, µ 1 is set to 32, 18, and 32 for three datasets at the beginning. In addition, when considering that the valuable information of LSAN is limited, the number of µ 1 decreases by one at each iteration, and the number of µ 2 increases by one correspondingly, to enable the unlabeled samples to be mostly selected from uLSAN when the valuable information in LSAN runs out. Figure 12 shows the OAs of the classification results using unlabeled samples that were selected from both LSAN and uLSAN, which is, our proposed SANI-SSL method. The OAs of classification results using unlabeled samples only from LSAN are also shown for comparison purpose. To demonstrate the effectiveness of uLSAN u D , we adopted exactly the same number of initial training samples (five per class) and unlabeled samples selected from both LSAN and uLSAN in this part. When considering that the labels of pixels in the LSAN are more reliable,  1 is set to 32, 18, and 32 for three datasets at the beginning. In addition, when considering that the valuable information of LSAN is limited, the number of  1 decreases by one at each iteration, and the number of  2 increases by one correspondingly, to enable the unlabeled samples to be mostly selected from uLSAN when the valuable information in LSAN runs out. Figure 12 shows the OAs of the classification results using unlabeled samples that were selected from both LSAN and uLSAN, which is, our proposed SANI-SSL method. The OAs of classification results using unlabeled samples only from LSAN are also shown for comparison purpose. The OAs of the two cases are very close to each other at first since the unlabeled samples are mostly selected from LSAN for both cases, the difference between the two methods increases with an increasing number of unlabeled samples, since there are more informative samples in uLSAN than LSAN, as seen in Figure 12. Finally, the OAs increases 1.32%, 1.96%, and 1.21% as compared with the results in Section 3.3 for three datasets, respectively. This experiment reflects that the unlabeled samples from uLSAN can improve the classification performance by improving training samples' representativeness.

The Influence of the Strategy to Ensure Confidence
In order to further illustrate the importance of the strategies that we adopted to ensure confidence, four groups of comparison experiments were performed, as follows: (1) selecting unlabeled samples using both the SC and ASR strategies, which is, our proposed SANI-SSL method; (2) selecting unlabeled samples only using the SC strategy; (3) selecting unlabeled samples only using the ASR strategy; and, (4) selecting unlabeled samples using neither the SC nor ASR strategy. Figure 13 shows the OAs of the classification results as a function of the number of unlabeled samples obtained with different strategies for three datasets. When the SC and ASR strategies are employed separately, both of them perform quite well and better than no strategy to ensure confidence for Indian Pines and Salinas Valley datasets, as seen in Figure 13. However, for the Pavia University data, the combination of SC and ASR greatly outperforms other cases, which reveals that both the SC and ASR strategies play an important part in the unlabeled samples' selection. For the performance difference between three datasets, the main reason is that Pavia University is characterized as a complex spatial structure with stronger heterogeneity, and only one ensured strategy might not guarantee the correctness of the unlabeled samples effectively. The OAs of the two cases are very close to each other at first since the unlabeled samples are mostly selected from LSAN for both cases, the difference between the two methods increases with an increasing number of unlabeled samples, since there are more informative samples in uLSAN than LSAN, as seen in Figure 12. Finally, the OAs increases 1.32%, 1.96%, and 1.21% as compared with the results in Section 3.3 for three datasets, respectively. This experiment reflects that the unlabeled samples from uLSAN can improve the classification performance by improving training samples' representativeness.

The Influence of the Strategy to Ensure Confidence
In order to further illustrate the importance of the strategies that we adopted to ensure confidence, four groups of comparison experiments were performed, as follows: (1) selecting unlabeled samples using both the SC and ASR strategies, which is, our proposed SANI-SSL method; (2) selecting unlabeled samples only using the SC strategy; (3) selecting unlabeled samples only using the ASR strategy; and, (4) selecting unlabeled samples using neither the SC nor ASR strategy. Figure 13 shows the OAs of the classification results as a function of the number of unlabeled samples obtained with different strategies for three datasets. When the SC and ASR strategies are employed separately, both of them perform quite well and better than no strategy to ensure confidence for Indian Pines and Salinas Valley datasets, as seen in Figure 13. However, for the Pavia University data, the combination of SC and ASR greatly outperforms other cases, which reveals that both the SC and ASR strategies play an important part in the unlabeled samples' selection. For the performance difference between three datasets, the main reason is that Pavia University is characterized as a complex spatial structure with stronger heterogeneity, and only one ensured strategy might not guarantee the correctness of the unlabeled samples effectively.  Table 4 shows the detailed OAs, AAs, kappa, and running time results obtained with different strategies to ensure confidence for the three datasets, and the best results are highlighted in bold. It can be seen that the combination of the two confidence strategies provides more satisfactory and stable results; the ASR and SC made almost equal contributions to the confidence of the unlabeled samples, but the ASR was more time-consuming than the SC strategy. Figures 14-16 show the best classification map obtained by using different strategies after 30 iterations, along with the corresponding OAs, in order to intuitively illustrate the differences among these methods. The classification results obtained without such ensure strategies have more salt-and-pepper noise than those obtained with both ASR and SC strategies, as can be seen from Figures 14-16. It can also be seen that the noise in Figures 14-16 is reduced significantly when compared with that in Figures 9-11 in Section 3.3. This experiment reflects that the strategies that we adopted are effective to select reliable unlabeled samples and the consuming time is acceptable.   Table 4 shows the detailed OAs, AAs, kappa, and running time results obtained with different strategies to ensure confidence for the three datasets, and the best results are highlighted in bold. It can be seen that the combination of the two confidence strategies provides more satisfactory and stable results; the ASR and SC made almost equal contributions to the confidence of the unlabeled samples, but the ASR was more time-consuming than the SC strategy. Figures 14-16 show the best classification map obtained by using different strategies after 30 iterations, along with the corresponding OAs, in order to intuitively illustrate the differences among these methods. The classification results obtained without such ensure strategies have more salt-and-pepper noise than those obtained with both ASR and SC strategies, as can be seen from Figures 14-16. It can also be seen that the noise in Figures 14-16 is reduced significantly when compared with that in Figures 9-11 in Section 3.3. This experiment reflects that the strategies that we adopted are effective to select reliable unlabeled samples and the consuming time is acceptable.

The Influence of the Strategy to Ensure Informativeness
We conducted the experiments using different values of η in order to explore the influence of the parameter η on the classification accuracy. In this part, both of the two strategies to ensure confidence are used. Recall that the BT value is in the range of (0, 1); we set η in the range of (0.05, 0.55) with a 0.05 step size. Figure 17 shows that the OAs of the classification results obtained with different η for three datasets. The classification OAs of three datasets first increases, then reaches the peak value, and finally decreases or keeps the same value, as we can see from Figure 17. It can also be seen that Indian Pines and Salinas Valley datasets are robust to the parameter η and Pavia University dataset is more sensitive to η. This finding is expected, since the larger the value of η is, the more uncertain samples will be included in the candidate sets; the scene in Pavia University is more complex, which makes the labels of the uncertainty samples more prone to error and, thus, affects the final accuracy. We conducted the experiments using different values of in order to explore the influence of the parameter on the classification accuracy. In this part, both of the two strategies to ensure confidence are used. Recall that the BT value is in the range of (0, 1); we set in the range of (0.05, 0.55) with a 0.05 step size. Figure 17 shows that the OAs of the classification results obtained with different for three datasets. The classification OAs of three datasets first increases, then reaches the peak value, and finally decreases or keeps the same value, as we can see from Figure 17. It can also be seen that Indian Pines and Salinas Valley datasets are robust to the parameter and Pavia University dataset is more sensitive to . This finding is expected, since the larger the value of is, the more uncertain samples will be included in the candidate sets; the scene in Pavia University is more complex, which makes the labels of the uncertainty samples more prone to error and, thus, affects the final accuracy.

Discussion
We reveal some important issues about the proposed SSL methods by conducting several comparison experiments on three hyperspectral datasets.
(1) The comparison experiment between the SA-, SP-, and FS-based SSL methods shows that our proposed SA-based method performs better than the other two methods. This finding can be explained by the fact that the SA spatial information is beneficial to construct a more representative neighborhood for every pixel. In homogeneous regions of the image, SA has the advantage of representing the neighborhood information comprehensively, while in the heterogeneous region, SA has the advantage of more accurately representing the neighborhood information. (2) The comparison experiment between the selection of unlabeled samples from LSAN and uLSAN shows that the uLSAN has great potential in finding valuable samples. This potential could be attributed to two reasons. First, when compared with the limited and high-related information in LSAN, the uLSAN contains abundant undiscovered and unused information, which is helpful in improving the classifier. Second, taking the characteristics of both LSAN and uLSAN into full consideration, the unlabeled samples are selected from these two regions at generally appropriate time by adjusting the number of unlabeled samples from both regions. (3) The influence of different strategies confidence to ensure on the classification accuracy is analyzed in Section 3.4.2, and the result shows that the two strategies can ensure that the selected samples are assigned by correct labels, and the SC strategy shows better performance in computational efficiency. The influence of the informativeness parameter on the classification accuracy is analyzed by conducting experiments using different value of  , and the best performance is acquired by setting to 0.25 according to the results of three datasets. As for the difference in performance of the three datasets, the main reason is that the spatial characteristics in the datasets are different. The main ground objects of Indian Pines and Salinas Valley datasets

Discussion
We reveal some important issues about the proposed SSL methods by conducting several comparison experiments on three hyperspectral datasets.
(1) The comparison experiment between the SA-, SP-, and FS-based SSL methods shows that our proposed SA-based method performs better than the other two methods. This finding can be explained by the fact that the SA spatial information is beneficial to construct a more representative neighborhood for every pixel. In homogeneous regions of the image, SA has the advantage of representing the neighborhood information comprehensively, while in the heterogeneous region, SA has the advantage of more accurately representing the neighborhood information. (2) The comparison experiment between the selection of unlabeled samples from LSAN and uLSAN shows that the uLSAN has great potential in finding valuable samples. This potential could be attributed to two reasons. First, when compared with the limited and high-related information in LSAN, the uLSAN contains abundant undiscovered and unused information, which is helpful in improving the classifier. Second, taking the characteristics of both LSAN and uLSAN into full consideration, the unlabeled samples are selected from these two regions at generally appropriate time by adjusting the number of unlabeled samples from both regions. (3) The influence of different strategies confidence to ensure on the classification accuracy is analyzed in Section 3.4.2, and the result shows that the two strategies can ensure that the selected samples are assigned by correct labels, and the SC strategy shows better performance in computational efficiency. The influence of the informativeness parameter η on the classification accuracy is analyzed by conducting experiments using different value of η, and the best performance is acquired by setting η to 0.25 according to the results of three datasets. As for the difference in performance of the three datasets, the main reason is that the spatial characteristics in the datasets are different. The main ground objects of Indian Pines and Salinas Valley datasets are cropland, which have larger spatial scale and regular distribution. However, for the urban scene in Pavia University dataset, the spatial distribution is more complicated; therefore, more stringent requirements on the strategies need to be satisfied in order to achieve satisfactory results.
We make a comparison between the proposed SANI-SSL method and the following spectral-spatial-based SSL methods for HSIs classification to further evaluate the performance of the proposed method: (1) tri-training based semi-supervised learning (TT-SSL) proposed in [55]; (2) generative adversarial networks based semi-supervised learning (GANs-SSL) proposed in [25]; and, (3) superpixel and density peak based semi-supervised learning (SDP-SSL) proposed in [30]. Table 5 shows the classification OAs, AAs, and kappa coefficient of our proposed SANI-SSL method in comparison with those of the above methods using five truly labeled samples per class for the Indian Pines and Pavia University datasets (some of the above methods did not use Salinas Valley dataset). It can be seen from Table 5 that the performance of SANI-SSL is better than the other spectral-spatial based SSL methods, which confirms the effectiveness of our proposed method. Table 5. Comparison of OAs, AAs, and kappa (as a percentage) between the proposed SANI-SSL method and other spectral-spatial information-based SSL methods.

Conclusions
In this paper, we have presented a SANI based SSL method for HSIs classification, which exploits the SANI of both labeled and unlabeled samples in order to make full use of the spectral-spatial information of the whole image. The EMAP-based spatial information is extracted and fused with spectral information firstly to enhance the discriminability of the different classes, and the SA-based spatial neighborhood information is used to select unlabeled samples. The novelty of this paper is that we selected unlabeled samples not only using the SANI of truly labeled samples, but also utilizing the SANI of unlabeled samples. For the unlabeled samples selected from LSAN, the SANI is utilized in order to guarantee the confidence of the unlabeled samples, and the BT algorithm is adopted to guarantee the informativeness. For the unlabeled samples selected from uLSAN, the SANI and classifier combination methods are utilized to guarantee the confidence, and a novel adaptive interval method is proposed to guarantee the informativeness. The proposed method was tested on Indian Pines, Pavia University, and Salinas Valley datasets, and the comparison with other spectral-spatial-based SSL methods (i.e., TT-SSL, GANs-SSL, and SDP-SSL) has demonstrated the superiority and effectiveness of the proposed SANI-based SSL method.
In future work, we would consider combining the SA-based spatial information with spectral information in order to improve the performance of the classifier, and investigating the influence of the scale of spatial features on HSIs classification.