Semi-Supervised Classiﬁcation for Hyperspectral Images Based on Multiple Classiﬁers and Relaxation Strategy

: Hyperspectral image (HSI) classiﬁcation is a fundamental and challenging problem in remote sensing and its various applications. However, it is difﬁcult to perfectly classify remotely sensed hyperspectral data by directly using classiﬁcation techniques developed in pattern recognition. This is partially owing to a multitude of noise points and the limited training samples. Based on multinomial logistic regression (MLR), the local mean-based pseudo nearest neighbor (LMPNN) rule, and the discontinuity preserving relaxation (DPR) method, in this paper, a semi-supervised method for HSI classiﬁcation is proposed. In pre-processing and post-processing, the DPR strategy is adopted to denoise the original hyperspectral data and improve the classiﬁcation accuracy, respectively. The application of two classiﬁers, MLR and LMPNN, can automatically acquire more labeled samples in terms of a few labeled instances per class. This is termed the pre-classiﬁcation procedure. The ﬁnal classiﬁcation result of the HSI is obtained by employing the MLRsub approach. The effectiveness of the proposal is experimentally evaluated by two real hyperspectral datasets, which are widely used to test the performance of the HSI classiﬁcation algorithm. The comparison results using several competing methods conﬁrm that the proposed method is effective, even for limited training samples.


Introduction
Owing to the special advantages of a wide spectral range, high spectral resolution, and continuous spectral curve, hyperspectral remote-sensing images have been widely applied in earth observation [1,2].As a fundamental and challenging problem in remote sensing, hyperspectral image (HSI) classification and its various applications have attracted increasing attention in recent years.Many methods have been introduced to classify HSI, attempting to detect a perfect classification result for a specific HSI.According to whether the class labels of the samples are used in the classification process, the existing classification approaches are generally partitioned into three categories: unsupervised/cluster [3,4], supervised [5][6][7], and semi-supervised [8,9].Although a satisfactory classification result can be obtained by the supervised classification method, the acquisition of labeled samples is laborious and time consuming and also depends on expert knowledge.Contrarily, sufficient unlabeled samples can be acquired.The unsupervised classification does not require prior knowledge of the hyperspectral dataset; however, the cluster purity is often not satisfied with the corresponding absence of the discriminative information.Compared with the unsupervised and supervised classifications, semi-supervised classification applies partial labeled samples and a large number of unlabeled instances, aiming at achieving a better classification effect.The classification results provided by semi-supervised methods generally have a close relationship to the size of the labeled samples.A number of studies show that semi-supervised HSI classification is a powerful and promising technique.Nevertheless, it is still a challenging problem to propose more powerful semi-supervised methods to classify remotely sensed hyperspectral data in cases of limited training samples.
Unlike other datasets, the hyperspectral dataset is depicted by both spectral information and definite spatial information.Tobler's first law [10] shows that all attribute values on a geographic surface are related to each other, but closer values are more strongly related than those from farther away.Tobler's first law implies that the spatial neighboring pixels in remote-sensing images are likely to be the same class label.Therefore, it is natural to consider the spectral and spatial information together in HSI classification.Plenty of spatial-spectral-based classification methods have been developed in the past decades [11][12][13][14][15].
Hyperspectral data are known to contain plenty of noise points.These noise points can evidently affect the final classification accuracy.The spectral-spatial technique is also used to reduce noise points for the HSI pre-processing task or to improve the classification accuracy for an obtained classification result in post-processing.As a typical application of this technique, probabilistic relaxation (PR) uses the local relationship among spatially adjacent pixels to correct spectral or spatial distortion.In the probability sense, PR is applied to exploit the continuity of neighboring labels [16].
Perhaps the most sophisticated PR technology is based on the Markov random fields (MRFs).In particular, the MRFs have achieved prominent performance on refining the classification results by characterizing the spatial information [17][18][19][20][21].For instance, Yu et al. proposed a novel classification method by integrating an adaptive MRF with a subspace-based support vector machine (SVM) classifier [17].The class labels predicted in the pre-classification process were altered through an adaptive MRF approach.In [18], an adaptive MRF was applied to optimize the classified image provided by the threshold exponential spectral angle map classifier, which improved the classification performance markedly.To an extent, it proved that the incorporation of a classifier and spatial smoothing technique can yield the ideal result.A novel probabilistic label relaxation strategy was proposed in [22], which combined the evidence of adjacent label assignments to effectively eliminate label ambiguity.Nevertheless, it is often observed that the effect of using spatial information as a relaxation is not ideal.On the one hand, it significantly improves the classification accuracy of smooth image regions.On the other hand, the boundary of the class is blurred by the forced smoothness.Therefore, a more powerful PR strategy to handle this problem is urgently required.Wang et al. proposed a novel two-step MRF regularization method [23], which addressed the problem of over-smoothing at the image boundary areas based on the spatial regularizing methodology of the MRF.By detecting the discontinuities of a band image in advance, the results could be locally smoothed without crossing the boundaries by the discontinuity preserving relaxation (DPR) method [24].More recently, a new relaxation method [16], DPR was introduced to logically smooth the original hyperspectral image or the classification results, using both spatial and spectral information while maintaining the discontinuity extracted from the hyperspectral data cube.
Currently, many machine-learning algorithms have been widely used in HSI classification, such as SVM [25], multinomial logistic regression (MLR) [26][27][28], k-nearest neighbor (KNN) [29,30], the local mean-based pseudo nearest neighbor (LMPNN) method [31], and artificial neural networks (ANNs) [32,33].Using unlabeled samples actively selected from the hyperspectral dataset, Prabhakar et al. extended the MLR algorithm to a semi-supervised learning of the posterior class distribution, promoting classification results with smaller training datasets and less complexity [26].Based on the integration of MLR with the subspace projection method, MLRsub was proposed in [27].It assumes that the samples in each class could be located approximately into a lower dimensional subspace.The use of the subspace projection method can effectively avoid the spectral confusion caused by mixed pixels [28].To further enhance the KNN-based classification performance, LMPNN [31] was presented in 2014, which is based upon the concept of local mean-based k-nearest neighbor rule (LMKNN) [34] and PNN [35] rules.Additionally, a novel semi-supervised classification approach for hyperspectral images was introduced by Tan et al. [29].To further improve the classification accuracy, the authors combined the KNN and MLR to determine the class labels of the selected unlabeled samples.
For limited training samples, it is typically difficult to provide a satisfactory classification result for an HSI.Motivated by the work of the classifier combination, a pre-classification technique has been developed, attempting to address the problem mentioned above.Based on the pre-classification, relaxation strategy and the MLRsub algorithm, a semi-supervised method for HSI classification consisting of four steps is proposed in this work.
The primary contributions of this paper are as follows: 1.
In pre-processing and post-processing, the DPR strategy combined with the Roberts cross operator is adopted to denoise the original hyperspectral data and improve the classification accuracy, respectively.

2.
A new classifier combination for the pre-classification process of HSIs is proposed, which addresses the problem of automatically labeling samples based on a small training set.Two classifiers, MLRsub and LMPNN, are used together to perform the pre-classification of automatically predicting more labeled samples in terms of a few labeled instances per class.

3.
A novel semi-supervised classification scheme is built by four steps: pre-processing, pre-classification, classification, and post-processing.
The remainder of this paper is organized as follows.In Section 2, we first briefly introduce the related classifiers and DPR algorithm and subsequently present the proposed classification method.Section 3 evaluates the performances of the proposal.Some summarizing remarks and conclusions are presented in the last section.

The Proposed Semi-Supervised Classification Method
We first define the notations used in this study and subsequently introduce the DPR method, MLRsub, and LMPNN classifiers.Finally, the proposed semi-supervised classification method is depicted in detail. Let ) be a hyperspectral dataset with n pixels, where x i = (x i1 , x i2 , . . . . . . ,x id ) T indicates a spectral vector associated with pixel i. y = (y 1, y 2 , • • • • • • , y n ) denotes an image of class labels, y i ∈ {1, 2, . . . . . . ,K} assuming that K classes exist in X.
Let N(x i ) denote a collection of spatial neighbors of pixel i.We herein adopt the Moore neighbor, which is defined as follows: where (i 1 , i 2 ) and (j 1 , j 2 ) are the spatial coordinates of pixels i and j, respectively.

Relaxation Method
Relaxation is a technique that utilizes the local spatial relationship among neighboring pixels to denoise hyperspectral data and enhance the spatial texture information in pre-processing and improve classification accuracy in post-processing.The relaxation strategy is in fact a moving window smoothing technology widely used to reduce noise in a time series.Recently, combinations of SVM and MFR [36] or MLR and MFR [28] methods have shown outstanding performance in HSI classification.Although the use of spatial information in the relaxation strategy can effectively refine the classification result in smooth image areas, it also blurs the boundaries of classes, rendering it difficult to obtain a better classification.To preserve the advantage and overcome the disadvantage of the method based on relaxation, Li et al. [16] proposed a DPR strategy for HSI classification.This method attempted to accurately preserve the edges of class boundaries while smoothing remotely sensed hyperspectral data.
The DPR strategy can be described as follows.
For a given a hyperspectral dataset X, T , and p i (j) be the probability of pixel i belonging to the j-th class.
T be the final k-dimensional vector of the probabilities provided by the relaxation process.
A relaxation strategy will be obtained by solving the following optimization problem: where γ is a parameter balancing the relative impact of both terms in Equation ( 2) and δ j is a value of the pixel j of the edge image.δ is calculated by Equation (3): where Sobel () represents the Sobel filter that detects the discontinuities in a band image.X (i) denotes the i-th band of the original data cube X.More details of this method can be found in [16].
In Equation ( 2), the first term measures the misfit of the data, and the second term facilitates the smooth solutions weighted by the parameter δ.When no discontinuity exists between the adjacent pixels to which it is connected, δ is large.Conversely, when discontinuities exist, δ is small [16].
The Roberts cross operator, initially proposed by Roberts in 1963 [37], is one of the first edge detectors and is used in image processing for edge detection.The basic idea of the Roberts cross operator is to approximate the gradient of an image by discrete differentiation, which is realized through calculating the squared sum of the differences between diagonally adjacent pixels.When the Roberts cross operator appears on the text, it is simple, easy to compute, and more accurate to detect an edge location.Considering these merits of the Roberts cross operator, we replace the Sobel filter with the Roberts cross operator in Equation ( 3).The experimental results on three hyperspectral datasets show that this substitution can achieve the ideal classification results.

Multinomial Logistic Regression (MLR)
Unlike the hard classification method, MLR, a probabilistic classifier suggested by Böhning [38], is a soft classification technique to calculate the posterior class density p(y i |x i ) .It indicates that each pixel of the HSI belongs to different classes with different probabilities.In fact, the classification result provided by MLR can more accurately reflect the relationship between pixels and classes.
The MLR classifier is modeled by the following: where T is a vector of t fixed functions of the input, which is often termed as features.ω (k) is the set of logistic regressors for class k that can be obtained by the LORSAL (logistic regression via variable splitting and augmented Lagrangian algorithm) [39], and LORSAL promotes the sparsity of the weights by a Thikonov regularization and an L 1 weight on the priors.
The function h can use linear or nonlinear functions to handle different problems.In [2,39], the Gaussian radial basis function kernel was adopted to compute the function h.The kernel technique is often used to handle linear inseparability in low-dimensional space in machine learning and to classify remotely sensed hyperspectral data [2,39].In this work, we prefer to use the following input function h(x i ) in Equation ( 4) [27,40]: where r (j) ,} is a set of r (j) -dimensional orthonormal basis vectors for the subspace associated with class j (r (j)  d).

Local Mean-Based Pseudo Nearest Neighbor (LMPNN)
As a popular algorithm in machine learning, the KNN rule [41] is a simple and effective classification method.It yields a classification decision by computing the k-nearest neighbors of an unlabeled sample and a simple vote principle.However, the classification result of the KNN classifier is sometimes dependent on the choice of the k value.Furthermore, it does not always provide an optimal solution for an unbalanced dataset.To handle these issues, a multitude of improvements in KNN-based approaches have been suggested in recent years, such as the weighted k-nearest neighbor rule [42], pseudo nearest neighbor rule [35], local mean-based k-nearest neighbor rule [34], and local mean-based pseudo nearest neighbor [31].The basic idea behind these methods is to assign a greater weight to the nearest neighbor or replace real neighbors with pseudo neighbors.
The LMPNN rule can be described by the following steps [31]: (1) For the unlabeled sample x, search for the k-nearest neighbors from each class of the training set; k denote the k-nearest neighbors selected from the i-th class and are arranged in the ascending order according to their distances from x.
(2) Compute the local average x i j of the first j-nearest neighbors of sample x from the i-th class (3) Assign weight 1/j to the local mean value x i j (j = 1, 2, . . ., k). (4) Calculate the distance from the sample x to the i-th class by Equation (7) where d(x, x i j ) denotes the distances from x to x i j and d x, x i the distance from x to the i-th class.(5) Determine the class label of sample x.The sample x belongs to the c-th class if the following is true.

The Proposed Method
It is known that classification results obtained by semi-supervised approaches generally depend on the volume of the training set.For limited training samples, certain semi-supervised classification methods do not always provide satisfactory classification results.Obviously, the acquisition of a number of labeled samples is difficult and sometimes even impossible.
In this subsection, we suggest a semi-supervised technique for HSI classification based on the DPR strategy.The proposal will be introduced in following three steps.
Step 1: Data Pre-Processing ISPRS Int.J. Geo-Inf.2018, 7, 284 6 of 14 In this step, the DPR strategy is used to denoise hyperspectral data.Unlike the DPR technique proposed by Li et al. [16], in this study, the Sobel filter is replaced by the Roberts cross operator in Equation (3).It can be rewritten in the following form: For each band in the dataset X, we use Equation (10) to update it constantly as follows: x where is denotes the value of the s-th band of the pixel x i in the t-th iteration.The update process terminates if Equation ( 11) is met.

||E
where x * s indicates the s-th band image of the HSI and ε is a predetermined threshold.
The experimental results show that the substitution of the edge detection operator can improve the classification accuracy.
Step 2: Classification The proposed semi-supervised classification method includes two stages: pre-classification and classification.
In the pre-classification stage, two classifiers, MLRsub and LMPNN, are used to predict the class labels of unlabeled pixels using limited training samples per class.Pre-classification can also be considered as a technique for automatically labeling samples or expanding the training set in terms of a few labeled instances.
Specifically, for any unlabeled pixel x i , we use the MLRsub classifier and LMPNN classifier together to obtain its class label provided that pixel x i belongs to the k-th class by the MLRsub algorithm, denoted as y MLR = k, and the s-th class by the LMPNN method (i.e., y LMPNN = s).If y MLR = y LMPNN = k, subsuently pixel x i is stamped on the label of the k-th class (i.e., y i = k).Otherwise, sample x i is placed into a set in which the class labels of all samples are to be further determined.This procedure is termed pre-classification in our study.
We empirically discovered that it is effective to automatically acquire more labeled samples in terms of a few labeled instances per class.It is inevitable that some pixels are mislabeled during pre-classification.Therefore, this procedure is not executed iteratively to avoid more samples being misclassified.
For samples that have not been labeled during pre-classification, we apply the MLRsub method again to label them based on those labeled samples obtained in pre-classification along with the initial training set.Currently, we have obtained the final classification result.
Step 3: Post-Processing To improve the classification accuracy, it is necessary to reprocess the obtained classification result.In this step, we employ the DPR strategy introduced in Section 2.1 to correct the class labels of some misclassified samples.

Experimental Results
We conducted some experiments to validate the performance of our proposal.Two airborne visible/infrared imaging spectrometer sensor (AVIRIS) HSI datasets (i.e., Indian Pines and Salinas) were chosen to illustrate our method.As a benchmark, these two datasets are often used to test the performance of classification algorithms for HSIs.The performance of the proposed algorithm was also compared with four competing classification techniques.
In our experiments, the initial small-size training sets have been obtained by randomly selecting 5, 10, and 15 samples from each class, respectively.To reduce the deviation of randomly labeling for the classification result, all experiments were performed over 10 independent trials with a random choice of the training set.The final result is obtained by the mean value of the result and its standard deviation.To properly assess the classification results of the HSI, two popular indices, the overall accuracy (OA) and kappa coefficient (KC), were adopted in this study.
Based on the initial training sets and the proposed method, the classification results on the Indian Pines and Salinas datasets are reported in Table 1.For Indian Pines dataset, the best classification accuracy over OA is 91.18% on average, in the case of randomly selecting 15 instances from each class.In other words, this result is obtained from approximately 2.3% of the total labeled samples.The classification accuracy of this dataset may not be as high as anticipated because of its serious data disequilibrium.The perfect classification result on the Salinas dataset has been obtained even if only 0.15% of the samples are labeled (label 5 samples per class).This fact successfully demonstrates that the proposal is valid for HSI classification with very small training sets.Table 1  Based on the initial training sets and the proposed method, the classification results on the Indian Pines and Salinas datasets are reported in Table 1.For Indian Pines dataset, the best classification accuracy over OA is 91.18% on average, in the case of randomly selecting 15 instances from each class.In other words, this result is obtained from approximately 2.3% of the total labeled samples.The classification accuracy of this dataset may not be as high as anticipated because of its serious data disequilibrium.The perfect classification result on the Salinas dataset has been obtained even if only 0.15% of the samples are labeled (label 5 samples per class).This fact successfully demonstrates that the proposal is valid for HSI classification with very small training sets.Table 1 also indicates that the classification result provided using the LMPNN classifier is better than that achieved by applying the KNN method of pre-classification.It is noteworthy that the application of post-processing technology has significantly improved the classification result.The classification accuracy and the standard deviation of each class in the Indian Pines and Salinas datasets are recorded in Tables 2 and 3, respectively.In Figure 3, we present the classification map obtained by PMLMP with different numbers of labeled sample.classification result provided using the LMPNN classifier is better than that achieved by applying the KNN method of pre-classification.It is noteworthy that the application of post-processing technology has significantly improved the classification result.The classification accuracy and the standard deviation of each class in the Indian Pines and Salinas datasets are recorded in Tables 2 and 3, respectively.In Figure 3, we present the classification map obtained by PMLMP with different numbers of labeled sample.

Comparative Tests
To objectively compare our method with other competitive approaches, it is necessary to select the same dataset, labeling proportion, and methods related to the MLR technique.Therefore, we have performed comparative experiments on the Indian Pines and Salinas datasets for the case of randomly selecting 15 instances from each class.Table 4 shows the comparison results provided by MLR, MLR-MLL (multilevel logistic prior) [43], ppMLR, prMLRpr [16] approaches, and the proposed method.For the Indian Pines dataset, no significant change is shown in the classification accuracy obtained by ppMLRpr and PMLMP.A similar classification result (OA = 90.44%)has also been achieved by the MLR + KNN + SNI method [2].However, for the Salinas dataset, the method we proposed provides a perfect classification result.The classification accuracy of this study is approximately 4% higher than that of the ppMLRpr algorithm.Tables 1 and 4 show that the results obtained by the PMLM and PMLMP techniques are still higher than those acquired by the ppMLRpr method, even for the random selection of five samples per category.This fact fully demonstrates the validity of the proposed method for remote-sensing data classification, even for the case of a small sample set.
Table 5 displays the superiority of using the Roberts cross operator over the Sobel filter in the proposed algorithm.For the random selection of 15 samples per class, the classification accuracies on both datasets have improved by at least 1%.This explains, to some extent, the rationality of the replacement scheme in pre-processing and post-processing.

Parameter Analysis
It is arduous to select the optimal parameter value in the classification algorithm.Our experience shows that the classification result generally depends on the selection of the parameter value.Figure 4 reflects the correlation between the classification result and parameter γ used in the relaxation strategy.The classification accuracy becomes increasingly better with the increase in γ.The best classification result is achieved with γ = 0.9, which is consistent with Li's selection [16].This shows that the selection of parameter γ is independent of the selection of the operators.Parameter γ = 1 means that the boundary operator has complete control while the smoothness is ignored.Figure 3a shows that the use of the boundary operator cannot determine all the boundaries accurately; therefore, the classification accuracy is reduced.that the selection of parameter γ is independent of the selection of the operators.Parameter γ = 1 means that the boundary operator has complete control while the smoothness is ignored.Figure 3a shows that the use of the boundary operator cannot determine all the boundaries accurately; therefore, the classification accuracy is reduced.Generally, the choice of the k value has a direct effect on the classification result provided by the LMPNN classifier or algorithms related to this classifier.To achieve the ideal classification result, it is clear that different k values need to be prespecified for different datasets.Figure 5 shows the relationship between the classification accuracy and k value in two datasets.Although the classification result has a slight change with the increase in k, we can still obtain the best result.Thus, in our algorithm, k is 2 for the Indian Pines dataset and 3 for the Salinas dataset.Generally, the choice of the k value has a direct effect on the classification result provided by the LMPNN classifier or algorithms related to this classifier.To achieve the ideal classification result, it is clear that different k values need to be prespecified for different datasets.Figure 5 shows the relationship between the classification accuracy and k value in two datasets.Although the classification result has a slight change with the increase in k, we can still obtain the best result.Thus, in our algorithm, k is 2 for the Indian Pines dataset and 3 for the Salinas dataset.Generally, the choice of the k value has a direct effect on the classification result provided by the LMPNN classifier or algorithms related to this classifier.To achieve the ideal classification result, it is clear that different k values need to be prespecified for different datasets.Figure 5 shows the relationship between the classification accuracy and k value in two datasets.Although the classification result has a slight change with the increase in k, we can still obtain the best result.Thus, in our algorithm, k is 2 for the Indian Pines dataset and 3 for the Salinas dataset.

Conclusions
In this study, we developed a novel semi-supervised method for HSI classification based on the DPR strategy and pre-classification technique.Using the DPR strategy in pre-processing and postprocessing achieves the purpose of denoising and improving the classification accuracy, respectively.The pre-classification technique is aimed to address the problem of limited training samples in a semi-supervised classification.Our experimental results using the Indian Pines and Salinas datasets show that in the DPR strategy, the substitution of the Roberts cross operator with the Sobel filter can yield better classification results.The comparative test results show that the proposed method is superior to several existing methods.In particular, the proposal can provide perfect classification results for limited training samples.

Figure 3 .
Figure 3. Discontinuity map and classification map obtaine d by pre -processing + MLRsub + the local me an-based pseudo ne arest ne ighbor (LMPNN) rule + multinomial logistic re gre ssion with subspace proje ction (MLRsub) + post-processing (PMLMP) with diffe re nt numbe rs of labe le d samples.The above row is for the Indian Pine s dataset and the below for the Salinas dataset.(a) Discontinuity map.(b-d) classification map with 5, 10, and 15 labe le d samples pe r class, respectively.

Figure 3 .
Figure 3. Discontinuity map and classification map obtained by pre-processing + MLRsub + the local mean-based pseudo nearest neighbor (LMPNN) rule + multinomial logistic regression with subspace projection (MLRsub) + post-processing (PMLMP) with different numbers of labeled samples.The above row is for the Indian Pines dataset and the below for the Salinas dataset.(a) Discontinuity map.(b-d) classification map with 5, 10, and 15 labeled samples per class, respectively.

Figure 4 .
Figure 4.A line chart of classification results varying with re laxation parameter on two hype rspectral image s (HSIs).(a) Indian Pine s dataset; (b) Salinas dataset.

Figure 4 .
Figure 4.A line chart of classification results varying with relaxation parameter on two hyperspectral images (HSIs).(a) Indian Pines dataset; (b) Salinas dataset.

Figure 4 .
Figure 4.A line chart of classification results varying with re laxation parameter on two hype rspectral image s (HSIs).(a) Indian Pine s dataset; (b) Salinas dataset.

Figure 5 .
Figure 5.The relation between the classification re sult and parameter k.(a) Indian Pine s; (b) Salinas.

Table 2 .
Classification accuracy over overall accuracy (OA (%)) and standard deviations for each class in the Indian Pines dataset.

Table 3 .
Classification accuracy over OA (%) and its standard deviations for each class in the Salinas dataset.

Table 1 .
Classification re sults of the Indian Pine s and Salinas datasets by diffe re nt me thods with diffe re nt labe l proportions.

Table 4 .
Comparison results of several methods on two hyperspectral datasets.

Table 5 .
Comparison results of using the Roberts cross operator and Sobel filter in our proposal.