A Constrained Graph-Based Semi-Supervised Algorithm Combined with Particle Cooperation and Competition for Hyperspectral Image Classiﬁcation

: Semi-supervised learning (SSL) focuses on the way to improve learning efﬁciency through the use of labeled and unlabeled samples concurrently. However, recent research indicates that the classiﬁcation performance might be deteriorated by the unlabeled samples. Here, we proposed a novel graph-based semi-supervised algorithm combined with particle cooperation and competition, which can improve the model performance effectively by using unlabeled samples. First, for the purpose of reducing the generation of label noise, we used an efﬁcient constrained graph construction approach to calculate the afﬁnity matrix, which is capable of constructing a highly correlated similarity relationship between the graph and the samples. Then, we introduced a particle competition and cooperation mechanism into label propagation, which could detect and re-label misclassiﬁed samples dynamically, thus stopping the propagation of wrong labels and allowing the overall model to obtain better classiﬁcation performance by using predicted labeled samples. Finally, we applied the proposed model into hyperspectral image classiﬁcation. The experiments used three real hyperspectral datasets to verify and evaluate the performance of our proposal. From the obtained results on three public datasets, our proposal shows great hyperspectral image classiﬁcation performance when compared to traditional graph-based SSL algorithms.


Introduction
With the efficient and rapid speed of information acquisition, more and more data are available in open source. In real-world classification tasks, a large portion of samples in datasets are unlabeled, and obtaining their labels is costly and time-consuming. The way to fully utilize the unlabeled data and explore their potential value of unlabeled samples is a key issue in machine learning. SSL is capable of improving the learning performance by using both a large proportion of unlabeled samples and a handful of labeled samples, and therefore proposed to solve the scarcity of labeled samples [1,2]. Recently, various SSL algorithms have been proposed such as transductive support vector machines (TSVM) [3], co-training [4], label propagation algorithm (LPA) [5], mixmatch [6], fixmatch [7], etc. Additionally, SSL is broadly applied to many areas in real-world tasks, for instance, object detection [8][9][10], remote sensing [11][12][13][14][15][16][17][18][19][20], and data mining [21,22].
Graph-based SSL (GSSL) is one of the important branches of SSL that benefits from its advantages of low time complexity and simple framework. Theoretically, the graph can construct a map that indicates the relationship between the labeled data and unlabeled data. Label propagation is a typical graph-based SSL method, which is capable of obtaining the predicted label of unlabeled samples by propagating label information on the graph. The similarity graph is used to encode the structure of samples. In the graph, each vertex presents a sample, and the similarity between a pair of vertices is presented by edge. Such similarity graphs can either be derived from actual relational samples or be constructed from sample features using k-nearest neighbors [23,24], ∈-neighborhoods, or Gaussian random fields. However, the larger the number size of the dataset, the larger the scale of the constructed graph, thereby resulting in extremely high computational and space complexity. On the other hand, the larger the number of labeled samples, the higher the model accuracy. Therefore, the number of labeled samples is closely related to the model accuracy. How to use as few labeled samples as possible to obtain a high-precision classification performance is worthy of research and is motivated by the constrained Laplacian rank (CLR) algorithm, which was proposed by Nie et al. in 2016 [25]. The main innovation of the CLR model is to construct a k-connected graph, which has an identical structure as data. Unlike other common methods, CLR can obtain cluster indicators through the graph directly without any post-processing. Thus, we followed the CLR method to construct an efficient and constraint graph for label propagation. This graph construction approach only needs to set one parameter k, which represents the number of nearest neighbors. Another advantage is that this graph construction approach can obtain the neat normalized graph without normalized data. Therefore, the proposed algorithm has the power to enhance the effectiveness of label propagation, which benefits from the advantages of the constrained graph construction approach. Additionally, we can obtain a higher accuracy than the comparative algorithms (TSVM, LPA et al.) because our graph is distance-consistent and scale-invariant, which can construct a highly correlated similarity relationship between the graph and the samples.
Semi-supervised learning is expected that, when labeled samples are limited, the performance can be improved by the massive and easily obtained unlabeled samples when compared to supervised algorithms that only use a small number of labeled samples for training. However, it has been found that the performances of current semi-supervised learning approaches may be seriously deteriorated because the noise in misclassified samples is added into the iteration. Thus, how to avoid the adverse impact of label noise is a key issue for LPA to spread label information among samples. Recently, a novel method combined with particle competition and cooperation (PCC) was presented in [26][27][28]. Simply, this approach propagates the labels to the whole network by the random-greedy walk of particles based on PCC mechanism. Hence, aiming at the issue of misclassified label noise in label propagation, the particle competition and cooperation mechanism is adopted in LPA.
Generally, a satisfactory classification results of hyperspectral images is usually obtained only when a large number of labeled samples are used. Many supervised approaches, for example, support vector machines (SVM) [29][30][31], Bayesian approach [32,33], random forests [34][35][36], and k-nearest neighbors (kNN) [37,38], etc., have performed great effectiveness by using a large size of labeled samples. However, the cost of obtaining enough labeled samples is time consuming, expensive, and laborious. If the classification process relies too much on a small size of labeled samples set, the trained classifier often has the problem of overfitting. Ways to enhance the learning efficiency of the algorithm by making most of the massive unlabeled hyperspectral samples has become a concern for hyperspectral image classification. As a very hot and active research direction, SSL has superior results in many application fields, and it has also attracted people's strong interest in the interpretation of remote sensing images, which can overcome the shortage of the number of labeled samples. For example, Zhao et al. [39] adopted a superpixel graph to construct the weighted connectivity graph and divided the graph by discrete potential method on hyperspectral images (HSIs). A new active multi-view multi-learner framework based on genetic algorithm (GA-MVML AL) was proposed by Jamshidpour et al. [40], which used the unique high dimensionality of hyperspectral data to construct multi-view algorithms and obtain a more accurate data distribution by multi-learner. Finally, the GA-MVML AL model showed its superiority in the hyperspectral datasets. Zhang et al. [41] presented an active semi-supervised approach based on random forest (ASSRF) for HSIs, which manually assigns labels to selected samples and pseudo labeling by a novel query function and new pseudo labeling strategy, respectively. Based on regularized local discriminant embedding (RLDE), Ou et al. [42] proposed a novel semi-supervised tri-training approach for HSIs classification, which extracts the optimal features by RLDE and selects the candidate set who is the most informative by active learning. A semi-supervised approach named ELP-RGF is presented by Cui et al. [43] for HSIs classification, which combined with extended label propagation (ELP) and rolling guidance filtering (RGF). Xue et al. [44] presented a novel semi-supervised hyperspectral image classification algorithm via dictionary learning. Xia et al. [45] proposed a semisupervised graph fusion approach. Aiming at solving the problem of data deficiency in HSIs, Cao et al. [46] proposed a structure named three-dimensional convolutional adversarial autoencoder. Zhao et al. [47] presented a cluster-based conditional generative adversarial net to increase the size and quality of the training dataset for hyperspectral image classification. A semisupervised hyperspectral unmixing solution that incorporated the spatial information between neighbor pixels in the abundance estimation procedure was proposed by Fahime et al. [48].
In this paper, a new constrained graph-based semi-supervised algorithm called constrained label propagation with particle competition and cooperation (CLPPCC) is presented for HSI classification. First, aiming at eliminating the redundant and noisy information simultaneously, three hyperspectral datasets we used in this paper were preprocessed by the approach named image fusion and recursive filtering feature (IFRF) [49]. Second, we constructed a constrained graph on the labeled set and unlabeled set, then started to propagate label information and get the predicted labels of unlabeled samples by our algorithm. Finally, we introduced the particle competition and cooperation mechanism to the training process, which can dynamically mark and correct the misclassified samples in an unlabeled dataset in order to achieve excellent classification accuracy. The mainly innovative ideas and contributions in our works can be summed up in the following three points.
(1) A novel constrained affinity matrix construction method is introduced for initial graph construction, which has the powerful ability to excavate the proficiency and complicated structure in the data. (2) For the purpose of preventing the performance deterioration of graph-based SSL caused by the label noise of predicted labels, the PCC mechanism was adopted to mitigate the adverse impact of label noise in LPA. (3) Aiming at the high costs to gain labeled samples and a rich supply of unlabeled samples in real-world hyperspectral datasets, we applied our semi-supervised CLPPCC algorithm to HSIs classification by only using a small number of labeled samples and the results demonstrated our proposal was superior to alternatives.
The remaining part of our paper is arranged as follows. The related works about the original approach proposed in [5,20,22] are simply described in Section 2. Then, an innovative graph-based semi-supervised algorithm with PCC is introduced and represented in detail in Section 3. Next, in Section 4, three real-world hyperspectral datasets were selected in the experiments to evaluate and verify the performance of our proposal and the comparative algorithms we selected. Finally, Section 5 gives a summary of this paper and looks forward to some possible future research directions.

Label Propagation
Label propagation is one of the classic graph-based semi-supervised algorithms, which was presented by Zhu et al. in 2002 [5]. Many experts and scholars have focused on LPA due to its advantages of simple framework, short execution time, high classification performance, and wide image application fields. The description of LPA can be simply described as follows.
Given dataset X = {x 1 , . . . , x l , x l+1 , . . . , x l+u } where x i ∈ R d , d represents the dimension of samples. The first l samples and their corresponding label are defined as the labeled dataset D L = {(x 1 , y 1 ), . . . , (x l , y l )} ⊂ R d and unlabeled dataset D U = {x l+1 , . . . , x l+u } ⊂ R d is defined by the rest of samples without label. Among them, n = l + u represents the total number of labeled and unlabeled samples, y i ∈ C and C = {1, . . . , c} on behalf of the class set. Here, c represents the total number of classes among all the datasets. The goal of LPA is to obtain the predicted label y j (l + 1 ≤ j ≤ l + u) of the unlabeled dataset.
Graph construction and label propagation are the key components of GSSL. GSSL assumes that the closer the sample, the more likely its labels are the same. In short, the steps of typical LPA in Zhou's method can be summarized below.
Step 1 Construct the affinity matrix A n×n with its entries Step 2 Calculate the probability transition matrix T n×n by using the affinity matrix calculated by Step 1 through Equation (1).
Step 3 Define a label matrix Y n×c by Equation (2), and calculate the probability distribution matrix by Equation (3).
Step 4 Clamp the labeled samples by Equation (4).
Step 5 Repeat Step 3 and Step 4, until the Y converges.
Step 6 Label propagation. Assign each sample x i ∈ D U with the label y i = argmax j≤c F ij , 1 ≤ i ≤ n.

Particle Cooperation and Competition
The particle competition and cooperation mechanism were proposed by Breve et al. [20][21][22], which is designed to process the datasets mixed into misclassified samples. This mechanism propagates labels by using competitive and cooperative teams of walking particles, which is nature-inspired and has great robustness in misclassified samples and label noise.
The PCC mechanism classifies the samples in the graph nodes by the corresponding particles who walk in the graph. The goal of each particle is to control most of the unlabeled nodes, propagate its labels, and prevent the invasion of enemy particles. Roughly speaking, the main steps of the PCC mechanism is as follows: Step 1 Transfer the dataset based on vector into a non-weighted and undirected graph. Here, each sample is presented by nodes in the graph, respectively. Edges between a pair of nodes is created and constructed by the k-nearest neighbor approach if the Euclidean distance between them is below a certain threshold.
Step 2 Create a particle corresponding to each labeled graph node, and define a team including particular particles, which corresponds to graph nodes with the same labels. The unlabeled nodes are dominated by the combined efforts of particles in the team. Furthermore, the particles that correspond to the graph nodes with different labels compete against one other by the way of occupying those nodes. The ambition of each particle team is to control the most of unlabeled nodes, propagate the label of their team, and avoid the invasion of the opponent's particles.
Step 3 The particles walking in the graph obey the random-greedy rules when the process is over. The territory frontiers of particles always drop on the nearby of edges between classes. Therefore, the PCC mechanism obtains a high classification accuracy.

Graph Construction
How to initialize an affinity matrix A n×n is a critical step for LPA. This graph should satisfy that the smaller the distance x i − x j 2 2 between x i and x j , the larger the edge weight a ij . Motivated by the constrained Laplacian rank (CLR) algorithm, which was proposed by Nie et al. in 2016 [25]. The process of how to calculate A is as follows: Here, the L2-norm of each row of A as the regularization is used to learn the affinity values of A. Let dataset X = {x 1 , . . . , x l , x l+1 , . . . , x n } as all nodes in the graph, where n = l + u. The problem of initial affinity matrix A is to solve the following problem: where a i represents the set of the edge weight a ij of the i−th node. With the aim to achieve efficient and high performance, we learned a sparse affinity matrix A. In order to make Problem (5) have k nonzero values accurately or constrain the L0-norm of a i to k, we selected the affinities with the maximal γ. Therefore, we have Problem (6) as follows.
whereâ is the optimal solution in the Problem (5). Let D ⊂ R n×n denote the distance matrix for the dataset. We used the squared Euclidean distance to compute d x ij in D(i, j) in this paper.
where d x i presents a vector with j−th element as d x ij . Assuming the constructed affinity matrix A satisfied that the number of neighbors of each sample is k, then each row in A should have exactly k nonzero values. Solve each row in A one by one by the way of solving Problem (5). Then, the optimization problem of each row in A can be described as follows.
Solving Problem (8) by the Lagrangian function, then we get Problem (9) as follows: where η and β i ≥ 0 are the Lagrange multipliers. After a series of algebraic operations, we can obtain the optimal affinitiesâ ij as follows:

Label Propagation
In this section, we started to label propagation on labeled and unlabeled samples to obtain the predicted labels of the unlabeled samples. First, we used Equation (11) to calculate the probability transition matrix T n×n based on the affinity matrix A computed by the graph construction approach in Section 3.1.
where T ij represents the propagation probability from node i to node j. Before label propagation, we needed to construct a label matrix Y first to show the change of label in the process of label propagation. Therefore, we defined a label matrix YL ⊂ R l×c , where the i−th row on behalf of the label probability of node y i , and the j−th column represents the label class corresponding to the sample. If YL ij = 1, then the label of node y i is represented by j, where 1 ≤ j ≤ c.
Meanwhile, given an unlabeled matrix YU ⊂ R u×c for labeled samples and Through probability propagation, make the distribution concentrate on the given class, and then the node label is propagated through the weight value of the edge.
Each node adds up the label information propagated by its surrounding nodes based on the weight on the basis of the propagation probability Y, and updates their probability distribution Y by Equation (13), where Y ij is the probability of i−th samples belonging to class j.
Aiming at preventing the ground truth label of labeled samples being covered, the probability distribution of the labeled sample is re-assigned to its corresponding initial value, as shown by the following formula: Repeat the steps above, until Y converges.
Finally, obtain the predicted labels of unlabeled samples by Equation (15) as follows.
Now, we summarize the process of the proposed method in Section 3.2 in Algorithm 1.

The Proposed Graph-Based Semi-Supervised Model Combined with Particle Cooperation and Competition
According to the above Section 3.1 and, we can obtain a dataset unlabeled set and its corresponding predicted labels which includes the total number of labeled set l, unlabeled set u, and testing set t. Our goal was to correct the misclassified labels in the unlabeled set and calculate the predicted label of the testing set according to the usage of the labeled set and unlabeled set with its predicted labels. The main ideas of the proposed CLPPCC model can be summarized in five parts as follows: • Initial configuration behalf of the edge set each corresponding to a couple of vertices. An undirected graph G = (V, E) was constructed based on the dataset T. Every v i corresponds to a sample x i . E can be represented by a similarity matrix A, which is calculated by the approach in Section 3.1.
For every labeled sample, x i ∈ {x 1 , . . . , x l , x l+1 , . . . , x n } or its corresponding node v i ∈ {v 1 , . . . , v n }, where n = l + u, a particle ρ i ∈ {ρ 1 , ρ 2 , . . . , ρ n } is generated. In addition, we called the particles placed on the labeled vertices as home vertices. For each node v i corresponding to one vector variable v ω Among them, particles will increase the node control level of its own team, while reducing the control level of other teams, so the total control level of each node is always constant, For each particle, the initial position is also called the home node. The initial strength of each particle is given as follows: ρ ω j (0) = 1 The initial distance table for each particle is defined as Equation (18). •

Nodes and particles dynamics
During the process of the visit, the updated domination level v The updated particle strength ρ ω j (t) is defined as Equation (20).
and the updated distance table is given as follows: •

Random-Greedy walk
There are two types of walk to select a nearby node to visit: Random walk and greedy walk. As for random walk, according to the probabilities, the particle ρ j transfers to the other node. It is defined as follows.
where q is the current node's index of ρ j . In greedy walk, the particle ρ j moves to the other node though the new probabilities, which is given as follows: where q is the current node's index of ρ j , ζ = ρ f j , and ρ f j is the label class of ρ j .

• Stop Criterion
For each node, record the average maximum domination levels ( v . Then, stop the overall model when the quantity does not increase. When the algorithm stops the criterion completely, each node will be labeled, and some will be relabeled to the class who has a higher domination level of them.
Overall, according to above section, we can build an innovative graph-based semisupervised model on the grounds of constrained label propagation and particle cooperation and competition.

Experiments and Analysis
Several experiments were performed to evaluate the learning efficiency of our proposal and demonstrate its superiority when compared to alternatives in this section. In our work, all experiments were carried out in a MATLAB R2019b environment on a computer with Intel Core i7-10750H CPU 2.6 GHz and 16 GB RAM.

•
Indian Pines image Indian Pines image was the first and famous testing data for hyperspectral image classification, which was obtained by AVIRIS sensors in 1992. The image of Indian Pines includes 145 × 145 pixels and 220 spectral bands covering the range of 0.4 to 2.5 µm, which shows a mixed agricultural/forest area in northwestern Indiana, America. There are 16 inclusive classes in the available ground-truth map such as alfalfa, corn-mintill, soybean-nottill, etc., which are remarked as C1-C16. Figure 1a-c shows the false color image, ground-truth image, and reference map, respectively.  •

Salinas image
The Salinas image was collected in Salinas Valley, California, USA by AVIRIS. The image of Salinas included 512 × 217 pixels and 224 spectral bands. Those bands that were absorbed by water were discarded in our experiments. The false color scene is shown in Figure 3a. As is shown in Figure 3b,c, the ground-truth map and reference map included 16 classes such as fallow, stubble, celery, etc. The classes here are remarked as C1-C16.

Evaluation Criteria
Three evaluation criteria were adopted to judge the effectiveness of each model.

•
Overall Accuracy, OA The overall classification accuracy can visually display the classification results by counting the number of samples classified correctly. The calculation method is the following expression: where N represents the total number of samples and m ii represents the number of samples classified into correct class i.
• Average Accuracy, AA where P CA = m ii /N i represents the classification results of single class. AA means the proportion of samples that are successfully classified to their class in the total number of samples in that class. AA is equal to OA when the number of each class is equal to each other.
The larger the value of K, the higher the consistency between the classified image and original image.

Comparative Algorithms
• TSVM: Transductive Support Vector Machine algorithm [3]. • LGC: The Local and Global Consistency graph-based algorithm [50]. • LPA: the original Label Propagation Algorithm [5]. • LPAPCC: the original Label Propagation Algorithm combined with Particle Cooperation and Competition without the novel graph construction mentioned in Section 3.1.

Classification of Hyperspectral Images
With the purpose of estimating the performance of our proposed CLPPCC algorithm, several related algorithms were used for comparative experiments such as TSVM, LGC, and LPA. To highlight the importance of the novel graph construction approach on the increased classification performance, we also compared the performance of our proposal and label propagation algorithm combined with particle competition and cooperation (LPA-PCC), which do not use the novel graph construction to construct the graph, that is, LPA was only optimized by particle competition and cooperation. Before classifying, aiming at eliminating the redundant and noisy information simultaneously, three hyperspectral datasets were preprocessed by the approach named image fusion and recursive filtering feature (IFRF). IFRF is an effective and powerful feature extraction method. The basic ideas of IFRF are as follows: First, the hyperspectral image is divided into several subsets of adjacent hyperspectral bands. Then, the bands in each subset are fused together by averaging, which is one of the simplest image fusion methods. Finally, the fused frequency band is filtered recursively in the transform domain to obtain the final features for classification. For each hyperspectral dataset we used in the experiments, the training set was composed of 30% samples randomly selected from each class, and the rest of the available samples made up the testing set. Among the training set, it was divided into the labeled set and unlabeled set. For the Indian Pines image dataset, the labeled set was made up of 10% randomly selected samples from each class in the training set, and the remaining samples were defined as the unlabeled set without labels. For the Pavia university scene dataset, the size for the percentage of the labeled set was set to 5%. For the Salinas image dataset, the size for the percentage of the labeled set was set to 2%. Considering the difference between the classes whose total sample number was a few, a minimum threshold was set to five for each class in both the training set and testing set. That is, regardless of the percentage of the selected samples, each class had at least five samples in the training set and labeled set. Each experiment obeyed the above sample division rules. Table 1 shows the distribution of each sample set. Among them, the labeled set and unlabeled set are represented by L and U, respectively. Here, AA, OA, and kappa coefficient were introduced as evaluation criteria to assess the effectiveness of the models here. To reduce the impact of randomness on the results, the final record result adopted the average result of 10 repeated runs for each hyperspectral dataset and each model.
The class-specific accuracies are given in Tables 2-4. We can observe the results obtained by each algorithm including AAs, OAs, and Kappa coefficients in three hyperspectral datasets, respectively. For the purpose of analyzing the impact of the neighborhood size k, which plays an important role in calculating the affinity matrix A, we set up a comparative experiment for each comparison algorithm and our proposal when k was 5 and 10, respectively (see Tables 2-4). The best classification results among the various algorithms are in bold. In order to show and compare the classification performance intuitively, the predicted classification maps of the Indian Pines, Pavia University, and Salinas Valley datasets can be clearly observed in Figures 4-6, respectively. In Figures 4-6, the neighborhood size k was set to five.    From the experimental results on the Indian Pines dataset in Table 2, we can observe that TSVM, LGC, LPA (k = 5), LPAPCC (k = 5), CLPPCC (k = 5), LPA (k = 10), LPA-PCC (k = 10), and CLPPCC (k = 10) achieved 0, 1, 2, 4, 3, 1, 1, and 9 best-specific accuracies, respectively. Particularly, CLPPCC (k = 10) achieved 9 best-specific accuracy. From Table 3, we can observe in the results in the Pavia University dataset that TSVM, LGC, LPA (k = 5), LPAPCC (k = 5), CLPPCC (k = 5), LPA (k = 10), LPAPCC (k = 10), and CLPPCC (k = 10) obtained 0, 0, 2, 5, 0, 0, 1, and 1 best class-specific accuracies, respectively. Here, our proposal not only performed well in a large number of class samples, but also performed well in a small number of class samples. For the Salinas dataset, Table 4 shows that TSVM, LGC, LPA (k = 5), LPAPCC (k = 5), CLPPCC (k = 5), LPA (k = 10), LPAPCC (k = 10), and CLPPCC (k = 10) achieved 1, 3, 7, 5, 5, 2, 4, and 6 best class-specific accuracies, respectively. Our proposal CLPPCC performed well in both k = 5 and k = 10. The classification results in Figures 4-6 show that CLPPCC was more satisfactory than the other compared algorithms when the parameter k was set to five.
From the above experimental results shown in Tables 2-4, several conclusions can be summarized. First, CLPPCC (k = 10) achieved the best OAs, AAs, and kappa coefficients for all three hyperspectral datasets. Second, the classification performance of TSVM, LGC, LPA (k = 5), and LPAPCC (k = 5) was not stable in all datasets as they obtained low class-specific accuracies in some small number classes. Third, compared with LPA, LPAPCC and CLPPCC, CLPPCC had the best performance in OAs, AAs, and kappa coefficients in both k = 5 and k = 10 for all datasets. LPA and LPAPCC were significantly affected by the value of k. When k was set to five, the LPA and LPAPCC performed terribly in a small number of class samples such as C1 in the Indian Pines dataset. When k was set to 10, both LPA, LPAPCC, and CLPPCC improved their performances. CLPPCC was affected by the value of k slightly, and it kept good performance regardless of what the value of k was. The reason why it worked well is because the graph construction approach we used could build a graph that closely related to the samples. Unlike the traditional KNN graph construction, our approach was less affected by the value of k. Fourth, compared with LPA and LPAPCC, LPAPCC performed better than LPA in both k = 5 and k = 10 for all datasets as it benefitted from the particle competition and cooperation mechanism, which could remark on the misclassified labels from unlabeled samples and could achieve great effectiveness even though there were some misclassified labels in the training process.

Running time
We recorded and analyzed the average running time of each model in seconds on the three hyperspectral datasets in Table 5. The running time in Table 5 shows that the execution time of LPA (k = 10), LPAPCC (k = 10), and CLPPCC (k = 10) was slightly longer than LPA (k = 5), LPAPCC (k = 5) and CLPPCC (k = 5). It can be seen from Tables 2-4 that when the value of k was larger, the higher the overall accuracy obtained. However, we can observe from Table 5 that the running time slowed down as the value of k increased from five to 10. Next, although our proposal CLPPCC was slower than LPA, the classification performance of CLPPCC was better than LPA. This is because the PCC mechanism needs to remark and modifies the misclassified labels in unlabeled samples, and the cost of running time is exchanged for an increase in class-specific accuracy, OAs, AAs, and kappa coefficients. Then, it is worth saying that although the OAs of TSVM was not the best when compared to other algorithms, the running time of TSVM was the fastest among the label propagation algorithms. The running time of LGC was as fast as TSVM, but the performance of LGC was worse than that of TSVM.

Robustness of the Proposed Method
The effectiveness of our proposed CLPPCC algorithm was appraised in several experiments above. In this section, we analyzed the robustness of our proposal and other comparison algorithms in order to evaluate the superiority of our proposal. We compared our method with alternatives in different sizes of labeled samples and noise situation and the experimental results indicated the robustness of our proposal. Here, the parameter k was set to five.

Labeled Size Robustness
Several experiments with different size of labeled samples were performed in this section. To reduce randomness, we picked the average results of 10 runs repeatedly as the final record result for each hyperspectral dataset. Figure 7 shows the overall classification accuracy of our proposal and four alternatives in different sizes of labeled samples. The labeled samples percentage was 5%, 10%, 12%, 14%, and 16% for the Indian Pines dataset. For the Pavia University scene, the percentage of labeled samples was set to 3%, 5%, 7%, 9%, and 11%. The labeled samples percentage for the Salinas image was set to 1%, 2%, 5%, 8%, and 10%.  Figure 7, it can be simply found that the higher ratio of labeled samples, the better classification accuracy (OAs) that each algorithm achieved. Simultaneously, other evaluation standards such as AAs, kappa coefficients, and class-specific accuracy were enhanced with the increase in the labeled ratio. Note that in most cases, our CLPPCC obtained the best performance of OAs in all three datasets, which indicates the robustness of CLP-PACC in various sizes of labeled samples. It is worth noticing that our proposal achieved a higher classification result even if in a low size of labeled samples and the compared algorithms rarely obtained high classification results with a small size of labeled samples.

Noise Robustness
The noise robustness was analyzed in this section. We added 20 dB of Gaussian noise into each hyperspectral dataset. We selected the average results of 10 runs repeatedly as the final recorded result for each hyperspectral dataset to reduce occasionality. Figure 8a,b demonstrates the clean and noisy images, respectively. For the purpose of showing the impact of noise in overall accuracy, the overall accuracy results of the three hyperspectral datasets are shown in Table 6. Figure 8 shows the false-color clean and noisy image of the Indian Pines image. The classification results of three noisy hyperspectral datasets and their corresponding OA values are shown in Figures 9-11. Here, the parameter of k was set to five for all approaches and three HSIs.    From Table 6 and Figures 9-11, the performances of all algorithms were affected by the Gaussian noise added into the datasets. The performance degradation of our algorithm was the lowest compared to the others. In addition, the proposed method still kept a high classification result, which indicates that our method is robust.

Conclusions
In this paper, we considered a graph-based semi-supervised problem where the usage of unlabeled samples might deteriorate the model performance in HSI classification. In addition, the IFRF method was used to remove the redundant feature information in HSIs. Several conclusions were summarized based on the experiments we performed. First, our proposed CLPPCC was effective for HSI classification. The classification results of CLPPCC were superior to the comparison algorithms among the three hyperspectral datasets. Second, the constrained graph construction approach plays an important role in CLPPCC, which helps CLPPCC keep a high overall accuracy when the percentage of labeled hyperspectral samples is low. Third, the PCC mechanism used in CLPPCC decreases the impact of label noise in an unlabeled hyperspectral dataset. At last, CLPPCC is robust in noise HSIs, and also keeps high overall accuracy with varying labeled ratio.
There are many interesting future works. For instance, the label propagation in our current proposal was based on a classical and initial learning method where several prior strategies and knowledge may be not leveraged. More flexible methods that are able to incorporate the domain knowledge are worth trying in the future.

Conflicts of Interest:
The authors declare no conflict of interest.