You are currently viewing a new version of our website. To view the old version click .
Remote Sensing
  • Article
  • Open Access

8 January 2021

A Constrained Graph-Based Semi-Supervised Algorithm Combined with Particle Cooperation and Competition for Hyperspectral Image Classification

,
,
,
,
and
1
School of Electronic and Information Engineering, Hebei University of Technology, Tianjin 300401, China
2
School of Mechanical Engineering, Hebei University of Technology, Tianjin 300401, China
3
Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
*
Author to whom correspondence should be addressed.

Abstract

Semi-supervised learning (SSL) focuses on the way to improve learning efficiency through the use of labeled and unlabeled samples concurrently. However, recent research indicates that the classification performance might be deteriorated by the unlabeled samples. Here, we proposed a novel graph-based semi-supervised algorithm combined with particle cooperation and competition, which can improve the model performance effectively by using unlabeled samples. First, for the purpose of reducing the generation of label noise, we used an efficient constrained graph construction approach to calculate the affinity matrix, which is capable of constructing a highly correlated similarity relationship between the graph and the samples. Then, we introduced a particle competition and cooperation mechanism into label propagation, which could detect and re-label misclassified samples dynamically, thus stopping the propagation of wrong labels and allowing the overall model to obtain better classification performance by using predicted labeled samples. Finally, we applied the proposed model into hyperspectral image classification. The experiments used three real hyperspectral datasets to verify and evaluate the performance of our proposal. From the obtained results on three public datasets, our proposal shows great hyperspectral image classification performance when compared to traditional graph-based SSL algorithms.

1. Introduction

With the efficient and rapid speed of information acquisition, more and more data are available in open source. In real-world classification tasks, a large portion of samples in datasets are unlabeled, and obtaining their labels is costly and time-consuming. The way to fully utilize the unlabeled data and explore their potential value of unlabeled samples is a key issue in machine learning. SSL is capable of improving the learning performance by using both a large proportion of unlabeled samples and a handful of labeled samples, and therefore proposed to solve the scarcity of labeled samples [,]. Recently, various SSL algorithms have been proposed such as transductive support vector machines (TSVM) [], co-training [], label propagation algorithm (LPA) [], mixmatch [], fixmatch [], etc. Additionally, SSL is broadly applied to many areas in real-world tasks, for instance, object detection [,,], remote sensing [,,,,,,,,,], and data mining [,].
Graph-based SSL (GSSL) is one of the important branches of SSL that benefits from its advantages of low time complexity and simple framework. Theoretically, the graph can construct a map that indicates the relationship between the labeled data and unlabeled data. Label propagation is a typical graph-based SSL method, which is capable of obtaining the predicted label of unlabeled samples by propagating label information on the graph. The similarity graph is used to encode the structure of samples. In the graph, each vertex presents a sample, and the similarity between a pair of vertices is presented by edge. Such similarity graphs can either be derived from actual relational samples or be constructed from sample features using k-nearest neighbors [,], -neighborhoods, or Gaussian random fields. However, the larger the number size of the dataset, the larger the scale of the constructed graph, thereby resulting in extremely high computational and space complexity. On the other hand, the larger the number of labeled samples, the higher the model accuracy. Therefore, the number of labeled samples is closely related to the model accuracy. How to use as few labeled samples as possible to obtain a high-precision classification performance is worthy of research and is motivated by the constrained Laplacian rank (CLR) algorithm, which was proposed by Nie et al. in 2016 []. The main innovation of the CLR model is to construct a k-connected graph, which has an identical structure as data. Unlike other common methods, CLR can obtain cluster indicators through the graph directly without any post-processing. Thus, we followed the CLR method to construct an efficient and constraint graph for label propagation. This graph construction approach only needs to set one parameter k, which represents the number of nearest neighbors. Another advantage is that this graph construction approach can obtain the neat normalized graph without normalized data. Therefore, the proposed algorithm has the power to enhance the effectiveness of label propagation, which benefits from the advantages of the constrained graph construction approach. Additionally, we can obtain a higher accuracy than the comparative algorithms (TSVM, LPA et al.) because our graph is distance-consistent and scale-invariant, which can construct a highly correlated similarity relationship between the graph and the samples.
Semi-supervised learning is expected that, when labeled samples are limited, the performance can be improved by the massive and easily obtained unlabeled samples when compared to supervised algorithms that only use a small number of labeled samples for training. However, it has been found that the performances of current semi-supervised learning approaches may be seriously deteriorated because the noise in misclassified samples is added into the iteration. Thus, how to avoid the adverse impact of label noise is a key issue for LPA to spread label information among samples. Recently, a novel method combined with particle competition and cooperation (PCC) was presented in [,,]. Simply, this approach propagates the labels to the whole network by the random-greedy walk of particles based on PCC mechanism. Hence, aiming at the issue of misclassified label noise in label propagation, the particle competition and cooperation mechanism is adopted in LPA.
Generally, a satisfactory classification results of hyperspectral images is usually obtained only when a large number of labeled samples are used. Many supervised approaches, for example, support vector machines (SVM) [,,], Bayesian approach [,], random forests [,,], and k-nearest neighbors (kNN) [,], etc., have performed great effectiveness by using a large size of labeled samples. However, the cost of obtaining enough labeled samples is time consuming, expensive, and laborious. If the classification process relies too much on a small size of labeled samples set, the trained classifier often has the problem of overfitting. Ways to enhance the learning efficiency of the algorithm by making most of the massive unlabeled hyperspectral samples has become a concern for hyperspectral image classification. As a very hot and active research direction, SSL has superior results in many application fields, and it has also attracted people’s strong interest in the interpretation of remote sensing images, which can overcome the shortage of the number of labeled samples. For example, Zhao et al. [] adopted a superpixel graph to construct the weighted connectivity graph and divided the graph by discrete potential method on hyperspectral images (HSIs). A new active multi-view multi-learner framework based on genetic algorithm (GA-MVML AL) was proposed by Jamshidpour et al. [], which used the unique high dimensionality of hyperspectral data to construct multi-view algorithms and obtain a more accurate data distribution by multi-learner. Finally, the GA-MVML AL model showed its superiority in the hyperspectral datasets. Zhang et al. [] presented an active semi-supervised approach based on random forest (ASSRF) for HSIs, which manually assigns labels to selected samples and pseudo labeling by a novel query function and new pseudo labeling strategy, respectively. Based on regularized local discriminant embedding (RLDE), Ou et al. [] proposed a novel semi-supervised tri-training approach for HSIs classification, which extracts the optimal features by RLDE and selects the candidate set who is the most informative by active learning. A semi-supervised approach named ELP-RGF is presented by Cui et al. [] for HSIs classification, which combined with extended label propagation (ELP) and rolling guidance filtering (RGF). Xue et al. [] presented a novel semi-supervised hyperspectral image classification algorithm via dictionary learning. Xia et al. [] proposed a semisupervised graph fusion approach. Aiming at solving the problem of data deficiency in HSIs, Cao et al. [] proposed a structure named three-dimensional convolutional adversarial autoencoder. Zhao et al. [] presented a cluster-based conditional generative adversarial net to increase the size and quality of the training dataset for hyperspectral image classification. A semisupervised hyperspectral unmixing solution that incorporated the spatial information between neighbor pixels in the abundance estimation procedure was proposed by Fahime et al. [].
In this paper, a new constrained graph-based semi-supervised algorithm called constrained label propagation with particle competition and cooperation (CLPPCC) is presented for HSI classification. First, aiming at eliminating the redundant and noisy information simultaneously, three hyperspectral datasets we used in this paper were preprocessed by the approach named image fusion and recursive filtering feature (IFRF) []. Second, we constructed a constrained graph on the labeled set and unlabeled set, then started to propagate label information and get the predicted labels of unlabeled samples by our algorithm. Finally, we introduced the particle competition and cooperation mechanism to the training process, which can dynamically mark and correct the misclassified samples in an unlabeled dataset in order to achieve excellent classification accuracy. The mainly innovative ideas and contributions in our works can be summed up in the following three points.
(1)
A novel constrained affinity matrix construction method is introduced for initial graph construction, which has the powerful ability to excavate the proficiency and complicated structure in the data.
(2)
For the purpose of preventing the performance deterioration of graph-based SSL caused by the label noise of predicted labels, the PCC mechanism was adopted to mitigate the adverse impact of label noise in LPA.
(3)
Aiming at the high costs to gain labeled samples and a rich supply of unlabeled samples in real-world hyperspectral datasets, we applied our semi-supervised CLPPCC algorithm to HSIs classification by only using a small number of labeled samples and the results demonstrated our proposal was superior to alternatives.
The remaining part of our paper is arranged as follows. The related works about the original approach proposed in [,,] are simply described in Section 2. Then, an innovative graph-based semi-supervised algorithm with PCC is introduced and represented in detail in Section 3. Next, in Section 4, three real-world hyperspectral datasets were selected in the experiments to evaluate and verify the performance of our proposal and the comparative algorithms we selected. Finally, Section 5 gives a summary of this paper and looks forward to some possible future research directions.

3. The Proposed Method

3.1. Graph Construction

How to initialize an affinity matrix A n × n is a critical step for LPA. This graph should satisfy that the smaller the distance x i x j 2 2 between x i and x j , the larger the edge weight a i j . Motivated by the constrained Laplacian rank (CLR) algorithm, which was proposed by Nie et al. in 2016 []. The process of how to calculate A is as follows:
Here, the L2-norm of each row of A as the regularization is used to learn the affinity values of A . Let dataset X = { x 1 , , x l , x l + 1 , , x n } as all nodes in the graph, where n = l + u . The problem of initial affinity matrix A is to solve the following problem:
min         j = 1 n x i x j 2 2 a i j + γ j = 1 n a i j 2 s . t .   a i Τ 1 = 1 , a i 0 , a i i = 0
where a i represents the set of the edge weight a i j of the i - th node.
With the aim to achieve efficient and high performance, we learned a sparse affinity matrix A . In order to make Problem (5) have k nonzero values accurately or constrain the L0-norm of a i to k , we selected the affinities with the maximal γ . Therefore, we have Problem (6) as follows.
max γ s . t .   a ^ i 0 = k
where a ^ is the optimal solution in the Problem (5).
Let D R n × n denote the distance matrix for the dataset. We used the squared Euclidean distance to compute d i j x in D ( i , j ) in this paper.
d i j x = x i x j 2 2
where d i x presents a vector with j - th element as d i j x .
Assuming the constructed affinity matrix A satisfied that the number of neighbors of each sample is k , then each row in A should have exactly k nonzero values. Solve each row in A one by one by the way of solving Problem (5). Then, the optimization problem of each row in A can be described as follows.
min         1 2 a i d i x 2 γ 2 2 s . t .   a i T 1 = 1 , a i 0 , a i i = 0
Solving Problem (8) by the Lagrangian function, then we get Problem (9) as follows:
L ( a i , η , β i ) = 1 2 a i + d i x 2 γ 2 2 η ( a i T 1 1 ) β i T a i
where η and β i 0 are the Lagrange multipliers.
After a series of algebraic operations, we can obtain the optimal affinities a ^ i j as follows:
a ^ i j = d i , k + 1 x d i j x k d i , k + 1 x h = 1 k d i h x j k 0 j > k

3.2. Label Propagation

In this section, we started to label propagation on labeled and unlabeled samples to obtain the predicted labels of the unlabeled samples. First, we used Equation (11) to calculate the probability transition matrix Tn×n based on the affinity matrix A computed by the graph construction approach in Section 3.1.
T i j = P ( i j ) = a i j q = 1 n a i q
where T i j represents the propagation probability from node i to node j .
Before label propagation, we needed to construct a label matrix Y first to show the change of label in the process of label propagation. Therefore, we defined a label matrix Y L R l × c , where the i - th row on behalf of the label probability of node y i , and the j - th column represents the label class corresponding to the sample. If Y L i j = 1 , then the label of node y i is represented by j , where 1 j c .
Y L i j { 1 i f   y i = j 0 o t h e r w i s e
where 1 i l ,   1 j c .
Meanwhile, given an unlabeled matrix Y U R u × c for labeled samples and Y U i j = 0 , where l + 1 i n ,   1 j c . Finally, define a label matrix Y R ( l + u ) × c ,   Y = ( Y L ; Y U ) . Through probability propagation, make the distribution concentrate on the given class, and then the node label is propagated through the weight value of the edge.
Each node adds up the label information propagated by its surrounding nodes based on the weight on the basis of the propagation probability Y , and updates their probability distribution Y by Equation (13), where Y i j is the probability of i - th samples belonging to class j .
Y i j = r = 1 n T i r Y r j ,   1 i n ,   1 j c
Aiming at preventing the ground truth label of labeled samples being covered, the probability distribution of the labeled sample is re-assigned to its corresponding initial value, as shown by the following formula:
Y i j = Y L i j 1 i l ,   1 j c
Repeat the steps above, until Y converges.
Finally, obtain the predicted labels of unlabeled samples by Equation (15) as follows.
y i = arg max j c Y i j ,   l + 1 i n
Now, we summarize the process of the proposed method in Section 3.2 in Algorithm 1.
Algorithm 1 Constrained Label Propagation Algorithm
Input: Data set D l + D u = { x 1 , , x l , x l + 1 , , x n } , where D L = { ( x 1 , y 1 ) , , ( x l , y l ) } R d are labeled dataset and its labels, D U = { x l + 1 , , x l + u } R d are unlabeled dataset.
1: Initialization: compute the affinity matrix A = R n × n though Equation (10);
2: Compute probability transition matrix T n × n based on the affinity matrix A using Equation (11);
3: Define a labeled matrix Y L R l × c using Equation (12) and   Y = ( Y L ; Y U ) R n × c ;
4: repeat
5: Propagate labels: update Y by using Equation (13):   Y T Y ;
6: Clamp the labeled data: Update   Y by using Equation (14);
7: until the   Y converges;
8: Calculate the label yi of unlabeled data by Equation (15);
Output: The predicted label set   Y U of unlabeled dataset   D U

3.3. The Proposed Graph-Based Semi-Supervised Model Combined with Particle Cooperation and Competition

According to the above Section 3.1 and, we can obtain a dataset X = { x 1 , , x l , x l + 1 , , x l + u } where x i R d , a labeled set D L = { ( x 1 , y 1 ) , , ( x l , y l ) } R d , an unlabeled set and its corresponding predicted labels D U = { ( x l + 1 , y l + 1 ) , , ( x l + u , y l + u ) } R d . Let S = { x 1 , , x l , x l + 1 , , x l + u + 1 , , x l + u + t } R d denote the d-dimensional dataset of size s = l + u + t , which includes the total number of labeled set l , unlabeled set u , and testing set t . Our goal was to correct the misclassified labels in the unlabeled set and calculate the predicted label of the testing set according to the usage of the labeled set and unlabeled set with its predicted labels. The main ideas of the proposed CLPPCC model can be summarized in five parts as follows:
  • Initial configuration
Let V = { v 1 , , v s } represent the vertex set of size V , E = { ( v i , v j ) | v i , v j V , i j } on behalf of the edge set each corresponding to a couple of vertices. An undirected graph G = ( V , E ) was constructed based on the dataset T. Every v i corresponds to a sample x i . E can be represented by a similarity matrix A , which is calculated by the approach in Section 3.1.
For every labeled sample, x i { x 1 , , x l , x l + 1 , , x n } or its corresponding node v i { v 1 , , v n } , where n = l + u , a particle ρ i { ρ 1 , ρ 2 , , ρ n } is generated. In addition, we called the particles placed on the labeled vertices as home vertices. For each node v i corresponding to one vector variable v i ω ( t ) = { v 1 ω 1 ( t ) , v 2 ω 2 ( t ) , , v c ω c ( t ) } , where v i ω i ( t ) [ 0 , 1 ] ,   1 i n is associated with the dominant level of team ζ { 1 , , c } over node v i . The initial dominant level vector v i ω can be computed as follows:
v i ω ζ ( 0 ) = 1 i f   y i = ζ 0   i f   y i ζ   a n d   y i L 1 c i f   y i =
Among them, particles will increase the node control level of its own team, while reducing the control level of other teams, so the total control level of each node is always constant, ζ = 1 c v i ω ζ = 1 .
For each particle, the initial position is also called the home node. The initial strength of each particle is given as follows:
ρ j ω ( 0 ) = 1
The initial distance table for each particle is defined as Equation (18).
ρ j d i ( t ) = 0 i f   i = j n 1 i f   i j
  • Nodes and particles dynamics
During the process of the visit, the updated domination level v i ω ζ ( t ) is as follows:
v i ω ζ ( t + 1 ) = max { 0 , v i ω ζ ( t ) Δ v ρ j ω ( t ) c 1 } i f   y i 0   a n d   ζ ρ j f v i ω ζ ( t ) + q ζ v i ω q ( t ) v i ω q ( t + 1 )   i f   y i 0   a n d   ζ = ρ j f v i ω ζ ( t ) i f   y i L
The updated particle strength ρ j ω ( t ) is defined as Equation (20).
ρ j ω ( t + 1 ) = v i ω ζ ( t + 1 )
and the updated distance table is given as follows:
ρ j d k ( t + 1 ) = ρ j d i ( t ) + 1 i f   ρ j d i ( t ) + 1 < ρ j d k ( t ) ρ j d k ( t ) o t h e r w i s e
  • Random-Greedy walk
There are two types of walk to select a nearby node to visit: Random walk and greedy walk. As for random walk, according to the probabilities, the particle ρ j transfers to the other node. It is defined as follows.
p ( v i | ρ j ) = W q i μ = 1 s W q μ
where q is the current node’s index of ρ j .
In greedy walk, the particle ρ j moves to the other node though the new probabilities, which is given as follows:
p ( v i | ρ j ) = W q i v i ω ζ 1 ( 1 + ρ j d i ) 2 μ = 1 s W q μ v i ω ζ 1 ( 1 + ρ j d i ) 2
where q is the current node’s index of ρ j , ζ = ρ j f , and ρ j f is the label class of ρ j .
  • Stop Criterion
For each node, record the average maximum domination levels ( v i ω ζ ,   ζ = arg max q v i ω q ) . Then, stop the overall model when the quantity does not increase. When the algorithm stops the criterion completely, each node will be labeled, and some will be relabeled to the class who has a higher domination level of them.
y i = arg max ζ v i ω ζ
Overall, according to above section, we can build an innovative graph-based semi-supervised model on the grounds of constrained label propagation and particle cooperation and competition.

4. Experiments and Analysis

Several experiments were performed to evaluate the learning efficiency of our proposal and demonstrate its superiority when compared to alternatives in this section. In our work, all experiments were carried out in a MATLAB R2019b environment on a computer with Intel Core i7-10750H CPU 2.6 GHz and 16 GB RAM.

4.1. Experimental Setup

4.1.1. Hyperspectral Images

  • Indian Pines image
Indian Pines image was the first and famous testing data for hyperspectral image classification, which was obtained by AVIRIS sensors in 1992. The image of Indian Pines includes 145 × 145 pixels and 220 spectral bands covering the range of 0.4 to 2.5 μm, which shows a mixed agricultural/forest area in northwestern Indiana, America. There are 16 inclusive classes in the available ground-truth map such as alfalfa, corn-mintill, soybean-nottill, etc., which are remarked as C1–C16. Figure 1a–c shows the false color image, ground-truth image, and reference map, respectively.
  • Pavia University scene
Figure 1. Indian Pines image. (a) False color image. (b) Ground truth image. (c) reference map.
In 2003, the ROSIS sensor collected the Pavia University scene. The scene of Pavia University includes 610 × 340 pixels and consists of 103 spectral bands with the wavelength varying with 0.43 to 0.86 μm. There are nine available inclusive classes in the ground-truth map including asphalt, meadows, gravel, etc. The classes in this scene are remarked as C1–C9 in the experimental results. Figure 2a–c shows the false color scene, ground-truth image, and reference map of the Pavia University scene, respectively.
  • Salinas image
Figure 2. Pavia University scene. (a) False color image. (b) Ground truth image. (c) Reference map.
The Salinas image was collected in Salinas Valley, California, USA by AVIRIS. The image of Salinas included 512 × 217 pixels and 224 spectral bands. Those bands that were absorbed by water were discarded in our experiments. The false color scene is shown in Figure 3a. As is shown in Figure 3b,c, the ground-truth map and reference map included 16 classes such as fallow, stubble, celery, etc. The classes here are remarked as C1–C16.
Figure 3. Salinas image. (a) False color image. (b) Ground truth image. (c) Reference map.

4.1.2. Evaluation Criteria

Three evaluation criteria were adopted to judge the effectiveness of each model.
  • Overall Accuracy, OA
The overall classification accuracy can visually display the classification results by counting the number of samples classified correctly. The calculation method is the following expression:
P OA = 1 N i = 1 n m i i
where N represents the total number of samples and m i i represents the number of samples classified into correct class i .
  • Average Accuracy, AA
P AA = 1 N i = 1 n P CA
where P CA = m i i / N i represents the classification results of single class. AA means the proportion of samples that are successfully classified to their class in the total number of samples in that class. AA is equal to OA when the number of each class is equal to each other.
  • Kappa Coefficient
K = N × i = 1 n m i i i = 1 n N i × m i i N 2 i = 1 n N i × m i i
The larger the value of K , the higher the consistency between the classified image and original image.

4.1.3. Comparative Algorithms

  • TSVM: Transductive Support Vector Machine algorithm [].
  • LGC: The Local and Global Consistency graph-based algorithm [].
  • LPA: the original Label Propagation Algorithm [].
  • LPAPCC: the original Label Propagation Algorithm combined with Particle Cooperation and Competition without the novel graph construction mentioned in Section 3.1.

4.2. Classification of Hyperspectral Images

With the purpose of estimating the performance of our proposed CLPPCC algorithm, several related algorithms were used for comparative experiments such as TSVM, LGC, and LPA. To highlight the importance of the novel graph construction approach on the increased classification performance, we also compared the performance of our proposal and label propagation algorithm combined with particle competition and cooperation (LPAPCC), which do not use the novel graph construction to construct the graph, that is, LPA was only optimized by particle competition and cooperation. Before classifying, aiming at eliminating the redundant and noisy information simultaneously, three hyperspectral datasets were preprocessed by the approach named image fusion and recursive filtering feature (IFRF). IFRF is an effective and powerful feature extraction method. The basic ideas of IFRF are as follows: First, the hyperspectral image is divided into several subsets of adjacent hyperspectral bands. Then, the bands in each subset are fused together by averaging, which is one of the simplest image fusion methods. Finally, the fused frequency band is filtered recursively in the transform domain to obtain the final features for classification. For each hyperspectral dataset we used in the experiments, the training set was composed of 30% samples randomly selected from each class, and the rest of the available samples made up the testing set. Among the training set, it was divided into the labeled set and unlabeled set. For the Indian Pines image dataset, the labeled set was made up of 10% randomly selected samples from each class in the training set, and the remaining samples were defined as the unlabeled set without labels. For the Pavia university scene dataset, the size for the percentage of the labeled set was set to 5%. For the Salinas image dataset, the size for the percentage of the labeled set was set to 2%. Considering the difference between the classes whose total sample number was a few, a minimum threshold was set to five for each class in both the training set and testing set. That is, regardless of the percentage of the selected samples, each class had at least five samples in the training set and labeled set. Each experiment obeyed the above sample division rules. Table 1 shows the distribution of each sample set. Among them, the labeled set and unlabeled set are represented by L and U , respectively. Here, AA, OA, and kappa coefficient were introduced as evaluation criteria to assess the effectiveness of the models here. To reduce the impact of randomness on the results, the final record result adopted the average result of 10 repeated runs for each hyperspectral dataset and each model.
Table 1. Training and testing samples for the three hyperspectral images.
The class-specific accuracies are given in Table 2, Table 3 and Table 4. We can observe the results obtained by each algorithm including AAs, OAs, and Kappa coefficients in three hyperspectral datasets, respectively. For the purpose of analyzing the impact of the neighborhood size k , which plays an important role in calculating the affinity matrix A , we set up a comparative experiment for each comparison algorithm and our proposal when k was 5 and 10, respectively (see Table 2, Table 3 and Table 4). The best classification results among the various algorithms are in bold. In order to show and compare the classification performance intuitively, the predicted classification maps of the Indian Pines, Pavia University, and Salinas Valley datasets can be clearly observed in Figure 4, Figure 5 and Figure 6, respectively. In Figure 4, Figure 5 and Figure 6, the neighborhood size k was set to five.
Table 2. Indian Pines image classification results.
Table 3. Pavia University scene classification results.
Table 4. Salinas image classification results.
Figure 4. Indian Pines image classification results: (a) TSVM; (b) LGC; (c) LPA; (d) LPAPCC; (e) CLPPCC. OA, Overall Accuracy.
Figure 5. Pavia University scene classification results: (a) TSVM; (b) LGC; (c) LPA; (d) LPAPCC; (e) CLPPCC. OA, Overall Accuracy.
Figure 6. Salinas image classification results: (a) TSVM; (b) LGC; (c) LPA; (d) LPAPCC; (e) CLPPCC. OA, Overall Accuracy.
From the experimental results on the Indian Pines dataset in Table 2, we can observe that TSVM, LGC, LPA ( k = 5 ), LPAPCC ( k = 5 ), CLPPCC ( k = 5 ), LPA ( k = 10 ), LPAPCC ( k = 10 ), and CLPPCC ( k = 10 ) achieved 0, 1, 2, 4, 3, 1, 1, and 9 best-specific accuracies, respectively. Particularly, CLPPCC ( k = 10 ) achieved 9 best-specific accuracy. From Table 3, we can observe in the results in the Pavia University dataset that TSVM, LGC, LPA ( k = 5 ), LPAPCC ( k = 5 ), CLPPCC ( k = 5 ), LPA ( k = 10 ), LPAPCC ( k = 10 ), and CLPPCC ( k = 10 ) obtained 0, 0, 2, 5, 0, 0, 1, and 1 best class-specific accuracies, respectively. Here, our proposal not only performed well in a large number of class samples, but also performed well in a small number of class samples. For the Salinas dataset, Table 4 shows that TSVM, LGC, LPA ( k = 5 ), LPAPCC ( k = 5 ), CLPPCC ( k = 5 ), LPA ( k = 10 ), LPAPCC ( k = 10 ), and CLPPCC ( k = 10 ) achieved 1, 3, 7, 5, 5, 2, 4, and 6 best class-specific accuracies, respectively. Our proposal CLPPCC performed well in both k = 5 and k = 10 . The classification results in Figure 4, Figure 5 and Figure 6 show that CLPPCC was more satisfactory than the other compared algorithms when the parameter k was set to five.
From the above experimental results shown in Table 2, Table 3 and Table 4, several conclusions can be summarized. First, CLPPCC ( k = 10 ) achieved the best OAs, AAs, and kappa coefficients for all three hyperspectral datasets. Second, the classification performance of TSVM, LGC, LPA ( k = 5 ), and LPAPCC ( k = 5 ) was not stable in all datasets as they obtained low class-specific accuracies in some small number classes. Third, compared with LPA, LPAPCC and CLPPCC, CLPPCC had the best performance in OAs, AAs, and kappa coefficients in both k = 5 and k = 10 for all datasets. LPA and LPAPCC were significantly affected by the value of k . When k was set to five, the LPA and LPAPCC performed terribly in a small number of class samples such as C1 in the Indian Pines dataset. When k was set to 10, both LPA, LPAPCC, and CLPPCC improved their performances. CLPPCC was affected by the value of k slightly, and it kept good performance regardless of what the value of k was. The reason why it worked well is because the graph construction approach we used could build a graph that closely related to the samples. Unlike the traditional KNN graph construction, our approach was less affected by the value of k . Fourth, compared with LPA and LPAPCC, LPAPCC performed better than LPA in both k = 5 and k = 10 for all datasets as it benefitted from the particle competition and cooperation mechanism, which could remark on the misclassified labels from unlabeled samples and could achieve great effectiveness even though there were some misclassified labels in the training process.

4.3. Running time

We recorded and analyzed the average running time of each model in seconds on the three hyperspectral datasets in Table 5. The running time in Table 5 shows that the execution time of LPA ( k = 10 ), LPAPCC ( k = 10 ), and CLPPCC ( k = 10 ) was slightly longer than LPA ( k = 5 ), LPAPCC ( k = 5 ) and CLPPCC ( k = 5 ). It can be seen from Table 2, Table 3 and Table 4 that when the value of k was larger, the higher the overall accuracy obtained. However, we can observe from Table 5 that the running time slowed down as the value of k increased from five to 10. Next, although our proposal CLPPCC was slower than LPA, the classification performance of CLPPCC was better than LPA. This is because the PCC mechanism needs to remark and modifies the misclassified labels in unlabeled samples, and the cost of running time is exchanged for an increase in class-specific accuracy, OAs, AAs, and kappa coefficients. Then, it is worth saying that although the OAs of TSVM was not the best when compared to other algorithms, the running time of TSVM was the fastest among the label propagation algorithms. The running time of LGC was as fast as TSVM, but the performance of LGC was worse than that of TSVM.
Table 5. The running time (in seconds) of CLPPCC and the comparative algorithms on HSIs datasets.

4.4. Robustness of the Proposed Method

The effectiveness of our proposed CLPPCC algorithm was appraised in several experiments above. In this section, we analyzed the robustness of our proposal and other comparison algorithms in order to evaluate the superiority of our proposal. We compared our method with alternatives in different sizes of labeled samples and noise situation and the experimental results indicated the robustness of our proposal. Here, the parameter k was set to five.

4.4.1. Labeled Size Robustness

Several experiments with different size of labeled samples were performed in this section. To reduce randomness, we picked the average results of 10 runs repeatedly as the final record result for each hyperspectral dataset. Figure 7 shows the overall classification accuracy of our proposal and four alternatives in different sizes of labeled samples. The labeled samples percentage was 5%, 10%, 12%, 14%, and 16% for the Indian Pines dataset. For the Pavia University scene, the percentage of labeled samples was set to 3%, 5%, 7%, 9%, and 11%. The labeled samples percentage for the Salinas image was set to 1%, 2%, 5%, 8%, and 10%.
Figure 7. HSIs Classification accuracy with varying labeled ratio: (a) Indian Pines image; (b) Pavia University scene; and (c) Salinas image.
From Figure 7, it can be simply found that the higher ratio of labeled samples, the better classification accuracy (OAs) that each algorithm achieved. Simultaneously, other evaluation standards such as AAs, kappa coefficients, and class-specific accuracy were enhanced with the increase in the labeled ratio. Note that in most cases, our CLPPCC obtained the best performance of OAs in all three datasets, which indicates the robustness of CLPPACC in various sizes of labeled samples. It is worth noticing that our proposal achieved a higher classification result even if in a low size of labeled samples and the compared algorithms rarely obtained high classification results with a small size of labeled samples.

4.4.2. Noise Robustness

The noise robustness was analyzed in this section. We added 20 dB of Gaussian noise into each hyperspectral dataset. We selected the average results of 10 runs repeatedly as the final recorded result for each hyperspectral dataset to reduce occasionality. Figure 8a,b demonstrates the clean and noisy images, respectively. For the purpose of showing the impact of noise in overall accuracy, the overall accuracy results of the three hyperspectral datasets are shown in Table 6. Figure 8 shows the false-color clean and noisy image of the Indian Pines image. The classification results of three noisy hyperspectral datasets and their corresponding OA values are shown in Figure 9, Figure 10 and Figure 11. Here, the parameter of k was set to five for all approaches and three HSIs.
Figure 8. False-color clean and noisy image of the Indian Pines image: (a) clean image; (b) noisy image with Gaussian noise.
Table 6. Classification results with Gaussian noise on three HSIs.
Figure 9. Noisy Indian Pines image classification results: (a) TSVM; (b) LGC; (c) LPA; (d) LPAPCC; (e) CLPPCC. OA, Overall Accuracy.
Figure 10. Noisy Pavia University scene classification results: (a) TSVM; (b) LGC; (c) LPA; (d) LPAPCC; (e) CLPPCC. OA, Overall Accuracy.
Figure 11. Noisy Salinas image classification results: (a) TSVM; (b) LGC; (c) LPA; (d) LPAPCC; (e) CLPPCC. OA, Overall Accuracy.
From Table 6 and Figure 9, Figure 10 and Figure 11, the performances of all algorithms were affected by the Gaussian noise added into the datasets. The performance degradation of our algorithm was the lowest compared to the others. In addition, the proposed method still kept a high classification result, which indicates that our method is robust.

5. Conclusions

In this paper, we considered a graph-based semi-supervised problem where the usage of unlabeled samples might deteriorate the model performance in HSI classification. In addition, the IFRF method was used to remove the redundant feature information in HSIs. Several conclusions were summarized based on the experiments we performed. First, our proposed CLPPCC was effective for HSI classification. The classification results of CLPPCC were superior to the comparison algorithms among the three hyperspectral datasets. Second, the constrained graph construction approach plays an important role in CLPPCC, which helps CLPPCC keep a high overall accuracy when the percentage of labeled hyperspectral samples is low. Third, the PCC mechanism used in CLPPCC decreases the impact of label noise in an unlabeled hyperspectral dataset. At last, CLPPCC is robust in noise HSIs, and also keeps high overall accuracy with varying labeled ratio.
There are many interesting future works. For instance, the label propagation in our current proposal was based on a classical and initial learning method where several prior strategies and knowledge may be not leveraged. More flexible methods that are able to incorporate the domain knowledge are worth trying in the future.

Author Contributions

Conceptualization, Z.H.; Data curation, Z.H. and B.Z.; Formal analysis, Z.H. and B.Z.; Funding acquisition, Z.H., K.X. and T.L.; Investigation, Z.H.; Methodology, Z.H. and K.X.; Project administration, Z.H., K.X. and T.L.; Resources, Z.H. and B.Z.; Software, Z.H.; Supervision, K.X. and T.L.; Validation, Z.H.; Visualization, Z.H.; Writing—original draft, Z.H.; Writing—review & editing, Z.H., K.X., B.Z., Z.Y. and J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. U1813222); the Tianjin Natural Science Foundation (No. 18JCYBJC16500); the Key Research and Development Project from Hebei Province (No.19210404D); and the Other Commissions Project of Beijing (No. Q6025001202001).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: [http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes].

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Deng, C.; Ji, R.; Liu, W.; Tao, D.; Gao, X. Visual Reranking through Weakly Supervised Multi-graph Learning. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2600–2607. [Google Scholar]
  2. Vanegas, J.A.; Escalante, H.J.; González, F.A. Scalable multi-label annotation via semi-supervised kernel semantic embedding. Pattern Recognit. Lett. 2019, 123, 97–103. [Google Scholar] [CrossRef]
  3. Joachims, T. Transductive inference for text classification using support vector machines. In Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, 27–30 June 1999; pp. 200–209. [Google Scholar]
  4. Blum, A.; Mitchell, T. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theroy, Madison, WI, USA; 1998; pp. 92–100. [Google Scholar]
  5. Zhu, X.J.; Zoubin, Z. Learning from Labeled and Unlabeled Data with Label Propagation; Carnegie Mellon Univerisity: Pittsburgh, PA, USA, 2002. [Google Scholar]
  6. Berthelot, D.; Carlini, N.; Goodfellow, L.; Oliver, A.; Papernot, N.; Raffel, C. Mixmatch: A holistic approach to semi-supervised learning. arXiv 2020, arXiv:1905.02249. [Google Scholar]
  7. Sohn, K.; Berthelot, D.; Li, C.; Zhang, Z.; Carlini, N.; Cubuk, E.; Kurakin, A.; Zhang, H.; Raffel, C. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. arXiv 2020, arXiv:2001.07685. [Google Scholar]
  8. Igor, S. Semi-supervised neural network training method for fast-moving object detection. In Proceedings of the 2018 14th Symposium on Neural Networks and Applications (NEUREL), Belgrade, Serbia, 20–21 November 2018; pp. 1–6. [Google Scholar]
  9. Hoang, T.; Engin, Z.; Lorenza, G.; Paolo, D. Detecting mobile traffic anomalies through physical control channel fingerprinting: A deep semi-supervised approach. IEEE Access 2019, 7, 152187–152201. [Google Scholar]
  10. Tokuda, E.K.; Ferreira, G.B.A.; Silva, C.; Cesar, R.M. A novel semi-supervised detection approach with weak annotation. In Proceedings of the 2018 IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI), Las Vegas, NV, USA, 8–10 April 2018; pp. 129–132. [Google Scholar]
  11. Chen, G.; Liu, L.; Hu, W.; Pan, Z. Semi-Supervised Object Detection in Remote Sensing Images Using Generative Adversarial Networks. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2018; pp. 2503–2506. [Google Scholar]
  12. Zu, B.; Xia, K.; Du, W.; Li, Y.; Ali, A.; Chakraborty, S. Classification of Hyperspectral Images with Robust Regularized Block Low-Rank Discriminant Analysis. Remote Sens. 2018, 10, 817. [Google Scholar] [CrossRef]
  13. Liu, C.; Li, J.; He, L. Superpixel-Based Semisupervised Active Learning for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 12, 1–14. [Google Scholar] [CrossRef]
  14. Shi, C.; Lv, Z.; Yang, X.; Xu, P.; Bibi, I. Hierarchical Multi-View Semi-supervised Learning for Very High-Resolution Remote Sensing Image Classification. Remote Sens. 2020, 12, 1012. [Google Scholar] [CrossRef]
  15. Zhou, S.; Xue, Z.; Du, P. Semisupervised Stacked Autoencoder With Cotraining for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 3813–3826. [Google Scholar] [CrossRef]
  16. Ahmadi, S.; Mehreshad, N. Semisupervised classification of hyperspectral images with low-rank representation kernel. J. Opt. Soc. Am. A 2020, 37, 606–613. [Google Scholar] [CrossRef] [PubMed]
  17. Mukherjee, S.; Cui, M.; Prasad, S. Spatially Constrained Semisupervised Local Angular Discriminant Analysis for Hyperspectral Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 11, 1203–1212. [Google Scholar] [CrossRef]
  18. Mohanty, R.; Happy, S.L.; Routray, A. A Semisupervised Spatial Spectral Regularized Manifold Local Scaling Cut with HGF for Dimensionality Reduction of Hyperspectral Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 3423–3435. [Google Scholar] [CrossRef]
  19. Wu, Y.; Mu, G.; Qin, C.; Miao, Q.-G.; Ma, W.; Zhang, X. Semi-Supervised Hyperspectral Image Classification via Spatial-Regulated Self-Training. Remote Sens. 2020, 12, 159. [Google Scholar] [CrossRef]
  20. Hu, Y.; An, R.; Wang, B.; Xing, F.; Ju, F. Shape Adaptive Neighborhood Information-Based Semi-Supervised Learning for Hyperspectral Image Classification. Remote Sens. 2020, 12, 2976. [Google Scholar] [CrossRef]
  21. Triguero, I.; García, S.; Herrera, F. Self-labeled techniques for semi-supervised learning: Taxonomy, software and empirical study. Knowl. Inf. Syst. 2015, 42, 245–284. [Google Scholar] [CrossRef]
  22. Wang, D.; Nie, F.; Huang, H. Large-scale adaptive semi-supervised learning via unified inductive and transductive model. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 482–491. [Google Scholar]
  23. Abu-Aisheh, Z.; Raveaux, R.; Ramel, J.-Y. Efficient k-nearest neighbors search in graph space. Pattern Recognit. Lett. 2020, 134, 77–86. [Google Scholar] [CrossRef]
  24. Yang, X.; Deng, C.; Liu, X.; Nie, F. New -norm relaxation of multi-way graph cut for clustering. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
  25. Nie, F.; Wang, X.; Jordan, M.; Huang, H. The constrained Laplacian rank algorithm for graph-based clustering. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; AAAI Press: New Orleans, LA, USA, 2016. [Google Scholar]
  26. Breve, F.; Zhao, L.; Quiles, M.; Pedrycz, W.; Liu, C.-M. Particle Competition and Cooperation in Networks for Semi-Supervised Learning. IEEE Trans. Knowl. Data Eng. 2011, 24, 1686–1698. [Google Scholar] [CrossRef]
  27. Breve, F.A.; Zhao, L. Particle Competition and Cooperation to Prevent Error Propagation from Mislabeled Data in Semi-supervised Learning. In Proceedings of the 2012 Brazilian Symposium on Neural Networks, Curitiba, Brazil, 20–25 October 2012; pp. 79–84. [Google Scholar]
  28. Breve, F.A.; Zhao, L.; Quiles, M.G. Particle competition and cooperation for semi-supervised learning with label noise. Neurocomputing 2015, 160, 63–72. [Google Scholar] [CrossRef]
  29. Tan, K.; Zhang, J.; Du, Q.; Wang, X. GPU Parallel Implementation of Support Vector Machines for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 4647–4656. [Google Scholar] [CrossRef]
  30. Gao, L.; Plaza, L.; Khodadadzadeh, M.; Plaza, J.; Zhang, B.; He, Z.; Yan, H. Subspace-Based Support Vector Machines for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2014, 12, 349–353. [Google Scholar] [CrossRef]
  31. Liu, L.; Huang, W.; Liu, B.; Shen, L.; Wang, C. Semisupervised Hyperspectral Image Classification via Laplacian Least Squares Support Vector Machine in Sum Space and Random Sampling. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 4086–4100. [Google Scholar] [CrossRef]
  32. Haut, J.M.; Paoletti, M.E.; Plaza, J.; Plaza, L.; Plaza, A. Active Learning with Convolutional Neural Networks for Hyperspectral Image Classification Using a New Bayesian Approach. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6440–6461. [Google Scholar] [CrossRef]
  33. Priya, T.; Prasad, S.; Wu, H. Superpixels for Spatially Reinforced Bayesian Classification of Hyperspectral Images. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1071–1075. [Google Scholar] [CrossRef]
  34. Ham, J.; Chen, Y.; Crawford, M.; Ghosh, J. Investigation of the random forest framework for classification of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 2005, 43, 492–501. [Google Scholar] [CrossRef]
  35. Zhang, Y.; Cao, G.; Li, X.; Wang, B. Cascaded Random Forest for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 1082–1094. [Google Scholar] [CrossRef]
  36. Peerbhay, K.Y.; Mutanga, O.; Ismail, R. Random Forests Unsupervised Classification: The Detection and Mapping of Solanum mauritianum Infestations in Plantation Forestry Using Hyperspectral Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 3107–3122. [Google Scholar] [CrossRef]
  37. Tu, B.; Huang, S.; Fang, L.; Zhang, G.; Wang, J.; Zheng, B. Hyperspectral Image Classification via Weighted Joint Nearest Neighbor and Sparse Representation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 4063–4075. [Google Scholar] [CrossRef]
  38. Blanzieri, E.; Melgani, F. Nearest Neighbor Classification of Remote Sensing Images with the Maximal Margin Principle. IEEE Trans. Geosci. Remote Sens. 2008, 46, 1804–1811. [Google Scholar] [CrossRef]
  39. Zhao, Y.; Su, F.; Fengqin, Y. Novel Semi-Supervised Hyperspectral Image Classification Based on a Superpixel Graph and Discrete Potential Method. Remote Sens. 2020, 12, 1528. [Google Scholar] [CrossRef]
  40. Jamshidpour, N.; Safari, A.; Homayouni, S. A GA-Based Multi-View, Multi-Learner Active Learning Framework for Hyperspectral Image Classification. Remote Sens. 2020, 12, 297. [Google Scholar] [CrossRef]
  41. Zhang, Y.; Cao, G.; Li, X.; Wang, B.; Fu, P. Active Semi-Supervised Random Forest for Hyperspectral Image Classification. Remote Sens. 2019, 11, 2974. [Google Scholar] [CrossRef]
  42. Ou, D.; Tan, K.; Du, Q.; Zhu, J.; Wang, X.; Chen, Y. A Novel Tri-Training Technique for the Semi-Supervised Classification of Hyperspectral Images Based on Regularized Local Discriminant Embedding Feature Extraction. Remote Sens. 2019, 11, 654. [Google Scholar] [CrossRef]
  43. Cui, B.; Xie, X.; Hao, S.; Cui, J.; Lu, Y. Semi-Supervised Classification of Hyperspectral Images Based on Extended Label Propagation and Rolling Guidance Filtering. Remote Sens. 2018, 10, 515. [Google Scholar] [CrossRef]
  44. Xue, Z.; Du, P.; Su, H.; Zhou, S. Discriminative Sparse Representation for Hyperspectral Image Classification: A Semi-Supervised Perspective. Remote Sens. 2017, 9, 386. [Google Scholar] [CrossRef]
  45. Xia, J.; Liao, W.; Du, P. Hyperspectral and LiDAR Classification with Semisupervised Graph Fusion. IEEE Geosci. Remote Sens. Lett. 2020, 17, 666–670. [Google Scholar] [CrossRef]
  46. Cao, Z.; Li, X.; Zhao, L. Semisupervised hyperspectral imagery classification based on a three-dimensional convolutional adversarial autoencoder model with low sample requirements. J. Appl. Remote Sens. 2020, 14, 024522. [Google Scholar] [CrossRef]
  47. Zhao, W.; Chen, X.; Bo, Y.; Chen, J. Semisupervised Hyperspectral Image Classification with Cluster-Based Conditional Generative Adversarial Net. IEEE Geosci. Remote Sens. Lett. 2020, 17, 539–543. [Google Scholar] [CrossRef]
  48. Fahime, A.; Mohanmad, K. Improving semisupervised hyperspectral unmixing using spatial correlation under a polynomial postnonlinear mixing model. J. Appl. Remote Sens. 2019, 13, 036512. [Google Scholar]
  49. Kang, X.; Li, S.; Benediktsson, J.A. Feature Extraction of Hyperspectral Images with Image Fusion and Recursive Filtering. IEEE Trans. Geosci. Remote Sens. 2013, 52, 3742–3752. [Google Scholar] [CrossRef]
  50. Zhou, D.; Olivier, B.; Thomas, N. Learning with Local and Global Consistency; MIT Press: Cambridge, MA, USA, 2003; pp. 321–328. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.