Hyperspectral Image Classiﬁcation Based on Sparse Superpixel Graph

: Hyperspectral image (HSI) classiﬁcation is one of the major problems in the ﬁeld of remote sensing. Particularly, graph-based HSI classiﬁcation is a promising topic and has received increasing attention in recent years. However, graphs with pixels as nodes generate large size graphs, thus increasing the computational burden. Moreover, satisfactory classiﬁcation results are often not obtained without considering spatial information in constructing graph. To address these issues, this study proposes an efﬁcient and effective semi-supervised spectral-spatial HSI classiﬁcation method based on sparse superpixel graph (SSG). In the constructed sparse superpixels graph, each vertex represents a superpixel instead of a pixel, which greatly reduces the size of graph. Meanwhile, both spectral information and spatial structure are considered by using superpixel, local spatial connection and global spectral connection. To verify the effectiveness of the proposed method, three real hyperspectral images, Indian Pines, Pavia University and Salinas, are chosen to test the performance of our proposal. Experimental results show that the proposed method has good classiﬁcation completion on the three benchmarks. Compared with several competitive superpixel-based HSI classiﬁcation approaches, the method has the advantages of high classiﬁcation accuracy (>97.85%) and rapid implementation (<10 s). This clearly favors the application of the proposed method in practice.


Introduction
Although continuous development of hyperspectral sensors makes it easy to collect a large amount of hyperspectral data, labeling the acquired data is expensive [1]. As a result of this fact, semi-supervised hyperspectral image (HSI) classification methods have been extensively investigated in the past decades [2][3][4][5]. In turn, the successful application of HSI classification in urban mapping [6], environment monitoring [7], precision agriculture [8] and other fields promotes the improvement of classification methods. Currently, it still remains an issue worthy of further study to develop new semi-supervised methods to classify an HSI fast and accurately, due to its characteristics of big data [9,10].
Unlike big data recorded in other fields, one of the remarkable features of hyperspectral data is that it clearly contains spatial structure information in addition to spectral information. This means that classification results may be improved by integrating spatial and spectral information based on classification techniques or classifier. Satisfactory classification results of plenty of existing spectral-spatial HSI classification methods have demonstrated the feasibility of this integration [11][12][13][14][15][16]. Using fixed-size window technique, the methods of Markov random field [17], guided filter [18], discontinuity preserving relaxation [19,20], and recursive filtering [21] successfully adopted spatial information to smooth the noisy pixels contained in the HSI. The use of these smoothing techniques in An effective semi-supervised algorithm was developed to detect the community structure in complex networks through discrete potential theory [54]. The potential theory on directed graph was discussed to discover factors of the network formation [55]. The difference between different applications makes it meaningful to further extend this theory to more fields, especially to remote sensing.
Inspired by the works on superpixel-level classifier, label propagation on graph and discrete potential theory, in this work, a novel superpixel graph-based scheme is proposed to classify HSI efficiently. In this proposal, a novel and computationally simple distance is first defined to measure the similarity of two superpixels. A sparse superpixel graph (with superpixels as vertices) is then constructed by local spatial connection and global spectral connection. Based on the constructed superpixel graph and discrete potential theory, the HSI classification is converted into an optimization problem that can be solved by a system of linear algebraic equations. We here adopt the conjugate gradient descent method (CGDM) to find the approximate solution because of its rapid convergence and strong robustness. Finally, according to the greatest potential of each vertex computed by the CGDM, superpixels can be properly labeled. A range of experimental and comparative results on three benchmarks extensively validate our scheme.
The main technical contributions of the proposal include: • An efficient HSI classification scheme is suggested based on sparse superpixel graph. • A computationally simple but effective distance between superpixels is newly defined. • A sparse superpixel graph is constructed by using spectral-spatial connection strategy.

•
The use of CGDM in the proposal speeds up the process of label propagation on graph.
The rest of this paper is organized as follows: Section 2 details the process of the proposed method, including superpixel segmentation, the new definition of distance between superpixels, the construction of sparse superpixel graph and label propagation on graph. Experimental and comparative results are reported in Section 3. Section 4 analyzes the influence of the parameters used in our method. Finally, discussions and conclusions are given in Sections 5 and 6.

Methods
In this section, we describe the proposed method in detail. The classification framework includes dimension reduction, superpixel segmentation, construction of sparse superpixel graph and label propagation on graph, as shown in Figure 1.

Superpixel Segmentation
Superpixel segmentation is often understood as the problem of region localization of an image. Generally, the number of pixels contained in each superpixel is automatically generated in terms of local image structure by using superpixel segmentation algorithms. Thus, different superpixels may have different shapes. These adaptive features of superpixels are obviously propitious to good homogeneity of the generated superpixels.
Among many segmentation algorithms, the graph-based entropy rate superpixel segmentation (ERS) [56] is commonly used in remote sensing because of its excellent performance in good effectiveness and high efficiency. However, the high-dimension property of hyperspectral data does not allow to divide an HSI into superpixels by using classical ERS straightly. Therefore, we need to execute principal component analysis approach in advance and take the first principal component to generate the base image used for segmentation.
The ERS method is an optimization algorithm based on edge clustering. For userdefined number of superpixel p and a set of edges M, the ERS approach splits the base image into superpixels through optimizing the following objective function with restrict conditions according to set M [56]: where R(M) and B(M) are entropy rate term and balancing term, respectively. N M indicates the number of the obtained superpixels; ω is a balancing parameter, and E is an edge set. In Equation (1), the introduction of entropy rate term is to include spatial neighbor pixels with similar spectral information as much as possible in a superpixel. However, this may cause significant differences in superpixel size. The use of the balance term and balancing parameter effectively avoids this phenomenon. Interested readers can consult with Liu's work for more details of ERS method [56].

Distance between Superpixels
It is one of the key issues to properly define the distance between superpixels in the superpixel-level HSI methods. A desired distance should not only better measure the intimate relationship between two superpixels but also should be computationally simple. However, this is a nontrivial task because superpixels have different sizes. In [34,[37][38][39][40], the distance/similarity of a pair of superpixels was defined by using various techniques. Although the similarity of two superpixels can be better measured by these designed distances, it is time-consuming to calculate them.
To address this problem, a simple but effective distance between superpixels is defined in this work. In the suggested distance, we carefully consider the statistical features of a superpixel, that is, mean, median and mode. Specifically, for a generated superpixels S i containing n i pixels, we first construct the mean vector α i 1 , the median vector α i 2 and the mode vector α i 3 by orderly computing the mean, the median and the mode of each band of these n i pixels. The superpixel S i is then approximately represented by a newly defined sample with the same dimension as the original HSI, where w 1 and w 2 are the weights that control the importance of each component. Finally, a novel distance between superpixels S i and S j can be defined as where B is the number of bands.
In Equation (3), distance between the vectors α i and α j may choose the desired distance, for example, the commonly used Euclidean distance. Furthermore, the adoption of the median vector and the mode vector in Equation (2) effectively reduces the impact of noise pixels on distance computation. The advantage of the distance defined here is simple to calculate because it is independent of the shape and size of the superpixels. The validity of the above definition has been widely verified in the experiments carried out in this work.

The Construction of Sparse Superpixel Graph
A majority of existing work on HSI classification demonstrates that the graph-based methods have a promising future, partially due to amazing data presentation ability of graphs. In the field of remote sensing, a vertex of the graphs may represent a pixel or a superpixel. We term a graph as a superpixel graph if the vertices in a graph represent superpixels. It is clear that superpixel graph has a smaller size than the one with pixels as vertices. This means that it is possible to take less time to partition a graph. Considering this advantage of superpixel graph, we use a superpixel graph to represent an HSI for classification.
It is known that there is abundant spatial information in a given HSI, besides rich spectral information. Notably, the effective fusion of spectral and spatial information is helpful to improve the classification results. Therefore, we adopt the method of global spectral connection and local spatial connection to construct desired superpixel graph by using KNN rule twice.
During the spectral connection, we firstly calculate the distances of the superpixel S i to the rest according to Equation (3), and then arrange them in an ascending order. For user-specified k 1 , we finally connect the superpixel S i to its first k 1 superpixels. In this stage, we only consider the spectral information of superpixels and search the nearest neighbors of the superpixel in a global range. This connection approach attempts to deal with the problem that the superpixels belonging to the same class are disconnected or far from each other in space.
In the process of local connection, for each superpixel, one connects it to k 2 nearest neighbors in all its spatial adjacent superpixels in terms of Equation (3). This link strategy aims to maintain local consistency by density connections.
The constructed graph is an efficient combination of spectral-based graph and spatialbased graph. Please note that a superpixel may be linked twice in global and local connections. It has no effect on the constructed superpixel graph because we only record the link regardless of the number of links.
The superpixel graph constructed here is an unweighted and undirected sparse graph. This provides the basis to predict the classes of label-free vertices using a variety of classical or popular label propagation techniques. In next subsection, we would like to use the discrete potential theory and conjugate gradient descent algorithm to predict the class of each unlabeled vertex in the constructed graph.
Let G = (V, E) be the constructed superpixel graph, where V = v 1 , v 2 , · · · , v p is the set of vertices and E ⊆ V × V is the collection of edges. Commonly, we use an adjacent matrix A to describe the graph G, where the element a i,j of A is equal to 1 if there is an edge between vertices v i and v j and 0 otherwise.
The Laplace matrix of graph G is usually defined as where the diagonal matrix D is the degree matrix of A. The matrix L is obviously symmetry and positive semi-definite.

Label Propagation on Graph
A graph can be regarded as an electrostatic field if one or more vertices are assigned to an electric potential 1 and some vertices to 0. Particularly, the electrostatic field is termed as m-label generation if we assign an unit potential to the vertices with label m and zero to other labeled vertices with class label rather than m. This electrostatic field is simply denoted by m-EF. The potentials of the reminding unlabeled vertices can be calculated by solving a system of linear algebraic equations derived from combinatorial Dirichlet problem with the boundary conditions [52]. In this sense, the process of potential transmission in the electrostatic field may be considered as the procedure of label propagation on the graph.
In a continuous situation, for a field u and a region Ω, the Dirichlet integral is defined as [52,54] The Dirichlet problem is to find a harmonic function satisfying Laplace equation and its boundary condition and minimize the Dirichlet integral.
The graph based discrete combinatorial formulation of Equation (5) can be expressed as [54] Min : where p-dimensional vector X denotes the potentials of all vertices. Now, the aim is to search for a vector X to minimize Equation (6). It is easy to divide the set of vertices V into two groups, V L (labeled vertices) and V U (unlabeled vertices) such that V L ∩ V U = ∅ and V L ∪ V U = V. Without loss of generality, we put the labeled vertices in front, then label-free vertices. In light of matrix theory, Equation (6) may also be decomposed into the following form where the vectors X L and X U denote the potentials of labeled vertices and unlabeled vertices, respectively. Differentiating D(X U ) with respect to X U and setting it to zero yield an algebraic system [54] L U X U = −H T X L .
In Equation (8), the matrix L U is sparse, symmetric and positive definite. The vector X L is composed of 1 or 0 which can be easily determined by an m-EF (regardless of the order of the labeled vertices). Among many methods to solve a system of linear algebraic equations, the conjugate gradient descent method (CGDM) has been widely used in various engineering problems because of its advantages of rapid convergence, less space requirement and strong robustness [54,57]. In this work, we would like to adopt the CGDM to solve the above equation. Therefore, the potentials of unlabeled vertices can be obtained by the solution of Equation (8).
Assume that the given HSI consists of C different classes and at least one pixel in each class is labeled. The superpixel will be assigned a class label if one or more pixels in this superpixel are labeled. Thus, based on this assumption, C such systems are deduced by using the proposed method. Solving these C systems, for each unlabeled vertex v i , one can obtain its C potentials, that is, Under the restriction condition ∑ C m=1 x m i = 1 (for any v i ), the potential x m i may be understood as the probability that vertex v i belongs to the m-th class. Accordingly, we use the following rule to label the unlabeled vertices In Figure 2, we take a graph with nine vertices as an example to explain our method. It is clear that this graph contains two apparent groups. Marking the vertex v 1 and v 6 as labeled vertices, one can derive two algebraic systems like Equation (8). Using CGDM to solve them, the electrostatic field shown in Figure 2b is generated. Similarly, we can get another electromagnetic field illustrated in Figure 2c. As a result, one can easily classify these vertices into two classes according to Equation (9). The above labeling rule also means that we only consider the approximate solution of Equation (8). In other words, the iteration process of the CGDM will be terminated early by specifying a relatively large threshold, for example 10 −2 . Additionally, the realization of finding the solutions to C systems can also be carried out in parallel. These considerations are conducive to the fast implementation of our method.
The proposed algorithm can be summarized as follows Initialize A Assign parameters to variables Call for PCA to generate the base image Execute ERS to segment the base image into p superpixels for i = 1 to p for j = 1 to p (8) from the m-EF Execute CGDM to solve Equation (8) Use Equation (9) to assign the class label to each unlabeled vertex end procedure The main computation time of the Algorithm 1 is spent on computing PCA, ERS, global spectral connection and CGDM. Suppose the given HSI has n pixels, B bands and C classes and is split into p superpixels. It will take about O(B 2 ), O(nlogn + p 2 ), O(p 2 B + p 2 logp) and O(p 2 C) to implement above four steps, respectively. Therefore, the complexity of the proposed method is approximately O(nlogn + p 2 B + p 2 C). Compared with pixel-level classification methods with at least O(n 2 B) complexity (n p), this means that our superpixel-level classification scheme is computationally efficient.

Description of Three Datsets
The Indian Pines scene, recorded by the AVIRIS sensor in June 1992, contains 16 different types of crops planted in Indian Pines test site in Northwestern Indiana. The image has a spatial resolution of 20 m per pixel and 200 bands of size 145 × 145 (after 20 water absorption bands removing).
Different from Indian Pines image, Pavia University scene is an urban scene around Pavia University, Pavia, Italy. This image acquired by ROSIS satellite sensor in 2001 has 103 bands (after 12 most noisy channels were abandoned) and is of size 610 × 340, with a spatial resolution of 1.3m. It contains nine different urban ground objects.

Experimental Setup
To validate the performance of our proposal, we have tested the proposed classification method SSG on three common hyperspectral benchmarks, that is, Indian Pines, Pavia University and Salinas scenes. In our experiments, weighted parameters w 1 and w 2 are 0.5 and 0.4, respectively; k 1 is equal to 2 for three images; the connection parameters k 2 is 5 for Pavia University and Salinas images and 6 for Indian Pines image. In our experiments, each class is randomly labeled, and the number of labeled samples and label-free samples are listed in Table 1. The training set consists of all labeled samples. The remaining label-free pixels make up the test set. In order to objectively show the performance of the proposed method, all experiments are reported using an average and standard deviation of ten independent tests. Like in many exiting work, three evaluation criteria, Overall Accuracy (OA), Average Accuracy (AA) and Kappa Coefficient (κ) are employed to evaluate the whole performance of classification methods.
Additionally, we compared the proposed method with several competitors to prove the superiority of our proposal. These comparative algorithms include graph convolutional network (GCN) [48], edge-preserving filters (EPF) [58], image fusion and recursive filtering (IFRF) [21], superpixel-based classification via multiple kernels (SCMK) [22], spectralspatial HSI classification method at superpixel level (SSC-SL) [37], multiscale dynamic graph convolutional network (MDGCN) [50] and superpixel pooling convolutional neural network (SPCNN) [10]. We re-implemented these seven comparison methods (the authors have shared their codes) which use the same parameter settings as the original. The methods of GCN, MDGCN and SPCNN are based on deep learning; SCMK, SSC-SL, MDGCN and SPCNN are superpixel level HSI classification methods; EPF and IFRF are pixel-wise spectral-spatial HSI classification methods.

Classification Results
The classification results of seven competitors and our method on Indian Pines image are tabulated in Table 2. The classification accuracy obtained by the five superpixel-level classification methods is better than that of the other three pixel-wise classification algorithms. This may be due to the fact that the spatial spectral information of this image can be more fully explored by using adaptive size and shape of superpixels in the classification. Compared with IFRF, the accuracy of EPF method is relatively low (82.59%), probably because the edge-preserving filtering technique is only used in the classification maps to correct misclassified pixels. The classification accuracy of our method is improved by at least 2% (97.85% vs. 95.83%) contrast to other four superpixel-level competitors, SCMK, SSC-SL, MDGCN and SPCNN. Furthermore, the proposed SSG method outperforms other competitive algorithms in terms of the three evaluation indicators. The visualization of classification results of eight different methods are shown in Figure 3.
For Indian Pines image, there are 145 × 145 (21,025) pixels. Among them, the number of pixels to be classified is 10,249. The classification accuracy 97.83% means that there are about 223 misclassification pixels. Assuming that this image is divided into 1000 superpixels, each one contains approximately 20 pixels. Therefore, at least 21 superpixels are misclassified because of many misclassified small volume superpixels produced after removing the background pixels. This is why the visuals are unsatisfactory.   Table 3 reports the statistical results of eight algorithms on Pavia University scene. With the exception of the GCN method, the classification results of all spatial-spectral methods are satisfactory (greater than 95%) because of the high resolution of the image and the good spatial separation between classes. Both MDGCN, SPCNN and our method have a classification accuracy of more than 98.5%. There is no significant difference in the classification results of these three methods. This shows that the proposed method is comparable to that based on deep learning. Compared with EPF, IFRF, SCMK and SSC-SL methods, the classification accuracy raised by at least 2.6%. Of the eight approaches, the GCN method has the lowest classification accuracy. The reason should be that, on the one hand, this method utilizes only spectral information for pixel-wise classification. On the other hand, the neural network classifier adopted in GCN requires a large number of training samples to achieve satisfactory result. The classification maps of these methods are presented in Figure 4.   Table 4 lists the classification results of various algorithms for Salinas image. The classification accuracy of seven methods is greater than 90% in the case of 1% pixels randomly labeled per class. The classification accuracy of EPF and IFRF methods is 91.55% and 92.06%, respectively, which is lower than that of the other five superpixel based methods. It may be due to the use of dimensionality reduction technique in these two methods, resulting in the loss of some information. There was no significant difference in classification results (from 95. 88% to 96.83%) among of SCMK, SSC-SL and SPCNN methods since they all used spatial adaptation technique. Our method is slightly better than the MDGCN method, which reaches 98.97%. In particular, for the two classes easily misclassified, i.e., the 8-th class and the 15-th class, the average accuracy of 99.52% and 99.26% indicates the superior performance of our method locally. As can be seen from Figure 5, our method shows good classification performance in both the global and local regions.

Effect of the Number of Superpixels and Different Number of Training Samples
The classification results of our proposed scheme depend on the choice of superpixel number and labeled sample number. As with many existing superpixel-based methods, the optimal parameter value was obtained as an experimental result, not as a pre-specified one. Therefore, the effect of different superpixel number and different number of training samples on the performance of the proposed SSG approach will be analyzed in this section.
The OA values recorded in our experiments are the average of ten independent runs.
In our experiments, the test superpixel number was from 700 to 1800. The number of training samples and test samples is the same as that listed in Table 1. In Figure 6, with the increase of the number of superpixels, the change of OA values shows a similar trend on three scenes, that is, rising first and then decreasing. The difference between them is that the superpixel number corresponding to the optimal OA value and the magnitude of the change are different. The reason may be that a small superpixel number usually means that the generated superpixels have large sizes. This increases the risk that the objects belonging to different classes are located in the same superpixel. In addition, although the superpixel with small size has better homogeneity, it weakens the role of spatial information in classification. The experimental results show that when the number of superpixels is equal to 1000 for Indian Pines and Pavia University images and 1500 for Salinas image, our method provides satisfactory classification results on three scenes. The performance of semi-supervised classification methods generally relies on the number of training samples. Although excellent classification results can be obtained using a large number of training samples, the marking of samples is expensive. Thus, using a small number of labeled samples to obtain satisfactory classification results can often verify the performance of the classifier. The number of superpixels is identical to that determined in the previous part. Figure 7 displays the impact of different numbers of training sample on several classification methods. The number of randomly labeled pixels per class varies from 5 to 30 for these three scenes. It is easy to see that the classification accuracy of all methods gradually improves as the increase of training pixels. Due to lack of spatial information and the use of neural network in classification, the classification results of GCN were unsatisfactory. Compared with other six spectral-spatial classification algorithms, the proposed scheme can always acquire better performances for different numbers of training samples for different images. Additionally, we further compared the running time of these classification methods. All experiments were implemented using MATLAB 2018a on the computer with TM i7-6700 CPU, 3.40GHz, 16GB memory and NVIDIA GEFORCE RTX 1660 GPU. The running time here includes all the computational times for PCA, ERS, graph construction, label propagation and classification. Table 5 reports the running time (average of ten runs) of these methods on Indian Pines (IP), Pavia University (PU) and Salinas (SA) images with 30 labeled pixels per class. Three deep learning-based classification framework, GCN, MDGCN and SPCNN, are time-consuming since there are massive parameters to be optimized. One can observe from Table 5 that the computation time of our method is less than 10 seconds on three images and much less than that of other competitors. This satisfactory result benefits from the computational efficiency of the ERS algorithm, the abstract representation of superpixels, the computational simplicity of the graph construction and the rapid implementation of solving algebraic equations by CGDM approach. The results in Tables 2-5 confirm that the proposed method can accurately and quickly realize the classification of hyperspectral data.
One is how to determine the optimal number of superpixels. As shown in Figure 5, different superpixel segmentation scales will result in different classification results. To date, the determination of the optimal number of superpixels still depends on the experimental results [10,20,22,27,32,[34][35][36][37][38][39][40]. Although the multiscale-based approach can partially address this problem [18,33], the problem of how to choose segmentation scales still remains. There may be two ways to address this problem: (1) define an indicator to determine superpixel segmentation scale for a given hyperspectral image; (2) eliminate or alleviate the effect of segmentation scale on the classification results by merging superpixels.
The other is how to properly measure the similarity between two superpixels and pixels. Generally, it is difficult to properly define the distance between the two superpixels due to its characteristic of adaptive shape and size. In [32,[37][38][39][40], the authors try to use various techniques to address this problem, such as affine hull model [32,38], covariance matrix [40], KNN-based [37,39]. The computation complexity of the distance between two superpixels is at least O (n i × n j × B) in these studies. In our work, the complexity of calculating the distance between the two superpixels is approximately O (n i + n j ) × B. As expected, calculating the defined distance is more efficient than existing methods.
In Equation (2), we only consider three commonly used statistical features of superpixels to construct a new sample to approximately represent a superpixel. Perhaps, more factors should be considered in such a representation, for example, the entropy, texture features or variance of superpixels. In addition, it is also important to choose an appropriate metric when calculating the distance between superpixels defined in Equation (3). For example, if the Euclidean distance (ED) in Equation (3) is replaced with spectral angular mapping (SAM) [59], different classification results and running times are obtained, as shown in Table 6. In general, using different metrics may yield different classification results for the same classification algorithm. Thus, it is necessary to choose the appropriate metrics for different data types. It is also important to select the appropriate metrics when calculating the distance between the superpixels defined in Formula (3).
Additionally, the sparse superpixel graph constructed here is an unweighted graph. Thus, some popular graph partitioning algorithms can be used directly. Theoretically, weighted directed graphs can better represent the close relationship between vertices. However, the intimacy between the two vertices is often asymmetric. In this case, the proposed method cannot work well, and a redesign of new label propagation method is required. Therefore, it is still worth to further explore how to represent hyperspectral data with a weighted directed superpixel graph and study its division.

Conclusions
Based on the constructed superpixel graph and discrete potential theory, in this work, we present a superpixel-level semi-supervised HSI classification method. The merits of the proposal are the following: (i) Unlike the existing definition of distance between superpixels, the only use of a vector to approximately represent a superpixel makes it easy and fast to calculate the defined distance between a pair of superpixels. (ii) The classification results of three scenes demonstrate that the strategy of global spectral connection and local spatial connection can better preserve the spectral and spatial relations of the HSI in the constructed superpixel graph. (iii) In the constructed superpixel graph, taking each superpixel as a vertex instead of a pixel actually reduces the hyperspectral data spatially, so that our method can be realized quickly. (iv) The label propagation procedure based on discrete potential theory, the CGDM and approximate solution speeds up the implementation of the scheme again. Experimental and comparative results in this paper, respectively, confirm the validity of the proposed classification scheme and outperform other state-of-the-art methods in terms of classification accuracy and running time.