Novel Semi-Supervised Hyperspectral Image Classification Based on a Superpixel Graph and Discrete Potential Method

: Hyperspectral image (HSI) classification plays an important role in the automatic interpretation of the remotely sensed data.However,it is a non-trivial task to classify HSIaccurately and rapidly due to its characteristics of having a large amount of data and massive noise points. To address this problem, in this work, a novel,semi-supervised,superpixel-levelclassification method foran HSIwas proposed based on a graph and discrete potential (SSC-GDP). The key ideaof the proposed schemeis the construction of the weighted connectivity graph and the division of the weighted graph.Based on the superpixel segmentation,aweighted connectivity graph is constructed usingthe weighted connection between a superpixelandits spatial neighbors. The generatedgraphisthen dividedinto different communities/subgraphsby using a discrete potential and theimproved semi-supervised Wu–Huberman (ISWH) algorithm. Each community in the weighted connectivity graphrepresents a class inthe HSI. The local connection strategy, together with thelinear complexity of the ISWH algorithm, ensures the fast implementation of the suggested SSC-GDP method. To prove the effectiveness of the proposed spectral–spatialmethod, two public benchmarks, Indian Pines and Salinas, were utilized to test the performance of our proposal. The comparative test results confirmed that the proposed method was superior to several other state-of-the-art methods.


Introduction
With the rapid development of hyperspectralremote sensing, computers, and communication technology, a large number of hyperspectral data containing hundreds of narrow spectral bands and rich spatial information have been collected in the pastfew decades. Compared with multispectral images, ahyperspectral image (HSI) with detailed spectral information and abundant spatial texture significantly improvesitsidentification ability for land coverand is thus widely applied to various fields, such as ocean exploration, environmental monitoring, urban planning, and others [1][2][3]. However,it is still a challenging problem regarding how to classify HSI quickly and accuratelyin the field of remote sensing because of its characteristics of high dimension, a large amount of data, and massive noise pixels.
In the remotesensing literature,feature extraction and feature selection are two groups of effective dimensionality-reduction techniques to cope with ahigh dimensionalHSI. Some feature extraction methods are principal component analysis (PCA) and its various improved versions [4][5][6], locally linear embedding [7], linear discriminantanalysis [8], and so on.Typical feature selection approaches includetechniques based on the optimization algorithm [9,10], as well as those that are geometry-based [11]and clustering-based [12].As an important preprocessing step for HSI classification, these methods not only effectivelyimprove the speed of the classification algorithm, but also avoid the occurrence of Hughes'phenomenon (the classification accuracy shows a tendency of first rising and then decreasing with the increase of the number of bands).At the same time, the volume of the HSI decreases sharplywhenreducing the dimensionality of pixels.
To manage the problem of noisy data, the spatial structure information of pixels is considered in the classification process. In the discontinuity-preserving relaxation [13], guided filter [14], and recursive filtering [15] methods,spatial informationis used to de-noise the raw HSI while attempting to maintaina class boundary.The Markov random field technique [16,17] adopted afixed-size window centered on the tested pixel to smooth the data.A large number of experimental results demonstrate that these methods can significantly improve the classification accuracy.
For the problem of the large amount ofdata of an HSI, it is natural to classify hyperspectral images by using machine learning methods, such asa support vector machine [18,19] and multinomial logistic regression [13], among others.Generally speaking, spectral information alone does not yield satisfactory classification results.As such,spatial information provided by the HSIitself is gradually integratedinto classifiers.Inmultiplekernel technology [20,21],the similarity between two pixels was measured better by combining the spatial information with the spectrum of pixels. Additionally,some spectral-spatial classification methods also incorporatedspatial information into other techniques to improve the classificationperformance, such as sparse representation [22,23] and low-rank representation [24,25].
Among various spectral-spatial classification approaches,superpixel-based HSI classification methods have recently attracted increasingly more attention in remote sensing [26][27][28][29][30]. Superpixelsare the homogeneous regions containing a set of spatially adjacent pixels with similar textures and colors. Moreover, the shape and size of a superpixel can be adaptive according to the local structures of the image.Superpixel segmentation algorithms includeentropy rate superpixel segmentation (ERS) [31], simple linear iterative clustering (SLIC) [32], seeds [33],watersheds [34], and so on.Superpixel homogeneity implies that the pixels within a superpixel are more likely to belong to the same class.A majority of HSI classifiers based on superpixels make full use ofsuperpixel homogeneity to correct the pre-classification results obtained usingthe pixel-wise classification algorithms.As a result,the classification accuracy isgreatly improved.Recently, based on an affine hull model and singular value decomposition,Lu et al. [35] defined a set-to-set distance and proposeda spectral-spatial classification method for an HSI at the superpixellevel.Tu et al. [36] suggested another superpixel-levelHSI classification process. The distance between two superpixelswas calculatedby selecting partialpixels fromeachsuperpixel.Very recently, Xie et al. proposed an effective superpixel-level classification method by introducing a new similarity between superpixels and using the k-nearestneighbor (KNN) rule [37]. In the superpixel-level HSI classifiers,each superpixel is regarded as a new sample such thatthe volume of hyperspectral data can be greatlyreduced.It implies that the classification process will likely be completed in a shorter time.Thus, it is worth trying to develop new superpixel-levelHSI classification methods in the field of remote sensing.
Graphinga powerful tool fordata representation. In remote sensing, the graph-based methods havepreviously been investigatedto classify hyperspectral data [38][39][40][41][42].Camps-Valls et al.first introduced a semi-supervised, graph-basedclassification method for an HSI [43]. They adopted different spectraland spatial kernels to construct pixel-basedweighted graphs.Gao et al.presented a bi-layer, graph-based learning methodto classify an HSI with a limited number of labeled pixels [44]. Cui et al. [45]utilizedextended label propagation and a rolling guidance filter to develop a semi-supervised HSI classification method. This method adopted agraph-based label propagation algorithm to predict the pixel labels and modified some mislabeledpixels usingsuperpixels.Taking each superpixel as a node,Sellars et al. [46] proposed a graph-based learning framework for HSI classification.The final classification was completedusing alocal and global consistency algorithm. In most of the existing graph-based classification methods,the construction of graphs is at the pixellevel, except for the method proposed by Sellars et al. [46].Generally speaking, the classification result and speed of these graph-based methods depend largely on the construction and partitioning of the graph.Therefore, it is of great importance to put forward new superpixel-based methods forgraph construction and fast graphdivision techniques.
Deep-learning-model-based HSI classification methods have been extensively studied in recent years [47].Early work on HSI classification using deep learning techniques can be traced back to the deep model based on stacked auto-encoders [48]. Adopting the convolutional kernels automatically learned from HSI, Ding et al. [49] introduced anHSI classification approach usingconvolutional neural networks. Based on the convolutional neural network and a generative adversarial network, Chen et al. designed a semi-supervised, fine-grained classifier and demonstrated the effectiveness of their method on an Indian Pines dataset [50]. Combined with band selection, Sellami et al. proposed a semi-supervised HSI classification method based on a 3D-convolutional auto-encoder [51].
Graphsare also known as complex networks in computer science and physics.Several facts show that complex networks commonly havea significant feature, namely, a community structure. Although there is no universally accepted formal definition of the community at present,a community is generally considered to bea sub-network in which the nodes in a community are denselylinked to each other and the connections between the communities are sparse [52]. For anHSI classification, such a community can be regarded as a class in a hyperspectral dataset. Inmany community detection methods [53][54][55][56], the unsupervised Wu-Huberman algorithm (UWH),based on a discrete potential [55], is a powerful algorithm used to effectively discover communities in complex networks because of its fast performance. However, several assumptions of the UWH limit its application.Therefore, we hereby attempt to improvethe UWH to be a semi-supervised method to accomplish the task of community detection in weighted connected networks.
To address the aforementioned problems, this study suggests a semi-supervised superpixel-level HSI classification method based on a graph and discrete potential (SSC-GDP), aiming at classifying HSIsaccurately and quickly.In the proposed classification scheme, each superpixelisviewed as a node in a graph. This leads to a significant reduction in the volume of the hyperspectraldata to be classified.Subsequently, we link each superpixel to its spatial neighbors to construct a weighted connectivity graph. The weight of each edge represents the affinity of a superpixel and its neighbors. Communities in the constructed weighted connectivity graphare detected usingthe improvedsemi-supervised Wu-Huberman algorithm (ISWH).Finally, the HSI classification is completed by mapping each communityin the generated graph to a class of an HSI.The sparseness of the generated graph and the linear complexity of the ISWHensure the fast implementation of the proposed classification scheme.This method is anovelattempt atapplying community detection techniques to solve the problem of remote sensing data classification.Experimental results confirm the effectiveness of the proposed classification scheme. In summary, this work makesthe following contributions: 1) A novel,semi-supervised,superpixel-level classification scheme for HSI is proposed based on a discrete potential technique, whichis an effective superpixel graph-basedmethod.
2) It is the first time that the discrete potential techniqueis adopted to classify HSIs.
3) The unsupervised Wu-Huberman algorithm is improved to be a semi-supervised method to better detect communities hidden in weighted connectivity graphs.
4) Unlike existing superpixel-based methods, the proposed classification algorithmworks on the whole HSI image, not just the reference data.

Methodology
In this section, we describe the classification framework we propose in detail. It involves three main parts, that is, the superpixel segmentation, the construction of the weighted graph, and the community detection in the generatedgraph. The proposed method is illustrated in Figure 1.

Superpixel Segmentation
Let HSI , , ⋯ , ⊂ R be a hyperspectral dataset with n pixels and B bands.The superpixel segmentation of an HSI is usedto partition the HSI into differentnon-overlapping regions with an adaptive size and shape such that in each region, the spectrum of the pixels is as similar as possible and the position of the pixels is as a spatially nearest neighbor.It can be mathematically expressed as: whereS is a superpixel containing pixels andm is the number of superpixels. Hyperspectral data with hundreds of spectral bands do not allow us to partitionHSIinto superpixelsby directly usingthe classical superpixel segmentation algorithms.This is because the classical superpixel segmentation methods were originally designed to segment color images and its workspace has a dimension of no more than threedimensions.Therefore, to splitan HSI intosuperpixels, it is necessary to carry out a dimension reduction on the HSI in advance.In this study, we chosethe popular unsupervised PCA method to reduce the dimension of an HSI and take the first principal component to generate the base image sincethe first principal component contains abundant information about theoriginalhyperspectral data.
Plenty of works have demonstrated that as a popular segmentation method, the graph-based ERS method [31] has commonly been used in superpixel segmentation of HSIs [20,27,36] due to its advantagesof better performance in terms ofefficiency and effectiveness.Thus, the ERS segmentation methodwas used in our proposal to effectively generate a superpixel map.Alternatively, as one of the recommended sixpractical algorithms [57], a popular SLIC algorithm [26,30] can also be adopted to replace the ERS in our proposal because of its advantage of good adherence to class boundaries. However, this would involve more computational costs. There is no significant difference between the classification results obtained using ERS and SLIC methods in our proposed classification scheme.
In the graph-basedERS method, thebase imagegenerated by the first principal component is first mapped to a graph. Each pixel in the base image is treated as a nodeand the similarity between a pair of pixels is considered as the weight of an edge.For the user-predefined number of superpixelsmuser-predefined, the ERS method will partition the graph into more thanmdisjoint subsetsS by clustering and optimizing the following objective functionwith respect to the selected edgeset A: where H(A) is the entropy rate termthatis responsible for the compactness and homogeneity of the obtained subset,B(A) isthe balancing function that facilitatesthe generation of the subsets with the approximate same size,E is the edge set of the graph,N denotes the number of connected subgraphs in the graph, and is the weight of the balancing term.The greedy algorithm can be used to solve the optimization problem in Equation (2).A detailed description of the ERS method can be found in Liu et al. [31].The procedure for the generation of the superpixels of HSI is illustrated inFigure 1.

The Construction of the Weighted Connectivity Graph
A large amountof work has proved that graph-based methodsare popular and effective techniques in HSI classification [38][39][40][41][42][43][44][45][46]because graphs have the advantage of an amazing and flexible data presentation.In the graph-based HSI classification methods, nodes in the generated graph can represent pixels [38][39][40][41][42][43][44][45] or superpixels [46]. Considering superpixelsas nodes in a graph implies a sharp reduction in the volume of the constructed graph. In other words, it is possible to quickly divide a graph into non-overlapping subgraphs. While doing so, it is natural that the superpixels obtained using theERS method are regarded as nodes in the constructed graph.
In the graph-based classification methods,the construction of a graph is essential because the classification result is closely related to it. The two commonly used ways togenerate graphs in machine learning arethe k-nearest neighbors method and the -neighborhood technique. Notably, unlike other datasets, hyperspectral datasets contain massive amounts ofspatial information in addition to a large amount of spectral information.This means that we can use the spectral features of pixels,together withthe spatial information provided bysuperpixels, to construct graphs.Additionally, the closeness between nodes and the sparseness of the constructed graph are favorable for fast and accurate graph division. Starting from this idea, we introduce the following method forgenerating the desired graph.
Suppose that G = (V, E, W) is a weight graph, whereV = { , , ⋯ , } is the set of nodes such that each node represents a superpixel,E is acollection of edgesconnecting a pair ofnodes, andW is a weight set.In this work,we took the following steps to construct the weighted connectivity graph we needed:

i)
For the computation of the distance between two spatially adjacentsuperpixels,assume that superpixelsS and S are spatially adjacent.Based on B bands' worth of information of the pixel and the spatial information of superpixelS , calculatethe Euclidean distance ( , S ) from a pixel ∈ S to superpixel S using the local mean-based pseudo-nearest-neighbor rule(LMPNN) [37,58].Then, sort them in ascending order, denoted as ( , S ) ≤ ( , S ) ≤ ⋯ ≤ ( , S ).The distancefromsuperpixelS tosuperpixelS can be defined as: ii) For the calculation of weights,calculate the weights betweena superpixel S andall its spatialneighborsS , , S , , ⋯ , S , usingEquation(4): iii) For the construction of the weighted graph,connect each superpixelS to its spatial neighbors with the weights: , = max , , , , if and are closest neighbors to each other, , × , , otherwise.
Here, is the closest neighbor of if (1) is one of the neighbors of , and (2)after sorting , ( = 1, 2, ⋯ , ) and , ( = 1, 2, ⋯ , ) in descending order, , is locatedin the first half of the , sequence and , is also in the first half of the , sequence. The graph generatedusing Equation (5) is a weighted connectivity graph because we have not considered whether these superpixels only contain reference data in the process of connecting nodes.This means that reference data,together with the background pixels,will be involved in the classification process.Although doing so may have an effect on the classification results and increase the computation time, from an application perspective, the proposed method will be easyto usefor solving other practicalproblems in remote sensing.

The Discrete Potential and the Improved Semisupervised Wu-HubermanAlgorithm
For a given graph G, byconnecting one node in thegraphto the anode of a battery and the other node to the negative pole, thegraphG can be represented as anelectric circuit,with each edgeactingasaresistance.In the UWHmethod [55],two nodes (distance > 2) were chosen randomly to attachto a batterywith a fixed voltage. The potential/voltageof each node could be calculated by solving Kirchhoff's equations under the assumption that each edge hasthe same resistanceand the fixed voltageswere1 and 0.The voltage of each node is updated for thenumber of times specified by the user.These voltages should stay between 0 and 1.Supposing that each community was approximately equal in size, two communities will be detected by using a certain threshold or a maximum voltage gap.To correct errors caused by random selection,this processis repeated many timesandthe final result is determined by the majority voting rule. Similarly, multiple communities are found one by one according to the above process.The complexity of the UWH algorithm is about O(r(|V|+|E|)), wherer is the number of repetitions.For a detailed description of the UWHmethod, interested readers mayconsultWu and Huberman [55].
Note that the assumptions of each edge with the same resistance and each community with approximately equal volumes limit the application of the UWH method. Additionally, it is also difficult to select an optimal threshold or find an ideal voltage gap, especially for agraph with a fuzzy ordiverse community structure.To address these drawbacks,we hereby improve the UWH method to be a semi-supervisedWu-Huberman methodsuch that the communities hidden in the weighted connectivity graph(WCG) will be detected better.
Supposed that there are Ccommunities in a WCGand b nodes are labeled in each community.The basic idea of the improvement is to first generate Celectric circuits and then physicallycompute C potentials for each unlabeled node in turn in terms of the ISWH. By comparing theseC potentials, we finally decide which community the unlabeled node belongs to.Specifically, the graphis called the electric circuitgenerated by the c-label, wherethe potentials of thebnodes with label care assigned a value of1 and the potentials of the remaininglabeled nodes (with labels other than c) are assigned a value of0. The current will flow from the high potential to the low potential through each edgein the WCG.
According to Kirchhoff'snodeequation and Ohm's law: where is the current flowing from to , is the number of neighbors of node , , denotes the weight between and , and represents the potential of node in the electric circuit generated by the c-label.In Equation (6), , −1 is used as the resistance instead of , .Because alarger , shows that nodes and have a close neighbor relationship, they are more likely to belong to the same community.From a potential perspective, this means that there should be a small voltage drop between them.
In the electric circuit generated by the c-label,the potential of each unlabeled node will be obtained by rewriting Equation (6)into the following form: That is, the potential of an unlabeled node is the weighted average of the potentials of its neighbors.
Starting from b nodes with label c, we apply the breadth-first search algorithm of the graph,as well as Equation (7),to calculate of each unlabeled node one by one in the electric circuit generated by the c-label. isthen constantly updated by repeating this process many times.As a result, the potentials of unlabeled nodes belonging to the cthcommunity gradually approach 1due to the influence of b nodes with a unit potential.When c varies from 1 to C,we obtain C ordered potentials ofeach unlabeled node , e.g., 1 , 2 , ⋯ , . isbetween 0 and 1, which reflects the relationship betweennode and the cthcommunity. Finally, the label of node is assigned according to the following equation: The process of community detection using the ISWH method is shown in Figure 2. For simplicity, each edge in the network is assigned the same weight. There are two apparent communities in this network.Taking the gray node as an example, it is assigned to the left community by comparing its two potentials 0.77 and 0.11, as well as according to Equation (8).Similarly, the remaining unlabeled nodes are classified one by one, and two communities are effectively detected.  O(rC(|V|+|E|)). The number of nodes |V| in graph G is the number of superpixelsm. The number of edges |E| is approximatelya multiple of m because we only connect a superpixel to its spatial neighbors. Therefore, the ISWHapproximately takesO(rCm)to complete the community discovery.

The Proposed SSC-GDP Method
To objectively prove the validity of the proposed method,we randomly labeledb pixels in each classinstead of bsuperpixelsdue to the good feasibility of the pixel-wiselabeling method in practical applications.According to the homogeneity of a superpixel, thesuperpixelacquiresthe label once oneor more of its pixelsarelabeled.Because superpixel segmentation is not always perfect, two or more pixels may be labeled with differentlabels within the same superpixel. Inthis case, we adoptedthe majority voting rule to label it.Furthermore, it isalso possible for a superpixelto contain more than two labeled pixels with the same label. In other words,the actual number of labeled superpixels is less than or equal to bin each class (the number of labeled nodes per community is not greater thanb).
By converting the problem of label propagation in the graph into the potential transmission process in the electric circuit, the proposed method realizes HSIclassification with the help of the community detection technique in complex networks.The proposed classification method is summarized below.
Input: HSI; b-the number of labeled pixels per class; m-the number of superpixels; r-the number of repetitions; C-the number of classes; t = 0.
Step 1: Call the ERS algorithm to segment the HSI into superpixels.
Step 3:for c=1 to C {generate the electric circuit using the c-label. Do update the potential of each unlabeled node in the generated electric circuit through the breadth-first search algorithm of the graph, as well as Equation (7), t = t+1, while t < r Step 4: Assign the label of node according to Equation (8).

Experimental Setup
To evaluate the effectiveness of the proposed SSC-GDP algorithm, we tooktwo AirborneVisible/Infrared Imaging Spectrometer (AVIRIS) datasets, namely an Indian Pines image and a Salinas image, as examples in our experiments. These two benchmark images are widely used to test the performance of HSI classification methods.

Data Description
Regarding the Indian Pines dataset,the Indian Pines image, acquiredusing the AVIRIS sensorin June 1992, covers the agricultural Indian Pines test site of Northwestern Indiana. The image has 220 bands of size 145 × 145, with a spatial resolution of 20 m per pixel and a spectral range from 0.4 to 2.5 μm.In our experiments, 20 water absorption bands were discarded. The ground truth contains 16 classes from different types of crops and a total of 10,249 labeled pixels(reference data).This is a disequilibrium data set where different classes vary greatly in size. The false-color compositeimage and the corresponding reference data of the Indian Pines datasetarerepresentedin Figure 3 and Table 1, respectively.
Regarding the Salinas dataset,the Salinas image was also acquired usingthe AVIRIS sensor over Salinas Valley, CA, USA. The image is of size 512 × 217 × 224, with a spatial resolution of 3.7 m per pixel. As with the Indian Pines image, 20 water absorption spectral bands were removed. It contains 16 different classes and 54,129 labeled pixels (reference data). The main challenge for this dataset is the accurate division between the 8th class and the 15th class because their spectral features are verysimilar. Figure 4 and Table 1 show the false-color composite image and the corresponding ground truth data, respectively, of the Salinas dataset.

Evaluation Protocol
For all experiments carried out in this work, each one was independentlyrepeated 10 times;the average and standard deviation are reported in Tables 2 and 3. Three commonly used evaluation criteria-overall accuracy (OA), average accuracy (AA),and the kappa coefficient (κ)-were adopted to evaluate the performance of each HSIclassification method.
The classification results of the proposed SLC-DP method were visually and quantitatively compared with those provided by several state-of-the-art HSI classification approaches, i.e., edge-preserving filters (EPF) [59],image fusion and recursive filtering (IFRF) [60],logistic regression via variable splitting and augmented Lagrangian (LORSAL) [61],sparse multinomial logistic regression with a spatially adaptive total variation regularization (SMLR-S) [62], superpixel-based classification via multiple kernels (SCMK) [20], superpixelwise PCA (SuperPCA) [63], and a support vector machine (SVM)based on superpixel and discontinuity-preserving relaxation (SVM-SD) [63]. The EPF, IFRF, SMLR-S, SCMK, SuperPCA, and SVM-SD algorithms are spectral-spatial classifiers, whereas the LORSAL method does not consider the spatial information of pixels in the classification. Table 2 reports the classification results provided by eight methods on the Indian Pines dataset,in which the number of superpixelsmwas equal to 700 and 20 pixels per classwere labeled.As seen inTable 2,compared with the seven other pixelwiseapproaches, the proposed SSC-GDP method achievedthe best classification result (96.19%), at least 13% higher than the others. Furthermore, our proposal correctly classified7 of the 16 classes.The LORSAL classifierprovidedthe lowest classification accuracy (59.63%) because the spatial context of the HSIwas not taken into accountin the classification. The use of the SVM and multiple kernels technique madeSCMK superior to the other five spectral-spatial HSI methods, namely EPF, IFRF, SKLR-S, SuperPCA, and SVM-SD. The EPF classifier didnot produce asatisfactory classification result (74.95%), which was partially due to the complexity ofthe class boundary and the application of a PCA technique. Table 2.Classification accuracy (in percent) ofthe eight methods used on theIndian Pines image.EPF: edge-preserving filters,IFRF: image fusion and recursive filtering,LORSAL: logistic regression via variable splitting and augmented Lagrangian,SMLR-S: sparse multinomial logistic regression with a spatially adaptive total variation regularization, SCMK: superpixel-based classification via multiple kernels, SuperPCA: superpixelwise PCA, SVM-SD: support vector machinebased on superpixel and discontinuity-preserving relaxation.OA: overall accuracy, AA: average accuracy, κ: kappa coefficient. The visual classification maps produced by different classifiers are shown in Figure 5. One can see from Figure 5 that, unlike several other classification results, misclassification in our classification mapoccurredonly between spatially adjacent classes. This was because, on the one hand, only spatially adjacent superpixelswere interconnected in the constructed weighted graph; on the other hand, the class label was incorrectly transmitted in the ISWH method usingmixed superpixels containing reference data and background pixels, as well asthose containing only background pixels. Although the removal of background pixels and the reprocessing of mixed superpixels may improve the classification accuracy, thisis not convenient for practical applications.  Table 3 lists the statistical results obtained by usingvarious classifiers on the Salinas image, where the number of superpixelsmwas equal to 1000 and 20 pixels per classwere marked. It is easy to understand from Table 3 that the proposed classification scheme was superior to the other seven competitive methods according to three indices.For the 8th class and the 15th class that are easilymisclassified, our proposal achievedsatisfactory classification results (98.98% and 98.15%, respectively).As can be seen from Figure 6h,a small number of pixels located near their class boundaries were misclassifiedand the remaining pixels were correctly classified.This wasbecause some mixed superpixelswere misclassified during the classification process.Additionally, the participation of background data in the classification process was a cause of misclassification. The classification accuracy of the LORSAL method was nearly 20% lower than our proposal. Compared with the SCMK and SMLR-S classifiers, the proposed method improved theclassification results by about 5% in terms of the OA.Among the seven spectral-spatial methods, the classification accuracy of the SuperPCA method was unsatisfactory. Thismay have been due to the use of PCA,which resultedin the loss of some information.From a quantitative and visual perspective, our resultsconfirmed that the proposed superpixel-level classification approach displayed agood performance in the case of limited labeled pixels.

Effect of the Number of Superpixels
The classification results obtained by the proposed method variedwith the number of superpixels,as shown in Figure 7. Here, ratio = 3 means that three pixels per class were marked. As the number of superpixelsmchangedfrom 300 to 1000, the classification accuracy on theIndian Pines datasetappearedto rise at first and then decrease.This may have beendue to the fact: (i) Asmall numberof superpixels means large superpixel volume. Accordingly,the probability that thepixels with different class labels are in one superpixel will be increased. This will have an impact on the classification results of the proposed superpixel-level classification algorithm. (ii) In general, a superpixelwith a small size indicates it has a better homogeneity. This fact seemedto help us groupthem correctly. However, similar to the case of pixels,the separability between superpixel blocks was reduced.The difference betweenthe classification results obtained by the same labeled ratio gradually reducedwhen the labeled ratiovariedfrom 3 to 20.In particular, in the case of ratio = 10, 15, and 20, the difference betweendifferent classification accuracieswas less than 1%when the number of superpixelschangedfrom 500 to 900.Thisindicates that the proposed classification framework was not sensitive to the number of superpixels in this range. The experimental results show that the OA reachedits maximum for m = 700 on this dataset.As can be observedin Figure 7b, all OA values forthe Salinas image were greater than 94.5%, even for ratio = 3. This resultexplains whythe proposed method can be thought of as a successful attemptto exploresuperpixel-level HSI classification. For the same labeled ratio, a small classification difference (not greater than 1.5%) means that the proposalwas insensitive to segmentation scales (from 500 to 1200) on this dataset.It should be pointed out that the optimal number of superpixelsused in this work was anexperimentalresult.In a variety of applications, it remains a challenging problem to determine the optimal segmentation scale without reference data.

Comparison of Several Competitive Methods
The comparison results fornumerouscompetitiveclassifierson the Indian Pines and Salinas datasets are shown in Figure 8.It is indicated in Figure 8 that our classification results were significantly better than those of the other seven state-of-the-art algorithms for different labeled ratios.For theIndian Pines image shown in Figure 8a, unlike the other seven competitive methods,the classification accuracy provided by the proposed methodshoweda slow upward trend with the increase inthe number of labeled pixels.This may have beenbecausetwo or morepixelswithin the same superpixelwere labeled. Thisdoes not help improve the classification results of our method.The LORSAL method didnot obtain a satisfactory classification result because of the lack of spatial information in the classification.The fact that the classification accuracy of the SMLR-S and SVM-SD approachesincreasedsteadily with the increase of the number of labeled pixelsimplies that the algorithm was dependent on the mark proportion.The classification results ofthe SCMK, IFRF, EPF, and SuperPCA methods exhibiteda similar tendency. As a whole, SCMK and IFRF outperformed the other two.As can be seen from Figure 8b, our method achievedsatisfactory classification results (higher than 95%) on the Salinas dataset.The proposed method hadabouta 5% advantage over the other seven classifiers. Because the regular class shape of this dataset enhancedthe smoothing effect of the spatial adaptive regularization technique, the SMLR-S methodwas superior to the other five spectral-spatial classification algorithms.The SuperPCA algorithm, which is closely related tothe superpixel segmentation scale, didnot achieve adesired classification effect on the image.Experimental results on these two datasets confirmed that the proposed superpixel-level classification scheme waspreferable to the other seven pixel-wise classification algorithms.

Impact of the Number of Updates on the Classification Results
In this experiment, 20 pixels per class were randomly labeled and the number of superpixelsmwas 700 and 1000 for the Indian Pines and Salinas datasets, respectively.In the ISWH method, the potential of each unlabeled node is continually updatedusingEquation (7) in the electric circuitgenerated by the c-label. As the number of updates increases, the difference between the potentials of the nodes belonging to the cth community and the remaining nodes increases.This is propitious forthe detection of the cthcommunity. This fact was demonstratedforthe Indian Pines dataset, as shown in Figure 9. The classification results of the Indian Pines image became increasingly better when the number of updates increasedfrom 5 to 20, and then tended to be stable (about 0.5% difference).For the Salinas image, after 10 updates, the classification accuracy reached 98.01%, and thenslightly increased byabout 0.2%. Afterward, it showeda steady trend when the number of updates was greater than or equal to 15. This may be explained by the fact that this dataset hada high spatial resolution and a good separation between classes.The fast convergence of the updating process illustrates the stability of the ISWH algorithm. At the same time, this is also one of the reasons why the proposed algorithm could realize fast classification.For convenience, the number of updates was uniformly taken as 20 in our experiments.

Execution Time
In this work, all the experiments were accomplished on the software Matlab 2015b. The parameters of the laptopusedwereAMD 2600 CPU with 3.5GHz, 16GB memory,and a Windows system.We report the running times of the eight HSI classification algorithms on the Indian Pines and Salinas datasets in Table 4. Each method wasindependently tested 10 times, with 20 labeled pixels per class.The number of superpixelswas 700 and 1000 for theIndian Pines and Salinas datasets, respectively. The LORSAL algorithm showedthe fastest performance due to considering only spectral information in the classification.Because of the adoption of sparse multinomial logistic regression, edge detection, and a weighted Markov random field, as well as the optimization of iterative updates, the SMLR-S method requiredmore time to complete the classification task. Usinga probability optimization strategy in EPD and SVM-SD algorithmsledto their relatively high computational complexity.
The main computing time of theproposed methodwas the construction of the weighted connectivity graphbecause it neededto calculate the distance between each superpixel and its spatial neighbors.Unlike the SCMK, SuperPCA, and SVM-SD methods, the proposed SSC-GDP method used the whole HSI, not just the reference data.Note that the Indian pines dataset consistedof10,249 reference pixels and 10,766 background pixels, and the Salinas image was made up of 54,129 reference pixels and 56,975 background pixels.Thus, our algorithm was the most time-efficientof thefour superpixel-related approaches.In terms of theclassification results and computing time, our proposal is a satisfactoryclassification framework.

Discussion
The basis of the proposed SSC-GDP method was the assumption of superpixelhomogeneity, i.e., pixels in a superpixel will share the same class label with a high probability.This assumption allowedus to representasuperpixel as a new sample in HSI classification.As a result,the volume of hyperspectral data is reduced greatly.This implies that classification is likely to be completed in a short time.Different from dimensionality reduction, this is a data reduction technique froma pixel perspective.Ahigh homogeneity of superpixels and an appropriate number ofsuperpixelsare key factors affecting the classification results.In existing superpixel-based, pixel-wiseHSI classifiers and foursuperpixel-level HSI classification approaches [35][36][37]46],the optimal number of superpixels is determined usingthe classification results of reference data. However, the hyperspectral data obtained in practice have little reference data. Therefore,the development of methods that can be used to find the optimal number of superpixelsfor the HSI without reference data is worthy of discussion.
To our best knowledge, there are four superpixel-level HSI classification methods [35][36][37]46]. Compared with the approaches proposed in References [35,46], the advantage of the proposed SSC-GDP classification framework is that it is nonparametricandeasyto calculate. The use of a local connection strategy in the graph construction, as well as the linear complexity of the ISWH algorithm, makes our proposal superior to the method developed inXie et al. [37]. Additionally, these four methods only consider the reference data in the process of classification.However, in our proposal, both the reference data and background data are involved in the classification because we have not considered whether superpixels only contain reference data when constructing graphs.
For superpixel-level HSI classification methods, it is of fundamental importance to properly measure the similarity between two superpixels. Based on an affine hull model and singular value decomposition [35], KNN [36], the extended LMPNN [37], and covariance matrix and kernel technique [46],the similarity defined in these four methods may measure the distance between two superpixels better.However, the calculation of these similarities is complicated, multi-parametric, or not fast enough.Unlike in the case of pixels, the characteristic of superpixels with adaptive shapes and sizes also makes it difficult to properly define and quicklycompute the similarity between superpixels. To classify HSI quicklyand accurately, it is necessary to design new methods to address this problem in our future work.
It is well known that superpixel segmentation is not always perfect. Therefore,a superpixel may contain a large number of pixels belonging to the same class and a few heterogeneous pixels. Furthermore, massive noisy pixels in the HSI will affect the similarity between superpixels. To weaken their influence on the calculation of the distance between superpixels, thisproblem is handled carefully using aweighting technique and the local average pseudo-nearest neighbor method definedin Equation (3).However, recalculating the distance from a pixel to each local average pseudo-nearest neighborwill requiremore computational cost.This influencesthe efficiency of our method and is a weakness of our proposal.Additionally, our method can work well on those hyperspectraldatasets whose class distribution is spatially continuousbecause the potential of labelednodes can be transmitted effectively.For those hyperspectral datasets composed of fragmented classes, the potential of a labeled nodeneeds to pass through different classes or background data to be transferred to the node to be classified, thusit will significantly attenuate. This means that the probability of misclassification will increase, whichlimits the working space of our method.In our classification results, the misclassified pixels were mainly located near the class boundary. Our future work willinvestigate how to avoid or reduce the misclassified pixels near the class boundary.

Conclusions
This paper suggests anovel, semi-supervised,superpixel-level classification method for HSI based on a graph and discrete potential method. The proposal aimedto rapidly and accurately classify an HSI. The advantages of the introduced classification algorithm are: (i) Data reduction sincetaking each superpixel as anew sample greatly reducedthe volume of the original hyperspectral dataset,and thus savedthe classification time. (ii) Denoising, where the assumption of homogeneity of superpixelsplayedan important role in removing noise pixels in the classification process.(iii)Good classification performance, where in the case of only a few labeledpixels per class, our method still showed agood classification ability.Additionally, the proposed classification methodranon the whole HSI image, not just the reference data.Thisimplies that the proposal has a better application prospect in the field of remote sensing.This scheme provides a new idea for HSI classification by introducing a community discovery technique.Experimental results and comparison results havedemonstrated the effectiveness of the method.