Spectral and Spatial Classification of Hyperspectral Images Based on Random Multi-Graphs

Hyperspectral image classification has been acknowledged as the fundamental and challenging task of hyperspectral data processing. The abundance of spectral and spatial information has provided great opportunities to effectively characterize and identify ground materials. In this paper, we propose a spectral and spatial classification framework for hyperspectral images based on Random Multi-Graphs (RMGs). The RMG is a graph-based ensemble learning method, which is rarely considered in hyperspectral image classification. It is empirically verified that the semi-supervised RMG deals well with small sample setting problems. This kind of problem is very common in hyperspectral image applications. In the proposed method, spatial features are extracted based on linear prediction error analysis and local binary patterns; spatial features and spectral features are then stacked into high dimensional vectors. The high dimensional vectors are fed into the RMG for classification. By randomly selecting a subset of features to create a graph, the proposed method can achieve excellent classification performance. The experiments on three real hyperspectral datasets have demonstrated that the proposed method exhibits better performance than several closely related methods.


Introduction
With the advance of earth observation programs, many hyperspectral sensors with high spectral resolution have been developed, such as NASA's Airborne Visible/Infrared Imaging Spectrometer (AVIRIS), and NASA's EO-1 with its hyperspectral instrument Hyperion. The AVIRIS can acquire image data in 224 bands of 10 nm spectral resolution in the reflected visible and near infrared spectrum. The Hyperion can acquire image data in 242 spectral bands at approximately 10 nm spectral resolution. In China, hyperspectral sensors include FY-3A with the Medium-Resolution Spectral Imager (MERSI), Chang'E-1 with the Interferometric Imaging Spectrometer (IIS), and the upcoming GaoFen-5 with the Advanced Hyperspectral Imager (AHSI). More and more remote sensing hyperspectral images are available. Hyperspectral images can obtain hundreds of narrowband spectral channels for the same area, and can provide richer spectral information to support the fine recognition of various land-cover materials. Therefore, hyperspectral images have drawn increasing attention and opened up new remote sensing application fields, such as hydrocarbon detection [1], lake sediment analysis [2], oil reservoir exploration [3], and diseased wheat detection [4], etc. Among these applications, classification of hyperspectral images is well acknowledged as the fundamental and challenging task of hyperspectral of Greenland is investigated. The proposed method obtains good classification performance in Arctic sea ice classification, and it may contribute to the Polar research communities.
The remainder of this paper is organized as follows. Section 2 reviews the related work focusing on hyperspectral classification. In Section 3, we present the proposed classification framework based on spectral-spatial features. Section 4 shows the experimental results on three real hyperspectral images. These experiments demonstrate the effectiveness of the proposed method. Finally, Section 5 gives the concluding remarks, together with some hints for plausible future research.

Related Work
Researchers have studied hyperspectral image classification for decades. This section discusses the existing feature extraction methods for hyperspectral image classification. We first review spatial-spectral classification methods based on handcrafted features, then we review classification methods based on deep learning and ensemble learning models, respectively.
Spatial-spectral classification methods based on handcrafted features. Inspired by the phenomenon where spatially neighboring pixels carry correlated information, jointly exploiting both spatial and spectral information becomes an attractive field of hyperspectral image classification [11]. Most of the previously proposed spatial-spectral classification methods have focused on using handcrafted features, which are designed based on the experts' prior knowledge, such as principle component analysis, Gabor wavelet filters, and morphological profiles. One simple yet effective method for spatial-spectral classification is by applying adaptive filters or moving windows to the spectral bands. Benediktsson et al. [12] proposed a classification method based on an extended morphological profile. The extended morphological profile is performed at many scales, and the obtained features are then fed into a classifier for classification. Jia et al. [6] used Gabor wavelet filters with different scales on hyperspectral data to extract spectral-spatial-combined features. In addition, some statistical tools are used to model the spatial relationship between neighboring pixels. Li et al. [13] utilized multi-modal logistic regression and Markov random fields to model the contextual information among neighboring pixels. Tarabalka et al. [14] utilized Markov random fields to refine the classification result generated by probabilistic support vector machines. Wang et al. [15] proposed a locality adaptive discriminant analysis method, and applied the method to spatial-spectral classification of hyperspectral images. Makantasis et al. [16] presented a tensor based method for hyperspectral image classification. The method retains the spatial and spectral coherency of the input samples by utilizing tensor algebra operations. It is empirically verified that when the size of the training set is small, the tensor based method presents superior classification performance.
Classification methods based on a deep learning model. Deep learning methods, which contain two or more hidden layers, tend to extract the discriminant and invariant features of the input data. These methods have been actively studied in image classification [42,43], natural language processing [44], and speech recognition [45], etc. Recently, deep learning methods have attracted great interest in remote sensing communities [17][18][19][20][21][22][23][24][25][26][27][28][29][30]. Detailed surveys of deep learning methods for processing of remote sensing data can be found in [46][47][48]. Chen et al. [17] introduce deep learning to hyperspectral image classification for the first time. A deep model based on stacked autoencoders was designed for feature extraction, and the model obtained better classification accuracy compared with some shallow classification models. Later, Chen et al. [18] propose a classification strategy based on deep belief networks (DBN). The multilayer DBN model is designed to learn the deep features of hyperspectral data, and the learned features are then classified by logistic regression. Ding et al. [21] propose a hyperspectral image classification method based on convolutional neural networks (CNNs), where the convolutional kernels can be automatically learned from the data through clustering. Wu et al. [22] propose a convolutional recurrent neural network (CRNN) for classification of hyperspectral data. The convolutional layers are utilized to extract locally invariant features, which are then fed to a few recurrent layers to additionally extract the contextual information among different spectral bands. Li et al. [23] propose a CNN-based pixel-pairs feature extraction framework for hyperspectral image classification. A pixel-pair model is designed to exploit the similarity between pixels and ensure a sufficient amount of data for the CNN. Pan et al. [26] design a Vertet Component Analysis Network (VCANet) for deep features extraction from hyperspectral images smoothed by a rolling guided filter. Zhang et al. [29] propose a diverse region-based CNN for hyperspectral image classification which can encode semantic context-aware representations to obtain promising features.
Classification methods based on an ensemble learning model. Ensemble learning models use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Ceamanos et al. [33] propose a hyperspectral classification method based on the fusion of multiple SVM classifiers. The method relies on the decision fusion of individual SVM classifiers which are trained in different feature subspaces. Huang et al. [34] present a multifeature classification model, aiming to construct a SVM ensemble combining multiple spectral and spatial features. Gu et al. [35] propose a multiple kernel learning framework which employs a boosting strategy for screening the limited training samples. The multiple kernel learning framework exploits the boosting trick to try different combinations of the limited training samples and adaptively determine the optimal weights of base kernels. Qi et al. [36] propose a multiple kernel learning method which can leverage the feature selection and particle swarm optimizations.
Our work is related to the ensemble learning model. A graph-based ensemble learning model has rarely been considered in hyperspectral image classification. In particular, our method introduces RMG into hyperspectral classification. RMG is a graph-based ensemble method, in which the classifier consists of an arbitrary number of trees for classification. We also show that the utilization of RMG can alleviate the phenomenon of over fitting and can effectively obtain satisfactory classification results.

Methodology
The framework of the proposed SS-RMG method is illustrated in Figure 1. It consists of two main steps: (1) Extraction of the spatial and spectral features; (2) integrating the spatial and spectral information into the random multi-graphs for classification. In the remainder of this section, we will describe in more details the strategies adopted for feature extraction and spatial-spectral classification.

Neighborhood region
LBP feature extraction Spatial vector . . .

Multi-Graphs
Output class labels

Spectral and Spatial Feature Extraction
The feature extraction step includes two parallel modules: Spectral feature extraction and spatial feature extraction. The spectral and spatial features of each pixel are stacked into a one-dimensional vector. The feature vectors are then fed into random multi-graphs for classification. Here, we introduce how spectral and spatial features are extracted. Then, we will give brief descriptions of how spectral and spatial features are combined in the proposed framework.
In spectral feature extraction, we use the raw data of all the spectral bands as input. In spatial feature extraction, linear prediction error (LPE) [40] is first utilized to select a subset of spectral bands with distinctive and informative features. LPE is a simple yet effective band selection method, based on band similarity measurement. Assuming that there are two initial bands B 1 and B 2 , for every other band B, an approximation can be expressed as B = a 0 + a 1 B 1 + a 2 B 2 . Here, a 0 , a 1 and a 2 are the parameters to minimize the LPE: e = B − B 2 . The parameter vector can be denoted by a = [a 0 , a 1 , a 2 ] T . A least square solution can be employed to obtain the parameter vector as follows: where M is a matrix with three columns whose first column is with all ones, second column is the B 1 -band, and third column is the B 2 -band. The number of rows of M is the total number of pixels in each spectral band. m is the B-spectral band. The band that produces the maximum error e is considered as the most dissimilar band to B 1 and B 2 , and will be selected. Thus, the band combination can be subsequently augmented to five, six, seven, and so on, until the desired number of bands are obtained. After band selection, the local binary pattern (LBP) [41] feature extraction process is applied to each selected band. LBP is a non-parametric method, and it summarizes local structures of images efficiently by comparing each pixel with its neighboring pixels. The most important properties of LBP are its tolerance regarding monotonic illumination changes and its computational simplicity. Given a center pixel t c , each neighbor of a local region is assigned a binary label, which is either "0" or "1" depending on whether the center pixel has a larger intensity value or not. Specifically, the k neighboring pixels are generated from a circle of radius r centered at t c . Along with the selected k neighbors, the LBP code for the center pixel t c can be given by: where The output of LBP code reflects the texture orientation and smoothness in a local region of the size w × w. After obtaining the LBP code of all pixels, an occurrence histogram is computed over a local patch centered at the pixel of interest, as shown in Figure 2. Then, all bands of LBP histograms are concatenated to form the spatial feature vector. It is well worth noting that the patch size w is a user-defined parameter, and classification performance with different patch sizes will be examined in the experimental section.
It should be noted that in this paper we use an extension of the original LBP, which is called the uniform pattern. The uniform pattern can effectively reduce the feature vector and implement a simple rotation invariant operator. A LBP is called uniform if the binary pattern contains at most two 0-1 or 1-0 transitions. In the computation of LBP histograms of each spectral band, all non-uniform patterns are assigned to a single bin. Then, the feature vector for one spectral band reduces from 256 to 59.

LBP feature extraction
Interested pixel center LBP image LBP image patch LBP histogram The spectral features contain important information for discriminating different kinds of ground categories. The spatial features decrease the intra-class variance and can lead to improved classification performance. The combination of spectral and spatial features provides more reliable classification results. The integration of spectral and spatial features is addressed by using a vector stacking approach, as shown in Figure 1. Specifically, for each pixel, the spatial feature vector is added to the end of the spectral vector. Then, these features are fed into the Random Multi-Graphs for classification. The detailed classification model will be described in the following subsection.

Classification Based on Random Multi-Graphs
The combined spectral and spatial features are fed into the Random Multi-Graphs for classification. The Random Multi-Graph (RMG) [38] is originally designed to solve the problem of face recognition. It tries to achieve two goals: The first is to avoid the curse of dimensionality and over fitting by injecting randomness into the graph. The second is to provide a new learning framework to handle high dimensionality and large-scale-data problems.
Given a dataset comprised of labeled data X lab = [x 1 , x 2 , . . . , x l ] and unlabeled data X unlab = [x l+1 , x l+2 , . . . , x l+u ], we can obtain a weighted graph. In the graph, the vertices consist of N = l + u data points. The edges in the graph with weights represent the similarity between the affiliated nodes, and these edges can be represented by a weight matrix W ∈ R N×N . Once the graph is built, the label information is injected into the graph and propagated throughout the whole graph to obtain the labels for the unlabeled data. Specifically, if the weight w ij is large, then the labels of the adjacent vertices x i and x j are considered to have the same label.
For a c-class classification problem, the graph-based learning methods can be considered as the following quadratic optimization problem: where tr(·) is the trace function. C ∈ R N×N is a diagonal matrix, and its i-th diagonal element c i is computed as c i = C l > 0 for 1 ≤ i ≤ l, and c i = C u > 0 for l + 1 < i ≤ N, where C l and C u are two parameters. Y = (y 1 , . . . , y l , 0, . . . , 0) T ∈ R N×c . f ∈ R N×c denotes the predicted labels. L ∈ R N×N is the regularization matrix. L is the graph Laplacian, and it is defined as L = D − W, where W is the weight matrix of the graph and computed by the Gaussian kernel as: where σ is the kernel width parameter, which needs to be tuned. D is the row sum of W. More detailed information can be found in Zhang's work [38]. In order to automatically discover the neighborhood structure inherent in the graphs to learn appropriate compact representations, researchers proposed the Anchor Graphs algorithm in handwritten digit recognition [49] and image classification [50]. The Anchor Graph algorithm allows constant time hashing of a new data point by extrapolating graph Laplacian eigenvectors to eigenfunctions. Then, a hierarchical threshold learning procedure is applied in which each eigenfunction yields multiple bits, leading to higher search accuracy. In the Anchor Graphs algorithm, the label prediction function can be represented as: where P ij is the data-adaptive weight. A = {a j } m j=1 in which each a j is an anchor point. This formula reduces the solution space of unknown labels from the larger space to a smaller space. K-means clustering centers are selected as anchors, since these centers have strong representation power to cover the full dataset. Liu et al. [49,51] proposed the Local Anchor Embedding (LAE) algorithm to obtain the anchors. In this paper, we use the LAE algorithm to compute the anchor points. Figure 3 illustrates the flowchart of the RMG algorithm. The whole framework of RMG can be described as follows: • Step 1: Randomly select k f features from all the high dimensional features of each sample.

•
Step 2: Select m anchor points to cover the data manifold denoted by an anchors matrix, and then compute the mapping matrix P to represent the rest of the data points via the selected anchors.

•
Step 3: Run semi-supervised inference on this graph by using graph Laplacian Regularization.

•
Step 4: Repeat the above steps to get k g graphs. • Step 5: k g graphs are voted to obtain the labels for the unlabeled data points.

Input dataset
Randomly choose k f features to construct graph 1 Randomly choose k f features to construct graph 2 Randomly choose k f features to construct graph 3 Randomly choose k f features to construct graph k g .

Graph-based inference to get label prediction F1
Graph-based inference to get label prediction F2 Graph-based inference to get label prediction F3 Graph-based inference to get label prediction The critical part of the proposed SS-RMG is injecting randomness into the graphs. This strategy can help to alleviate the problem of overfitting. That is, the learning model can fit the training set very well, but fails to generalize to new samples. The most common solution to overfitting is regularization. By using regularization, all the features are maintained and the magnitudes of the parameters are reduced. In this sense, the proposed SS-RMG can be considered as a kind of regularization. Specifically, we select a small subset of features to construct a graph, and unselected features' weights are penalized to become zero in the graph. This kind of regularization can contribute a lot to alleviate the phenomenon of overfitting. There are some similar statements from previous work by other researchers. In Reference [52], it is confirmed that the randomness in the classification model can be viewed as a kind of regularization. In Reference [53], Breiman noted that injecting the right kind of randomness can help to alleviate overfitting.
The proposed method is suitable for scenarios where a small number of ground truth samples are selected, and based on these samples, the whole scene will be labeled. In such semi-supervised applications, RMGs can lead good classification performance.

Experimental Results and Analysis
In order to evaluate the performance of the proposed method, three hyperspectral datasets are employed. As mentioned before, besides two widely used hyperspectral datasets, we use the Baffin Bay dataset to evaluate the performance of the proposed method. In this section, we first give a brief introduction to the datasets. Then, the influence of parameters is analyzed. Finally, the experimental results are shown and discussed by comparing with some closely related methods.

Dataset Description
The first dataset is the Indian Pines dataset. This dataset is widely used in hyperspectral classification, and it is captured by the visible/infrared imaging spectrometer (AVIRIS) in Northwestern Indiana. It covers the wavelength ranges from 0.4 to 2.5 µm with 20 m spatial resolution. The size of the dataset is 145 × 145 pixels, and 10,249 pixels are labeled. The labeled pixels are classified into 16 classes. There are 200 bands available after removing the water absorption channels. A false composite image (R-G-B = band 36-17-11) and the corresponding ground truth are shown in Figure 4a,b. The number of training and testing samples is summarized in Table 1.   The second dataset is called the Pavia University dataset. It is an urban site over the University of Pavia, Italy. The dataset was captured by the reflective optics system imaging spectrometer (ROSIS-3). The size of the image is 610 × 340 with 1.3 m spatial resolution. The image has 103 spectral bands prior to water-band removal. It has a spectral coverage of 0.43-0.86 µm. A false composite image (R-G-B = band 10-27-46) and the corresponding ground truth are shown in Figure 5a,b. The number of training and testing samples is summarized in Table 2.
The third dataset is the Baffin Bay dataset. It was acquired on 12 April in 2014, from the Hyperion sensor of EO-1. The Hyperion sensor collects 220 spectral bands ranging from 0.4 to 2.5 µm. The sensor operates in a push broom fashion, with a spatial resolution of 30 m for all bands. There are mainly four classes from the ground truth map: Land, sea water, gray ice and white ice. A false composite image and the corresponding ground truth are shown in Figure 6a,b. The number of training and testing samples is listed in Table 3. It should be noted that this dataset is very challenging for hyperspectral image classification. There are many small pieces of white ice in the dataset, and it is very hard to classify these white ice pieces correctly.

Analysis of Parameters
In the Random Multi-Graphs algorithm [38], the authors have provided a detailed analysis of C l and C u . They have demonstrated that the influence of C l and C u is limited. In Zhang's work [38], C l is set as 0.1, and C u is set to a fixed small value, 10 −6 . We use the same C l and C u values in the proposed SS-RMG. Here, we mainly focus on the discussion about the particular parameters used in the proposed SS-RMG: The number of graphs, the number of spectral bands, and the patch size in LBP feature extraction. In this subsection, OA is selected as the metric. All the results are obtained by averaging the accuracy results of 30 runs.
In the proposed SS-RMG, the number of graphs k g is an important parameter. We present the experiments about the effect of k g . Figure 7 illustrates the influence of graph numbers k g on three datasets. We can see that there is an increase when k g ≤ 20 . When k g is above 20, the OA values on three datasets tend to be stable. Although utilizing more graphs may contribute to a better result, the improvement is slight. Therefore, we may draw the conclusion that k g = 20 is an available setting, since continuously increasing the number of graphs contributes little to the improvement of accuracy. The number of bands used in the proposed SS-RMG also affects the final classification results. Here, we set the number of bands from 1 to 12 to analyze the influence on the three datasets, as illustrated in Figure 8. It can be observed that after an increase from 1 to 7, the value of OA on the Indian Pines dataset presents a steady tendency. However, on the Pavia University dataset, there is a sharp decline when the number of bands is above 5. Similarly, on the Baffin Bay dataset, there is a slow decline when the number of bands is above 6. In our implementations, we set the number of bands as 5. Using 5 spectral bands in the proposed SS-RMG may not be the best choice for all the experimental datasets. Here, we choose a relatively small number, within allowable hardware resources, for better analysis. On the Indian Pines dataset, the spatial and spectral feature vector dimension is 495. 295 elements correspond to spatial features and 200 elements correspond to spectral features. On the Pavia University dataset, the spatial and spectral feature vector dimension is 398. 295 elements correspond to spatial features and 103 elements correspond to spectral features. On the Baffin Bay dataset, the spatial and spectral feature vector dimension is 515. 295 elements correspond to spatial features and 220 elements correspond to spectral features. Besides the number of graphs and the number of spectral bands, the patch size w in LBP feature extraction is also an important parameter. We set w to 7, 11, 15, 19, 23, 27, 31, 35, 39 and 43 to analyze the influence on the three datasets, as illustrated in Figure 9. We notice that there is a sharp increase in classification accuracy when w ranges from 7 to 19. The accuracy tends to be stable when w is 19 or larger on the Indian Pines dataset. On the Pavia University and Baffin Bay datasets, the classification accuracies decline slightly when w ranges from 19 to 43. A larger patch size would take pixels of different classes into account, and therefore have negative effects in classification accuracy. If the patch size is too small, the extracted features may not be representative for the center pixel's spatial characteristics. Hence, in our experiments, we set the patch size in LBP feature extraction to 19 × 19 pixels.

Classification Results
The performance of the proposed SS-RMG is shown in Tables 4-6 for the three datasets with different methods. We compare the proposed method with some closely related hyperspectral classification methods: EPF-G [54], IFRF [55], LBP-ELM [39], and R-VCANet [26]. EPF-G generates pixel-wise classification maps, and handles these maps by edge-preserving filtering. Then, the class of each pixel is selected based on the maximum probability. IFRF combines spatial and spectral information via image fusion and recursive filtering. IFRF does not directly extract patches' features, and it uses two parameters δ s and δ r = 0.3 to extract spatial features. In LBP-ELM, LBP is implemented to represent the spectral and texture features, and a soft-decision fusion process of extreme learning machines was used to merge the probability outputs of spectral and texture features. R-VCANet is a simplified deep learning model. It is comprised of the input layer, two convolutional layers, and the output layer. In the input layer, a rolling guidance filter is used to explore the contextual structure features and remove small details.  We run the above methods 30 times with randomly selected training and testing samples, and the average accuracies and the corresponding standard deviations are reported. Overall accuracy (OA) and kappa coefficient (K) are selected as criterions to give quantitative evaluations.
(1) Results on the Indian Pines dataset: In this dataset, 10% of the pixels are selected as the training set and the rest of the pixels in the image are selected for testing. Experimental results on this dataset show that nearly all the methods work. We can observe from Figure 10 that the spatial consistency is roughly preserved by all these methods. The reason for this phenomenon is that these methods have utilized joint spatial-spectral features. The proposed SS-RMG achieves a 1-7% advantage over the other methods. The experimental results on this dataset demonstrate that the Random Multi-Graphs algorithm is effective in hyperspectral image classification, especially when the number of training samples is limited.
(2) Results on the Pavia University dataset: In this dataset, 1% of the pixels are selected as the training set and the rest of the pixels in the image are selected for testing. All the methods show close results. The proposed SS-RMG surpasses LBP-ELM by 6% in OA. It is evident that the proposed SS-RMG outperforms LBP-ELM by randomly selecting subsets of features. In addition, the proposed SS-RMG outperforms EPF-G and IFRF, which means that the application of LBP features can improve the classification performance. Moreover, the proposed SS-RMG surpasses R-VCANet by 1.4% of OA. This result indicates that ensemble learning models can achieve competitive performance compared with deep learning methods in hyperspectral image classification. The results on this dataset indicate that compared with some state-of-the-art methods, the proposed method is predominant. The randomness in the proposed SS-RMG can be viewed as a kind of regularization technique, and it may alleviate the phenomenon of over fitting.
(3) Results on the Baffin Bay dataset: As mentioned before, previous studies mainly focus on ground cover classification, and a sea ice dataset is rarely considered. The Baffin Bay dataset covers a region between the Baffin Island and the southwest coast of Greenland. In this dataset, 1% of the pixels are selected as the training set and the rest of the pixels in the image are selected for testing. From Figure 11, we can observe that in the results generated by EPF-G and IFRF, a lot of small white ice is classified incorrectly into gray ice. Therefore, the classification accuracies of EPF-G and IFRF are lower than the proposed method. From Table 6, we can observe that the proposed SS-RMG surpasses LBP-ELM by 2.2% in OA. Furthermore, in comparison with R-VCANet, the proposed method obtains a 1.5% improvement in OA. This indicates that the proposed SS-RMG is effective in feature extraction and classification. The experimental results on this dataset indicate that the proposed method can achieve good accuracy in sea ice classification by capturing the intrinsic inter-class discriminative patterns.

Analysis and Discussion
Figures 10-12 illustrates the classification results on the three datasets. The classification maps generated by the proposed SS-RMG are obviously less noisy than the other methods, e.g., the regions of Land in Figure 11. The visual results are consistent with those in Tables 4-6. The number of training samples is an important concern in hyperspectral image classification, since the number of training samples is often limited. In some studies, 50% of all the labeled pixels are selected as training samples [18,27]. Figure 13 shows the influences of training samples on the Indian Pines dataset. The number of training samples is large enough to depict the tendency. SS-RMG, R-VCANet, and IFRF present the best classification results among all the methods, therefore they are displayed in Figure 13. These results are obtained by averaging the accuracy results of 30 runs. When the ratio of training samples is above 10%, the classification accuracies of SS-RMG tend to be stable. Therefore, we draw the conclusion that 10% is enough to learn a representative model for this dataset. Moreover, it can be observed from Figure 13 that, compared with R-VCANet and IFRF, the proposed SS-RMG can achieve good classification results with less training samples.
Some recently proposed deep learning models, such as pixel-pair features learned by deep convolutional neural networks (CNN-PPF) [23], have attracted considerable attention. CNN-PPF is a CNN-based classification method based on deep pixel-pair features. The pixel-pair model is used to exploit the similarity between pixels and ensure a sufficient amount of input data to learn a large number of parameters in the CNN. The model is comprised of ten convolutional layers and three max-pooling layers. In the implementations of CNN-PPF, 200 labeled pixels per class are needed for training, and the other pixels are used for testing. To compare the classification performance between CNN-PPF and the proposed SS-RMG, we use the same number of training and testing samples as mentioned in [23].  Table 7 lists the class-specific accuracy and OA for the Indian Pines dataset. It should be noted that the classification result of CNN-PPF is directly obtained from Li's work [23], and the result of SS-RMG is obtained by the average value of running 30 times. We can observe that the proposed SS-RMG is superior to CNN-PPF, and yields over 3% higher accuracy. Particularly for Soybean-mintill, the class-specific accuracy of the proposed SS-RMG is about 10.4% higher than CNN-PPF. The comparison with CNN-PPF demonstrates that graph-based ensemble learning methods can obtain competitive classification results. Sometimes, the graph-based ensemble model can even generate better classification results. The computational complexity of the proposed method and other closely related methods is reported in Table 8. All experiments were implemented on the Intel Xeon E5-1620 platform. The computational cost of the proposed SS-RMG is higher than EPF-G, IFRF, and LBP-ELM, due to the fact that SS-RMG carries the burden of graph construction. It is worth noting that the graph constructions are performed independently, which means that this procedure can potentially be performed in parallel. Thus, the speed of the proposed SS-RMG can be further improved. Moreover, compared with R-VCANet, the proposed SS-RMG is superior, which means that the speed of SS-RMG is quite competitive to some deep learning models.
Overall, the experimental results on two popular datasets and one sea ice dataset could imply that the proposed SS-RMG is an effective graph-based ensemble learning model for hyperspectral image classification. It can capture the intrinsic inter-class discriminative patterns, and only limited training samples are needed.

Conclusions and Future Work
In this paper, we propose a spectral and spatial classification framework based on Random Multi-Graphs for hyperspectral image classification. First, spatial features are extracted based on LBP. Then, spatial and spectral features are stacked into high dimensional vectors. Second, the high dimensional vectors are fed into Random Multi-Graphs for classification. By randomly selecting a subset of features to create a graph, the proposed method can achieve satisfying classification performance. Compared with closely related methods, the proposed method exhibits good performance. It can be concluded that the proposed method can handle the hyperspectral classification task with limited training samples.
Internal variability has enhanced the greenhouse gas forced Arctic sea ice decline in the past decades [56]. The observed decline in Arctic sea ice opens shorter trade routes across the Arctic Ocean. These routes will allow swifter deliveries between Europe and Asia. Therefore, it is important to monitoring the sea ice along these routes for safe navigation. Hence, in the future, we will focus on developing Arctic sea ice classification methods for hyperspectral images.