Higher Order Support Vector Random Fields for Hyperspectral Image Classiﬁcation

: This paper addresses the problem of contextual hyperspectral image (HSI) classiﬁcation. A novel conditional random ﬁelds (CRFs) model, known as higher order support vector random ﬁelds (HSVRFs), is proposed for HSI classiﬁcation. By incorporating higher order potentials into a support vector random ﬁelds with a Mahalanobis distance boundary constraint (SVRFMC) model, the HSVRFs model not only takes advantage of the support vector machine (SVM) classiﬁer and the Mahalanobis distance boundary constraint, but can also capture higher level contextual information to depict complicated details in HSI. The higher order potentials are deﬁned on image segments, which are created by a fast unsupervised over-segmentation algorithm. The higher order potentials consider the spectral vectors of each of the segment’s constituting pixels coherently, and weight these pixels with the output probability of the support vector machine (SVM) classiﬁer in our framework. Therefore, the higher order potentials can model higher-level contextual information, which is useful for the description of challenging complex structures and boundaries in HSI. Experimental results on two publicly available HSI datasets show that the HSVRFs model outperforms traditional and state-of-the art methods in HSI classiﬁcation, especially for datasets containing complicated details


Introduction
With the development of hyperspectral imaging technology, hyperspectral image (HSI) classification has attracted increasing attention in various fields such as disaster monitoring, precision agriculture, and the military.The high spectral resolution of the HSI largely facilitates the classification, which requires the discrimination of small differences among ground cover classes.However, the high-dimensional data spaces bring challenges to the classification methods.The most difficult issue in supervised HSI classification is the Hughes effect, which occurs when feature dimension is high and training samples are limited [1].Different methods have been proposed to address the Hughes effect, among which the support vector machine (SVM) [2] has promising performance due to its robustness to the Hughes effect compared to traditional HSI classification methods [3][4][5].However, the SVM classifier considers only pixel-level spectral information and overlooks spatial contextual information, whereas spatially adjacent pixels are actually highly correlated in HSI.Thus, the SVM classifier usually leads to classification maps with salt-and-pepper noise.Many spectral-spatial classification methods have been proposed for HSI classification [6,7], and in a recent review of HSI classification, the advantages of using both spectral and spatial information are concluded [8].Among spectral-spatial classification methods, conditional random fields (CRFs) have been extensively studied for HSI classification because of their strong spatial context modeling ability [9,10].
CRFs were first proposed for sequence data labeling by Laffery in 2001 [11].Kumar et al. [12] applied CRFs to man-made structure detection in natural images.Shotton et al. [13,14] employed the CRFs model for multi-class object recognition and segmentation.He et al. [15,16] proposed multiscale CRFs for image labeling.Torralba et al. [17] proposed boosted random fields based on CRFs for object detection.Gould et al. [18,19] proposed combining a relative location prior with the CRFs model for multi-class image labeling.Ladicky et al. [20,21] utilized associative hierarchical CRFs for object class image segmentation and dense stereo reconstruction.Huang et al. [22] used a hierarchical CRF for labeling and segmenting street scene images.Li et al. [23] implemented multi-class object segmentation using superpixel-based CRFs.Kohli et al. [24,25] proposed adding robust higher order potential to pairwise CRFs to enforce label consistency in image labeling tasks.All these applications in object detection, image labeling, recognition, and segmentation benefit from the strong spatial context modeling ability and feature integration flexibility of CRFs.In recent years, the advantages of CRFs have attracted the attention of researchers in the field of remote sensing image processing, and CRFs have been introduced to model the spatial context in remote sensing images [9,10,[26][27][28][29][30][31][32].
The SVM classifier has been proven effective for HSI classification and has been combined with other methods to incorporate spatial information to improve classification accuracy [6,7,33,34].Support vector random fields (SVRFs) [35] improve CRFs by embedding the SVM classifier into the CRF framework, thus taking advantage of the powerful discriminant properties of SVM while still maintaining the spatial context modeling ability of CRFs.The SVRFs model was employed by Zhong et al. [26] for high spatial resolution remote sensing image classification.In this work, the SVM classifier was used as the spectral term and a Mahalanobis distance boundary constraint model was defined as the spatial term in CRFs, and the proposed model was called the support vector random fields classifier with a Mahalanobis distance boundary constraint (SVRFMC) [26].In the Mahalanobis distance boundary constraint, both the spatial and spectral contextual information are modeled.Thus, the SVRFMC model outperforms the earlier BC-CRF [36] that used multinomial logistic regression (MLR) [37] as the spectral term and a boundary constraint as the spatial term, which only captures the spatial context.However, similar to classical pairwise CRFs, SVRFMC can only capture pairwise interactions and ignores the higher level context that extensively exists in HSI and is potentially useful for HSI classification.
In [24], Kohli et al. incorporated higher order potentials into pairwise CRFs to model higher level properties in natural images.After that, CRFs with higher order potentials (CRF-Hs) were applied to HSI classification by Zhong et al. [27] and the higher order potential was proved to be an efficient term to model complicated structures and boundaries in HSI.However, in CRF-H, the multinomial logistic regression (MLR) [37] classifier was used as the unary term, for which the discriminant ability for hyperspectral features is far from that of SVM.The MLR classifier has limited generalization ability when there are not enough training samples [26].Moreover, a generalization of the Ising model was used to define the pairwise term in CRF-H, which does not preserve edges between different classes very well compared to the Mahalanobis distance boundary constraint in SVRFMC.In addition, Zhong et al. [27] integrated the class information and the situation information of each segment's constituting pixels into the weight parameters in the higher order potentials, which showed limited improvement over pairwise CRF in the overall accuracies of HSI classification (0.11% and 1.13% on the two tested datasets in their work).
In this work, we propose a novel model known as higher order support vector random fields (HSVRFs), which incorporates higher order potentials into the SVRFMC model, for HSI classification.In the HSVRFs model, we use a multi-class probabilistic SVM classifier [38] as the unary potential and the Mahalanobis distance boundary constraint model [26] as the pairwise potential; these components are similar to the two corresponding parts in the SVRFMC model.Besides the unary and pairwise potentials, we propose incorporating higher order potentials into the HSVRFs to enforce label consistency in image segments, which are obtained using a recently proposed efficient unsupervised over-segmentation algorithm called the entropy rate (ER) [39].The higher order potentials adjust the label inconsistency cost of each segment with two important pieces of information: the first is the segment homogeneity, which is defined as the variance of spectral vectors of the segment's constituting pixels; the second is the sum of weights of inconsistent pixels (pixels not taking the dominant label) in the segment, which is based on the label map obtained by the SVM classifier in our framework.The labeling of each pixel in a segment is done with a different level of confidence since the SVM classifier gives the class label of each pixel a probability.Therefore, the weight of each pixel is given as its probability of taking the label, which is provided by the SVM classifier.This weighting strategy improves the expressiveness of higher order potentials for complex structures and boundaries in HSI with very little extra computation.By integrating the higher order potentials into SVRFMC model, the HSVRFs model not only takes advantage of the SVM classifier and the pairwise potential of Mahalanobis distance boundary constraint, but can also capture higher level context in HSI.
The main contributions of this paper are as follows.Firstly, we propose a new conditional random field model named HSVRFs to exploit both spectral and spatial information and their context for HSI classification based on the SVM classifier, the Mahalanobis distance boundary constraint model, and higher order potentials.Secondly, by incorporating higher order potentials into our framework, higher order dependencies among spectral vectors of the constituting pixels in image segments rather than pairwise relations between neighboring pixels, are taken into consideration.Spatial contextual information is utilized more effectively to improve HSI classification accuracy.Thirdly, a weighting strategy for pixel labeling in each segment is proposed to compute the label inconsistency cost of higher order potentials.This strategy enables the higher order potentials to capture the complex details in HSI more efficiently.Fourthly, we evaluated the performance of HSVRFs model on two HSI datasets with different spectral and spatial resolutions.Experimental results show that the HSVRFs model outperforms traditional and state-of-the-art methods for HSI classification, and its advantages are more obvious for HSI with high spatial resolution and more complicated details.
The rest of this paper is organized as follows: Section 2 describes the proposed HSVRFs model.The experimental results and analysis for two HSI datasets are presented in Section 3. Finally, Section 4 gives a conclusion of our work.

Problem Formulation
In the HSI classification problem, an observed image with V pixels is denoted by a set of spectral vectors x = (x 1 , x 2 , ..., x i , ...x V ), where the pixel index i ranges from 1 to V .x i = [x i1 , x i2 , ..., x id ] T represents the spectral vector of pixel i, and d is the number of bands.The task is to infer the labeling of the image y = ( 1 , 2 , ..., i , ..., V ), where each variable i is the label of pixel i and takes a value from the set C = {1, 2, ..., L}, while L is the number of classes.Thus, the y labeling takes values from the set C V .The HSI classification problem is formulated as finding a field of class labels that represent the maximum posterior labeling, i.e., y * = ar max y∈ C V P (y|x).

Higher Order Support Vector Random Fields
The proposed HSVRFs model incorporates higher order potentials into pairwise CRFs called SVRFMCs [26] and can be formulated as follows: where Z (x, η) is a normalizing constant known as the partition function, N i is the neighbor set of pixel i in its 8-neighborhood, S is the set of all image segments, y c is the set of labels over the higher order clique c, and x c is the set of spectral vectors over clique c. ϕ i (•), ϕ i j (•), and ϕ c (•) are the unary, pairwise, and higher order potentials, respectively.ϑ and ξ are the parameters of unary and higher order potentials, respectively.The pairwise potentials, which are defined as the Mahalanobis distance boundary constraints [26], have no parameters.η is the vector of the model parameters, i.e., η = {ϑ , ξ }.
The unary potential ϕ i (•) describes the tendency that each pixel belongs to a certain class based on its spectral vector, and we use the multi-class probabilistic support vector machine (SVM) classifier [38] to define this potential.As a discriminative classifier, SVM has been proven to have promising performance for multispectral and hyperspectral remote sensing image classification [4,40,41].The SVM classifier has better performance than multinomial logistic regression (MLR) [37] for HSI classification because by using the kernel function, the linearly inseparable spectral signatures are projected into a higher-dimension space to be separable.SVM needs fewer training samples compared to MLR under the structural risk minimization (SRM) principle [26].The unary potential ϕ i (•) in ( 1) can be defined as follows: where ϕ i ( i , x i , ϑ ) represents the probability of belonging to class label i for pixel i under its spectral vector x i , ϑ is the parameter vector of this potential, δ (•) is the zero-one indicator function, ) is the probability of pixel i taking the class label k, based on the feature vector x i (which is given by the multi-class probabilistic SVM model [38]), and L is the number of classes.
The multi-class probabilistic SVM model estimates the multi-class probability .., L by combining all the pairwise class probabilities [42,43].The objective function of the probability estimation [42,43] is represented as: where .., L is the estimated pairwise class probability.The goal is to estimate p k using all r kl .The objective function can be rewritten as where Instead of directly pursuing the optimal solution of the objective function in Equation ( 4), a simple iterative method was proposed [42,43] in which the optimal solution p satisfies The iterative algorithm can be described as follows: Step 1. Start with some initial p k ≥ 0, ∀k and L k =1 p k = 1.
In HSI, there are usually complex interactions among spectral and spatial neighborhood, and a single SVM classifier considering only the pixel-level spectral information will get noisy classification maps.Therefore, we need a pairwise potential to model the contextual information to correct those wrongly classified pixels to get smoother classification results.ϕ i j (•) employs a Mahalanobis distance boundary constraint model [26] that captures both the spectral and spatial context, which is formulated as in Equation ( 9): where D(x i , x j ) is a modified Mahalanobis distance, which measures the similarity between neighboring spectral vectors x i and x j .The detailed formulation of D (x i , x j ) can be found in [26].σ 2 is the mean value of (x i − x j ) T (x i − x j ) over the whole image.The boundary constraint with the modified Mahalanobis distance captures both the spectral and spatial context, which has superiority over CRFs using the Potts model [9] and a simple boundary constraint model [36] that considers only spatial correlation [26].
The higher order potential ϕ c takes the form of a Robust P n model [24] to capture the high-level contextual information.These potentials are defined on image segments, which are generated by the ER over-segmentation algorithm [39].The major goal of the higher order potentials is to enforce label consistency in image segments while appropriately maintaining structure details.However, the image segments are not all equally homogeneous.There may be more than one class in some segments, which will lead to incorrect classification if label consistency is enforced rigidly.Therefore, the higher order potentials modulate the label inconsistency cost firstly by measuring the segment homogeneity, which is defined as the variance of pixel spectral vectors in the segment.Secondly, the confidence for the labeling of each pixel in a segment is different because the SVM classifier gives each pixel's class label with a probability.For this reason, we weight each pixel with its probability of taking the label.Then, the label inconsistency cost is also modulated by accumulating the weights of inconsistent pixels in a segment.By this weighting strategy, the higher order potentials can describe the complicated details in HSI more accurately.We define the higher order potentials as in Equations ( 10) and (11): where P = i ∈c w i , and w i represents label confidence for pixel i, which is given by the multi-class probabilistic SVM classifier, i.e., measures the inconsistency cost by accumulating the weights (label confidence) of pixels.In this way, pixels with higher label confidence are given more importance, while pixels with lower label confidence are given less importance.This weighting strategy makes the higher order potentials better model the challenging complex details in HSI with very little extra computation.T is the threshold parameter controlling the rigidity of the higher order potentials.According to [24], the value of T satisfies the constraint T < 0.5, and we tune the value experimentally.λ max is a function incorporating the homogeneity of each segment, and the inconsistency cost is positively correlated to λ max , which means the more homogeneous the segment, the higher the inconsistency cost.When W i (y c ) ≤ T • P, the inconsistency cost is also positively correlated to W i (y c ), which means the larger the accumulated weights of inconsistent pixels, the higher the inconsistent cost.The function λ max is defined as in Equation ( 12) to measure the homogeneity of each segment: where H (c) models the homogeneity of segment c using the variance of spectral vectors for constituting pixels in c, and θ α , θ p , θ are parameters.The definition of H (c) is shown in Equation ( 13): where • is the l 2 norm, µ = i ∈c x i /|c | is the mean spectral vector, and θ β is a parameter.Therefore, a segment containing multiple classes will have low homogeneity in Equation ( 13) and thus will have a low inconsistency cost in Equation (10), which encourages some pixels in an inhomogeneous segment to take inconsistent labels.In this way, the higher order potentials can eliminate the oversmoothing effect caused by a rigid consistency enforcement.

Parameter Learning and Inference
Since there are many parameters in the HSVRFs model, an exhaustive search for the optimal parameter values is impractical.We found the optimal values for different parameters of the HSVRFs model under a piecewise training framework [44], where the model is divided into pieces and each piece is trained independently.It has been proved in [44] that piecewise training is an effective training method for graphical models like CRFs, performing much better than pseudolikelihood [45], and it is often competitive for global training.In this paper, we divided the model according to the types of cliques (i.e., unary potential, binary potential, and higher order potential), in a similar manner to the method used in [27].However, there is a problem with the piecewise training strategy, which is that it may lead over-counting during inference in the combined model [44].To compensate the over-counting, scalar powers were introduced for each of the three potentials in HSVRFs, and all of them functioned in the form of adding weights to the corresponding potential [14].Then, by combining the separately learned potentials, the posterior probability in (1) can be obtained as in Equation ( 14): where λ 1 , λ 2 , and λ 3 are the fixed powers of unary, pairwise, and higher order potentials, respectively.Similar to the work of Zhong et al. [27], we fixed λ 1 as 1 and only modulated λ 2 and λ 3 .
Under the piecewise training framework, we selected the model parameters in a way similar to that used in [24].We first learned the parameters in the SVM classifier (unary potential) using the LibSVM toolbox [43].The radial basis function (RBF) [42] was used as the kernel of SVM and then the unary parameters ϑ = {C, γ } (C controls the penalty during optimization, and γ is the spread of RBF), were tuned by five-fold cross validation within the range of [2 −5 , 2 15 ] and [2 −15 , 2 5 ], respectively, with grid-search [43].Then, we kept the learned parameters of the SVM constant and tuned the parameters of pairwise potential.Since the Mahalanobis distance boundary constraint has no parameters, we only need to adjust the weight of the pairwise potential.After that we tuned the higher order parameters in the HSVRFs with only unary and higher order potentials, and then with the ratio between unary and pairwise potentials.For all the above steps, five-fold cross-validation was used.
With the selected parameters, we inferred the optimal classification result (i.e., maximizing the probability in Equation ( 1)) with the graph-cut-based move for making algorithms [46], which was proven to be efficient in [24] for the inference of CRFs with higher order potentials.

Experimental Setting
Two hyperspectral datasets were used to evaluate the performance of HSVRFs: the Indian Pines [47] and the Pavia University [47] datasets.The two images are very popular hyperspectral datasets and have been widely used in many classification works.The Indian Pines image was acquired by airborne visible/infrared imaging spectrometer (AVIRIS) over the Indian Pines test site in Northwestern Indiana.This image covers 145 × 145 pixels with 20-m spatial resolution and a 0.4 to 2.5-µm wavelength range.Two hundred spectral bands were observed after removing 20 water absorption bands.In this dataset, 10,249 pixels were labeled, and the rest were not.There were 16 classes available in the original ground truth; 7 were discarded in our experiments because they contained only few training samples.The remaining nine classes contained 9345 labeled pixels.A three-band false color image and the ground truth image of this dataset are shown in Figure 1a,d (in Section 3.2).The Pavia University image was collected by reflective optics system imaging spectrometer (ROSIS) sensor over an urban area of the Pavia University in northern Italy.This dataset contains more complex structure information than the Indian Pines.The image covers 610 × 340 pixels, with a very high spatial resolution of 1.3 m, and 103 spectral bands were preserved after removing 12 noisy bands.There were 42,776 labeled pixels available in the ground truth, belonging to nine different classes.Figure 2a,g (in Section 3.2) present a three-band false color image and the ground truth for this dataset.
We conducted two groups of experiments on the Indian Pines dataset.All the compared methods were run five times with different randomly selected training testing sets, and the average accuracies were reported.In the first group of experiments, we randomly selected 200 training samples for each of the nine classes from the ground truth and the rest of the samples were used for testing.The class descriptions and the training and testing size of each class are shown in Table 1 (see Section 3.2).We compared the classification results of HSVRFs with those of MLR [37,48], SVM [2], CRFs, and SVRFMC [26].In the second group of experiments, we kept the same training testing split of reference data as Zhong et al. [27] did, and directly drew the classification results of MLR and CRF-H on Indian Pines from his work for comparison.The details of the classes and training/testing split are given in Table 2 (see Section 3.2).We also give the classification results of CRF, SVM, SVRFMC, and HSVRFs for comparison in this group of experiments.
One group of experiments was conducted on the Pavia University dataset.Similar to experiments on Indian Pines, we ran all the compared methods for five times with different randomly selected training testing sets, and reported the average accuracies.We randomly picked 70 training samples for each class, and the rest were used as testing sets.Table 3 (see Section 3.2) shows the class descriptions and the training/testing sample numbers for each class.We also compared the HSVRFs model with MLR, CRFs, SVM, and SVRFMC on this dataset.
In all the experiments, the radial basis function (RBF) kernel [38] was used for SVM classifiers, as it has been proven to be effective for the complicated nonlinear spectral signature classification.For the two datasets, the optimal parameter values of the SVM classifier were selected as C = 512, γ = 0.125.The MLR classifier was trained using the backpropagation algorithm [49,50], and the weight decay parameter λ was tuned by five-fold cross validation.λ was set to be 4.4 × 10 −6 for the Indian Pines dataset and 5 × 10 −6 for the Pavia University dataset.In CRFs, the MLR classifier was used as the unary term while the Mahalanobis distance boundary constraint model was used as the pairwise term, and this was the same for the corresponding part in the SVRFMC.This strategy, for the fair comparison of MLR and SVM, was used as the unary term in the CRFs framework.

Classification Performance
Figures 1 and 2 show experimental results on the Indian Pines and Pavia University datasets.The results were obtained with the tuned parameter values mentioned above.From the classification maps and difference images, we can see that the results of MLR and SVM have abundant salt-and-pepper classification noise.The result of SVM is better than that of MLR, which is consistent with the conclusion in the literature that SVM has robust performance in conditions of high feature dimension and limited training samples, while the MLR has limited generalization ability when training samples are not sufficient [26].CRFs, SVRFMC, and HSVRFs exhibit better visualization results in that the salt-and-pepper noise is greatly reduced and smoother results are obtained.This is because the spatial context is considered in the four models.However, there is an obvious over-smoothing effect in the classification maps of CRFs, which is illustrated more clearly in the corresponding difference images.In the difference image of CRFs on Indian Pines (Figure 1j), it is clear that there are patches which are entirely misclassified (shown in circles in Figure 1j).For example, on comparing the classification map and difference image of CRFs (Figure 1g,j), we can see that the corn-notill patch (shown in the yellow circle) in the top-left corner is misclassified into corn-mintill.The soybean-mintill patch (shown in the purple circle) in the top-right is misclassified into soybean-clean.Meanwhile, the soybean-clean patch (shown in the white circle) in the left of the image is misclassified into corn-mintill, and the soybean-notill patch (shown in the blue circle) at the bottom of the image is misclassified into soybean-mintill.In contrast, the results of SVRFMC and HSVRFs are much better.For example, from the difference images of SVRFMC and HSVRFs on the Indian Pines image (Figure 1k,l), we can see that the four aforementioned misclassified patches in the difference image of CRFs (Figure 1j) are classified correctly in general.This demonstrates the advantage of SVM over MLR used as the unary potential in the CRF framework.
Furthermore, it can be seen that compared to SVRFMC, the HSVRFs model obtains better classification maps.Note that there are still misclassified pixels in the yellow-circled region in the difference image of SVRFMC on the Indian Pines dataset.However, those pixels are correctly classified in the results of HSVRFs.It is worth noting that for the Pavia University dataset, the advantage of the HSVRFs model is more obvious.For example, when comparing the difference images of SVRFMC and HSVRFs for Pavia University, we can see that the misclassified pixels of the Meadow patches (shown in yellow circles), Bitumen patches, Bare Soil patches (shown in white circles), and the Brick patches (shown in purple circles) in the result of the SVRFMC (Figure 2k) are classified into the right classes in the results of the HSVRFs (Figure 2l).The reason for this is that the integration of higher order potentials makes the HSVRFs model capable of modeling high-level contextual information, and thus it can better depict the complicated details in HSI, especially for urban images that contain many complex structures.To conclude, compared to the other four methods, the HSVRFs model can achieve competitive classification maps, showing appropriate smoothing and preserving good boundary information.This can be demonstrated more clearly in the comparison of circled regions in the difference images of the five methods.To give the quantitative evaluation, Tables 1 and 2 present class-specific accuracies (percentage of pixels correctly classified for each class), overall accuracy (OA; percentage of pixels correctly classified) and Kappa [51] of compared methods in the two groups of experiments on Indian Pines dataset.Table 3 shows the accuracies of compared methods on the Pavia University dataset.We ran each experiment five times with different randomly chosen training/testing sets and reported the mean classification accuracies.
In the first group of experiments on the Indian Pines dataset, HSVRFs obtained the highest OA and Kappa at 94.83% and 93.81% respectively, which were 1.36% and 1.61% higher than the corresponding values for SVRFMC, and these were the highest class-specific accuracies for most classes.The OA of SVM was 85.52%, which was about 8.9% higher than that of MLR.The OA of SVRFMC was about 6.44% higher than that of CRFs, which demonstrates the advantage of SVM over MLR working as the unary classifier in CRFs framework.In the second group of experiments on this dataset, HSVRFs also acquired the highest OA at 98.50%, which was 1.07% and 4.81% higher than the corresponding of the SVRFMC and CRF-H [27], respectively.It is notable that even with the higher order potential, the OA of CRF-H is still lower than that of SVRFMC.The reason for this phenomenon can be ascribed to the advantage of SVM in SVRFMC over the MLR in CRF-H, and also the advantage of the Mahalanobis distance boundary constraint in SVRFMC over the Ising model in CRF-H.
In the Pavia University dataset with more complex structures, the HSVRFs obtained the highest OA and Kappa at 96.67% and 97.10% respectively.These values were 2.02% and 3.98% higher than the corresponding values for the SVRFMC.The OA of SVRFMC was 6.23% higher than that of CRFs.Over the two datasets, the OAs of CRFs, CRF-H, SVRFMC, and HSVRFs were higher than those of non-contextual MLR and SVM, which shows the importance of contextual information.The HSVRFs achieved the best performance among all the six methods.Furthermore, the advantage of HSVRFs over SVRFMC on the Pavia University dataset is more obvious than that on the Indian Pines dataset.This validates the suitability of the proposed higher order potentials for modeling urban HSI containing more complex structures.

Parameter Analysis
In the proposed HSVRFs model, there are three main factors that have an influence on the final classification accuracy.The first is the threshold parameter T which controls the rigidity of the higher order potentials, the second is the number of superpixels (i.e., segments) in the ER over-segmentation algorithm, and the third is the number of training samples.In this section, we give the analysis about classification performance of the model with various values of the three parameters.
Figure 3 gives the OA curves of the HSVRFs model in relation to different values of T .Each OA reported in the curves was the average value of five experiments with different randomly chosen training sets.The other parameters were kept the same as those mentioned in Section 3.1.For the two datasets, we tuned the values of T from 0.01 to 0.49, with a stepsize of 0.01.From the two curves, we can see that the highest OA was obtained when T = 0.27 for the Indian Pines dataset, and T = 0.43 for Pavia University.This was reported in the selected parameter values in Section 3.1.
Figure 4 gives the OA curves of the HSVRFs model in relation to different number of superpixels in ER algorithm.Each OA reported in the curves was also the average value of five experiments with different randomly chosen training sets.The other parameters were set the same as those mentioned in Section 3.1.For the Indian Pines dataset, we tuned the number of superpixels from 100 to 2500, with a stepsize of 100.For the Pavia University dataset, we tuned the number of superpixels from 300 to 3000, with a stepsize of 100.From the two curves, we can see that the highest OA was obtained when the number of superpixels was 500 for Indian Pines dataset, and 900 for Pavia University.
Figure 5 gives the OA curves of MLR, CRF, SVM, SVRFMC, and HSVRFs in relation to different numbers of training samples.Each OA reported the 5-time average overall accuracy.All the other parameters were set the same as those mentioned in Section 3.1.For the Indian Pines dataset, the number of training samples varied from 10 to 250, with a stepsize of 50 except for the first and second point in the curves.For the Pavia University dataset, the number of training samples varied from 10 to 90, with a stepsize of 20.We note that on the two datasets, the OAs of the compared methods grew with the increase in training samples, but the growth became slower when the numbers of training samples increased to a certain point.The HSVRFs model performed the best among the five methods on the two datasets.Lastly, the HSVRFs model obtained the highest OA when the number of training samples was 200 for the Indian Pines dataset and 70 for the Pavia University dataset, which was reported in Section 3.1.

Conclusions
In this work, we propose a novel CRF model named HSVRFs for HSI classification.By incorporating higher order potentials into the SVRFMC model, the HSVRFs model not only takes advantage of the SVM classifier and Mahalanobis distance boundary constraint in the SVRFMC model, but can also capture higher-level contextual information.Moreover, we weight the pixels in each segment, on which the higher order potentials are defined, with their label confidences given by the SVM classifier in our framework.This weighting strategy further enhances the depiction ability of the higher order potentials in our model.Experiments on two real HSI datasets show that the HSVRFs model has better performance than the traditional MLR, SVM, and CRFs methods, and also outperforms the recently proposed SVRFMC and CRF-H models.The experiments also reveal that the HSVRFs model is especially efficient for HSI in urban areas, as it has high spatial resolution and contains complicated structures and boundaries.
Currently, the pairwise potentials in our HSVRFs model are defined on neighboring pixels in the 8-neighborhood.A further step is to define the pairwise potentials on neighboring superpixels (superpixels sharing edges), which will incorporate a longer-range spatial context that may further improve the classification result.This is part of our future work to better explore the spatial contextual information in HSI.Moreover, the integration of higher order potentials in HSVRFs brings additional computations on feature value calculation and model inference.Thus HSVRFs cost more computational time compared to second-order CRF frameworks.Moreover, we also hope to investigate the efficiency improvement of this model in our future work.

Figure 1 .
Figure 1.Classification results of the Indian Pines dataset (with 200 training samples for each class).(a) Three-band false color image.(d) Ground truth.(b,c,g-i) Classification maps obtained by multinomial logistic regression (MLR), support vector machines (SVMs), conditional random fields (CRFs), support vector random fields with a Mahalanobis distance boundary constraint (SVRFMC), and higher order support vector random fields (HSVRFs).(e,f,j-l) The corresponding difference images of (b,c,g-i) compared with the ground truth in (d).Black regions represent pixels without ground truth; In the rest of the areas, green regions represent correctly classified pixels and red regions represent wrongly classified pixels.

Figure 2 .
Figure 2. Classification results of Pavia University dataset.(a) Three-band false color image.(g) Ground truth.(g,f) Classification maps obtained by MLR, SVMs, CRFs, SVRFMC, and HSVRFs.(h-l) The corresponding difference images of (b-f) compared with the ground truth in (g).Black regions represent pixels without ground truth.In the remaining part, green regions represent correctly classified pixels and red regions represent wrongly classified pixels.

Figure 3 .
Figure 3. Overall classification accuracies for different values of T. (a,b) show the OA versus different values of T on the Indian Pines and Pavia University datasets, respectively.

Figure 4 .
Figure 4. Overall classification accuracies for different numbers of superpixels.(a,b) show the OA versus different numbers of superpixels on the Indian Pines and Pavia University datasets, respectively.

Figure 5 .
Figure 5. Overall classification accuracies with different numbers of training samples.(a,b) show the OA versus different numbers of training samples on the Indian Pines and Pavia University datasets, respectively.

Table 1 .
Classification accuracies of the different algorithms for the Indian Pines dataset (%).OA: overall accuracy.

Table 2 .
Classification accuracies of the different algorithms for the Indian Pines dataset (%).

Table 3 .
Classification accuracies of the different algorithms for the Pavia University dataset (%).