Hyperspectral Image Classification Based on Fusion of Curvature Filter and Domain Transform Recursive Filter

In recent decades, in order to enhance the performance of hyperspectral image classification, the spatial information of hyperspectral image obtained by various methods has become a research hotspot. For this work, it proposes a new classification method based on the fusion of two spatial information, which will be classified by a large margin distribution machine (LDM). First, the spatial texture information is extracted from the top of the principal component analysis for hyperspectral images by a curvature filter (CF). Second, the spatial correlation information of a hyperspectral image is completed by using domain transform recursive filter (DTRF). Last, the spatial texture information and correlation information are fused to be classified with LDM. The experimental results of hyperspectral images classification demonstrate that the proposed curvature filter and domain transform recursive filter with LDM(CFDTRF-LDM) method is superior to other classification methods.

Many scholars around the world have successfully studied various classification methods of HSI, including sparse representation-based techniques [12], Bayesian estimation method [13], K-mean [14], maximum likelihood [15], multinomial logistic regression [16] and deep learning [17].More specifically, Support Vector Machine (SVM) has been fruitfully applied in HSI classification and achieved respectable results [18].Zhang et al. adopted the idea of maximizing the margin mean and minimizing the margin variance to improve the maximum margin model of SVM, with the suggestion of using large margin distribution machine (LDM) [19].In addition, Zhan et al. applied LDM to HSI classification [20].
To improve the classification accuracy, many classification methods with spatial information extraction have been successfully investigated.Some scholars have attempted to obtain spatial information by segmentation to improve HSI classification.A classification method based on the construction of a minimum spanning forest from the region markers, which were gained from the initial classification results [21].Ghamisi et al. proposed a classification method based on two segmentation methods, fractional-order Darwinian particle swarm optimization and mean shift segmentation, and classified the integration of these two methods by SVM [22].Also, the existing researches acquired spatial information with morphological profile feature.Multiple morphological profiles were proposed for synthesizing the spectral-spatial information extracted from the multicomponent base images and were interpreted with decision fusion and sparse classifier based on multinomial logistic regression [23].The method proposed by Xue et al. for HSI classification was performed via morphological component analysis-based image separation rationale in sparse representation [24].Liao et al. applied the morphological profile filter and domain transform normalized convolution Filter (DTNCF) to extract the spatial information [25], which was combined and fed into support vector machine (SVM), and finally implemented two-step optimization in the classification process [26].Moreover, some scholars attempted to improve classification performance with Markov random field [27].For instance, Sun et al. proposed an HSI classification method, including a spectral data fidelity term and a spatially adaptive Markov random field prior in the hidden field based on maximum a posteriori framework with sparse multinomial logistic regression [28].Zhang et al. used an extended Markov random field model to combine the multiple features with local and nonlocal spatial constraints in the semantic space with probabilistic SVM for HSI classification [29].
In order to obtain the fully spatial features of HSI, many classification methods for spatial information extraction have been investigated.For example, the integration of spectral and spatial context was an effective method for HSI classification, and more researchers intended to extract spatial information with different filters, such as guided filter (GDF) [30], bilateral filter (BF) [31], Gabor filter (GF) [32] and etc. Wang et al. suggested a filtering framework named discriminatively guided image filtering which integrates SVM and linear discriminative analysis by GDF to enhance classification performance [33].A method of k-nearest neighbor with GDF was presented by Guo et al. to extract spatial information and optimize the classification accuracy [34].WANG et al. proposed a spectral-spatial HSI classification method based on joint BF and graph cut segmentation with the SVM classifier [35].Sahadevan et al. integrated the spatial texture information obtained with BF into the spectral domain to improve SVM performance [36].A hyperspectral classification method was proposed based on sparse representation classification spatial features, which were extracted by joint BF with the first principal component as the guidance image in the literature [37].Edge-preserving filter (EPF) and principal component analysis (PCA) [38]-based EPFs (PCA-EPFs) methods with GDF or BF and recursive filter were adopted to progress SVM classification performance in the references of [39] and [40], respectively.Moreover, a feature extraction method based on the image fusion with multiple subsets of adjacent bands and recursive filter (IFRF) was achieved by Kang et al. to increase accuracy of HSI classification [41].In addition, a spectral-spatial Gabor surface feature fusion method was completed with including SVM classifier for HSI classification, and the magnitude pictures of Gabor features were extracted by 2 dimensional GF in the reference [42].Li et al. projected Gabor features of the hyperspectral image obtained with GF into the kernel induced space through composite kernel technique [43].Chen et al. combined GD with deep convolutional neural networks to mitigate overfitting problem and increase classification accuracy for HSI classification [44].Tu et al. proposed an HSI classification method o based on non-local means filtering with maximum probability and SVM, which uses the spatial context information and non-local means filtering in the first principal component to obtain the optimization probability image of spatial structure [45].
A filter can be used to extract spatial texture features, but it is difficult to get complete spatial features using a single filter.In this paper, we first used the curvature filter (CF) to extract the spatial texture features [46,47], and then applied DTRF [25] to attain spatial correlation features to enrich the spatial characteristics and provide more effectively hyperspectral image classification.Finally, LDM can be adopted to classify the fusion of two spatial information to form a new classification method, which combine the curvature filter and domain transform recursive filter with LDM (CFDTRF-LDM).The work of this paper can be summarized as follows: Remote Sens. 2019, 11, 833 3 of 24 (1) CF with the minimal projection operator has superior characteristics of small calculation amounts and fast convergence [46], which can efficiently extract the spatial features of a hyperspectral image.The spatial correlation information of obtained by DTRF benefits the spatial texture information to improve classification accuracy.(2) The effective fusion of the two spatial information is conducive to the LDM classification and is superior to other methods.
The rest of this article is organized as follows.The methodology is shown in Section 2. The hyperspectral image datasets is applied in Section 3 to test the effectiveness of the proposed method, with analyzing the experimental results with CFDTRF-LDM.Finally, conclusions are drawn in Section 4.

Classification Method for HSI
LDM improves the SVM classification performance with simultaneous maximization of the margin means and minimization of the margin variances.A training set is defined as and m is the number of the training data.The function of SVM was to predict the unlabeled data with the hyperplane of maximization of the minimum margin [48], and can be shown as follows: where a is the weight vector of decision function, g(a) is the linear model and ϕ(a) is a mapping of a by a kernel k, such as: The margin of instance (x i , y i ) can be formulated as For the inseparable conditions, the soft-margin LDM can be expressed as Equation (4).
where α 1 and α 2 are the parameters corresponding to the trading-off the margin variance and the margin mean.The margin mean η and the margin variance η can be characterized as Equations ( 5) and ( 6), respectively.
Since the hyperplane of LDM intends to maximize the mean margin and minimize the margin variance, LDM can achieve more effective performance for the hyperspectral image classification with small amount of training data [20,26].

Spatial Information Extraction
In order to obtain fully spatial information, CF and DTRF were used to extract the spatial texture features and spatial correlation features, and the principle of the CF and DTRF will be analyzed in the following.

Curvature Filter
Curvature filter is proposed to first study the surface corresponding to the curvature and then select one of all surfaces to best approximate the data.As a unique optimization algorithm, the curvature filter optimizes regularization energy and implicitly uses known differential geometry surfaces in the filtering process.

A. Optimization of energy functional
The basic idea of the variational regularization method is to first define the energy function of the image processing problem.When the energy function is smaller, the variable is closer to the expected result.There is a relationship in the process of optimizing the model where ∂ D (M, I), which is data-fitting energy, measured how well M fits the image data I. ∂ R (M) that is regularization energy formalized prior knowledge about M, and λ is scalar regularization coefficient used to measure the contribution of the two energy.The evolution process of the energy function in the variational model is shown in Figure 1.The data-fitting energy ∂ D (M, I) is always increasing, while the regular energy ∂ R (M) is decreasing.Since the overall energy ∂(M) is decreasing, this indicated that the regularization energy is the main part in the optimization process.Therefore, curvature filtering suggests optimizing the regular energy.As long as the reduction of the regular energy is greater than the increase in the data-fitting energy, the overall energy decline can be guaranteed.The curvature filter proposes to optimize the variational model, which is to reduce the energy of the curvature regular energy to a minimum value, and minimize the regular energy by minimizing the regular curvature from the perspective of differential geometry [46].Curvature filter is proposed to first study the surface corresponding to the curvature and then select one of all surfaces to best approximate the data.As a unique optimization algorithm, the curvature filter optimizes regularization energy and implicitly uses known differential geometry surfaces in the filtering process.

A. Optimization of energy functional
The basic idea of the variational regularization method is to first define the energy function of the image processing problem.When the energy function is smaller, the variable is closer to the expected result.There is a relationship in the process of optimizing the model where ( , ) D MI


, which is data-fitting energy, measured how well M fits the image data I .
() R M  that is regularization energy formalized prior knowledge about M , and  is scalar regularization coefficient used to measure the contribution of the two energy.
The evolution process of the energy function in the variational model is shown in Figure 1.The data-fitting energy ( , ) D MI  is always increasing, while the regular energy Since the overall energy () M  is decreasing, this indicated that the regularization energy is the main part in the optimization process.Therefore, curvature filtering suggests optimizing the regular energy.As long as the reduction of the regular energy is greater than the increase in the data-fitting energy, the overall energy decline can be guaranteed.The curvature filter proposes to optimize the variational model, which is to reduce the energy of the curvature regular energy to a minimum value, and minimize the regular energy by minimizing the regular curvature from the perspective of differential geometry [46].

B. Domain decomposition
There is a dependency between adjacent pixels, which hinders local minimization of the principal curvature.A domain decomposition algorithm was proposed here to circumvent the problem.
As shown in Figure 2, the discrete domain Ω of image U was decomposed into four subsets: red triangle RT, red circle RC, purple triangle PT, purple circle PC.The advantages of this decomposition were as follows: (1) the dependence of adjacent pixels can be eliminated, and the filtering efficiency can be improved; (2) the updated field can be used to ensure convergence due to independence; (3)

B. Domain decomposition
There is a dependency between adjacent pixels, which hinders local minimization of the principal curvature.A domain decomposition algorithm was proposed here to circumvent the problem.
As shown in Figure 2, the discrete domain Ω of image U was decomposed into four subsets: red triangle R T , red circle R C , purple triangle P T , purple circle P C .The advantages of this decomposition were as follows: (1) the dependence of adjacent pixels can be eliminated, and the filtering efficiency can be improved; (2) the updated field can be used to ensure convergence due to independence; (3) all the tangent planes can be enumerated in a 3 × 3 local window [46].

C. Projection to the tangent plane
Assuming that a pixel is x , constructing the surface is to project the current pixel value of hyperspectral image () Mx onto the ideal pixel value () Mx which is on the optimal tangent plane of the adjacent pixel [46].The relationships are met as following: where d is the projection distance.
To find the optimal tangent plane of the field () Mx, all possible triangles are enumerated in the 3 × 3 neighborhood of x (as shown in Figure 3, excluding x as the vertex).Among them, four pass the red field R, and four pass the purple field P, and four pass the red/purple mixed field RP.As shown in Figure 4, since there are common edges of x passing through the 12 triangular sections and the projection was sufficient, there were only eight different projection distance i d .There are two common edges in R, two common edges in P, and four mixed tangent planes.

C. Projection to the tangent plane
Assuming that a pixel is x, constructing the surface is to project the current pixel value of hyperspectral image M(x) onto the ideal pixel value M(x) which is on the optimal tangent plane of the adjacent pixel [46].The relationships are met as following: where d is the projection distance.
To find the optimal tangent plane of the field M(x), all possible triangles are enumerated in the 3 × 3 neighborhood of x (as shown in Figure 3, excluding x as the vertex).Among them, four pass the red field R, and four pass the purple field P, and four pass the red/purple mixed field RP.

C. Projection to the tangent plane
Assuming that a pixel is x , constructing the surface is to project the current pixel value of hyperspectral image () Mx onto the ideal pixel value () Mx which is on the optimal tangent plane of the adjacent pixel [46].The relationships are met as following: where d is the projection distance.
To find the optimal tangent plane of the field () Mx, all possible triangles are enumerated in the 3 × 3 neighborhood of x (as shown in Figure 3, excluding x as the vertex).Among them, four pass the red field R, and four pass the purple field P, and four pass the red/purple mixed field RP.As shown in Figure 4, since there are common edges of x passing through the 12 triangular sections and the projection was sufficient, there were only eight different projection distance i d .
There are two common edges in R, two common edges in P, and four mixed tangent planes.As shown in Figure 4, since there are common edges of x passing through the 12 triangular sections and the projection was sufficient, there were only eight different projection distance d i .There are two common edges in R, two common edges in P, and four mixed tangent planes.

D. Minimal Projection Operator (Pg)
According to Euler's theorem, it can be known that: where: 1 k , 2 k are the principal curvatures; i  is the angle to the principal plane.If the angular sample i  is sufficiently dense within ( ,  − ), when 11 ,0 kk , there is dm ≈ min{ki}.For the pixel at (i, j), the distance dm can be obtained from the tangent plane with the neighborhood pixels in the 3 × 3 window [46].
Therefore, the minimum absolute value dm is taken as the minimum projection of () M is on the tangent plane of the field E. Gaussian curvature filter The minimum projection operator is iterated with all pixels of T R , C R , T P and C P , and the Gaussian curvature filter can be generated as: D. Minimal Projection Operator (P g ) According to Euler's theorem, it can be known that: where: k 1 , k 2 are the principal curvatures; θ i is the angle to the principal plane.If the angular sample θ i is sufficiently dense within (−π, π), when k 1 , k 1 ≥ 0, there is d m ≈ min{k i }.
For the pixel at (i, j), the distance d m can be obtained from the tangent plane with the neighborhood pixels in the 3 × 3 window [46].
Therefore, the minimum absolute value d m is taken as the minimum projection of M(x) to M.
M is on the tangent plane of the field

E. Gaussian curvature filter
The minimum projection operator is iterated with all pixels of R T , R C , P T and P C , and the Gaussian curvature filter can be generated as: As a unique optimization algorithm, Gaussian curvature filtering is an image smoothing algorithm with edge protection, which assumes that the surface formed by the ideal noise-free image is block-expandable, and the Gaussian curvature is zero everywhere.Also, the pixel values are directly adjusted so that the tangential plane of the domain pixel satisfied the assumption, avoiding the explicit calculation of the Gaussian curvature.Thus, the image surface is no longer required to have second-order variability, allowing for the presence of abrupt edges and corners in the image to ideally protect the edges of image.
In hyperspectral images, there are hundreds of frequency bands, high correlation between large amount of data and adjacent bands, which leads to redundant information.In order to obtain more comprehensive spatial information with CF, we first use PCA to reduce dimensionality of hyperspectral images.The CF validation test will be found in Section 3.3.

Domain Transform Recursive Filter (DTRF)
DTRF proposed by Gastal et al. is used for image filtering, in which two-dimensional image filtering can be converted into one-dimensional image filtering [25].The energy function of DTRF for hyperspectral image R at the i-th band can be represented as: and f (y n ) = where is the result of the (n-1)-th recursive filtering, d is the distance between neighbor samples y n and y n−1 in the transformed domain Ω w , f (y n ) which is calculated by integrating the partial differential for the hyperspectral band image R k , which is transformed into an increasing function.Besides, r is filter radius, σ s is the spatial standard deviation, σ r is the range standard deviation, σ H t is the value of the t-th iteration, and N is the total number of iterations.DTRF has an infinite impulse response with the exponential decay.Briefly, as d increases, a d goes to zero, which stops the propagation chain, indicating that the neighborhood pixels are in the same ground.Equation ( 14) is an asymmetric causal filter and depended on input and output information.To obtain the filtering symmetry, this equation needs to be executed twice, such as the procedures: first from left to right, and then from right to left; or from top to bottom, and then from bottom to top [25].
In general, the ground distribution of hyperspectral images has suitable uniformity, so there is always a strong spatial correlation between pixels in a hyperspectral image.Moreover, the spatial correlation meaning is an associated property of the reflection intensity between a pixel and an adjacent pixel.However, spatial correlation information is often ignored in texture information extraction.
To examine the spatial correlation features of CF and DTRF, Moran's I [49,50] is employed to test the spatial correlation of hyperspectral images before and after filtering, calculated by the following formula: where Y i and Y j are the reflection intensities of two hyperspectral pixels, and Y is the average of Y. n is the pixel number of one band, and α ij is the spatial weight.The larger I is larger, the stronger the spatial correlation and vice versa.Section 3.4 describes validation tests for spatial correlation information extraction with DTRF.

CFDTRF-LDM
Based on CF and DTRF, a new classification approach (CFDTRF-LDM) is proposed.CF and DTRF are respectively applied to extract spatial texture information and spatial correlation information.In order to obtain rich spatial correlation feature, the spatial correlation information is obtained from original spectral images.In addition, in order to avoid the hughes phenomenon, the spatial correlation information and spatial texture information were obtained from the original image and the components of PCA respectively, so the total numbers of images are suitable for LDM classification.The implementation process will be depicted as following.
Step 1: normalization.The formula (21) normalized the hyperspectral image R, where µ and σ are corresponding to the mean and standard deviation of R.
Step 2: dimensionality reduction.Since most of the information is distributed in the front principal component after the PCA dimension is reduced, the normalized image H will be further lowered by PCA, while the top 10% of the principal componentis selected for CF.
Step 3: spatial texture information extraction.CF extracts the spatial texture information D t on each band of E by Equation (13).
Step 4: spatial correlation information extraction.DTRF extracts the spatial correlation information D c from E.
Step 5: fusion.Equation ( 23) linearly fuses D t and D c : Step 6: classification.The training set is randomly selected in proportion from D and the test set is formed with the remaining samples, which is verified by the LDM classifier.
The flow of the CFDTRF-LDM is shown in Figure 5.

Hyperspectral Data Description
Three hyperspectral image datasets were used to verify the effectiveness of CFDTRF-LDM.The first dataset was Indian Pines [51], which was acquired in 1992 by the airborne visible infrared imaging spectrometer (AVIRIS) sensor in the Indian Pines region of Northwestern Indiana.It contains 220 spectral bands with a spatial size of 145 145  pixels.Due to noise and water absorption, 20 spectral bands were removed, leaving 200 bands remaining.This image includes 16 classes, and the specific types and the numbers of each class are shown in Table 1.

Hyperspectral Data Description
Three hyperspectral image datasets were used to verify the effectiveness of CFDTRF-LDM.The first dataset was Indian Pines [51], which was acquired in 1992 by the airborne visible infrared imaging spectrometer (AVIRIS) sensor in the Indian Pines region of Northwestern Indiana.It contains 220 spectral bands with a spatial size of 145 × 145 pixels.Due to noise and water absorption, 20 spectral bands were removed, leaving 200 bands remaining.This image includes 16 classes, and the specific types and the numbers of each class are shown in Table 1.The third dataset was Kennedy Space Center acquired by NASA airborne visible/infrared imaging spectrometer (AVIRIS) at the Kennedy Space Center in Florida on 23 March 1996.AVIRIS collected 224 bands with 10 nm width with the center wavelengths from 400-2500 nm.The Kennedy Space Center dataset was available at an altitude of approximately 20 km with a spatial resolution of 18 m.After removal of water absorption and low SNR bands, 176 bands were used for the analysis.The image also includes 13 classes, and the specific types and numbers of each class are shown in Table 3.

Parameter Setting
To demonstrate the superiority of the proposed method, several methods were used to compare with CFDTRF-LDM, including: (1) SVM [18]: according to the raw features of hyperspectral images, SVM was applied with the Gaussian radial basis function kernel.(2) PCA-SVM (PCA with SVM): the use of PCA reduced the hyperspectral dimension and selected the top 10% components for the SVM.(3) LDM: gaussian radial basis function kernel was applied according to the raw features of hyperspectral images.(4) PCA-LDM (PCA with LDM): PCA reduced the hyperspectral dimension and selected the top 10% components for the LDM.( 5) EPF [39]: in this method, SVM classified hyperspectral images.Next, edge-preserving filter was conducted for each probabilistic map.Last, the class of every pixel was selected based on the maximum probability.( 6) IFRF [41]: this method acquired the classified results with SVM according to the image fusion and recursive filter.( 7) PCA-EPFs [40]: the spatial information constructed by applying edge-preserving filters was stacked to form the fused feature, and the dimension was reduced by PCA for the classifier of SVM. ( 8) LDM and feature learning-based(LDM-FL) [20]: this method attained the classified results with LDM from the recursive filter.(9) CF-SVM: the hyperspectral dimensionality was reduced with PCA, and the first 10% principal components were selected for SVM based on CF. (10) CF-LDM: the hyperspectral dimensionality was reduced with PCA, and the first 10% principal component were selected for LDM based on CF. (11) DTRF-SVM: the hyperspectral dimensionality was reduced with PCA, and the first 10% principal components were selected for SVM according to DTRF.(12) DTRF-LDM: the hyperspectral dimensionality was reduced with PCA, and the first 10% principal components were picked for LDM based on DTRF.(13) CFDTRF-LDM: the advanced method in this paper.( 14) CFDTRF-SVM: in addition to the classification results, the advanced method was generated by SVM in this paper.
In this paper, overall accuracy (OA), average accuracy (AA) and kappa statistic (Kappa) were adopted to test the classification accuracy.To avoid biased estimation, twelve independent tests were carried out using the computer program of Matlab R2012b based on the configuration of i7-6700 CPU and 8GB RAM.

The Validation Test of CF and DTRF
To verify CF validation, the 10th, 60th, 130th and 180th bands of Indian Pines were processed with CF.As shown in Figure 6, CF can extract good boundary features of hyperspectral images, and has great advantages in obtaining smooth edges by using CF smooth hyperspectral images.Also, DTRF owns good spatial correlation preserving characteristics.

Test of Spatial Correlation Information
To compare the spatial correlation of CF and DTRF, we calculated the mean of Moran's I for each band of Indian Pines, Salinas Valley and Kennedy Space Center datasets.The average Moran's I of the two filters is shown in Figure 7.It can be found that the average of Moran's I obtained from DTRF is higher than the average of CF and raw spectral features.In addition, the average of Moran's I acquired by CF is lower than that of the spectrum images, suggesting that the spatial correlation information is weak.Therefore, it can be illustrated that DTRF can extract good spatial correlation information and effectively compensate for the deficiency of CF.

Test of Spatial Correlation Information
To compare the spatial correlation of CF and DTRF, we calculated the mean of Moran's I for each band of Indian Pines, Salinas Valley and Kennedy Space Center datasets.The average Moran's I of the two filters is shown in Figure 7.It can be found that the average of Moran's I obtained from DTRF is higher than the average of CF and raw spectral features.In addition, the average of Moran's I acquired by CF is lower than that of the spectrum images, suggesting that the spatial correlation information is weak.Therefore, it can be illustrated that DTRF can extract good spatial correlation information and effectively compensate for the deficiency of CF.

Test of Spatial Correlation Information
To compare the spatial correlation of CF and DTRF, we calculated the mean of Moran's I for each band of Indian Pines, Salinas Valley and Kennedy Space Center datasets.The average Moran's I of the two filters is shown in Figure 7.It can be found that the average of Moran's I obtained from DTRF is higher than the average of CF and raw spectral features.In addition, the average of Moran's I acquired by CF is lower than that of the spectrum images, suggesting that the spatial correlation information is weak.Therefore, it can be illustrated that DTRF can extract good spatial correlation information and effectively compensate for the deficiency of CF.

Optimization of DTRF
The total number of iteration N, spatial standard deviation σ s and the range standard deviation σ r of DTRF can influence the filtering effect of the image.Therefore, a classification test was conducted for the Indian Pines dataset to verify the effectiveness of parameter optimization.From the entire data set, 4% and 96% of the training and test samples were randomly selected, and the exhaustive method was employed to establish the three optimal parameters to obtain the most satisfactory LDM classification results.To reduce the complexity of the algorithm, we first set the total number of iterations N = 10.Then, σ r ∈ 0.10, 0.11, • • • , 0.5 and σ s ∈ 10, 15, • • • , 500 were set for experiments.Last, the experiments were performed sequentially for the classification with 4059 iterations.According to the iteration result, when σ r = 0.43 and σ s = 260, the best classification can be obtained and the optimal OA = 90.23%.Therefore, to achieve a better classification, the parameters of σ r = 0.43 and σ s = 260 will be adopted in the following experiments.

Experiment of Indian Pines
To evaluate the performance of CFDTRF-LDM, fifteen methods were used to classify and validate the data from Indian Pines.The verified method is as follows: The distribution of Indian Pines datasets is shown in Figure 8a.All 16 categories were selected, of which 5% (about 533) samples were employed as the training set with the rest as test set, while 20% of the three types of Indian Pines grounds were insufficient for training.Tables 1 and 2 shows the classification accuracy generated by fifteen classification methods, as shown in Figure 8.
The classification results for Indian Pines are shown in Figure 8, while Tables 1 and 2 shows the accuracies of OA, AA and Kappa for each class of the different methods, and also indicates CFDTRF-LDM achieved the best accuracy, when OA = 96.64%,AA = 96.04% and Kappa = 96.18%.Furthermore, the accuracies can be over 99% of six classes for CFDTRF-LDM.This experiment demonstrates that the classification performance was improved compared to other classification methods.
the classification performance was improved compared to other classification methods.

Experiment of Salinas Valley
Similarly, the distribution according to the Salinas Valley dataset is shown in Figure 9a: all 16 classes were selected, with 0.8% (about 433) samples as the training set, and the remaining 99.2% as the test set.Table 2 lists the classification accuracy of the Salinas Valley dataset for different methods.The classification effects are shown in Figure 9.
The classification results for Salinas Valley are shown in Figure 9, while Tables 3 and 4 shows the accuracies of OA, AA and Kappa for each class of the different methods, and also indicates CFDTRF-LDM achieved the best accuracy, when OA = 99.16%,AA = 98.71% and Kappa = 99.06%.Furthermore, the accuracies reached 100% of four classes for CFDTRF-LDM.This experiment demonstrates that the classification performance was improved compared to other classification methods.
The classification effects are shown in Figure 9.The classification results for Salinas Valley are shown in Figure 9, while Tables 3 and 4 shows the accuracies of OA, AA and Kappa for each class of the different methods, and also indicates CFDTRF-LDM achieved the best accuracy, when OA = 99.16%,AA = 98.71% and Kappa = 99.06%.Furthermore, the accuracies reached 100% of four classes for CFDTRF-LDM.This experiment demonstrates that the classification performance was improved compared to other classification methods.10a: all 16 classes were selected, of which 4% (about 208) samples were employed as the training set, and the remaining 96% were used as the test set.Tables 5 and 6 lists the classification accuracies of the Salinas Valley dataset for different methods.The classification effect is shown in Figure 10.Likewise, the distribution based on Kennedy Space Center dataset is shown in Figure 10a: all 16 classes were selected, of which 4% (about 208) samples were employed as the training set, and the remaining 96% were used as the test set.Tables 5 and 6 lists the classification accuracies of the Salinas Valley dataset for different methods.The classification effect is shown in Figure 10.The classification results for Kennedy Space Center are shown in Figure 10, while Tables 5 and 6 indicates the accuracies of OA, AA and Kappa for each class of the various methods, with the best accuracy of CFDTRF-LDM as OA = 97.33%,AA = 96.13%and Kappa = 97.03%.Furthermore, six classes for CFDTRF-LDM owned accuracies more than 99%.This experiment shows that the classification performance was enhanced compared to other classification methods.

Analysis
First, as the classification results are shown in Figure 11.The OA values of LDM and PCA-LDM for Indian Pines were 79.85% and 78.49%, correspondingly, which were 2.38% and 0.68% greater than that of SVM and PCA-SVM.Likewise, the OA values of LDM and PCA-LDM for Salinas Valley were 88.96% and 89.19%, respectively which were 0.98% and 1.74% higher than that of SVM and PCA-SVM.Furthermore, the OA values of LDM and PCA-LDM for the Kennedy Space Center were 85.11% and 80.65%, severally, which were 2.66% and 1.16% grander than that of SVM and PCA-SVM.It can be included that LDM superior to SVM with features that maximize margin means and minimize margin variances.Second, as shown in Figure 12, the CF-SVM and CF-LDM OA values of Indian Pines were 10.62% and 9.79% higher than that of SVM and LDM, respectively, and the OA values of CF-SVM and CF-LDM in Salinas Valley were 1.21% and 1.60% higher than that of SVM and LDM.In addition, The OA values of CF-SVM and CF-LDM in Kennedy Space Center were 7.62% and 5.91% higher than that of SVM and LDM.This finding indicates that the spatial texture information extracted by CF was effective for enhancing the classification performance of SVM and LDM.Second, as shown in Figure 12, the CF-SVM and CF-LDM OA values of Indian Pines were 10.62% and 9.79% higher than that of SVM and LDM, respectively, and the OA values of CF-SVM and CF-LDM in Salinas Valley were 1.21% and 1.60% higher than that of SVM and LDM.In addition, The OA values of CF-SVM and CF-LDM in Kennedy Space Center were 7.62% and 5.91% higher than that of SVM and LDM.This finding indicates that the spatial texture information extracted by CF was effective for enhancing the classification performance of SVM and LDM.Second, as shown in Figure 12, the CF-SVM and CF-LDM OA values of Indian Pines were 10.62% and 9.79% higher than that of SVM and LDM, respectively, and the OA values of CF-SVM and CF-LDM in Salinas Valley were 1.21% and 1.60% higher than that of SVM and LDM.In addition, The OA values of CF-SVM and CF-LDM in Kennedy Space Center were 7.62% and 5.91% higher than that of SVM and LDM.This finding indicates that the spatial texture information extracted by CF was effective for enhancing the classification performance of SVM and LDM.Fourth, in Figure 14, the OA values of CFDTRF-LDM in Indian Pines, Salinas Valley and Kennedy Space Center were 96.64%, 99.16% and 97.33%, respectively.It can be found that all those OA values were larger than that of EPF, IFRF, PCA-EPFs and LDM-FL.Therefore, the spatial texture information and spatial correlation information obtained by CF and DTRF in this work can improve the performance of LDM than that of the edge-preserving filter and recursive filter methods, and the LDM-based methods.To prove the effect of the training ratio on the classification, the classification of the two datasets has been used to test the different values, as shown in Figure 15.As can be seen from the figure, if the training sample was 2% of the Indian Pines dataset, the OA value of the proposed method can reach 90.41%.In addition, when the ratio increased 7%, the OA value can exceed 97%.Also, if the training sample ratio of the Salinas Valley dataset was set to 0.2%, the OA value can reach 90%, and when the ratio increased to 0.8%, it can achieve to 99%.Also, when the training ratio were 2% and 9%, the OA value of the Kennedy Space Center can reach 93% and 99%, respectively.Thus, the proposed CFDTRF-LDM can obtain satisfied classification with a small amount of training set and provided stability of the different training ratios with optimal classification performance.Fourth, in Figure 14, the OA values of CFDTRF-LDM in Indian Pines, Salinas Valley and Kennedy Space Center were 96.64%, 99.16% and 97.33%, respectively.It can be found that all those OA values were larger than that of EPF, IFRF, PCA-EPFs and LDM-FL.Therefore, the spatial texture information and spatial correlation information obtained by CF and DTRF in this work can improve the performance of LDM than that of the edge-preserving filter and recursive filter methods, and the LDM-based methods.Fourth, in Figure 14, the OA values of CFDTRF-LDM in Indian Pines, Salinas Valley and Kennedy Space Center were 96.64%, 99.16% and 97.33%, respectively.It can be found that all those OA values were larger than that of EPF, IFRF, PCA-EPFs and LDM-FL.Therefore, the spatial texture information and spatial correlation information obtained by CF and DTRF in this work can improve the performance of LDM than that of the edge-preserving filter and recursive filter methods, and the LDM-based methods.To prove the effect of the training ratio on the classification, the classification of the two datasets has been used to test the different values, as shown in Figure 15.As can be seen from the figure, if the training sample was 2% of the Indian Pines dataset, the OA value of the proposed method can reach 90.41%.In addition, when the ratio increased 7%, the OA value can exceed 97%.Also, if the training sample ratio of the Salinas Valley dataset was set to 0.2%, the OA value can reach 90%, and when the ratio increased to 0.8%, it can achieve to 99%.Also, when the training ratio were 2% and 9%, the OA value of the Kennedy Space Center can reach 93% and 99%, respectively.Thus, the proposed CFDTRF-LDM can obtain satisfied classification with a small amount of training set and provided stability of the different training ratios with optimal classification performance.To prove the effect of the training ratio on the classification, the classification of the two datasets has been used to test the different values, as shown in Figure 15.As can be seen from the figure, if the training sample was 2% of the Indian Pines dataset, the OA value of the proposed method can reach 90.41%.In addition, when the ratio increased 7%, the OA value can exceed 97%.Also, if the training sample ratio of the Salinas Valley dataset was set to 0.2%, the OA value can reach 90%, and when the ratio increased to 0.8%, it can achieve to 99%.Also, when the training ratio were 2% and 9%, the OA value of the Kennedy Space Center can reach 93% and 99%, respectively.Thus, the proposed CFDTRF-LDM can obtain satisfied classification with a small amount of training set and provided stability of the different training ratios with optimal classification performance.
to effectively classify with LDM, and obtain huge classification performance for HSI.Furthermore, the proposed method can obtain satisfied classification with a small amount of training set and supply stability of the various training ratios with optimal classification performance.For future work, more efficient spatial information should be explored for SVM or LDM classification.

Figure 1 .
Figure 1.The evolution process of energy functional.

Figure 1 .
Figure 1.The evolution process of energy functional.

Figure 3 .
Figure 3.All possible triangles in the x neighborhood (a) down; (b) down and left (P); (c) mix.

Figure 3 .
Figure 3.All possible triangles in the x neighborhood (a) down; (b) down and left (P); (c) mix.

Figure 3 .
Figure 3.All possible triangles in the x neighborhood (a) down; (b) down and left (P); (c) mix.

Figure 4 .
Figure 4. Eight types of the triangular tangent planes through x (a) two of the common edges from the four tangent plane (R); (b) two of the common edges from the four tangent plane (P); (c) four of the tangent planes through mixed neighbors.

Figure 4 .
Figure 4. Eight types of the triangular tangent planes through x (a) two of the common edges from the four tangent plane (R); (b) two of the common edges from the four tangent plane (P); (c) four of the tangent planes through mixed neighbors.

Figure 6 .
Figure 6.Curvature filter (CF) and domain transform recursive filter (DTRF) comparison for Indian Pines; (a) the 10th band of spectrum; (b) the 60th band of spectrum; (c) the 130th band of spectrum; (d) the 180th band of spectrum; (e) the 10th band filtering of CF; (f) the 60th band filtering of CF; (g) the 130th band filtering of CF; (h) the 180th band filtering of CF; (i) the 10th band filtering of DTRF; (j) the 60th band filtering of DTRF; (k) the 130th band filtering of DTRF; (h) the 180th band filtering of DTRF.

Figure 6 .
Figure 6.Curvature filter (CF) and domain transform recursive filter (DTRF) comparison for Indian Pines; (a) the 10th band of spectrum; (b) the 60th band of spectrum; (c) the 130th band of spectrum; (d) the 180th band of spectrum; (e) the 10th band filtering of CF; (f) the 60th band filtering of CF; (g) the 130th band filtering of CF; (h) the 180th band filtering of CF; (i) the 10th band filtering of DTRF; (j) the 60th band filtering of DTRF; (k) the 130th band filtering of DTRF; (h) the 180th band filtering of DTRF.

Figure 7 .
Figure 7. Average of Moran's I for hyperspectral images (HSI) (a) Indian Pines (b) Salinas Valley (c) Kennedy Space Center.

Figure 12 .
Figure 12.Comparison of SVM, CF-SVM, LDM and CF-LDM on three datasets.Third, from Figure13, the OA values of DTRF-SVM and DTRF -LDM in Indian Pines were 14.82% and 14.75%, separately larger than of the OA values of SVM and LDM.Correspondingly, the OA values of DTRF-SVM and DTRF-LDM in Salinas Valley were 8.72% and 9.56% huger than the SVM and LDM OA values.Similarly, the OA values of DTRF-SVM and DTRF-LDM in Kennedy Space Center were 10.17% and 10.13%, respectively, higher than the SVM and LDM OA values.Thus, for improving the hyperspectral classification in this work, the spatial correlation information extracted by DTRF was efficient.

Table 1 .
Comparison of classification accuracies (in percent) provided by seven methods for Indian Pines (part A).

Sum Train SVM PCA-SVM LDM PCA-LDM EPF IFRF PCA-EPFs
The second dataset was SalinasValley [52]collected by AVIRIS in the Salinas Valley, Southern California, in 1998.It has a high spatial resolution of 3.7 m with a region of the spatial size of 512 217  pixels and 206 spectral bands.Similarly, 200 bands were retained because of noise and

Table 1 .
Comparison of classification accuracies (in percent) provided by seven methods for Indian Pines (part A).The second dataset was Salinas Valley [52] collected by AVIRIS in the Salinas Valley, Southern California, in 1998.It has a high spatial resolution of 3.7 m with a region of the spatial size of 512 × 217 pixels and 206 spectral bands.Similarly, 200 bands were retained because of noise and water absorption.The image also includes 16 classes, and the specific types and numbers of each class are shown in Table 2.

Table 2 .
Comparison of classification accuracies (in percent) provided by seven methods for Indian Pines (part B).

Table 3 .
Comparison of classification accuracies (in percent) provided by seven methods for Salinas Valley (part A).

Table 4 .
Comparison of classification accuracies (in percent) provided by seven methods for Salinas Valley (part B).Experiment of Kennedy Space CenterLikewise, the distribution based on Kennedy Space Center dataset is shown in Figure

Table 5 .
Comparison of classification accuracies (in percent) provided by seven methods for Kennedy Space Center (part A).

Table 5 .
Comparison of classification accuracies (in percent) provided by seven methods for Kennedy Space Center (part A).

Table 6 .
Comparison of classification accuracies (in percent) provided by seven methods for Kennedy Space Center (part B).