A Spectral-Spatial Classification of Hyperspectral Images Based on the Algebraic Multigrid Method and Hierarchical Segmentation Algorithm

The algebraic multigrid (AMG) method is used to solve linear systems of equations on a series of progressively coarser grids and has recently attracted significant attention for image segmentation due to its high efficiency and robustness. In this paper, a novel spectral-spatial classification method for hyperspectral images based on the AMG method and hierarchical segmentation (HSEG) algorithm is proposed. Our method consists of the following steps. First, the AMG method is applied to hyperspectral imagery to construct a multigrid structure of fine-to-coarse grids based on the anisotropic diffusion partial differential equation (PDE). The vertices in the multigrid structure are then considered as the initial seeds (markers) for growing regions and are clustered to obtain a sequence of segmentation results. In the next step, a maximum vote decision rule is employed to combine the pixel-wise classification map and the segmentation maps. Finally, a final classification map is produced by choosing the optimal grid level to extract representative spectra. Experiments based on three different types of real hyperspectral datasets with different resolutions and contexts demonstrate that our method can obtain 3.84%–13.81% higher overall accuracies than the SVM classifier. The performance of our method was further compared to several marker-based spectral-spatial classification methods using objective quantitative measures and a visual qualitative evaluation.


Introduction
Hyperspectral imaging systems can acquire numerous contiguous spectral bands throughout the electromagnetic spectrum.Therefore, hyperspectral imaging techniques are widely used for many applications, including environmental monitoring, mineralogy, astronomy, surveillance and defense [1].Nevertheless, the high dimensionality of the pixels, undesirable noise, high spectral redundancy and spectral and spatial variabilities, in conjunction with limited ground truth data, present challenges for the analysis of hyperspectral imagery.In recent decades, many intensive hyperspectral image classification methods have been proposed, including ensemble learning [2][3][4], Bayesian approaches [5][6][7], kernel-based methods [8,9], neural networks [10], sparse representations [11,12] and manifold learning [13].
To further improve classification performance, many contributions have been dedicated to spectral-spatial classification, which combine spatial contextual information with spectral information for hyperspectral image classification; the approaches include using co-occurrence [14], extended morphological profiles [15,16], extended morphological attribute profiles [17], Gabor filtering [18] and multihypothesis prediction [19] to extract spatial features and Markov random fields [20,21], graph kernels [22] and composite kernels [23] to perform spectral-spatial classification.
Alternatively, spatial information can also be integrated into the spectral-spatial classification process by performing image segmentation.For that purpose, many segmentation techniques have been proposed, including watershed, partitional clustering, the HSEG algorithm, and minimum spanning forest (MSF), to segment hyperspectral imagery into homogeneous regions according to a homogeneity criterion.A maximum vote decision rule is then employed to classify all the homogeneous regions according to a pixel-wise classification result.Because the technique can define large-scale neighborhoods for large homogeneous regions without missing small regions, more accurate classification results can be achieved than by traditional spectral-spatial methods.However, the automatic segmentation of hyperspectral imagery is still an continuing problem.To remedy this problem, a marker-controlled segmentation method was proposed to automatically select a single hierarchical segmentation level [24].The idea behind this approach is to select at least one pixel for each spatial object and to grow regions from the selected seeds (also called markers) to guarantee that each region is associated with one marker in the segmentation maps.Meanwhile, the experimental results in [24,25] demonstrated that the classification accuracies of the HSEG algorithm and MSF segmentation algorithms using automatically selected markers can greatly outperform the SVM classifier.Nevertheless, the drawbacks of these approaches are twofold.First, the marker selection methods are based on the performance of pixel-wise classifiers, i.e., the performance of different pixel-wise classifiers leads to different markers and uncertainties in the classification results.Even if the same classifier is applied, using different parameter settings may cause a similar problem.Second, the randomness of the training samples in the pixel-wise classification procedure always generates stochastic markers, which results in unstable classification accuracies.
To solve these problems, a novel spectral-spatial classification method for hyperspectral imagery based on the AMG method and HSEG algorithm is presented.Our method includes four main steps.First, a multiscale representation of the hyperspectral imagery is obtained using the AMG method to solve an anisotropic diffusion PDE.Recently, PDE-based techniques have been applied to image processing and computer vision due to their outstanding edge-preservation smoothing properties and explicit accounting of intrinsic geometries.The scale-space representation of hyperspectral imagery can be obtained by solving a classical anisotropic diffusion PDE using the AMG method for higher accuracies, efficiencies and scalability.Following this idea, we applied the AMG method to construct a multigrid structure with different grid levels, starting with the finest grid (which corresponds to the original image) to the coarsest one.The vertices in the multigrid structure are then considered as markers for region growing and are clustered to obtain a sequence of segmentation results with different numbers of homogeneous regions.It should be remarked that the markers that are extracted from the multigrid structure are only determined by the structures of the hyperspectral imagery rather than training samples or the performance of pixel-wise classifiers, which is more robust than the traditional methods.In addition, by combining the pixel-wise SVM classification map and the unsupervised segmentation maps using a maximum vote decision rule, a series of spectral-spatial classification maps are obtained.Finally, a final classification map is produced by choosing the optimal grid level to extract representative pixels for our segmentation and classification.Multiscale representations for hyperspectral imagery using the AMG method always contain series of grid levels.In fact, each grid level can be utilized to extract the required representative pixels.As a consequence, a sequence of segmentation maps is produced for the subsequent classification process.In this paper, we will discuss an optimization problem to automatically select an optimal grid level to achieve the best classification accuracies.
The remainder of this paper is organized as follows.Section 2 presents the proposed spectral-spatial classification of hyperspectral imagery based on the AMG method and HSEG algorithm.
Remote Sens. 2016, 8, 296 3 of 23 Section 3 describes the experimental results, Section 4 is a discussion of our method, and Section 5 presents our concluding remarks.

Materials and Methods
A flow-chart of the proposed classification method is illustrated in Figure 1.Because the AMG method is more robust for PDE-based image analysis [26], we utilized it to solve the hyperspectral anisotropic diffusion PDE.First, the hyperspectral anisotropic diffusion PDE is briefly discussed in this section.The AMG method should construct a multigrid structure, which is visualized in Figure 1 as a pyramid, starting with the finest grid (which corresponds to the original image) on its base and the coarsest grid on its top.The detailed introduction of the AMG algorithm is then presented.Once the multigrid structure based on the AMG method is constructed for the hyperspectral image, we used the representative spectra in Figure 1 as markers for region growing in the HSEG algorithm.A step-by-step procedure of the AMG-derived M-HSEG method is also provided.As shown in Figure 1, a multiscale representation of hyperspectral imagery using the AMG method always contains a series of grid levels, each of which can be utilized to extract the required representative pixels.The selection of an optimal grid level to achieve the best classification accuracies is finally introduced.

Materials and Methods
A flow-chart of the proposed classification method is illustrated in Figure 1.Because the AMG method is more robust for PDE-based image analysis [26], we utilized it to solve the hyperspectral anisotropic diffusion PDE.First, the hyperspectral anisotropic diffusion PDE is briefly discussed in this section.The AMG method should construct a multigrid structure, which is visualized in Figure 1 as a pyramid, starting with the finest grid (which corresponds to the original image) on its base and the coarsest grid on its top.The detailed introduction of the AMG algorithm is then presented.Once the multigrid structure based on the AMG method is constructed for the hyperspectral image, we used the representative spectra in Figure 1 as markers for region growing in the HSEG algorithm.A step-by-step procedure of the AMG-derived M-HSEG method is also provided.As shown in Figure 1, a multiscale representation of hyperspectral imagery using the AMG method always contains a series of grid levels, each of which can be utilized to extract the required representative pixels.The selection of an optimal grid level to achieve the best classification accuracies is finally introduced.

Hyperspectral Anisotropic Diffusion PDE
It is well known that the linear scale-space can be represented by the heat diffusion equation.However, the disadvantage of the equation is that the edge features are smeared and distorted after a few iterations of the diffusion evolution.To overcome the problem of the linear scale-space, Perona and Malik [27] presented an anisotropic diffusion PDE to encourage intraregional smoothing while preventing interregional smoothing.Let : Ω R denote an original grey-level image, and Ω is the image domain.The anisotropic diffusion PDE is where u is the smoothed image at time t, and ∇ and div are the gradient and divergence operators, respectively.g(s) is the diffusion coefficient, which is defined as a non-negative monotonically

Hyperspectral Anisotropic Diffusion PDE
It is well known that the linear scale-space can be represented by the heat diffusion equation.However, the disadvantage of the equation is that the edge features are smeared and distorted after a few iterations of the diffusion evolution.To overcome the problem of the linear scale-space, Perona and Malik [27] presented an anisotropic diffusion PDE to encourage intraregional smoothing while preventing interregional smoothing.Let u 0 : Ω Ă R 2 denote an original grey-level image, and Ω is the image domain.The anisotropic diffusion PDE is

Bt
" div pg p|∇u|q ∇uq (1) where u is the smoothed image at time t, and ∇ and div are the gradient and divergence operators, respectively.g(s) is the diffusion coefficient, which is defined as a non-negative monotonically decreasing function of the local gradient magnitude s " |∇u|, and several forms of g p¨q have been widely exploited for Equation (1) [28][29][30].We can obtain a series of smoothed images by iteratively evolving Equation (1) starting from the observed image at t = 0, which constitutes a nonlinear scale space.To construct a scale-space representation of the hyperspectral imagery, a vector-valued anisotropic diffusion PDE is used: where N is the number of bands, u = (u 1 , u 2, . . ., u N ), and u σ is obtained by convolving u with a Gaussian kernel of standard deviation σ. g p¨q is the diffusion function of |∇u σ | .In our computations, the diffusion coefficient proposed in [31] is utilized because it can produce segmentation-like images: .31488 where θ is a measure of the image edge strength and can be discretized as the Euclidean distance (ED) or the spectral angle mapper (SAM) between two pixel vectors.K denotes a threshold that modulates the amount of diffusion with respect to θ.A simple example of the hyperspectral anisotropic diffusion process is illustrated in Figure 2. It can be observed that sensor noise in the hyperspectral image is effectively removed, and edge features are better preserved.In addition, the homogeneous regions are very smooth, which is greatly preferable for segmentation and classification.
Remote Sens. 2016, 8, 296 2 of 23 decreasing function of the local gradient magnitude = ∇ s u , and several forms of ( ) ⋅ g have been widely exploited for Equation (1) [28][29][30].We can obtain a series of smoothed images by iteratively evolving Equation (1) starting from the observed image at t = 0, which constitutes a nonlinear scale space.To construct a scale-space representation of the hyperspectral imagery, a vector-valued anisotropic diffusion PDE is used: where N is the number of bands, u = (u1, u2,…, uN), and uσ is obtained by convolving u with a Gaussian kernel of standard deviation σ. ( ) ⋅ g is the diffusion function of σ ∇u .In our computations, the diffusion coefficient proposed in [31] is utilized because it can produce segmentation-like images: 3.31488 1, 0 where θ is a measure of the image edge strength and can be discretized as the Euclidean distance (ED) or the spectral angle mapper (SAM) between two pixel vectors.K denotes a threshold that modulates the amount of diffusion with respect to θ.A simple example of the hyperspectral anisotropic diffusion process is illustrated in Figure 2. It can be observed that sensor noise in the hyperspectral image is effectively removed, and edge features are better preserved.In addition, the homogeneous regions are very smooth, which is greatly preferable for segmentation and classification.An explicit (or Euler forward difference) scheme is widely used to solve Equation ( 2) and control the diffusion by the diffusion coefficients computed from the previous time step using The solution u n`1 is obtained explicitly from u n by simply rearranging Equation (6) as u n`1 i " u n i `τ ¨div pg pθ p∇u n σ qq ∇u n i q (5) Using space discretization, the explicit scheme can be given by where I and A are the identity matrix and the matrix of diffusion coefficients, respectively.However, this scheme requires a maximum possible time step size τ ď 1{4 to ensure its stability, which means the computation efficiency is very low.An alternative discretization of Equation ( 2) employs semi-implicit schemes, which are stable for all time step sizes.The traditional semi-implicit scheme of the diffusion PDE can be calculated by a series of linear equations, The tridiagonal matrix I ´τA pu n σ q can be inverted using the Gauss-Seidel or preconditioned conjugate gradient method.However, it is necessary to solve large linear systems at each iteration step.As shown in [32], semi-implicit schemes such as additive operator splitting (AOS) and alternative direction implicit (ADI) schemes can significantly speed up the evolution of the PDE Equation (2).

Multiscale Representation of Hyperspectral Imagery
Following the work of [33], we construct a multigrid structure that is obtained by applying the AMG method to solve Equation (7).The multiscale representation for hyperspectral imagery by the AMG method is based on graph theory.Initially, the first graph pV 0 , E 0 q is built from the original hyperspectral image, where V 0 denotes the set of vertices and has the same size as the hyperspectral image, and E 0 represents the set of edges that connect each vertex to its four neighborhoods with weights.In our method, the initial weights g 0 ij of pi, jq Ă E 0 are computed using the diffusion coefficient Equation (3).To build an AMG multigrid structure, the authors of [33] introduced a mass m i for each vertex, which is a measure of the number of pixels assigned to a given vertex selected for the next grid and can be initialized as m 0 i " 1.The construction of the multigrid structure mainly consists of the two following steps.

‚
The first step is the consecutive selection of a new set of V l`1 from V l , where l is used to denote the current grid level with 0 ď l ď S, where S is the coarsest grid.The set of vertices V l is sorted in decreasing order according to m l i .Then, the first vertex of V l`1 is initialized as the vertex in V l that has the highest mass.Finally, the set of vertices at grid l+1 can be obtained: where 0 ă υ ă 1 is a threshold value, and V l zV l`1 denotes the set difference between V l and V l`1 .Note that m 0 i , g 0 ij and E 0 in the finest grid are initialized as before, and we can compute the coarser grid according to the previous algorithm.Once the vertices of the finest coarse grid are Remote Sens. 2016, 8, 296 6 of 23 constructed, we compute the masses in the next grid and the dependence degrees of the vertices in V l zV l`1 to the vertices in V l as and @i P V l zV l`1 , j P V l`1 : w l ij " w l ji " where w l ij is a measure of how much vertex i P V l zV l`1 depends on vertex j P V l`1 .

‚
The second step is performed by connecting the vertices in V l`1 to obtain E l`1 , which can be realized first by computing g l`1 ij for all the vertices in grid level l `1 and then connecting the vertices to obtain E l`1 .Based on the Garlekin principle [34], the matrix of diffusion coefficients can be defined as the Garlekin operator G l`1 " I c f G l I f c , where I c f denotes the restriction operator that maps vectors in a fine grid into a coarser one and is given by where I f c is the interpolation operator, which is used to interpolate the intensity back to the finer grids and is given by By combining Equations ( 11) and ( 12) into the Garlekin operator, we obtain We can then connect the vertices to obtain E l`1 : Finally, the set of vertices and edges in grid level l `1 are constructed.Therefore, the representative spectra in the coarse grid are determined by the position of the vertices that correspond to grid level l `1.
Anisotropic diffusion PDEs are always used for multiscale image analyses to construct nonlinear scale spaces.As time elapses, noise in the hyperspectral image is effectively removed, and edge features are better preserved.In addition, homogeneous regions become very smoothed, which is greatly preferable for segmentation and classification, as shown in Figure 2.However, the solution of the hyperspectral anisotropic diffusion PDE imposes a heavy computation burden.The AMG method can construct a hierarchical representation of the PDE solver from the finest grid (which corresponds to the original image) to the coarsest grid, i.e., the linear system can be solved in the coarser grid with higher accuracy, efficiency and scalability.On the other hand, with respect to graph pV l , E l q, the diffusion coefficient Equation ( 3) is used as a similarity function to assign a weight to each edge in E l .According to the definition of the diffusion coefficient, if g ij is close to 1, the pixel is inside a homogeneous region, whereas if g ij is close to 0, the pixel is near the edges.As a consequence, when the set of vertices V l is sorted in decreasing order of mass, the first vertex can correspond to a pixel inside a spectrally uniform region.Therefore, the anisotropic diffusion PDE is suitable for marker selection, and the acquired representative spectra is more reliable for the HSEG algorithm.
In the multigrid structure, each vertex selected by the AMG method can be exploited as a pioneer for a certain area in the hyperspectral image.The main problem is the use of the selected vertices for marker-based segmentation.In fact, we can use vertices in any grid level as the initial seeds for region growing.In [24], two automatic marker selection techniques were introduced to obtain the most reliable classified pixels as markers, i.e., marker selection approaches using morphological filtering (Morpho-MS) and probabilistic SVM (Proba-MS).To compare those two classification-derived marker selection methods with ours, flow-charts of the two schemes are illustrated in Figure 3.It can be observed in Figure 3a that the choice of markers in the classification-derived methods is always strongly dependent on the performance of the pixel-wise classifiers.Even if the same classifier is used, using different parameter settings can also produce different classification maps, which impacts the final markers.In addition, the selected markers can be very different due to the randomness of the training samples, which causes uncertainty in the classification maps.However, the final markers produced by our method are only determined by the structure of the hyperspectral image, as shown in Figure 3b, which is more robust than the traditional marker selection methods.Furthermore, the AMG-derived marker selection method requires fewer adjustment parameters than classification-derived marker selection techniques.
Remote Sens. 2016, 8, 296 2 of 23 suitable for marker selection, and the acquired representative spectra is more reliable for the HSEG algorithm.
In the multigrid structure, each vertex selected by the AMG method can be exploited as a pioneer for a certain area in the hyperspectral image.The main problem is the use of the selected vertices for marker-based segmentation.In fact, we can use vertices in any grid level as the initial seeds for region growing.In [24], two automatic marker selection techniques were introduced to obtain the most reliable classified pixels as markers, i.e., marker selection approaches using morphological filtering (Morpho-MS) and probabilistic SVM (Proba-MS).To compare those two classification-derived marker selection methods with ours, flow-charts of the two schemes are illustrated in Figure 3.It can be observed in Figure 3a that the choice of markers in the classification-derived methods is always strongly dependent on the performance of the pixel-wise classifiers.Even if the same classifier is used, using different parameter settings can also produce different classification maps, which impacts the final markers.In addition, the selected markers can be very different due to the randomness of the training samples, which causes uncertainty in the classification maps.However, the final markers produced by our method are only determined by the structure of the hyperspectral image, as shown in Figure 3b, which is more robust than the traditional marker selection methods.Furthermore, the AMG-derived marker selection method requires fewer adjustment parameters than classificationderived marker selection techniques.

AMG-Derived M-HSEG Method
Once the multigrid structures are constructed by the AMG method, we integrate the multiscale representation with the HSEG algorithm to obtain unsupervised segmentation maps.The basic idea of our segmentation method is that vertices in the multigrid structure can be considered as initial seeds (markers) for region growing and are clustered to obtain a sequence of segmentation results.The AMG-derived M-HSEG segmentation algorithm is briefly described as follows.

Algorithm 1: AMG-derived M-HSEG Segmentation Algorithm
Input: An original hyperspectral image u and the coarsest grid level S. Output: Segmentation maps 1. Input a hyperspectral image and construct an undirected graph as the finest grid.

AMG-Derived M-HSEG Method
Once the multigrid structures are constructed by the AMG method, we integrate the multiscale representation with the HSEG algorithm to obtain unsupervised segmentation maps.The basic idea of our segmentation method is that vertices in the multigrid structure can be considered as initial seeds (markers) for region growing and are clustered to obtain a sequence of segmentation results.The AMG-derived M-HSEG segmentation algorithm is briefly described as follows.

Algorithm 1: AMG-derived M-HSEG Segmentation Algorithm
Input: An original hyperspectral image u and the coarsest grid level S. Output: Segmentation maps 1.
Input a hyperspectral image and construct an undirected graph as the finest grid.

‚
At the finest grid level, perform a Gauss-Seidel relaxation to solve `I ´τG 0 ˘X0 " u with an initial guess image u and compute the error X 0 " `I ´τG 0 ˘X0 ´u.

‚
At the coarser grid level l (0 ă l ď S), perform a Gauss-Seidel relaxation to solve the residual equation ´I ´τG l ¯Xl " F l with an initial guess 0, and compute the error X l " F l ´´I ´τG l ¯Xl and then the residual F l " ´I ´τG l ¯Xl .
1. AMG Coarse-Grid Correction: Select the set of vertices in V l for V l`1 to obtain F l`1 for the coarser grid level l `1.

2.
Compute G l`1 and connect the nodes in V l`1 to obtain E l`1 .

3.
If l ď S, go to step 2; otherwise, go to the next step.

4.
Initialize the vertices in grid l as markers for the subsequent HSEG algorithm by assigning each vertex a non-zero marker label and each pixel as a separate region.5.
Perform the M-HSEG algorithm by using the markers obtained from grid l of the hyperspectral image: Stop when there is no more merging, which means that the DC value is NaN.

3.
Obtain the resultant segmentation maps for the subsequent classification.
The AMG-derived M-HSEG algorithm can be divided into three procedures.(i) In the coarsening grid procedure, the error X l (0 ă l ď S) is first estimated by solving ´I ´τG l ¯Xl " F l using the Gauss-Seidel relaxation, where F l is the residual.The masses, the matrix of diffusion coefficients and the set of edges are then updated for grid level l `1.We perform this procedure until the S-level multigrid structure is constructed; (ii) In the marker selection procedure, all the vertices in grid l are considered as the initial seeds (markers) for region growing.Because the scale-space can provide several coarse grid levels, we can obtain a series of segmentation maps; (iii) In the HSEG algorithm procedure, we only compute the DC value between a markered pixel and a non-markered pixel and merge the pair of adjacent pixels that has the smallest DC value, which can reduce the computational burden.To further improve the computational efficiency, the RHSEG strategy [35] is also employed.As a consequence, the number of regions in the resultant segmentation map is equal to the number of marker sets.

Selection of the Optimal Grid Level
With respect to the AMG-based scale space, if we choose the finest grid for marker extraction, a large number of representative spectra will be selected.Consequently, a segmentation map is produced with many small regions, which is very difficult to extract in large homogeneous areas.On the contrary, if we choose the very coarse grid, the number of markers is very small, and only a few large regions exist in the segmentation map.The key problem is choosing the optimal grid level from the multigrid structure for classification.According to the above analyses, we can achieve the best classification accuracies by choosing a certain grid level l (0 ď l ď S) for marker extraction from the AMG multigrid structure.All of our experiments on hyperspectral imagery, including those not reported here, confirm that the classification accuracy achieved by our method can be considered as a concave function of the number of markers.Inspired by this rule, we can provide a strategy for automatically selecting an optimal value of l.Therefore, the parameter tuning can be converted into an optimization problem: where l opt is the index of the optimal grid level, and OA denotes the overall accuracy of our classification.It is possible to compare the resultant images by different values of l to obtain a solution to Equation (11) with the best OA value, which corresponds well with a parameter tuning procedure for marker selection.

Parameter Settings and Evaluation Measures
To enable a better understanding of the maximum vote decision rule in our method, an illustrative example of the integration process using the majority voting step is depicted in Figure 4.As mentioned in Section 2.3, the AMG-derived M-HSEG algorithm is used to segment the hyperspectral image into different regions with region labels, as shown in Figure 5b.To assign each region an information label, we integrate the unsupervised segmentation map and the pixel-wise SVM classification map by applying majority voting within this region in the segmentation map.For each region in Figure 5b, its class label is assigned to the most frequent class in the pixel-wise classification map in Figure 5a within this region.In this way, the advantages of both the pixel-wise SVM classification and the AMG-derived M-HSEG algorithm are combined; the resultant classification map is shown in Figure 5c.It should be noted that a marker may be assigned to the wrong class by most of the marker-based classification methods.Therefore, all pixels within the region grown from this marker are at risk of being wrongly classified.To tackle this problem, the majority voting step is widely used.Nevertheless, the purpose of the step in our method is to perform spectral-spatial classification on all the homogeneous regions in the unsupervised segmentation map by combining spectral and spatial information because we have no idea which class a marker should belong to with respect to the ground truth.
Remote Sens. 2016, 8, 296 2 of 23 where l opt is the index of the optimal grid level, and OA denotes the overall accuracy of our classification.It is possible to compare the resultant images by different values of l to obtain a solution to Equation (11) with the best OA value, which corresponds well with a parameter tuning procedure for marker selection.

Parameter Settings and Evaluation Measures
To enable a better understanding of the maximum vote decision rule in our method, an illustrative example of the integration process using the majority voting step is depicted in Figure 4.As mentioned in Section 2.3, the AMG-derived M-HSEG algorithm is used to segment the hyperspectral image into different regions with region labels, as shown in Figure 5b.To assign each region an information label, we integrate the unsupervised segmentation map and the pixel-wise SVM classification map by applying majority voting within this region in the segmentation map.For each region in Figure 5b, its class label is assigned to the most frequent class in the pixel-wise classification map in Figure 5a within this region.In this way, the advantages of both the pixel-wise SVM classification and the AMG-derived M-HSEG algorithm are combined; the resultant classification map is shown in Figure 5c.It should be noted that a marker may be assigned to the wrong class by most of the marker-based classification methods.Therefore, all pixels within the region grown from this marker are at risk of being wrongly classified.To tackle this problem, the majority voting step is widely used.Nevertheless, the purpose of the step in our method is to perform spectral-spatial classification on all the homogeneous regions in the unsupervised segmentation map by combining spectral and spatial information because we have no idea which class a marker should belong to with respect to the ground truth.In this section, several hyperspectral classification methods are compared to the proposed AMGderived M-HSEG (AMG-M-HSEG) classification method.(1) In the pixel-wise SVM classification method, the parameters are optimally set for each data set.(2) Two marker-based spectral-spatial classification methods proposed by Tarabalka et al. [36] used a HSEG algorithm following the classification-derived marker selection methods; these methods were used both without and with the optional majority voting under the rule that the class label of each region is given to the class with the maximum pixels within this region in the classification map.These methods are named "Morph-M-HSEG", "Morph-M-HSEG + MV", "Proba-M-HSEG" and "Proba-M-HSEG + MV", respectively.
(3) Another marker-based spectral-spatial classification method proposed by Tarabalka et al. [25] uses an MSF construction following the Proba-MS marker selection method, which are also used without and with the optional majority voting step.These methods are named "Proba-M-MSF" and "Proba-M-MSF + MV", respectively.
(1) Because the merging of spatially non-adjacent regions always creates a large computational burden, the optional parameter Swght is set as Swght = 0.0 to improve the computational efficiency for all the hyperspectral images in our experiments, which means that only spatially adjacent regions are merged in the HSEG step.
(2) To increase the computational efficiency, four-neighborhood connectivity is exploited in the HSEG algorithm and the MSF construction algorithm.
(3) Because two different similarity metric measures are commonly used for hyperspectral images to discretize θ in Equation (3), i.e., the ED and the SAM between spectral vectors, we apply those two measures for computing the DCs between the regions for the HSEG and the weights of the edges for the MSF construction, respectively.It should be remarked that our experiments on the images used in the paper demonstrate that both the Proba-M-HSEG (+MV) and Morph-M-HSEG (+MV) methods using the ED measure always result in inaccurate or false segmentation and classification maps because the ED measure cannot provide a satisfactory dissimilarity measure between the region mean vectors for the M-HSEG algorithm.Therefore, these two classification methods using the ED measure, both without and with the optional majority voting step, are not considered in our following experiments for comparison.In addition, other similarity metric measures such as the L1 vector norm and the spectral information divergence (SID) can be used as well.
(4) The parameters for our method are set according to former research on the AMG method [33].In our experiments, we set τ = 1, υ = 0.2, and K = 0.01.In addition, we used the method described in [33] to determine the coarsest grid S, i.e., if the number of vertices in any grid starting with the finest grid is equal or less than log2U (U is equal to the original image size), the construction of the multigrid structure stops, and the coarsest grid S can be automatically obtained.
(5) The multiclass one-vs.-oneSVM classifier with a Gaussian radial basis function (RBF) kernel is used on the hyperspectral data sets.SVM has been the most frequently used method and can achieve higher classification accuracies than traditional pixel-wise techniques when a limited number In this section, several hyperspectral classification methods are compared to the proposed AMG-derived M-HSEG (AMG-M-HSEG) classification method.(1) In the pixel-wise SVM classification method, the parameters are optimally set for each data set.(2) Two marker-based spectral-spatial classification methods proposed by Tarabalka et al. [36] used a HSEG algorithm following the classification-derived marker selection methods; these methods were used both without and with the optional majority voting under the rule that the class label of each region is given to the class with the maximum pixels within this region in the classification map.These methods are named "Morph-M-HSEG", "Morph-M-HSEG + MV", "Proba-M-HSEG" and "Proba-M-HSEG + MV", respectively.(3) Another marker-based spectral-spatial classification method proposed by Tarabalka et al. [25] uses an MSF construction following the Proba-MS marker selection method, which are also used without and with the optional majority voting step.These methods are named "Proba-M-MSF" and "Proba-M-MSF + MV", respectively.
(1) Because the merging of spatially non-adjacent regions always creates a large computational burden, the optional parameter S wght is set as S wght = 0.0 to improve the computational efficiency for all the hyperspectral images in our experiments, which means that only spatially adjacent regions are merged in the HSEG step.
(2) To increase the computational efficiency, four-neighborhood connectivity is exploited in the HSEG algorithm and the MSF construction algorithm.
(3) Because two different similarity metric measures are commonly used for hyperspectral images to discretize θ in Equation (3), i.e., the ED and the SAM between spectral vectors, we apply those two measures for computing the DCs between the regions for the HSEG and the weights of the edges for the MSF construction, respectively.It should be remarked that our experiments on the images used in the paper demonstrate that both the Proba-M-HSEG (+MV) and Morph-M-HSEG (+MV) methods using the ED measure always result in inaccurate or false segmentation and classification maps because the ED measure cannot provide a satisfactory dissimilarity measure between the region mean vectors for the M-HSEG algorithm.Therefore, these two classification methods using the ED measure, both without and with the optional majority voting step, are not considered in our following experiments for comparison.In addition, other similarity metric measures such as the L1 vector norm and the spectral information divergence (SID) can be used as well.
(4) The parameters for our method are set according to former research on the AMG method [33].In our experiments, we set τ = 1, υ = 0.2, and K = 0.01.In addition, we used the method described in [33] to determine the coarsest grid S, i.e., if the number of vertices in any grid starting with the finest grid is equal or less than log 2 U (U is equal to the original image size), the construction of the multigrid structure stops, and the coarsest grid S can be automatically obtained.
(5) The multiclass one-vs.-oneSVM classifier with a Gaussian radial basis function (RBF) kernel is used on the hyperspectral data sets.SVM has been the most frequently used method and can achieve higher classification accuracies than traditional pixel-wise techniques when a limited number of training data sets are available.Refer to [37][38][39] for details on SVM.As a consequence, information classes are defined for the hyperspectral image, and each pixel is given a unique class label.The performance of the methods is objectively evaluated in terms of global accuracy (GA) measures that include the OA, average accuracy (AA), the kappa coefficient κ [40], and class-specific accuracy (CA).Note that these objective measures can be obtained from the confusion matrix.

The Indian Pines Image (AVIRIS)
The first Indian Pines hyperspectral image, which was acquired with the AVIRIS sensor, has 145 ˆ145 pixels and 220 bands in the 400-2500 nm range, which represent a 2 mile by 2 mile area with a spatial resolution of 20 m.A spectral subset of 185 bands was used for our experiments.Sixteen classes of interest, which are shown in Table 1, were used in our experiments.To perform the supervised classification, 10% of the labeled pixels in each class in the ground truth data were randomly selected as training samples, and the remaining 90% were used as test samples.It can be observed from the ground truth data that some classes only include a very small number of samples, such as Alfalfa, Grass/pasture-mowed and Oats.For each of those we randomly selected 10 training samples, and the remainder of the samples were used for testing.In addition, the SVM classifier parameters C and γ were optimally obtained using a five-fold cross-validation, and C = 8192, γ = 0.5.In the Proba-MS method, there are three adjustable parameters (M, P and T).Because the maximum size of the connected components for oats in the SVM classification map was 19, we set M = 19 for the Proba-MS procedure in the Proba-M-MSF (+MV) and Proba-M-HSEG (+MV) methods.In addition, in the Proba-M-MSF (+MV) method, the parameter P was computed as P = 6%, given the condition that each marker for a large region should have at least one pixel.The last parameter, T, was set equal to the lowest probability within the highest 2% of the probability estimates, whereas in the Proba-M-HSEG (+MV) method, the parameter settings were set to P = 40% and T = 50%.In addition, the size of the structuring element in the Morph-M-HSEG (+MV) method was 3 ˆ3.
To build the multigrid structure, 10 coarse grid levels (S = 10) were constructed according to Section 3.1.Figure 6 shows the impact on the objective quantitative assessments of the GAs caused by varying l from 0 to 10.We can observe from these plots the high robustness of the results with respect to values of l from 1 to 5 when compared to the SVM classification result (l = 0).In addition, it can be observed that the shapes of the plots share a similar global behavior.As l is increased, all the GAs rise gradually until reaching a peak.After the maximum, the OA and κ values drop from 96.32% and 95.80% (l = 5) to 51.05% and 42.61% (l = 10), respectively.However, the AA value drops more quickly than the other two measures, from a high value of 95.95% to 28.6%.This can be explained because for l ě 6, all the pixels in the classification maps, which should belong to Grass/pasture-mowed in the ground truth data (refer to Figure 7b), are assimilated with their neighboring structures.As a consequence, the CA of this class is 0, and the corresponding AA is lower.In this experiment, which had a coarse grid level of l = 5 estimated using Equation ( 15), 397 vertices, which occupy 1.9% of the total number of pixels in the image, are used as markers.It is worth noting that other GAs such as AA and κ can also be used in Equation ( 15), and the same value of l opt (l opt = 5) will be obtained.To build the multigrid structure, 10 coarse grid levels (S = 10) were constructed according to Section 3.1.Figure 6 shows the impact on the objective quantitative assessments of the GAs caused by varying l from 0 to 10.We can observe from these plots the high robustness of the results with respect to values of l from 1 to 5 when compared to the SVM classification result (l = 0).In addition, it can be observed that the shapes of the plots share a similar global behavior.As l is increased, all the GAs rise gradually until reaching a peak.After the maximum, the OA and κ values drop from 96.32% and 95.80% (l = 5) to 51.05% and 42.61% (l = 10), respectively.However, the AA value drops more quickly than the other two measures, from a high value of 95.95% to 28.6%.This can be explained because for l ≥ 6, all the pixels in the classification maps, which should belong to Grass/pasture-mowed in the ground truth data (refer to Figure 7b), are assimilated with their neighboring structures.As a consequence, the CA of this class is 0, and the corresponding AA is lower.In this experiment, which had a coarse grid level of l = 5 estimated using Equation (15), 397 vertices, which occupy 1.9% of the total number of pixels in the image, are used as markers.It is worth noting that other GAs such as AA and κ can also be used in Equation ( 15), and the same value of lopt (lopt = 5) will be obtained.To build the multigrid structure, 10 coarse grid levels (S = 10) were constructed according to Section 3.1.Figure 6 shows the impact on the objective quantitative assessments of the GAs caused by varying l from 0 to 10.We can observe from these plots the high robustness of the results with respect to values of l from 1 to 5 when compared to the SVM classification result (l = 0).In addition, it can be observed that the shapes of the plots share a similar global behavior.As l is increased, all the GAs rise gradually until reaching a peak.After the maximum, the OA and κ values drop from 96.32% and 95.80% (l = 5) to 51.05% and 42.61% (l = 10), respectively.However, the AA value drops more quickly than the other two measures, from a high value of 95.95% to 28.6%.This can be explained because for l ≥ 6, all the pixels in the classification maps, which should belong to Grass/pasture-mowed in the ground truth data (refer to Figure 7b), are assimilated with their neighboring structures.As a consequence, the CA of this class is 0, and the corresponding AA is lower.In this experiment, which had a coarse grid level of l = 5 estimated using Equation ( 15), 397 vertices, which occupy 1.9% of the total number of pixels in the image, are used as markers.It is worth noting that other GAs such as AA and κ can also be used in Equation ( 15), and the same value of lopt (lopt = 5) will be obtained.(g) (h) (i) Table 1 lists the number of training and test samples for each class in the ground truth data and the classification accuracies of the SVM classification, and Table 2 lists the classification accuracies of all the marker-based classification methods used here.The RGB composite map from bands 47, 23 and 13 of the AVIRIS image and its ground truth data are depicted in Figure 7a,b, respectively.Figure 7c-m illustrates the corresponding classification maps.From those results, we reached the following conclusions.
(1) The Morph-M-HSEG and Proba-M-HSEG methods, both with or without the optional majority voting step, can achieve better GAs when compared with the SVM classification.Meanwhile, the highest CAs for 6 of the 16 classes were achieved when using those four methods, including Corn-min till, Grass/trees, Grass/pasture-mowed, Hay-windowed, Soybeans-clean till and Stonesteel towers.However, those methods always resulted in a slight under-segmentation in the HSEG step.For example, it can be observed in Figure 7 that some small regions of the Corn-no till, Oats and Grass/pasture classes were merged by their adjacent regions that belonged to the other classes, or some small regions of the other classes were merged by large regions of the Corn-min and Soybean-clean classes.In contrast, in the results of our method shown in Figure 7l,m, most of the small regions that Table 1 lists the number of training and test samples for each class in the ground truth data and the classification accuracies of the SVM classification, and Table 2 lists the classification accuracies of all the marker-based classification methods used here.The RGB composite map from bands 47, 23 and 13 of the AVIRIS image and its ground truth data are depicted in Figure 7a,b, respectively.Figure 7c-m illustrates the corresponding classification maps.From those results, we reached the following conclusions.(1) The Morph-M-HSEG and Proba-M-HSEG methods, both with or without the optional majority voting step, can achieve better GAs when compared with the SVM classification.Meanwhile, the highest CAs for 6 of the 16 classes were achieved when using those four methods, including Corn-min till, Grass/trees, Grass/pasture-mowed, Hay-windowed, Soybeans-clean till and Stone-steel towers.However, those methods always resulted in a slight under-segmentation in the HSEG step.For example, it can be observed in Figure 7 that some small regions of the Corn-no till, Oats and Grass/pasture classes were merged by their adjacent regions that belonged to the other classes, or some small regions of the other classes were merged by large regions of the Corn-min and Soybean-clean classes.In contrast, in the results of our method shown in Figure 7l,m, most of the small regions that belonged to different classes were better preserved.In addition, the majority voting step did not improve the GAs and CAs of the Morph-M-HSEG method using the SAM distance because almost all the pixels in each region in Figure 7f have the same class label.
(2) The Proba-M-MSF method, both with or without the optional majority voting step, can obtain better GAs than the SVM classifier.However, the highest OA by our method is 2%-5% higher when compared with this method.To clearly demonstrate the difference between the two methods, one region at the top-middle of the image was used for comparison.This region should be classified as Bldg-Grass-Trees-Drives according to the ground truth data, but a large number of pixels in that region were classified as Woods by the Proba-M-MSF method, as shown in Figure 7j,k.By comparison, our method can achieve more accurate classification maps.Apart from these observations, another small region in the top left of the image was used for comparison.It can be observed that this region was correctly classified as Grass/pasture by our method, which is consistent with the ground truth data.Nevertheless, the entire region was merged by its spatial adjacent region, which belonged to Bldg-Grass-Trees-Drives, by the Proba-M-MSF method.
(3) The GAs achieved by our method using the SAM distance were the best among all the classification methods used for comparison.In this case, the OA and κ increased by 13.81% and 15.84%, respectively.Meanwhile, the highest CAs for 8 of the 16 classes were achieved when using our method.On that occasion, the AA was improved by 15.34%.It is very important to preserve material boundaries and edge structures in classification maps.From the classification maps, we can observe that our method was better than the other marker-based classification methods in terms of region homogenization and edge preservation.

The Washington DC Image (HYDICE)
In the next example, the benchmark Washington DC image from the HYDICE sensor contains 1208 scan lines with 307 pixels in each scan line and 224 bands, and it has a spatial resolution of approximately 2.8 m.A sub-image was produced for our experiments by spatially and spectrally subsetting to include 200 ˆ225 pixels and 191 bands.Because the image also has a high spatial resolution, we obtained a ground truth image with six labeled classes by identifying the different materials.In the SVM classification algorithm, 5% of the labeled pixels for each class in the ground truth data were randomly chosen for training, and the remaining labeled pixels were used for testing.The optimal parameters for the SVM classifier were estimated as C = 2084 and γ = 2 by five-fold cross-validation.The GAs and CAs of the classification of the data set using the SVM classification are listed in Table 1.The parameters for the Proba-MS algorithm were fixed as follows: M = 20, P = 5% and T = 2% for the Proba-M-MSF (+MV) method and M = 20, P = 40% and T = 50% for the Proba-M-HSEG (+MV) method.Additionally, a 3 ˆ3 structuring element was used in the Morph-M-HSEG (+MV) method.
In our method, 12 coarse grid levels were constructed in the AMG structure, and the optimal coarse grid level l = 5 can be obtained from Figure 8 using Equation (15).In this grid, 1125 vertices, which occupy 2.5% of the total number of pixels in the image, were utilized as markers.To objectively compare the classification results, the GAs and CAs of the Washington DC image are shown in Table 3, and all the corresponding classification maps are illustrated in Figure 9.As can be observed from these results, we can obtain similar conclusions as for the Indian Pines image.In particular, it can be observed from Table 3 that the GAs can be better than the pixel-wise SVM by most of the spectral-spatial classification methods used here, except for the Proba-M-MSF method, which could not effectively differentiate the Street, Roofs and Path classes.Furthermore, the best GAs among all the classification methods were obtained by our method with the SAM distance.In this case, the OA and κ were better by 3.84% and 4.83%, respectively, compared with the SVM results.In addition, the highest CAs for 3 of the 6 classes were achieved by our method.In this example, the increase in AA values was as large as 3.58%.
In our method, 12 coarse grid levels were constructed in the AMG structure, and the optimal coarse grid level l = 5 can be obtained from Figure 8 using Equation (15).In this grid, 1125 vertices, which occupy 2.5% of the total number of pixels in the image, were utilized as markers.To objectively compare the classification results, the GAs and CAs of the Washington DC image are shown in Table 3, and all the corresponding classification maps are illustrated in Figure 9.As can be observed from these results, we can obtain similar conclusions as for the Indian Pines image.In particular, it can be observed from Table 3 that the GAs can be better than the pixel-wise SVM by most of the spectralspatial classification methods used here, except for the Proba-M-MSF method, which could not effectively differentiate the Street, Roofs and Path classes.Furthermore, the best GAs among all the classification methods were obtained by our method with the SAM distance.In this case, the OA and κ were better by 3.84% and 4.83%, respectively, compared with the SVM results.In addition, the highest CAs for 3 of the 6 classes were achieved by our method.In this example, the increase in AA values was as large as 3.58%.

The Centre of Pavia Image (ROSIS)
The third hyperspectral data set was the Centre of Pavia image, which was acquired by the ROSIS-03 optical sensor.The image has 400 ˆ400 pixels, 102 spectral channels and a spatial resolution of 1.3 m.To evaluate our method, we manually generated a ground truth data for the image by visual interpretation that included ten material classes of interest, and 2% of the labeled pixels of each class from the ground truth data were selected as training samples.The remaining ones were used for testing.A pixel-wise SVM classification was performed on the image, and the following parameters were chosen by five-fold cross-validation: C = 1.31072 ˆ10 5 and γ = 2.The training and test samples for each class and the corresponding classification accuracies by the SVM classifier are reported in Table 1.The Proba-MS algorithm parameters for the Proba-M-MSF (+MV), Proba-M-HSEG (+MV) and Morph-M-HSEG (+MV) methods were the same as the second hyperspectral data set.
To obtain accurate markers for the following segmentation, 14 coarse grid levels of the Centre of Pavia image were constructed for building the AMG structure, in which 3105 vertices, which occupied 1.9% of the total number of pixels in the image for the optimal grid level of l = 6 , were utilized as markers, as shown in Figure 10.For comparison, we show the GAs and CAs of the applied classification methods used here in Table 4, and the corresponding classification maps are displayed in Figure 11.It can be observed that almost all the spectral-spatial classification methods achieved higher GAs when compared with the pixel-wise SVM classification, except for the Proba-M-MSF method with the SAM distance.As shown in Figure 11i, most of the building shadows were not appropriately recognized by the Proba-M-MSF method with the SAM distance, and the CAs of this method confirm that conclusion.For example, the Shadow CA was only 27.15%, which was much lower than the 95.94% achieved by the SVM classifier.In addition, the best GAs were achieved using our method.For example, the increases in OA and κ were as large as 7.06% and 7.91%, respectively, compared with the SVM results.Apart from that, the highest CAs for 5 of the 10 classes were achieved by our method, and the improvement in AA was as large as 5.96%.Proba-M-MSF +MV using ED; (k) Proba-M-MSF +MV using SAM; (l) AMG-M-HSEG using ED; and (m) AMG-M-HSEG using SAM.

The Centre of Pavia Image (ROSIS)
The third hyperspectral data set was the Centre of Pavia image, which was acquired by the ROSIS-03 optical sensor.The image has 400 × 400 pixels, 102 spectral channels and a spatial resolution of 1.3 m.To evaluate our method, we manually generated a ground truth data for the image by visual interpretation that included ten material classes of interest, and 2% of the labeled pixels of each class from the ground truth data were selected as training samples.The remaining ones were used for testing.A pixel-wise SVM classification was performed on the image, and the following parameters were chosen by five-fold cross-validation: C = 1.31072 × 10 5 and γ = 2.The training and test samples for each class and the corresponding classification accuracies by the SVM classifier are reported in Table 1.The Proba-MS algorithm parameters for the Proba-M-MSF (+MV), Proba-M-HSEG (+MV) and Morph-M-HSEG (+MV) methods were the same as the second hyperspectral data set.
To obtain accurate markers for the following segmentation, 14 coarse grid levels of the Centre of Pavia image were constructed for building the AMG structure, in which 3105 vertices, which occupied 1.9% of the total number of pixels in the image for the optimal grid level of l = 6 , were utilized as markers, as shown in Figure 10.For comparison, we show the GAs and CAs of the applied classification methods used here in Table 4, and the corresponding classification maps are displayed in Figure 11.It can be observed that almost all the spectral-spatial classification methods achieved higher GAs when compared with the pixel-wise SVM classification, except for the Proba-M-MSF method with the SAM distance.As shown in Figure 11i, most of the building shadows were not appropriately recognized by the Proba-M-MSF method with the SAM distance, and the CAs of this method confirm that conclusion.For example, the Shadow CA was only 27.15%, which was much lower than the 95.94% achieved by the SVM classifier.In addition, the best GAs were achieved using our method.For example, the increases in OA and κ were as large as 7.06% and 7.91%, respectively, compared with the SVM results.Apart from that, the highest CAs for 5 of the 10 classes were achieved by our method, and the improvement in AA was as large as 5.96%.

Discussion
In our method, the coarse grid level l critically determines the AMG-M-HSEG classifier.In this

Discussion
In our method, the coarse grid level l critically determines the AMG-M-HSEG classifier.In this section, the influence of this parameter on the classification accuracies of our method is investigated experimentally for the Indian Pines image.To clearly demonstrate the difference in the visual interpretation, the segmentation and corresponding classification maps with respect to l from 4 to 6 are shown in Figure 12.The GAs and CAs are also listed in Table 5.From these results, we can draw the following conclusions.
(1) At the coarse grid level of l = 5, our algorithm achieved the best GAs.
(2) As the coarse grid level l increased, the number of regions in the HSEG maps was reduced mainly due to the decrease of the selected markers.In addition, the CAs of classes with large connected regions were improved, such as the Bldg-Grass-Tree-Drives and Oats classes (refer to the red solid line rectangles in Figure 12).On the other hand, the CAs of the classes with small connected regions were greatly reduced, such as the Grass/pasture-mowed, Soybeans-no till and Alfalfa classes (refer to the green solid line ellipses in Figure 12).It is very interesting that different values of l produced homogeneous regions with different sizes.From this point of view, the adjustability of the grid level in the multigrid structure can provide flexibility for generating different segmentation maps to satisfy various applications.

Conclusions
In this paper, we proposed a spectral-spatial classification method for hyperspectral imagery based on the AMG method and the HSEG algorithm.Specifically, a multiscale representation for hyperspectral imagery is obtained using the AMG method to solve the hyperspectral anisotropic

Conclusions
In this paper, we proposed a spectral-spatial classification method for hyperspectral imagery based on the AMG method and the HSEG algorithm.Specifically, a multiscale representation for hyperspectral imagery is obtained using the AMG method to solve the hyperspectral anisotropic diffusion PDE.Once the multigrid structure is constructed, vertices in any grid level can be considered as markers for HSEG and clustered to obtain a sequence of unsupervised segmentation results.A maximum vote decision rule is then employed to classify all the homogeneous regions in the segmentation maps according to a pixel-wise classification result.Finally, to obtain the best classification accuracies, a final classification map is developed by choosing the optimal grid level to extract representative spectra.The major advantage of the proposed method is that it can obtain reliable markers that are only determined by the structures in the hyperspectral imagery, not the performance of the pixel-wise classifiers or the training samples in the pixel-wise classification step.The proposed classification method, i.e., AMG-M-HSEG, was compared with the pixel-wise SVM classifier and marker-based spectral-spatial classification methods.The results of the experiments demonstrated that our method can achieve improvements of 13.81%, 3.84% and 7.06% in OA over the pixel-wise SVM classifier for the Indian Pines, Washington DC and Centre of Pavia datasets, respectively.Furthermore, for all three datasets, the OA and κ by our method were the best among all the traditional spectral-spatial classification methods.It can be concluded that the proposed method can effectively improve classification accuracy for real hyperspectral datasets with different resolutions and contexts.Therefore, our method is useful in real-world applications.In the future, a further improvement can be achieved by exploring more valuable information from the AMG-based scale-space of hyperspectral imagery.

Figure 1 .
Figure 1.Spectral-spatial classification of hyperspectral images based on the AMG method and marker-based HSEG algorithm (M-HSEG).The original Indian Pines data set is visualized by an RGB composite of bands 47, 23, and 13; the marker maps are visualized by an RGB composite of bands 47, 23, and 13 using the representative spectra, which are used as the initial seeds (markers) for the HSEG algorithm.The segmentation maps correspond to the marker maps, and the different grey levels represent different region labels.

Figure 1 .
Figure 1.Spectral-spatial classification of hyperspectral images based on the AMG method and marker-based HSEG algorithm (M-HSEG).The original Indian Pines data set is visualized by an RGB composite of bands 47, 23, and 13; the marker maps are visualized by an RGB composite of bands 47, 23, and 13 using the representative spectra, which are used as the initial seeds (markers) for the HSEG algorithm.The segmentation maps correspond to the marker maps, and the different grey levels represent different region labels.

Figure 2 .
Figure 2. Example of the hyperspectral anisotropic diffusion PDE process for the Indian Pines data set: (a) original noisy band 185; and (b) diffused result.

Figure 2 .
Figure 2. Example of the hyperspectral anisotropic diffusion PDE process for the Indian Pines data set: (a) original noisy band 185; and (b) diffused result.

Figure 3 .
Figure 3. Flow-charts of the classification-derived marker selection method and our method: (a) the classification-derived marker selection scheme; and (b) the AMG-derived marker selection scheme.

Figure 3 .
Figure 3. Flow-charts of the classification-derived marker selection method and our method: (a) the classification-derived marker selection scheme; and (b) the AMG-derived marker selection scheme.
(a)Calculate the dissimilarity criterion (DC) values between all pairs of spatially adjacent regions.It should be noted that we only calculate the DC value between a markered pixel and a non-markered pixel and merge the pair of adjacent pixels that has the smallest DC value.(b)Merge the pair of adjacent pixels that has the smallest DC value.(c)

Figure 4 .
Figure 4. Schematic example of using majority voting within segmentation regions.

Figure 4 .Figure 5 .
Figure 4. Schematic example of using majority voting within segmentation regions.

Figure 5 .
Figure 5. Spectral-spatial classification for the Indian Pines image using majority voting within segmentation regions: (a) the pixel-wise SVM classification map; (b) the AMG-derived M-HSEG segmentation map; and (c) the final classification map.

Figure 6 .Figure 6 .
Figure 6.Sensitivity analysis of the GAs using the AMG-M-HSEG method with the Indian Pines image for l from 0 to 10.

Figure 6 .Figure 7 .
Figure 6.Sensitivity analysis of the GAs using the AMG-M-HSEG method with the Indian Pines image for l from 0 to 10.

Figure 8 .Table 3 .
Figure 8. Sensitivity analysis of the GAs achieved by the AMG-M-HSEG method using the Washington DC image for l from 0 to 12.

Figure 8 .Table 3 .
Figure 8. Sensitivity analysis of the GAs achieved by the AMG-M-HSEG method using the Washington DC image for l from 0 to 12.

Figure 10 .
Figure 10.Sensitivity analysis of the GAs achieved by the AMG-M-HSEG method using the center of the Pavia image for l from 0 to 14.

Table 4 .
The GAs and CAs (percent) for the center of the Pavia image using all the spectral-spatial classification methods for comparison.The highest accuracies are indicated in bold in each category.

Figure 10 .
Figure 10.Sensitivity analysis of the GAs achieved by the AMG-M-HSEG method using the center of the Pavia image for l from 0 to 14.
To solve PDE Equation (2), a first order discretization in time is used to approximate Bu i {B t by ´un`1 i ´un i ¯{τ, where τ is the time step size, and u n i is the solution of Equation (2) at time t n " n ¨τ.

Table 1 .
Number of training and test samples and the GAs and CAs (percent) for all the hyperspectral data sets using the SVM classification method.

Table 2 .
The GAs and CAs (percent) for the Indian Pines image using all the spectral-spatial classification methods for comparison.The highest accuracies are indicated in bold in each category.

Table 4 .
The GAs and CAs (percent) for the center of the Pavia image using all the spectral-spatial classification methods for comparison.The highest accuracies are indicated in bold in each category.

Table 5 .
The GAs and CAs (percent) produced by our method using different values of l from 4 to 6 for the Indian Pines image.The highest accuracies are indicated in bold in each category.

Table 5 .
The GAs and CAs (percent) produced by our method using different values of l from 4 to 6 for the Indian Pines image.The highest accuracies are indicated in bold in each category.