3.1. Parameter Settings and Evaluation Measures
To enable a better understanding of the maximum vote decision rule in our method, an illustrative example of the integration process using the majority voting step is depicted in
Figure 4. As mentioned in
Section 2.3, the AMG-derived M-HSEG algorithm is used to segment the hyperspectral image into different regions with region labels, as shown in
Figure 5b. To assign each region an information label, we integrate the unsupervised segmentation map and the pixel-wise SVM classification map by applying majority voting within this region in the segmentation map. For each region in
Figure 5b, its class label is assigned to the most frequent class in the pixel-wise classification map in
Figure 5a within this region. In this way, the advantages of both the pixel-wise SVM classification and the AMG-derived M-HSEG algorithm are combined; the resultant classification map is shown in
Figure 5c. It should be noted that a marker may be assigned to the wrong class by most of the marker-based classification methods. Therefore, all pixels within the region grown from this marker are at risk of being wrongly classified. To tackle this problem, the majority voting step is widely used. Nevertheless, the purpose of the step in our method is to perform spectral-spatial classification on all the homogeneous regions in the unsupervised segmentation map by combining spectral and spatial information because we have no idea which class a marker should belong to with respect to the ground truth.
In this section, several hyperspectral classification methods are compared to the proposed AMG-derived M-HSEG (AMG-M-HSEG) classification method. (1) In the pixel-wise SVM classification method, the parameters are optimally set for each data set. (2) Two marker-based spectral-spatial classification methods proposed by Tarabalka
et al. [
36] used a HSEG algorithm following the classification-derived marker selection methods; these methods were used both without and with the optional majority voting under the rule that the class label of each region is given to the class with the maximum pixels within this region in the classification map. These methods are named “Morph-M-HSEG”, “Morph-M-HSEG + MV”, “Proba-M-HSEG” and “Proba-M-HSEG + MV”, respectively. (3) Another marker-based spectral-spatial classification method proposed by Tarabalka
et al. [
25] uses an MSF construction following the Proba-MS marker selection method, which are also used without and with the optional majority voting step. These methods are named “Proba-M-MSF” and “Proba-M-MSF + MV”, respectively.
(1) Because the merging of spatially non-adjacent regions always creates a large computational burden, the optional parameter Swght is set as Swght = 0.0 to improve the computational efficiency for all the hyperspectral images in our experiments, which means that only spatially adjacent regions are merged in the HSEG step.
(2) To increase the computational efficiency, four-neighborhood connectivity is exploited in the HSEG algorithm and the MSF construction algorithm.
(3) Because two different similarity metric measures are commonly used for hyperspectral images to discretize θ in Equation (3), i.e., the ED and the SAM between spectral vectors, we apply those two measures for computing the DCs between the regions for the HSEG and the weights of the edges for the MSF construction, respectively. It should be remarked that our experiments on the images used in the paper demonstrate that both the Proba-M-HSEG (+MV) and Morph-M-HSEG (+MV) methods using the ED measure always result in inaccurate or false segmentation and classification maps because the ED measure cannot provide a satisfactory dissimilarity measure between the region mean vectors for the M-HSEG algorithm. Therefore, these two classification methods using the ED measure, both without and with the optional majority voting step, are not considered in our following experiments for comparison. In addition, other similarity metric measures such as the L1 vector norm and the spectral information divergence (SID) can be used as well.
(4) The parameters for our method are set according to former research on the AMG method [
33]. In our experiments, we set
τ = 1,
υ = 0.2, and
K = 0.01. In addition, we used the method described in [
33] to determine the coarsest grid
S,
i.e., if the number of vertices in any grid starting with the finest grid is equal or less than log
2U (
U is equal to the original image size), the construction of the multigrid structure stops, and the coarsest grid
S can be automatically obtained.
(5) The multiclass one-
vs.-one SVM classifier with a Gaussian radial basis function (RBF) kernel is used on the hyperspectral data sets. SVM has been the most frequently used method and can achieve higher classification accuracies than traditional pixel-wise techniques when a limited number of training data sets are available. Refer to [
37,
38,
39] for details on SVM. As a consequence, information classes are defined for the hyperspectral image, and each pixel is given a unique class label. The performance of the methods is objectively evaluated in terms of global accuracy (GA) measures that include the OA, average accuracy (AA), the kappa coefficient
[
40], and class-specific accuracy (CA). Note that these objective measures can be obtained from the confusion matrix.
3.2. The Indian Pines Image (AVIRIS)
The first Indian Pines hyperspectral image, which was acquired with the AVIRIS sensor, has 145 × 145 pixels and 220 bands in the 400–2500 nm range, which represent a 2 mile by 2 mile area with a spatial resolution of 20 m. A spectral subset of 185 bands was used for our experiments. Sixteen classes of interest, which are shown in
Table 1, were used in our experiments. To perform the supervised classification, 10% of the labeled pixels in each class in the ground truth data were randomly selected as training samples, and the remaining 90% were used as test samples. It can be observed from the ground truth data that some classes only include a very small number of samples, such as
Alfalfa,
Grass/pasture-mowed and
Oats. For each of those classes, we randomly selected 10 training samples, and the remainder of the samples were used for testing. In addition, the SVM classifier parameters
C and
γ were optimally obtained using a five-fold cross-validation, and
C = 8192,
γ = 0.5.
In the Proba-MS method, there are three adjustable parameters (M, P and T). Because the maximum size of the connected components for oats in the SVM classification map was 19, we set M = 19 for the Proba-MS procedure in the Proba-M-MSF (+MV) and Proba-M-HSEG (+MV) methods. In addition, in the Proba-M-MSF (+MV) method, the parameter P was computed as P = 6%, given the condition that each marker for a large region should have at least one pixel. The last parameter, T, was set equal to the lowest probability within the highest 2% of the probability estimates, whereas in the Proba-M-HSEG (+MV) method, the parameter settings were set to P = 40% and T = 50%. In addition, the size of the structuring element in the Morph-M-HSEG (+MV) method was 3 × 3.
To build the multigrid structure, 10 coarse grid levels (
S = 10) were constructed according to
Section 3.1.
Figure 6 shows the impact on the objective quantitative assessments of the GAs caused by varying
l from 0 to 10. We can observe from these plots the high robustness of the results with respect to values of
l from 1 to 5 when compared to the SVM classification result (
l = 0). In addition, it can be observed that the shapes of the plots share a similar global behavior. As
l is increased, all the GAs rise gradually until reaching a peak. After the maximum, the OA and
values drop from 96.32% and 95.80% (
l = 5) to 51.05% and 42.61% (
l = 10), respectively. However, the AA value drops more quickly than the other two measures, from a high value of 95.95% to 28.6%. This can be explained because for
l ≥ 6, all the pixels in the classification maps, which should belong to
Grass/pasture-mowed in the ground truth data (refer to
Figure 7b), are assimilated with their neighboring structures. As a consequence, the CA of this class is 0, and the corresponding AA is lower. In this experiment, which had a coarse grid level of
l = 5 estimated using Equation (15), 397 vertices, which occupy 1.9% of the total number of pixels in the image, are used as markers. It is worth noting that other GAs such as AA and
can also be used in Equation (15), and the same value of
lopt (
lopt = 5) will be obtained.
Table 1 lists the number of training and test samples for each class in the ground truth data and the classification accuracies of the SVM classification, and
Table 2 lists the classification accuracies of all the marker-based classification methods used here. The RGB composite map from bands 47, 23 and 13 of the AVIRIS image and its ground truth data are depicted in
Figure 7a,b, respectively.
Figure 7c–m illustrates the corresponding classification maps. From those results, we reached the following conclusions.
(1) The Morph-M-HSEG and Proba-M-HSEG methods, both with or without the optional majority voting step, can achieve better GAs when compared with the SVM classification. Meanwhile, the highest CAs for 6 of the 16 classes were achieved when using those four methods, including
Corn-min till,
Grass/trees,
Grass/pasture-mowed,
Hay-windowed,
Soybeans-clean till and
Stone-steel towers. However, those methods always resulted in a slight under-segmentation in the HSEG step. For example, it can be observed in
Figure 7 that some small regions of the
Corn-no till,
Oats and
Grass/pasture classes were merged by their adjacent regions that belonged to the other classes, or some small regions of the other classes were merged by large regions of the
Corn-min and
Soybean-clean classes. In contrast, in the results of our method shown in
Figure 7l,m, most of the small regions that belonged to different classes were better preserved. In addition, the majority voting step did not improve the GAs and CAs of the Morph-M-HSEG method using the SAM distance because almost all the pixels in each region in
Figure 7f have the same class label.
(2) The Proba-M-MSF method, both with or without the optional majority voting step, can obtain better GAs than the SVM classifier. However, the highest OA by our method is 2%–5% higher when compared with this method. To clearly demonstrate the difference between the two methods, one region at the top-middle of the image was used for comparison. This region should be classified as
Bldg-Grass-Trees-Drives according to the ground truth data, but a large number of pixels in that region were classified as
Woods by the Proba-M-MSF method, as shown in
Figure 7j,k. By comparison, our method can achieve more accurate classification maps. Apart from these observations, another small region in the top left of the image was used for comparison. It can be observed that this region was correctly classified as
Grass/pasture by our method, which is consistent with the ground truth data. Nevertheless, the entire region was merged by its spatial adjacent region, which belonged to
Bldg-Grass-Trees-Drives, by the Proba-M-MSF method.
(3) The GAs achieved by our method using the SAM distance were the best among all the classification methods used for comparison. In this case, the OA and increased by 13.81% and 15.84%, respectively. Meanwhile, the highest CAs for 8 of the 16 classes were achieved when using our method. On that occasion, the AA was improved by 15.34%. It is very important to preserve material boundaries and edge structures in classification maps. From the classification maps, we can observe that our method was better than the other marker-based classification methods in terms of region homogenization and edge preservation.
3.3. The Washington DC Image (HYDICE)
In the next example, the benchmark Washington DC image from the HYDICE sensor contains 1208 scan lines with 307 pixels in each scan line and 224 bands, and it has a spatial resolution of approximately 2.8 m. A sub-image was produced for our experiments by spatially and spectrally subsetting to include 200 × 225 pixels and 191 bands. Because the image also has a high spatial resolution, we obtained a ground truth image with six labeled classes by identifying the different materials. In the SVM classification algorithm, 5% of the labeled pixels for each class in the ground truth data were randomly chosen for training, and the remaining labeled pixels were used for testing. The optimal parameters for the SVM classifier were estimated as
C = 2084 and
γ = 2 by five-fold cross-validation. The GAs and CAs of the classification of the data set using the SVM classification are listed in
Table 1. The parameters for the Proba-MS algorithm were fixed as follows:
M = 20,
P = 5% and
T = 2% for the Proba-M-MSF (+MV) method and
M = 20,
P = 40% and
T = 50% for the Proba-M-HSEG (+MV) method. Additionally, a 3 × 3 structuring element was used in the Morph-M-HSEG (+MV) method.
In our method, 12 coarse grid levels were constructed in the AMG structure, and the optimal coarse grid level
l = 5 can be obtained from
Figure 8 using Equation (15). In this grid, 1125 vertices, which occupy 2.5% of the total number of pixels in the image, were utilized as markers. To objectively compare the classification results, the GAs and CAs of the Washington DC image are shown in
Table 3, and all the corresponding classification maps are illustrated in
Figure 9. As can be observed from these results, we can obtain similar conclusions as for the Indian Pines image. In particular, it can be observed from
Table 3 that the GAs can be better than the pixel-wise SVM by most of the spectral-spatial classification methods used here, except for the Proba-M-MSF method, which could not effectively differentiate the
Street,
Roofs and
Path classes. Furthermore, the best GAs among all the classification methods were obtained by our method with the SAM distance. In this case, the OA and
were better by 3.84% and 4.83%, respectively, compared with the SVM results. In addition, the highest CAs for 3 of the 6 classes were achieved by our method. In this example, the increase in AA values was as large as 3.58%.
3.4. The Centre of Pavia Image (ROSIS)
The third hyperspectral data set was the Centre of Pavia image, which was acquired by the ROSIS-03 optical sensor. The image has 400 × 400 pixels, 102 spectral channels and a spatial resolution of 1.3 m. To evaluate our method, we manually generated a ground truth data for the image by visual interpretation that included ten material classes of interest, and 2% of the labeled pixels of each class from the ground truth data were selected as training samples. The remaining ones were used for testing. A pixel-wise SVM classification was performed on the image, and the following parameters were chosen by five-fold cross-validation:
C = 1.31072 × 10
5 and
γ = 2. The training and test samples for each class and the corresponding classification accuracies by the SVM classifier are reported in
Table 1. The Proba-MS algorithm parameters for the Proba-M-MSF (+MV), Proba-M-HSEG (+MV) and Morph-M-HSEG (+MV) methods were the same as the second hyperspectral data set.
To obtain accurate markers for the following segmentation, 14 coarse grid levels of the Centre of Pavia image were constructed for building the AMG structure, in which 3105 vertices, which occupied 1.9% of the total number of pixels in the image for the optimal grid level of
l = 6 , were utilized as markers, as shown in
Figure 10. For comparison, we show the GAs and CAs of the applied classification methods used here in
Table 4, and the corresponding classification maps are displayed in
Figure 11. It can be observed that almost all the spectral-spatial classification methods achieved higher GAs when compared with the pixel-wise SVM classification, except for the Proba-M-MSF method with the SAM distance. As shown in
Figure 11i, most of the building shadows were not appropriately recognized by the Proba-M-MSF method with the SAM distance, and the CAs of this method confirm that conclusion. For example, the
Shadow CA was only 27.15%, which was much lower than the 95.94% achieved by the SVM classifier. In addition, the best GAs were achieved using our method. For example, the increases in OA and
were as large as 7.06% and 7.91%, respectively, compared with the SVM results. Apart from that, the highest CAs for 5 of the 10 classes were achieved by our method, and the improvement in AA was as large as 5.96%.