Remote Sensing Image Classification Using the Spectral-Spatial Distance Based on Information Content

Among many types of efforts to improve the accuracy of remote sensing image classification, using spatial information is an effective strategy. The classification method integrates spatial information into spectral information, which is called the spectral-spatial classification approach, has better performance than traditional classification methods. Construct spectral-spatial distance used for classification is a common method to combine the spatial and spectral information. In order to improve the performance of spectral-spatial classification based on spectral-spatial distance, we introduce the information content (IC) in which two pixels are shared to measure spatial relation between them and propose a novel spectral-spatial distance measure method. The IC of two pixels shared was computed from the hierarchical tree constructed by the statistical region merging (SRM) segmentation. The distance we proposed was applied in two distance-based contextual classifiers, the k-nearest neighbors-statistical region merging (k-NN-SRM) and optimum-path forest-statistical region merging (OPF-SRM), to obtain two new contextual classifiers, the k-NN-SRM-IC and OPF-SRM-IC. The classifiers with the novel distance were implemented in four land cover images. The classification results of the classifier based on our spectral-spatial distance outperformed all the other competitive contextual classifiers, which demonstrated the validity of the proposed distance measure method.


Introduction
Classification is an important technology to extract useful information from remote sensing images, so improving the accuracy of classification is always a popular research topic. The newest improvement direction focuses on exploiting spectral-spatial features to classify remote sensing images [1], namely spectral-spatial classification. Spectral-spatial classification approaches can also be called contextual classification approaches. Spectral-spatial classifiers are more accurate and work better compared with the traditional classifiers because they take into account the contextual spatial characteristics to enhance the performance of the classification results [2]. The techniques commonly used in contextual classifiers to extract spatial features include the random field (RF) [3], mathematical morphology [4], segmentation and clustering [5], and deep learning [6].
information within segmentation results. In our novel spectral-spatial distance measure method, additional IC is also considered. The novel spectral-spatial distance measure method takes full advantage of spatial information in the segmentation results which makes the distance computation more accurate. Exploiting the proposed spectral-spatial distance in classification could reduce the effect of segmentation results on classification to some extent; make up for the shortcoming that the results of spectral-spatial classification method based on segmentation is highly dependent on segmentation. As far as we know, this is the first time to introduction IC in remote sensing image classification. Our improved spectral-spatial distance measure is applied to the OPF-SRM and k-NN-SRM to obtain two extended contextual classifiers, OPF-SRM-IC and k-NN-SRM-IC. The classification results using the improved measure on four real land cover remote sensing images outperformed the original classifiers and other competitive contextual classifiers, which illustrated the validity of the proposed spectral-spatial distance function. This paper is organized in five sections. Section 2 described the proposed spectral-spatial distance measure that considers the detailed IC. Section 3 presented the datasets used for the experiments and the parameters used in the experiments. Section 4 provided a discussion and Section 5 is the conclusion of this paper.

Principle of SRM Segmentation
The SRM [33] is a commonly used hierarchical multiscale segmentation method. SRM is also a region-based segmentation method. The main idea of the SRM is to start with a pixel of an image and then determine whether a neighborhood should be added according to a particular order and merging criteria. The regional growth is stopped and image segmentation is completed when a certain condition is satisfied. The key steps in the SRM include the calculation of the merging prediction and merging sequence.
Given I * is the real scene, the statistical regions within I * represent the theoretical true objects. Image I denotes the observation of I * . Defined |·| is the cardinality of image, so there are |I| pixels contained in I in total. Each pixel of an image contains red-green-blue (RGB) values, each of the three belonging to the set {1, 2, . . . , g}. In 8 bit images, g = 256. Then each color channel of the pixels in I * could be described by a set of Q independent random variables within the range of [0, g/Q], which ensure the sum of the Q independent random variables belongs to {1, 2, . . . , g}. The merging criterion is that the pixels inside a region are homogeneous if they have the same expected value for any given color channel, and the pixels that belong to different regions are separate if the expected values are different for at least one color channel. The merging predicate can be written as follows: where R and R are two adjacent candidate regions to be merged. R α is the mean value of region R in channel α. b(R) is calculated as follows: where S |R| denotes the set of regions with |R| pixels and δ = 1/(6|I| 2 ). The regions R and R that are homogeneous will be merged if the P(R, R ) is true. A pixel P and its 4 connecting neighbors (up, down, left and right), P , constitute a pixel pair (P, P ). There are a total of N < 2|I| pixel pairs. The SRM defines the merging sequence as the ascending order of dissimilarity of these pixel pairs. The pixel pair with the smallest dissimilarity is the first to be predicated. The dissimilarity is measured by (P, P ) = max P α − P α , in which P α and P α are the values of pixels P and P in channel α ∈ {R, G, B}, respectively. So we can see, Q is the only parameter in SRM used to determine the statistical complexity of I * . One can achieve different segmentation granularity by adjusting Q values.

Spectral-Spatial Distance Based on SRM
As mentioned above, the merging prediction of the SRM segmentation method is based on the adjacent regions, and thus, the segmentation results of the SRM method reflect the spatial relation among the samples of an image. Moreover, the segmentation results of SRM could be used to measure the spatial distance between samples of an image.
In the segmentation results of SRM, the segmented regions decrease, and the number of regions becomes larger when Q increases and vice versa. We can obtain different scales of segmentation results of an image by tuning the Q values and put all these results together can build a hierarchical tree with the hierarchical level (HL) h. Such as in Figure 1, the hierarchical tree is constructed with the segmentation results of four different Q values for one image. Each level in the hierarchical tree is a different scale of segmentation. With the increasing values of Q, the hierarchical value levels decrease, the number of regions increases, and the size of regions decreases. The highest hierarchical level is H, in which is the initial undivided image. The structure of the hierarchical tree can be used to measure the spatial relation between the pixels. Huo et al. exploited the levels, namely the heights of segments to compute the spatial similarity between pixels. The segment with the minimum level among all common segments of two pixels calls their first common segment. The hierarchical level of the first common segment of pixels x i and x j is computed as their hierarchical level distance d HL . Then the similarity formula is defined as follows: where seg h i denotes that pixel x i belongs to a segment at level h. According to the above formula, the hierarchical level distance d HL of the green and yellow objects in Figure 1 is 3. The lower the hierarchical level h of the first common segment of two pixels is, the more spatially similar they are. The spatial similarity is minimal when the first common segment of two pixels is the original undivided image at the highest level H. egmentation results of SRM, the segmented regions decrease, and the numbe ger when increases and vice versa. We can obtain different scales of se image by tuning the values and put all these results together can build a e hierarchical level (HL) ℎ. Such as in Figure 1, the hierarchical tree is const ation results of four different values for one image. Each level in the hiera t scale of segmentation. With the increasing values of , the hierarchical e number of regions increases, and the size of regions decreases. The highest which is the initial undivided image. The structure of the hierarchical tree he spatial relation between the pixels. Huo et al. exploited the levels, namely to compute the spatial similarity between pixels. The segment with the min mmon segments of two pixels calls their first common segment. The hierarch mon segment of pixels and is computed as their hierarchical level d ilarity formula is defined as follows: > 0 , , = argmin seg = seg , denotes that pixel belongs to a segment at level ℎ. According to the abo ical level distance of the green and yellow objects in Figure 1 is 3. Th level ℎ of the first common segment of two pixels is, the more spatially simi similarity is minimal when the first common segment of two pixels is mage at the highest level . ditional distance function is computed based on the spectral feature space w sensing image classification, which ignore the spatial characteristics and lead to a lower classification accuracy. Inspired by Huo, Chen et al. The traditional distance function is computed based on the spectral feature space when applied in remote sensing image classification, which ignore the spatial characteristics useful for classification and lead to a lower classification accuracy. Inspired by Huo, Chen et al. proposed a spectral-spatial distance based on SRM, which considered both the spatial and the spectral characteristics of remote sensing image during classification. They used 1/0.9 d HL (x i ,x j ) to denote the spatial distance f spatial between pixels x i and x j and combined it with the traditional spectral feature to get a hybrid spectral-spatial distance function as follows: where d spectral is the distance between pixels based on the spectral feature space, such as the Euclidean distance that was used.

Novel Spectral-Spatial Distance Based SRM Considering IC
The accuracy of classification based on spectral-spatial distance proposed by Chen et al. is highly dependent on the performance of the SRM segmentation results, so we should make full use of the available information within the segmentation results and the corresponding hierarchical tree. However, Chen et al. only simply used the height information of segments in the hierarchical tree, and there is still a lot of other information of hierarchical tree can used to capture richer spatial characteristics and more accurate spatial relation. In this section, we extend the works of Chen et al. by proposing an improved spectral-spatial distance measure. The proposed distance measure method introduced the IC of two pixels shared.
The notion of IC was widely applied in the concept similarity measure method in data mining. Following the standard argumentation of information theory, the IC of a concept can be quantified by the probability. As the probability increases, the more abstract a concept becomes and the lower its IC. This same argument can apply to segments in a segmented image. The more pixels a segment contains increases its probability and lowers its IC. In the hierarchical tree obtained by SRM, from the low hierarchical level to the high hierarchical level, the size of segments goes from small to large, the segments go from specific to abstract, and the IC in segments goes from large to small. According to Resnik [34], the similarity between concepts depends on the information that two concepts shared, which could be computed by the IC of their common ancestor. Inspired by Resnik, we can obtain a new way to measure pixel distances through the quantitative characterization of information. The distance between the two pixels depends on the information that the two pixels shared. The less information that two pixels share, the more dissimilar they are, and the greater the distance.
The distance measure-based IC between pixels is defined as follows: where count FCS x i , x j is number of pixels in the first common segment of x i and x j . N is the total number of pixels the image contains. This fraction normalizes the d IC value between 0 and 1. According to function (6), the distance of a pixel from itself is 0. The distance is at the maximum of 1 when the first common segment of two pixels is the initial undivided image at the maximum hierarchical level H. When the first common segment of the two pixel points contains more pixels, these two pixels share less information, and the distance between them is greater. Conversely, if the first common segment of two pixel points contains fewer pixels, the two pixels share more information, and the distance value is smaller. For example, the first common segment of the red and blue pixels and the first common segment of the yellow and green pixels are both at hierarchical level 3. However, the region sizes of these two first common segments are different. The first common segment of the red and blue pixels contains four pixels, and the first common segment of the yellow and green pixels contains three pixels. Therefore, the spatial similarity between them is different. We combine our distance-based IC with the d spectral−spatial proposed by Chen et al. to obtain a comprehensive distance formula: Figure 2 is the flowchart of the proposed spectral-spatial distance-based SRM considering IC. The proposed comprehensive distance formula measures distance between sample both in spatial level and spectral level. Both the height information and IC are taken into account during spatial distance measurement which could increase the accuracy of the spatial distance and the final distance.
OR PEER REVIEW igure 2. Flowchart of the proposed comprehensive spectral-spatial distance D of the Proposed Comprehensive Distance Measure tage of the distance-based remote sensing image classification algorit y to execute, but the accuracy of the algorithm needs to be improved. ance function in this type classifier only use the spectral character spectral distance between samples during the classification, which ign f remote sensing data. This can be avoiding by applying the hybrid s and OPF algorithms are two commonly used methods based on distan nce. And they are closely related to each other. The OPF is equivalen ing samples are used as prototypes in the OPF [35]. Chen et al. distance to these two classifiers obtained two contextual classifiers, . The comprehensive distance considers the IC proposed in this pap e two distance-based classifiers. We applied our comprehensive distan OPF to obtained two contextual classifiers named as the k-NN

Application of the Proposed Comprehensive Distance Measure
The advantage of the distance-based remote sensing image classification algorithm is that it is simple and easy to execute, but the accuracy of the algorithm needs to be improved. Especially, the traditional distance function in this type classifier only use the spectral characteristic and only depend on the spectral distance between samples during the classification, which ignore the spatial characteristic of remote sensing data. This can be avoiding by applying the hybrid spatial-spectral distance.
The k-NN and OPF algorithms are two commonly used methods based on distance due to their good performance. And they are closely related to each other. The OPF is equivalent to the 1-NN when all training samples are used as prototypes in the OPF [35]. Chen et al. applied their spectral-spatial distance to these two classifiers obtained two contextual classifiers, the OPF-SRM and k-NN-SRM. The comprehensive distance considers the IC proposed in this paper can also be applied to these two distance-based classifiers. We applied our comprehensive distance measures to the k-NN and OPF to obtained two contextual classifiers named as the k-NN-SRM-IC and OPF-SRM-IC, respectively. The comprehensive distance formula proposed in this paper is used to measure distance between samples in k-NN-SRM-IC and OPF-SRM-IC.

k-NN-SRM-IC
The idea of k-NN is each test sample can be represented and classified according to its k-nearest neighbors in the training set. The classifier assigns a test sample to the class that its k-nearest neighbors most belong to. The novel comprehensive distance formula considers IC is used to find the k-nearest neighbors in k-NN-SRM-IC.

OPF-SRM-IC
OPF construct a complete graph A that each pair of nodes is connected by a single edge in the training stage, the nodes of A is the samples of training set Z 1 , and the weight of edge is the distance between the nodes connected by this edge. The adjacent nodes with different labels in the minimum spanning tree (MST) of graph A were chosen as prototypes set S; each prototype represents one class. π t denotes a path consist of a sequence of adjacent samples start from a root R(t) and end with sample t. The maximum distance between adjacent samples along π t defined as the path-cost f(π t ), which reflects connection strength of nodes in path. The connection strength between R(t) and t is maximum and path π t is optimum when f(π t ) ≤ f(τ t ), where τ t is any other path containing node t. All the other samples in the training set are connected to a prototype with the minimum path-cost f(π t ).
And the minimization of f(π t ) is assigned to sample t as the minimum cost C t used for the testing stage. A prototype with all the samples connected constitute an optimum tree, all the optimum trees constitute the optimum forest which is used to classified the testing set.
For a given testing sample v in the testing set Z 2 , OPF connect it to all the samples t ∈ Z 1 . Then we find the optimum path P(v) according to the minimum cost C v : The prototype included in P(v) is connected most closely with v and they are most likely within the same class, so label v with the class of R(v). The novel comprehensive distance formula considers IC is used to compute the weight of edge and the cost function in OPF-SRM-IC.

Datasets
The datasets used in this paper are land cover images obtained from the CBERS-2B and Landsat-5 TM covering the Itatinga area of Brazil and land cover images obtained from Ikonos-2 MS and Geoeye covering the Duque de Caxias area of Brazil. The Green, Red and Near IR bands for CBERS-2B, Ikonos-2 MS and Geoeye were used in the experiment. Near IR, Red and SW IR bands for Landsat-5 TM were used in the experiment. These four images are shown in Figure 3. The RGB values were used as features to describe each pixel. The ground truth images of the datasets were annotated by Rodrigo José Pisani [12] and shown in Figure 4. Tables 1 and 2 describe the detailed information of land cover classes included in the datasets.

Parameter Settings
In addition to comparisons with the original k-NN-SRM and OPF-SRM, the contextual classifiers using the proposed comprehensive distance (k-NN-SRM-IC and OPF-SRM-IC) were also compared against with k-NN, OPF, SVM, the spectral-spatial classifiers based on WH segmentation as well as the spectral-spatial classifiers based on MRF. For the MRF-based classifiers, we used the

Parameter Settings
In addition to comparisons with the original k-NN-SRM and OPF-SRM, the contextual classifiers using the proposed comprehensive distance (k-NN-SRM-IC and OPF-SRM-IC) were also compared against with k-NN, OPF, SVM, the spectral-spatial classifiers based on WH segmentation as well as the spectral-spatial classifiers based on MRF. For the MRF-based classifiers, we used the OPF-MRF proposed by Nakamura [10] and the SVM-MRF proposed by Osaku [12]. The original spectral-spatial classifier based on WH and MV is the SVM-WH-MV [17], which uses the SVM as the pixel-wise classifier performed on the original hyperspectral image first. Then, for every watershed region, all the pixels are assigned to the most frequent class within this region by MV approach. Other classifiers can also be used as the pixel-wise classifier. Thus, we built another two WH-MV-based spectral-spatial classifiers, the k-NN-WH-MV and the OPF-WH-MV, with the k-NN and OPF pixel-wise classifiers, respectively.

Parameter Settings
In addition to comparisons with the original k-NN-SRM and OPF-SRM, the contextual classifiers using the proposed comprehensive distance (k-NN-SRM-IC and OPF-SRM-IC) were also compared against with k-NN, OPF, SVM, the spectral-spatial classifiers based on WH segmentation as well as the spectral-spatial classifiers based on MRF. For the MRF-based classifiers, we used the OPF-MRF proposed by Nakamura [10] and the SVM-MRF proposed by Osaku [12]. The original spectral-spatial classifier based on WH and MV is the SVM-WH-MV [17], which uses the SVM as the pixel-wise classifier performed on the original hyperspectral image first. Then, for every watershed region, all the pixels are assigned to the most frequent class within this region by MV approach. Other classifiers can also be used as the pixel-wise classifier. Thus, we built another two WH-MV-based spectral-spatial classifiers, the k-NN-WH-MV and the OPF-WH-MV, with the k-NN and OPF pixel-wise classifiers, respectively.
We used the accuracy measure proposed by Papa to assess the classification results [36]. For the OPF, we used LibOPF [37]. For the MRF-based classifiers, we used the code obtained from [12], and for the k-NN and WH-MV-based methods we used our own implementation. The training and testing sets were constructed using the holdout method with 5% and 95%, respectively. Considering that the randomness of the holdout method may affect the results, the classification for each image was executed five times with different partitions. The final accuracy is the mean value of five iterations' results. The number of iterations for the MRF-based methods is ten. For the SVM-based methods, the kernel is the radial basis function (RBF), and the relevant parameters have been optimized. k was chosen from the interval [1,10] according to the result having the maximum accuracy. We set fourteen different Q values for each image using the segmentation. They are presented in Table 3 with the corresponding hierarchical level. When Q increases to a certain extent, the result of the segmentation tends to stabilize, and we can determine the maximum value of Q.

Quantitative Results
The accuracy of the classification methods using the proposed distance measure and comparison methods for all images is shown in Table 4. The values in brackets are the parameter values for the relevant classifiers, including β used in two MRF-based classifiers and k with the best results in the k-NN-based classifiers. The spectral-spatial classifiers using the proposed comprehensive distance that considers the IC have an obvious improvement. The accuracy of SRM-based methods that consider IC is higher than SRM-based methods without IC. Among all the methods, the k-NN-SRM-IC has the best performance and obtained the highest accuracy. For each pixel-wise classifier, the SRM-based methods performed better than WH-MV-based methods, which indicate that SRM is more effective than WH in capturing spatial information. The k-NN-SRM outperformed the SVM-MRF in the case that the k-NN has similar performance to the SVM, and the OPF-SRM outperformed OPF-MRF. This illustrates that the spatial information obtained by the SRM is much more accurate than the spatial information obtained by the MRF.  Figure 5 shows the classification results for the CBERS-2B image. The k-NN-SRM-IC and k-NN-SRM obtained similar results that classified all the classes correctly, and the k-NN-SRM-IC is more accurate than the k-NN-SRM in culture. For the OPF-SRM and OPF-SRM-IC classifiers, reforesting and bushes were confused. The OPF failed to identify culture, reforesting and bushes. All the other methods cannot recognize culture completely. There are many discontinuous fragments in the results of methods using the WH and MV mechanisms.

Visual Results
As shown in Figure 6, the k-NN-SRM-IC and k-NN-SRM recognized all the classes correctly for the Landsat-5 image, and the k-NN-SRM-IC is more accurate than the k-NN-SRM in culture. The OPF, OPF-SRM, OPF-SRM-IC, and OPF-WH-MV confused a part of grasslands and reforesting. Other methods' results are similar in that they misclassified a portion of reforesting and culture.   As shown in Figure 7, the result of the k-NN-SRM-IC is still the best for the Ikonos-2 MS image, and the k-NN-SRM is second. The OPF-SRM and OPF-SRM-IC misclassified bare soil as grassland. All the other methods misidentified tree coverage, clear tonal signatures and the covering of dark tonal signatures as road.
As shown in Figure 8,    As shown in Figure 7, the result of the k-NN-SRM-IC is still the best for the Ikonos-2 MS image, and the k-NN-SRM is second. The OPF-SRM and OPF-SRM-IC misclassified bare soil as grassland. All the other methods misidentified tree coverage, clear tonal signatures and the covering of dark tonal signatures as road.
As shown in Figure 8,

Discussion
The drawback of segmentation-based spectral-spatial classification methods is that they are highly dependent on the performance of the segmentation methods used, which includes two aspects: one is the accuracy of the segmentation results; the other is the information expression/capture capability of the segmentation results. The strategy based on MV belongs to the first case, the segmentation results are used directly to determine the final classification of samples. Once there are mistakes in the segmentation results, the final classification results will be affected. Such as the methods of combining WH and MV can improve the original classification to a certain extent, however, there are still many discontinuous fragments without practical significance. This is due to the WH segmentation results are over segmented.
In the methods of construct the spectral-spatial distance depending on segmentation results, the quality of spatial information captured from segmentation results affect the final classification

Discussion
The drawback of segmentation-based spectral-spatial classification methods is that they are highly dependent on the performance of the segmentation methods used, which includes two aspects: one is the accuracy of the segmentation results; the other is the information expression/capture capability of the segmentation results. The strategy based on MV belongs to the first case, the segmentation results are used directly to determine the final classification of samples. Once there are mistakes in the segmentation results, the final classification results will be affected. Such as the methods of combining WH and MV can improve the original classification to a certain extent, however, there are still many discontinuous fragments without practical significance. This is due to the WH segmentation results are over segmented.
In the methods of construct the spectral-spatial distance depending on segmentation results, the quality of spatial information captured from segmentation results affect the final classification

Discussion
The drawback of segmentation-based spectral-spatial classification methods is that they are highly dependent on the performance of the segmentation methods used, which includes two aspects: one is the accuracy of the segmentation results; the other is the information expression/capture capability of the segmentation results. The strategy based on MV belongs to the first case, the segmentation results are used directly to determine the final classification of samples. Once there are mistakes in the segmentation results, the final classification results will be affected. Such as the methods of combining WH and MV can improve the original classification to a certain extent, however, there are still many discontinuous fragments without practical significance. This is due to the WH segmentation results are over segmented.
In the methods of construct the spectral-spatial distance depending on segmentation results, the quality of spatial information captured from segmentation results affect the final classification results. Simply and insufficient spatial information express/capture from segmentation results may reduce the classification accuracy. In our proposed method, the spatial information capture capability of the segmentation results is enhanced. Our proposed method consider both the height information and the IC of samples to measure the spatial distance, which take full advantage of segmentation results and capture more complex spatial information. The performance of the k-NN-SRM-IC is better than the k-NN-SRM, and the OPF-SRM-IC outperformed the OPF-SRM in both quantitative and visual assessments, which illustrate the proposed comprehensive spectral-spatial distance considers IC is more effective than the spectral-spatial distance that only used the hierarchical level to capture spatial information. The results also indicate the feasibility and effectiveness that using the notion of the IC that two pixels share to measure the spatial distance between them is correct.
The classification results based on the MRF are complementary with the classification results based on the SRM. The classes that are correctly recognized by the MRF-based methods cannot be identified by SRM-based methods. Conversely, those classes that are identified correctly by the SRM-based methods are misclassified by the MRF-based methods. Thus, we can consider combining the MRF and SRM together to improve the classification in the future.

Conclusions
In this paper, we introduce IC to measure the spectral-spatial distance. We use the number of pixels contained in a segment to measure its IC. The more pixels that a segment includes, the more abstract that it is, and the less information the segment has. The fewer pixels that a segment includes, the more specific that it is, and the more information the segment has. The spatial distance between pixels can be computed according to the IC of their first common segment in a hierarchical tree constructed by the results of SRM multiscale segmentation. Integrating the spatial distance computed by the IC with the spectral-spatial distance that utilizes the hierarchical level and spectral features, we can obtain a novel spectral-spatial distance measure that can capture more spatial information. The novel spectral-spatial distance can be applied to the traditional distance-based classification method to improve the performance, such as the k-NN and OPF. The novel distance-based contextual classifiers using our novel spectral-spatial distance function, named the k-NN-SRM-IC and OPF-SRM-IC, are implemented in four land cover images. The k-NN-SRM-IC obtains the highest accuracy among all methods. The classification results proved the accuracy of the IC-based distance measure function and proved that the novel spectral-spatial distance function has the capability to capture more spatially related information between pixels. In the future, we will explore applying the proposed spectral-spatial distance in other contextual classifiers to improve the classification performance.