Superpixel Segmentation of Hyperspectral Images Based on Entropy and Mutual Information

: Superpixel segmentation (SS) methods have been proven to be feasible in improving the performance of hybrid algorithms on hyperspectral images (HSIs). In this paper, a superpixel segmentation algorithm based on the information measures with color histogram driving (IM-CHD) was proposed. First, Shannon entropy was applied to measure the image information and preliminarily select spectral bands. Mutual information (MI) is derived from the concept of entropy and measures the statistical dependence between two random variables. Also, MI can effectively identify the redundant spectral bands. Therefore, in this paper, both MI and color matching functions (CMF) were used to select the most useful spectral bands. Second, the selected spectral bands were combined into a false color image containing the main spectral information. A local optimization algorithm named “hill climbing” was used to achieve the superpixel segmentation. Finally, parameter selection experiments and comparative experiments were performed on two hyperspectral data sets. The experimental results showed that the IM-CHD method was more efficient and accurate than other state-of-the-art methods.


Introduction
Hyperspectral images (HSIs) contain full spectral and spatial information, and the formed threedimensional data block can effectively reflect much ground object information that cannot be detected in wide-band multi-spectral images [1]. In recent years, the classification of hyperspectral images has received extensive attention in the field of remote sensing image processing. In addition, hyperspectral images have the characteristics of high dimension and high information redundancy. Therefore, dimensionality reduction and image preprocessing (such as image segmentation) are important.
At present, there are mainly two types of dimension reduction methods for hyperspectral images, i.e., feature extraction and band selection. Band selection is based on information measures and has been intensively studied. In the band selection method, Shannon entropy or its variations are typically used to evaluate image information. Shannon proposed a supervised band-selection algorithm, in which only the known class signatures were used while neither the original band nor a training sample was required [2]. Based on mutual information, Martínez-Usó et al. used a clustering method to select band and achieve dimensionality reduction [3]. Based on spatial entropy, Wang et al. proposed a hyperspectral band selection algorithm for supervised classification [4]. On the other hand, many methods perform feature extraction on hyperspectral images to achieve dimensionality reduction. For example, Chavez and Kwarteng proposed a selection method using principal component analysis (PCA) [5]. Wang et al. proposed a non-linear extraction method based on manifold learning [6]. The dimensionality of data can be reduced by finding the corresponding coordinates on low-dimensional manifold for the points in high-dimensional space. Chen et al. proposed a dimensionality reduction method for hyperspectral images based on sparse representation [7]. Jacobson et al. proposed a band transformation method that extends the color matching functions (CMF) to the entire image spectra, which provided a foundation for subsequent band selection [8].
In addition to the dimensionality reduction, hyperspectral image segmentation can also reduce the processing complexity of subsequent images. Acito et al. proposed an HSIs segmentation algorithm based on statistical approach [9]. The proposed algorithm was completely unsupervised and only relied on the spectral information. Veganzones et al. proposed a hyperspectral image segmentation method using a new spectral unmixing-based binary partition tree representation [10]. Rundo et al. proposed an intelligent image analysis framework for image enhancement, automatic global thresholding and segmentation [11]. However, the above traditional object-level segmentation algorithms have the common problem of under-segmentation, which has a significant impact on post-processing. In recent years, some researchers have proposed an over-segmentation scale called superpixel, which has adaptive sizes and shapes for different spatial structures. In the hyperspectral image classification process, segmentation using the superpixel method can effectively reduce the impact of the segmentation scale on the classification results [12]. Therefore, superpixel segmentation (SS) has become an important preprocessing method for HSIs. The concept of superpixel was firstly proposed by Ren et al. [13]. In recent years, new superpixel segmentation algorithms have been continuously proposed. Liu proposed an entropy rate-based objective function for superpixel segmentation. However, the proposed method was very computationally intensive and time consuming [14]. Bergh et al. proposed a superpixel segmentation algorithm via energy-driven sampling. However, the disadvantage of this method was that it cannot solve the problem of sudden colohr changes [15]. Zhang et al. proposed to pre-cluster the hyperspectral image for the non-linear graphs into Simple Linear Iterative Clustering (SLIC) superpixels, and then use the superpixels as the input of the dimensionality reduction algorithm [16]. In this method, only the first principal component map was used for segmentation, which may cause the loss of important information. Based on the region growth method in [17], Xu et al. proposed a super-pixel segmentation method [18]. In this method, on the basis of fast region growth, the low-dimensional representations of hyperspectral images were achieved without sacrificing subsequent classification performance. Rodarmel and Shan used principal component analysis as a preprocessing technique to classify hyperspectral images [19]. Based on super pixels, Zhang et al. proposed a hyperspectral image classification method to effectively identify the type of land cover [20]. Steven et al. proposed a band selection method to effectively visualize the spectral images [21]. The results showed that the superpixel segmentation scale had sufficient potential in hyperspectral image segmentation. Therefore, although superpixel segmentation is still under study, it has been proved to have promising prospects in HSIs segmentation.
In this paper, a new hyperspectral image superpixel segmentation algorithm (IM-CHD) was proposed, in which the new techniques and methods, such as band selection method based on information measures, CMF, and color histogram driving function, were used to achieve more accurate and effective HSIs segmentation. In our previous work [22], we have proposed an HSIs classification method based on information measure, and proved the effectiveness of obtaining the main spectra of HSIs through information measure. In this paper, we synthesize the false color image based on the spectral components selected by information measure, and realize the segmentation of hyperspectral image based on this. The results of this study have significant reference value in two major aspects.
(1) The information measures theory, including entropy, mutual information, and normalized mutual information, was applied to superpixel segmentation for HSIs. When information measures were used in the superpixel segmentation, CMF was applied to HSIs at the same time.
The combination of both information measures and CMF achieved dimensionality reduction of hyperspectral images. Thus, the superpixel segmentation based on information measures is different from the existing spectral band selection methods based on information measures.
(2) As one of the most innovative and effective methods, superpixel segmentation based on color histogram driving and hill climbing optimization was used to segment the false color image obtained in previous steps.

Dimensionality Reduction of HSIs Based on Information Measures
Band selection refers to the selection of band images which have relevant or irrelevant information [23]. The entropy and mutual information were generally used as the criterion for the band selection [21]. The entropy of each spectral band was calculated to evaluate the amount of contained information in this band. A threshold was set to exclude the irrelevant spectral bands. Then CMF was used for spectrum segmentation and the mutual information was used for the final band selection. The process can be mainly divided into the following two steps.

Preliminary Selection of Spectral Bands Based on Entropy and Color Matching Function
Shannon first introduced the concept of entropy and mutual information [24]. It has been proven that Shannon's information measures theory is very effective in reducing the dimensionality of highdimensional data. In hyperspectral images, each channel is considered to be equivalent to a random variable X , and all the pixels of the channel are considered to be the events of X .
The entropy of a channel can be expressed as: is the probability density of X , and b is the order of the algorithm.
X and Y are two random variables with n and m values, respectively. The entropy of the joint event is: where ( ) , i j P X Y is the probability of the joint occurrence of first i X and then j Y .
The mutual information between the two random variables X and Y is: is the joint entropy of the two random variables, X and Y . Bell proposed the concept of the co-information [25] of three random variables, X , Y and Z , which can be expressed as follows: where ( ) H X,Y,Z stands for the joint entropy of three random variables X , Y and Z .This principle remains feasible for HSIs. The information of one channel can increase the mutual information between the other two channels. Thus, when the co-information is smaller, the amount of the shared information is larger. In the preliminary selection, the low-informative channels are removed. First, the entropy of each spectral band of the HSIs is calculated. Second, the local average value is defined by Equation (5 In Equation (5), m stands for window size which represents the size of the neighborhood. All the bands meeting the condition in Equation 6 can be retained.
In Equation (6), σ is the threshold factor. The bands with higher entropy than their local average values are distinguished by the threshold factor, and these bands are considered to be irrelevant, as shown in Figure 1. The window size m and threshold factor σ were appropriately selected based on the smoothness of the curve. When the entropy curve was smoother, the probability of an uncorrelated band was lower and there was a smaller number of bands outside the correlation range. In this case, both σ and m were set to small value in order to improve the precision. On the contrary, a sharper curve indicated that the neighboring channels were more different. In this case, a larger window size should be used. Then, the CIE 1931 supplementary standard colorimetric observer, i.e., CMF [26], was used to make a second selection of the spectral bands. Based on the experiments, the CMF for a particular wavelength was derived by determining the mixing ratio of the three primary colors of light (red, green, and blue) to generate the same impression as a monochromatic light at that wavelength. By applying the CIE color matching envelopes to the HSIs in the visible range, the HSIs can be visualized as a colorimetric ally correct image [8].
The wavelength was set to 360 λ = nm in the first valid AVIRIS band and 830 λ = in the last valid AVIRIS band. Then, linear interpolation was performed between the two bands to achieve HSIs stretch, as shown in Figure 2. were obtained, which were corresponding to the red, blue, and green primitive channels, respectively. The impact of t is illustrated in Figure 3. In the figure, the horizontal lines represent the spectrum thresholds in two different cases ( ). The spectral bands with abovethreshold CMF coefficients are retained. In this paper, an automatic threshold method was adopted to determine t . The optimal threshold was defined as the value at which the amount of discarded information was maximized. . Then, the optimal value of t can be obtained from Equation (7).

Band Selection Algorithm Based on Mutual Information
The dimensionality reduction of the HSIs was achieved by optimizing two criteria, i.e., maximizing the amount of information and minimizing the redundancy. These two criteria need to be considered simultaneously.
Pla et al. used the normalized mutual information for band selection [27]. In this paper, the kth order normalized information (NI) of the bandset was calculated as follows: is the entropy of i B .
From the above step, three segments can be obtained, i.e., Set t R , . Finally, the 3n spectral bands are selected to composite the false color images. This method is not only convenient to configure, but also produce the images containing the main spectral information. The optimal value of n was determined by experiments.

Superpixel Segmentation Based on Color Histogram Driving
In the superpixel segmentation method based on information measures, the spectral information in HSIs is converted to its own color information. Thus, the image color can be used as the basis to achieve the final segmentation of HSIs. Utilizing the feature of HSIs, the color histogram was used to evaluate the false color composition image, and the hill climbing algorithm was used to obtain the superpixel segmentation of images [22].

Metric Function of Color Uniformity
In this study, CHD function was used as the evaluation function [28], which consisted of two parts, i.e., the metric function of color uniformity function and the edge smoothness function.
(1) Metric function of color uniformity To evaluate the division of the color density of each superpixel, we constructed a histogram and discretized the color space. λ is the label of color space, G is the number of discrete histogram bars which is set by the users, and where ( )  ( ) k A c ψ is used to evaluate the distance between colors and the concentration degree of colors in a histogram, as shown in Equation (11).   boundaries, thus not all the pixels can reach the maximum. However, by analyzing the neighboring regions which contain more than one superpixel labels, the pixels close to boundaries can be reduced and the function can have more regular shapes.

Superpixel Segmentation Algorithm Based on Hill Climbing Optimization
During the segmentation process, the one-by-one superpixel adjustment strategy through gradual local optimization can be used to obtain a superpixel region that is sufficiently segmented and uniform in color. The hill climbing optimization was used as the one-by-one superpixel adjustment strategy in the superpixel segmentation. The hill climbing optimization is an iterative algorithm that starts with an arbitrary solution and then attempts to optimize the solution by incrementally changing a single element of the solution. The segmentation includes two important operations, which are described below [28]: (1) The initialization of superpixels First, the method proposed in [15] was used to manually set the segmentation amplitude threshold U . Using this method, the maximum number of superpixels was obtained. Then, the side length L of the initial square superpixel was obtained, as shown in Equation (14).
where N is the total number of pixels, and ( ) f  represents the function to get the upper even number.
(2) The adjustment of superpixels First, a block adjustment was performed on the square area formed by adjacent pixels centred at the superpixels, which speeded up the segmentation process. Then, the pixels were adjusted by pixel, and the pixels on the adjusted superpixel edge were moved to its neighboring superpixels. Each superpixel was guaranteed to remain unchanged after the modification.
Through the above two steps, the final segmentation results were obtained. The overall segmentation process of HSIs is shown in Figure 4. The spectral bands were preliminarily selected based on entropy and CMF, and then n spectral bands were selected to synthesize false color images.
Next, the false color images were first segmented based on hill climbing optimization, in which ( ) E s was used to evaluate the segmentation results. Subsequently, both block-level and pixel-level adjustments were performed for the pre-set number of iterations, N . Finally, the segmentation result was obtained and output. This aspect might be added in future work since the results might be further improved by a global optimization method. We can refer to the method proposed by Masra et al. for combining conventional HE technology with particle swarm optimization (PSO) algorithm to naturally enhance distorted images [29].

Experiments and Analyses
Two series of experiments were designed. In the first series of experiment, the best value of n (number of selected bands) was analyzed. The impact of window size m and threshold σ on the segmentation performance of the proposed IM-CHD algorithm was investigated. The results can provide references for the parameter selection. In the second series of experiment, our algorithm was compared with the existing superpixel segmentation algorithms of hyperspectral images. In the experiment, two datasets with different characteristics, i.e., the Indian Pines dataset (Indian Pines) and the Pavia Center dataset (Pavia), were selected. Indian pines were acquired on 12 June 1992 at the Purdue University Agricultural College in northwestern West Lafayette with an airborne visible/infrared imaging spectrometer (AVIRIS) sensor system with 145 145 × pixels. After removing the noise, the image contains 200 spectral information bands with a resolution of 20 m. There are 16 real objects with labels in the image [30], as shown in Figure 5. PaviaC was obtained through the Reflex Optical System Imaging Spectrometer hyperspectral sensor system in the city center near Pavia University in Pavia, northern Italy. After removing the noise, the data contain 1096 175 × pixels. The image has a resolution of 1.3 m and a total of 102 spectral bands. This image contains 9 types of objects [31], as shown in Figure 6. In these experiments, under-segmentation error (UE), achievable segmentation accuracy (ASA), and boundary recovery (BR) were used to evaluate the performance of the proposed algorithm [32]. UE measures the fraction of pixel leakage across the truth boundaries of the ground. Smaller value of UE indicates the better performance. In ideal situations, the classification results are all correct, so the calculation result of UE is 0. BR estimates the percentage of true natural boundaries recovered by segmentation of superpixel boundaries. In ideal situations, the superpixel boundaries are the same as real boundaries and the calculation result of BR is 1. A larger value of BR indicates the better segmentation performance. Complete accurate segmentation rate (ASA) measures the upper bound of the performance. ASA represents the highest precision of segmentation in superpixels. A larger ASA indicates the better performance. In the ideal situation, the result of ASA is 1.
In this paper, we implemented the generation of false color images on the platform of MatLAB2014a, and implemented the hyperpixel segmentation of hyperspectral images on the platform of VS2008 with the extended Opencv library. The specific experimental design is described as follows.

Parameters Selection Experiments
In this section, the results and discussions may be presented in separate subsections or in one combined subsection.
IM-CHD algorithm has three key parameters, i.e., the number of selected spectral bands n, the window size m , and the third is the threshold σ . The three parameters need to be adjusted manually. The three key parameters decide the exclusion of spectral bands and have a great influence on the dimensionality of HSIs.

Number of Selected Bands
First of all, the number of selected bands was set to 1, 2, 3, 4, 5, 8,10 n = . When 10 n > , more than 30 spectral bands would be selected, which makes dimensionality reduction meaningless. The window size was set to    , the red line is at the highest point and the value of ASA is the largest. Meanwhile, the red line is more stable than the orange line. Therefore, when 1 n = , the best segmentation results can be obtained. From the results, when 1 n = , UE was relatively low, BR and ASA were relatively high. At 1 n = , the segmentation results were more satisfactory than the results at other values of n. When 1 n = , three least relevant spectral bands were selected, and they contained the largest amount of information and minimum redundancy. Therefore, when the value of the parameter n was 1, the performance was optimal.

Threshold Parameter σ
Based on information measures, the threshold parameter σ has a great influence on the dimensionality of HSIs. In the experiment, three bands ( 1 n = ) were selected and the window size was set to 11 m = . Using the dataset of Indian Pines, the influences of the threshold σ on UE, BR, ASA are shown in Table 1.  when BR and ASA were simultaneously maximal, even if UE was a bit high, it still can be considered that the segmentation performance at 0.05 σ = was more satisfactory. In the following experiments, the value of σ was set to 0.05.

Window Size m
The window size m has a great influence on the dimensionality of HSIs based on information measures. In the experiment, three bands ( 1 n = ) were selected and the threshold was set to 0.1 σ = . Using the dataset of Indian Pines, the effects of different window size m on UE, BR, ASA are shown in Table 2. As shown in Figure 1, in the image of "Indian pines", the curvature changed steeply, thus selecting a large window size m and a moderate σ was more suitable. In addition, from , the experimental results of the segmentation were optimal. Under this condition, both BR and ASA were high, even though UE was a little high, the segmentation performance was still considered to be satisfactory. Through the above two parameter selection experiments, the most suitable values for the experimental parameters can be determined in the comparison experiment.

Comparison Experiments
In this section, the performance of the proposed IM-CHD algorithm was compared with similar existing algorithms. At first, the IM-CHD was compared with the conventional image color visualization method [21] based on information measures (CV-CHD). Then the proposed IM-CHD was compared to six other combined algorithms. Specifically, the conventional first principal component (FPC) extraction algorithm [33] and principal component (PC) weighted false color composition (FCC) algorithm [19] were combined with the conventional SLIC algorithm [16], new LSC algorithm [34], and the CHD superpixel segmentation algorithm [19]

Experiment Using Indian Pines Dataset
First, the comparison experiments were performed on the Indian Pines dataset. The obtained experimental results are shown in Figure 8 and Table 3. In the experiment, the obtained results were averaged by ten times.    Figure 8a shows the results of parameter UE using 8 algorithms. The smaller value of the UE indicated the smaller error and the better the segmentation performance. From the figure, the UE value using CV-CHD and IM-CHD is obviously lower than that using the other six algorithms. In details, from Table 3, compared with FCC-CHD, the error rate of IM-CHD(UE) was reduced by more than 15%; compared with CV-CHD, the error rate of IM-CHD(UE) was reduced by more than 8%. Figure 8b shows the results of the parameter BR using eight algorithms. The larger value of BR indicated that the boundary obtained by superpixel segmentation was closer to the natural boundary. From the results, CV-CHD and IM-CHD were superior to FPC-CHD and FCC-CHD. In addition, all the above four algorithms (CV-CHD, IM-CHD, FPC-CHD, and FCC-CHD) were much better than the other algorithms. From Table 3, compared with FCC-CHD, the missed boundaries in the IM-CHD algorithm were reduced by more than 5%. Compared with CV-CHD, the missed boundaries were reduced by 8%. Figure 8c shows the results of the parameter ASA using eight algorithms. The larger ASA value indicated higher accuracy of the segmentation result. From the figure, the obtained ASA values using different methods were relatively stable. Based on ASA, CV-CHD, IM-CHD, FCC-CHD, and FPC-CHD had relatively better performance. The segmentation accuracy of CV-CHD was similar to that of FCC-CHD. The ASA value of IM-CHD was slightly lower. But compared with the maximum ASA value obtained by the FCC-CHD method, the ASA value of IM-CHD was reduced by less than 2%.

IM
By comprehensively considering the values of the three evaluation indicators, it can be concluded that the segmentation results obtained by the IM-CHD method are the best. Figure 9 shows the segmentation results of the Indian Pines dataset under different segmentation methods. Under the same number of iterations, the number of superpixels obtained by different algorithms was different. More obtained superpixels indicated that the segment algorithm was more efficient. From the results, the IM-CHD method can better capture the boundary information and can allocate more superpixels to represent complex areas.
In the three methods, i.e., FPC-SLIC, FPC-LSC, and FPC-CHD, the images were segmented according to the first principal component map of the HSIS. Therefore, the obtained colors using these three methods were slightly different from the other five methods. From Figure 9, the superpixel results using different segmentation methods were slightly different. The CV-CHD and IM-CHD methods can better segment the real boundaries of ground objects, such as areas within a red circle. In both algorithms, more superpixel blocks were assigned to represent the regions with complex colors while fewer superpixels were used for the regions with simple colors. Therefore, more precise results can be obtained using both CV-CHD and IM-CHD. Figure 10 and Table 4 show the experimental results on the PaviaC dataset. Figure 10 shows the best results generated by each method. Table 4 shows the detailed experimental results.    Figure 10a shows the results of the parameter UE using 8 algorithms. The histogram height of UE using IM-CHD and CV-CHD was lower than using the other six algorithms. From Table 4, the obtained value UE using IM-CHD was the same as using FCC-CHD and FPC-CHD. Therefore, IM-CHD method had the second best UE performance (CV-CHD method resulted in the lowest UE value). Figure 10b shows the results of the parameter BR using eight algorithms. FCC-CHD, FPC-CHD, CV-CHD, and IM-CHD were superior to FFC-LSC and FPC-LSC, while the above six algorithms were much better than FCC-SLIC and FPC-SLIC. From Table 4, among all the eight methods, IM-CHD method provided the second largest BR value. Figure 10c shows the results of the parameter ASA using eight algorithms. IM-CHD was superior to all the other seven methods, i.e., FFC-FCC-CHD, FPC-CHD, FFC-LSC, FPCLSC, FFC-SLIC, FPC-SLIC, and CV-CHD. In Table 4, compared to CV-CHD, the ASA value obtained by IM-CHD was increased by 0.2%. Therefore, based on ASA, IM-CHD exhibited the best performance.

CV-CHD
Among all the segmentation methods of this experiment, according to the comprehensive consideration of the three evaluation indicators, our proposed IM-CHD method had the best performance and the highest classification accuracy. Figure 11 shows the example segmentation results on PaviaC dataset. Due to the same reason as in Figure 9, the segmentation result pictures using different methods had different colors. From Figure 11, using IM-CHD, the density of the obtained superpixel blocks is more reasonable, which is consistent with the observations from Figure 9.
By comprehensively analyzing the performance of eight algorithms on Indian Pines and PaviaC datasets, it can be concluded that the segmentation results obtained by IM-CHD are more reasonable and accurate. The experiment results proved the effectiveness of the proposed method.

Conclusions
In this paper, first, the superpixel segmentation algorithm based on information measures with color histogram driving was introduced. The spectral bands were selected based on entropy and CMF. In these selected bands, the redundant information was removed while the significant information was retained for further superpixel segmentation. Then, the mutual information was used to select the final three groups of bands, i.e., 3n bands, to composite false color images. The smaller value of the mutual information indicated the less redundancy between the spectral bands and the greater amount of contained information. In our study, the segmentation results were obtained using the hill climbing optimization algorithm. In addition, parameter selection experiments and comparative experiments were designed and implemented in this study. Under the optimal parameter, the proposed IM-CHD was compared with other similar existing methods. The experimental results demonstrated the effectiveness and superiority of the proposed IM-CHD method. Therefore, the proposed method achieved the purpose of efficient preprocessing of hyperspectral images.
Previously, we proved the validity of obtaining the main spectra of HSIs through information measure. In the future, we will study the application of superpixel segmentation based on information measurement in the classification of hyperspectral images. That is, the IM-CHD method proposed in this paper is used to preprocess HSIs and classify the processed HSIs. The classification results are compared with the previously proposed HSIs classification method, and it is discussed whether IM-CHD method can improve the classification accuracy of HSIs.