Unsupervised Clustering for Hyperspectral Images

: Hyperspectral images are becoming a valuable tool much used in agriculture, mineralogy, and so on. The challenge is to successfully classify the materials founded in the ﬁeld relevant for different applications. Due to a large amount of data corresponding to a big number of spectral bands, the classiﬁcation programs require a long time to analyze and classify the data. The purpose is to ﬁnd a better method for reducing the classiﬁcation time. We exploit various algorithms on real hyperspectral data sets to ﬁnd out which algorithm is more effective. This paper presents a comparison of unsupervised hyperspectral image classiﬁcation such as K-means, Hierarchical clustering, and Parafac decomposition, which allows the performance of the model reduction and feature extraction. The results showed that the method useful for big data is the classiﬁcation of data after Parafac Decomposition.


Introduction
Hyperspectral images are a challenging area due to unfavorable resolution of images, high dimensionality, and insufficient ground truth data during training. Hyperspectral systems acquire images in about a hundred or more continuous spectral bands [see Figure 1]. At different wavelengths, the materials reflect and absorb differently. If the detection system has sufficient spectral and spatial resolution, the materials can be identified from their spectral signatures [1]. The unsupervised classification discovers the information on its own and also leaves the number of classes to be chosen by the user.
Hyperspectral image classification is very complex and difficult to obtain because of image noises, complex background, various characters [2] or high dimensionality [3]. Each pixel is assigned to a class, leaving the user to find the meaning for each class. Also, existing classifiers do not always offer a better classification for all types of data, which is why studying the classification of hyperspectral images can provide us general information about classification accuracy. Hyperspectral images are an important tool for identifying materials spectrally unique, as it provides sufficient information for classifying data from different areas. Furthermore, hyperspectral images could be used to caption the different phases of the eruptive activity of volcanoes, to obtain a detailed chronology of the lava flow emplacement and radiative power estimation [4]. In Reference [6], unsupervised algorithms (means K and ISODATA) were applied to hyperspectral images and, although both gave accurate results, K-means had better results. Another approach of hyperspectral classification is described in Reference [7], where the authors used Maximum Likelihood (ML), Spectral Angle Mapper (SAM), and Support Vector Machine (SVM), where SVM was more effective than the others.
Studies such as those in Reference [8] proposed hierarchical SVM which achieved an acceptable accuracy value [9], which experienced supervised and unsupervised methods of classification, have concluded that the best result was obtained by the hierarchical tree [10] which presented an approach by using principal component analysis (PCA), k-Means, and Multi-Class Support Vector Machine (M-SVM), which achieved accuracy values over 77% confirms that hyperspectral images classification is a challenging task.
Other approaches such as that in Reference [11], which proposed a compression-based non-negative tensor canonical polyadic (CP) decomposition to analyze hyperspectral images, and Reference [12], which proposed algorithms for extraction and classification which is based on orthogonal or non-negative tensor decomposition and higher-order discriminant analysis, was realized to gain the full potential of hyperspectral images.
An approach to analyze and process hyperspectral images is Cellular Nonlinear Network (CNN), which could be mostly used in recognition problems to improve the run-time. The authors of the paper [13] proved that CNN is an appropriate tool to discover recurrent patterns, to create a link between circuits, art, and neuroscience. The hyperspectral analysis it is not an easy task due to high dimensionality, spatial, and spectral resolution of hyperspectral images.

Results of Unsupervised Clustering
In this section, we explore the efficacy of the tensor decomposition, k-means, hierarchical clustering and tensor decomposition classification. Ground truths available for the studied data sets were provided by direct observation. To obtain the results, we used Python on a machine under Windows 10 with 8 GB of memory. The data sets used in our study were taken from Grupo de Inteligencia Computational [14]. The data set of the Danube Delta was acquired from U.S. Geological Survey [15]. The tensor dimensions of data sets were: 103 × 101 × 204 (Set 1), 401 × 146 × 204 (Set 2), 181 × 111 × 103 (Set 3), 191 × 151 × 105 (Set 4) and 181 × 151 × 53 (Set 5). The geometric resolutions for our images were 3.7 m for Set 1 and Set 2, 1.3 m for Set 3 and Set 4, and 30 m for Set 5.
We prepared the analysis by building the matricized tensor, which was constructed by reordering the elements into a matrix. To compare the final results, we chose to classify the training data, and we build a matrix that contained only the spectral bands known for which we knew the class in which belongs. We also classified the entire data set, taking into account the spectral bands whose class we do not know. Before we applied the classifiers, the data were normalized and the values were between 0 and 1.

Hierarchical Clustering
The Agglomerative Hierarchical clustering Technique assumes that each data point is an individual cluster. Similar clusters merge with other clusters iteratively until all similar data points form a single cluster [16].
The confusion matrix acquired for Set 1, M i,j where an observation known to be in class i was predicted in class j [see Table 1]. As we can see, the table demonstrates a good classification [see Figure 2], except for class 4, which was classified as class 2. Figure 3 shows the output of classification for Set 2.     Figure 4, it can be seen that the classification is quite well defined.

K-Means Clustering
The K-means algorithm classifies the data by grouping the items into k different clusters. The results are obtained by using the Euclidean distance [17].
As previously, the confusion matrix is shown Table 2. As we can see, the table demonstrates a good classification [see Figure 5], except in class 4, which was classified as class 2. Figure 6 shows the output of classification for Set 2.

Parafac Classification
Parallel Factor Analysis (Parafac) is a method to decompose multi-way data and analyze the data using non-supervised classification tools. In Figure 8 are the mean of spectral bands for each class for Danube Delta, Romania (Set 5). The number of Parafac components was 100, the algorithm converged after 2 iterations, and the explained variation was 98.73.  Figure 9 shows the estimated abundance maps for each class, blue for low contribution to yellow to high contribution. Also, we can observe that there are classes which do not distinguish very well and one of the reasons for this could be the poor resolution of the hyperspectral image. On the fourth row in the second image, it can be easily seen that the areas are very well delimited by analyzing Figure 7 (left).
The pseudo-code that describes the construction of an abundance map is presented in Algorithm 1. Figure 9 shows the abundances maps for each class, constructed by using the algorithm described previously. We used the same algorithm for all the data sets we discussed in this paper. In Section 5, we discuss the results presented previously.

Discussion
The results from three classifications methods are summarized in Tables 3-5. Because we started with the assumption that the ground truth is unknown, we evaluated the classification using the following measures-Silhouette Coefficient, and Davies-Bouldin Index. Further, we calculated the accuracy and confusion matrix by using the ground truth.  3 The execution time of classification expressed in seconds (s), 4 Accuracy (using ground truth), 5 Number of classes from ground truth. Tables 3 and 4 contain the evaluation results of classification. The good classification performance is demonstrated by accuracy values which are higher than 84% for both K-means and Hierarchical clustering. We can see that the number of classes from ground truth does not match with the number of classes proposed by the Silhouette score for all data sets. One reason can be the resolution of images and the quality of ground truth. In the case of Set 5-Danube Delta-because the ground truth is missing, we do not have the accuracy value and number of classes. In Table 5, we have on the line number of sets, and on the column, we have the classes. For each set, we calculated The Root Mean Squared Error between means of spectral bands for each class from ground truth and the means of spectral bands for each class from classification result. In Table 5, we highlighted the pairs with the lowest value of Root Mean Squared Error. Even though we classified the spectral bands in 7 clusters for this data set, the RMSE associate one class resulted from Parafac Decomposition with more than one class from the ground truth [see Table 5].
The execution time expressed in seconds (s), 2 . The execution time expressed in microseconds (µs), 3 The execution time expressed in seconds (s) using K-means for the entire data set, 4 The execution time expressed in seconds (s) using Hierarchical clustering for the entire data set.
Also, Table 5 shows the execution time classification of spectral bands after Parafac decomposition, K-means and Hierarchical clustering. Given that the run time after decomposition is noticeably shorter, the run time sounds promising concerning big data analysis [see Tables 3 and 4].

Methods
Regarding the high dimensionality of hyperspectral images, dimension reduction is an important step to process the hyperspectral information. Many techniques were tried to improve the performance of dimension reduction to detect, recognize and classify the materials. Next, we describe the algorithms addressed in this paper.

Parafac Decomposition
In practice, hyperspectral images are tensors of three-order [see Figure 10]. Give, a hypespectral tensor T ∈ R N×M×P , N × M is the number of pixels and M is the spectral signature. A tensor T ∈ R I 1 ×I 2 ×...×I N is a multidimensional array called N th order tensor or N-way tensor, where N > 0 [18]. CANDECOMP/Parafac Decomposition of a higher-order tensor is decomposition in a minimal number of rank-1 tensors [see Equation (1)]. For a third order tensor T ∈ R I 1 ×I 2 ×I 3 , the Parafac decomposition is written as [see Figure 11] [12,19]: In Equation (1), A, B and, C are the loading matrices, R is a positive integer and, the symbol • denotes the outer product of vectors. Figure 11. The Parafac decomposition of a three order tensor [18].
Spectral unmixing assumes "the decomposition of the measured spectrum of a mixed pixel into a set of pure spectral signatures (endmembers) and their corresponding abundances (fractional area coverage of each endmember present in the pixel)" [20]. The mixed pixels contain a combination of different materials. In Equation (2) is written the linear mixture model for each pixel: where x ∈ R L×1 is the spectrum of the pixel and L is the number of bands, M ∈ R L×P is a matrix in which each column corresponds to the spectrum of an endmember, P is the number of endmembers, s ∈ R P×1 contains the fractional abundances of the endmembers and, e is additive noise [20]. In Equation (3) is written the mixing model for N pixels: where the matrix X ∈ R L×N contains all pixels from hyperspectral data and S ∈ R P×N contains the fractional abundances of all pixels on all endmembers [21]. Making a connection between Block Term Decomposition, and linear spectral mixture model (LSMM), the previous equation can be seen as a problem of matrix factorization. According to LSMM, it is expected to be the product between a matrix and a vector. The vector contains the endmembers, and the matrix contains the abundances of the corresponding endmember for every pixel, and also the spatial position of pixels [21].
The rank-(L r , L r , 1) block term decomposition approximates a tensor of order three in a sum of R terms and, each term is an outer product of rank L r matrix and, a nonzero vector [see Figure 12]. In Equation (4) is written the rank-(L r , L r , 1) BTD of a tensor T [22]: where A r ∈ R I 1 ×L r , B r ∈ R I 2 ×L r are L r ranks, E r is called abundance map representing the product A r · B T r and nonzero c r ∈ R I 3 . Spectral unmixing makes possible a decomposition of a hyperspectral image to obtain the corresponding abundance map [23].

K-means and Hierarchical Clustering
Hierarchical clustering and K-means are unsupervised classifiers that look for similarities among data and, both use the same approaches to decide the number of clusters [24].
To evaluate the performance of the classification, we used the Davies-Bouldin Index, whose values signify the average similarity between clusters. The similarity is a measure that compares the distance between clusters with the size of the clusters themselves; a value closer to 0 or 0 indicates a better partition. Also, we used the Silhouette Coefficient, where a higher Silhouette Coefficient score indicates better-defined clusters [25]. Using the ground truth, we could calculate the contingency matrix and accuracy value [25].

Conclusions
This study has addressed the problem of the classification of hyperspectral images using K-means, Parafac Decomposition, and Hierarchical clustering. The following discussion implies a comparative analysis of three different classification methods for hyperspectral images. The run-time is the total time from when an algorithm starts to its finish [26]. Taking into account this challenge, the execution time depends on many factors, but the most important is the memory used, the percentage of CPU utilized by an algorithm and the data set dimensions. Considering these results, Parafac decomposition is very useful for big data, because it is important to reduce the execution time and to use memory efficiently. According to the results in Section 3, we can say that for small data, K-means or hierarchical classification can be used. For big data, Parafac decomposition can be used because it reduces the data and becomes easy to interpret; the execution time will be considerably reduced. Additionally, the abundance maps are very well defined, considering that the fraction of a pixel class is estimated by an abundance map. In other words, the abundance map could be seen as ground truth, noting that possible errors of human errors of the class assignment are non-existent. Regarding K-means and Hierarchical clustering, we can see that it offered very good accuracy values but, at the same time, we must take into account that, for big data, the execution time will grow considerably.
By comparing the supervised and unsupervised classification, the supervised classification yielded better results, but it forces us to have ground truth [27]. However, the results for unsupervised classification are not so far from supervised classification. The unpleasant results of classification after Parafac decomposition can be caused by spatial and spectral resolution. Finally, the studied algorithms have been shown to be effective for our problems, are very useful in applications such as model reduction, pattern recognition and classification.