Unsupervised Identiﬁcation of Targeted Spectra Applying Rank 1 -NMF and FCC Algorithms in Long-Wave Hyperspectral Infrared Imagery

: Clustering methods unequivocally show considerable inﬂuence on many recent algorithms and play an important role in hyperspectral data analysis. Here, we challenge the clustering for mineral identiﬁcation using two different strategies in hyperspectral long wave infrared (LWIR, 7.7–11.8 µ m). For that, we compare two algorithms to perform the mineral identiﬁcation in a unique dataset. The ﬁrst algorithm uses spectral comparison techniques for all the pixel-spectra and creates RGB false color composites (FCC). Then, a color based clustering is used to group the regions (called FCC-clustering). The second algorithm clusters all the pixel-spectra to directly group the spectra. Then, the ﬁrst rank of non-negative matrix factorization (NMF) extracts the representative of each cluster and compares results with the spectral library of JPL/NASA. These techniques give the comparison values as features which convert into RGB-FCC as the results (called clustering rank 1 -NMF). We applied K-means as clustering approach, which can be modiﬁed in any other similar clustering approach. The results of the clustering-rank 1 -NMF algorithm indicate signiﬁcant computational efﬁciency (more than 20 times faster than the previous approach) and promising performance for mineral identiﬁcation having up to 75.8% and 84.8% average accuracies for FCC-clustering and clustering-rank 1 NMF algorithms (using spectral angle mapper (SAM)), respectively. Furthermore, several spectral comparison techniques are used also such as adaptive matched subspace detector (AMSD), orthogonal subspace projection (OSP) algorithm, principal component analysis (PCA), local matched ﬁlter (PLMF), SAM, and normalized cross correlation (NCC) for both algorithms and most of them show a similar range in accuracy. However, SAM and NCC are preferred due to their computational simplicity. Our algorithms strive to identify eleven different mineral grains (biotite, diopside, epidote, goethite, kyanite, scheelite, smithsonite, tourmaline, pyrope, olivine, and quartz).


Introduction
Hyperspectral infrared imagery provides the spectral and spatial information from the material's surface and has many applications in different fields including in geology [1][2][3][4][5]. The proposed approach challenges clustering strategies for the purpose of mineral identification in a unique hyperspectral infrared imagery dataset, which increases the novelty of this research.
For several past decades [6], the spectral analysis technology has showed considerable interest in airborne [6][7][8][9][10][11][12], portable instruments, and core logging [1,2]. The need for an automatic system to analyze hyperspectral imagery led to many investigations in the field of datamining (i.e., a Spectral Image Processing System (SIPS) [13], expert system for mineral mapping in [14], or many other similar examples [15][16][17][18][19][20]). Classification methods (e.g., support vector machine (SVM) [21][22][23][24][25][26][27], neural networks [28][29][30]) are supervised learning techniques that involve manually annotated examples in a training step. Supervised learning highly depends on the quality of the training set (number of training examples, labeling samples [27,28,[31][32][33][34][35][36]), besides a tedious task of annotating data by a human. On the other hand, clustering approaches do not need training to ultimately label samples, by techniques such as K-means clustering [23,31,[37][38][39][40], Fuzzy C-means [41][42][43][44][45], and other types of clustering [21,33,46]. However, clustering schemes are important in terms of computation requirements and similarity measurements. Here, the strategy to use clustering is investigated for ground-based spectra (laboratory conditions) to identify the minerals for portable instrument applications. We present in this paper a brief overview of the application of the hyperspectral imagery for the purpose of portable instruments with applications in geology. One very good example of using hyperspectral thermal infrared (TIR) for core logging was presented by Kruse in 1994 which was called Portable Infrared Mineral Analyzer II (PIMA II) and was functioned in a Short Wave Infrared spectrometer (SWIR-1.3-2.5 /mum). It used a Spectral Angle Mapper (SAM) for split drill core at the size of 1 cm intervals in both the cross-and along-core directions [1]. Two other comparable approaches were presented by Yajima et al. for POSAM (Portable Spectroradiometer for Mineral identification), which has been developed by the Metal Mining Agency of Japan (former organization of JOGMEC [2]) in 1993 [3,4] and Coulter et al. that reviewed the airborne hyperspectral system from visible infrared spectroscopy [5]. Hecker et al. (2008) analyzed the influence of reference spectra on classification of minerals (i.e., kaolinite) using SAM, and used synthetic images of three mineral endmembers to try to classify them applying reference libraries derived from ground spectra (portable infrared analyzer), United States Geologic Survey and airborne imagery; this led to increasing the classification accuracy. Moreover, Hecker et al. (2008) used preprocessing methods to suppress the influence of different referencing sources such as two types of continuum removal (hull subtraction, hull quotient), and a combination thereof [47]. The classification of this approach was efficient, but it was more a matching process between the targeted and reference spectra rather than relying on any type of clustering or machine learning approaches. The similarity of this approach to the proposed approach is due to the ground based spectra and the matching algorithm used in the method. Continuous Wavelet Analysis (CWA) is one of the feature extraction algorithms that is known to increase the processing time during the identification process in spectral imagery. Bruce & Li (2001) used wavelet analysis into hyperspectral data [48] and was followed by Rivard et al. (2008) to create a better representation of spectral libraries and to minimize the viewing and illumination measurement disparities [49]. Moreover, the estimation of oil sands was done applying Gaussian singlets and derivative of Gaussian wavelets [50]. The CWA used for hyperspectral long-wave infrared (3-14 µm) for rock encrusting lichens using airborne SE-BASS (spatially enhanced broadband array spectrograph system, aerospace corp.) imagery was performed based on finding the display peaks in reflectance (maximum reflectance) in the mineral's spectra [51]. These approaches increased the level of processing, whereas they do not transform the process into an automatic identification system. Another research effort on portable instruments which function for shortwave infrared (SWIR) and visible near-infrared (VNIR) wavelength was proposed for spectropolarimetric imaging. This system was based on acousto-optic tunable filter (AOTF) technology for desert soils analysis (Grupta (2014)). The wavelength band covered 450-800 nm and 1000-1600 nm and enabled a tuned optical wavelength and Radio Frequency (RF) (for piezoelectric transducer of AOTFs) along with the spectral band [52].
Unsupervised classification (clustering) methods are used in the hyperspectral field due to their simplicity and avoiding the labeling into the learning procedure, which often exists in supervised learning approaches. Moreover, clustering offers the ability to group the spectra into adjustable initial class numbers (common property with supervised learning approaches). This provides a better performance versus spectral dependent approaches such as specific wavelength bands (e.g., [14]).
Clustering was employed for various purposes such as improving matched filter [22,37], mixed agriculture and forestry application [53], anomaly finding in target detection [54], endmember identification [55], and urban area [56] for Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) data [21,24,38,42,43,57,58]. A semi-supervised band clustering on AVIRIS Indian pines [31] with Non-Negative Least Squares (NNLS) was used for endmember estimation in Hyperion and AVIRIS data [59], which was a supervised classification approach (a combination of Hyperion and Landsat for leaf area index estimation in SWIR, and NIR is presented in [60]). A color based clustering was used for mapping Kaolinite and was presented by Tyo et al. (2003) [61]. A Gustafson-Kessel clustering and fuzzy clustering [41] with Multi-Objective Particle Swarm Optimization (MOPSO) framework was used for AVIRIS and ROSIS sensor data [39]. Clustering Signal Subspace Projection (CSSP) and Maximum Correlation Band Clustering (MCBC) were employed based on PCA for AVIRIS data [46]. A Neighborhood Homogeneity Index (NHI) for spectral-spatial clustering [62] and Spectral Angle Mapper (SAM) based clustering were used employing k-means, CLUES, and SVM analysis for AVIRIS [23]. The proposed approach, unlike previously proposed hybrid methods, deploys comparison clustering analysis by changing the hierarchy of grouping in the mineral identification algorithm and strives to increase this efficacy.
The applications of spectral comparison techniques such as SAM [1,47,63] or matched filter (for target detection application [22,37,64]) and can be combined with machine learning approaches [22,23,37,64] to increase their performance. Another way is to combine them with other preprocessing approaches such as Continuous Wavelet Analysis (CWA) [48][49][50][51]65,66] in order to increase the performance and efficiency of decision-making processes (and deployed in supervised or unsupervised ways [22,23,64,65]). A Dynamic Self-Organizing Maps (DSOM) and Fuzzy ART algorithm combination approach has been presented for Sonar images segmentation [33] for ocean research and mineral mapping based on a blind spectral unmixing method and on sparse component analysis (BSUSCA) [67]. These approaches are considered as more advanced approaches in terms of datamining analysis.
An application of RGB false color composites (FCC) with wavelet transform used for noise and continuum separation combined with PCA was employed for natural oil seepages identification of the concentration due to the effects that oil in the soil causes on the spectral signatures of vegetation [65] (also eigenvector application for anomaly detection [68] or unmixing in Single-Pixel Independent Component Analysis SP-ICA [69]). This last approach has similarity with our approach in terms of using RGB-FCC in the mapping.
Here, we comparatively propose two algorithms to perform the mineral identification. The first algorithm uses spectral comparison techniques first for all the pixel-spectra and creates RGB false color composites (FCC). Then, a color based clustering is used to group the regions. The second algorithm clusters all the pixel-spectra to directly group the spectra. Then, the first rank of Non-negative Matrix Factorization (NMF) extracts the representative of each cluster and compares results with the spectral library of JPL/NASA. We compare these two strategies of clustering to identify minerals in hyperspectral imagery. This is the first time such a comparison is performed to challenge the clustering strategy and show some road map to perform such analyses. Besides that, the proposed hyperspectral is newly obtained for this specific purpose and is unique (in LWIR). Moreover, the methods have considerable flexibility and can adopt any other form of clustering approach into the algorithm. The main contributions of this paper are summarized in the following points: (1) Challenging clustering strategies by comparative analysis for two algorithms: RGB-FCC and colored based clustering versus rank-one NMF based clustering.
(2) Creating a unique dataset applying hyperspectral infrared imagery in laboratory conditions using the spectra obtained from portable instrument with adjusted lens and compared their spectral with another dataset (NASA/JPL) as validation.
(3) We also provided a brief review summary of the similar methods and compared their methods with the proposed strategies.
(4) We created another acquisition in different modality, Micro-XRF, to verify the ground truth and our quantitative assessments. Figures 1 and 2 depict the block-diagram of each algorithm to illustrate the comparison between the two. Table 1 describes comparison of the proposed approach with the state-ofthe-art methods.
The rest of this paper is organized as follows: in the next section (Section 2), the methodology of the approach is briefly described with different spectral comparison approaches and the two algorithms are presented. The experimental and computational results, as well as the discussion are presented afterward in Sections 3 and 4, respectively. The conclusions (Section 5) finally state that the automatic mineral identification in LWIR through the Clustering-rank 1 NMF algorithm has lower computational complexity but considerable better accuracy as compared to the FCC-Clustering algorithm and that fulfills the objectives of this research.

Contributions versus Prevalent State-of-the-Art Approaches
Approach Topic of the Approach Comparison to the Proposed Research Kruse (1996) Identification and mapping of minerals PIMA II with limited absorption band-depth mapping and spectral classification. Yajima (2004) Mineral mapping using the POSAM method Spectral correction, normalize (spectral enhancement), and Hull (base line correction). Zhang et al. (2014) Subpixel target detection metric learning Supervised metric learning approach with labeling.  Spectral Image Processing System (SIPS) SAM without any machine learning technique.  Expert system-based mineral mapping Application specific band false color mapping. Gillespie et al. (1986) Color-based correlation analysis FCC-PCA, which is relatively sensitive to outliers and noise.

Methods
The methodology is summarized by comparing two algorithms for the identification of minerals. Both of them involve spectral comparison techniques and clustering approaches. They both compare the targeted minerals spectra to the ASTER-JPL NASA spectral library (as reference spectra). The difference between these two algorithms is when clustering is applied ( Figure 2). Here, a brief summary of the spectral techniques used for both algorithms is provided.

Matched Filter
Matched Filter (MF) is a technique used for spectral mapping between the targeted and reference spectra. Particularly, MF applies the maximization of the targeted spectrum responses that theoretically suppress the cluttered background [70]. One feature of the MF is that it normalizes every component in the space of principal component analysis (PCA), applying the maximum between the global and local eigenvalue to the pixels. The

Orthogonal Subspace Projection (OSP) Algorithm
Orthogonal subspace projection (OSP) considers, as the first design, a method for projection of an orthogonal subspace to eliminate the non-target response, and then a matched filter is used to match the designed target from the data. OSP is a method that applies a structured subspace model to describe spectral variability [72,73]. The original OSP is described as T OSP (x) = d T P ⊥ U x. This form of OSP is appropriate for the purpose of classification, but is not suitable regarding spectral unmixing and abundance map estimation. Thus, another form of OSP has been introduced which is a scaled version of the OSP classifier and can estimate an abundance map. Let P ⊥ U = I L×L − UU * be the projector of orthogonal subspace, x is the pixel spectrum, and d is the target spectrum. U is the spectra matrix for non-target, U * = (U T U) −1 U T is the pseudo-inverse of U, and L is the bands number. OSP needs the spectra matrix of non-targeted area (region in the image) and the spectral signatures of the non-targeted can directly be extracted through the endmember, from the hyperspectral image [72,73].

Adaptive Matched Subspace Detector (AMSD) Algorithm
Based on the assumption of a linear mixing model for a pixel and its endmembers and their abundance, the endmembers are representative of materials spectrally present in the HIS. Thus, the mathematical concept follows: where M is the image number of endmembers, E is an L × M matrix and its columns signify the ith endmember, and a is an M × 1 vector where the ith entry reveals the abundance value a i , and x is the vector of current pixel spectral signature L × 1. It is assumed that the linear mixing model also involves the abundance values with two constraints: sum-to-one and non-negativity. Considering the interaction of the spectra within a pixel (or the region in this study), a hypotheses set can be made to discriminate those pixels holding targeted pixels that entirely include background spectra. The hypotheses are where T is an interested endmembers matrix of the target, B is a matrix representing those endmembers that are considered as background, n is considered zeromean, white Gaussian noise with variance σ 2 a b for the abundances of those endmembers (here, targeted minerals), x is the pixel under test (targeted pixel-spectrum). The AMSD algorithm uses the GLRT as a statistical test [74], but the sum-to-one constraints and non-negativity on the abundance estimation is not satisfied. Thus, the AMSD leads to a solution of a closed-form approach having the advantages of Constant False Alarm Rate (CFAR) property. Since the AMSD algorithm follows GLRT, the first stage is the calculation of the unknown parameters through Maximum Likelihood Estimates (MLE) employing log-likelihood and solving every unknown parameter that gives the abundance estimation of MLEâ b = (B T B) −1 B T x and the noise variance estimation of MLE The GLRT then considers the ratio of the hypothesis functions: As B and E are associated, it is not easy to identify this detection statistic distribution; consequently, a new detection statistic is used This provides independency to the denominator and numerator. Moreover, it does not depend on the estimation of the variance and abundance under the null hypothesis so the detection possesses the property of CFAR [74,75].

Clustering and Proposed Algorithms
Clustering is a term used for an unsupervised learning approach (unlike classification, which is a supervised learning approach), to discriminate the spectra for the mineral identification in hyperspectral imagery. A clustering method provides a fast performance and reduces the typical difficulties of supervised approaches such as training and labeling that particularly occur when the number of mineral samples are limited [30]. The proposed approach applies the K-means clustering for both algorithms. However, the FCC-Clustering algorithm used HSV [76] (color) based clustering to group the RGB-FCC, and the clustering-rank 1 NMF algorithm directly groups the spectra. FCC-clustering and clustering-rank 1 NMF algorithms are presented in Tables 2 and 3, respectively. The spectral comparison techniques used in these two algorithms are referred to as M j (x, y), and it changes for comparison assessment in both algorithms following different techniques used for comparison assessment.
Parameters involved in the clustering in the proposed approach: (1) The main concern to use the clustering algorithm involves the initial number of the clusters. Here, this number strictly depends on the number of targeted minerals. In addition, there are several ways to perform clustering without initialization cluster number (e.g., Elbow method [77]) that is far from the subject of this manuscript.
(2) In the case of direct clustering of spectra, the hyperspectral acquisition parameters were fixed for all samples (and remained unchanged) to have no variation in the grouping result.

FCC-K-Means ALGORITHM Given
Input data I(x,y,z) ∈ R N×M×Z is a continuum removed spectral data where I(x,y) ∈ R N×M is the spatial dimension for RoI (in pixel unit), z is the spectral resolution.
Step 1 Calculation of the spectral comparison techniques: represents the spectral techniques corresponding to j (e.g., j = 1 → M 1 = NCC). Φ i ∈ R z denotes the reference spectra (i.e., ASTER/JPL) with targeted mineral i.

Step 2
Generating FCC, Ψ RGB , using M j (for every j) applying thresholding.
Step 3 Let Ψ HSV a representation of FCC in HSV color system, Table 3. The clustering-rank 1 NMF algorithm for directly grouping the spectra.

K-Means-Rank 1 NMF ALGORITHM Given
Input data I(x,y,z) ∈ R N×M×Z is a continuum removed spectral data where I(x,y) ∈ R N×M is the spatial dimension for RoI (in pixel unit), z is the spectral resolution.
Step 1 Clustering X(p, z), p ∈ R N×M into k categories. The clustering is based on the spectral difference among the clusters (0 ≤ J ≤ k).
Step 2 h q 1 is the rank one NMF (i = 1) of each cluster C q i after clustering application.
Step 3 Calculate spectral comparison techniques: represents the spectral techniques corresponding to j (e.g., j = 1 → M 1 = NCC). Φ i ∈ R z denotes the reference spectra (i.e., ASTER/JPL) with targeted mineral i. Output Generating FCC, Ψ RGB , using M j (for every j) through thresholding.

FCC-Clustering
The FCC-Clustering algorithm is presented in Table 2. The input data of the algorithm, I(x, y, z) ∈ R N×M×Z , is continuum removed spectral data [78,79] where I(x, y) ∈ R N×M is the spatial dimension for selected RoI (contains targeted pixels), and z is the spectral resolution. Calculation of the spectral comparison techniques is shown by the following formulation: ST j represents the spectral techniques and j reveals the number of techniques exploited (e.g., j = 1 −→ M 1 = NCC). Φ i ∈ R z denotes the reference spectra (i.e., ASTER/JPL) and i is the number of spectra corresponds to number of targeted minerals. The generation of FCC, Ψ RGB , depends on the amount of M j (for every j). The three lowest M j create the (R,G,B) applying thresholding criterion. The output will be Ψ RGB , an image where the materials have been marked by different colors. Let Ψ HSV be a representation of FCC in the HSV color system, Ψ HSV (x, y, 3) = Ψ RGB (x, y, 3). Clustering method groups Ψ HSV (p, 3), p ∈ R N×M into k categories. Here, a K-means clustering technique has been used; hence, we have: where J k represents the targeted mineral grains clustered from other minerals by different colors:

Clustering-Rank 1 NMF
In the Clustering-rank 1 , the NMF algorithm is presented in Table 3. The input data of the algorithm, I(x, y, z) ∈ R N×M×Z , is continuum removed spectral data where I(x, y) ∈ R N×M is the spatial dimension for selected RoI (contains targeted pixels), and z is the spectral resolution. Clustering x(p, z), p ∈ R N×M into k categories, the clustering is based on the spectral difference among the clusters (0 ≤ J ≤ k): h q 1 is the rank one NMF (i = 1) of each cluster C q i after clustering application; Calculation of the spectral comparison techniques is as follows: ST j represents the spectral techniques and j reveals the number of techniques exploited (e.g., j = 1 −→ M 1 = NCCSAM). Φ i ∈ R z denotes the reference spectra (i.e., ASTER/JPL) and i is the number of spectra that corresponds to the number of targeted minerals. The generation of FCC, Ψ RGB , depends on the amount of M j (for every j). By thresholding, the three lowest M j create (R, G, B). The output will be Ψ RGB , an image where the materials have been marked by different colors.

Accuracy of the Proposed Approach
The accuracy of the algorithms is based on counting the correct detected pixels in the hyperspectral images (Tables 4 and 5). For that, a ground truth is used following the rigid manual labeling of the known location of the mineral grains in the samples and verified by the results of µX-ray fluorescence (µXRF (Figure 3). The number of ground truth pixels in each cases is also mentioned for all the samples with spatial resolution of ROI. For every case of mineral mixture, the targeted mineral grains are mixed with quartz grains; hence, for each mixed sample, accuracy is estimated through the accuracy of discrimination between these two types of grains. Two types of errors are shown by false positives that represent wrong mineral identification (one mineral instead of other) and a false negative that reveals misidentification of the mineral grains, and both are calculated in each case. Total accuracy of each algorithm is calculated by subtraction of the correct identification and two errors for every sample: where local accuracy (ACC) is calculated by: ACC(%) = Correct detected pixels Total pixels o f mineral(ROI) * 100 Let FN stands for the false negative error and estimate it using below mentioned formulas: The False Positive (FP) error is the wrong classified pixels and calculates by:   (rows a-f, column 1), the second left column presents FCC images using NCC (it can be any other spectral comparison techniques) before applying the clustering (rows a-f, column 2), and the two right hand-side columns show the results of segmented grains (rows a-f, columns 3 and 4).

Mineral Grains and Experimental Set Up
The experiment was conducted in a laboratory environment with a lightweight Hyper-Camera imaging spectroradiometer (HYPER-CAM LW) [80] operating in the long-wave infrared (LWIR) band (from 7.7 to 11.8 µm). It has spatial resolution of 320 × 256 with a LWIR PV-MCT focal plane array detector. It has a spectral resolution of 4 cm −1 which gives the spectra from 868 to 1270 cm −1 . The individual spectra are gathered using the Fourier-Transform Infrared Spectrometer (FT-IR) for every pixel with an instantaneous field-of-view of 0.35 mrad [80]. There is a heating source between the hyperspectral camera, and the grain samples ( Figure 4 depict the experimental setup) to closely and uniformly radiate the samples. Having the heating source located to the side of the sample and the camera enables radiating the heat more uniformly. There are eight mineral grains targeted to be automatically identified using the spectral analysis. Figure 5 shows the spectra from mineral grains used in the experiments together with the spectra from ASTER JPL-NASA spectral library to demonstrate the qualitative similarity among experimental and reference spectra. A brief description of targeted minerals is presented in Table 6.  The experiments in the 7.7 µm to 11.8 µm wavelength took place twice with the heating source turned on and switched off to calculate the continuum removed spectra [78]. The image shows the experimental setup that depicts the location of the hyper-camera, heating source, infragold plate, and mineral grains in the experiment.  Figure 5. The box plot of the spectral angle (SAM) between the spectra in every cluster and their first rank NMF for every mineral segmentation using the clustering rank 1 NMF algorithm (similarity per cluster by spectral angle difference). The Whisker and box plots are representing the similarity between the best representation of each cluster using NMF and the entire spectra of the cluster itself (to show the NMF functioning). The higher the median line in the whisker plot, the more it shows the number of spectral similarity in the clusters. In general, the bigger box and Whisker plots represent the higher variation of similarity between the best representative spectrum in every mineral and spectra of the cluster. Table 6. Characteristic of minerals studied by LWIR.

Minerals Chemical Formula
SiO 2

Properties of Hyperspectral Image
The image acquisitions have been performed while the minerals were attached to a carbon substrate (shown in Figure 3a-e) and had an infragold plate in the background. The infragold plate reflects all the radiation and is used to calculate the overall radiation's amount for Continuum Removal (CR) [78]. The experiment was performed while the heating source was first on and then off. It is required to perform CR and avoid calculation of the black-body's temperature [78,79]. The size of hyperspectral images is 180 × 300 pixels in the spatial resolution and 122 channels of spectral resolution.
Mineral identification using hyperspectral technology depends on spectral comparison techniques and mineral spectral signature. The spectra for some of these minerals (e.g., Figure 5) are represented by maximum or minimum in particular wavelength. The location and types of these particular features within the wavelength band play an important role in the identification's accuracy. The minerals for this research have reasonable signatures in the 7.7-11.8 µm band. Smithsonite, scheelite, and goethite have more similar spectra to each other in terms of spectral shape (location of extrema). This causes a problem for identification of the minerals once they are combined in a mixture form as their spectral features would be similar. In this example, only scheelite has a maximum peak before 11.8 µm, and it makes scheelite detectable. In contrast, goethite and smithsonite have a peak after a 11.8 µm band (not in the range of our hyper-camera), and this makes them undetectable for both algorithms (geothite example in Figures 3c and 5) with the used system.

The Results of Spectral Comparison Techniques
The results are shown for the FCC-clustering proposed algorithm using the RGB-FCC (Figures 5 and 6). This provides a better visualization of the spectral differences among the minerals by placing each mineral target as a certain weight amount in colors. FCC provides a good difference criterion that can easily be discriminated by a clustering approach (the results of the FCC-clustering algorithm are shown in Figure 6). Some of the spectral techniques applied to the images are not necessarily for spectral comparison approaches such as NCC or SAM. However, these techniques have been used to investigate the strength of the method with respect to the extraction of spectral differences. In order to apply these techniques, a MATLAB hyperspectral image index analysis toolbox [81] was used. The quantitative results of the FCC-clustering algorithm are shown in Table 2.

Results of the Two Algorithms
The results of the spectral comparison technique were presented in the previous section; in this section, the clustering results are shown in Figures 5 and 6. This includes the clustering approach for both algorithms. The results of the FCC-clustering strategy are presented in comparison with the second hyperspectral method. Applying the clustering in a different hierarchy with spectral analysis techniques creates two approaches that have two similar outcomes. Figure 6 shows the performance of the color based clustering approach for the algorithms. Besides the computational load which is considered as significantly different between these methods (Table 5), clustering after using spectral analysis (in the FCC-clustering algorithm) is considered as a more sensitive algorithm, and this is caused by the dependency of the clustering to the generated color (RGB-FCC). A low performance in spectral comparison techniques creates misclassification in the algorithm, for instance: diopside applying NCC has lower performance than SAM, and it created more false negatives after clustering (Table 2).
However, the sensitivity of the clustering-rank 1 NMF algorithm lies under the application of clustering techniques. The same example of diopside that has more false negatives because of clustering poor performance (Table 4). Sometimes, due to the spectral curve that does not have significant extrema in the band of the hyperspectral information, the clustering method cannot discriminate the clusters from each other (i.e., in the case of goethite and smithsonite minerals). In such a situation, increasing the number of groups in the clustering initialization partially solves the problem. Even though this solves the problem in the clustering, the clusters selected as different categories might have similar content materials. This is corrected by the application of the spectral comparison techniques that categorize all of this similar grouping into one category. However, in the case of similar mineral spectra or unspecified spectral exterma in minerals, the problem remains unsolved. Applying these algorithms provides an opportunity to compare them through the mineral identification task and resulting computational load. The FCC-clustering algorithm seems heavier in computation time when compared to the clustering-rank 1 NMF algorithm due to the application of the spectral analysis for each spectrum. Furthermore, the clustering-rank 1 NMF algorithm clusters all the spectra that result in heavy computations as well. Table 4 presents the accuracy of both algorithms, and Table 5 indicates the computational load for each algorithm with different spectral analysis. Averaging and factor analysis can provide better outcomes, particularly the factor analysis provides more statistical information for the selection of the spectral representative in the algorithm. We applied NMF to select a spectral representative for each cluster. Figures 5 and 7 show some examples of NMF results and box-plots for different categories of minerals and present the qualitative and quantitative representation for the application of such techniques, respectively. A higher number of initialization for the clustering allows a re-selection of the same spectral mineral in two or more different categories. This difficulty can also be solved in the latter hierarchy by applying spectral analysis, which means that this analysis selects the same mineral categories for these selections.
It is noticeable that the location of the detected pixels is vital to identify the mineral grains. In other words, we would like to have correctly detected pixels located on the surface of mineral grains and even one or two pixels detected on the surface of minerals indicate the grain content and ultimately could yield to an accurate identification (grainbased identification). We also provide pixel-based accuracy percentages (at Table 4 or  Table 7). For example, kyanite and scheelite have been detected with a very limited number of pixels but very accurately because these pixels are in the grain's surface. In terms of spectral comparison technique accuracy of the algorithms, the SAM and NCC provided better accuracy as compared to OSA and AMSD that required the background spectra in their calculations. Matching filter (PLMF) did not succeed with detecting the minerals and was omitted from the accuracy calculations. Table 4 shows the accuracy of spectral comparison techniques using the clustering-rank 1 NMF algorithm. The accuracy of biotite & quartz, epidote and quartz, geothite and quartz, and kyanite and quartz are the same for NCC and SAM, and this indicates the dependency of the clustering-rank 1 NMF algorithm to the clustering. This is also shown in Table 3 for the clustering-rank 1 NMF algorithm. However, the accuracy of NCC and SAM in the FCC-clustering algorithm is not similar and indicates that the identification process in the FCC-clustering algorithm depends more on the performance of the spectral comparison techniques than on clustering.  a,b, diopside c,d, tourmaline e,f and mixed with quartz grains are shown. In addition, the/µXRF image of the samples are also shown in the image to verify the ground truth images and labeling. g1-g3 and h1-h3 depict the images of diopside-quartz and epidote-quartz samples using Micro X-ray fluorescence (µXRF), respectively. The presence of magnesium and aluminum elements in diposide and epidote is shown in g4 and h4, respectively. g5,h5 represent mapping diposide and epidote binocular images to the automatic identification results using ArcGIS. i1-i4 show a point in the grains of diopside and epidote. Moreover, the lower raw in the figure shows SEM images of diopside, epidote, tourmaline, and pyrope to indicate the surface of these grains.

Automatic Identification Process
Mineral identification has been studied and researched for several decades, and most of these approaches have been categorized under the hyperspectral remote sensing, airborne, portable instruments [82], and core logging [83] applications. The proposed research addressed the application of hyperspectral infrared in the LWIR (7.7-11.8 µm) for the purpose of automated mineral identification applying two algorithms that involved unsupervised segmentation and spectral comparison techniques. It has been previously shown that clustering techniques are more suitable for categorizing the minerals as compared to classification (supervised) approaches. This is because of not having enough data to properly train the classifier and expensive labeling [30,78]. Applying spectral comparison techniques and clustering approaches gave the opportunity to identify the minerals using two algorithms. The difference between these two approaches lies in the utilization of these techniques in different hierarchies. Applying clustering or spectral comparison approaches for all data spectral points in these two algorithms is considered as the bottle neck for both algorithms due to the categorization task. Each of these algorithms, for all the data points, can lead the entire algorithm to a high or low computational complexity. The algorithms have reasonable performance for the identification of the minerals. Some minerals such as goethite and smithsonite have no specific extrema in their LWIR spectra and their identification failed, regardless of which algorithm is applied (it is shown in Figure 5 and also Table 4). In contrast, biotite, diopside, epidote, tourmaline, scheelite, quartz, and kyanite have been identified more clearly due to their distinctive spectral signature in the LWIR.

Computational Complexity of the Algorithms
We analyzed two different algorithms using spectral comparison and clustering techniques applied in different hierarchies. The FCC-clustering algorithm calculates the spectral techniques for all the pixels of hyperspectral image spectra and gave false colors to these features and ultimately segmented these false color regions by applying clustering. On the contrary, the clustering-rank 1 NMF algorithm directly applies the clustering to all the pixels in the hyperspectral image. Then, the first rank of NMF is selected as the spectral representative for each cluster (the statistical relationship among the clusters are shown by box-plots in Figure 7). These spectra were compared with the reference spectra (ASTER spectral library) and this led to finalizing the segmentation process. Selecting the representative of each group using low-rank NMF (here rank-1 NMF) from each group alleviates the influence of miss-classification (occurred by clustering) in the final spectral comparison ( [79,84] and showed that averaging of spectra would not be effective as far as decomposition/factorization methods work). Moreover, NMF shows better performance than the PCA, which might suffer from sensitivity to outliers and noise).
The FCC-clustering algorithm shows itself to be more computationally costly as compared to the clustering-rank 1 NMF algorithm because of the application of the spectral comparison approaches to the entire pixel's spectra of the hyperspectral images. Nevertheless, the results presented in Table 4 indicate that the clustering-rank 1 NMF algorithm is also computationally costly due to direct clustering for the whole spectral pixel points. The computational complexity of the clustering (here K-means) algorithm for the fixed k and d (dimension) is O(n (dk+1) log n), where n is the number of entities to be clustered [85]. Some heuristic algorithms such as Lloyds algorithm have the complexity of O(nkdi), where k is the number of clusters, i is the number of iteration, and n is the number of d-dimensional vectors [86]. On the other hand, some algorithms such as SAM have a cosine function and have approximately a computational complexity of cosine which is O(M(n) log n) (for the algorithm of Arithmetic-geometric mean iteration), where M(n) stands for the chosen multiplication algorithm complexity [87]. Due to the division into the cosine function in the SAM, the complexity of O(n 2 ) corresponds to the division itself which increases the whole complexity of the SAM function. The different computational complexity between clustering and spectral comparison techniques is the reason why calculation of the spectra for the clustering-rank 1 NMF algorithm shows considerably lower complexity as compared to the FCC-clustering algorithm.
The sensitivity of each algorithm depends on the spectral difference calculations. For example, the FCC-clustering algorithm is sensitive to spectral techniques used for extraction of the RGB-FCC to finally clustering them. However, there are many other better alternative approaches that exist for visualization of spectra in the hyperspectral imagery (i.e., [88][89][90][91]) that can replace the current FCC method. In the clustering-rank 1 NMF algorithm, the sensitivity of the system lies in the clustering approach and, in particular, the number of initialization in clustering. Besides this, there are other factors nominally involved in the sensitivity of the proposed approaches such as the initialization of the clustering for the FCC-clustering approach and the spectral analysis in the clustering-rank 1 NMF method. In the clustering-rank 1 NMF algorithm, a spectrum from each selected cluster should be compared to reference spectra from the ASTER JPL-NASA spectral library. Several methods can be applied to select the best representative spectrum from each cluster such as randomly selecting one spectrum, spectral averaging, or using factor analysis. Random or averaging selection of the spectrum might not be an efficient way to select the spectral representative in each cluster because of averaging and random calculation sensitivity against bad spectra (wrongly grouped spectra or noisy spectra) that may occur in the process. Some noisy spectra might be clustered into the different category of minerals and random (or averaging) selection may not be a good option to suppress this effect, and this ultimately influences the mineral identification.

Conclusions
The proposed approach presented a geological hyperspectral infrared imagery (in the 7.7-11.8 µm-LWIR range) in laboratory conditions. This paper addressed quantitative and qualitative assessments of two algorithms for the identification of several minerals and challenged the application of clustering. The FCC-clustering algorithm applied the spectral comparison techniques on the entire pixel-spectra of the input data cube and spectral library of JPL/NASA. It generated the spectral difference that was presented in RGB-FCC form, and a clustering grouped the different composites. The clusteringrank 1 NMF algorithm clustered all the pixel-spectra into different categories. Then, rank 1 extracted from NMF as representative for each cluster were compared to the spectral library of JPL/NASA through spectral comparison techniques that generated RGB-FCC results. The results of the clustering-rank 1 NMF algorithm indicated more significant computational efficiency (more than 20 times faster) than the FCC-clustering algorithm. The clustering-rank 1 NMF algorithm showed more dependency on clustering rather than the FCC-clustering algorithm that was more sensitive to spectral comparison techniques. Both algorithms had promising performance for mineral identification having even more than 90% accuracy (using) a clustering-rank 1 NMF algorithm. Several spectral techniques were used such as AMSD, OSP, PLMF, SAM, and NCC, but most of them showed a similar accuracy range (although PLMF exhibited a lower accuracy). Eleven different mineral grains (biotite, diopside, epidote, goethite, kyanite, scheelite, smithsonite, tourmaline, pyrope, olivine, and quartz) were studied. Future work can be more focused on clustering approaches and the noise effect in mineral identification to increase the performance of the system. The study of minerals with poorly shaped spectra is another important future research avenue.