Data Fusion Using a Multi-Sensor Sparse-Based Clustering Algorithm

: The increasing amount of information acquired by imaging sensors in Earth Sciences results in the availability of a multitude of complementary data (e.g., spectral, spatial, elevation) for monitoring of the Earth’s surface. Many studies were devoted to investigating the usage of multi-sensor data sets in the performance of supervised learning-based approaches at various tasks (i.e., classiﬁcation and regression) while unsupervised learning-based approaches have received less attention. In this paper, we propose a new approach to fuse multiple data sets from imaging sensors using a multi-sensor sparse-based clustering algorithm (Multi-SSC). A technique for the extraction of spatial features (i.e., morphological proﬁles (MPs) and invariant attribute proﬁles (IAPs)) is applied to high spatial-resolution data to derive the spatial and contextual information. This information is then fused with spectrally rich data such as multi- or hyperspectral data. In order to fuse multi-sensor data sets a hierarchical sparse subspace clustering approach is employed. More speciﬁcally, a lasso-based binary algorithm is used to fuse the spectral and spatial information prior to automatic clustering. The proposed framework ensures that the generated clustering map is smooth and preserves the spatial structures of the scene. In order to evaluate the generalization capability of the proposed approach, we investigate its performance not only on diverse scenes but also on different sensors and data types. The ﬁrst two data sets are geological data sets, which consist of hyperspectral and RGB data. The third data set is the well-known benchmark Trento data set, including hyperspectral and LiDAR data. Experimental results indicate that this novel multi-sensor clustering algorithm can provide an accurate clustering map compared to the state-of-the-art sparse subspace-based clustering algorithms. validation, K.R.S. and R.J.; formal analysis, K.R.S.; investigation, K.R.S.; writing—original draft preparation, K.R.S.; writing—review and editing, all authors; visualization, K.R.S.; supervision, P.G., P.S. and R.G.; project administration, R.G.


Introduction
The recent advances in imaging technology have resulted in the acquisition of high spectral-spatial resolution imaging data [1]. Among the designed imaging sensors, hyperspectral cameras have become the main source of fine spectral-resolution data in hundreds of narrow spectral bands [2]. The spectral channels (bands) of hyperspectral imaging data (HSI) cover the electromagnetic spectrum ranging from the visible to the near-infrared (VNIR, 0.4-1 µm) and shortwave infrared (SWIR, 1-2.5 µm). An HSI can be employed to identify and discriminate different materials and biological targets [2]. The rich spectral information derived from an HSI has been widely used in remote sensing applications such as land-cover- [3], urban- [4] and mineral-mapping [5,6], or the estimation of above ground biomass [7]. For instance, in the mineral exploration stage, information that is obtained by an HSI can be used as a useful support for geologists to distinguish and track mineralogical features of interest [8].
The analysis of HSI data can be challenging due to (1) their high dimensionality and the often limited availability of representative samples (also known as training samples ), which results in the so-called curse of dimensionality [9] and (2) the highly mixed nature of pixels [10]. In order to overcome these challenges, different machine learning algorithms have been developed and proposed in the last decades [11]. Supervised learning algorithms have become popular for HSI analysis because of their empirical success and adjustability to solve classification and regression problems [12,13]. However, these algorithms strongly rely on training samples, which still remain challenging to obtain. This issue has been tackled by developing unsupervised algorithms (also known as clustering algorithms) [14]. Clustering algorithms use unlabeled data to group the data into meaningful clusters. Conventional clustering algorithms (e.g., K-means [15], Fuzzy C-means [16]) distribute the data around several cluster centers, the so-called centroids. By using different distance measures, data points are assigned to the cluster with the closest centroid [15]. These clustering algorithms are iterative algorithms and strongly depend on the random initialization of the centroids [17]. Moreover, conventional clustering algorithms are pixel-based, and do not include any spatial information. Additionally, they are only applicable to single-sensor data.
In the past decades, HSIs have been extensively used to detect and characterize different materials, as they provide rich spectral information. At the same time, RGB sensors can generate very detailed high spatial resolution images, while light detection and ranging sensors (LiDAR) can yield geometrical and elevation data [18,19]. All the above-mentioned multi-sensor data sets lead to complementary information (i.e., spectral, spatial, elevation), and the fusion of these data sets has received much attention. In many studies (to name a few [4,5,[19][20][21][22][23][24]), the integration of spectral, spatial and elevation information, extracted from multi-sensors has led to significant improvements in the performance of supervised learning-based approaches. In [19], a low-rank subspace-based fusion technique (SubFus) is proposed to incorporate contextual features from multi-sensor data for classification. In [5], the authors proposed to fuse HSI and fine spatial-resolution RGB images using orthogonal total variation component analysis (OTVCA) as a spectral-spatial feature extraction technique. In [21], WorldView 2/3 and LiDAR images were integrated using a deep learning algorithm, the so-called dense convolutional network to map individual tree species in urban areas. A comprehensive review on the integration of multi-sensor and multi-temporal data sets is given in [4].
It has been demonstrated in different studies that high dimensional data can lie in a union of lower dimensional subspaces [14,17,[25][26][27][28]. In this way, data points can be grouped into meaningful clusters using subspace information rather than using the complete high dimensional data, and thus reducing the complexity and computational cost [17]. Among the subspace-based clustering algorithms, sparse representation-based algorithms have received much attention in many fields (e.g., feature fusion [29] and image segmentation [14,17,30,31]). The main motivation to use sparsity-based clustering is that the sparse subspace-based clustering algorithm (SSC) was proven to be robust against noise [32]. In addition, the complexity of the optimization with sparse representations is lower than with other representations such as low-rank representations. The sparse subspace clustering method (SSC) is a well-known clustering algorithm, which benefits from the self-expressiveness property. This property states that data points of an HSI can be written as a linear combination of other data points from the same subspace [17]. To solve the SSC optimization problem, the alternative direction method of multipliers (ADMM) is employed. Nevertheless, SSC has some drawbacks: (1) it uses the entire data set for the clustering process, which makes it hard to handle large data sets (this is known as the scalability issue), (2) it only uses spectral information and does not account for the spatial continuity between the data points [14].
During the last decades, several studies have been conducted to cope with the scalability issue. In [33], the authors proposed a landmark-based spectral clustering method (LSC) to handle large data sets. In LSC, spectral clustering is applied on a few representative points (also known as landmark points), which are selected from the data by clustering or random sampling. In [34], a sparse subspace clustering method by using orthogonal matching pursuit (SSC-OMP) was proposed. In [14], an exemplar-based subspace clustering method (ESC) was proposed, which searches for a subset of representative samples, the so-called exemplars that can describe the entire data. Moreover, such an approach allows ESC to reduce the computational cost and to analyze larger data sets. However, the random initialization in search for the exemplars of a data set, results in a less robust performance of ESC. We recently proposed a hierarchical sparse subspace-based clustering (HESSC), as a fast and robust HSI clustering algorithm [28]. In HESSC, the entire data set is explained in different levels of detail using a hierarchical structure, using a lasso-based binary tree and a criterion to split nodes and automatically determine the number of clusters. Nonetheless, all the introduced advanced sparse subspace-based clustering algorithms merely utilize spectral information.
On the other hand, several methods were developed to incorporate spatial information in the sparse subspace-based clustering procedure. In [35], a spatial subspace-based clustering algorithm (SpatSC) was proposed to employ spatial constraints in favor of utilizing information derived from neighboring pixels that have similar features in mineralogical and geological structures. In [26,36], spatial information is exploited by applying a mean or median filter on the computed sparse coefficient matrix in the upgrade phase of ADMM. The above-mentioned algorithms use the entire HSI for clustering, and do not cope with the scalability issue. Additionally, all the mentioned spectral-spatial clustering algorithms use information derived from single-sensor data.
In recent years, several morphological-based algorithms were developed that exploit spatial information for image processing. In [37], for the first time, it was proposed to use morphological profiles (MPs), using opening and closing by reconstruction operators with different structuring elements (SEs) to segment high resolution satellite images. As a follow-up, in [38], extended MPs (EMPs) were introduced as the MPs, which are extracted from the first principle components of an HSI. MPs and EMPs are powerful tools to extract spatial information, but are suffering from a few limitations: (1) the SEs have fixed shapes, (2) they are not suitable to map all the different characteristics of spatial information. In [39], morphological attribute profiles (APs) were proposed to extract structural information of very high resolution images. In APs, morphological attribute filters (i.e., attribute thinning and thickening) are applied on connected components of a gray-scale image to extract different spatial characteristics of a scene. Therefore, APs can be considered as a generalization of MPs that are better able to capture region-based attributes. In order to handle HSIs with APs, extended morphological APs (EMAPs) were proposed [40]. Additionally, in [41,42], different feature extraction techniques (e.g., kernel PCA, multi-linear PCA, and the like) were investigated on the performances of EMPs and EMAPs.
Nevertheless, APs may suffer from an under-segmentation problem, as it may group different regions belonging to several objects. To overcome this problem, invariant attribute profiles (IAPs) were proposed recently [43]. IAPs are built in two phases, the first phase is a spatial invariant feature extraction, where isotropic filtering (by convolutional kernels) is employed to extract the convolutional features of the HSI. This process is followed by a simple linear iterative clustering to spatially aggregate the filtered features. In the second phase, continuous APs are obtained by modeling the variant behaviors of pixels of the HSI. The aforementioned procedure is applied in a Fourier polar coordinate system to avoid discretization artifacts and capture frequency features [44].
In this paper, we propose a spectral-spatial multi-sensor-based clustering algorithm. The proposed algorithm extracts information from both HSI and fine spatial-resolution images. Spectral information is extracted from the HSI, while spatial features (i.e., MPs and IAPs) contain the spatial and elevation information from RGB and LiDAR data. In order to fuse and cluster the obtained information from the different sensors, the concept of HESSC is adopted, in which a fusion between the spectral and spatial features is accomplished at each hierarchical level of the binary tree structure.
The following items summarize the proposed contribution of this paper:

1.
We propose a novel multi-sensor sparse-based clustering algorithm that describes the data at different levels of detail.

2.
To the best of our knowledge, this is the first attempt to incorporate spatial information in the form of morphological-based profiles extracted from multi-sensor data sets in a hierarchical sparse-based clustering algorithm. 3.
In the proposed algorithm, both spectral and spatial features equally contribute at each level of the tree.

4.
The proposed algorithm is able to cluster large-scale data sets.

5.
The proposed algorithm can be adapted to different ancillary remote sensing data sets (e.g., RGB, multi-spectral images, HSI, LiDAR, synthetic aperture radar).

Methodology
In this section, we provide a brief overview of the proposed methodology. The workflow of the proposed algorithm is presented in Figure 1. As can be observed from the figure, spatial features are extracted from high-spatial resolution images (i.e., RGB, LiDAR). The process is then followed by establishing a hierarchical (tree) structure to describe the data at various levels of detail. In order to build the tree structure and fuse the spectral and spatial features at the level of the nodes, a lasso-based binary clustering was employed. Finally, if an end-node cannot be divided further, all data points corresponding to this end-node are assigned to one cluster.

Spatial Feature Extraction
We start to describe the spatial features that are obtained from the auxiliary high spatial-resolution image. As the spatial features, we propose to employ MPs and IAPs, the concepts of which will be described in detail in the following paragraphs.

Morphological Profiles
In [37], extracted MPs rely on two mathematical morphology operators (i.e., opening and closing). These operators are applied on an image with a series of structuring elements (SEs), that can have different shapes (e.g., disc, square, cube) with a user-defined size r . With morphological opening, an eroded image is dilated to capture the bright structures in a scene, while with morphological closing, a dilated image is eroded to filter out the dark structures in the scene. Additionally, to preserve the original image structure, a reconstruction filter is applied after the aforementioned morphological operators [41].
Let us assume that X = [x 1 , x 2 , ..., x N ] T ∈ R b×N expresses a high spatial-resolution image, where b is the number of spectral channels (e.g., 3 for an RGB image), and N represents the number of pixels in the image. In addition, the location of each pixel in X is denoted by (x, y) in the Cartesian coordinates. Let γ SE (X) and δ SE (X) denote the opening and closing by reconstruction operators on X with a set of SEs, respectively. In this study, only disk-shaped SEs with variable size are used. MPs are then computed as: where r is the radius of a disk-shaped SE, and MP γ and MP δ are the extracted opening and closing MPs by reconstruction [38]; n is a non-negative integer. For more information, we refer to [37,38]. MPs have been widely applied to extract spatial information in remote sensing applications, although they cannot provide all characteristics of the spatial information [11].

Invariant Attribute Profiles
In [43], invariant attribute profiles (IAPs) were defined as the concatenation of two different types of invariant features, i.e., spatial invariant features (SIFs) and frequency invariant features (FIFs).
In order to extract the SIFs, initially, robust convolutional features F RCF are obtained by isotropic filtering of X. For this, each band of the image is convolved with a convolutional kernel in order to aggregate the local spatial information. Then, a simple linear iterative clustering (SLIC), a well-known super-pixel segmentation approach, is applied on X [45]. Eventually, the extracted convolutional features F RCF are spatially aggregated by the super-pixels. For each pixel i, the SIFs are defined as: where φ i,q is the pixel set of the q-th super-pixel, containing the i-th targeted pixel and a total of N q pixels.
To derive the frequency invariant features (FIFs), first a polarized Fourier transformation is applied on X. In this way, rotation invariance is obtained by accessing the angular information. From this transformation, three groups of rotation-invariant features are calculated: F 1 m (x, y) are magnitude features corresponding to m different Fourier orders; F 2 m (x, y) are absolute rotation-invariant features, obtained by removing the phase information; F 3 m (x, y) are relative rotation-invariant features, that compensate for the loss of rich phase information. The FIFs are obtained by concatenating these rotation-invariant features and aggregating them into region-based features by using isotropic convolutional filters.
Finally, IAPs are produced by stacking the SIFs and the FIFs: F I APs = [F SIFs ∪ F FIFs ]. More detailed information on the generation of the IAPs can be found in [43].

Sparse Subspace Clustering (SSC)
Let Y denote the input HSI with size D × N, where D is the number of spectral bands. y i represents the ith column vector of Y, where i ∈ {1, 2, ..., N} is the pixel index in the HSI, and y ji is the element in row j and column i and denotes band j of that pixel.
The main assumption in sparse subspace clustering methods is that each data point can be described as a linear combination of a dictionary of data points from the same subspace and that all pixels are drawn from a union of L linear subspaces {S l } L l=1 of dimension {d l } L l=1 [17]. Therefore, Remark that the number of pixels of each subspace should be larger than the dimension of the subspace (N l > d l ). In most subspace-based clustering algorithms, the number and the dimensions of the subspaces need to be known a priori.
SSC addresses the aforementioned issue by employing the data itself as a dictionary. Accordingly, the following sparse optimization problem is obtained: where represents the Frobenius-norm [17]. λ is a trade off parameter between the sparsity and fidelity terms. The constraint diag(C) = 0 is required to avoid a point being represented as a linear combination of itself. The alternating direction method of multipliers (ADMM) solver is employed to solve the sparse optimization problem of Equation (3) [46]. Consequently, the non-zero elements (c ji ) of C identify all data points j that are from the same subspace as i. From C, a symmetrical similarity matrix W = |C| + |C| T is obtained, to assure that all pixels from the same subspace are connected to each other.
As the final step, to reduce the dimensionality of the problem, a spectral clustering algorithm [47] is applied on W. For this, the normalized Laplacian matrix is computed: Then, singular-value decomposition (SVD) is applied on L, to obtain the eigenvectors V ∈ R N×A , corresponding to the A smallest non-zero eigenvalues of L. Finally, clustering (we used K-means clustering) is applied on V to generate the clustering map [17,47].
The main advantage of SSC is that the clustering is applied on the similarity matrix of the sparse coefficient matrix, rather than directly on the data. In this way, the subspace structure of the data is accounted for, and makes the clustering more robust to the initialization compared to the conventional clustering algorithms (e.g., K-means, FCM) [48].
Although SSC performs well to cluster HSIs, it also has shortcomings. SSC has a scalability issue, in which it requires to use all data points to solve the optimization problem in Equation (3), which can be computationally expensive in the case of large data sets. Another shortcoming of SSC is that the algorithm only uses spectral information and does not take into account the spatial context of the HSI.

Hierarchical Sparse Subspace Clustering (HESSC)
Several sparse subspace-based clustering algorithms have been proposed in order to tackle the scalability issue, [14,27,28,49]. In our earlier work [28], we proposed an automatic hierarchical sparse subspace-based clustering (HESSC) algorithm to analyze HSIs. HESSC is able to describe an HSI at different hierarchical levels in a robust and automatic manner by employing a binary tree structure.
Starting from all data points in the HSI, at each level, nodes are split into two groups. Let us denote Y l i as the collection of all the data points in one of the parental nodes at level l i . Then, where Y r,l i+1 denote the collection of all data points in each of the two child nodes at level i + 1. The criterion to split the nodes is based on the generation of a binary matrix, obtained by the following sparse optimization problem: arg min where C l i is the computed sparse coefficients derived from Y l i , and y j is the j-th column vector of Y l i , where j is randomly selected. To solve the above optimization problem, the lasso version of the least angle regression algorithm is used [50]. Subsequently, the computed C l i is normalized, and thresholded (with a user-defined threshold) to generate a binary matrix (BM) (Figure 2). In order to assure the robustness of HESSC, the aforementioned procedure is repeated 100 times, with randomly selected y j , each time generating a binary matrix. Subsequently, the generated BMs are fed to an entropy-based consensus clustering algorithm (ECC) [51] to produce the final binary matrix that eventually splits the current node into two child nodes. A stopping criterion is applied to decide whether a node should be further split. The stopping criterion is either manual, when the required number of clusters is known prior and pre-defined by the user, or automatic, based on a Singular Value Decomposition of the data [52]. When a node is not split any further, its' corresponding data points are labeled as one cluster.
Although HESSC is able to cluster an HSI in a robust and fast manner, only spectral information is utilized. In this work, we will extend HESSC to include spatial information that is derived from high spatial-resolution data from the same scene as the HSI.

Multi-Sensor Spectral-Spatial Sparse-Based Clustering (Multi-SSC)
One way of fusing spectral and spatial information is to extract spatial features from an HSI, and combining (e.g., concatenating) the extracted spatial features with the spectral channels [53]. However, such a single-sensor integration approach might not contain sufficient spatial and contextual information. We therefore propose to explicitly incorporate the spatial information derived from high spatial-resolution data sets (i.e., high spatial-resolution RGB and LiDAR images) [19].
In the proposed Multi-SSC, a hierarchical tree approach, similar to the HESSC algorithm is used. However, at each level of the binary tree, a parental node is split, based on binary matrices, each of which is generated from either spectral or spatial information. To generate a BM from spectral information, the sparse optimization problem of Equation (4) is solved and the obtained sparse coefficient matrix is thresholded. To generate a BM from spatial information, the following sparse optimization problem is defined: where F SFE l i denotes the extracted spatial features (either MPs or IAPs) from all data points at the specific parental node of level l i . F SFE j is the j-th column vector of F SFE l i , which has been randomly selected. The obtained sparse coefficient matrix is then thresholded to generate the BM.
To cope with uncertainties of the model [54] that may be introduced by the random selection of both F SFE j and y j , we apply the ECC algorithm [51]. Additionally, to include both spectral and spatial information in a balanced way, an equal number of BMs are generated from the optimizations of Equations (4) and (5). Depending on the specific application, this ratio can be adapted. In this work, we choose to apply each of the optimizations 50 times. The total of 100 BMs are then processed by the ECC algorithm to produce the final binary matrix, which is used for the further division of the current node. The stopping criterion is the same as applied in Section 2.3. The final clustering map is generated by labeling the nodes of the tree, which are not further split.

Data Acquisition and Description
In order to evaluate the generalization capability of the proposed method, we take into account two important factors for the selection of data sets: (1) the investigated data sets cover different scene types (i.e., geological and rural areas) and (2) different sensor combinations and types are investigated (i.e., HSI, RGB, and LiDAR). For this reason, we chose the following data sets: (1) Czech data : The first data set was acquired near the Litov tailing lake and its adjacent waste heap, situated in the Sokolov district of the Czech Republic. Both HSI and RGB images were acquired. The HSI was acquired by a hyperspectral frame-based camera (0.6 Mp Rikola Hyperspectral Imager), which was deployed on a hexacopter unmanned aerial vehicle (UAV; Aibotix Aibot X6v2) along with a pre-programmed stop-scan-motion flight plan to capture a complete set of HSIs for the subsequent image mosaicking. The RGB image was captured by employing a senseFly S.O.D.A. RGB camera, deployed on a fixed-wing UAV. The spatial resolution of the captured RGB image is 1.5 cm. It is downsampled to the size of the HSI data (1066 × 909 pixels), which has a spatial resolution of 3.3 cm and is composed of 50 spectral bands ranging from 0.50-0.90 µm. Figure 3 illustrates the acquired RGB image of the Czech data set.
Finland data: The second data set was captured over an outcrop of the Archean Siilinjärvi glimmerite-carbonatite complex in Finland, which is currently mined for large phosphate-rich apatite occurrences used in fertilizer production [55]. In the Finland data set, the same instruments were employed to acquire the HSI and RGB data. The HSI and downsampled RGB images are composed of 718 × 1848 pixels. Figure 4 displays the acquired RGB image of the Finland data set.
Trento data: The third data set was captured over a rural area in the south of the city of Trento, Italy. It consists of LiDAR and HSI data that are composed of 600 by 166 pixels with a spatial sampling of 1m. The HSI was acquired by the AISA Eagle sensor, and contains 63 spectral bands ranging between 0.40 and 0.98 µm. The LiDAR data were captured by the Optech ALTM 3100EA sensor. The color-composite image of the HSI from the Trento data is shown in Figure 5.
For the fusion procedure, the co-registered high-spatial resolution RGB/LiDAR images have already been downsampled to their corresponding HSIs. In the case of geological data sets, the resampling was carried out by using the nearest-neighborhood method in Quantum GIS (version 3.4, QGIS development team, Open Source Geospatial Foundation). The internal image co-registration of hyperspectral scans was done by registering all bands on a distinct master band per hypercube, using the MEPHySTo image processing toolbox [56], used for pre-processing of the hypercubes. Georeferencing of all images was done on high-resolution drone-based RGB orthophotos, created by structure-from-motion photogrammetry in Agisoft Photoscan (version 1.4, Agisoft LLC, St. Petersburg, Russia). The UAV-based, GPS-tagged Orthophotos were acquired over the target area and their positioning was refined by using ground control points. Further data acquisition details for the Czech data set are referenced in [57], and for the Finland data set in [58].

Experimental Setup
Quantitative and qualitative analyses were conducted to evaluate the performance of our proposed algorithm. The clustering results obtained by Multi-SSC with MPs and IAPs were compared with the results of two conventional clustering algorithms K-means and FCM, along with the scalable clustering algorithms, LSC [33], ESC [14], and HESSC [28]. Additionally, to allow a fair comparison, the concatenation of the HSI and RGB/LiDAR images (e.g., the combination of HSI and RGB has D + 3 features) is fed into all of the applied clustering algorithms to evaluate their performances. Moreover, to assess the effect of extracted IAPs on the clustering performance, IAPs are stacked with the spectral bands of the HSI and used as the input for the competitive clustering algorithms. For the remaining, the following acronyms are used: AL sensor , where AL is the name of the applied clustering algorithm and sensor refers to the applied data (HSI is only spectral data, HSI+RGB and HSI+LiDAR representing the concatenation of spectral bands and RGB/LiDAR bands, HSI+IAPs denotes the concatenation of spectral bands and IAPs).
All utilized parameters in ESC were set as the default parameters proposed by the developers [14]. In LSC, the K-means sampling strategy was used to produce the landmark points. In HESSC, the threshold τ in the lasso-based binary was set to 0.5, the default by the developers [28]. For the MPs, suitable radii r for the SEs were selected for the first and second data sets by visual inspection. In the first data set, r = [1, 5, 10, 15, 20, 40, 60, 80, 100] were chosen. In the second data set, r = [10, 20, 40, 60, 80, 100], and for the last data set, r = [10, 20,40,60] were used, as in [19]. Regarding the parameters used in extracting the IAPs, we followed the same setup as suggested by the developers in [43], where the number of scaled convolutions and their radii were set to 3 and [2,4,6], respectively. The Fourier orders to derive the frequency features were set to m = [0, 1,2], and for all data sets, we used the first three principal components to generate the IAPs.
All quantitative results are reported as an average of 10 runs. All the applied algorithms in this study were implemented in MATLAB version 2019b on a computer with an Intel c core-i9 7900X with 128 GB RAM.

Evaluation Metrics
Three commonly used evaluation metrics are employed: overall accuracy (OA), average accuracy (AA), and kappa coefficient (κ). The ground truth data set is represented by G = [g 1 , g 2 , ..., g N ].
The clustering result can be presented as M = [m 1 , m 2 , ..., m N ], where m i = 1, ..., L, the index of the class label, assigned to each pixel. To assess the clustering performance, m i needs to be matched with g i . We thus employed m i = bestMap(g i , m i ) as a matching function, based on the Hungarian algorithm, where m i is the clustering map for which the best match between g i and m i is produced by bestMap(.) [59]. Consequently, OA is defined as In addition to the three aforementioned metrics, two generally applied clustering indices are employed: (1) The adjusted rand index (ARI) is used as a similarity (or dissimilarity) measure between two partitions and it is an adjusted version of the original rand index [60]. According to [60], ARI can be formulated as: where, n ij = |m i ∩ g j |, n i+ and n +j are equal to ∑ N j=1 n ij and ∑ N i=1 n ij , respectively. The range of ARI is usually between 0 and 1. An ARI closer to one means that M and G highly agree. However, it is also possible that the ARI becomes negative, indicating that the agreement between M and G is even less than what is expected from a random result.
(2) The normalized mutual information (NMI) is calculated based on the mutual information between M and G. The mutual information is normalized to range NMI between 0 and 1 [61]. NMI can be calculated as: ∑ ij n ij log n i n ij n i+ n +j (∑ i n i+ log n i+ n )(∑ j n +j log n +j n ) Similar to ARI, the agreement between M and G is higher when NMI is closer to 1.

Results and Discussion
Results of all experiments are summarized in Tables 1-3, for the Czech, Finland and Trento data, respectively.

The Czech Data Set
The Czech data set contains 8 classes: (1) Vegetation, (2) Acid mine drainage-stream, (3) Acid mine drainage-solid, (4) Soil-slipped, (5) Clay-high, (6) Clay-flood zone, (7) Old-stream bed and (8) Artificial objects. Table 1 summarizes all clustering results. The proposed method outperforms all other clustering approaches. The clustering performance of HESSC HSI is weak, when compared to K-means HSI , FCM HSI , and LSC HSI , which may be due to the high similarity between the spectral signatures of the different classes. The concatenation of extracted IAPs improves the accuracy of HESSC HSI+I APs with 5% over HESSC HSI . Another 12% improvement is observed when fusing the spectral and IAPs in the proposed approach Multi-SSC, when compared to concatenating them in HESSC HSI+I APs . With respect to the clustering results of individual classes, one can observe that class 2 and 8 are accurately mapped using Multi-SSC (IAPs), while class 4 and 6 are better mapped using Multi-SSC (MPs). ESC HSI+RGB is able to capture class 2 perfectly, while both versions of LSC perform very poor on class 2. Figure 6 shows the obtained clustering maps (results with concatenated IAPs are not shown). It can be observed that K-means, FCM, and LSC capture the general structure, yet, both their single and multi-sensor versions generate heterogeneous clustering maps. The clustering maps obtained by Multi-SSC with MPs and IAPs are more homogeneous.
In general, the clustering results on the Czech data show that incorporating spatial information by stacking derived IAPs with the spectral bands of the HSI improves the single-sensor version of the algorithms. However, fusion of the IAPs with the HSI by employing the proposed Multi-SSC leads to the most accurate result. These results can be attributed to the structure of Multi-SSC. In the proposed fusion strategy of Multi-SSC, both spectral and spatial features contribute equally. In the competitive algorithms, i.e., K-means, FCM, LSC, and ESC, it is not possible to balance the contribution of spectral and spatial features. In the Czech data, the obtained values of ARI and NMI show that in K-means, FCM, and LSC, the mere concatenation of the IAPs, extracted from the HSI does not improve results. In ESC, however, stacking the extracted IAPs with the HSI improves ARI and NMI by almost 0.31 and 0.21, respectively. This can be attributed to the ability of ESC to select exemplars that are representative of the entire data set.
With respect to Table 2, Multi-SSC shows the highest accuracy, both with MPs and IAPs. HESSC HSI performs better than the other considered spectral-based clustering algorithms. However, the proposed algorithm (using IAPs) surpasses the original HESSC HSI by almost 17% in terms of OA. The inclusion of a high spatial-resolution RGB image in the K-means algorithm improves the clustering accuracy by 10%. Replacing RGB bands by IAPs does not further improve the results. With respect to clustering results for individual classes, ESC HSI+I APs outperforms the other approaches on class 2 (Glimmerite-Carbonatite). However, overall, ESC HSI+I APs has a poor performance compared to its competitors, which can be attributed to its dependence on initial conditions, i.e., the random selection of the first exemplars. HESSC HSI+I APs and ESC HSI capture class 3 (Glimmerite) better than the other applied algorithms. Figure 7 presents the obtained clustering maps. Multi-SSC, both with MPs and IAPs generate homogeneous regions and preserve the structures within the data set. In particular, for both geological data sets, it is desirable to obtain homogeneous clusters corresponding to individual geological features of interest. As it can be observed from Table 2, the concatenation of the extracted IAPs with the HSI in K-means, FCM, LSC, and ESC considerably increases the calculated ARI and NMI. This proves that the extracted IAPs are informative and increase the performance of the clustering algorithms. Nevertheless, the fusion strategy of Multi-SSC plays an eminent role. The equal contribution of the extracted IAPs and the HSI results in the highest ARI and NMI.

The Trento Data Set
In the Trento data set, there are six classes: (1) Apple trees, (2) Buildings, (3) Ground, (4) Wood, (5) Vineyard and (6) Roads. As it is shown in Table 3, also in this dataset, Multi-SSC with IAPs performs the best (OA = 71.90%) and Multi-SSC with MPs (OA = 65.12%) is the second best clustering algorithm. The clustering maps are presented in Figure 8. Overall, Multi-SSC (IAPs) generates a smooth clustering map compared to the other considered clustering algorithms. The clustering results in Table 3 show that the proposed method with IAPs miss-clustered class 1 (Apple trees) as class 5 (Vineyard). While Multi-SSC with MPs was able to cluster class 1 correctly. Such miss-clustering in Multi-SSC with IAPs can be attributed to the used parameters in the spatial extraction phase. More specifically, the used radii in the scaled convolutions lead to over-smoothed clustering results, as is shown in Figure 8m. This can be alleviated by modifying the radii of the scaled convolutions. Such modifications may, however, result in a decrease of the overall accuracy.

Discussion
Experimental results on all three data sets confirmed that spectral-based clustering algorithms using single-sensor data sets obtained less distinct clustering results for different classes of interest (e.g., Wood, Vineyard). The obtained values of OA, AA, κ, ARI, NMI, and t (seconds) of the applied clustering algorithms show that the inclusion of multi-sensor data sets in the clustering procedure leads to an improvement of the final clustering results. Moreover, the incorporation of spatial information using extracted spatial features IAPs versus MPs was investigated. It can be observed that the use of IAPs significantly improves the clustering performance. The results prove that the extracted IAPs consist of more informative spatial features compared to MPs, mainly due to the fact that they cover spatial information extracted from both spatial and frequency domains.
Multi-SSC was designed to fuse the rich spectral, spatial, and contextual information derived from multi-sensor data sets. The algorithm has the potential capability to be adjusted for processing any type of data set. Multi-SSC benefits from a hierarchical structure, which makes the algorithm able to describe multi-sensor data sets at different levels of detail. In addition, Multi-SSC employs the sparse representation matrix to cluster data sets. Such information allows the algorithm to explore and identify the relation between data points from the same subspace. Due to its effective structure, Multi-SSC is able to handle large scale data sets by using limited number of data points. Multi-SSC utilizes both spectral and spatial information in order to generate the final clustering map. As can be seen in Tables 1-3, the best results are obtained with the proposed algorithm. Multi-SSC is, thus, able to fuse and exploit spatial information in more accurate manner compared to simply concatenating two data sets. In addition, Multi-SSC ensures a balanced use of the spectral and spatial information in its clustering procedure.
Therefore, it can be summarized that Multi-SSC, both using MPs and IAPs, performs well to capture the classes of interest in both rural and geological areas. Multi-SSC consistently performs better with IAPs than with MPs. This can be attributed to the fixed shape of the SEs in MPs, which cannot fully exploit the spatial characteristics of regions within a scene of an image. Additionally, due to the fixed parameters, IAPs can be automatically extracted, which makes them more efficient compared to MPs for unsupervised learning purposes.    Table 3. Quantitative assessment of the performances of the clustering algorithms applied to the Trento data. The clustering accuracy is reported using OA, AA, κ, ARI, and NMI. In addition, the processing time is reported as well.  Processing times are reported in Tables 1-3. Overall, K-means and LSC perform fast, compared to the other applied clustering algorithms. K-means is a distance-based algorithm that measures the Euclidean distance between each data point and its corresponding centroid, which does not demand high computational power. In LSC, landmark points are identified by adopting the K-means algorithm, leading to similar processing times. Also, based on the reported ARIs and NMIs on different data sets, K-means and LSC performed similarly. According to the reported processing time for all data sets, Multi-SSC, both with MPs and IAPs takes more time than the other approaches K-means, FCM, LSC, and HESSC. This can be attributed to (1) the spatial feature extraction step, which can be time consuming when extracting the spatial information from larger data sets, and (2) the ECC step, which generates the final binary clustering result from a binary matrix that is composed of 100 individual binary clustering results) [28,51]. Nonetheless, with respect to the calculated ARI and NMI, the most accurate performance on all data sets is obtained by Multi-SSC with IAPs. This observation shows that our proposed algorithm is able to effectively fuse the spatial/contextual and spectral information from multiple-sensor data sets. As another observation, among all the applied clustering algorithms, ESC performs the slowest. This can be attributed to the high number of exemplars selected by ESC. By increasing the number of exemplars, not only the processing step will increase, but also the capability of ESC to handle large-scale data sets will drastically drop.

Conclusions
In this paper, we proposed Multi-SSC, a novel multi-sensor clustering algorithm that integrates high spectral and spatial resolution images. In Multi-SSC, invariant attribute profiles (IAPs) were extracted to incorporate the spatial information derived from a fine spatial-resolution image (RGB/ LiDAR). Then, the hierarchical sparse subspace clustering algorithm (HESSC) was adopted to fuse the spectral information derived from an HSI with the extracted IAPs to produce the final clustering map. The clustering performance of Multi-SSC was validated on three different data sets. Two data sets were acquired over mining sites and used to identify the minerals of interest. The third data set is the well-known Trento benchmark data set containing HSI and LiDAR data over a rural area. Multi-SSC along with some conventional and state-of-the-art subspace clustering algorithms were applied on all three data sets.
The experiments imply that in most cases, the concatenation of IAPs with spectral bands of an HSI has the superiority to merely stacking the HSI bands with RGB and LiDAR bands. Furthermore, it is also observed that, fusing the multi sensor data sets using Multi-SSC considerably increases the clustering accuracies. The clustering maps obtained by Multi-SSC contain homogeneous areas and preserve the structures present in the scene. The proposed multi-sensor algorithm evenly balances the contribution of spectral and spatial information, and allows to adapt the ratio of employing spectral to spatial information according to the applications at hand.