Hyperspectral Image Super-Resolution Based on Spatial Group Sparsity Regularization Unmixing

: A hyperspectral image (HSI) contains many narrow spectral channels, thus containing efﬁcient information in the spectral domain. However, high spectral resolution usually leads to lower spatial resolution as a result of the limitations of sensors. Hyperspectral super-resolution aims to fuse a low spatial resolution HSI with a conventional high spatial resolution image, producing an HSI with high resolution in both the spectral and spatial dimensions. In this paper, we propose a spatial group sparsity regularization unmixing-based method for hyperspectral super-resolution. The hyperspectral image (HSI) is pre-clustered using an improved Simple Linear Iterative Clustering (SLIC) superpixel algorithm to make full use of the spatial information. A robust sparse hyperspectral unmixing method is then used to unmix the input images. Then, the endmembers extracted from the HSI and the abundances extracted from the conventional image are fused. This ensures that the method makes full use of the spatial structure and the spectra of the images. The proposed method is compared with several related methods on public HSI data sets. The results demonstrate that the proposed method has superior performance when compared to the existing state-of-the-art.


Introduction
Hyperspectral imagery (HSI) can profile materials and organisms due to hundreds of narrow spectral bands that correspond to different wavelengths, which is not provided by traditional images. Hyperspectral imaging has been significantly developed and utilized in many computer vision tasks in recent years, including remote sensing [1], medical imaging [2], object classification [3,4], and target tracking [5,6].
To obtain an HSI with a good signal-to-noise ratio, the hyperspectral camera must use a long exposure time to ensure that enough photons are captured. Constrained by sensor technology, the current HSIs usually have low spatial resolution [7] and suffer from the spectral mixing effect, which leads to every single pixel consisting of spectra from several materials. This makes the exploitation of hyperspectral images a big challenge, which is difficult to overcome due to physical limits. HSI super-resolution is a fundamental and essential problem in the field of hyperspectral imaging, based on the simple idea of fusing the spectral information of hyperspectral images with the spatial information of conventional images. In order to avoid misunderstandings, the term "conventional images" is used in this paper to represent RGB images and multispectral images (MSIs) which have at least an order of magnitude less spectra than HSIs. Thus, HSI super-resolution can also be referred to as hyperspectral and multispectral image fusion (HS-MS image fusion).
There have been many HSI super-resolution approaches in recent years. The spatial and spectral regularization is an essential idea that widely used [8]. Most of the HSI super-resolution methods intend to seek breakthroughs in making full use of spatial and spectral information. From the point of view of image processing, the problem is a special case of image fusion. In particular, the HSI super-resolution problem can be seen as a generalisation of pan-sharpening [9][10][11]. However, this technique does improve the spatial resolution to some extent. To further improve the quality of super-resolution, other fusion methods such as matrix factorization-based methods [12,13] maintain spatial structures of the HSI. Unmixing-based [12,13] unmix the latent HSI into endmember and abundance matrices and then constrain the appropriate structures for those factored matrices to regularize the super-resolution problem. Tensor factorization-based [14,15] and Deep Learning-based methods have also been developed in the literature [16,17]. A brief review on the related works is given in Section 2.
An equivalent definition of an HSI is given by the acquisition of a stack of images representing the radiance in the respective spectral band. Pixels in an HSI often include a mixture of multiple materials. Hence, the HSIs also seem as three-dimensional data cubes, which contain the spatial dimension and the spectral dimension. It leads to an observation that hyperspectral super-resolution is linked to the well-known problem of hyperspectral unmixing. Hyperspectral unmixing characterizes the complex mixture in a spectral signal as a combination of pure spectral compositions (which are called endmembers) and their corresponding weights (which are called abundances) [18]. There are two types of mixing models to describe HSIs: the linear mixing model and various nonlinear mixing models. The simplest model that meets the data are usually the most effective. The linear mixing model has been widely used, as it is simple and has clear physical meaning [19]. It assumes that each pixel of the HSI is a linear combination of endmembers [19], which leads to computationally efficient solutions.
In the HSI super-resolution method proposed in this paper, we unmix the low spatial resolution HSI and high spatial resolution conventional image separately, following which the high spatial resolution HSI is estimated. The hyperspectral unmixing method introduced in this paper is based on spatial group sparsity regularization and the 2,1 norm. Simple Linear Iterative Clustering (SLIC) superpixels are introduced to build a spatial group sparsity regularizer, which is combined with the sparse unmixing framework to investigate the effectiveness of the spatial structure and sparsity. Furthermore, the 2,1 norm is adopted in the loss function to restrain noise and outliers. Finally, the alternative direction method of multipliers (ADMM) algorithm is utilized to solve the robust sparse unmixing problem.
In summary, this work mainly contributes in the following aspects: • We propose an HSI super-resolution approach by simultaneously unmixing the two input images into their endmembers and associated abundances.

•
The advantage of the proposed unmixing lies in taking the spatial correlation, as well as noise and outliers, into consideration and providing a solution that combines SLIC superpixels and robust sparse unmixing.

•
We test our approach with a widely used standard benchmark, the "Harvard data set", and several remotely sensed hyperspectral images. The results of the experiments demonstrate that the proposed approach is superior to other related state-of-the-art methods.
The rest of this paper is organized as follows. Section 2 presents some related works. Section 3 details the proposed method. Section 4 gives the experimental settings. The results and analysis are presented in Section 5, and the conclusions are given in Section 6.

Related Work
Hyperspectral imaging has been widely adopted in the field of remote sensing over the past two decades, and has become popular in the field of traditional computer vision. It is difficult to obtain a high spatial resolution HSI, due to hardware limitations [20], which has led to a notable amount of research into HSI super-resolution. HSI super-resolution is also referred to as HS-MS image fusion as it fuses the high-resolution spectral information of an HSI and high-resolution spatial information of an MSI. We roughly divide the related work into the following categories: (1) Pan-sharpening-based methods; (2) Matrix factorization-based methods; (3) Tensor factorization-based methods; and (4) Deep Learning-based methods. The recent development of HS-MS image fusion is illustrated as follows.
Pan-sharpening fuses an HSI with a high spatial resolution panchromatic image. It is considered a special case of HS-MS image fusion. Gomez et al. [9] first introduced a pan-sharpening-based fusion method by using a wavelet transform. Then, Zhang et al. [10] proposed an improved method using a 3D wavelet transform. However, the performance of wavelet transform-based methods is highly restricted by spectral resampling. With the development of pan-sharpening techniques, more advanced attempts have been made to solve HS-MS image fusion problems. For example, a framework has been proposed in [11] that uses pan-sharpening to divide the spectrum of the hyperspectral data into several regions. A synthetic image resampled from an MSI is then used as a high-resolution image in the spectral range covered by the hyperspectral bands. Selva et al. [21] proposed a new framework, termed hypersharpening. It adapts multi-resolution analysis (MRA)-based pan-sharpening methods to HS-MS fusion. In this work, a high-resolution image is synthesized for each hyperspectral band, as a linear combination of MSIs, by linear regression.
Matrix factorization-based methods are mainly based on the assumption that an HSI can be approximated by a basis and corresponding coefficients. Based on different bases, Matrix factorizationbased methods can be divided into two sub-categories: methods based on the spectral basis are also known as spectral unmixing-based methods, which transform the HS-MS fusion problem into a coupled matrix factorization problem. The spectral basis and coefficients are updated using HS-MS images with specific priors. The HSI is unmixed into endmembers and abundances with suitable spatial and spectral constraints. For example, Berné et al. [12] fused HS and MS images to a super-resolution image by spectral unmixing. In this work, Non-negative Matrix Factorization (NMF) and least-squares regression were separately used to unmix the input HSI and MSI. An improved method, termed coupled NMF (CNMF), was proposed by Yokoya et al. [13]. The improvement of this method was mainly in conducting spectral unmixing based on NMF with the constraints of an observation model, combining both spectral response functions (SRFs) and point spread functions (PSFs). Akhtar et al. [22] proposed a method that uses non-parametric Bayesian sparse representations for HS-MS image fusion, which contributed to inferring distributions for the spectra of endmembers in the scene. Lanaras et al. [23] addressed HSI super-resolution as a coupled matrix factorization problem, jointly solving HSI super-resolution and unmixing.
The other sub-category of matrix factorization-based methods tries to make full use of the spatial structures of the HSI, in order to obtain better super-resolution performance. For example, Akhtar et al. [24] proposed a method, in which G-SOMP+ dictionary learning and sparse coding were used with the extracted spectra to reconstruct the super-resolution HSI. Wei et al. [25] presented an unsupervised spectral unmixing-based HS-MS image fusion algorithm. The problem was formulated as an inverse problem with non-negativity and sum-to-one constraints. Zhang et al. [26] exploited the clustering manifold structure in HS-MS image fusion based on the knowledge that the manifold structure is well-preserved in the spatial domain of the MSI. Some Matrix factorization-based methods considering the spectral basis and spatial structures have been proposed; for example, Chen et al. [27] proposed a joint spatial-spectral resolution method with spectral matrix factorization and spatial sparsity constraints. The property that real-world HSI are locally low-rank is used to partition the hyperspectral image into patches and helps the optical computing of HSIs [28].
Tensor factorization-based methods are based on the prior that an HSI can be described by a 3D tensor. Dian et al. [14] proposed a non-local sparse tensor factorization for HS-MS image fusion, where an HSI cube can be approximated by core tensor multiplication. It transforms the problem into the estimation of a sparse core tensor and dictionaries. Li et al. [15] proposed a coupled sparse tensor factorization-based super-resolution method which formulated the HS-MS image fusion problem as the estimation of a core tensor and dictionaries of three modes, which is solved by coupled tensor factorization of the input HSI and MSI. Zhang et al. [29] proposed a spatial-spectral graph-regularized low-rank tensor decomposition-based method, which infers the spectral smoothness of the HSI and spatial consistency of the MSI. The low tensor-train rank (LTTR) prior has been shown to outperform Tucker rank-based methods [30]. In the method proposed in [31], the input HSI and MSI are first clustered according to their structure, constituting a highly correlated 4D tensor. Then, a new LTTR prior is introduced to learn the correlations among the spatial, spectral, and non-local modes. Finally, the regularized optimization problem is solved using the alternating direction method of multipliers (ADMM) algorithm.
Deep learning has become a popular tool in many fields, including HS-MS image fusion. Qu et al. [16] made the first attempt to solve the HSI image super-resolution problem using an unsupervised sparse Dirichlet-Net, which reduces the amount of training set data. Xie et al. [17] provided a new HS-MS image fusion network, which was based on observation models and unfolding the algorithm into an optimization-inspired deep network. Zhang et al. [32] presented an effective convolutional neural network (CNN)-based method which learns the deep prior from the external data set, as well as the internal information of input coded image with spatial-spectral constraints.

Proposed Method
The proposed HSI super-resolution method is detailed in this section. First, we formulate the problem. Then, a solution based on spatial group sparsity regularization unmixing is presented.

Problem Formulation
Our final goal is to estimate a high spatial resolution hyperspectral image S 3D ∈ R M×N×L from a low resolution hyperspectral image H 3D ∈ R m×n×L and a high spatial resolution conventional image C 3D ∈ R M×N×l , where M × N and m × n represent the spatial dimensions, and L and l denote the number of the spectral bands. For the proposed problem, m × n is generally several times lower than M × N, L for the hyperspectral image is in the range of 100 to 200 bands, and l L. This makes the problem severely ill-posed.
HSIs and MSIs are 3D image cubes. The 3D image cubes are resized into 2D matrices. Each column of the matrix corresponds to the spectrum of a pixel and each row corresponds to the complete image in a specific spectral band. Thus, S ∈ R L×N c , H ∈ R L×N h , and C ∈ R l×N c , where N c = M × N, The number of endmembers in an HSI scene is lower than the total number of spectral bands. The linear mixing model indicates that the mixed spectrum in a given HSI is a linear combination of abundances. Thus, a pixel s ∈ R L of the HSI S can be approximated as: where e j ∈ R L represents the reflectance of the j th endmember and a j is the corresponding abundance of the endmember. P is the highest possible number of endmembers in the scene. Equation (1) can be written, in matrix form, as follows: where E = [e 1 , e 2 , · · · , e P ] and A = [a 1 , a 2 , · · · , a N c ]. The endmember matrix E is an non-orthogonal basis which can represent S in a lower dimensional space R L×P . The input HSI H and the conventional image C describe the same scene as S. However, H has a smaller spatial size, which can be seen as a spatially downsampled version of S. Similarly, C is a downsampled version of S in the spectral dimension. They are formulated as follows: whereÂ = AT h is the abundance with lower spatial resolution and T h ∈ R N h ×N c is the downsampling operator for spatial pixels. Similarly,Ê = T c E is the spectrally downsampled endmembers and T c ∈ R l×L is the downsampling operator for spectral bands. T h and T c are respectively determined by the cameras. If E and A are known, the super-resolution hyperspectral image can be estimated, which means that the HSI super-resolution problem can be transformed into an unmixing problem. Thus, two physical constraints should be considered: The abundance vector obeys the non-negative constraint and sum-to-one constraint, given by: where a i are the elements of A. These constraints include the desired sparsity of the abundances.

Problem Solution
As suggested above, if E (of the HSI) and A (of the conventional image) are known, a super-resolution HSI can be estimated. This subsection introduces a solution for sparse unmixing based on spatial group sparsity regularization.
Suppose Y ∈ R L×N denotes an HSI with L spectral bands and N pixels, E = [e 1 , · · · , e P ] ∈ R L×P denotes the endmember matrix, and the abundance vector is A = [a 1 , · · · , a N ] ∈ R P×N , where P is the number of endmembers and n represents the additive noise. Then, the unmixing can be formulated as follows: In the sparse unmixing model, hyperspectral vectors can be approximated by a linear combination of a small number of spectral signatures in the spectral library and, so, the non-zero abundance should appear in only a few lines, which indicates sparsity along the pixels of an HSI [33]. Traditional sparse unmixing approaches usually adopt the least square error function; however, it is vulnerable to noise or outliers, as the error term is squared, which means that even small disturbances play a significant role in the objective function. Furthermore, the abundance has a collaborative sparse property [13], which means it can be characterized by the 2,1 norm. This can also impose sparsity, which is more rotation invariant than the 1 norm. Sparse unmixing is equivalent to solving the following optimization problem: where the 2,1 norm is described as: According to geography theory, the pixels in a local spatial group have a similar sparse model. This section presents the process of taking advantage of the spatial information and sparsity priors by coupling a spatial group sparsity regularizer with sparse unmixing.
To improve the dependence between adjacent pixels and make full use of the spatial information, the original HSI needs to be transformed to a superpixel scale before the unmixing process. Let T ∈ R M×K denote a spatial transform operator for the HSI and abundance library, and Y s ∈ R L×K and E s ∈ R P×K be the superpixel scale approximation of the original HSI Y and the original abundance A, respectively. The transformation result is inserted into Equation (7): The goal of the superpixel transformation is to cluster spatially adjacent pixels, which are structurally similar and share the same sparse model. We introduce a representative gradient ascent-based algorithm, SLIC [34], to obtain superpixels, which also have the advantage of preserving the structures or features of non-clustered pixels.
The SLIC superpixel algorithm was originally designed for color images, but has been applied in HSI analysis in recent years. SLIC has a similar function to k-means clustering, but it is applied in a feature domain consisting of spatial and color features. To adapt it to hyperspectral data, color features are replaced by spectral features. We used an improved SLIC method to produce the superpixels, which is summarized as follows: Step 1: Create a feature vector Φ(x, y): where (x, y) is the spatial location of the pixel, λ = m/S is a parameter denoting the trade-off between the spatial and spectral priors, and I(x, y) contains the values of each spatial band. IN λ, m is a parameter related to superpixel regularity and S is the size of superpixels.
Step 2: Construct a set of cluster centers C k = Φ(x k , y k ). Traditional SLIC for a color image uses a square grid to overlap the whole image. However, considering that hexagonal grids have more non-diagonal neighbors than square grids, which leads to less distance distortion loss when processing boundary pixels, a hexagonal grid is introduced to promote the accuracy of clustering. In addition, as the number of spectral bands in the HSI is much more than the number of channels in the conventional color image, we directly use the spectral channels in the clustering process while the color channels of the color image are used for color space transformation. This has the benefit of minimizing the loss of spectral information. The cluster center is defined by the following formula: where [x 1i , · · · , x Li ] T is the average spectral reflectance of the i th cluster and [m i , n i ] T denotes the location of the cluster center.
Step 3: Suppose d = ||Φ(x, y) − Φ(x k , y k )|| is the distance between a pixel (x, y) and the cluster center. Assign the pixel to the closest cluster center.
Step 4: Update cluster centers based on the spatial centroid of the clustered pixels under the current iteration.
Step 5: Repeat until the distance is less than a pre-determined threshold.
Step 6: Assign the disjoint segments to the largest nearby cluster. The Alternating Direction Method of Multipliers (ADMM) [9] algorithm is used to solve the optimization problem shown in Equation (7): where ł R + (A) is the indicator function for the non-negative orthant R + and A i is the i th atom of the abundance vector. It should be noted that when ł R + (A i ) = 0, A i is in the non-negative orthant or +∞.
Adding three auxiliary matrices V1, V2, and V3, the problem in Equation (12) can be reformulated as follows: min Equation (13) can be compacted as: Thus, the augmented Lagrangian function is: where Λ/µ are the Lagrange multipliers and µ > 0 is the Lagrange multiplier regularization parameter. The L can be optimized by updating V 1 , V 2 , A, and V 3 .
(2) To update V 2 , a formula similar to Equation (17) needs to be solved. The difference is that ξ is replaced by (4) To update A, The termination condition of the iteration is ||GV k 3 + H I k − J|| 2 F < ε × ((r * c)) 1/2 and the error tolerance is ε > 0, where r and c are the number of rows and columns of J, respectively. Once A is acquired, E can be computed as described by Equation (6).
We apply the unmixing approach detailed above to both the input HSI and conventional image, such that E of the HSI and A of the conventional image are obtained, leading to an HSI with a high resolution both in spectral and spatial domains.

Experiments
To verify the superiority of our HSI super-resolution method, we compared the proposed method with the following state-of-the-art works: (1) FUSE [35]; (2) Bayesian Sparse Representation (BSR); (3) Coupled Spectral Unmixing Hyperspectral Super-Resolution (CSU) [23]; (4) Clustering Manifold Structure (CMS) [26]; and (5) Learning a Low Tensor-Train Rank Representation (LTTR) [31]. The experiments were carried out using two open-source HSI data sets. All experiments in this paper were implemented in MATLAB R2018b on a server with 24 2.20 GHz Intel Xeon CPUs and 60.0 GB RAM. In this section, we introduce the data sets used in the experiments, followed by the results and discussions.

Data Sets and Quantitative Metrics
We used real-world images from standard HSI data sets, namely, the "Harvard" data set [36] and remote sensing image data sets. The remote sensing data we used were "CUPRITE", "CHIKUSEI", "INDIAN PINES", "UNIVERSITY OF PAVIA", and "UNIVERSITY OF HOUSTON". Images in the "Harvard" data set have a spatial resolution of 1392 × 1040 and contain 31 spectral bands ranging from 420 nm to 720 nm. RGB color images from the "Harvard" data set are shown in Figure 1. In our experiments, the original images in the data sets served as ground truth. The observed low spatial resolution HSIs were downsampled by a factor of 8. Similarly, the conventional images (RGB color images) were downsampled in the spectral domain with a spectral response matrix R derived from a Nikon D700 digital camera (Nikon D700 Study http://www.maxmax.com/nikon_ d700_study.htm).  Table 1. It must be noted that we used cropped images with selected spectral bands, instead of the original images. Lines 4 and 6 in Table 1 show the number of selected bands and the size of the cropped image, which are essential parameters for the algorithms.  Six widely accepted quantitative metrics were used in this work to quantitatively measure the results. The basic idea of Structural Similarity (SSIM) [1] is to evaluate the similarity of two images through the following three aspects: luminance, contrast, and structure. It is formulated as follows: where µ S and µŜ represent the average values of the estimated image and ground truth, σ is the standard deviation, and σ SŜ is the covariance of S andŜ. A SSIM value close to one indicates high image quality.
The root mean square error (RMSE) [1] measures the absolute error between the estimated image and the ground truth. A smaller RMSE value indicates better quality: where L is the number of bands and N c is the total number of pixels. Peak signal-to-noise ratio (PSNR) [1] is used to measure the numerical similarity of spatial reconstruction between the result and the ground truth. It is defined as follows: where S i andŜ i denote the pixel value of the ith spectral band separately in the reconstructed image and ground truth, respectively, and L is the total number of bands. The larger the PSNR value, the closer the spatial quality of the reconstructed image is to that of the reference image. Spectral angle mapper (SAM) [1] is a metric which evaluates the spectral preservation quality, indicating the spectral similarity between the result and the ground truth. The SAM of the jth pixel is defined as the angle between the vectors of the estimated and the ground truth spectrum, as follows: A smaller SAM value indicates higher spectral quality. The relative dimensionless global error in synthesis (ERGAS) [1] is a global statistical metric, which is defined as follows: where MSE(S i ,Ŝ i ) represents the mean square error, d is the downsampling scale factor, and mean(Ŝ i ) is the mean pixel value ofŜ i . Hypercomplex quality assessment (Q2 n ) [37] is another metric used to evaluate HSIs. Q2 n has been improved based on the universal image quality index (UIQI), which has been widely used in many other related works. The UIQI is defined as: where x and y are the band of ground truth and band of the reconstructed HSI. The three components of Q(x, y) respectively represent the correlation, luminance distortion, and contrast distortion of the image. However, the UIQI was specifically designed for monochromatic images. Q2 n overcomes this problem by modeling each pixel's spectrum vector x j as a hypercomplex number, which is formulated as follows: x j = x j,0 + x j,1 i 1 + x j,2 i 2 · · · + x j,2 n i 2 n −1 .
Thus, Q2 n is a universal image metric, a larger value of which indicates higher quality of the image.

Experimental Setting
The basic procedure of the experiment was to first obtain the input images based on the ground truth image by simulation and then running various super-resolution algorithms to obtain high spatial resolution HSIs. Finally, the results were compared with the ground truth. The data sets provided an original HSI cube. There is no ground truth in the real world, so the literature typically uses the HSI in the data set as a reference. The input low spatial resolution HSIs and high spatial resolution conventional images were obtained by downsampling, based on these references.
The source code of all the related works for comparison are open-source. The code for FUSE is available at http://wei.perso.enseeiht.fr/publications.html; the code for BSR is available at http: //openremotesensing.net/wp-content/uploads/2016/12/Supplementary.zip; the code for CSU is available at https://github.com/lanha/SupResPALM; the code for CMS is available at https:// sites.google.com/site/leizhanghyperspectral/publications; and the code for LTTR is available at https://github.com/renweidian/LTTR.
We kept the default parameters for the related works. For the unmixing-based methods, the number of endmembers (P in Equation (1)) was set to 30, which was sufficient for all data sets. As for our spatial group sparsity regularization-based unmixing, the Lagrange multiplier regularization parameter µ, the iteration number r, and the error tolerance ε were respectively set to µ = 10 −2 , r = 1000, and ε = 10 −6 . Furthermore, the average width of the superpixels for SLIC was set to 10. The regularization parameter λ plays an important role related to the trade-off between accuracy and sparsity, and was set to 10 −2 .
For images in the "Harvard" data set, all downsampling scale factors were set to 8. The conventional images had three spectral bands, which means that we estimated images with dimensions of 31 × 1392 × 1040 from low spatial resolution hyperspectral images with dimensions of 31 × 174 × 130 and conventional images with dimensions of 3 × 1392 × 1040. For images in the remote sensing data sets, the downsampling scale factors and the numbers of conventional image bands were various, as shown in Table 2.

Results and Analysis
We present and analyze the performance of the proposed HSI super-resolution method in this section. Due to space limitations, not all results are presented. First, we give a visual result of our unmixing and super-resolution method. Then, examples of visual comparison results of images in the "Harvard" and "UNIVERSITY OF PAVIA" data sets are shown. Finally, we present the tables of objective image quality metrics of all the compared methods.

Hyperspectral Unmixing
We selected a popular benchmark named "Jasper Ridge" as an example to qualitatively demonstrate the performance of the proposed hyperspectral unmixing method. It contained 100 × 100 pixels with bands ranging from 380 nm to 2500 nm. The spectral resolution was up to 9.46 nm. Channels 1-3, 108-112, 154-166, and 220-224 were removed, in order to eliminate water absorption and atmospheric effects. Finally, 198 bands remained, and there were four latent endmembers in this data: "Water", "Tree", "Soil", and "Road". The SLIC superpixels of the benchmark "Jasper Ridge" are illustrated in Figure 3. The SLIC superpixels are shown in Figure 3a, whose shape and size are adaptive and related to the spectral similarity of the neighboring pixels. Figure 3b shows the ground truth spectral reflectances, while Figure 3c gives two examples that indicate pixels within a superpixel share similar spectral properties. The abundances were well estimated by the proposed method and were close to the ground truth. Figure 4 shows the estimated abundance.    To verify the efficiency of the proposed Spatial Group Sparsity Regularization based unmixing (SGSU) algorithm, simulated experiments were performed. Synthetic HSIs with 100 × 100 pixels contained six endmembers. The endmembers were randomly selected from the USGS library and randomly distributed. The proposed unmixing was compared to that of several typical sparse-based unmixing algorithms, such as SUnSAL [38], CLSUnSAL [39], and SUnSAL-TV [40]. The synthetic data were formed by spectra randomly selected from the digital spectral library of the United States Geological Survey (USGS), which is available at: https://www.usgs.gov/labs/spec-lab. We adopt two metrics to quantificationally evaluate the unmixing performance: the spectral angle difference (SAD) [41] and the root-mean-square error (RMSE) [41]. SAD and RMSE are typically used to evaluate the performance of hyperspectral unmixing. SAD is similar to SAM, as in Equation (20), but changes the input variable to endmembers. The input variable for RMSE is abundances. SAD was applied to evaluate the spectral angle between the estimated endmember and the ground truth endmember. RMSE is an evaluation metric for signal reconstruction, which represents the difference between the reconstructed signal and the original signal. Generally, smaller SAD and RMSE indicate better unmixing performance.
We conducted a experiment with the simulated data to analyze the sensitivity of the proposed unmixing method with regard to different endmember numbers. The number of endmembers was varied from 3 to 18 with a step length of 3. As Figure 5 shows that the values of SAD and RMSE increase as the number of endmembers increases, which means the difficulty of the unmixing increases as the number of endmembers increases. Noise and outliers also have significant impacts on hyperspectral unmixing. To study how the level of noise affects the unmixing performance and prove the superiority of the proposed unmixing method, we added Gaussian white noise with a signal-to-noise ratio (SNR) varying from 5 db to 50 db, and compared the proposed SGSU with some classical unmixing methods. Figure 6 shows the SAD and RMSE values as a function of the SNR of Gaussian white noise. We can see, from Figure 6a, that the SAD of all methods decreased with the increase of SNR, which indicates that noise had a bad effect on the hyperspectral unmixing. However, the proposed unmixing method obtained the best RMSE in most conditions; Figure 6b indicates that the proposed unmixing method performed better than other state-of-the-art methods.

Super-Resolution
We show an example of the estimated super-resolution results in Figure 7. It was produced by using the proposed method on imgd2 of the "Harvard" data set, as shown in Figure 1i. Figure 7a-c show the input conventional (RGB) image, the low spatial resolution hyperspectral image, and the super-resolution result, respectively, where (b) and (c) are pseudo-color images with RGB bands set as (30,17,7).  A region of interest is magnified in each reconstructed image for better visual comparison. As can be seen from the magnified regions, Figure 7c shows that the reconstructed image had a good visual result and the same high spatial resolution as the RGB image. We also randomly selected three pixel points in the reconstructed image and the corresponding ground truth. The spectral responses of all the pixel points are shown in Figure 7d, where the solid lines are related to the ground truth and the dashed lines are related to the reconstructed image. The mean square error (MSE) values of the reconstructed spectral reflectance and the ground truth reflectance are presented in colors corresponding to the curves. The MSE values are very small, indicating that the reconstructed data were close to the ground truth data. In fact, in addition to the three example pixel-points, the spectral reflectances of other pixels were also reconstructed with high accuracy. Overall, these results show that our method has an efficient reconstruction performance in both spatial and spectral domains. Figure 8 shows the estimated spectral reflectance of three randomly selected pixel points and their corresponding ground truths. The left image is the pseudo-color image of the super-resolution HSI of image1 in the "Harvard" data set. It has the same high spatial resolution as the input conventional image, which was presented in Figure 1a. The estimated reflectance curves visually match the ground truth well. The MSE of the estimated reflectances and the ground truth were very small, which indicates that the spectrum was efficiently reconstructed.
To compare our method with the other state-of-the-art methods, Figure 9 shows a visual comparison of all of the methods for image1 in the "Harvard" data set. For each sub-figure, the image on the top shows the super-resolution results of the corresponding methods in band 21 (620 nm) and image at the bottom is the error map, constructed by comparison with the ground truth. For ease of comparison, a region of interest is magnified in each reconstructed image. The smokestacks on the roof are highlighted by red squares. As we can see from Figure 9b,c, the results of FUSE were more blurry than the ground truth, and the results of BSR had the worst error map. Figure 9d-h show that CSU, CMS, and LTTR get close visual results to our method. However, if we check the region in the red square, the smokestack in the error map of sub-figure (g) is closer to zero than in (d), (e), and (f). This indicates that our method performed better in the reconstruction of detailed textures. Visual comparison by human eye is inevitably subjective and, therefore, quantitative objective evaluation is necessary. For objective comparison, we carried out all methods to construct high resolution hyperspectral images from the input low spatial resolution hyperspectral images and conventional images. All quality metrics mentioned in Section 4.1 were applied to the resulting images and corresponding ground truth images. In particular, the images in the "Harvard" data set are all ground-based hyperspectral images, which have similar spatial and spectral properties. We give the mean and standard deviation for each quality metric in Table 3. The best results are shown in bold.
Note that Table 3 gives an evaluation of the overall image quality, while Figure 9 shows a visual comparison in a single band. Hence, it is reasonable that the subjective evaluation of Figure 9 might be inconsistent with some of the metrics in Table 3. This inconsistency does not affect the conclusions of the quantitative quality evaluation.  It can be seen, from the numerical results in Table 3, that the proposed method had superior performance to the compared methods, in terms of all quality metrics. All of the methods performed well in terms of SSIM. The superiority in SSIM values demonstrates that the spatial structures of the estimated hyperspectral images were well-recovered by the proposed method. Our method achieved a significantly lower average RMSE than all other compared methods. It reduced the error by about 6.4%, compared to the best baseline and improved at least 56.3% against all others. The superiority in RMSE indicates that our estimated image had the closest pixel values to the ground truth in all spectral bands. The average PSNR of the proposed method was 0.93-11.01 db (2.22% to 37.25%) larger than that of the other methods. This indicates that the super-resolution results obtained by the proposed method were closer to the ground truth, in terms of spatial numerical similarity. SSIM, RMSE, and PSNR all mainly evaluate performance in terms of spatial information. This improvement is a result of using the SLIC superpixel algorithm in our method. Spatially adjacent pixels are clustered, in order to take advantage of preserving structures or features with similar structures. SAM evaluates the quality of spectral reconstruction. It can be seen that the average SAM value of the proposed method was lower than that of the compared methods. Our average SAM value was reduced by 46.38% to 7.48%, compared to the other baselines. The superiority in spectral similarity of the proposed method comes from the sparsity priors by coupling a spatial group sparsity regularizer with sparse unmixing. ERGAS and Q2 n are global quality metrics, which not only evaluate the spatial structure but also the spectral similarity. As shown in Table 3, the ERGAS produced by our method was 4.92% less than the second-best performed baseline and 68.85% less than the worst-performing baseline.
It should be noted that, in Table 3, the number before the symbol "±" is the average value and the number after the symbol "±" is the standard deviation. The standard deviation indicates how the quality metric values differ from the average value. We can observe that our method obtained the best average values and the smallest standard deviations (except for ERGAS), which means our method not only performed better but was also more stable than the other compared methods.
Similarly, in Figure 10, the estimated spectral responses and the ground truth for the "UNIVERSITY OF PAVIA" data set are shown. The pseudo-color image of the super-resolution HSI is presented on the left side. It was visually well-reconstructed, compared to the input conventional image, as shown in Figure 2d. Our method also obtained a satisfactory result, in terms of spectrum reconstruction. As shown in the reflectance curves on the right side, the spectra of three randomly selected pixel points were well-reconstructed. As an example of visual results for the remote sensing data sets considered, Figure 11 shows a visual comparison of the results of all the considered methods for the "UNIVERSITY OF PAVIA" data set. The CSU results show an obvious difference from the ground truth. A careful check of the building in the red square reveals that the error maps of FUSE, BSR, and CSU show an obvious shadow of the building. The error maps of CMS, LTTR, and Ours were closer to the ground truth. The bottom three error maps were very close to each other, but we can still see that our method achieved a smaller reconstruction error, indicating that the proposed method could reconstruct detailed spatial structures more accurately. Considering the visual deviation of the human eye, a quantitative comparison for remote sensing data sets is presented in the following discussion.
Similar to the "Harvard" data set, Table 4 shows the results for all of the remote sensing data sets. The best values are highlighted in bold. In addition, if our method did not achieve the best result in a certain metric, but the second-best, it is highlighted and underlined.
All quality metrics of the proposed method on the "CUPRITE" data set outperformed the other methods. For the "CHIKUSEI" data set, the values of PSNR and Q2 n for the proposed method were the second-best in all methods, while SSIM, RMSE, SAM, and ERGAS were all the best. For the "INDIAN PINES" data set, our method obtained the best SSIM, PSNR, SAM, and ERGAS, and the second-best RMSE. For the "UNIVERSITY OF PAVIA" data set, our method obtained the best SSIM, RMSE, and SAM, and the second-best Q2 n . For the "UNIVERSITY OF HOUSTON" data set, all quality metrics were superior, except for RMSE (which was the second-best).

Impact on Classification
HSI classification is one of the most relevant topics in HSI processing. The quality of the fused images can be indirectly validated by pixel-wise classification. The classification performance was quantitatively validated using the overall accuracy (OA). The classification method we used is based on the K-Nearest Neighbors (KNN) classifier [42]. As the absolute OA value for classification is not mainly concerned in this paper, we set the parameter k as 5 and the ratio of training data as 0.1 for lower processing time. This led to not very high OA results, but this did not affect the verification of our super-resolution method. The "UNIVERSITY OF PAVIA" data set has also been widely used in HSI classification, due to the availability of ground truth data. The classification results of our method are shown in Figure 12, where sub-figure (a) shows the ground truth classification map, sub-figure (b) shows the reference classification map (which is the result of the original high spatial resolution HSI), and sub-figure (c) shows the classification map of our super-resolution HSI. Table 5 shows the quantitative numerical results of all methods for comparison. From the results, we can make the observation that our super-resoultion HSI can obtain a closer OA to the original HSI than any of the other considered methods.

Computational Cost
To evaluate the computational cost of the proposed work, we report the runtime for all methods in Tables 3 and 4, which were obtained by reconstructing the HSIs from the corresponding data sets. BSR was the most time-consuming method, as it needs to learn a dictionary for sparse coding. FUSE is a simple fusion strategy, making it the fastest method; however, it had poor super-resolution performance. Compared with the rest of the methods, the time consumed by our method was moderate and acceptable. For the "Harvard" data set, our method did not have a superior performance in run-time. This was due to the super-pixel procedure in our method, the run-time of which is mainly influenced by image size. For all remote sensing data sets, the run-time of our method was superior to that of LTTR but worse than that of CSU. Considering both HSI super-resolution and computational cost, the method proposed in this paper can be considered effective.

Conclusions
This paper proposed an HSI super-resolution method based on spatial group sparsity regularization unmixing. It can construct a high spatial and spectral resolution HSI from a low spatial resolution HSI and a high spatial resolution conventional image by solving the spectral unmixing problem for both input images. The unmixing-based super-resolution of the HSI helps to alleviate the constraints derived from the imaging process. Under the linear mixing model, the proposed unmixing method integrates the advantages of both spatial group sparsity and the 2,1 norm to produce a more accurate and robust unmixing estimation. The SLIC superpixel algorithm is used to transform the original hyperspectral data, making full use of the spatial information and sparse priors. The SLIC superpixel algorithm is used to transform the original HSI into spatial groups. Then, the distribution of the endmembers is estimated based on the spatial and sparsity priors. The proposed method takes advantage of this distribution estimate as a regularizer for sparse unmixing and solves for the abundances using the ADMM algorithm. The spatial group sparsity regularization unmixing makes full use of the spatial structure and the spectra of the images. In simulations on public data sets-a ground-based data set "Harvard" and five satellite remote sensing data sets-the proposed method showed superior performance, when compared to state-of-the-art methods.