Self-Dictionary Regression for Hyperspectral Image Super-Resolution

Due to sensor limitations, hyperspectral images (HSIs) are acquired by hyperspectral sensors with high-spectral-resolution but low-spatial-resolution. It is difficult for sensors to acquire images with high-spatial-resolution and high-spectral-resolution simultaneously. Hyperspectral image super-resolution tries to enhance the spatial resolution of HSI by software techniques. In recent years, various methods have been proposed to fuse HSI and multispectral image (MSI) from an unmixing or a spectral dictionary perspective. However, these methods extract the spectral information from each image individually, and therefore ignore the cross-correlation between the observed HSI and MSI. It is difficult to achieve high-spatial-resolution while preserving the spatial-spectral consistency between low-resolution HSI and high-resolution HSI. In this paper, a self-dictionary regression based method is proposed to utilize cross-correlation between the observed HSI and MSI. Both the observed low-resolution HSI and MSI are simultaneously considered to estimate the endmember dictionary and the abundance code. To preserve the spectral consistency, the endmember dictionary is extracted by performing a common sparse basis selection on the concatenation of observed HSI and MSI. Then, a consistent constraint is exploited to ensure the spatial consistency between the abundance code of low-resolution HSI and the abundance code of high-resolution HSI. Extensive experiments on three datasets demonstrate that the proposed method outperforms the state-of-the-art methods.


Introduction
With the developments of hyperspectral sensors, hyperspectral images (HSIs) have been widely used in numerous applications [1][2][3], such as remote sensing classification [4][5][6], change detection [7] and target detection [8].A hyperspectral sensor captures high-spectral-resolution information by constructing a continuous radiance spectrum for every pixel in the HSIs.However, due to the instrument limitation, it is difficult for hyperspectral sensors to simultaneously acquire high-spatial-resolution HSI.Moreover, the low-spatial-resolution in HSI will result in mixed pixels and greatly degrade the further processing in the remote sensing applications [9,10].Therefore, enhancing the spatial-resolution of HSI has become an important issue in the remote sensing community [11][12][13].
To mitigate this issue, HSI super-resolution [14,15] has been investigated to enhance the spatial-resolution of HSI.As a software technique, HSI super-resolution does not modify the sensor array or the imaging optics.HSI super-resolution is supposed as an inverse problem [11,16,17]: the original high-spatial-resolution HSI can be recovered from the low-resolution observations [14,18,19].Fundamentally, the missing spatial information in low-resolution HSI can be compensated by utilizing the prior knowledge in a high-resolution coincident image of the same scene [20], such as panchromatic image (PAN), RGB image and multispectral image (MSI).
Recently, many HSI super-resolution methods [12,[21][22][23] have been proposed to fuse low-resolution HSI with a high-resolution coincident image.An overview of recent state-of-the-art hyperspectral and multispectral image fusion methods can be found in [23,24].A general trend in the existing methods is to exploit the spatial information by spectral mixture analysis (hyperspectral unmixing) [25].In spectral mixture analysis, the HSI can be described by a mixture of some "pure" spectral signatures (the so-called endmembers).These spectral signatures are the spectra of the underlying materials presented in the observed scene.Based on the spectral mixture analysis, the original high-resolution HSI can be unmixed into endmember dictionary (or endmember spectra) and abundance code (or abundance matrix).The endmember dictionary denotes the "pure" spectral signatures, while the abundance code indicates the proportions of endmember spectra within each pixel.The high-resolution HSI can be estimated by combining the endmember dictionary and the abundance code.In this way, HSI super-resolution is transferred to estimate the endmember dictionary and the abundance code.The main limitation of these super-resolution methods is that their performance largely depends on the accuracy of estimating endmember dictionary (or endmember spectra) and abundance (or code of the dictionary) [26,27].
In the past several decades, a lot of research has been made to develop the efficient estimation of the endmember dictionary and the abundance code.Because the MSI captures the same scene with HSI, the endmember dictionary or the abundance code should be the same [27,28].The endmember dictionary is extracted from the low-resolution HSI and the abundance code is estimated by the spatial fractional abundances of MSI.For example, Yokoya et al. proposed coupled nonnegative matrix factorization (CNMF) to estimate the endmember dictionary and abundance code from HSI and MSI, respectively.To efficiently estimate the endmember dictionary and the abundance code, various constraints are considered such as spatial smoothness, the nonnegativity and sparsity constraints.Simoes et al. [26] proposed a convex subspace-based formulation by considering a total variation abundance regularization.Zou et al. [29] proposed a double regularization HSI super-resolution by introducing the spatial structure information and the nonnegative factorization.Veganzones et al. [27] proposed a local dictionary learning to exploit locally low rank property for HSI super-resolution.Zhao et al. [19] proposed a joint spatial and spectral regulation for HSI super-resolution.Dong et al. [30] proposed a non-negative structured sparse representation to exploit the spatial correlation among the learned sparse codes.However, most methods are inspired by a simple assumption [26,27,31]: the spectral information extracted from one of the images should also be able to explain the other one.The hyperspectral super-resolution image can be reconstructed by combining the endmember dictionary and the abundance code.In these conventional dictionary-based methods, the endmember dictionary is extracted from the low-resolution HSI and the abundance code is estimated by the spatial fractional abundances of MSI.Only the observed low-resolution HSI is used to estimate the endmember dictionary and the observed MSI is used to estimate the abundance code.These methods usually ignore the cross-correlation between the observed HSI and MSI, which is helpful in obtaining high quality reconstructed images.
In this paper, a self-dictionary sparse regression (SDSR) method is proposed to fuse the HSI and MSI.Both the observed low-resolution HSI and MSI are simultaneously considered to estimate the endmember dictionary and the abundance code.The proposed method first extracts the endmember dictionary by self-dictionary sparse regression, and then estimates the abundance code by constrained least squares and consistent constraint.(1) To extract the endmember dictionary, the proposed method performs a common sparse basis selection on the concatenation of observed HSI and MSI.The endmember dictionary is learned by finding a smallest (or sparse) subset of spectral signatures to represent the whole set of spectral signatures.In particular, the learned endmember dictionary can preserve the spectral consistency between the observed HSI and MSI, since HSI and MSI share the same sparse basis.(2) Although the abundance code can be estimated based on the learned endmember dictionary, directly estimating the abundance code of the MSI is difficult.This is because the number of multispectral bands is usually lower than the number of induced endmembers, estimating the abundance of MSI is an ill-posed problem [27].To improve the estimation of the abundance code, a consistent constraint is exploited to ensure the spatial consistency between the abundance code of low-resolution HSI and the abundance code of high-resolution HSI.This means that spatial fractional abundances in the corresponding spatial position should coincide.
Generally, the conventional dictionary-based HSI super-resolution method consists of four steps (see in Figure 1).Given the observed HSI, (1) the first step is to learn the endmember dictionary of HSI from the observed HSI.(2) Then, the endmember dictionary is projected onto the multispectral domain to estimate the endmember dictionary of MSI by a spectral response.(3) The projected endmember dictionary is used to estimate the abundance code from the observed MSI.(4) Finally, the endmember dictionary of HSI and the abundance code of MSI are combined to reconstruct the high-resolution HSI.In these conventional methods, the spectral responses of the sensors are assumed known.For example, Wei et al. [32] proposed HSI super-resolution by jointly estimating the endmember signatures and abundances from the observed HSIs.The spectral response of the multispectral sensor is assumed to be known and available.Ghasrodashti et al. [33] proposed an HSI super-resolution by combining spectral unmixing and Bayesian sparse.The spectral response is estimated from the observed images using the method presented in [26].Guerra et al. [34] proposed a computationally efficient algorithm for fusing multispectral and hyperspectral images by incorporating the spatial details of the MSI into the HSI.It is also assumed that the spectral response of the multispectral sensor is known.The common point of these methods is to estimate the endmember dictionary of HSI and the endmember dictionary of MSI alternatively.In this paper, the endmember dictionaries are estimated by performing a common sparse basis selection on the concatenation of observed HSI and MSI.Both the endmember dictionary of HSI and the endmember dictionary of MSI are estimated jointly.In other words, the endmember dictionary of MSI can be estimated without requiring a spectral response.1) The first step is to learn the dictionary from the observed hyperspectral image.(2) Then, the dictionary is projected onto the multispectral domain by a spectral response.(3) The projected dictionary is used to estimate the code from the observed MSI.(4) Finally, the dictionary and the code are combined to reconstruct the hyperspectral super-resolution image.
Consequently, the proposed method includes two main contributions: (1) A self-dictionary regression is proposed to identify the endmember dictionary on the concatenation of observed HSI and MSI.The learned endmember dictionary can preserve the spectral consistency between the observed HSI and MSI, since HSI and MSI share the same sparse basis.
(2) A consistent constraint is proposed to preserve the spatial consistency between low-resolution HSI and high-resolution HSI.The abundance code is estimated by constrained least squares and consistent constraint.This paper is organized as follows: the HSI super-resolution problem is formulated in Section 2. Section 3 overviews the conventional HSI super-resolution methods.Then, Section 4 presents the proposed method.Section 5 details the experimental results.Section 6 concludes this paper.

Related Work
In recent years, many HSI super-resolution methods have been proposed to enhance the spatial-resolution of HSI.A general solution is to fuse the low-resolution HSI with a high-resolution coincident image of the same scene.According to the coincident images, most methods can be categorized into two groups: the pan-sharpening methods and the HSI-MSI fusion methods.
The pan-sharpening methods enhance the spatial-resolution of HSI by fusing the low-resolution HSI with a corresponding high-spatial resolution panchromatic image (PAN).Due to the high-spatial-resolution, PANs have been widely used to enhance the resolution of MSI, known as MSI pan-sharpening.With the increasing availability of hyperspectral sensors, PANs are now extending to enhance the resolution of HSI, known as HSI pan-sharpening [23].Naturally, HSI pan-sharpening can be handled by the popular MSI pan-sharpening methods, such as component substitution (CS) [35] and multiresolution analysis (MRA) [36].To obtain the high-resolution HSI, the CS methods substitute the spatial component of HSI with PAN.The MRA methods inject the spatial details of PAN into HSI.In the CS methods, the HSI is first separated into spatial and spectral component.Subsequently, the fused image is obtained by substituting the spatial component with the PAN.The MRA methods first generate the spatial details by a multiresolution decomposition of the PAN.Then, the generated details can be injected into the HSI.The CS methods may provide high spatial quality, but compromise spectral distortion.The MRA methods can preserve spectral consistency but deteriorate the spatial quality.
The HSI-MSI fusion methods obtain the high-resolution HSI by combining an HSI with a high-spatial-resolution MSI because the MSI provides both the spatial and the spectral information.The HSI-MSI fusion methods are significantly more difficult and advantageous than the pan-sharpening methods.To fuse the HSI and MSI, many methods have been proposed in the recent decades: frequency-based methods and dictionary-based methods (unmixing or spectral mixture analysis).The frequency-based methods transform the original images (HSI and MSI) into the frequency components, and combine the wavelet coefficients in the transform domain.More recently, the dictionary-based methods uses spectral mixture analysis to enhance the spatial resolution of HSI.Based on the spectral mixture analysis, the original high-resolution HSI can be unmixed into endmember dictionary (or endmember spectra) and abundance code (or abundance matrix).In these methods, the high-resolution HSI can be fused by combining the estimated endmember dictionary and abundance code.Much research has been conducted to develop the efficient estimation of the endmember dictionary and the abundance code.Furthermore, various constraints [19,27,29,30,37] are considered to efficiently estimate the endmember dictionary and the abundance code, such as spatial smoothness [26], the nonnegativity and sparsity constraints [37].Simoes et al. [26] proposed a convex subspace-based method by considering a total variation abundance regularization.Akhtar et al. [37] presented a constrained sparse representation by imposing non-negativity and the spatial structure for HSI super-resolution.Zou et al. [29] proposed a double regularization HSI super-resolution by introducing spatial structure information and the nonnegative factorization.Lanaras et al. [38] proposed a hyperspectral super-resolution method by jointly unmixing the two input images into pure reflectance spectra of the observed materials.Veganzones et al. [27] proposed a local dictionary learning method to exploit the low rank property for HSI super-resolution.Zhao et al. [19] proposed a joint spatial-spectral regulation to utilize the nonlocal similarities for HSI super-resolution.Dong et al. [30] proposed a non-negative structured sparse representation to exploit the spatial correlation among the learned sparse codes.Mei et al. [21] proposed a novel three-dimensional full convolutional neural network for hyperspectral super-resolution.The main advantage of the dictionary-based methods is physically reasonable and effective for HSI-MSI fusion.

Problem Formulation
Let X ∈ R L×N denote an HSI with L spectral bands (rows of X ) and N pixels (columns of X ).In this paper, the goal of HSI super-resolution is to recover the original high-resolution HSI X ∈ R L×N from two degraded observations of X: a low-resolution HSI Y h ∈ R L×n and a high-resolution MSI Y m ∈ R l×N .For these observations, n N, and l L, which make the estimation of a super-resolution HSI X severely ill-posed.In particular, both observations Y h and Y m can be considered to be mapping of the original HSI: where ϕ h : R L×N → R L×n and ϕ m : R L×N → R l×N .Generally, MSI Y m can be approximated as: where T ∈ R l×L is a transformation matrix or spectral response.This means the spectral quantization from the high-spectral-resolution image (HSI) to the low-spectral-resolution image (MSI).
In experiments, the spectral response is often assumed to be known [26,31].The HSI can be described by a mixture of some pure spectral signatures [8,9].If the spectrum at each pixel is assumed to be a linear combination of several endmember spectra, the high-resolution HSI is formulated as: where U ∈ R L×p is the spectral dictionary and denotes the spectral signatures of the endmembers.
V ∈ R p×N is the abundance code and denotes the proportions of endmember spectra at each pixel.In addition, p represents the number of endmember spectra.
If U and V are known or estimated, an estimation of X can thereby be obtained by Equation ( 3).This reasoning led to the conventional dictionary-based method for HSI super-resolution [27,31] (see in Figure 1).Given the observed HSI Y h , (1) the first step is to learn the dictionary U from the observed HSI.(2) Then, the dictionary is projected onto the multispectral domain by a spectral response T. (3) The projected dictionary TU is used to estimate the code V from the observed MSI Y m .(4) Finally, the dictionary U and the code V are combined to reconstruct the high-resolution HSI X.
As shown in Figure 1, the success of the dictionary-based methods depends fundamentally on the first three procedures.Many recent efforts have been made in the past few years to develop the efficient estimation of U,T and V, related to the first three procedures-for example, Refs.[39,40] for U, Refs.[26,27] for T, and Refs.[27,30] for V.

Proposed Method
The available dictionary-based HSI super-resolution methods usually obtain dictionary U from the observed low-resolution HSI and estimate the code V from the observed MSI.However, these methods usually ignore the cross-correlation between the observed HSI and MSI, which is helpful in obtaining high quality reconstructed images.In this section, a novel HSI super-resolution method is presented from the perspective of self-dictionary sparse regression [41].A sparsity regularization is introduced to recover the spectral dictionary (or endmember signatures) from both observed low resolution HSI and MSI.In order to preserve the spectral consistency between the observed HSI and MSI, the spectral dictionary is estimated by performing a common sparse basis selection on the concatenation of observed HSI and MSI.Meanwhile, the dictionary of MSI can be also estimated without requiring a spectral response.Once the dictionary is acquired, the corresponding code can be recovered by standard non-blind unmixing methods (e.g., [42]).Here, the code of high-resolution HSI is estimated by ensuring the spatial consistency between the code of low-resolution HSI and the code of high-resolution HSI.Therefore, the proposed method contains two parts: (1) learning the dictionary with a common sparsity on the concatenation of observed HSI and MSI, (2) estimating the corresponding code by exploiting a consistent constraint to preserve the spatial consistency between low-resolution HSI and high-resolution HSI.The flowchart of the proposed method is shown in Figure 2.
The proposed method first (1) learns the dictionary with a common sparsity on the concatenation of observed HSI and MSI, (2) then estimates the corresponding code by exploiting a consistent constraint to preserve the spatial consistency between low-resolution HSI and high-resolution HSI.

Self-Dictionary Sparse Regression
The performance of the super-resolution HSI is strongly influenced by the estimation of the spectral dictionary [27].Since the low-resolution HSI and the corresponding high-resolution HSI capture the same scene, their endmembers should be the same [27,28].In the general dictionary-based methods, the dictionary U h ∈ R L×p is learned from the observed low-resolution HSI itself and ignores the spectral consistency between the observed HSI and MSI.In particular, the elements of the dictionary U h should be consistent with the dictionary U m of MSI Y m .For this reason, a novel method is proposed to obtain the spectral dictionary entries from both observed HSI and MSI.The spectral dictionary (or underlying materials or endmembers) of high-resolution HSI U ∈ R L×p is assumed to be same with the dictionary of low-resolution HSI First, a spatially upscale version of the low-resolution HSI can be obtained by: where f () is an upscaling function-for example, a bicubic interpolation.Then, the spectral dictionary is learned from the concatenation of pre-HSI and observed MSI by endmembers' induction algorithms: where U m ∈ R l×p is the dictionary of MSI and S ∈ R p×N is the coefficient matrix.In this paper, self-dictionary sparse regression [43,44] is used to extract the spectral dictionary (the spectra of the underlying endmembers).In this way, the dictionary is learned by finding a smallest subset of measurement vectors to represent the whole set of measurement vectors.Therefore, the dictionary is learned by perform a common sparse basis selection: min where C ∈ R N×N is the abundance matrix and two constraints are imposed: the abundance non-negativity constraint (ANC) C ≥ 0 and the abundance sum-to-one constraint (ASC) 1 T C = 1 T .C row−0 denotes the number of nonzero rows of C.
Under the pure pixel assumption, Equation ( 6) may be formulated by identifying a complete pure pixel index set Λ [45].Λ lists all indices of nonzero rows of C and contains a pure pixel index of every endmember: In this paper, Equation ( 6) is tackled by greedy pursuit [41,46] under unknown number of endmembers.In this case, both pure pixels and the number of endmembers are identified simultaneously.
Once Λ is identified, we have The intuition of the above observation is that, by solving Equation ( 6), we may recover a complete pure pixel index set Λ, and, consequently, identify the true endmember signature matrix U h and U m .

Sparse Codes Estimation via Constrained Least Squares and Consistent Constraints
Once the spectral dictionaries U h and U m are estimated, the corresponding codes V h and V m are estimated by solving the following constrained least squares (CLS) optimization problem: where codes V h ∈ R p×n and V m ∈ R p×N represent abundances of materials, and • F denotes the Frobenius norm.The constraint V h ≥ 0, V m ≥ 0 means that the codes are element-wise non-negative [47].It is possible to add the abundances sum-to-one constraint [47]: and V T m 1 p×1 = 1 n×1 .However, this constraint is usually dropped due to possible scale model mismatches [48].Multiplicative update rules have been developed to minimize the residual errors in Equations (10) and (11).The multiplicative update rules for Equations (10) and (11) are given by: where .* and ./denote element-wise multiplication and division, respectively.(A) T denotes the transposition of the matrix A.
In the conventional dictionary-based methods, the code of desired high-resolution HSI V is approximated by estimating the code from the observed MSI V m : V ≈ V m .However, a main limitation of these methods is their inability to estimate the code V m from MSI [27].Since the number of multispectral bands is usually lower than the number of entries in the spectral dictionary, estimating the abundance code of MSI is an ill-posed problem [27].To address this issue, the spatial consistency between V h and V m is considered to improve the estimation of V: where λ is a weighting factor to balance the importance of the two terms in Equation ( 14).When λ is too small (λ = 0), the second term λ V h F L − VF H 2 F will have less effect.In this case, V is only related to V m .When λ is too big (λ = ∞), the first term V − V m 2 F will have less effect.In this case, V is only related to V h .Therefore, the appropriate parameter λ will balance the contributions between the codes V m and V h .F L ∈ R n×q and F H ∈ R N×q represent the corresponding spatial locations in low-resolution HSI Y h and high-resolution HSI X, respectively.F H = [f H1 , . . ., f Hq ] indicates the q samples on high-resolution HSI X, and F L = [f L1 , . . ., f Lq ] indicates a set of corresponding samples on low-resolution HSI Y h .Each column of F L and F H is indicator functions that contain one at the selected samples and zeros elsewhere.In other words, 1 means that the pixel is selected, and 0 means that the pixel is not selected.In addition, there is a positional correspondence between these selected pixels on the low-resolution HSI Y h and the high-resolution HSI X. HSI super-resolution assumes that a pixel in the low-resolution HSI can be obtained by averaging the pixels of high-resolution HSI belonging to the same area.This assumption refers to pixel aggregation [32].Given a set of q samples F H = [f H1 , . . ., f Hq ] on high-resolution HSI X, and a set of corresponding samples Taking derivative with regard to V and setting the resultant equation to zero yield, the solution of Equation ( 14) is given by: where I is a N × N identity matrix.The desired high-resolution HSI can be recovered by Equation (3), after estimating the spectral dictionary U and the abundance code V.The overall description for estimating the high-resolution HSI is given as Algorithm 1.

Algorithm 1 The proposed method
Input: Y h ∈ R L×n -the observed hyperspectral image; Y m ∈ R l×N -the observed multispectral image; Parameter λ = 1, and q = n.
1. Identify a complete pure pixel index set Λ by Equation ( 6); 2. Estimate the spectral dictionaries U h and U m by Equation ( 9), calculate the spectral dictionary U by U = U h ; 3. Calculate the code of low-resolution HSI V h by Equation (10); 4. Calculate the code of MSI V m by Equation (11); 5. Estimate the code of high-resolution HSI V by Equation (15); 6. X is output by X = UV.

Output:
X -the high-resolution hyperspectral image.

Experiments
To verify the performance of the proposed method, four subsections are presented in this section: (1) Section 5.1 details three public datasets; (2) Section 5.2 introduces the competing methods and evaluation indexes; (3) Section 5.3 describes the implementation details of the proposed method; and, (4) finally, Section 5.4 displays the experiments and comparisons.

Datasets
To evaluate the performance of the proposed method, extensive datasets are considered, including ground based and remote sensing imagery.These datasets are commonly used in HSI super-resolution.Through experiments on these databases, we can fairly evaluate the performance of the proposed method.We consider the HSIs in the datasets as the original high-resolution HSIs [37].As common practice [23,37,49], the low-resolution HSI is simulated by first blurring the high-resolution HSI and then down-sampling the result by a factor of 3 in each direction.In order to facilitate the experiments, we use the Norm(•) defined in Equation ( 16) to normalize the HSIs in the datasets to the range [0, 1] in advance, where X donates the HSIs in datasets.For evaluating the proposed method, we converted the resulting image into an 8-bit image: Ground based dataset.The ground based dataset is CAVE (http://www.cs.columbia.edu/CAVE/databases/multispectral/). CAVE [37,38,49] comprises 32 HSI with 512 × 512 × 31.Among them, 512 × 512 is the image size, and 31 is the number of bands.Each HSI is acquired at a wavelength interval of 10 nm in the range 400-700 nm.The high-spatial-resolution MSI is created by integrating a high-resolution HSI over the spectral dimension, using the Nikon D700 (Sendai, Japan) spectral response (https://www.maxmax.com/spectral_response.htm).Here, the HSI of "Balloons" is selected to give visual illustration, as shown in Figure 3. Three bands (3, 25 and 30) are selected as a pseudo color view.Remote sensing dataset.The first remote sensing dataset is Pavia University scene (http://www.ehu.eus/ccwintco/uploads/e/e3/Pavia.mat), which is widely used in hyperspectral classification with nine kinds of samples [26,28].This image is acquired from the reflective optics system imaging spectrometer (ROSIS), a sensor of DLR that has 115 spectral bands.ROSIS hyperspectral remote sensing satellite captured the image at Pavia University in northern Italy in 2002.This HSI contains 610 × 340 pixels and 103 spectral bands, after removing water absorption bands.A part of size 200 × 200 × 93 including abundant detailed information is selected from the original image, as shown in Figure 4.This image is used as the high-resolution HSI.In the HSI super-resolution problem, the spectral responses of the sensors is assumed known.The spectral response of the IKONOS satellite is used to create the observed MSI.The second remote sensing dataset is Paris scene [26].This dataset is taken above Paris and obtained by two instruments on board the Earth Observing-1 Mission (EO-1) satellite: the Hyperion instrument, and the Advanced Land Imager (ALI) instrument.The Hyperion instrument provides an HSI with a spatial resolution of 30 m, while the ALI instrument provides an MSI at a resolution of 30 m, as shown in Figure 5.In the experiments of HSI-MSI fusion, the observed HSI should have lower resolution than the observed MSI.Therefore, the low-resolution HSI is simulated by blurring and down-sampling the original high-resolution HSI.Furthermore, to further verify the effectiveness of the proposed method, Pavia dataset is used for hyperspectral classification.

Competitors and Evaluation Indexes
Competing method.To demonstrate the superiority of the proposed method, six state-of-the-art methods are selected as competing methods.All of these competing methods can be obtained from the corresponding authors' MATLAB+MEX (R2014a, MathWorks, Natick, MA, USA) implementation.All the parameters involved in the competing methods are automatically chosen as described in the references.For a fair comparison, we use the same number of endmembers in all experiments.Furthermore, the spectral responses of the sensors can be estimated by HySure [26].The estimated spectral responses are used as basis for all the competing methods:
• Coupled nonnegative matrix factorization (CNMF) [31] achieves the high-resolution HSI by alternately unmixing the observed HSI and MSI.The endmember dictionary and abundance code are estimated from HSI and MSI, respectively.

•
Hyperspectral SuperResolution (HySure) (https://github.com/alfaiate/HySure)[26] formulates HSI super-resolution as a convex subspace-based optimization problem.A total variation abundance regularization is considered to promote piecewise-smooth.In addition, the spectral responses of the sensors are estimated by assuming relative smooth.

•
Convolutional neural network collaborative nonnegative matrix factorization (CNNCNMF) [28] employs the convolutional neural network to learn the spatial mapping between the observed HSI and MSI.In addition, the collaborative nonnegative matrix factorization is introduced to explore the spectral characteristic between the observed HSI and MSI.
Evaluation indexes.To assess the performance of all competing methods, six quantitative indices [26,50] are employed, including root mean square error (RMSE), mean peak signal-to-noise ratio (MPSNR), mean structure similarity index (MSSIM), erreur relative globale adimensionnelle de synthèse (ERGAS), universal image quality index (UIQI), and spectral angle mapper (SAM).The similarity between the target image and the reference image is evaluated by RMSE and MPSNR based on mean square error.The structural consistency between the target image and the reference image is evaluated by MSSIM.The fidelity is evaluated by ERGAS based on the weighted sum of MSE for each band.UIQI is a universal image quality index to evaluate image reconstruction quality.The fidelity of the spectral reflectance is described by SAM.Furthermore, the larger MPSNR, MSSIM, and UIQI are, the better image reconstruction quality is.The smaller RMSE, ERGAS, and SAM are, the better the image reconstruction quality is.

Parameter Determination
The parameters of the proposed method can be clearly delineated in Algorithm 1: the regularization parameter λ, and the number of consistency samples q.The impact of parameters is evaluated on Balloons data, Pavia dataset and Paris dataset.Experimental results for various λ and q are shown in Figure 6.In the CAVE dataset, the parameters are set to be λ = 1, q = n.In the Pavia dataset, the parameters are set to be λ = 0.7, q = n.In the Paris dataset, the parameters are set to be λ = 10, q = n.
The parameter λ is the regularization parameter which balances the contributions between the consistency constraint term and the fidelity term.With respect to λ, Figure 6a,d,g show the experimental results from λ = 0 to λ = 10.For both RMSE and MPSNR, the performance increases with a large λ.However, in the "Balloons" data, when the value of λ is larger than 1, the performance decreases.In the Pavia dataset, when the value of λ is larger than 0.7, the performance decreases.
Experimental results for various q/n (from 0 to 1) are shown in Figure 6b,e,h.It is observed that with the increase of the spatial bandwidth q/n, the performance increases slightly.When the number of consistency samples is equal to the number of pixels in the low-resolution HSI n, we achieve the best performance on RMSE and MPSNR.In this case, all the pixels in the low-resolution HSI are considered.F L becomes a n × n identity matrix, F H accounts for a uniform sub-sampling of the image, in order to yield the lower-spatial-resolution of the HSI.Furthermore, we analyze the effect of the number of endmembers p on the performance of proposed method, as shown in Figure 6c,f,i.It is observed that a proper number of endmembers is important to different datasets.In the CAVE dataset, Pavia dataset and Paris dataset, the number of endmembers is set to 5, 10 and 20, respectively.

Comparison to the State-of-the-Art Methods
The experimental results on three datasets are discussed in this subsection.The proposed method is compared with six state-of-the-art methods in the experiments.These methods are CNMF, GSOMP, BSR, HySure, CNNCNMF and PALM.For the sake of comparison, we present three types of evaluation: (a) quantitative assessments, (b) visual results, and (c) spectral differences between the original and estimated HSIs.Tables 1-3 show HSI super-resolution performances of different methods on the CAVE, Pavia, and Paris dataset, respectively.For qualitative analysis, Figures 7-9 show the pseudo color images in three datasets.It is clearly observed that the proposed method receives a better visual quality than other competing methods.As can be observed from Figures 7-9, the proposed method produces much sharper edges than other methods without any obvious artifacts across the image.It is very interesting to see that the proposed method can achieve the comparable visual performance.Without requiring the spectral response, the proposed method is suitable for complex real-word super resolution tasks.To gain further intuition, the average squared error between the original spectra and obtained spectra are given.Figure 10 shows spectral residuals of several pixels after super resolution.In Figure 10, six typical objects are selected in the datasets such as Pixel (50,50) in Balloons data, Pixel (300,300) in Balloons data, Pixel (50,50) in Pavia dataset, Pixel (100,100) in Pavia dataset, Pixel (50,50) in Paris dataset and Pixel (70,70) in Paris dataset.Compared with other competing methods, Figure 10 illustrates that the proposed method can still preserve the useful spectral information of the original HSI.Furthermore, the average squared errors between the obtained and ground truth spectra are given in   In the proposed method, the cross-correlation between the observed HSI and MSI is utilized.The endmember dictionaries are learned by performing a common sparse basis selection on the concatenation of observed HSI and MSI.Both the endmember dictionary U h and U m can be estimated simultaneously.We call the proposed method with concatenation operation the concatenation proposed method.In contrast, there are still two ways to estimate the endmember dictionary without the concatenation of observed HSI and MSI: coupling proposed and uncoupling proposed method.

•
The concatenation proposed method simultaneously estimate the endmember dictionary U h and U m from the concatenation of observed HSI and MSI.

•
The coupling proposed method refers to first estimating the endmember dictionary U h from the observed HSI, and then utilizing the estimated U h and spectral response to calculate the U m .

•
The uncoupling proposed method refers to estimating the endmember dictionary U h and U m from the observed HSI and MSI, respectively.
A comparison of the proposed SDSR methods is shown in Table 5.Among the proposed methods, the concatenation proposed method achieves impressive performance.This is because the concatenation operation can exploit the cross-correlation between the observed HSI and MSI.Furthermore, the coupling proposed method can obtain similar performance with the concatenation proposed method, but require a spectral response.In the coupling proposed method, a spectral response is assumed known and provides spectral information between HSI and MSI.In the uncoupling proposed method, the endmember dictionary U m is estimated from MSI, which is an ill-posed problem.The number of spectral bands in MSI is usually lower than the number of induced endmembers, which leads to the worst performance.Based on the multiplicative updates, the overall cost for NMF is O (tNLp).The computational complexity for the proposed method is O (tN(L + l)p), where t is the number of iterations in Algorithm 1.It can be seen that the computational complexity depends linearly on the HSI resolution.This means that the proposed method enhances resolution in an acceptable computational time.
Additionally, the running time is shown in Tables 1-3.In the proposed method, the experiments are implemented using Matlab R2014a, running on 3.4 GHz Intel CPU (Santa Clara, CA, USA) with 64 GB memory.In the experiments, the running time of the proposed method is comparable to the competing methods except for Bicubic.Bicubic interpolation is frequently used for super-resolution with low computational complexity Both the proposed method and CNMF have the same running time, the running time is almost linear in the image size.Although the proposed method cannot outperform all the competing methods, the proposed method can obtain the high-resolution HSI in acceptable computational time.

Hyperspectral Classification
Hyperspectral classification is further presented on Pavia dataset to verify the effectiveness of the proposed method.The estimated super-resolution HSI is classified with support vector machine (SVM).The ground truth used in classification is shown in Figure 11.To obtain a reliable evaluation, we estimate the classification overall accuracy (OA) with the SVM using ten-fold cross-validation.The dataset is randomly divided into ten equal parts.
In this experiment, two types of evaluation are presented.(1) The average cross-validation classification accuracies are presented in Table 6; (2) The classification maps are presented in Figure 12.Specifically, overall accuracy (OA) is selected as the classification assessment index.It is clearly seen from Table 6 and Figure 12 that the classification results have been improved by several HSI super-resolution methods: Bicubic, HySure, and Proposed SDSR method.Bicubic and HySure obtain the better index than the proposed method.This is because that Bicubic and HySure can obtain a smooth HSI.Bicubic interpolates the data points on a two-dimensional regular grid.Images obtained by Bicubic interpolation are smoother and have fewer interpolation artifacts.In HySure, total variation is used as a regularization.Total variation imposes sparsity in the distribution of the absolute gradient of an image.In this case, the transitions between the pixels should be smooth in the spatial dimension.Compared with Bicbic and HySure, the proposed method gives the compared classification result without smoothing the HSI.The reasons are mainly as follows: (1) The endmember in HSI are learned by performing a common sparse basis selection on the concatenation of observed HSI and MSI, which can preserve the spectral consistency between the observed HSI and MSI; and (2) the spatial consistency between low-resolution HSI and high-resolution HSI is considered.

Conclusions
In this paper, we propose a self-dictionary sparse regression to enhance the spatial-resolution of HSI.In the proposed method, the spatial-spectral consistency is considered to fuse the HSI and MSI.A notable difference between the existing HSI super-resolution and the proposed method is that both observed HSI and MSI are simultaneously considered to estimate the endmember dictionary and the abundance code.Specifically, the endmember dictionary is extracted by performing self-dictionary sparse regression on the concatenation of observed HSI and MSI.Then, a consistent constraint between the low-resolution HSI and the high-resolution HSI is exploited to improve the estimation of the abundance code.Experimental results on three datasets validate that the proposed method outperforms the conventional HSI super-resolution methods.

Figure 1 .
Figure 1.The conventional dictionary-based method for HSI super-resolution.(1) The first step is to learn the dictionary from the observed hyperspectral image.(2) Then, the dictionary is projected onto the multispectral domain by a spectral response.(3) The projected dictionary is used to estimate the code from the observed MSI.(4) Finally, the dictionary and the code are combined to reconstruct the hyperspectral super-resolution image.

Figure 3 .
Figure 3. Balloons data from CAVE dataset (color image composed of bands 3, 25 and 30 for the red, green and blue channels, respectively).(a) low-resolution hyperspectral image; (b) multispectral image; (c) high-resolution hyperspectral image.

Figure 4 .
Figure 4. Pavia dataset and its auxiliary images (color image composed of bands 20, 16 and 5 for the red, green and blue channels, respectively).(a) low-resolution hyperspectral image; (b) multispectral image with four bands (blue, green, red, and near-infrared); (c) high-resolution hyperspectral image.

Figure 5 .
Figure 5. Paris dataset and its auxiliary images (color image composed of bands 28, 13 and 3 for the red, green and blue channels, respectively).(a) low-resolution hyperspectral image; (b) multispectral image with four bands (blue, green, red, and near-infrared); (c) high-resolution hyperspectral image.

Figure 6 .
Figure 6.(a) evaluation of the parameters λ on Balloons data; (b) evaluation of the parameters q on Balloons data; (c) evaluation of the parameters p on Balloons data; (d) evaluation of the parameters λ on Pavia dataset; (e) evaluation of the parameters q on Pavia dataset; (f) evaluation of the parameters p on Pavia dataset; (g) evaluation of the parameters λ on Paris dataset; (h) evaluation of the parameters q on Paris dataset; (i) evaluation of the parameters p on Paris dataset.

Figure 11 .
Figure 11.Pavia dataset and its Ground truth in classification.(a) high-resolution hyperspectral image; (b) ground truth in classification.

Table 1 .
Comparison with the competing methods on the CAVE dataset.The reported evaluation values are calculated in the 8-bit resulting images.

Table 2 .
Comparison with the competing methods on the Pavia dataset.The reported evaluation values are calculated in the 8-bit resulting images.

Table 3 .
Comparison with the competing methods on the Paris dataset.The reported evaluation values are calculated in the 8-bit resulting images.

Table 4 .
Average squared error between obtained and ground truth spectra of some pixels.

Table 5 .
The effectiveness of concatenation in the proposed method.The reported RMSE is calculated in the 8-bit resulting images.

Table 6 .
Classification assessment index on the Pavia dataset.