Joint Spatial-spectral Resolution Enhancement of Multispectral Images with Spectral Matrix Factorization and Spatial Sparsity Constraints

: This paper presents a joint spatial-spectral resolution enhancement technique to improve the resolution of multispectral images in the spatial and spectral domain simultaneously. Reconstructed hyperspectral images (HSIs) from an input multispectral image represent the same scene in higher spatial resolution, with more spectral bands of narrower wavelength width than the input multispectral image. Many existing improvement techniques focus on spatial- or spectral-resolution enhancement, which may cause spectral distortions and spatial inconsistency. The proposed scheme introduces virtual intermediate variables to formulate a spectral observation model and a spatial observation model. The models alternately solve spectral dictionary and abundances to reconstruct desired high-resolution HSIs. An initial spectral dictionary is trained from prior HSIs captured in different landscapes. A spatial dictionary trained from a panchromatic image and its sparse coefficients provide high spatial-resolution information. The sparse coefficients are used as constraints to obtain high spatial-resolution abundances. Experiments performed on simulated datasets from AVIRIS/Landsat 7 and a real Hyperion/ALI dataset demonstrate that the proposed method outperforms the state-of-the-art spatial- and spectral-resolution enhancement methods. The proposed method also worked well for combination of exiting spatial- and spectral-resolution enhancement methods.


Introduction
Hyperspectral imaging has demonstrated its usefulness in various earth observation applications, such as landscape classification [1], object detection [2], and environmental monitoring [3]. Hyperspectral sensors collect contiguous spectral bands over the visible to the infrared wavelength ranges. Rich spectral characteristics in hyperspectral image (HSI) are beneficial in identifying and classifying different materials and landscapes [4]. In hyperspectral imaging, there is a trade-off between spatial and spectral resolutions. The spatial resolution of spaceborne hyperspectral images (HSIs) is always lower than multispectral images (MSIs) and panchromatic (PAN) images. Instantaneous field of view (IFoV) should be enlarged to achieve acceptable signal-tonoise ratio (SNR) with narrow spectral bandwidth [5]. For example, the maximum spatial resolution of Sentinel 2 MSI is 10 m while the spatial resolution of Hyperion HSI is only 30 m. Low spatial resolution of HSI always results in mixed pixels and degrades the performance of subsequent applications, while spectral understanding and analysis are usually limited by low spectral resolution of MSIs or PAN images. Although earth observation remote sensing data have become increasingly available and the corresponding applications have attracted wide interests, there are existing challenges in acquiring images with simultaneously high spatial resolution and high spectral resolution [6]. Therefore, many research focuses on recovering high quality synthetic image from low resolution (LR) inputs, including spatial resolution improvement approaches [7][8][9][10][11][12][13][14][15][16][17][18][19] and spectral resolution enhancement techniques [20][21][22][23][24][25][26][27].

Spatial Resolution Improvement of HSI
Over past decades, several spatial improvement methods have been proposed based on the fusion of MSI and PAN images. Hyperspectral pan-sharpening improves spatial resolution of HSI by fusing LR HSI with a high resolution (HR) PAN image covering a spectral range from the visible to the near infrared ranges [7]. Except for the classical hyperspectral pan-sharpening approaches such as component substitution (CS) methods [8], multi-resolution analysis (MRA) methods [9], and model based optimization methods [10] [11], deep learning, particularly convolutional neural network (CNN), has been widely exploited in pan-sharpening tasks. In [12], a residual CNN model is designed to describe the mapping between LR/HR MSI pairs and PAN image. Yuan et al. [13] proposed a multi-scale and multi-depth CNN (MSDCNN) for pan-sharpening, in which multi-scale feature extraction is used to reconstruct the HR MSI.
HSI fusion is another popular approach to spatial resolution improvement. A HR HSI is recovered by fusing a LR HSI with a HR MSI, where the HSI and MSI are assumed to be captured simultaneously for the same landscapes. Typical fusion methods based on the transform domain including the 2-D and 3-D wavelet transforms [14] can always result in heavy spectral distortions. Statistical methods are proposed by using a stochastic mixing model [15]. Another effective fusion method based on spectral unmixing recovers HR HSI using the endmembers/spectral dictionary from a LR HSI and the abundances from a HR MSI [16]. For example, Yokoya et al. [17] proposed a coupled non-negative matrix factorization (CNMF) method based on unsupervised spectral unmixing, where MS image and HSI are alternately unmixed using non-negative matrix factorization. In [18], a nonparametric Bayesian model is proposed using dictionary learning and sparse coding to achieve HR hyperspectral result. Dong et al. designed a non-negative structured sparse representation (NSSR) method [19]. The reconstructed HR HSI is formulated through estimating a non-negative spectral dictionary and its sparse coefficients. However, these fusion methods are limited to the same spatial coverage as the input image, therefore only HR HSI of the same spatial coverage as the input LR HSI can be obtained.

Spectral Resolution Enhancement Techniques
Despite various approaches to spatial resolution improvement techniques, only a few methods focus on spectral super-resolution. Spectral resolution enhancement reconstructs a HSI from a MSI or RGB image, where the recovered HSI and the input MSI/RGB image have the same spatial resolution and coverage [20]. Existing spectral resolution enhancement algorithms are categorized into hardware methods and reconstruction methods. The hardware methods achieve HSI from RGB/MSI through active light [21], tunable filters [22], and variable spectral responses provide by different RGB cameras [23]. Due to the requirement of modifying optics instruments or adding extra hardware equipment, such methods may not be suitable for remote sensing tasks. Several spectral reconstruction methods are proposed to improve spectral resolution. For example, a spectral resolution enhancement method (SREM) [24] recovers a HSI with wide swath by estimating spectral response matrix. Arad et al. [25] proposed a spectral resolution enhancement algorithm based on sparse representation, where a spectral dictionary is learned from prior HSIs through K-means singular value decomposition (K-SVD), and HSI is recovered from the input RGB image. The performance of this method is further boosted in [26] by exploiting with a shallow sparse representation framework. In [27], a convolutional neural network was introduced to learn an endto-end spectral mapping between RGB image and HSI, which is proved to achieve better spectral resolution improvement results. Most spectral resolution improvement methods mentioned above only recover HSI from a RGB image, covering only the visible or part of the near-infrared spectral regions. Our previous work [20] proposed a spectral super-resolution approach over the full spectral range from 400 nm to 2500 nm. Spectral improvement strategy and spatial preservation strategy are introduced respectively to estimate spectral response relationship by prior MSI/HSI pairs, learn spectral dictionaries from HS priors and ensure consistency using spatial constraints.
Despite many applications requiring high-resolution HSIs with relatively short time intervals, existing spaceborne HSIs are only available in low spatial-and temporal-resolution as well as low revisit frequency. PAN images have high spatial resolution and MSIs come in high temporal resolution and revisit frequency [20]. Therefore, we can easily acquire high spatial resolution PAN images or high temporal resolution MSIs at different time or over different landscapes, which is challenging for spaceborne HS sensors. Most scenes captured using multispectral or panchromatic sensors do not have their corresponding HSIs. This paper proposes a joint spatial-spectral resolution enhancement method using PAN images and prior HSIs. The reconstructed HSI has the same spatial resolution as PAN image and the same spectral resolution as the prior HSIs. The proposed method not only improves the spatial-and spectral-resolution of the input image, but also creates a new HSI at different time or over different scenes, which provides new opportunities for earth observation.
Existing techniques aim at improving either spatial resolution or spectral resolution, not HR HSI by improving spatial-and spectral-resolution simultaneously. If resolution enhancement is performed step-by-step in the spatial-or the spectral domain, any defects and distortions transferred from the previous step will pose difficulties in eliminating the artifacts in the current step. So, the performance of their subsequent processing tasks will be degraded. Additionally, it will be challenging to unify spatial and spectral enhancement techniques into one framework to improve spatial-and spectral-resolution simultaneously. A low spatial resolution MSI with a few spectral bands are required to recover a HSI of high spatial resolution and of more than 200 bands, which is obviously a highly ill-posed converse problem. The proposed method reduces artifacts and distortions using spatial features from PAN image and spectral signatures from prior HSIs. Combined spectral unmixing for unified spatial and spectral improvements ensures accurate spectral characteristics for subsequent applications.
The proposed scheme recovers a high spatial resolution HSI from an input low spatial resolution MSI based on high spatial-spectral correlation in an input low resolution MSI and the desired high quality HSI. Each pixel in the desired HSI can be represented by a linear combination of a few pure spectral signatures extracted from a spectral dictionary [28]. The pixels in the input LR MSI are assumed to be highly correlated with the HR ones in the desired high quality HSI [29]. Figure 1 shows a joint spatial-spectral resolution enhancement algorithm proposed in this paper to simultaneously improve spatial-and spectral resolution. Spectral and spatial improvement are combined into a unified framework based on formulating spectral observation model and spatial observation model. In this work, an input LR MSI and the desired HR HSI are denoted as LSpaLSpe and HSpaHSpe, respectively. Two virtual intermediate variables are designed for a more comprehensive description of the spatial and the spectral observation models. These two virtual intermediate variables are denoted as HSpaLSpe and LSpaHSpe, respectively. HSpaLSpe represents high spatial resolution but low spectral resolution MSI, and LSpaHSpe indicates low spatial resolution but high spectral resolution HSI. HSpaLSpe has the same spatial resolution as the desired HR HSI, considered as the spatial improvement result of the input LR MSI. While LSpaHSpe has the same spectral resolution as the desired HSI, considered as the spectral enhancement result of the input MSI. Desired HSI is decomposed by a linear combination of spectral signatures from spectral dictionary, where abundances describe the fractions of each spectral signature [28]. We assume that high spectral resolution dictionary is extracted from the LSpaHSpe HSI while the high spatial resolution abundances are extracted from HSpaLSpe MSI. So the spectral observation model, which describes spectral relationship scheme between the input MSI and the desired HSI, can be formulated via the virtual intermediate variable LSpaHSpe. The spatial observation model to describe the relationship between input MSI and desired HSI is built using HSpaLSpe. LSpaHSpe and HSpaLSpe are just virtual intermediate variables in this work, not necessary to be solved.
In the spectral observation model, spectral characteristics in the desired HSI can be extracted from LSpaHSpe. An initial spectral dictionary is acquired by adapting prior HSIs, captured using the same sensor as the desired HSI and cover different landscapes from the input image. The spectral dictionary is acquired from LSpaHSpe HSI. In the spatial observation model, a spatial dictionary is learned from a HR PAN image having the same spatial resolution as the desired HSI. The trained spatial dictionary is shared by the desired HSI and the HSpaLSpe MSI, while the relationship between their sparse coefficients is then exploited as spatial constraint to obtain high spatial resolution abundances. The sparse coefficients and abundances are solved using the feedback scheme in [11] to achieve more accurate results. In this paper, spectral-and spatial observation model are unified into a joint framework to alternately solve spectral dictionary and abundances. Spectral dictionary is updated using abundances while abundances are solved by the spectral dictionary in the previous iteration. These two steps can be used as constraints for each other to finally achieve joint spatialspectral enhancement without solving for the virtual intermediate variables. The contributions of the proposed algorithm are summarized as follows: 1. Improves spatial and spectral resolution simultaneously, which unifies the spatial-and spectral enhancement steps in one framework taking high resolution spectral features and spatial information as the constraints for each other in an alternate solving process. To our best knowledge, this is the first attempt.
2. Designed spectral and spatial observation models for the joint spatial and spectral enhancement problem. Virtual intermediate variables LSpaHSpe and HSpaLSpe are introduced in spectral and spatial observation models to find spectral/spatial relationships between input LR MSI and the desired HR HSI. The high spectral resolution dictionary and the corresponding high spatial resolution abundances are alternately solved to recover a high spatial and spectral resolution HSIs.
LSpaHSpe and HSpaLSpe are only virtual intermediate variables without having to solve them in the proposed method.
3. The proposed joint spatial-spectral enhancement algorithm is applied to real remote sensing data, such as ALI/Hyperion (30 m, 9 bands/ 30 m, 242 bands), for a target scene, the PAN image of ALI (10 m) is used to provide high spatial resolution features, while prior Hyperion HSIs (30 m, 242 bands) over different scenes are used to train spectral dictionary with high spectral resolution characteristics. So high spatial-and spectral resolution Hyperion data (10 m, 242 bands) of the target scene is achieved from the input LR ALI data (30 m, 9 bands).
The remainder of the paper is organized as follows. In Section 2, spatial observation and spectral observation models are introduced, respectively. The proposed joint spatial-spectral resolution enhancement method is presented in Section 3. In Section 4, we give the experimental results on both simulated and real datasets. Analyses and discussion are shown in Section 4 and finally conclusions are drawn in Section 5.

Spatial Observation Model and Spectral Observation Model
This section introduces spatial-and spectral observation models and two virtual intermediate variables LSpaHSpe HSI and HSpaLSpe MSI to alternately solve high resolution spectral dictionary and high spatial resolution abundances to recover desired HR HSIs.

Spatial Observation Model
In this paper, the virtual variable HSpaLSpe is assumed to have the same spatial resolution as the desired HSI and the same spectral resolution as the input MSI. HSpaLSpe is just an intermediate variable and is not necessary to be solved. PAN image contains abundant high-resolution spatial details and structures [30], used to train an over-completed spatial dictionary via sparse representation framework [31]. In Figure 2, virtual variable HSpaLSpe has the same spatial resolution as the desired HSI. HSpaLSpe and the desired HSI are assumed to share the same spatial dictionary. A spectral degradation mapping between HSpaLSpe and the desired HSI exists. The relationship between sparse coefficients of HSpaLSpe and the desired HSI obeys the spectral degradation mapping [5]. An interactive feedback framework [11] is also utilized in the spatial observation model to simultaneously deal with spatial processing and spectral unmixing.

Spectral Observation Model
Desired HSI consists of a few spectral signatures in a spectral dictionary [32]. In Figure 3, the virtual variable LSpaHSpe has the same spectral resolution as the desired HSI and the same spatial resolution as the input MSI. Like HSpaLSpe, LSpaHSpe is also considered as an intermediate variable and therefore not be solved in the proposed process. When the high spectral resolution image covering the same scene as the input image is not available, prior HSIs with different landscapes are used to estimate an initial spectral dictionary. The utilized prior HSIs have the same spatial resolution as LSpaHSpe and they are captured by the same sensor as the desired HSI. Due to the same spectral resolution, LSpaHSpe and the desired HSI share the same high-resolution spectral dictionary, and there is a spatial mapping scheme between them. Figure 3 shows that the mapping scheme of their abundances should be in accordance with the mapping scheme between two images.

Proposed Joint Spatial-spectral Enhancement Algorithm
The proposed joint spatial-spectral enhancement method combines spatial and spectral observation models into a unified framework to alternately solve spectral dictionary and abundances, from which the high spatial resolution and high spectral resolution image is achieved. HSIs and MSIs are denoted as 2-D matrices where each column represents an image of one spectral band. Input low spatial and spectral resolution MSI is denoted as denote LSpaHSpe and HSpaLSpe, respectively. M and m are respectively the number of pixels of high and low spatial images. L and l represent the number of spectral bands with l L  . The LSpaHSpe HSI is denoted as X  , treated as the spatially degraded version of the desired HR HSI Z . Input MSI Y is treated as spectrally degraded version of LSpaHSpe. X  and Y can be formulated as: where m M R   T denotes the spatial degradation matrix, which includes blurring and spatial down-sampling factor between the desired HSI Z and LSpaHSpe HSI X  .
is the spectral response transform matrix between input MSI Y and LSpaHSpe HSI X  . In general, T and M are used as prior knowledge according to the sensors parameters. For real data, T can be estimated using Gaussian point spread function and M can be acquired using cross calibration [33]. 1 μ and 2 μ represent modeling errors and the noise. Another virtual intermediate variable HSpaLSpe image X and input MSI Y can be considered as spectrally degraded version of the desired HSI Z and spatially degraded version of HSpaLSpe, respectively: where 1 ν and 2 ν are modeling errors.

Formulation of Spectral Observation Model to Solve Spectral Dictionary
The spectral observation model is to obtain high spectral resolution dictionary. The spectrum of each pixel in the desired HSI Z can be represented by a linear combination of spectral signatures, which is formulated as [32]: where i z is the spectrum of pixel i in Z .
is the spectral dictionary containing K pure spectral signatures, i a represents fraction abundance which is assumed sparse, i n is modeling error. Since all images are used in 2-D forms [34], Eq. (5) can be rewritten as a compact matrix form: where A denotes high spatial resolution fraction abundances, and n is modeling error.
Due to the fact that Z has the same spectral resolution as the LSpaHSpe image X  , their underlying spectral signatures should be the same. Substituting (6) to (1), the spectral observation model can be As a result, spectral dictionary training can be solved by the following sparse non-negative matrix decomposition problem [19]:  describes data fidelity and  is a positive parameter balancing the trade-off between sparseness and data fidelity terms. Both i ã and the atoms in dictionary spe D are regularized to be non-negative, and the non-negative dictionary learning algorithm mentioned in [19] is employed to achieve the high resolution spectral dictionary spe D of the desired image Z .

Formulation of Spatial Observation Model to Solve abundances
Spatial observation model aims at estimating high spatial resolution abundances. The virtual intermediate variable HSpaLSpe image X is used in this step. Assume that represented as a linear combination of atoms via an over-completed spatial dictionary, which is expressed as follows based on sparse representation [35]: where spa D is the spatial dictionary trained by high spatial resolution PAN image using Kmeans SVD (K-SVD) algorithm [36]. j  is a sparse coefficient vector. When the sparse formulation of one patch can be generalized to the patch matrix, The HSpaLSpe image X has the same spatial resolution as Z , so the patch matrix can be represented: Substituting (3) into (10) and (11), the relationship between z α and x α can be obtained: Input image Y is the spatial degradation version of X , the patch matrix 1 2 { , ,..., } l y y y y  , which covers the same scene as the patch matrix x , formulated in (13) by involving the spatial degradation relationship presented in (4): Although HSpaLSpe image X is not available in this work, the corresponding sparse coefficient matrix x α can be acquired by the input image Y using the spatial degradation scheme between Y and X . The sparse coefficient matrix z α can be obtained by (12). In Section 3.1, the desired image Z can also be represented in spectral domain by a few spectral signatures weighted by corresponding fraction abundances. Spatial expression of Z should be approximated to its spectral expression [11]: where R represents the extraction operator used to extract overlapping patch matrix from Z .
The sparse coefficient matrix is used as spatial constraint to achieve high spatial resolution abundance A of the desired HSI Z , where x α and A are iteratively solved to obtain more accurate results with less distortions.

Joint Spatial-spectral Enhancement Algorithm
The proposed method recovers high resolution HSI Z from the low spatial resolution and low spectral resolution input MSI Y , through which the spatial and spectral resolution are jointly improved. LSpaHSpe image X  and HSpaLSpe image X are introduced as virtual variables which describes low spatial resolution but high spectral resolution image (HSI) and high spatial resolution but low spectral resolution image (MSI), respectively. X  is used to build the spectral observation model and X is exploited to describe the spatial observation model, from which high resolution spectral dictionary spe D and high spatial resolution abundance A are estimated alternately to achieve the desired HSI Z with both high spatial and spectral resolution and less distortions.
As described in Section 3.1, the high spectral resolution dictionary spe D can be solved using X  via the non-negative dictionary learning algorithm, where X  is assumed to cover the same landscapes as the input MSI and the desired HSI. However, X  is a virtual variable that is unavailable in real cases. To obtain the high spectral resolution dictionary, prior HSIs, covering different landscapes, are used to train an initial spectral dictionary 0 spe D ， in the first iteration. These prior HSIs are assumed to have the same spatial resolution as the input MSI and the same spectral resolution as the desired HSI. So, the initial spectral dictionary can be trained: where H represents the prior HSI, is the sparse abundance, and  is a positive parameter.
In Section 3.2, HSpaLSpe image X is used to establish the spatial observation model to estimate high spatial resolution abundance A , where spatial sparse information of Z is used as constraints to achieve more accurate result with less distortions. The sparse coefficient matrix z α of Z is obtained by combing (12) and (13).
where y is the patch matrix in the input MSI Y , spa D is spatial dictionary trained by high spatial resolution PAN image ( Z should have the same spatial resolution as the PAN image). x α and z α are sparse coefficient matrices of the X and Z ,respectively. Parameter 2  balances sparsity and representation error and  controls the spectral degradation of x α and z α . The patch matrix y is directly extracted from Y , and it is assumed to be accurate. So it helps to achieve more accurate spatial sparse information of the desired image.
The spectral expression and spatial representation of Z should be approximated. The solved sparse coefficient matrix z α is exploited as spatial constraint to obtain high resolution abundance A and simultaneously improve the accuracy of z α and A , the idea of which has already been proposed in our previous work [11]. The objective function of A is written by using (14): where 1 R  represents the patch matrix extraction factor.  is the weight of abundance sparsity and  is the parameter of spatial sparsity.
In this paper, spectral dictionary spe D and abundance A are alternately solved. Using (14)-(15), the objective function of spe D is defined as (except the first iteration): where  is the weight of abundance sparsity.

Solver
There are four objective functions that need to be solved, including the solution of initial spectral dictionary ,0 spe D in (15), the solution of spatial sparse information z α in (16), the solution of high spatial resolution abundance A in (17) and the solution of high resolution spectral dictionary spe D in (18). These multiple variable objective functions are convex and unique solutions can be solved by variable splitting and alternate optimization [37].

Solution of Initial Spectral Dictionary
,0 spe D The objective function of initial spectral dictionary ,0 spe D in (15) can be solved using the nonnegative dictionary learning algorithm proposed in [19], where each dictionary atom is updated via a closed-form solution. So, the optimization problem in (15) is divided into two sub-problems: (1) Solve B with respect to a fixed Eq. (19) is solved by ADMM [38] for a fast convergence rate [39]. ( Eq. (20) is solved via block coordinate descent which updates one atom in each iteration using the non-negative constraint [40].

Solution of Spatial Sparse Information z α
The objective function of spatial sparse information z α in (16) can be divided into following two sub-problems: (1) Solve spatial sparse coefficient matrix x It is a sparse coding problem that can be solved by alternately learning spa D from the high spatial resolution PAN image using K-SVD [36] and estimating x α using OMP [41].
(2) Solve spatial sparse coefficient matrix z α with respect to a fixed x α 2 arg min || || The optimization function in (22) can be directly solved via least square method [42].

Solution of High Spatial Resolution Abundance A
The objective function of high spatial resolution abundance A in (17) is divided into following two sub-problems.
(1) Solve A with respect to a fixed z α Both of them are sparse coding optimization problems. Eq. (23) is solved via ADMM and Eq. (24) is solved by OMP. (1) Update spe D with respect to a fixed A 1 2 arg min || ||

Solution of High Resolution Spectral Dictionary
Eq. (25) can be solved by block coordinate descent [40], which is similar to the solution of (20).
(2) Solve A with respect to a fixed spe D Being similar to the solution of (19), the solution in this problem is obtained using ADMM [38].
The proposed joint spatial-spectral resolution enhancement algorithm is implemented by alternately obtaining spectral dictionary spe D and abundance A , which are formulated in (19) and (20), respectively. Table 1 Table 1. The optimization procedures of the proposed algorithm.
Step 3: Solve the high spatial resolution abundance Step 4: Solve the high resolution spectral dictionary , spe i D with Eq. (18).

Return
Output: HR HSI Z .

Experiment Results
Experiments on both simulated and real datasets are presented to verify the effectiveness of the proposed joint spatial-spectral resolution enhancement algorithm (denoted as J-SpeSpaRE). No algorithms exist that simultaneously improve spatial and spectral resolution. The proposed method is compared to the state-of-art spatial resolution enhancement methods and spectral resolution enhancement methods, respectively. The input of our proposed method is a MSI with low spatial and spectral resolution, and the inputs of all compared methods should be the same to make fair comparison. Several PAN-sharpening methods are exploited to compare the performance of spatial resolution enhancement. The result of our proposed method J-SpeSpaRE is a HSI with high spatial and spectral resolution, which should be spectrally degraded in the comparison of spatial enhancement. Spectral resolution enhancement methods are used to compare spectral performance. The result of our proposed method should be spatially degraded when comparing with spectral enhancement methods. In this paper, spatial resolution enhancement algorithms for comparison (PAN-sharpening algorithms) include GSA method [43], Indusion method [44], and SparseFI method [35]. Arad's method [25] and SREM method [24] are used to compare the performance of spectral resolution enhancement.
The input of our method is LR MSI, HR PAN image, and LR prior HSIs, the output HR HSI has the same spatial resolution as the input PAN image and the same spectral resolution as the HSI priors. Since the reconstructed HSI covers the same scene as the input MSI, there is no pixel shift and unmatched pixels. PAN image provides high spatial resolution information. In fact, PAN images do not need to cover the same scene nor be well registered with the input MSI. There are plenty of structural primitives (such as object edges, line segments, and other elementary features) contained in PAN image and these primitives are qualitatively similar in similar type of scenes [11]. The HSI prior images are utilized for training initial spectral dictionary. Therefore, the proposed method is independent with image registration between low resolution images and high resolution images.
The assessments used to evaluate the performance of compared methods are Peak Signal-to-Noise Ratio (PSNR), Structural Similarity (SSIM), Feature Similarity (FSIM), Correlation Coefficient (CC), Root-Mean-Square Error (RMSE), Spectral Angle Mapper (SAM), and Per-pixel Deviation (PD). PSNR, the ratio between the maximum power of a signal and the power of residual errors, evaluates the spatial quality of each band [16]. Higher PSNR values indicate better reconstruction results. SSIM measures the similarity of spatial structures in each band by using various windows. It is combined by the comparison of luminance, contrast and structure [45,46]. The value of SSIM ranges between 0 and 1. FSIM shows the spatial similarity in each band with a full reference image. It works by jointly utilizing phase congruency and gradient magnitude [46]. The value of FSIM also varies between 0 and 1. CC shows the spectral similarity between the recovered HR HSI and the ground truth [35]. The value of CC ranges from -1 to 1. RMSE is applied for the evaluation of reconstruction accuracy by measuring the root mean square error between the recovered HR HSI and the ground truth [47]. The smaller the RMSE value, the better reconstruction performance. SAM is used to assess spectral distortions by calculating the absolute value of the angle between the ground truth and the recovered spectra [16]. It ranges from 0 to 1, and the lower value close to 0 indicates less spectral distortions. With PD, the recovered image is first spatial degraded to the resolution of the original image and is then subtracted from the original image on a per-pixel basis. Then, the average deviation per pixel was calculated using the number of pixels [48]. The minimum if PD is 0 and lower value shows less differences.

Experiments on Simulated Datasets
Two datasets are used in simulated experiments to compare the performance of our proposed method with spatial resolution enhancement methods and spectral resolution enhancement methods, respectively. Both of simulated datasets Indian pines and Cuprite are acquired by AVIRIS sensor. The spatial resolution is 20 m and there are 224 spectral bands covering from 400 nm to 2,500 nm. Indian pines dataset is captured over Northwest of Indiana, USA. Cuprite dataset is obtained over the Cuprite mine district in Nevada. Sub-scenes with the size of 256 × 256 are extracted in both Indian pines and Cuprite datasets. The spectral bands with serious noises and water vapor absorption are discarded, so 162 bands remained.
The above data is considered as ground truth HR HSI with high spectral-and spatial-resolution. Input LR MSI is generated by spatial down-sampling and spectral down-sampling, respectively. A Gaussian filter and a spatial down-sampling factor are applied for spatial down-sampling [49]. For Indian pines dataset and Cuprite dataset, spectral down-sampling is implemented using the given spectral response function between Landsat TM and AVIRIS. The simulated MSI is generated with uniform spectral response functions corresponding to Landsat TM bands 1-5 and 7, which cover the wavelength of 450-520, 520-600, 630-690, 760-900, 1550-1750, and 2080-2350 nm, respectively [17]. Spatial dictionary is trained by HR PAN image simulated by the bands of ground truth over the visible wavelength. The spatial resolutions of PAN image and ground truth HSI are the same. Spectral dictionary is learned from LR prior HSIs which are generated by spatially down-sampling different AVIRIS HSIs with the same spectral configuration as ground truth HSI. In the experiment on Indian pines, the prior HSIs are Moffett Field HSI and Cuprite HSI. While Indian pines and Moffett Field are treated as prior HSIs in the experiment on Cuprite dataset.
The input of our proposed method is a MSI with low spatial and spectral resolution, which should be the same as all of compared methods to make fair comparison. Several PAN-sharpening methods are exploited to compare the performance of spatial resolution enhancement. The result of our proposed method J-SpeSpaRE is the HSI with high spatial and spectral resolution, which should be spectrally degraded in the comparison of spatial enhancement.
Spatial assessment comparison on Indian pines dataset and Cuprite dataset are respectively presented in Table 2 and Table 3. The best evaluations are listed in bold letters. The pan-sharpening methods GSA, Indusion and SparseFI are used to compare the performance of spatial resolution enhancement. From Table 2 to Table 3, MPSNR, MFSIM, and MSSIM represent the average values of all spectral bands while SAM denotes the average results of all pixels. It can be found from Table 2 and Table 3 that our proposed J-SpeSpaRE method achieves better spatial performance than other compared methods. GSA is a component substitution method and Indusion is a multi-resolution analysis method, both of which are conventional pan-sharpening techniques. SparseFI proves superior spatial enhancement results than GSA and Indusion due to the high spatial resolution dictionary trained by PAN image. However, spectral correlation and constraints are ignored in SparseFI, which leads to more spectral distortions than our proposed method. The proposed J-SpeSpaRE method can simultaneously improve spatial resolution and spectral resolution. PAN images are trained to generate spatial dictionary with high spatial resolution textures and structures via sparse representation, which is assumed to be benefited for acquiring high spatial resolution abundances. On the other hand, abundances are updated in each iteration by using the spectral dictionary trained in the previous iteration. Spectral dictionary and abundances are alternately solved to ensure more accurate spatial information and less spectral distortions. Except from the assessment evaluation, visual comparisons of each method on Indian pines dataset and Cuprite dataset are also represented in Figure 4 and Figure 5, where the reconstructed results are shown as red, green and blue bands. We found that all of the compared methods can recover high spatial resolution MSIs. However, spatial distortions exist in Indusion method and SparseFI method, while GSA method leads to serious spectral distortions and blurring. As indicated from Figures 4(e) and 5(e), our proposed method proves better visual performance and less distortions than other compared methods.  Spectral super-resolution methods are used to evaluate the effectiveness of spectral resolution enhancement. For fair comparison, the result of our proposed method J-SpeSpaRE should be spatially degraded. The spectral assessments on both Indian pines dataset and Cuprite dataset are listed in Table  4 and Table 5, respectively. The best evaluation results are written in bold letters. The compared spectral resolution enhancement methods include Arad and SREM. In Tables 4 and 5, our proposed method achieves better or more competitive results than the compared methods. Figures 6 and 7 show visual comparison of Indian pines dataset and Cuprite dataset, where the spectral enhancement results of band 550 nm, 750 nm, and 1,500 nm are presented. Arad's method is designed based on sparse representation framework where a high-resolution spectral dictionary is trained using hyperspectral priors. Spatial constraints are ignored in Arad's method, so it suffers from serious spatial and spectral distortions. A transformation matrix between input MSI and the desired HSI is estimated in SREM method, where spatial correlation is not used because of the pixel by pixel processing. The proposed J-SpeSpaRE method shows superior comparison results than Arad' method and SREM. Spatial observation model and spectral observation model are designed to acquire high spectral resolution dictionary and high spatial resolution abundances in an alternative approach.
These two steps are used as constraint for each other, so spectral distortions are reduced, and spatial features are maintained.

Effectiveness of Joint Spatial and Spectral Enhancement Framework
In this paper, spectral-and spatial enhancement are implemented in a unified framework where spectral dictionary and abundances are iteratively solved to achieve a high resolution HSI from the input MSI with low spatial and low spectral resolutions.
To validate the effectiveness of the proposed joint spatial and spectral enhancement framework, the spatial evaluation values of PSNR and RMSE, as well as the spectral evaluation values of SAM in each iteration are given in Figure 8, where assessments on Indian pines dataset and Cuprite dataset are presented by blue solid curves and black dashed curves, respectively. From Figure 8, the reconstruction performance increases dramatically at the first and second iterations, and then improves very slowly when iterating. In this paper, 4 and 5 are the best iteration times for Indian pines dataset and Cuprite dataset, respectively. So, the increasing performance in each iteration indicates the superiority of the proposed joint spatial and spectral enhancement framework.

Experiment on Real Dataset
ALI/Hyperion real datasets captured by EO-1 satellite are applied to evaluate the performance of the proposed method. EO-1 satellite simultaneously carries hyperspectral Hyperion sensor and multispectral ALI sensor, where Hyperion sensor captures 242 hyperspectral bands and ALI sensor captures one PAN band and nine multispectral bands [50]. The spectral resolution of Hyperion image is 10 nm. The spatial resolutions of ALI image and Hyperion image are both 30 m. The spatial resolution of PAN image is 10 m. PAN image (10 m) and MSI (30 m) can be directly acquired to recover the corresponding HSI (10 m) using spatial dictionary trained from PAN image and spectral dictionary trained from prior HSIs (30 m) over various scenes. The selected Hyperion/ALI images are obtained in June of 2002 over the city of Paris, France. Figure 9 (a) shows the PAN image, and RGB compositions of ALI and Hyperion images are presented in Figure 9 (b) and (c). The prior hyperspectral data used for training initial spectral dictionary are Hyperion hyperspectral data captured over the cities of Berlin, Germany, and Xi'an, China. Atmospheric correction and registration are firstly performed. Fast Line-of-sight Atmospheric Analysis in ENVI 5.3 is used for atmospheric correction [24]. The polynomial algorithm is exploited to register Hyperion image to ALI image by selecting more than 50 pairs of tie points. Combinations of state-of-the-art spatial resolution enhancement methods and spectral resolution enhancement methods are used in this section to compare with our proposed J-SpeSpaRE method. Figure 10 shows the RGB compositions of the proposed method and all of combined methods. The input is ALI MSI with spatial resolution of 30 m, spatial resolution enhancement method (Indusion or SparseFI) and spectral resolution enhancement method (Arad's method or SREM method) are sequentially performed to achieve the recovered Hyperion HSI with spatial resolution of 10 m, which is used to compare with the reconstruction result of our proposed J-SpeSpaRE method. There are four combinations of spatial resolution enhancement methods and spectral resolution enhancement methods, which are denoted as Indusion+Arad, Indusion+SREM, SparseFI+Arad and SparseFI+SREM. The high-resolution hyperspectral ground truth is not available in real case, so only the overlapped regions of ALI and Hyperion images could be used to evaluate the reconstruction performance of all compared methods. Comparisons of sub-scenes at band 550 nm, band 900 nm, and band 1,600 nm are presented in Figure 11. From visual comparison, our proposed joint spatial and spectral enhancement method achieves better results than other combined methods. Clearer boundaries and less color distortions exist in our method, which proves the effectiveness of simultaneous improvement in the spatial and spectral domain. In the compared combination methods, the artifacts and errors are transformed from spatial enhancement to spectral enhancement through the sequential approach. The Indusion+Arad method and Indusion+SREM method suffer from spatial blurring and distortions due to the drawback of Indusion method. SREM method takes advantage of underlying spectral materials so less spectral distortions are presented in the combination SparseFI+SREM method. Our proposed method obtains overall better results than other compared methods, exploits spectral observation model and spatial observation model, and also unifies them into a joint framework to iteratively solve spectral dictionary and abundances. Spectral and spectral resolutions are simultaneously improved with less errors and distortions.

Spectral Unmixing on Real Dataset
Spectral unmixing is applied to the recovered HR HSIs of the real dataset to evaluate the performance of the proposed method. Endmembers indicate spectral reflectance of each landscape and abundances prove proportion of each endmember. VCA [51] is utilized as endmember extraction algorithm and SUnSAL [28] is exploited for abundances estimation. The sub-scenes in Figure 11 are used in spectral unmixing of the proposed and the methods for comparison. The false color composite is given in Figure 12 where five landscapes (Woods, Lawn, Residence, Sand land, and Crop land) are selected to evaluate accuracy of spectral signatures. Spectral unmixing result of the original LR Hyperion Paris HSI is also given to verify effectiveness of the proposed joint spatial and spectral resolution enhancement method.  Figure 13 shows that the reflectance of the five selected endmembers of all compared methods (Indusion+Arad, Indusion+SREM, SparseFI+Arad, and SparseFI+SREM) and the proposed method. The abundances maps of each endmember are also listed in Figure 14. In Figure 13, our proposed method achieves better performance of endmember extraction than other combined methods. Spectral reflectance of our proposed method has the lowest difference from the original HSI. More accurate spectral endmembers can be extracted after improving spatial-and spectral-resolution simultaneously than applying spatial enhancement and spectral enhancement step by step. Figure 14 shows the superiority of spatial information achieved by our method, where high spatial resolution abundance maps have more clear structures, sharp edges and spatial details than other combined methods. Reliable spectral signatures and high-resolution spatial features can be acquired through joint spatial-and spectral-resolution enhancement to verify the accuracy and effectiveness of the proposed method.

Conclusions
This paper proposes a joint spatial-spectral resolution enhancement algorithm based on spectral factorization and spatial sparsity to achieve an HR HSI from an input LR MSI. Spectral and spatial observation models are formulated to describe spectral and spatial relationship schemes between the input MSI and the desired HSI. Virtual intermediate variables, LSpaHSpe and HSpaLSpe, are introduced to make more comprehensive descriptions in spectral and spatial observation models which are used to respectively solve high spectral resolution dictionary and high spatial resolution abundances. Spectral dictionary can be trained by adapting prior LSpaHSpe HSIs, and abundances are obtained by using sparse coefficients as spatial constraint. A spatial observation model and a spectral observation model are unified into a joint framework to alternately solve spectral dictionary and abundances. These two steps can be used as constraints for each other and finally achieve the result of joint spatial-spectral enhancement without solving the virtual intermediate variables. The proposed joint spatial-spectral enhancement framework overcomes the drawback of a sequential implementation of spatial improvement and spectral enhancement steps to achieve more accurate reconstruction results and low distortions.
Author Contributions: Conceptualization, methodology, and writing-original draft preparation, Chen Yi.; writing-reviewing and editing, Jonathan Cheung-Wai Chan, Yong-qiang Zhao and Seong Kong. All authors have read and agreed to the published version of the manuscript.