Domain Transfer Learning for Hyperspectral Image Super-Resolution

A Hyperspectral Image (HSI) contains a great number of spectral bands for each pixel; however, the spatial resolution of HSI is low. Hyperspectral image super-resolution is effective to enhance the spatial resolution while preserving the high-spectral-resolution by software techniques. Recently, the existing methods have been presented to fuse HSI and Multispectral Images (MSI) by assuming that the MSI of the same scene is required with the observed HSI, which limits the super-resolution reconstruction quality. In this paper, a new framework based on domain transfer learning for HSI super-resolution is proposed to enhance the spatial resolution of HSI by learning the knowledge from the general purpose optical images (natural scene images) and exploiting the cross-correlation between the observed low-resolution HSI and high-resolution MSI. First, the relationship between lowand high-resolution images is learned by a single convolutional super-resolution network and then is transferred to HSI by the idea of transfer learning. Second, the obtained Pre-high-resolution HSI (pre-HSI), the observed low-resolution HSI, and high-resolution MSI are simultaneously considered to estimate the endmember matrix and the abundance code for learning the spectral characteristic. Experimental results on ground-based and remote sensing datasets demonstrate that the proposed method achieves comparable performance and outperforms the existing HSI super-resolution methods.


Introduction
The hyperspectral imaging technique can acquire images with hundreds of spectral bands for each image pixel.Hyperspectral Images (HSI) have been widely used in numerous applications, such as urban planning, precision agriculture, and land-cover classification [1][2][3][4][5][6].However, due to the limitations of the image spectrometer, it is difficult to acquire hyperspectral images with high-spectral-resolution and high-spatial-resolution simultaneously.Therefore, the spatial resolution of HSI data is often low, which is hard to capture the details of the land-cover and highly degrades the subsequent processing in the remote sensing fields.Hence, enhancing the spatial resolution of HSI data has gained more and more attention in the remote sensing community.
To achieve this goal, the HSI super-resolution technique [7,8] has been investigated to enhance the spatial resolution of HSI data by a software technique.The existing HSI super-resolution methods are based on the assumption that multiple observations of the same scene are required with the observed low-resolution HSI.These auxiliary observations, such as RGB images, panchromatic images (PAN), and Multispectral Images (MSI), can be used to estimate the missing spatial information in the observed HSI data.A comparative review of the existing hyperspectral and multispectral image fusion approaches can be found in [9].
On the basis of spectral mixture analysis [10], the HSI data can be formulated by an endmember matrix multiplied by the corresponding abundance matrix.The endmember matrix indicates the pure spectral signatures, while the abundance matrix denotes the proportions of endmember spectra for each pixel.Therefore, the problem of HSI super-resolution is reduced to obtaining the optimal endmember and abundance matrices of the original high-spatial-resolution HSI.
Recently, many HSI approaches have been presented to enhance the spatial resolution of the observed HSI by exploiting the auxiliary observations.Since the observed HSI and the target HSI capture the same scene, their endmember matrices should be the same [11].Moreover, the abundance matrix is often estimated from the observed MSI.Aiazzi et al. [12] presented an HSI super-resolution method based on a Generalized Laplacian Pyramid (GLP).Zhao et al. [13] proposed an HSI super-resolution approach by introducing a spatial-spectral joint nonlocal similarity in the reconstruction.Simões et al. [11] proposed a Hyperspectral Super-resolution (HySure) method with a convex subspace-based formulation.HySure introduced a total variation regularization for the estimation of the abundance matrix, which is valid to preserve edges while suppressing noise in the homogeneous regions.Generally, the subspace transformation is derived from the low-spatial-resolution HSI by an endmember extraction technique, i.e., Vertex Component Analysis (VCA) [14].Lanaras et al. [15] proposed a Proximal Alternating Linearized Minimization (PALM) for HSI super-resolution, which jointly unmixes the observed HSI and MSI into the spectra of endmembers and the corresponding fractional abundances.A nonnegative structural sparse representation-based HSI super-resolution method was presented in [16].Gao et al. [17] proposed a Self-Dictionary Sparse Regression (SDSR) method by combining the observed HSI and MSI to estimate the endmember matrix and the abundance matrix.
Compared with the auxiliary observations, general purpose optical images (natural scene images) are easily acquired and have more rich information.The relationship between low-and high-resolution HSIs is assumed to be the same as that between low-and high-resolution natural scene images [18].That is because the optical information in remote sensing images has strong similarities with that in natural images [11].In [18], the target high-spatial-resolution HSI was estimated by introducing a Collaborative Nonnegative Matrix Factorization (CNMF) between the observed HSI and the transferred high-spatial-resolution HSI, without requiring any auxiliary observations.However, it usually ignores the abundance information estimated from the observed MSI, which is effective to improve the final super-resolution reconstruction quality.To exploit the cross-correlation between the observed HSI and MSI and the nonlinear mapping relationship in the natural image domain effectively, a new framework based on domain transfer learning is proposed to enhance the spatial similarity and preserve the spectral consistency in the reconstructed HSIs.The main contributions of this paper can be summarized as follows: (1) To reduce the blurring effect in the observed HSI, the proposed method first exploits a Convolutional Super-resolution Network (CSN) model trained by the natural scene images.
Then, the trained CSN model is transferred to the HSI domain.(2) The observed low-spatial-resolution HSI and the transferred pre-high-spatial-resolution HSI are simultaneously used to estimate the optimal endmember matrix with higher precision compared with the baselines.(3) Considering the spatial information and spectral consistency, a new optimization function with two regularization parameters is proposed to obtain the optimal endmember and abundance matrices of the target HSI with higher spatial resolution, while preserving its high-spectral-resolution.
The remainder of this paper is organized as follows.Section 2 briefly introduces some related works.Section 3 provides our proposed method in detail.In Section 4, experimental results are presented to demonstrate the effectiveness of the proposed method compared with state-of-the-art baselines.Finally, a conclusion of this work is provided in Section 5.

Related Works
In the past decades, many image super-resolution methods have been presented.According to whether the auxiliary observations are required, the existing image super-resolution methods are divided into two categories: single-image-based and auxiliary-based.The single-image-based approaches (e.g., [19,20]) try to improve the spatial-resolution of each band image in HSI data.However, these methods ignore the spectral information in the reconstruction process.
The auxiliary-based HSI super-resolution methods have been proposed to estimate the high-spatial-resolution HSI by fusing the observed HSI and the high-spatial-resolution auxiliary observations, i.e., PAN [21], RGB, and MSI [22].Thomas et al. [23] introduced a component substitution scheme for HSI super-resolution.The observed HSI is first divided into spatial and spectral components.Then, the estimated HSI is obtained by substituting the spatial component with the observed PAN.Based on the spectral mixture analysis, Yokoya et al. [24] proposed an HSI super-resolution method by constructing a coupled feature space between the observed HSI and MSI of the same scene.Subsequently, many HSI super-resolution approaches have been proposed to develop the efficient estimation of the endmember and abundance matrices with some constraints [11][12][13][15][16][17][18].For example, in [11], a spatial smoothness constraint was imposed in the abundance optimization.In [25], the nonnegativity and sparsity constraints were introduced in a constrained sparse representation for HSI super-resolution.Fang et al. [26] proposed a superpixel-based sparse representation model to fuse the observed HSI and MSI.Yi et al. [27] presented a regularization model by exploiting the spatial and spectral correlations to achieve HSI-MSI fusion.These methods directly use the observed HSI or the bicubic interpolation of the observed HSI to estimate the sparse dictionary or the endmember matrix.However, a serious blurring effect in the observed HSI may lead to low estimation accuracy of the endmember matrix.
To sum up, it is important to capture the spatial similarity of each band image and the cross-correlation between the observed HSI and the auxiliary observations.Therefore, this is an effective way to take full use of the advantages of the single-image-based and auxiliary-based approaches.

Proposed Method
To take advantage of the additional information from the natural images and the spatial-spectral information between the observed HSI and MSI, a new Super-Resolution network based on Domain Transfer Learning (SRDTL) is proposed to enhance the spatial resolution, while preserving the abundant spectral information in the reconstructed hyperspectral remote sensing images.The overall flowchart of the proposed method is shown in Figure 1.

Notation
Let {X X X s , Y Y Y s } be the high-and low-resolution information in the natural image domain.Similarly, {X X X, Y Y Y} is denoted as the high-and low-resolution information in the HSI domain, where X X X ∈ R M×N×L and Y Y Y ∈ R m×n×L are the original high-spatial-resolution HSI and the observed low-spatial-resolution HSI, M, N, m, n represent two spatial sizes, and L denotes the number of spectral bands.X X X and Y Y Y have the same high-spectral-resolution.Based on these observations: m M and n N, the super-resolution problem, i.e., the estimation of X X X, is severely ill-posed.For convenience, the HSI data are converted from 3D to 2D by concatenating the spatial pixels for each spectral band, i.e., X X X ∈ R L×MN and Y Y Y ∈ R L×mn .In addition, Z Z Z ∈ R l×MN represents the observed MSI with high-spatial-resolution, but low-spectral-resolution, where l is the number of spectral bands in MSI.In particular, the observed HSI Y Y Y and the observed MSI Z Z Z are degraded from the original high-spatial-resolution and high-spectral-resolution HSI X X X [11,24].Therefore, the observed low-spatial-resolution HSI Y Y Y and low-spectral-resolution MSI Z Z Z can be represented as: where Ψ : R L×MN → R L×mn and Φ : R L×MN → R l×MN are two mapping functions, which may be linear or nonlinear.Generally, the MSI Z Z Z is often approximated as: where H H H ∈ R l×L is the spectral response matrix, which is often assumed to be known [11,28].This means MSI Z Z Z can be easily obtained by the spectral degradation of the original high-spectral-resolution HSI X X X.As is known in [11,17,28,29], the spectrum at each pixel position is often assumed to be a linear combination of several endmember spectra.Therefore, the high-spatial-resolution and high-spectral-resolution HSI X X X is formulated as: where U U U ∈ R L×C is the endmember matrix, V V V ∈ R C×MN is the abundance matrix, and C represents the total number of endmembers.U U U denotes the spectral signatures of the underlying materials, while V V V represents the proportions of endmember spectra in each spatial point of the scene.

Domain Transfer Learning
Inspired by the idea of transfer learning [18,30], the mapping between low-and high-resolution images can be learned in the natural image domain and then transferred to the hyperspectral image domain.Moreover, deep learning has strong generalization ability and representation power to achieve domain transfer learning.Recently, many deep convolutional neural networks (CNNs) for image super-resolution have been presented, e.g., [20,31].The effective CNNs can capture the nonlinear mapping between low-and high-resolution images.
To better learn the nonlinear mapping between low-and high-resolution images, a deep Convolutional Super-resolution Network (CSN) [20] is constructed to handle multiple and even spatially-variant degradations, which significantly enhances the applicability in the real world.In the test process, CSN takes a low-resolution natural image and the degradation maps as the input and then produces the corresponding high-resolution natural image.The degradation maps contain the warping knowledge, which can enable the super-resolution network to have the spatial transformation ability [32].Similar to [20,33], a cascade of 3 × 3 convolutional layers are used to perform the nonlinear mapping between low-and high-resolution images.Each layer includes three operations: Convolution (Conv), Batch Normalization (BN) [34], and Rectified Linear Units (ReLU) [35].Generally, "Conv + BN + ReLU" is applied to each convolutional layer excluding the last convolutional layer with only one "Conv" operation.Then, an additional sub-pixel convolutional layer is applied to convert several high-resolution subimages with a size of m × n × r 2 K into a single high-resolution image with a size of M × N × K, where K is the number of channels and r denotes the magnification factor, which is equal to M/m and N/n.Since CSN operates on RGB channels rather than the luminance channel (i.e., Y channel in YCbCr color space), the value of K is three.
The CSN directly learns the nonlinear mapping between the low-resolution image Y Y Y s ∈ R m×n×3 and the corresponding high-resolution image X X X s ∈ R M×N×3 .F (•) denotes a deep CSN, which takes Y Y Y s as the input and outputs the estimated high-resolution image.As is known to us, the nonlinear mapping function F (•) can be learned by minimizing the loss function between the original high-resolution natural image X X X s and the estimated high-resolution image F (Y Y Y s , M; Θ), where M represents the degradation maps with respect to the input image Y Y Y s and Θ denotes the CSN model parameters.Given a large training set containing lots of natural images the model parameters Θ of a CSN model are estimated by solving the following minimization problem using the Adam method [36]: where • F denotes the Frobenius norm and N s is the number of training sample pairs.Once having obtained the learned CSN model, i.e., the model parameters Θ, it is reasonable to transfer it from the natural image domain to the HSI domain in each spectrum direction.In this paper, each band image of the low-spatial-resolution HSI Y Y Y is first copied to RGB channels as the input of the learned CSN model.Then, the estimated high-resolution band image is obtained by computing the mean value of the output RGB channels.In this way, the transferred high-spatial-resolution HSI X X X h ∈ R L×MN is predicted as: where In addition, Γ (•) represents the composite function that first converts the vector from 1D to 2D, i.e., R 1×mn → R m×n , and then copies the same matrix to RGB channels, while E(•) denotes the other composite function that first computes the mean value of the output RGB channels and then converts the matrix from 2D to 1D, i.e., R M×N → R 1×MN .

Optimization
Since the low-spatial-resolution HSI Y Y Y and the corresponding high-spatial-resolution HSI X X X, as well as the transferred high-spatial-resolution HSI X X X h capture the same scene, the underlying materials, i.e., the endmember matrix, should be the same.In addition, the abundance matrix of the transferred HSI X X X h is approximated as that from the estimated HSI X X X: V V V h ≈ V V V. Therefore, the low-spatial-resolution HSI Y Y Y and the transferred HSI X X X h should be approximated as: where W W W ∈ R C×mn is the abundance matrix of the observed low-spatial-resolution HSI Y Y Y. Similarly, the observed MSI Z Z Z and the desired high-spatial-resolution HSI X X X share the same abundance matrix [17].Hence, Equation ( 2) is changed as: where U U U m ∈ R l×C is the endmember matrix of the observed MSI Z Z Z.
Combining Equations ( 6)-( 8), the super-resolution problem for HSI can be solved by minimizing the following optimization problem: where α and β are two nonnegative regularization parameters.The constraints U U U ≥ 0, V V V ≥ 0, W W W ≥ 0, and U U U m ≥ 0 mean that these matrices are element-wise nonnegative [37].In the linear mixing model, the abundance matrices V V V and W W W represent the abundances of the underlying materials that are necessarily nonnegative.Recent studies [29,38] have shown that multiplicative update rules are guaranteed to minimize the residual errors in Equation (9).Therefore, the multiplicative update rules for U U U, V V V, W W W, and U U U m are given as: where (•) T denotes the function of transposition and .* and ./represent the element-wise multiplication and division, respectively.In summary, the overall description of the proposed method is given in Algorithm 1. Specifically, the matrices U U U, V V V, W W W, and U U U m are first initialized randomly and then updated by computing Equations ( 10)-( 13) until convergence is reached.In this paper, the convergence condition is defined as the change ratio of the loss function being smaller than a given threshold 10 −8 .The optimal endmember matrix U U U and abundance matrix V V V are combined to reconstruct the high-spatial-resolution HSI X X X. 1. Obtain the transferred high-spatial-resolution HSI X X X h using Equation (5);

Initialize:
2. Set t = 0 and ε 0 = 0; 3. Initialize U U U, U U U m , W W W, and V V V randomly.
Update the endmember matrix U U U of the estimated HSI using Equation (10); 5. Update the endmember matrix U U U m of the observed MSI using Equation (11); 6. Update the abundance matrix W W W of the observed HSI using Equation (12); 7. Update the abundance matrix V V V of the estimated HSI using Equation ( 13 end 10.Obtain the optimal matrices U U U = U U U and V V V = V V V; 11.Estimate the high-spatial-resolution HSI X X X by computing X X X = U U U V V V.

Experiments
This section introduces the experiments for HSI super-resolution to verify the effectiveness of the proposed method.Three public HSI datasets are described in Section 4.1.Section 4.2 introduces state-of-the-art competing methods and evaluation indexes used in this paper.Section 4.3 shows the parameter analysis of the proposed method.Finally, Section 4.4 displays the experimental results.

HSI Datasets
In the experiments, three HSI datasets, i.e., CAVE (http://www.cs.columbia.edu/CAVE/databases/multispectral/), Pavia (http://www.ehu.eus/ccwintco/uploads/e/e3/Pavia.mat), and Paris (https://github.com/alfaiate/HySure),were used to evaluate the performance of the proposed method and state-of-the-art approaches.Commonly, the HSIs in these datasets were considered as the original high-spatial-resolution and high-spectral-resolution HSIs.The simulated low-spatial-resolution HSIs were obtained by first blurring the original HSIs and then downsampling the result with a magnification factor r in each band direction.In addition, the value of HSI data was often large; therefore, the original HSIs should be normalized to [0, 1].For better evaluation, all the output images were converted into 8-bit images.
The CAVE dataset [15,39] contains 32 HSIs with a size of 512 × 512 × 31, which means that the size of each band image is 512 × 512 and the number of spectral bands is 31.Similar to [28], the high-spatial-resolution and low-spectral-resolution MSI was created by using the Nikon D700 spectral response (https://www.maxmax.com/spectral_response.htm), i.e., the spectral response matrix H H H in Equation (2).Therefore, the auxiliary images of the same scene used in the CAVE dataset are RGB images.To give better visual illustration, the Face image from the CAVE dataset is shown in Figure 2, where three spectral bands, i.e., 3, 25, and 30, were chosen to provide a pseudo-color view.The Pavia dataset [11,18] captures the urban area of the university of Pavia, which is commonly used in the application of hyperspectral classification.This dataset has a single HSI with a size of 610 × 340 × 103; however, there are 10 non-informational spectral bands.A region of interest with an image size of 200 × 200 was selected from the original Pavia image, which contained valuable information.Therefore, a part of a size of 200 × 200 × 93 was used as the referenced high-spatial-resolution HSI, after cropping the original Pavia image and removing the first ten spectral bands, i.e., X X X = X X X(411:610, 141:340, 11:103).The Paris dataset [11] is often used for real data experiments.The size of the Paris image is 72 × 72 × 128.For the two remote sensing datasets, i.e., Pavia and Paris, the MSIs were created by using the spectral response of the IKONOS satellite [11].These MSIs contain four bands: blue, green, red, and near-infrared components.Figures 3 and 4 show the Pavia image and the Paris image, respectively.For the Pavia HSI data, three spectral bands 20, 16, and 5 were selected as a pseudo-color view.For Paris HSI data, the color image is composed of Bands 28, 13, and 3 for red, green, and blue channels, respectively.
All of the above competing methods were obtained from the authors' MATLAB+MEX implementation, where the parameters were set for the best performance, as described in the corresponding references.For a pair comparison, the number of endmembers C C C was set to 10 in this paper.In addition, all the experiments were performed on a personal computer with NVIDIA GeForce GTX 1080 Ti GPU, 64 GB memory, and 64-bit Windows 7 using MATLAB R2017b.
Evaluation indexes: There are many indexes to evaluate the image quality of the reconstructed images, e.g., qualitative [40,41] and quantitative [42][43][44][45].In this paper, six quantitative indexes were employed to evaluate the quality of the reconstructed HSIs.
The first two quantitative indexes were the Mean Root-Mean-Squared Error (MRMSE) and the Mean Peak Signal-to-Noise Ratio (MPSNR).The RMSE and PSNR between the original high-spatial-resolution HSI X X X ∈ R L×MN and the high-spatial-resolution estimated HSI X X X ∈ R L×MN are defined as: where max(•) is the maximum value of the input vector.MRMSE and MPSNR are based on mean squared error.A smaller MRMSE value and a larger MPSNR value indicate that the reconstruction quality of the estimated HSI X X X is better.
The third quantitative index is the Mean Structure Similarity (MSSIM), which is to measure the structural consistency between the estimated image and the reference image.The MSSIM index of HSI data is formulated as: where X X X i ∈ R 1×MN and X X X i ∈ R 1×MN denote the i th band images of the original HSI and the estimated HSI, respectively.ϕ(•) is the function to convert a vector from R 1×MN to R M×N .The details of the function SSIM(•, •) were described in [42].The value of MSSIM belongs to [0, 1].A larger MSSIM value indicates that the estimated image is more similar to the reference image in the structure.The fourth index is the relative dimensionless global error in synthesis (ERGAS) [43], defined as: where r is the magnification factor and µ 2,i denotes the mean value X X X i .ERGAS calculates the band-wise normalized RMSE and then divides it by the spatial factor between the high-and low-resolution HSIs.
The fifth quantitative index is the Universal Image Quality Index (UIQI) [44].UIQI is performed on a sliding window with a size of 32 × 32 and averages on all the window positions and over all the spectral bands.Denote X X X i,j and X X X i,j as the j th window region of the i th band image in the original and estimated HSIs, respectively.Therefore, the equation of UIQI is given by: where P is the number of window positions and ρ ij represents the covariance between the subimages X X X i,j and X X X i,j .In addition, µ 1,ij and σ 1,ij are the mean value and standard deviation of the reference band image X X X i,j , respectively.Meanwhile, µ 2,ij and σ 2,ij are the mean value and standard deviation of the estimated band image X X X i,j , respectively.The sixth quantitative index is the Spectral Angle Mapper (SAM) [45], which is commonly used to evaluate the spectral information preservation at each pixel.SAM measures the spectral similarity by calculating the angle between the reference pixel X X X k ∈ R L×1 and the estimated pixel X X X k ∈ R L×1 , defined as: The SAM value is given in degrees.When the SAM value is close to zero, this indicates high spectral quality in the estimated HSI.
In summary, when MPSNR, MSSIM, and UIQI values are larger, the estimated HSI is more similar to the reference HSI.Furthermore, the smaller MRMSE, ERGAS, and SAM values are, the better the reconstruction quality of the estimated HSI is.

Parameter Analysis
As shown in Algorithm 1, the two main parameters of the proposed method are the regularization parameters α and β.The value of α affects the estimation of endmember matrix, while the value of β affects the contribution of the abundance matrix from the observed MSI.The value of regularization parameter α was tuned with the set {0.0001, 0.0002, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 1}.Meanwhile, the parameter β was varied with the set {1, 10, 100, 500, 1000,1500,2000,3000,4000,5000,10,000}. Figure 5 shows the MRMSEs and MSSIMs with respect to the values of regularization parameters α and β on the CAVE, Pavia, and Paris datasets, respectively.
From Figure 5, we can see that when the value of β increased, the MRMSEs and MSSIMs displayed a small change with a fixed α.This is because the abundance matrix estimated from the observed MSI generated a similar contribution in the final result.For the CAVE dataset, a peak value was generated in the curved surface at α = 10 −4 and β = 1500.In addition, the curved surfaces of the Pavia and Paris datasets were similar.With a fixed α value, the MSSIMs began to increase slowly when the value of β continued to increase.If the value of β is large, the ability of the estimation of the abundance matrix from the observed MSI will be limited.Therefore, the regularization parameters α and β were set to 10 −4 and 10 4 for both the Pavia and Paris datasets in the experiments.

Experimental Results
The experimental results on the three HSI datasets are summarized in this subsection.Tables 1-3 show the quantitative results (using Equations ( 15)-( 20)) and the computation time of the proposed method and the baselines on the CAVE, Pavia, and Paris HSI datasets with a magnification factor of three, respectively.The reported evaluation values were calculated in the 8-bit resulting images.Although the bicubic, GLP, CNMF, and SDSR methods take less time, the reconstruction quality of these methods is very limited.Compared with the competing methods, the computation time of the proposed method was the largest.That is because the proposed method relies on the construction of the transferred HSI, which is obtained by the existing SRMDNF method for each spectral band.For the CAVE dataset, the values shown in Table 1 are the average values of the six quantitative indexes over the 32 hyperspectral images.Compared with state-of-the-art approaches, the proposed method achieved better performance for most cases, which indicates that the information transferred from the natural image domain can enhance the ability of HSI super-resolution.Especially, the proposed method obtained the best MSSIM index values in most of the cases, which demonstrates that the estimated HSIs obtained by the proposed method have better structural similarity and higher spatial-resolution.For visualization, Figures 6-8 show the pseudo-color images obtained by different methods on the Balloons, Pavia, and Paris images with a magnification factor of three, respectively.In addition, Table 4 shows the MSSIM values corresponding to Figures 6-8.Although bicubic enlarges the image size, blurring effects often exist in the reconstructed images.GLP was better than bicubic; however, GLP easily produces the ghost effects on the edge area.HySure can learn high-spatial-resolution information from the observed MSI, but it may introduce noise in the final result.PALM performed worse on the structural regions than smooth regions.CNMF only uses the knowledge learned from natural images, which lacks some complex structural information.SDSR uses the result obtained by the bicubic interpolation as the pre-HSI, which will reduce the estimation accuracy of the endmember matrix.SRMD is a single-image super-resolution method, which learns more valuable information from a high-resolution training image dataset.However, it ignores the spectral consistency for HSI super-resolution.From the theory of domain transfer learning, the knowledge transferred from the natural image domain can improve the ability of HSI super-resolution reconstruction to a certain extent.For this reason, the proposed method combines the transferred HSI, the observed HSI, and MSI to estimate the optimal endmember and abundance matrices.Compared with state-of-the-art approaches, the proposed method achieves better performance for HSI super-resolution.To further compare the performance of different methods in each spectral band, Figure 9 shows the RMSE between the original spectra and the estimated spectra on the Balloons, Face, Pavia, and Paris hyperspectral images.In addition, Table 5 shows the average RMSE values corresponding to Figure 9. Compared with the baselines, the proposed method preserved the spectral consistency from the observed HSI and achieved better performance for most cases.
In addition, to validate the effectiveness of the proposed method for different magnification factors, we repeated the experiments for a magnification factor of four.Tables 6-8 show the quantitative

Conclusions
In this paper, a novel domain transfer learning-based HSI super-resolution method was proposed to improve the spatial-resolution of the observed HSI, on the basis of the spatial similarity and spectral consistency.First, the proposed method obtains the transferred high-spatial-resolution HSI by exploiting the knowledge from the natural images.Then, the transferred high-spatial-resolution HSI, the observed low-spatial-resolution HSI, and high-spatial-resolution MSI are unified to estimate the endmember and abundance matrices during the spectral mixture analysis.Finally, the optimal endmember and abundance matrices are used to construct the target high-spatial-resolution HSI.
Through the experiments on the three real-world HSI datasets, i.e., CAVE, Pavia, and Paris datasets, the proposed method achieved better super-resolution reconstruction performance than the competing approaches.Specifically, the performance of the proposed method can still outperform the baselines for the magnification factors of three and four, which indicates that the proposed method is suitable for complex real-world super-resolution applications.When compared with the bicubic interpolation with a magnification factor of three, the average MSSIM values obtained by the proposed method at least increased by 4.4%, 40.1%, and 55.5% for the CAVE, Pavia, and Paris datasets, respectively.The proposed method captures the nonlinear mapping relationship between low-and high-resolution natural images and then transfers the knowledge to the HSI domain.In addition, the proposed method enhances the spatial similarity and preserves the spectral consistency in the optimization of the endmember and abundance matrices.Therefore, the proposed method achieved qualitative and quantitative results in the experiments.
In the future, the focus of our work will be on how to estimate the endmember matrix or the abundance matrix with higher precision under some effective constraints.

Figure 1 .
Figure 1.The flowchart of the proposed method.First, the Convolutional Super-Resolution Network (CSN) model parameters are learned on the low-and high-resolution natural images.Then, the transferred HSI is obtained by the trained CSN model with respect to the observed HSI.Finally, the endmember matrix and the abundance matrix of the estimated HSI are optimized by using the observed HSI, the observed Multispectral Image (MSI), and the transferred HSI.BN, Batch Normalization.

Algorithm 1 :
The proposed framework for HSI super-resolution.Input: The observed low-spatial-resolution HSI Y Y Y; observed high-spatial-resolution MSI Z Z Z; model parameters of CSN Θ; regularization parameters α and β; number of endmembers C; maximum number of iterations T; convergence threshold τ.Output: Estimate the high-spatial-resolution HSI X X X. Transferring:

Figure 2 .
Figure 2. Face image from the CAVEdataset, where the color image consists of Spectral Bands 3, 25, and 30 for the red, green, and blue channels, respectively.(a) Bicubic interpolation of the observed low-spatial-resolution HSI, with a magnification factor of three; (b) RGB image after applying Nikon 700 spectral response; (c) the original high-spatial-resolution HSI considered as the ground truth.

Figure 3 .
Figure 3. Pavia image, where the color image consists of Spectral Bands 20, 16, and 5 for the red, green, and blue channels, respectively.(a) Bicubic interpolation of the observed low-spatial-resolution HSI with a magnification factor of three; (b) MSI after applying IKONOS spectral response; only red, green, and blue components displayed; (c) the original high-spatial-resolution HSI considered as the ground truth.

Figure 4 .
Figure 4. Paris image, where the color image consists of Spectral Bands 28, 13, and 3 for the red, green, and blue channels, respectively.(a) Bicubic interpolation of the observed low-spatial-resolution HSI with a magnification factor of three; (b) MSI after applying IKONOS spectral response; only red, green, and blue components displayed; (c) the original high-spatial-resolution HSI considered as the ground truth.

Figure 5 .
Figure 5. Mean Root-Mean-Squared Errors (MRMSEs) and Mean Structure Similarities (MSSIMs) versus the value of parameters α and β using the proposed method with a magnification factor of three.(a) MRMSEs versus α and β on the CAVE dataset.(b) MSSIMs (in terms of percentage) versus α and β on the CAVE dataset.(c) MRMSEs versus α and β on the Pavia dataset.(d) MSSIMs (%) versus α and β on the Pavia dataset.(e) MRMSEs versus α and β on the Paris dataset.(f) MSSIMs (%) versus α and β on the Paris dataset.

Figure 9 .
Figure 9. Spectral errors in terms of the RMSE for the compared methods.(a) Balloons image; (b) Face image; (c) Pavia image; (d) Paris image.

Table 2 .
Quantitative results on the Pavia dataset with a magnification factor of 3.

Table 3 .
Quantitative results on the Paris dataset with a magnification factor of 3.

Table 5 .
The average values and the standard deviations of the RMSEs corresponding to Figure9, in terms of percentage %.

Table 6 .
Quantitative results on the CAVE dataset with a magnification factor of 4.

Table 7 .
Quantitative results on the Pavia dataset with a magnification factor of 4.

Table 8 .
Quantitative results on the Paris dataset with a magnification factor of 4.