Hyperspectral Image Super-Resolution by Deep Spatial-Spectral Exploitation

: Limited by the existing imagery sensors, hyperspectral images are characterized by high spectral resolution but low spatial resolution. The super-resolution (SR) technique aiming at enhancing the spatial resolution of the input image is a hot topic in computer vision. In this paper, we present a hyperspectral image (HSI) SR method based on a deep information distillation network (IDN) and an intra-fusion operation. Speciﬁcally, bands are ﬁrstly selected by a certain distance and super-resolved by an IDN. The IDN employs distillation blocks to gradually extract abundant and efﬁcient features for reconstructing the selected bands. Second, the unselected bands are obtained via spectral correlation, yielding a coarse high-resolution (HR) HSI. Finally, the spectral-interpolated coarse HR HSI is intra-fused with the input HSI to achieve a ﬁner HR HSI, making further use of the spatial-spectral information these unselected bands convey. Different from most existing fusion-based HSI SR methods, the proposed intra-fusion operation does not require any auxiliary co-registered image as the input, which makes this method more practical. Moreover, contrary to most single-based HSI SR methods whose performance decreases signiﬁcantly as the image quality gets worse, the proposal deeply utilizes the spatial-spectral information and the mapping knowledge provided by the IDN, which achieves more robust performance. Experimental data and comparative analysis have demonstrated the effectiveness of this method.


Introduction
Hyperspectral imagery sensors usually collect reflectance information of objects in hundreds of contiguous bands over a certain electromagnetic spectrum [1], and the hyperspectral image (HSI) can simultaneously obtain a set of two-dimensional images (or bands) [2]. These rich bands play an important role in discriminating different objects by their spectral signatures [3], and making them widely applicable in classification [4] and anomaly detection [5]. However, limited by the existing imaging sensor technologies, HSIs are characterized by low spatial resolution, which results in limitation of their applications' performance. As a type of signal post-processing technique, HSI super-resolution (SR) can improve the spatial resolution of the HSI without modifying the imagery hardware, which is a hot issue in computer vision.
HSI SR has been studied for a long time in remote sensing and many methods have been proposed to improve the spatial resolution of the HSIs. According to the number of input images, these methods can be roughly classified into two types: the fusion-based HSI SR methods and the single-based ones.
The fusion-based approaches are based on the assumption that multiple fully-registered observations of the same scene are accessible. Dong et al. [6] proposed a nonnegative structured spatial and spectral information of the input LR HSI is fully-utilized, which contributes to the robust and acceptable performance.The main contributions of this work are summarized as follows: 1. We adopt a scalable SR strategy for super-resolving the HSI. Firstly, an IDN is used for super-resolving the interval-selected bands individually, a process exploiting their spatial information and the mapping learned by the IDN. Secondly, the unselected bands are fast interpolated via cubic Hermit spline method, which uses the high spectral correlation in the HSI to obtain a coarse HR HSI. Both spatial and spectral information is utilized. Meanwhile, contrary to super-resolving the HSI band by band via some certain methods, this scalable SR strategy achieves a tradeoff between high quality and high efficiency.
2. Most existing single-based methods super-resolve bands in the HSI individually, which neglects the spectral information. In this way, their performance is highly correlated to the images' spatial quality. The proposed method deeply utilizes both the spatial and spectral information in the HSI, and its performance is more robust. 3. To deeply use the information the input LR HSI conveys, intra-fusion is made between the spectral-interpolated coarse HR HSI and the input LR HSI. Different from most fusion methods, which require another co-registered image as the input, the other input of the proposed intra-fusion is an intermediate outcome of the SR processing, which fully exploits the information the LR HSI conveys in a subtle way.
The remainder of the paper is organized as follows: Section 2 describes the proposed method. We present the experimental results and data analysis in Section 3. Conclusions are drawn in Section 4.

Proposed Method
In this section, we present the four main parts of the proposed method: framework overview, bands' selection and super-resolution by IDN, unselected bands' super-resolution, and intra-fusion. Detailed descriptions of these four parts are presented in the following subsections. Figure 1 illustrates the workflow of the proposed HSI SR method. The input data are one LR HSI. Bands are first selected by a certain distance and super-resolved via IDN with respect to their spatial information. Then, unselected bands are super-resolved by utilizing their spectral correlation to obtain a coarse but integrated HR HSI. Furthermore, this coarse HR HSI is intra-fused with the input LR HSI to further use the information these unselected bands convey. To facilitate discussion, we clarify the notations of some frequently used terms. The input LR HSI and the desired HR HSI are represented as L ∈ R w×h×n and H ∈ R sw×sh×n , where w and h denote the width and height of the input LR HSI. s and n denote the scaling factor and the number of bands, respectively.H andĤ denote the coarse HR HSI and the intra-fused fine HR HSI, respectively.

Bands' Selection and Super-Resolution by IDN
This part firstly analyzes the correlation between the bands in the HSI and depicts the rationality of interval setting, and then super-resolves the selected bands via IDN.

Correlation Analysis
For HSIs, their neighboring bands are highly correlated in the spectral domain [22]. Figure 2 plots the correlation coefficient curves of the Pavia university scene HSI, a remote-sensing HSI widely applied in classification [23]. The value on the x-axis is the index of the current band H i . The legend denotes the interval between H i and the other neighboring band H i+d . The value on the y-axis is the correlation coefficient between H i and H i+d . According to the three curves in Figure 2, although correlation decreases as the band gets further from the current one, most bands are highly correlated to their neighboring bands, whose correlation coefficients are larger than 0.95. Moreover, as the image quality decreases, most high-frequency in the images are damaged, the correlation coefficient between neighboring bands is supposed to be higher (related experiments have been described in Section 3). Hence, to achieve a high efficiency without performance loss, it is rational and necessary to firstly select some bands by certain distance and super-resolve the selected bands by utilizing their spatial information. Contrary to super-resolving each bands in the HSI, this operation will highly reduce the computational complexity with negligible performance degradation. Figure 3 has shown the general architecture of the IDN, a process consisting of three parts, i.e., a feature extraction block, multiple stacked information distillation blocks and a reconstruction block. The feature extraction block applies two 3 × 3 convolution layers to extract the feature maps from the original LR images. The extracted features maps act as the input of the information distillation blocks. Several information distillation blocks are composed by using chained mode, in which each block contains an enhancement unit and a compression unit with stacked style. Finally, a transposed convolution layer without activation function acts as the reconstruction block to obtain the HR image. Compared with the other networks, the IDN extracts feature maps directly from the LR images and utilizes multiple DBlocks to generate the residual representations in HR space. The enhancement unit in each DBlock gathers as much information as possible, and the remaining compression unit distills more useful information, which achieves competitive results with a concise structure.

Super-Resolution via IDN
It should be noted that the network considered two loss functions during the training process. The first one is the widely used mean square error (MSE): (1) in which N, I i , andÎ i represent the number of training samples, the ith input image patch and the label of the ith input image patch, respectively. Meanwhile, mean absolute error (MAE) is also applied to train the IDN model. The MAE is formulated as: Specifically, the IDN is first trained with loss MAE , and then fine-tuned by the loss MSE .
Having the trained IDN, the spatial resolution of the selected bands in the LR HSI can be enhanced in a fast and efficient way, and the IDN super-resolved HR bands can be denoted as where the unselected bands are temporarily missing.

Spectral-Interpolation for the Unselected Bands
Given the super-resolved interval-selected bands, the proposal applies a cubic Hermite spline method f (x) to achieve a continuous and smooth entire HR HSI. Cubic Hermite splines are typically used for interpolation of numeric data specified at given argument values, to obtain a smooth continuous function. Compared with the linearity, it can better hold and analyze the mean of the the dependent variables and capture the nature of their relationships [24].
Suppose that the following nodes in the cubic Hermite spline function and their values are given: x :a = x 0 < x 1 < ... < x n = b y : y 0 y 1 y n (4) in which n denotes the number of the given nodes minus 1, thus it starts from 0. The proposed cubic Hermite spline function f (x) describes the mapping between x and y, and is a partitiondefined formula. With these IDN-super-resolved HR bands acting as the given nodes, the cubic Hermite spline function f (x) can be applied to obtain one coarse but integrated HR HSI, which can be denoted as:

Intra-Fusion
According to the above descriptions, the HR HSIH is reconstructed by utilizing the spatial information of the selected bands, the mapping learned by the IDN model and the spectral correlation. The information these unselected bands convey is directly neglected. If further utilization of these information is made, it is supposed that a spatial enhancement will be gained. In this way, we propose to get an HR HSIĤ through intra-fusing the spectral-interpolatedH with the input LR HSI L via the non-negative matrix factorization (NMF) method. Different from most existing fusion methods, the input of the proposed fusion is an intermediate output of the super-resolution process, which is why it is named as intra-fusion. This intra-fusion is more flexible and more practical.
Because of the coarse spatial resolution, pixels in the HSI are usually mixed by different materials. Spectral curves of the HSI usually are mixtures of different pure materials' reflectance, and these pure materials are called endmembers. Considering the mathematics simplicity and physical effectiveness, each spectral curve can be modeled by a linear mixture model. Let matrix H α ∈ R n×s 2 wh represent the desired HR HSI by concatenating the pixels of HSI H: H ∈ R sw×sh×n → H α ∈ R n×s 2 wh . The same operation is implemented on the spectral-interpolated HR HSIH ∈ R sw×sh×n and the input LR HSI L ∈ R w×h×n to obtain theH α ∈ R n×s 2 wh and L α ∈ R n×wh . In this way, the desired HR HSI H α can be denoted as where W ∈ R n×D is the endmember matrix, and C ∈ R D×s 2 wh is the abundance matrix. N denotes the residual. D is the number of endmembers and each column in W represents the spectrum of an endmember. Here, D is obtained by the Neyman-Pearson lemma [25]. Given an HSI with n bands, eigenvalues of its correlation matrix R n×n and covariance matrix K n×n are computed and sorted as {λ 1 ≥ λ 2 ≥ λ 3 ≥ · · · ≥ λ n } and {ξ 1 ≥ ξ 2 ≥ ξ 3 ≥ · · · ≥ ξ n }, respectively. According to the binary hypothesis testing, assume H0: According to the preset false alarm rate, a threshold α can be computed to maximize the detection rate. In this way, when λ i − ξ i is greater than α, a signal signature is considered to exist. The number of endmembers is obtained by counting the number of eigenvalues that satisfy λ i − ξ i > α. In the proposal, the given false alarm rate is set as 5 × 10 −2 . Meanwhile, all elements in the matrix W and C are nonnegative. The spectral-interpolated HR HSIH α and the input LR HSI L α can be denoted as: in which Q is the spectral transform matrix, and P ∈ R s 2 wh×wh is the spatial spread transform matrix. E q and E p are the residual matrices. When substituting Equation (6) into Equations (7) and (8),H α and L α can be reformulated asH where W T and C T denote the spatial degraded abundance matrix and the spectrally degraded matrix, respectively. During the HSI unmixing procedure, it is expected that the HSI reconstructed by the endmember and coefficient matrices should be close to input image. In this way, the cost functions about the input LR HSI L α and spectral-interpolatedH α are formulated as ||H α − W T C|| 2 F and ||L α − WC T || 2 F , respectively. NMF was developed to decompose a nonnegative matrix into a product of nonnegative matrices [26]. When applied to theH α , the cost function is formulated as arg min During the solving process, both W T and C are firstly initialized as nonnegative matrices. If W T is smaller than the desired matrix W Td , a variable k whose value is larger than 1 will be multiplied by W T to make it next to the W Td . On the other hand, if W T is larger than W Td , k should be larger than 0 but smaller than 1 to make sure all the elements in W T are nonnegative. In this way, k is defined as where (.) T denotes the transposition of the matrix. "./" denotes the element-wise division. According to Equation (7), it is noted that the relation between k and constant 1 changes with that between W T and W Td . Hence, W T can be updated by the following expression: The same update strategy is operated on C, W and C T .
When W and C are obtained, the HR HSIĤ intra-fused byH and L is also achieved, which contains both the conveyed spatial information L and the IDN learned spatial mapping correlation. Moreover, this fusion operation does not require any auxiliary HR image as the input, which is more practical. The complete algorithm is summarized in Algorithm 1.

Algorithm 1: Pseudocode of the proposal.
Input: LR HSI L, bands' selection interval d, bands' number n Output: reconstructed HR HSIĤ up-sample L i →H i via cubic Hermite spline function; 8 for i = t(d + 1) + 2; i ≤ n; i + + do 9 up-sample L i →H i via IDN; 10 returnH; 11 reshapeH →H α , and L → L α ; 12 Initialize W and H T via vertex component analysis method and randomly, respectively; 13 Update C T via Equation (16), with W fixed; 14 Optimize W and C T via Equation (15) and Equation (16); 15 for i = 1; i ≤ q; i + + do 16 Set W T = W; 17 Update C via Equation (14), with W T fixed; 18 Optimize W T and C via Equation (13) and Equation (14); 19 Update W via Equation (15), with C T fixed; 20 Optimize W and C T via Equation (15) and Equation (16);

Experimental Setup
The performance of this proposed method was mainly evaluated on outdoor HSIs obtained via airborne, spaceborne and the ground-based platforms. The outdoor HSIs utilized in the experiments are the Pavia University, Washington DC Mall, Salinas, Botswana and Scene02 datasets. They were acquired by the Reflective Optics Imaging Spectrometer (ROSIS), the Hyperspectral Digital Image Collection Experiment (HYDICE), the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS), the Hyperion sensor on the NASA EO-1 satellite and the Spectral Imagery (SPECIM) hyperspectral camera, respectively. The geometric resolution of the ROSIS and HYDICE are 1.3 m and 1 m, respectively. There are 610 × 610 × 103 pixels in the original Pavia University, but we selected a 200 × 200 × 100 region with rich detail to validate the performance. The Washington DC Mall acquisition measured the pixel response from 400 nm to 2400 nm region. Bands in the 900 nm and 1400 nm region where the atmosphere is opaque were omitted from the dataset, leaving 191 bands. The Salinas dataset was collected by the 224-band AVIRIS sensor and 20 water absorption bands were abandoned. The Botswana dataset was acquired at 30 m pixel resolution over a 7.7 km strip in 242 bands covering the 400-2500 nm portion of the spectrum in 10 nm windows. The scene02 HSI consists of 1312 × 1174 pixels and 396 spectral reflectance bands in the wavelength range from 378 nm to 1003 nm. It is noted that these outdoor HSIs cover three different types of surface. Both the Pavia University and Washington DC Mall datasets refer to the building typologies, the Salinas and Botswana datasets refer to the land cover typologies, and the scene02 dataset refers to the man-made material typologies.
Moreover, to validate the proposal's robustness to the image quality, we also validated the performance on the ground-based HSIs with finer spatial detail. We randomly selected two HSIs from the CAVE database: flowers and f ake_and_real_ f ood. Their spectral resolution is from 400 nm to 700 nm at 10 nm steps (31 bands in total). The low-resolution HSIs were simulated by bicubic down-sampling the original HSIs with the scaling factor s = 2, 4 and 8. To comprehensively evaluate the performance of the proposed method, comparisons were made with several state-of-the-art deep learning-based single SR methods, including SRCNN [27], VDSR [28], LapSRN [29], and IDN [19]. The parameters in the competing methods were chosen as described in their corresponding references. The single-based methods provide a solid baseline with no auxiliary images.
In [30], Lim et al. experimentally demonstrated that training with loss MSE is not a good choice. Thus, the IDNs were first trained with the loss MAE , and then fine-tuned by the loss MSE . During the training process, the learning rate was initially set to 10 −4 and decreased by a factor of 10 during fine-tuning phase.
The following six quantitative measurements were employed to evaluate the performance of reconstructed HR HSIs: Correlation Coefficient (CC), Root Mean Square Error (RMSE), Erreur Relative Globale Adimensionnelle de Synthese (ERGAS) [31], Peak Signal-to-Noise Ratio (PSNR), Structure Similarity Index Measurement (SSIM), and Spectral Angle Mapper (SAM) [32]. CC, RMSE, PSNR and SSIM are universal measurement matrices for image quality assessment, and their detailed descriptions are omitted in this paper. ERGAS is a global statistical measure of the super-resolution quality with the best value at 0. The ERGAS of H andĤ is calculated by Here, n is the number of pixel in any band of the HR HSI H, and µ k is the sample mean of the kth band of H. SAM evaluates the spectral information preservation at each pixel. The SAM at the kth pixel is determined by (18) in which < a, b >= a T b is an inner product between a and b, and ||.|| is the l 2 norm. The optimal value of SAM is 0. The values of SAM reported in our experiments were obtained by averaging the values obtained for all the image pixels [33]. Among these six indices, larger CC, PSNR and SSIM indicate better quality, while SAM, RMSE and ERGAS are the opposite. All experiments were run on a desktop with Intel core i5 2.8 GHZ CPU and 16.0 GB RAM using MATLAB(R2015b). It is noted that the computation time of the bicubic method was excluded in the comparison process for its poor performance. Thus, all times of the bicubic method are marked with * .

Data Analysis
Correlation coefficients between neighboring bands in the LR HSIs were firstly measured to select the most proper interval. Figure 5a describes the variation of CC with the interval for the two-times down-sampled Pavia. It is noticed the CC decreased as the interval increased. This validated that the neighboring bands were highly correlated. However, as their interval became wider, their correlation coefficients decreased. Meanwhile, as the scaling factor increased, more high frequency details in the input LR HSIs were damaged, and the differences between different bands narrowed. Figure 5b plots the difference between the CC of the four-times down-sampled Pavia and that of the two-times down-sampled one, when interval was set as 1 and 2. It is noticed that the values of most differences were larger than 0. This validated that the bands in the four-times down-sampled HSIs were more correlated than those of the two-times down-sampled ones at the same interval. Thus, a larger interval can be set for the larger down-sampling factor, which would reduce the computational complexity with negligible performance loss.  Table 1 is obtained by comparing the quality indices of the spectral-interpolated coarse HR HSIs with those of the IDN band-by-band super-resolved HR HSIs at different intervals. It is noticed that, at the same scaling factor, as the interval grew, the quality indices' gap became bigger, which is consistent with what Figure 5a describes. Table 2 depicts the tendency of quality indices' gap between the spectral-interpolated HR HSI and those of the IDN band-by-band super-resolved HR HSI at the same interval but different scaling factor. It is noticed that, with the same interval, the performance of the coarse HR HSI at scaling factor of 8 was closer to that of the IDN super-resolved than at the scaling factor of 4, and scaling factor of 4 was closer than scaling factor of 2, a phenomenon consistent with what shows in Figure 5b, which further validated the rationality of larger-interval setting for larger scale from a vertical perspective. In this way, setting for the "d" follows two principles: (1) for the HSIs with coarser spectral resolution, it is better to set a smaller "d"; and (2) for the HSIs with coarser spatial resolution, most high frequency differences between neighboring bands are lost, and their spectral correlation would be higher than the HSIs with finer spatial detail. A larger "d" can be set for these HSIs. Figure 6 shows the variation of PSNR and computation time with the band's selection interval "d" for the Pavia University at the scaling factor of 2. It is noticed, as the "d" became larger, the computation time became less, but the PSNR of the reconstructed HSI became smaller, which validated our proposal's tradeoff between high quality and high efficiency.
For the Pavia University HSI, there are 115 bands in the 430-860 nm, of which 12 bands were neglected and the remaining 103 bands were used for the experiment. There are 31 bands in the spectral range of 400-700 nm for the HSIs from the CAVE database, and 210 spectral bands covers the 400-2400 nm for the original Washington DC Mall HSI. In this way, from a horizontal perspective, under the same scaling factor, a larger interval was set for the spectrally more correlated Pavia University, and smaller interval is set for the Washington DC Mall and the HSIs from the CAVE database. Detailed interval settings for the experimental HSIs are displayed in Table 3. Table 3. Interval d set for the HSI under different intervals. 2×  3  2  2  2  2  2  4×  9  3  3  3  3  3  8×  9  3  3  3  3  3

Pavia University
In Figure 7, we present a visual exhibition of the fourth band of the reconstructed Pavia University HSIs via various single based methods, with the scaling factor of 4.
As shown in Figure 7, the proposed method achieved a better spatial enhancement. Table 4 shows the averaged performance on the Pavia University HSI with comparison to the single-based methods. When compared with the SRCNN, VDSR and LapSRN, the proposed method achieved a better spatial enhancement of nearly 0.3-1 dB at the scaling factor of 2. In addition, the proposal only required about half the time of the IDN to achieve a comparable or better performance.  The asterisk * in the tables denote the best performance for the time measurement among all the methods. However, limited by its poor performance, we eliminated it from the comparison and make another time comparison among the other methods, in which we emphases the next least time in a bold format. The proposed method achieved the best performance when the scaling factors were 4 and 8. The reason is that, as the scaling factor grew, less spatial information was contained in the input HSI. The performance of methods that super-resolve the LR HSI band by band would be severely damaged. The proposed method fully exploits the spatial-spectral information, achieving more acceptable performance. Figure 8a depicts the PSNRs of different bands in the 8× reconstructed Pavia University HSIs via different single based methods. It can be seen that the performance of the proposed method was superior to the other methods for each band.

Washington DC Mall
In Figure 9, we present a visual exhibition about the 90th band of the reconstructed Washington DC Mall via various single based methods, with the scaling factor of 4. As shown in Figure 9, the proposed method achieved an HR HSI whose features are more distinct than the HSIs reconstructed by the other methods. Meanwhile, we randomly selected one point from the boundary of a region in the image and plotted its spectral curves ( Figure 10). in Figure 10 "ref" represents the ground truth spectral curves. Because the spectral curve reconstructed by the IDN method seriously deviated from the reference one, we eliminated it in Figure 10 to have a better comparison with the other methods. The proposed method well preserved the spectral information. Contrary to the spectral curves reconstructed by the other methods, the locations of almost all the reflectance peaks and the reflectance troughs in the the spectral curve generated by the proposal are consistent with those of the reference curve, which ensures the correctness of their future applications.  Figure 8b depicts the PSNRs of different bands in the 8× reconstructed Washington DC Mall HSIs via different single-based methods. It can also be seen that the performance of the proposed method was superior to the other methods for each band. Table 5 shows the averaged performance on the reconstructed Washington DC Mall with comparison to the other single-based methods. When compared with the SRCNN, VDSR and LapSRN, the proposal achieved a better spatial enhancement of nearly 0-0.5 dB at the scaling factor of 2. The asterisk * in the tables denote the best performance for the time measurement among all the methods. However, limited by its poor performance, we eliminated it from the comparison and make another time comparison among the other methods, in which we emphases the next least time in a bold format.
At the scaling factors of 4 and 8, the proposal also achieved the best performance. As the selection interval for Washington DC Mall is smaller than that of Pavia University, the computational complexity's superiority over the IDN was smaller than that for Pavia University, but still reduced about 40% at the scaling factor of 8.

Salinas
In Figure 11, visual exhibition about the 99th band of the reconstructed Salinas HSIs is presented, with the scaling factor set to 4. As the land cover in Salinas is quite smooth, no big visual difference in the main body between different methods can be noticed. However, for the junction of different land covers, its geometric shape reconstructed by the proposal is much more distinct than that of the other methods. Figure 12a depicts the PSNRs of each band in the 4× reconstructed Salinas via different methods. According to the data in Figure 12a, the proposal achieved a better spatial enhancement performance on most bands for super-resolving the 4× down-sampled Salinas HSI. Table 6 shows the averaged performance of the Salinas HSI at different scaling factors.  The asterisk * in the tables denote the best performance for the time measurement among all the methods. However, limited by its poor performance, we eliminated it from the comparison and make another time comparison among the other methods, in which we emphases the next least time in a bold format.
When compared with the SRCNN, VDSR, and LapSRN method, the proposal achieved a better spatial quality of nearly 0.06-0.74 dB at the scaling factor of 2. When it comes to the scaling factors of 4 and 8, the proposal achieved the best performance by utilizing the least time. It is demonstrated that the proposal also achieved an acceptable performance on the remote-sensing HSI with land cover typology, besides for the building typology. Figure 12b depicts the PSNRs of each band in the 8× reconstructed Botswana via different methods. The proposal achieveD the best PSNRs for most bands. Table 7 depicts the performance comparison with the alternative methods on the spaceborne Botswana. According to the data in Table 7, the IDN's performance superiority over the proposal gradually diminishED. When IT comes to the scaling factor of 8, the proposal outperformED the IDN in less time. When combining Figure 12b with the data in Table 7, it is convincing that the proposal achieved high performance with high efficiency. In addition, from the data in Table 7, the computational complexity of VDSR and SRCNN was always stable for scaling factors 2, 4 and 8. For the IDN and the proposal, as the size of the input image decreased, less computational complexity was required, which is more practical. Both the experimental data of Salinas and Botswana demonstrated the proposal's effectiveness on the HSIs of land cover typology. The asterisk * in the tables denote the best performance for the time measurement among all the methods. However, limited by its poor performance, we eliminated it from the comparison and make another time comparison among the other methods, in which we emphases the next least time in a bold format.

Scene02
Although the spectral resolution of the Scene02 is finer than 10 nm per band, we still set small and conservative band interval for this HSI to obtain a better spatial enhancement. Figure 13 presents the visual exhibition about the 198th band for the 8× reconstructed Scene02. We amplified a square region that contains both the letter and the grid to make a comparison with the reconstructed HSIs via different methods. Both the contour of the letter and the boundary of the grid in the HSI reconstructed by the proposal are sharper than that of the other methods. Figure 14 depicts the PSNRs of each band in the 8× reconstructed Scene02. The proposal achieved the best performance for most of the single bands, especially for the bands locating at the middle wavelengths. Table 8 shows the averaged performance on the reconstructed Scene02 via different methods. Limited by the memory, the running time of this HSI was much longer than the others, and the proposal achieved a significant computational reduction. This demonstrated that, as the image size grows, the proposal will achieve a notable computational advantage over the IDN.  The asterisk * in the tables denote the best performance for the time measurement among all the methods. However, limited by its poor performance, we eliminated it from the comparison and make another time comparison among the other methods, in which we emphases the next least time in a bold format.

CAVE Database
For the ground-based remote sensing HSIs in the CAVE database, their spatial quality is much finer than that of the other HSIs. In this case, according to the data in Table 9, the performance of super-resolving the HSI band by band via the IDN method was slightly better than that of the proposed method. The asterisk * in the tables denote the best performance for the time measurement among all the methods. However, limited by its poor performance, we eliminated it from the comparison and make another time comparison among the other methods, in which we emphases the next least time in a bold format.
However, considering the input HSIs were generated by down-sampling from the original HSIs via bicubic interpolation, as the scaling factor grew, the spatial quality of the input HSIs decreased. Hence, it is rational to note that, as the scaling factor grew, the IDN's superiority to the proposed method gradually diminished. Figure 15 plots the IDN's superiority to the proposal on fake_and_real_food. It is noted that, as the scaling factor grew, the superiority gradually diminished and was next to 0, validating once again our proposal's efficiency on remote sensing HSIs with poor spatial information. According to Kwan et al. [34], resolutions of most HSIs are limited to tens of meters, for which the proposed method would achieve a more appealing performance.

Conclusions
In this paper, an HSI super-resolution method is proposed by deeply utilizing the spatial-spectral information to reconstruct a high-resolution HSI from a low-resolution one. This method firstly incorporates different strategies for super-resolving the selected bands and the unselected ones. Compared with most of the existing single-based super-resolution methods, this scalable super-resolution operations greatly reduces the computational complexity with little performance degradation. Moreover, intra-fusion is operated to further improve the performance, which is more practical and does not require any auxiliary images. Through the proposed method, spatial-spectral information is fully exploited, and spatial details are well recovered. Experimental results and data analyses on both ground based and remote-sensing HSIs have demonstrated that the proposed method is especially suitable for dealing with the remote-sensing HSIs with poor spatial information.