Pansharpening by Complementing Compressed Sensing with Spectral Correction

Satellite image Abstract: Pansharpening (PS) is a process used to generate high-resolution multispectral (MS) images from high-spatial-resolution panchromatic (PAN) and high-spectral-resolution multispectral images. In this paper, we propose a method for pansharpening by focusing on a compressed sensing (CS) technique. The spectral reproducibility of the CS technique is high due to its image reproducibility, but the reproduced image is blurry. Although methods of complementing this incomplete reproduction have been proposed, it is known that the existing method may cause ringing artifacts. On the other hand, component substitution is another technique used for pansharpening. It is expected that the spatial resolution of the images generated by this technique will be as high as that of the high-resolution PAN image, because the technique uses the corrected intensity calculated from the PAN image. Based on these facts, the proposed method fuses the intensity obtained by the component substitution method and the intensity obtained by the CS technique to move the spatial resolution of the reproduced image close to that of the PAN image while reducing the spectral distortion. Experimental results showed that the proposed method can reduce spectral distortion and maintain spatial resolution better than the existing methods.


Introduction
The optical sensors installed in satellites acquire panchromatic (PAN) and multispectral (MS) region images. PAN sensors observe a wide range of visible and near-infrared (NIR) regions as one band with a high spatial resolution, and the MS sensor observes multiple bands. Pansharpening (PS) is a method used to generate a high-resolution MS images from these two types of data. Due to physical constraints [1], MS sensors are not designed to acquire high-resolution images. In general, high-resolution MS images are obtained via the pansharpening process. PS techniques are used for change detection, target recognition, classification, backgrounds for map application, visual image analysis, etc.
PS has been studied for decades [2,3], and its methodologies can roughly be classified into four types. The first is methods used to generate a pansharpened image by substituting the intensity component of the MS images for that of the PAN image via component substitution. The intensity-hue-saturation (IHS) transform [4], generalized IHS method (GIHS) [5], Brovey transform [6], principal component analysis [6], and Gram-Schmidt transform [7] belong to this group. The second group contains methods used to extract high-frequency components of PAN images via multiresolution analysis (MRA) and then add them to the MS images. To extract high-frequency components, high-pass filter, decimated wavelet transforms [8], a "trous" wavelet transform [9], Laplacian pyramid [10,11], and non-subsampled contourlet transform [12] methods are used. The third group contains methods that use machine-learning techniques such as compressed sensing [13] and deep learning [14,15]. The fourth group is the hybrid methods that combine multiple methods described above. The methods proposed by Vivone et al. [16], Fei et al. [17], and Yin [18] are the ones that combined the component substitution method and the MRA. The combination of the MRA, convolutional neural network (CNN), and sparse modeling was proposed by Wang et al. [19], and the combination of the MRA and CNN was proposed by He et al. [20].
Recently, proposals for hybrid methods using machine-learning techniques have been increasing. Many methods based on compressed sensing (CS) have been proposed based on the 2006 theory [21]. Since Yang et al. [13] proposed a method for super-resolution, it has been frequently applied to PS. Li et al. proposed a method using a learning-free dictionary and CS [22], followed by the proposal of a method using dictionary learning and CS [23]. Although they showed the possibility of using a dictionary without learning, the feasibility of these methods was low because high-resolution MS images that were not realistically available were necessary for dictionary construction. Jiang et al. proposed a method using a high-resolution dictionary generated using a set of pairs of low-resolution MS images and high-resolution PAN images [24]. They then proposed a method for reconstruction by calculating sparse representations in two stages using a learning-free dictionary [25]. Zhu et al. constructed a pair of high-resolution and low-resolution learning-free dictionaries from PAN images [26,27]. Guo et al. proposed a method called online coupled dictionary learning (OCDL) [28], which iteratively performs dictionary learning and reconstruction processes until the reconstructed image becomes stable, based on the sparse representation (SR) theory described by Elad [29]. SR theory shows that better reconstruction results are obtained when the dictionary's atoms are highly related to the reconstructed image. Vicinanze et al. proposed the generation of a learning-free dictionary using the high-resolution and low-resolution dictionaries as the detailed image information [30]. Ghahremani et al. proposed a learning-free dictionary with low-resolution PAN images and high-resolution detailed information extracted from by the ripplet transform of the edges and textures of high-resolution PAN images [31]. Zhang et al. introduced non-negative matrix factorization and proposed estimation of a high-resolution matrix by solving an optimization problem of decomposing the matrix into basis and coefficient matrices [32]. Ayas et al. created a dictionary of high-resolution MS image features by incorporating the tradeoff parameter [24,26] and back-projection [13,33]. It was shown that the spectral distortion was reduced by incorporating back-projection. Yin proposed the cross-resolution projection and the offset [18]. The cross-resolution projection generates high-resolution MS images by assuming that the position estimated by CS is the same for high-resolution and low-resolution images. The offset is used to adjust the reconstructed image.
In these studies, the results of PS depended on the model selection and dictionary selection. Various studies on the structure and construction of dictionaries, model construction, and optimization processes have been conducted. In addition, it was pointed out that CS-based reconstruction does not guarantee the reproduction of the original image [13,29,34], which means that the spatial characteristics may not be incorporated accurately. The fact that it is not an exact reconstruction should be considered when the method is used for spectral analysis. The process of back-projection was introduced by Yang et al. [13] to improve the reconstructed image. This process was also incorporated by Ayas et al. [34] to reduce the spectral distortion. On the other hand, it is also known that ringing artefacts occur when back-projection is performed. In the PS process, low spectral distortion is important as well as high resolution. It is important to consider how to achieve the fidelity of the reproduced image to the ideal image in terms of the spectral and spatial resolution.
In this paper, we focused on reducing the spectral distortion more effectively than the back-projection for resolution enhancement of a visible light image. To this end, we propose a method for pansharpening by combining the CS technique and a component substitution method that calculates the intensity with high spatial resolution and low spectral distortion to enhance the reproducibility. As the component substitution, spectrum correction using modeled panchromatic image (SCMP) [35] is introduced. The observation band of the PAN sensor of IKONOS, an earth observation satellite, was 526-929 nm. The red, green, blue, and NIR bands of the MS sensor of IKONOS were 632-698 nm, 505-595 nm, 445-516 nm, and 757-853 nm, respectively. Therefore, the PAN sensor covered the observation band of the MS sensor including NIR. The NIR information needs to be included when generating PS images using component substitution in order to avoid spectral distortion. The SCMP is a model that can correct this distortion. On the other hand, processing using the CS is expected to have high data reproducibility and it reflects the characteristics of the input image, while the resolution is lower than that generated by the SCMP. In order to improve the fidelity of the reproduced image to the original image, a tradeoff process was applied on the high-resolution intensity images obtained by the CS and the SCMP. It was found that a spatial resolution equivalent to that of PAN image was obtained and spectral distortion was reduced by the proposed method. Table 1 shows the two image datasets used for the experiments. The first was collected in May 2008 and covers the city of Nihonmatsu, Japan. The second was collected in May 2006 and covers the city of Yokohama, Japan. The two IKONOS images datasets were provided by the Japan Space Imaging Corporation, Japan. The spatial resolutions of the PAN and the MS images in these datasets were 1 m and 4 m, respectively. The original dataset contained PAN images with 1024 × 1024 pixels and MS images with 256 × 256 pixels for Nihonmatsu, and PAN images with 1792 × 1792 pixels and MS images with 448 × 448 pixels for Yokohama. To evaluate the quality of the PS images, we experimented with the test images and original images according to the Wald protocol [36]. The test images were used to evaluate the numerical image quality, and the original images were used as reference images for numerical and visual evaluation. We regarded the original images as ground truth images. The spatial resolution of the test PAN images was reduced from 1 to 4 m and that of the test MS images was reduced from 4 to 16 m. Hence, the test image datasets had PAN images with 256 × 256 pixels and MS images with 64 × 64 pixels for the Nihonmatsu, and PAN images with 448 × 448 pixels and MS images with 112 × 112 pixels for the Yokohama. The training and test images were downsampled images of the original image with bicubic spline interpolation.

Compressed Sensing
Compressed sensing (CS) is a technique used to reconstruct unknown data from a small number of observed data. In theory, the original data can be estimated when the data is sparse [28].
We considered the problem of reconstructing an image z h ∈ R n 0 with a higher resolution than the observed low-resolution image z l ∈ R m 0 (m 0 < n 0 ). The relationship between the high-resolution image and the low-resolution image can be expressed by Equation (1).
where S is a downsampling operator and H is a filter that lowers the resolution. At this time, since the dimensionality of z l is smaller than that of z h , the solution cannot be uniquely determined. Based on the compressed sensing theory, the high-resolution image z h is estimated by Equation (2) from the image element D h and sparse representation a. The element for reproducing the image is called atom d i ∈ R n 0 , and the set of atoms is called the dictionary D h ∈ R n 0 ×N d .
Using Equation (2), Equation (1) can be expressed as z h can be reproduced by z l using the sparse representation a obtained from Equation (4), in which the sparsity constraint is added to Equation (3).
Equation (4) can be solved using optimization methods.

Proposed Method
Our proposed method combines the advantages of super-resolution based on the theory of compressed sensing and component substitution. In this method, the high-resolution intensity obtained by the SCMP and the high-resolution intensity obtained by the CS-based method are linearly combined using the tradeoff process, and the obtained high-resolution intensity images are fused via the GIHS method to generate PS images. The flow of the proposed method is shown in Figure 1.
where is a downsampling operator and is a filter that lowers the resolution. At this time, since the dimensionality of is smaller than that of , the solution cannot be uniquely determined. Based on the compressed sensing theory, the high-resolution image is estimated by Equation (2) from the image element and sparse representation . The element for reproducing the image is called atom ∈ ℝ , and the set of atoms is called the dictionary Using Equation (2), Equation (1) can be expressed as , . ( can be reproduced by using the sparse representation obtained from Equation (4), in which the sparsity constraint is added to Equation (3).
Equation (4) can be solved using optimization methods.

Proposed Method
Our proposed method combines the advantages of super-resolution based on the theory of compressed sensing and component substitution. In this method, the high-resolution intensity obtained by the SCMP and the high-resolution intensity obtained by the CS-based method are linearly combined using the tradeoff process, and the obtained high-resolution intensity images are fused via the GIHS method to generate PS images. The flow of the proposed method is shown in Figure 1.

Notation
In the proposed method, four features are extracted from low-resolution images, and the set of features is called the feature map. We used the gradient map proposed by Yang et al. [13] as the feature map. Let ∈ ℝ be a patch of size extracted from a high-resolution image, and

Notation
In the proposed method, four features are extracted from low-resolution images, and the set of features is called the feature map. We used the gradient map proposed by Yang et al. [13] as the feature map. Let x high i ∈ R p 2 ×1 be a patch of size p × p extracted from a high-resolution image, and x low i ∈ R 4p 2 ×1 be a set of four patches of size p × p extracted from the feature map. The ith training data patch and X low = x low 1 , · · · , x low N t ∈ R 4p 2 ×N t represent the set of patches of the training data, high-resolution training data, and low-resolution training data, respectively, where N t is the number of training data.
is a high-resolution dictionary atom of size p × p, and d low i ∈ R 4p 2 ×1 is a low-resolution dictionary atom of size p × p. All the atoms are arranged in raster scan order. The high-resolution dictionary D high ∈ R p 2 ×N d and the low-resolution dictionary Y low ∈ R 4p 2 ×N patch indicates the feature map of the low-resolution input image to be reconstructed, where N patch is the number of patches of the input image. I sr ∈ R p 2 ×N patch represents the reconstructed high-resolution image. I sr ∈ R p 2 ×N patch represents the mean value of the intensity values of the reconstructed high-resolution image patch. The sparse representations are denoted as α ∈ R N t ×N patch and β ∈ R N d ×N t , and λ represents the sparsity regularization parameter.

Estimation of Coefficients for Intensity Correction
The coefficient estimated by SCMP is used for the intensity correction in the proposed method. It is possible to obtain an intensity correction value with little spectral distortion with SCMP. This coefficient is calculated by argmin where where k indicates the pixel position. N is the number of pixels and PAN high ∈ R N is the downsampled PAN test image, of which the size is the same as that of I low obtained via the bicubic interpolation.
The suffixes nir, blu, grn, and red represent the NIR, blue, green, and red color components of the MS image, respectively. For MS low nir , MS low blu , MS low grn , and MS low red , test MS images are used. I low is calculated by Note that Equation (6) does not include NIR because it is the intensity of the RGB image.

Dictionary Learning
The high-resolution dictionary and the low-resolution dictionary were constructed via Equation (7) using the corresponding pair of training images. These are shown as Equation (8).
where β ∈ R N d ×N t represents the sparse representation and X = X high X low , D = D high D low . The dictionary was obtained by solving the optimization problem of Equation (9), where the regularization conditions and constraints are added to Equation (8).
where λ is the normalization parameter. The training data used for dictionary learning were training PAN images for the high-resolution dictionary and the feature map obtained from its corresponding low-resolution training PAN image for the low-resolution dictionary as shown in Table 1. X high and X low were obtained from a high-resolution training PAN image and its corresponding low-resolution training PAN image. Given a high-resolution training PAN image, it was divided into regions of size p × p. The high-resolution patch x is obtained. Given the low-resolution training PAN image, the feature map was calculated with four filters of the first derivative and the second derivative defined by where T indicates transposition. The feature map was divided into patches of p × p, and each patch was normalized. Since the feature map was calculated from the entire image, each patch contained the information of its adjacent patch. By arranging the normalized feature map in the raster scan order for The algorithm of the dictionary learning is described as follows.
(1) Obtain X high and X low from the training data. Each column of X = X high X low is then normalized.
(2) Set the initial value of the dictionary D. Random numbers that follow the Gaussian distribution with mean 0 and variance 1 are normalized for each patch region.
(3) Estimate the sparse representation β by solving the optimization problem of Equation (10) by fixing the dictionary D.
(4) Estimate the dictionary D by solving the optimization problem of Equation (11) by fixing the sparse representation β. (5) Steps (3) and (4) are repeated. (In the experiment, we repeated 40 times.) (6) The obtained dictionary D is normalized for each patch and used as the final trained dictionary D.

Reconstruction Process
Assuming that the low-resolution image and the high-resolution image have the same sparse representation, the sparse representation of the low-resolution image can be obtained. The sparse representation α was estimated by solving the optimization problem of Equation (12).
The reconstruction process is performed as follows. In this study, the resolution of RGB image was increased.
(1) The low-resolution RGB images are upsampled via bicubic interpolation to the size of the PAN image. The upsampled low-resolution intensity I low up is calculated using Equation (6) with the upsampled RGB image.
(2) The feature map Y low is obtained from the upsampled low-resolution intensity I low up . After applying the four filters shown in Section 2.6 to I low up , the feature map Y low ∈ R 4p 2 ×N patch is obtained, where the overlap of adjacent patches are p × 1 and 1 × p for horizontal and vertical directions.
I low patch ∈ R p 2 ×N patch is then obtained from I low up where the overlap of adjacent patches are p × 1 and 1 × p for horizontal and vertical directions.
(3) The sparse representationα is calculated using Equation (12). (4) The high-resolution intensity image I sr is obtained from the sparse representationα and the high-resolution dictionary D high by Equation (13), and it is normalized for each patch. The average value of the jth patch, I sr ( j), is calculated with the upsampled low-resolution intensity I low patch ( j), and it is added to the patch of the high-resolution intensity I sr ( j) using Equation (14).
(5) Using the patches of the obtained high-resolution intensityÎ sr ( j), the image is reconstructed. The mean value of the overlapped pixels is used as the value of the pixel in the adjacent overlapping patches.

Tradeoff Process
The intensity of SCMP is calculated using Equation (15) after obtaining the intensity I low up via Equation (6).
where k indicates the pixel position. The low-resolution PAN image, PAN low , is obtained using Equation (16).
where I low up , MS low up,nir , MS low up,blu , MS low up,grn , and MS low up,red are the intensities of the low-resolution RGB image, NIR, blue, green, and red, respectively, and these are upsampled to the same sizes as those of the PAN image.
If the PAN image is corrected using the intensity I scmp obtained via SCMP, there will be little loss of spatial information. Since the intensity correction is performed appropriately on the image, the spectral distortion when using SCMP is smaller than that of the other component substitution methods. Figure 2 shows the intensity of the original RGB image, the intensity image generated by SCMP, and the intensity image obtained by CS of Nihonmatsu and Yokohama images. From this figure, it can be seen that the intensity obtained by CS had less spatial information than SCMP. In order to increase the quality of the intensity reproduced by CS using SCMP, the intensities obtained by the CS and the SCMP were combined linearly using the tradeoff parameter τ. In the tradeoff process, the high-resolution intensity images were obtained by SCMP and CS, and these were linearly combined by Equation (17).
Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 20 order to increase the quality of the intensity reproduced by CS using SCMP, the intensities obtained by the CS and the SCMP were combined linearly using the tradeoff parameter . In the tradeoff process, the high-resolution intensity images were obtained by SCMP and CS, and these were linearly combined by Equation (17).

Generalized IHS Method
For the fusion process, the generalized IHS method (GIHS) [5] proposed by Tu et al. was used. The intensity was calculated using Equation (6), and Equation (18) was applied to red-, green-, and blue-band RGB images.

Generalized IHS Method
For the fusion process, the generalized IHS method (GIHS) [5] proposed by Tu et al. was used. The intensity was calculated using Equation (6), and Equation (18) was applied to red-, green-, and blue-band RGB images.
where MS high red , MS high grn ,MS high blu , MS low red , MS low grn , and MS low blu are the intensities of the high-resolution and low-resolution RGB images.

Quality Evaluation
Visual and numerical evaluations were performed according to the Wald protocol [36]. The original MS image was used as the reference image. Correlation coefficient (CC), universal image quality index (UIQI) [37], erreur relative global adimensionnelle de synthese (ERGAS) [38], and spectral angle mapper (SAM) [39] were used for numerical evaluation. These are major evaluation criteria and used in almost all PS-related research [2]. The CC is given by where a value closer to 1.0 implies a smaller loss of the intensity correlation and a better result. N and |B| are the total number of pixels in the entire image for each band and the number of bands in the PS image, respectively. O b (i) and O b denote the ith pixel value of the b-band reference image and its mean value, respectively, and PS b (i) and PS b denote the ith pixel value of the b-band PS image and its mean value, respectively. UIQI is an index for measuring the loss of intensity correlation, intensity distortion, and contrast distortion and is given by where σ O b and σ PS b are the standard deviation of the reference and PS images in the b-band, respectively, and σ O b,PS b denotes the covariance of the reference and PS images in the b-band. A value closer to 1.0 implies that these losses are small. The size of the UIQI sliding window was 8 × 8.
The ERGAS is given by where h and l denote the spatial resolution of the PAN and MS images, respectively. The smaller the ERGAS value, the better the image quality. The SAM is an index for measuring spectral distortion and is given by If the value is closer to 0.0, the spectrum ratio of each band is closer to the reference image.

Dictionary Learning
We used PAN images from the Nihonmatsu and Yokohama datasets as training images for dictionary learning. For training image X and dictionary, the number of training times was 40, the number of atoms of the dictionary D was 1024, and the size of each atom was p = 4. The number of patches was 6433 (Nihonmatsu) and 10,000 (Yokohama). The training dataset was a set of 1000 patches randomly selected from these patches. As the training image X, a training PAN image for the high-resolution dictionary was used as the high-resolution data X high , and a training PAN image for the low-resolution dictionary was used as the low-resolution data X low . The low-resolution image was generated by downsampling and upsampling via bicubic interpolation. The sparsity regularization parameter was λ = 0.1. The sparse representation β was calculated using Equation (10) by solving the L1 norm regularized least-squares problem, and the dictionary D was obtained using Equation (11) by solving the least-squares problem with quadratic constraints. We used the code provided by Lee et al. [40]. The learned high-resolution dictionaries are shown in Figure 3. For Nihonmatsu images, the dictionary created using Nihonmatsu images was used. For Yokohama images, the dictionary created using Yokohama images was used.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 20 for the low-resolution dictionary was used as the low-resolution data . The low-resolution image was generated by downsampling and upsampling via bicubic interpolation. The sparsity regularization parameter was 0.1. The sparse representation was calculated using Equation (10) by solving the L1 norm regularized least-squares problem, and the dictionary was obtained using Equation (11) by solving the least-squares problem with quadratic constraints. We used the code provided by Lee et al. [40]. The learned high-resolution dictionaries are shown in Figure 3. For Nihonmatsu images, the dictionary created using Nihonmatsu images was used. For Yokohama images, the dictionary created using Yokohama images was used.

Reconstruction Process
The bicubic interpolation was used to upsample the intensity of the low-resolution RGB image. The size of the patch of the input low-resolution data was 4, and the overlapping region of adjacent patches was 1 for the horizontal direction and 1 for the vertical direction. The filter size used for the back-projection process was 5 5. The sparse representation was calculated using Equation (12) by solving the L1 norm regularized least-squares problem. We used the code provided by Lee et al. [40].

Reconstruction Process
The bicubic interpolation was used to upsample the intensity I low of the low-resolution RGB image. The size of the patch of the input low-resolution data Y low was p = 4, and the overlapping region of adjacent patches was p × 1 for the horizontal direction and 1 × p for the vertical direction. The filter size used for the back-projection process was 5 × 5. The sparse representation α was calculated using Equation (12) by solving the L1 norm regularized least-squares problem. We used the code provided by Lee et al. [40].

Coefficients for Intensity Correction
The correction coefficients estimated by SCMP are shown in Table 2. These values were used for the fusion process.

Tradeoff Parameter
In the tradeoff process, the intensity images obtained by SCMP and CS were combined by Equation (17) using the tradeoff parameter τ (0 ≤ τ ≤ 1). The tradeoff parameter is a hyperparameter that should be determined in advance. In general, experimental validation is carried out via cross-validation when there are hyperparameters. Since we used two kinds of satellite images (Ninohmatsu and Yokohama), the value of τ was determined using one of these datasets, and the other dataset was processed using the determined value. In our experiment, the tradeoff parameter was determined using the correlation coefficient (CC) and ERGAS [38]. The results of applying the tradeoff parameter with a step size of 0.1 for image quality evaluation are shown in Table 3. From this result, it was found that the resolution and numerical evaluation were improved when the sum of squares of 1-CC and ERGAS was the smallest. This relationship is shown in Equations (23) and (24).
The tradeoff parameter was determined as the value that minimizes Equation (24). Figure 4 displays the results of evaluation of the intensity of RGB images with various tradeoff parameters. The red line shows Nihonmatsu, the blue line shows Yokohama, the solid line is the result of CC, and the dotted line is the result of ERGAS. Figure 5 shows three images with some tradeoff parameter values. The lower the evaluation index S, the better the visual appearance. From these results, the value that minimized S(τ) was τ = 0.4 for Nihonmatsu and τ = 0.3 for Yokohama. Therefore, in the following experimetns, τ = 0.3 was used for Nihonmatsu and τ = 0.4 was used for Yokohama.     Table 4 shows the effect of the back-projection (BP). The results of the intensity obtained using the sparse representation (SR) and the intensity obtained by repeating BP ten times with a size 5 filter are shown. These were compared by CC, UIQI, and ERGAS, and applying BP was better in all cases. Table 5 shows the effect of the tradeoff process (TP). The intensity images generated by SCMP, SR, and TP are shown. TP was the best in all evaluations. Table 6 shows the comparison of BP and TP. TP was better in every case.   Table 4 shows the effect of the back-projection (BP). The results of the intensity I sr obtained using the sparse representation (SR) and the intensity obtained by repeating BP ten times with a size 5 filter are shown. These were compared by CC, UIQI, and ERGAS, and applying BP was better in all cases. Table 5 shows the effect of the tradeoff process (TP). The intensity images generated by SCMP, SR, and TP are shown. TP was the best in all evaluations. Table 6 shows the comparison of BP and TP. TP was better in every case. Tables 7 and 8 show numerical evaluations of the existing methods and the proposed method. The existing methods include fast IHS [5], Gram-Schmidt method (GS) [7], band-dependent spatial detail (BDSD) [41], weighted least-squares (WLS)-filter-based method (WLS) [42], multiband images with adaptive spectral-intensity modulation (MDSIm) [43], spectrum correction using modeled panchromatic image (SCMP) [35], image super-resolution via sparse representation (ISSR) [13] using natural images (Dict-natural) and corresponding images (Dict-self), the sparse representation of injected details (SR-D) [30], and the method of sparse representation described by Ayas et al. (SRayas) [34]. Table 4. Numerical evaluation of image intensities with or without the back-projection process of the intensity obtained via sparse representation. The best values are given in bold.  The size of the local estimation of the distinct block of BDSD was 256 × 256 for Nihonmatsu and 448 × 448 for Yokohama. For the ISSR settings, the training images for the dictionary included training PAN images and natural images, the number of training times was 40, the number of atoms in the dictionary was 1024, the atom size of the dictionary was 4 × 4 × 5, the sparsity regularization parameter was 0.1, randomly selected 1000 training image patches were used, the upscale was 4 (ratio of resolution of PAN images and MS images of IKONOS), overlap pixel of patches in the reconstruction process was R 1×p in horizontal direction and R p×1 in vertical direction, the size of the back-projection filter was 5 × 5, and the number of iterations was 10. For SR-D, high-resolution and low-resolution dictionaries were constructed from the original PAN images without training. The atom size of the high-resolution dictionary was R 28×28×4 and the overlapping areas of the adjacent atoms were R 16×p , R p×16 ; the atom size of the low-resolution dictionary was R 7×7×4 and and the overlapping area of the adjacent atoms were R 4×p , R p×4 .For the SRayas setting, the original IKONOS MS images were used as the training images for the dictionary, the number of training times was 20, the number of atoms in the dictionary was 4096, the size of the dictionary atom was 8 × 8 × 4, the sparsity regularization parameter was λ = 0.15, the number of training image patches was 2000, the upscale was 4 (ratio of resolution of PAN images and MS images of IKONOS), β = 0.25, the weight of each spectral band of IKONOS was w = [0.1071, 0.2646, 0.2696, 0.3587], the overlap pixel in the patch of reconstruction process was 0, the back-projection filter size was 3 × 3, and the number of repetitions was 20. These settings of the existing methods followed those described in the original papers except the distinct block size of BDSD. The code of Vivone et al. [2] was used for GS and BDSD, and the code of Yang et al. [13] was used for ISSR. the back-projection filter was 5 5, and the number of iterations was 10. For SR-D, high-resolution and low-resolution dictionaries were constructed from the original PAN images without training. The atom size of the high-resolution dictionary was ℝ and the overlapping areas of the adjacent atoms were ℝ , ℝ ; the atom size of the low-resolution dictionary was ℝ and and the overlapping area of the adjacent atoms were ℝ , ℝ .For the SRayas setting, the original IKONOS MS images were used as the training images for the dictionary, the number of training times was 20, the number of atoms in the dictionary was 4096, the size of the dictionary atom was pixel in the patch of reconstruction process was 0, the back-projection filter size was 3 3, and the number of repetitions was 20. These settings of the existing methods followed those described in the original papers except the distinct block size of BDSD. The code of Vivone et al. [2] was used for GS and BDSD, and the code of Yang et al. [13] was used for ISSR. In Tables 7 and 8, the highest scores are printed in bold, and the second highest scores are underlined. GS was generally not good except for the SAM of Yokohama. The results of BDSD were unremarkable but stable. SCMP was stable and gave good results. In ISSR, differences in training images had little effect on results. SRayas did not perform as well overall as ISSR and SRayas. Although some results, such as the CC of SR-D of Table 7, MDSIm of Table 8, and SAM of GS were better in part than the proposed method, in many other cases they were less accurate than the proposed method. The results of the proposed method were generally good, although there are some differences due to the tradeoff parameter. Figure 6 shows the ranking of the quality metric of the numerical evaluations of Tables 7 and 8, except for the proposed method. For each test, the best result was worth three points, the second-best result was worth two points, and the third-best result was worth one point. The highest score was 24 points. This figure shows that only three methods, WLS, SCMP, and the proposed method, were good for both of the images, and the proposed method got the highest score.  Figure 7, in GS (f), the color of green was darker in the rice field area (indicated by the green arrow), while the forest area was whitish. BDSD (g) was more blurred than other images. In MDSIm (i), the color of the forest area was also whitish (indicated by the yellow arrow). In Figure 8, the vegetation area was whitish in GS (f) and MDSIm (i) (indicated by the red arrow). WLS (h) looked hazy. SR-D (m) had lower resolution than the other methods using In Tables 7 and 8, the highest scores are printed in bold, and the second highest scores are underlined. GS was generally not good except for the SAM of Yokohama. The results of BDSD were unremarkable but stable. SCMP was stable and gave good results. In ISSR, differences in training images had little effect on results. SRayas did not perform as well overall as ISSR and SRayas. Although some results, such as the CC of SR-D of Table 7, MDSIm of Table 8, and SAM of GS were better in part than the proposed method, in many other cases they were less accurate than the proposed method. The results of the proposed method were generally good, although there are some differences due to the tradeoff parameter. Figure 6 shows the ranking of the quality metric of the numerical evaluations of Tables 7 and 8, except for the proposed method. For each test, the best result was worth three points, the second-best result was worth two points, and the third-best result was worth one point. The highest score was 24 points. This figure shows that only three methods, WLS, SCMP, and the proposed method, were good for both of the images, and the proposed method got the highest score.

Method
Appl. Sci. 2020, 10, x FOR PEER REVIEW 16 of 20 sparse representation with back-projection. In both Figures 7 and 8, the results of ISSR (k) (l) and SRayas (n) had ringing artefacts. Other images appeared to be reproduced without problems.

Discussion
From the results in Table 4, it was found that the back-projection (BP) was effective from the viewpoint of improving spectral distortion. From the results in Table 5, it was found that the tradeoff process (TP) is effective in improving spectral distortion. Furthermore, since the TP was better than the individual methods of sparse representation (SR) and SCMP, it was clarified that these methods complementarily improve the spectral distortion. From the results in Table 6, it was found that the TP improved spectral distortion more than BP. In the results shown in Tables 7 and 8, the method using GIHS (WLS and SCMP) was better than the existing methods using SR. In addition, it was found that the proposed method, the linear combination of SR and SCMP, gave better results than SCMP alone. One of the problems to be solved in PS processing is the independence of the processed  Figure 7, in GS (f), the color of green was darker in the rice field area (indicated by the green arrow), while the forest area was whitish. BDSD (g) was more blurred than other images. In MDSIm (i), the color of the forest area was also whitish (indicated by the yellow arrow). In Figure 8, the vegetation area was whitish in GS (f) and MDSIm (i) (indicated by the red arrow). WLS (h) looked hazy. SR-D (m) had lower resolution than the other methods using sparse representation with back-projection. In both Figures 7 and 8, the results of ISSR (k) (l) and SRayas (n) had ringing artefacts. Other images appeared to be reproduced without problems.

Discussion
From the results in Table 4, it was found that the back-projection (BP) was effective from the viewpoint of improving spectral distortion. From the results in Table 5, it was found that the tradeoff process (TP) is effective in improving spectral distortion. Furthermore, since the TP was better than the individual methods of sparse representation (SR) and SCMP, it was clarified that these methods complementarily improve the spectral distortion. From the results in Table 6, it was found that the TP improved spectral distortion more than BP. In the results shown in Tables 7 and 8, the method using GIHS (WLS and SCMP) was better than the existing methods using SR. In addition, it was found that the proposed method, the linear combination of SR and SCMP, gave better results than SCMP alone. One of the problems to be solved in PS processing is the independence of the processed image. In other words, it is important to obtain stable and good results rather than obtaining good results only on a specific image. Although there are some methods shown in Tables 7 and 8 that gave better results than the proposed method, comparing the other evaluation results shows that the results were inconsistent. The reason why the evaluation results were so different could be that these processing methods depend on the processed image.
As shown in Figures 7 and 8, it was found that the reproduction of vegetation area by the GS was unstable. Since the image quality differs depending on the size of the local estimation on the distinct blocks used in BDSD, we evaluated the visibly good images with good numerical values, but the resolution of the image was low. The WLS gave good numerical results with two images, but the images were blurred. Both ISSR and SRayas using BP generated ringing artifacts. Other images seemed to be reproduced without problems in resolution and color. Among them, the proposed method gave the best results in the numerical evaluation.
In this method, resolution enhancement was achieved by using the visible and NIR regions. On the other hand, it can be applied only to the resolution enhancement of RGB images, and not NIR images.

Conclusions
In this paper, we proposed a method for pansharpening based on CS theory. In the proposed method, the intensity obtained from the component substitution method and the intensity obtained via the method based on CS theory are fused to reproduce the intensity close to the original. We introduced SCMP as the intensity substitution method and used the tradeoff process for image fusion. Experimental results showed that the proposed method outperformed existing methods in terms of numerical and visual evaluation. The proposed method was also effective for satellites with panchromatic sensors (observed areas are visible and NIR regions) and multispectral sensors (observed areas are red, blue, green, and NIR bands) like IKONOS.
Generally, the intensity image generated by a CS-based method is blurrier than the intensity image generated by the component substitution method, because component substitution captures the intensity of the PAN image. On the other hand, it is expected that the spectral distortion of the intensity image generated by the CS-based method will be lower than that of the image generated by the component substitution method. Since complete restoration is not guaranteed, in order to get the image close to a complete reproduction, back-projection methods can be used. However, they may cause ringing artifacts. Based on these considerations, our proposed method combines the intensities generated by the CS-based method and the component-substitution-based method via the tradeoff process instead of the back-projection to achieve both an improvement of spatial resolution and a reduction of spectral distortion. Experimental results show that the tradeoff process was more effective than the back-projection in generating a pansharpened image of which the spatial resolution was equivalent to that of the PAN image and reducing spectral distortion. Improvement of the accuracy by parameter tuning is important future work.

Acknowledgments:
The authors thank the Japan Space Imaging Corporation and Space Imaging, LLC for providing the images.

Conflicts of Interest:
The authors declare no conflict of interest.