An IHS-Based Pan-Sharpening Method for Spectral Fidelity Improvement Using Ripplet Transform and Compressed Sensing

Pan-sharpening aims at integrating spectral information from a multi-spectral (MS) image and spatial information from a panchromatic (PAN) image in a fused image with both high spectral and spatial resolutions. Numerous pan-sharpening methods are based on intensity-hue-saturation (IHS) transform, which may cause evident spectral distortion. To address this problem, an IHS-based pan-sharpening method using ripplet transform and compressed sensing is proposed. Firstly, the IHS transform is applied to the MS image to separate intensity components. Secondly, discrete ripplet transform (DRT) is implemented on the intensity component and the PAN image to obtain multi-scale sub-images. High-frequency sub-images are fused by a local variance algorithm and, for low-frequency sub-images, compressed sensing is introduced for the reconstruction of the intensity component so as to integrate the local information from both the intensity component and the PAN image. The specific fusion rule is defined by local difference. Finally, the inverse ripplet transform and inverse IHS transform are coupled to generate the pan-sharpened image. The proposed method is compared with five state-of-the-art pan-sharpening methods and also the Gram-Schmidt (GS) method through visual and quantitative analysis of WorldView-2, Pleiades and Triplesat datasets. The experimental results reveal that the proposed method achieves relatively higher spatial resolution and more desirable spectral fidelity.


Introduction
Remote sensing (RS) is a general approach for the knowledge extraction of the Earth's surface structure and content through acquiring and interpreting the spectral characteristics from a great distance [1]. However, the well-known trade-off between spatial resolution and spectral resolution has always precluded the further application of RS products. Pan-sharpening is a desirable solution to settle such dilemma. In fact, most of the Earth observation satellite images, such as IKONOS, QuickBird, and the WorldView family (including WorldView-2/3/4), can only provide a panchromatic (PAN) image with high spatial resolution but low spectral resolution together, with a multi-spectral (MS) image with low spatial resolution but high spectral resolution, respectively [2]. Pan-sharpening, a special case of image fusion, is capable of obtaining an image with both high spatial and high spectral resolutions. In other words, the technique provides a fused image with the same spectral response as the MS image and with the spatial resolution of the PAN image. The pan-sharpened images have been are directly extracted from each spectral channel of an MS image. The dictionary construction takes the correlation between PAN and MS images into consideration to transform pan-sharpening into image restoration.
However, most IHS-based algorithms utilize MRA transforms to improve spectral fidelity without the aid of a compressed sensing technique, which limits the local information preservation of the pan-sharpened image [16,18,45,46]. A few methods utilize MRA tools and compressed sensing, e.g., the method based on wavelet transform and compressed sensing proposed by Cheng et al. [47]. Nevertheless, most existing studies possess intrinsic limitations in the extraction and reconstruction of spatial details using conventional MRA tools. Conventional MRA tools (e.g., discrete wavelet transform and à trous wavelet transform), have limited capability to resolve one-dimensional (1D) singularities or two-dimensional (2D), isotropic singularities. Hence, the pan-sharpened images usually suffer from Gibbs phenomena. Furthermore, curvelet transform is suggested as representing 2D singularities more efficiently. However, the scaling rule to achieve anisotropic directionality has not yet been thoroughly demonstrated.
In recent years, the shift invariant ripplet transform (RT) [48] has been proposed and extensively adopted in feature extraction. RT is able to represent 2D singularities (e.g., edges and textures) more efficiently at multiple scales and in multiple directions along a family of curves. Several studies [23,48,49] demonstrate that RT is superior to wavelet-and curvelet-based MRA transforms in terms of the effectiveness and efficiency of edge and texture extraction and reconstruction. In view of this, an IHS-based pan-sharpening method combining the high spectral fidelity of discrete ripplet transform (DRT) [49] and desirable local information preservation of compressed sensing [42,50] is proposed in this paper. Specifically, IHS transform is firstly performed on MS images to obtain the intensity component. Then, smoothing filter-based intensity modulation (SFIM) [51] is introduced to modulate spatial information contained in the intensity component without altering its spectral characteristic. The ripplet transform [23] is implemented on the intensity component and the PAN images to extract high-frequency (HF) and low-frequency (LF) sub-images. Subsequently, compressed sensing technique is applied to reconstruct the LF sub-images. Various fusion rules are adopted for the HF sub-images and LF sub-images according to their specific characteristics. The local variance algorithm [39] and local difference weighted algorithm are respectively selected as the fusion rules for HF and LF coefficients. Finally, the pan-sharpened image is obtained by inverse ripplet transform and inverse IHS transform. Experiments on WorldView-2, Pleiades and Triplesat datasets corroborate that the proposed method ensures superior spectral fidelity compared to several state-of-the-art approaches.
The reminder of the paper is structured as follows. Section 2 briefly reviews the related works. The proposed method is introduced in Section 3. The experimental results implemented on multiple RS images and discussion are provided in Section 4. Section 5 draws a brief conclusion of this paper.

Related Work
The four relevant techniques utilized in pan-sharpening are briefly reviewed here. Section 2.1 discusses the principle of the IHS-based pan-sharpening method and the cause of spectrum distortion. Section 2.2 introduces discrete ripplet transform. Sparse representation of images and the compressed sensing theory are briefly illustrated in Section 2.3.

IHS Transform and Spectral Distortion
The framework of IHS and IHS-like pan-sharpening methods is firstly presented in [12] and thoroughly illustrated in [52] as: where M F is the pan-sharpened image, M denotes the original MS image, I is the intensity component separated by IHS transform, G represents the vector modulating the gains of spatial details injection, B is the number of bands in the MS image, and b denotes the bth band of the pan-sharpened or MS image. The subscript "UP" denotes the up-sampling process. In practice, the up-sampled image is produced by cubic convolution interpolation for operational convenience. It is noteworthy that this process can be implemented more properly by integrating with up-sampling, a low-pass filter and modulate transfer function (MTF) [53,54]. P HM is the histogram-matched PAN image. The up-sampled I component is generated by: where w is the weight vector which can be empirically set as 1 B [13] and can be experimentally chosen to weigh the spectral overlapping between PAN and MS images [24].
According to reference [12], in the inverse IHS transform for obtaining a pan-sharpened image, the input hue (H) and saturation (S) components remain unchanged, while the input (I) component is replaced by P HM , essentially. We denote intensity variation (δ) occurs in the IHS-based pan-sharpening method which can be measured as: Due to the sensor type, local dissimilarities, and spectral mismatch between the PAN and MS images [12,29,30], δ is not equal to zero. Furthermore, the variation of I leads to a distorted S component of the pan-sharpened image. Thus, under the prerequisite of instrumentally and temporally identical pan-sharpening, spectral distortion can be remitted by decreasing δ by means of reconstructing a new intensity component that combines spectral characteristics of I and the PAN image. The proposed method is developed based on this concept. Remarkably, the maximization of image quality score index Q4 [52] is an ideal approach for distortion minimization, which is proved superior in extensive practice.

Discrete Ripplet Transform (DRT)
Ripplet transform [49] is proposed to address the problem that 2-D singularities along arbitrarily shaped curves cannot be resolved by conventional wavelet transform. RT generalizes the curvelet by introducing two parameters, i.e., support, c, and degree, d. The two parameters empower RT with the anisotropic capability to represent singularities along arbitrarily shaped curves.
The discretized representation of RT (i.e., DRT) is actually based on the discretization of the parameters of ripplets, which is similar to discrete curvelet transform [55]. In reference [48,49], DRT is defined at scale a j = 2 −j , orientation θ l = 2Π c ·2 − j(1−1/d) ·l, and position where → k = [k 1 , k 2 ] T , j, k 1 , k 2 , l ∈ Z, d ∈ R, m, n ∈ N, and d = n m . In the frequency domain, we can represent the frequency response of the ripplet function as: where the radial window W and the angular window V satisfy the following admissible conditions: given c, d and j. The "wedge" corresponding to the ripplet function in the frequency domain is as follows: The DRT of an M × N image f (n 1 , n 2 ) will be in the form of The image can be reconstructed through inverse discrete ripplet transform (IDRT):

Compressed Sensing and Sparse Representation
The basic assumption of images in sparse representation (SR) is that each original image can be represented as a linear combination of small number of atoms from a specific dictionary [42,56,57]. The main information, including local spatial features and internal structure of original images, can be concentrated in the atoms using SR. With the assumption that an original image patch s has a consistent structure, the sparse representation α of the original image patch using a specific dictionary D can be modeled as: where · d=0,2 denotes the L d -norm, and α 0 is the number of non-zero components in α. Practically, directed by compressed sensing theory [42,50], we decimate a few measurements of y using decimation matrix M as: where M ∈ R n×k , k is the number of atoms in the dictionary, and n k. Combining with the sparse representation of images and compressed sensing [27], the sparsest α ensures that image patch s can be recovered from y through the optimization process as: where s = Dα, and ε is a fairly small constant that denotes the error tolerance. The optimization of Equation (12) is a Non-deterministic Polynomial (NP)-hard problem [26,27]. According to references [26,58], Equation (12) can be converted into the L 1 -norm minimization problem as: The basis pursuit (BP) algorithm [59] is the well-known and most commonly used optimization method to solve Equation (13). The image patches reconstruction using the BP algorithm can be represented in the standard linear formation as: where θ = (α + , α − ), c = (1; 1), A = (MD; −MD), b = y, and α + and α − are the positive and negative value vectors of α, respectively.α in Equation (12) is obtained byα = α + − α − . The sparsely reconstructed image is obtained byŝ = Dα.

Proposed Method
Based on the concept introduced in Section 2.1, the proposed method focuses on reducing spectral distortion δ by obtaining a reconstructed I component with spatial details and spectral characteristics from the PAN and MS images, respectively. We perform SFIM on the I component and the PAN image at first, and then utilize DRT to obtain HF and LF sub-images of the I component. The local variance algorithm is adopted as the fusion rule for the HF coefficient containing spatial information such as textures and edges. The LF coefficient is reconstructed using a compressed sensing technique and consequently fused according to local differences.

Spectral Fidelity Improvement of Intensity Component Using Sfim
In the proposed method, SFIM is performed on the PAN images and the I component to generate the new I component (I SFI M ) with modulated spatial details while spectral diversity is maintained. SFIM has two advantages which encourage its adoption for improving the spectral fidelity of I SFI M . On one hand, the smoothing filter kernel of SFIM leads to a reduction of co-registration accuracy sensitivity. On the other hand, SFIM can be performed on individual color components. The smoothing filter kernel is an average function to obtain a local average on each pixel in a high-resolution PAN image. A smoothing filter kernel with size of s × s can be represented as: SFIM can be represented as: where I(x, y) is a pixel of the I component, P(x, y) is the corresponding pixel in the PAN image, and P Mean is the smoothed mean image of the PAN image. The size of the kernel is dependent on the ratio of resolution between the PAN and MS images, which generally equals 4 for most high resolution satellite images [51], e.g., WorldView-2, Pleiades-1A, or Triplesat ( Table 1). The kernel size of the smoothing filter is not constant but changes with the experiment data. The optimization of kernel size of SFIM is experimentally discussed in Section 4.3.1.

Intensity Component Reconstruction Using DRT and Compressed Sensing
DRT and compressed sensing are combined to obtain a reconstructed I component (I R ) so as to adequately integrate the local spatial details and spectral information of the I component and the PAN image while minimizing the local dissimilarities and spectral mismatch between the two [12,29,30]. Specifically, DRT is advantageous to preserve the spectral features and spatial structures of source images. Compressed sensing is effective in extracting local spatial information by using image pairs [47]. In this study, two-level DRT with c = 1 and d = 4 is applied to obtain the HF and LF sub-images of the I component and the PAN image; note that c = 1 and d = 4 are shown to be sufficiently high for DRT to capture 2-D anisotropic details in high spatial resolution satellite images, such as IKONOS and QuickBird [23].

High-Frequency Sub-Images Fusion
The HF sub-images decomposed by DRT contain sufficient spatial details of original images. The local variation of wavelet energy can represent the abundance of local information contained in the original images. The local variance of wavelet energy is utilized to fuse the high-frequency sub-images of the I component (HFI) and high-frequency sub-images of PAN images (HFP) without losing spatial details or distorting spectral characteristics [60][61][62]. Following the set-up in [61], we define the local region Q in the image with size 3 × 3, whose centre is (x, y). For every region Q in HFI or HFP, the principle of HF sub-images fusion rule is defined as: where V is the local variance value of the pixel, H is the mean value of a specific local region in HFI or HFP, and H(x, y) is the central pixel value. The fused high-frequency sub-images (HFF) can be obtained as: HFF(x, y) = HFI(x, y), V HFI(x,y) > V HFP(x,y) HFP(x, y), V HFI(x,y) ≤ V HFP(x,y) where V HFI(x,y) and V HFP(x,y) are the local variance values of a specific pixel HFI(x, y) or HFP(x, y), respectively.

Low-Frequency Sub-Images Fusion
The low-frequency sub-images of the I component (LFI) and the PAN image (LFP) decomposed by DRT contain the majority of the energy and spectral information of the I component and the PAN image, which are essential for pan-sharpening. Sparse representation and compressed sensing are introduced to get the representation of low-frequency sub-images efficiently and precisely. Predefined dictionaries are usually adopted to reconstruct images in multiple existing studies of image sparse representation and compressed sensing [43,56,63]. However, spectral characteristics of satellite images vary according to the sensor type, imaging location and imaging time [47]. Therefore, many pan-sharpening methods based on sparse representation and compressed sensing establish the dictionaries through image patches extracted from the original satellite images [23,27,30,43,44,64].
We construct the dictionary D using image patches of LFP sub-images. To combine the local information of LFI and LFP, we divide LFI and LFP into K small and partially overlapped image patches with size l × l and overlapping ratio σ using the patch-by-patch strategy. Following the set-up in the reference [27], all K patches from LFP sub-images are normalized, from which is subtracted the mean of all patches to form the dictionary D. We denote D = {d 1 , d 2 , . . . , d K } is the dictionary to be calculated, and the Gaussian random matrix is adopted as the decimation matrix M. Thus, each patch of LFI (i.e., LFI k , k ∈ {1, . . . , K}), should be reconstructed into LFI k R according to Equation (13) as: where y k LFI is column vector lexicographically rearranged from LFI k . According to Section 2.3, the estimated kth patch is generated asLFI k R = Dα k LFI via the BP algorithm. To ensure the fused patch is capable of integrating both the local wavelet energy and structures from LFI and LFP, the local difference weighted algorithm [62,65] is adopted to fuse LFI k R and the corresponding kth LFP patch (LFP k ). The local difference weighted value calculated in the patch with size l × l can be represented as: where k = l/2 , LFI k R (i, j) and LFP k (i, j) denote the pixel values of a specific LFI R patch and corresponding LFP k patch, W (x,y) is the local difference weighted value, and (x, y) is the central coordinate of this specific patch. The fusion rule is defined as: where LFF k R (x, y) is the kth fused low-frequency patch, and λ is the credibility of the pixel value λ ∈ (0, 1]. Furthermore, a, b and µ can be defined as: The fused low-frequency patches LFF not only contain the spectral information in LFI but also preserve the characteristics of LFP, which differs from several pan-sharpening methods based on sparse representation and compressed sensing that have been recognized as excellent [23,26,47]. Subsequently, LFF are obtained by rearranging LFF R into image patches with the same size as LFI and LFP. The procedure of low-frequency sub-images fusion is represented by the flowchart in Figure 1. The four main steps of low-frequency sub-images fusion can be briefly summarized as: (1) The and sub-images are separately divided into small and partially overlapped image patches. All the image patches are normalized before the mean of each patch is subtracted. (2) All patches from are utilized for the dictionary composition. (3) For each patch from , compressed sensing is performed by solving Equation (18) to acquire the reconstructed patch. (4) The patch is fused using a local difference weighted algorithm.

The Over-All Pan-Sharpening Method
The technical flow of the proposed method is shown in Figure 2. The basic steps of the improved IHS pan-sharpening method based on ripplet transform and compressed sensing are represented as follows: 1. Data pre-processing: The MS and PAN images are co-registered precisely at first. In the meantime, the MS image is up-sampled into the same pixel size and spatial scale as PAN image using cubic convolution. This step ensures the MS and PAN images have the same pixel size and geographic coordinates. and are fused to obtain according to the local variance algorithm in Section 3.2.1. 6. Low-frequency sub-images fusion: Compressed sensing and the local weighted algorithm discussed in Section 3.2.2 is performed on and to obtain . 7. IDRT transform: IDRT is performed on and to obtain the reconstructed intensity component ( ) following Equation (9). The four main steps of low-frequency sub-images fusion can be briefly summarized as: (1) The LFI and LFP sub-images are separately divided into K small and partially overlapped image patches. All the image patches are normalized before the mean of each patch is subtracted. (2) All K patches from LFP are utilized for the dictionary composition. (3) For each patch from LFI, compressed sensing is performed by solving Equation (18) to acquire the reconstructed LFI patch. (4) The LFF patch is fused using a local difference weighted algorithm.

The Over-All Pan-Sharpening Method
The technical flow of the proposed method is shown in Figure 2. The basic steps of the improved IHS pan-sharpening method based on ripplet transform and compressed sensing are represented as follows: 1.
Data pre-processing: The MS and PAN images are co-registered precisely at first. In the meantime, the MS image is up-sampled into the same pixel size and spatial scale as PAN image using cubic convolution. This step ensures the MS and PAN images have the same pixel size and geographic coordinates.

5.
High-frequency sub-images fusion: HFI and HFP are fused to obtain HFF according to the local variance algorithm in Section 3.2.1.

6.
Low-frequency sub-images fusion: Compressed sensing and the local weighted algorithm discussed in Section 3.2.2 is performed on LFI and LFP to obtain LFF. 7.
IDRT transform: IDRT is performed on HFF and LFF to obtain the reconstructed intensity I component (I R ) following Equation (9)

Results and Discussion
In this study, the proposed method, together with five state-of-the-art existing methods, is implemented on three datasets (WorldView-2, Pleiades-1A, and TripleSat) each with size 1024 × 1024 for comparison. The specific spatial resolution, spectral response, quantization value, and data source of the three datasets are listed in the Table 1. The datasets cover land objects such as buildings, roads, bridges, farmlands, rivers, shoreline, etc. To guarantee the generality of the experiment, we only fuse three spectral channels in the MS image: red (R), green (G), and blue (B). In fact, the proposed method on the basis of IHS transform has the potential to be extended to fuse four or more spectral bands.

Results and Discussion
In this study, the proposed method, together with five state-of-the-art existing methods, is implemented on three datasets (WorldView-2, Pleiades-1A, and TripleSat) each with size 1024 × 1024 for comparison. The specific spatial resolution, spectral response, quantization value, and data source of the three datasets are listed in the Table 1. The datasets cover land objects such as buildings, roads, bridges, farmlands, rivers, shoreline, etc. To guarantee the generality of the experiment, we only fuse three spectral channels in the MS image: red (R), green (G), and blue (B). In fact, the proposed method on the basis of IHS transform has the potential to be extended to fuse four or more spectral bands. The proposed method is compared with five state-of-the-art pan-sharpening methods, including adaptive IHS (AIHS) [13], wavelet transform and sparse representation (WT-SR) [66], wavelet based IHS (WT-IHS) [19], curvelet transform and independent component analysis (CT-ICA) [67], and ripplet transform based on injected details (CSI) [23]. Moreover, GS sharpening [15] is a well-known method shown to be outstanding on many satellite images for its generality and effectiveness. For fair comparison, the GS method is utilized as a standard benchmark in this study. The corresponding performances are evaluated by six quality indices. All the algorithms and indices used in experiments are implemented in MATLAB 2015b on a macOS 10.13 laptop.
In order to balance the performance and computation time of the proposed method, we empirically define the averaging filter kernel size as 5 × 5 when performing SFIM, and set the image patch size of 7 × 7 with an overlapping ratio of 15% in low-frequency sub-images fusion. In Section 4.3.1, we have a brief discussion on the size of averaging filter kernel selection in SFIM. The effects of the image patch size and overlap ratio on WorldView-2, Pleiades and Triplesat are analyzed in Section 4.3.2.
To ensure the fairness of the comparison, the overlapping ratio of the CSI method is set to be 50% as recommended by [27].

Qualitative Comparison
The source MS and PAN images from the three datasets and corresponding pan-sharpened images produced by seven individual approaches are presented in IHS (WT-IHS) [19], curvelet transform and independent component analysis (CT-ICA) [67], and ripplet transform based on injected details (CSI) [23]. Moreover, GS sharpening [15] is a well-known method shown to be outstanding on many satellite images for its generality and effectiveness. For fair comparison, the GS method is utilized as a standard benchmark in this study. The corresponding performances are evaluated by six quality indices. All the algorithms and indices used in experiments are implemented in MATLAB 2015b on a macOS 10.13 laptop.
In order to balance the performance and computation time of the proposed method, we empirically define the averaging filter kernel size as 5 × 5 when performing SFIM, and set the image patch size of 7 × 7 with an overlapping ratio of 15% in low-frequency sub-images fusion. In Section 4.3.1, we have a brief discussion on the size of averaging filter kernel selection in SFIM. The effects of the image patch size and overlap ratio on WorldView-2, Pleiades and Triplesat are analyzed in Section 4.3.2.
To ensure the fairness of the comparison, the overlapping ratio of the CSI method is set to be 50% as recommended by [27].

Qualitative Comparison
The source MS and PAN images from the three datasets and corresponding pan-sharpened images produced by seven individual approaches are presented in Figures 3-5, respectively. The upsampled MS images are regarded as ground truth and collaborated with GS sharpened images as benchmarks. For fairer comparisons, the zoomed images for the local area are provided at the bottom corner of each image.    The pan-sharpening results reveal that the GS method possesses ideal universality for the three experimental datasets. The GS method is able to enrich spatial information significantly with a tolerable spectral distortion. Furthermore, in general, the proposed method outperforms all the other methods (including the GS method) on the three datasets. It can be seen in Figures 3-5 that the pan-   The pan-sharpening results reveal that the GS method possesses ideal universality for the three experimental datasets. The GS method is able to enrich spatial information significantly with a tolerable spectral distortion. Furthermore, in general, the proposed method outperforms all the other methods (including the GS method) on the three datasets. It can be seen in Figures 3-5 that the pan- The pan-sharpening results reveal that the GS method possesses ideal universality for the three experimental datasets. The GS method is able to enrich spatial information significantly with a tolerable spectral distortion. Furthermore, in general, the proposed method outperforms all the other methods (including the GS method) on the three datasets. It can be seen in Figures 3-5 that the pan-sharpened image of the proposed method has the closest spectral characteristics of MS images and possesses more abundant (or approximately equivalent) spatial information than other pan-sharpened images.
From the perspective of spatial details upgrading, AIHS and WT-IHS methods perform better than WT-SR and CT-ICA. The spatial structures and textures contained in pan-sharpened images output by AIHS and WT-HIS are more visually cognizable. The AIHS method causes more obvious spectral distortion on the three datasets, which is deduced by the highest brightness of pan-sharpened images by AIHS. However, from the perspective of the preservation of spectral information, WT-SR and CT-ICA results reveal no obvious visual spectral diversity degradation. However, the zoomed images suggest these methods (especially WT-SR) perform poorly in improving spatial details. It can also be seen that the CSI method has a remarkable performance on the WorldView-2 and Triplesat datasets.
Since human vision is not sensitive enough to perceive spectral distortion, we calculate the absolute difference values between the fusion results and the original MS image according to Equation (24) as: where (x, y) is the coordinate of a specific pixel, B is the number of bands, M × N is the size of images, and F and R denote the pan-sharpened image and referenced MS image. Figures 6-8 are difference images, which denote the absolute values of spectral information difference between each fused image and referenced MS image on every pixel for the three datasets. Blue represents finer differences and red suggests larger differences in spectral information. sharpened image of the proposed method has the closest spectral characteristics of MS images and possesses more abundant (or approximately equivalent) spatial information than other pansharpened images. From the perspective of spatial details upgrading, AIHS and WT-IHS methods perform better than WT-SR and CT-ICA. The spatial structures and textures contained in pan-sharpened images output by AIHS and WT-HIS are more visually cognizable. The AIHS method causes more obvious spectral distortion on the three datasets, which is deduced by the highest brightness of pansharpened images by AIHS. However, from the perspective of the preservation of spectral information, WT-SR and CT-ICA results reveal no obvious visual spectral diversity degradation. However, the zoomed images suggest these methods (especially WT-SR) perform poorly in improving spatial details. It can also be seen that the CSI method has a remarkable performance on the WorldView-2 and Triplesat datasets.
Since human vision is not sensitive enough to perceive spectral distortion, we calculate the absolute difference values between the fusion results and the original MS image according to Equation (24) as: where ( , ) is the coordinate of a specific pixel, is the number of bands, × is the size of images, and and denote the pan-sharpened image and referenced MS image. Figure 6-8 are difference images, which denote the absolute values of spectral information difference between each fused image and referenced MS image on every pixel for the three datasets. Blue represents finer differences and red suggests larger differences in spectral information.   sharpened image of the proposed method has the closest spectral characteristics of MS images and possesses more abundant (or approximately equivalent) spatial information than other pansharpened images. From the perspective of spatial details upgrading, AIHS and WT-IHS methods perform better than WT-SR and CT-ICA. The spatial structures and textures contained in pan-sharpened images output by AIHS and WT-HIS are more visually cognizable. The AIHS method causes more obvious spectral distortion on the three datasets, which is deduced by the highest brightness of pansharpened images by AIHS. However, from the perspective of the preservation of spectral information, WT-SR and CT-ICA results reveal no obvious visual spectral diversity degradation. However, the zoomed images suggest these methods (especially WT-SR) perform poorly in improving spatial details. It can also be seen that the CSI method has a remarkable performance on the WorldView-2 and Triplesat datasets.
Since human vision is not sensitive enough to perceive spectral distortion, we calculate the absolute difference values between the fusion results and the original MS image according to Equation (24) as: where ( , ) is the coordinate of a specific pixel, is the number of bands, × is the size of images, and and denote the pan-sharpened image and referenced MS image. Figure 6-8 are difference images, which denote the absolute values of spectral information difference between each fused image and referenced MS image on every pixel for the three datasets. Blue represents finer differences and red suggests larger differences in spectral information.    Figures 6-8 demonstrate that the proposed method is superior to other methods in spectral fidelity improvement. However, when implemented on the Triplesat dataset, the proposed method ( Figure 8g) causes a slight spectral distortion. The difference images of the GS method are generally fine for all datasets when compared with other methods. In particular, the difference images reveal that the AIHS and WT-IHS methods lead to more serious spectral degradation, which is in accordance with visual analyses.

Quantitative Comparison
In quantitative comparison, correlation coefficient (CC) [10], root mean square error (RMSE) [47], the relative global synthesis error (ERGAS) [10], spectral angle mapper (SAM) [10], image quality score index (Q4) [68], and quality with no reference (QNR) [69] are adopted to evaluate the performance of individual pan-sharpening methods. Specifically, the application of CC, RMSE, ERDAS, SAM, Q4, and QNR with D can assess the spectral correlation between the original MS image and the pan-sharpened image. The standard deviation (STD) and QNR with D reflect the richness of spatial details in the MS and pan-sharpened images. The ideal values of CC, Q4 and QNR are 1. The smaller the values for RMSE, ERGAS and SAM, the better. The quantitative results of these different methods on the three datasets are reported in Tables 2-4, respectively. For better observation, the optimal value of each quality metric is labeled in bold.  Figures 6-8 demonstrate that the proposed method is superior to other methods in spectral fidelity improvement. However, when implemented on the Triplesat dataset, the proposed method ( Figure 8g) causes a slight spectral distortion. The difference images of the GS method are generally fine for all datasets when compared with other methods. In particular, the difference images reveal that the AIHS and WT-IHS methods lead to more serious spectral degradation, which is in accordance with visual analyses.

Quantitative Comparison
In quantitative comparison, correlation coefficient (CC) [10], root mean square error (RMSE) [47], the relative global synthesis error (ERGAS) [10], spectral angle mapper (SAM) [10], image quality score index (Q4) [68], and quality with no reference (QNR) [69] are adopted to evaluate the performance of individual pan-sharpening methods. Specifically, the application of CC, RMSE, ERDAS, SAM, Q4, and QNR with D λ can assess the spectral correlation between the original MS image and the pan-sharpened image. The standard deviation (STD) and QNR with D s reflect the richness of spatial details in the MS and pan-sharpened images. The ideal values of CC, Q4 and QNR are 1. The smaller the values for RMSE, ERGAS and SAM, the better. The quantitative results of these different methods on the three datasets are reported in Tables 2-4, respectively. For better observation, the optimal value of each quality metric is labeled in bold.   The quantitative comparison results, as represented in Tables 2-4, reveal the proposed method to be superior than other methods as it achieves the optimum value for most quality indices. Such a result is in accordance with our previous qualitative comparison. The CSI method achieves comparable values for the WorldView-2 dataset (specifically for STD of the red band and SAM) and the Triplesat dataset (specifically for CC, RMSE, STD of the blue and red bands and D s ) but fails to achieve the same overall (i.e., average) score for the WorldView-2 and Pleiades datasets. The AIHS method has relatively higher STD values on the three datasets, and even achieves the highest value of STD of the red band for the Pleiades dataset. Similar to the WT-IHS method, however, the AIHS method is also not capable of maintaining spectral information, with inferior values for CC, RMSE ERGAS, SAM and Q4. After the proposed method, CSI method performs best on the WorldView-2 and Triplesat datasets.

Discussion on Independence Factors and Contributions
In this section, we discuss the three independence factors of the proposed method and briefly conclude the contributions. In Section 4.3.1, the influence of smoothing filter kernel size in SFIM is discussed. The dependence of image patch size and overlapping ratio in low-frequency sub-images fusion is investigated in Section 4.3.2. The contributions are provided in Section 4.3.3.

Discussion on Averaging Filer Kernel Size in SFIM
As mentioned in Section 3.1, the definition of averaging filer kernel size in SFIM depends on the scale ratio between PAN and MS images. The scale ratio of WorldView-2, Pleiades and Triplesat is 4 (Table 1), but the kernel size is theoretically odd [51]. Hence, 3 × 3 and 5 × 5 are the two potential optimal kernel sizes of the experiment datasets. Following the guidance in [51], we derive the CC between the pan-sharpened images with the original PAN and MS images for the three datasets. The specific results, as reported in Table 5 (where "PS" represents the pan-sharpened images), reveal 5 × 5 as the optimal kernel size for the study, as the sums of CC between PS with PAN and PS with MS are relatively higher than that of 3 × 3. Furthermore, we increase the smoothing kernel size from 7 × 7 towards 13 × 13 to discuss the influence of kernel size in SFIM. It is noteworthy that the CCs of PS and PAN, and PS and MS become gradually identical with the increase in kernel size. Such a trend reveals that the large kernel size will excessively modulate MS images towards PAN images.

Dependence on Image Patch Size and Overlapping Ratio
According to Section 3.2.2, image patch size l × l and the overlapping ratio σ are the only two parameters in LF sub-images fusion. The formation of dictionary D is dependent on image patch size and overlapping ratio. There is a trade-off between the completeness (i.e., number of image patches) and compactness (i.e., size of image patches) of the dictionary [70]. On one hand, a too-large image patch size accelerates computation but weakens the ability of compressed sensing to reconstruct the LF component. On the other hand, a too-small image patch size leads to high coherence of atoms in the dictionary.
We increase image patch size from 5 × 5 to 11 × 11 and the overlapping ratio from 5% to 20%. Experimental results on WorldView-2, Pleiades-1A, and Triplesat data are reported in Tables 6-8 (where "Avg" means the average value of the index of all the spectral channels), separately.   According to this statistical evidence, we can draw three conclusions: 1.
The increase of the overlapping ratio σ is beneficial to performance but leads to large increases in computation time.
In our experiments, σ = 15% is the most desirable value to balance performance and computation time.

Contributions
In general, comprehensive evaluations, both visually and by objective indices, reveal the proposed method as optimal for datasets from WorldView-2, Pleiades, and Triplesat, when compared with AIHS [13], WT-SR [66], WT-IHS [19], CT-ICA [67], CSI [23] and GS [15]. Notably, the CSI method also achieved comparable results with the exception of the Pleiades dataset. The results indicate that the proposed method achieves better spectral fidelity while yielding greater (or equivalent) spatial information improvement than the other state-of-the-art methods. Furthermore, the results demonstrate that the integration of DRT and compressed sensing can ensure the balance of spatial detail reconstruction and spectral diversity preservation.

Conclusions
In this paper, an improved IHS-based pan-sharpening method using ripplet transform and compressed sensing is proposed, which overcomes the spectral distortion that occurs in the original IHS method. Firstly, SFIM is performed on the PAN image and the I component separated from MS image to improve spectral fidelity. The proposed method adopts different fusion rules according to individual characteristics of the HF and LF coefficients obtained by DRT. The fusion rule based on local variance of wavelet energy is utilized for HF sub-images. For LF information, we use a compressed sensing technique to maintain the local structures and spectral information. The LF coefficients of the I component and the PAN image are divided into small patches and a dictionary is constructed using patches of low-frequency components of PAN. The reconstructed low-frequency components of the I component and the PAN image are generated using the BP algorithm, and the local difference weighted algorithm is used to generate new LF components. For the combination of IHS transform, DRT and compressed sensing, the proposed algorithm can produce pan-sharpened images with both high spatial and high spectral resolution efficiently and effectively.
The performance of the proposed method and five state-of-the-art methods is comprehensively evaluated using WorldView-2, Pleiades-1A and Triplesat datasets. The specific evaluation indices include CC, RMSE, STD, ERGAS, SAM, Q4 and QNR. The experimental results demonstrate that the proposed method performs better than other fusion methods and overcomes spectral distortion while improving spatial resolution significantly. In future work, more effort should be dedicated to improving the efficiency of the proposed method, as the BP algorithm is time-consuming and can thus be potentially replaced by other reconstruction algorithms. In addition, the practicability of the algorithm images captured by other remote sensors has yet to be demonstrated.