Pansharpening with a Gradient Domain GIF Based on NSST

In order to improve the fusion quality of multispectral (MS) and panchromatic (PAN) images, a pansharpening method with a gradient domain guided image filter (GIF) that is based on non-subsampled shearlet transform (NSST) is proposed. First, multi-scale decomposition of MS and PAN images is performed by NSST. Second, different fusion rules are designed for highand low-frequency coefficients. A fusion rule that is based on morphological filter-based intensity modulation (MFIM) technology is proposed for the low-frequency coefficients, and the edge refinement is carried out based on a gradient domain GIF to obtain the fused low-frequency coefficients. For the high-frequency coefficients, a fusion rule based on an improved pulse coupled neural network (PCNN) is adopted. The gradient domain GIF optimizes the firing map of the PCNN model, and then the fusion decision map is calculated to guide the fusion of the high-frequency coefficients. Finally, the fused highand low-frequency coefficients are reconstructed with inverse NSST to obtain the fusion image. The proposed method was tested using the WorldView-2 and QuickBird data sets; the subjective visual effects and objective evaluation demonstrate that the proposed method is superior to the state-of-the-art pansharpening methods, and it can efficiently improve the spatial quality and spectral maintenance.


Introduction
With the development of satellite technology and imaging systems, we are able to obtain remote sensing images with higher resolution.High-resolution remote sensing images can describe target information more accurately, which is of great significance in the development of numerous applications, such as environmental monitoring, land and resource planning, military mapping, object recognition, and scene interpretation.Most commercial satellites, such as QuickBird, IKONOS, GeoEye, and WorldView, can currently jointly obtain panchromatic (PAN) and multispectral (MS) images.Due to imaging physical constraints and transmission bandwidth limit, it has been difficult to obtain images with the characteristics of both high spatial resolution and high spectral resolution.Fusion is one of the most important and effective methods in providing better interpretation ability of remote sensing images.How to combine complementary information of PAN and MS images is an urgent problem to be solved.
MS images have the advantage of high spectral resolution, while PAN images have the advantage of high spatial resolution.The purpose of fusion is to combine the complementary characteristics of the original images and provide enough information for image interpretation.The component substitution (CS) method is a traditional pansharpening model, which includes intensity-hue-saturation transformation (IHS) [1], principle component analysis (PCA) [2], the Gram-Schmidt process (GS) [3], and so on.These methods offer outstanding spatial quality but they have the problem of serious spectral distortion in the fused image.The CS method has been extended and improved based on new theories over time, because of low computational complexity, [4,5].In [6], an adaptive image fusion method that is based on the concept of partial replacement of intensity component is proposed, which is known as partial replacement adaptive CS (PRACS).A context-adaptive (CA) pansharpening method that is based on image segmentation is proposed in [7], which is integrated into the GS scheme in order to achieve a better estimation of the injection coefficients.The band-dependent spatial-detail (BDSD) model is also known as adaptive CS [8].Model-based methods have gathered increasing interest in recent studies.This kind of method based on complex models can achieve a better pansharpening effect in some cases, but the time complexity is high due to the optimization process.Within this family, many contributions that are based on Bayesian methods rely on the sparse representations of signals [9,10] and total variation penalization [11,12] terms.Essentially, this can be regarded as a strategy of image repair, which consists of the reconstruction of the high-resolution MS image from the original data [13].Multi-resolution analysis (MRA) methods, which decompose the image into different frequency coefficients, can harmonize the injection of spatial details and maintenance of spectral information.The MRA scheme is based on the injection of detailed information that is extracted from the decomposition coefficients of PAN images into the low-resolution MS bands.Wavelet transform is one of the MRA methods and it is an important milestone in the field of image processing as a mathematical tool [14].The fusion results of wavelet transform provide certain improvements in preserving spectral information, but they also have shortcomings, such as direction limitation, shift, and aliasing.When compared with discrete wavelet transform (DWT), dual-tree complex wavelet transform (DTCWT) [15] has the advantages of shift-invariance and directional selectivity, but its limited number of directional textures and edges of wavelet families make it difficult to represent two-dimensional (2-D) images.To solve this problem, a number of multi-scale geometric analysis tools, such as curvelet transform [16], contourlet transform [17], and shearlet transform [18], have been developed and successfully applied to the pansharpening problem.The main motivation of multi-scale geometric analysis methods is to pursue a "true" 2-D transform [19], which can effectively capture the geometric structure of an image, so that the fusion quality can be further improved.However, without the shift-invariant property, the contourlet and shearlet transform may suffer the frequency alias problem.The non-subsampled contourlet transform (NSCT) [20,21] is an effective solution to this problem, but its application is limited by the finite decomposition directions and high computational complexity.Non-subsampled shearlet transform (NSST) is a shift-invariant version of shearlet transform, which attains a low computational cost and good image sparse representation performance [22].For the current study, the appearance of NSST has provided a new solution for the pansharpening issue.As a novel multi-scale geometric analysis method, NSST is one of the topics that are currently being studied by many researchers.Moonon [23] proposed a remote sensing image fusion method based on NSST and sparse representation.Wu proposed a method that is based on improved non-negative matrix decomposition in the NSST domain [24] and a fusion method using chaotic bee colony optimization in the NSST domain [25].Yang [26] proposed a pansharpening framework based on the matting model and multi-scale transform.These methods have good effects on pansharpening, although they all are subject to their own limitations.
The lack of an anti-aliasing feature in multi-scale decomposition tends to cause decision bias in the boundary region of objects.The bias will result in an artificial texture and image non-uniformity, and it therefore causes a bad influence on visual effects and image interpretation.In order to solve this problem, some spatial techniques and optimization strategies have been introduced into the fusion method, such as bilateral filter [27], cross bilateral filter (CBF) [28], weighted least squares filter [29], and guided image filter (GIF) [30,31].The GIF is one of the fastest edge-preserving local filters and it is superior to bilateral filters in avoiding gradient reversal.Meng [32] proposed a pansharpening method with an edge-preserving guided filter based on three-layer decomposition and the decomposed PAN image is injected into the MS image within this method.However, due to the fixed regularized values in the GIF, the edges will inevitably be smoothed.Li [33] proposed a weighted GIF, which avoided the edge blurring problem to some extent.However, these two methods do not have explicit constraints in processing the edges of images and the filtering process is usually accompanied by image coarsening.When both edge preservation and filtering are considered, the problem of edge blurring will inevitably occur.Moreover, in some cases, these methods still cannot maintain the edges well, which leads to the degradation of fusion quality.Kou [34] proposed a gradient domain GIF, in which the introduction of explicit first-order edge condition constraints defines a new edge-perception weight, so that the edges of an image can be better preserved.
Based on a gradient domain GIF with excellent edge-preserving properties, a new fusion method of MS and PAN images in the NSST domain is proposed.The MS and PAN images are decomposed by NSST to obtain the coefficients of a different frequency.For the high-frequency coefficients in the NSST domain, an improved pulse coupled neural network (PCNN) model is used to obtain the initial firing map.Unlike previous methods that directly calculate the fusion decision map, the gradient domain GIF is used to optimize the firing map, and then the fusion decision map is calculated in order to guide the fusion of high-frequency coefficients.For the low-frequency coefficients in the NSST domain, a fusion strategy that is based on morphological filter-based intensity modulation (MFIM) technology is adopted.The gradient domain GIF is used to perform the edge refinement on the modulated low-frequency coefficients to obtain the fusion result of the low-frequency coefficients.The experimental results show that, in the proposed method, the detailed information and spatial continuity can be effectively improved while still maintaining excellent spectral information.

NSST
Guo and Labate [18] constructed the shearlet transform by combining geometry and multi-scale method through classical theory of the affine system.When the dimension n = 2, the affine system is: where, ψ ∈ L 2 ( 2 ), L represents the square integrable space, A and B are both 2 × 2 invertible matrices, and |detB| = 1.If for any f ∈ L 2 ( 2 ), ψ AB (ψ) constitutes the following tight support framework, and then the element of ψ AB (ψ) is called the synthetic wavelet: where,A j is associated with scale transformation and B l is associated with geometric transformation.
In Equation ( 1), A = a 0 0 a 1/2 denotes an anisotropic matrix and B = 1 s 0 1 denotes a shearlet matrix; at this point, the synthetic wavelet is called a shearlet and the values usually are taken as . Figure 1 shows the frequency decomposition and supports of a shearlet.The frequency supported base of each element ψj,k,l is a trapezoid region, the size is approximately 2 2j × 2 j , and the slope is l2 −j .As the shearlet does not have a shift-invariant character, it is easy to introduce the Gibbs phenomenon in image fusion.NSST [22] is an improved form of the shearlet with directional selectivity and shift-invariance, high computational efficiency, and simple structure.NSST is mainly divided into multi-scale decomposition and multi-directional decomposition.Non-subsampled Laplacian pyramid (NSP) performs multi-scale decomposition.J sub-band images of the high-frequency and one sub-band image of the low-frequency can be obtained by the J level NSP decomposition, and each sub-band image has the same size as the source image.The shearlet filter (SF) implements multi-directional decomposition.
J ∑ j=1 2 l j + 1 sub-band images with the same size as the source image can be obtained by the j level decomposition of a certain scale.J is the image decomposition layer and l j is the direction decomposition levels under scale j. Figure 2 illustrates a three-level NSST decomposition model.NSST has the characteristics of approximate optimal sparse representation for 2-D images.Applying it to the fusion of MS and PAN images can provide more useful information for the fused result.
Electronics 2019, 8, x FOR PEER REVIEW 4 of 28 As the shearlet does not have a shift-invariant character, it is easy to introduce the Gibbs phenomenon in image fusion.NSST [22] is an improved form of the shearlet with directional selectivity and shift-invariance, high computational efficiency, and simple structure.NSST is mainly divided into multi-scale decomposition and multi-directional decomposition.Non-subsampled Laplacian pyramid (NSP) performs multi-scale decomposition.J sub-band images of the high-frequency and one sub-band image of the low-frequency can be obtained by the J level NSP decomposition, and each sub-band image has the same size as the source image.The shearlet filter (SF) implements multi-directional decomposition. the source image can be obtained by the j level decomposition of a certain scale.J is the image decomposition layer and j l is the direction decomposition levels under scale j .Figure 2 illustrates a three-level NSST decomposition model.NSST has the characteristics of approximate optimal sparse representation for 2-D images.Applying it to the fusion of MS and PAN images can provide more useful information for the fused result.As the shearlet does not have a shift-invariant character, it is easy to introduce the Gibbs phenomenon in image fusion.NSST [22] is an improved form of the shearlet with directional selectivity and shift-invariance, high computational efficiency, and simple structure.NSST is mainly divided into multi-scale decomposition and multi-directional decomposition.Non-subsampled Laplacian pyramid (NSP) performs multi-scale decomposition.J sub-band images of the high-frequency and one sub-band image of the low-frequency can be obtained by the J level NSP decomposition, and each sub-band image has the same size as the source image.The shearlet filter (SF) implements multi-directional decomposition.

GIF
The GIF is a new kind of local linear filter.It has the edge-preserving property and its time complexity is independent of the size of filter window.It has fast calculation speed, high efficiency, and time complexity of O(n), which has been successfully applied to the field of image fusion [35].The GIF can transfer the structural information in the guided image to the filter output, which makes

GIF
The GIF is a new kind of local linear filter.It has the edge-preserving property and its time complexity is independent of the size of filter window.It has fast calculation speed, high efficiency, and time complexity of O(n), which has been successfully applied to the field of image fusion [35].The GIF can transfer the structural information in the guided image to the filter output, which makes the filter output more structured, so as to complete the edge correction in the fusion decision.However, the local linear model that is used in the GIF cannot represent the image well around some edges, which may result in halo artifacts.Li [33] proposed the weighted GIF, which can effectively reduce artifacts by introducing the edge-aware factors, but there is no explicit constraint to deal with edges, so in some cases the edges cannot be well preserved.Kou [34] proposed a gradient domain GIF by introducing explicit first-order edge condition constraints, so that the edges can be better preserved.
Figure 3 shows the processing flowchart of the GIF.The GIF assumes that ω k is a square window whose size is (2r 1 + 1) × (2r 1 + 1) centered at a pixel k.I is the guidance image, Z is the output image, and X is an image to be filtered.Equation ( 3) is a local linear model between the guided image and the filtered output image: where, a k and b k are the linear coefficients in the window ω k and have fixed values.a k and b k are calculated by minimizing the cost function E(a k , b k ), which is defined as: where, λ is a regularization term, which is an important parameter in adjusting the ambiguity of the filter to penalize the too-large values of a k .According to linear regression analysis, the solution of Equation ( 4) can be obtained: where, is the dot products of two matrices, µ I X,r 1 (k), µ I,r 1 (k), and µ X,r 1 (k) represent the means of I X, I, and X in the local window ω k , respectively.Since the value of λ in the GIF is fixed, the edges are inevitably smoothed, and in some cases the edges cannot be well preserved.
output image, and X is an image to be filtered.Equation ( 3) is a local linear model between the guided image and the filtered output image: where, k k E a b , which is defined as: where,  is a regularization term, which is an important parameter in adjusting the ambiguity of the filter to penalize the too-large values of k a .According to linear regression analysis, the solution of Equation ( 4) can be obtained: ) (5) where, is the dot products of two matrices, is fixed, the edges are inevitably smoothed, and in some cases the edges cannot be well preserved. is defined, as follows:

Edge-Aware Weighting
In order to better preserve the edges, Kou defines a new edge-aware weighting Γ I (k), which can determine the importance of each pixel with respect to the global guidance image.Γ I (k) is defined, as follows: where, χ(k) is defined as σ I,1 (k)σ I,r 1 (k), σ I,1 (k), and σ I,r 1 (k) are the variances of I in window 3 × 3 and (2r 1 + 1) × (2r 1 + 1), respectively.r 1 is the window size of the filter, M is the pixel number in the image.

Gradient Domain GIF
∇Z(i) = a k ∇I(i) can be obtained according to the linear model in Equation (3).Obviously, the smoothness of Z in local window depends on the value of a k .If the pixel k is at the edge area and a k = 1, the edge can be preserved well.If the pixel k is in a smooth region, it is expected that the a k = 0, so the region will be smoothed.Based on the observation, a new cost function is defined, as follows: where the definition of γ k is where, µ χ,∞ is the mean value of χ(i), η = 4/(µ χ,∞ − min(χ(i))).It can be seen that, if the pixel k is at the edge area, the value of γ k approaches 1.On the contrary, the value of γ k approaches 0 if the pixel is in the smooth region.The filter is less sensitive to the selection of parameter λ.
The optimal values of a k and b k are calculated, as follows The final value of Z is: In the calculation of the linear coefficients of the local window, a pixel may be contained in multiple windows, so a k and b k need to be calculated by mean filtering, that is: where a i and b i are the means of a k and b k in the window ω k , respectively.
, and |ω| is the number of pixels in the local window.

Proposed Method
The factors affecting the fusion quality of PAN and MS images mainly include two aspects: the integration of spatial details and the preservation of spectral information.The main purpose of pansharpening is to establish a good tradeoff between the injection of spatial details and the preservation of spectral information.It is assumed that PAN and MS images that are used for fusion have been geometrically registered.NSST is utilized to decompose the source images with multi-scale and multi-direction into different frequency coefficients.The high-frequency coefficients represent the detailed features of the image, which is the reflection of spatial information.The selection of high-frequency coefficients plays an important role in the maintenance of detailed spatial information and improving the spatial resolution of the fused image.The low-frequency coefficient image is the approximate version of the original image, which contains the most energy and spectral information.The selection of low-frequency coefficients is of great importance for maintaining the spectral signatures and reducing the spectral distortion of the fused result.According to the different significant features that need to be preserved in different frequency domains, the corresponding fusion rules of high-and low-frequency coefficients are proposed.Figure 4 illustrates the flowchart of the multi-scale image fusion procedure that is based on NSST, and the fusion steps are as follows:

NSST decomposition
The PAN and MS images are decomposed by NSST to obtain the corresponding low-and high-frequency sub-band coefficients {S 0,d P , S l,d P } and {S 0,d M , S l,d M }. S 0,d P is the low-frequency coefficients of the PAN image, and S l,d P is the lth scale, dth directional high-frequency coefficients of the PAN image.Similarly, S 0,d M and S l,d M are the corresponding coefficients of the MS image.

Fusion of low-frequency coefficients in the NSST domain
The low frequency coefficients of the PAN and MS images are processed with fusion rule that is based on MFIM technology.The high resolution version of the low-frequency coefficients can be obtained while using low-frequency coefficients of the PAN image to modulate the low-frequency coefficients of the MS image.The gradient domain GIF is used to optimize the modulated sub-band coefficients and the fusion results of low-frequency coefficients {S 0,d F } are obtained.

Fusion of high-frequency coefficients in the NSST domain
The high-frequency coefficients of the PAN and MS images are processed by the improved PCNN-based fusion rule.The initial firing map is obtained by a PCNN model, the original PAN image is subsequently used as a guidance image, and then the edge preserving gradient domain GIF is performed on the firing map.According to the optimized firing map, the fusion decision map is calculated to guide the fusion of high-frequency coefficients to obtain the fusion result {S l,d F }. coefficient image is the approximate version of the original image, which contains the most energy and spectral information.The selection of low-frequency coefficients is of great importance for maintaining the spectral signatures and reducing the spectral distortion of the fused result.
According to the different significant features that need to be preserved in different frequency domains, the corresponding fusion rules of high-and low-frequency coefficients are proposed.Figure 4 illustrates the flowchart of the multi-scale image fusion procedure that is based on NSST, and the fusion steps are as follows: are obtained.

Fusion of high-frequency coefficients in the NSST domain
The high-frequency coefficients of the PAN and MS images are processed by the improved PCNN-based fusion rule.The initial firing map is obtained by a PCNN model, the original PAN image is subsequently used as a guidance image, and then the edge preserving gradient domain GIF is performed on the firing map.According to the optimized firing map, the fusion decision map is calculated to guide the fusion of high-frequency coefficients to obtain the fusion result

Fusion Rule of the Low-Frequency Coefficients
The low-frequency coefficients that were decomposed by the NSST process are the inheritance of the original MS and PAN images, which are the approximate components.They contain the most energy of the original images.The fusion rule of these coefficients is designed to combine the available spatial information from the low-frequency coefficients of the PAN image and the available spectral information from the low-frequency coefficients of the MS image.

Fusion Rule of the Low-Frequency Coefficients
The low-frequency coefficients that were decomposed by the NSST process are the inheritance of the original MS and PAN images, which are the approximate components.They contain the most energy of the original images.The fusion rule of these coefficients is designed to combine the available spatial information from the low-frequency coefficients of the PAN image and the available spectral information from the low-frequency coefficients of the MS image.
In this paper, the morphological filter-based intensity modulation (MFIM) technique [36] and gradient domain GIF based fusion rule of the low-frequency coefficients is designed.The low-frequency coefficient image serves as an approximate version of the original image, which contains the main information regarding the original image and also contains some detailed spatial information.Traditional fusion strategies for the low-frequency coefficients of the MS and PAN images usually adopt a simple weighted averaging method or the region energy analysis based method.These methods are easy to implement but they have problems in contrast reduction and detailed information loss.Furthermore, the coefficient selection or superposition procedure may result in decision bias and spectral distortion.The preservation of the spectral content of the original MS image is very important.Thus, we adopt the MFIM technique to reconstruct the spatial details that are missing in the low-frequency sub-bands of the MS image and preserve the spectral information.The gradient-domain GIF is used to refine the modulated low-frequency coefficients image in order to properly inject the coefficients of spatial details and avoid halo artifacts in the fusion results.
First, by using the low-frequency coefficients image MS k of the MS image as a reference image, the image histogram equalization on the low-frequency coefficient image S 0,d P of the PAN image is performed.In other words, the image P 0 k on the band k can be obtained by the equalization step between S 0,d P and MS k .The low-resolution version P low k of the low-frequency coefficients image of the PAN image can be obtained by half-gradient operator that is based morphological filtering.Setting the structural element the corresponding morphological filter is as follows: where, ε B (F) and δ B (F) are the erosion and dilation operators, respectively.P low k is derived through the morphological filter and a pyramidal decomposition.
Subsequently, the spatial details can be extracted and injected.The fusion result { MS k } k=1,...,N can be obtained by intensity modulation of the low-frequency coefficients of the MS image { MS k } k=1,...,N , as follows: The definition of the contrast by Weber [37] is given by C = ∆L/L b , where ∆L= L − L b , L b is the background luminance, L is the pixel luminance [38], and ∆L can be seen as the foreground luminance.Equation ( 15) can be written as Finally, the gradient domain GIF is used to refine the edges in the modulated coefficients image MS = { MS k } k=1,...,N .The low-frequency coefficients image of the PAN image serves as the guidance image.The final fusion result of the low-frequency coefficients can be obtained by the following equation:

Fusion Rule of the High-Frequency Coefficients
A PCNN is used to extract detailed information in the high-frequency coefficients, which is a new type of neural network with global pulse synchronization and pulsecoupling.It can realize the simultaneous firing of pixels with proximal space and similar features.The firing number of the PCNN model can effectively reflect the detailed spatial information of an image.The more time that the firing occurs, the more information that the corresponding pixel area will have in the image.The PCNN consists of several neurons, each of which consists of three parts: the receiving domain, the modulation domain, and the pulse generator.In this paper, a simplified PCNN model is adopted [39] and its expression is as follows: where, n denotes the iteration steps, F ij (n), D ij , and L ij (n) are the neuron feedback input, exterior input, and linking input, respectively, the parameter β is the connecting weight between the neurons, U ij (n) represents the inner activity of the neuron, θ ij (n) is the dynamic threshold, ω ij,pq denotes the synaptic links of the neuron, V L is the amplification factor of the linking input, V θ is the threshold amplification factor, α L and α θ denote the attenuation time constant, and called a single firing.After n iterations, the firing map that is generated by the total numbers of firings of each neuron becomes the PCNN model output.
Based on the simplified PCNN model, a soft limited amplitude sigmoid function is used to calculate the firing output amplitude of the model during the iterative process: After n iterations of the PCNN, the sum of the firing output amplitude is: The gradient domain GIF is used to optimize the firing map that is exported by the PCNN model, which aims to satisfy the spatial consistency and therefore reduce the artificial texture and image non-uniformity produced by the multi-scale fusion methods.W P and W M denote the firing maps of the PAN and MS images that were optimized by gradient domain GIF, respectively: Based on observation, the fusion decision matrix can be obtained as: The fusion result of the high-frequency coefficients can then be calculated according to the decision matrix: where, S M (i, j) and S P (i, j) represent the high-frequency coefficients of the MS and PAN images, respectively.

Data Set
In our experiment, the data sets that were acquired by WorldView-2 and QuickBird satellites are used to evaluate the proposed fusion method.The main features of these two satellites are given in Tables 1 and 2. The MS and PAN images that were used in the experiment have been registered.
The WorldView-2 data set was collected by the Beijing Key Laboratory of Digital Media.The sizes of the PAN and MS images are 512×512 and 128×128, respectively.Eight multi-spectral wave bands, including four standard bands and four new bands are contained in the WorldView-2 satellite, but the data set only provides three bands of 5 (R), 3 (G), and 2 (B) for the MS image.The land-cover types of the MS and PAN images include home, seaside, urban, house, and bridge.Given space limitations, four pairs of MS and PAN images are considered for the comparison analysis of our proposed method, two performed at reduced resolution and two evaluated at the original resolution.The second data set was captured by the QuickBird satellite on November 21, 2002, which covers the national forest park of Sundarbans, India.The sizes of the PAN and MS images are 256×256 and 64×64, respectively.This data set provides the MS image with four bands (R, G, B, and NIR).Two pairs of MS and PAN images are selected for the result analysis as an example, one for full resolution assessment and the other for reduced resolution assessment.

Quality Assessment of Fusion Results
Generally, the quality assessment of an image fusion method includes subjective and objective evaluations.Subjective evaluation that is based on individuals may have differences, meaning that the assessment faces a challenge of reliability, so an objective evaluation is needed to provide a complementary and quantitative evaluation system.
Because of the absence of reference images, two strategies have been proposed in order to measure the fusion quality.One is evaluated on the degraded images that are based on Wald's protocol; the other is performed on the full resolution without the reference images.For the degraded evaluation, the original MS and PAN images are processed by a low-pass filter with a down-sampling factor of four to obtain the reduced resolution images, and the original MS image is taken as the reference image.The other strategy calculates quality indices by operating on the relationship between the original MS and PAN images and the fused images.
As detailed below, nine well-known indices are utilized to evaluate the spatial and spectral qualities of the fused results.The correlation coefficient (CC) reflects the similarity of spectral features, and structural similarity (SSIM) is a structural similarity index.The spectral angle mapper (SAM) is utilized to measure the spectral distortion.Root mean square error (RMSE), erreur relative global adimensionnelle de synthèse (ERGAS), and the universal image quality index (UIQI) measures the global quality of spectral and spatial features.The "quality with no reference" (QNR) indicator measures the overall quality of the fused image and it is composed of two separate indices, the spatial distortion index D s and the spectral distortion index D λ .

Reduced-Resolution Assessment
Follow Wald's protocol, the reduced resolution assessment is based on the scale invariance assumption.Six indices have been selected for the evaluation procedure, which consider the available reference images.The indices are described, as follows: 1. Correlation Coefficient (CC) CC [40] indicates the presentation ability of the spectral characteristics.It reflects the correlation degree of spectral features between each band of the fusion MS image and the reference image, and the ideal value is 1.The definition is as follows: where, CC k is the correlation coefficient of the band k, F k , and R k denote pixel means of the k band of the fusion MS image and the reference image, respectively.

Structural Similarity (SSIM)
SSIM [41] reflects the structural similarity between the fused image and the reference image.It quantifies the degradation of image quality.A higher value of SSIM indicates a higher structural similarity between the two images.
where µ R and µ F denote the mean values of the reference image R and the fused image F, respectively.
σ R and σ F are the variances of R and F. σ RF calculates the covariance value of R and F. C 1 and C 2 are constants, which exist to avoid the denominator equaling to 0 and to maintain stability.In order to simplify the model, the value is set to 0.

Spectral Angle Mapper (SAM)
A simple index, SAM [42], reflects the distortion in spectral information between the fused MS image and the reference image, and the ideal value is 0. Its definition is as follows: where u F and u R are the spectral vectors constructed by the fused image and the reference image.

Root Mean Square Error (RMSE)
RMSE [43] is the index accounts for the difference between the reference image and the fused image; the ideal value is 0. RMSE is defined as

Erreur Relative Global Adimensionnelle de Synthèse (ERGAS)
ERGAS [13] reflects the distortion degree of the spectral and spatial information, and it is an index for the overall evaluation of the fused image.The optimal value of ERGAS is 0. The definition is as follows: where h and l are the spatial resolution of the PAN and MS images, respectively.RMSE k is the mean square error between the k band of the fused image and the k band of the reference image, indicating the difference between the two images, and Mean(R k ) denotes the mean value of the band k of the reference image.
6. Universal Image Quality Index (UIQI) UIQI [44] is a performance index that measures the spatial details of the fused MS images.The optimal value is 1; so the closer that UIQI value is to 1, the better the fusion quality.It is defined as: where, σ R and σ F are the standard deviations of the reference image R and the fused image F, respectively.σ RF calculates the covariance value of R and F.

Full-Resolution Assessment
In order to operate the assessment directly on the data at the original resolution, the quality with no reference (QNR) index was proposed [45].This index measures the correlation, luminance, and contrast between the two images.QNR is composed of two separate indices,D λ and D s , which denote the distortion of spectral and spatial features, respectively.
1. Quality with No Reference (QNR) QNR, which consists of D λ and D s , reflects the overall quality of the fused image, and the ideal value is 1.The definition of QNR is as follows: 2. Spectral Distortion Index (D λ ) D λ represents the spectral distortion degree between the fused image and the MS image, with an ideal value of 0. The definition of D λ is as follows: where, p is the difference enhancement parameter that isusually set to 1, U IQI(MS i , MS j ) and U IQI(F i , F j ) are the generalized image quality indicators of the MS image bands and the fused image bands, respectively; U IQI(MS i , MS j ) = , where σ MS i MS j is the image covariance, σ MS i and σ MS j represent image variances, and MS i and MS j denote the image mean values.

Spatial Distortion Index (D s )
D s indicates the information preservation of the fused image in the spatial domain, with an ideal value of 0. Its definition is as follows: where, q is the difference enhancement parameter that is usually set to 1, U IQI(MS i , PAN) is the generalized image quality indicator between the k band of the MS image and the PAN image, and U IQI(F i , PAN) is the generalized image quality indicator between the k band of the fused image and the PAN image.

Implementation Details
The implementation details for the fusion methods are provided and analyzed.The experimental hardware environment is an Intel Core i5-4200U, 1.60GHz CPU, and 4GB memory.The software environment is MATLAB R2014a.The parameters of the fusion methods are set, as follows: the NSST decomposition filter is "maxflat", the scale decomposition layer is set to 3, and the corresponding directions to {16, 8, 4}.In the PCNN model, n = 200, α L = 1.0, α θ = 0.2, β = 3.0, V L = 1.0, and V θ = 20.The two parameters r and ε of the gradient domain GIF affect the fusion performance of the proposed method to some extent.The parameter r is the size of the filter window, which determines the significant difference of the guidance image in local windows.ε denotes the normalized parameter, which determines the blur degree of guided filter.
The effects of the two sets of parameters, r 1 , ε 1 and r 2 , ε 2 , in the gradient domain GIF on fusion performance will be discussed.Four indices, CC, SAM, ERGAS, and UIQI, are adopted to analyze the performance impact of the four parameters.Figure 5a,b,d,e represent two groups of the MS and PAN images from the WorldView-2 data set, which are the plant and house areas, respectively.Figure 5c,f is the fusion results that were obtained with the optimal guided filter parameters.
Electronics 2019, 8, x FOR PEER REVIEW 13 of 28 where, q is the difference enhancement parameter that is usually set to 1, ( , ) i UIQI MS PAN is the generalized image quality indicator between the k band of the MS image and the PAN image, and ( , ) i UIQI F PAN is the generalized image quality indicator between the k band of the fused image and the PAN image.

Implementation Details
The implementation details for the fusion methods are provided and analyzed.The experimental hardware environment is an Intel Core i5-4200U, 1.60GHz CPU, and 4GB memory.The software environment is MATLAB R2014a.The parameters of the fusion methods are set, as follows: the NSST decomposition filter is "maxflat", the scale decomposition layer is set to 3, and the corresponding directions to .The two parameters r and  of the gradient domain GIF affect the fusion performance of the proposed method to some extent.The parameter r is the size of the filter window, which determines the significant difference of the guidance image in local windows. denotes the normalized parameter, which determines the blur degree of guided filter.
The effects of the two sets of parameters, 1 r , 1  and 2 r , 2  , in the gradient domain GIF on fusion performance will be discussed.Four indices, CC , SAM , ERGAS , and UIQI , are adopted to analyze the performance impact of the four parameters.Figure 5a,b,d,e represent two groups of the MS and PAN images from the WorldView-2 data set, which are the plant and house areas, respectively.Figure 5c, f is the fusion results that were obtained with the optimal guided filter parameters.3 shows the results of the fusion performance on the two pairs of test images, in terms of four selected indices ( CC , SAM , ERGAS , and UIQI ) versus different window sizes 1 r .Figure 6 shows the change trend of each index intuitively, in which the indices SAM and ERGAS have been normalized.It can be seen that the performance is optimal when 1 = 2 r .When the parameter r 1 is analyzed, the other three parameters are set to fixed values: ε 1 = 1, r 2 = 3, and ε 2 = 10 −6 .Table 3 shows the results of the fusion performance on the two pairs of test images, in terms of four selected indices (CC, SAM, ERGAS, and UIQI) versus different window sizes r 1 .Figure 6 shows the change trend of each index intuitively, in which the indices SAM and ERGAS have been normalized.It can be seen that the performance is optimal when r 1 = 2.When analyzing the parameter 1  , the other three parameters are also set to fixed values: r , and  4, statistics of all evaluation indices of fusion images that were obtained by different values of 1  are presented.Figure 7 shows the relevant changing curve of each index; as can be seen, better performance results came from setting the parameter   When analyzing the parameter ε 1 , the other three parameters are also set to fixed values: r 1 = 2, r 2 = 3, and ε 2 = 10 −6 .As shown in Table 4, statistics of all evaluation indices of fusion images that were obtained by different values of ε 1 are presented.Figure 7 shows the relevant changing curve of each index; as can be seen, better performance results came from setting the parameter ε 1 = 10 −6 . rr are presented in Table 5. Figure 8 shows the change curve of each evaluation index and it can be seen that the performance is optimal when 2 = 2 r .When the parameter r 2 is analyzed, the other three parameters are set to: r 1 = 2, ε 1 = 10 −6 , and ε 2 = 10 −6 , respectively.The statistical results of each evaluation index that are obtained by different window sizes r 2 × r 2 are presented in Table 5. Figure 8 shows the change curve of each evaluation index and it can be seen that the performance is optimal when r 2 = 2.   rr are presented in Table 5. Figure 8 shows the change curve of each evaluation index and it can be seen that the performance is optimal when 2 = 2 r .When analyzing the parameter ε 2 , the other three parameters are also set to fixed values: r 1 = 2, ε 1 = 10 −6 , and r 2 = 2, respectively.As shown in Table 6, the performance results with different values of ε 2 for the proposed and other comparison methods are presented.Figure 9 shows the relevant changing curve of each index and the best performance results came from setting the parameter ε 1 = 10 −6 .When analyzing the parameter 2  , the other three parameters are also set to fixed values:  − , and 2 = 2 r , respectively.As shown in Table 6, the performance results with different values of 2  for the proposed and other comparison methods are presented.Figure 9 shows the relevant changing curve of each index and the best performance results came from setting the parameter   By analyzing the performance with different parameter settings in the gradient domain GIF for the proposed method on WorldView-2 data set, we find that the proposed method achieves better results when the four parameters are set as 1 = 2 r , it can be observed that the gradient domain GIF is less sensitive to the selection of the blur degree parameter.From our experiment, the performance with the no-reference indices for the proposed method also achieves the best results under the same parameter settings.The proposed method is independent of the precise parameter selection of the gradient domain GIF.These parameters of the filter have little effect on the overall fusion performance.Therefore, a satisfactory effect can be obtained by the proposed method with fixed parameter values.

Results Analysis and Discussion
In this section, the experiments are performed on the real data and the degraded data, respectively.Six image pairs of the experimental data set are from different observation scenes of the WorldView-2 and QuickBird satellites: three groups are used as the real data and the other three are applied to the performance comparison on the degraded data.By analyzing the performance with different parameter settings in the gradient domain GIF for the proposed method on WorldView-2 data set, we find that the proposed method achieves better results when the four parameters are set as r 1 = 2, ε 1 = 10 −6 , r 2 = 2, and ε 2 = 10 −6 .At the same time, it can be observed that the gradient domain GIF is less sensitive to the selection of the blur degree parameter.From our experiment, the performance with the no-reference indices for the proposed method also achieves the best results under the same parameter settings.The proposed method is independent of the precise parameter selection of the gradient domain GIF.These parameters of the filter have little effect on the overall fusion performance.Therefore, a satisfactory effect can be obtained by the proposed method with fixed parameter values.

Results Analysis and Discussion
In this section, the experiments are performed on the real data and the degraded data, respectively.Six image pairs of the experimental data set are from different observation scenes of the WorldView-2 and QuickBird satellites: three groups are used as the real data and the other three are applied to the performance comparison on the degraded data.

Performance Comparison with State-of-the-Art methods on Real Data
In this experiment, the proposed fusion method is compared with IHS [1], NSCT-PCNN [20], NSST-SR [23], multi-resolution singular value decomposition (MSVD) [46], PRACS [6], Gram-Schmidt adaptive (GSA) [7], and morphological filter-half gradients (MF-HG) [36] on the real remote sensing data set without a reference image.The codes for the seven comparison methods were downloaded from the homepage that was published by the authors and the parameter settings in the experiment are consistent with those in the literature.As shown in Figure 10, three image pairs of experimental data sets are selected as examples for the result analysis without a reference image.The first two image pairs came from two different observation scenes of the WorldView-2 satellite and the third one was collected from the QuickBird satellite.Figures 11-13 show the fusion results of the proposed method and the comparison methods.For better comparison, a rectangular sub-image area is magnified and then displayed at the upper left of the fusion image.In this experiment, the proposed fusion method is compared with IHS [1], NSCT-PCNN [20], NSST-SR [23], multi-resolution singular value decomposition (MSVD) [46], PRACS [6], Gram-Schmidt adaptive (GSA) [7], and morphological filter-half gradients (MF-HG) [36]on the real remote sensing data set without a reference image.The codes for the seven comparison methods were downloaded from the homepage that was published by the authors and the parameter settings in the experiment are consistent with those in the literature.As shown in Figure 10, three image pairs of experimental data sets are selected as examples for the result analysis without a reference image.The first two image pairs came from two different observation scenes of the WorldView-2 satellite and the third one was collected from the QuickBird satellite.Figures 11-13 show the fusion results of the proposed method and the comparison methods.For better comparison, a rectangular sub-image area is magnified and then displayed at the upper left of the fusion image.

WorldView-2 Data Set
For the group 1 image pair of the WorldView-2 data set, the corresponding fused products are given in Figure 11a-h.The fusion results of IHS, NSCT-PCNN, NSST-SR, MSVD, PRACS, GSA, and MF-HG are shown in Figure 11a-g, respectively; Figure 11h is the result that is generated by our method.It can be seen that the fused image that was obtained by the proposed method achieves a satisfactory visual effect in terms of the preservation of spectral information and spatial details.When compared with the other methods, there is a severe spectral distortion in the fused image that is obtained by the IHS method.The spatial and spectral quality of fusion result of the NSCT-PCNN method is relatively good.The fused result of NSST-SR suffers from some degrees of color distortion, and from the enlarged rectangle sub-image we can see that there are not enough details of the cars.The fusion results of the MSVD and GSA methods have some blurring at the edges of the cars.The PRACS method results in some noise in the dark area next to the cars.The MF-HG method shows the quality of the spectral information preservation and the spatial details injection is enhanced.For the group 2 image pair of the WorldView-2 data set, Figure 12 shows the corresponding experimental results.By comparing the visual effects with the other comparison methods, we can find that the proposed method has significant improvement on the spatial quality and the spectral details and it preserves more spectral information than the IHS method.The NSST-SR method has higher spatial resolution, but the color of the fusion result is quite different from the original MS image, and there is a certain degree of spectral distortion.The fusion result of the MSVD has little blurring.The PRACS, GSA, and MF-HG methods result in better spectral information maintenance, as well as improved spatial quality.It is difficult to tell the difference between the results that were obtained by these methods through visual observation.10e,f, which came from the QuickBird satellite, the corresponding fusion products are shown in Figure 13.The IHS method has certain limitations; namely, it can only be applied to the MS image with three bands.This data set from the QuickBird satellite provides the MS image with four bands (R, G, B, and NIR), so the methods other than IHS are used for comparison.When compared with the visual effects of the fusion results, there is a relatively serious spectral and spatial distortion in the results that were achieved by the NSCT-PCNN and NSST-SR methods.The fusion result of the MSVD method suffers from some blurring and it has an obviously serrated border.The fusion result of the GSA method has some improvement in terms of the spatial and spectral quality, but spectral inhomogeneity exists in local areas.The PRACS, MF-HG, and our proposed methods obtain better visual effects, which have higher spatial quality and better spectral information maintenance ability.Table 9 shows the corresponding quantitative results of Figure 13, in which we can see that the proposed method outperforms the six compared methods in the QNR and s D indices, while the best value of D  is given by the PRACS method.Table 9. Objective evaluation of the experimental results shown in Figure 13.In this experiment, the performance of the proposed fusion method is compared with seven state-of-the-art methods, including IHS [1], NSCT-PCNN [20], NSST-SR [23], MSVD [46], PRACS [6], GSA [7], and MF-HG [36], on the degraded remote sensing data set with the reference image.As shown in Figure 14, three image pairs are selected for the reduced resolution evaluation.The first two image pairs came from two different observation scenes of the WorldView-2 satellite and the

WorldView-2 Data Set
For the group 1 image pair of the WorldView-2 data set, the corresponding fused products are given in Figure 11a-h.The fusion results of IHS, NSCT-PCNN, NSST-SR, MSVD, PRACS, GSA, and MF-HG are shown in Figure 11a-g, respectively; Figure 11h is the result that is generated by our method.It can be seen that the fused image that was obtained by the proposed method achieves a satisfactory visual effect in terms of the preservation of spectral information and spatial details.When compared with the other methods, there is a severe spectral distortion in the fused image that is obtained by the IHS method.The spatial and spectral quality of fusion result of the NSCT-PCNN method is relatively good.The fused result of NSST-SR suffers from some degrees of color distortion, and from the enlarged rectangle sub-image we can see that there are not enough details of the cars.The fusion results of the MSVD and GSA methods have some blurring at the edges of the cars.The PRACS method results in some noise in the dark area next to the cars.The MF-HG method shows the quality of the spectral information preservation and the spatial details injection is enhanced.
For the group 2 image pair of the WorldView-2 data set, Figure 12 shows the corresponding experimental results.By comparing the visual effects with the other comparison methods, we can find that the proposed method has significant improvement on the spatial quality and the spectral fidelity of the fused image.Figure 12a shows the fused image of the IHS method.It can be seen that this method generates serious spectral distortion.The NSCT-PCNN method injects more spatial details and it preserves more spectral information than the IHS method.The NSST-SR method has higher spatial resolution, but the color of the fusion result is quite different from the original MS image, and there is a certain degree of spectral distortion.The fusion result of the MSVD has little blurring.The PRACS, GSA, and MF-HG methods result in better spectral information maintenance, as well as improved spatial quality.It is difficult to tell the difference between the results that were obtained by these methods through visual observation.
Tables 7 and 8 provide the quantitative evaluation results of Figures 11 and 12. From the comparison of each index, we can see that the proposed method provides the best value in terms of QNR.In Table 7, the D s value of our proposed method is the smallest, but the MF-HG method gives the best value of D λ .In Table 8, our proposed method provides the best value of D λ , and the best value of D s is given by the GSA method; the result of our method is the third smallest.In general, however, the proposed method achieves better pansharpening performance than the comparison methods.

QuickBird Data Set
For the group 3 image pair in Figure 10e,f, which came from the QuickBird satellite, the corresponding fusion products are shown in Figure 13.The IHS method has certain limitations; namely, it can only be applied to the MS image with three bands.This data set from the QuickBird satellite provides the MS image with four bands (R, G, B, and NIR), so the methods other than IHS are used for comparison.When compared with the visual effects of the fusion results, there is a relatively serious spectral and spatial distortion in the results that were achieved by the NSCT-PCNN and NSST-SR methods.The fusion result of the MSVD method suffers from some blurring and it has an obviously serrated border.The fusion result of the GSA method has some improvement in terms of the spatial and spectral quality, but spectral inhomogeneity exists in local areas.The PRACS, MF-HG, and our proposed methods obtain better visual effects, which have higher spatial quality and better spectral information maintenance ability.Table 9 shows the corresponding quantitative results of Figure 13, in which we can see that the proposed method outperforms the six compared methods in the QNR and D s indices, while the best value of D λ is given by the PRACS method.In this experiment, the performance of the proposed fusion method is compared with seven state-of-the-art methods, including IHS [1], NSCT-PCNN [20], NSST-SR [23], MSVD [46], PRACS [6], GSA [7], and MF-HG [36], on the degraded remote sensing data set with the reference image.As shown in Figure 14, three image pairs are selected for the reduced resolution evaluation.The first two image pairs came from two different observation scenes of the WorldView-2 satellite and the third was collected from the QuickBird satellite.The low resolution images can be generated by modulation transfer functions (MTF) filtering and decimation.The original MS images are used as the reference images for performance evaluation.Figure 15, Figure 17, and Figure 19 show the fusion results of the proposed method and the compared methods.
(e) (f) (g) Table 9. Objective evaluation of the experimental results shown in Figure 13.In this experiment, the performance of the proposed fusion method is compared with seven state-of-the-art methods, including IHS [1], NSCT-PCNN [20], NSST-SR [23], MSVD [46], PRACS [6], GSA [7], and MF-HG [36], on the degraded remote sensing data set with the reference image.As shown in Figure 14, three image pairs are selected for the reduced resolution evaluation.The first two image pairs came from two different observation scenes of the WorldView-2 satellite and the third was collected from the QuickBird satellite.The low resolution images can be generated by modulation transfer functions (MTF) filtering and decimation.The original MS images are used as the reference images for performance evaluation.Figures 15,17

WorldView-2 Data Set
For the group 1 image set, which is the degraded data set of WorldView-2 on the bridge area, the corresponding fused results are shown in Figure 15. Figure 15a is the reference MS image.Figure 15b-i demonstrate the fusion results of IHS, NSCT-PCNN, NSST-SR, MSVD, PRACS, GSA, MF-HG, and our proposed method, respectively.It can be seen that serious spectral distortion is produced by the IHS method.The fusion result of the NSCT-PCNN method behaves relatively well in terms of spectral maintenance and spatial information injection.The spatial detailed information in the fused result of NSST-SR is well maintained, but there are some color distortions.The fusion results of the MSVD and PRACS also behave well in terms of spectral and spatial information.From Figure 15g, we can see that some detailed information is lost in the result of the GSA method, with some blurring of the outlines of the bridge.In the fusion result that was obtained by the MF-HG method, the contrast in local areas is slightly higher than that in the reference image.By contrast, the fusion image of the proposed method is closer to the reference image, and it has higher spatial resolution than the comparison methods.
MF-HG, and our proposed method, respectively.It can be seen that serious spectral distortion is produced by the IHS method.The fusion result of the NSCT-PCNN method behaves relatively well in terms of spectral maintenance and spatial information injection.The spatial detailed information in the fused result of NSST-SR is well maintained, but there are some color distortions.The fusion results of the MSVD and PRACS also behave well in terms of spectral and spatial information.From Figure 15g, we can see that some detailed information is lost in the result of the GSA method, with some blurring of the outlines of the bridge.In the fusion result that was obtained by the MF-HG method, the contrast in local areas is slightly higher than that in the reference image.By contrast, the fusion image of the proposed method is closer to the reference image, and it has higher spatial resolution than the comparison methods.For better analysis of the spectral preservation ability of the fused images in Figure 15, the horizontal profiles of the column means for R, G, and B bands of the fusion images that were obtained by different methods are displayed in Figure 16.The profile that was formed by black dots is the reference image.The closer to this dotted profile, the better the spectral characteristics are preserved.It can be seen that the profile of the IHS method has large deviations from the reference images in most areas.In contrast, the fused image that was obtained by the proposed method is most similar to the horizontal spectral profile of the reference image, and it has the highest fidelity for the spectral information.For better analysis of the spectral preservation ability of the fused images in Figure 15, the horizontal profiles of the column means for R, G, and B bands of the fusion images that were obtained by different methods are displayed in Figure 16.The profile that was formed by black dots is the reference image.The closer to this dotted profile, the better the spectral characteristics are preserved.It can be seen that the profile of the IHS method has large deviations from the reference images in most areas.In contrast, the fused image that was obtained by the proposed method is most similar to the horizontal spectral profile of the reference image, and it has the highest fidelity for the spectral information.
Group 2 image pair in Figure 14 is the degraded data set of WorldView-2 on the urban area, and Figure 17 shows the corresponding fusion results of the compared methods and the proposed method.Figure 17a is the original MS image, which is used as the reference image for evaluation.There is serious spectral distortion in the fused result that was obtained by the IHS method.The color of the fusion result of the NSST-SR method has a certain degree of spectral distortion.The fusion result that was obtained by the MSVD method is blurred.Fusion results yielded by the NSCT-PCNN, PRACS, GSA, MF-HG, and our proposed methods achieve better visual effects and have a more similar appearance to the reference MS image.Group 2 image pair in Figure 14 is the degraded data set of WorldView-2 on the urban area, and Figure 17 shows the corresponding fusion results of the compared methods and the proposed method.Figure 17a is the original MS image, which is used as the reference image for evaluation.There is serious spectral distortion in the fused result that was obtained by the IHS method.The color of the fusion result of the NSST-SR method has a certain degree of spectral distortion.The fusion result that was obtained by the MSVD method is blurred.Fusion results yielded by the NSCT-PCNN, PRACS, GSA, MF-HG, and our proposed methods achieve better visual effects and have a more similar appearance to the reference MS image.Group 2 image pair in Figure 14 is the degraded data set of WorldView-2 on the urban area, and Figure 17 shows the corresponding fusion results of the compared methods and the proposed method.Figure 17a is the original MS image, which is used as the reference image for evaluation.There is serious spectral distortion in the fused result that was obtained by the IHS method.The color of the fusion result of the NSST-SR method has a certain degree of spectral distortion.The fusion result that was obtained by the MSVD method is blurred.Fusion results yielded by the NSCT-PCNN, PRACS, GSA, MF-HG, and our proposed methods achieve better visual effects and have a more similar appearance to the reference MS image.Figure 18 shows the horizontal spectral profiles of the fused images on urban area.It can be seen that the IHS profile has a large degree of deviation from the reference image.The horizontal profiles of NSCT-PCNN, NSST-SR, GSA, and MF-HG methods are close to the reference image, with only some local sections having differences.The horizontal spectral profile of the proposed method is closest to the reference image.
(g) (h) (i) Figure 18 shows the horizontal spectral profiles of the fused images on urban area.It can be seen that the IHS profile has a large degree of deviation from the reference image.The horizontal profiles of NSCT-PCNN, NSST-SR, GSA, and MF-HG methods are close to the reference image, with only some local sections having differences.The horizontal spectral profile of the proposed method closest to the reference image.In this experiment, six indices are selected for the evaluation procedure on the degraded data, including CC , SSIM , SAM , RMSE , ERGAS , and UIQI .From Tables 10 and 11we can see that the largest CC values are achieved by the proposed method, which are significantly higher than other methods, indicating that our method has excellent spectral preservation ability.The proposed method also obtains the best values in RMSE , ERGAS and SSIM .It demonstrates that the fused image that was obtained by our method maintains good spatial information and has the most similar structure to the reference image.The optimal value of the UIQI index shows that the spatial details preservation of the proposed method is the best.The value of SAM is close to the optimal value of the compared methods.According to the subjective visual effect and objective In this experiment, six indices are selected for the evaluation procedure on the degraded data, including CC, SSIM, SAM, RMSE, ERGAS, and UIQI.From Tables 10 and 11 we can see that the largest CC values are achieved by the proposed method, which are significantly higher than other methods, indicating that our method has excellent spectral preservation ability.The proposed method also obtains the best values in RMSE, ERGAS and SSIM.It demonstrates that the fused image that was obtained by our method maintains good spatial information and has the most similar structure to the reference image.The optimal value of the UIQI index shows that the spatial details preservation of the proposed method is the best.The value of SAM is close to the optimal value of the compared methods.According to the subjective visual effect and objective evaluation using the WorldView-2 data set, the proposed method not only improves the spatial continuity, but also effectively maintains the spectral information and improves the spatial resolution of the fused image.The performance of the proposed method is obviously superior to the other seven state-of-the-art methods.

QuickBird Data Set
Figure 19 shows the fusion results of the group 3 image pair in Figure 14.This data set, from the QuickBird satellite, provides the MS image with four bands (R, G, B, and NIR).It uses the six compared methods excluding IHS for comparison in this part.The fusion result of the NSCT-PCNN method behaves well in terms of spatial detail enhancement and spectral preservation, but it results in some noise in local areas.The results of the NSST-SR and MSVD methods have a certain degree of color distortion and the fusion result that was achieved by the MSVD method has the problem of edge blurring.The spectral characteristics in the fusion results of the PRACS and GSA methods are preserved well, but there is some structural information loss.In contrast, the fusion results of the MF-HG method and the proposed method achieve better visual effects.19 shows the fusion results of the group 3 image pair in Figure 14.This data set, from the QuickBird satellite, provides the MS image with four bands (R, G, B, and NIR).It uses the six compared methods excluding IHS for comparison in this part.The fusion result of the NSCT-PCNN method behaves well in terms of spatial detail enhancement and spectral preservation, but it results in some noise in local areas.The results of the NSST-SR and MSVD methods have a certain degree of color distortion and the fusion result that was achieved by the MSVD method has the problem of edge blurring.The spectral characteristics in the fusion results of the PRACS and GSA methods are preserved well, but there is some structural information loss.In contrast, the fusion results of the MF-HG method and the proposed method achieve better visual effects.The quantitative evaluation results of Figure 19 are given in Table 12, in which we can see that the proposed method outperforms the six compared methods in most indices, except the RMSE index.The best values in CC and SAM indicate better spectral preservation ability of the proposed method.Values of SSIM , ERGAS and UIQI also achieve the optimal value, which demonstrates that the fused image obtained by the proposed method achieves better preservation ability of spatial and spectral information and has the most similar structure to the reference image.The RMSE value of our proposed method is close to the optimum, which is given by the PRACS method.According to the subjective visual effect and objective evaluation using the QuickBird data set, our proposed method can harmonize the spectral information preservation and the spatial details injection well, and it is better than the other six state-of-the-art methods.The quantitative evaluation results of Figure 19 are given in Table 12, in which we can see that the proposed method outperforms the six compared methods in most indices, except the RMSE index.The best values in CC and SAM indicate better spectral preservation ability of the proposed method.Values of SSIM, ERGAS and UIQI also achieve the optimal value, which demonstrates that the fused image obtained by the proposed method achieves better preservation ability of spatial and spectral information and has the most similar structure to the reference image.The RMSE value of our proposed method is close to the optimum, which is given by the PRACS method.According to the subjective visual effect and objective evaluation using the QuickBird data set, our proposed method can harmonize the spectral information preservation and the spatial details injection well, and it is better than the other six state-of-the-art methods.

Conclusions
In this paper, a new pansharpening method that is based on the gradient domain GIF and NSST is proposed.The spatial continuity of the fused image is improved by using a gradient domain GIF with good edge preserving properties.For low-frequency coefficients in the NSST domain, the MFIM technology is adopted to complete the injection of the detailed spatial information.The modulated low-frequency coefficients are refined based on the gradient-domain GIF, which reduces the blurring of the edges.For the high-frequency coefficients, an improved PCNN is utilized to calculate the model firing output amplitude.The gradient domain GIF is then used to optimize the firing map, which reduces the decision deviation at the boundary region of the object.It also effectively suppresses the artificial texture and image non-uniformity phenomenon.The real and simulated data sets from the WorldView-2 and QuickBird satellites were utilized in order to verify the pansharpening performance of the proposed method.The data sets consist of three channels and four channels, respectively.Experiments conducted on the WorldView-2 data set adopted seven state-of-the-art methods for comparison: (1) IHS; (2) NSCT-PCNN; (3) NSST-SR; (4) MSVD; (5) PRACS; (6) GSA; and, (7) MF-HG.Experiments that were conducted using the QuickBird data set adopted these same methods with the exception of the IHS method.The experimental results show that our proposed method can achieve better performance in terms of spectral preservation and spatial detail injection, which verifies the method's effectiveness and superiority.
Although the proposed method performs well in spectral information preservation and spatial quality improvement, the spatial quality of the fused image still needs to be further improved due to the blurring of some edge details that is caused by the gradient domain GIF.In addition, the WorldView-2 satellite data set that was used in the paper only provides three channels of R, G, and B, while the satellite image itself contains more bands.Combinations of different bands have different applications in practice.Spectral fidelity is particularly important for WorldView-2 satellite images.In future work, we would like to extend the pansharpening method to MS images with more bands, and we will work to further improve the spatial quality of the fused images while maintaining spectral information and spatial continuity.
images with the same size as
images with the same size as the source image can be obtained by the j level decomposition of a certain scale.J is the image decomposition layer and j l is the direction decomposition levels under scale j .Figure2illustrates a three-level NSST decomposition model.NSST has the characteristics of approximate optimal sparse representation for 2-D images.Applying it to the fusion of MS and PAN images can provide more useful information for the fused result.
minimizing the cost function ( , ) of I X , I , and X in the local window k  , respectively.Since the value of  in the GIF

Figure 3 .
Figure 3.The flow chart of the guided image filter (GIF).

2. 2 . 2 .
Edge-Aware Weighting In order to better preserve the edges, Kou defines a new edge-aware weighting ( ) importance of each pixel with respect to the global guidance image.( ) I k

Figure 3 .
Figure 3.The flow chart of the guided image filter (GIF).

4 .
Inverse transform of NSSTThe inverse transform of NSST is performed on the fused low-frequency coefficients {S 0,d F } and the fused high-frequency coefficients {S l,d F } to obtain the final fused image F.

4 .Figure 4 .
Figure 4.The flow chart of the proposed fusion method.

Figure 4 .
Figure 4.The flow chart of the proposed fusion method.

Figure 5 .
Figure 5. Two pairs of the multispectral (MS) and panchromatic (PAN) example images: (a) PAN image 1; (b) MS image 1; (c) the fusion result of PAN image 1 and MS image 1; (d) PAN image 2; (e) MS image 2; and, (f) the fusion result of PAN image 2 and MS image 2.

Figure 5 .
Figure 5. Two pairs of the multispectral (MS) and panchromatic (PAN) example images: (a) PAN image 1; (b) MS image 1; (c) the fusion result of PAN image 1 and MS image 1; (d) PAN image 2; (e) MS image 2; and, (f) the fusion result of PAN image 2 and MS image 2.

Figure 6 .
Figure 6.Analysis of parameter r 1 used in the algorithm.Correlation Coefficient (CC), Spectral Angle Mapper (SAM), Erreur Relative Global Adimensionnelle de Synthèse (ERGAS), and Universal Quality Index (UIQI) are observed when ε 1 = 1, r 2 = 3, and ε 2 = 10 −6 (which are set as default values): (a) the effect of r 1 on fusion performance of group 1 images; and, (b) the effect of r 1 on fusion performance of group 2 images.

Figure 8 .(
Figure 8. Analysis of parameter 2 r used in the algorithm.CC , SAM , ERGAS , and UIQI are observed when 1 = 2 r

Figure 7 .
Figure 7. Analysis of parameter ε 1 used in the algorithm.CC, SAM, ERGAS, and UIQI are observed when r 1 = 2, r 2 = 3, and ε 2 = 10 −6 (which are set as default values): (a) the effect of ε 1 on fusion performance of group 1 images; and, (b) the effect of ε 1 on fusion performance of group 2 images.

Figure 8 .
Figure 8. Analysis of parameter 2 r used in the algorithm.CC , SAM , ERGAS , and UIQI are observed when 1 = 2 r

Figure 8 .
Figure 8. Analysis of parameter r 2 used in the algorithm.CC, SAM, ERGAS, and UIQI are observed when r 1 = 2, ε 1 = 10 −6 , and ε 2 = 10 −6 (which are set as default values): (a) the effect of r 2 on fusion performance of group 1 images; and, (b) the effect of r 2 on fusion performance of group 2 images.

Figure 9 .
Figure 9. Analysis of parameter ε 2 used in the algorithm.CC, SAM, ERGAS, and UIQI are observed when r 1 = 2, r 2 = 2, and ε 1 = 10 −6 (which are set as default values): (a) the effect of ε 2 on fusion performance of group 1 images; and, (b) the effect of ε 2 on fusion performance of group 2 images.

Figure 10 .
Figure 10.Threeimage pairs of experimental data sets are used for assessment without a reference.The first two came from WorldView-2 satellite, and the third was collected from the QuickBird satellite: (a) test MS image 1; (b) test PAN image 1; (c) test MS image 2; (d) test PAN image 2; (e) test MS image 3; and, (f) test PAN image 3.

Figure 10 .Figure 11 .
Figure 10.Threeimage pairs of experimental data sets are used for assessment without a reference.The first two came from WorldView-2 satellite, and the third was collected from the QuickBird satellite: (a) test MS image 1; (b) test PAN image 1; (c) test MS image 2; (d) test PAN image 2; (e) test MS image 3; and, (f) test PAN image 3. Electronics 2019, 8, x FOR PEER REVIEW 18 of 28

Figure 14 .
Figure 14.Three image pairs of experimental data sets used for assessment with reference.The first two came from the WorldView-2 satellite, and the third was collected from the QuickBird satellite: (a) test MS image 1; (b) test PAN image 1; (c) test MS image 2; (d) test PAN image 2; (e) test MS image 3; and, (f) test PAN image 3.

Figure 14 .
Figure 14.Three image pairs of experimental data sets used for assessment with reference.The first two came from the WorldView-2 satellite, and the third was collected from the QuickBird satellite: (a) test MS image 1; (b) test PAN image 1; (c) test MS image 2; (d) test PAN image 2; (e) test MS image 3; and, (f) test PAN image 3.

Figure 16 .
Figure 16.Horizontal spectral profiles of the fusion results for the WorldView-2 data set on the bridge area: (a) band R; (b) band G; and, (c) band B.

Figure 16 .Figure 16 .
Figure 16.Horizontal spectral profiles of the fusion results for the WorldView-2 data set on the bridge area: (a) band R; (b) band G; and, (c) band B.

Figure 18
Figure18shows the horizontal spectral profiles of the fused images on urban area.It can be seen that the IHS profile has a large degree of deviation from the reference image.The horizontal profiles of NSCT-PCNN, NSST-SR, GSA, and MF-HG methods are close to the reference image, with only some local sections having differences.The horizontal spectral profile of the proposed method is closest to the reference image.

Figure 18 .
Figure 18.Horizontal spectral profiles of the fusion results for the WorldView-2 data set on the urban area: (a) band R; (b) band G; and, (c) band B.

Figure 18 .
Figure 18.Horizontal spectral profiles of the fusion results for the WorldView-2 data set on the urban area: (a) band R; (b) band G; and, (c) band B.

Figure 20 shows
Figure20shows the horizontal spectral profiles of the column means for R, G, B, and NIR bands of the fusion images and the reference image.The profiles of the NSST-SR and MSVD methods have large deviations from the reference image.The profiles of the GSA, PRACS, and MF-HG methods agree relatively well with the reference image.It can be seen that the profile that was obtained by our proposed method is the closest to the reference profile.

Figure 20
Figure20shows the horizontal spectral profiles of the column means for R, G, B, and NIR bands of the fusion images and the reference image.The profiles of the NSST-SR and MSVD methods have large deviations from the reference image.The profiles of the GSA, PRACS, and MF-HG methods agree relatively well with the reference image.It can be seen that the profile that was obtained by our proposed method is the closest to the reference profile.

Figure 20 .
Figure 20.Horizontal spectral profiles of the fusion results for the QuickBird data set: (a) band R; (b) band G; (c) band B; and, (d) band NIR.

Figure 20 .
Figure 20.Horizontal spectral profiles of the fusion results for the QuickBird data set: (a) band R; (b) band G; (c) band B; and, (d) band NIR.
The high resolution version of the low-frequency coefficients can be obtained while using low-frequency coefficients of the PAN image to modulate the low-frequency coefficients of the MS image.The gradient domain GIF is used to optimize the modulated sub-band coefficients and the fusion results of low-frequency coefficients

Table 1 .
The spectral and spatial resolution of each band of the WorldView-2 satellite.

Table 2 .
The spectral and spatial resolution of each band of the QuickBird satellite.

Table 3 .
The performance evaluation results with different window sizes r 1 × r 1 .

Table 3 .
The performance evaluation results with different window sizes 11  rr .

Table 4 .
The performance evaluation results with different values of 1

Table 4 .
The performance evaluation results with different values of ε 1 .

Table 5 .
The performance evaluation results with different window sizes 22  rr .

Table 5 .
The performance evaluation results with different window sizes r 2 × r 2 .

Table 5 .
The performance evaluation results with different window sizes 22  rr .

Table 6 .
The performance evaluation results with different values of ε 2 .

Table 6 .
The performance evaluation results with different values of 2  .

Table 8 .
Objective evaluation of the experimental results shown in Figure12.

Table 7 .
Objective evaluation of the experimental results shown in Figure11.

Table 8 .
Objective evaluation of the experimental results shown in Figure12.

Table 9 .
Objective evaluation of the experimental results shown in Figure13.

Table 10 .
Objective evaluation of the experimental results shown in Figure15.

Table 11 .
Objective evaluation of the experimental results shown in Figure17.

Table 10 .
Objective evaluation of the experimental results shown in Figure15.

Table 11 .
Objective evaluation of the experimental results shown in Figure17.

Table 12 .
Objective evaluation of the experimental results shown in Figure19.

Table 12 .
Objective evaluation of the experimental results shown in Figure19.